Chris Angelico
2015-03-09 02:52:14 UTC
There's a bit of a discussion happening on python-list about whether
or not it should be legal to have codepoints like U+D800 in Unicode
strings. Currently, both Python and Pike permit them, but reject them
if you try to, for example, convert to UTF-8. But a suggestion has
been made that the mere presence of \uD800 in a string literal should
be a syntax error, and I'm wondering: Has anyone considered and
rejected this, or is it simply something that nobody's thought to
disallow?
Are there situations in which it's necessary to be able to store these
kinds of noncharacters in a string?
ChrisA
or not it should be legal to have codepoints like U+D800 in Unicode
strings. Currently, both Python and Pike permit them, but reject them
if you try to, for example, convert to UTF-8. But a suggestion has
been made that the mere presence of \uD800 in a string literal should
be a syntax error, and I'm wondering: Has anyone considered and
rejected this, or is it simply something that nobody's thought to
disallow?
Are there situations in which it's necessary to be able to store these
kinds of noncharacters in a string?
ChrisA