Notice: I'll be updating the information in the next weeks. It'll be broken occasionally.
Input may be:
Input | |
---|---|
Type here | |
Input was: | E2809E | ValueError('unichr() arg not in range(0x110000) (wide Python build)',)
...assuming this is UTF8-as-hexadecimal for U+201E (if intended as a hexadecimal codepoint, use U+E2809E style) | |
Character: | „ (U+201E) |
Character image (experiment) | ![]() |
Characters near this | Before: ◌ ◌ ‐ ‑ ‒ – — ― ‖ ‗ ‘ ’ ‚ ‛ “ ” After: ‟ † ‡ • ‣ ․ ‥ … ‧ ◌ ◌ ◌ ◌ ◌ ◌ Note: unprintable, nonspacing and combining characters are shown as or beside ◌ |
Some unicode data | |
Character name | DOUBLE LOW-9 QUOTATION MARK |
Categories | Other Neutrals, punctuation: open |
Links elsewhere: | fileformat.info/info/unicode/char/201e codepoints.net/U+201e |
Unicode block | General Punctuation, U+2000 to U+206F (see also the according PDF on unicode.org) |
Normalization | No normalisations change the data (does not necessarily mean nothing decomposes to this form) |
Font info (experiment) | |
Encoding | |
Character stuff | |
Named entity | HTML 2: not defined HTML 3.2: not defined HTML 4 and XHTML 1.0: „ XML 1.0: not defined |
Alt code | Alt 0132 Alt 132 (where cp1252-based, not a given) |
String stuff | |
HTML/XML numeric entities | All but basic alphanumeric encoded (hexadecimal and decimal): „ „ |
UTF8 bytestring | as hex: e2809e (UTF8 bytestring length is 3) |
URL-encoded UTF8 | %E2%80%9E |
Python string before py3k | Unicode string: u'\u201e' UTF8 bytestring: '\xe2\x80\x9e' |
...in py3k | Unicode string: '\u201e' UTF8 bytestring: b'\xe2\x80\x9e' |
Javascript (≥1.3) | "\u201e" |
LaTeX (incomplete experiment) | Character in LaTeX: ,, |
Encodings that can encode this properly | utf_8 utf_16 iso8859_13 gb18030 mac_cyrillic mac_iceland mac_latin2 mac_roman mac_turkish ptcp154 cp775 cp1250 cp1251 cp1252 cp1253 cp1254 cp1255 cp1256 cp1257 cp1258 |
Encodings that will dismember your data | ascii latin_1 iso8859_2 iso8859_3 iso8859_4 iso8859_5 iso8859_6 iso8859_7 iso8859_8 iso8859_9 iso8859_10 iso8859_14 iso8859_15 iso2022_jp iso2022_jp_1 iso2022_jp_2 iso2022_jp_2004 iso2022_jp_3 iso2022_jp_ext iso2022_kr gb2312 gbk big5 big5hkscs euc_jp euc_jis_2004 euc_jisx0213 euc_kr hz johab koi8_r koi8_u mac_greek shift_jis shift_jis_2004 shift_jisx0213 cp037 cp424 cp437 cp500 cp737 cp850 cp852 cp855 cp856 cp857 cp860 cp861 cp862 cp863 cp864 cp865 cp866 cp869 cp874 cp875 cp932 cp949 cp950 cp1006 cp1026 cp1140 |
Note that of the ~1.1 million codepoints under U+10FFFF (the current cap), ~900K are unused, ~130K are private use, and only ~100K are general-purpose graphic codepoints (about half in BMP).
Grouping used below is somewhat arbitrary, but looks halfway sensible
For more on planes, see http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Planes
Mouseover shows a range's codepoint range, links are to unicode.org PDFs.
Nameless gray blocks are reserved, unused or restricted areas, and show their size.
If we are showing a single codepoint above, the range it is in is bolded below.