Input | |
---|---|
Character set up to date to Unicode 12. The tool as a whole is a new version, public in early stages. As I work on it, it will be missing features, occasionally its data, and sometimes give errors. Input currently recognised:
|
|
Type here | |
Input was | |
Interpretation | No valid codepoint in U+DC00..U+DFFF You didn't ask for anything, treating as empty string. |
Search | |
Name search | no interesting words to search for |
Unicode string properties | |
Normalization | No normalisations change the data (does not necessarily mean nothing decomposes to this form) |
Encodings that can encode this properly | utf_8 utf_16 utf_32 ascii latin_1 iso8859_2 iso8859_3 iso8859_4 iso8859_5 iso8859_6 iso8859_7 iso8859_8 iso8859_9 iso8859_10 iso8859_13 iso8859_14 iso8859_15 iso2022_jp iso2022_jp_1 iso2022_jp_2 iso2022_jp_2004 iso2022_jp_3 iso2022_jp_ext iso2022_kr gb2312 gbk gb18030 big5 big5hkscs euc_jp euc_jis_2004 euc_jisx0213 euc_kr hz johab koi8_r koi8_u mac_cyrillic mac_greek mac_iceland mac_latin2 mac_roman mac_turkish ptcp154 shift_jis shift_jis_2004 shift_jisx0213 cp037 cp424 cp437 cp500 cp737 cp775 cp850 cp852 cp855 cp856 cp857 cp860 cp861 cp862 cp863 cp864 cp865 cp866 cp869 cp874 cp875 cp932 cp949 cp950 cp1006 cp1026 cp1140 cp1250 cp1251 cp1252 cp1253 cp1254 cp1255 cp1256 cp1257 cp1258 |
Encodings that will mangle your text | |
String encoding | |
String stuff | |
HTML/XML numeric entities | All but a-zA-Z0-9 and space are encoded, which is a little overzealous hexadecimal: decimal: |
UTF8 bytestring | as hex: (UTF8 bytestring length is 0) |
URL-encoded UTF8 | |
Javascript ~ES3 | "" |
ES6 | "" |
Python py2 | Unicode string: u'' UTF8 bytestring: '' |
py3 | Unicode string: '' UTF8 bytestring: b'' |
Ruby | '' |
CSS (in :before/:after) | '' |
TeX (experiment) | nothing interesting to report here |
Emoji (experiment; TODO) | |
CJK (experiment; TODO) | |
Unicode layout, and blocks used by the input | |
BMP - Basic Multilingual Plane: Basic Latin (128) pdf
Latin-1 Supplement (128) pdf
Latin Extended-A (128) pdf
Latin Extended-B (208) pdf IPA Extensions (96) pdf Greek and Coptic (144) pdf
Cyrillic Supplement (48) pdf
Arabic Supplement (48) pdf
Syriac Supplement (16) pdf
not allocated (48)
Arabic Extended-A (96) pdf
Devanagari (128) pdf
Hangul Jamo (256) pdf
Ethiopic Supplement (32) pdf
New Tai Lue (96) pdf
Khmer Symbols (32) pdf
Cyrillic Extended-C (16) pdf
Georgian Extended (48) pdf
Sundanese Supplement (16) pdf
Vedic Extensions (48) pdf Phonetic Extensions (128) pdf
Latin Extended Additional (256) pdf
Greek Extended (256) pdf General Punctuation (112) pdf
Currency Symbols (48) pdf
Letterlike Symbols (80) pdf
Number Forms (64) pdf
Mathematical Operators (256) pdf
Miscellaneous Technical (256) pdf
Control Pictures (64) pdf
Enclosed Alphanumerics (160) pdf
Box Drawing (128) pdf
Block Elements (32) pdf
Geometric Shapes (96) pdf
Miscellaneous Symbols (256) pdf
Supplemental Arrows-A (16) pdf
Braille Patterns (256) pdf
Supplemental Arrows-B (128) pdf
Glagolitic (96) pdf
Latin Extended-C (32) pdf
Georgian Supplement (48) pdf
Ethiopic Extended (96) pdf
Cyrillic Extended-A (32) pdf
Supplemental Punctuation (128) pdf CJK Radicals Supplement (128) pdf
Kangxi Radicals (224) pdf
not allocated (16)
Bopomofo Extended (32) pdf
CJK Strokes (48) pdf
CJK Compatibility (256) pdf
CJK Unified Ideographs (20992) pdf
Yi Syllables (1168) pdf
Yi Radicals (64) pdf
Cyrillic Extended-B (96) pdf
Modifier Tone Letters (32) pdf
Latin Extended-D (224) pdf
Syloti Nagri (48) pdf
Saurashtra (96) pdf
Devanagari Extended (32) pdf
Myanmar Extended-B (32) pdf
Myanmar Extended-A (32) pdf
Ethiopic Extended-A (48) pdf
Latin Extended-E (64) pdf
Cherokee Supplement (80) pdf
Meetei Mayek (64) pdf
Hangul Syllables (11184) pdf
High Surrogates (896) pdf
High Private Use Surrogates (128)
Low Surrogates (1024) pdf Private Use Area (6400) pdf Variation Selectors (16) pdf
Vertical Forms (16) pdf
Combining Half Marks (16) pdf
Small Form Variants (32) pdf
End of range that UCS2-based Unicode implementations can store. UCS4 implementations have no real limit, UTF-16 implementations can go beyond using surrogates. SMP - Supplemental Multilingual Plane: Linear B Syllabary (128) pdf
Linear B Ideograms (128) pdf
Aegean Numbers (64) pdf
Ancient Greek Numbers (80) pdf
Ancient Symbols (64) pdf
Phaistos Disc (48) pdf
not allocated (128)
Coptic Epact Numbers (32) pdf
Old Italic (48) pdf
Old Permic (48) pdf
Old Persian (64) pdf
not allocated (32)
Caucasian Albanian (64) pdf
not allocated (144)
not allocated (128)
Cypriot Syllabary (64) pdf
Imperial Aramaic (32) pdf
not allocated (48)
Phoenician (32) pdf
not allocated (64)
Meroitic Hieroglyphs (32) pdf
Meroitic Cursive (96) pdf
Kharoshthi (96) pdf
Old South Arabian (32) pdf
Old North Arabian (32) pdf
not allocated (32)
Manichaean (64) pdf
Inscriptional Pahlavi (32) pdf
Psalter Pahlavi (48) pdf
not allocated (80)
Old Turkic (80) pdf
not allocated (48)
Old Hungarian (128) pdf
Hanifi Rohingya (64) pdf
not allocated (288)
Rumi Numeral Symbols (32) pdf
not allocated (128)
Old Sogdian (48) pdf
not allocated (112)
Sora Sompeng (48) pdf
not allocated (48)
not allocated (128)
not allocated (160)
Mongolian Supplement (32) pdf
not allocated (48)
not allocated (192)
not allocated (80)
Warang Citi (96) pdf
not allocated (160)
Nandinagari (96) pdf
Zanabazar Square (80) pdf
not allocated (16)
Pau Cin Hau (64) pdf
not allocated (256)
not allocated (64)
Masaram Gondi (96) pdf
Gunjala Gondi (80) pdf
not allocated (304)
not allocated (192)
Tamil Supplement (64) pdf
Early Dynastic Cuneiform (208) pdf
not allocated (2736)
Egyptian Hieroglyphs (1072) pdf
not allocated (4032)
Anatolian Hieroglyphs (640) pdf
not allocated (8576)
Bamum Supplement (576) pdf
not allocated (96)
Pahawh Hmong (144) pdf
not allocated (688)
Medefaidrin (96) pdf
not allocated (96)
not allocated (64)
Tangut Components (768) pdf
not allocated (9472)
Kana Supplement (256) pdf
Kana Extended-A (48) pdf
Small Kana Extension (64) pdf
not allocated (2304)
not allocated (4944) Byzantine Musical Symbols (256) pdf
Musical Symbols (256) pdf
not allocated (144)
Mayan Numerals (32) pdf
Tai Xuan Jing Symbols (96) pdf
Counting Rod Numerals (32) pdf
not allocated (128)
Sutton SignWriting (688) pdf
not allocated (1360)
Glagolitic Supplement (48) pdf
not allocated (208)
not allocated (368)
not allocated (1280)
Mende Kikakui (224) pdf
not allocated (32)
not allocated (784)
Indic Siyaq Numbers (80) pdf
not allocated (64)
Ottoman Siyaq Numbers (80) pdf
not allocated (176)
not allocated (256) Mahjong Tiles (48) pdf
Domino Tiles (112) pdf
Playing Cards (96) pdf Ornamental Dingbats (48) pdf
Transport and Map Symbols (128) pdf
Alchemical Symbols (128) pdf
Geometric Shapes Extended (128) pdf
Supplemental Arrows-C (256) pdf
Chess Symbols (112) pdf
not allocated (1280)
SIP - Supplemental Ideographic Plane: not allocated (32)
not allocated (3088)
not allocated (1504)
TIP - Tertiary Ideographic Plane: not allocated (60592)
Planes 4 through 13 - not allocated: plane 4 (not allocated) (65536)
plane 5 (not allocated) (65536)
plane 6 (not allocated) (65536)
plane 7 (not allocated) (65536)
plane 8 (not allocated) (65536)
plane 9 (not allocated) (65536)
plane 10 (not allocated) (65536)
plane 11 (not allocated) (65536)
plane 12 (not allocated) (65536)
plane 13 (not allocated) (65536)
SSP - Supplemental Special-purpose Plane: not allocated (128)
not allocated (65040)
PUA-A - Private Use Area A: Supplementary Private Use Area-A (65536) pdf
PUA-B - Private Use Area B: Supplementary Private Use Area-B (65536) pdf
Note that of the ~1.1 million codepoints under U+10FFFF (the current cap), only ~140K are general-purpose graphic codepoints (about half in BMP), ~130K are private use (with no defined characters), and ~830K are unused. The grouping used above is somewhat arbitrary, but looks halfway sensible | |