Input

Character set up to date to Unicode 12. The tool as a whole is a new version, public in early stages. As I work on it, it will be missing features, occasionally its data, and sometimes give errors.

Input currently recognised:
Or you could get a random character.

Type here
Input wasexclamation question
InterpretationDoesn't look like any specific reference. Will describe given string as-is
Constituent codepoints
   U+0065 eLATIN SMALL LETTER E
   U+0078 xLATIN SMALL LETTER X
   U+0063 cLATIN SMALL LETTER C
   U+006C lLATIN SMALL LETTER L
   U+0061 aLATIN SMALL LETTER A
   U+006D mLATIN SMALL LETTER M
   U+0061 aLATIN SMALL LETTER A
   U+0074 tLATIN SMALL LETTER T
   U+0069 iLATIN SMALL LETTER I
   U+006F oLATIN SMALL LETTER O
   U+006E nLATIN SMALL LETTER N
   U+0020 SPACE
   U+0071 qLATIN SMALL LETTER Q
   U+0075 uLATIN SMALL LETTER U
   U+0065 eLATIN SMALL LETTER E
   U+0073 sLATIN SMALL LETTER S
   U+0074 tLATIN SMALL LETTER T
   U+0069 iLATIN SMALL LETTER I
   U+006F oLATIN SMALL LETTER O
   U+006E nLATIN SMALL LETTER N
Search
Name search! ? ¡ ¿ ǃ ; ՜ ՞ ؟ ߹ 𑅃 𖺚 𞥞 𞥟 🆙 🔛 🙹 🙺 🙻 𡃇 ◌󠀡 ◌󠀿
Unicode string properties
NormalizationNo normalisations change the data
(does not necessarily mean nothing decomposes to this form)
Encodings that can encode this properlyutf_8   utf_16   utf_32   ascii   latin_1   iso8859_2   iso8859_3   iso8859_4   iso8859_5   iso8859_6   iso8859_7   iso8859_8   iso8859_9   iso8859_10   iso8859_13   iso8859_14   iso8859_15   iso2022_jp   iso2022_jp_1   iso2022_jp_2   iso2022_jp_2004   iso2022_jp_3   iso2022_jp_ext   iso2022_kr   gb2312   gbk   gb18030   big5   big5hkscs   euc_jp   euc_jis_2004   euc_jisx0213   euc_kr   hz   johab   koi8_r   koi8_u   mac_cyrillic   mac_greek   mac_iceland   mac_latin2   mac_roman   mac_turkish   ptcp154   shift_jis   shift_jis_2004   shift_jisx0213   cp037   cp424   cp437   cp500   cp737   cp775   cp850   cp852   cp855   cp856   cp857   cp860   cp861   cp862   cp863   cp864   cp865   cp866   cp869   cp874   cp875   cp932   cp949   cp950   cp1006   cp1026   cp1140   cp1250   cp1251   cp1252   cp1253   cp1254   cp1255   cp1256   cp1257   cp1258
Encodings that will mangle your text
String encoding
String stuff
HTML/XML
numeric entities
All but a-zA-Z0-9 and space are encoded, which is a little overzealous
hexadecimal:
  exclamation question

decimal:
  exclamation question
UTF8 bytestringas hex: 6578636c616d6174696f6e207175657374696f6e
(UTF8 bytestring length is 20)
URL-encoded UTF8%65%78%63%6C%61%6D%61%74%69%6F%6E%20%71%75%65%73%74%69%6F%6E
Javascript
~ES3
"exclamation question"
ES6"exclamation question"
Python
py2
Unicode string:
  u'exclamation question'
UTF8 bytestring:
  'exclamation question'

py3
Unicode string:
  'exclamation question'
UTF8 bytestring:
  b'exclamation question'
Ruby'exclamation question'
CSS (in :before/:after)'exclamation question'
TeX
(experiment)
nothing interesting to report here
Emoji (experiment; TODO)
HasNo
CJK (experiment; TODO)
HasNo
Font info (experiment; TODO)
Unicode layout, and blocks used by the input

BMP - Basic Multilingual Plane:






Hebrew (112) pdf
Arabic (256) pdf
Syriac (80) pdf
Thaana (64) pdf
NKo (64) pdf
not allocated (48)
Bengali (128) pdf
Oriya (128) pdf
Tamil (128) pdf
Telugu (128) pdf
Kannada (128) pdf
Sinhala (128) pdf
Thai (128) pdf
Lao (128) pdf
Tibetan (256) pdf
Myanmar (160) pdf
Ogham (32) pdf
Runic (96) pdf
Buhid (32) pdf
Khmer (128) pdf
Limbu (80) pdf
Tai Le (48) pdf
Batak (64) pdf
Lepcha (80) pdf




Arrows (112) pdf


Coptic (128) pdf


not allocated (16)
Kanbun (16) pdf
Lisu (48) pdf
Vai (320) pdf
Bamum (96) pdf
Rejang (48) pdf
Cham (96) pdf


High Private Use Surrogates (128)









End of range that UCS2-based Unicode implementations can store.
UCS4 implementations have no real limit, UTF-16 implementations can go beyond using surrogates.




SMP - Supplemental Multilingual Plane:
not allocated (128)
Lycian (32) pdf
Carian (64) pdf
Gothic (32) pdf
not allocated (32)
Osage (80) pdf
not allocated (144)
not allocated (128)
not allocated (48)
Hatran (32) pdf
Lydian (32) pdf
not allocated (64)
not allocated (32)
not allocated (80)
not allocated (48)
not allocated (288)
not allocated (128)
not allocated (112)
Brahmi (128) pdf
Kaithi (80) pdf
Chakma (80) pdf
Khojki (80) pdf
not allocated (48)
Grantha (128) pdf
not allocated (128)
Newa (128) pdf
not allocated (160)
Siddham (128) pdf
Modi (96) pdf
Takri (80) pdf
not allocated (48)
Ahom (64) pdf
not allocated (192)
Dogra (80) pdf
not allocated (80)
not allocated (160)
not allocated (16)
not allocated (256)
not allocated (64)
not allocated (304)
not allocated (192)
Cuneiform (1024) pdf
not allocated (2736)
not allocated (4032)
not allocated (8576)
Mro (48) pdf
not allocated (96)
not allocated (688)
not allocated (96)
Miao (160) pdf
not allocated (64)
Tangut (6144) pdf
not allocated (9472)
Nushu (400) pdf
not allocated (2304)
not allocated (4944)


not allocated (144)
not allocated (128)
not allocated (1360)
not allocated (208)
not allocated (368)
Wancho (64) pdf
not allocated (1280)
not allocated (32)
Adlam (96) pdf
not allocated (784)
not allocated (64)
not allocated (176)
not allocated (256)






not allocated (1280)




SIP - Supplemental Ideographic Plane:
not allocated (32)
not allocated (3088)
not allocated (1504)




TIP - Tertiary Ideographic Plane:
not allocated (60592)




Planes 4 through 13 - not allocated:
plane 4 (not allocated) (65536)
plane 5 (not allocated) (65536)
plane 6 (not allocated) (65536)
plane 7 (not allocated) (65536)
plane 8 (not allocated) (65536)
plane 9 (not allocated) (65536)
plane 10 (not allocated) (65536)
plane 11 (not allocated) (65536)
plane 12 (not allocated) (65536)
plane 13 (not allocated) (65536)




SSP - Supplemental Special-purpose Plane:
Tags (128) pdf
not allocated (128)
not allocated (65040)




PUA-A - Private Use Area A:




PUA-B - Private Use Area B:




Note that of the ~1.1 million codepoints under U+10FFFF (the current cap), only ~140K are general-purpose graphic codepoints (about half in BMP), ~130K are private use (with no defined characters), and ~830K are unused.

The grouping used above is somewhat arbitrary, but looks halfway sensible