Notice: I'll be updating the information in the next weeks. It'll be broken occasionally.
Input may be:
Input | |
---|---|
Type here | |
Input was: | U+2022 |
...interpreting as U+00 style format (hex codepoint, two to eight digits) | |
Character: | • (U+2022) |
Character image (experiment) | ![]() |
Characters near this | Before: ‒ – — ― ‖ ‗ ‘ ’ ‚ ‛ “ ” „ ‟ † ‡ After: ‣ ․ ‥ … ‧ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ‰ ‱ Note: unprintable, nonspacing and combining characters are shown as or beside ◌ |
Some unicode data | |
Character name | BULLET |
Categories | Other Neutrals, punctuation:Other |
Links elsewhere: | fileformat.info/info/unicode/char/2022 codepoints.net/U+2022 |
Unicode block | General Punctuation, U+2000 to U+206F (see also the according PDF on unicode.org) |
Normalization | No normalisations change the data (does not necessarily mean nothing decomposes to this form) |
Font info (experiment) | |
Encoding | |
Character stuff | |
Named entity | HTML 2: not defined HTML 3.2: not defined HTML 4 and XHTML 1.0: • XML 1.0: not defined |
Alt code | Alt 7 Alt 0149 Alt 149 (where cp1252-based, not a given) |
String stuff | |
HTML/XML numeric entities | All but basic alphanumeric encoded (hexadecimal and decimal): • • |
UTF8 bytestring | as hex: e280a2 (UTF8 bytestring length is 3) |
URL-encoded UTF8 | %E2%80%A2 |
Python string before py3k | Unicode string: u'\u2022' UTF8 bytestring: '\xe2\x80\xa2' |
...in py3k | Unicode string: '\u2022' UTF8 bytestring: b'\xe2\x80\xa2' |
Javascript (≥1.3) | "\u2022" |
LaTeX (incomplete experiment) | Character in LaTeX: \textbullet |
Encodings that can encode this properly | utf_8 utf_16 iso2022_jp_2004 iso2022_jp_3 gb18030 big5 big5hkscs euc_jis_2004 euc_jisx0213 mac_cyrillic mac_greek mac_iceland mac_latin2 mac_roman mac_turkish ptcp154 shift_jis_2004 shift_jisx0213 cp874 cp950 cp1250 cp1251 cp1252 cp1253 cp1254 cp1255 cp1256 cp1257 cp1258 |
Encodings that will damage your data | ascii latin_1 iso8859_2 iso8859_3 iso8859_4 iso8859_5 iso8859_6 iso8859_7 iso8859_8 iso8859_9 iso8859_10 iso8859_13 iso8859_14 iso8859_15 iso2022_jp iso2022_jp_1 iso2022_jp_2 iso2022_jp_ext iso2022_kr gb2312 gbk euc_jp euc_kr hz johab koi8_r koi8_u shift_jis cp037 cp424 cp437 cp500 cp737 cp775 cp850 cp852 cp855 cp856 cp857 cp860 cp861 cp862 cp863 cp864 cp865 cp866 cp869 cp875 cp932 cp949 cp1006 cp1026 cp1140 |
Note that of the ~1.1 million codepoints under U+10FFFF (the current cap), ~900K are unused, ~130K are private use, and only ~100K are general-purpose graphic codepoints (about half in BMP).
Grouping used below is somewhat arbitrary, but looks halfway sensible
For more on planes, see http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Planes
Mouseover shows a range's codepoint range, links are to unicode.org PDFs.
Nameless gray blocks are reserved, unused or restricted areas, and show their size.
If we are showing a single codepoint above, the range it is in is bolded below.