encoding, ascii

ACSII – 7 bit character encoding for unaccented english letters (from 32 to 127: space = 32, “A”=65)
code pages – showed different ways to use other bits (from 128 to 255) depends where you lived (Central Europe Windows 1250: latin capital O with acute Ó = 211)
unicode – a single character set coverring all characters on planet (“A” = U+0041) (0x41=65 decimal)
UTF-8 – system of storing a string of unicode code points (U+hex number): on 1 byte for 0-127 code point, on 2 or more bytes for >127 numbers. English text looks like exactly the same in UTF-8 as in ASCII or ANSI.

You need to know to the encoding of the text, don’t assume it is ASCII. Use
Content-Type: text/plain; charset=”UTF-8″ (e.g. in email header) or

Traditional encodings may store only some of the code points correctly and change the others into question marks “?”. UTF-8 can store any code point correctly.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s