Changes between Version 2 and Version 3 of Documentation/NewbieGuide/CharacterSets


Ignore:
Timestamp:
04/15/05 10:46:29 (15 years ago)
Author:
gogo
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Documentation/NewbieGuide/CharacterSets

    v2 v3  
    44 
    55 
    6 [wiki:CharacterSets#stopwaffle Skip to the advice.] 
     6[http://xinha.python-hosting.com/wiki/CharacterSets#stopwaffle Skip to the advice.] 
    77 
    88 
     
    4141== UTF-8 == 
    4242 
    43 UTF-8 is just a character encoding, it says "take a string of bytes, do this algorithm over them, and you'll get a list of numbers which represent characters in the UNICODE character set".  The special thing about UNICODE is that it leaves the lower (127 characters) of ASCII intact (remembering that these characters are unchanged in UNICODE), so for most english text, UTF-8 encoded UNICODE is just the same as ASCII (which as you might expect is quite useful). 
     43UTF-8 is just a character encoding, it says "take a string of bytes, do this algorithm over them, and you'll get a list of numbers which represent characters in the UNICODE character set".  The special thing about UTF-8 is that it leaves the lower (127 characters) of ASCII intact (remembering that these characters are unchanged in UNICODE), so for most english text, UTF-8 encoded UNICODE is identical to plain old 7-bit ASCII (which as you might expect is quite useful). 
    4444 
    4545Slowly but surely the world is progessing to ONE character set (UNICODE) and ONE encoding (UTF-8), gone will be ASCII, BIG-5, SHIFT-JIS, and all those other character sets and encodings, never to darken our doorstep again. 
    4646 
    47 '''The important thing is''' - UTF-8 is ONLY used to get characters INTO Javascript, once it's there, that's it, it's just a list of numbers, nothing more, nothing less, just a list of numbers which represent characters in the UNICODE character set.  Not BIG-5, not ASCII, just UNICODE. 
     47'''The important thing is''' - UTF-8, and any other character encoding, is ONLY used to get characters IN TO Javascript, once it's there, that's it, it's just a list of numbers which are indexes into the big UNICODE character tables, nothing more, nothing less.  Not BIG-5, not ASCII, not even UTF-8 anymore, it's just UNICODE index numbers. 
    4848 
    4949{{{