SlideShare a Scribd company logo
1 of 29
Download to read offline
marți, 13 martie 12
UTF-8
                 The What, Why, and How

                      Iulian Dogariu
                        10.03.2012

marți, 13 martie 12
Te-a°tept disearã
                          la portiÞã

marți, 13 martie 12
00000000   89   50   4e   47   0d   0a   1a   0a   00   00   00   0d   49   48   44   52   |.PNG........IHDR|
                      00000010   00   00   02   e5   00   00   00   8f   08   06   00   00   00   5d   ca   a5   |.............]..|
                      00000020   eb   00   00   0a   43   69   43   43   50   49   43   43   20   70   72   6f   |....CiCCPICC pro|
                      00000030   66   69   6c   65   00   00   78   da   9d   53   77   58   93   f7   16   3e   |file..x..SwX...>|
                      00000040   df   f7   65   0f   56   42   d8   f0   b1   97   6c   81   00   22   23   ac   |..e.VB....l.."#.|
                      00000050   08   c8   10   59   a2   10   92   00   61   84   10   12   40   c5   85   88   |...Y....a...@...|
                      00000060   0a   56   14   15   11   9c   48   55   c4   82   d5   0a   48   9d   88   e2   |.V....HU....H...|
                      00000070   a0   28   b8   67   41   8a   88   5a   8b   55   5c   38   ee   1f   dc   a7   |.(.gA..Z.U8....|
                      00000080   b5   7d   7a   ef   ed   ed   fb   d7   fb   bc   e7   9c   e7   fc   ce   79   |.}z............y|
                      00000090   cf   0f   80   11   12   26   91   e6   a2   6a   00   39   52   85   3c   3a   |.....&...j.9R.<:|
                      000000a0   d8   1f   8f   4f   48   c4   c9   bd   80   02   15   48   e0   04   20   10   |...OH......H.. .|
                      000000b0   e6   cb   c2   67   05   c5   00   00   f0   03   79   78   7e   74   b0   3f   |...g......yx~t.?|




marți, 13 martie 12
“ASCII is allright”


          41 53 43 49 49 20 69 73 20 61 6c 6c 72 69 67 68 74




marți, 13 martie 12
ASCII

marți, 13 martie 12
marți, 13 martie 12
marți, 13 martie 12
One byte
                      not enough


marți, 13 martie 12
ASCII
                      “Unicode is KEWL”


                 55 6e 69 63 6f 64 65 20 69 73 20 4b 45 57 4c 0a




marți, 13 martie 12
Unicode UCS-2
                      “Unicode is KEWL”

                 55 00 6e 00 69 00 63 00 6f 00 64 00 65 00 20 00
                 69 00 73 00 20 00 4b 00 45 00 57 00 4c 00 0a 00




marți, 13 martie 12
Two bytes
                      not enough (!)


marți, 13 martie 12
Unicode UCS-4
                                Unicode is KEWL


               55     00   00   00   6e   00   00   00   69   00   00   00   63   00   00   00
               6f     00   00   00   64   00   00   00   65   00   00   00   20   00   00   00
               69     00   00   00   73   00   00   00   20   00   00   00   4b   00   00   00
               45     00   00   00   57   00   00   00   4c   00   00   00   0a   00   00   00




marți, 13 martie 12
“0” bytes
                      “Unicode is KEWL”

                 55 00 6e 00 69 00 63 00 6f 00 64 00 65 00 20 00
                 69 00 73 00 20 00 4b 00 45 00 57 00 4c 00 0a 00




marți, 13 martie 12
UCS-2

           ... 55 f4 3a ff 6e ac 12 43 69 fa 3f 1a 63 ff ...




            ... 55 f4 3a ff 6e ac 12 43 69 fa 3f 1a 63 ff ...




marți, 13 martie 12
This way?
           ... 55 f4 3a ff 6e ac 12 43 69 fa 3f 1a 63 ff ...




                       ... or This way ?
            ... 55 f4 3a ff 6e ac 12 43 69 fa 3f 1a 63 ff ...




marți, 13 martie 12
Little Endian

                      “U”   55 00



                      “U”   00 55

                              Big Endian
marți, 13 martie 12
1) Storage space
                  2) “0” bytes
                  3) Synchronisation
                  4) Endianness

marți, 13 martie 12
UTF-8

marți, 13 martie 12
Variable length encoding


                      a              ж                龍

                      61             d0 b6        e9 be 8d




marți, 13 martie 12
Unicode number
                           00 61


              a
                      0000 0000   0110 0001




                           0--- ----




                                              UTF-8 encoding
marți, 13 martie 12
Unicode number
                           00 61


              a
                      0000 0000   0110 0001




                           0110 0001


                             61
                                              UTF-8 encoding
marți, 13 martie 12
Unicode number
                           04 36


          ж
                      0000 0100   0011 0100




                      11-- ----   10-- ----




                                              UTF-8 encoding
marți, 13 martie 12
Unicode number
                           04 36


          ж
                      0000 0100   0011 0100




                      1101 0000   1011 0110


                           d0 b6
                                              UTF-8 encoding
marți, 13 martie 12
Unicode number
                                  9f 8d
                           1001 1111   1000 1101




             龍        111- ----   10-- ----   10-- ----




                                                   UTF-8 encoding
marți, 13 martie 12
Unicode number
                                  9f 8d
                           1001 1111   1000 1101




             龍        1110 1001   1011 1110   1000 1101


                              e9 be 8d
                                                   UTF-8 encoding
marți, 13 martie 12
Variable length encoding

              a       61         0110 0001



            ж         d0 b6      1101 0000   1011 0110




            龍         e9 be 8d   1110 1001   1011 1110   1000 1101




marți, 13 martie 12
1) Storage space
                  2) “0” bytes
                  3) Synchronisation
                  4) Endianness

marți, 13 martie 12
Thank you!
                      And please don’t forget the evaluation form :-)




marți, 13 martie 12

More Related Content

More from Codecamp Romania

Cezar chitac the edge of experience
Cezar chitac   the edge of experienceCezar chitac   the edge of experience
Cezar chitac the edge of experienceCodecamp Romania
 
Business analysis techniques exercise your 6-pack
Business analysis techniques   exercise your 6-packBusiness analysis techniques   exercise your 6-pack
Business analysis techniques exercise your 6-packCodecamp Romania
 
Bpm company code camp - configuration or coding with pega
Bpm company   code camp - configuration or coding with pegaBpm company   code camp - configuration or coding with pega
Bpm company code camp - configuration or coding with pegaCodecamp Romania
 
Andrei prisacaru takingtheunitteststothedatabase
Andrei prisacaru takingtheunitteststothedatabaseAndrei prisacaru takingtheunitteststothedatabase
Andrei prisacaru takingtheunitteststothedatabaseCodecamp Romania
 
2015 dan ardelean develop for windows 10
2015 dan ardelean   develop for windows 10 2015 dan ardelean   develop for windows 10
2015 dan ardelean develop for windows 10 Codecamp Romania
 
The case for continuous delivery
The case for continuous deliveryThe case for continuous delivery
The case for continuous deliveryCodecamp Romania
 
Stefan stolniceanu spritekit, 2 d or not 2d
Stefan stolniceanu   spritekit, 2 d or not 2dStefan stolniceanu   spritekit, 2 d or not 2d
Stefan stolniceanu spritekit, 2 d or not 2dCodecamp Romania
 
Sizing epics tales from an agile kingdom
Sizing epics   tales from an agile kingdomSizing epics   tales from an agile kingdom
Sizing epics tales from an agile kingdomCodecamp Romania
 
Raluca butnaru corina cilibiu the unknown universe of a product and the cer...
Raluca butnaru corina cilibiu   the unknown universe of a product and the cer...Raluca butnaru corina cilibiu   the unknown universe of a product and the cer...
Raluca butnaru corina cilibiu the unknown universe of a product and the cer...Codecamp Romania
 
Parallel & async processing using tpl dataflow
Parallel & async processing using tpl dataflowParallel & async processing using tpl dataflow
Parallel & async processing using tpl dataflowCodecamp Romania
 
Material design screen transitions in android
Material design screen transitions in androidMaterial design screen transitions in android
Material design screen transitions in androidCodecamp Romania
 
Kickstart your own freelancing career
Kickstart your own freelancing careerKickstart your own freelancing career
Kickstart your own freelancing careerCodecamp Romania
 
Ionut grecu the soft stuff is the hard stuff. the agile soft skills toolkit
Ionut grecu   the soft stuff is the hard stuff. the agile soft skills toolkitIonut grecu   the soft stuff is the hard stuff. the agile soft skills toolkit
Ionut grecu the soft stuff is the hard stuff. the agile soft skills toolkitCodecamp Romania
 
Diana antohi me against myself or how to fail and move forward
Diana antohi   me against myself  or how to fail  and move forwardDiana antohi   me against myself  or how to fail  and move forward
Diana antohi me against myself or how to fail and move forwardCodecamp Romania
 

More from Codecamp Romania (20)

Cezar chitac the edge of experience
Cezar chitac   the edge of experienceCezar chitac   the edge of experience
Cezar chitac the edge of experience
 
Cloud powered search
Cloud powered searchCloud powered search
Cloud powered search
 
Ccp
CcpCcp
Ccp
 
Business analysis techniques exercise your 6-pack
Business analysis techniques   exercise your 6-packBusiness analysis techniques   exercise your 6-pack
Business analysis techniques exercise your 6-pack
 
Bpm company code camp - configuration or coding with pega
Bpm company   code camp - configuration or coding with pegaBpm company   code camp - configuration or coding with pega
Bpm company code camp - configuration or coding with pega
 
Andrei prisacaru takingtheunitteststothedatabase
Andrei prisacaru takingtheunitteststothedatabaseAndrei prisacaru takingtheunitteststothedatabase
Andrei prisacaru takingtheunitteststothedatabase
 
Agility and life
Agility and lifeAgility and life
Agility and life
 
2015 dan ardelean develop for windows 10
2015 dan ardelean   develop for windows 10 2015 dan ardelean   develop for windows 10
2015 dan ardelean develop for windows 10
 
The bigrewrite
The bigrewriteThe bigrewrite
The bigrewrite
 
The case for continuous delivery
The case for continuous deliveryThe case for continuous delivery
The case for continuous delivery
 
Stefan stolniceanu spritekit, 2 d or not 2d
Stefan stolniceanu   spritekit, 2 d or not 2dStefan stolniceanu   spritekit, 2 d or not 2d
Stefan stolniceanu spritekit, 2 d or not 2d
 
Sizing epics tales from an agile kingdom
Sizing epics   tales from an agile kingdomSizing epics   tales from an agile kingdom
Sizing epics tales from an agile kingdom
 
Scale net apps in aws
Scale net apps in awsScale net apps in aws
Scale net apps in aws
 
Raluca butnaru corina cilibiu the unknown universe of a product and the cer...
Raluca butnaru corina cilibiu   the unknown universe of a product and the cer...Raluca butnaru corina cilibiu   the unknown universe of a product and the cer...
Raluca butnaru corina cilibiu the unknown universe of a product and the cer...
 
Parallel & async processing using tpl dataflow
Parallel & async processing using tpl dataflowParallel & async processing using tpl dataflow
Parallel & async processing using tpl dataflow
 
Material design screen transitions in android
Material design screen transitions in androidMaterial design screen transitions in android
Material design screen transitions in android
 
Kickstart your own freelancing career
Kickstart your own freelancing careerKickstart your own freelancing career
Kickstart your own freelancing career
 
Ionut grecu the soft stuff is the hard stuff. the agile soft skills toolkit
Ionut grecu   the soft stuff is the hard stuff. the agile soft skills toolkitIonut grecu   the soft stuff is the hard stuff. the agile soft skills toolkit
Ionut grecu the soft stuff is the hard stuff. the agile soft skills toolkit
 
Ecma6 in the wild
Ecma6 in the wildEcma6 in the wild
Ecma6 in the wild
 
Diana antohi me against myself or how to fail and move forward
Diana antohi   me against myself  or how to fail  and move forwardDiana antohi   me against myself  or how to fail  and move forward
Diana antohi me against myself or how to fail and move forward
 

CodeCamp Iasi 10 march 2012 - UTF-8

  • 2. UTF-8 The What, Why, and How Iulian Dogariu 10.03.2012 marți, 13 martie 12
  • 3. Te-a°tept disearã la portiÞã marți, 13 martie 12
  • 4. 00000000 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 |.PNG........IHDR| 00000010 00 00 02 e5 00 00 00 8f 08 06 00 00 00 5d ca a5 |.............]..| 00000020 eb 00 00 0a 43 69 43 43 50 49 43 43 20 70 72 6f |....CiCCPICC pro| 00000030 66 69 6c 65 00 00 78 da 9d 53 77 58 93 f7 16 3e |file..x..SwX...>| 00000040 df f7 65 0f 56 42 d8 f0 b1 97 6c 81 00 22 23 ac |..e.VB....l.."#.| 00000050 08 c8 10 59 a2 10 92 00 61 84 10 12 40 c5 85 88 |...Y....a...@...| 00000060 0a 56 14 15 11 9c 48 55 c4 82 d5 0a 48 9d 88 e2 |.V....HU....H...| 00000070 a0 28 b8 67 41 8a 88 5a 8b 55 5c 38 ee 1f dc a7 |.(.gA..Z.U8....| 00000080 b5 7d 7a ef ed ed fb d7 fb bc e7 9c e7 fc ce 79 |.}z............y| 00000090 cf 0f 80 11 12 26 91 e6 a2 6a 00 39 52 85 3c 3a |.....&...j.9R.<:| 000000a0 d8 1f 8f 4f 48 c4 c9 bd 80 02 15 48 e0 04 20 10 |...OH......H.. .| 000000b0 e6 cb c2 67 05 c5 00 00 f0 03 79 78 7e 74 b0 3f |...g......yx~t.?| marți, 13 martie 12
  • 5. “ASCII is allright” 41 53 43 49 49 20 69 73 20 61 6c 6c 72 69 67 68 74 marți, 13 martie 12
  • 9. One byte not enough marți, 13 martie 12
  • 10. ASCII “Unicode is KEWL” 55 6e 69 63 6f 64 65 20 69 73 20 4b 45 57 4c 0a marți, 13 martie 12
  • 11. Unicode UCS-2 “Unicode is KEWL” 55 00 6e 00 69 00 63 00 6f 00 64 00 65 00 20 00 69 00 73 00 20 00 4b 00 45 00 57 00 4c 00 0a 00 marți, 13 martie 12
  • 12. Two bytes not enough (!) marți, 13 martie 12
  • 13. Unicode UCS-4 Unicode is KEWL 55 00 00 00 6e 00 00 00 69 00 00 00 63 00 00 00 6f 00 00 00 64 00 00 00 65 00 00 00 20 00 00 00 69 00 00 00 73 00 00 00 20 00 00 00 4b 00 00 00 45 00 00 00 57 00 00 00 4c 00 00 00 0a 00 00 00 marți, 13 martie 12
  • 14. “0” bytes “Unicode is KEWL” 55 00 6e 00 69 00 63 00 6f 00 64 00 65 00 20 00 69 00 73 00 20 00 4b 00 45 00 57 00 4c 00 0a 00 marți, 13 martie 12
  • 15. UCS-2 ... 55 f4 3a ff 6e ac 12 43 69 fa 3f 1a 63 ff ... ... 55 f4 3a ff 6e ac 12 43 69 fa 3f 1a 63 ff ... marți, 13 martie 12
  • 16. This way? ... 55 f4 3a ff 6e ac 12 43 69 fa 3f 1a 63 ff ... ... or This way ? ... 55 f4 3a ff 6e ac 12 43 69 fa 3f 1a 63 ff ... marți, 13 martie 12
  • 17. Little Endian “U” 55 00 “U” 00 55 Big Endian marți, 13 martie 12
  • 18. 1) Storage space 2) “0” bytes 3) Synchronisation 4) Endianness marți, 13 martie 12
  • 20. Variable length encoding a ж 龍 61 d0 b6 e9 be 8d marți, 13 martie 12
  • 21. Unicode number 00 61 a 0000 0000 0110 0001 0--- ---- UTF-8 encoding marți, 13 martie 12
  • 22. Unicode number 00 61 a 0000 0000 0110 0001 0110 0001 61 UTF-8 encoding marți, 13 martie 12
  • 23. Unicode number 04 36 ж 0000 0100 0011 0100 11-- ---- 10-- ---- UTF-8 encoding marți, 13 martie 12
  • 24. Unicode number 04 36 ж 0000 0100 0011 0100 1101 0000 1011 0110 d0 b6 UTF-8 encoding marți, 13 martie 12
  • 25. Unicode number 9f 8d 1001 1111 1000 1101 龍 111- ---- 10-- ---- 10-- ---- UTF-8 encoding marți, 13 martie 12
  • 26. Unicode number 9f 8d 1001 1111 1000 1101 龍 1110 1001 1011 1110 1000 1101 e9 be 8d UTF-8 encoding marți, 13 martie 12
  • 27. Variable length encoding a 61 0110 0001 ж d0 b6 1101 0000 1011 0110 龍 e9 be 8d 1110 1001 1011 1110 1000 1101 marți, 13 martie 12
  • 28. 1) Storage space 2) “0” bytes 3) Synchronisation 4) Endianness marți, 13 martie 12
  • 29. Thank you! And please don’t forget the evaluation form :-) marți, 13 martie 12