Guide to String Encodings

•Descargar como KEY, PDF•

1 recomendación•535 vistas

Mauro Pompilio

Tecnología

ENCODINGS
" abc ".encode("UTF-32BE")

"x00x00x00ax00x00x00bx00x00x00c"

STRING EXAMPLES

e = "é"
e.encoding.name # => UTF-8
e.size # => 1
e.bytesize # => 2

FORCING ENCODINGS
x.encoding.name #=> ISO-8859-1
x.bytesize #=> 6
x.valid_encoding? #=> true
x.force_encoding("UTF-8")
x.encoding.name #=> UTF-8
x.bytesize #=> 6
x.valid_encoding? #=> false
x =~ /x/ #=>

invalid byte sequence in UTF-8

TRANSCODING
x.encoding.name #=> ISO-8859-1
x.bytesize #=> 6
x.valid_encoding? #=> true
x.encode!("UTF-8")
x.encoding.name #=> UTF-8
x.bytesize #=> 12
x.valid_encoding? #=> true

COMPATIBILITY
if Encoding.compatible?(ascii_string, utf8_string)
new_string = ascii_string + utf8_string
new_string.encoding.name #=> UTF-8
end

$ITERATION x.bytes x.each_byte {|b| puts b } x.codepoints x.each_codepoint {|c| puts c } x.chars x.each_char {|c| puts c } x.lines x.each_line {|l| puts l }$

INTERNAL ENCODING
> ruby -E :UTF-8

Encoding.default_internal = 'UTF-8'

EXTERNAL ENCODING
> ruby -E UTF-8:

Encoding.default_external = 'UTF-8'

File.open("ﬁle.txt", "w:UTF-8")

WHAT DO?

USE MAGIC COMMENTS

DECLARE IO ENCODINGS

CONVERT BEFORE COMPARISONS

UNICODE-UTILS GEM

Thanks!
http://blog.grayproductions.net/
articles/understanding_m17n

Más contenido relacionado

La actualidad más candente

Function Math 8guest4c92508

Ruby: OOP, metaprogramming, blocks, iterators, mix-ins, duck typing. Code styleAnton Shemerey

perl_lessonstutorialsruby

Swift - Krzysztof SkarupaSunscrapers

Short intro to the Rust languageGines Espada

Intro to F#Kristian Hellang

여자개발자모임터 6주년 개발 세미나 - Scala LanguageAshal aka JOKER

Perl6 a whistle stop tourSimon Proctor

tutorial7tutorialsruby

Postgresql and rorMichał Czyż

La actualidad más candente (11)

Function Math 8

Ruby: OOP, metaprogramming, blocks, iterators, mix-ins, duck typing. Code style

perl_lessons

Swift - Krzysztof Skarupa

Short intro to the Rust language

Intro to F#

여자개발자모임터 6주년 개발 세미나 - Scala Language

Perl6 a whistle stop tour

tutorial7

Postgresql and ror

Destacado

Cassandra and Spark datastaxjp

Thu 1500 lacoul_shamod_colorDATAVERSITY

Cassandra Summit 2015: Intro to DSE SearchCaleb Rackliffe

Cassandra Read/Write Pathsjdsumsion

DataStax & 451 Group Webinar - Real NoSQL Applications in the Enterprise TodayDataStax

Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...DataStax Academy

Managing Objects and Data in Apache CassandraDataStax

Apache cassandraMuralidharan Deenathayalan

Destacado (8)

Cassandra and Spark

Thu 1500 lacoul_shamod_color

Cassandra Summit 2015: Intro to DSE Search

Cassandra Read/Write Paths

DataStax & 451 Group Webinar - Real NoSQL Applications in the Enterprise Today

Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...

Managing Objects and Data in Apache Cassandra

Apache cassandra

Similar a Guide to String Encodings

Unicode and character setsrenchenyu

20220112 sac v1Sharon Liu

Unicode, PHP, and Character Set CollisionsRay Paseur

The 9th Bit: Encodings in Ruby 1.9Norman Clarke

Camomile : A Unicode library for OCamlYamagata Yoriyuki

FYP Final PresentationJason Yeo Jie Shun

Compiler Construction | Lecture 13 | Code GenerationEelco Visser

SlidesMohamed Mustaq Ahmed

Encodings - Ruby 1.8 and Ruby 1.9Dimelo R&D Team

Kotlin: maybe it's the right timeDavide Cerbo

Hypers and Gathers and Takes! Oh my!Workhorse Computing

P3 2018 python_regexesProf. Wim Van Criekinge

Towards Programming Languages for Reasoning.pptxmarkmarron7

Python Cheat SheetMuthu Vinayagam

Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017 Codemotion

Meet scalaWojciech Pituła

Hidden treasures of RubyTom Crinson

Ruby -the wheel Technologyppparthpatel123

Digital Electronics Basics by Er. Swapnil KawareProf. Swapnil V. Kaware

Hive Functions Cheat SheetHortonworks

Similar a Guide to String Encodings (20)

Unicode and character sets

20220112 sac v1

Unicode, PHP, and Character Set Collisions

The 9th Bit: Encodings in Ruby 1.9

Camomile : A Unicode library for OCaml

FYP Final Presentation

Compiler Construction | Lecture 13 | Code Generation

Slides

Encodings - Ruby 1.8 and Ruby 1.9

Kotlin: maybe it's the right time

Hypers and Gathers and Takes! Oh my!

P3 2018 python_regexes

Towards Programming Languages for Reasoning.pptx

Python Cheat Sheet

Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017

Meet scala

Hidden treasures of Ruby

Ruby -the wheel Technology

Digital Electronics Basics by Er. Swapnil Kaware

Hive Functions Cheat Sheet

Último

🐬 The future of MySQL is Postgres 🐘RTylerCroy

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

A Domino Admins Adventures (Engage 2024)Gabriella Davis

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Slack Application Development 101 Slidespraypatel2

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Guide to String Encodings

1. STRING ENCODINGS

2. I’m still @malditogeek

3. We’re hiring!

4. STRING ENCODINGS

7. FIRST THERE WAS... ASCII

9. BUT WHAT ABOUT THE 8TH BIT?

10. UH OH!

11. BUT MY LANGUAGE HAS 2000 CHARACTERS

12. WELCOME TO MULTIBYTE ENCODINGS

13. UNICODE

14. CHARACTER SETS vs ENCODINGS

15. CHARACTER SETS U+0061 = 97 = a

16. ENCODINGS " abc ".encode("UTF-32BE") "x00x00x00ax00x00x00bx00x00x00c"

17. UTF-8

18. RUBY 1.8 STRINGS ARE JUST BYTE ARRAYS!

19. $KCODE=’u’

20. RUBY 1.9

21. M17N

22. STRING EXAMPLES e = "é" e.encoding.name # => UTF-8 e.size # => 1 e.bytesize # => 2

23. FORCING ENCODINGS x.encoding.name #=> ISO-8859-1 x.bytesize #=> 6 x.valid_encoding? #=> true x.force_encoding("UTF-8") x.encoding.name #=> UTF-8 x.bytesize #=> 6 x.valid_encoding? #=> false x =~ /x/ #=> invalid byte sequence in UTF-8

24. TRANSCODING x.encoding.name #=> ISO-8859-1 x.bytesize #=> 6 x.valid_encoding? #=> true x.encode!("UTF-8") x.encoding.name #=> UTF-8 x.bytesize #=> 12 x.valid_encoding? #=> true

25. COMPATIBILITY if Encoding.compatible?(ascii_string, utf8_string) new_string = ascii_string + utf8_string new_string.encoding.name #=> UTF-8 end

26. ITERATION x.bytes x.each_byte {|b| puts b } x.codepoints x.each_codepoint {|c| puts c } x.chars x.each_char {|c| puts c } x.lines x.each_line {|l| puts l }

27. SOURCE ENCODING # encoding: UTF-8

28. INTERNAL ENCODING > ruby -E :UTF-8 Encoding.default_internal = 'UTF-8'

29. EXTERNAL ENCODING > ruby -E UTF-8: Encoding.default_external = 'UTF-8' File.open("ﬁle.txt", "w:UTF-8")

30. WHAT DO? USE MAGIC COMMENTS DECLARE IO ENCODINGS CONVERT BEFORE COMPARISONS UNICODE-UTILS GEM

31. C’EST FINI

32. We’re hiring!

33. Thanks! http://blog.grayproductions.net/ articles/understanding_m17n

Notas del editor

\n
\n
\n
\n
\n
\n
\n
128 Chars\nEnglish upper, lower symbols + some control charaters\n
128 chars = 7 bits.\n1 byte = 8 bits.\nWe have a whole 128 other characters to play with.\n
Everyone wants a different 128 chars.\nEncodings were born.\nLatin 1 is one of the most popular, it uses the other 128 chars mostly for accented alphabets.\n
It would be simple if we all spoke english!\njapanese, chinese, etc.\n
The only way we can deal with this is to comprise characters of more than 1 byte.\nThis is a big issue for programming as we now have to be very mindful not to split in the middle of a char.\n
Ideally we would only ever have 1 encoding to deal with.\nThis isn&#x2019;t the case but unicode is as close as it gets.\n
There different\n\n
Character sets are just internal mappings from numbers to strings\n\nhex - int - char\n
here we are writing a lot of null chars\n
Unicode encoding that is 100% compat with US-ASCII\nuses variable length characters to maintain compat\nBest current solution for compat, size, support\n
as we&#x2019;ve seen multibyte sets are common and useful.\nRuby doesn&#x2019;t care about encodings, it just displays characters based on a mapping.\nreverse(), split(), size() all break multibyte strings very easily.\nBut ruby doesn&#x2019;t check so you&#x2019;ll never really know!\n
Ruby 1.8 supports utf-8 right?\nkinda\nunicode support extends to understanding boundaries so magically, reverse(), split(), match() all work.\nUnfortunately it&#x2019;s very basic support so transforms like upcase() will bite you in the arse.\nAlso kcode is global so there is no way to deal with more than 1 encoding at a time.\n
Ruby 1.9 brings full string encoding support for over 80 encodings.\nString objects now reference both raw bytes and an Encoding object.\n\n
\n
a string is aware of it&#x2019;s encoding, character length and byte length\n
force_encoding will retag a string with a different encoding, however the actual bytes are not modified.\nIf the string is tagged with an invalid encoding then it can cause ArgumentException&#x2019;s when trying to manipulate the strings characters.\n
calling encode() actually modifies the underlying string.\n
\n
String.each is gone in 1.9\nreplaced with explicit each methods and Enumerator functions\n
\n
\n
\n
http://unicode-utils.rubyforge.org/ for locale management\n
\n
\n
\n

Guide to String Encodings

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (11)

Destacado

Destacado (8)

Similar a Guide to String Encodings

Similar a Guide to String Encodings (20)

Último

Último (20)

Guide to String Encodings

Notas del editor