2. What is Data Serialization
● The process of translating a data structure and
its object state into a format that can be stored
in a memory buffer, file or transported on a
network.
● End goal being that it can be reconstructed in
another computer environment.
3. Reasons as To why We do this
● Persist Objects [Store and later Retrieve them]
● Perform Remote Procedural Calls
● Create Distributed Objects [Corba , JavaRMI,
ICE]
4. Key Words
● Computer Environment
- Programming Languages
- Operating Systems
- Architectures and processors
● Platform Independent Solutions
5. Popular Platform Independent
Solutions
● JSON and XML
● BSON and Binary XML
● Google Protocol Buffer , Thrift , Avro
Ref
http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
6. JSON AND XML
● Most popular
● Easily Human Readable to some extent
● Most Web based APIs use it by default
● Lots of generators for this stuff
7. How to works
● You write an IDL [Interface Description
Language] . Kinda like CORBA IDLs but ,
much cleaner and more flexible.
● Pass it through a C++ based code generator
● Get your Boiler plate code in a given language
you specified
8. GOOGLE PROTOCOL BUFFERS
● This is a platform independent language
independent data serialization solution similar
to XML in structure but much smaller in size
and easier to structure .
● Been there since 2001 , made open in 2008
9. JSON BINARY FORMATS
● JSON is darn easy to read , If you can read binary , you definitely
need to see a doctor.
● JSON [Gets fat even on little Data], Binary really compact
{"deposit_money": "12345678"}
JSON BINARY
'0x6d', '0x6f', '0x6e', '0x01', '0xBC614E'
'0x65', '0x79', '0x31',
'0x32', '0x33', '0x34',
'0x35', '0x36', '0x37',
'0x38'
10. SPEED AT PARSING
● JSON is Fairly fast but , Binary is close to
machine speed since it is readily parse-able.
11. FLOW
Schema / IDL
C++ Code Generator
C++ JAVA Python JavaScript
Server /Client application bases
13. Howto Generate the Code
● Use the protobuffer compiler by specifying the
language you want out put and the file.proto
● Protoc -I=/DIR_to_Schema/
--out_language=FOLDER_TO_Buffer/
DIR_TO_Schema/file.proto
17. Runtime Performance
Server CPU AVG Client CPU AVG Time
Protobuf 30.0% 37.75% 01:19:48
JSON 20.0% 75.00% 04:44:83
XML 12.00 80.75% 05:27:45
18. Versioning
● This is to do with backward compatibility
between Protocol buffers that are old or new
● Old server new Client and Vice Versa
Even if a field has changed , the data will be
parsed
20. Reasons To use Protocol Buffers
● They are smaller to push around over
networks
● Easier [If Not easiest] to structure
● Give a sense object oriented structuring
21. Reasons Not To use it
● Well, you will have to maintain both the server
and clients .
● They may in most cases not be easy to learn
● They are not an industry standard.
● I am just trying to be fair here :)
22. SIMPLE DEMO CHAT APPS
● Simple chat application working on both
desktops, laptops and Also on different
Operating systems
● Partial Inspiration from the Fifth Estate
23. THE END
● Links to Check out
Google Protocol Buffers Main Page
https://developers.google.com/protocol-buffers/
● Apache Thrift
https://thrift.apache.org/