SlideShare a Scribd company logo
1 of 29
Download to read offline
PARQUET & AVRO
http://airisdata.com/
Presenter Introduction
•Tim	Spann,	Senior	Solutions	Architect,	airis.DATA
• ex-Pivotal	Senior	Field	Engineer
• DZONE	MVB	and	Zone	Leader
• ex-Startup	Senior	Engineer	/	Team	Lead
• http://www.slideshare.net/bunkertor
• http://sparkdeveloper.com/
• http://www.twitter.com/PaasDev
•Srinivas	Daruna	
Data	Engineer,	airis.DATA
Spark	Certified	Developer
Presenter Introduction
airis.DATA
airis.DATA is	a	next	generation	system	integrator	that	specializes	in	rapidly	deployable	
machine	learning	and	graph	solutions.	
Our	core	competencies	involve	providing	modular,	scalable	Big	Data	products	that	can	be	
tailored	to	fit	use	cases	across	industry	verticals.	
We	offer	predictive	modeling	and	machine	learning	solutions	at	Petabyte	scale	utilizing	
the	most	advanced,	best-in-class	technologies	and	frameworks	including	Spark,	H20,	
Mahout,	and	Flink.	
Our	data	pipelining	solutions	can	be	deployed	in	batch,	real-time	or	near-real-time	
settings	to	fit	your	specific	business	use-case.	
airis.DATA
qAvro	and	Parquet	- When	and	Why	to	use	which	format?
qUse	cases	for	Schema	Evolution	&	practical	examples
qData	modeling	- Avro	and	Parquet	schema
qWorkshop
- Read	Avro	input	from	Kafka
- Transform	data	in	Spark
- Write	data	frame	to	Parquet
- Read	back	from	Parquet
qOur	experiences	with	Avro	and	Parquet
qSome	helpful	insights	for	projects	design
Agenda
AVRO - Introduction
Ø Doug	Cutting	created	Avro,	a	data	serialization	and	RPC		library,	to	help	improve	data	
interchange,	interoperability,	and	versioning	in	Hadoop	Eco	System.	
ØSerialization	&	RPC	Library	and	also	storage	format.
ØWhat	led	to	a	new	serialization	mechanism.?
ØThrift	and	PB	are	not	splittable and	codegen required	for	both	of	them.	Dynamic	reading	is	
not	possible
ØSequence	files	does	not	have	schema	evolution
ØEvolved	as	in-house	serialization	and	RPC	library	for	hadoop.	Good	overall	performance	that	
can	match	up	to	Protocol	Buffers	in	some	aspects.
Ø Dynamic	Access – No	need	of	Code	generation	for	accessing	the	data.
Ø UnTagged Data – Which	allows	better	compression
Ø Platform	in-dependent – Has	libraries	in	Java,	Scala,	Python,	Ruby,	C	and	C#.
Compressible	and	Splittable – Complements	the	parllel processing	systems	such	as	MR	and	
Spark.
Ø Schema	Evolution: “Data	models	evolve	over	time”,	and	it’s	important	that	your
data	formats	support	 your	need	to	modify	your	data	models.	Schema	evolution
allows	you	to	add,	modify,	and	in	some	cases	delete	attributes,	while	at	the	same
time	providing	 backward	and	forward	compatibility	for	readers	and	writers
Some Important features of AVRO
• Row	Based
• Direct	mapping	from/to	JSON
• Interoperability:	can	serialize	into	Avro/Binary	or	Avro/Json
• Provides	rich	data	structures
• Map	keys	can	only	be	strings	(could	be	seen	as	a	limitation)
• Compact	binary	form
• Extensible	schema	language
• Untagged	data
• Bindings	for	a	wide	variety	of	programming	languages
• Dynamic	typing
• Provides	a	remote	procedure	call
• Supports	block	compression
• Avro	files	are	splittable
• Best	compatibility	for	evolving	data	schemas
Summary of AVRO Properties
AVRO Schema Types
Primitive Types
null: no value
boolean: a binary value
int: 32-bit signed integer
long: 64-bit signed integer
float: single precision (32-bit) IEEE 754
floating-point number
double: double precision (64-bit) IEEE 754
floating-point number
bytes: sequence of 8-bit unsigned bytes
string: unicode character sequence
Primitive types have no specified attributes.
Primitive type names are also defined type
names. Thus, for example, the schema "string"
is equivalent to: {"type":	"string"}
Complex Types
Records
Enums
Arrays
Maps
Unions
Fixed
https://avro.apache.org/docs/current/spec.html
Avro Schema
Understanding	Avro	schema	is	very	important	for	Avro	
Data.	
Ø JSON	Format	is	used	to	define	schema
Ø Simpler	than	IDL(Interface	Definition	Language)	of	
Protocol	Buffers	and	thrift
Ø very	useful	in	RPC.	Schemas	will	be	exchanged	to	
ensure	the	data	correctness
Ø You	can	specify	order	(Ascending	or	Descending)	for	
fields.			
Sample	Schema:
{
"type":	"record",
"name":	"Meetup",
"fields":	[	{
"name":	"name",
"type":	"string”,
"order"	:	"descending"
}, {
"name":	"value",
"type":	["null",	"string”]
}
…..
]
}
Union
File Structure - Avro
Workshop - Code Examples
Ø Java	API	to	create	Avro	file		- API	Support
Ø Hive	Query	to	create	External	table	with	Avro	Storage	Format	– Schema	Evolution
Ø Accessing	avro file	generated	from	Java	in	Python	– Language	Independence
Ø Spark-Avro	data	access
Few interesting things…
• Avro	Cli – Avro	Tools	jar	that	can	provide	some	command	line	help
• Thrift	and	Protocol	Buffers
• Kryo
• Jackson-avro-databind java	API
• Project	Kiji (	Schema	management	in	Hbase)
Please	drop	mail	for	support	if	you	have	any	issues	or	if	you	have	suggestions	on	Avro
PARQUET - Introduction
• Columnar	storage	format	that	come	out	of	a	collaboration	between	
Twitter	and	Cloudera	based	on	Dremel
• What	is	a	storage	format?
• Well-suited	to	OLAP	workloads
• High	level	of	integration	with	Hadoop	and	the	ecosystem	(Hive,	
Impala	and	Spark)
• Interoperable	Avro,	Thrift	and	Protocol	Buffers
PARQUET cont..
• Allows	compression.	Currently	supports	Snappy	and	Gzip.
• Well	supported	over	Hadoop eco	system.	
• Very	well	integrated	with	Spark	SQL	and	DataFrames.
• Predicate	pushdown:	Projection	and	predicate	pushdowns	involve	an	
execution	engine	pushing	the	projection	and	predicates	down	to	the	
storage	format	to	optimize	the	operations	at	the	lowest	level	possible.
• I/O	to	a	minimum	by	reading	from	a	disk	only	the	data	required	for	
the	query.
• Schema	Evolution	to	some	extent.	Allows	adding	new	columns	at	the	
end.
• Language	independent.	Supports	Scala,	Java,	C++,	Python.
File Structures - Parquet
message Meetup {
required binary name (UTF8);
required binary meetup_date (UTF8);
required int32 going;
required binary organizer (UTF8);
required group topics (LIST) {
repeated binary array (UTF8);
}
}
Sample Schema
Ø Be	careful	with	Parquet	Data	types	
Ø Does	not	have	a	good	stand	alone	API	as	Avro,	have	converters	for	Avro,	PB	and	Thrift	instead
Ø Flattens	all	nested	data	types	in	order	to	save	them	as	columnar	structures
Nested Schema resolution
Parquet few important notes..
• Parquet	requires	a	lot	of	memory	when	writing	files	because	it	
buffers	writes	in	memory	to	optimize	the	encoding	and	
compressing	of	the	data
• Using	a	heavily	nested	data	structure	with	Parquet	will	likely	limit	
some	of	the	optimizations	that	Parquet	makes	for	pushdowns.	If	
possible,	try	to	flatten	your	schema
Code examples
• Java	API
• Spark	Example
• Kafka	Exmple
How to decide on storage format
• What	kind	of	data	you	have?
• What	is	the	processing	framework?	Future	and	Current
• Data	processing	and	querying
• Do	you	have	RPC/IPC
• How	much	schema	evolution	do	you	have?
Our experiences with Parquet and Avro
Name	space	re-definitions
<item>
<chapters>
<content>
<name>content1</name>
<pages>100</pages>
</content>
</chapters>
</item>
<otheritem>
<chapters>
<othercontent>
<randomname>xyz</randomname>
<someothername>abcd</someothername>
</othercontent>
</chapters>
</otheritem>
Is Agenda Accomplished..??
üAvro	and	Parquet	- When	and	Why	to	use	which	format?
üUse	cases	for	Schema	Evolution	&	practical	examples
üData	modeling	- Avro	and	Parquet	schema
üWorkshop
- Read	Avro	input	from	Kafka
- Transform	data	in	Spark
- Write	data	frame	to	Parquet
- Read	back	from	Parquet
üOur	experiences	with	Avro	and	Parquet
üSome	helpful	insights	for	projects	design
Questions… ????????
Notes
• https://dzone.com/articles/where-should-i-store-hadoop-data
• https://developer.ibm.com/hadoop/blog/2016/01/14/5-reasons-to-choose-parquet-for-spark-sql/
• http://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-stampedecon-2015
• http://parquet.apache.org/
• https://github.com/cloudera/parquet-examples
• http://avro.apache.org/docs/current/spec.html#schema_primitive
• http://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
• https://cwiki.apache.org/confluence/display/AVRO/FAQ
• http://avro.apache.org/
• https://github.com/miguno/avro-cli-examples
• http://avro.apache.org/docs/current/spec.html#schema_primitive
• https://dzone.com/articles/getting-started-apache-avro
• https://github.com/databricks/spark-avro
Notes
• https://github.com/twitter/bijection
• https://github.com/mkuthan/example-spark
• http://blog.cloudera.com/blog/2015/09/making-apache-spark-testing-easy-with-spark-testing-base/
• https://github.com/databricks/spark-avro/blob/master/README.md
• http://www.bigdatatidbits.cc/2015/01/how-to-load-some-avro-data-into-spark.html
• http://engineering.intenthq.com/2015/08/pucket/
• https://dzone.com/articles/understanding-how-parquet
• http://blog.cloudera.com/blog/2015/03/converting-apache-avro-data-to-parquet-format-in-apache-hadoop/
Our	Solutions	Team
o Prasad	Sripathi,	CEO,	Experienced	Big	Data	Architect,	Head	of	NJ	Data	Science	and	Hadoop	Meetups
o Sergey	Fogelson,	PhD,	Director,	Data	Science
o Eric	Marshall, Senior	Systems	Architect,	Hortonworks	Hadoop	Certified	Administrator
o Kristina	Rogale Plazonic,	Spark	Certified	Data	Engineer
o Ravi	Kora,	Spark	Certified	Senior	Data	Scientist
o Srinivasarao Daruna, Spark	Certified	Data	Engineer
o Srujana Kuntumalla,Spark	Certified	Data	Engineer
o Tim	Spann,Senior	Solutions	Architect,	ex-Pivotal
o Rajiv	Singla,	Data	Engineer
o Suresh	Kempula,	Data	Engineer
Technology Stack

More Related Content

What's hot

Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioAlluxio, Inc.
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudDatabricks
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 

What's hot (20)

Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 

Viewers also liked

Data Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeData Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeArvind Prabhakar
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Adam Muise
 
Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches DataWorks Summit
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis Omid Vahdaty
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016Jayesh Thakrar
 
大型电商的数据服务的要点和难点
大型电商的数据服务的要点和难点 大型电商的数据服务的要点和难点
大型电商的数据服务的要点和难点 Chao Zhu
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascienceAdam Muise
 

Viewers also liked (9)

Data Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeData Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache Flume
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
 
大型电商的数据服务的要点和难点
大型电商的数据服务的要点和难点 大型电商的数据服务的要点和难点
大型电商的数据服务的要点和难点
 
Parquet overview
Parquet overviewParquet overview
Parquet overview
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 

Similar to Parquet and AVRO

Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark OverviewairisData
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empowerDurga Gadiraju
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Oracle Cloud - Infrastruktura jako kód
Oracle Cloud - Infrastruktura jako kódOracle Cloud - Infrastruktura jako kód
Oracle Cloud - Infrastruktura jako kódMarketingArrowECS_CZ
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Cloudera, Inc.
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Timothy Spann
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenWes McKinney
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Michael Rys
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production Paolo Platter
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25Cask Data
 
Apache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceTimothy Spann
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsDr. Mirko Kämpf
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsDr. Mirko Kämpf
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 
Data Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudDataData Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudDataWeCloudData
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsTimothy Spann
 

Similar to Parquet and AVRO (20)

Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Oracle Cloud - Infrastruktura jako kód
Oracle Cloud - Infrastruktura jako kódOracle Cloud - Infrastruktura jako kód
Oracle Cloud - Infrastruktura jako kód
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25
 
Apache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Data Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudDataData Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudData
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 

Recently uploaded

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 

Recently uploaded (20)

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 

Parquet and AVRO