Introduction to Hadoop at Data-360 Conference

http://www.packtpub.com/using-cloudera-impala/book
http://www.amazon.com/Simplifying-Windows-Azure-HDInsight-Service/dp/0735673802
https://www.linkedin.com/in/avkashchauhan

Hadoop is an Open Source (Java based), “Scalable”, “fault
tolerant” platform for large amount of unstructured data storage
& processing, distributed across machines.

Flexibility
A Single Repo for storing
and analyzing any kind
of data not bounded by
schema
Scalability
Scale-out architecture
divides workload across
multiple nodes using flexible
distributed file system
Low Cost
Deployed on
commodity
hardware & open
source platform
Fault Tolerant
Continue working
event if node(s) go
down

A system to move computation, where the data is.

Cloudera Impala Hortonworks Tez
Impala uses C++ based in-memory
processing of HDFS data through SQL
like statements to expedite the data
processing
Use cases include user collaborative
filtering, user recommendations,
clustering and classification.

Introduction to Hadoop at Data-360 Conference

Introduction to Hadoop at Data-360 Conference

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (16)

Similar to Introduction to Hadoop at Data-360 Conference

Similar to Introduction to Hadoop at Data-360 Conference (20)

More from Avkash Chauhan

More from Avkash Chauhan (9)

Recently uploaded

Recently uploaded (20)

Introduction to Hadoop at Data-360 Conference

Editor's Notes