Más contenido relacionado La actualidad más candente (20) Similar a Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Collins & Charles Zedlewski, Cloudera (20) Más de Cloudera, Inc. (20) Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Collins & Charles Zedlewski, Cloudera2. In the beginning
CORE HADOOP COMPONENTS
Hadoop was a platform for data
storage and processing that is… Hadoop MapReduce
Distributed File
Scalable System (HDFS)
Fault tolerant
Open source File Sharing & Data
Protection Across
Distributed Computing
Across Physical Servers
Physical Servers
Flexibility Scalability Low Cost
A single repository for storing Scale-out architecture divides Can be deployed on commodity
processing & analyzing any type of workloads across multiple nodes hardware
data Flexible file system eliminates ETL Open source platform guards
Not bound by a single schema bottlenecks against vendor lock
2 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
3. A good start
Apache Hadoop
Shell / CLI
Data Processing Resource Management
File storage
Formats RPC Compression
3 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
4. Core use cases
• Data processing
– Search index building
– Click sessionization
4 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
5. We were here
100% 100%
Core
Hadoop 58%
37% 37% 31%
as % of
New
Patches
2006 2007 2008 2009 2010 2011
• Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop
• HBase • HBase • HBase • HBase
• Zookeeper • Pig • Pig • Pig
• Mahout • Zookeeper • Zookeeper • Zookeeper
• Mahout • Mahout • Mahout
• Hive • Hive • Hive
Relevant • Avro • Avro
Projects • Whirr • Whirr
• Sqoop • Sqoop
• HCatalog
• Mrunit
• Bigtop
• Oozie
5 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
6. First cut at the system
Shell / CLI
Languages Libraries Workflow
Data Processing Resource Management
Metadata storage
Record storage
File storage
Coordination
Formats RPC Compression
6 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
7. Underlying projects & communities
Apache Pig,
Apache Hadoop
Hive, Mahout
Shell / CLI
Languages Libraries Workflow
Apache Hive Data Processing Resource Management
Metadata storage
Record storage
File storage
Apache Coordination
HBase
Formats RPC Compression
Apache Apache
Zookeeper Avro
7 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
8. Core use cases
• Data processing
– Search index building
– Click sessionization
– Data processing pipelines
• Analytics
– Machine learning
– Batch reporting
• Live content serving (for the braver folks)
8 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
9. We were here
100% 100%
Core
Hadoop 58%
37% 37% 31%
as % of
New
Patches
2006 2007 2008 2009 2010 2011
• Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop
• HBase • HBase • HBase • HBase
• Zookeeper • Pig • Pig • Pig
• Mahout • Zookeeper • Zookeeper • Zookeeper
• Mahout • Mahout • Mahout
• Hive • Hive • Hive
Relevant • Avro • Avro
Projects • Whirr • Whirr
• Sqoop • Sqoop
• HCatalog
• Mrunit
• Bigtop
• Oozie
9 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
10. Where we are today
Web Shell / CLI Drivers
Files
Languages Libraries Workflow Scheduling
Data Processing Resource Management
Integration
Metadata storage
RDBMS
Record storage
File storage
Logs & Coordination
events
Formats RPC Authentication Compression
10 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
11. Where we are today
Hue Apache Pig, Apache JDBC /
Apache Hadoop
Hive, Mahout Oozie ODBC
Apache
Sqoop
Web Shell / CLI Drivers
Files
Languages Libraries Workflow Scheduling
Apache Hive, Data Processing Resource Management
Integration
HCatalog
Metadata storage
RDBMS
Record storage
File storage
Apache Logs & Coordination
HBase events
Formats RPC Authentication Compression
Apache Apache Apache
Flume Zookeeper Avro
11 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
12. Core use cases
• Data processing
– Search index building
– Click sessionization
– Data processing pipelines
• Analytics
– Machine learning
– Batch reporting
• Real time applications
– Content serving
– System management
– Real-time aggregates & counters
• Storage
– EDW archive
12 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
13. Current state
100% 100%
Core
Hadoop 58%
37% 37% 31%
as % of
New
Patches
2006 2007 2008 2009 2010 2011
• Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop
• HBase • HBase • HBase • HBase
• Zookeeper • Pig • Pig • Pig
• Mahout • Zookeeper • Zookeeper • Zookeeper
• Mahout • Mahout • Mahout
• Hive • Hive • Hive
Relevant • Avro • Avro
Projects • Whirr • Whirr
• Sqoop • Sqoop
• HCatalog
• Mrunit
• Bigtop
• Oozie
13 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
14. Limitations
Redundancy - DAG, RPC, serialization, integration, etc.
Uniformity - diff components require diff DBs, mgt interfaces,
etc.
Ease of use - improving but still an obstacle. Eg non-native
file formats require integration.
Multi-datacenter - cross-DC repl. for HBase but not HDFS.
Interoperability - requires conversions, end-user integration.
14 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
15. Ongoing work
Metadata repos - shared schema and data
types, table abstraction via Apache HCat
(incubating) and Apache Hive.
Self-describing data via Apache Avro.
15 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
16. Ongoing work: Apache Bigtop
Dedicated to Hadoop stack integration and testing.
Integration - between projects, dependencies, hosts.
Testing - interoperability, multi-component use cases.
100% Apache projects, using upstream releases.
Participants across the ecosystem - join us!
http://incubator.apache.org/bigtop
16 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
17. Technical trends - software
• Moving more forms of computation to
Hadoop storage
• Frameworks to make HBase more
application and developer friendly
• Taking advantage of pluggability to provide
more optimized formats, schedulers,
codecs, etc
• More granular security models
17 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
18. Technical trends - hardware
•Increasingly powerful hosts
l# cores and memory
lNetwork - 10/40 gige
lStorage - 48/60 TB hosts. Flash.
•Cloud - multi-tenancy and virtualization
•Low power CPUs
18 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
19. Enable future use cases pt 1
More valuable data
•Cost = gravity. Data flows downhill to cheapest store.
•High-value data not just generated but also consumed by
the platform ie more processing is done within the system
before leaving.
Richer end user applications
•Apps built directly on the platform (eBay’s Cassini,
Facebook messages, etc)
•Web 3.0 – data centric apps. Apps move over common
data sources vs tightly coupled to their data.
19 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
20. Enable future use cases pt 2
Lower latency / higher interactivity
•Low latency response times for applications
•Interactive - human-driven, correlated access, eg
analytics
•Low latency query execution and in-memory
datasets.
•Resource management - batch and interactive
workloads
20 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
21. Enable future use cases pt 3
Hadoop meets ILM
Policy - access control, std mgt interfaces, SLAs. MDM,
etc.
Operation - disaster recovery, archive, etc.
Traditional features - availability, snapshots, mirroring,
ACLs, integration via standard protocols.
21 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
22. Things to look forward to
Web Shell / CLI Drivers
Files Languages Libraries Workflow Scheduling
MapReduce Stream Graph MPI Other
Resource Management
Integration
Metadata storage
RDBMS Time Series ORM OLAP OLTP
Record storage
File storage
Logs & Coordination
events
Formats RPC Authentication Compression
22 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
23. Getting crowded…
Hue Apache Pig, Apache S4 X-Rime Apache JDBC /
Apache Hadoop
Hive, Mahout Storm Giraph Oozie ODBC
Web Shell / CLI Drivers
Apache
Sqoop Files Languages Libraries Workflow Scheduling
MapReduce Stream Graph MPI Other
Resource Management
Integration
Apache Hive, Metadata storage
HCatalog
RDBMS Time Series ORM OLAP OLTP
OpenTSDB Record storage
File storage
Apache Logs & Coordination
HBase events
Formats RPC Authentication Compression
Apache Apache Apache Apache
Flume Zookeeper Avro Gora Omid
23 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
24. We appreciate your time and
interest in
For Additional Information:
+1 (888) 789-1488 twitter.com/
cloudera
sales@cloudera.com cloudera.com
facebook.com/
cloudera
24 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.