More Related Content Similar to Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides (20) More from Cloudera, Inc. (20) Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides1. March 13, 2011
From Zero to Big Data Answers in Less Than
an Hour
Daniel Templeton | Cloudera Manager, Partner Program Adoption
Richard Guth | Karmasphere Chief Marketing Officer
2. The ‘Big Data’ Phenomenon
Big Data Drivers: More Content More Devices
The proliferation of data capture
and creation technologies
Increased “interconnectedness”
drives consumption (creating
more data) More New & Better
Consumption Information
Inexpensive storage makes it
possible to keep more, longer
Innovative software and analysis
tools turn data into information
Every gigabyte of stored content can generate
Big Data encompasses not a petabyte or more of transient data*
only the content itself, but
how it’s consumed. The information about you is much greater
than the information you create
*Source: IDC 2011
2
©2011 Cloudera, Inc. All Rights Reserved.
3. What is Apache Hadoop?
CORE HADOOP COMPONENTS
Apache Hadoop is a platform for
data storage and processing that is… Hadoop
Distributed File
Scalable System (HDFS) MapReduce
Fault tolerant
Open source File Sharing & Data
Protection Across
Distributed Computing
Across Physical Servers
Physical Servers
Has the Flexibility to Store Excels at Scales
and Mine Any Type of Data Processing Complex Data Economically
Ask questions across structured and Scale-out architecture divides Can be deployed on commodity
unstructured data that were previously workloads across multiple nodes hardware
impossible to ask or solve Flexible file system eliminates ETL Open source platform guards
Not bound by a single schema bottlenecks against vendor lock
3
©2011 Cloudera, Inc. All Rights Reserved.
4. Who Is Cloudera?
The trusted leader in We make Hadoop Unrivaled knowledge Strong executive
Apache Hadoop. enterprise-easy. and experience. team with proven
abilities.
Package the #1 A distribution of Apache Founders, committers and
distribution of Apache Hadoop that is contributors to Apache
Mike Olson Amr Awadallah
Hadoop in commercial and tested, certified and Hadoop and related CEO VP, Engineering
non-commercial supported projects Kirk Dunn Mary Rorabaugh
environments COO VP, Finance
A suite of management A wealth of experience in Jeff Charles
Roadmap control or software for Hadoop the design and delivery of Hammerbacher Zedlewski
Chief Scientist VP, Products
influence over all Apache administrators enterprise software
Doug Cutting Omer Trajman
Hadoop-related projects Chief Architect VP, Customer
Training and certification Solutions
Top contributor to the programs
Apache ecosystem overall
Comprehensive support
Tens of thousands of nodes and consulting services
under management
4
©2011 Cloudera, Inc. All Rights Reserved.
5. CDH Overview
The #1 commercial and non-commercial
Apache Hadoop distribution.
Complete, Integrated Hadoop Stack CDH Components
Apache Hadoop – reliable, scalable distributed computing
File System Mount UI Framework SDK
FUSE-DFS HUE HUE SDK
Apache Hive – SQL-like language and metadata repository
Apache Pig – High level language for expressing data analysis
programs
Workflow Scheduling Metadata
APACHE OOZIE APACHE OOZIE APACHE HIVE Apache HBase – Hadoop database for random, real-time read/write
access
Apache Zookeeper – Highly reliable distributed coordination service
Languages / Compilers
Data APACHE PIG, APACHE HIVE Fast Apache Flume* – Distributed service for collecting and aggregating
Read/Write log and event data
Integration
Access
Apache Whirr* – Library for running Hadoop in the cloud
APACHE FLUME,
APACHE SQOOP APACHE HBASE Apache Sqoop* – Integrating Hadoop with RDBMS
Apache Oozie* – Server-based workflow engine for Hadoop Activities
Coordination
APACHE ZOOKEEPER Fuse-DFS – Module within Hadoop for mounting HDFS as a traditional
file system
Hue – Browser-based desktop interface for interacting with Hadoop
* Currently undergoing Incubation at the Apache Software Foundation.
5
©2011 Cloudera, Inc. All Rights Reserved.
6. Cloudera Enterprise
Cloudera Enterprise makes CLOUDERA ENTERPRISE COMPONENTS
open source Hadoop enterprise-easy
Simplify and Accelerate Hadoop Deployment
Cloudera Production-Level
Manager Support
Reduce Adoption Costs and Risks
Lower the Cost of Administration
End-to-End Management Our Team of Experts On-
Increase Transparency and Control Over Hadoop Application for Apache Call to Help You Meet
Hadoop Your SLAs
Leverage the Experience of Our Experts
EFFECTIVENESS EFFICIENCY
Ensuring You Enabling You to
Get Value From Your Hadoop Deployment Affordably Run Hadoop in Production
6
©2011 Cloudera, Inc. All Rights Reserved.
7. Big Data Intelligence Applications
for Enterprise Data Professionals
www.karmasphere.com
7 © Karmasphere 2012 All rights reserved
8. About Karmasphere
Company
Pure-play, singularly focused on Big Data
Intelligence and Analytics on Hadoop and
NoSQL, in the cloud and on-premise.
Engineering Expertise
Hadoop, analytics, web
analytics, business
intelligence, visualizations, programming
languages, compilers, architecture, mathe
matics, database
Management Experience
Google, Yahoo, Ask, Ning, Omniture, BEA,
Oracle, Sybase, Actuate, Apple, Zend, Intel
, BMC, Spotfire
8 © Karmasphere 2012 All rights reserved
9. Karmasphere Mission
Provide an EASY way to find
INSIGHTS in Big Data to transform business
Upcoming Skills Shortage
“By 2018, the United States alone could face a shortage of 140,000
to 190,000 people with deep analytical skills as well as 1.5 million
managers and analysts with the know-how to use the analysis of
big data to make effective decisions”
“Big Data: the next frontier for innovation, competition and productivity”
McKinsey, May 2011
9 © Karmasphere 2012 All rights reserved
11. From Zero to Answers in 60 Minutes
DEMO
Our Process Marketing Analyst for Retail Chain
• Access any cloud or on- 1 Connect to the preconfigured
premise Cloudera CDH Cloudera CDH cluster
• Assemble and organize 2 Access our structured point of sale
unstructured and transactions data and bring up
structured data in transactional data for lunch meals
Hadoop
3 Correlate results with unstructured
• Analyze the data using
social media data to get some insight
familiar SQL
on our buyers and buying behavior
4 Infer from these results on
underperforming stores and come up
with an action plan to increase sales
for these stores
11 © Karmasphere 2012 All rights reserved
13. From Zero to Big Data Answers in Less Than an Hour
The webinar recording will be made available shortly at:
• https://www1.gotomeeting.com/register/890391584
Contact Information:
• info@cloudera.com
• 1 (888) 789-1488
• info@karmasphere.com
• 1 (650) 292-6100
13 © Karmasphere 2012 All rights reserved
©2011 Cloudera, Inc. All 13