If you also got the Big Data itch, here is something to ease the pain :-)
Answers to this questions will be available soon (more info in the attached link)
Which Big Data Appliance should YOU use?
(click on the attached link for Poll results)
Appliances are Small and Quick, Right?
Revealing the 6 Types of Big Data Appliances
Uncovering the Main Players
Challenges, Pitfalls, and Winning the Big Data Game
Where is all this leading YOU to?
2. Piranha vs. Mammoth
Predator Appliances
chew up BIG DATA
• Appliances are Small and Quick, Right?
• Revealing the 6 Types of Big Data Appliances
• Uncovering the Main Players
• Which Big Data Appliance should YOU use?
• Challenges, Pitfalls, and Winning the Big Data
Game
• Where is all this leading YOU to?
7. How deep have you dived into Big
Data?
A. Just starting to learn it
B. Learning a lot, nothing done yet
C. Planning a Big Data Project
D. Running a Big Data Operation
E. I don't get it Yet! What's all the fuss about it?
10. So what makes a great appliance?
1. Does the job – no more, no less
2. Quick and simple setup
3. Quick and easy updates
4. Easy control of one or many instances
5. Simple Infrastructure requirements
6. Reliable underlying system
7. No delays doing it’s job
8. What else?
11. What’s the most important Job for a
Great Appliance? (Poll #2)
12. What’s the most important Job for a
Great Appliance? (Poll #2)
A. Does the job on time – no more, no less
B. Quick and simple Setup and Updates
C. Easy control of one or many instances
D. Simple Infrastructure requirements
E. Reliable underlying system
15. What is the job for your Big Data
Appliance?
1. Extend your Existing Data Warehouse to include Non-
Structured Data?
2. Discover new types of insights to Increase Innovation
3. Run a pilot to verify it is worth it
4. Process more (types of) Data
5. Process Data faster
6. Process Data cheaper
7. Static or Continuous Analysis of Data
8. Flexibility and Lock-In prevention (yes, sure :-)) - Hadoop
9. Turn Operational Data into Assets
10. Break Data Silo barriers
11. Stick to existing Data vendors or work with new ones
17. Revealing the 6 Types of Big Data
Appliances
• Hadoop Engine - Software Based Appliance
• Data Warehouse Hardware Engine + API to
Hadoop / Analytics
• Hardware Storage “Only”
• Software Based Appliance,
Compatible to Hadoop
• Cloud based VMs + Hadoop Engine
• Cloud Based API with Hooks to Hadoop
18. What type of Big Data Appliance will
you use? (Poll #3)
19. What type of Big Data Appliance will
you use? (Poll #3)
A. Hadoop Engine or Compatible - Software
Based
B. Data Warehouse Hardware Engine + API to
Hadoop / Analytics
C. Hardware Storage “Only”
D. Cloud based VMs + Hadoop Engine
E. Cloud Based API with Hooks to Hadoop
22. Hadoop Engine - Software Based
Appliances
• Oracle
• Cloudera
• HortonWorks (like Cloudera
Co-Op VMware, Microsoft, TeraData,…))
• MapR Available on Amazon EMR
and Google Compute Engine VMs
• Red Hat Storage 2.0 Beta (Includes
compatibility for Apache Hadoop)
23. Oracle Big Data Appliance
• End goal: Get data into Oracle Database 11g
• Includes open source Hadoop (Now Cloudera)
• Oracle NoSQL Database (JVM DB vs. HDFS!)
• Oracle Loader for Hadoop (more next slide)
• Open source distribution of R
• Oracle Linux + Oracle Java Hot Spot VM
24. Oracle Big Data Appliance
• Oracle Data Integrator + Hadoop API
– Easy upload to HDFS by automating MAP-R
– Validate constraints of Hives
– Add Data to Hives
– Upload to Oracle using Oracle Loader for Hadoop
– Allows query of Hives, using Oracle SQL, via a
“connector” Oracle Table
25. Oracle Big Data Appliance
• Type: Hadoop Engine - Software Based Appliance
• Does the job – See next slide
• Quick and simple setup – Medium (Oracle)
• Quick and easy updates – Medium (Oracle/CDH?)
• Easy control of one or many instances
• Simple Infrastructure requirements – Medium (Oracle)
• Reliable underlying system
• No delays doing it’s job - ?
• What else?
– Great if you’ve got Oracle already
– Add on to Oracle Exadata Hardware / Data Warehouse
26. Oracle Big Data Appliance
• Can do most of the job requirements
• Exceptions:
– Process Data faster – Looks like…
– Process Data cheaper – Oracle is not a cheap
product…
– Flexibility and Lock-In prevention - Medium
27. Cloudera
• Integrated, Tested collection of Open Source
Apache Hadoop (more next slide)
• HDFS is the NOSQL Database...
• Management Console for rapid node deploy
• Free up to X nodes
• Paid Enterprise Subscription, includes support
• Integrated into a bunch of Data software
Giants
28. Cloudera Included Open Source Mods:
• Apache HBase HDFS based tables
• Apache Hive SQL-like language
• Apache Mahout Machine Learning algorithms
• Apache Pig High-level data flow language
• Apache Sqoop Engine integrating with SQLDBs
• Apache Whirr to deploy Hadoop in the cloud
• Hue Browser-based interface for Hadoop
29. Cloudera
• Type: Hadoop Engine - Software Based Appliance
• Does the job – See next slide
• Quick and simple setup – Great once first node set
• Quick and easy updates
• Easy control of one or many instances
• Simple Infrastructure requirements
• Reliable underlying system
• No delays doing it’s job - maybe
• What else?
– Easy to start as a pilot!
– Great for old hardware
30. Cloudera
• Can do most of the job requirements
• Exceptions:
– Process Data faster – depends on allocated
resources
– Process Cheaper – Yes (but cheap HW can be
costly)
– Static or Continuous Analysis – needs more tools
– Endorsement from Huge Players
31. MapR Special Features
(Do You need it?)
• ExpressLane – Small jobs finish quickly (medium)
• Mount / use HDFS over NFS (strategic?)
• NFS, allows data streaming (Important/lock in?)
• Volumes (manage, mirror, snap) – (Important?)
• X times more scalable / faster (lock in?)
• Name Node and Job Tracker HA (claims regular
hadoop has only 1 Name Node) (Medium)
• SW Snapshot/Mirror (Fast? Complex?)
32. Data Warehouse Hardware Engine +
API to Hadoop / Analytics
• TeraData Aster MapR Appliance
• EMC GreenPlum
• IBM Netezza + Cloudera/Hadoop
as part of IBM’s Big Data Solutions Suite
• Cray Big Data Appliance, Urika
(YarcData Division)
33. TeraData Aster MapR Appliance
• Hadoop is not at the front, MAP Reduce is
• Short learning curve, using current DW tool
• MPP is already built in for scale as part of DW
• Reliability and Performance done by HW
• Connectivity (JDBC,ODBC) to Big Data: Cloudera
• Guess Price is higher than Hadoop solutions
• Platform: SuSE Linux
• Aster Data nCluster Amazon AWS Cloud Edition
34. TeraData Aster MapR Appliance
• Type: Data Warehouse Hardware Engine + API to
Hadoop / Analytics
• Does the job – See next slide
• Quick and simple setup –
• Quick and easy updates
• Easy control of one or many instances
• Simple Infrastructure requirements – Specialized
HW…
• Reliable underlying system
• No delays doing it’s job - maybe
35. TeraData Aster MapR Appliance
• Can do most of the job requirements
• Exceptions:
– Run a pilot to verify it is worth it – probably
pricy…unless using the Software / Cloud editions
– Process Data cheaper – probably not so…
– Static or Continuous Analysis of Data – Should Excel!
– Lock-In – probably, not sure how much
• Turn Operational Data into Assets - Should Excel
at this…
36. Hardware Storage “Only”
• DataDirect Networks Big Data Storage
Appliances
• Quantum StorNext Metadata Appliances
37. DataDirect Networks Big Data Storage
Appliances
• “Science Fiction” I/O Performance
– Single Array: 40GB⁄s and 1.4 Million Flash IOPS
– Up to 25 FC/Infiniband hooked arrays: 1TB⁄s +
– More info and pricing
38. Quantum StorNext Metadata
Appliances
• Special additional features:
– Huge file size support
– Huge amount of files support
– Varying Operating System direct access support
39. Hardware Storage “Only”
• Does the job – See next slide
• Quick and simple setup – Once you set the HW
• Quick and easy updates - probably
• Easy control of one or many instances
• Simple Infrastructure requirements – Specialized
HW…
• Reliable underlying system
• No delays doing it’s job
40. Hardware Storage “Only”
• Can do SOME of the job requirements
• Exceptions: Can’t do all those without
additional software
– Run a pilot to verify it is worth it – too costly for a
pilot?
– Process Data faster
– Process Data cheaper
– Flexibility and Lock-In prevention
41. Cloud based VMs + Hadoop Engine
• Amazon Elastic MapReduce
(Amazon EMR)
• Google Compute Engine
42. Amazon Elastic MapReduce
(Amazon EMR)
• Type: Cloud based VMs + Hadoop Engine
• Cost Affective (not always = cheap!)
• Includes Hadoop SW such as MapR including all
MapR advanced SW based File Services
• Easily add or remove nodes
– Pre set VMs
– Easy mass deployment using AWS console
• HA integrated into Amazon S3
• Hadoop Hbase DB as EMR service
43. Google Compute Engine Special
Features
• Type: Cloud based VMs + Hadoop Engine
• Based on CentOS (nice – open…)
• Various disk types (all encrypted, fast)
– Non Persistent (dies with the VM)
– Persistent – shared + snapshots
– Cloud based (looks similar to Amazon S3)
• Cheaper than Amazon?
44. Amazon Elastic MapReduce
(Amazon EMR)
• Does the job – See next slide
• Quick and simple setup
• Quick and easy updates - probably
• Easy control of one or many instances
• Simple Infrastructure requirements
• Reliable underlying system
• No delays doing it’s job
45. Amazon Elastic MapReduce
(Amazon EMR)
• Can do most of the job requirements
• Exceptions:
– Extend your Existing Data Warehouse to include Non-
Structured Data - Your DW out in the cloud …
– Run a pilot to verify it is worth it – Excels at this!
– Process Data faster
– Process Data cheaper
– Static or Continuous Analysis of Data
– Turn Operational Data into Assets
Operational in the Cloud…
46. Cloud Based API with Hooks to
Hadoop
• Google APP Engine Map Reduce
• Microsoft Big Data via Windows Azure
47. Google APP Engine Map Reduce
• open-source library for doing MapReduce on
the Google App Engine platform
• Can process data store entities and blob files
(probably Google Cloud Storage)
• Both in memory and disk operation
• Scale up or down “working threads”
• Python and Java support
• Experimental, still allows a look into the future…
48. Google APP Engine Map Reduce
• Does the job – See next slide
• Quick and simple setup – Once you learn the
API
• Quick and easy updates
• Easy control of one or many instances
• Simple Infrastructure requirements
• Reliable underlying system – still Beta…
• No delays doing it’s job
49. Google APP Engine Map Reduce
• Can do SOME of the job requirements
• Exceptions:
– Extend your Existing Data Warehouse – Cloud Security
and DW
– Run a pilot to verify it is worth it – could be great!
– Process Data faster
– Process Data cheaper
– Static or Continuous Analysis of Data
– Flexibility and Lock-In prevention – Code is open, but
Process may not be
– Turn Operational Data into Assets – Cloud Security…
50. Microsoft Big Data via Windows Azure
• Provides SQL Server Hadoop Connector
Provides ODBC Hadoop connector to tie MS
Office and other Apps to Hadoop Hive
• Seems similar to DW providers who have
connector to Hadoop
– Reason: It is not clear exactly where and how
Azure Cloud Implementation goes…
52. Which Big Data Appliance should YOU
use?
• Let’s look at the Big Data Appliance Job to be
Done and ask questions:
• Where are you and what is your goal?
– So you have some of the puzzle pieces?
– Any constraints?
– Long term vs. Short term?
– (Always start with a Pilot, if this is your first time…)
54. Challenges, Pitfalls, and
Winning the Big Data Game
• You can’t get much of Big Data if you don’t
know how to find useful insights (Lack of Data
Scientists)
• The same abilities you needed for Data
Warehouse digging, you need with Big
Data, even more
• Commoditization of the data warehouse
(hadoop + Cloud) = More players and
innovation
55. Challenges, Pitfalls, and
Winning the Big Data Game
• You can’t make use of it, if you lack innovative
quick agile abilities to change direction and
respond on time
• Privacy (implied and specific)
• Security (implied and specific)
• To pay cheap (many X86 nodes) you need Mass
Node Management APP
• Big DW Vendors embrace hadoop through
solution providers such as Cloudera and
HortonWorks, but it “feels” a bit “vague”
57. Where is all this leading YOU to?
• The Simple Stuff (I know it looks complicated)
– Crunching More and Faster for Less
– Optimizing the Process and Utilizing the right Tools
• The real challenge: Turning Data into an Asset
– Finding: The Golden Nuggets
– Deciding: What should I do now?
– Pitching and leading: The Transformation
• Big Data does not mean Endless Capacity…
• Don’t get lost in the Technology Play Ground
58. Q&A Soon…But First,
I need Your Help now…
1. Please rate the Webinar
2. Download the resource attachments for
future use
3. Register to my channel on BrightTalk
4. Spread the word
5. Have fun with Big Data and Enjoy Life
60. Reminder…
1. Please rate the Webinar
2. Download the resource attachments for
future use
3. Register to my channel on BrightTalk
4. Spread the word
5. Have fun with Big Data and Enjoy Life