Piranha vs. mammoth predator appliances that chew up big data

Piranha vs. Mammoth
Predator Appliances
chew up BIG DATA

Piranha vs. Mammoth
Predator Appliances
chew up BIG DATA
• Appliances are Small and Quick, Right?
• Revealing the 6 Types of Big Data Appliances
• Uncovering the Main Players
• Which Big Data Appliance should YOU use?
• Challenges, Pitfalls, and Winning the Big Data
Game
• Where is all this leading YOU to?

Appliances are Small and Quick, Right?

Well, in some cases.
But, Big Data Appliance can be…
BIG…

Quantum StorNext M330 Presented on YouTube
http://www.youtube.com/watch?v=X1IZpoyHxlY

So what makes a great appliance?

But first, let’s get to know You
(Big Data Appliance Poll #1…)

How deep have you dived into Big
Data?
A. Just starting to learn it
B. Learning a lot, nothing done yet
C. Planning a Big Data Project
D. Running a Big Data Operation
E. I don't get it Yet! What's all the fuss about it?

So what makes a great appliance?
1. Does the job – no more, no less
2. Quick and simple setup
3. Quick and easy updates
4. Easy control of one or many instances
5. Simple Infrastructure requirements
6. Reliable underlying system
7. No delays doing it’s job
8. What else?

What’s the most important Job for a
Great Appliance? (Poll #2)

What’s the most important Job for a
Great Appliance? (Poll #2)
A. Does the job on time – no more, no less
B. Quick and simple Setup and Updates
C. Easy control of one or many instances
D. Simple Infrastructure requirements
E. Reliable underlying system

What is the job for your Big Data
Appliance?

What is the job for your Big Data
Appliance?
1. Extend your Existing Data Warehouse to include Non-
Structured Data?
2. Discover new types of insights to Increase Innovation
3. Run a pilot to verify it is worth it
4. Process more (types of) Data
5. Process Data faster
6. Process Data cheaper
7. Static or Continuous Analysis of Data
8. Flexibility and Lock-In prevention (yes, sure :-)) - Hadoop
9. Turn Operational Data into Assets
10. Break Data Silo barriers
11. Stick to existing Data vendors or work with new ones

Revealing the 6 Types of Big Data
Appliances

Revealing the 6 Types of Big Data
Appliances
• Hadoop Engine - Software Based Appliance
• Data Warehouse Hardware Engine + API to
Hadoop / Analytics
• Hardware Storage “Only”
• Software Based Appliance,
Compatible to Hadoop
• Cloud based VMs + Hadoop Engine
• Cloud Based API with Hooks to Hadoop

What type of Big Data Appliance will
you use? (Poll #3)

What type of Big Data Appliance will
you use? (Poll #3)
A. Hadoop Engine or Compatible - Software
Based
B. Data Warehouse Hardware Engine + API to
Hadoop / Analytics
C. Hardware Storage “Only”
D. Cloud based VMs + Hadoop Engine
E. Cloud Based API with Hooks to Hadoop

Uncovering (some of) the Main Players

Hadoop Engine - Software Based
Appliances
• Oracle
• Cloudera
• HortonWorks (like Cloudera
Co-Op VMware, Microsoft, TeraData,…))
• MapR Available on Amazon EMR
and Google Compute Engine VMs
• Red Hat Storage 2.0 Beta (Includes
compatibility for Apache Hadoop)

Oracle Big Data Appliance
• End goal: Get data into Oracle Database 11g
• Includes open source Hadoop (Now Cloudera)
• Oracle NoSQL Database (JVM DB vs. HDFS!)
• Oracle Loader for Hadoop (more next slide)
• Open source distribution of R
• Oracle Linux + Oracle Java Hot Spot VM

• Oracle Data Integrator + Hadoop API
– Easy upload to HDFS by automating MAP-R
– Validate constraints of Hives
– Add Data to Hives
– Upload to Oracle using Oracle Loader for Hadoop
– Allows query of Hives, using Oracle SQL, via a
“connector” Oracle Table

• Type: Hadoop Engine - Software Based Appliance
• Does the job – See next slide
• Quick and simple setup – Medium (Oracle)
• Quick and easy updates – Medium (Oracle/CDH?)
• Easy control of one or many instances
• Simple Infrastructure requirements – Medium (Oracle)
• Reliable underlying system
• No delays doing it’s job - ?
• What else?
– Great if you’ve got Oracle already
– Add on to Oracle Exadata Hardware / Data Warehouse

• Can do most of the job requirements
• Exceptions:
– Process Data faster – Looks like…
– Process Data cheaper – Oracle is not a cheap
product…
– Flexibility and Lock-In prevention - Medium

Cloudera
• Integrated, Tested collection of Open Source
Apache Hadoop (more next slide)
• HDFS is the NOSQL Database...
• Management Console for rapid node deploy
• Free up to X nodes
• Paid Enterprise Subscription, includes support
• Integrated into a bunch of Data software
Giants

Cloudera Included Open Source Mods:
• Apache HBase HDFS based tables
• Apache Hive SQL-like language
• Apache Mahout Machine Learning algorithms
• Apache Pig High-level data flow language
• Apache Sqoop Engine integrating with SQLDBs
• Apache Whirr to deploy Hadoop in the cloud
• Hue Browser-based interface for Hadoop

Cloudera
• Type: Hadoop Engine - Software Based Appliance
• Quick and simple setup – Great once first node set
• Quick and easy updates
• Simple Infrastructure requirements
• No delays doing it’s job - maybe
• What else?
– Easy to start as a pilot!
– Great for old hardware

Cloudera
• Exceptions:
– Process Data faster – depends on allocated
resources
– Process Cheaper – Yes (but cheap HW can be
costly)
– Static or Continuous Analysis – needs more tools
– Endorsement from Huge Players

MapR Special Features
(Do You need it?)
• ExpressLane – Small jobs finish quickly (medium)
• Mount / use HDFS over NFS (strategic?)
• NFS, allows data streaming (Important/lock in?)
• Volumes (manage, mirror, snap) – (Important?)
• X times more scalable / faster (lock in?)
• Name Node and Job Tracker HA (claims regular
hadoop has only 1 Name Node) (Medium)
• SW Snapshot/Mirror (Fast? Complex?)

Data Warehouse Hardware Engine +
API to Hadoop / Analytics
• TeraData Aster MapR Appliance
• EMC GreenPlum
• IBM Netezza + Cloudera/Hadoop
as part of IBM’s Big Data Solutions Suite
• Cray Big Data Appliance, Urika
(YarcData Division)

TeraData Aster MapR Appliance
• Hadoop is not at the front, MAP Reduce is
• Short learning curve, using current DW tool
• MPP is already built in for scale as part of DW
• Reliability and Performance done by HW
• Connectivity (JDBC,ODBC) to Big Data: Cloudera
• Guess Price is higher than Hadoop solutions
• Platform: SuSE Linux
• Aster Data nCluster Amazon AWS Cloud Edition

• Type: Data Warehouse Hardware Engine + API to
Hadoop / Analytics
• Quick and simple setup –
• Simple Infrastructure requirements – Specialized
HW…
• No delays doing it’s job - maybe

• Exceptions:
– Run a pilot to verify it is worth it – probably
pricy…unless using the Software / Cloud editions
– Process Data cheaper – probably not so…
– Static or Continuous Analysis of Data – Should Excel!
– Lock-In – probably, not sure how much
• Turn Operational Data into Assets - Should Excel
at this…

Hardware Storage “Only”
• DataDirect Networks Big Data Storage
Appliances

• Quantum StorNext Metadata Appliances

DataDirect Networks Big Data Storage
Appliances
• “Science Fiction” I/O Performance
– Single Array: 40GB⁄s and 1.4 Million Flash IOPS
– Up to 25 FC/Infiniband hooked arrays: 1TB⁄s +
– More info and pricing

Quantum StorNext Metadata
Appliances
• Special additional features:
– Huge file size support
– Huge amount of files support
– Varying Operating System direct access support

• Quick and simple setup – Once you set the HW
• Quick and easy updates - probably
• Simple Infrastructure requirements – Specialized
HW…
• No delays doing it’s job

• Can do SOME of the job requirements
• Exceptions: Can’t do all those without
additional software
– Run a pilot to verify it is worth it – too costly for a
pilot?
– Process Data faster
– Process Data cheaper
– Flexibility and Lock-In prevention

Cloud based VMs + Hadoop Engine

• Amazon Elastic MapReduce
(Amazon EMR)

• Google Compute Engine

Amazon Elastic MapReduce
(Amazon EMR)
• Type: Cloud based VMs + Hadoop Engine
• Cost Affective (not always = cheap!)
• Includes Hadoop SW such as MapR including all
MapR advanced SW based File Services
• Easily add or remove nodes
– Pre set VMs
– Easy mass deployment using AWS console
• HA integrated into Amazon S3
• Hadoop Hbase DB as EMR service

Google Compute Engine Special
Features
• Type: Cloud based VMs + Hadoop Engine
• Based on CentOS (nice – open…)
• Various disk types (all encrypted, fast)
– Non Persistent (dies with the VM)
– Persistent – shared + snapshots
– Cloud based (looks similar to Amazon S3)
• Cheaper than Amazon?

(Amazon EMR)
• Quick and simple setup
• Quick and easy updates - probably

(Amazon EMR)
• Exceptions:
– Extend your Existing Data Warehouse to include Non-
Structured Data - Your DW out in the cloud …
– Run a pilot to verify it is worth it – Excels at this!
– Static or Continuous Analysis of Data
– Turn Operational Data into Assets
Operational in the Cloud…

Cloud Based API with Hooks to
Hadoop
• Google APP Engine Map Reduce

• Microsoft Big Data via Windows Azure

Google APP Engine Map Reduce
• open-source library for doing MapReduce on
the Google App Engine platform
• Can process data store entities and blob files
(probably Google Cloud Storage)
• Both in memory and disk operation
• Scale up or down “working threads”
• Python and Java support
• Experimental, still allows a look into the future…

• Quick and simple setup – Once you learn the
API
• Reliable underlying system – still Beta…

• Can do SOME of the job requirements
• Exceptions:
– Extend your Existing Data Warehouse – Cloud Security
and DW
– Run a pilot to verify it is worth it – could be great!
– Static or Continuous Analysis of Data
– Flexibility and Lock-In prevention – Code is open, but
Process may not be
– Turn Operational Data into Assets – Cloud Security…

Microsoft Big Data via Windows Azure

• Provides SQL Server Hadoop Connector
Provides ODBC Hadoop connector to tie MS
Office and other Apps to Hadoop Hive
• Seems similar to DW providers who have
connector to Hadoop
– Reason: It is not clear exactly where and how
Azure Cloud Implementation goes…

Which Big Data Appliance should YOU
use?

Which Big Data Appliance should YOU
use?
• Let’s look at the Big Data Appliance Job to be
Done and ask questions:
• Where are you and what is your goal?
– So you have some of the puzzle pieces?
– Any constraints?
– Long term vs. Short term?
– (Always start with a Pilot, if this is your first time…)

Challenges, Pitfalls, and
Winning the Big Data Game

• You can’t get much of Big Data if you don’t
know how to find useful insights (Lack of Data
Scientists)
• The same abilities you needed for Data
Warehouse digging, you need with Big
Data, even more
• Commoditization of the data warehouse
(hadoop + Cloud) = More players and
innovation

• You can’t make use of it, if you lack innovative
quick agile abilities to change direction and
respond on time
• Privacy (implied and specific)
• Security (implied and specific)
• To pay cheap (many X86 nodes) you need Mass
Node Management APP
• Big DW Vendors embrace hadoop through
solution providers such as Cloudera and
HortonWorks, but it “feels” a bit “vague”

Where is all this leading YOU to?

Where is all this leading YOU to?
• The Simple Stuff (I know it looks complicated)
– Crunching More and Faster for Less
– Optimizing the Process and Utilizing the right Tools
• The real challenge: Turning Data into an Asset
– Finding: The Golden Nuggets
– Deciding: What should I do now?
– Pitching and leading: The Transformation
• Big Data does not mean Endless Capacity…
• Don’t get lost in the Technology Play Ground

Q&A Soon…But First,
I need Your Help now…

1. Please rate the Webinar
2. Download the resource attachments for
future use
3. Register to my channel on BrightTalk
4. Spread the word
5. Have fun with Big Data and Enjoy Life 

Reminder…
1. Please rate the Webinar
2. Download the resource attachments for
future use
3. Register to my channel on BrightTalk
4. Spread the word
5. Have fun with Big Data and Enjoy Life 

Piranha vs. mammoth predator appliances that chew up big data

Piranha vs. mammoth predator appliances that chew up big data

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Piranha vs. mammoth predator appliances that chew up big data

Similar a Piranha vs. mammoth predator appliances that chew up big data (20)

Último

Último (20)

Piranha vs. mammoth predator appliances that chew up big data