SlideShare una empresa de Scribd logo
1 de 61
Piranha vs. Mammoth
 Predator Appliances
  chew up BIG DATA
Piranha vs. Mammoth
           Predator Appliances
            chew up BIG DATA
• Appliances are Small and Quick, Right?
• Revealing the 6 Types of Big Data Appliances
• Uncovering the Main Players
• Which Big Data Appliance should YOU use?
• Challenges, Pitfalls, and Winning the Big Data
  Game
• Where is all this leading YOU to?
Appliances are Small and Quick, Right?
Well, in some cases.
But, Big Data Appliance can be…
              BIG…




    Quantum StorNext M330 Presented on YouTube
    http://www.youtube.com/watch?v=X1IZpoyHxlY
So what makes a great appliance?
But first, let’s get to know You
(Big Data Appliance Poll #1…)
How deep have you dived into Big
                 Data?
A.   Just starting to learn it
B.   Learning a lot, nothing done yet
C.   Planning a Big Data Project
D.   Running a Big Data Operation
E.   I don't get it Yet! What's all the fuss about it?
Results…
So what makes a great appliance?
So what makes a great appliance?
1.   Does the job – no more, no less
2.   Quick and simple setup
3.   Quick and easy updates
4.   Easy control of one or many instances
5.   Simple Infrastructure requirements
6.   Reliable underlying system
7.   No delays doing it’s job
8.   What else?
What’s the most important Job for a
    Great Appliance? (Poll #2)
What’s the most important Job for a
     Great Appliance? (Poll #2)
A.   Does the job on time – no more, no less
B.   Quick and simple Setup and Updates
C.   Easy control of one or many instances
D.   Simple Infrastructure requirements
E.   Reliable underlying system
Results…
What is the job for your Big Data
          Appliance?
What is the job for your Big Data
               Appliance?
1. Extend your Existing Data Warehouse to include Non-
    Structured Data?
2. Discover new types of insights to Increase Innovation
3. Run a pilot to verify it is worth it
4. Process more (types of) Data
5. Process Data faster
6. Process Data cheaper
7. Static or Continuous Analysis of Data
8. Flexibility and Lock-In prevention (yes, sure :-)) - Hadoop
9. Turn Operational Data into Assets
10. Break Data Silo barriers
11. Stick to existing Data vendors or work with new ones
Revealing the 6 Types of Big Data
           Appliances
Revealing the 6 Types of Big Data
              Appliances
• Hadoop Engine - Software Based Appliance
• Data Warehouse Hardware Engine + API to
  Hadoop / Analytics
• Hardware Storage “Only”
• Software Based Appliance,
  Compatible to Hadoop
• Cloud based VMs + Hadoop Engine
• Cloud Based API with Hooks to Hadoop
What type of Big Data Appliance will
        you use? (Poll #3)
What type of Big Data Appliance will
         you use? (Poll #3)
A. Hadoop Engine or Compatible - Software
   Based
B. Data Warehouse Hardware Engine + API to
   Hadoop / Analytics
C. Hardware Storage “Only”
D. Cloud based VMs + Hadoop Engine
E. Cloud Based API with Hooks to Hadoop
Results…
Uncovering (some of) the Main Players
Hadoop Engine - Software Based
            Appliances
• Oracle
• Cloudera
• HortonWorks             (like Cloudera
  Co-Op VMware, Microsoft, TeraData,…))
• MapR              Available on Amazon EMR
  and Google Compute Engine VMs
• Red Hat Storage 2.0 Beta (Includes
  compatibility for Apache Hadoop)
Oracle Big Data Appliance
•   End goal: Get data into Oracle Database 11g
•   Includes open source Hadoop (Now Cloudera)
•   Oracle NoSQL Database (JVM DB vs. HDFS!)
•   Oracle Loader for Hadoop (more next slide)
•   Open source distribution of R
•   Oracle Linux + Oracle Java Hot Spot VM
Oracle Big Data Appliance
• Oracle Data Integrator + Hadoop API
  – Easy upload to HDFS by automating MAP-R
  – Validate constraints of Hives
  – Add Data to Hives
  – Upload to Oracle using Oracle Loader for Hadoop
  – Allows query of Hives, using Oracle SQL, via a
    “connector” Oracle Table
Oracle Big Data Appliance
•   Type: Hadoop Engine - Software Based Appliance
•   Does the job – See next slide
•   Quick and simple setup – Medium (Oracle)
•   Quick and easy updates – Medium (Oracle/CDH?)
•   Easy control of one or many instances
•   Simple Infrastructure requirements – Medium (Oracle)
•   Reliable underlying system
•   No delays doing it’s job - ?
•   What else?
    – Great if you’ve got Oracle already
    – Add on to Oracle Exadata Hardware / Data Warehouse
Oracle Big Data Appliance
• Can do most of the job requirements
• Exceptions:
  – Process Data faster – Looks like…
  – Process Data cheaper – Oracle is not a cheap
    product…
  – Flexibility and Lock-In prevention - Medium
Cloudera
• Integrated, Tested collection of Open Source
  Apache Hadoop (more next slide)
• HDFS is the NOSQL Database...
• Management Console for rapid node deploy
• Free up to X nodes
• Paid Enterprise Subscription, includes support
• Integrated into a bunch of Data software
  Giants
Cloudera Included Open Source Mods:
•   Apache HBase HDFS based tables
•   Apache Hive SQL-like language
•   Apache Mahout Machine Learning algorithms
•   Apache Pig High-level data flow language
•   Apache Sqoop Engine integrating with SQLDBs
•   Apache Whirr to deploy Hadoop in the cloud
•   Hue Browser-based interface for Hadoop
Cloudera
•   Type: Hadoop Engine - Software Based Appliance
•   Does the job – See next slide
•   Quick and simple setup – Great once first node set
•   Quick and easy updates
•   Easy control of one or many instances
•   Simple Infrastructure requirements
•   Reliable underlying system
•   No delays doing it’s job - maybe
•   What else?
    – Easy to start as a pilot!
    – Great for old hardware
Cloudera
• Can do most of the job requirements
• Exceptions:
  – Process Data faster – depends on allocated
    resources
  – Process Cheaper – Yes (but cheap HW can be
    costly)
  – Static or Continuous Analysis – needs more tools
  – Endorsement from Huge Players
MapR Special Features
            (Do You need it?)
• ExpressLane – Small jobs finish quickly (medium)
• Mount / use HDFS over NFS (strategic?)
• NFS, allows data streaming (Important/lock in?)
• Volumes (manage, mirror, snap) – (Important?)
• X times more scalable / faster (lock in?)
• Name Node and Job Tracker HA (claims regular
  hadoop has only 1 Name Node) (Medium)
• SW Snapshot/Mirror (Fast? Complex?)
Data Warehouse Hardware Engine +
      API to Hadoop / Analytics
• TeraData Aster MapR Appliance
• EMC GreenPlum
• IBM Netezza                  + Cloudera/Hadoop
   as part of IBM’s Big Data Solutions Suite
• Cray Big Data Appliance, Urika
  (YarcData Division)
TeraData Aster MapR Appliance
•   Hadoop is not at the front, MAP Reduce is
•   Short learning curve, using current DW tool
•   MPP is already built in for scale as part of DW
•   Reliability and Performance done by HW
•   Connectivity (JDBC,ODBC) to Big Data: Cloudera
•   Guess Price is higher than Hadoop solutions
•   Platform: SuSE Linux
•   Aster Data nCluster Amazon AWS Cloud Edition
TeraData Aster MapR Appliance
• Type: Data Warehouse Hardware Engine + API to
  Hadoop / Analytics
• Does the job – See next slide
• Quick and simple setup –
• Quick and easy updates
• Easy control of one or many instances
• Simple Infrastructure requirements – Specialized
  HW…
• Reliable underlying system
• No delays doing it’s job - maybe
TeraData Aster MapR Appliance
• Can do most of the job requirements
• Exceptions:
  – Run a pilot to verify it is worth it – probably
    pricy…unless using the Software / Cloud editions
  – Process Data cheaper – probably not so…
  – Static or Continuous Analysis of Data – Should Excel!
  – Lock-In – probably, not sure how much
• Turn Operational Data into Assets - Should Excel
  at this…
Hardware Storage “Only”
• DataDirect Networks Big Data Storage
  Appliances

• Quantum StorNext Metadata Appliances
DataDirect Networks Big Data Storage
             Appliances
• “Science Fiction” I/O Performance
  – Single Array: 40GB⁄s and 1.4 Million Flash IOPS
  – Up to 25 FC/Infiniband hooked arrays: 1TB⁄s +
  – More info and pricing
Quantum StorNext Metadata
             Appliances
• Special additional features:
  – Huge file size support
  – Huge amount of files support
  – Varying Operating System direct access support
Hardware Storage “Only”
• Does the job – See next slide
• Quick and simple setup – Once you set the HW
• Quick and easy updates - probably
• Easy control of one or many instances
• Simple Infrastructure requirements – Specialized
  HW…
• Reliable underlying system
• No delays doing it’s job
Hardware Storage “Only”
• Can do SOME of the job requirements
• Exceptions: Can’t do all those without
  additional software
  – Run a pilot to verify it is worth it – too costly for a
    pilot?
  – Process Data faster
  – Process Data cheaper
  – Flexibility and Lock-In prevention
Cloud based VMs + Hadoop Engine

• Amazon Elastic MapReduce
  (Amazon EMR)

• Google Compute Engine
Amazon Elastic MapReduce
            (Amazon EMR)
• Type: Cloud based VMs + Hadoop Engine
• Cost Affective (not always = cheap!)
• Includes Hadoop SW such as MapR including all
  MapR advanced SW based File Services
• Easily add or remove nodes
  – Pre set VMs
  – Easy mass deployment using AWS console
• HA integrated into Amazon S3
• Hadoop Hbase DB as EMR service
Google Compute Engine Special
              Features
• Type: Cloud based VMs + Hadoop Engine
• Based on CentOS (nice – open…)
• Various disk types (all encrypted, fast)
  – Non Persistent (dies with the VM)
  – Persistent – shared + snapshots
  – Cloud based (looks similar to Amazon S3)
• Cheaper than Amazon?
Amazon Elastic MapReduce
              (Amazon EMR)
•   Does the job – See next slide
•   Quick and simple setup
•   Quick and easy updates - probably
•   Easy control of one or many instances
•   Simple Infrastructure requirements
•   Reliable underlying system
•   No delays doing it’s job
Amazon Elastic MapReduce
            (Amazon EMR)
• Can do most of the job requirements
• Exceptions:
  – Extend your Existing Data Warehouse to include Non-
    Structured Data - Your DW out in the cloud …
  – Run a pilot to verify it is worth it – Excels at this!
  – Process Data faster
  – Process Data cheaper
  – Static or Continuous Analysis of Data
  – Turn Operational Data into Assets
    Operational in the Cloud…
Cloud Based API with Hooks to
              Hadoop
• Google APP Engine Map Reduce



• Microsoft Big Data via Windows Azure
Google APP Engine Map Reduce
• open-source library for doing MapReduce on
  the Google App Engine platform
• Can process data store entities and blob files
  (probably Google Cloud Storage)
• Both in memory and disk operation
• Scale up or down “working threads”
• Python and Java support
• Experimental, still allows a look into the future…
Google APP Engine Map Reduce
• Does the job – See next slide
• Quick and simple setup – Once you learn the
  API
• Quick and easy updates
• Easy control of one or many instances
• Simple Infrastructure requirements
• Reliable underlying system – still Beta…
• No delays doing it’s job
Google APP Engine Map Reduce
• Can do SOME of the job requirements
• Exceptions:
  – Extend your Existing Data Warehouse – Cloud Security
    and DW
  – Run a pilot to verify it is worth it – could be great!
  – Process Data faster
  – Process Data cheaper
  – Static or Continuous Analysis of Data
  – Flexibility and Lock-In prevention – Code is open, but
    Process may not be
  – Turn Operational Data into Assets – Cloud Security…
Microsoft Big Data via Windows Azure

• Provides SQL Server Hadoop Connector
  Provides ODBC Hadoop connector to tie MS
  Office and other Apps to Hadoop Hive
• Seems similar to DW providers who have
  connector to Hadoop
  – Reason: It is not clear exactly where and how
    Azure Cloud Implementation goes…
Which Big Data Appliance should YOU
                use?
Which Big Data Appliance should YOU
                use?
• Let’s look at the Big Data Appliance Job to be
  Done and ask questions:
• Where are you and what is your goal?
  – So you have some of the puzzle pieces?
  – Any constraints?
  – Long term vs. Short term?
  – (Always start with a Pilot, if this is your first time…)
Challenges, Pitfalls, and
Winning the Big Data Game
Challenges, Pitfalls, and
       Winning the Big Data Game
• You can’t get much of Big Data if you don’t
  know how to find useful insights (Lack of Data
  Scientists)
• The same abilities you needed for Data
  Warehouse digging, you need with Big
  Data, even more
• Commoditization of the data warehouse
  (hadoop + Cloud) = More players and
  innovation
Challenges, Pitfalls, and
       Winning the Big Data Game
• You can’t make use of it, if you lack innovative
  quick agile abilities to change direction and
  respond on time
• Privacy (implied and specific)
• Security (implied and specific)
• To pay cheap (many X86 nodes) you need Mass
  Node Management APP
• Big DW Vendors embrace hadoop through
  solution providers such as Cloudera and
  HortonWorks, but it “feels” a bit “vague”
Where is all this leading YOU to?
Where is all this leading YOU to?
• The Simple Stuff (I know it looks complicated)
   – Crunching More and Faster for Less
   – Optimizing the Process and Utilizing the right Tools
• The real challenge: Turning Data into an Asset
   – Finding: The Golden Nuggets
   – Deciding: What should I do now?
   – Pitching and leading: The Transformation
• Big Data does not mean Endless Capacity…
• Don’t get lost in the Technology Play Ground
Q&A Soon…But First,
         I need Your Help now…




1. Please rate the Webinar
2. Download the resource attachments for
   future use
3. Register to my channel on BrightTalk
4. Spread the word
5. Have fun with Big Data and Enjoy Life 
Questions?
Reminder…
1. Please rate the Webinar
2. Download the resource attachments for
   future use
3. Register to my channel on BrightTalk
4. Spread the word
5. Have fun with Big Data and Enjoy Life 
Piranha vs. mammoth   predator appliances that chew up big data

Más contenido relacionado

La actualidad más candente

Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...Hiram Fleitas León
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)GeeksLab Odessa
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraCloudera, Inc.
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopCCG
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data TipsQubole
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDatawarehouse Trainings
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopCCG
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeVasu S
 

La actualidad más candente (20)

2022 02 Integration Bootcamp
2022 02 Integration Bootcamp2022 02 Integration Bootcamp
2022 02 Integration Bootcamp
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
 
SQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery ImplementationSQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery Implementation
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Disaster Recovery Site Implementation with MySQL
Disaster Recovery Site Implementation with MySQLDisaster Recovery Site Implementation with MySQL
Disaster Recovery Site Implementation with MySQL
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
Data Federation
Data FederationData Federation
Data Federation
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data Lake
 

Similar a Piranha vs. mammoth predator appliances that chew up big data

Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoopch adnan
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineCsaba Toth
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Alex Gorbachev
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightTillmann Eitelberg
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 

Similar a Piranha vs. mammoth predator appliances that chew up big data (20)

Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App Engine
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 

Último

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Último (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Piranha vs. mammoth predator appliances that chew up big data

  • 1. Piranha vs. Mammoth Predator Appliances chew up BIG DATA
  • 2. Piranha vs. Mammoth Predator Appliances chew up BIG DATA • Appliances are Small and Quick, Right? • Revealing the 6 Types of Big Data Appliances • Uncovering the Main Players • Which Big Data Appliance should YOU use? • Challenges, Pitfalls, and Winning the Big Data Game • Where is all this leading YOU to?
  • 3. Appliances are Small and Quick, Right?
  • 4. Well, in some cases. But, Big Data Appliance can be… BIG… Quantum StorNext M330 Presented on YouTube http://www.youtube.com/watch?v=X1IZpoyHxlY
  • 5. So what makes a great appliance?
  • 6. But first, let’s get to know You (Big Data Appliance Poll #1…)
  • 7. How deep have you dived into Big Data? A. Just starting to learn it B. Learning a lot, nothing done yet C. Planning a Big Data Project D. Running a Big Data Operation E. I don't get it Yet! What's all the fuss about it?
  • 9. So what makes a great appliance?
  • 10. So what makes a great appliance? 1. Does the job – no more, no less 2. Quick and simple setup 3. Quick and easy updates 4. Easy control of one or many instances 5. Simple Infrastructure requirements 6. Reliable underlying system 7. No delays doing it’s job 8. What else?
  • 11. What’s the most important Job for a Great Appliance? (Poll #2)
  • 12. What’s the most important Job for a Great Appliance? (Poll #2) A. Does the job on time – no more, no less B. Quick and simple Setup and Updates C. Easy control of one or many instances D. Simple Infrastructure requirements E. Reliable underlying system
  • 14. What is the job for your Big Data Appliance?
  • 15. What is the job for your Big Data Appliance? 1. Extend your Existing Data Warehouse to include Non- Structured Data? 2. Discover new types of insights to Increase Innovation 3. Run a pilot to verify it is worth it 4. Process more (types of) Data 5. Process Data faster 6. Process Data cheaper 7. Static or Continuous Analysis of Data 8. Flexibility and Lock-In prevention (yes, sure :-)) - Hadoop 9. Turn Operational Data into Assets 10. Break Data Silo barriers 11. Stick to existing Data vendors or work with new ones
  • 16. Revealing the 6 Types of Big Data Appliances
  • 17. Revealing the 6 Types of Big Data Appliances • Hadoop Engine - Software Based Appliance • Data Warehouse Hardware Engine + API to Hadoop / Analytics • Hardware Storage “Only” • Software Based Appliance, Compatible to Hadoop • Cloud based VMs + Hadoop Engine • Cloud Based API with Hooks to Hadoop
  • 18. What type of Big Data Appliance will you use? (Poll #3)
  • 19. What type of Big Data Appliance will you use? (Poll #3) A. Hadoop Engine or Compatible - Software Based B. Data Warehouse Hardware Engine + API to Hadoop / Analytics C. Hardware Storage “Only” D. Cloud based VMs + Hadoop Engine E. Cloud Based API with Hooks to Hadoop
  • 21. Uncovering (some of) the Main Players
  • 22. Hadoop Engine - Software Based Appliances • Oracle • Cloudera • HortonWorks (like Cloudera Co-Op VMware, Microsoft, TeraData,…)) • MapR Available on Amazon EMR and Google Compute Engine VMs • Red Hat Storage 2.0 Beta (Includes compatibility for Apache Hadoop)
  • 23. Oracle Big Data Appliance • End goal: Get data into Oracle Database 11g • Includes open source Hadoop (Now Cloudera) • Oracle NoSQL Database (JVM DB vs. HDFS!) • Oracle Loader for Hadoop (more next slide) • Open source distribution of R • Oracle Linux + Oracle Java Hot Spot VM
  • 24. Oracle Big Data Appliance • Oracle Data Integrator + Hadoop API – Easy upload to HDFS by automating MAP-R – Validate constraints of Hives – Add Data to Hives – Upload to Oracle using Oracle Loader for Hadoop – Allows query of Hives, using Oracle SQL, via a “connector” Oracle Table
  • 25. Oracle Big Data Appliance • Type: Hadoop Engine - Software Based Appliance • Does the job – See next slide • Quick and simple setup – Medium (Oracle) • Quick and easy updates – Medium (Oracle/CDH?) • Easy control of one or many instances • Simple Infrastructure requirements – Medium (Oracle) • Reliable underlying system • No delays doing it’s job - ? • What else? – Great if you’ve got Oracle already – Add on to Oracle Exadata Hardware / Data Warehouse
  • 26. Oracle Big Data Appliance • Can do most of the job requirements • Exceptions: – Process Data faster – Looks like… – Process Data cheaper – Oracle is not a cheap product… – Flexibility and Lock-In prevention - Medium
  • 27. Cloudera • Integrated, Tested collection of Open Source Apache Hadoop (more next slide) • HDFS is the NOSQL Database... • Management Console for rapid node deploy • Free up to X nodes • Paid Enterprise Subscription, includes support • Integrated into a bunch of Data software Giants
  • 28. Cloudera Included Open Source Mods: • Apache HBase HDFS based tables • Apache Hive SQL-like language • Apache Mahout Machine Learning algorithms • Apache Pig High-level data flow language • Apache Sqoop Engine integrating with SQLDBs • Apache Whirr to deploy Hadoop in the cloud • Hue Browser-based interface for Hadoop
  • 29. Cloudera • Type: Hadoop Engine - Software Based Appliance • Does the job – See next slide • Quick and simple setup – Great once first node set • Quick and easy updates • Easy control of one or many instances • Simple Infrastructure requirements • Reliable underlying system • No delays doing it’s job - maybe • What else? – Easy to start as a pilot! – Great for old hardware
  • 30. Cloudera • Can do most of the job requirements • Exceptions: – Process Data faster – depends on allocated resources – Process Cheaper – Yes (but cheap HW can be costly) – Static or Continuous Analysis – needs more tools – Endorsement from Huge Players
  • 31. MapR Special Features (Do You need it?) • ExpressLane – Small jobs finish quickly (medium) • Mount / use HDFS over NFS (strategic?) • NFS, allows data streaming (Important/lock in?) • Volumes (manage, mirror, snap) – (Important?) • X times more scalable / faster (lock in?) • Name Node and Job Tracker HA (claims regular hadoop has only 1 Name Node) (Medium) • SW Snapshot/Mirror (Fast? Complex?)
  • 32. Data Warehouse Hardware Engine + API to Hadoop / Analytics • TeraData Aster MapR Appliance • EMC GreenPlum • IBM Netezza + Cloudera/Hadoop as part of IBM’s Big Data Solutions Suite • Cray Big Data Appliance, Urika (YarcData Division)
  • 33. TeraData Aster MapR Appliance • Hadoop is not at the front, MAP Reduce is • Short learning curve, using current DW tool • MPP is already built in for scale as part of DW • Reliability and Performance done by HW • Connectivity (JDBC,ODBC) to Big Data: Cloudera • Guess Price is higher than Hadoop solutions • Platform: SuSE Linux • Aster Data nCluster Amazon AWS Cloud Edition
  • 34. TeraData Aster MapR Appliance • Type: Data Warehouse Hardware Engine + API to Hadoop / Analytics • Does the job – See next slide • Quick and simple setup – • Quick and easy updates • Easy control of one or many instances • Simple Infrastructure requirements – Specialized HW… • Reliable underlying system • No delays doing it’s job - maybe
  • 35. TeraData Aster MapR Appliance • Can do most of the job requirements • Exceptions: – Run a pilot to verify it is worth it – probably pricy…unless using the Software / Cloud editions – Process Data cheaper – probably not so… – Static or Continuous Analysis of Data – Should Excel! – Lock-In – probably, not sure how much • Turn Operational Data into Assets - Should Excel at this…
  • 36. Hardware Storage “Only” • DataDirect Networks Big Data Storage Appliances • Quantum StorNext Metadata Appliances
  • 37. DataDirect Networks Big Data Storage Appliances • “Science Fiction” I/O Performance – Single Array: 40GB⁄s and 1.4 Million Flash IOPS – Up to 25 FC/Infiniband hooked arrays: 1TB⁄s + – More info and pricing
  • 38. Quantum StorNext Metadata Appliances • Special additional features: – Huge file size support – Huge amount of files support – Varying Operating System direct access support
  • 39. Hardware Storage “Only” • Does the job – See next slide • Quick and simple setup – Once you set the HW • Quick and easy updates - probably • Easy control of one or many instances • Simple Infrastructure requirements – Specialized HW… • Reliable underlying system • No delays doing it’s job
  • 40. Hardware Storage “Only” • Can do SOME of the job requirements • Exceptions: Can’t do all those without additional software – Run a pilot to verify it is worth it – too costly for a pilot? – Process Data faster – Process Data cheaper – Flexibility and Lock-In prevention
  • 41. Cloud based VMs + Hadoop Engine • Amazon Elastic MapReduce (Amazon EMR) • Google Compute Engine
  • 42. Amazon Elastic MapReduce (Amazon EMR) • Type: Cloud based VMs + Hadoop Engine • Cost Affective (not always = cheap!) • Includes Hadoop SW such as MapR including all MapR advanced SW based File Services • Easily add or remove nodes – Pre set VMs – Easy mass deployment using AWS console • HA integrated into Amazon S3 • Hadoop Hbase DB as EMR service
  • 43. Google Compute Engine Special Features • Type: Cloud based VMs + Hadoop Engine • Based on CentOS (nice – open…) • Various disk types (all encrypted, fast) – Non Persistent (dies with the VM) – Persistent – shared + snapshots – Cloud based (looks similar to Amazon S3) • Cheaper than Amazon?
  • 44. Amazon Elastic MapReduce (Amazon EMR) • Does the job – See next slide • Quick and simple setup • Quick and easy updates - probably • Easy control of one or many instances • Simple Infrastructure requirements • Reliable underlying system • No delays doing it’s job
  • 45. Amazon Elastic MapReduce (Amazon EMR) • Can do most of the job requirements • Exceptions: – Extend your Existing Data Warehouse to include Non- Structured Data - Your DW out in the cloud … – Run a pilot to verify it is worth it – Excels at this! – Process Data faster – Process Data cheaper – Static or Continuous Analysis of Data – Turn Operational Data into Assets Operational in the Cloud…
  • 46. Cloud Based API with Hooks to Hadoop • Google APP Engine Map Reduce • Microsoft Big Data via Windows Azure
  • 47. Google APP Engine Map Reduce • open-source library for doing MapReduce on the Google App Engine platform • Can process data store entities and blob files (probably Google Cloud Storage) • Both in memory and disk operation • Scale up or down “working threads” • Python and Java support • Experimental, still allows a look into the future…
  • 48. Google APP Engine Map Reduce • Does the job – See next slide • Quick and simple setup – Once you learn the API • Quick and easy updates • Easy control of one or many instances • Simple Infrastructure requirements • Reliable underlying system – still Beta… • No delays doing it’s job
  • 49. Google APP Engine Map Reduce • Can do SOME of the job requirements • Exceptions: – Extend your Existing Data Warehouse – Cloud Security and DW – Run a pilot to verify it is worth it – could be great! – Process Data faster – Process Data cheaper – Static or Continuous Analysis of Data – Flexibility and Lock-In prevention – Code is open, but Process may not be – Turn Operational Data into Assets – Cloud Security…
  • 50. Microsoft Big Data via Windows Azure • Provides SQL Server Hadoop Connector Provides ODBC Hadoop connector to tie MS Office and other Apps to Hadoop Hive • Seems similar to DW providers who have connector to Hadoop – Reason: It is not clear exactly where and how Azure Cloud Implementation goes…
  • 51. Which Big Data Appliance should YOU use?
  • 52. Which Big Data Appliance should YOU use? • Let’s look at the Big Data Appliance Job to be Done and ask questions: • Where are you and what is your goal? – So you have some of the puzzle pieces? – Any constraints? – Long term vs. Short term? – (Always start with a Pilot, if this is your first time…)
  • 54. Challenges, Pitfalls, and Winning the Big Data Game • You can’t get much of Big Data if you don’t know how to find useful insights (Lack of Data Scientists) • The same abilities you needed for Data Warehouse digging, you need with Big Data, even more • Commoditization of the data warehouse (hadoop + Cloud) = More players and innovation
  • 55. Challenges, Pitfalls, and Winning the Big Data Game • You can’t make use of it, if you lack innovative quick agile abilities to change direction and respond on time • Privacy (implied and specific) • Security (implied and specific) • To pay cheap (many X86 nodes) you need Mass Node Management APP • Big DW Vendors embrace hadoop through solution providers such as Cloudera and HortonWorks, but it “feels” a bit “vague”
  • 56. Where is all this leading YOU to?
  • 57. Where is all this leading YOU to? • The Simple Stuff (I know it looks complicated) – Crunching More and Faster for Less – Optimizing the Process and Utilizing the right Tools • The real challenge: Turning Data into an Asset – Finding: The Golden Nuggets – Deciding: What should I do now? – Pitching and leading: The Transformation • Big Data does not mean Endless Capacity… • Don’t get lost in the Technology Play Ground
  • 58. Q&A Soon…But First, I need Your Help now… 1. Please rate the Webinar 2. Download the resource attachments for future use 3. Register to my channel on BrightTalk 4. Spread the word 5. Have fun with Big Data and Enjoy Life 
  • 60. Reminder… 1. Please rate the Webinar 2. Download the resource attachments for future use 3. Register to my channel on BrightTalk 4. Spread the word 5. Have fun with Big Data and Enjoy Life 