SlideShare una empresa de Scribd logo
1 de 22
Everything you've always wanted to
      know about Big Data
        (But were afraid to ask)




           Howie Rosenshine
      howie.rosenshine@gmail.com



                        PhillyDB – 6/19/2012
Administrivia


    Why the title of the talk?

    When I use the unqualified term database I
    probably mean DBMS. Bad habit.

    When I use the unqualified term database I
    probably mean RDBMS (not NoSQL)
    Another habit...maybe not bad...it will be for you
    to decide


                    Howie Rosenshine - Ergo Analytics
What is Big Data?


    Size is such that it cannot (easily/economically)
    be processed within a single node (or single
    "shared something" cluster)

    Or any smaller architecture that is capable of
    scaling to the above Big Data definition

    And what exactly do we mean by “processed” ?




                    Howie Rosenshine - Ergo Analytics
CRUD


    Create Read Update Destroy (plus a potentially
    huge amount of actual computing)

    Big Data examples tend to come from “machine
    generated” domain e.g. web crawling or
    tracking, realtime sensor data, logfiles, etc.

    So for Big Data, CRUD ⇒ CRud. Or perhaps:


     CRAP ⇒ Create Read Analytical Processing

                   Howie Rosenshine - Ergo Analytics
Why is all this CRAP a problem?


    Because tools and architectures that have
    grown to support not(B) do not scale well to B,
    where B=Big Data.

    Why is this so?

    Scalability? No such thing. Bottlenecks!




                      Howie Rosenshine - Ergo Analytics
Bottlenecks (Scalability)


    Consider the single Node RDB example (single
    node = shared everything)

    Bottlenecks can be hardware or software
    (probably software more often than not) e.g.
    kernel locks for I/O contention will probably bite
    before you run out of disks to attach or PCI
    bandwidth, etc.

    But one or the other will bite eventually.


                    Howie Rosenshine - Ergo Analytics
Scalability Solution(?)


    Distribute!

    Multiple machines ⇒
multiple kernels, multiple I/O backplanes...yay!

    Shared something...yay?




                  Howie Rosenshine - Ergo Analytics
Shared Something


    Shared logical disk implemented with pretty
    extensive inter-machine ipc locking mechanism.

    This will typically bottleneck long before any
    aggregated hardware limits

    Nevertheless, it is good enough to become a
    dominant force in the OLTP industry.
Note: this is not to say that you can’t do serious
 analytical processing on such an architecture
    But what happens when your “really big data”exceeds
     this limit?
                    Howie Rosenshine - Ergo Analytics
Big Data Strategies


    Shared nothing parallel relational database

    NoSQL (key/value stores)

    Map Reduce
    Note: Embarrassingly parallel problems require
    none of these. Examples:
              
                  Static web pages.
              
                  Wikipedia (at least w/o edits).
              
                  Google maps/earth.

                      Howie Rosenshine - Ergo Analytics
Shared Nothing parallel RDB


    Shared nothing, obviously

    Partitioning/Sharding

    Columnar (typically)




                   Howie Rosenshine - Ergo Analytics
Shared Nothing


    Well, “Nothing but Net”, that is

    Network should be fast, certainly for bandwidth,
    preferably for latency as well

    At least for some queries (see next section)




                    Howie Rosenshine - Ergo Analytics
Partitioning/Sharding


    Ideally little/no inter-shard/inter-node
    communication (local/localized join)

    Data distribution/redistribution among shards

    Redundancy also allows for orthogonal
    sharding




                     Howie Rosenshine - Ergo Analytics
Columnar Store


    Columnar store, for the most part at this point
“Some RDBMS are born columnar, and some
  have columnarness thrust upon them”

    Strong advantage for aggregation

    Also advantageous for compression




                    Howie Rosenshine - Ergo Analytics
NoSQL (Key/Value store) Types


    “Key value” store (simple key/value store)
    ⇒ riak, voldemort, etc

    Document store (complex key/value store)
     ⇒ mongodb, couchdb, etc

    Column oriented stores (tabular key/value
    store)
    ⇒ bigtable, hbase, cassandra, etc


                   Howie Rosenshine - Ergo Analytics
NoSQL (Key/Value store)
            Characteristics

    Relatively low latency, targeted at transaction
    oriented data (simple transactions)

    Typically not ACID

    Typically no joins




                    Howie Rosenshine - Ergo Analytics
Database vs Datastore?


    Is it ACID?

    Must a database be an instantiation of DBMS?

    “I shall not attempt to further define the
    characteristics of a database, but I know it
    when I see it, and this isn’t it”




                    Howie Rosenshine - Ergo Analytics
Map Reduce
    (“And now for something completely different”)



    Practical general purpose (or as close as
    anyone has come) implicit parallel
    programming paradigm

    Attributed to Google, who published the
    original Map Reduce white paper.

    Open Source Hadoop - Doug Cutting, Yahoo
Note: Hadoop is an “ecosystem”, not a “product”,
 however the unqualified use of Hadoop is typically
 taken to mean the use of Hadoop map reduce

                   Howie Rosenshine - Ergo Analytics
Hadoop characteristics


    Hadoop addresses the crAP partition

    Hadoop map reduce is composed, primarily of
    HDFS and map reduce itself.

    Not just Java ⇒ streams interface
    Python, Ruby...,
    Unix: utilities, pipes, filters, shell



                     Howie Rosenshine - Ergo Analytics
Hadoop


    “Hello World”

    $HADOOP_HOME/bin/hadoop jar 
    $HADOOP_HOME/hadoop-streaming.jar 
     -input myInputDirs 
     -output myOutputDir                               -
    mapper /bin/cat                                    -
    reducer /bin/wc



                    Howie Rosenshine - Ergo Analytics
General Purpose


    Use your imagination: If you can make the
    shoe fit, Hadoop will wear it

    HIVE ⇒ RDBMS

    RDBMS X...new and improved, 100% fortified
    with Hadoop ⇒ ETL




                   Howie Rosenshine - Ergo Analytics
Big Picture “Scalability”


    Order of magnitude comparison 1/10/100/1000
⇒ Single node/shared something rdb/
  shared nothing rdb/map reduce


    This is not necessarily a good inter-platform
    performance comparison, though it may be
    reasonable for intra-platform comparison


                    Howie Rosenshine - Ergo Analytics
Further Reading:


    dbms2.com - Curt Monash

    dbmsmusings.blogspot.com - Daniel Abadi




                  Howie Rosenshine - Ergo Analytics

Más contenido relacionado

Similar a Big Data Overview

How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: RevealedSachin Holla
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BIPrasad Prabhu (PP)
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 

Similar a Big Data Overview (20)

How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
NoSQL
NoSQLNoSQL
NoSQL
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Nosql
NosqlNosql
Nosql
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Firebird meets NoSQL
Firebird meets NoSQLFirebird meets NoSQL
Firebird meets NoSQL
 

Último

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 

Último (20)

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 

Big Data Overview

  • 1. Everything you've always wanted to know about Big Data (But were afraid to ask) Howie Rosenshine howie.rosenshine@gmail.com PhillyDB – 6/19/2012
  • 2. Administrivia  Why the title of the talk?  When I use the unqualified term database I probably mean DBMS. Bad habit.  When I use the unqualified term database I probably mean RDBMS (not NoSQL) Another habit...maybe not bad...it will be for you to decide Howie Rosenshine - Ergo Analytics
  • 3. What is Big Data?  Size is such that it cannot (easily/economically) be processed within a single node (or single "shared something" cluster)  Or any smaller architecture that is capable of scaling to the above Big Data definition  And what exactly do we mean by “processed” ? Howie Rosenshine - Ergo Analytics
  • 4. CRUD  Create Read Update Destroy (plus a potentially huge amount of actual computing)  Big Data examples tend to come from “machine generated” domain e.g. web crawling or tracking, realtime sensor data, logfiles, etc.  So for Big Data, CRUD ⇒ CRud. Or perhaps: CRAP ⇒ Create Read Analytical Processing Howie Rosenshine - Ergo Analytics
  • 5. Why is all this CRAP a problem?  Because tools and architectures that have grown to support not(B) do not scale well to B, where B=Big Data.  Why is this so?  Scalability? No such thing. Bottlenecks! Howie Rosenshine - Ergo Analytics
  • 6. Bottlenecks (Scalability)  Consider the single Node RDB example (single node = shared everything)  Bottlenecks can be hardware or software (probably software more often than not) e.g. kernel locks for I/O contention will probably bite before you run out of disks to attach or PCI bandwidth, etc.  But one or the other will bite eventually. Howie Rosenshine - Ergo Analytics
  • 7. Scalability Solution(?)  Distribute!  Multiple machines ⇒ multiple kernels, multiple I/O backplanes...yay!  Shared something...yay? Howie Rosenshine - Ergo Analytics
  • 8. Shared Something  Shared logical disk implemented with pretty extensive inter-machine ipc locking mechanism.  This will typically bottleneck long before any aggregated hardware limits  Nevertheless, it is good enough to become a dominant force in the OLTP industry. Note: this is not to say that you can’t do serious analytical processing on such an architecture But what happens when your “really big data”exceeds this limit? Howie Rosenshine - Ergo Analytics
  • 9. Big Data Strategies  Shared nothing parallel relational database  NoSQL (key/value stores)  Map Reduce Note: Embarrassingly parallel problems require none of these. Examples:  Static web pages.  Wikipedia (at least w/o edits).  Google maps/earth. Howie Rosenshine - Ergo Analytics
  • 10. Shared Nothing parallel RDB  Shared nothing, obviously  Partitioning/Sharding  Columnar (typically) Howie Rosenshine - Ergo Analytics
  • 11. Shared Nothing  Well, “Nothing but Net”, that is  Network should be fast, certainly for bandwidth, preferably for latency as well  At least for some queries (see next section) Howie Rosenshine - Ergo Analytics
  • 12. Partitioning/Sharding  Ideally little/no inter-shard/inter-node communication (local/localized join)  Data distribution/redistribution among shards  Redundancy also allows for orthogonal sharding Howie Rosenshine - Ergo Analytics
  • 13. Columnar Store  Columnar store, for the most part at this point “Some RDBMS are born columnar, and some have columnarness thrust upon them”  Strong advantage for aggregation  Also advantageous for compression Howie Rosenshine - Ergo Analytics
  • 14. NoSQL (Key/Value store) Types  “Key value” store (simple key/value store) ⇒ riak, voldemort, etc  Document store (complex key/value store) ⇒ mongodb, couchdb, etc  Column oriented stores (tabular key/value store) ⇒ bigtable, hbase, cassandra, etc Howie Rosenshine - Ergo Analytics
  • 15. NoSQL (Key/Value store) Characteristics  Relatively low latency, targeted at transaction oriented data (simple transactions)  Typically not ACID  Typically no joins Howie Rosenshine - Ergo Analytics
  • 16. Database vs Datastore?  Is it ACID?  Must a database be an instantiation of DBMS?  “I shall not attempt to further define the characteristics of a database, but I know it when I see it, and this isn’t it” Howie Rosenshine - Ergo Analytics
  • 17. Map Reduce (“And now for something completely different”)  Practical general purpose (or as close as anyone has come) implicit parallel programming paradigm  Attributed to Google, who published the original Map Reduce white paper.  Open Source Hadoop - Doug Cutting, Yahoo Note: Hadoop is an “ecosystem”, not a “product”, however the unqualified use of Hadoop is typically taken to mean the use of Hadoop map reduce Howie Rosenshine - Ergo Analytics
  • 18. Hadoop characteristics  Hadoop addresses the crAP partition  Hadoop map reduce is composed, primarily of HDFS and map reduce itself.  Not just Java ⇒ streams interface Python, Ruby..., Unix: utilities, pipes, filters, shell Howie Rosenshine - Ergo Analytics
  • 19. Hadoop  “Hello World”  $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input myInputDirs -output myOutputDir - mapper /bin/cat - reducer /bin/wc Howie Rosenshine - Ergo Analytics
  • 20. General Purpose  Use your imagination: If you can make the shoe fit, Hadoop will wear it  HIVE ⇒ RDBMS  RDBMS X...new and improved, 100% fortified with Hadoop ⇒ ETL Howie Rosenshine - Ergo Analytics
  • 21. Big Picture “Scalability”  Order of magnitude comparison 1/10/100/1000 ⇒ Single node/shared something rdb/ shared nothing rdb/map reduce  This is not necessarily a good inter-platform performance comparison, though it may be reasonable for intra-platform comparison Howie Rosenshine - Ergo Analytics
  • 22. Further Reading:  dbms2.com - Curt Monash  dbmsmusings.blogspot.com - Daniel Abadi Howie Rosenshine - Ergo Analytics