SlideShare una empresa de Scribd logo
1 de 21
© 2015 IBM CorporationHadoop Summit – San Jose 2015
NoSQL Needs SomeSQL
Scott C. Gray (sgray@us.ibm.com)
Senior Architect and STSM, Big SQL, Big Data Open Source
© 2015 IBM Corporation2 Hadoop Summit – San Jose, CA – June 2015
Agenda
 SQL Overview
 History
 Pro’s and Con’s
 Challenges of SQL on Hadoop
 NoSQL Overview
 History
 Solving the Challenges
 Advantages and Tradeoffs
 Conclusion and Questions
© 2015 IBM Corporation3 Hadoop Summit – San Jose, CA – June 2015
Structured Query Language
Quick History on SQL (for NoSQL comparison later on)
 Developed in the 1970’s by IBM
 Multiple commercial offerings by 1980
 Standardization began in 1986 and continues today
 SQL:2011 is the most recent standard
 Defining characteristics:
 Tabular (row/column storage)
 Strict schema
 Highly encourages a relational design
© 2015 IBM Corporation4 Hadoop Summit – San Jose, CA – June 2015
Structured Query Language
What’s to Like? The obvious:
 A well known language
 Ubiquitous use by IT and business
 Standardization makes skills (and applications)
easily transferable
 Many, many tools available due to a relatively
simple and common data model
 Relational model allows you to easily explore
data relationships
 Sales by part #
 Sales by region
 Sales by customer
 …
© 2015 IBM Corporation5 Hadoop Summit – San Jose, CA – June 2015
Structured Query Language
What’s to Like? The not-so-obvious
 Formal and strict modelling allows for very smart optimizations based upon
 Data distribution (statistics)
 Data size (bytes per row, rows per page, etc.)
 Data type domains (value ranges, nullability, etc.)
 Declared domains (CHECK constraints)
 Formal relationships (referential constraints)
 The database engine can make very smart query strategy decisions
© 2015 IBM Corporation6 Hadoop Summit – San Jose, CA – June 2015
Structured Query Language
What’s to NOT to Like?
 Typically not so efficient with sparse data
 This is changing with modern columnar stores – but they have tradeoffs too
 Very rigid, simple, data model makes modeling complex objects tedious
 May take dozens of tables to model one “object” (e.g. XML document)
 Fetching one “object” now requires significant work to reconstruct (many joins)
 Evolving the data model can be non-trivial
 E.g. changing a column’s type may require a table rebuild (and all dependent tables!)
 The relational model can make it difficult to be agile!
 The structure of all data must be defined up front
© 2015 IBM Corporation7 Hadoop Summit – San Jose, CA – June 2015
Structured Query Language
What about Apache Drill?
 All of this talk about schema inflexibility…but what about projects
like Apache Drill??
 Apache Drill allows for efficient SQL queries against data without a schema*
 *It at least needs to know how the data is encoded (e.g. JSON, XML, etc.)
 It re-evaluates the structure of each “row” of data as it runs
 Supports a number of NoSQL platforms (HBase, MongoDB, etc.)
 But this only addresses the flexibility of the query language, and sill suffers from:
 Difficult to make optimization decisions (they are making some strides here…)
 Still pay a cost for joins (more on this coming up…)
 You still may not be able to ask a “table” what it’s schema is
• Lots of tooling relies upon this
© 2015 IBM Corporation8 Hadoop Summit – San Jose, CA – June 2015
SQL on Hadoop
The Great Promise
 In many ways, the architecture of Hadoop runs against the grain of relational processing
 Most DW’s rely heavily on controlled data placement
 Data is explicitly partitioned across the cluster
 A particular node “owns” a known subset of data
 Partitioning tables on the same key(s) and on the same
nodes allows for co-located processing
 The fundamental design of HDFS explicitly implements
“random” data placement
 No matter which node writes a block there is no
guarantee a copy will live on that node
 Rebalancing HDFS can move blocks around
 So, no co-located processing without bending over
backwards
See my other session:
Challenges of SQL on Hadoop
Thursday, 3:10pm – Grand Ballroom 220C
Partition A
T1 T2
Partition B
T1 T2
Partition C
T1 T2
Query
Coordinator
HDFS
© 2015 IBM Corporation9 Hadoop Summit – San Jose, CA – June 2015
SQL on Hadoop
Query Processing Without Data Placement
 Without co-location the options for join processing are limited
 Redistribution join
 DB engines read and filter “local” blocks for each table
 Records with the same key are shipped to the same
node to be joined
 In the worst case both joined tables are moved in their entirety!
 Doesn’t really work well for non-equijoins (!=, <, >, etc.)
 Hash Join
 Smaller, or heavily filtered, tables are shipped to all
other nodes
 An in memory hash table is used for very fast joins
 Can still lead to a lot of network to move the small table
T1
T1
DB
Engine
T1DB
Engine
T2
DB
Engine T2
DB
Engine
DB
Engine
DB
Engine
DB
Engine
Broadcast Join
T1
T1
DB
Engine
T1DB
Engine
T2
DB
Engine
Hash Join
T2 T2
© 2015 IBM Corporation10 Hadoop Summit – San Jose, CA – June 2015
Enter: NoSQL (“Not Only” SQL!)
History of NoSQL
 It’s older than SQL!
 First database created in 1965 by TRW
 IBM’s IMS (hierarchical database) created for NASA and the Apollo space program in 1966
 Advanced on Hadoop by Google’s BigTable papers
 Defining characteristics:
 No pre-defined schema (a.k.a. late-binding, scheme-on-read)
 Designed for horizontal scale-out
 Related data tends to be physically co-located or nested
 Strongly encourages non-relational designs
 Typically API-accessed (or path expressions)
© 2015 IBM Corporation11 Hadoop Summit – San Jose, CA – June 2015
Solving the Relational on Hadoop Challenge
 We saw the challenges of relational joins on distributed data
 There isn't time to explore each NoSQL technology
 Let's focus on one popular technology (HBase) and explore
how can solve our relational woes and the tradeoffs….
© 2015 IBM Corporation12 Hadoop Summit – San Jose, CA – June 2015
HBase In One Slide
 HBase is a popular key-value store for Hadoop
 Client/server database
 A table has no schema, just a name
 All HBase tables are ordered and
accessed by primary key
 Each row can have zero or more
name-value stores (“column family”)
 Each column family can have zero or
more name-value pairs
 Names and values are just binary data;
there are no data types!
MyTable
123412
Key Value
fname
lname
age
mobile
Scott
Gray
45
609-555-1212
Row Key Col Family: userinfo Col Family: changehistory
Key Value
20140721
20141103
fname=Scot
age=44
123746
Key Value
fname
lname
age
home
Mary
Swanson
28
123-555-1212
139442
Key Value
fname
lname
age
team
Kimi
Räikkönen
34
Ferrari
Key Value
20130911
20131007
team=Lotus
age=33
Key Value
© 2015 IBM Corporation13 Hadoop Summit – San Jose, CA – June 2015
Describing an HBase Table Relationally
 Different database engines provide different
mechanisms for describing HBase tables
 Describe how data is encoded in the table
 Map the column family:column to relational
column(s)
 But some common HBase design patterns are
difficult/impossible to describe relationally…
CREATE HBASE TABLE MY_TABLE
(
C1 INT NOT NULL,
C2 INT NOT NULL,
C3 INT NOT NULL,
C4 VARCHAR(10),
C5 DECIMAL(5,2),
C6 SMALLINT NOT NULL,
CONSTRAINT PK1 PRIMARY KEY (C1, C)
)
COLUMN MAPPING
(
KEY MAPPED BY (C1,C2) ENCODING BINARY,
CF:COL1 MAPPED BY (C3, c4)
SEPARATOR '|' ENCODING STRING
CF:COL2 MAPPED BY (C5, C6)
ENCODING SERDE ‘com.myco.MyJSONSerDe’
)
Big SQL Example
© 2015 IBM Corporation14 Hadoop Summit – San Jose, CA – June 2015
HBase Design Patterns
Getting Rid of the Join
 One common HBase design pattern is to physically nest related data within its parent row
 Take the typical department/employee relationship
 Each employee may be in its own column family within the dept
 Reading the dept automatically reads the employees with it
 No need for joins!
DepartmentEmployees
0001
Key Value
Name
Manager
Address
Phone
Finance
Bob Smith
451 St. Claire…
609-555-1212
Row Key Col Fam: dept_info
Key Value
287
934
16
1023
{ fname: Glen, lname: Hanks, … }
0002
Col Fam: employees
{ fname: Scott, lname: Anderson, … }
{ fname: Brian, lname: Applebaum, … }
{ fname: Jim, lname: Demes, … }
Key Value
Name
Manager
Address
Phone
Sales
Jane McClaren
555 Bailey …
408-314-8234
Key Value
287
934
{ fname: Tom, lname: Donohue, … }
{ fname: Mary, lname: Swanson, … }
© 2015 IBM Corporation15 Hadoop Summit – San Jose, CA – June 2015
HBase Design Patterns
Getting Rid of the Join
 Another approach is use the row key to force child data to be adjacent to the parent record
 Asking for row key 0001 gives just the dept
 Asking for keys >= 0001 and < 0002 gives dept + employees
 Odds are very good dept + employees are physically adjacent on the same server
DepartmentEmployees
0001
Key Value
Name
Manager
Finance
…
Row Key
0001/287
Key Value
Glen
Hanks
fname
lname
0001/934
Key Value
Scott
Anderson
fname
lname
dept_id
dept_id/emp_id
© 2015 IBM Corporation16 Hadoop Summit – San Jose, CA – June 2015
NoSQL Design Tradeoffs
 There are many other similar design approaches!
 What are the tradeoffs for such designs vs. relational?
 Advantages
 Related data is always co-located, no network hop for a join
 As data "shards" related data automatically stays together
 Schema can trivially be extended in the future
• Add new name/value pairs
• Add new column families
• Add new adjacent rows…
© 2015 IBM Corporation17 Hadoop Summit – San Jose, CA – June 2015
NoSQL Design Tradeoffs
 Disadvantages
 Relationships tend to be one-way
• What if I want to find the department a given employee is in?
• May need to maintain multiple copies of the data
• Cannot easily (efficiently) explore ad-hoc relationships
 Difficult to model
• Describing these data models to a relational engine is very difficult
• Hive has limited/restrictive support for ad-hoc data in column families
• Making the wrong choice can make SQL access impossible or limited
 Query optimization
• The developer is the query optimizer
• The data model dramatically limits available optimizations
 What's the schema??
• Database schema cannot be determined from the database!
• Tooling (data exploration/management) tends to need to be custom built
© 2015 IBM Corporation18 Hadoop Summit – San Jose, CA – June 2015
Why Not Just Model Relationally?
 You can, of course, just model you data relationally
 But, there is a good chance your data will not be co-located!
 Every joined row may require a network hop to fetch
 You’re back to most of the problems you were trying to solve!
 Modelling complex object is difficult
 Re-assembling complex objects is expensive
 Changing the data model is still a pain
Department
0001
Key Value
Name
Manager
Finance
…
Row Key Employee
287
Key Value
fname
lname
dept_id
Glen
Hanks
0001
Row Key
Region Server
Department
0001-0486
Employee
1-300
Region Server
Employee
301-999
Region Server
Department
0487-0923
© 2015 IBM Corporation19 Hadoop Summit – San Jose, CA – June 2015
So, All Is Lost Then?
 All is not lost!
 You can expose limited portions of your data model through SQL
 Co-processors/batch jobs can maintain relational views of non-relational data
 Some SQL solutions can model certain design patterns
 Hive can capture an entire column family into a MAP
 Big SQL allows for custom column decoders to map arbitrary data structures relationally
 Drill can dig into certain complex column types
 Mix-and-match relational design with what your SQL engine can do
© 2015 IBM Corporation20 Hadoop Summit – San Jose, CA – June 2015
Conclusion
 Not all NoSQL solutions have the same limitations as HBase!
 But invariably they all pose some challenge to traditional relational querying
 NoSQL fundamentally encourages nested relationships
 You have to plan to SQL access in advance
 It is important to understand the NoSQL capabilities of your SQL solution thoroughly
 There are a more challenges than I have described here!
© 2015 IBM Corporation21 Hadoop Summit – San Jose, CA – June 2015
Thank You!
 Thanks for putting up with me
 Questions?

Más contenido relacionado

La actualidad más candente

SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionDataWorks Summit
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLDataWorks Summit
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview EMC
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMoving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMongoDB
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesDataWorks Summit
 
Dealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDataWorks Summit
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 

La actualidad más candente (20)

SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMoving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDB
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
 
Dealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDealing with Changed Data in Hadoop
Dealing with Changed Data in Hadoop
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 

Destacado

Self Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System AccuracySelf Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System AccuracyDataWorks Summit
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impalaNAVER D2
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...DataWorks Summit
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNDataWorks Summit
 
Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]DataWorks Summit
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenDataWorks Summit
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopDataWorks Summit
 
Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets DataWorks Summit
 
Running Spark and MapReduce together in Production
Running Spark and MapReduce together in ProductionRunning Spark and MapReduce together in Production
Running Spark and MapReduce together in ProductionDataWorks Summit
 
Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeHadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeDataWorks Summit
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...DataWorks Summit
 
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...DataWorks Summit
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentDataWorks Summit
 
Spark Application Development Made Easy
Spark Application Development Made EasySpark Application Development Made Easy
Spark Application Development Made EasyDataWorks Summit
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?DataWorks Summit
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDataWorks Summit
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersDataWorks Summit
 

Destacado (20)

Self Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System AccuracySelf Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System Accuracy
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
 
Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets
 
Running Spark and MapReduce together in Production
Running Spark and MapReduce together in ProductionRunning Spark and MapReduce together in Production
Running Spark and MapReduce together in Production
 
Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeHadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance Initiative
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
 
Hadoop for Genomics__HadoopSummit2010
Hadoop for Genomics__HadoopSummit2010Hadoop for Genomics__HadoopSummit2010
Hadoop for Genomics__HadoopSummit2010
 
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
 
Spark Application Development Made Easy
Spark Application Development Made EasySpark Application Development Made Easy
Spark Application Development Made Easy
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
 

Similar a NoSQL Needs SomeSQL

SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOLSQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOLBCS Data Management Specialist Group
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousingSneha Challa
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehousesAlex Meadows
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...Big Data Spain
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Boston Data Engineering: Iceberg Dead Ahead with Starburst
Boston Data Engineering: Iceberg Dead Ahead with StarburstBoston Data Engineering: Iceberg Dead Ahead with Starburst
Boston Data Engineering: Iceberg Dead Ahead with StarburstBoston Data Engineering
 
Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?
Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?
Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?brianlangbecker
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionKrishnakumar S
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
SQL vs NoSQL deep dive
SQL vs NoSQL deep diveSQL vs NoSQL deep dive
SQL vs NoSQL deep diveAhmed Shaaban
 
Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison PillEnterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison PillBilly Newport
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 

Similar a NoSQL Needs SomeSQL (20)

SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOLSQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
 
Erciyes university
Erciyes universityErciyes university
Erciyes university
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehouses
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
 
No sql
No sqlNo sql
No sql
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Boston Data Engineering: Iceberg Dead Ahead with Starburst
Boston Data Engineering: Iceberg Dead Ahead with StarburstBoston Data Engineering: Iceberg Dead Ahead with Starburst
Boston Data Engineering: Iceberg Dead Ahead with Starburst
 
Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?
Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?
Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
SQL vs NoSQL deep dive
SQL vs NoSQL deep diveSQL vs NoSQL deep dive
SQL vs NoSQL deep dive
 
Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison PillEnterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison Pill
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Último (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

NoSQL Needs SomeSQL

  • 1. © 2015 IBM CorporationHadoop Summit – San Jose 2015 NoSQL Needs SomeSQL Scott C. Gray (sgray@us.ibm.com) Senior Architect and STSM, Big SQL, Big Data Open Source
  • 2. © 2015 IBM Corporation2 Hadoop Summit – San Jose, CA – June 2015 Agenda  SQL Overview  History  Pro’s and Con’s  Challenges of SQL on Hadoop  NoSQL Overview  History  Solving the Challenges  Advantages and Tradeoffs  Conclusion and Questions
  • 3. © 2015 IBM Corporation3 Hadoop Summit – San Jose, CA – June 2015 Structured Query Language Quick History on SQL (for NoSQL comparison later on)  Developed in the 1970’s by IBM  Multiple commercial offerings by 1980  Standardization began in 1986 and continues today  SQL:2011 is the most recent standard  Defining characteristics:  Tabular (row/column storage)  Strict schema  Highly encourages a relational design
  • 4. © 2015 IBM Corporation4 Hadoop Summit – San Jose, CA – June 2015 Structured Query Language What’s to Like? The obvious:  A well known language  Ubiquitous use by IT and business  Standardization makes skills (and applications) easily transferable  Many, many tools available due to a relatively simple and common data model  Relational model allows you to easily explore data relationships  Sales by part #  Sales by region  Sales by customer  …
  • 5. © 2015 IBM Corporation5 Hadoop Summit – San Jose, CA – June 2015 Structured Query Language What’s to Like? The not-so-obvious  Formal and strict modelling allows for very smart optimizations based upon  Data distribution (statistics)  Data size (bytes per row, rows per page, etc.)  Data type domains (value ranges, nullability, etc.)  Declared domains (CHECK constraints)  Formal relationships (referential constraints)  The database engine can make very smart query strategy decisions
  • 6. © 2015 IBM Corporation6 Hadoop Summit – San Jose, CA – June 2015 Structured Query Language What’s to NOT to Like?  Typically not so efficient with sparse data  This is changing with modern columnar stores – but they have tradeoffs too  Very rigid, simple, data model makes modeling complex objects tedious  May take dozens of tables to model one “object” (e.g. XML document)  Fetching one “object” now requires significant work to reconstruct (many joins)  Evolving the data model can be non-trivial  E.g. changing a column’s type may require a table rebuild (and all dependent tables!)  The relational model can make it difficult to be agile!  The structure of all data must be defined up front
  • 7. © 2015 IBM Corporation7 Hadoop Summit – San Jose, CA – June 2015 Structured Query Language What about Apache Drill?  All of this talk about schema inflexibility…but what about projects like Apache Drill??  Apache Drill allows for efficient SQL queries against data without a schema*  *It at least needs to know how the data is encoded (e.g. JSON, XML, etc.)  It re-evaluates the structure of each “row” of data as it runs  Supports a number of NoSQL platforms (HBase, MongoDB, etc.)  But this only addresses the flexibility of the query language, and sill suffers from:  Difficult to make optimization decisions (they are making some strides here…)  Still pay a cost for joins (more on this coming up…)  You still may not be able to ask a “table” what it’s schema is • Lots of tooling relies upon this
  • 8. © 2015 IBM Corporation8 Hadoop Summit – San Jose, CA – June 2015 SQL on Hadoop The Great Promise  In many ways, the architecture of Hadoop runs against the grain of relational processing  Most DW’s rely heavily on controlled data placement  Data is explicitly partitioned across the cluster  A particular node “owns” a known subset of data  Partitioning tables on the same key(s) and on the same nodes allows for co-located processing  The fundamental design of HDFS explicitly implements “random” data placement  No matter which node writes a block there is no guarantee a copy will live on that node  Rebalancing HDFS can move blocks around  So, no co-located processing without bending over backwards See my other session: Challenges of SQL on Hadoop Thursday, 3:10pm – Grand Ballroom 220C Partition A T1 T2 Partition B T1 T2 Partition C T1 T2 Query Coordinator HDFS
  • 9. © 2015 IBM Corporation9 Hadoop Summit – San Jose, CA – June 2015 SQL on Hadoop Query Processing Without Data Placement  Without co-location the options for join processing are limited  Redistribution join  DB engines read and filter “local” blocks for each table  Records with the same key are shipped to the same node to be joined  In the worst case both joined tables are moved in their entirety!  Doesn’t really work well for non-equijoins (!=, <, >, etc.)  Hash Join  Smaller, or heavily filtered, tables are shipped to all other nodes  An in memory hash table is used for very fast joins  Can still lead to a lot of network to move the small table T1 T1 DB Engine T1DB Engine T2 DB Engine T2 DB Engine DB Engine DB Engine DB Engine Broadcast Join T1 T1 DB Engine T1DB Engine T2 DB Engine Hash Join T2 T2
  • 10. © 2015 IBM Corporation10 Hadoop Summit – San Jose, CA – June 2015 Enter: NoSQL (“Not Only” SQL!) History of NoSQL  It’s older than SQL!  First database created in 1965 by TRW  IBM’s IMS (hierarchical database) created for NASA and the Apollo space program in 1966  Advanced on Hadoop by Google’s BigTable papers  Defining characteristics:  No pre-defined schema (a.k.a. late-binding, scheme-on-read)  Designed for horizontal scale-out  Related data tends to be physically co-located or nested  Strongly encourages non-relational designs  Typically API-accessed (or path expressions)
  • 11. © 2015 IBM Corporation11 Hadoop Summit – San Jose, CA – June 2015 Solving the Relational on Hadoop Challenge  We saw the challenges of relational joins on distributed data  There isn't time to explore each NoSQL technology  Let's focus on one popular technology (HBase) and explore how can solve our relational woes and the tradeoffs….
  • 12. © 2015 IBM Corporation12 Hadoop Summit – San Jose, CA – June 2015 HBase In One Slide  HBase is a popular key-value store for Hadoop  Client/server database  A table has no schema, just a name  All HBase tables are ordered and accessed by primary key  Each row can have zero or more name-value stores (“column family”)  Each column family can have zero or more name-value pairs  Names and values are just binary data; there are no data types! MyTable 123412 Key Value fname lname age mobile Scott Gray 45 609-555-1212 Row Key Col Family: userinfo Col Family: changehistory Key Value 20140721 20141103 fname=Scot age=44 123746 Key Value fname lname age home Mary Swanson 28 123-555-1212 139442 Key Value fname lname age team Kimi Räikkönen 34 Ferrari Key Value 20130911 20131007 team=Lotus age=33 Key Value
  • 13. © 2015 IBM Corporation13 Hadoop Summit – San Jose, CA – June 2015 Describing an HBase Table Relationally  Different database engines provide different mechanisms for describing HBase tables  Describe how data is encoded in the table  Map the column family:column to relational column(s)  But some common HBase design patterns are difficult/impossible to describe relationally… CREATE HBASE TABLE MY_TABLE ( C1 INT NOT NULL, C2 INT NOT NULL, C3 INT NOT NULL, C4 VARCHAR(10), C5 DECIMAL(5,2), C6 SMALLINT NOT NULL, CONSTRAINT PK1 PRIMARY KEY (C1, C) ) COLUMN MAPPING ( KEY MAPPED BY (C1,C2) ENCODING BINARY, CF:COL1 MAPPED BY (C3, c4) SEPARATOR '|' ENCODING STRING CF:COL2 MAPPED BY (C5, C6) ENCODING SERDE ‘com.myco.MyJSONSerDe’ ) Big SQL Example
  • 14. © 2015 IBM Corporation14 Hadoop Summit – San Jose, CA – June 2015 HBase Design Patterns Getting Rid of the Join  One common HBase design pattern is to physically nest related data within its parent row  Take the typical department/employee relationship  Each employee may be in its own column family within the dept  Reading the dept automatically reads the employees with it  No need for joins! DepartmentEmployees 0001 Key Value Name Manager Address Phone Finance Bob Smith 451 St. Claire… 609-555-1212 Row Key Col Fam: dept_info Key Value 287 934 16 1023 { fname: Glen, lname: Hanks, … } 0002 Col Fam: employees { fname: Scott, lname: Anderson, … } { fname: Brian, lname: Applebaum, … } { fname: Jim, lname: Demes, … } Key Value Name Manager Address Phone Sales Jane McClaren 555 Bailey … 408-314-8234 Key Value 287 934 { fname: Tom, lname: Donohue, … } { fname: Mary, lname: Swanson, … }
  • 15. © 2015 IBM Corporation15 Hadoop Summit – San Jose, CA – June 2015 HBase Design Patterns Getting Rid of the Join  Another approach is use the row key to force child data to be adjacent to the parent record  Asking for row key 0001 gives just the dept  Asking for keys >= 0001 and < 0002 gives dept + employees  Odds are very good dept + employees are physically adjacent on the same server DepartmentEmployees 0001 Key Value Name Manager Finance … Row Key 0001/287 Key Value Glen Hanks fname lname 0001/934 Key Value Scott Anderson fname lname dept_id dept_id/emp_id
  • 16. © 2015 IBM Corporation16 Hadoop Summit – San Jose, CA – June 2015 NoSQL Design Tradeoffs  There are many other similar design approaches!  What are the tradeoffs for such designs vs. relational?  Advantages  Related data is always co-located, no network hop for a join  As data "shards" related data automatically stays together  Schema can trivially be extended in the future • Add new name/value pairs • Add new column families • Add new adjacent rows…
  • 17. © 2015 IBM Corporation17 Hadoop Summit – San Jose, CA – June 2015 NoSQL Design Tradeoffs  Disadvantages  Relationships tend to be one-way • What if I want to find the department a given employee is in? • May need to maintain multiple copies of the data • Cannot easily (efficiently) explore ad-hoc relationships  Difficult to model • Describing these data models to a relational engine is very difficult • Hive has limited/restrictive support for ad-hoc data in column families • Making the wrong choice can make SQL access impossible or limited  Query optimization • The developer is the query optimizer • The data model dramatically limits available optimizations  What's the schema?? • Database schema cannot be determined from the database! • Tooling (data exploration/management) tends to need to be custom built
  • 18. © 2015 IBM Corporation18 Hadoop Summit – San Jose, CA – June 2015 Why Not Just Model Relationally?  You can, of course, just model you data relationally  But, there is a good chance your data will not be co-located!  Every joined row may require a network hop to fetch  You’re back to most of the problems you were trying to solve!  Modelling complex object is difficult  Re-assembling complex objects is expensive  Changing the data model is still a pain Department 0001 Key Value Name Manager Finance … Row Key Employee 287 Key Value fname lname dept_id Glen Hanks 0001 Row Key Region Server Department 0001-0486 Employee 1-300 Region Server Employee 301-999 Region Server Department 0487-0923
  • 19. © 2015 IBM Corporation19 Hadoop Summit – San Jose, CA – June 2015 So, All Is Lost Then?  All is not lost!  You can expose limited portions of your data model through SQL  Co-processors/batch jobs can maintain relational views of non-relational data  Some SQL solutions can model certain design patterns  Hive can capture an entire column family into a MAP  Big SQL allows for custom column decoders to map arbitrary data structures relationally  Drill can dig into certain complex column types  Mix-and-match relational design with what your SQL engine can do
  • 20. © 2015 IBM Corporation20 Hadoop Summit – San Jose, CA – June 2015 Conclusion  Not all NoSQL solutions have the same limitations as HBase!  But invariably they all pose some challenge to traditional relational querying  NoSQL fundamentally encourages nested relationships  You have to plan to SQL access in advance  It is important to understand the NoSQL capabilities of your SQL solution thoroughly  There are a more challenges than I have described here!
  • 21. © 2015 IBM Corporation21 Hadoop Summit – San Jose, CA – June 2015 Thank You!  Thanks for putting up with me  Questions?