Enviar búsqueda
Cargar
NoSQL Needs SomeSQL
•
2 recomendaciones
•
545 vistas
DataWorks Summit
Seguir
Hadoop Summit 2015
Leer menos
Leer más
Tecnología
Vista de diapositivas
Denunciar
Compartir
Vista de diapositivas
Denunciar
Compartir
1 de 21
Recomendados
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
DataWorks Summit
The Challenges of SQL on Hadoop
The Challenges of SQL on Hadoop
DataWorks Summit
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Lester Martin
Integration of HIve and HBase
Integration of HIve and HBase
Hortonworks
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
Caserta
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
Recomendados
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
DataWorks Summit
The Challenges of SQL on Hadoop
The Challenges of SQL on Hadoop
DataWorks Summit
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Lester Martin
Integration of HIve and HBase
Integration of HIve and HBase
Hortonworks
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
Caserta
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
SQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
Daniel Abadi
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
DataWorks Summit/Hadoop Summit
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
DataWorks Summit
SQL on Hadoop
SQL on Hadoop
Bigdatapump
Data warehousing with Hadoop
Data warehousing with Hadoop
hadooparchbook
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
Sql on everything with drill
Sql on everything with drill
Julien Le Dem
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Allen Day, PhD
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
Hadoop demo ppt
Hadoop demo ppt
Phil Young
Large scale ETL with Hadoop
Large scale ETL with Hadoop
OReillyStrata
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
Hadoop 101
Hadoop 101
EMC
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
DataWorks Summit
Hadoop Overview
Hadoop Overview
EMC
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
Hortonworks
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDB
MongoDB
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
DataWorks Summit
Dealing with Changed Data in Hadoop
Dealing with Changed Data in Hadoop
DataWorks Summit
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
Self Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System Accuracy
DataWorks Summit
(Aaron myers) hdfs impala
(Aaron myers) hdfs impala
NAVER D2
Más contenido relacionado
La actualidad más candente
SQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
Daniel Abadi
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
DataWorks Summit/Hadoop Summit
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
DataWorks Summit
SQL on Hadoop
SQL on Hadoop
Bigdatapump
Data warehousing with Hadoop
Data warehousing with Hadoop
hadooparchbook
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
Sql on everything with drill
Sql on everything with drill
Julien Le Dem
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Allen Day, PhD
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
Hadoop demo ppt
Hadoop demo ppt
Phil Young
Large scale ETL with Hadoop
Large scale ETL with Hadoop
OReillyStrata
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
Hadoop 101
Hadoop 101
EMC
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
DataWorks Summit
Hadoop Overview
Hadoop Overview
EMC
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
Hortonworks
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDB
MongoDB
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
DataWorks Summit
Dealing with Changed Data in Hadoop
Dealing with Changed Data in Hadoop
DataWorks Summit
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
La actualidad más candente
(20)
SQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
SQL on Hadoop
SQL on Hadoop
Data warehousing with Hadoop
Data warehousing with Hadoop
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
Sql on everything with drill
Sql on everything with drill
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Hadoop demo ppt
Hadoop demo ppt
Large scale ETL with Hadoop
Large scale ETL with Hadoop
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Hadoop 101
Hadoop 101
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
Hadoop Overview
Hadoop Overview
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDB
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
Dealing with Changed Data in Hadoop
Dealing with Changed Data in Hadoop
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
Destacado
Self Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System Accuracy
DataWorks Summit
(Aaron myers) hdfs impala
(Aaron myers) hdfs impala
NAVER D2
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]
DataWorks Summit
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
50 Shades of SQL
50 Shades of SQL
DataWorks Summit
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
DataWorks Summit
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets
DataWorks Summit
Running Spark and MapReduce together in Production
Running Spark and MapReduce together in Production
DataWorks Summit
Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance Initiative
DataWorks Summit
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
DataWorks Summit
Hadoop for Genomics__HadoopSummit2010
Hadoop for Genomics__HadoopSummit2010
Yahoo Developer Network
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
DataWorks Summit
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
DataWorks Summit
Spark Application Development Made Easy
Spark Application Development Made Easy
DataWorks Summit
Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
DataWorks Summit
Destacado
(20)
Self Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System Accuracy
(Aaron myers) hdfs impala
(Aaron myers) hdfs impala
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
50 Shades of SQL
50 Shades of SQL
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets
Running Spark and MapReduce together in Production
Running Spark and MapReduce together in Production
Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance Initiative
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Hadoop for Genomics__HadoopSummit2010
Hadoop for Genomics__HadoopSummit2010
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
Spark Application Development Made Easy
Spark Application Development Made Easy
Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Similar a NoSQL Needs SomeSQL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
BCS Data Management Specialist Group
Erciyes university
Erciyes university
hothaifa alkhazraji
Agile data warehousing
Agile data warehousing
Sneha Challa
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
Amr Awadallah
Building next generation data warehouses
Building next generation data warehouses
Alex Meadows
NoSQL Seminer
NoSQL Seminer
Partha Das
No sql
No sql
Prateek Jain
Big data or big deal
Big data or big deal
eduarderwee
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
NoSQL Basics - a quick tour
NoSQL Basics - a quick tour
Bikram Sinha. MBA, PMP
Big data and tools
Big data and tools
Shivam Shukla
Relational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
Boston Data Engineering: Iceberg Dead Ahead with Starburst
Boston Data Engineering: Iceberg Dead Ahead with Starburst
Boston Data Engineering
Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?
Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?
brianlangbecker
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
Krishnakumar S
NoSQL and MapReduce
NoSQL and MapReduce
J Singh
SQL vs NoSQL deep dive
SQL vs NoSQL deep dive
Ahmed Shaaban
Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison Pill
Billy Newport
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
Yahoo Developer Network
Report 2.0.docx
Report 2.0.docx
pinstechwork
Similar a NoSQL Needs SomeSQL
(20)
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
Erciyes university
Erciyes university
Agile data warehousing
Agile data warehousing
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
Building next generation data warehouses
Building next generation data warehouses
NoSQL Seminer
NoSQL Seminer
No sql
No sql
Big data or big deal
Big data or big deal
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
NoSQL Basics - a quick tour
NoSQL Basics - a quick tour
Big data and tools
Big data and tools
Relational databases vs Non-relational databases
Relational databases vs Non-relational databases
Boston Data Engineering: Iceberg Dead Ahead with Starburst
Boston Data Engineering: Iceberg Dead Ahead with Starburst
Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?
Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
NoSQL and MapReduce
NoSQL and MapReduce
SQL vs NoSQL deep dive
SQL vs NoSQL deep dive
Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison Pill
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
Report 2.0.docx
Report 2.0.docx
Más de DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Más de DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Último
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Stephanie Beckett
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Dubai Multi Commodity Centre
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
hariprasad279825
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Slibray Presentation
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Mark Simos
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Wonjun Hwang
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Zilliz
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Memoori
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Alfredo García Lavilla
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
The Digital Insurer
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
Último
(20)
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
NoSQL Needs SomeSQL
1.
© 2015 IBM
CorporationHadoop Summit – San Jose 2015 NoSQL Needs SomeSQL Scott C. Gray (sgray@us.ibm.com) Senior Architect and STSM, Big SQL, Big Data Open Source
2.
© 2015 IBM
Corporation2 Hadoop Summit – San Jose, CA – June 2015 Agenda SQL Overview History Pro’s and Con’s Challenges of SQL on Hadoop NoSQL Overview History Solving the Challenges Advantages and Tradeoffs Conclusion and Questions
3.
© 2015 IBM
Corporation3 Hadoop Summit – San Jose, CA – June 2015 Structured Query Language Quick History on SQL (for NoSQL comparison later on) Developed in the 1970’s by IBM Multiple commercial offerings by 1980 Standardization began in 1986 and continues today SQL:2011 is the most recent standard Defining characteristics: Tabular (row/column storage) Strict schema Highly encourages a relational design
4.
© 2015 IBM
Corporation4 Hadoop Summit – San Jose, CA – June 2015 Structured Query Language What’s to Like? The obvious: A well known language Ubiquitous use by IT and business Standardization makes skills (and applications) easily transferable Many, many tools available due to a relatively simple and common data model Relational model allows you to easily explore data relationships Sales by part # Sales by region Sales by customer …
5.
© 2015 IBM
Corporation5 Hadoop Summit – San Jose, CA – June 2015 Structured Query Language What’s to Like? The not-so-obvious Formal and strict modelling allows for very smart optimizations based upon Data distribution (statistics) Data size (bytes per row, rows per page, etc.) Data type domains (value ranges, nullability, etc.) Declared domains (CHECK constraints) Formal relationships (referential constraints) The database engine can make very smart query strategy decisions
6.
© 2015 IBM
Corporation6 Hadoop Summit – San Jose, CA – June 2015 Structured Query Language What’s to NOT to Like? Typically not so efficient with sparse data This is changing with modern columnar stores – but they have tradeoffs too Very rigid, simple, data model makes modeling complex objects tedious May take dozens of tables to model one “object” (e.g. XML document) Fetching one “object” now requires significant work to reconstruct (many joins) Evolving the data model can be non-trivial E.g. changing a column’s type may require a table rebuild (and all dependent tables!) The relational model can make it difficult to be agile! The structure of all data must be defined up front
7.
© 2015 IBM
Corporation7 Hadoop Summit – San Jose, CA – June 2015 Structured Query Language What about Apache Drill? All of this talk about schema inflexibility…but what about projects like Apache Drill?? Apache Drill allows for efficient SQL queries against data without a schema* *It at least needs to know how the data is encoded (e.g. JSON, XML, etc.) It re-evaluates the structure of each “row” of data as it runs Supports a number of NoSQL platforms (HBase, MongoDB, etc.) But this only addresses the flexibility of the query language, and sill suffers from: Difficult to make optimization decisions (they are making some strides here…) Still pay a cost for joins (more on this coming up…) You still may not be able to ask a “table” what it’s schema is • Lots of tooling relies upon this
8.
© 2015 IBM
Corporation8 Hadoop Summit – San Jose, CA – June 2015 SQL on Hadoop The Great Promise In many ways, the architecture of Hadoop runs against the grain of relational processing Most DW’s rely heavily on controlled data placement Data is explicitly partitioned across the cluster A particular node “owns” a known subset of data Partitioning tables on the same key(s) and on the same nodes allows for co-located processing The fundamental design of HDFS explicitly implements “random” data placement No matter which node writes a block there is no guarantee a copy will live on that node Rebalancing HDFS can move blocks around So, no co-located processing without bending over backwards See my other session: Challenges of SQL on Hadoop Thursday, 3:10pm – Grand Ballroom 220C Partition A T1 T2 Partition B T1 T2 Partition C T1 T2 Query Coordinator HDFS
9.
© 2015 IBM
Corporation9 Hadoop Summit – San Jose, CA – June 2015 SQL on Hadoop Query Processing Without Data Placement Without co-location the options for join processing are limited Redistribution join DB engines read and filter “local” blocks for each table Records with the same key are shipped to the same node to be joined In the worst case both joined tables are moved in their entirety! Doesn’t really work well for non-equijoins (!=, <, >, etc.) Hash Join Smaller, or heavily filtered, tables are shipped to all other nodes An in memory hash table is used for very fast joins Can still lead to a lot of network to move the small table T1 T1 DB Engine T1DB Engine T2 DB Engine T2 DB Engine DB Engine DB Engine DB Engine Broadcast Join T1 T1 DB Engine T1DB Engine T2 DB Engine Hash Join T2 T2
10.
© 2015 IBM
Corporation10 Hadoop Summit – San Jose, CA – June 2015 Enter: NoSQL (“Not Only” SQL!) History of NoSQL It’s older than SQL! First database created in 1965 by TRW IBM’s IMS (hierarchical database) created for NASA and the Apollo space program in 1966 Advanced on Hadoop by Google’s BigTable papers Defining characteristics: No pre-defined schema (a.k.a. late-binding, scheme-on-read) Designed for horizontal scale-out Related data tends to be physically co-located or nested Strongly encourages non-relational designs Typically API-accessed (or path expressions)
11.
© 2015 IBM
Corporation11 Hadoop Summit – San Jose, CA – June 2015 Solving the Relational on Hadoop Challenge We saw the challenges of relational joins on distributed data There isn't time to explore each NoSQL technology Let's focus on one popular technology (HBase) and explore how can solve our relational woes and the tradeoffs….
12.
© 2015 IBM
Corporation12 Hadoop Summit – San Jose, CA – June 2015 HBase In One Slide HBase is a popular key-value store for Hadoop Client/server database A table has no schema, just a name All HBase tables are ordered and accessed by primary key Each row can have zero or more name-value stores (“column family”) Each column family can have zero or more name-value pairs Names and values are just binary data; there are no data types! MyTable 123412 Key Value fname lname age mobile Scott Gray 45 609-555-1212 Row Key Col Family: userinfo Col Family: changehistory Key Value 20140721 20141103 fname=Scot age=44 123746 Key Value fname lname age home Mary Swanson 28 123-555-1212 139442 Key Value fname lname age team Kimi Räikkönen 34 Ferrari Key Value 20130911 20131007 team=Lotus age=33 Key Value
13.
© 2015 IBM
Corporation13 Hadoop Summit – San Jose, CA – June 2015 Describing an HBase Table Relationally Different database engines provide different mechanisms for describing HBase tables Describe how data is encoded in the table Map the column family:column to relational column(s) But some common HBase design patterns are difficult/impossible to describe relationally… CREATE HBASE TABLE MY_TABLE ( C1 INT NOT NULL, C2 INT NOT NULL, C3 INT NOT NULL, C4 VARCHAR(10), C5 DECIMAL(5,2), C6 SMALLINT NOT NULL, CONSTRAINT PK1 PRIMARY KEY (C1, C) ) COLUMN MAPPING ( KEY MAPPED BY (C1,C2) ENCODING BINARY, CF:COL1 MAPPED BY (C3, c4) SEPARATOR '|' ENCODING STRING CF:COL2 MAPPED BY (C5, C6) ENCODING SERDE ‘com.myco.MyJSONSerDe’ ) Big SQL Example
14.
© 2015 IBM
Corporation14 Hadoop Summit – San Jose, CA – June 2015 HBase Design Patterns Getting Rid of the Join One common HBase design pattern is to physically nest related data within its parent row Take the typical department/employee relationship Each employee may be in its own column family within the dept Reading the dept automatically reads the employees with it No need for joins! DepartmentEmployees 0001 Key Value Name Manager Address Phone Finance Bob Smith 451 St. Claire… 609-555-1212 Row Key Col Fam: dept_info Key Value 287 934 16 1023 { fname: Glen, lname: Hanks, … } 0002 Col Fam: employees { fname: Scott, lname: Anderson, … } { fname: Brian, lname: Applebaum, … } { fname: Jim, lname: Demes, … } Key Value Name Manager Address Phone Sales Jane McClaren 555 Bailey … 408-314-8234 Key Value 287 934 { fname: Tom, lname: Donohue, … } { fname: Mary, lname: Swanson, … }
15.
© 2015 IBM
Corporation15 Hadoop Summit – San Jose, CA – June 2015 HBase Design Patterns Getting Rid of the Join Another approach is use the row key to force child data to be adjacent to the parent record Asking for row key 0001 gives just the dept Asking for keys >= 0001 and < 0002 gives dept + employees Odds are very good dept + employees are physically adjacent on the same server DepartmentEmployees 0001 Key Value Name Manager Finance … Row Key 0001/287 Key Value Glen Hanks fname lname 0001/934 Key Value Scott Anderson fname lname dept_id dept_id/emp_id
16.
© 2015 IBM
Corporation16 Hadoop Summit – San Jose, CA – June 2015 NoSQL Design Tradeoffs There are many other similar design approaches! What are the tradeoffs for such designs vs. relational? Advantages Related data is always co-located, no network hop for a join As data "shards" related data automatically stays together Schema can trivially be extended in the future • Add new name/value pairs • Add new column families • Add new adjacent rows…
17.
© 2015 IBM
Corporation17 Hadoop Summit – San Jose, CA – June 2015 NoSQL Design Tradeoffs Disadvantages Relationships tend to be one-way • What if I want to find the department a given employee is in? • May need to maintain multiple copies of the data • Cannot easily (efficiently) explore ad-hoc relationships Difficult to model • Describing these data models to a relational engine is very difficult • Hive has limited/restrictive support for ad-hoc data in column families • Making the wrong choice can make SQL access impossible or limited Query optimization • The developer is the query optimizer • The data model dramatically limits available optimizations What's the schema?? • Database schema cannot be determined from the database! • Tooling (data exploration/management) tends to need to be custom built
18.
© 2015 IBM
Corporation18 Hadoop Summit – San Jose, CA – June 2015 Why Not Just Model Relationally? You can, of course, just model you data relationally But, there is a good chance your data will not be co-located! Every joined row may require a network hop to fetch You’re back to most of the problems you were trying to solve! Modelling complex object is difficult Re-assembling complex objects is expensive Changing the data model is still a pain Department 0001 Key Value Name Manager Finance … Row Key Employee 287 Key Value fname lname dept_id Glen Hanks 0001 Row Key Region Server Department 0001-0486 Employee 1-300 Region Server Employee 301-999 Region Server Department 0487-0923
19.
© 2015 IBM
Corporation19 Hadoop Summit – San Jose, CA – June 2015 So, All Is Lost Then? All is not lost! You can expose limited portions of your data model through SQL Co-processors/batch jobs can maintain relational views of non-relational data Some SQL solutions can model certain design patterns Hive can capture an entire column family into a MAP Big SQL allows for custom column decoders to map arbitrary data structures relationally Drill can dig into certain complex column types Mix-and-match relational design with what your SQL engine can do
20.
© 2015 IBM
Corporation20 Hadoop Summit – San Jose, CA – June 2015 Conclusion Not all NoSQL solutions have the same limitations as HBase! But invariably they all pose some challenge to traditional relational querying NoSQL fundamentally encourages nested relationships You have to plan to SQL access in advance It is important to understand the NoSQL capabilities of your SQL solution thoroughly There are a more challenges than I have described here!
21.
© 2015 IBM
Corporation21 Hadoop Summit – San Jose, CA – June 2015 Thank You! Thanks for putting up with me Questions?