SlideShare a Scribd company logo
1 of 24
Download to read offline
Google BigQuery
About myself
Matthias Feys
work @Datatonic:
- big data (with Google Cloud)
- machine learning
- data visualizations (Tableau/Spotfire)
Google Qualified Cloud Developer
contact:
- @FsMatt
- matthias@datatonic.com
About Datatonic
Datatonic is a team of data science experts that help corporations unleash
the power of data. They use Google Cloud Platform, data visualisation
technologies (like Tableau or Spotfire) and machine learning to build
breakthrough solutions. Either as expert advisors, or built as a fully
managed solution with support end-to-end (A3S).
Some references:
@teamdatatonic
● What is BigQuery?
● How does it scale/work?
● How does it compare to:
- NoSQL datastores
- MapReduce
● Demo
● Pricing Model
● Best Practices
This talk
What is BigQuery?
“BigQuery is a fully-managed
and cloud-based interactive
query service for
massive datasets.”
It’s the externalization
of Dremel, one of
Google’s core
technologies
What is BigQuery? (2)
BigQuery Service is available via:
● Web UI (bigquery.cloud.google.com)
● console (gcloud)
● API (+ client libraries)
● external tools (Tableau, Excel, …)
● ODBC connector
How Does it Scale?
Dremel Architecture
Data Model/Storage:
- Columnar Storage
- Nested/Repeated Fields
- No Index!
-> Single Full Table Scan (from disk)
Query Execution:
- Tree Architecture
- Using tens of thousands
machines over fast Google
network (+1Petabit/s)
Columnar Storage
● Traffic minimization:
○ only read selected
columns
● Higher Compression Ratio:
○ Similar values in the
same column
○ From 1:3 → 1:10
Tree Architecture
- root server:
->receives query + reads table metadata
->rewrites the query(s)
->sends queries to the next level
<-returns final query results
- intermediate servers:
->(similar steps)
<-parallel partial aggregation
- leaf servers:
->actually scan (parts) of the table
<-send data to intermediate servers
NoSQL Datastore vs. BigQuery?
NoSQL Datastore
● Index based
(expected queries)
● Read-write
BigQuery
● Non-index based
(ad hoc queries)
● Read-only (append-
only)
MapReduce vs. BigQuery?
MapReduce
● High latency
● Flexible (complex)
batch processing
● Unstructured data
BigQuery
● Low latency
● SQL-like queries
● Structured data
Demo’s
Demo’s
Pricing Model
Category Price Note
Storage Cost $0.020 per GB, per
month
Query Cost $5 per TB 1st TB per month
is free
Best Practices
Denormalize / Pre-Join Where Possible
● Best performance
● Only pay for the columns you need
● Nested/repeated fields!
Relational Database Design Denormalized Nested/Repeated (JSON)
Table Sharding
● You pay for what you read
→ Read less, pay less
● Table wildcards allow for easy reading over multiple tables
https://cloud.google.com/bigquery/query-reference#tablewildcardfunctions
Optimize for Query vs. Storage Costs
Common Queries?
- Materialized views
(save intermediate results in tables)
with pre-aggregated data:
→ faster + cheaper queries
- Store data in multiple tables:
- table for daily data
- table for weekly data
- table for monthly data
Narrow the Table Scans
You only pay for the columns you read
Don’t use “SELECT *” !!!
Table Decorators
Only way to avoid doing full table scans!
Allows undeleting tables
options:
● snapshot decorator + range decorator
● relative value + absolute values
https://cloud.google.com/bigquery/table-decorators
Query optimizations
Query Plan
https://cloud.google.com/bigquery/query-plan-explanation
Big Data Reference Architecture
Questions?
You can reach me at:
- mail: matthias@datatonic.com
- Twitter: @FsMatt

More Related Content

What's hot

What's hot (20)

Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & Benefits
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
BigQuery walk through.pptx
BigQuery walk through.pptxBigQuery walk through.pptx
BigQuery walk through.pptx
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQuery
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL Databases
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Mongodb
MongodbMongodb
Mongodb
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 

Viewers also liked

Viewers also liked (6)

Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
 
AWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesAWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL Queries
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 

Similar to Google BigQuery

Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
DataWorks Summit
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
Guillaume Lefranc
 

Similar to Google BigQuery (20)

Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
 
Mongodb
MongodbMongodb
Mongodb
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Google BigQuery

  • 2. About myself Matthias Feys work @Datatonic: - big data (with Google Cloud) - machine learning - data visualizations (Tableau/Spotfire) Google Qualified Cloud Developer contact: - @FsMatt - matthias@datatonic.com
  • 3. About Datatonic Datatonic is a team of data science experts that help corporations unleash the power of data. They use Google Cloud Platform, data visualisation technologies (like Tableau or Spotfire) and machine learning to build breakthrough solutions. Either as expert advisors, or built as a fully managed solution with support end-to-end (A3S). Some references: @teamdatatonic
  • 4. ● What is BigQuery? ● How does it scale/work? ● How does it compare to: - NoSQL datastores - MapReduce ● Demo ● Pricing Model ● Best Practices This talk
  • 5. What is BigQuery? “BigQuery is a fully-managed and cloud-based interactive query service for massive datasets.” It’s the externalization of Dremel, one of Google’s core technologies
  • 6. What is BigQuery? (2) BigQuery Service is available via: ● Web UI (bigquery.cloud.google.com) ● console (gcloud) ● API (+ client libraries) ● external tools (Tableau, Excel, …) ● ODBC connector
  • 7. How Does it Scale?
  • 8. Dremel Architecture Data Model/Storage: - Columnar Storage - Nested/Repeated Fields - No Index! -> Single Full Table Scan (from disk) Query Execution: - Tree Architecture - Using tens of thousands machines over fast Google network (+1Petabit/s)
  • 9. Columnar Storage ● Traffic minimization: ○ only read selected columns ● Higher Compression Ratio: ○ Similar values in the same column ○ From 1:3 → 1:10
  • 10. Tree Architecture - root server: ->receives query + reads table metadata ->rewrites the query(s) ->sends queries to the next level <-returns final query results - intermediate servers: ->(similar steps) <-parallel partial aggregation - leaf servers: ->actually scan (parts) of the table <-send data to intermediate servers
  • 11. NoSQL Datastore vs. BigQuery? NoSQL Datastore ● Index based (expected queries) ● Read-write BigQuery ● Non-index based (ad hoc queries) ● Read-only (append- only)
  • 12. MapReduce vs. BigQuery? MapReduce ● High latency ● Flexible (complex) batch processing ● Unstructured data BigQuery ● Low latency ● SQL-like queries ● Structured data
  • 15. Pricing Model Category Price Note Storage Cost $0.020 per GB, per month Query Cost $5 per TB 1st TB per month is free
  • 17. Denormalize / Pre-Join Where Possible ● Best performance ● Only pay for the columns you need ● Nested/repeated fields! Relational Database Design Denormalized Nested/Repeated (JSON)
  • 18. Table Sharding ● You pay for what you read → Read less, pay less ● Table wildcards allow for easy reading over multiple tables https://cloud.google.com/bigquery/query-reference#tablewildcardfunctions
  • 19. Optimize for Query vs. Storage Costs Common Queries? - Materialized views (save intermediate results in tables) with pre-aggregated data: → faster + cheaper queries - Store data in multiple tables: - table for daily data - table for weekly data - table for monthly data
  • 20. Narrow the Table Scans You only pay for the columns you read Don’t use “SELECT *” !!!
  • 21. Table Decorators Only way to avoid doing full table scans! Allows undeleting tables options: ● snapshot decorator + range decorator ● relative value + absolute values https://cloud.google.com/bigquery/table-decorators
  • 23. Big Data Reference Architecture
  • 24. Questions? You can reach me at: - mail: matthias@datatonic.com - Twitter: @FsMatt