SlideShare una empresa de Scribd logo
1 de 14
Section Slide Template Option 2
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
Google BigQuery 101 & What’s New
Vadim Solovey - CTO, DoIT International
Google Cloud Developer Expert | Authorized Trainer
vadim@doit-intl.com
DoIT International confidential │ Do not distribute
About me..
Vadim Solovey - CTO, DoiT International
Google Cloud Developer Expert | AWS Solutions Architect
vadim@doit-intl.com
DoIT International confidential │ Do not distribute
Agenda
Google BigQuery 101
Partitioned Tables
Standard SQL & New DML Statements
1
2
3
New Formats4
Cost Optimization
6 Q & A
5
DoIT International confidential │ Do not distribute
BigQuery 101
Google’s Highly Distributed Columnar Database optimized for Analytics
● Fully managed NoOps service
● Multi petabyte scale & zero sizing required
● Ingestion + Analytics + Storage + API
● No indexes, only full table scans (!)
● Pre-integrated with other Google Cloud services:
○ Dataproc (Hadoop/Spark)
DoIT International confidential │ Do not distribute
BigQuery 101
Continue...
● Supports nested and repeated fields/columns
● Google’s SQL Dialect
● Query results are cached for up to 24 hours (no charge)
● Charged for storage ($10-$20 per TB/month) and for data scans ($5/TB)
○ No idle costs
○ Highly cost optimizable
DoIT International confidential │ Do not distribute
Based on Dremel
Google File System (GFS)
Leaf Leaf Leaf Leaf Leaf Leaf
Mixer 1 Mixer 1
Mixer 0
BigQuery in 60 Seconds
● Long Lived Shared Tree
● Mixer = Master & Reducer
● Leaf = Mapper
● Partial Reduction
● Diskless Data flow
Columnar Storage
● Execution Independent
● Reduces Disk Time
DoIT International confidential │ Do not distribute
Demo
DoIT International confidential │ Do not distribute
What’s New
New features:
● Table Partitions
● Insert/Update/Delete DML
● Standard ANSI SQL 2011
● Identity and Access Management
● Stackdriver for Monitoring
● New data formats for import/export
DoIT International confidential │ Do not distribute
Table Partitions
New way to shard the data to minimize amount of data being scanned by a query:
● Integrated with Streaming API for easy partition creation and update
● _PARTITIONTIME pseudo column
● Current release supports partition by DAY
Creating partitioned table (using CLI)
● bq mk --time_partitioning_type=DAY mydataset.table1
● bq mk --time_partitioning_type=DAY --time_partitioning_expiration=259200 mydataset.table2
Accessing partitioned data:
● Query all partitions: SELECT * from mydataset.table
● Query specific partition: SELECT * from mydataset.table$20161109
● Query range: SELECT * FROM mydataset.table WHERE _PARTITIONTIME BETWEEN
TIMESTAMP('2016-01-01') AND TIMESTAMP('2016-01-02')
DoIT International confidential │ Do not distribute
Insert, Update & Delete DML
BigQuery is not append-only anymore ;-)
Data Manipulation Language (DML) supporting these statements:
● INSERT
● UPDATE
● DELETE
Every statement is implicit transactions, no multi-statement transactions yet.
Quotas:
● Maximum UPDATE/DELETE statements per day per table: 48
● Maximum UPDATE/DELETE statements per day per project: 500
● Maximum INSERT statements per day per table: 1,000
● Maximum INSERT statements per day per project: 10,000
DoIT International confidential │ Do not distribute
Standard SQL
Full ANSI SQL 2011
● With extensions to support nested and repeated fields
● ‘Legacy SQL’ is still supported
Set a desired dialect using prefix, i.e.:
● #legacySQL or #standardSQL
#standardSQL
SELECT
weight_pounds, state, year, gestation_weeks
FROM
`bigquery-public-data.samples.natality`
ORDER BY weight_pounds DESC
LIMIT 10;
DoIT International confidential │ Do not distribute
Import/Export Formats
Data is importable (and exportable) into/from the following formats:
● *CV files
● JSON
● AVRO
● PARQUET
DoIT International confidential │ Do not distribute
Cost Optimization Tips
Some query optimization strategies:
● Use CONTAINS() instead of REGEXP_MATCH(), where possible..
● Sometimes, the sample of data is enough. Use HASH() function to sample the data.
● Use JSON_EXTRACT() if you have raw, unstructured json data in your data
● Avoid nondeterministic queries, i.e. things like NOW() etc. to improve caching
● Don’t query the table which you stream data into (cache will be immediately invalidated)
● Keep query result < 128MB, otherwise it won’t get cached as well
● Use the __TABLES__ & __DATASET__ metadata table for house-keeping goals
DoIT International confidential │ Do not distribute
Are you paying too much?
BigQuery is a Columnar Datastore, and maximum performance is achieved on denormalized data sets:
● Pre-Filter with Destination Table when running many similar queries (in WHERE clause)
● Use static tables to optimize BigQuery’s cache
○ If streaming/uploading frequently, create daily/hourly ‘snapshots’ and query them instead of
primary table
● Always prefer storage over compute!
● Set TableExpiration on datasets/partitions for automatic data lifecycle management
● Fetch only required columns in your SELECT clause
● Use dryRun & EXPLAIN to find most cost efficient query
● Set Cost Controls to cap your BigQuery spending

Más contenido relacionado

La actualidad más candente

TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDatatdc-globalcode
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Big Data Spain
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
Streaming 4 billion Messages per day. Lessons Learned.
Streaming 4 billion Messages per day. Lessons Learned.Streaming 4 billion Messages per day. Lessons Learned.
Streaming 4 billion Messages per day. Lessons Learned.Angelos Petheriotis
 
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...Big Data Spain
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data StoryLynn Langit
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionBrian Enochson
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Database Choices
Database ChoicesDatabase Choices
Database ChoicesLynn Langit
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101Ike Ellis
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureObjectRocket
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Icebergkbajda
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Rittman Analytics
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerMark Kromer
 

La actualidad más candente (20)

TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigData
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Streaming 4 billion Messages per day. Lessons Learned.
Streaming 4 billion Messages per day. Lessons Learned.Streaming 4 billion Messages per day. Lessons Learned.
Streaming 4 billion Messages per day. Lessons Learned.
 
NoSQL for SQL Users
NoSQL for SQL UsersNoSQL for SQL Users
NoSQL for SQL Users
 
Practical Use of a NoSQL Database
Practical Use of a NoSQL DatabasePractical Use of a NoSQL Database
Practical Use of a NoSQL Database
 
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
Practical Use of a NoSQL
Practical Use of a NoSQLPractical Use of a NoSQL
Practical Use of a NoSQL
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Database Choices
Database ChoicesDatabase Choices
Database Choices
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101
 
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr  SPARK & RIAK -20160114_hug_franceHugfr  SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the future
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
 
MongoDB on Azure
MongoDB on AzureMongoDB on Azure
MongoDB on Azure
 

Destacado

AWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesAWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesDoiT International
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopDoiT International
 
Get more from Analytics 360 with BigQuery and the Google Cloud Platform
Get more from Analytics 360 with BigQuery and the Google Cloud PlatformGet more from Analytics 360 with BigQuery and the Google Cloud Platform
Get more from Analytics 360 with BigQuery and the Google Cloud Platformjavier ramirez
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
An overview of Amazon Athena
An overview of Amazon AthenaAn overview of Amazon Athena
An overview of Amazon AthenaJulien SIMON
 
Ensayo blogger def 2
Ensayo blogger def 2Ensayo blogger def 2
Ensayo blogger def 2AldoMaGe
 
الفيلم أداة للتدريس - التجربة الشخصية أثناء دراسة الماجستير
الفيلم أداة للتدريس - التجربة الشخصية أثناء دراسة الماجستيرالفيلم أداة للتدريس - التجربة الشخصية أثناء دراسة الماجستير
الفيلم أداة للتدريس - التجربة الشخصية أثناء دراسة الماجستيرProf. Sherif Shaheen
 
Superfunds Magazine - Ready to take on the world
Superfunds Magazine - Ready to take on the worldSuperfunds Magazine - Ready to take on the world
Superfunds Magazine - Ready to take on the worldChloe Tilley
 
Посібник "Конспекти уроків у 1 семестрі"
Посібник "Конспекти уроків у 1 семестрі"Посібник "Конспекти уроків у 1 семестрі"
Посібник "Конспекти уроків у 1 семестрі"sveta7940
 
Dmc roast
Dmc roastDmc roast
Dmc roastcdnpal
 

Destacado (11)

AWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesAWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL Queries
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On Workshop
 
Get more from Analytics 360 with BigQuery and the Google Cloud Platform
Get more from Analytics 360 with BigQuery and the Google Cloud PlatformGet more from Analytics 360 with BigQuery and the Google Cloud Platform
Get more from Analytics 360 with BigQuery and the Google Cloud Platform
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
An overview of Amazon Athena
An overview of Amazon AthenaAn overview of Amazon Athena
An overview of Amazon Athena
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
Ensayo blogger def 2
Ensayo blogger def 2Ensayo blogger def 2
Ensayo blogger def 2
 
الفيلم أداة للتدريس - التجربة الشخصية أثناء دراسة الماجستير
الفيلم أداة للتدريس - التجربة الشخصية أثناء دراسة الماجستيرالفيلم أداة للتدريس - التجربة الشخصية أثناء دراسة الماجستير
الفيلم أداة للتدريس - التجربة الشخصية أثناء دراسة الماجستير
 
Superfunds Magazine - Ready to take on the world
Superfunds Magazine - Ready to take on the worldSuperfunds Magazine - Ready to take on the world
Superfunds Magazine - Ready to take on the world
 
Посібник "Конспекти уроків у 1 семестрі"
Посібник "Конспекти уроків у 1 семестрі"Посібник "Конспекти уроків у 1 семестрі"
Посібник "Конспекти уроків у 1 семестрі"
 
Dmc roast
Dmc roastDmc roast
Dmc roast
 

Similar a Google BigQuery 101 & What’s New

ORACLE 12C-New-Features
ORACLE 12C-New-FeaturesORACLE 12C-New-Features
ORACLE 12C-New-FeaturesNavneet Upneja
 
Oracle 12 c new-features
Oracle 12 c new-featuresOracle 12 c new-features
Oracle 12 c new-featuresNavneet Upneja
 
Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?Huy Nguyen
 
Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7Morgan Tocker
 
Encompassing Information Integration
Encompassing Information IntegrationEncompassing Information Integration
Encompassing Information Integrationnguyenfilip
 
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...Altinity Ltd
 
MariaDB 10.4 New Features
MariaDB 10.4 New FeaturesMariaDB 10.4 New Features
MariaDB 10.4 New FeaturesFromDual GmbH
 
IT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New FeaturesIT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New FeaturesFromDual GmbH
 
4 Essential Checklist to Manage Drupal Projects
4 Essential Checklist to Manage Drupal Projects4 Essential Checklist to Manage Drupal Projects
4 Essential Checklist to Manage Drupal ProjectsSnake Hill Web Agency
 
[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analyticsGUSS
 
Jss 2015 in memory and operational analytics
Jss 2015   in memory and operational analyticsJss 2015   in memory and operational analytics
Jss 2015 in memory and operational analyticsDavid Barbarin
 
Activity Recognition project
Activity Recognition projectActivity Recognition project
Activity Recognition projectAndreaNapoletani
 
MySQL 8 - UKOUG Techfest Brighton December 2nd, 2019
MySQL 8 - UKOUG Techfest Brighton December 2nd, 2019MySQL 8 - UKOUG Techfest Brighton December 2nd, 2019
MySQL 8 - UKOUG Techfest Brighton December 2nd, 2019Dave Stokes
 
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)Dave Stokes
 
Plain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsPlain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsAngela Byron
 
Scaling Magento
Scaling MagentoScaling Magento
Scaling MagentoCopious
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesnicolascombin1
 
COUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_FeaturesCOUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_FeaturesAlfredo Abate
 

Similar a Google BigQuery 101 & What’s New (20)

ORACLE 12C-New-Features
ORACLE 12C-New-FeaturesORACLE 12C-New-Features
ORACLE 12C-New-Features
 
Oracle 12 c new-features
Oracle 12 c new-featuresOracle 12 c new-features
Oracle 12 c new-features
 
Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?
 
GCP for AWS Professionals
GCP for AWS ProfessionalsGCP for AWS Professionals
GCP for AWS Professionals
 
Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Encompassing Information Integration
Encompassing Information IntegrationEncompassing Information Integration
Encompassing Information Integration
 
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
 
MariaDB 10.4 New Features
MariaDB 10.4 New FeaturesMariaDB 10.4 New Features
MariaDB 10.4 New Features
 
IT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New FeaturesIT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New Features
 
4 Essential Checklist to Manage Drupal Projects
4 Essential Checklist to Manage Drupal Projects4 Essential Checklist to Manage Drupal Projects
4 Essential Checklist to Manage Drupal Projects
 
[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics
 
Jss 2015 in memory and operational analytics
Jss 2015   in memory and operational analyticsJss 2015   in memory and operational analytics
Jss 2015 in memory and operational analytics
 
Activity Recognition project
Activity Recognition projectActivity Recognition project
Activity Recognition project
 
MySQL 8 - UKOUG Techfest Brighton December 2nd, 2019
MySQL 8 - UKOUG Techfest Brighton December 2nd, 2019MySQL 8 - UKOUG Techfest Brighton December 2nd, 2019
MySQL 8 - UKOUG Techfest Brighton December 2nd, 2019
 
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
 
Plain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsPlain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticals
 
Scaling Magento
Scaling MagentoScaling Magento
Scaling Magento
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examples
 
COUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_FeaturesCOUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_Features
 

Más de DoiT International

Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules RestructuredDoiT International
 
GAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor CoresGAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor CoresDoiT International
 
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsDoiT International
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!DoiT International
 
An Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure MicroservicesAn Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure MicroservicesDoiT International
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?DoiT International
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best PracticesDoiT International
 
Running Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWSRunning Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWSDoiT International
 
Scaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami MahloofScaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami MahloofDoiT International
 
CI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriCI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriDoiT International
 
Kubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen FisherKubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen FisherDoiT International
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)DoiT International
 

Más de DoiT International (15)

Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
 
GAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor CoresGAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor Cores
 
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s Operators
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!
 
An Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure MicroservicesAn Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure Microservices
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?
 
Applying ML for Log Analysis
Applying ML for Log AnalysisApplying ML for Log Analysis
Applying ML for Log Analysis
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best Practices
 
Running Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWSRunning Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWS
 
Scaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami MahloofScaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami Mahloof
 
CI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriCI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar Demri
 
Kubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen FisherKubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen Fisher
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)
 

Último

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Google BigQuery 101 & What’s New

  • 1. Section Slide Template Option 2 Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Google BigQuery 101 & What’s New Vadim Solovey - CTO, DoIT International Google Cloud Developer Expert | Authorized Trainer vadim@doit-intl.com
  • 2. DoIT International confidential │ Do not distribute About me.. Vadim Solovey - CTO, DoiT International Google Cloud Developer Expert | AWS Solutions Architect vadim@doit-intl.com
  • 3. DoIT International confidential │ Do not distribute Agenda Google BigQuery 101 Partitioned Tables Standard SQL & New DML Statements 1 2 3 New Formats4 Cost Optimization 6 Q & A 5
  • 4. DoIT International confidential │ Do not distribute BigQuery 101 Google’s Highly Distributed Columnar Database optimized for Analytics ● Fully managed NoOps service ● Multi petabyte scale & zero sizing required ● Ingestion + Analytics + Storage + API ● No indexes, only full table scans (!) ● Pre-integrated with other Google Cloud services: ○ Dataproc (Hadoop/Spark)
  • 5. DoIT International confidential │ Do not distribute BigQuery 101 Continue... ● Supports nested and repeated fields/columns ● Google’s SQL Dialect ● Query results are cached for up to 24 hours (no charge) ● Charged for storage ($10-$20 per TB/month) and for data scans ($5/TB) ○ No idle costs ○ Highly cost optimizable
  • 6. DoIT International confidential │ Do not distribute Based on Dremel Google File System (GFS) Leaf Leaf Leaf Leaf Leaf Leaf Mixer 1 Mixer 1 Mixer 0 BigQuery in 60 Seconds ● Long Lived Shared Tree ● Mixer = Master & Reducer ● Leaf = Mapper ● Partial Reduction ● Diskless Data flow Columnar Storage ● Execution Independent ● Reduces Disk Time
  • 7. DoIT International confidential │ Do not distribute Demo
  • 8. DoIT International confidential │ Do not distribute What’s New New features: ● Table Partitions ● Insert/Update/Delete DML ● Standard ANSI SQL 2011 ● Identity and Access Management ● Stackdriver for Monitoring ● New data formats for import/export
  • 9. DoIT International confidential │ Do not distribute Table Partitions New way to shard the data to minimize amount of data being scanned by a query: ● Integrated with Streaming API for easy partition creation and update ● _PARTITIONTIME pseudo column ● Current release supports partition by DAY Creating partitioned table (using CLI) ● bq mk --time_partitioning_type=DAY mydataset.table1 ● bq mk --time_partitioning_type=DAY --time_partitioning_expiration=259200 mydataset.table2 Accessing partitioned data: ● Query all partitions: SELECT * from mydataset.table ● Query specific partition: SELECT * from mydataset.table$20161109 ● Query range: SELECT * FROM mydataset.table WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2016-01-01') AND TIMESTAMP('2016-01-02')
  • 10. DoIT International confidential │ Do not distribute Insert, Update & Delete DML BigQuery is not append-only anymore ;-) Data Manipulation Language (DML) supporting these statements: ● INSERT ● UPDATE ● DELETE Every statement is implicit transactions, no multi-statement transactions yet. Quotas: ● Maximum UPDATE/DELETE statements per day per table: 48 ● Maximum UPDATE/DELETE statements per day per project: 500 ● Maximum INSERT statements per day per table: 1,000 ● Maximum INSERT statements per day per project: 10,000
  • 11. DoIT International confidential │ Do not distribute Standard SQL Full ANSI SQL 2011 ● With extensions to support nested and repeated fields ● ‘Legacy SQL’ is still supported Set a desired dialect using prefix, i.e.: ● #legacySQL or #standardSQL #standardSQL SELECT weight_pounds, state, year, gestation_weeks FROM `bigquery-public-data.samples.natality` ORDER BY weight_pounds DESC LIMIT 10;
  • 12. DoIT International confidential │ Do not distribute Import/Export Formats Data is importable (and exportable) into/from the following formats: ● *CV files ● JSON ● AVRO ● PARQUET
  • 13. DoIT International confidential │ Do not distribute Cost Optimization Tips Some query optimization strategies: ● Use CONTAINS() instead of REGEXP_MATCH(), where possible.. ● Sometimes, the sample of data is enough. Use HASH() function to sample the data. ● Use JSON_EXTRACT() if you have raw, unstructured json data in your data ● Avoid nondeterministic queries, i.e. things like NOW() etc. to improve caching ● Don’t query the table which you stream data into (cache will be immediately invalidated) ● Keep query result < 128MB, otherwise it won’t get cached as well ● Use the __TABLES__ & __DATASET__ metadata table for house-keeping goals
  • 14. DoIT International confidential │ Do not distribute Are you paying too much? BigQuery is a Columnar Datastore, and maximum performance is achieved on denormalized data sets: ● Pre-Filter with Destination Table when running many similar queries (in WHERE clause) ● Use static tables to optimize BigQuery’s cache ○ If streaming/uploading frequently, create daily/hourly ‘snapshots’ and query them instead of primary table ● Always prefer storage over compute! ● Set TableExpiration on datasets/partitions for automatic data lifecycle management ● Fetch only required columns in your SELECT clause ● Use dryRun & EXPLAIN to find most cost efficient query ● Set Cost Controls to cap your BigQuery spending

Notas del editor

  1. Before we talk about the next generation stack, let’s look at the principles that underlie it.
  2. The storage is very high performance, distributed, high bandwidth storage platform Each leaf reads only few MB of data Mixers combine data along the tree