Google Developer Group (GDG) Sonargaon is a community based focused group for developers on Google and related technologies. I tried to cover a topic on Big Data & BigQuery which is the future of analytics.
4. Data that has three attributes(V’s)
can be ‘Big Data’
Velocity
Variety Volume
A fast, economical, fully managed and cloud
based interactive query service for large-scale
data analytics
BigQueryBig Data
5. How Big is B-I-G
Youtube
Media data
15+ exabytes (2017)
Inventory &
Customer Data
42 Terabytes (2014)
Gmail only
18.5+ petabytes (2018)
English article
10 + Terabytes
(2013)
Amazon Google Wikipedia
6. 1. Generate big data reports require expensive servers and skilled database administrators
2. Interacting with big data has been expensive, slow and inefficient
3. BigQuery changes all that reducing time and expense to query data
4. Super fast SQL queries - run queries on terabyte data sets in seconds( 4.7TB data took 2.5 sec.)
5. Scalable – i) Store hundreds of terabytes ii) Pay only for what you use
6. Service for interactive analysis of massive datasets:
a) Query billions of rows: seconds to write, seconds to return
b) Uses a SQL style query syntax c) It's a service, accessed by a RESTful API
Why BigQuery
7. [
{
"mode": "NULLABLE",
"name": "version",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "amount",
"type": "NUMERIC"
]
Integer: 64 bit signed
Float
String: UTF-8 encoded,
<64KB
Boolean: “true” or “false”
Timestamp: String - YYYY-
MM-DD HH:MM:SS
Numeric - seconds from
UNIX
Schema & Data Types
8. 1. Project: All data in BigQuery belongs inside
a project (Set of users, APIs, authentication,
billing information)
2. Dataset: Holds one or more tables (Lowest
access control
3. Table: Row-column structure that contains
actual data
4. Job: Used to start potentially long running
queries
Project
Big Query
Jobs
Team access
Dataset
Dataset
Table
Table
Project Hierarchy
9. 1. Table name is represented as follows:
Current Project
<dataset>.<table name>
e.g. lightcastle-data-testing:forecasting.sales
Datasets & Tables
10. BigQuery support following format for data loading
Avro, CSV, TSV, JSON,ORC, Parquet, Cloud Datastore exports, Cloud Firestore exports
Big Query
tool
Web
Browser
API
Big
Query
Data Format & Accessing BigQuery
11. SELECT extract(year from timestamp) as year, country, sum(amount) as total FROM
`lightcastle-data-testing.forecasting.sales` where version = 1 group by extract(year from
timestamp), country LIMIT 1000;
BigQuery Demo Using Web Interface
12. Visualization Tools
1. Data Studio
2. Tableau
3. Qlik View
4. Metric Insights
5. Jaspersoft
6. Bime
Analysis Using Google Data Studio
13. • CSV/JSON must be split into chunks less than 1TB
• Split to smaller files
Easier error recovery
To smaller data unit (day, month instead of year)
• Split tables by dates
Minimize cost of data scanned
Minimize query time
• Denormalize your data
• For Query - Query only the columns(SELECT name) that you need instead of select
all(SELECT *)
A Few Best Practices
14. • 1,000 import jobs per table per day
• 10,000 import jobs per project per day
• File size (for both CSV and JSON)
1GB for compressed file
1TB for uncompressed
• 10,000 files per import job
• 1TB per import job
BigQuery Data Load
15. • Use it when you have queries that run more than five seconds
• Major usage in Data Analytics
• BigQuery is good for scenarios where data does not change often
• Retailer using data to forecast product sales
• Ads targeting proper customer sections
• Log analysis is making sense of computer generated records
Use Cases of BigQuery
• Use it when you have queries that run more than five seconds
• Major usage in Data Analytics
• BigQuery is good for scenarios where data does not change often
• Retailer using data to forecast product sales
• Ads targeting proper customer sections
• Log analysis is making sense of computer generated records
Use Cases of BigQuery
17. BigQuery Pricing Summary
Operation Pricing Details
Active storage $0.020 per GB The first 10 GB is free each month.
Long-term storage $0.010 per GB The first 10 GB is free each month.
BigQuery Storage API $1.10 per TB The BigQuery Storage API is not included in
the free tier.
Streaming Inserts $0.010 per 200 MB You are charged for rows that are successfully
inserted. Individual rows are calculated using a 1
KB minimum size.
Queries (on-demand) $5.00 per TB First 1 TB per month is free
Queries (monthly flat-
rate)
$10,000 per 500 slots You can purchase additional slots in 500 slot
increments.
Get $300 free credit to spend over 12 months