2. What is BigQuery ?
• BigQuery is a web service that lets you do
interactive analysis of massive datasets—up to
billions of rows. Scalable and easy to use,
BigQuery lets developers and businesses tap into
powerful data analytics on demand.
• BigQuery allows you to execute any business
query like a SQL query against very large
datasets, with potentially billions of rows.
• It is an OLAP (online analytical processing) system
and not an OLTP (online transactional processing)
system like MySql.
3. History of BigQuery
• Google uses Dremel ( internal code/project )
to do all sort of Analysis/Monitoring on
BigData ( Search, YouTube, AdWords, Gmail ).
BigQuery is externalization of Dremel.
• Hence, BigQuery gives us a stable platform
which has been tested all these years
extensively with credibility of Google.
4. How fast & Scalable it is !
• It can Scan 35 Billion Rows Without an Index
in Tens of Seconds.
• Dremel, the cloud-powered massively parallel
query service, shares Google’s infrastructure,
so it can parallelize each query and run it on
tens of thousands of servers simultaneously
5. By the way how it is so fast !
• Basic architecture which makes it so fast
includes following components:
– Columnar Storage: Data is stored in a columnar
storage fashion. All the columns are stored on
different servers ( even thousands of commodity
servers )
– Tree Architecture is used for dispatching queries
and aggregating results across thousands of
machines in a few seconds.
6. How to access it ?
Very simple to start using it. No need of
extensive programming knowledge. It can be
accessed via:
• Web-browser
• Command line tool
• REST API using client libraries in Java, python
etc.
7. Pricing
• Free to start with like Google App Engine.
• Only If usage exceeds a threshold; pricing is
based on:
– Storage ( $0.12 per GB/month )
– Query Processing ( only data processed in
columns, not entire tables)
• Batch Queries ( $0.02 per GB processed)
• Interactive Queries ( $0.035 per GB processed)
8. Comparison with MapReduce
• BigQuery is suitable for OLAP (Online Analytical
Processing) or BI (Business Intelligence) usage,
where most of the queries are simple and done
through a quick aggregation and filtering by a set
of columns (dimensions).
• Best for ad hoc queries or trialand-error data
analysis.
• MapReduce is a better choice when you want to
process unstructured data programmatically or if
you need to output gigabytes of data, as in the
case of merging two big tables.
9. Demo
• Visit https://bigquery.cloud.google.com/
• Sample Query:
SELECT title, COUNT(title) as count
FROM publicdata:samples.wikipedia
WHERE (REGEXP_MATCH(title,r'ww'ww'))
GROUP BY title
ORDER BY count DESC;
matches "ne'er", "we'll", "speak'st", "you'll" and so on in 313 million rows
within few seconds.
publicdata: ProjectId, samples is dataset, wikipediais table name
10. Have a good day at
BarCamp Bangalore 2013
Deepak Singhal
deepakagra@gmail.com
in.linkedin.com/in/deepakagra