The document discusses using Google Cloud Platform for big data applications. It provides examples of how various companies are using GCP products like BigQuery, Dataflow, and Cloud Storage to gain insights from large, diverse datasets. Specifically, it outlines how marketing analytics, sensor data from IoT, log and system data, SaaS applications, and traditional Hadoop workloads can benefit from GCP's scalable and easy-to-use infrastructure for capturing, processing, and analyzing big data.
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Google на конференции Big Data Russia
1. Big Data with Google Cloud Platform
Focus on insight, not infrastructure
Google confidential │ Do not distribute
Google confidential │ Do not distribute
Daniel Bergqvist
Solution Engineer, Big Data Technologies
Olga Strelova
Cloud Platform Sales,
Tel: +7 495 734-71-41, olgastrelova@google.com
3. Google confidential │ Do not distribute
Big Data is driving Big Value
Used data from telematic
sensors in over 46K vehicles
to:
● Reduce daily routes by
85 million miles
● Saved 8.4 million gallons
of fuel
● Saved over $30 million
in miles cut/driver/day
Created Snapshot device to
collect data on driving habits
and user behavior in real-time
Calculated applicable discount
to driver’s monthly premium
based on their individual
behavior
Analyzed the activity of their
entire customer base (over
7M customers and 19B
images)
Uncovered trends that
improved customer
acquisition, retention and
value through optimized
marketing
4. Google confidential │ Do not distribute
Trends
Increasing Digitization
of Human & Economic
Activity
Falling Costs of
Storage & Computing
Increasing Pace of
Innovation
5. Google confidential │ Do not distribute
Opportunities with Big Data
Recognize and seize market trends before your competitors
Capture business value from information
Create a smarter, learning organization
1
2
3
6. Big Data remains inaccessible
Big Data is Hard Big Data is Expensive
Google confidential │ Do not distribute
Complex technical
infrastructure to
support distributed
computing
Requires
specialized
expertise
Time
consuming
Storage costs
scale with larger
datasets
Computing
resources must
be provisioned
for peak-loads
Personnel are
expensive
7. Google is making Big Data accessible
Big Data is Hard Big Data is Expensive
Google confidential │ Do not distribute
No complex data
architecture
required
Use the
technical and
product
skillsets you
already have
Pay on-demand
for only the
resources you
use
Take
advantage of
falling prices
& Moore’s
Law
Reduce
infrastructure
management
burden
Easy
Affordable
Query within
seconds and
get real-time
results
11. Google Services in Numbers
Search
1B Searches/Month
>25% of F500 (GSA)
Android
1.5M+ activation per day
900+ M devices
YouTube
100 hours of video
uploaded per minute
G+
500M+ accounts;
135M+ active in stream
Apps
500M+ Gmail
Chrome
310M+ browser users
Maps & Earth
1B+ downloads; 200M+ mobile;
10M+ activations on iOS
Cloud Platform
4.75M+ apps; 250K+
developers
12.
13. GFS MillWheel
Google confidential │ Do not distribute
Google is a pioneer in Big Data
MapReduce Dremel Spanner
Big Table Colossus
Flume
2002 2004 2006 2008 2010 2012 2013
14. We help you manage the entire lifecycle of Big Data
Open
Source
Tools
Google confidential │ Do not distribute
Store
Capture Analyze
BigQuery Dataflow
Pub/Sub
Process
Storage SQL Datastore Dataflow
15. • Event management system that simplifies analytics application architecture
• Connect your services with reliable, many-to-many asynchronous messaging
• Guarantees that messages will be delivered whether or not all consumers are online
• Provides a single global ingestion point, not dependent on zone or regional availability
• Scales to what you need with no wasted capacity
Google confidential │ Do not distribute
Our Big Data products
Computing Patterns
Cloud
Pub/Sub
Cloud
Dataflow
BigQuery
Open Source
Tools
• Successor to MapReduce and based on Google technologies, including Flume and MillWheel
• Fully managed service
• Create data pipelines that ingest, transform and analyze in batch or streaming mode
• Takes care of deploying, maintaining and scaling infrastructure
• Interactive analysis of large scale datasets, providing real-time insights
• Run fast, SQL queries against virtually limitless datasets in seconds
• Full visibility and control with pricing, only pay for querying and storage
• No complex data architecture required
• Run Hadoop and other FOSS on Cloud platform; take advantage of performance, ease of use and cost efficiency
• Using cloud resources eliminates capital costs and reduces administration time
• With one command line, start a cluster running Hadoop, Hive, Pig, Spark or Shark in order to get up and running
quickly and without worrying about configuration hassles
• Using GCP storage products allows you to take advantage of accessing data within any Hadoop deployment
17. Google confidential │ Do not distribute
1. Marketing Analytics The Technology
Using Google Cloud Platform for marketing
analytics
enables a deeper understanding of how marketing
investments are performing
What Cloud Platform offers:
● Easily micro-segment by looking for discreet
patterns in large sets of customer data
● Measure campaigns by combining multiple
datasets that can track campaigns across
channels and users across stages of the
buying funnel
● Market-mix modeling to optimize spend
across channels
● Identify patterns and trends in real-time to
improve customer acquisition and ROI
Integration between Google Analytics Premium
and BigQuery allows for data mashups, analysis of
user interaction across multiple devices, and
complex queries at lightening speed to gain deeper,
broader insights
Cloud Dataflow helps you ingest and analyze data
from both live campaigns, existing CRM tools, and
any other data sources you need
Open Source Tools and Connectors allow you to
harness the power of many open-source tools such
as Hadoop and Spark to provide flexibility when
analyzing campaign data
BigQuery enables interactive analysis of unlimited
amounts of data allowing you to seize opportunities
and optimize in a timely manner, thereby
increasing acquisition and ROI
18. Boosting Sales While Improving Shopping Experience
Google confidential │ Do not
distribute
Home furnishing retailer Rooms
To Go simplifies the consumer
shopping experience by offering
completely designed room
packages.
19. Google confidential │ Do not distribute
2. Sensor Data & IoT The Technology
Using Google Cloud Platform for sensor data &
IoT enables use of diffuse data sources to
optimize large-scale systems & improve
production processes
What Cloud Platform offers:
● Scalable, reliable platform for capturing and
managing IoT data
● Ability to run analytics (streaming and
historical) over this data
● Improve customer experiences based on
faster responses to events
● Cost effective storage needed to process vast
amounts of data
Google Cloud Storage, Cloud SQL, and Datastore
provide scalable and secure ways to store data
Pub/Sub provides a reliable system for event
collection and management
Dataflow allows to filter, aggregate and enrich data
both for streaming and batch analysis under one
API
BigQuery allows for interactive analysis of unlimited
data to uncover trends in large databases and
across all customers in order to improve customer
experience
20. Connected Equipments/Devices
Lennox International Inc. is an American
company. Through its subsidiaries, it is a
provider of climate control products for the
heating, ventilation, air conditioning, and
refrigeration markets in housing and
commercial sectors around the world.
Goal: Capture detailed product performance
data and ambient conditions from the installed
units for better innovation and customer
service
● Innovation: Finding out areas for product
improvements and new designs
● Customer Delight: Providing energy settings
advice proactively to customer based on usage,
weather conditions etc...
● Customer Service: Predictive maintenance to
avoid major breakdowns
● Cost Savings: Better understanding of failure
points feeding back into better design, helping
reduce warranty and replacement costs
21. Google confidential │ Do not distribute
3. Log Data The Technology
Using Google Cloud Platform for Log Data
enables easy management of massive log files
constantly ingesting real-time data with much
shorter response times
What Cloud Platform offers:
● Better management of massive log files
● An efficient platform for capturing, managing
and analyzing IoT infrastructure
● The ability to continuously identify customer
trends and take timely actions
BigQuery handles log files of massive volume,
constantly ingesting real-time data with much
shorter response times
Pub/Sub provides a fully managed service for
reliable event ingestion, distribution and
notifications, which automatically scales to what
you need with no wasted capacity
Dataflow is a pipeline management system that
allows you to examine a real-time stream of data
as well as compare it to historical data in order
to capture significant patterns and activities
Apps running in Compute Engine and App
Engine benefit from advanced log analytics
based on data streaming with real-time alerts
22. Phones
BigQuery Storage
BigQuery
Workflows
Big Query
Compute
Engine
Hadoop MapReduce Workflows
App Engine
Cloud Storage
Big Query
• Business Analysts
• Applications
• Visualizations
Motorola
23. Google confidential │ Do not distribute
4. SaaS The Technology
Using Google Cloud Platform for
SaaS
enables ease of management for
analytics
What Cloud Platform offers:
● Ease of integration with open
source tools
● A platform to capture, process
and analyze large scale
analytics without needing to
worry about building a
complex infrastructure
● Technology that scales and
requires minimal
administration
● The most cost effective, fastest
way to store and analyze data
Connectors and Tools for Hadoop data sources allow you to easily install
different open source processing frameworks such as Spark, Shark, Hive
and Pig to take advantage of interoperability and portability within all
these frameworks as well as other Google Cloud Platform products under
one system
Dataflow takes care of ingestion, transformation and analysis of data,
providing real-time access to application and consumer data across a set
of devices
Compute Engine allows you to easily scale up and down depending on
your workload. Also, per minute billing lets you pay for exactly what you
use and sustained-use discounts automatically reward you for running
steady-state workloads
BigQuery provides a 99.9% uptime SLA and you only pay for the storage
you need and queries you run, giving you full visibility and control
Cloud Storage and Big Query require no hardware/software eliminating
capital expenditure or the need to build complex infrastructure
24. Google confidential │ Do not
distribute
Streak - CRM in email
Managing millions
of interactions and
recommendations/
day with Prediction
API and BigQuery
25. Google confidential │ Do not distribute
5. Traditional Hadoop Workloads The Technology
Using Google Cloud Platform for Hadoop Workloads
enables an easy and effective way to unlock the power of
the Apache Hadoop framework
What Cloud Platform offers:
● Quick startup times
● Unmatched value with per-minute billing to optimize
for scale and speed
● Agility to mix and match data with multiple open
source software and cloud services without worrying
about configuration
● Greater stability for running Hadoop
● Flexibility and control of resizing your cluster
depending on workload
● An easy way to leverage the Hadoop framework
without worrying about investing in costly
infrastructures and administration
Compute Engine virtual machines start in seconds
bdutil allows you to easily deploy and use the best
tools from the open-source ecosystem. With one
command line, you can start a cluster running
Hadoop, Hive, Pig, Spark or Shark in order to get up
and running quickly without worrying about
configuration hassles
Cloud Storage frees you from the burden of
investing in complex disks and machines and
provides flexibility to scale up and down when
needed
Connectors provide access to Cloud Storage,
BigQuery and Datastore, which allow you to turn
down your cluster without losing any of your data
and take advantage of accessing your data within
any of your Hadoop deployments
26. Google confidential │ Do not distribute
Cdiscount.com
France's largest e-commerce site,
Cdiscount.com, is using Compute
Engine because it's 15x faster than
their on premise data warehouse.
27. Google confidential │ Do not distribute
Google probably processes more
information than any company on the
planet and tends to have to invent tools
to cope with the data. As a result its
technology runs a good five to 10 years
ahead of the competition.
Bloomberg Businessweek, June 2014