SlideShare una empresa de Scribd logo
1 de 44
Big Data Analytics
on

January 9th, 2014
GROW WITH BIG DATA.
Third Eye Consulting Services & Solutions
LLC.
For Questions
Tweet Directly to
@ThirdEyeCss
We are actively monitoring this Twitter
channel!
Agenda
1. 5 minutes
- Introductions
2. 15 minutes
- Introduction to the Google Cloud Platform & its various
Big Data services
3. 10 minutes
- Showcasing various Online Retail Analytics
- User, Site & Products Analytics
4. 15 minutes
- Live Demonstration
- Ingestion of session log data to visualization in Tableau
5. 15 minutes
- Q&A Session
(Can extend beyond based on the audience enthusiasm & participation!)
Google Cloud Platform
Google Cloud Platform
– Key Components
App Engine
 Big Query
 Cloud SQL
 Cloud Storage
 Compute Engine

Tweet @ThirdEyeCss



https://cloud.google.com
App Engine - Architecture
A highly elastic and scale on demand infrastructure for deploying and running front
end web applications
App Master

Front End
Instance 1
Front End
Instance 2
Front End
Instance 3
Front End
Instance n

App Server
Instance 1
App Server
Instance 2
App Server
Instance 3
App Server
Instance n

Datasto
re

Memcac
he

Static
Files

https://cloud.google.com/products/app-engine
App Engine - Advantages







Scales on Demand
Very low barrier for entry
No initial hardware costs
Issues such as scalability, reliability are non-issues
Can handle very large amounts of data
Can handle very large user volumes, including sudden
spikes by scaling elastically

https://cloud.google.com/products/app-engine
BigQuery


A column oriented data store that can store and
process billions of rows of data



SQL like query syntax for querying data



Run ad-hoc queries against multi terabyte data
sets in seconds



Highly scalable, reliable and secure as it uses
underlying core Google Platform Infrastructure

https://cloud.google.com/products/big-query
BigQuery


Supports all the main ETL and BI tools like
Informatica, Talend, QlikView and Tableau



Primarily used for real-time data analysis and
visualization



Integration with App Engine through APIs

https://cloud.google.com/products/big-query
BigQuery
SQL Access


Only SELECT operations



No CREATE, UPDATE or DROP



Analysis of Unstructured data using REGEXP_yyyy
functions



JOINs of small (<8mb of compressed data) and large
tables are possible. Performance penalty for large
table joins

https://cloud.google.com/products/big-query
BigQuery
Programmatic Access


bq command line tool, Google API client library,
REST API



Google API client library supports various languages
like Java, Python, JavaScript, Ruby, PHP, Google
Apps Script



Authentication is handled via Oauth2



In REST API, credentials and HTTP request have to
be handled manually by user

https://cloud.google.com/products/big-query
BigQuery
Use Cases
 Can
 Real

be used for batch analysis of large data sets
time analytics for dashboard type applications

 Pre-process

very large data sets and serve data in

real-time
 Visualization

using third party tools that call Big

Query APIs.
https://cloud.google.com/products/big-query
Cloud SQL


MySQL database running on the Google Cloud Platform



Easy migration from local MySQL instances to Cloud SQL



Highly scalable and reliable with replication



Supports all major MySQL features including stored
procedures, triggers and views



GUI Frontend for easy administration and operations



Built on top of core Google Infrastructure



Easy integration with App Engine

https://cloud.google.com/products/cloud-sql
Cloud Storage




Custom
App

Cloud SQL

BigQuery

Cloud SQL

Cloud Storage

A highly reliable cloud storage
platform for storing and
accessing vast amounts of data
Can be used for data archival
and content delivery



Data can be ingested and
processed by other Google
Cloud Services



Accessible through GUI,
command line and APIs

https://cloud.google.com/products/cloud-storage
Cloud Storage


Object store that can deliver very efficiently over the internet



Not a mountable file system



Buckets are the basic container. They cannot be nested and can reside in the
US or EU geographies.



Objects are stored in buckets. They are immutable and can be upto 5TB in
size.



ACLs can be setup for Google users, groups, app domain, authenticated
users with READ, WRITE or FULL_CONTROL. Signed URL access for
anonymous users.



Can be accessed using XML and JSON REST APIs



Command line access using gsutil tool

 App Engine Storage API for access from App Engine
https://cloud.google.com/products/cloud-storage
Compute Engine


Infrastructure as a service



Linux Virtual machines with associated storage and network
infrastructure are hosted by Google



Can run any type of application or workload in the google cloud that
uses the same Google Core Infrastructure



Highly elastic and scalable



A typical use case would be to provision a Hadoop Cluster on demand
using several 10s to 100s of virtual machines as name node and data
nodes

https://cloud.google.com/products/compute-engine
Compute Engine


Various machine type configurations possible such as High
Memory, High CPU, Standard etc.



Very easy provisioning and management using cloud
management software like RightScale



CentOS and Debian are the default OSes currently
supported.



Typical use cases are batch processing, log analysis, i/o
intensive workloads, hadoop on the cloud (map/reduce)

https://cloud.google.com/products/compute-engine
Online Retail
Analytics
&
Visualization
Online Retail Industry

Forrester: U.S. Online Retail Sales to Hit $370 Billion by
Healthcare Store


Large online
retailer’s Health
Store website.



Thousands of health
care products are
sold per month.
These large online
retailers are killing us!
I need to increase
sales.
I need to understand
my site visitors better.
VP OF MARKETING

Can Big Data
Analytics
help?
DATA SCIENTIST

Yes, Big Data
Analytics can help!
Google’s Cloud
platform handles all
the complexities of Big
Data processing.
We start with regular
session log files.
Session Log File (W3C compliant)

Time & Date
when visitor
came on site

Unique User
& Session Id

Product Page
Visited by
User

Referral Site
From the simple log files, we can do
sophisticated analytics like these:

DATA SCIENTIST

User Analytics
• # of Unique Site Visitors,
per hour, per day
• # of Return Site Visitors,
per hour, per day
• Total # of Site Visitors,
per hour, per day
• Top 10 Active Users
per hour, per day
Product Analytics like these:
• Top 10 Popular Products
per hour, per day
• Top 10 popular Products
in Shopping Basket
per hour, per day
• Top 10 Bought Products
per hour, per day
DATA SCIENTIST
Conversion Analytics like these:
• # of users who added products to
shopping basket
per hour, per day
• # of users who actually bought
products
per hour, per day
• % of users who browsed,
added products to shopping cart &
actually bought
per hour, per day.
DATA SCIENTIST
Behold, The Google Cloud Platform’s Dashboard!
DATA
SCIENTIST

List of
available
Services.
Google Cloud Platform’s Cloud Storage
DATA
SCIENTIST

Session
Log
Files
Uploaded

to
Cloud
Storage.
Google Cloud Platform’s BigQuery
DATA
SCIENTIST

Tables
on
BigQuery

with
data
from
Session
Log
Files.
Running a Query on BigQuery
DATA
SCIENTIST

Queries
on
BigQuery

are very
much
SQL
like,
easy to
develop
& gets
results
fast.
Visualize BigQuery’s Results in
DATA
SCIENTIST

Tableau
provides
an easy
&
effective
way to
develop
dashboards &
reports.
Site Analytics – Referral Site Comparisons
DATA
SCIENTIST

Traffic
referred
to site
from
other
sources
like
Google.
com
Site Analytics – Referral Site Comparisons
DATA
SCIENTIST

Traffic
referred
to site
from
other
sources
like
Google.
com
Site Analytics – Referral Site Comparisons
DATA
SCIENTIST

Traffic
referred
to site
from
other
sources
like
Google.
com
Product Analytics - Product Purchase Trends
DATA
SCIENTIST

Analysis
of
specific
products
as
purchased

on site
over
hours /
days in a
month
Conversion Analytics
- Product Added to Cart vs. Bought.
DATA
SCIENTIST

Analysis
of which
products
were
placed in
cart vs
actually
bought
over
hours /
days in a
month
Conversion Analytics - Conversion Rate Trends
DATA
SCIENTIST

Analysis
of which
products
were
placed in
cart vs
actually
bought
over
hours /
days in a
month
DATA SCIENTIST

You now know:
- how are your products
selling,
- when are they selling,
- which referring site helps
the most and other such info.
You now have the power of
Big Data Analytics on your
fingertips!
Wow!
Now, I can compete
against all the giants!
Let me start on my
marketing plans!
VP OF MARKETING
Q&A
@ThirdEyeCss
Third Eye is Google’s
Partner for the Google
Cloud Platform
We are mentioned on Google’s Cloud
Platform, site:
https://cloud.google.com/partners/
Tweet @ThirdEyeCss
Contact:
Dj Das, Founder & CEO, djdas@thirdeyecss.com
Alan Merrihew, VP of Business Development, alan@thirdeyecss.com
Phone

- (408) 462-5257

Corporate Site

- ThirdEyeCSS.com

Big Data Training

- ThirdEyeClasses.com

Big Data Educational Seminars
- BigDataCloud.com, BigDataCloudToday.com,
meetup.com/BigDataCloud
Big Data Jobs

- jobs.BigDataCloud.com

Big Data Analytics As a Service

- ClustersTogo.com, Power140.com, Raaser.com, PowerI90.com
THANK YOU!

Más contenido relacionado

Destacado

(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014Amazon Web Services
 
Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2Khor SoonHin
 
Introduction to Python and TensorFlow
Introduction to Python and TensorFlowIntroduction to Python and TensorFlow
Introduction to Python and TensorFlowBayu Aldi Yansyah
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platformrajdeep
 
Google Cloud Technologies Overview
Google Cloud Technologies OverviewGoogle Cloud Technologies Overview
Google Cloud Technologies OverviewChris Schalk
 
Big Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better PlatformBig Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better PlatformAmazon Web Services
 
Introduccion a Azure Machine Learning
Introduccion a Azure Machine LearningIntroduccion a Azure Machine Learning
Introduccion a Azure Machine LearningEduardo Castro
 
Data Visualization: Introduction to Shiny Web Applications
Data Visualization: Introduction to Shiny Web ApplicationsData Visualization: Introduction to Shiny Web Applications
Data Visualization: Introduction to Shiny Web ApplicationsOlga Scrivner
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit
 
Machine learning and TensorFlow
Machine learning and TensorFlowMachine learning and TensorFlow
Machine learning and TensorFlowJose Papo, MSc
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformUnderstanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformDr. Ketan Parmar
 
Azure Machine Learning tutorial
Azure Machine Learning tutorialAzure Machine Learning tutorial
Azure Machine Learning tutorialGiacomo Lanciano
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlowDarshan Patel
 
Google Cloud for Developers - Devfest Manila
Google Cloud for Developers - Devfest ManilaGoogle Cloud for Developers - Devfest Manila
Google Cloud for Developers - Devfest ManilaPatrick Chanezon
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudAmazon Web Services
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco
 

Destacado (20)

(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
 
Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2
 
Introduction to Python and TensorFlow
Introduction to Python and TensorFlowIntroduction to Python and TensorFlow
Introduction to Python and TensorFlow
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platform
 
Google Cloud Technologies Overview
Google Cloud Technologies OverviewGoogle Cloud Technologies Overview
Google Cloud Technologies Overview
 
Big Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better PlatformBig Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better Platform
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Introduccion a Azure Machine Learning
Introduccion a Azure Machine LearningIntroduccion a Azure Machine Learning
Introduccion a Azure Machine Learning
 
Data Visualization: Introduction to Shiny Web Applications
Data Visualization: Introduction to Shiny Web ApplicationsData Visualization: Introduction to Shiny Web Applications
Data Visualization: Introduction to Shiny Web Applications
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 
Machine learning and TensorFlow
Machine learning and TensorFlowMachine learning and TensorFlow
Machine learning and TensorFlow
 
TensorFlow
TensorFlowTensorFlow
TensorFlow
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformUnderstanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud Platform
 
Azure Machine Learning tutorial
Azure Machine Learning tutorialAzure Machine Learning tutorial
Azure Machine Learning tutorial
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
Google Cloud for Developers - Devfest Manila
Google Cloud for Developers - Devfest ManilaGoogle Cloud for Developers - Devfest Manila
Google Cloud for Developers - Devfest Manila
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
 
Big Data and Analytics on AWS
Big Data and Analytics on AWS Big Data and Analytics on AWS
Big Data and Analytics on AWS
 

Más de BigDataCloud

Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsWebinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsBigDataCloud
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction SystemBigDataCloud
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS BigDataCloud
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing ServicesBigDataCloud
 
Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!BigDataCloud
 
Big Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBig Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBigDataCloud
 
Big Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBig Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBigDataCloud
 
Streak + Google Cloud Platform
Streak + Google Cloud PlatformStreak + Google Cloud Platform
Streak + Google Cloud PlatformBigDataCloud
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value BigDataCloud
 
Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.BigDataCloud
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideBigDataCloud
 
Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?BigDataCloud
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalBigDataCloud
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBigDataCloud
 
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBig Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBigDataCloud
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinBigDataCloud
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud
 

Más de BigDataCloud (20)

Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsWebinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction System
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing Services
 
Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!
 
Big Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBig Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & Apps
 
Big Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBig Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud Platform
 
Streak + Google Cloud Platform
Streak + Google Cloud PlatformStreak + Google Cloud Platform
Streak + Google Cloud Platform
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value
 
Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural Guide
 
Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBig Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will Win
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 

Último

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Último (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Big Data Analytics on the Google Cloud Platform

  • 2. GROW WITH BIG DATA. Third Eye Consulting Services & Solutions LLC.
  • 3. For Questions Tweet Directly to @ThirdEyeCss We are actively monitoring this Twitter channel!
  • 4. Agenda 1. 5 minutes - Introductions 2. 15 minutes - Introduction to the Google Cloud Platform & its various Big Data services 3. 10 minutes - Showcasing various Online Retail Analytics - User, Site & Products Analytics 4. 15 minutes - Live Demonstration - Ingestion of session log data to visualization in Tableau 5. 15 minutes - Q&A Session (Can extend beyond based on the audience enthusiasm & participation!)
  • 6. Google Cloud Platform – Key Components App Engine  Big Query  Cloud SQL  Cloud Storage  Compute Engine Tweet @ThirdEyeCss  https://cloud.google.com
  • 7. App Engine - Architecture A highly elastic and scale on demand infrastructure for deploying and running front end web applications App Master Front End Instance 1 Front End Instance 2 Front End Instance 3 Front End Instance n App Server Instance 1 App Server Instance 2 App Server Instance 3 App Server Instance n Datasto re Memcac he Static Files https://cloud.google.com/products/app-engine
  • 8. App Engine - Advantages       Scales on Demand Very low barrier for entry No initial hardware costs Issues such as scalability, reliability are non-issues Can handle very large amounts of data Can handle very large user volumes, including sudden spikes by scaling elastically https://cloud.google.com/products/app-engine
  • 9. BigQuery  A column oriented data store that can store and process billions of rows of data  SQL like query syntax for querying data  Run ad-hoc queries against multi terabyte data sets in seconds  Highly scalable, reliable and secure as it uses underlying core Google Platform Infrastructure https://cloud.google.com/products/big-query
  • 10. BigQuery  Supports all the main ETL and BI tools like Informatica, Talend, QlikView and Tableau  Primarily used for real-time data analysis and visualization  Integration with App Engine through APIs https://cloud.google.com/products/big-query
  • 11. BigQuery SQL Access  Only SELECT operations  No CREATE, UPDATE or DROP  Analysis of Unstructured data using REGEXP_yyyy functions  JOINs of small (<8mb of compressed data) and large tables are possible. Performance penalty for large table joins https://cloud.google.com/products/big-query
  • 12. BigQuery Programmatic Access  bq command line tool, Google API client library, REST API  Google API client library supports various languages like Java, Python, JavaScript, Ruby, PHP, Google Apps Script  Authentication is handled via Oauth2  In REST API, credentials and HTTP request have to be handled manually by user https://cloud.google.com/products/big-query
  • 13. BigQuery Use Cases  Can  Real be used for batch analysis of large data sets time analytics for dashboard type applications  Pre-process very large data sets and serve data in real-time  Visualization using third party tools that call Big Query APIs. https://cloud.google.com/products/big-query
  • 14. Cloud SQL  MySQL database running on the Google Cloud Platform  Easy migration from local MySQL instances to Cloud SQL  Highly scalable and reliable with replication  Supports all major MySQL features including stored procedures, triggers and views  GUI Frontend for easy administration and operations  Built on top of core Google Infrastructure  Easy integration with App Engine https://cloud.google.com/products/cloud-sql
  • 15. Cloud Storage   Custom App Cloud SQL BigQuery Cloud SQL Cloud Storage A highly reliable cloud storage platform for storing and accessing vast amounts of data Can be used for data archival and content delivery  Data can be ingested and processed by other Google Cloud Services  Accessible through GUI, command line and APIs https://cloud.google.com/products/cloud-storage
  • 16. Cloud Storage  Object store that can deliver very efficiently over the internet  Not a mountable file system  Buckets are the basic container. They cannot be nested and can reside in the US or EU geographies.  Objects are stored in buckets. They are immutable and can be upto 5TB in size.  ACLs can be setup for Google users, groups, app domain, authenticated users with READ, WRITE or FULL_CONTROL. Signed URL access for anonymous users.  Can be accessed using XML and JSON REST APIs  Command line access using gsutil tool  App Engine Storage API for access from App Engine https://cloud.google.com/products/cloud-storage
  • 17. Compute Engine  Infrastructure as a service  Linux Virtual machines with associated storage and network infrastructure are hosted by Google  Can run any type of application or workload in the google cloud that uses the same Google Core Infrastructure  Highly elastic and scalable  A typical use case would be to provision a Hadoop Cluster on demand using several 10s to 100s of virtual machines as name node and data nodes https://cloud.google.com/products/compute-engine
  • 18. Compute Engine  Various machine type configurations possible such as High Memory, High CPU, Standard etc.  Very easy provisioning and management using cloud management software like RightScale  CentOS and Debian are the default OSes currently supported.  Typical use cases are batch processing, log analysis, i/o intensive workloads, hadoop on the cloud (map/reduce) https://cloud.google.com/products/compute-engine
  • 20. Online Retail Industry Forrester: U.S. Online Retail Sales to Hit $370 Billion by
  • 21. Healthcare Store  Large online retailer’s Health Store website.  Thousands of health care products are sold per month.
  • 22. These large online retailers are killing us! I need to increase sales. I need to understand my site visitors better. VP OF MARKETING Can Big Data Analytics help?
  • 23. DATA SCIENTIST Yes, Big Data Analytics can help! Google’s Cloud platform handles all the complexities of Big Data processing. We start with regular session log files.
  • 24. Session Log File (W3C compliant) Time & Date when visitor came on site Unique User & Session Id Product Page Visited by User Referral Site
  • 25. From the simple log files, we can do sophisticated analytics like these: DATA SCIENTIST User Analytics • # of Unique Site Visitors, per hour, per day • # of Return Site Visitors, per hour, per day • Total # of Site Visitors, per hour, per day • Top 10 Active Users per hour, per day
  • 26. Product Analytics like these: • Top 10 Popular Products per hour, per day • Top 10 popular Products in Shopping Basket per hour, per day • Top 10 Bought Products per hour, per day DATA SCIENTIST
  • 27. Conversion Analytics like these: • # of users who added products to shopping basket per hour, per day • # of users who actually bought products per hour, per day • % of users who browsed, added products to shopping cart & actually bought per hour, per day. DATA SCIENTIST
  • 28. Behold, The Google Cloud Platform’s Dashboard! DATA SCIENTIST List of available Services.
  • 29. Google Cloud Platform’s Cloud Storage DATA SCIENTIST Session Log Files Uploaded to Cloud Storage.
  • 30. Google Cloud Platform’s BigQuery DATA SCIENTIST Tables on BigQuery with data from Session Log Files.
  • 31. Running a Query on BigQuery DATA SCIENTIST Queries on BigQuery are very much SQL like, easy to develop & gets results fast.
  • 32. Visualize BigQuery’s Results in DATA SCIENTIST Tableau provides an easy & effective way to develop dashboards & reports.
  • 33. Site Analytics – Referral Site Comparisons DATA SCIENTIST Traffic referred to site from other sources like Google. com
  • 34. Site Analytics – Referral Site Comparisons DATA SCIENTIST Traffic referred to site from other sources like Google. com
  • 35. Site Analytics – Referral Site Comparisons DATA SCIENTIST Traffic referred to site from other sources like Google. com
  • 36. Product Analytics - Product Purchase Trends DATA SCIENTIST Analysis of specific products as purchased on site over hours / days in a month
  • 37. Conversion Analytics - Product Added to Cart vs. Bought. DATA SCIENTIST Analysis of which products were placed in cart vs actually bought over hours / days in a month
  • 38. Conversion Analytics - Conversion Rate Trends DATA SCIENTIST Analysis of which products were placed in cart vs actually bought over hours / days in a month
  • 39. DATA SCIENTIST You now know: - how are your products selling, - when are they selling, - which referring site helps the most and other such info. You now have the power of Big Data Analytics on your fingertips!
  • 40. Wow! Now, I can compete against all the giants! Let me start on my marketing plans! VP OF MARKETING
  • 42. Third Eye is Google’s Partner for the Google Cloud Platform We are mentioned on Google’s Cloud Platform, site: https://cloud.google.com/partners/ Tweet @ThirdEyeCss
  • 43. Contact: Dj Das, Founder & CEO, djdas@thirdeyecss.com Alan Merrihew, VP of Business Development, alan@thirdeyecss.com Phone - (408) 462-5257 Corporate Site - ThirdEyeCSS.com Big Data Training - ThirdEyeClasses.com Big Data Educational Seminars - BigDataCloud.com, BigDataCloudToday.com, meetup.com/BigDataCloud Big Data Jobs - jobs.BigDataCloud.com Big Data Analytics As a Service - ClustersTogo.com, Power140.com, Raaser.com, PowerI90.com

Notas del editor

  1. Online Retail market has seen phenomenal growth in the recent years which is not going to abate in the next couple of decades.More Americans are planning to shop online than go down to their neighborhood mall!