Online retail is a fiercely competitive market where every retailer is trying to gain a competitive edge by understanding their customers better and analyzing their buying patterns, likes & dislikes. Such knowledge would greatly help them to target & serve their customers better, thereby increasing their sales revenues.
Big Data analytics is the answer for online retailer’s need to glean such business insights from their customer data.
In this webinar, we showcased & discussed:
- End to end data flow from session log files to analytical dashboards & reports.
- Developing solution aggregates and analyze online transactions.
- Aggregation and analysis of data residing in the session log files.
All data related techniques were demonstrated on the Google Cloud Platform.
All data visualizations were performed using Tableau.
4. Agenda
1. 5 minutes
- Introductions
2. 15 minutes
- Introduction to the Google Cloud Platform & its various
Big Data services
3. 10 minutes
- Showcasing various Online Retail Analytics
- User, Site & Products Analytics
4. 15 minutes
- Live Demonstration
- Ingestion of session log data to visualization in Tableau
5. 15 minutes
- Q&A Session
(Can extend beyond based on the audience enthusiasm & participation!)
7. App Engine - Architecture
A highly elastic and scale on demand infrastructure for deploying and running front
end web applications
App Master
Front End
Instance 1
Front End
Instance 2
Front End
Instance 3
Front End
Instance n
App Server
Instance 1
App Server
Instance 2
App Server
Instance 3
App Server
Instance n
Datasto
re
Memcac
he
Static
Files
https://cloud.google.com/products/app-engine
8. App Engine - Advantages
Scales on Demand
Very low barrier for entry
No initial hardware costs
Issues such as scalability, reliability are non-issues
Can handle very large amounts of data
Can handle very large user volumes, including sudden
spikes by scaling elastically
https://cloud.google.com/products/app-engine
9. BigQuery
A column oriented data store that can store and
process billions of rows of data
SQL like query syntax for querying data
Run ad-hoc queries against multi terabyte data
sets in seconds
Highly scalable, reliable and secure as it uses
underlying core Google Platform Infrastructure
https://cloud.google.com/products/big-query
10. BigQuery
Supports all the main ETL and BI tools like
Informatica, Talend, QlikView and Tableau
Primarily used for real-time data analysis and
visualization
Integration with App Engine through APIs
https://cloud.google.com/products/big-query
11. BigQuery
SQL Access
Only SELECT operations
No CREATE, UPDATE or DROP
Analysis of Unstructured data using REGEXP_yyyy
functions
JOINs of small (<8mb of compressed data) and large
tables are possible. Performance penalty for large
table joins
https://cloud.google.com/products/big-query
12. BigQuery
Programmatic Access
bq command line tool, Google API client library,
REST API
Google API client library supports various languages
like Java, Python, JavaScript, Ruby, PHP, Google
Apps Script
Authentication is handled via Oauth2
In REST API, credentials and HTTP request have to
be handled manually by user
https://cloud.google.com/products/big-query
13. BigQuery
Use Cases
Can
Real
be used for batch analysis of large data sets
time analytics for dashboard type applications
Pre-process
very large data sets and serve data in
real-time
Visualization
using third party tools that call Big
Query APIs.
https://cloud.google.com/products/big-query
14. Cloud SQL
MySQL database running on the Google Cloud Platform
Easy migration from local MySQL instances to Cloud SQL
Highly scalable and reliable with replication
Supports all major MySQL features including stored
procedures, triggers and views
GUI Frontend for easy administration and operations
Built on top of core Google Infrastructure
Easy integration with App Engine
https://cloud.google.com/products/cloud-sql
15. Cloud Storage
Custom
App
Cloud SQL
BigQuery
Cloud SQL
Cloud Storage
A highly reliable cloud storage
platform for storing and
accessing vast amounts of data
Can be used for data archival
and content delivery
Data can be ingested and
processed by other Google
Cloud Services
Accessible through GUI,
command line and APIs
https://cloud.google.com/products/cloud-storage
16. Cloud Storage
Object store that can deliver very efficiently over the internet
Not a mountable file system
Buckets are the basic container. They cannot be nested and can reside in the
US or EU geographies.
Objects are stored in buckets. They are immutable and can be upto 5TB in
size.
ACLs can be setup for Google users, groups, app domain, authenticated
users with READ, WRITE or FULL_CONTROL. Signed URL access for
anonymous users.
Can be accessed using XML and JSON REST APIs
Command line access using gsutil tool
App Engine Storage API for access from App Engine
https://cloud.google.com/products/cloud-storage
17. Compute Engine
Infrastructure as a service
Linux Virtual machines with associated storage and network
infrastructure are hosted by Google
Can run any type of application or workload in the google cloud that
uses the same Google Core Infrastructure
Highly elastic and scalable
A typical use case would be to provision a Hadoop Cluster on demand
using several 10s to 100s of virtual machines as name node and data
nodes
https://cloud.google.com/products/compute-engine
18. Compute Engine
Various machine type configurations possible such as High
Memory, High CPU, Standard etc.
Very easy provisioning and management using cloud
management software like RightScale
CentOS and Debian are the default OSes currently
supported.
Typical use cases are batch processing, log analysis, i/o
intensive workloads, hadoop on the cloud (map/reduce)
https://cloud.google.com/products/compute-engine
22. These large online
retailers are killing us!
I need to increase
sales.
I need to understand
my site visitors better.
VP OF MARKETING
Can Big Data
Analytics
help?
23. DATA SCIENTIST
Yes, Big Data
Analytics can help!
Google’s Cloud
platform handles all
the complexities of Big
Data processing.
We start with regular
session log files.
24. Session Log File (W3C compliant)
Time & Date
when visitor
came on site
Unique User
& Session Id
Product Page
Visited by
User
Referral Site
25. From the simple log files, we can do
sophisticated analytics like these:
DATA SCIENTIST
User Analytics
• # of Unique Site Visitors,
per hour, per day
• # of Return Site Visitors,
per hour, per day
• Total # of Site Visitors,
per hour, per day
• Top 10 Active Users
per hour, per day
26. Product Analytics like these:
• Top 10 Popular Products
per hour, per day
• Top 10 popular Products
in Shopping Basket
per hour, per day
• Top 10 Bought Products
per hour, per day
DATA SCIENTIST
27. Conversion Analytics like these:
• # of users who added products to
shopping basket
per hour, per day
• # of users who actually bought
products
per hour, per day
• % of users who browsed,
added products to shopping cart &
actually bought
per hour, per day.
DATA SCIENTIST
28. Behold, The Google Cloud Platform’s Dashboard!
DATA
SCIENTIST
List of
available
Services.
29. Google Cloud Platform’s Cloud Storage
DATA
SCIENTIST
Session
Log
Files
Uploaded
to
Cloud
Storage.
30. Google Cloud Platform’s BigQuery
DATA
SCIENTIST
Tables
on
BigQuery
with
data
from
Session
Log
Files.
31. Running a Query on BigQuery
DATA
SCIENTIST
Queries
on
BigQuery
are very
much
SQL
like,
easy to
develop
& gets
results
fast.
32. Visualize BigQuery’s Results in
DATA
SCIENTIST
Tableau
provides
an easy
&
effective
way to
develop
dashboards &
reports.
33. Site Analytics – Referral Site Comparisons
DATA
SCIENTIST
Traffic
referred
to site
from
other
sources
like
Google.
com
34. Site Analytics – Referral Site Comparisons
DATA
SCIENTIST
Traffic
referred
to site
from
other
sources
like
Google.
com
35. Site Analytics – Referral Site Comparisons
DATA
SCIENTIST
Traffic
referred
to site
from
other
sources
like
Google.
com
36. Product Analytics - Product Purchase Trends
DATA
SCIENTIST
Analysis
of
specific
products
as
purchased
on site
over
hours /
days in a
month
37. Conversion Analytics
- Product Added to Cart vs. Bought.
DATA
SCIENTIST
Analysis
of which
products
were
placed in
cart vs
actually
bought
over
hours /
days in a
month
38. Conversion Analytics - Conversion Rate Trends
DATA
SCIENTIST
Analysis
of which
products
were
placed in
cart vs
actually
bought
over
hours /
days in a
month
39. DATA SCIENTIST
You now know:
- how are your products
selling,
- when are they selling,
- which referring site helps
the most and other such info.
You now have the power of
Big Data Analytics on your
fingertips!
40. Wow!
Now, I can compete
against all the giants!
Let me start on my
marketing plans!
VP OF MARKETING
42. Third Eye is Google’s
Partner for the Google
Cloud Platform
We are mentioned on Google’s Cloud
Platform, site:
https://cloud.google.com/partners/
Tweet @ThirdEyeCss
43. Contact:
Dj Das, Founder & CEO, djdas@thirdeyecss.com
Alan Merrihew, VP of Business Development, alan@thirdeyecss.com
Phone
- (408) 462-5257
Corporate Site
- ThirdEyeCSS.com
Big Data Training
- ThirdEyeClasses.com
Big Data Educational Seminars
- BigDataCloud.com, BigDataCloudToday.com,
meetup.com/BigDataCloud
Big Data Jobs
- jobs.BigDataCloud.com
Big Data Analytics As a Service
- ClustersTogo.com, Power140.com, Raaser.com, PowerI90.com
Online Retail market has seen phenomenal growth in the recent years which is not going to abate in the next couple of decades.More Americans are planning to shop online than go down to their neighborhood mall!