Patrick Deglon worked at CERN from 1996-2002 where he analyzed particle collision data from experiments. He used large-scale computational analysis and statistics to make discoveries about particle properties and interactions. In 2004, he joined eBay where he now leads analytics to understand customer behavior and measure the impact of initiatives using A/B testing and other techniques. He discusses how the challenges of analyzing large datasets at CERN prepared him for working with eBay's "big data".
3. FROM THE BIG BANG
TO ECOMMERCE, A
JOURNEY IN MAKING
SENSE OF BIG DATA
Patrick Deglon
Director of Global Traffic Analytics
pdeglon@ebay.com
linkd.in/pdeglon
6. During 1996-2002, worked at CERN
(the European Laboratory for Particle Physics)
for my MS and PhD at the University of Geneva
Mont
Blanc
Geneva
Switzerland
17 miles underground tunnel
for the LEP & LHC accelerator
Source: CERN 6
Image: CERN
8. Example of a particle collision
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
8
9. Solving the puzzle… which particles go together?
1. AB + CD?
2. AC + BD?
3. AD + BC?
A
B
?
D
C
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
9
10. PAW – Physics Analysis Workstation
Source: Wikipedia
Tape robot
Data collection & analysis was
done in Fortran. Advance
analysis/statistics was done
through PAW. [1996-2002]
Source: CERN
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
10
11. Solution: Big Data infrastructure enables large scale
computational such as combine all possibilities (cross-product)
Schematic View
CERN Example
(discovery of a new particle bb)
Signal
(particle resonance)
Statistical Noise
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
Source: http://www.atlas.ch/news/2011/ATLAS-discovers-its-first-new-particle.html
11
12. Size of the electron?
R < 5.1 x 10-19 m ***
*** Patrick Deglon, Etude de la diffusion Bhabha avec le détecteur L3
au LEP, Th. phys. Genève, 2002; Sc. 3332
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
12
13. Extra dimension?
MS > 1.1 TeV ***
graviton
extra
dimension
e+
e+
ee-
our universe in 4 dimensions
*** Patrick Deglon, Etude de la diffusion Bhabha avec le détecteur L3
au LEP, Th. phys. Genève, 2002; Sc. 3332
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
13
14. 2004, joined
eBay European HQ
in Bern, Switzerland
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
14
15. $68 billion
in merchandise traded in 2011 ... or
$1.3 million every
10 minutes
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
15
16. eBay: The World's Online Marketplace®
every
every
every
26
2
4
min. min. sec.
a Ford Mustang is sold
a major appliance is sold
a pair of shoes is sold
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
16
17. CERN vs EBAY
CERN
EBAY
• Write kilometers long Fortran code
• Analysis can run for many hours… before a
batch robot error
• Write miles long SQL code
• Queries can run for many hours… before a
spool space error
• Study billions of collision data
• Study billions of transactional data
• Great depth of data structure & complexity
• Great depth of data structure & complexity
• Know your local expert for question – but try
to find the solution by yourself… much
quicker
• Know your local expert for question – but try
to find the solution by yourself… much
quicker
• Remove “bad runs” (unclean data batch)
• Remove “wackos” (non material
transactions)
• Transform a complex system into insights
• Transform a complex system into insights
• Communicate findings to conferences
• Communicate recommendation to business
review
• Strong competitive landscape (4 distinct
experiments competing to the first to
publish, or publish better results)
• Strong competitive landscape
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
17
18. Analytics at eBay
“CIO”
“CDO”
“CAO”
“CMO”
Analytics Platforms & Delivery (APD)
Analytics
Marketing
Technology
Finance
Business
Units
End Users
of Big Data
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
18
19. What my friends
think I do
What my mum
thinks I do
What the BU
thinks I do
What I think I do
What the BU
wants me to do
What I really do
Source: Pierre Donzier
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
19
21. Core Analytics
Data Access
Business Centric
DataHub
MS Excel
Tableau
Data Platform
Technology Centric
SAS/R
OBIEE
MicroStrategy
Analyze & Report
SOA/DAL
Purpose
Built Aps
SQL
Discover & Explore
EDW
“SINGULARITY”
HADOOP CLUSTERS
ENTERPRISE-CLASS SYSTEM
LOW END ENTERPRISE-CLASS SYSTEM
COMMODITY HARDWARE SYSTEM
Teradata 55xx and 66xx Series
Relational Data
Dual System
10+ PB
Semi Structured &
Relational Data
Deep Storage
Unstructured Data
Pattern Detection
Deep Storage
40+ PB
40+ PB
Data Integration
Ab Initio
Informatica
Golden Gate
UC4
BES
MapReduce
21
22. DW Sandbox enables agile analytics
Analytics teams have access
to sandboxes within eBay
Teradata data warehouses
(~ 100 GB per sandbox):
• Enable to keep the “Single
analyst’s
sandbox
Teradata Data Warehouse
Point of Truth” philosophy
• Improved Time To Market – Days / Weeks vs Months
• Enable the business to do agile prototyping
• Enable the users to “Fail
Fast” – Make it easy to try out new ideas
• Eliminate isolated Data Marts
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
22
23. SO… WHERE DO WE GO
FROM HERE?
1
Intro:
CERN & eBay
2
eBay
Infrastructure
3
Examples
of Analysis
4
Partnership
& Trust
24. Measuring impact of initiatives
A/B test
Pre/Post analysis
illustrative example (Simulation)
illustrative example (Simulation)
Number
of purchases
Number
of listings
35,000
Initiative
launched
450
400
Impact of the
initiative
350
300
test group
200
150
50
0
Aug 1st
pre
2012
post
D
25,000
20,000
250
100
30,000
Impact of the
initiative
Initiative
launched
B
15,000
2011
C
10,000
control
group
Sep 1st
5,000
Oct 1st
• Randomized Test/Control group
methodology is a golden standard in
research
A
0
Aug 1st
Sep 1st
Oct 1st
• Used to measure the impact of an
initiative in a full market or a market
segment
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
25. Marketing 101
Cost
Direct Return
Purchase
L C
L
Incr Return
?
No Purchase
?
C
D
Don‟t
Do Marketing
D
Do Marketing
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
25
26. Medici Effect
• New ideas proliferate when professional or cultural fields collide.
That‟s the “Medici Effect.“
• During the Renaissance, the Medici family enabled such collisions
by funding various fields and facilitating interdisciplinary creativity.
House of Medici
Michelangelo
Source:
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
26
27. Remember this physics problem?
1. AB + CD?
2. AC + BD?
3. AD + BC?
A
B
?
D
C
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
27
28. Solution: Big Data infrastructure enables large scale
computational such as combine all possibilities (cross-product)
Schematic View
CERN Example
(discovery of a new particle bb)
Signal
(particle resonance)
Statistical Noise
Combine correlated events and uncorrelated events produce a system with a
statistical noise (which is simple enough to extract) and the researched signal
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
Source: http://www.atlas.ch/news/2011/ATLAS-discovers-its-first-new-particle.html
28
29. Big Data technologies enable the full Cartesian product of
Marketing action & Revenue generating events
Clicks – Conversion
Playground
Marketing Events
(Clicks or Impressions)
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
29
30. Alternative way to understand customer behavior &
incrementally: geographic experimentation
Revenues / Cost
3 per. Mov. Avg. (Group 1)
Baseline
3 per. Mov. Avg. (Group 2)
3 per. Mov. Avg. (Group 3)
Phase 1
3 per. Mov. Avg. (Group 4)
Phase 2
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
30
32. Analytics as a function?
Embedded Model
Functional Model
“I‟m following my BU leader,
but can‟t get promoted”
“I‟m a partner of
business execution”
Need to track
satisfaction/loyalty/trust
of our partnership
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
32
33. Net Promoter Score
NPS: How likely is that you will recommend [Brand Name] to a friend or a colleague?
0
1
2
3
4
5
6
7
8
very unlikely
9
10
very likely
Detractors
Passives
Promoters
NPS = % Promoters - % Detractors
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
33
34. The logic behind NPS
• To improve NPS, a company need to work on 2 fronts:
– Move Detractors into Passives
(i.e. fix the holes, i.e. no more unacceptable bad experiences)
– Move Passives into Promoters
(i.e. improve the whole experience, best-in-class buyer experience)
0
1
2
3
Detractors
4
5
6
7
8
Passives
9
10
Promoters
NPS = % Promoters - % Detractors
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
34
35. Side note: Error on NPS measurement
• NPS is a multinomial distribution with
– p the probability to answer 0 to 6
– q the probability to answer 7 or 8
– r the probability to answer 9 or 10
– N the number of answers
• The Expected value for the Net Promoter Score is then
E(NPS) = r – p
• The Variance is then
V(NPS) = V(r-p) = V(r) + V(p) – 2 Cov(r,p) =
r (1-r) / N + p (1-p) / N + 2 r p / N
• Hence the error on NPS, i.e. the Standard Deviation, is then
(NPS) = SQRT [ r (1-r) / N + p (1-p) / N + 2 r p / N ]
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
35
36. NPS is a measurement of Loyalty in a free environment. In a
paid environment, it‟s more a measurement of Trust between
co-workers/partners
Net Promoter Score
How likely is it that you would recommend working
with Analyst XXX to a friend or colleague?
0
1
2
3
4
5
6
7
8
9
10
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
36
37. eNPS Survey
Team eNPS
Survey
Partner eNPS
Survey
• Identify opportunity to better partner with the business
• Identify to better work together as a team
• Enable directional assessment of eNPS; keeping in mind
biases: low N, subjective question, unlikely to promote an
unknown entity, partner <> client (i.e. Finance vs Agency)
Now that we have a
measurement,
how to improve it?
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
37
38. What is Trust? How to improve it?
Trust =
Credibility
Reliability
Intimacy
Unselfishness
http://www.collieassociates.com/common/Trust_Equation.pdf
Words: Convincing & believable
Actions: Consistently good in
quality & performance
Emotions: Feel comfortable talking to you
about the sensitive, personal issues connected
to the surface issue
Motives: Know that you care about serving
higher interests
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
38
39. Build Trust: Trust Equation
Trust
=
R
×
C
×
I
×
Trust
Component
Reliability
(Actions = Consistently
good in quality &
performance)
Credibility
(Words = Convincing &
believable)
Insights
Discovery ®
Colors
Hartman
Personality
Profiles
Lead
completely
Fiery RED
“Do it now!”
RED
Power Wielders
Practice
judgment
Cool BLUE
“Do it right!”
BLUE
The Do-gooders
Keep it
human
Earth GREEN
“Do it
harmoniously!”
WHITE
The Peacekeepers
Trust each
other
Sunshine
YELLOW
“Do it together!”
YELLOW
The Fun Lovers
Intimacy
(Emotions = Feel
comfortable talking to
you about the
sensitive/personal
issues connected to
the surface issue)
Unselfishness
U
eBay
Success
Factor
(Motives = Know that
you care about serving
our higher interests)
Carl Jung,
Swiss psychologist
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
39
40. Example of an internal partners survey on
the Trust foundation
Translates ideas and concepts into action.
4.9
Turnaround requests effectively.
5.0
Is comfortable with change.
5.0
Is adept at prioritizing tasks.
Does what one says one will do.
Tell the truth.
Is genuine in saying „Thank you‟ or „I don‟t
know‟.
Is comfortable saying 'no' at the beginning
rather than being unable to deliver in the end.
Creates an environment to address potential
conflicts openly.
Reliability (4.9)
4.9
5.2
5.6
5.5
Credibility (5.3)
5.0
5.0
Seeks help when facing difficulties.
5.3
Has an appropriate sense of humor.
5.3
Responds to and understand the
feelings/needs of others.
5.4
Uses „we‟ rather than „they‟ or „I‟.
Makes time for others.
Intimacy (5.2)
5.2
5.4
Supports ideas for innovation from others.
5.3
Trusts others to make decisions and get things
done for them.
Unselfishness (5.3)
5.2
Please complete each of the following statements using the rating guide. Try to provide a rating for every statement
and be honest with your feedback.
Weak in this area=1, Some concerns=2, A minor shortfall=3, Competent=4, Better than competent=5, Outstanding=6
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
40
41. Trust Equation assessment by the team and our partners
Partner average answer
90
85
under confidence
zone
over confidence
zone
Intimacy,
Keep It Human
Credibility,
Meets Quality
80
Non Political,
Unselfishness
75
Reliability,
Meets Deadline
70
65
60
60
65
70
75
80
85
90
Team average answer
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
41
42. Reliability: Value of an Analysis
Keep It Simple & Stupid
Individual Limit
Total Cost
Direct Return
Preferred
analyst‟s
level of
complexity
Optimal
level of
complexity
Complexity of Analytics
Net Return (Profit)
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
42
43. Credibility: Principle Of Least Surprise (POLS)
Don‟t surprise executives & partners
with new metrics, new definition,
new format or anything new…
without a proper business reason.
Setup Insights & Recommendation
in a natural, logical, global &
agreed-upon framework.
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
43
44. Credibility: Fixed Standard… or Flexible Chaos?
Standardized
Global
Metrics
Store any thing to
enable measuring any
metrics to answer any
questions
Chaos enable
flexibility, but require a
strong process to
maintain credibility
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
44
45. (Business) Intimacy
• Keep It Human – meet people, talk to people, walk to desk, pick-up the phone
• Seek help when needed
• Have a good sense of humor – “It‟s just a website…”
• Create an enviroment where people can open-up and discuss underlying issue
• Respond to the need/feeling of others
• CONNECT with people (Avatar‟s “I see you”)
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
45
46. Unselfishness
• Don‟t work in silo
• Consider “we” rather than “I” or “they”
• Support ideas for innovation from other (improv‟s “yes, and…”)
• Trust other to make the right decision – and live with it
• Be AVAILABLE – make time for other
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
46
47. Wrapping Up
How complexity can spark innovation, but also kill effectiveness
• Medici principle
• KISS
• Managing chaos
Why an embedded or client-centric Analytics organization is not
necessarily a great idea
• Enable career path with an Analytics organization
• Partner vs Client
• eNPS - Maintain the pulse on the internal-client/partner satisfaction
Why analyst creativity is antagonistic to executive reporting
• Trust pillars: Reliability, Credibility, Intimacy, Unselfishness
• POLS
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
47
49. FROM THE BIG BANG
TO ECOMMERCE, A
JOURNEY IN MAKING
SENSE OF BIG DATA
Patrick Deglon
Director of Global Traffic Analytics
pdeglon@ebay.com
linkd.in/pdeglon
50. Credibility: Key Phases of an Analytics Project
Move the
Business
Follow-up /
Implementation
Readout
Executive
Summary
Scoping
Hypothesis
to be verified
Scoping the
question
Measurement
set up
Measuring
Query
Data check
Guiding the
Business
Story Line /
Deck
Driving
Insights
Facts / Slides
Review
hypothesis
Data
manipulation
Interpretation
Statistics
Graphs
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
50
51. James, 32, live in Pittsburgh,
married, 1 child, Electronics Enthusiast
Site Visit
Site Visit
YouTube
Display Click
Site Visit
Offline
Store
Visit
Google Search on
“Digital Camera”,
click on eBay PS Ad
Google Search on
“eBay Digital Camera”
Click on NS link
Purchase
Loyalty Level
i.e. Likelihood to purchase on eBay
Woa.. They
really have
nice deals
on eBay
Ah…yes, e
Bay was a
good idea
– what do
they have?
That‟s really
expensive in
a store
Let‟s get
that
camera
now
Time
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
51
52. Marketing Attribution Logic
$
YouTube
Display Impression
Google Search on
“Digital Camera”,
click on eBay PS Ad
Google Search on
“eBay Digital Camera”
Click on NS link
Purchase
How does the purchase correlate to the customer touch points?
How “close”/”distant” are the clicks & the purchase?
Which one is the most important?
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
52
53. What is more important:
the front wheel or the back wheel?
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
53
54. Marketing Attribution Management
YouTube
Display Impression
Google Search on
“Digital Camera”,
click on eBay PS Ad
Google Search on
“eBay Digital Camera”
Click on NS link
Purchase
Define correlation (“distance”) between
customer touch points and purchase and
the likelihood that it happens
distance in time
distance in KW space
distance in Mindset
• Latency: time between click and ROI event (2 minutes? 2 hours? 2 days?)
• Relevancy: difference between Search keyword and Item purchased (KW-Title
relevancy, KW-Vertical relevancy)
• Loyalty: mindset of customer, i.e. RFM segment (Reactivation or Top Buyer)
• …
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
54
55. Marketing Attribution Management
Last Click
First Click
All Clicks
Model
YouTube
Display Impression
Google Search on
“Digital Camera”,
click on eBay PS Ad
Google Search on
“eBay Digital Camera”
Click on NS link
100%
YouTube
Display Impression
100%
Google Search on
“Digital Camera”,
click on eBay PS Ad
Google Search on
“eBay Digital Camera”
Click on NS link
YouTube
Display Impression
33%
Google Search on
“Digital Camera”,
click on eBay PS Ad
33%
Google Search on
“eBay Digital Camera”
Click on NS link
33%
YouTube
Display Impression
60%
Google Search on
“Digital Camera”,
click on eBay PS Ad
35%
Google Search on
“eBay Digital Camera”
Click on NS link
5%
Purchase
Purchase
Purchase
Purchase
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
55
56. … So what?
Last Click
Channel A
Channel B
Channel C
GMB
8%
5%
1%
ROI
+20%
-10%
+10%
• Reduce spend on channel B
• Invest in channel A
• When prioritizing, ignore
channel C
<>
All Clicks Model
Channel A
Channel B
Channel C
GMB
7%
6%
12%
ROI
-20%
+30%
+60%
• Reduce spend on channel A
• Invest heavily on channel C
• Marketing counts actually for
25% of the site
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
56
57. Example of the International Weekly Variance
Infrastructure (2007)
Automated SQL
Core DW
database
Excel
inputs
PDF
print-out
PET*
Modular
Back-end
single
pivot table
PPT &
Excel
report
Flexible
Front-end
* PET is a small database inside the Teradata Data Warehouse for building prototypes.
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
57
58. Example of Automated Quarterly Market Review deck (2007)
PowerPoint chart object with a
“SQL” field containing a EXEC
MACRO to refresh data content
of the chart
Linked to an Excel file that can
we refresh when needed
PowerPoint table object with a
“SQL” field containing a EXEC
MACRO to refresh the table
content
58
59. PowerPoint Reporting Tool (2012)
Update the content of the selected objects (table or chart)
Update the content of all objects in the PowerPoint
Login to DW
Add a “SQL” tag to
objects (table of chart)
and edit the SQL
Create a dummy chart
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
59
60. Example of BI report using Tableau
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
60