2. Agenda
• Problem Statement:
– Digital and Retail behavior analysis:
• Long tail problem similarities
– Propensity Marketing:
• Propensity for consumer to respond to promotion?
• Cover DM/ML Demographics presentation
– Profitability Marketing
• Who are the most profitable customers?
• Obvious answer, select * from customers join orders order by amt
desc;
– Promotion Modeling
• What drives order values and who should receive promotions?
3. What do I do
• Work, Tech lead Google, ~10y, Architect
Absolute SW
• Teach, mentor others on Big Data, Hadoop,
DM/ML
• http://www.meetup.com/HandsOnProgrammi
ngEvents/.
4. Review
• Theory:
– What is long tail?
– Long tail success case studies
– Demographic targeting/Modeling and prediction
– ML/DM success case studies
• Data Analysis Strategies/Structure
5. What is the Long Tail?
• Originated from search engines/Google
• Don’t focus on the top 20% queries, focus on
the bottom 50% first
• Why? The bottom 50% was the hardest:
LP&SB. The top 20% was automatic
7. Keyword Lift/Complementary
Strategies
• 70% of the keywords are not used frequently.
• Page Rank/feature selection/Spam reduction
– Most data (demographics is inaccurate, eBay problem)
• Quality of features enable ML/DM modeling
– Identify these words first using simple SQL queries
then run a model and use A/B testing to iterate to
better results
– Example of ML/DM later
• Case study of data visualisation for search query
length
8. Complete solution not possible
• A complete solution to the long tail is not
possible via a hackathon
• Examples of Complete Solutions
– Example: Symantec uses modified page rank to see if
virus files are safe/not safe. Viruses are different, all
are unique. You can’t rely on past examples. >90%
accuracy rate. Uses people feedback.
– Example: Yahoo content system matching users to
content ~100 attributes->1k attributes. Most users
only go to Yahoo news for a few stories. MM guides
this
10. Long Tail
• Obvious longer queries imply user wants more precise
result. Precision vs. Recall
• Obvious these users are more valuable b/c the directed
intent is more focused. Showing the user enter in queries
with more precision is very very valuable for shopping and
other applications with focused directed intent
• The above case results in a $50.00 click to Google for
Salesforce/SAP ads (e.g home financing/mortgages)
• Best way to see this is in a demo:
Move mouse on dots which are close to each other:
http://dataincolour.com:8888/#1144645000
DEMO!!!!!
11. Example real time applied to previous
example
We looked at search keywords and search phrase
length. Visualizations as a substitute for Machine
Learning algorithms. Much faster to implement
Some students <~20 years old did this in a
weekend hackathon:
http://www.dataincolour.com/2011/06/curiousn
akes-visualization-of-aol-questions/
http://datainsightsf.com/schedule-2/ Not
repeated
12. What to do?
• Brainstorm some more, definitely something here, play
w/data; will come in time. The most important part is
the definition of the problem, not the code
– Think more code less
• Should you copy the data visualisation example on
Search Query Length?
– Probably not
• A long long time ago Google displayed the incoming
search queries in the lobby; this had practical use
• Real time constrain the problem, less complicated
processing, less about the algorithm, more about the
user
13. Why Real Time? Long Tail
Do I really need real time? Yes, why?
Pre2010 Google search displayed all the results, a
combination of precision and recall.
Post 2010 Google went to instant search, limited recall.
Nobody drilled down to the 1Mth page for DVDs.
Better ads results with real time
Analytics today is similar to pre2010 Google search,
batch processing using click logs
Real time analytics mostly custom solutions but can be
much more effective. Once user leaves the website too
late to do anything. Many orders of magnitude
difference. Precision >> Recall
15. Mouse on a dot which is part of a
group which looks like a snake
Can see what user typed in as queries after
another, here is one example;
How to fix car-> What is a fuel filter-> How to
replace a fuel filter.
This is valuable in adding additional features
to the user who asked this
Can't get this from SQL queries easily or at all.
16. What is the lesson here?
• Viewing data in real time has value
• Minimum it helps clear the thinking for the
next step
• Use as an alerting system/QC process to show
if ML/DM is running correctly (proprietary in
Google/Yahoo). Every business has these.
• Key: visible to everybody w/o running a SQL
query
17. Wisdom gained matches across 2
hackathons
• One of the most surprising pieces of work was
a unique data visualization from the DM
hackathon
• None of these positive results were defined in
the problem statement. Required creativity.
• Careful
18. Review ML/DM
• Review a small subset of these slides:
– http://www.slideshare.net/DougChang1/demographic
s-andweblogtargeting-10757778
• Agenda: review a case study of the Motley Fool
and how to create/target promotions to likely
subscribers for problem #2, propensity marketing
• Case study of a past hackathon.
– My role: I seed the ideas, Mike Bowles, Nick Kolegraff
19. ML/DM Slides
• DO NOT INSERT SLIDES, cover the original so
we don’t limit the scope of audience
questions
20. ML/DM and Hackathons
• Done 2 as examples,
– Motley Fool, cosponsored by Kaggle (Mike Bowles)
– Best Buy, paid Kaggle (Nick Kolegraff@Accenture/DM
SIG, we sought him out)
– These events require guidance/very successful, both still
are receptive to more DM/ML events
• Careful: an algorithm doesn’t mean you have a
production process or something someone can
manage via a paid analyst headcount
• Why aren’t there more? Time investment to clean
data, tech talk to guide participants, min 3 months
work
21. What do I do for others which may
help you?
• Seed the ideas; should add a structure to this. NDA. Run
SQL queries
• Current Case Study
– Starting to do the prep work for another real time analytics
example, teaching from this
– Nick/Mike did this for the other 2 hackathons.
• Match the strategy w/structure
– Take time off work to build an engineering prototype (Twitter
Storm in old slide deck)
– Not covering this here
– Strategy: first display the data in a real time dashboard then
iterate the visualizations, then add DM/ML algorithms after the
A/B testing framework is complete
27. Kiehl’s Example
• Put in offers w/($ amount, product desc, click url)
customized per user, A/B test layouts and placement,
store data for customization and measure lift
• Measure facebook ads via page rank
• Predict missing links application
• http://blog.echen.me/2012/07/31/edge-prediction-in-
a-social-graph-my-solution-to-facebooks-user-
recommendation-contest-on-kaggle/
• Careful, don’t copy. Example only. Generalize to
hackathon. Many other ideas
• Your answer is different from Yahoo & Google.
This isn’t a roadmap.
28. Promotion Modeling
• Is this a long tail problem?
– How to formulate the graph and influence across
nodes?
– Which features to select to use for modeling?
– Still ok if you don’t have the long tail answer.
Follow the Demographics Customer modeling ex.
• How to change the model over time?
• Metrics for promotion effectiveness
– Facebook campaigns are easy to iterate and run.
Still need some form of A/B testing
29. Structure has to match Strategy
• Partner w/Macy’s? Develop a structure to
work with retail partners to increase their
sales
– E.g. customized shopkick
– Don’t just release APIs, release mobile app source
code ppl can modify
• Test promotions and building profiles?
• … lots of ideas