Link to video: https://www.youtube.com/watch?v=IZ-kC6ut1Lg
In a previous talk, we explained how we developed Imhotep, a distributed system for building decision trees for machine learning. We went on to describe how we build large scale interactive analytics tools using the same platform. This has kept our engineering and product organizations focused on key metrics by analyzing test results. It also gives our marketing organization timely and accurate insight into our data - allowing us to identify opportunities, spot trends, and learn about our job seekers. In this talk, Zak Cocos, who leads our Marketing Sciences team, and Product Manager Tom Bergman will discuss and provide examples of the valuable insights that can be gained by using Imhotep with almost any data set.
31. How has traffic from Yahoo! changed over
time in Great Britain, Germany, and Japan?
32. QUERY
from:yahoo AND country:(gb, de, jp)
METRIC
visits
How has traffic from Yahoo! changed over
time in Great Britain, Germany, and Japan?
33. How has traffic from Yahoo! changed over
time in Great Britain, Germany, and Japan?
34. ● How many unique queries in the US?
● What are the top 50 queries in the US?
● How many clicks did each of those queries receive?
Questions Ramses can’t answer
36. Began as a distributed iteration and group-by
engine for building click prediction models.
Imhotep Origins
37. We use an iterative algorithm to build decision
trees level-by-level.
Decision Tree Builder
38.
39. Began as a distributed iteration and group-by
engine for building click prediction models.
Leveraged ability to do massive group-bys and
aggregates to make real-time analytics engine.
Imhotep Origins
40. How many Android App users with accounts
older than 30 days saved at least 1 job in the
past week?
41. What titles have the highest click-through rate
for the query “Architecture” in the US?
What about the lowest click-through rate?
42. For job seekers who click on Google jobs in
Ireland, what other company’s jobs do they
click on?
83. Methodology
1) organic index: select companies in the US
which received organic clicks
2) crunchbase index: select companies, and
the amount of funding for companies receiving
investments in Austin
84. Methodology
1) organic index: select companies in the US
which received organic clicks
2) crunchbase index: select companies, and
the amount of funding for companies receiving
investments in Austin
3) Join, segment, and do the math!
87. Large Scale Interactive Analytics Platform
● 123 Unique Indexes
● Largest Index 30TB
● Total size ~125TB
88. Large Scale Interactive Analytics Platform
IQL -> Largely Programmatic access
● approx 76k queries/day
● Avg time to execute 0.67 seconds
Ramses -> Largely Human
● approx 3,400 queries/day
● Avg time to execute 4.4 seconds
89. Large Scale Interactive Analytics Platform
Users
● 198 unique users in past month
● 25,622 unique queries in past month
● Avg 53 queries/user per day
90. Large Scale Interactive Analytics Platform
40+ internal clients
● 6 Analytics Webapps
● 5 dashboards
● 10 programming/scripting shells
● 6 monitoring apps
● … and more
91. Large Scale Interactive Analytics Platform
One Tool-set for all data
● Website usage
● Operational Monitoring
● Financial Reporting
● Google Analytics
● Internal Webapp Usage
● External Reports
98. Terminology Common to both
Software and Architecture
Blueprint
Design
Framework
Infrastructure
Engineer
Project manager
Development
Technical architect
Software
Modeling
Computation
Code reviews
99. Architecture vs Software Titles
Architect
CAD Designer
Project Manager
vs
Software Architect
UI Designer
Project Manager
101. Query Management
Indeed uses Imhotep to improve
matching
Automatically detect results that should
be added or removed from queries
102. Query Management
Indeed uses Imhotep to improve
matching
Automatically detect results that should
be added or removed from queries
26,790 rules across all countries
106. Next @IndeedEng Talk
Launching Indeed Around the World
Davide Novelli, International Director
David Tulig, Tech Lead
May 28, 2014
http://engineering.indeed.com/talks