A summary of the philosophy and approach taken by the TravelBird Data Science team (and company as a whole) that allows rapid development of new machine learning algorithms, data insights, and integration into production and operations.
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
Architecting for analytics
1. Rob Winters
Head of Data Science
Architecting for
Analytics
Data Science at TravelBird
2. Founded in 2010, TravelBird’s
focus is to bring back the joy of
travel by providing inspiration to
explore and simplicity in
discovering new destinations.
Active in eleven markets across
Europe and inspiring three
million travelers daily via email,
web, and mobile app.
Our Values
Inspiring
Prompting you to visit a place you’d
never thought about before.
Curated & local
Proudly introducing travellers to the
very best their destinations have to
offer, with insider tips and local
insight.
Simple & easy
Taking care of the core elements of
your journey, and there for you
every step of the way.
3. ● Team Lead: Rob
● Data Engineering: Niels
● Data Science: Tedy, Egle, Bastien
● Reporting: Jeff, Enzo
Data
Science @
TB
Team
Composition
Summary Stats
● 30 million events processed/day
● 2,5 million personalized
interactions/day
● 700 discrete dashboards/ad hoc
analyses, 300 FTE supported with
60% daily reporting utilization
5. Systems
Engineering
The Data Science team independently
manages >50% of the company
technology stack composed of over a
dozen systems and services to support
all components of data capture,
management, storage, and utilization
8. Marketing channel attribution and spend decision support
Order
Affiliate EmailAffiliate Affiliate Email Email Email SEA Organic Email Email
3d 5d 2d 1d 1d 1d 3d 3d 1d 5d
Journey’s net revenue is spread to the
touchpoints backwards, according to:
- channel weight,
- the amount of touchpoints in the
journey and whether it is the
- first or the last click of the journey.
9. Forecasting
Using a mixture of internal and
external data fed through ARIMA
and neural net models, we
predict the expected travel
demand and how that matches
our negotiated availability
Data Science in Operations
Decision Support
By analyzing offer features and
user interactions, automatically
recommend changes to image,
text, calendar, price, etc
Calendar Analysis
By using CNNs to analyze
calendar features, we can
identify how users interpret
calendars and how that changes
over time, allowing better
negotiation with partners
10. Other Tasks
● Customer base management and
strategy
● A/B and multivariate testing
● Financial target setting
● Liability risk management
● Organizational coaching/training
● GDPR management of 3rd parties
● Website optimization algorithms
● Partner billing/invoicing
If it has
data, we
deal with it
12. Self-service
everywhere
The number one focus in reporting is
self-service, and everyone from the
CEO down is expected to be
comfortable in BI. Reporting is a
standard part of new hire onboarding
and advanced trainings are conducted
on a monthly basis.
The best way
to get value
from data is if
everyone
mines it
13. Never deliver
more than MVP
With our stakeholders we have agreed
to always deliver MVP and iterate
together, focusing on shipping quickly
and perfecting later. This means that
many times our initial products have
incomplete features, partly invalidated
data, known bugs, etc. Once the core
problems are solved, the incomplete
solution may remain sufficient for
several quarters.
Agile
Always
14. Close partners
with all teams
Data Science acts as a peer operational
team with Marketing, Sales, etc. This
means that we join the same
operational meetings, have similar
targets, and directly work together to
solve problems. This also means that
every project is jointly run start-to-finish
with one or more people from an ops
team.
Partnership,
not Service
16. T-Shaped People
Every person is expected to be
full-stack capable and understand the
general mechanics of everything relating
to their domain. This includes technical
components (ex reporting guys write
ETL and API integrations) as well as
functional (everyone does their own
stakeholder management)
Everyone
can do
everything
17. Specialists lower
complexity for
others
We place large focus on working to
reduce complexity for others when they
need to interface in different domains.
This means building standardized tools,
conducting trainings, and continuous
side-by-side coaching and pair
programming
Always be
helping
18. Continuous
Improvement
Learning and coaching are center to our
work. Everyone works on projects each
quarter that are outside their expertise
but in their learning goals, 15% of time
is reserved for learning and “hack time”,
and each person is paired quarterly with
another to coach and be coached.
Always Be
Learning
20. Cheap
We don’t want to pay for
anything unless we have to, and
even then we try not to pay. Our
architecture is designed to
minimize costs whenever
possible by using open source,
lots of flexible scaling, and low
cost hosted solutions
Architectural Goals
Auto scaling/recovering
With only one engineer and no
on-call, our systems should be
able to automatically adjust to
demand and handle near
catastrophic failure gracefully
Easily flexible
As every person must be able to
partially manage parts of the
infrastructure, we have to be
able to build tooling and
functionality that allow
non-engineers to build and
destroy servers, scale clusters,
and productionalize jobs without
any support
21. Our Architecture (Overall)
● Fully AWS hosted
● Mixture of permanent hosts, auto-scaled,
and dynamically launched (ex for ML jobs)
● Production is built in Django + MySQL
● Data Science architecture (interesting stuff
in red) is:
○ Postgres + Vertica for databases
○ Kinesis for event buffering
○ Spark, Keras, Tensorflow for ML
○ Airflow + Rundeck for scheduling
○ Redis for real-time data
○ S3 + HDFS + GFS for storage
And Python for EVERYTHING
22. Reporting in Detail: Self-Service
● Structural trainings every month,
total of 12 hours of training
material prepared by team
● Two tools
○ Tableau for general reporting
○ Metabase for more technical users
(allows raw SQL)
● >80% of all reporting is end user
created and maintained
23. Event In Detail: Real Time + Microbatch
Our Inspiration: Lambda architecture
24. Machine Learning In Detail
● Used for all the big, sexy analytics
○ Regression billions of records
○ Collaborative filtering
■ Average domain has 15k
products and 1,5M training
users
● PySpark instead of Scala allows
recycling of all our custom Python
libraries into ML jobs (rather than
rewriting)
● In modern Spark, performance in Python
and Scala is about the same (when using
Spark functionality)
● Used for all the small, sexy analytics
○ Deep learning on session purchase
propensity
○ Predicting sellout dates using
RNNs
● Keras is easier and cleaner to read than
raw TensorFlow
● Spark deep learning functionality is
underdeveloped at this time
● In deep learning, TF is #1 and Keras #2,
so Keras + TF is … #12? Great
community and development
25. The BI-brary and Central Config
The bibrary is a Python library
everyone contributes to which
contains standardized functionality to
be reused for any conceivable tasks.
Everything from data management to
Spark and Tensorflow functionality
Tools we’ve built to facilitate data science
The Executor
The executor allows anyone to launch
servers or clusters, execute code
remotely, process data into the
database, etc from models all using a
simple JSON configuration block
Auto-DBA
A large part of performance
management and optimization is
automated including storage
management, likely foreign key
identification, and data security
26. Working in Python exclusively means that
data science is easy
● This is a simplified
recommender model in 20 lines
of Python
● A data scientist familiar with
Python can be working
productively in Spark in a few
days
● Easy, fast modeling means we
can keep iteration time low,
increasing number of tests
27. But the production code is equally easy
This Bibrary function interprets a JSON
blob into SQL to determine what content
to be sent in an email
● SQL + Python makes it easy for
data scientists to understand
● Using consistent input/output
structure means that very little
testing is needed when introducing
new models, templates, or products
29. The primary goal of the project was to
improve the effectiveness of our
marketing attribution model, improving
the team’s ability to spend effectively.
To achieve this goal, the secondary
goals were to:
● Identify the largest opportunities for
model improvement
● Build, test, and accept model changes
for two largest opportunities
● Conduct a workshop with Marketing on
how the changes will impact their
channel strategies
The Goal
30. The team consisted of:
● Enzo: Data Analyst studying Data
Science
● Bastien: Data Scientist
● Noah: Display marketer
● Colin: SEA marketer
Together they reviewed products that
had the lowest performance in
attribution and identified likely model
factors that could be adjusted to
account for the product variances
Project
Start
Team and Kickoff
31. Together they prioritized two changes:
● Last click: use channel conversion
propensity to re-weight the last session
● Dynamic journey decay: based on a
products average time-to-purchase,
dynamically reweight older sessions
Together they defined deliverables, timelines,
scope of work and jointly divided tasks
including learning goals:
● Enzo is the better engineer and would
supervise Bastien in data pipeline
changes
● Bastien is the more experienced Data
Scientist and would support Enzo in
algorithm development
Analysis and
Planning
32. Development
The team used standard deployment
scripting to create a sandbox DWH
environment and to build new model
workers each day, allowing them to
easily test and evaluate on 100% of
historical data (>300M rows)
Development and Acceptance
Communication
The team directly communicated
progress with the CMO and
stakeholders, with intermediary
acceptance conducted based on
slack messages. Colin regularly
looked into intermediary output using
SQL and Tableau
Final Acceptance
After shipping the model changes,
acceptance was conducted as a joint
review with marketing team leads.
Start to finish was two weeks from
agreement of project to production
33. Knowledge Sharing
To conclude, Bastien conducted a
workshop/attribution Q&A with all of
marketing, senior leadership, and
other operational folks to explain
attribution and how markov chains
work