SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Rob Winters
Head of Data Science
Architecting for
Analytics
Data Science at TravelBird
Founded in 2010, TravelBird’s
focus is to bring back the joy of
travel by providing inspiration to
explore and simplicity in
discovering new destinations.
Active in eleven markets across
Europe and inspiring three
million travelers daily via email,
web, and mobile app.
Our Values
Inspiring
Prompting you to visit a place you’d
never thought about before.
Curated & local
Proudly introducing travellers to the
very best their destinations have to
offer, with insider tips and local
insight.
Simple & easy
Taking care of the core elements of
your journey, and there for you
every step of the way.
● Team Lead: Rob
● Data Engineering: Niels
● Data Science: Tedy, Egle, Bastien
● Reporting: Jeff, Enzo
Data
Science @
TB
Team
Composition
Summary Stats
● 30 million events processed/day
● 2,5 million personalized
interactions/day
● 700 discrete dashboards/ad hoc
analyses, 300 FTE supported with
60% daily reporting utilization
What does the
Data Science
team do?
Systems
Engineering
The Data Science team independently
manages >50% of the company
technology stack composed of over a
dozen systems and services to support
all components of data capture,
management, storage, and utilization
Traditional BI/Reporting
Omni-channel personalization and CRM
Every email The web and app Every service interaction
Marketing channel attribution and spend decision support
Order
Affiliate EmailAffiliate Affiliate Email Email Email SEA Organic Email Email
3d 5d 2d 1d 1d 1d 3d 3d 1d 5d
Journey’s net revenue is spread to the
touchpoints backwards, according to:
- channel weight,
- the amount of touchpoints in the
journey and whether it is the
- first or the last click of the journey.
Forecasting
Using a mixture of internal and
external data fed through ARIMA
and neural net models, we
predict the expected travel
demand and how that matches
our negotiated availability
Data Science in Operations
Decision Support
By analyzing offer features and
user interactions, automatically
recommend changes to image,
text, calendar, price, etc
Calendar Analysis
By using CNNs to analyze
calendar features, we can
identify how users interpret
calendars and how that changes
over time, allowing better
negotiation with partners
Other Tasks
● Customer base management and
strategy
● A/B and multivariate testing
● Financial target setting
● Liability risk management
● Organizational coaching/training
● GDPR management of 3rd parties
● Website optimization algorithms
● Partner billing/invoicing
If it has
data, we
deal with it
Driving outcomes in the company
Organizational
Philosophies
Self-service
everywhere
The number one focus in reporting is
self-service, and everyone from the
CEO down is expected to be
comfortable in BI. Reporting is a
standard part of new hire onboarding
and advanced trainings are conducted
on a monthly basis.
The best way
to get value
from data is if
everyone
mines it
Never deliver
more than MVP
With our stakeholders we have agreed
to always deliver MVP and iterate
together, focusing on shipping quickly
and perfecting later. This means that
many times our initial products have
incomplete features, partly invalidated
data, known bugs, etc. Once the core
problems are solved, the incomplete
solution may remain sufficient for
several quarters.
Agile
Always
Close partners
with all teams
Data Science acts as a peer operational
team with Marketing, Sales, etc. This
means that we join the same
operational meetings, have similar
targets, and directly work together to
solve problems. This also means that
every project is jointly run start-to-finish
with one or more people from an ops
team.
Partnership,
not Service
How we do what we do
Team
Philosophies
T-Shaped People
Every person is expected to be
full-stack capable and understand the
general mechanics of everything relating
to their domain. This includes technical
components (ex reporting guys write
ETL and API integrations) as well as
functional (everyone does their own
stakeholder management)
Everyone
can do
everything
Specialists lower
complexity for
others
We place large focus on working to
reduce complexity for others when they
need to interface in different domains.
This means building standardized tools,
conducting trainings, and continuous
side-by-side coaching and pair
programming
Always be
helping
Continuous
Improvement
Learning and coaching are center to our
work. Everyone works on projects each
quarter that are outside their expertise
but in their learning goals, 15% of time
is reserved for learning and “hack time”,
and each person is paired quarterly with
another to coach and be coached.
Always Be
Learning
Translating philosophy to
technology
In Detail:
Infrastructure
Cheap
We don’t want to pay for
anything unless we have to, and
even then we try not to pay. Our
architecture is designed to
minimize costs whenever
possible by using open source,
lots of flexible scaling, and low
cost hosted solutions
Architectural Goals
Auto scaling/recovering
With only one engineer and no
on-call, our systems should be
able to automatically adjust to
demand and handle near
catastrophic failure gracefully
Easily flexible
As every person must be able to
partially manage parts of the
infrastructure, we have to be
able to build tooling and
functionality that allow
non-engineers to build and
destroy servers, scale clusters,
and productionalize jobs without
any support
Our Architecture (Overall)
● Fully AWS hosted
● Mixture of permanent hosts, auto-scaled,
and dynamically launched (ex for ML jobs)
● Production is built in Django + MySQL
● Data Science architecture (interesting stuff
in red) is:
○ Postgres + Vertica for databases
○ Kinesis for event buffering
○ Spark, Keras, Tensorflow for ML
○ Airflow + Rundeck for scheduling
○ Redis for real-time data
○ S3 + HDFS + GFS for storage
And Python for EVERYTHING
Reporting in Detail: Self-Service
● Structural trainings every month,
total of 12 hours of training
material prepared by team
● Two tools
○ Tableau for general reporting
○ Metabase for more technical users
(allows raw SQL)
● >80% of all reporting is end user
created and maintained
Event In Detail: Real Time + Microbatch
Our Inspiration: Lambda architecture
Machine Learning In Detail
● Used for all the big, sexy analytics
○ Regression billions of records
○ Collaborative filtering
■ Average domain has 15k
products and 1,5M training
users
● PySpark instead of Scala allows
recycling of all our custom Python
libraries into ML jobs (rather than
rewriting)
● In modern Spark, performance in Python
and Scala is about the same (when using
Spark functionality)
● Used for all the small, sexy analytics
○ Deep learning on session purchase
propensity
○ Predicting sellout dates using
RNNs
● Keras is easier and cleaner to read than
raw TensorFlow
● Spark deep learning functionality is
underdeveloped at this time
● In deep learning, TF is #1 and Keras #2,
so Keras + TF is … #12? Great
community and development
The BI-brary and Central Config
The bibrary is a Python library
everyone contributes to which
contains standardized functionality to
be reused for any conceivable tasks.
Everything from data management to
Spark and Tensorflow functionality
Tools we’ve built to facilitate data science
The Executor
The executor allows anyone to launch
servers or clusters, execute code
remotely, process data into the
database, etc from models all using a
simple JSON configuration block
Auto-DBA
A large part of performance
management and optimization is
automated including storage
management, likely foreign key
identification, and data security
Working in Python exclusively means that
data science is easy
● This is a simplified
recommender model in 20 lines
of Python
● A data scientist familiar with
Python can be working
productively in Spark in a few
days
● Easy, fast modeling means we
can keep iteration time low,
increasing number of tests
But the production code is equally easy
This Bibrary function interprets a JSON
blob into SQL to determine what content
to be sent in an email
● SQL + Python makes it easy for
data scientists to understand
● Using consistent input/output
structure means that very little
testing is needed when introducing
new models, templates, or products
Translating technology and
philosophy to outcomes
Example:
Attribution
Enhancement
The primary goal of the project was to
improve the effectiveness of our
marketing attribution model, improving
the team’s ability to spend effectively.
To achieve this goal, the secondary
goals were to:
● Identify the largest opportunities for
model improvement
● Build, test, and accept model changes
for two largest opportunities
● Conduct a workshop with Marketing on
how the changes will impact their
channel strategies
The Goal
The team consisted of:
● Enzo: Data Analyst studying Data
Science
● Bastien: Data Scientist
● Noah: Display marketer
● Colin: SEA marketer
Together they reviewed products that
had the lowest performance in
attribution and identified likely model
factors that could be adjusted to
account for the product variances
Project
Start
Team and Kickoff
Together they prioritized two changes:
● Last click: use channel conversion
propensity to re-weight the last session
● Dynamic journey decay: based on a
products average time-to-purchase,
dynamically reweight older sessions
Together they defined deliverables, timelines,
scope of work and jointly divided tasks
including learning goals:
● Enzo is the better engineer and would
supervise Bastien in data pipeline
changes
● Bastien is the more experienced Data
Scientist and would support Enzo in
algorithm development
Analysis and
Planning
Development
The team used standard deployment
scripting to create a sandbox DWH
environment and to build new model
workers each day, allowing them to
easily test and evaluate on 100% of
historical data (>300M rows)
Development and Acceptance
Communication
The team directly communicated
progress with the CMO and
stakeholders, with intermediary
acceptance conducted based on
slack messages. Colin regularly
looked into intermediary output using
SQL and Tableau
Final Acceptance
After shipping the model changes,
acceptance was conducted as a joint
review with marketing team leads.
Start to finish was two weeks from
agreement of project to production
Knowledge Sharing
To conclude, Bastien conducted a
workshop/attribution Q&A with all of
marketing, senior leadership, and
other operational folks to explain
attribution and how markov chains
work
Culture, not
Technology, drives
data-driven
outcomes
Architecting for analytics

Más contenido relacionado

La actualidad más candente

Starting Your Modern DataOps Journey
Starting Your Modern DataOps JourneyStarting Your Modern DataOps Journey
Starting Your Modern DataOps JourneyCloverDX
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessInside Analysis
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15madynav
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business IntelligenceAlmog Ramrajkar
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsRyan Gross
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business IntelligencePrithwis Mukerjee
 
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data ModelerThe Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data ModelerDATAVERSITY
 
Big Data Analytics Webinar
Big Data Analytics WebinarBig Data Analytics Webinar
Big Data Analytics WebinarEckerson Group
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTechWell
 
Microsoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered AnalyticsMicrosoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered AnalyticsJuan Alvarado
 
Agile data warehouse
Agile data warehouseAgile data warehouse
Agile data warehouseDao Vo
 
An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIInside Analysis
 
Data driven decision making through analytics and IoT
Data driven decision making through analytics and IoTData driven decision making through analytics and IoT
Data driven decision making through analytics and IoTAachen Data & AI Meetup
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 
CI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. HuntCI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. HuntDatabricks
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Embedding Insight through Prediction Driven Logistics
Embedding Insight through Prediction Driven LogisticsEmbedding Insight through Prediction Driven Logistics
Embedding Insight through Prediction Driven LogisticsDatabricks
 

La actualidad más candente (20)

Starting Your Modern DataOps Journey
Starting Your Modern DataOps JourneyStarting Your Modern DataOps Journey
Starting Your Modern DataOps Journey
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data ops
 
AI is a Team Sport
AI is a Team SportAI is a Team Sport
AI is a Team Sport
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business Intelligence
 
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data ModelerThe Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
 
Big Data Analytics Webinar
Big Data Analytics WebinarBig Data Analytics Webinar
Big Data Analytics Webinar
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big Problems
 
Microsoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered AnalyticsMicrosoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered Analytics
 
Agile data warehouse
Agile data warehouseAgile data warehouse
Agile data warehouse
 
An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Data driven decision making through analytics and IoT
Data driven decision making through analytics and IoTData driven decision making through analytics and IoT
Data driven decision making through analytics and IoT
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
CI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. HuntCI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. Hunt
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Embedding Insight through Prediction Driven Logistics
Embedding Insight through Prediction Driven LogisticsEmbedding Insight through Prediction Driven Logistics
Embedding Insight through Prediction Driven Logistics
 

Similar a Architecting for analytics

Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform Michael Ghen
 
Machine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup EventMachine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup EventBenjamin Schulte
 
Resume SAGAR DHAKATE
Resume  SAGAR DHAKATEResume  SAGAR DHAKATE
Resume SAGAR DHAKATESagarDhakate1
 
A journey of ai driven analytics insights engine
A journey of  ai driven analytics insights engineA journey of  ai driven analytics insights engine
A journey of ai driven analytics insights engineSomya Anand
 
Delivering Projects the Pivotal Way
Delivering Projects the Pivotal WayDelivering Projects the Pivotal Way
Delivering Projects the Pivotal WayAaron Severs
 
ElectroNeek Partner - AAPNA Infotech
ElectroNeek Partner - AAPNA InfotechElectroNeek Partner - AAPNA Infotech
ElectroNeek Partner - AAPNA InfotechAapna Infotech
 
Gdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdfGdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdfSparshJhariya2
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
 
Are we there yet? Rev up your productivity with project management tools
Are we there yet? Rev up your productivity with project management toolsAre we there yet? Rev up your productivity with project management tools
Are we there yet? Rev up your productivity with project management toolsMargot
 
Are we there yet? Rev up your productivity with project management tools
Are we there yet?  Rev up your productivity with project management toolsAre we there yet?  Rev up your productivity with project management tools
Are we there yet? Rev up your productivity with project management toolsAnnis Lee Adams
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Kent Graziano
 
MVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost LabsMVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost LabsBoost Labs
 
Leveraging Your Tech Stack – Migration Execution: Best Practices for Both
Leveraging Your Tech Stack – Migration  Execution: Best Practices for BothLeveraging Your Tech Stack – Migration  Execution: Best Practices for Both
Leveraging Your Tech Stack – Migration Execution: Best Practices for BothTinuiti
 
Curriculum vitae Fahmi Rahman
Curriculum vitae Fahmi RahmanCurriculum vitae Fahmi Rahman
Curriculum vitae Fahmi RahmanFahmi Rahman
 
Balancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMBalancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMProduct School
 

Similar a Architecting for analytics (20)

Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
 
Machine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup EventMachine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup Event
 
Resume SAGAR DHAKATE
Resume  SAGAR DHAKATEResume  SAGAR DHAKATE
Resume SAGAR DHAKATE
 
Microsoft teams.pdf
Microsoft teams.pdfMicrosoft teams.pdf
Microsoft teams.pdf
 
A journey of ai driven analytics insights engine
A journey of  ai driven analytics insights engineA journey of  ai driven analytics insights engine
A journey of ai driven analytics insights engine
 
Delivering Projects the Pivotal Way
Delivering Projects the Pivotal WayDelivering Projects the Pivotal Way
Delivering Projects the Pivotal Way
 
ElectroNeek Partner - AAPNA Infotech
ElectroNeek Partner - AAPNA InfotechElectroNeek Partner - AAPNA Infotech
ElectroNeek Partner - AAPNA Infotech
 
Gdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdfGdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdf
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
Are we there yet? Rev up your productivity with project management tools
Are we there yet? Rev up your productivity with project management toolsAre we there yet? Rev up your productivity with project management tools
Are we there yet? Rev up your productivity with project management tools
 
Are we there yet? Rev up your productivity with project management tools
Are we there yet?  Rev up your productivity with project management toolsAre we there yet?  Rev up your productivity with project management tools
Are we there yet? Rev up your productivity with project management tools
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)
 
Farhan farooqi cv (1)
Farhan farooqi cv (1)Farhan farooqi cv (1)
Farhan farooqi cv (1)
 
MVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost LabsMVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost Labs
 
mohan_p1
mohan_p1mohan_p1
mohan_p1
 
Leveraging Your Tech Stack – Migration Execution: Best Practices for Both
Leveraging Your Tech Stack – Migration  Execution: Best Practices for BothLeveraging Your Tech Stack – Migration  Execution: Best Practices for Both
Leveraging Your Tech Stack – Migration Execution: Best Practices for Both
 
Curriculum vitae Fahmi Rahman
Curriculum vitae Fahmi RahmanCurriculum vitae Fahmi Rahman
Curriculum vitae Fahmi Rahman
 
Balancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMBalancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PM
 
Abhishek jaiswal
Abhishek jaiswalAbhishek jaiswal
Abhishek jaiswal
 

Más de Rob Winters

A brief history of data warehousing
A brief history of data warehousingA brief history of data warehousing
A brief history of data warehousingRob Winters
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 
Building a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine LearningBuilding a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine LearningRob Winters
 
Architecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsArchitecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsRob Winters
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseRob Winters
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesRob Winters
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfRob Winters
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataRob Winters
 
Getting Started with Big Data Analytics
Getting Started with Big Data AnalyticsGetting Started with Big Data Analytics
Getting Started with Big Data AnalyticsRob Winters
 
Billions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right NowBillions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right NowRob Winters
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil GamesRob Winters
 

Más de Rob Winters (11)

A brief history of data warehousing
A brief history of data warehousingA brief history of data warehousing
A brief history of data warehousing
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Building a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine LearningBuilding a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine Learning
 
Architecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsArchitecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data Analytics
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big Data
 
Getting Started with Big Data Analytics
Getting Started with Big Data AnalyticsGetting Started with Big Data Analytics
Getting Started with Big Data Analytics
 
Billions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right NowBillions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right Now
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
 

Último

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 

Último (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Architecting for analytics

  • 1. Rob Winters Head of Data Science Architecting for Analytics Data Science at TravelBird
  • 2. Founded in 2010, TravelBird’s focus is to bring back the joy of travel by providing inspiration to explore and simplicity in discovering new destinations. Active in eleven markets across Europe and inspiring three million travelers daily via email, web, and mobile app. Our Values Inspiring Prompting you to visit a place you’d never thought about before. Curated & local Proudly introducing travellers to the very best their destinations have to offer, with insider tips and local insight. Simple & easy Taking care of the core elements of your journey, and there for you every step of the way.
  • 3. ● Team Lead: Rob ● Data Engineering: Niels ● Data Science: Tedy, Egle, Bastien ● Reporting: Jeff, Enzo Data Science @ TB Team Composition Summary Stats ● 30 million events processed/day ● 2,5 million personalized interactions/day ● 700 discrete dashboards/ad hoc analyses, 300 FTE supported with 60% daily reporting utilization
  • 4. What does the Data Science team do?
  • 5. Systems Engineering The Data Science team independently manages >50% of the company technology stack composed of over a dozen systems and services to support all components of data capture, management, storage, and utilization
  • 7. Omni-channel personalization and CRM Every email The web and app Every service interaction
  • 8. Marketing channel attribution and spend decision support Order Affiliate EmailAffiliate Affiliate Email Email Email SEA Organic Email Email 3d 5d 2d 1d 1d 1d 3d 3d 1d 5d Journey’s net revenue is spread to the touchpoints backwards, according to: - channel weight, - the amount of touchpoints in the journey and whether it is the - first or the last click of the journey.
  • 9. Forecasting Using a mixture of internal and external data fed through ARIMA and neural net models, we predict the expected travel demand and how that matches our negotiated availability Data Science in Operations Decision Support By analyzing offer features and user interactions, automatically recommend changes to image, text, calendar, price, etc Calendar Analysis By using CNNs to analyze calendar features, we can identify how users interpret calendars and how that changes over time, allowing better negotiation with partners
  • 10. Other Tasks ● Customer base management and strategy ● A/B and multivariate testing ● Financial target setting ● Liability risk management ● Organizational coaching/training ● GDPR management of 3rd parties ● Website optimization algorithms ● Partner billing/invoicing If it has data, we deal with it
  • 11. Driving outcomes in the company Organizational Philosophies
  • 12. Self-service everywhere The number one focus in reporting is self-service, and everyone from the CEO down is expected to be comfortable in BI. Reporting is a standard part of new hire onboarding and advanced trainings are conducted on a monthly basis. The best way to get value from data is if everyone mines it
  • 13. Never deliver more than MVP With our stakeholders we have agreed to always deliver MVP and iterate together, focusing on shipping quickly and perfecting later. This means that many times our initial products have incomplete features, partly invalidated data, known bugs, etc. Once the core problems are solved, the incomplete solution may remain sufficient for several quarters. Agile Always
  • 14. Close partners with all teams Data Science acts as a peer operational team with Marketing, Sales, etc. This means that we join the same operational meetings, have similar targets, and directly work together to solve problems. This also means that every project is jointly run start-to-finish with one or more people from an ops team. Partnership, not Service
  • 15. How we do what we do Team Philosophies
  • 16. T-Shaped People Every person is expected to be full-stack capable and understand the general mechanics of everything relating to their domain. This includes technical components (ex reporting guys write ETL and API integrations) as well as functional (everyone does their own stakeholder management) Everyone can do everything
  • 17. Specialists lower complexity for others We place large focus on working to reduce complexity for others when they need to interface in different domains. This means building standardized tools, conducting trainings, and continuous side-by-side coaching and pair programming Always be helping
  • 18. Continuous Improvement Learning and coaching are center to our work. Everyone works on projects each quarter that are outside their expertise but in their learning goals, 15% of time is reserved for learning and “hack time”, and each person is paired quarterly with another to coach and be coached. Always Be Learning
  • 19. Translating philosophy to technology In Detail: Infrastructure
  • 20. Cheap We don’t want to pay for anything unless we have to, and even then we try not to pay. Our architecture is designed to minimize costs whenever possible by using open source, lots of flexible scaling, and low cost hosted solutions Architectural Goals Auto scaling/recovering With only one engineer and no on-call, our systems should be able to automatically adjust to demand and handle near catastrophic failure gracefully Easily flexible As every person must be able to partially manage parts of the infrastructure, we have to be able to build tooling and functionality that allow non-engineers to build and destroy servers, scale clusters, and productionalize jobs without any support
  • 21. Our Architecture (Overall) ● Fully AWS hosted ● Mixture of permanent hosts, auto-scaled, and dynamically launched (ex for ML jobs) ● Production is built in Django + MySQL ● Data Science architecture (interesting stuff in red) is: ○ Postgres + Vertica for databases ○ Kinesis for event buffering ○ Spark, Keras, Tensorflow for ML ○ Airflow + Rundeck for scheduling ○ Redis for real-time data ○ S3 + HDFS + GFS for storage And Python for EVERYTHING
  • 22. Reporting in Detail: Self-Service ● Structural trainings every month, total of 12 hours of training material prepared by team ● Two tools ○ Tableau for general reporting ○ Metabase for more technical users (allows raw SQL) ● >80% of all reporting is end user created and maintained
  • 23. Event In Detail: Real Time + Microbatch Our Inspiration: Lambda architecture
  • 24. Machine Learning In Detail ● Used for all the big, sexy analytics ○ Regression billions of records ○ Collaborative filtering ■ Average domain has 15k products and 1,5M training users ● PySpark instead of Scala allows recycling of all our custom Python libraries into ML jobs (rather than rewriting) ● In modern Spark, performance in Python and Scala is about the same (when using Spark functionality) ● Used for all the small, sexy analytics ○ Deep learning on session purchase propensity ○ Predicting sellout dates using RNNs ● Keras is easier and cleaner to read than raw TensorFlow ● Spark deep learning functionality is underdeveloped at this time ● In deep learning, TF is #1 and Keras #2, so Keras + TF is … #12? Great community and development
  • 25. The BI-brary and Central Config The bibrary is a Python library everyone contributes to which contains standardized functionality to be reused for any conceivable tasks. Everything from data management to Spark and Tensorflow functionality Tools we’ve built to facilitate data science The Executor The executor allows anyone to launch servers or clusters, execute code remotely, process data into the database, etc from models all using a simple JSON configuration block Auto-DBA A large part of performance management and optimization is automated including storage management, likely foreign key identification, and data security
  • 26. Working in Python exclusively means that data science is easy ● This is a simplified recommender model in 20 lines of Python ● A data scientist familiar with Python can be working productively in Spark in a few days ● Easy, fast modeling means we can keep iteration time low, increasing number of tests
  • 27. But the production code is equally easy This Bibrary function interprets a JSON blob into SQL to determine what content to be sent in an email ● SQL + Python makes it easy for data scientists to understand ● Using consistent input/output structure means that very little testing is needed when introducing new models, templates, or products
  • 28. Translating technology and philosophy to outcomes Example: Attribution Enhancement
  • 29. The primary goal of the project was to improve the effectiveness of our marketing attribution model, improving the team’s ability to spend effectively. To achieve this goal, the secondary goals were to: ● Identify the largest opportunities for model improvement ● Build, test, and accept model changes for two largest opportunities ● Conduct a workshop with Marketing on how the changes will impact their channel strategies The Goal
  • 30. The team consisted of: ● Enzo: Data Analyst studying Data Science ● Bastien: Data Scientist ● Noah: Display marketer ● Colin: SEA marketer Together they reviewed products that had the lowest performance in attribution and identified likely model factors that could be adjusted to account for the product variances Project Start Team and Kickoff
  • 31. Together they prioritized two changes: ● Last click: use channel conversion propensity to re-weight the last session ● Dynamic journey decay: based on a products average time-to-purchase, dynamically reweight older sessions Together they defined deliverables, timelines, scope of work and jointly divided tasks including learning goals: ● Enzo is the better engineer and would supervise Bastien in data pipeline changes ● Bastien is the more experienced Data Scientist and would support Enzo in algorithm development Analysis and Planning
  • 32. Development The team used standard deployment scripting to create a sandbox DWH environment and to build new model workers each day, allowing them to easily test and evaluate on 100% of historical data (>300M rows) Development and Acceptance Communication The team directly communicated progress with the CMO and stakeholders, with intermediary acceptance conducted based on slack messages. Colin regularly looked into intermediary output using SQL and Tableau Final Acceptance After shipping the model changes, acceptance was conducted as a joint review with marketing team leads. Start to finish was two weeks from agreement of project to production
  • 33. Knowledge Sharing To conclude, Bastien conducted a workshop/attribution Q&A with all of marketing, senior leadership, and other operational folks to explain attribution and how markov chains work