SlideShare una empresa de Scribd logo
1 de 28
A Hitchhiker’s Guide to
Data Science
sudeep das
Sudeep Das
Senior Machine Learning Researcher
@datamusing
My Journey
Ph. D. Astrophysics
Cosmic Microwave Background
Gravitational Lensing
Beats Music
Core Recommendation Systems Group
What do I do?
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
The Grand Innovation Workflow
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
In some companies, this is a data scientist
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
In some other companies, this is a data scientist
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
yet in some other companies, this is a data scientist
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
At Netflix, this is broadly what I do
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Tools of the trade
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
SQL, Spark (scala), PySpark, Python-Pandas, Hive,AWS-S3
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
Matplotlib, Tableau, Vega, Plotly, custom javascript (d3)
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
Hive, s3, APIs in Flask/Django/Java
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metricsPython, SciKit-learn, Jupyter notebooks,
TensorFlow/Keras, XGBoost, SparkML/scala, Zeppelin ...
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipelines
Monitor offline
metrics
Docker, company specific platforms
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipelines
Monitor offline
metrics
Java, Scala, in some cases Python, company specific
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Types of Problems
● Personalization
● Search
● Object recognition
● Voice/speech recognition
● Pattern recognition
● Natural Language
Processing
● Trend prediction
● Segmentation/clustering
● Dynamic Pricing
● Optimization
● Outlier Detection
At Netflix, we do a bit of everything
Emergent Trends
Probabilistic Graphical Models -
Bayes Nets
Deep Learning
Causal
Inference
(Deep)
Reinforcement
Learning
What academia prepares you for
● Perseverance
● Ability to pick up new technical skills
● Presentation skills
● Some quantitative visualization skills
● Ability to distil technical research in related areas and adapt it to the problem at hand
● If you are from a quantitative and experimental field:
○ Mathematical abilities
○ Knowledge of Basic Statistics - error analysis, experiment design
○ Some parameter estimation, bayesian inference exposure
○ Some ability to write code
○ Some exposure to general machine learning
● Learning from failure: Most A/B tests fail - so do experiments in academia
● Writing papers/ technical blogs etc.
What academia doesn’t prepare you for
● Being a good listener
● Asking questions
● Understanding and articulating the business value of your technical pursuit
● Writing clean, maintainable code with documentation and unit tests
● Ability to collaborate across teams and cultures - cross-functionally
● Admitting that “Good enough” is better than perfect
● Coping with quick project timelines
● Documenting, sharing, getting early input on projects
● Dealing with live, large, and exceptionally dirty datasets.
● Understanding that research in Industry is results driven and not publication driven.
● Stepping out of your focus area and seeing your problem in the bigger context of where your
company is headed.
Marketing Yourself
Fill in your
basic skills
gaps
Databases, SQL,
Spark familiarity
Data Structures
Algo/CS 101
Get really strong
in one language -
highly
recommend
Python - pandas,
scikit ecosystem
Good coding
practices -
documentation,
modular code,
unit tests
Amp up
your ML
Knowledge
Create an
Online
Presence
Improve soft
skills
Interview
Prep
Your friends:
Online courses
and open
datasets!
Do mini projects
on ML, esp. Deep
Learning,
Reinforcement
Learning. Get
creative!
Get a rock solid
foundation in
basic stats.
Kaggle
Competitions
Github repo so
recruiters can look
at your code.
Put your hobby
projects online
Write a blog post
on something new
you learned
Follow/contribute
to Stackoverflow
Landing the First Job!
Identify
weakness in
communication
skills and work
on them.
Pick up speaking
engagements at
meetups, at your
university, and
conferences such
as PyData
Do collaborative
projects with
people who are
also transitioning
Practise whiteboarding,
collaborative coding on
CoderPad
Standard books like
Cracking the Coding
Interview, Glassdoor
Go for some “dry run”
interviews.
Do background research
on the company - be
inquisitive, ask
questions
Keep at it!
@datamusing

Más contenido relacionado

La actualidad más candente

Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemAnoop Deoras
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixJustin Basilico
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized HomepageJustin Basilico
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsJustin Basilico
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Fernando Amat
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at NetflixLinas Baltrunas
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
 
Exploration and diversity in recommender systems
Exploration and diversity in recommender systemsExploration and diversity in recommender systems
Exploration and diversity in recommender systemsJaya Kawale
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender modelsParmeshwar Khurd
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 

La actualidad más candente (20)

Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Exploration and diversity in recommender systems
Exploration and diversity in recommender systemsExploration and diversity in recommender systems
Exploration and diversity in recommender systems
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 

Similar a Academia to Data Science - A Hitchhiker's Guide

Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning CCG
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningMostafa
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Datascience and Azure(v1.0)
Datascience and Azure(v1.0)Datascience and Azure(v1.0)
Datascience and Azure(v1.0)Zenodia Charpy
 
Data Science on Azure
Data Science on Azure Data Science on Azure
Data Science on Azure Zenodia Charpy
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning ClassifiersMostafa
 
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...DevOpsDays Riga
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...San Diego Supercomputer Center
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Automated Testing with Databases
Automated Testing with DatabasesAutomated Testing with Databases
Automated Testing with Databaseselliando dias
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management ToolkitJack Moore
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data scienceShilpaKrishna6
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseLisa Cohen
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 

Similar a Academia to Data Science - A Hitchhiker's Guide (20)

Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Datascience and Azure(v1.0)
Datascience and Azure(v1.0)Datascience and Azure(v1.0)
Datascience and Azure(v1.0)
 
Data Science on Azure
Data Science on Azure Data Science on Azure
Data Science on Azure
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Automated Testing with Databases
Automated Testing with DatabasesAutomated Testing with Databases
Automated Testing with Databases
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management Toolkit
 
AI-SDV 2020: Kairntech
AI-SDV 2020: KairntechAI-SDV 2020: Kairntech
AI-SDV 2020: Kairntech
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
3685807
36858073685807
3685807
 

Último

B.tech Civil Engineering Major Project by Deepak Kumar ppt.pdf
B.tech Civil Engineering Major Project by Deepak Kumar ppt.pdfB.tech Civil Engineering Major Project by Deepak Kumar ppt.pdf
B.tech Civil Engineering Major Project by Deepak Kumar ppt.pdfDeepak15CivilEngg
 
Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
B.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak KumarB.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak KumarDeepak15CivilEngg
 
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...vershagrag
 
UIowa Application Instructions - 2024 Update
UIowa Application Instructions - 2024 UpdateUIowa Application Instructions - 2024 Update
UIowa Application Instructions - 2024 UpdateUniversity of Iowa
 
Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...
Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...
Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...Angela Justice, PhD
 
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...ZurliaSoop
 
Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
obat aborsi pacitan wa 081336238223 jual obat aborsi cytotec asli di pacitan0...
obat aborsi pacitan wa 081336238223 jual obat aborsi cytotec asli di pacitan0...obat aborsi pacitan wa 081336238223 jual obat aborsi cytotec asli di pacitan0...
obat aborsi pacitan wa 081336238223 jual obat aborsi cytotec asli di pacitan0...yulianti213969
 
9352852248 Call Girls Sanand Escort Service Available 24×7 In Sanand
9352852248 Call Girls  Sanand Escort Service Available 24×7 In Sanand9352852248 Call Girls  Sanand Escort Service Available 24×7 In Sanand
9352852248 Call Girls Sanand Escort Service Available 24×7 In Sanandgargpaaro
 
Gabriel_Carter_EXPOLRATIONpp.pptx........
Gabriel_Carter_EXPOLRATIONpp.pptx........Gabriel_Carter_EXPOLRATIONpp.pptx........
Gabriel_Carter_EXPOLRATIONpp.pptx........deejay178
 
207095666-Book-Review-on-Ignited-Minds-Final.pptx
207095666-Book-Review-on-Ignited-Minds-Final.pptx207095666-Book-Review-on-Ignited-Minds-Final.pptx
207095666-Book-Review-on-Ignited-Minds-Final.pptxpawangadkhe786
 
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...gajnagarg
 
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
drug book file on obs. and gynae clinical pstings
drug book file on obs. and gynae clinical pstingsdrug book file on obs. and gynae clinical pstings
drug book file on obs. and gynae clinical pstingsKarishma7720
 
Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Call Girl Service in Ahmednagar { 9332606886 } VVIP NISHA Call Girls Near 5 S...
Call Girl Service in Ahmednagar { 9332606886 } VVIP NISHA Call Girls Near 5 S...Call Girl Service in Ahmednagar { 9332606886 } VVIP NISHA Call Girls Near 5 S...
Call Girl Service in Ahmednagar { 9332606886 } VVIP NISHA Call Girls Near 5 S...Sareena Khatun
 

Último (20)

B.tech Civil Engineering Major Project by Deepak Kumar ppt.pdf
B.tech Civil Engineering Major Project by Deepak Kumar ppt.pdfB.tech Civil Engineering Major Project by Deepak Kumar ppt.pdf
B.tech Civil Engineering Major Project by Deepak Kumar ppt.pdf
 
Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...
 
B.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak KumarB.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak Kumar
 
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
 
UIowa Application Instructions - 2024 Update
UIowa Application Instructions - 2024 UpdateUIowa Application Instructions - 2024 Update
UIowa Application Instructions - 2024 Update
 
Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Shillong [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Jabalpur [ 7014168258 ] Call Me For Genuine Models ...
 
Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...
Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...
Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...
 
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
 
Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Etawah [ 7014168258 ] Call Me For Genuine Models We...
 
obat aborsi pacitan wa 081336238223 jual obat aborsi cytotec asli di pacitan0...
obat aborsi pacitan wa 081336238223 jual obat aborsi cytotec asli di pacitan0...obat aborsi pacitan wa 081336238223 jual obat aborsi cytotec asli di pacitan0...
obat aborsi pacitan wa 081336238223 jual obat aborsi cytotec asli di pacitan0...
 
9352852248 Call Girls Sanand Escort Service Available 24×7 In Sanand
9352852248 Call Girls  Sanand Escort Service Available 24×7 In Sanand9352852248 Call Girls  Sanand Escort Service Available 24×7 In Sanand
9352852248 Call Girls Sanand Escort Service Available 24×7 In Sanand
 
Gabriel_Carter_EXPOLRATIONpp.pptx........
Gabriel_Carter_EXPOLRATIONpp.pptx........Gabriel_Carter_EXPOLRATIONpp.pptx........
Gabriel_Carter_EXPOLRATIONpp.pptx........
 
207095666-Book-Review-on-Ignited-Minds-Final.pptx
207095666-Book-Review-on-Ignited-Minds-Final.pptx207095666-Book-Review-on-Ignited-Minds-Final.pptx
207095666-Book-Review-on-Ignited-Minds-Final.pptx
 
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Sagar [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
 
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
 
drug book file on obs. and gynae clinical pstings
drug book file on obs. and gynae clinical pstingsdrug book file on obs. and gynae clinical pstings
drug book file on obs. and gynae clinical pstings
 
Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...
 
Call Girl Service in Ahmednagar { 9332606886 } VVIP NISHA Call Girls Near 5 S...
Call Girl Service in Ahmednagar { 9332606886 } VVIP NISHA Call Girls Near 5 S...Call Girl Service in Ahmednagar { 9332606886 } VVIP NISHA Call Girls Near 5 S...
Call Girl Service in Ahmednagar { 9332606886 } VVIP NISHA Call Girls Near 5 S...
 

Academia to Data Science - A Hitchhiker's Guide

  • 1. A Hitchhiker’s Guide to Data Science sudeep das Sudeep Das Senior Machine Learning Researcher @datamusing
  • 3. Ph. D. Astrophysics Cosmic Microwave Background Gravitational Lensing
  • 5. What do I do?
  • 6. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics The Grand Innovation Workflow Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 7. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics In some companies, this is a data scientist Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 8. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics In some other companies, this is a data scientist Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 9. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics yet in some other companies, this is a data scientist Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 10. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics At Netflix, this is broadly what I do Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 11. Tools of the trade
  • 12. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics SQL, Spark (scala), PySpark, Python-Pandas, Hive,AWS-S3 Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 13. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics Matplotlib, Tableau, Vega, Plotly, custom javascript (d3) Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 14. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics Hive, s3, APIs in Flask/Django/Java Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 15. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metricsPython, SciKit-learn, Jupyter notebooks, TensorFlow/Keras, XGBoost, SparkML/scala, Zeppelin ... Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 16. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipelines Monitor offline metrics Docker, company specific platforms Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 17. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipelines Monitor offline metrics Java, Scala, in some cases Python, company specific Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 19. ● Personalization ● Search ● Object recognition ● Voice/speech recognition ● Pattern recognition ● Natural Language Processing ● Trend prediction ● Segmentation/clustering ● Dynamic Pricing ● Optimization ● Outlier Detection At Netflix, we do a bit of everything
  • 21. Probabilistic Graphical Models - Bayes Nets Deep Learning Causal Inference (Deep) Reinforcement Learning
  • 23. ● Perseverance ● Ability to pick up new technical skills ● Presentation skills ● Some quantitative visualization skills ● Ability to distil technical research in related areas and adapt it to the problem at hand ● If you are from a quantitative and experimental field: ○ Mathematical abilities ○ Knowledge of Basic Statistics - error analysis, experiment design ○ Some parameter estimation, bayesian inference exposure ○ Some ability to write code ○ Some exposure to general machine learning ● Learning from failure: Most A/B tests fail - so do experiments in academia ● Writing papers/ technical blogs etc.
  • 24. What academia doesn’t prepare you for
  • 25. ● Being a good listener ● Asking questions ● Understanding and articulating the business value of your technical pursuit ● Writing clean, maintainable code with documentation and unit tests ● Ability to collaborate across teams and cultures - cross-functionally ● Admitting that “Good enough” is better than perfect ● Coping with quick project timelines ● Documenting, sharing, getting early input on projects ● Dealing with live, large, and exceptionally dirty datasets. ● Understanding that research in Industry is results driven and not publication driven. ● Stepping out of your focus area and seeing your problem in the bigger context of where your company is headed.
  • 27. Fill in your basic skills gaps Databases, SQL, Spark familiarity Data Structures Algo/CS 101 Get really strong in one language - highly recommend Python - pandas, scikit ecosystem Good coding practices - documentation, modular code, unit tests Amp up your ML Knowledge Create an Online Presence Improve soft skills Interview Prep Your friends: Online courses and open datasets! Do mini projects on ML, esp. Deep Learning, Reinforcement Learning. Get creative! Get a rock solid foundation in basic stats. Kaggle Competitions Github repo so recruiters can look at your code. Put your hobby projects online Write a blog post on something new you learned Follow/contribute to Stackoverflow Landing the First Job! Identify weakness in communication skills and work on them. Pick up speaking engagements at meetups, at your university, and conferences such as PyData Do collaborative projects with people who are also transitioning Practise whiteboarding, collaborative coding on CoderPad Standard books like Cracking the Coding Interview, Glassdoor Go for some “dry run” interviews. Do background research on the company - be inquisitive, ask questions Keep at it!