SlideShare a Scribd company logo
1 of 76
@arburbank
Building a culture
of experimentation
scaling data science at Pinterest
@arburbank
Andrea Burbank
@arburbank
σ, μ
@arburbank
Organizational maturity model
use source control
write unit tests
track bugs
write a spec
build often
@arburbank
Experimentation maturity model
@arburbank
Experimentation maturity model
get started
get big
get better
get out
get tools
@arburbank
Stage 1:
get started
get started
@arburbank
problem:
people making bad decisions
get started
@arburbank
Run experiments
entire
population
control
enable
d
@arburbank
Cultural maturity model
get started
entire
population
control
enabled
data
data
insight
@arburbank
Stage 2:
get big
get started
get big
@arburbank
problem:
underutilization
get started
get big
@arburbank
http://altmba.com/wp-content/uploads/2015/06/fieldofdreamscorn.jpg
@arburbank
http://altmba.com/wp-content/uploads/2015/06/fieldofdreamscorn.jpg
@arburbank
if you build it, they won’t come
marketing
@arburbank
if you build it, they won’t come
evangelism
@arburbank
if you build it, they won’t come
salesmanship
@arburbank
Cultural maturity model
evangelize
educate
explain
get big
@arburbank
Stage 3:
get better
get started
get big
get better
@arburbank
problem:
guidance
get started
get big
get better
needed
@arburbank
you are the human in the loop
ensure
successrun test
YOU
@arburbank
you are the human in the loop
ensure
successrun test
YOU
ensure
successrun test
ensure
success
run test
ensure
success
run test
@arburbank
@arburbank
It is not your career
goal to be the
experiments person
@arburbank
(you should have
higher ambitions)
@arburbank
Cultural maturity model
how can I
help?
get better
@arburbank
Stage 4:
get out
get started
get big
get better
get out
@arburbank
problem:
scale yourself
get started
get big
get better
get out
@arburbank
Write down the process
What mistakes do you see in experiments?
What questions do you answer repeatedly?
How will learning this help others?
@arburbank
@arburbank
@arburbank
@arburbank
“if you let engineers
run experiments, they
will screw them up in
every way possible.”
@arburbank
“if you let untrained
engineers run
experiments, they will
screw them up in every
way possible.”
@arburbank@arburbank
@arburbank
For every important
mistake, explain why
it’s wrong and how to
avoid it.
@arburbank
launch landing
in-flight
@arburbank
launch
@arburbank
@arburbank
in-flight
@arburbank
@arburbank
landing
@arburbank
@arburbank
Make a list, check it twice


 landing
in-flight
launch
@arburbank
Make a list, check it twice
e+r+
@arburbank
Make a list, check it twice
@experiments-help
@arburbank
@experiments-help
names matter:
“help,” not “on-call”
@arburbank
@experiments-help
engineer partners:
move fast, own the process
@arburbank
@experiments-help
the right people:
thoughtful, well-respected
@arburbank
@arburbank
Implement a process
1. Checklists for experiments
2. @experiments-help mention in code review
3. e+ as part of code review
4. Mailing list: experiments-help@
5. Experiment document template
6. Rotation of experiment helpers
@arburbank
Implement a process
1. Checklists for experiments
2. @experiments-help mention in code review
3. e+ as part of code review
4. Mailing list: experiments-help@
5. Experiment document template
6. Rotation of experiment helpers
@arburbank
Train your successors
So you want to be an experiment helper?
• Step 1: read the documentation
• Step 2: take the experiment quiz
• Step 3: review all experiments for a week
@arburbank
@arburbank
@arburbank
50
trained
experiment
helpers
@arburbank
Cultural maturity model
how would
you answer
that?
get out
@arburbank
Stage 5:
get tools
get started
get big
get better
get out
get tools
@arburbank
problem:
simple mistakes
get started
get big
get better
get out
get tools
@arburbank
get tools
launch
@arburbank
simplify experiment API
@arburbank
remove untriggered experiments
@arburbank
create helper functions
@arburbank
get tools
in flight
@arburbank
add a control group automatically
when a new variant is introduced
@arburbank
expand experiment groups
at the same rate
@arburbank
get tools
landing
@arburbank
detect errors
@arburbank
Automation: analysis
chi-squared test on group sizes
@arburbank
Automation: analysis
test that groups grew at the same rate
@arburbank
Automation: analysis
verify similar distributions of users
@arburbank
Automation: analysis
hide results that are likely to be wrong
@arburbank
simplify analysis
@arburbank
Automation: analysis
automatically track important metrics
(and compute statistical significance)
@arburbank
Automation: analysis
segment important populations
@arburbank
Automation: analysis
measure novelty vs. long-term effects
@arburbank
Cultural maturity model
just use
humans for
the hard part:
thinking
get tools
@arburbank
Experimentation maturity model
get started
get big
get better
get out
get tools
@arburbank
Stage 6:
the future ??
@arburbank
data science:
changing minds, one at a time
andrea@pinterest.com

More Related Content

What's hot

A/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyA/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyDanielle Jabin
 
A/B testing at Spotify
A/B testing at SpotifyA/B testing at Spotify
A/B testing at SpotifyAli Sarrafi
 
Intro to A/B Testing by Ever's Senior Product Manager
Intro to A/B Testing by Ever's Senior Product ManagerIntro to A/B Testing by Ever's Senior Product Manager
Intro to A/B Testing by Ever's Senior Product ManagerProduct School
 
Product Workshop - Finding Your North Star - handout
Product Workshop - Finding Your North Star - handoutProduct Workshop - Finding Your North Star - handout
Product Workshop - Finding Your North Star - handoutAmplitude
 
Experimentation Platform at Netflix
Experimentation Platform at NetflixExperimentation Platform at Netflix
Experimentation Platform at NetflixSteve Urban
 
A/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerA/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerProduct School
 
Talks@Coursera - A/B Testing @ Internet Scale
Talks@Coursera - A/B Testing @ Internet ScaleTalks@Coursera - A/B Testing @ Internet Scale
Talks@Coursera - A/B Testing @ Internet Scalecourseratalks
 
Basics of AB testing in online products
Basics of AB testing in online productsBasics of AB testing in online products
Basics of AB testing in online productsAshish Dua
 
Prioritization Method for Every Case by fmr Atlassian Principal PM
Prioritization Method for Every Case by fmr Atlassian Principal PMPrioritization Method for Every Case by fmr Atlassian Principal PM
Prioritization Method for Every Case by fmr Atlassian Principal PMProduct School
 
Product Discovery At Google
Product Discovery At GoogleProduct Discovery At Google
Product Discovery At GoogleJohn Gibbon
 
Lean Analytics for Startups and Enterprises
Lean Analytics for Startups and EnterprisesLean Analytics for Startups and Enterprises
Lean Analytics for Startups and EnterprisesLean Analytics
 
Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B TestingAlex Alwan
 
Growth Hacking / Marketing 101: It's about process
Growth Hacking / Marketing 101: It's about processGrowth Hacking / Marketing 101: It's about process
Growth Hacking / Marketing 101: It's about processRuben Hamilius
 
21 Story Splitting Patterns
21 Story Splitting Patterns21 Story Splitting Patterns
21 Story Splitting PatternsKent McDonald
 
Product Led Growth: The Rise of the User
Product Led Growth: The Rise of the UserProduct Led Growth: The Rise of the User
Product Led Growth: The Rise of the UserOpenView
 
4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B TestingJanessa Lantz
 
Lean Product Discovery
Lean Product DiscoveryLean Product Discovery
Lean Product DiscoveryDavid Hawks
 

What's hot (20)

A/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyA/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at Spotify
 
A/B testing at Spotify
A/B testing at SpotifyA/B testing at Spotify
A/B testing at Spotify
 
Intro to A/B Testing by Ever's Senior Product Manager
Intro to A/B Testing by Ever's Senior Product ManagerIntro to A/B Testing by Ever's Senior Product Manager
Intro to A/B Testing by Ever's Senior Product Manager
 
Product Workshop - Finding Your North Star - handout
Product Workshop - Finding Your North Star - handoutProduct Workshop - Finding Your North Star - handout
Product Workshop - Finding Your North Star - handout
 
The Power of A/B Testing
The Power of A/B TestingThe Power of A/B Testing
The Power of A/B Testing
 
Experimentation Platform at Netflix
Experimentation Platform at NetflixExperimentation Platform at Netflix
Experimentation Platform at Netflix
 
A/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerA/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product Manager
 
Talks@Coursera - A/B Testing @ Internet Scale
Talks@Coursera - A/B Testing @ Internet ScaleTalks@Coursera - A/B Testing @ Internet Scale
Talks@Coursera - A/B Testing @ Internet Scale
 
Basics of AB testing in online products
Basics of AB testing in online productsBasics of AB testing in online products
Basics of AB testing in online products
 
Prioritization Method for Every Case by fmr Atlassian Principal PM
Prioritization Method for Every Case by fmr Atlassian Principal PMPrioritization Method for Every Case by fmr Atlassian Principal PM
Prioritization Method for Every Case by fmr Atlassian Principal PM
 
Product Discovery At Google
Product Discovery At GoogleProduct Discovery At Google
Product Discovery At Google
 
Lean Analytics for Startups and Enterprises
Lean Analytics for Startups and EnterprisesLean Analytics for Startups and Enterprises
Lean Analytics for Startups and Enterprises
 
Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B Testing
 
Growth Hacking / Marketing 101: It's about process
Growth Hacking / Marketing 101: It's about processGrowth Hacking / Marketing 101: It's about process
Growth Hacking / Marketing 101: It's about process
 
WTF is a Product Roadmap?
WTF is a Product Roadmap?WTF is a Product Roadmap?
WTF is a Product Roadmap?
 
User Stories
User StoriesUser Stories
User Stories
 
21 Story Splitting Patterns
21 Story Splitting Patterns21 Story Splitting Patterns
21 Story Splitting Patterns
 
Product Led Growth: The Rise of the User
Product Led Growth: The Rise of the UserProduct Led Growth: The Rise of the User
Product Led Growth: The Rise of the User
 
4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing
 
Lean Product Discovery
Lean Product DiscoveryLean Product Discovery
Lean Product Discovery
 

Viewers also liked

いまさら聞けない機械学習の評価指標
いまさら聞けない機械学習の評価指標いまさら聞けない機械学習の評価指標
いまさら聞けない機械学習の評価指標圭輔 大曽根
 
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ Seiji Takahashi
 
Gunosyデータマイニング研究会 #118 これからの強化学習
Gunosyデータマイニング研究会 #118 これからの強化学習Gunosyデータマイニング研究会 #118 これからの強化学習
Gunosyデータマイニング研究会 #118 これからの強化学習圭輔 大曽根
 
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LTあなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LTHiroaki Kudo
 
#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計Cloudera Japan
 
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015Cloudera Japan
 
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechApache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechCloudera Japan
 
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のことHiroaki Kudo
 
爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話Kentaro Yoshida
 
Gunosy における AWS 上での自然言語処理・機械学習の活用事例
Gunosy における AWS 上での自然言語処理・機械学習の活用事例Gunosy における AWS 上での自然言語処理・機械学習の活用事例
Gunosy における AWS 上での自然言語処理・機械学習の活用事例圭輔 大曽根
 
論文紹介@ Gunosyデータマイニング研究会 #97
論文紹介@ Gunosyデータマイニング研究会 #97論文紹介@ Gunosyデータマイニング研究会 #97
論文紹介@ Gunosyデータマイニング研究会 #97圭輔 大曽根
 
記事分類における教師データおよびモデルの管理
記事分類における教師データおよびモデルの管理記事分類における教師データおよびモデルの管理
記事分類における教師データおよびモデルの管理圭輔 大曽根
 
マイクロサービスとABテスト
マイクロサービスとABテストマイクロサービスとABテスト
マイクロサービスとABテスト圭輔 大曽根
 
WebDB Forum 2016 gunosy
WebDB Forum 2016 gunosyWebDB Forum 2016 gunosy
WebDB Forum 2016 gunosyHiroaki Kudo
 
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017Cloudera Japan
 

Viewers also liked (15)

いまさら聞けない機械学習の評価指標
いまさら聞けない機械学習の評価指標いまさら聞けない機械学習の評価指標
いまさら聞けない機械学習の評価指標
 
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 
 
Gunosyデータマイニング研究会 #118 これからの強化学習
Gunosyデータマイニング研究会 #118 これからの強化学習Gunosyデータマイニング研究会 #118 これからの強化学習
Gunosyデータマイニング研究会 #118 これからの強化学習
 
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LTあなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
 
#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計
 
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
 
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechApache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentech
 
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
 
爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話
 
Gunosy における AWS 上での自然言語処理・機械学習の活用事例
Gunosy における AWS 上での自然言語処理・機械学習の活用事例Gunosy における AWS 上での自然言語処理・機械学習の活用事例
Gunosy における AWS 上での自然言語処理・機械学習の活用事例
 
論文紹介@ Gunosyデータマイニング研究会 #97
論文紹介@ Gunosyデータマイニング研究会 #97論文紹介@ Gunosyデータマイニング研究会 #97
論文紹介@ Gunosyデータマイニング研究会 #97
 
記事分類における教師データおよびモデルの管理
記事分類における教師データおよびモデルの管理記事分類における教師データおよびモデルの管理
記事分類における教師データおよびモデルの管理
 
マイクロサービスとABテスト
マイクロサービスとABテストマイクロサービスとABテスト
マイクロサービスとABテスト
 
WebDB Forum 2016 gunosy
WebDB Forum 2016 gunosyWebDB Forum 2016 gunosy
WebDB Forum 2016 gunosy
 
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
 

Similar to A/B Testing at Pinterest: Building a Culture of Experimentation

Workshop #2: User Research For Everyone by Aras Bilgen
Workshop #2: User Research For Everyone by Aras BilgenWorkshop #2: User Research For Everyone by Aras Bilgen
Workshop #2: User Research For Everyone by Aras Bilgenux singapore
 
Rachel Meyer Pubcon Presentation
Rachel Meyer Pubcon PresentationRachel Meyer Pubcon Presentation
Rachel Meyer Pubcon PresentationRachel Meyer
 
Designing to save lives: Government technical documentation
Designing  to save  lives: Government technical documentation Designing  to save  lives: Government technical documentation
Designing to save lives: Government technical documentation Laurian Vega
 
Content Strategy: A Framework for Marketing Success
Content Strategy: A Framework for Marketing SuccessContent Strategy: A Framework for Marketing Success
Content Strategy: A Framework for Marketing SuccessLaura Creekmore
 
Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...
Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...
Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...COMAQA.BY
 
D school assignment 3 Prototype and Test
D school assignment 3 Prototype and TestD school assignment 3 Prototype and Test
D school assignment 3 Prototype and TestLee-Anne Walker
 
How to avoid research debt
How to avoid research debtHow to avoid research debt
How to avoid research debtCaroline Jarrett
 
Full Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYC
Full Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYCFull Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYC
Full Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYCKarl Stanton
 
Introduction to bugs measurement
Introduction to bugs measurementIntroduction to bugs measurement
Introduction to bugs measurementVolodya Novostavsky
 
Data Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningData Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningDomino Data Lab
 
LEARN STARTUP OVERVIEW
LEARN STARTUP OVERVIEWLEARN STARTUP OVERVIEW
LEARN STARTUP OVERVIEWwe20
 
[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...
[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...
[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...Product Camp Brasil
 
Digital portfolio 1_v2
Digital portfolio 1_v2Digital portfolio 1_v2
Digital portfolio 1_v2mustafaalinike
 
Case study for agile software development:
Case study for agile software development: Case study for agile software development:
Case study for agile software development: Joe Crespo
 
Improve the UX of Your Content and Prove It
Improve the UX of Your Content and Prove ItImprove the UX of Your Content and Prove It
Improve the UX of Your Content and Prove ItPam Noreault
 
It's time to research our designs better. Here's how. UIUX Conference 2018 - ...
It's time to research our designs better. Here's how. UIUX Conference 2018 - ...It's time to research our designs better. Here's how. UIUX Conference 2018 - ...
It's time to research our designs better. Here's how. UIUX Conference 2018 - ...Sophie Freiermuth
 
Using cognitive walkthroughs to better review designs for accessibility
Using cognitive walkthroughs to better review designs for accessibilityUsing cognitive walkthroughs to better review designs for accessibility
Using cognitive walkthroughs to better review designs for accessibilityIntopia
 
Cultivating Content: Designing Wiki Solutions That Scale
Cultivating Content: Designing Wiki Solutions That ScaleCultivating Content: Designing Wiki Solutions That Scale
Cultivating Content: Designing Wiki Solutions That Scalecolleenfry
 
Pubcon SFIMA Super Awesome Extended Bonus Edition
Pubcon SFIMA Super Awesome Extended Bonus EditionPubcon SFIMA Super Awesome Extended Bonus Edition
Pubcon SFIMA Super Awesome Extended Bonus Editionrachelmeyer
 
5 Essential Tips For Improving Your Website Mockups & Prototypes!
5 Essential Tips For Improving Your Website Mockups & Prototypes!5 Essential Tips For Improving Your Website Mockups & Prototypes!
5 Essential Tips For Improving Your Website Mockups & Prototypes!Usersnap
 

Similar to A/B Testing at Pinterest: Building a Culture of Experimentation (20)

Workshop #2: User Research For Everyone by Aras Bilgen
Workshop #2: User Research For Everyone by Aras BilgenWorkshop #2: User Research For Everyone by Aras Bilgen
Workshop #2: User Research For Everyone by Aras Bilgen
 
Rachel Meyer Pubcon Presentation
Rachel Meyer Pubcon PresentationRachel Meyer Pubcon Presentation
Rachel Meyer Pubcon Presentation
 
Designing to save lives: Government technical documentation
Designing  to save  lives: Government technical documentation Designing  to save  lives: Government technical documentation
Designing to save lives: Government technical documentation
 
Content Strategy: A Framework for Marketing Success
Content Strategy: A Framework for Marketing SuccessContent Strategy: A Framework for Marketing Success
Content Strategy: A Framework for Marketing Success
 
Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...
Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...
Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...
 
D school assignment 3 Prototype and Test
D school assignment 3 Prototype and TestD school assignment 3 Prototype and Test
D school assignment 3 Prototype and Test
 
How to avoid research debt
How to avoid research debtHow to avoid research debt
How to avoid research debt
 
Full Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYC
Full Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYCFull Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYC
Full Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYC
 
Introduction to bugs measurement
Introduction to bugs measurementIntroduction to bugs measurement
Introduction to bugs measurement
 
Data Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningData Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine Learning
 
LEARN STARTUP OVERVIEW
LEARN STARTUP OVERVIEWLEARN STARTUP OVERVIEW
LEARN STARTUP OVERVIEW
 
[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...
[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...
[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...
 
Digital portfolio 1_v2
Digital portfolio 1_v2Digital portfolio 1_v2
Digital portfolio 1_v2
 
Case study for agile software development:
Case study for agile software development: Case study for agile software development:
Case study for agile software development:
 
Improve the UX of Your Content and Prove It
Improve the UX of Your Content and Prove ItImprove the UX of Your Content and Prove It
Improve the UX of Your Content and Prove It
 
It's time to research our designs better. Here's how. UIUX Conference 2018 - ...
It's time to research our designs better. Here's how. UIUX Conference 2018 - ...It's time to research our designs better. Here's how. UIUX Conference 2018 - ...
It's time to research our designs better. Here's how. UIUX Conference 2018 - ...
 
Using cognitive walkthroughs to better review designs for accessibility
Using cognitive walkthroughs to better review designs for accessibilityUsing cognitive walkthroughs to better review designs for accessibility
Using cognitive walkthroughs to better review designs for accessibility
 
Cultivating Content: Designing Wiki Solutions That Scale
Cultivating Content: Designing Wiki Solutions That ScaleCultivating Content: Designing Wiki Solutions That Scale
Cultivating Content: Designing Wiki Solutions That Scale
 
Pubcon SFIMA Super Awesome Extended Bonus Edition
Pubcon SFIMA Super Awesome Extended Bonus EditionPubcon SFIMA Super Awesome Extended Bonus Edition
Pubcon SFIMA Super Awesome Extended Bonus Edition
 
5 Essential Tips For Improving Your Website Mockups & Prototypes!
5 Essential Tips For Improving Your Website Mockups & Prototypes!5 Essential Tips For Improving Your Website Mockups & Prototypes!
5 Essential Tips For Improving Your Website Mockups & Prototypes!
 

More from WrangleConf

Wrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes DataWrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes DataWrangleConf
 
Wrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small DataWrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small DataWrangleConf
 
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...WrangleConf
 
Wrangle 2016: Malware Tracking at Scale
Wrangle 2016: Malware Tracking at ScaleWrangle 2016: Malware Tracking at Scale
Wrangle 2016: Malware Tracking at ScaleWrangleConf
 
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlowWrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlowWrangleConf
 
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...WrangleConf
 
Wrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HRWrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HRWrangleConf
 
Sensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to CloudSensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to CloudWrangleConf
 
Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance WrangleConf
 
Data Science in Drug Discovery
Data Science in Drug DiscoveryData Science in Drug Discovery
Data Science in Drug DiscoveryWrangleConf
 
From Science to Product (Company)
From Science to Product (Company)From Science to Product (Company)
From Science to Product (Company)WrangleConf
 
The Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product SenseThe Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product SenseWrangleConf
 

More from WrangleConf (12)

Wrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes DataWrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes Data
 
Wrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small DataWrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small Data
 
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
 
Wrangle 2016: Malware Tracking at Scale
Wrangle 2016: Malware Tracking at ScaleWrangle 2016: Malware Tracking at Scale
Wrangle 2016: Malware Tracking at Scale
 
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlowWrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
 
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
 
Wrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HRWrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HR
 
Sensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to CloudSensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to Cloud
 
Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance
 
Data Science in Drug Discovery
Data Science in Drug DiscoveryData Science in Drug Discovery
Data Science in Drug Discovery
 
From Science to Product (Company)
From Science to Product (Company)From Science to Product (Company)
From Science to Product (Company)
 
The Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product SenseThe Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product Sense
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

A/B Testing at Pinterest: Building a Culture of Experimentation

Editor's Notes

  1. Hi there! I’m Andrea Burbank, and I’m a data scientist at Pinterest. WHAT MOVES THE NEEDLE When I was asked to speak at a conference for data scientists, I thought for a while about what that means, and which aspects of my experience would be most interesting to folks who are working on the same sorts of problems that I tackle every day. What I ultimately decided was that the work that moves the needle at Pinterest wasn’t just the analysis we do to understand our ecosystem or to predict user engagement, but the culture of experimentation we’ve built up across the entire company. It wasn’t something that happened overnight, and I hope that by sharing our experience I can help you scale data science at your companies as well.
  2. As data scientists, we often think of ourselves as a hybrid between a software engineer and a statistician, blending the best of both to build a talented data machine. When we hit on a problem like AB testing, we tend to approach it from that perspective: what tools and frameworks should I engineer and what statistical comparisons are most relevant in order to build a successful AB testing program? Those are both tremendously important tools, obviously. But what will end up making or breaking your experimentation program is neither of those: it’s the people. It’s building up a culture of AB testing, one person at a time.
  3. Perhaps you’ve heard of a notion of an organizational maturity model. In software, there are basic steps you follow to improve your software engineering quality: Use source control, write unit tests, and so on.
  4. MODEL + PEOPLE + ANTICIPATE Along those lines, I’d like to propose a model for the cultural maturity of experimentation. Every time you solve THE BIG problem facing you at the moment, move on to the next stage of experimentation and you create a new problem. And fundamentally, each of the problems you face is about the people and the culture, and the solutions you form are only as successful as the culture you foster to nourish them. For us, we didn’t recognize this pattern until we’d already stumbled partway through this evolution, and even when we recognized that the solution was in the culture and the people, it took us a while to embrace that approach. My hope is that by talking about these stages I can help you, unlike us, to recognize the stage of the maturity model you’re currently in, to frame and solve it as a human problem, and then to start to anticipate the next phase before it becomes absolutely necessary. So what are those stages?
  5. So what are those stages? I’d say they look like this. Get STARTED. Get BIG. Get BETTER. Get OUT. Get TOOLS. Let’s dive in.
  6. Stage 1: get started. This is where you actually build the experiment framework.
  7. The problem: people are making bad decisions. Maybe they’re shipping things willy-nilly without measuring them at all, or maybe they’re watching trends over time and attributing change to newly released products when in fact the change might be completely unrelated. So you decide to build an AB testing framework.
  8. FRAMEWORK + PIPELINE + UI -> WE MADE IT! In my first couple months at Pinterest, I built up our experiment framework to have all the capabilities I thought were important: - triggering users at the moment the experiment actually affected their experience, - keeping track of novelty effects, and - functioning correctly for offline experiments. I built a data pipeline to capture all the most important metrics automatically and a UI to surface those metrics. I ran a few experiments myself, validated the findings with A/A tests and on real experiments, and figured AHA! We’d made it. Now we could run experiments and actually understand the effects of the feature changes we made.
  9. When you’ve toiled and coded and tested and built, you may think you’re done. After all, you now have a working framework. But in fact, you only just got started.
  10. ACTUALLY USE IT The next stage is to get BIG. What I mean by that is to get people to actually use the framework you’ve built.
  11. DRIVE ADOPTION The problem you’re facing now is that your framework on its own is useless; you need to drive adoption. I think it’s easy … to underestimate how important this phase is.
  12. A GREAT PRODUCT SPEAKS FOR ITSELF I think it’s easy to underestimate how important this phase is. Again, we are engineers. There’s a part of us that really wants to believe that a great product speaks for itself. It’s so tremendously useful! You’ve anticipated all the use cases and made it easy to actually understand the effect your feature is having on users! How on earth is this not the holy grail??
  13. Unfortunately, this is almost never actually true. Even high-quality tools don’t magically attract users. So stage 2 is about getting people to actually adopt your new framework, to buy into the idea of running experiments.
  14. Once you have your experiment framework in place, your #1 priority is to get people to use it. That means you do marketing.
  15. That means you do evangelism.
  16. TECH TALKS + DEMO + PM + BENEFITS + STRATEGIC PROJECT That means that you have to be a salesman (or woman). Give tech talks. Do a demo. Give impassioned speeches to anyone who will listen. Whenever you hear about a feature going out, go find the PM, chat with the engineer, try to convince them to run an experiment. Tell them what they will learn, how they will benefit, how easy it will be. Find a strategically important project, suggest running it as an experiment, and don’t take no for an answer.
  17. SHOW VALUE -> AGAIN AND AGAIN If you demonstrate the metrics effect of a strategic initiative, or you earn people call-outs at the company all-hands for lifting a metric by 5%, or you help the company avoid a huge mistake, people like it. They want you to do it again. And again. And now, you’ve done it: you got big.
  18. LOTS OF PEOPLE -> NUDGES In stage 3, your experiment framework is big and things are going swimmingly. People start running lots of experiments, and they firmly believe that running an AB test is the best way to understand the performance of their feature. But now that you’re not the person running all the experiments, you find that they need some nudges here and there to make their experiments run correctly. Instead of evangelizing, you spend your time helping people run experiments: come up with a hypothesis, determine how they’ll detect failure, consider how changes might affect individual users’ experience.
  19. DECIDE ON OWN -> NEED YOUR GUIDANCE In stage 2, no one was trying to run experiments unless you cajoled them into it, so you were always right there to help with implementation. Now that folks have bought in and are doing it on their own, guidance is needed, and you become the human to provide that guidance.
  20. FUN! And depending on your personality, your patience, and how quickly your company is growing, this stage might last a while. If you’re in this stage now, you might think it’s pretty great. Your framework is getting used, people are making good decisions, and you have the added perk that you get to be connected to feature development across the whole company, so you always know what’s going on. And honestly, that’s a lot of fun.
  21. SPOF! NAPA But after a while, you realize that you are a single point of failure. When you’re not there, people ship experiments when there aren’t enough users. They add new variants without thinking about how to measure them. They start experiments that accidentally trigger for everyone instead of only the affected users. For me, stage 3 became suboptimal pretty abruptly when I found myself trying to do code reviews on my iPhone while on my anniversary bike trip in Napa.
  22. GO INSANE OR STOP LEARNING Now, I hit this problem fairly quickly because Pinterest was growing at a breakneck pace. You might think you can last in stage 3 for a while, or even indefinitely. But as someone who enjoyed that stage tremendously, I’d advise against it. If your organization grows and you don’t scale, the culture will spin apart and you’ll go insane. If it doesn’t grow, you’ll keep needing to play the same role of experiments diva, and you won’t get a chance to learn what else you can contribute.
  23. This is important. It’s not your career goal to be the experiments person. (You should have higher ambitions.)
  24. Making experiments run is important. It’s interesting. But it’s not what you should be doing with the rest of your life.
  25. HELP PEOPLE SUCCEED. So that’s stage three. Once you have momentum behind your experiment framework, help people succeed with it. Help them think through their setup and their data. Help them figure out whether they have enough people, or it’s not working for a subset of the population. Help review their code, check their triggering, and figure out how to relaunch when things go wrong. Having you in the loop will make your company’s experiments successful. But also: start thinking about how you can move on to stage 4.
  26. TEACH OTHERS In stage 4, you start thinking about how you can teach others to fulfill the role you’ve been taking on in helping people to run successful experiments, and how you can get out. In stage 4, you start to (flip) scale yourself.
  27. Scale yourself. Figure out what you do and write it down. Develop repeatable processes, guidelines, checklists.
  28. LIST ERRORS At this point, you’ve been helping people with experiments for a while. What mistakes do you see happening? What questions do you answer repeatedly? And how can you get others to want to understand experiments better? The first thing I did was try to write down every problem I’d seen in an experiment. I dug up that list when I was writing this talk.
  29. It was three pages long in small font.
  30. It was three pages long in small font.
  31. It was three pages long in small font. When I shared this list with a coworker, he said:
  32. It was three pages long in small font. When I shared this list with a coworker, he said:
  33. TRAIN ENGINEERS -> PEOPLE PROBLEM + PEOPLE SOLUTION I rephrased it as follows. But then the question is: how do you train engineers to run experiments accurately? Again, this is a people problem, and the solution again comes from people.
  34. The answer: make your process clear and easily repeatable. If you haven’t read Atul Gawande’s book, you should. Even the most complex human processes: performing surgery, flying a plane, building a skyscraper, can be improved by simple checklists. There are so many pieces to keep track of that having a simple list can help you get the important things right.
  35. To make a checklist: for every important mistake, explain why it’s wrong and how to avoid it.
  36. LIFECYCLE We also thought about the experiment lifecycle. In the end, there are three major phases of every experiment. First, the experiment has to launch. Before it actually take off with users on board, you want to make sure that it’s configured correctly so we can learn what we want. Once an experiment is in flight, we may need to make adjustments. Perhaps we had an idea for a new take on the feature we’re testing, or we just want to increase our experimental power to measure the effect on a larger population. And finally, when we’re ready to land the experiment, we need to make sure that we’ve learned what we want to learn and that we’re making the right decision from the data. So we built checklists: what should you watch for in each of these phases?
  37. THINKING Launch is the most important thing. If the experiment is trying to measure the wrong thing or is set up incorrectly, you won’t learn anything from it. Before an experiment begins, most of the work is in the thinking. What are you trying to do? Why? Can you measure what you want to change?
  38. WHY CHANGE? Sometimes an experiment owner will want to make changes to an experiment after launch. Usually they want to increase group sizes to get more statistical power, but sometimes they want to change the population they’re measuring or add new types of treatment. Sometimes they want to change an experiment but they haven’t actually checked to see whether it’s working as expected. All of these things then turned into the in-flight checklist.
  39. READY? RIGHT DECISION? Lastly, at some point every experiment should be shut down. Sometimes people try to shut it down too early, before they have enough data or before we can understand the long-term effects. Or they’re right that it should be shut down now, but they decide to turn it off when it actually should be shipped because metrics are up, or they decide to ship it even though metrics are down but they don’t acknowledge it. The landing checklist tries to anticipate these issues and make sure we’re avoiding them.
  40. IMPLEMENT: NO FRICTION + GET OTHERS ON BOARD But all the checklists in the world are meaningless if nobody implements them. So we spent a while thinking about two things: how we could try to improve the quality of experiments being run without introducing too much friction, and how we could get others on board to help monitor experiments’ quality.
  41. PIGGYBACK ON R+ = OK. OPTIONAL IN THEORY. INTERNS. E+. YOUR CULTURE. To improve the quality of experiments without introducing friction, we piggybacked on the concept we already have of getting an r+. If you’re not familiar with an r+, it’s a naming convention that we adopted from Mozilla, but it’s just a way of signing off on a code review. When a code reviewer signs off with r+ on a review, it means that they think the new code improves the codebase. We had a culture of r+ that we stole for e+. We never said it was mandatory, just that it was recommended, but practically it was mandatory. No one ships code without an r+ except for new people and interns. For e+, we took that and said, hey, just as with code review, making sure that an experiment goes out correctly is critical. When you set up or change an experiment in a code review, ask someone who knows about experiments to take a look at it and provide feedback on your experiment setup. You need to find something that works within your culture. We could leverage this part of our existing engineering culture to create improved experiments. What is that lever at your company?
  42. The other key to our success was getting others on board to be the experiment reviewers. I think there were a couple pieces that were important here: 1) Calling it experiments-help. We considered experiments on-call but who finds on-call glamorous? Everyone wants to help others. 2) Getting partners in engineering: move faster, badge value (certification) and owning the process themselves, not gatekeepers. 3) Choosing the right people. The first few helpers were really thoughtful, well-respected engineers in the organization. Other people looked to them as leaders.
  43. The other key to our success was getting others on board to be the experiment reviewers. I think there were a couple pieces that were important here: 1) Calling it experiments-help. We considered experiments on-call but who finds on-call glamorous? Everyone wants to help others. 2) Getting partners in engineering: move faster, badge value (certification) and owning the process themselves, not gatekeepers. 3) Choosing the right people. The first few helpers were really thoughtful, well-respected engineers in the organization. Other people looked to them as leaders.
  44. The other key to our success was getting others on board to be the experiment reviewers. I think there were a couple pieces that were important here: 1) Calling it experiments-help. We considered experiments on-call but who finds on-call glamorous? Everyone wants to help others. 2) Getting partners in engineering: move faster, badge value (certification) and owning the process themselves, not gatekeepers. 3) Choosing the right people. The first few helpers were really thoughtful, well-respected engineers in the organization. Other people looked to them as leaders.
  45. The other key to our success was getting others on board to be the experiment reviewers. I think there were a couple pieces that were important here: 1) Calling it experiments-help. We considered experiments on-call but who finds on-call glamorous? Everyone wants to help others. 2) Getting partners in engineering: move faster, badge value (certification) and owning the process themselves, not gatekeepers. 3) Choosing the right people. The first few helpers were really thoughtful, well-respected engineers in the organization. Other people looked to them as leaders.
  46. And so we announced a process. We introduced the experiments-help@ email alias and just asked people to come to us for help if they wanted to learn from their experiments.
  47. ALL THE PIECES. TRAIN FIRST SET Now we had all the pieces in place: checklists for experiments, a way for people to ask for help in code review and for a certified helper to sign off, and a way for people to write about experiments in a standard way. Now we just had to train our first set of experiment helpers.
  48. LEARN BY DOING -> APPRENTICESHIP Now we had all the pieces in place: checklists for experiments, a way for people to ask for help in code review and for a certified helper to sign off, and a way for people to ask questions about experiments outside code reviews as well. Now we just had to train first set of experiment helpers. We are strong believers in learning by doing. So we set up the experiment helper program as an apprenticeship.
  49. QUIZ + ON THE HOOK Sure, people could read the documentation. But it’s not until they were put on the spot that they’d really begin to develop a sense of what to do. We created a quiz for prospective experiment helpers to test their understanding and ability to detect common problems. Nothing fancy – ours was just a Google doc with an answer key at the end. And then when your week of the rotation came along, you were on the hook for every question that came into experiments-help and every code review. When you were exposed to the variety of experiments people ran and had to be the person who kept them going in the right direction, you learned quickly.
  50. And so we expanded from just me, to me and Dan and John.
  51. And from Dan and John to a small set of respected engineers on a variety of teams, who start to build up the culture of experiments within their own smaller organizations.
  52. 50 PEOPLE + SELF-PROPELLING: QUEUE, COMMUNITY, TEAMS And now we have 50 trained experiment helpers distributed across all the product engineering teams at the company. It’s become self-propelling: we have queues of folks waiting to train as helpers, folks jumping in to answer each other’s questions, and individual engineering teams honing their own team’s experiment processes. We add questions to the quiz as new problems arise, and we now have a small army of folks equipped with experimental understanding who can explain new changes to their teams and help our process continue to grow.
  53. REMOVE YOURSELF FROM THE LOOP BY TRAINING OTHERS. GROWING VOLUME OF EXPERIMENTS. So that’s stage four. Remove yourself from the loop by training others to take over your role. Get them to ask the hard questions, to help experiment owners avoid pitfalls and follow best practices. At this point, you’ve built a well-oiled, self-sustaining machine. The volume of experiments grows and grows. Problems that were rare when you started now crop up often enough that they’re really starting to get irritating, and so you start to think about what else you could invest in to simplify experiments and increase their likelihood of success.
  54. MANY ERRORS ARE HUMAN, BUT SIMPLE ONES (FLIP) ARE PREVENTABLE. LETS HUMANS FOCUS ON THINKING. A lot of the things that can go wrong with an experiment are human: you can’t automate them away. Is it worth running an experiment in the first place? Does your hypothesis make sense, given the feature you’re building? Have you thought about what will happen to users if you remove the treatment? How will we decide whether the experiment is a success? But as you step back from individually reviewing everyone’s experiments, you may start to notice patterns of where simple things are going wrong,(flip) and you have the opportunity to step back and try to eliminate the problems that can be solved by better tools and automation. By solving this set of problems, you allow humans to focus on the hard stuff: the thinking.
  55. MANY ERRORS ARE HUMAN, BUT SIMPLE ONES (FLIP) ARE PREVENTABLE. LETS HUMANS FOCUS ON THINKING. A lot of the things that can go wrong with an experiment are human: you can’t automate them away. Is it worth running an experiment in the first place? Does your hypothesis make sense, given the feature you’re building? Have you thought about what will happen to users if you remove the treatment? How will we decide whether the experiment is a success? But as you step back from individually reviewing everyone’s experiments, you may start to notice patterns of where simple things are going wrong, and you have the opportunity to step back and try to eliminate the problems that can be solved by better tools and automation. By solving this set of problems, you allow humans to focus on the hard stuff: the thinking.
  56. Some of the simple mistakes happen at launch. If you can remove all the implementation details, you allow the experiment helper to focus on the important questions of what the experiment is trying to measure.
  57. For us, that meant simplifying the experiment API, removing untriggered experiments, and creating helper functions for common user populations, like only experimenting on the latest app version.
  58. For us, that meant simplifying the experiment API, removing untriggered experiments, and creating helper functions for common user populations, like only experimenting on the latest app version.
  59. LAST ONE For us, that meant simplifying the experiment API, removing untriggered experiments, and creating helper functions for common user populations, like only experimenting on the latest app version.
  60. Other mistakes happen in-flight. By building tools to take care of those details, we allowed the experiment helper to pay attention instead to why the experiment was changing and how it would be measured.
  61. LAST ONE
  62. WRONG DECISION -> HURTS USERS, WRONG DIRECTION Perhaps the most worrisome set of mistakes happens when someone decides to land an experiment. If they make the wrong decision here, it could result not only in shipping a product that hurts users, but in shaping future product decisions based on erroneous learnings! So we invested especially heavily in helping people avoid mistakes in interpreting their experiment results.
  63. First off, an experiment will be invalid if the randomization produced groups that aren’t actually the same, so we built a number of tools to detect errors.
  64. (last error)
  65. Other mistakes resulted from people trying to do their own analysis on metrics: querying the data incorrectly, making comparisons that didn’t make sense, or just not thinking about statistical significance.
  66. So that’s stage five, where Pinterest currently finds itself. After stepping back from the day-to-day review of experiments, we built tools so that the experiment helpers can focus on the important part: deciding what to build and understanding how it affects our users.
  67. SUMMARIZE. NOT JUST ENGINEERING: PEOPLE. BUY-IN, TEACHING, HARDER TO SHOOT FOOT. We’ve built an experiment framework that allows us to track changes on all parts of our service, gotten it widespread adoption, built up a core set of 50 engineers who lead their teams in running experiments, and automated tools to make all of the aspects of the experiment lifecycle harder to screw up. At each stage, while engineering and statistical know-how were part of the equation, the real solution lay in building a culture of experimentation: getting the humans who make up the organization to buy into experiments, teaching them to help each other make decisions, and building tools that make it harder to shoot yourself in the foot.
  68. NEXT?? LESSONS BEYOND EXPERIMENTATION. I don’t know yet what the next stage will look like. If you do, I’d love to find out. But I think the lessons extend beyond just experimentation. (flip) Data science is not just engineering and statistics: your recommendation model will not be used unless you convince someone it’s useful, and your analysis will not change product strategy until it’s changed people’s minds.
  69. NOT JUST ENGINEERING AND STATS. CONVINCE PEOPLE. Data science is not just engineering and statistics: your recommender system will gather dust unless you convince someone it’s useful, and your analysis will not change product strategy until it’s changed people’s minds. Spending time actively investing in building a data-driven culture will pay off handsomely in the long run.