Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
NONDETERMINISTIC SOFTWARE
FOR THE REST OF US
An exercise in frustration by
Tomer Gabel @ GeeCON 2018, Krakow
Case Study #1
• Delver, circa 2007
• We built a search engine
• What’s expected?
– Performant (<1 sec)
– Reliable
– Useful
Let me take you back…
• We applied good old
fashioned engineering
• It was kind of great!
– Reliability
– Fast iteration
–...
Let me take you back…
• So yeah, we coded it
• And it worked… sort of
– It was highly available
– It responded within SLA
...
Furthermore
• Not all software can be
acceptance-tested
– Qualitative/subjective
(e.g. search, social feed)
Furthermore
• Not all software can be
acceptance-tested
– Qualitative/subjective
(e.g. search, social feed)
– Huge input s...
Furthermore
• Not all software can be
acceptance-tested
– Qualitative/subjective
(e.g. search, social feed)
– Huge input s...
“CORRECT” AND “GOOD”
ARE SEPARATE DIMENSIONS
Takeaway #1
Getting Started
• For any product of any
scale, always ask:
– What does success look like?
Image: Hole in the Wall, Freman...
Getting Started
• For any product of any
scale, always ask:
– What does success look like?
– How can I measure success?
Im...
Getting Started
• For any product of any
scale, always ask:
– What does success look like?
– How can I measure success?
• ...
What should you measure?
• (Un-) fortunately, you
have customers
• Analyze their behavior
– What do they want?
– What infl...
USERS ARE PART OF YOUR SYSTEM
Takeaway #2
What should you measure?
• (Un-) fortunately, you
have customers
• Analyze their behavior
– What do they want?
– What infl...
What should you measure?
Paging
– “Not relevant enough”
Query
Skim
Decide
Follow
RefinementPaging
What should you measure?
Paging
– “Not relevant enough”
Refinement
– “Not what I meant”
Query
Skim
Decide
Follow
Refinemen...
What should you measure?
Paging
– “Not relevant enough”
Refinement
– “Not what I meant”
Clickthrough
– “Bingo!”
Query
Skim...
What should you measure?
Paging
– “Not relevant enough”
Refinement
– “Not what I meant”
Clickthrough
– “Bingo!”
Bonus: Aba...
It should.
Is this starting to look familiar?
Well now!
• We’ve been having this
conversation for years
• Mostly with…
– Product managers
– Business analysis
– Data eng...
Well now!
• We’ve been having this
conversation for years
• Mostly with…
– Product managers
– Business analysis
– Data eng...
What can we learn from BI?
Ø Be mindful of your users
Ø Talk to your analysts!• Analysis
• Experimentation
• Iteration
What can we learn from BI?
Ø Invest in A/B tests
Ø Prove your
improvements!
• Analysis
• Experimentation
• Iteration
What can we learn from BI?
• Analysis
• Experimentation
• Iteration
Ø Establish your baseline
Ø Invest in metric collectio...
SYSTEMS ARE NOT SNAPSHOTS.
MEASURE CONTINUOUSLY
Takeaway #3
Hold on to your hats
… this isn’t about search engines
Case Study #2
• newBrandAnalytics,
circa 2011
• A social listening platform
– Finds user-generated
content (e.g. reviews)
...
Social Listening Platform
• A three-stage pipeline
Acquisition
•3rd party ingestion
•BizDev
•Web scraping
Analysis
•Manual...
Social Listening Platform
• A three-stage pipeline
• My team focused on data
acquisition
• Let’s discuss web scraping
– St...
Large-Scale Scraping
• A two-pronged problem
• Target sites…
– Can change at the drop of a hat
– Actively resist scraping!...
Optimizing for User Happiness
• Users consume reviews
• What do they want?
– Completeness
(no missed reviews)
– Correctnes...
Putting It Together
• How do we measure
completeness?
• Manually
– Costly, time consuming
– Sampled (by definition)
Image:...
Putting It Together
• How do we measure
completeness?
• Manually
– Costly, time consuming
– Sampled (by definition)
• Auto...
Putting It Together
• How do we measure
completeness?
• Manually
– Costly, time consuming
– Sampled (by definition)
• Auto...
Putting It Together
• Targets do not want to
be scraped
• Major sites employ:
– IP throttling
– Traffic fingerprinting
• 3...
Putting It Together
• What of timeliness?
• It’s an optimization
problem
– Polling frequency
determines latency
– But poll...
Putting It Together
• So then, timeliness…?
• First, build a cost model
– Review acquisition cost
– Break it down by sourc...
Recap
1. ”Correct” and “Good” are
separate dimensions
2. Users are part of your system
3. Systems are not snapshots.
Measu...
QUESTIONS?
Thank you for listening
tomer@tomergabel.com
@tomerg
http://www.tomergabel.com
This work is licensed under a Cr...
Próxima SlideShare
Cargando en…5
×

Nondeterministic Software for the Rest of Us

151 visualizaciones

Publicado el

A talk given at GeeCON 2018 in Krakow, Poland.

Classically-trained (if you can call it that) software engineers are used to clear problem statements and clear success and acceptance criteria. Need a mobile front-end for your blog? Sure! Support instant messaging for a million concurrent users? No problem! Store and serve 50TB of JSON blobs? Presto!

Unfortunately, it turns out modern software often includes challenges that we have a hard time with: those without clear criteria for correctness, no easy way to measure performance and success is about more than green dashboards. Your blog platform better have a spam filter, your instant messaging service has to have search, and your blobs will inevitably be fed into some data scientist's crazy contraption.

In this talk I'll share my experiences of learning to deal with non-deterministic problems, what made the process easier for me and what I've learned along the way. With any luck, you'll have an easier time of it!

Publicado en: Software
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Nondeterministic Software for the Rest of Us

  1. 1. NONDETERMINISTIC SOFTWARE FOR THE REST OF US An exercise in frustration by Tomer Gabel @ GeeCON 2018, Krakow
  2. 2. Case Study #1 • Delver, circa 2007 • We built a search engine • What’s expected? – Performant (<1 sec) – Reliable – Useful
  3. 3. Let me take you back… • We applied good old fashioned engineering • It was kind of great! – Reliability – Fast iteration – Built-in regression suite Spec Tests Code Deployment
  4. 4. Let me take you back… • So yeah, we coded it • And it worked… sort of – It was highly available – It responded within SLA – … but with crap results • Green tests aren’t everything!
  5. 5. Furthermore • Not all software can be acceptance-tested – Qualitative/subjective (e.g. search, social feed)
  6. 6. Furthermore • Not all software can be acceptance-tested – Qualitative/subjective (e.g. search, social feed) – Huge input space (e.g. machine vision) Image: Cristian David
  7. 7. Furthermore • Not all software can be acceptance-tested – Qualitative/subjective (e.g. search, social feed) – Huge input space (e.g. machine vision) – Resource-constrained (e.g. Lyft or Uber) Image: rideshareapps.com
  8. 8. “CORRECT” AND “GOOD” ARE SEPARATE DIMENSIONS Takeaway #1
  9. 9. Getting Started • For any product of any scale, always ask: – What does success look like? Image: Hole in the Wall, FremantleMedia North America
  10. 10. Getting Started • For any product of any scale, always ask: – What does success look like? – How can I measure success? Image: Hole in the Wall, FremantleMedia North America
  11. 11. Getting Started • For any product of any scale, always ask: – What does success look like? – How can I measure success? • You’re an engineer! – Intuition can’t replace data – QA can’t save your butt Image: Hole in the Wall, FremantleMedia North America
  12. 12. What should you measure? • (Un-) fortunately, you have customers • Analyze their behavior – What do they want? – What influences your quality of service? • For a search engine… Query Skim Decide Follow RefinementPaging
  13. 13. USERS ARE PART OF YOUR SYSTEM Takeaway #2
  14. 14. What should you measure? • (Un-) fortunately, you have customers • Analyze their behavior – What do they want? – What influences your quality of service? • For a search engine… Query Skim Decide Follow RefinementPaging Signal Signal Signal
  15. 15. What should you measure? Paging – “Not relevant enough” Query Skim Decide Follow RefinementPaging
  16. 16. What should you measure? Paging – “Not relevant enough” Refinement – “Not what I meant” Query Skim Decide Follow RefinementPaging
  17. 17. What should you measure? Paging – “Not relevant enough” Refinement – “Not what I meant” Clickthrough – “Bingo!” Query Skim Decide Follow RefinementPaging
  18. 18. What should you measure? Paging – “Not relevant enough” Refinement – “Not what I meant” Clickthrough – “Bingo!” Bonus: Abandonment – ”You suck” Query Skim Decide Follow RefinementPaging
  19. 19. It should. Is this starting to look familiar?
  20. 20. Well now! • We’ve been having this conversation for years • Mostly with… – Product managers – Business analysis – Data engineers • Guess what? Product Changes R&D DeploymentMeasurement Analysis
  21. 21. Well now! • We’ve been having this conversation for years • Mostly with… – Product managers – Business analysis – Data engineers • Guess what? Product Changes R&D DeploymentMeasurement Analysis Informed by BI
  22. 22. What can we learn from BI? Ø Be mindful of your users Ø Talk to your analysts!• Analysis • Experimentation • Iteration
  23. 23. What can we learn from BI? Ø Invest in A/B tests Ø Prove your improvements! • Analysis • Experimentation • Iteration
  24. 24. What can we learn from BI? • Analysis • Experimentation • Iteration Ø Establish your baseline Ø Invest in metric collection and dashboards
  25. 25. SYSTEMS ARE NOT SNAPSHOTS. MEASURE CONTINUOUSLY Takeaway #3
  26. 26. Hold on to your hats … this isn’t about search engines
  27. 27. Case Study #2 • newBrandAnalytics, circa 2011 • A social listening platform – Finds user-generated content (e.g. reviews) – Provides operational analytics
  28. 28. Social Listening Platform • A three-stage pipeline Acquisition •3rd party ingestion •BizDev •Web scraping Analysis •Manual tagging/training •NLP/ML models Analytics •Dashboards •Ad-hoc query/drilldown •Reporting
  29. 29. Social Listening Platform • A three-stage pipeline • My team focused on data acquisition • Let’s discuss web scraping – Structured data extraction – At scale – Reliability is paramount Acquisition •3rd party ingestion •BizDev •Web scraping Analysis •Manual tagging/training •NLP/ML models Analytics •Dashboards •Ad-hoc query/drilldown •Reporting
  30. 30. Large-Scale Scraping • A two-pronged problem • Target sites… – Can change at the drop of a hat – Actively resist scraping! • Both are external constraints • Neither can be unit-tested
  31. 31. Optimizing for User Happiness • Users consume reviews • What do they want? – Completeness (no missed reviews) – Correctness (no duplicates/garbage) – Timeliness (near real-time) TripAdvisor Twitter Yelp … DataAcquisition Reports Notifications Data Lake
  32. 32. Putting It Together • How do we measure completeness? • Manually – Costly, time consuming – Sampled (by definition) Image: Keypunching at Texas A&M, Cushing Memorial Library and Archives, Texas A&M (CC-BY 2.0)
  33. 33. Putting It Together • How do we measure completeness? • Manually – Costly, time consuming – Sampled (by definition) • Automatically – Re-scrape a known subset – Produce similarity score
  34. 34. Putting It Together • How do we measure completeness? • Manually – Costly, time consuming – Sampled (by definition) • Automatically – Re-scrape a known subset – Produce similarity score • Same with correctness
  35. 35. Putting It Together • Targets do not want to be scraped • Major sites employ: – IP throttling – Traffic fingerprinting • 3rd party proxies are expensive Image from the movie “UHF", Metro-Goldwyn-Mayer
  36. 36. Putting It Together • What of timeliness? • It’s an optimization problem – Polling frequency determines latency – But polling has a cost – “Good” is a tradeoff
  37. 37. Putting It Together • So then, timeliness…? • First, build a cost model – Review acquisition cost – Break it down by source • Next, put together SLAs – Reflect cost in pricing! – Adjust scheduler by SLA
  38. 38. Recap 1. ”Correct” and “Good” are separate dimensions 2. Users are part of your system 3. Systems are not snapshots. Measure continuously Image: Confused Monkey, Michael Keen (CC BY-NC-ND 2.0)
  39. 39. QUESTIONS? Thank you for listening tomer@tomergabel.com @tomerg http://www.tomergabel.com This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

×