Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Future of data science as a profession

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 47 Anuncio

Future of data science as a profession

How can you thrive in a future where machine learning has been popular for a few years already?
In this talk, I will give you actionable advice from my experience training serious data scientists at our retreat center in Berlin. You are going to face these pointy, hard questions:
- What is the promise of machine learning? Has it happened yet?
- Is it easy to take advance of machine learning, now that most algorithms are nicely packaged in APIs and libraries?
- How much time should I spend getting good at machine learning? Am I good enough now?
- Are data scientists going to be replaced by algorithms? Are we all?
- Is it easy to hire talent in machine learning after the explosion of MOOCs?

How can you thrive in a future where machine learning has been popular for a few years already?
In this talk, I will give you actionable advice from my experience training serious data scientists at our retreat center in Berlin. You are going to face these pointy, hard questions:
- What is the promise of machine learning? Has it happened yet?
- Is it easy to take advance of machine learning, now that most algorithms are nicely packaged in APIs and libraries?
- How much time should I spend getting good at machine learning? Am I good enough now?
- Are data scientists going to be replaced by algorithms? Are we all?
- Is it easy to hire talent in machine learning after the explosion of MOOCs?

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (20)

Anuncio

Similares a Future of data science as a profession (20)

Más reciente (20)

Anuncio

Future of data science as a profession

  1. 1. Future of Data Science as a profession Jose Quesada, Director, Data Science Retreat @datascienceret http://datascienceretreat.com/
  2. 2. The promise
  3. 3. The machine learning promise People should be able to predict: • Which employee will leave in the next 6 months • Which electric generator is likely to die in the next 2 weeks • Which sales lead has the highest potential to close in the next 3 months • What each new website visitor is likely to buy based on past visitors
  4. 4. http://www.slideshare.net/bigml/the-past-present-and-future-of-machine-learning-apis Jao. The Past, Present, and Future of Machine Learning APIs
  5. 5. http://www.enlitic.com/healthcare.html
  6. 6. Smile detection Example Graduate portfolio project from DSR 03. Smile detection on video streams. Works reliably with multiple people on cam. Applications: youtube funny video evaluation
  7. 7. Data analysis has become super easy. But has it? • Great libraries exist with every algorithm under the sun
  8. 8. The machine learning promise (Anyone who can turn on a computer) should be able to predict: • Which employee will leave in the next 6 months • Which electric generator is likely to die in the next 2 weeks • Which sales lead has the highest potential to close in the next 3 months • What each new website visitor is likely to buy based on past visitors
  9. 9. Paco Nathan: Data Science in future tense
  10. 10. Why data analysis is still hard, after all the libraries and APIs
  11. 11. Andreas Mueller’s map
  12. 12. Trent McConaghy’s riff on Andy http://trent.st/ffx/
  13. 13. Two machine learners, two maps Andreas Mueller, PhD Andy is an Assistant Research Scientist at the NYU Center for Data Science, building a group to work on open source software for data science. Previously I was a Machine Learning Scientist at Amazon, working on computer vision and forecasting problems. I am one of the core developers of the scikit-learn machine learning library, and have maintained it for several years. Authored the now famous model picker image from scikit-learn Trent McConaghy, PhD Trent is co-founder & CTO of ascribe, which uses modern crypto, ML, and big data to tackle challenges in digital property ownership. His two startups applied ML in the enterprise semi- conductor space: ADA was acquired in 2004 and Solido is going strong. His interests include large scale regression, automating creativity, anything labeled "impossible", and thousand-fold improvements. He was raised on a pig farm in Canada.
  14. 14. Why data analysis is still hard, after all the libraries and APIs • It’s too easy to lie to yourself about it working • It’s very hard to tell whether it could work if it doesn’t • There is no free lunch http://blog.mikiobraun.de/2014/02/data-analysis-hard- parts.html
  15. 15. No free lunch theorem • There is no universally optimal learning algorithm as shown by the No Free Lunch Theorem: There is no algorithm which is better than all the rest for all kinds of data.
  16. 16. “Toolified” • As more and more ML techniques become "toolified" the problem is that the business doesn't understand that the hard work is still ahead of them. • Home Depot sells hammers and lumber, and while some people have the skill and dedication to build their own house, most folks are smart enough to hire someone that knows what they're doing so the thing doesn't fall in and kill their family. • Blind faith in the power of tools is not helpful
  17. 17. 80 % data mangling 20 % building & testing models Is model building automatable? How about the data Wrangling part? It’s actually a larger chunk
  18. 18. Automating the data scientist
  19. 19. Machine learning APIs
  20. 20. Machine learning for data Wrangling
  21. 21. • Zoubin Ghahramani, Automatic statistician • It's easy to shoot yourself in the foot with automated tools — and convince yourself that the results are meaningful when they're not
  22. 22. Alternative: interfaces that draw the most useful information out of people Aka ‘The Luis von Ahn trick’. Human computation: combine human brainpower with computers to solve problems that neither could solve alone. ReCAPTCHA: Computer-generated tests that humans are routinely able to pass but that computers have not yet mastered.
  23. 23. Actionable advice for individuals
  24. 24. Goal • Become a full-stack problem solver • AKA the unicorn data scientist
  25. 25. How to get there • Focus on delivering business value
  26. 26. How to get there Only after the business side is covered: focus on the tech stack. • Machine learning • Big data/ engineering • When to use ML at scale, when to sample and run on a single machine
  27. 27. Constant learning • The field changes faster than any other in technology • If you are not willing to allocate ‘time outside work’ to learn new things you will stagnate fast
  28. 28. Not being the equivalent to a code monkey • MOOC haven decreased the barrier of entry to machine- learning. • Nowadays, you cannot be ‘the guy who knows how to run (insert off-the-shelf-algo-here)’. In dataland, that’s the equivalent to being a code monkey. MOOCs and superb libraries (scikit-learn, R’s ecosystem) made sure there is plenty of people who can throw say a random forest to a problem. In the modern world, this is not adding that much value.
  29. 29. Picking problems to add the most value • Sometimes beating what the company is already doing (often, nothing) offers a lot of value. Detecting fraud poorly is better than not detecting fraud
  30. 30. Data Science will continue to be democratized • There’s no shortage of data scientists. • 1900: Number of cars on the road would be limited by the supply of trained chauffeurs.
  31. 31. Machine learning can very quickly get you, say, 80% of the way to solving just about any (real world) problem You want to apply ML to contexts that are fault tolerant: • Online ad targeting • Ranking search results • Recommendations • Spam filtering
  32. 32. ML quickly hits a point of diminishing returns “The gain is not worth the pain."
  33. 33. Actionable advice for companies
  34. 34. Talent: invest in it • The hunt for the 10x programmer continues (although few companies succeed) • In data science, the equivalent is the unicorn data scientist • Unicorn data scientist should generate more business value than a 10x programmer • Market agrees: supersalaries of >200k are common for unicorn data scientists
  35. 35. Talent: beware of the fake data scientist • Each linkedin job ad for data scientist gets ~150 applications • Often people who just rebranded themselves but have no real experience • Very common in guys bailing out of academia • HR managers cannot tell the difference • It’s a common mistake to hire one, and never be able to produce business value
  36. 36. Talent: easier to find than you may think • Online courses have raised the bar • Intensive bootcamps do work, as long as people have built something at the end • You will still get 150 fake data scientist for each decent one
  37. 37. A future where ML has been popular for years. How does it look like?
  38. 38. Next 3 years • ML APIs will enable people with less and less skill to run quite sophisticated analyses • Startups doing ML as a service will grow up, then contract. ML will stop being a key competitive advantage on most (not all) domains • Blind faith in the power of tools will lead to wrong decisions, which will lead to a backslash
  39. 39. Next 10 years • Prediction: C-level people will be data scientists in the future • Product managers become a data scientist, or get replaced by one
  40. 40. DS is a chaotic field and people don’t really know what they want (much less what they need)
  41. 41. Interested in Data Science Retreat? Apply to any of our two tracks http://datascienceretreat.com/
  42. 42. Thank You! Jose Quesada, PhD Director, Data Science Retreat @datascienceret me@josequesada.com
  43. 43. References • Paco Nathan. Data science in future tense • Chris Dixon Machine learning is really good at partially solving just about any problem • Jao. The Past, Present, and Future of Machine Learning APIs

Notas del editor

  • It was almost a joke
    Too much email asking the ‘When to do what’ question
  • IF YOU thought sci-kit learn was convenient 
  • What is business value? If you have been in academia or away from a customer-facing role most of your career, you probably don’t have good intuitions abut this. Sure-fire way to learn is to start a business. Or take a customer-facing role. Even so it may take years to know your market
  • What is business value? If you have been in academia or away from a customer-facing role most of your career, you probably don’t have good intuitions abut this. Sure-fire way to learn is to start a business. Or take a customer-facing role. Even so it may take years to know your market
  • The discussion about the shortage of Data Scientists reminds me that in the early 1900s people thought that the number of cars on the road would be limited by the supply of trained chauffeurs. Then Henry Ford and others built cars that owners could drive themselves. New tools are going to be available that business owners can use themselves without need data scientists
     
  • you need to apply ML to contexts that are fault tolerant:
    online ad targeting,
    ranking search results,
    Recommendations
    spam filtering.

×