Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Intro to Data Science

Cargando en…3

Eche un vistazo a continuación

1 de 52 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Intro to Data Science (20)


Más de TJ Stalcup (20)

Más reciente (20)


Intro to Data Science

  1. 1. network: In3Guest
  2. 2. March 2017 Intro to Data Science
  3. 3. Me • TJ Stalcup • Lead DC Mentor @ Thinkful • API Evangelist @ WealthEngine • Github: tjstalcup • Twitter: @tjstalcup
  4. 4. You I already have a career in data I’m serious about switching into a career in data I’m curious about switching into a career in data I just want to see what all the fuss is about
  5. 5. Today’s Goals What is a data scientist and what do they do? How and why has the field emerged? How can one become a data scientist?
  6. 6. Why do we care? “The United States alone faces a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings.” - @McKinsey
  7. 7. Why do we care? Also… average salaries are $115,000 a year
  8. 8. Nate Silver “I think data-scientist is a sexed up term for a statistician”
  9. 9. Example: LinkedIn 2006 “[LinkedIn] was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.” -LinkedIn Manager, June 2006
  10. 10. Enter: Data Scientist Joined LinkedIn in 2006, only 8M users (450M in 2016) Started experiments to predict people’s networks Engineers were dismissive: “you can already import your address book” Jonathan Goldman
  11. 11. The Result
  12. 12. Other Examples Uber — Where drivers should hang out Netflix — $1M movie recommendations contest Ebola — Mobile mapping in Senegal to fight disease
  13. 13. Big Data Big Data: datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze
  14. 14. Big Data - History Trend “started” in 2005 (Hadoop!) Web 2.0 - Majority of content is created by users Mobile accelerates this — data/person skyrockets
  15. 15. Hadoop? HDFS MapReduce
  16. 16. Hadoop Distributed File System File is too big….Distribute! Too many files….Distribute! Yahoo has over 10,000 servers running Hadoop
  17. 17. MapReduce Data + Processing Software Distributed Processing Map all of the data, reduce it
  18. 18. MapReduce
  19. 19. Big Data 90% of the data in the world today has been created in the last two years alone - IBM, May 2013
  20. 20. Big Data
  21. 21. Data Scientists - We Can Be Heroes
  22. 22. Data Scientists - Jack of all Trades
  23. 23. The Process - LinkedIn Example Frame the question Collect the raw data Process the data Explore the data Communicate results
  24. 24. Case: Frame the Question What questions do we want to answer?
  25. 25. Case: Frame the Question What connections (type and number) lead to higher user engagement? Which connections do people want to make but are currently limited from making? How might we predict these types of connections with limited data from the user?
  26. 26. Case: Collect the Data What data do we need to answer these questions?
  27. 27. Case: Collect the Data Connection data (who is who connected to?) Demographic data (what is the profile of the connection) Retention data (how do people stay or leave) Engagement data (how do they use the site)
  28. 28. Case: Process the Data How is the data “dirty” and how can we clean it?
  29. 29. Case: Process the Data User input - 80/20 Redundancies - 2 emails Feature changes Data model changes
  30. 30. Case: Explore the Data What are the meaningful patterns in the data?
  31. 31. Case: Explore the Data Triangle closing Time overlaps Geographic clustering
  32. 32. Case: Communicate Findings How do we communicate this? To whom?
  33. 33. Case: Communicate Findings Tell story at the right technical level for each audience Make sure to focus on Whats In It For You (WIIFY!) Be objective, don’t lie with statistics Be visual! Show, don’t just tell
  34. 34. Tools SQL Queries Business Analytics Software Machine Learning Algorithms
  35. 35. #1 - SQL Queries SQL is the standard querying language to access and manipulate databases
  36. 36. #1 - SQL Queries friends id full_name age 1 Dan Friedman 24 2 Tyler Brewer 27 3 David Coulter 22 4 TJ Stalcup 33 SELECT full_name FROM friends WHERE age>22
  37. 37. #2: Visualization Software Business analytics software for your database enabling you to easily find and communicate insights visually
  38. 38. #2: Visualization Software
  39. 39. #3: Machine Learning Algorithms Machine learning algorithms provide computers with the ability to learn without being explicitly programmed — “programming by example”
  40. 40. Iris Data Set
  41. 41. Iris Data Set
  42. 42. Iris Data Set ?
  43. 43. Use Cases for Machine Learning Classification — Predict categories Regression — Predict values Anomaly Detection — Find unusual occurrences Clustering — Discover structure
  44. 44. It’s not easy but someone has to do it
  45. 45. That someone might be you Knowledge of statistics, algorithms, & software Comfort with languages & tools (Python, SQL, Tableau) Inquisitiveness and intellectual curiosity Strong communication skills It’s all Teachable!
  46. 46. Ways to keep learningLevelofsupport Learning methods
  47. 47. 1-on-1 mentorship enables flexibility 325+ mentors with an average of 10 years of experience in the field
  48. 48. Support ‘round the clock
  49. 49. Our results Job Titles after GraduationMonths until Employed
  50. 50. Try us out! • Initial 3-week prep course includes six mentor sessions for $250 • Learn Python, Python’s data science toolkit, stats • Option to continue onto Data Science bootcamp • Talk to me (or email if you’re interested