Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Getting started in Data Science (April 2017, Los Angeles)
1. Data Science:
How did we get here and where are we going?
April 2017
http://bit.ly/data-la
Wifi: CrossCamp.us Events
2. About us
We train developers and data
scientists through 1-on-1
mentorship and career prep
3. About me
• Noel Duarte
• Los Angeles Area General Manager
• UC Berkeley ’15 — worked primarily with R for population
genetics analysis, at Thinkful since January 2016
4. About you
Why are you here?
• I already have a career in data
• I’m curious about switching to a career in data
• I’m currently transitioning into a career in data
• I want to learn what data science is and why it’s
important
5. Today’s goals
• Why is data science important?
• What is a data scientist and what do they do?
• How and why has the field emerged?
• How can one become a data scientist? (And why
would you want to?)
6. Why is data science important?
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 people with deep analytical skills
as well as 1.5 million managers and analysts with the
know-how to use the analysis of big data to make
effective decisions.
- McKinsey Global Institute (MGI)
10. Case study: LinkedIn (2006)
“[LinkedIn] was like arriving at a conference reception
and realizing you don’t know anyone. So you just stand
in the corner sipping your drink—and you probably
leave early.”
-LinkedIn Manager, June 2006
11. The new guy
• Joined LinkedIn in 2006,
only 8M users (450M in
2016)
• Started experiments to
predict people’s networks
• Engineers were dismissive:
“you can already import
your address book”
13. Data, data everywhere 🚀
• Uber — Where drivers should hang out
• Netflix — movie recommendations
• Ebola epidemic — Mobile mapping in Senegal to
fight disease
15. Big Data — what exactly does it mean?
Big Data: datasets whose size is beyond the ability of
typical database software tools to capture, store,
manage, and analyze
16. Big Data — brief history
• Trend “started” in 2005 (Hadoop!)
• Web 2.0 - Majority of content is created by users
• Mobile accelerates this — data/person skyrockets
21. The data science process
Let’s come back to LinkedIn’s evolution in 2006 and
examine it using a typical* data science approach.
• Frame the question
• Collect the raw data
• Process the data
• Explore the data
• Communicate results
22. Case: Frame the question
What questions do we want to answer?
23. Case: Frame the question
• What connections (type and number) lead to higher
user engagement?
• Which connections do people want to make but are
currently limited from making?
• How might we predict these types of connections
with limited data from the user?
24. Case: Collect the data
What data do we need to answer these questions?
25. Case: Collect the data
• Connection data (who is who connected to?)
• Demographic data (what is profile of connection?)
• Retention data (how do people stay or leave?)
• Engagement data (how do they use the site?)
26. Case: Process the data
How is the data “dirty” and how can we clean it?
27. Case: Process the data
• User input
• Redundancies
• Feature changes
• Data model changes
28. Case: Explore the data
What are the meaningful patterns in the data?
29. Case: Explore the data
• Triangle closing
• Time overlaps
• Geographic clustering
31. Case: Communicate results
• Tell story at the right technical level for each audience
• Make sure to focus on Whats In It For You (WIIFY!)
• Be objective, don’t lie with statistics
• Be visual! Show, don’t just tell
32. Tools to explore “big data”
• SQL Queries
• Business Analytics Software
• Machine Learning Algorithms
33. Tool #1: SQL queries
SQL is the standard querying language to access and
manipulate databases
34. SQL example
friends
id full_name age
1 Dan Friedman 24
2 Jared Jones 27
3 Paul Gu 22
4 Noel Duarte 73
SELECT full_name FROM friends WHERE age=73
35. Tool #2: Analytics software
Business analytics software for your database enabling
you to easily find and communicate insights visually
37. Tool #3: Machine Learning Algorithms
Machine learning algorithms provide computers
with the ability to learn without being explicitly
programmed — “programming by example”
42. This is what you’ll need
• Knowledge of statistics, algorithms, & software
• Comfort with languages & tools (Python, SQL,
Tableau)
• Inquisitiveness and intellectual curiosity
• Strong communication skills
43. Ways to keep learningLevelofsupport
Structure efficiency
44. 1-on-1 mentorship enables flexibility
325+ mentors with an average of 10
years of experience in the field
46. Want to try us/data science out?
• Three-week program,
includes six mentor
sessions for $250
• Overview of Python,
Python’s data science
toolkit, stats
• Option to continue into
full data science
bootcamp
• Talk to me (or email
noel@thinkful.com) if
you’re interested