SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
How to Prepare for a Career
in Data Science
Juuso Parkkinen, PhD - @ouzor
Head of Data Science, Nightingale Health - @NgaleHealth
Aalto University, November 25, 2019
Outline
1.My Career as a Data Scientist
2.Data Science Workflow
3.Data Science and Business
My Career as a Data Scientist
The Data Science Venn Diagram
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
My career steps
MSc in bioinformation technology
from HUT / Aalto
PhD in bioinformatics and machine
learning from Aalto
Data Scientist (consultant) at Reaktor
Data Scientist at Nightingale Health
Data Science research: probabilistic
models for biomedical problems
7
More on my research and other projects:
https://ouzor.github.io/projects.html
Data Science as a hobby: open tools for open data
Blogging
Open source programming
Open Knowledge -community
Blogs: https://louhos.github.io/, https://ouzor.github.io/
Open data science example: Biking activity in Helsinki
How do various factors affect biking activity
in Helsinki?
Data sources:
- Automatic bike activity counters from
multiple sites
- Weather data from FMI
Bike activity modelled with Negative
Binomial distribution using R (mgcv::gam)
Done with Janne Sinkkonen and Antti
Poikola
Data, code & results:
https://github.com/apoikola/fillarilaskennat
9
Open Data Science at Reaktor: Apartment price modelling
Kannattaakokauppa.fi by Reaktor: http://kannattaakokauppa.fi
More about the model: https://ouzor.github.io/blog/2016/03/08/apartment-price-model.html
Data Science Workflow
11
Data Science in the vacuum
Typically starts with a clean data set and a clear (modelling) task.
Example: Weather data in csv, and a goal to predict humidity.
What might be different in the real world?
12
Data Science Workflow in the Real World
1. Identifying and defining the problem
2. Accessing data
3. Preprocessing and cleaning the data
4. Exploratory data analysis and visualisation
5. Statistical modelling or machine learning
6. End result
Note the difference between academic interests and practical relevance!
13
ITERATION
Identifying and defining the problem
Learn to be critical and ask good questions!
• Why is this problem important?
• How does solving this improve our user experience?
• How does solving this improve our business?
• Is the problem really something we should solve, or is it something where we happen to have data or methods
available?
• Do we even need to solve this problem!?
Only after the problem is identified, you can start thinking about data science
- Do we have relevant data to support solving the problem?
- Can we use modelling to solve the problem (e.g. prediction or classification)?
14
Accessing data
Data exists in variety of sources and formats.
A data scientist might need to access data from any of
these in a reasonable time.
Typical data sources: Files, APIs, Data bases, web
scraping
Typical data formats:
- CSV, TSV, Excel
- JSON (XML less nowadays)
- Lot’s of strange structure in text files
Domain-specific formats:
- Relational data (networks)
- Spatial data
- Gene expression, genomic data
15
Example: Weather data from WFS API
http://opendata.fmi.fi/wfs?service=WFS&version=2.0.0&request=getFeature&storedquery_id=fmi::forecast::hirlam::surface::point::multipointcoverage&place=helsinki&
16
Be very careful with Excel data formatting!
17
Preprocessing and cleaning (”wrangling” / ”munging”)
“Tidy datasets are all alike, but every messy dataset is messy in its own way.” –– Hadley Wickham
Having data in a tidy format makes data analysis, visualisation and modelling easier.
Data frames in R and Python.
Read more about tidy data: https://r4ds.had.co.nz/tidy-data.html
18
Exploratory Data Analysis and Visualisation
The goal of Exploratory Data Analysis is to get
to know your data, using visual summaries and
computing descriptive statistics.
Includes identifying missing data, outliers and
other possible problems with the data.
This informs preprocessing and cleaning, and
typically needs a couple of iterations before the
data is ready for analysis.
You should also contact domain experts and
confirm if the data looks as it should.
It’s hard to define when the data is really
”clean”. You will develop an instict for this
over time.
19
Statistical modelling and machine learning
Modelling is one way to reach a goal in data analysis, not a
goal in itself.
Pick a suitable method based on your goals - not the other
way around!
Start with simple methods, add complexity gradually, if
needed.
You can get pretty far with linear or logistic regression.
20
End result
The end result of a data science project can be many things, such as
- A single figure describing the association of two variables
- A comprehensive report for a client or business department
- A machine learning product ready to be deployed into production
In most projects, it is important to write some kind of report of documentation of what has been done.
Learning to communicate effectively is a very important skill for data scientists. This includes producing clear visual
summaries of the main results, and using generally understandable language.
21
Deplying Data Products
Data science is useful in creating insights, increasing understanding, and informing decision making.
The biggest impact however comes from intelligent systems that operate automatically and continuously, such as
recommendation engines. This typically means that data science products are deployed as part of larger software
systems.
Deploying your first data products can be frightening for data scientist with no programming background.
Get support from software developers or data engineers!
22
Data Science Tools – Some tips
Make everything reproducible and use version control!
Tidyverse is a family of R packages that cover most of the
data science workflow.
Many similar tools exist for Python!
Tidyverse: https://www.tidyverse.org/
R for Data Science: https://r4ds.had.co.nz/
23
How to learn the Data Science Workflow?
Data Science is an art – you only learn it by doing!
• Pick challenging courses with large and realistic projects
• Start a hobby project, for example using some open data set, and share the code and results (e.g. GitHub)
• Participate competitions and challenges
• Tidytuesdays: https://github.com/rfordatascience/tidytuesday
• Kaggle: https://www.kaggle.com/
Learning a proper Data Science Workflow will help you in producing reliable results in a reasonable time.
This will benefit your career regardless of whether you work in the academia, industry, or somewhere else.
24
Data Science and Business
25
Agile Data Science
Any sufficiently interesting problem has more than one ”correct” answer.
You can use anything between 2 hours and a PhD on single problem. Try to recognize how much effort each problem
is worth of.
You can often get a satisfactory solution with 20% of the effort compared to a ”perfect” solution.
Learn to fail fast. Sometimes data science solutions do not work, and it’s good to realise this as soon as possible.
Adopting agile software development practices helps!
Agile Data Science with R: https://edwinth.github.io/ADSwR/index.html
26
Data Science in a Team
No single person can master every possible data science
skill.
Data scientists work effectively in teams, with
complementary skill sets and backgrounds.
When looking for you first job as a data scientits, look for
places where there are senior people who can help you
learn and grow as a data scientist.
27
Data Science Use Cases
28
Data Science as part of a Product or Project
Data Science is typically only a small part of the larger
Product or Project.
It is important to know what the overall goal is, and to
adjust data science development towards that.
You need to collaborate with other people, such as
designers, software developers, marketing and sales
people, customers, etc.
29
Some takeaway notes
Data Science is an art – you only learn it by doing.
Find ways to continuously learn and practice your skills, with e.g.
hobby projects or competitions.
Finding a problem worth solving is hard.
There is never a single correction solution.
Curiosity and critical thinking are invaluable!
Thank you!
Juuso Parkkinen, PhD - @ouzor
Head of Data Science, Nightingale Health - @NgaleHealth
www.nightingalehealth.com

Más contenido relacionado

La actualidad más candente

Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyPeter Kua
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceFerdin Joe John Joseph PhD
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science Mahesh Kumar CV
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_scienceMartina Pugliese
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceKoo Ping Shung
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 

La actualidad más candente (20)

Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Data science 101
Data science 101Data science 101
Data science 101
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_science
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Notebooks in IBM
Notebooks in IBMNotebooks in IBM
Notebooks in IBM
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 

Similar a How to Prepare for a Career in Data Science

Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptxRupaliKute3
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Betacowork
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargShiv Shakti Ghosh
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfData Science Council of America
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
 
Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data ScienceNyraSehgal
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data ScienceActonRoy
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Joanne Luciano
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniDonatella Cambosu
 
Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights Joe Lamantia
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analyticssunnypatil1778
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientistTanujaSomvanshi1
 

Similar a How to Prepare for a Career in Data Science (20)

Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptx
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
data science
data sciencedata science
data science
 
data science
data sciencedata science
data science
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data Science
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo Pellegrini
 
Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 

Último

➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 

Último (20)

➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 

How to Prepare for a Career in Data Science

  • 1.
  • 2. How to Prepare for a Career in Data Science Juuso Parkkinen, PhD - @ouzor Head of Data Science, Nightingale Health - @NgaleHealth Aalto University, November 25, 2019
  • 3. Outline 1.My Career as a Data Scientist 2.Data Science Workflow 3.Data Science and Business
  • 4. My Career as a Data Scientist
  • 5. The Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  • 6. My career steps MSc in bioinformation technology from HUT / Aalto PhD in bioinformatics and machine learning from Aalto Data Scientist (consultant) at Reaktor Data Scientist at Nightingale Health
  • 7. Data Science research: probabilistic models for biomedical problems 7 More on my research and other projects: https://ouzor.github.io/projects.html
  • 8. Data Science as a hobby: open tools for open data Blogging Open source programming Open Knowledge -community Blogs: https://louhos.github.io/, https://ouzor.github.io/
  • 9. Open data science example: Biking activity in Helsinki How do various factors affect biking activity in Helsinki? Data sources: - Automatic bike activity counters from multiple sites - Weather data from FMI Bike activity modelled with Negative Binomial distribution using R (mgcv::gam) Done with Janne Sinkkonen and Antti Poikola Data, code & results: https://github.com/apoikola/fillarilaskennat 9
  • 10. Open Data Science at Reaktor: Apartment price modelling Kannattaakokauppa.fi by Reaktor: http://kannattaakokauppa.fi More about the model: https://ouzor.github.io/blog/2016/03/08/apartment-price-model.html
  • 12. Data Science in the vacuum Typically starts with a clean data set and a clear (modelling) task. Example: Weather data in csv, and a goal to predict humidity. What might be different in the real world? 12
  • 13. Data Science Workflow in the Real World 1. Identifying and defining the problem 2. Accessing data 3. Preprocessing and cleaning the data 4. Exploratory data analysis and visualisation 5. Statistical modelling or machine learning 6. End result Note the difference between academic interests and practical relevance! 13 ITERATION
  • 14. Identifying and defining the problem Learn to be critical and ask good questions! • Why is this problem important? • How does solving this improve our user experience? • How does solving this improve our business? • Is the problem really something we should solve, or is it something where we happen to have data or methods available? • Do we even need to solve this problem!? Only after the problem is identified, you can start thinking about data science - Do we have relevant data to support solving the problem? - Can we use modelling to solve the problem (e.g. prediction or classification)? 14
  • 15. Accessing data Data exists in variety of sources and formats. A data scientist might need to access data from any of these in a reasonable time. Typical data sources: Files, APIs, Data bases, web scraping Typical data formats: - CSV, TSV, Excel - JSON (XML less nowadays) - Lot’s of strange structure in text files Domain-specific formats: - Relational data (networks) - Spatial data - Gene expression, genomic data 15
  • 16. Example: Weather data from WFS API http://opendata.fmi.fi/wfs?service=WFS&version=2.0.0&request=getFeature&storedquery_id=fmi::forecast::hirlam::surface::point::multipointcoverage&place=helsinki& 16
  • 17. Be very careful with Excel data formatting! 17
  • 18. Preprocessing and cleaning (”wrangling” / ”munging”) “Tidy datasets are all alike, but every messy dataset is messy in its own way.” –– Hadley Wickham Having data in a tidy format makes data analysis, visualisation and modelling easier. Data frames in R and Python. Read more about tidy data: https://r4ds.had.co.nz/tidy-data.html 18
  • 19. Exploratory Data Analysis and Visualisation The goal of Exploratory Data Analysis is to get to know your data, using visual summaries and computing descriptive statistics. Includes identifying missing data, outliers and other possible problems with the data. This informs preprocessing and cleaning, and typically needs a couple of iterations before the data is ready for analysis. You should also contact domain experts and confirm if the data looks as it should. It’s hard to define when the data is really ”clean”. You will develop an instict for this over time. 19
  • 20. Statistical modelling and machine learning Modelling is one way to reach a goal in data analysis, not a goal in itself. Pick a suitable method based on your goals - not the other way around! Start with simple methods, add complexity gradually, if needed. You can get pretty far with linear or logistic regression. 20
  • 21. End result The end result of a data science project can be many things, such as - A single figure describing the association of two variables - A comprehensive report for a client or business department - A machine learning product ready to be deployed into production In most projects, it is important to write some kind of report of documentation of what has been done. Learning to communicate effectively is a very important skill for data scientists. This includes producing clear visual summaries of the main results, and using generally understandable language. 21
  • 22. Deplying Data Products Data science is useful in creating insights, increasing understanding, and informing decision making. The biggest impact however comes from intelligent systems that operate automatically and continuously, such as recommendation engines. This typically means that data science products are deployed as part of larger software systems. Deploying your first data products can be frightening for data scientist with no programming background. Get support from software developers or data engineers! 22
  • 23. Data Science Tools – Some tips Make everything reproducible and use version control! Tidyverse is a family of R packages that cover most of the data science workflow. Many similar tools exist for Python! Tidyverse: https://www.tidyverse.org/ R for Data Science: https://r4ds.had.co.nz/ 23
  • 24. How to learn the Data Science Workflow? Data Science is an art – you only learn it by doing! • Pick challenging courses with large and realistic projects • Start a hobby project, for example using some open data set, and share the code and results (e.g. GitHub) • Participate competitions and challenges • Tidytuesdays: https://github.com/rfordatascience/tidytuesday • Kaggle: https://www.kaggle.com/ Learning a proper Data Science Workflow will help you in producing reliable results in a reasonable time. This will benefit your career regardless of whether you work in the academia, industry, or somewhere else. 24
  • 25. Data Science and Business 25
  • 26. Agile Data Science Any sufficiently interesting problem has more than one ”correct” answer. You can use anything between 2 hours and a PhD on single problem. Try to recognize how much effort each problem is worth of. You can often get a satisfactory solution with 20% of the effort compared to a ”perfect” solution. Learn to fail fast. Sometimes data science solutions do not work, and it’s good to realise this as soon as possible. Adopting agile software development practices helps! Agile Data Science with R: https://edwinth.github.io/ADSwR/index.html 26
  • 27. Data Science in a Team No single person can master every possible data science skill. Data scientists work effectively in teams, with complementary skill sets and backgrounds. When looking for you first job as a data scientits, look for places where there are senior people who can help you learn and grow as a data scientist. 27
  • 28. Data Science Use Cases 28
  • 29. Data Science as part of a Product or Project Data Science is typically only a small part of the larger Product or Project. It is important to know what the overall goal is, and to adjust data science development towards that. You need to collaborate with other people, such as designers, software developers, marketing and sales people, customers, etc. 29
  • 30. Some takeaway notes Data Science is an art – you only learn it by doing. Find ways to continuously learn and practice your skills, with e.g. hobby projects or competitions. Finding a problem worth solving is hard. There is never a single correction solution. Curiosity and critical thinking are invaluable!
  • 31. Thank you! Juuso Parkkinen, PhD - @ouzor Head of Data Science, Nightingale Health - @NgaleHealth www.nightingalehealth.com