SlideShare a Scribd company logo
1 of 27
Download to read offline
FROM PHYSICS TO DATA SCIENCE
Martina Pugliese
17 December 2015
Scotland Data Science & Technology
An outline of what we will discuss
THE PARTS ABOUT
ME,
MY JOB,
MY BACKGROUND
WHAT (I LEARNED) IT
MEANS TO
DO DATA SCIENCE
WHAT IS DATA SCIENCE
AND ITS (AMBIGUOUS)
RELATIONSHIP
TO RESEARCH
WHO AM I?
Why am I here?
What do I want?
THE BORING BACKGROUND
➤ I did a Bachelor’s degree in Physics
I thought I wanted to do particle physics
➤ Then I did a Master’s degree in Physics (Statistical Mechanics)
I’ve studied the evolution of Influenza virus
0 2 4 6 8 10 12 14 16 18 20 10−3
10−2
10−1
1
10
0
1
2
3
4
5
6
7
S
E
0 2 4 6 8 10 12 14 16 18 20 10−3
10−2
10−1
1
10
0
1
2
3
4
5
6
7
S
E
βM
pM0.55
S
0
1
2
3
4
5
6
7
βM
pM0.55
S
0
1
2
3
4
5
6
7
Numerical model (using a genetic
algorithm) simulating how
the pathogen creates new variants
THE BORING BACKGROUND
➤ Then I did a PhD in Physics
I’ve explored how Natural Language evolves in time
0
0.2
0.4
0.6
0.8
1
10−5
10−4
10−3
10−2
10−1
I
fsum
burn
0
0.2
0.4
0.6
0.8
1
10−5
10−4
10−3
10−2
10−1
I
fsum
dwell
0
0.2
0.4
0.6
0.8
1
10−5
10−4
10−3
10−2
10−1
I
fsum
hide
0
0.2
0.4
0.6
0.8
1
10−5
10−4
10−3
10−2
10−1
I
fsum
sing
verbs changing
inflection in time
hide became irregular
sing stayed irregular
burn stayed regular dwell oscillates
Data Mining
&
Simulations
THE BORING BACKGROUND
➤ I wanted a job in the industry, as a Data Scientist, so …
I’ve done a bootcamp in London, S2DS, working on a
commercial DS problem [1]
Physics gave me:
the ability to model reality
(mathematically)
a brain trained to deal with data
ideas about lots of more things to study
the scientific method to carry out
experiments
DATA
SCIENCE
Trend or Hype?
what do you mean
by “science”?
“The key word in “Data Science” is not
Data, it is Science.
-Jeff Leek
DATA SCIENCE: A BABY COME OF AGE?
NGram Viewer data
There’s lots of talk these days on several buzzwords containing “data”
But the science of extracting information out of raw data
is much older than some think
A WEE BIT OF HISTORY
➤ The ‘60s: Data Analysis bashfully starts branching out of Statistics as an
empirical science [1]
➤ The ‘70s: Establishing the idea of converting data into knowledge
➤ The ‘80s: G. Piatetsky-Shapiro founds the KDD (Knowledge Discovery in
Databases) conferences
➤ The ‘90s: companies have lots of data on customers! The term Data Science is
first used in a conference name [2]
➤ the 2000s: Academic endeavours to define the field [3]
Statistical models (the “irrelevant theory”) vs. Algorithms
➤ the 2010s: the BOOM!
The “sexiest job of the 21st century” [4]
Big Data is the new innovation [5]
Growth in “analytics” and Data Science educational programs [6]
Data Science in Business should be called “Decision Science” [7]
But today, this is what’s happening:
[Intel, What happens in an Internet Minute? 2012]
So there came the need to have (many more) specialised people, in the industry, to
understand this dirty, variegated, large data and leverage it to provide solutions
The data we agree to give to services we use
(social networks, apps …) is used to sell us
tailored experiences There is a saying in Italian which goes
(translated) as:
“I know you as my pockets”
It should now become something like “I know you as your phone”
Where to get all these people from?
DS academic programs
Research
on the rise
???
PhD?
No thanks…or maybe Yes
The ugly fact: research has no room for all PhD graduates
Growth of PhD graduates in S&E fields in time
vs. growth of research positions [8]
The academic bottleneck is in the after the PhD
PhDs do not have real “transferable” skills (The Economist, [8])
Is this a reason alone to transfer a PhD to the industry?
NO
A PhD is an academic qualification
It is meant to train people for research
And for the new challenges ahead,
we need lots of scientists
to study new solutions
climate change
ageing of population
sustainable energy sources
the human brain
data science algorithms
…
Does it mean access to PhD programs should change?
MAYBE
Can we suggest Academia and industry should cooperate more?
CERTAINLY
Google cooperates (and hires from) Academia a lot
They’re shaping the
innovation landscape
Considering them as separate worlds does not help
They’re contributing to
“traditional”
academic research
(Quantum Annealing, [9])
They’re pushing the current
borders of AI
(deep learning, anyone?)
THE (OBVIOUS) DISADVANTAGES OF A PHD GRADUATE
➤ The
“overqualified and
unexperienced” curse
➤ Research trains you to sustain and cope with failure
➤ You know how to quickly learn new stuff alone
➤ You have a long history of communicating your findings
THE (NOT-SO-OBVIOUS) ADVANTAGES OF A PHD GRADUATE
I’d argue this is the best
skill to have today
➤ The “age” and “expectations” problems www.phdcomics.com
THE STUFF
YOU
(DEFINITELY)
NEED
hints on where to find it
I believe the main and most important skill
one needs in this role is that of being able to
learn quickly and having the passion for doing so
BUT PRACTICALLY SPEAKING…
➤ Mathematics & Statistics foundations
This is the brain training you need to understand it all. I won’t list all the needed stuff because it
wouldn't make sense, but in short…:
Linear Algebra (matrices operations)
Probability Theory, the concepts
Graph Theory, the concepts
Be proficient with Calculus and Mathematical Methods
Statistical Tests and Techniques
…
➤ Machine Learning
You need to be able to understand an algorithm on pen and paper, otherwise it’s just pushing a button
on a ML library. With practice you learn which to choose for what and how to assess its performance.
As for libraries, it depends, but scikit-learn is great and very well documented, including the Maths
behind algorithms so it’s a great resource.
BUT PRACTICALLY SPEAKING…
➤ Programming
It’s essential code quickly and product reusable, robust scripts.
I have a thing for Python.
I also use R sometimes for stats analyses.
Shell commands proficiency helps a lot to save time
Numerical simulations: something like C++ is very useful
Basics of web development and of the software development process
➤ Data visualisation tools
Visualisations help you and others around you understand information
I use Python libraries for simple things, but the beauty of D3 is unbeatable
➤ Big Data Technologies
This is the bit about which there’s lots of talk these days. Analytical skills also means you
learn the Technologies (Hadoop/Spark/Mahout…) with practice.
RIGHT, BUT
WHAT
EXACTLY DO
YOU DO?
tell me about your job!
Mallzee is the fashion app for everyone
You swipe product right (like)
or left (dislike)
You can create your own style feeds
You can search for specific products
and favourite brands
You can buy products
We have millions of “swipes” plus user data
WHAT I DO IN MY JOB
Follow the DS mantra:
Exploratory
Analyses
Model
Data pre-processing
Product Insights
Model
Validation
takes long time…[8]
produce
visualisations
produce
software
THE ROLE CONSISTS OF SEVERAL THINGS
Understand user behaviour
in all parts of the app
Predict what drives
retention/usage
Analyse numerical data on swipes
to see what’s hot this season
Improve product with
tailored-to-you features
Computer Vision to see what
images features perform best
for what sorts and whom
Measure all indicators
across the business
Recommendations
THE REFERENCES
➤ [1] Something I wrote for S2DS
➤ [1] Tukey, The Future of Data Analysis
➤ [2] Data Science, Classification and related methods, Kobe, Japan, 1996
➤ [3] Leo Breiman, Statistical Modeling, the Two Cultures
➤ [4] HBR, Data Scientist: The Sexiest Job of the 21st Century
➤ [5] McKinsey, Big Data, the next frontier for innovation
➤ [6] KDNuggets, the boom in analytics education
➤ [7] TechCrunch, Why Decision Science matters
➤ [8] Nature Biotechnology, The missing piece to changing the university culture
➤ [8] The Economist, the disposable academic
➤ [9] What is the computational value of finite range tunnelling?
➤ [8] NY Times, the "Janitor work" is key hurdle for insight
➤ [8] M. Loudikes, What is Data Science?
➤ [9] The Edison European Project
Thanks!
… and a special thanks to W. Kandinsky

More Related Content

What's hot

Bank frauds & its safety
Bank frauds & its safetyBank frauds & its safety
Bank frauds & its safetyBISWAJITGHORAI2
 
OPSEC for hackers
OPSEC for hackersOPSEC for hackers
OPSEC for hackersgrugq
 
Top 10 chief investment officer interview questions and answers
Top 10 chief investment officer interview questions and answersTop 10 chief investment officer interview questions and answers
Top 10 chief investment officer interview questions and answerskentjonh196
 
Types of cyber crime
Types of cyber crimeTypes of cyber crime
Types of cyber crimeInshaLakhani
 
Webinar - Cyber Hygiene: Stay Clean at Work and at Home
Webinar - Cyber Hygiene: Stay Clean at Work and at HomeWebinar - Cyber Hygiene: Stay Clean at Work and at Home
Webinar - Cyber Hygiene: Stay Clean at Work and at HomeWPICPE
 

What's hot (6)

Bank frauds & its safety
Bank frauds & its safetyBank frauds & its safety
Bank frauds & its safety
 
OPSEC for hackers
OPSEC for hackersOPSEC for hackers
OPSEC for hackers
 
11 S. Truett Cathy Quotes
11 S. Truett Cathy Quotes11 S. Truett Cathy Quotes
11 S. Truett Cathy Quotes
 
Top 10 chief investment officer interview questions and answers
Top 10 chief investment officer interview questions and answersTop 10 chief investment officer interview questions and answers
Top 10 chief investment officer interview questions and answers
 
Types of cyber crime
Types of cyber crimeTypes of cyber crime
Types of cyber crime
 
Webinar - Cyber Hygiene: Stay Clean at Work and at Home
Webinar - Cyber Hygiene: Stay Clean at Work and at HomeWebinar - Cyber Hygiene: Stay Clean at Work and at Home
Webinar - Cyber Hygiene: Stay Clean at Work and at Home
 

Similar to from_physics_to_data_science

Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014Lin Todd
 
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...NOVA DATASCIENCE
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Joanne Luciano
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientistTanujaSomvanshi1
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewAnidata
 
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...jybufgofasfbkpoovh
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceJuuso Parkkinen
 
10 Tips From A Young Data Scientist
10 Tips From A Young Data Scientist10 Tips From A Young Data Scientist
10 Tips From A Young Data ScientistNuno Carneiro
 
Hector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business AnalyticsHector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business AnalyticsErika Marr
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 

Similar to from_physics_to_data_science (20)

Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
 
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
 
Interview
InterviewInterview
Interview
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
 
How to crack down big data?
How to crack down big data? How to crack down big data?
How to crack down big data?
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical Interview
 
The field-guide-to-data-science
The field-guide-to-data-scienceThe field-guide-to-data-science
The field-guide-to-data-science
 
SENCER_panel.ppt
SENCER_panel.pptSENCER_panel.ppt
SENCER_panel.ppt
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
 
Data Scientist
Data ScientistData Scientist
Data Scientist
 
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
10 Tips From A Young Data Scientist
10 Tips From A Young Data Scientist10 Tips From A Young Data Scientist
10 Tips From A Young Data Scientist
 
Hector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business AnalyticsHector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business Analytics
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 

from_physics_to_data_science

  • 1. FROM PHYSICS TO DATA SCIENCE Martina Pugliese 17 December 2015 Scotland Data Science & Technology
  • 2. An outline of what we will discuss THE PARTS ABOUT ME, MY JOB, MY BACKGROUND WHAT (I LEARNED) IT MEANS TO DO DATA SCIENCE WHAT IS DATA SCIENCE AND ITS (AMBIGUOUS) RELATIONSHIP TO RESEARCH
  • 3. WHO AM I? Why am I here? What do I want?
  • 4. THE BORING BACKGROUND ➤ I did a Bachelor’s degree in Physics I thought I wanted to do particle physics ➤ Then I did a Master’s degree in Physics (Statistical Mechanics) I’ve studied the evolution of Influenza virus 0 2 4 6 8 10 12 14 16 18 20 10−3 10−2 10−1 1 10 0 1 2 3 4 5 6 7 S E 0 2 4 6 8 10 12 14 16 18 20 10−3 10−2 10−1 1 10 0 1 2 3 4 5 6 7 S E βM pM0.55 S 0 1 2 3 4 5 6 7 βM pM0.55 S 0 1 2 3 4 5 6 7 Numerical model (using a genetic algorithm) simulating how the pathogen creates new variants
  • 5. THE BORING BACKGROUND ➤ Then I did a PhD in Physics I’ve explored how Natural Language evolves in time 0 0.2 0.4 0.6 0.8 1 10−5 10−4 10−3 10−2 10−1 I fsum burn 0 0.2 0.4 0.6 0.8 1 10−5 10−4 10−3 10−2 10−1 I fsum dwell 0 0.2 0.4 0.6 0.8 1 10−5 10−4 10−3 10−2 10−1 I fsum hide 0 0.2 0.4 0.6 0.8 1 10−5 10−4 10−3 10−2 10−1 I fsum sing verbs changing inflection in time hide became irregular sing stayed irregular burn stayed regular dwell oscillates Data Mining & Simulations
  • 6. THE BORING BACKGROUND ➤ I wanted a job in the industry, as a Data Scientist, so … I’ve done a bootcamp in London, S2DS, working on a commercial DS problem [1] Physics gave me: the ability to model reality (mathematically) a brain trained to deal with data ideas about lots of more things to study the scientific method to carry out experiments
  • 7. DATA SCIENCE Trend or Hype? what do you mean by “science”?
  • 8. “The key word in “Data Science” is not Data, it is Science. -Jeff Leek
  • 9. DATA SCIENCE: A BABY COME OF AGE? NGram Viewer data There’s lots of talk these days on several buzzwords containing “data” But the science of extracting information out of raw data is much older than some think
  • 10. A WEE BIT OF HISTORY ➤ The ‘60s: Data Analysis bashfully starts branching out of Statistics as an empirical science [1] ➤ The ‘70s: Establishing the idea of converting data into knowledge ➤ The ‘80s: G. Piatetsky-Shapiro founds the KDD (Knowledge Discovery in Databases) conferences ➤ The ‘90s: companies have lots of data on customers! The term Data Science is first used in a conference name [2] ➤ the 2000s: Academic endeavours to define the field [3] Statistical models (the “irrelevant theory”) vs. Algorithms ➤ the 2010s: the BOOM! The “sexiest job of the 21st century” [4] Big Data is the new innovation [5] Growth in “analytics” and Data Science educational programs [6] Data Science in Business should be called “Decision Science” [7]
  • 11. But today, this is what’s happening: [Intel, What happens in an Internet Minute? 2012]
  • 12. So there came the need to have (many more) specialised people, in the industry, to understand this dirty, variegated, large data and leverage it to provide solutions The data we agree to give to services we use (social networks, apps …) is used to sell us tailored experiences There is a saying in Italian which goes (translated) as: “I know you as my pockets” It should now become something like “I know you as your phone” Where to get all these people from? DS academic programs Research on the rise ???
  • 14. The ugly fact: research has no room for all PhD graduates Growth of PhD graduates in S&E fields in time vs. growth of research positions [8] The academic bottleneck is in the after the PhD PhDs do not have real “transferable” skills (The Economist, [8])
  • 15. Is this a reason alone to transfer a PhD to the industry? NO A PhD is an academic qualification It is meant to train people for research And for the new challenges ahead, we need lots of scientists to study new solutions climate change ageing of population sustainable energy sources the human brain data science algorithms … Does it mean access to PhD programs should change? MAYBE
  • 16. Can we suggest Academia and industry should cooperate more? CERTAINLY Google cooperates (and hires from) Academia a lot They’re shaping the innovation landscape Considering them as separate worlds does not help They’re contributing to “traditional” academic research (Quantum Annealing, [9]) They’re pushing the current borders of AI (deep learning, anyone?)
  • 17. THE (OBVIOUS) DISADVANTAGES OF A PHD GRADUATE ➤ The “overqualified and unexperienced” curse ➤ Research trains you to sustain and cope with failure ➤ You know how to quickly learn new stuff alone ➤ You have a long history of communicating your findings THE (NOT-SO-OBVIOUS) ADVANTAGES OF A PHD GRADUATE I’d argue this is the best skill to have today ➤ The “age” and “expectations” problems www.phdcomics.com
  • 19. I believe the main and most important skill one needs in this role is that of being able to learn quickly and having the passion for doing so
  • 20. BUT PRACTICALLY SPEAKING… ➤ Mathematics & Statistics foundations This is the brain training you need to understand it all. I won’t list all the needed stuff because it wouldn't make sense, but in short…: Linear Algebra (matrices operations) Probability Theory, the concepts Graph Theory, the concepts Be proficient with Calculus and Mathematical Methods Statistical Tests and Techniques … ➤ Machine Learning You need to be able to understand an algorithm on pen and paper, otherwise it’s just pushing a button on a ML library. With practice you learn which to choose for what and how to assess its performance. As for libraries, it depends, but scikit-learn is great and very well documented, including the Maths behind algorithms so it’s a great resource.
  • 21. BUT PRACTICALLY SPEAKING… ➤ Programming It’s essential code quickly and product reusable, robust scripts. I have a thing for Python. I also use R sometimes for stats analyses. Shell commands proficiency helps a lot to save time Numerical simulations: something like C++ is very useful Basics of web development and of the software development process ➤ Data visualisation tools Visualisations help you and others around you understand information I use Python libraries for simple things, but the beauty of D3 is unbeatable ➤ Big Data Technologies This is the bit about which there’s lots of talk these days. Analytical skills also means you learn the Technologies (Hadoop/Spark/Mahout…) with practice.
  • 22. RIGHT, BUT WHAT EXACTLY DO YOU DO? tell me about your job!
  • 23. Mallzee is the fashion app for everyone You swipe product right (like) or left (dislike) You can create your own style feeds You can search for specific products and favourite brands You can buy products We have millions of “swipes” plus user data
  • 24. WHAT I DO IN MY JOB Follow the DS mantra: Exploratory Analyses Model Data pre-processing Product Insights Model Validation takes long time…[8] produce visualisations produce software
  • 25. THE ROLE CONSISTS OF SEVERAL THINGS Understand user behaviour in all parts of the app Predict what drives retention/usage Analyse numerical data on swipes to see what’s hot this season Improve product with tailored-to-you features Computer Vision to see what images features perform best for what sorts and whom Measure all indicators across the business Recommendations
  • 26. THE REFERENCES ➤ [1] Something I wrote for S2DS ➤ [1] Tukey, The Future of Data Analysis ➤ [2] Data Science, Classification and related methods, Kobe, Japan, 1996 ➤ [3] Leo Breiman, Statistical Modeling, the Two Cultures ➤ [4] HBR, Data Scientist: The Sexiest Job of the 21st Century ➤ [5] McKinsey, Big Data, the next frontier for innovation ➤ [6] KDNuggets, the boom in analytics education ➤ [7] TechCrunch, Why Decision Science matters ➤ [8] Nature Biotechnology, The missing piece to changing the university culture ➤ [8] The Economist, the disposable academic ➤ [9] What is the computational value of finite range tunnelling? ➤ [8] NY Times, the "Janitor work" is key hurdle for insight ➤ [8] M. Loudikes, What is Data Science? ➤ [9] The Edison European Project
  • 27. Thanks! … and a special thanks to W. Kandinsky