SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
What	
  is	
  Data	
  Science
Looking	
  for	
  an	
  objective,	
  complete,	
  inclusive,	
  accurate	
  and	
  succinct	
  
definition	
  of	
  this	
  emerging	
  field
Ioannis	
  Kourouklides
www.kourouklides.com
Contents
• Introduction
• History
• Related	
  terms
• Definitions	
  by	
  various	
  individuals
• Domain	
  expertise
• Data	
  Science	
  in	
  the	
  job	
  market
• How	
  Data	
  Scientists	
  are	
  self-­‐defined
• Summary
• Conclusion
• References	
  &	
  Bibliography
Introduction
• In	
  a	
  Forbes	
  article,	
  Gil	
  Press	
  (2013)	
   admits	
  himself,	
  among	
  others,	
  that	
  
Data	
  Science	
  (DS)	
  is	
  a	
  buzzword without	
  a	
  clear	
  definition
• A	
  quick	
  search	
  in	
  online	
  and	
  print	
  resources	
  verifies	
  this	
  lack	
  of	
  description
• Several	
  people	
  and	
  companies	
  expressed	
  their	
  own	
  opinion	
  on	
  the	
  matter
• Nonetheless,	
  most	
  definitions	
  overlap	
  with	
  each	
  other
• Data	
  Science	
  is	
  not	
  concerned	
  with	
  everything	
  that	
  has	
  to	
  do	
  with	
  data
• A	
  brief	
  look	
  at	
  the	
  recent	
  history	
  can	
  give	
  more	
  insight
• The	
  proper	
  (concrete)	
  definition	
  of	
  this	
  science	
  would	
  have	
  to	
  come	
  from	
  
the	
  industry rather	
  than	
  academia and	
  might	
  keep	
  evolving	
  through	
  time
History
• The	
  term	
  “Data	
  Science”	
  has	
  been	
  around	
  for	
  more	
  than	
  30	
  years
• It	
  did	
  not	
  always	
  have	
  the	
  same	
  meaning,	
  but	
  it	
  picked	
  up	
  since	
  then
• Gil	
  Press	
  (2013)	
  authored	
  an	
  article	
  about	
  the	
  evolution	
  of	
  the	
  term
• 1966:	
  Peter	
  Naur	
  used	
  the	
  term	
  “Science	
  of	
  Data”	
  interchangeably	
  with	
  
“Datalogy”	
  as	
  a	
  synonym	
  of	
  Computer	
  Science	
  in	
  his	
  courses	
  (Naur,	
  1968)
• 1974:	
  Naur	
  published	
  the	
  book	
  ‘Concise	
  Survey	
  of	
  Computer	
  Methods’
which	
  is	
  a	
  survey	
  of	
  modern	
  data	
  processing	
  methods
• 1989:	
  Gregory	
  Piatetsky-­‐Shapiro	
  organized	
  and	
  chaired	
  the	
  first	
  Knowledge	
  
Discovery	
  in	
  Databases	
  workshop.	
  In	
  1995,	
  it	
  became	
  the	
  annual	
  ACM	
  
Conference	
  on	
  Knowledge	
  Discovery	
  and	
  Data	
  Mining	
  (KDD).
History
• 1996:	
  International	
  Federation	
  of	
  Classification	
  Societies	
  (IFCS)	
  used	
  the	
  
term	
  “Data	
  Science”	
  for	
  the	
  first	
  time	
  in	
  the	
  title	
  of	
  the	
  conference
(“Data	
  science,	
  classification,	
  and	
  related	
  methods”)
• 1997: C.F.	
  Jeff	
  Wu	
  gave	
  his	
  inaugural	
  lecture	
  entitled	
  ‘Statistics	
  =	
  Data	
  
Science?’	
  (“Identity	
  of	
  statistics	
  in	
  science	
  examined,”	
  1997)
• 2001:	
  William	
  S.	
  Cleveland	
  published	
  ‘Data	
  Science:	
  An	
  Action	
  Plan	
  for	
  
Expanding	
  the	
  Technical	
  Areas	
  of	
  the	
  Field	
  of	
  Statistics’
• 2002:	
  Launch	
  of	
  ‘Data	
  Science	
  Journal’	
  by	
  CODATA	
  of	
  ICSU
• 2003:	
  Launch	
  of	
  ‘Journal	
  of	
  Data	
  Science’	
  by	
  Columbia	
  University
• 2005:	
  National	
  Science	
  Board	
  defined	
  what	
  a	
  Data	
  Scientist	
  is
• 2007:	
  Nathan	
  Yau	
  wrote	
  about	
  the	
  “Rise	
  of	
  the	
  Data	
  Scientist”
Related	
  terms
• But	
  let’s	
  look	
  at	
  some	
  related	
  (possibly	
  overlapping)	
  terms:
• Machine	
  Learning
• Data	
  Mining
• Predictive	
  Analytics
• Statistics
• Big	
  Data
• Data	
  Analysis
• Business	
  Intelligence
• Data	
  Engineering
• Business	
  Analytics
• Knowledge	
  Discovery	
  in	
  Databases
• For	
  a	
  comparison	
  of	
  these	
  terms	
  with	
  Data	
  Science:	
  http://goo.gl/uW15El
Definition	
  by	
  M.	
  Loukides
• Loukides	
  (2010)	
  wrote	
  an	
  article	
  about	
  ‘What	
  is	
  data	
  science?’
• “Data	
  science	
  requires	
  skills	
  ranging	
  from	
  traditional	
  computer science to	
  
mathematics to	
  art.”
• “Data	
  scientists	
  combine	
  entrepreneurship with	
  patience,	
  the	
  willingness	
  to	
  
build	
  data	
  products	
  incrementally,	
  the	
  ability	
  to	
  explore,	
  and	
  the	
  ability	
  to	
  
iterate	
  over	
  a	
  solution.	
  They	
  are	
  inherently	
  interdisciplinary.	
  They	
  can	
  
tackle	
  all	
  aspects	
  of	
  a	
  problem,	
  from	
  initial	
  data	
  collection	
  and	
  data	
  
conditioning	
  to	
  drawing	
  conclusions.”
• This	
  is	
  not	
  a	
  very	
  precise	
  definition,	
  but	
  it	
  is	
  insightful	
  enough
• He	
  also	
  highlighted	
  the	
  industry’s	
  perspective	
  and	
  the	
  escalated	
  job	
  trends
Definition	
  by	
  D.	
  Conway
• Conway	
  (2010)	
  gave	
  a	
  less	
  vague	
  definition:
“…one	
  needs	
  to	
  learn	
  a	
  lot	
  as	
  they	
  aspire	
  to	
  
become	
  a	
  fully	
  competent	
  data	
  scientist.	
  
Unfortunately,	
  simply	
  enumerating	
  texts	
  and	
  
tutorials	
  does	
  not	
  untangle	
  the	
  knots.	
  Therefore,	
  
in	
  an	
  effort	
  to	
  simplify	
  the	
  discussion,	
  and	
  add	
  
my	
  own	
  thoughts	
  to	
  what	
  is	
  already	
  a	
  crowded	
  
market	
  of	
  ideas,	
  I	
  present	
  the Data	
  Science	
  Venn	
  
Diagram…	
  hacking	
  skills,	
  math	
  and	
  stats	
  
knowledge,	
  and	
  substantive	
  expertise.”
Definition	
  by	
  P.	
  Warden
• An	
  other	
  description	
  of	
  DS	
  (Warden,	
  2011)	
  appears	
  to	
  be	
  the	
  following:
• “There	
  is	
  no	
  widely	
  accepted	
  boundary	
  for	
  what’s	
  inside	
  and	
  outside	
  of	
  
data	
  science’s	
  scope.	
  Is	
  it	
  just	
  a	
  faddish	
  rebranding	
  of	
  statistics?	
  I	
  don’t	
  
think	
  so,	
  but	
  I	
  also	
  don’t	
  have	
  a	
  full	
  definition.	
  I	
  believe	
  that	
  the	
  recent	
  
abundance	
  of	
  data	
  has	
  sparked	
  something	
  new	
  in	
  the	
  world,	
  and	
  when	
  I	
  
look	
  around	
  I	
  see	
  people	
  with	
  shared	
  characteristics	
  who	
  don’t	
  fit	
  into	
  
traditional	
  categories.	
  These	
  people	
  tend	
  to	
  work	
  beyond	
  the	
  narrow	
  
specialties	
  that	
  dominate	
  the	
  corporate	
  and	
  institutional	
  world,	
  handling	
  
everything	
  from	
  finding the	
  data,	
  processing it	
  at	
  scale,	
  visualizing it	
  and	
  
writing	
  it	
  up	
  as	
  a	
  story.	
  They	
  also	
  seem	
  to	
  start	
  by	
  looking	
  at	
  what	
  the	
  data	
  
can	
  tell	
  them,	
  and	
  then	
  picking	
  interesting	
  threads	
  to	
  follow,	
  rather	
  than	
  
the	
  traditional	
  scientist’s	
  approach	
  of	
  choosing	
  the	
  problem	
  first	
  and	
  then	
  
finding	
  data	
  to	
  shed	
  light	
  on	
  it.”
Definition	
  by	
  J.	
  Wills
Definition	
  by	
  B.	
  Tierney
Definition	
  by	
  F.	
  Lo
• When	
  searching	
  online	
  the	
  phrase	
  ‘define	
  data	
  science’,	
  an	
  excellent	
  article	
  
(Lo,	
  2013)	
  appears	
  as	
  the	
  suggested/endorsed	
  answer	
  by	
  Google
• “Data	
  science	
  is	
  multidisciplinary;	
  the	
  skill	
  set	
  of	
  a	
  
data	
  scientist	
  lies	
  at	
  the	
  intersection	
  of	
  3	
  main	
  
competencies.”
• “Also,	
  a	
  big	
  misconception	
  is	
  that	
  data	
  science	
  is	
  
all	
  about	
  statistics.	
  While	
  statistics	
  are	
  important,	
  
it	
  is	
  not	
  the	
  only	
  type	
  of	
  mathematics	
  that	
  should	
  
be	
  well-­‐understood	
  by	
  a	
  data	
  scientist.”
• “A	
  defining	
  personality	
  trait	
  of	
  data	
  scientists	
  is	
  
they	
  are	
  deep	
  thinkers	
  with intense	
  intellectual	
  
curiosity.”
Definition	
  by	
  M.	
  Mut
• Mut	
  (2013)	
  went	
  a	
  step	
  further	
  and	
  classified	
  Data	
  Scientists	
  into	
  3	
  distinct	
  
specialties	
  with	
  very	
  little	
  overlap:
• “Advanced	
  Analysis:	
  Math,	
  Stats,	
  Pattern	
  Recognition/Learning,	
  Uncertainty,	
  
Visualization,	
  Data	
  Mining” – let’s	
  call	
  them	
  Data	
  Researchers
• “Computer	
  Systems	
  -­‐ Advanced	
  Computing,	
  High	
  Performance	
  Computing,	
  
Visualization,	
  Data	
  Mining” – let’s	
  call	
  them	
  Data	
  Hackers
• “Databases -­‐ Data	
  Engineering,	
  Data	
  Warehousing”	
  – let’s	
  call	
  them	
  Data Developers
• He	
  claimed	
  that	
  DS	
  is	
  defined	
  to	
  include	
  all	
  these	
  specialties	
  and	
  thus	
  
makes	
  life	
  confusing	
  for	
  employers	
  and	
  applicants
• He	
  proposed	
  a	
  solution	
  would	
  be	
  to	
  educate	
  HR	
  and	
  employers	
  that	
  they	
  
need	
  to	
  break	
  DS	
  into	
  specialties
Definition	
  by	
  V.	
  Granville
• However,	
  Granville	
  (2014)	
  and	
  others	
  disagreed	
  with	
  Mut.	
  They	
  maintained	
  
that combining	
  these	
  different	
  areas	
  is	
  not	
  impossible	
  and	
  they	
  forecasted	
  
that	
  in	
  the	
  future	
  there	
  will	
  be	
  more	
  skills	
  overlap	
  within	
  individuals
• In	
  his	
  book	
  ‘Developing	
  Analytic	
  Talent:	
  Becoming	
  a	
  Data	
  Scientist’	
  he	
  seems	
  
to	
  provide	
  the	
  most	
  convincing	
  and	
  conforming	
  definition:	
  
• “Data	
  Science	
  is	
  the	
  intersection	
  of	
  computer	
  science,	
  business	
  engineering,	
  
statistics,	
  data	
  mining,	
  machine	
  learning,	
  operations	
  research,	
  Six	
  Sigma,	
  
automation	
  and	
  domain	
  expertise.”
• “…	
  people	
  interested	
  in	
  a	
  data	
  science	
  career	
  don’t	
  need	
  to	
  learn	
  […]	
  
everything	
  from	
  these	
  domains.”
Domain	
  expertise
• Domain	
  expertise	
  and	
  business	
  acumen	
  are	
  totally	
  essential	
  for	
  DS
• This	
  depends	
  on	
  the	
  kind	
  of	
  data	
  and	
  their	
  source,	
  such	
  as:
• Bioinformatics	
  &	
  Genomics
• Information	
  Security
• Computer	
  Vision	
  &	
  Image	
  Processing
• Finance	
  &	
  Econometrics
• Insurance
• Marketing
• Medicine,	
  Health	
  &	
  Biomedical	
  applications
• Particle	
  Physics
• Social	
  Networks
• Telecoms	
  &	
  Utilities
• Web	
  &	
  Text	
  Mining
Data	
  Science	
  in	
  the	
  job	
  market
• Data	
  Scientist	
  roles	
  can	
  be	
  referred	
  to	
  by	
  various	
  names	
  according	
  to	
  the	
  
seniority	
  level,	
  the	
  specific	
  skillset	
  and	
  area	
  of	
  expertise
• Frequently	
  required	
  skills	
  are:
• Hadoop/MapReduce/MongoDB/Hive	
  (not	
  always	
  necessary,	
  sometimes	
  as	
  a	
  plus)
• SQL	
  (though	
  less	
  popular	
  than	
  NoSQL)
• Perl/Java/PHP/.NET/Ruby/C++
• Machine	
  Learning	
  techniques
• Python/R/MATLAB/Octave/SPSS/SAS/Stata/Mathematica
• Advanced	
  level	
  degree:	
  MSc	
  or	
  PhD
• Work	
  experience	
  (typically	
  more	
  than	
  1-­‐3	
  years)
• Communications	
  skills
Data	
  Science	
  in	
  the	
  job	
  market
How	
  Data	
  Scientists	
  are	
  self-­‐defined
• Harris	
  et	
  al.	
  (2013)	
  identified	
  four clusters	
  (latent	
  factors)	
  of	
   Data	
  Scientists	
  
in	
  their	
  book,	
  using	
  Non-­‐negative	
  Matrix	
  Factorization:
• The	
  three	
  specializations	
  overlap	
  with	
  the	
  ones	
  mentioned	
  by	
  Mut (2013)
• The	
  forth	
  one	
  refers	
  mostly	
  to	
  CDOs	
  (Chief	
  Data	
  Officers),	
  self-­‐identified	
  as:	
  
Leaders,	
  Businesspersons,	
  or	
  Entrepreneurs
Data Researcher Researcher Scientist Statistician
Data Hacker Hacker Artist Jack	
  of	
  All	
  Trades
Data	
  Developer Developer Engineer -­‐
How	
  Data	
  Scientists	
  are	
  self-­‐defined
• The three specializations have started to emerge as three job positions:
• Nothing stops a person who studied Science from becoming a Data
Developer or Data Hacker and nothing stops a person who studied
Engineering from becominga Data Researcher
• Thus, it is the author’s belief that the terms ‘Scientist’ and ‘Engineer’
should not have been used, as they are misleading
Data Researcher Data	
  Scientist
Data Hacker Machine Learning	
  Engineer
Data	
  Developer Data Engineer
Summary
• In	
  brief,	
  one	
  can	
  split	
  down	
  the	
  skills	
  defining	
  DS	
  into	
  three	
  groups:
Note:	
  Each	
  column	
  above	
  is	
  not	
  related	
  to	
  the	
  adjacent	
  ones
Soft	
  skills
Communication
Business	
  knowledge
Domain	
  expertise
Knowledge &	
  Research	
  skills
Machine	
  Learning	
  – Data	
  Mining
Statistics	
  &	
  other	
  Maths
Relational	
  Databases
High	
  Performance	
  Computing
Data	
  Visualization
Coding	
  skills
Perl/Java/C#/PHP/Ruby/C++
Python/R/MATLAB/Octave
SPSS/SAS/Stata/Mathematica
Hadoop/MongoDB/Hive
SQL/JSON/XML/HTML/CSS
Conclusion
• To	
  sum	
  up,	
  DS	
  is	
  an	
  interdisciplinary science,	
  but	
  without	
  a	
  clear	
  definition
• It	
  can	
  be	
  defined	
  as	
  a	
  set	
  of	
  skills	
  from	
  Computer	
  Science,	
  Statistics,	
  …
• It	
  definitely	
  requires	
  some	
  Research qualities,	
  but	
  also	
  Domain Expertise
• Machine	
  Learning	
  is	
  at	
  the	
  epicentre	
  of	
  this	
  newly	
  coined	
  term
• Different	
  Data	
  Scientists	
  used	
  to	
  focus	
  or	
  specialize	
  in	
  one	
  area	
  of	
  expertise
• It	
  is	
  the	
  author’s	
  belief	
  that	
  future	
  Data	
  Professionals	
  will	
  be	
  required	
  to	
  have	
  
three	
  distinct specializations	
  similar	
  to	
  Quantitative	
  Professionals,	
  i.e.	
  Quant	
  
Researchers,	
  Quant	
  Traders and	
  Quant	
  Developers	
  corresponding	
  to	
  Data	
  
Scientists,	
  Machine	
  Learning	
  Engineers	
  and	
  Data	
  Engineers respectively
• More	
  resources	
  can	
  be	
  found	
  at	
  the	
  next	
  slides
References	
  &	
  Bibliography
1. Gil	
  Press	
  http://www.forbes.com/sites/gilpress/2013/05/28/a-­‐very-­‐short-­‐
history-­‐of-­‐data-­‐science/
2. . Naur,	
  P.,	
  “'Datalogy',	
  the	
  science	
  of	
  data	
  and	
  data	
  processes.” IFIP	
  
Congress	
  2,	
  1968,	
  pp.	
  1383-­‐1387.
3. "Identity	
  of	
  statistics	
  in	
  science	
  examined".	
  The	
  University	
  Records,	
  9	
  
November	
  1997,	
  The	
  University	
  of	
  Michigan.	
  
http://ur.umich.edu/9899/Nov09_98/4.htmRetrieved	
  8	
  August	
  2014.
4. Cleveland,	
  W.	
  S.	
  (2001).	
  "Data	
  Science:	
  An	
  Action	
  Plan	
  for	
  Expanding	
  the	
  
Technical	
  Areas	
  of	
  the	
  Field	
  of	
  Statistics". International	
  Statistical	
  Review	
  
/	
  Revue	
  Internationale	
  de	
  Statistique 69 (1).
5. .	
  http://radar.oreilly.com/2010/06/what-­‐is-­‐data-­‐science.html
References	
  &	
  Bibliography
• http://www.theguardian.com/news/datablog/2012/mar/02/data-­‐scientist#_
• http://www.indeed.com/trendgraph/jobgraph.png?q=%22data+scientist%22
• http://www.indeed.com/trendgraph/jobgraph.png?q=%22machine+learning%22
• http://blogs.nature.com/naturejobs/2013/03/18/so-­‐you-­‐want-­‐to-­‐be-­‐a-­‐data-­‐
scientist
• http://www.forbes.com/sites/rawnshah/2014/01/16/revealing-­‐data-­‐sciences-­‐
job-­‐potential/
• http://www.kdnuggets.com/2014/03/data-­‐scientist-­‐right-­‐career-­‐path-­‐candid-­‐
advice.html
• http://www.oreilly.com/data/free/analyzing-­‐the-­‐
analyzers.csp?goback=%2Egde_2013423_member_254847898
• http://www.kddi-­‐ri.jp/download/report/RA2014006

Más contenido relacionado

La actualidad más candente

Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycleManoj Mishra
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data ScienceKenny Daniel
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceData Science Thailand
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxVrishit Saraswat
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science clubData Science Club
 

La actualidad más candente (20)

Data science
Data science Data science
Data science
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
Data science
Data scienceData science
Data science
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
Big Data
Big DataBig Data
Big Data
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Data science
Data scienceData science
Data science
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
Data science
Data scienceData science
Data science
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 
Data Science
Data ScienceData Science
Data Science
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Data Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill SetData Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill Set
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science club
 

Similar a What is Data Science

Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeJosh Cowls
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyeroiisdp
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1sasi
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostelloData Con LA
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...jybufgofasfbkpoovh
 
Data Science definition
Data Science definitionData Science definition
Data Science definitionCarloLauro1
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data ScienceCarlo Lauro
 
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Lauri Eloranta
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist? BICC Thomas More
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSara-Jayne Terp
 
Data Science-1 (1).ppt
Data Science-1 (1).pptData Science-1 (1).ppt
Data Science-1 (1).pptSanjayAcharaya
 
Big data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsBig data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsrobkitchin
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...The Higher Education Academy
 
5. Do you understand how to fine tune your methodological choices?
5. Do you understand how to fine tune your methodological choices?5. Do you understand how to fine tune your methodological choices?
5. Do you understand how to fine tune your methodological choices?DoctoralNet Limited
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...CILIP MDG
 
SSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research AssessmentSSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research AssessmentWilliam Gunn
 

Similar a What is Data Science (20)

Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyer
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostello
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Data Science definition
Data Science definitionData Science definition
Data Science definition
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data Science
 
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist?
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Data Science-1 (1).ppt
Data Science-1 (1).pptData Science-1 (1).ppt
Data Science-1 (1).ppt
 
Kochalko,"Why we should stop worrying about high impact journal indicators an...
Kochalko,"Why we should stop worrying about high impact journal indicators an...Kochalko,"Why we should stop worrying about high impact journal indicators an...
Kochalko,"Why we should stop worrying about high impact journal indicators an...
 
Datasciencehandbook sample
Datasciencehandbook sampleDatasciencehandbook sample
Datasciencehandbook sample
 
Big data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsBig data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shifts
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...
 
5. Do you understand how to fine tune your methodological choices?
5. Do you understand how to fine tune your methodological choices?5. Do you understand how to fine tune your methodological choices?
5. Do you understand how to fine tune your methodological choices?
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
 
SSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research AssessmentSSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research Assessment
 

Último

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 

Último (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

What is Data Science

  • 1. What  is  Data  Science Looking  for  an  objective,  complete,  inclusive,  accurate  and  succinct   definition  of  this  emerging  field Ioannis  Kourouklides www.kourouklides.com
  • 2. Contents • Introduction • History • Related  terms • Definitions  by  various  individuals • Domain  expertise • Data  Science  in  the  job  market • How  Data  Scientists  are  self-­‐defined • Summary • Conclusion • References  &  Bibliography
  • 3. Introduction • In  a  Forbes  article,  Gil  Press  (2013)   admits  himself,  among  others,  that   Data  Science  (DS)  is  a  buzzword without  a  clear  definition • A  quick  search  in  online  and  print  resources  verifies  this  lack  of  description • Several  people  and  companies  expressed  their  own  opinion  on  the  matter • Nonetheless,  most  definitions  overlap  with  each  other • Data  Science  is  not  concerned  with  everything  that  has  to  do  with  data • A  brief  look  at  the  recent  history  can  give  more  insight • The  proper  (concrete)  definition  of  this  science  would  have  to  come  from   the  industry rather  than  academia and  might  keep  evolving  through  time
  • 4. History • The  term  “Data  Science”  has  been  around  for  more  than  30  years • It  did  not  always  have  the  same  meaning,  but  it  picked  up  since  then • Gil  Press  (2013)  authored  an  article  about  the  evolution  of  the  term • 1966:  Peter  Naur  used  the  term  “Science  of  Data”  interchangeably  with   “Datalogy”  as  a  synonym  of  Computer  Science  in  his  courses  (Naur,  1968) • 1974:  Naur  published  the  book  ‘Concise  Survey  of  Computer  Methods’ which  is  a  survey  of  modern  data  processing  methods • 1989:  Gregory  Piatetsky-­‐Shapiro  organized  and  chaired  the  first  Knowledge   Discovery  in  Databases  workshop.  In  1995,  it  became  the  annual  ACM   Conference  on  Knowledge  Discovery  and  Data  Mining  (KDD).
  • 5. History • 1996:  International  Federation  of  Classification  Societies  (IFCS)  used  the   term  “Data  Science”  for  the  first  time  in  the  title  of  the  conference (“Data  science,  classification,  and  related  methods”) • 1997: C.F.  Jeff  Wu  gave  his  inaugural  lecture  entitled  ‘Statistics  =  Data   Science?’  (“Identity  of  statistics  in  science  examined,”  1997) • 2001:  William  S.  Cleveland  published  ‘Data  Science:  An  Action  Plan  for   Expanding  the  Technical  Areas  of  the  Field  of  Statistics’ • 2002:  Launch  of  ‘Data  Science  Journal’  by  CODATA  of  ICSU • 2003:  Launch  of  ‘Journal  of  Data  Science’  by  Columbia  University • 2005:  National  Science  Board  defined  what  a  Data  Scientist  is • 2007:  Nathan  Yau  wrote  about  the  “Rise  of  the  Data  Scientist”
  • 6. Related  terms • But  let’s  look  at  some  related  (possibly  overlapping)  terms: • Machine  Learning • Data  Mining • Predictive  Analytics • Statistics • Big  Data • Data  Analysis • Business  Intelligence • Data  Engineering • Business  Analytics • Knowledge  Discovery  in  Databases • For  a  comparison  of  these  terms  with  Data  Science:  http://goo.gl/uW15El
  • 7. Definition  by  M.  Loukides • Loukides  (2010)  wrote  an  article  about  ‘What  is  data  science?’ • “Data  science  requires  skills  ranging  from  traditional  computer science to   mathematics to  art.” • “Data  scientists  combine  entrepreneurship with  patience,  the  willingness  to   build  data  products  incrementally,  the  ability  to  explore,  and  the  ability  to   iterate  over  a  solution.  They  are  inherently  interdisciplinary.  They  can   tackle  all  aspects  of  a  problem,  from  initial  data  collection  and  data   conditioning  to  drawing  conclusions.” • This  is  not  a  very  precise  definition,  but  it  is  insightful  enough • He  also  highlighted  the  industry’s  perspective  and  the  escalated  job  trends
  • 8. Definition  by  D.  Conway • Conway  (2010)  gave  a  less  vague  definition: “…one  needs  to  learn  a  lot  as  they  aspire  to   become  a  fully  competent  data  scientist.   Unfortunately,  simply  enumerating  texts  and   tutorials  does  not  untangle  the  knots.  Therefore,   in  an  effort  to  simplify  the  discussion,  and  add   my  own  thoughts  to  what  is  already  a  crowded   market  of  ideas,  I  present  the Data  Science  Venn   Diagram…  hacking  skills,  math  and  stats   knowledge,  and  substantive  expertise.”
  • 9. Definition  by  P.  Warden • An  other  description  of  DS  (Warden,  2011)  appears  to  be  the  following: • “There  is  no  widely  accepted  boundary  for  what’s  inside  and  outside  of   data  science’s  scope.  Is  it  just  a  faddish  rebranding  of  statistics?  I  don’t   think  so,  but  I  also  don’t  have  a  full  definition.  I  believe  that  the  recent   abundance  of  data  has  sparked  something  new  in  the  world,  and  when  I   look  around  I  see  people  with  shared  characteristics  who  don’t  fit  into   traditional  categories.  These  people  tend  to  work  beyond  the  narrow   specialties  that  dominate  the  corporate  and  institutional  world,  handling   everything  from  finding the  data,  processing it  at  scale,  visualizing it  and   writing  it  up  as  a  story.  They  also  seem  to  start  by  looking  at  what  the  data   can  tell  them,  and  then  picking  interesting  threads  to  follow,  rather  than   the  traditional  scientist’s  approach  of  choosing  the  problem  first  and  then   finding  data  to  shed  light  on  it.”
  • 11. Definition  by  B.  Tierney
  • 12. Definition  by  F.  Lo • When  searching  online  the  phrase  ‘define  data  science’,  an  excellent  article   (Lo,  2013)  appears  as  the  suggested/endorsed  answer  by  Google • “Data  science  is  multidisciplinary;  the  skill  set  of  a   data  scientist  lies  at  the  intersection  of  3  main   competencies.” • “Also,  a  big  misconception  is  that  data  science  is   all  about  statistics.  While  statistics  are  important,   it  is  not  the  only  type  of  mathematics  that  should   be  well-­‐understood  by  a  data  scientist.” • “A  defining  personality  trait  of  data  scientists  is   they  are  deep  thinkers  with intense  intellectual   curiosity.”
  • 13. Definition  by  M.  Mut • Mut  (2013)  went  a  step  further  and  classified  Data  Scientists  into  3  distinct   specialties  with  very  little  overlap: • “Advanced  Analysis:  Math,  Stats,  Pattern  Recognition/Learning,  Uncertainty,   Visualization,  Data  Mining” – let’s  call  them  Data  Researchers • “Computer  Systems  -­‐ Advanced  Computing,  High  Performance  Computing,   Visualization,  Data  Mining” – let’s  call  them  Data  Hackers • “Databases -­‐ Data  Engineering,  Data  Warehousing”  – let’s  call  them  Data Developers • He  claimed  that  DS  is  defined  to  include  all  these  specialties  and  thus   makes  life  confusing  for  employers  and  applicants • He  proposed  a  solution  would  be  to  educate  HR  and  employers  that  they   need  to  break  DS  into  specialties
  • 14. Definition  by  V.  Granville • However,  Granville  (2014)  and  others  disagreed  with  Mut.  They  maintained   that combining  these  different  areas  is  not  impossible  and  they  forecasted   that  in  the  future  there  will  be  more  skills  overlap  within  individuals • In  his  book  ‘Developing  Analytic  Talent:  Becoming  a  Data  Scientist’  he  seems   to  provide  the  most  convincing  and  conforming  definition:   • “Data  Science  is  the  intersection  of  computer  science,  business  engineering,   statistics,  data  mining,  machine  learning,  operations  research,  Six  Sigma,   automation  and  domain  expertise.” • “…  people  interested  in  a  data  science  career  don’t  need  to  learn  […]   everything  from  these  domains.”
  • 15. Domain  expertise • Domain  expertise  and  business  acumen  are  totally  essential  for  DS • This  depends  on  the  kind  of  data  and  their  source,  such  as: • Bioinformatics  &  Genomics • Information  Security • Computer  Vision  &  Image  Processing • Finance  &  Econometrics • Insurance • Marketing • Medicine,  Health  &  Biomedical  applications • Particle  Physics • Social  Networks • Telecoms  &  Utilities • Web  &  Text  Mining
  • 16. Data  Science  in  the  job  market • Data  Scientist  roles  can  be  referred  to  by  various  names  according  to  the   seniority  level,  the  specific  skillset  and  area  of  expertise • Frequently  required  skills  are: • Hadoop/MapReduce/MongoDB/Hive  (not  always  necessary,  sometimes  as  a  plus) • SQL  (though  less  popular  than  NoSQL) • Perl/Java/PHP/.NET/Ruby/C++ • Machine  Learning  techniques • Python/R/MATLAB/Octave/SPSS/SAS/Stata/Mathematica • Advanced  level  degree:  MSc  or  PhD • Work  experience  (typically  more  than  1-­‐3  years) • Communications  skills
  • 17. Data  Science  in  the  job  market
  • 18. How  Data  Scientists  are  self-­‐defined • Harris  et  al.  (2013)  identified  four clusters  (latent  factors)  of   Data  Scientists   in  their  book,  using  Non-­‐negative  Matrix  Factorization: • The  three  specializations  overlap  with  the  ones  mentioned  by  Mut (2013) • The  forth  one  refers  mostly  to  CDOs  (Chief  Data  Officers),  self-­‐identified  as:   Leaders,  Businesspersons,  or  Entrepreneurs Data Researcher Researcher Scientist Statistician Data Hacker Hacker Artist Jack  of  All  Trades Data  Developer Developer Engineer -­‐
  • 19. How  Data  Scientists  are  self-­‐defined • The three specializations have started to emerge as three job positions: • Nothing stops a person who studied Science from becoming a Data Developer or Data Hacker and nothing stops a person who studied Engineering from becominga Data Researcher • Thus, it is the author’s belief that the terms ‘Scientist’ and ‘Engineer’ should not have been used, as they are misleading Data Researcher Data  Scientist Data Hacker Machine Learning  Engineer Data  Developer Data Engineer
  • 20. Summary • In  brief,  one  can  split  down  the  skills  defining  DS  into  three  groups: Note:  Each  column  above  is  not  related  to  the  adjacent  ones Soft  skills Communication Business  knowledge Domain  expertise Knowledge &  Research  skills Machine  Learning  – Data  Mining Statistics  &  other  Maths Relational  Databases High  Performance  Computing Data  Visualization Coding  skills Perl/Java/C#/PHP/Ruby/C++ Python/R/MATLAB/Octave SPSS/SAS/Stata/Mathematica Hadoop/MongoDB/Hive SQL/JSON/XML/HTML/CSS
  • 21. Conclusion • To  sum  up,  DS  is  an  interdisciplinary science,  but  without  a  clear  definition • It  can  be  defined  as  a  set  of  skills  from  Computer  Science,  Statistics,  … • It  definitely  requires  some  Research qualities,  but  also  Domain Expertise • Machine  Learning  is  at  the  epicentre  of  this  newly  coined  term • Different  Data  Scientists  used  to  focus  or  specialize  in  one  area  of  expertise • It  is  the  author’s  belief  that  future  Data  Professionals  will  be  required  to  have   three  distinct specializations  similar  to  Quantitative  Professionals,  i.e.  Quant   Researchers,  Quant  Traders and  Quant  Developers  corresponding  to  Data   Scientists,  Machine  Learning  Engineers  and  Data  Engineers respectively • More  resources  can  be  found  at  the  next  slides
  • 22. References  &  Bibliography 1. Gil  Press  http://www.forbes.com/sites/gilpress/2013/05/28/a-­‐very-­‐short-­‐ history-­‐of-­‐data-­‐science/ 2. . Naur,  P.,  “'Datalogy',  the  science  of  data  and  data  processes.” IFIP   Congress  2,  1968,  pp.  1383-­‐1387. 3. "Identity  of  statistics  in  science  examined".  The  University  Records,  9   November  1997,  The  University  of  Michigan.   http://ur.umich.edu/9899/Nov09_98/4.htmRetrieved  8  August  2014. 4. Cleveland,  W.  S.  (2001).  "Data  Science:  An  Action  Plan  for  Expanding  the   Technical  Areas  of  the  Field  of  Statistics". International  Statistical  Review   /  Revue  Internationale  de  Statistique 69 (1). 5. .  http://radar.oreilly.com/2010/06/what-­‐is-­‐data-­‐science.html
  • 23. References  &  Bibliography • http://www.theguardian.com/news/datablog/2012/mar/02/data-­‐scientist#_ • http://www.indeed.com/trendgraph/jobgraph.png?q=%22data+scientist%22 • http://www.indeed.com/trendgraph/jobgraph.png?q=%22machine+learning%22 • http://blogs.nature.com/naturejobs/2013/03/18/so-­‐you-­‐want-­‐to-­‐be-­‐a-­‐data-­‐ scientist • http://www.forbes.com/sites/rawnshah/2014/01/16/revealing-­‐data-­‐sciences-­‐ job-­‐potential/ • http://www.kdnuggets.com/2014/03/data-­‐scientist-­‐right-­‐career-­‐path-­‐candid-­‐ advice.html • http://www.oreilly.com/data/free/analyzing-­‐the-­‐ analyzers.csp?goback=%2Egde_2013423_member_254847898 • http://www.kddi-­‐ri.jp/download/report/RA2014006