Explains: What is Data Science? What is the difference between Data Science and Data Engineering, and between Data Science and Business Intelligence? What type of work do Data Scientists do, and what types of companies employ them? What is the job outlook for Data Science? What professional education is required?
2. What we will talk about
• Data Science:
• What is Data Science?
• What kind of work to Data Scientists do?
• Employment demand for Data Science jobs
• What kind education is required?
• What is Data Engineering, and how does it differ from
Data Science?
• What is the difference between Data Science and
Business Intelligence?
3. Who am I?
• David Rostcheck
• I’m a consulting Data Scientist
• I have worked in various software
roles (software engineer, enterprise
architect, etc.)
• My degree is in Physics
• I write articles on Data Science on
http://linkedin.com/in/davidrostcheck
4. What is Data Science?
• Data Science is industrial research on a company’s own
data
• Goal: produce advanced algorithms that produce a
competitive advantage
• Often work with unstructured data, may be large
• “The qualifications for the job include the strength to
tunnel through mountains of information and the vision to
discern patterns where others see none”
- Bloomberg Businessweek
5. Is Data Science really science?
Academic Science Industrial Research
Teams PhDs, graduate students PhDs, technologists
Setting University Company
Publication Formal (academic
publications, conferences)
Less formal (blogs, white papers,
open source)
Funding Public grants Corporate
Goal Advance human knowledge Create competitive advantage
- Data Science is industrial science
- It shares some attributes with academic
science, but has other differences
6. What kind of work do Data Scientists do?
• Create Artificially Intelligent systems (“narrow AI”)
• Examples:
• Recommender systems
• Self-driving cars
• AI agents
• Smart energy management
• Medical diagnosis
• Machine vision
7. Data Science is in Demand
• “The hot job of the decade… Data scientists today are
akin to Wall Street “quants” of the 1980s and 1990s”
- Harvard Business Review
• “18.7% projected growth 2010-2020”
- VentureBeat
• “McKinsey projects […] ‘50 percent to 60 percent gap
between supply and requisite demand’”
- Bloomberg Businessweek
8. On the other hand…
• Some people believe Data Science itself will be
automated
• “New Teradata Platform Reduces Demand For Data
Scientists”
- Forbes
• “Automating the Data Scientist”
- MIT Technology Review
9. What do I think?
• Yes, advanced tools will automate some data exploration
• But: research and communication (the fundamental skills),
are always in demand when the world is changing
• Data will continue to explode (Internet of Things)
• We will see more change and faster change
10. What is Data Engineering?
• Specialized type of software engineering
• Requires additional training in:
• Data (SQL, NoSQL, data visualization) and Big Data
(Hadoop, Apache Spark/Storm/Flink, cloud)
• Machine Learning algorithms and platforms (ex. Dato)
• Predictive APIs (ex. Watson)
• Linear Algebra & Calculus really help to understand
Machine Learning
11. Data Engineering vs. Data Science
Data Science Data Engineering
Approach Scientific (Exploration) Engineering (Development)
Problems Unbounded Bounded
Path to
Solution
Iterative, exploratory,
nonlinear
Mostly linear
Education More is better (PhD’s
common)
BS and/or self-trained
Presentation
Skills
Important Not as important
Research
experience
Important Not as important
Programming
skills
Not as important Important
Data skills Important Important
12. Data Science vs. Business Intelligence
Business Intelligence (BI) Data Science
Data analysis Yes Yes
Statistics Yes Yes
Visualization Yes Yes
Data Sources Usually SQL, often Data
Warehouse
Less structured (logs, cloud data,
SQL, noSQL, text)
Tools Statistics, Visualization Statistics, Machine Learning,
Graph Analysis, NLP
Focus Present and past Future
Approach Analytic Scientific
Goal Better strategic decisions Advanced functionality
The two fields are closely related. In some ways Data Science is an
evolution of BI.
13. What industries use Data Science?
• Now: Technology (employ over 50%), Education, Finance,
Consulting, Health Care
• But: “Technology” companies like Uber, Amazon, AirBnB
compete in other industries (transportation, retail, hotels)
• “Software is eating the world” – Andreessen Horowitz
• What industries will AI change? Ultimately, all of them.
• Incorporating AI == large business opportunity
14. Education for Data Science/Engineering
• Academic programs
• Boot camps
• Online classes (Coursera & Udacity)
• For Data Engineering:
• Documentation and webinars (self-education)
• Focus on data manipulation tools and Machine Learning
• For Data Science:
• The more academic science and research expertise, the better
• Focus on projects that solve unknown problems
• Work with more experienced Data Scientists
16. “Big Data”
• Specialized technologies and techniques for working with very
large data sets
• Often too big to process with one computer – need clusters
and/or cloud computing
• Data may change rapidly
• Specialized tools: Map/Reduce, Apache
Hadoop/Spark/Storm/Flink, Elastic, etc.
• Large demand, but keep perspective:
• Big Data tools can be more awkward
• It is often easier to solve problems at small scale, then scale up, if
possible
Notas del editor
Statistically model human behavior
Predict and respond to humans
Understand natural language and the natural world
Understand subtle patterns in big data
- Machine Learning is here to stay
On a large team, Data Science and Data Engineering are separate roles
On a small team, a Data Scientist must do (at least some) of his/her own Data Engineering
The roles are new and not strictly defined. Today, often one role is called by the other’s name.