Data science involves using industrial research techniques on a company's own data to develop advanced algorithms that provide a competitive advantage. Data engineering is a specialized form of software engineering focused on handling and processing data using skills in areas like structured and unstructured data storage, machine learning platforms, and predictive APIs. While data science and business intelligence overlap in using data analysis, statistics, and visualization, data science has a more scientific approach focused on the future rather than the past. Data-focused jobs are in high demand across many industries, especially technology, but some roles may become automated, increasing the value of skills like research and communication. Education options for these fields include academic programs, boot camps, and online classes.
21. “The qualifications for the job include the
strength to tunnel through mountains of
information and the vision to discern patterns
where others see none”
- Bloomberg Businessweek
23. let’s compare…
academic science data science
Teams PhDs, graduate
students
PhDs, technologists
Setting University Company
Publication Formal (academic
publications,
conferences)
Less formal (blogs, white
papers, open source)
Funding Public grants Corporate
Goal Advance human
knowledge
Create competitive
advantage
24. Data science is industrial science
It shares some attributes with academic science,
but has other differences
30. data engineering is a specialized kind of
software engineering
with additional skills in
handling and processing data
31. data science vs. data engineering
data science data engineering
Approach Scientific (Exploration) Engineering (Development)
Problems Unbounded Bounded
Path to Solution Iterative, exploratory, nonlinear Mostly linear
Education More is better (PhD’s common) BS and/or self-trained
Presentation Skills Important Not as important
Research
experience
Important Not as important
Programming skills Not as important Important
Data skills Important Important
32. What kind of special training does a data
engineer need?
33. Data storage and processing
– structured: (SQL)
– unstructured (NoSQL)
– Big Data (Hadoop, Apache Spark/Storm/Flink, cloud)
Data visualization
Machine Learning algorithms and platforms (ex. Dato)
Predictive APIs (ex. Watson)
34. Does a data engineer need more math
than a regular software engineer?
45. Ultimately, all of them.
Incorporating AI is a large business opportunity
46. data jobs are in demand
• “The hot job of the decade… Data scientists
today are akin to Wall Street “quants” of
the 1980s and 1990s”
- Harvard Business Review
• “18.7% projected growth 2010-2020”
- VentureBeat
• “McKinsey projects […] ‘50 percent to
60 percent gap between supply and requisite
demand’”
- Bloomberg Businessweek
47. On the other hand…
Some people believe data jobs themselves will be
automated:
“New Teradata Platform Reduces
Demand For Data Scientists”
- Forbes
“Automating the Data Scientist”
- MIT Technology Review
48. What do we think?
• Yes, advanced tools will automate some data
exploration
• But: research and communication are
fundamental skills and are always in demand
when the world is changing
• Data will continue to explode (Internet of Things)
• We will see more change and faster change
49. education for data jobs
options include:
academic programs,
boot camps,
and online classes
(Coursera ,
Udacity)
50. for data engineering:
– documentation and webinars (self-education)
– focus on data manipulation tools and machine
learning
51. for data science:
– The more academic science and research expertise,
the better
– Focus on projects that solve unknown problems
– Work with more experienced data scientists
Statistically model human behavior
Predict and respond to humans
Understand natural language and the natural world
Understand subtle patterns in big data
On a large team, Data Science and Data Engineering are separate roles
On a small team, a Data Scientist must do (at least some) of his/her own Data Engineering
The roles are new and not strictly defined. Today, often one role is called by the other’s name.