This document provides an overview of data science. It defines data science as an interdisciplinary field that uses scientific methods and processes to extract knowledge and insights from data. It discusses what constitutes data and how "big data" refers to extremely large and growing collections of data that are challenging to store and process using traditional tools. The document outlines some examples of the scale of big data generated daily. It also reviews the growing demand for data scientists and the skills needed in data science, including statistics, machine learning, data visualization, and related fields. Finally, it provides some learning resources for becoming a data scientist.
2. What is DataIn.me?
A community for data science enthusiasts
Email: info@datain.me Facebook: fb.com/DataIn.me
Website: www.DataIn.me Twitter: twitter.com/DataInme
2
3. What is Data Science?
Data Science is an interdisciplinary field about scientific methods,
processes, and systems to get knowledge or insights from data in
various forms.
3
4. What is Data?
4
"The quantities, characters, or symbols on which operations are performed by
a computer, which may be stored and transmitted in the form of electrical
signals and recorded on magnetic, optical, or mechanical recording media. "
6. Big Data
•'Big Data' is a term used to
describe collection of data that is
huge in size and yet growing
exponentially with time.
•In short, such a data is so large
and complex that none of the
traditional data management
tools are able to store it or
process it efficiently.
6
7. How Big is Big Data?
The New York Stock Exchange
generates about one terabyte of new
trade data per day.
Statistic shows that 500+terabytes of
new data gets ingested into the
databases of social media site
Facebook, every day.
Single Jet engine can generate
10+terabytes of data in 30 minutes of
a flight time.
7
10. 10
IBM Predicts Demand For Data Scientists Will Soar 28% By 2020
• Market Growth: By 2018, data science jobs in the U.S. will exceed 490,000,
with fewer than 200,000 available data scientists to fill these positions
(McKinsey & Co.)
• Average Salary: Between $116,000 and $163,500 in 2017 (Forbes)
• Job openings: Shortage of up to 1.5 million by 2018 (McKinsey & Co.)
13. Why statistics?
Statistics play a central role in data science.
Concepts & Statistical techniques can be applied to data to analyze data
to get insights.
13
17. What is Machine Learning?
Machine learning is a type of Artificial Intelligence that allows software
applications to become more accurate in predicting outcomes without
being explicitly programmed.
17
20. Data Visualization
Data visualization refers to representing data in a visual context, like a
chart or a map, to help people understand the significance of that data.
20
26. Summary
The extraordinary spread of computers and online data is changing
forever the way decisions are made in many fields, from medicine to
marketing to scientific research.
Dramatic growth in the scale and complexity of data that can be
collected and analysed is affecting all aspects of work and society
including health care, business practices, public safety, scientific
discoveries and public policy.
Understanding effective and ethical ways of using vast amounts of data
is a significant challenge to science and to society as a whole, and
developing scalable techniques for data analysis and decision making
requires interdisciplinary research in many areas, including machine
learning, algorithms, statistics, operations research, databases,
complexity analysis, visualization, and privacy and security.
26