This Data Science Presentation will help you in understanding what is Data Science, why we need Data Science, prerequisites for learning Data Science, what does a Data Scientist do, Data Science lifecycle with an example and career opportunities in Data Science domain. You will also learn the differences between Data Science and Business intelligence. The role of a data scientist is one of the sexiest jobs of the century. The demand for data scientists is high, and the number of opportunities for certified data scientists is increasing. Every day, companies are looking out for more and more skilled data scientists and studies show that there is expected to be a continued shortfall in qualified candidates to fill the roles. So, let us dive deep into Data Science and understand what is Data Science all about.
This Data Science Presentation will cover the following topics:
1. Need for Data Science?
2. What is Data Science?
3. Data Science vs Business intelligence
4. Prerequisites for learning Data Science
5. What does a Data scientist do?
6. Data Science life cycle with use case
7. Demand for Data scientists
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
The Data Science with python is recommended for:
1. Analytics professionals who want to work with Python
2. Software professionals looking to get into the field of analytics
3. IT professionals interested in pursuing a career in analytics
4. Graduates looking to build a career in analytics and data science
5. Experienced professionals who would like to harness data science in their fields
2. What’s in it for you
What is Data Science?
Data Science vs Business Intelligence
What does a Data Scientist do?
Data Science lifecycle with example
Data Scientist demand
Need for Data Science
The Prerequisites for learning Data Science
4. Need For Data Science
Does the thought of your car
driving you home by itself excite
you?
5. Is that even possible ?
Need For Data Science
Does the thought of your car
driving you home by itself excite
you?
6. This is where the ‘Need For Data Science’
comes into picture. Data Science helps in
making better decisions!
Need For Data Science
7. You mean it will be able to take decisions like
slowing down, stopping by itself, speeding up
and all of that?
Need For Data Science
This is where the ‘Need For Data Science’
comes into picture. Data Science helps in
making better decisions!
8. You mean it will be able to take decisions like
slowing down, stopping by itself, speeding up
and all of that?
Exactly! And then let the machine
learn iteratively using
unsupervised learning!
Need For Data Science
This is where the ‘Need For Data Science’
comes into picture. Data Science helps in
making better decisions!
9. You mean it will be able to take decisions like
slowing down, stopping by itself, speeding up
and all of that?
That’s interesting!
Need For Data Science
Exactly! And then let the machine
learn iteratively using
unsupervised learning!
This is where the ‘Need For Data Science’
comes into picture. Data Science helps in
making better decisions!
10. Self driving cars will root out more than 2 million deaths caused by car
accidents annually.
Need For Data Science
11. Due to lack of data available, flights
are often delayed or cancelled at
the last minute
1
Need For Data Science
We’re extremely sorry to
inform that your flight has
been delayed by 4 hours
due to bad weather
conditions. Regret the
inconvenience caused
2
3
12. 1
Need For Data Science
Due to improper route planning,
customers don’t get the flight for
desired time and duration
We’re extremely sorry to
inform you that there are
no flights for the time
selected. There’s a
connecting flight for the
same time tomorrow.
2
3
Due to lack of data available, flights
are often delayed or cancelled at
the last minute
13. 2
1
3
Need For Data Science
Dear Flyer, We regret to
inform you that your flight
has been cancelled due to
delay from Airbus on
account of engine delivery
Incorrect decisions in selection of
right equipment leads to unplanned
delays and cancellations
Due to lack of data available, flights
are often delayed or cancelled at
the last minute
Due to improper route planning,
customers don’t get the flight for
desired time and duration
14. Need For Data Science
With Data Science, it has become possible to predict
such disruptions and alleviate the loss for both airline
and the passenger
15. Need For Data Science
Using Data Science, we can
achieve the following: Route Planning: Whether to
schedule direct or connecting
flights
Predictive analytics model
can be built to foresee
flight delays
Deciding which class of planes
to purchase for better
performance
Promotional offers
depending on customer
booking patterns
16. Logistics companies like FedEx are using Data Science
models for operational efficiency
Discover the best
routes to ship
The best suited time to
deliver
The best mode of transport
Need For Data Science
17. Need For Data Science
So Data Science is mainly needed for:
Better Decision Making
Whether A or B?
Predictive Analysis
What will happen next?
Pattern Discovery
Is there any hidden information in the
data?
19. What is Data Science?
Suppose, you have decided to buy furniture online for
your new office
How do you choose the right website?
20. What is Data Science?
Want to buy online furniture?
Does website sell furniture
?
Yes
Rating > 4 out of 5
Yes
Purchase Product
No
Close website
No
Close website
Yes
Discount > 20%
No
Close website
21. Which route should my cab take so
that I reach faster?
Which viewers like the same kind
of TV shows?
Will this refrigerator fail in the next 3
years: Yes or No?
Who will win the elections?
Data Science can answer a lot of other questions as well!
What is Data Science?
22. What is Data Science?
Finally
communicating
and visualizing
the results
Asking the right
questions and
exploring the data
Modeling the data
using various
algorithms
So, Data Science or Data-driven Science is about:
26. Business Intelligence vs Data Science
Structured data e.g. Data
Warehouse
Unstructured data e.g. web logs
Data Source
Method Analytical Scientific
Skills Statistics, Visualization Statistics, Visualization, Machine
Learning
Focus Past and Present Data Present Data and Future
Predictions
Criterion Business Intelligence Data Science
28. Prerequisites for Data Science
Only when you ask questions, you will have a
better understanding of the business problem
CURIOSITY
The following are the 3 essential traits of a Data Scientist:
29. Prerequisites for Data Science
COMMON SENSE
To identify new ways to solve a business
problem and to detect priority problems
The following are the 3 essential traits of a Data Scientist:
CURIOSITY
30. Prerequisites for Data Science
COMMUNICATION SKILLSCOMMON SENSE
A Data Scientist needs to communicate
their findings to business teams to act upon
the insights
The following are the 3 essential traits of a Data Scientist:
CURIOSITY
31. Machine learning is the backbone of Data
Science. It is one of the many ways that
Data Science uses to find solution to a
problem
Prerequisites for Data Science
1 MACHINE LEARNING
32. Prerequisites for Data Science
Mathematical Models can be extremely
helpful to make fast calculations and
predictions from what you know about
your data
1
2
MACHINE LEARNING
MATHEMATICAL
MODELLING
33. Prerequisites for Data Science
Statistics is foundational to Data Science.
It lets you extract knowledge and obtain
better results from the data
3
1
2
MACHINE LEARNING
STATISTICS
MATHEMATICAL
MODELLING
34. Prerequisites for Data Science
You should know at least one
programming language, preferably
Python or R for data modelling
4
1
2
3
MACHINE LEARNING
STATISTICS
COMPUTER
PROGRAMMING
MATHEMATICAL
MODELLING
35. MACHINE LEARNING
Prerequisites for Data Science
STATISTICS
COMPUTER
PROGRAMMING
The discipline of querying
databases teaches you to ask
better questions as a Data
Scientist
51
2
3
4
MATHEMATICAL
MODELLING
DATABASES
36. Tools/Skills used in Data Science
Skills: R, Python, Statistics
Tools: SAS, Jupyter, R studio, MATLAB,
Excel, RapidMiner
Data Analysis
Skills: ETL, SQL,Hadoop, Apache Spark,
Tools: Informatica/ Talend, AWS Redshift
Data Warehousing
Skills: R, Python libraries
Tools: Jupyter, Tableau, Cognos, RAW
Data Visualization
Skills: Python, Algebra, ML Algorithms, Statistics
Tools: Spark MLib, Mahout, Azure ML studio
Machine Learning
39. What does a Data Scientist do?
Raw Data
Real World
40. What does a Data Scientist do?
Raw Data
Process and Analyze
Real World
41. What does a Data Scientist do?
Raw Data
Process and Analyze
Meaningful Data
Real World
42. What does a Data Scientist do?
Raw Data
Process and Analyze
Meaningful Data
Real World
Useful Insights
43. Must Know Machine Learning Algorithms
Naive Baiyes
Support Vector MachineClustering
The most basic and important techniques that you should know as a Data
Scientist are
Decision TreeRegression
Note to instructor: Please say that they can find the videos on specific algorithms
in the video description below
45. Concept Study – Life Cycle
CONCEPT STUDY
Understanding the problem statement, thorough study of the business model
is required.
1
2
3 4
5
6
46. What is the Example?
What is the
end goal?
What is the budget?
What are the
various
specifications?
Concept Study – Example
47. Concept of the task : Predict the price of 1.35 carat diamond
Get to know about the diamond industry, various terminologies used. Understand the business
problem and collect RELEVANT and enough data
Suppose, we get the price of diamonds from different diamond
retailers. Now, we want to find out the price of 1.35 carat diamond
Concept Study – Example
48. Data Preparation - Life cycle
Data Preparation
Also known as Data Munging, it is the most important aspect of Data Science
lifecycle for any valuable insights to pop up.
1
2
3 4
5
6
49. Data Integration
Resolving any
conflicts in the data
and handling
redundancies
Data Cleaning
Correcting inconsistent data
by filling out missing values
and smoothing out noisy data Data Transformation
It involves normalization,
transformation and
aggregation of data using
ETL methodsData Reduction
Using various strategies,
reducing the size of data
but yielding the same
outcome
Data Preparation - Life cycle
50. Data Preparation - Example
Missing
Value
Improper
Datatype
Null Value
Data preparation: Make the data clean and valuable.
51. Data Preparation - Example
Ways to fill missing data values:
If dataset is huge, we can
simply remove the rows
with missing data vales. It
is the quickest way.
i.e. we use the rest of the
data to predict the values.
We can substitute missing
values with mean of rest of
the data using pandas’
dataframe in Python.
i.e. df.mean()
df.fillna(mean)
52. • Split the data into train data and test data in the ratio of 80:20
• It is generally advised to divide the dataset into two random partition
Data Preparation - Example
Train data (80%)
Test data (20%)
53. Model Planning - Life cycle
Model Planning:-
After proper understanding and cleaning of the data in hand, suitable
model is selected.1
2
3 4
5
6
54. Model Planning:
• This step involves Exploratory Data Analysis (EDA) to understand the relation between
variables and to see what the data can tell us
• Key variables are selected
Model Planning - Life cycle
55. But what is
Exploratory
Data Analysis?
Definition : Deeper analysis of dataset to better understand the data.
Model Planning - Life cycle
Goals :
• Know the datatypes and answer questions with the data
• Understand how data is distributed
• Identify outliers
• Identify patterns, if any
57. Model Planning - Example
Test Data
(20%)
Train Data
(80%)
Model is created
Feedback
• Train Data is used to develop model
• Test Data is used to validate model
Train Data vs Test Data
Improvement
59. Model Building - Life cycle
Model Building :-
Using various analytical tools and techniques, data is transformed with the
goal of ‘discovering’ useful information to build the right model
1
2
3 4
5
6
60. Model Building:
On analyzing the data, we observe that the output is progressing linearly. Hence, we are using Linear Regression
Algorithm for Model Building in this case
Model Building - Example
Rs. 15,000
Carat
Rs.5,000
Rs.10,000
Price of diamond
0.5 1.0 1.5
1.35
Regression
line
61. Model Building - Example
Linear regression describes the relation between 2 variables i.e. X and Y
After the regression line is drawn, we can predict Y value for a input X value
using following formula: Y = mX + c
m = Slope of the line
c = Y intercept
X is Independent
variable
62. Model Building - Example
Linear regression describes the relation between 2 variables i.e. X and Y
After the regression line is drawn, we can predict Y value for a input X value
using following formula: Y = mX + c
m = Slope of the line
c = Y intercept
X is Independent
variable
Y is dependent
variable
63. Collected &
Analysed Data
(Carat, price)
Output
Test data
Model Building Prediction
(Price)
(Carat)
Model Building - Example
Using test data set, the built model is validated for the best accuracy
Feedback
64. Prediction:
After successful validation of the model, we predict the price of 1.35 carat diamond
Model Building - Example
Rs. 15,000
Carat
Rs.5,000
Rs.10,000
Price of diamond
0.5 1.0 1.5
1.35
Regression
line
65. Prediction:
Thus, using Simple Linear Regression algorithm we have implemented a successful model and predicted the price of
1.35 carat diamond to be Rs. 10,000
Model Building - Example
Rs. 15,000
Carat
Rs.5,000
Rs.10,000
Price of diamond
0.5 1.0 1.5
1.35
Regression
line
66. This model is easily built using Python packages like pandas,
matplotlib, numpy
We will study this in detail in the upcoming Data Science Tutorial
using Python
Model Building - Example
67. Communication - Life cycle
Communicate results:
Keys findings are identified and conveyed to the stakeholders
Communicate results
1
2
3 4
5
6
68. Communication - Life cycle
The Battle is not over yet!!
A good Data Scientist should be able to communicate his findings
with the business team such that it easily goes into execution
phase
69. Life cycle of Data Science project
Operationalize: -
Final reports, code, and technical documents are delivered by the team.
1
2
3 4
5
6
70. Summary - Life cycle
Operationalize
1
2
3 4
5
6
Concept Study
Data Preparation
Model Planning Model Building
Communicate Results
72. Demand for Data Scientist
Marketing
Finance
Healthcare
Gaming
Industries with high demand of Data Scientists:
Technology
73. Summary
Need For Data Science What is Data science? Prerequisites of data science
Demand for data scientistLifecycle with exampleTools Used in Data science
Data-driven science, is an interdisciplinary
field of scientific methods,
processes, algorithms and
systems to extract knowledge or insights
from data in various forms
Data-driven science, is an interdisciplinary
field of scientific methods,
processes, algorithms and
systems to extract knowledge or insights
from data in various forms
Data-driven science, is an interdisciplinary
field of scientific methods,
processes, algorithms and
systems to extract knowledge or insights
from data in various forms
Good insight of the workings of the DBMS will surely take you a long way.
A Data Scientist collects as much raw
data as possible from the real world
A Data Scientist collects as much raw
data as possible from the real world
A Data Scientist collects as much raw
data as possible from the real world
A Data Scientist collects as much raw
data as possible from the real world
A Data Scientist collects as much raw
data as possible from the real world
Iwe can also use
Natural language processing to enable it to communicate successfully in English (or some other human language).
Knowledge representation to store information provided before or during the interrogation.
Automated reasoning to use the stored information to answer questions and to draw new conclusions.
Machine learning to adapt to new circumstances and to detect and extrapolate patterns.