SlideShare una empresa de Scribd logo
1 de 30
»Significance
»Advantages
»Process of Data Science
»Roles in a Data Science Project
»Stages of Data Science Project
»Responsibilities of Data Scientist
»Qualifications of Data Scientist
»Data Science vs Big Data
»Python Libraries for Data Science
M Vishnuvardhan
Data Science
» Data science, also known as data-driven science, is an interdisciplinary
field about scientific methods, processes, and systems to extract
knowledge or insights from data in various forms, either structured or
unstructured
» A data scientist manages big data. They take a large amount of data
points and use their skills in programming, math, and statistics to
organize, and clean them.
M Vishnuvardhan
Data Science and its Importance
The concept of data science is to help unify statistics, machine
learning, data analysis, and other related methods. That way people
will better understand and analyze information with data. Data
Science tends to be used to describe predictive modeling, business
intelligence, business analytics, or other uses of data.
Data Science is about uncovering hidden information that may be
able to help companies make smarter choices for their business.
Eg: Spotify recommends new music, Netflix recommends new
movies, spam filter in Gmail, recommendation engine in Amazon,
M Vishnuvardhan
Data Science Advantages
» Monetizing the data
» Mitigating company risk
» Better understanding the customers
» Unique insights for businesses
» Business Expansion
» Improve forecasting
» Objective decisions for businesses
M Vishnuvardhan
Process of Data Science
» Frame the problem
» Collection of data to solve the problem
» Process the data
» Explore the data
» Perform in-depth analysis
» Communication of result analysis
M Vishnuvardhan
Data Science Project - Roles
» Project sponsor
» Client
» Data scientist
» Data architect
» Operations
M Vishnuvardhan
Stages of Data Science Project
M Vishnuvardhan
Stages of Data Science Project
» Define Goal
» Data Collection and Management
» Modelling
» Model evaluation
» Presentation and documentation
» Model deployment and maintenance
M Vishnuvardhan
Stages of Data Science Project
» Define Goal
» Data Collection and Management
» Modelling
» Model evaluation
» Presentation & documentation
» Model deployment & maintenance
» Why do the sponsors want the project?
» What do they lack, and what do they need?
» What are they doing to solve the problem, and
why isn’t that good enough?
» What resources will you need ie., staff,
domain experts
» How do the project sponsors plan to deploy
your results?
» What are the constraints that have to be met
for successful deployment?
M Vishnuvardhan
Stages of Data Science Project
» Define Goal
» Data Collection & Management
» Modelling
» Model evaluation
» Presentation & documentation
» Model deployment & maintenance
This step includes identifying the data you need,
exploring it, and conditioning it to be suitable for
analysis
» What data is available to me?
» Will it help me solve the problem?
» Is it enough?
» Is the data quality good enough?
M Vishnuvardhan
Stages of Data Science Project
» Define Goal
» Data Collection & Management
» Modelling
» Model evaluation
» Presentation & documentation
» Model deployment & maintenance
Statistics and machine learning is used during
modelling stage The most common data science
modelling tasks are these:
» Classification
» Scoring
» Ranking
» Clustering
» Finding relations
» Characterization
M Vishnuvardhan
Stages of Data Science Project
» Define Goal
» Data Collection & Management
» Modelling
» Model evaluation
» Presentation & documentation
» Model deployment & maintenance
Once you have a model, you need to determine if
it meets your goals:
» Is it accurate enough for your needs? Does it
generalize well?
» Does it perform better than “guess”? Better
than whatever estimate you currently use?
» Do the results of the model make sense in the
context of the problem domain?
M Vishnuvardhan
Stages of Data Science Project
» Define Goal
» Data Collection & Management
» Modelling
» Model evaluation
» Presentation & documentation
» Model deployment & maintenance
Once you have a model that meets your success
criteria, you must present your results to your
project sponsor and other stakeholders. You must
also document the model
M Vishnuvardhan
Stages of Data Science Project
» Define Goal
» Data Collection & Management
» Modelling
» Model evaluation
» Presentation & documentation
» Model deployment &
maintenance
model is put into operation. In many organizations
this means the data scientist no longer has
primary responsibility for the day-to-day
operation of the model. But you still should
ensure that the model will run smoothly and
won’t make disastrous unsupervised decisions.
You also want to make sure that the model can be
updated as its environment changes.
M Vishnuvardhan
Responsibilities of a Data Scientist
» Recommend the most cost-effective changes that should be made to
existing strategies and procedures.
» Communicate findings and predictions to IT and management
departments through effective reports and visualizations of data.
» Come up with new algorithms to figure out problems and create new
tools to automate work.
» Device data-driven solutions to the challenges that are most pressing.
» Examine and explore data from several different angles to find hidden
opportunities, weaknesses, and trends.
M Vishnuvardhan
Responsibilities of a Data Scientist
» Prune and clean data to get rid of the irrelevant information.
» Employ sophisticated analytics programs, statistical methods, and
machine learning to get data ready for use in a prescriptive and
predictive modelling.
» Extract data from several external and internal sources.
» Conduct undirected research and create open-ended questions
M Vishnuvardhan
Qualifications of Data Scientists- Technical
» Cloud tools such as Amazon S3.
» Big data platforms such as Hive & Pig, and Hadoop. Python, Perl, Java,
C/C++
» SQL databases, as well as database querying languages.
» SAS and R languages. Unstructured data techniques.
» Data visualization and reporting techniques. Data munging and cleaning.
» Data mining
» Software engineering skills
» Machine learning techniques and tools. Statistics
» Mathematics
M Vishnuvardhan
Qualifications of Data Scientists- Business
» Industry knowledge: It’s important to understand how your chosen
industry works and how the data is utilized, collected, and analyzed.
» Intellectual curiosity: Data Scientists have to explore new territories and
find unusual and creative ways to solve problems.
» Effective communication: Data Scientists have to explain their
discoveries and techniques to non-technical and technical audiences in a
way that they can understand.
» Analytic problem-solving: Data Scientists approach high-level challenges
with clear eyes on what is important. They employ the right methods
and approaches to create the best use of human resources and time
M Vishnuvardhan
Data Science and Big Data
Big data refers to the large group of heterogeneous data that comes from
various sources. This data encompasses all different types of data;
unstructured, semi-structured, and structured information that can be
found easily throughout the internet. Big data includes:
» Structured data: transaction data, OLTP, RDBMS, and other structured
formats.
» Semi-Structured: text files, system log files, XML files, etc.
» Unstructured data: web pages, sensor data, mobile data, online data
sources, digital audio, and video feeds, digital images, tweets, blogs,
emails, social networks, and other sources.
M Vishnuvardhan
Data Science and Big Data
Big Data Data Science
M
e
a
n
i
n
g
• Large volumes of data that can’t be
handled using a normal database
program.
• Characterized by velocity, volume,
and variety.
• Data focused scientific activity.
• Similar in nature to data mining.
• Harnesses the potential of big data to
support business decisions.
• Includes approaches to process big
data.
M Vishnuvardhan
Data Science and Big Data
Big Data Data Science
C
o
n
c
e
p
t
• Includes all formats and types of
data.
• Diverse data types are generated
from several different sources.
• Helps organizations make decisions.
• Provides techniques to help extract
insights and information to create large
datasets.
M Vishnuvardhan
Data Science and Big Data
Big Data Data Science
B
a
s
i
s
o
f
F
o
r
m
a
t
i
o
n
• Data is generated from system logs.
• Data is created in organizations –
emails, spreadsheets, DB, transactions,
and so on.
• Online discussion forums.
• Video and audio streams that include
live feeds.
• Electronic devices – RFID, sensors, and
so on.
• Internet traffic and users.
• Working apps are made by
programming developed models.
• It captures complex patterns from
big data and developed models.
• It is related to data analysis,
preparation, and filtering.
• Applies scientific methods to find
the knowledge in big data.
M Vishnuvardhan
Data Science and Big Data
Big Data Data Science
A
p
p
l
i
c
a
t
i
o
n
A
r
e
a
s
• Security and law enforcement.
• Research and development.
• Commerce.
• Sports and health.
• Performance optimization.
• Optimizing business processes.
• Telecommunications.
• Financial services.
• Web development.
• Fraud and risk detection.
• Image and speech recognition.
• Search recommenders.
• Digital advertisements.
• Internet search.
• Other miscellaneous areas and
utilities
M Vishnuvardhan
Data Science and Big Data
Big Data Data Science
A
p
p
r
o
a
c
h
• To understand the market and to
gain new customers.
• To find sustainability.
• To establish realistic ROI and
metrics.
• To leverage datasets for the
advantage of the business.
• To gain competitiveness.
• To develop business agility.
• Data Visualization and prediction.
• Data destroy, preserve, publishing,
processing, preparation, or acquisition.
• Programming skills, like NoSQL, SQL,
and Hadoop platforms.
• State-of-the-art algorithms and
techniques for data mining.
• Involves the extensive use of statistics,
mathematics, and other tools.
M Vishnuvardhan
Python Libraries used for Data Science
» NumPy: is a Python module for numerical computation that can process
massive amounts of data and perform array computations. NumPy
integrates seamlessly with other libraries commonly used in data
science, such as pandas and Matplotlib.
» Matplotlib: Matplotlib is a visualization-building plotting package that is
used to plot graphs and charts. It is frequently utilized for data analysis
due to the charts and histograms that it generates.
» Seaborn: A Matplotlib-based package is used to make visualizations that
are more enticing and instructive. These include themes, color palettes,
and custom fonts.
M Vishnuvardhan
Python Libraries used for Data Science
» Scikit-learn: is a machine learning package for Python that offers
practical tools for data analysis and mining
» TensorFlow: An open-source software framework created by Google
called TensorFlow enables dataflow and differentiable programming for
a variety of purposes, including machine learning. TensorFlow used to
run ML algorithms on Smartphones, the internet, and the cloud.
» Keras: Keras is a Python-based high-level neural network API that can
operate on top of TensorFlow, CNTK, or Theano. It was created with the
goal of allowing for quick experimentation. It supports neural network
layers, activation functions, loss functions, and optimizers that are
typical in neural networks.
M Vishnuvardhan
Python Libraries used for Data Science
» PyTorch : is an open-source machine learning library used for tasks like
computer vision and natural language processing. It was created by
Facebook's AI research team and is extensively used in both business and
academia. PyTorch supports distributed computation, enabling quick and
effective model training on huge datasets.
» Pandas: Pandas is a popular data science library. It provides a range of
functions for data manipulation, data analysis, and data visualization,
» Statsmodels: provides a range of statistical models as well as tools for
data scientists. The models include linear and logistic regression or
generalized linear models. It also easily integrates seamlessly with
Pandas, to analyze and visualize data stored in data frames.
M Vishnuvardhan
Python Libraries used for Data Science
» NLTK or Natural Language Toolkit: It is used for natural language
processing. Some data scientists deal with the analysis of natural
language data. It provides a range of functions for text processing. It
also offers functions for sentiment analysis, which is the process of
determining the sentiment or opinion expressed in a piece of text.
M Vishnuvardhan

Más contenido relacionado

La actualidad más candente

Chapter 8 system analysis and design
Chapter 8   system analysis and designChapter 8   system analysis and design
Chapter 8 system analysis and designPratik Gupta
 
System analysis and_design
System analysis and_designSystem analysis and_design
System analysis and_designTushar Rajput
 
behavioral model (DFD & state diagram)
behavioral model (DFD & state diagram)behavioral model (DFD & state diagram)
behavioral model (DFD & state diagram)Lokesh Singrol
 
L7 decision tree & table
L7 decision tree & tableL7 decision tree & table
L7 decision tree & tableNeha Gupta
 
Transaction Management
Transaction Management Transaction Management
Transaction Management Visakh V
 
Resource Management for Computer Operating Systems
Resource Management for Computer Operating SystemsResource Management for Computer Operating Systems
Resource Management for Computer Operating Systemsinside-BigData.com
 
Recovery Techniques and Need of Recovery
Recovery Techniques and   Need of RecoveryRecovery Techniques and   Need of Recovery
Recovery Techniques and Need of RecoveryPooja Dixit
 
Dbms 14: Relational Calculus
Dbms 14: Relational CalculusDbms 14: Relational Calculus
Dbms 14: Relational CalculusAmiya9439793168
 
Databases to improve business performance and decision making Client-server a...
Databases to improve business performance and decision making Client-server a...Databases to improve business performance and decision making Client-server a...
Databases to improve business performance and decision making Client-server a...Naveen Raj
 
Lecture 5 process synchronization
Lecture 5 process synchronizationLecture 5 process synchronization
Lecture 5 process synchronizationKlintonChhun
 
Mis presentation
Mis presentationMis presentation
Mis presentation9868538768
 
Characteristics and Advantages of Database Management System
Characteristics and Advantages of Database Management SystemCharacteristics and Advantages of Database Management System
Characteristics and Advantages of Database Management SystemCharthaGaglani
 

La actualidad más candente (20)

Chapter 8 system analysis and design
Chapter 8   system analysis and designChapter 8   system analysis and design
Chapter 8 system analysis and design
 
System analysis and_design
System analysis and_designSystem analysis and_design
System analysis and_design
 
Xml processors
Xml processorsXml processors
Xml processors
 
behavioral model (DFD & state diagram)
behavioral model (DFD & state diagram)behavioral model (DFD & state diagram)
behavioral model (DFD & state diagram)
 
dss
dssdss
dss
 
L7 decision tree & table
L7 decision tree & tableL7 decision tree & table
L7 decision tree & table
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Transaction Management
Transaction Management Transaction Management
Transaction Management
 
Resource Management for Computer Operating Systems
Resource Management for Computer Operating SystemsResource Management for Computer Operating Systems
Resource Management for Computer Operating Systems
 
Recovery Techniques and Need of Recovery
Recovery Techniques and   Need of RecoveryRecovery Techniques and   Need of Recovery
Recovery Techniques and Need of Recovery
 
Dbms 14: Relational Calculus
Dbms 14: Relational CalculusDbms 14: Relational Calculus
Dbms 14: Relational Calculus
 
Server Consolidation
Server ConsolidationServer Consolidation
Server Consolidation
 
Databases to improve business performance and decision making Client-server a...
Databases to improve business performance and decision making Client-server a...Databases to improve business performance and decision making Client-server a...
Databases to improve business performance and decision making Client-server a...
 
Lecture 5 process synchronization
Lecture 5 process synchronizationLecture 5 process synchronization
Lecture 5 process synchronization
 
Mis presentation
Mis presentationMis presentation
Mis presentation
 
Daa
DaaDaa
Daa
 
Information Management
Information ManagementInformation Management
Information Management
 
Acid properties
Acid propertiesAcid properties
Acid properties
 
System analysis and design
System analysis and designSystem analysis and design
System analysis and design
 
Characteristics and Advantages of Database Management System
Characteristics and Advantages of Database Management SystemCharacteristics and Advantages of Database Management System
Characteristics and Advantages of Database Management System
 

Similar a DataScience.pptx

DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh hasmeerana605
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxsumitkumar600840
 
Data Analytics Course In Hyderabad-October
Data Analytics Course In Hyderabad-OctoberData Analytics Course In Hyderabad-October
Data Analytics Course In Hyderabad-OctoberDataMites
 
Data Analytics Course In Delhi-November
Data Analytics Course In Delhi-NovemberData Analytics Course In Delhi-November
Data Analytics Course In Delhi-NovemberDataMites
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placementSaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science trainingDIGITALSAI1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabadVamsiNihal
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in HyderabadKumarNaik21
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)SayyedYusufali
 

Similar a DataScience.pptx (20)

DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
Data Analytics Course In Hyderabad-October
Data Analytics Course In Hyderabad-OctoberData Analytics Course In Hyderabad-October
Data Analytics Course In Hyderabad-October
 
Data Analytics Course In Delhi-November
Data Analytics Course In Delhi-NovemberData Analytics Course In Delhi-November
Data Analytics Course In Delhi-November
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 

Más de M Vishnuvardhan Reddy (20)

Python Sets_Dictionary.pptx
Python Sets_Dictionary.pptxPython Sets_Dictionary.pptx
Python Sets_Dictionary.pptx
 
Lists_tuples.pptx
Lists_tuples.pptxLists_tuples.pptx
Lists_tuples.pptx
 
Python Control Structures.pptx
Python Control Structures.pptxPython Control Structures.pptx
Python Control Structures.pptx
 
Python Strings.pptx
Python Strings.pptxPython Strings.pptx
Python Strings.pptx
 
Python Basics.pptx
Python Basics.pptxPython Basics.pptx
Python Basics.pptx
 
Python Operators.pptx
Python Operators.pptxPython Operators.pptx
Python Operators.pptx
 
Python Datatypes.pptx
Python Datatypes.pptxPython Datatypes.pptx
Python Datatypes.pptx
 
Html forms
Html formsHtml forms
Html forms
 
Cascading Style Sheets
Cascading Style SheetsCascading Style Sheets
Cascading Style Sheets
 
Java Threads
Java ThreadsJava Threads
Java Threads
 
Java Streams
Java StreamsJava Streams
Java Streams
 
Scanner class
Scanner classScanner class
Scanner class
 
Polymorphism
PolymorphismPolymorphism
Polymorphism
 
Java intro
Java introJava intro
Java intro
 
Java applets
Java appletsJava applets
Java applets
 
Exception handling
Exception handling Exception handling
Exception handling
 
Control structures
Control structuresControl structures
Control structures
 
Constructors
ConstructorsConstructors
Constructors
 
Classes&objects
Classes&objectsClasses&objects
Classes&objects
 
Shell sort
Shell sortShell sort
Shell sort
 

Último

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 

Último (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 

DataScience.pptx

  • 1.
  • 2. »Significance »Advantages »Process of Data Science »Roles in a Data Science Project »Stages of Data Science Project »Responsibilities of Data Scientist »Qualifications of Data Scientist »Data Science vs Big Data »Python Libraries for Data Science
  • 3. M Vishnuvardhan Data Science » Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured » A data scientist manages big data. They take a large amount of data points and use their skills in programming, math, and statistics to organize, and clean them.
  • 4. M Vishnuvardhan Data Science and its Importance The concept of data science is to help unify statistics, machine learning, data analysis, and other related methods. That way people will better understand and analyze information with data. Data Science tends to be used to describe predictive modeling, business intelligence, business analytics, or other uses of data. Data Science is about uncovering hidden information that may be able to help companies make smarter choices for their business. Eg: Spotify recommends new music, Netflix recommends new movies, spam filter in Gmail, recommendation engine in Amazon,
  • 5. M Vishnuvardhan Data Science Advantages » Monetizing the data » Mitigating company risk » Better understanding the customers » Unique insights for businesses » Business Expansion » Improve forecasting » Objective decisions for businesses
  • 6. M Vishnuvardhan Process of Data Science » Frame the problem » Collection of data to solve the problem » Process the data » Explore the data » Perform in-depth analysis » Communication of result analysis
  • 7. M Vishnuvardhan Data Science Project - Roles » Project sponsor » Client » Data scientist » Data architect » Operations
  • 8. M Vishnuvardhan Stages of Data Science Project
  • 9. M Vishnuvardhan Stages of Data Science Project » Define Goal » Data Collection and Management » Modelling » Model evaluation » Presentation and documentation » Model deployment and maintenance
  • 10. M Vishnuvardhan Stages of Data Science Project » Define Goal » Data Collection and Management » Modelling » Model evaluation » Presentation & documentation » Model deployment & maintenance » Why do the sponsors want the project? » What do they lack, and what do they need? » What are they doing to solve the problem, and why isn’t that good enough? » What resources will you need ie., staff, domain experts » How do the project sponsors plan to deploy your results? » What are the constraints that have to be met for successful deployment?
  • 11. M Vishnuvardhan Stages of Data Science Project » Define Goal » Data Collection & Management » Modelling » Model evaluation » Presentation & documentation » Model deployment & maintenance This step includes identifying the data you need, exploring it, and conditioning it to be suitable for analysis » What data is available to me? » Will it help me solve the problem? » Is it enough? » Is the data quality good enough?
  • 12. M Vishnuvardhan Stages of Data Science Project » Define Goal » Data Collection & Management » Modelling » Model evaluation » Presentation & documentation » Model deployment & maintenance Statistics and machine learning is used during modelling stage The most common data science modelling tasks are these: » Classification » Scoring » Ranking » Clustering » Finding relations » Characterization
  • 13. M Vishnuvardhan Stages of Data Science Project » Define Goal » Data Collection & Management » Modelling » Model evaluation » Presentation & documentation » Model deployment & maintenance Once you have a model, you need to determine if it meets your goals: » Is it accurate enough for your needs? Does it generalize well? » Does it perform better than “guess”? Better than whatever estimate you currently use? » Do the results of the model make sense in the context of the problem domain?
  • 14. M Vishnuvardhan Stages of Data Science Project » Define Goal » Data Collection & Management » Modelling » Model evaluation » Presentation & documentation » Model deployment & maintenance Once you have a model that meets your success criteria, you must present your results to your project sponsor and other stakeholders. You must also document the model
  • 15. M Vishnuvardhan Stages of Data Science Project » Define Goal » Data Collection & Management » Modelling » Model evaluation » Presentation & documentation » Model deployment & maintenance model is put into operation. In many organizations this means the data scientist no longer has primary responsibility for the day-to-day operation of the model. But you still should ensure that the model will run smoothly and won’t make disastrous unsupervised decisions. You also want to make sure that the model can be updated as its environment changes.
  • 16. M Vishnuvardhan Responsibilities of a Data Scientist » Recommend the most cost-effective changes that should be made to existing strategies and procedures. » Communicate findings and predictions to IT and management departments through effective reports and visualizations of data. » Come up with new algorithms to figure out problems and create new tools to automate work. » Device data-driven solutions to the challenges that are most pressing. » Examine and explore data from several different angles to find hidden opportunities, weaknesses, and trends.
  • 17. M Vishnuvardhan Responsibilities of a Data Scientist » Prune and clean data to get rid of the irrelevant information. » Employ sophisticated analytics programs, statistical methods, and machine learning to get data ready for use in a prescriptive and predictive modelling. » Extract data from several external and internal sources. » Conduct undirected research and create open-ended questions
  • 18. M Vishnuvardhan Qualifications of Data Scientists- Technical » Cloud tools such as Amazon S3. » Big data platforms such as Hive & Pig, and Hadoop. Python, Perl, Java, C/C++ » SQL databases, as well as database querying languages. » SAS and R languages. Unstructured data techniques. » Data visualization and reporting techniques. Data munging and cleaning. » Data mining » Software engineering skills » Machine learning techniques and tools. Statistics » Mathematics
  • 19. M Vishnuvardhan Qualifications of Data Scientists- Business » Industry knowledge: It’s important to understand how your chosen industry works and how the data is utilized, collected, and analyzed. » Intellectual curiosity: Data Scientists have to explore new territories and find unusual and creative ways to solve problems. » Effective communication: Data Scientists have to explain their discoveries and techniques to non-technical and technical audiences in a way that they can understand. » Analytic problem-solving: Data Scientists approach high-level challenges with clear eyes on what is important. They employ the right methods and approaches to create the best use of human resources and time
  • 20. M Vishnuvardhan Data Science and Big Data Big data refers to the large group of heterogeneous data that comes from various sources. This data encompasses all different types of data; unstructured, semi-structured, and structured information that can be found easily throughout the internet. Big data includes: » Structured data: transaction data, OLTP, RDBMS, and other structured formats. » Semi-Structured: text files, system log files, XML files, etc. » Unstructured data: web pages, sensor data, mobile data, online data sources, digital audio, and video feeds, digital images, tweets, blogs, emails, social networks, and other sources.
  • 21. M Vishnuvardhan Data Science and Big Data Big Data Data Science M e a n i n g • Large volumes of data that can’t be handled using a normal database program. • Characterized by velocity, volume, and variety. • Data focused scientific activity. • Similar in nature to data mining. • Harnesses the potential of big data to support business decisions. • Includes approaches to process big data.
  • 22. M Vishnuvardhan Data Science and Big Data Big Data Data Science C o n c e p t • Includes all formats and types of data. • Diverse data types are generated from several different sources. • Helps organizations make decisions. • Provides techniques to help extract insights and information to create large datasets.
  • 23. M Vishnuvardhan Data Science and Big Data Big Data Data Science B a s i s o f F o r m a t i o n • Data is generated from system logs. • Data is created in organizations – emails, spreadsheets, DB, transactions, and so on. • Online discussion forums. • Video and audio streams that include live feeds. • Electronic devices – RFID, sensors, and so on. • Internet traffic and users. • Working apps are made by programming developed models. • It captures complex patterns from big data and developed models. • It is related to data analysis, preparation, and filtering. • Applies scientific methods to find the knowledge in big data.
  • 24. M Vishnuvardhan Data Science and Big Data Big Data Data Science A p p l i c a t i o n A r e a s • Security and law enforcement. • Research and development. • Commerce. • Sports and health. • Performance optimization. • Optimizing business processes. • Telecommunications. • Financial services. • Web development. • Fraud and risk detection. • Image and speech recognition. • Search recommenders. • Digital advertisements. • Internet search. • Other miscellaneous areas and utilities
  • 25. M Vishnuvardhan Data Science and Big Data Big Data Data Science A p p r o a c h • To understand the market and to gain new customers. • To find sustainability. • To establish realistic ROI and metrics. • To leverage datasets for the advantage of the business. • To gain competitiveness. • To develop business agility. • Data Visualization and prediction. • Data destroy, preserve, publishing, processing, preparation, or acquisition. • Programming skills, like NoSQL, SQL, and Hadoop platforms. • State-of-the-art algorithms and techniques for data mining. • Involves the extensive use of statistics, mathematics, and other tools.
  • 26. M Vishnuvardhan Python Libraries used for Data Science » NumPy: is a Python module for numerical computation that can process massive amounts of data and perform array computations. NumPy integrates seamlessly with other libraries commonly used in data science, such as pandas and Matplotlib. » Matplotlib: Matplotlib is a visualization-building plotting package that is used to plot graphs and charts. It is frequently utilized for data analysis due to the charts and histograms that it generates. » Seaborn: A Matplotlib-based package is used to make visualizations that are more enticing and instructive. These include themes, color palettes, and custom fonts.
  • 27. M Vishnuvardhan Python Libraries used for Data Science » Scikit-learn: is a machine learning package for Python that offers practical tools for data analysis and mining » TensorFlow: An open-source software framework created by Google called TensorFlow enables dataflow and differentiable programming for a variety of purposes, including machine learning. TensorFlow used to run ML algorithms on Smartphones, the internet, and the cloud. » Keras: Keras is a Python-based high-level neural network API that can operate on top of TensorFlow, CNTK, or Theano. It was created with the goal of allowing for quick experimentation. It supports neural network layers, activation functions, loss functions, and optimizers that are typical in neural networks.
  • 28. M Vishnuvardhan Python Libraries used for Data Science » PyTorch : is an open-source machine learning library used for tasks like computer vision and natural language processing. It was created by Facebook's AI research team and is extensively used in both business and academia. PyTorch supports distributed computation, enabling quick and effective model training on huge datasets. » Pandas: Pandas is a popular data science library. It provides a range of functions for data manipulation, data analysis, and data visualization, » Statsmodels: provides a range of statistical models as well as tools for data scientists. The models include linear and logistic regression or generalized linear models. It also easily integrates seamlessly with Pandas, to analyze and visualize data stored in data frames.
  • 29. M Vishnuvardhan Python Libraries used for Data Science » NLTK or Natural Language Toolkit: It is used for natural language processing. Some data scientists deal with the analysis of natural language data. It provides a range of functions for text processing. It also offers functions for sentiment analysis, which is the process of determining the sentiment or opinion expressed in a piece of text.