SlideShare una empresa de Scribd logo
1 de 28
ICT 3202 - INTRODUCTION
TO DATA SCIENCE
BY
ENGR. JOHNSON C. UBAH
B.ENG, M.ENG, HCNA, ASM
Course description
This course is an introduction to data science. The major goals
of this course are to learn how to use tools for acquiring,
cleaning, analyzing, exploring, and visualizing data; making
data-driven inferences and decisions; and effectively
communicating results. For practical purposes one may work
with Python, Octave/Matlab, ...
Fields to be covered
 Data mining
 Statistics
 Machine learning
 Information visualization
 Network analysis
 Natural language processing
 Algorithms
 Software engineering
 Databases
 Distributed systems
 Big data
Introduction
Data science is an inter-disciplinary field that uses scientific
methods, processes, algorithms and systems to
extract knowledge and insights from many structural
and unstructured data
Data science is related to data mining and big data.
Introduction to data science
Data science is a "concept to unify statistics, data analysis, machine
learning and their related methods" in order to "understand and
analyze actual phenomena" with data.[3] It employs techniques and
theories drawn from many fields within the context
of mathematics, statistics, computer science, and information
science.
Big data
Big Data refers to a huge volume of data that can be structured,
semi-structured and unstructured. It comprises of 5 Vs i.e.
Volume: It refers to an amount of data or size of data that can be in
quintillion when comes to big data.
Variety: It refers to different types of data like social media, web
server logs etc.
Big Data
Velocity: It refers to how fast data is growing, data is exponentially
growing and at a very fast rate.
Veracity: It refers to an uncertainty of data like social media means if the
data can be trusted or not.
Value: It refers to the data which we are storing and processing is worth
and how we are getting benefit from this huge amount of data.
Structured data
Data that is the easiest to search and organize, because it is usually
contained in rows and columns and its elements can be mapped into
fixed pre-defined fields, is known as structured data.
Often structured data is managed using Structured Query Language
(SQL)—a programming software language developed by IBM in the
1970s for relational databases.
Structured data
Examples of structured data include financial data such as accounting
transactions, address details, demographic information, star ratings
by customers, machines logs, location data from smart phones and
smart devices, etc.
Today, most estimate structured data accounts for less than 20
percent of all data.
Unstructured data
A much bigger percentage of all the data is our world is unstructured data.
Unstructured data is data that cannot be contained in a row-column
database and doesn’t have an associated data model.
Think of the text of an email message. The lack of structure made
unstructured data more difficult to search, manage and analyse.
Unstructured data
Other examples of unstructured data include photos, video and audio
files, text files, social media content, satellite imagery, presentations,
PDFs, open-ended survey responses, websites and call center
transcripts/recordings.
Instead of spreadsheets or relational databases, unstructured data is
usually stored in data lakes, NoSQL databases, applications and data
warehouses.
Semi-structured data
Beyond structured and unstructured data, there is a third category, which
basically is a mix between both of them.
The type of data defined as semi-structured data has some defining or
consistent characteristics but doesn’t conform to a structure as rigid as is
expected with a relational database.
Therefore, there are some organizational properties such as semantic tags
or metadata to make it easier to organize, but there’s still fluidity in the data.
Email messages are a good example.
While the actual content is unstructured, it does contain structured data such as
name and email address of sender and recipient, time sent, etc.
Another example is a digital photograph.
The image itself is unstructured, but if the photo was taken on a smart phone,
for example, it would be date and time stamped, geo tagged, and would have a
device ID
Semi-structured data
Big data can be analyzed for insights that
lead to better decisions and strategic
business moves.
How much data does it take to
be called Big Data?
Usually, data which is equal to or greater than 1 Tb known as Big
Data. Analysts predict that by 2020, there will be 5,200 Gbs of data
on every person in the world.
Example: On average, people spend about 50 million tweets per day,
Walmart processes 1 million customer transaction per hour.
Why is Big Data Important?
The importance of Big Data does not mean how much data we have
but what would you get out of that data. We can analyze data to
reduce cost and time, smart decision making etc.
Challenges:
Storing such a huge amount of data efficiently.
How do we process and extract valuable information from this huge
amount of data within a given timeframe?
Solution: Hadoop and Spark framework
Data Mining
Data Mining also known as Knowledge Discovery of Data
refers to extracting knowledge from a large amount of data
i.e. Big Data. It is mainly used in statistics, machine
learning and artificial intelligence. It is the step of the
“Knowledge discovery in databases”.
Data Mining basics
The components of data mining mainly consist of 5 levels, those are:
–
1. Extract, transform and load data into warehouse
2. Store and manage
3. Provide data access (Communication)
4. Analyze (Process)
5. User Interface (Present data to user)
Need for Data Mining
Analyze relationship and patterns in stored transaction data to get
information which will help for better business decisions.
Data mining helps in Credit ratings, targeted marketing, Fraud
detection like which types of transactions are like to be a fraud by
checking the past transactions of a user, checking customer relationship
like which customers are loyal and which will leave for other company.
We can do 4 relationships using data mining:
1. Classes: It is used to locate the target
2. Clusters: It will group the data items to logical relation
3. Association: Relationship between data
4. Sequential Pattern: To anticipate behavioral patterns and trends.
Challenges in Data Mining
1. Mining different types of Knowledge in databases
2. Handling noise and incomplete data
3. Efficiency and scaling of data mining algorithms
4. Handling relational and complex types of data
5. Protection of data security, integrity, and privacy
Head To Head Comparison
Between Big Data vs Data Mining
Big Data and Data Mining are two different concepts, Big data is a
term which refers to a large amount of data whereas data
mining refers to deep drive into the data to extract the key
knowledge/Pattern/Information from a small or large amount of
data.
The main concept in Data Mining is to dig deep into analyzing the
patterns and relationships of data that can be used further in
Artificial Intelligence, Predictive Analysis etc. But the main concept
in Big Data is the source, variety, volume of data and how to store
and process this amount of data.
Head To Head Comparison
Between Big Data vs Data Mining
Analyzing of Big data to give a business solution or to make a
business definition plays a crucial role to determine growth.
Data Mining does not depend on Big Data as it can be done on the
small or large amount of data but big data surely depends on Data
Mining because if we are not able to find the value/importance of a
large amount of data then that data is of no use.
Head To Head Comparison
Between Big Data vs Data Mining
Head To Head Comparison
Between Big Data vs Data Mining
Features Data mining Big Data
Focus It mainly focuses on
lots of details of a data
It mainly focuses on
lots of relationship
between data
View It is a close-up view of
data
It is Big picture of data
Data It expresses what
about data
It expresses why of the
data
Volume It can be used for small
data or big data
It refers to a large
amount of data set
Head To Head Comparison
Between Big Data vs Data Mining
Features Data Mining Bid Data
Definition It is a technique for
analyzing data
It is a concept than a
precise term
Data types Structured data, relational
and dimensional database
Structured, semi-structured
and unstructured data (in
NoSQL)
Analysis Mainly statistical analysis,
focus on prediction and
discovery of business
factors on small scale
Mainly data analysis, focus
on prediction and discovery
of business factors on large
scale.
Result Mainly for strategic
decision making
Dashboards and predictive
measures.
Big data only refers to only a large amount of data and all the big data
solutions depends on the availability of data. It can be considered as the
combination of Business Intelligence and Data Mining.
Data mining uses different kinds of tools and software on Big data to
return specific results. It is mainly “looking for a needle in a haystack”
In short, big data is the asset and data mining is the manager of that is
used to provide beneficial results.
Thank you!!!
QUESTION

Más contenido relacionado

La actualidad más candente

GROUP PROJECT REPORT_FY6055_FX7378
GROUP PROJECT REPORT_FY6055_FX7378GROUP PROJECT REPORT_FY6055_FX7378
GROUP PROJECT REPORT_FY6055_FX7378
Parag Kapile
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Mining
samiksha sharma
 
Data mining by_ashok
Data mining by_ashokData mining by_ashok
Data mining by_ashok
Ashok Kumar
 

La actualidad más candente (18)

Dwdm
DwdmDwdm
Dwdm
 
Unit 2
Unit 2Unit 2
Unit 2
 
Data mining services
Data mining servicesData mining services
Data mining services
 
GROUP PROJECT REPORT_FY6055_FX7378
GROUP PROJECT REPORT_FY6055_FX7378GROUP PROJECT REPORT_FY6055_FX7378
GROUP PROJECT REPORT_FY6055_FX7378
 
Bi assignment
Bi assignmentBi assignment
Bi assignment
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Data mining
Data miningData mining
Data mining
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Big Data and Classification
Big Data and ClassificationBig Data and Classification
Big Data and Classification
 
Data mining
Data miningData mining
Data mining
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Mining
 
Data mining by_ashok
Data mining by_ashokData mining by_ashok
Data mining by_ashok
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 

Similar a introduction to data science

Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
rajsharma159890
 

Similar a introduction to data science (20)

1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
365 Data Science
365 Data Science365 Data Science
365 Data Science
 
Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.
 
An Comprehensive Study of Big Data Environment and its Challenges.
An Comprehensive Study of Big Data Environment and its Challenges.An Comprehensive Study of Big Data Environment and its Challenges.
An Comprehensive Study of Big Data Environment and its Challenges.
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analytics
 
Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)
 
Chapter 2 - Intro to Data Sciences[2].pptx
Chapter 2 - Intro to Data Sciences[2].pptxChapter 2 - Intro to Data Sciences[2].pptx
Chapter 2 - Intro to Data Sciences[2].pptx
 
(Big) Data infographic - EnjoyDigitAll by BNP Paribas
(Big) Data infographic - EnjoyDigitAll by BNP Paribas(Big) Data infographic - EnjoyDigitAll by BNP Paribas
(Big) Data infographic - EnjoyDigitAll by BNP Paribas
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Analytics for actuaries cia
Analytics for actuaries ciaAnalytics for actuaries cia
Analytics for actuaries cia
 
notes_dmdw_chap1.docx
notes_dmdw_chap1.docxnotes_dmdw_chap1.docx
notes_dmdw_chap1.docx
 
Intro to big data and applications - day 1
Intro to big data and applications - day 1Intro to big data and applications - day 1
Intro to big data and applications - day 1
 
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Data science and data analytics major similarities and distinctions (1)
Data science and data analytics  major similarities and distinctions (1)Data science and data analytics  major similarities and distinctions (1)
Data science and data analytics major similarities and distinctions (1)
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 

Más de Johnson Ubah (7)

Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Statistical inference with Python
Statistical inference with PythonStatistical inference with Python
Statistical inference with Python
 
Lecture 3 intro2data
Lecture 3 intro2dataLecture 3 intro2data
Lecture 3 intro2data
 
IP Addressing
IP AddressingIP Addressing
IP Addressing
 
OSI reference Model
OSI reference ModelOSI reference Model
OSI reference Model
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
Network and computer forensics
Network and computer forensicsNetwork and computer forensics
Network and computer forensics
 

Último

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 

Último (20)

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 

introduction to data science

  • 1. ICT 3202 - INTRODUCTION TO DATA SCIENCE BY ENGR. JOHNSON C. UBAH B.ENG, M.ENG, HCNA, ASM
  • 2. Course description This course is an introduction to data science. The major goals of this course are to learn how to use tools for acquiring, cleaning, analyzing, exploring, and visualizing data; making data-driven inferences and decisions; and effectively communicating results. For practical purposes one may work with Python, Octave/Matlab, ...
  • 3. Fields to be covered  Data mining  Statistics  Machine learning  Information visualization  Network analysis  Natural language processing  Algorithms  Software engineering  Databases  Distributed systems  Big data
  • 4. Introduction Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data Data science is related to data mining and big data.
  • 5. Introduction to data science Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data.[3] It employs techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science.
  • 6. Big data Big Data refers to a huge volume of data that can be structured, semi-structured and unstructured. It comprises of 5 Vs i.e. Volume: It refers to an amount of data or size of data that can be in quintillion when comes to big data. Variety: It refers to different types of data like social media, web server logs etc.
  • 7. Big Data Velocity: It refers to how fast data is growing, data is exponentially growing and at a very fast rate. Veracity: It refers to an uncertainty of data like social media means if the data can be trusted or not. Value: It refers to the data which we are storing and processing is worth and how we are getting benefit from this huge amount of data.
  • 8. Structured data Data that is the easiest to search and organize, because it is usually contained in rows and columns and its elements can be mapped into fixed pre-defined fields, is known as structured data. Often structured data is managed using Structured Query Language (SQL)—a programming software language developed by IBM in the 1970s for relational databases.
  • 9. Structured data Examples of structured data include financial data such as accounting transactions, address details, demographic information, star ratings by customers, machines logs, location data from smart phones and smart devices, etc. Today, most estimate structured data accounts for less than 20 percent of all data.
  • 10. Unstructured data A much bigger percentage of all the data is our world is unstructured data. Unstructured data is data that cannot be contained in a row-column database and doesn’t have an associated data model. Think of the text of an email message. The lack of structure made unstructured data more difficult to search, manage and analyse.
  • 11. Unstructured data Other examples of unstructured data include photos, video and audio files, text files, social media content, satellite imagery, presentations, PDFs, open-ended survey responses, websites and call center transcripts/recordings. Instead of spreadsheets or relational databases, unstructured data is usually stored in data lakes, NoSQL databases, applications and data warehouses.
  • 12. Semi-structured data Beyond structured and unstructured data, there is a third category, which basically is a mix between both of them. The type of data defined as semi-structured data has some defining or consistent characteristics but doesn’t conform to a structure as rigid as is expected with a relational database. Therefore, there are some organizational properties such as semantic tags or metadata to make it easier to organize, but there’s still fluidity in the data.
  • 13. Email messages are a good example. While the actual content is unstructured, it does contain structured data such as name and email address of sender and recipient, time sent, etc. Another example is a digital photograph. The image itself is unstructured, but if the photo was taken on a smart phone, for example, it would be date and time stamped, geo tagged, and would have a device ID Semi-structured data
  • 14. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
  • 15. How much data does it take to be called Big Data? Usually, data which is equal to or greater than 1 Tb known as Big Data. Analysts predict that by 2020, there will be 5,200 Gbs of data on every person in the world. Example: On average, people spend about 50 million tweets per day, Walmart processes 1 million customer transaction per hour.
  • 16. Why is Big Data Important? The importance of Big Data does not mean how much data we have but what would you get out of that data. We can analyze data to reduce cost and time, smart decision making etc. Challenges: Storing such a huge amount of data efficiently. How do we process and extract valuable information from this huge amount of data within a given timeframe? Solution: Hadoop and Spark framework
  • 17. Data Mining Data Mining also known as Knowledge Discovery of Data refers to extracting knowledge from a large amount of data i.e. Big Data. It is mainly used in statistics, machine learning and artificial intelligence. It is the step of the “Knowledge discovery in databases”.
  • 18. Data Mining basics The components of data mining mainly consist of 5 levels, those are: – 1. Extract, transform and load data into warehouse 2. Store and manage 3. Provide data access (Communication) 4. Analyze (Process) 5. User Interface (Present data to user)
  • 19. Need for Data Mining Analyze relationship and patterns in stored transaction data to get information which will help for better business decisions. Data mining helps in Credit ratings, targeted marketing, Fraud detection like which types of transactions are like to be a fraud by checking the past transactions of a user, checking customer relationship like which customers are loyal and which will leave for other company.
  • 20. We can do 4 relationships using data mining: 1. Classes: It is used to locate the target 2. Clusters: It will group the data items to logical relation 3. Association: Relationship between data 4. Sequential Pattern: To anticipate behavioral patterns and trends.
  • 21. Challenges in Data Mining 1. Mining different types of Knowledge in databases 2. Handling noise and incomplete data 3. Efficiency and scaling of data mining algorithms 4. Handling relational and complex types of data 5. Protection of data security, integrity, and privacy
  • 22. Head To Head Comparison Between Big Data vs Data Mining Big Data and Data Mining are two different concepts, Big data is a term which refers to a large amount of data whereas data mining refers to deep drive into the data to extract the key knowledge/Pattern/Information from a small or large amount of data.
  • 23. The main concept in Data Mining is to dig deep into analyzing the patterns and relationships of data that can be used further in Artificial Intelligence, Predictive Analysis etc. But the main concept in Big Data is the source, variety, volume of data and how to store and process this amount of data. Head To Head Comparison Between Big Data vs Data Mining
  • 24. Analyzing of Big data to give a business solution or to make a business definition plays a crucial role to determine growth. Data Mining does not depend on Big Data as it can be done on the small or large amount of data but big data surely depends on Data Mining because if we are not able to find the value/importance of a large amount of data then that data is of no use. Head To Head Comparison Between Big Data vs Data Mining
  • 25. Head To Head Comparison Between Big Data vs Data Mining Features Data mining Big Data Focus It mainly focuses on lots of details of a data It mainly focuses on lots of relationship between data View It is a close-up view of data It is Big picture of data Data It expresses what about data It expresses why of the data Volume It can be used for small data or big data It refers to a large amount of data set
  • 26. Head To Head Comparison Between Big Data vs Data Mining Features Data Mining Bid Data Definition It is a technique for analyzing data It is a concept than a precise term Data types Structured data, relational and dimensional database Structured, semi-structured and unstructured data (in NoSQL) Analysis Mainly statistical analysis, focus on prediction and discovery of business factors on small scale Mainly data analysis, focus on prediction and discovery of business factors on large scale. Result Mainly for strategic decision making Dashboards and predictive measures.
  • 27. Big data only refers to only a large amount of data and all the big data solutions depends on the availability of data. It can be considered as the combination of Business Intelligence and Data Mining. Data mining uses different kinds of tools and software on Big data to return specific results. It is mainly “looking for a needle in a haystack” In short, big data is the asset and data mining is the manager of that is used to provide beneficial results.