Introduction to Data Science.pptx

A
CS3352 -Foundations of Data
Science
Introduction to Data Science
Dr.V.Anusuya
Associate Professor/IT
Ramco Institute of Technology
Rajapalayam
Introduction
• Data Science is the area of study which involves extracting
insights from vast amounts of data using various scientific
methods, algorithms and processes.
• Data Science is useful to discover hidden patterns from the
voluminous raw data.
• Data Science is because of the evolution of mathematical
statistics, data analysis, and big data.
Contd.,
• Ingredients = Data
• When a chef is starting out with a new dish.
• After the chef has determined what type of dish he would like to
make, he goes to the fridge to gather ingredients. If he doesn’t
have the necessary ingredients, he takes a trip to the store to
collect them.
• As data scientists, our data points are our ingredients. We may
have some on hand, but we still might have to collect more
through web scraping, SQL queries, etc.
Big Data
• Data Science focuses on processing the huge
volume of heterogeneous data known as big
data.
• Big data refers to the data sets that are large
and complex in nature and difficult to process
using traditional data processing application
software.
Contd.,
• The characteristics of big data are often referred
to as the three Vs:
• Volume—How much data is there?
• Variety—How diverse are different types of
data?
• Velocity—At what speed is new data generated?
• Veracity-How accurate is the data?.
• These four properties make big data different
from the data found in traditional data
management tools.
Contd.,
• The four V’s of data make the data
capturing,cleaning,preprocessing,storing,search
ing,sharing,transferring and visualization
processes a complex task.
• Specialization techniques are required to extract
the insights from this huge volume of data.
• The main things that set a data scientist apart
from a statistician are the ability to work with
big data and experience in machine learning,
computing, and algorithm building.
Big data
• Tools- Hadoop, Pig, Spark, R, Python, and Java.
• Python is a great language for data science
because it has many data science libraries
available, and it’s widely supported by
specialized software.
Contd.,
• Data Science is an interdisciplinary field that
allows to extract knowledge from structured
or unstructured data.
• Translate business problem and Research
project into practical solution.
Benefits and uses of data science
• Data science and big data are used almost everywhere
in both commercial and noncommercial Settings.
• Commercial companies in almost every industry use
data science and big data to gain insights into their
customers, processes, staff, completion, and products.
• Many companies use data science to offer customers a
better user experience, as well as to cross-sell, up-sell,
and personalize their offerings.
Benefits and uses of data science
• Governmental organizations are also aware of data’s value. Many
governmental organizations not only rely on internal data scientists to
discover valuable information, but also share their data with the
public.
• Nongovernmental organizations (NGOs) use it to raise money and
defend their causes.
• Universities use data science in their research but also to enhance the
study experience of their students. The rise of massive open online
courses (MOOC) produces a lot of data, which allows universities to
study how this type of learning can complement traditional classes.
Facets of data
• In data science and big data we’ll come across
many different types of data, and each of them
tends to require different tools and techniques. The
main categories of data are these:
• Structured
• Unstructured
• Natural language
• Machine-generated
• Graph-based
• Audio, video, and images
• Streaming
Structured Data
•Structured data is data that depends on a
data model and resides in a fixed field within
a record.
•To store structured data in tables within
databases or Excel files.
•SQL, or Structured Query Language, is the
preferred way to manage and query data
that resides in databases.
•Hierarchical data such as a family tree.
Example
Unstructured Data
• Unstructured data is data that isn’t easy to fit
into a data model because the content is
context-specific or varying.
• Example of unstructured data is your regular
email.
• Although email contains structured elements
such as the sender, title, and body text.
• The thousands of different languages and
dialects out there further complicate this.
Example
Natural language
• Natural language is a special type of unstructured data; it’s challenging to
process because it requires knowledge of specific data science techniques
and linguistics.
• The natural language processing community has had success in entity
recognition, Speech recognition, summarization, text completion, and
sentiment analysis, but models trained in one domain don’t generalize well to
other domains.
• Even state-of-the-art techniques aren’t able to decipher the meaning of every
piece of text.
• Example: personal voice Assistant, siri, Alexa recognize patterns in speech,
provide a useful response.
• Example: Email Filters- Primary, Social ,Promotions
Machine-generated data
• Machine-generated data is information that’s
automatically created by a computer, process,
application, or other machine without human
intervention.
• Machine-generated data is becoming a major
data resource and will continue to do so.
• The analysis of machine data relies on highly
scalable tools, due to its high volume and
speed. Examples of machine data are web
server logs, call detail records, network event
logs, and telemetry.
Introduction to Data Science.pptx
Graph-based or Networked Data
• A graph is a mathematical structure to model
pair-wise relationships between objects.
• Graph or network data is, in short, data that
focuses on the relationship or adjacency of
objects.
• The graph structures use nodes, edges, and
properties to represent and store graphical data.
Contd.,
• Graph-based data is a natural way to represent
social networks, and its structure allows you to
calculate specific metrics such as the influence
of a person and the shortest path between two
people.
• Example
• LinkedIn-business colleagues, Twitter-
follower list.
• Facebook-connecting edges here to show
friends.
Graph based structure
Specialized query languages such as SPARQL
Audio, image, and video
• Audio, image, and video are data types that
pose specific challenges to a data scientist.
• Tasks that are trivial for humans, such as
recognizing objects in pictures, to be challenging
for computers.
• MLBAM (Major League Baseball Advanced
Media) announced in 2014 that they’ll increase
video capture to approximately 7 TB per game
for the purpose of live, in-game analytics.
Contd.,
• Recently a company called DeepMind
succeeded at creating an algorithm that’s
capable of learning how to play video games.
• This algorithm takes the video screen as input
and learns to interpret everything via a
complex of deep learning.
Streaming Data
• The data flows into the system when an event
happens instead of being loaded into a data
store in a batch.
• Examples are the “What’s trending” on
Twitter, live sporting or music events, and the
stock market.
1 de 24

Recomendados

Data science Big Data por
Data science Big DataData science Big Data
Data science Big Datasreekanthricky
661 vistas16 diapositivas
Data Science Project Lifecycle por
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project LifecycleJason Geng
4.5K vistas11 diapositivas
Data science unit1 por
Data science unit1Data science unit1
Data science unit1varshakumar21
3K vistas46 diapositivas
4.3 multimedia datamining por
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia dataminingKrish_ver2
16.9K vistas17 diapositivas
Data warehousing and online analytical processing por
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processingVijayasankariS
405 vistas40 diapositivas
What Is DATA MINING(INTRODUCTION) por
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)Pratik Tambekar
3.5K vistas28 diapositivas

Más contenido relacionado

La actualidad más candente

Research management tools por
Research management toolsResearch management tools
Research management toolsUniversity of North Carolina at Greensboro
975 vistas61 diapositivas
Data mining primitives por
Data mining primitivesData mining primitives
Data mining primitiveslavanya marichamy
17.7K vistas17 diapositivas
Unit 1 - R Programming (Part 2).pptx por
Unit 1 - R Programming (Part 2).pptxUnit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptxMalla Reddy University
459 vistas67 diapositivas
Latent Semantic Indexing For Information Retrieval por
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalSudarsun Santhiappan
9.1K vistas27 diapositivas
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing por
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
10.5K vistas64 diapositivas
Dbms notes por
Dbms notesDbms notes
Dbms notesProf. Dr. K. Adisesha
1.5K vistas50 diapositivas

La actualidad más candente(20)

Latent Semantic Indexing For Information Retrieval por Sudarsun Santhiappan
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
Sudarsun Santhiappan9.1K vistas
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing por Salah Amean
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Salah Amean10.5K vistas
Database Management System, Lecture-1 por Sonia Mim
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1
Sonia Mim513 vistas
Introduction to dataset por datamantra
Introduction to datasetIntroduction to dataset
Introduction to dataset
datamantra2.9K vistas
Data Visualization por Tarek Amr
Data VisualizationData Visualization
Data Visualization
Tarek Amr5.1K vistas
Database system concepts por Kumar
Database system conceptsDatabase system concepts
Database system concepts
Kumar 11.6K vistas
Data mining :Concepts and Techniques Chapter 2, data por Salah Amean
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
Salah Amean25.2K vistas
Exploratory Data Analysis in Spark por datamantra
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
datamantra2.4K vistas
Exploratory data analysis por Gramener
Exploratory data analysisExploratory data analysis
Exploratory data analysis
Gramener7.6K vistas
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption por DataWorks Summit
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
DataWorks Summit2.1K vistas
Big data and data science overview por Colleen Farrelly
Big data and data science overviewBig data and data science overview
Big data and data science overview
Colleen Farrelly3.9K vistas
Chapter07 determining system requirements por Dhani Ahmad
Chapter07 determining system requirementsChapter07 determining system requirements
Chapter07 determining system requirements
Dhani Ahmad7.6K vistas
Statistics for data scientists por Ajay Ohri
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
Ajay Ohri14.2K vistas
Data Mining Concepts por Dung Nguyen
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
Dung Nguyen59K vistas

Similar a Introduction to Data Science.pptx

Data science.chapter-1,2,3 por
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
643 vistas44 diapositivas
ch2 DS.pptx por
ch2 DS.pptxch2 DS.pptx
ch2 DS.pptxderbew2112
3 vistas34 diapositivas
Big data ppt por
Big data pptBig data ppt
Big data pptDeepika ParthaSarathy
6.3K vistas34 diapositivas
Digital data por
Digital dataDigital data
Digital dataShivanandaVSeeri
1.4K vistas80 diapositivas
Digital Types por
Digital TypesDigital Types
Digital TypesShivanandaVSeeri
139 vistas80 diapositivas
Big data por
Big dataBig data
Big dataRameshwariPatil3
105 vistas18 diapositivas

Similar a Introduction to Data Science.pptx(20)

Data science.chapter-1,2,3 por varshakumar21
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
varshakumar21643 vistas
Behind the scenes of data science por Loïc Lejoly
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data science
Loïc Lejoly101 vistas
BIg Data Overview por dimantoku
BIg Data OverviewBIg Data Overview
BIg Data Overview
dimantoku68 vistas
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION por Elvis Muyanja
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
Elvis Muyanja1.6K vistas
01-Introduction.pdf por ngVnThng12
01-Introduction.pdf01-Introduction.pdf
01-Introduction.pdf
ngVnThng126 vistas
Introduction to Big Data Analytics por Utkarsh Sharma
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
Utkarsh Sharma262 vistas
Data Engineer vs Data Scientist vs Data Analyst.pptx por CarolineRebeccaD
Data Engineer vs Data Scientist vs Data Analyst.pptxData Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
CarolineRebeccaD188 vistas
Chapter 2 - Introduction to Data Science.pptx por Wollo UNiversity
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
Wollo UNiversity21 vistas
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat... por Denodo
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Denodo 61 vistas

Más de Anusuya123

Types of Data-Introduction.pptx por
Types of Data-Introduction.pptxTypes of Data-Introduction.pptx
Types of Data-Introduction.pptxAnusuya123
6 vistas36 diapositivas
Basic Statistical Descriptions of Data.pptx por
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxAnusuya123
107 vistas22 diapositivas
Data warehousing.pptx por
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptxAnusuya123
56 vistas8 diapositivas
Unit 1-Data Science Process Overview.pptx por
Unit 1-Data Science Process Overview.pptxUnit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptxAnusuya123
93 vistas60 diapositivas
5.2.2. Memory Consistency Models.pptx por
5.2.2. Memory Consistency Models.pptx5.2.2. Memory Consistency Models.pptx
5.2.2. Memory Consistency Models.pptxAnusuya123
9 vistas39 diapositivas
5.1.3. Chord.pptx por
5.1.3. Chord.pptx5.1.3. Chord.pptx
5.1.3. Chord.pptxAnusuya123
2 vistas32 diapositivas

Más de Anusuya123(12)

Types of Data-Introduction.pptx por Anusuya123
Types of Data-Introduction.pptxTypes of Data-Introduction.pptx
Types of Data-Introduction.pptx
Anusuya1236 vistas
Basic Statistical Descriptions of Data.pptx por Anusuya123
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
Anusuya123107 vistas
Data warehousing.pptx por Anusuya123
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
Anusuya12356 vistas
Unit 1-Data Science Process Overview.pptx por Anusuya123
Unit 1-Data Science Process Overview.pptxUnit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptx
Anusuya12393 vistas
5.2.2. Memory Consistency Models.pptx por Anusuya123
5.2.2. Memory Consistency Models.pptx5.2.2. Memory Consistency Models.pptx
5.2.2. Memory Consistency Models.pptx
Anusuya1239 vistas
5.1.3. Chord.pptx por Anusuya123
5.1.3. Chord.pptx5.1.3. Chord.pptx
5.1.3. Chord.pptx
Anusuya1232 vistas
3. Descriptive statistics.ppt por Anusuya123
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.ppt
Anusuya1233 vistas
Runtimeenvironment por Anusuya123
RuntimeenvironmentRuntimeenvironment
Runtimeenvironment
Anusuya123124 vistas
Think pair share por Anusuya123
Think pair shareThink pair share
Think pair share
Anusuya12354 vistas
Lexical analyzer generator lex por Anusuya123
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lex
Anusuya1234.5K vistas
Operators in Python por Anusuya123
Operators in PythonOperators in Python
Operators in Python
Anusuya1232K vistas

Último

Module-1, Chapter-2 Data Types, Variables, and Arrays por
Module-1, Chapter-2 Data Types, Variables, and ArraysModule-1, Chapter-2 Data Types, Variables, and Arrays
Module-1, Chapter-2 Data Types, Variables, and ArraysDemian Antony D'Mello
19 vistas44 diapositivas
PIT Interpretation - Use only PIT-W EN.ppt por
PIT Interpretation - Use only PIT-W EN.pptPIT Interpretation - Use only PIT-W EN.ppt
PIT Interpretation - Use only PIT-W EN.pptRabindra Shrestha
5 vistas45 diapositivas
Global airborne satcom market report por
Global airborne satcom market reportGlobal airborne satcom market report
Global airborne satcom market reportdefencereport78
8 vistas13 diapositivas
Details of Acoustic Liner for selection of material por
Details of Acoustic Liner for selection of materialDetails of Acoustic Liner for selection of material
Details of Acoustic Liner for selection of materialrafiqalisyed
5 vistas1 diapositiva
CCNA_questions_2021.pdf por
CCNA_questions_2021.pdfCCNA_questions_2021.pdf
CCNA_questions_2021.pdfVUPHUONGTHAO9
11 vistas196 diapositivas
GPS Survery Presentation/ Slides por
GPS Survery Presentation/ SlidesGPS Survery Presentation/ Slides
GPS Survery Presentation/ SlidesOmarFarukEmon1
9 vistas13 diapositivas

Último(20)

Details of Acoustic Liner for selection of material por rafiqalisyed
Details of Acoustic Liner for selection of materialDetails of Acoustic Liner for selection of material
Details of Acoustic Liner for selection of material
rafiqalisyed5 vistas
Different type of computer networks .pptx por nazmul1514788
Different  type of computer networks .pptxDifferent  type of computer networks .pptx
Different type of computer networks .pptx
nazmul151478820 vistas
IRJET-Productivity Enhancement Using Method Study.pdf por SahilBavdhankar
IRJET-Productivity Enhancement Using Method Study.pdfIRJET-Productivity Enhancement Using Method Study.pdf
IRJET-Productivity Enhancement Using Method Study.pdf
SahilBavdhankar11 vistas
Programmable Logic Devices : SPLD and CPLD por Usha Mehta
Programmable Logic Devices : SPLD and CPLDProgrammable Logic Devices : SPLD and CPLD
Programmable Logic Devices : SPLD and CPLD
Usha Mehta44 vistas
Integrating Sustainable Development Goals (SDGs) in School Education por SheetalTank1
Integrating Sustainable Development Goals (SDGs) in School EducationIntegrating Sustainable Development Goals (SDGs) in School Education
Integrating Sustainable Development Goals (SDGs) in School Education
SheetalTank120 vistas
GDSC Mikroskil Members Onboarding 2023.pdf por gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil75 vistas
REPORT Data Science EXPERT LECTURE.doc por Parulkhatri11
REPORT Data Science EXPERT LECTURE.docREPORT Data Science EXPERT LECTURE.doc
REPORT Data Science EXPERT LECTURE.doc
Parulkhatri117 vistas
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R... por IJCNCJournal
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
IJCNCJournal5 vistas
2023-12 Emarei MRI Tool Set E2I0501ST (TQ).pdf por Philipp Daum
2023-12 Emarei MRI Tool Set E2I0501ST (TQ).pdf2023-12 Emarei MRI Tool Set E2I0501ST (TQ).pdf
2023-12 Emarei MRI Tool Set E2I0501ST (TQ).pdf
Philipp Daum6 vistas
Web Dev Session 1.pptx por VedVekhande
Web Dev Session 1.pptxWeb Dev Session 1.pptx
Web Dev Session 1.pptx
VedVekhande23 vistas
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... por csegroupvn
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
csegroupvn17 vistas
AWS Certified Solutions Architect Associate Exam Guide_published .pdf por Kiran Kumar Malik
AWS Certified Solutions Architect Associate Exam Guide_published .pdfAWS Certified Solutions Architect Associate Exam Guide_published .pdf
AWS Certified Solutions Architect Associate Exam Guide_published .pdf

Introduction to Data Science.pptx

  • 1. CS3352 -Foundations of Data Science Introduction to Data Science Dr.V.Anusuya Associate Professor/IT Ramco Institute of Technology Rajapalayam
  • 2. Introduction • Data Science is the area of study which involves extracting insights from vast amounts of data using various scientific methods, algorithms and processes. • Data Science is useful to discover hidden patterns from the voluminous raw data. • Data Science is because of the evolution of mathematical statistics, data analysis, and big data.
  • 3. Contd., • Ingredients = Data • When a chef is starting out with a new dish. • After the chef has determined what type of dish he would like to make, he goes to the fridge to gather ingredients. If he doesn’t have the necessary ingredients, he takes a trip to the store to collect them. • As data scientists, our data points are our ingredients. We may have some on hand, but we still might have to collect more through web scraping, SQL queries, etc.
  • 4. Big Data • Data Science focuses on processing the huge volume of heterogeneous data known as big data. • Big data refers to the data sets that are large and complex in nature and difficult to process using traditional data processing application software.
  • 5. Contd., • The characteristics of big data are often referred to as the three Vs: • Volume—How much data is there? • Variety—How diverse are different types of data? • Velocity—At what speed is new data generated? • Veracity-How accurate is the data?. • These four properties make big data different from the data found in traditional data management tools.
  • 6. Contd., • The four V’s of data make the data capturing,cleaning,preprocessing,storing,search ing,sharing,transferring and visualization processes a complex task. • Specialization techniques are required to extract the insights from this huge volume of data. • The main things that set a data scientist apart from a statistician are the ability to work with big data and experience in machine learning, computing, and algorithm building.
  • 7. Big data • Tools- Hadoop, Pig, Spark, R, Python, and Java. • Python is a great language for data science because it has many data science libraries available, and it’s widely supported by specialized software.
  • 8. Contd., • Data Science is an interdisciplinary field that allows to extract knowledge from structured or unstructured data. • Translate business problem and Research project into practical solution.
  • 9. Benefits and uses of data science • Data science and big data are used almost everywhere in both commercial and noncommercial Settings. • Commercial companies in almost every industry use data science and big data to gain insights into their customers, processes, staff, completion, and products. • Many companies use data science to offer customers a better user experience, as well as to cross-sell, up-sell, and personalize their offerings.
  • 10. Benefits and uses of data science • Governmental organizations are also aware of data’s value. Many governmental organizations not only rely on internal data scientists to discover valuable information, but also share their data with the public. • Nongovernmental organizations (NGOs) use it to raise money and defend their causes. • Universities use data science in their research but also to enhance the study experience of their students. The rise of massive open online courses (MOOC) produces a lot of data, which allows universities to study how this type of learning can complement traditional classes.
  • 11. Facets of data • In data science and big data we’ll come across many different types of data, and each of them tends to require different tools and techniques. The main categories of data are these: • Structured • Unstructured • Natural language • Machine-generated • Graph-based • Audio, video, and images • Streaming
  • 12. Structured Data •Structured data is data that depends on a data model and resides in a fixed field within a record. •To store structured data in tables within databases or Excel files. •SQL, or Structured Query Language, is the preferred way to manage and query data that resides in databases. •Hierarchical data such as a family tree.
  • 14. Unstructured Data • Unstructured data is data that isn’t easy to fit into a data model because the content is context-specific or varying. • Example of unstructured data is your regular email. • Although email contains structured elements such as the sender, title, and body text. • The thousands of different languages and dialects out there further complicate this.
  • 16. Natural language • Natural language is a special type of unstructured data; it’s challenging to process because it requires knowledge of specific data science techniques and linguistics. • The natural language processing community has had success in entity recognition, Speech recognition, summarization, text completion, and sentiment analysis, but models trained in one domain don’t generalize well to other domains. • Even state-of-the-art techniques aren’t able to decipher the meaning of every piece of text. • Example: personal voice Assistant, siri, Alexa recognize patterns in speech, provide a useful response. • Example: Email Filters- Primary, Social ,Promotions
  • 17. Machine-generated data • Machine-generated data is information that’s automatically created by a computer, process, application, or other machine without human intervention. • Machine-generated data is becoming a major data resource and will continue to do so. • The analysis of machine data relies on highly scalable tools, due to its high volume and speed. Examples of machine data are web server logs, call detail records, network event logs, and telemetry.
  • 19. Graph-based or Networked Data • A graph is a mathematical structure to model pair-wise relationships between objects. • Graph or network data is, in short, data that focuses on the relationship or adjacency of objects. • The graph structures use nodes, edges, and properties to represent and store graphical data.
  • 20. Contd., • Graph-based data is a natural way to represent social networks, and its structure allows you to calculate specific metrics such as the influence of a person and the shortest path between two people. • Example • LinkedIn-business colleagues, Twitter- follower list. • Facebook-connecting edges here to show friends.
  • 21. Graph based structure Specialized query languages such as SPARQL
  • 22. Audio, image, and video • Audio, image, and video are data types that pose specific challenges to a data scientist. • Tasks that are trivial for humans, such as recognizing objects in pictures, to be challenging for computers. • MLBAM (Major League Baseball Advanced Media) announced in 2014 that they’ll increase video capture to approximately 7 TB per game for the purpose of live, in-game analytics.
  • 23. Contd., • Recently a company called DeepMind succeeded at creating an algorithm that’s capable of learning how to play video games. • This algorithm takes the video screen as input and learns to interpret everything via a complex of deep learning.
  • 24. Streaming Data • The data flows into the system when an event happens instead of being loaded into a data store in a batch. • Examples are the “What’s trending” on Twitter, live sporting or music events, and the stock market.