SlideShare a Scribd company logo
1 of 22
What is Data Analytics ?
Why do we need it ?
Where do we need it ?
We need It to predict something relevant to our goal, increase
profit and for efficient utilization of our resources
We need it everywhere:
Daily Life: Before buying the grocery, Before filling the fuel
Business : To increase Sales
Computer System : Various algorithms ( LRU, MRU ,
Command Queuing Algorithms)
It is a process of transforming processed Information into knowledge
Have U ever noticed that the product you select or viewed from ,
AMAZON, SNAPDEAL, MYANTRA.
The Advertisement of the same product is reflect on all the websites and webpages over
which U go?
How Dominos is giving Ua offer a PIZZA free on
Thrusday and may be giving
U a GARLIC BREAD with CHEESY DIP
absolutely free?
WHY Supermarket stores are having
SALES and how they are giving a product free?
IF U HAVE SUCH QUESTIONS IN YOUR MINDTHEN
DIVE IN OCEAN OF ANALYSIS
DATAANALYTICS (Play with data)
ANALYSIS needs DATA
So needs to understand various types of DATA over
which companies work
1. STRUCTURED DATA
2. SEMI STRUCTURED DATA
3. UNSTRUCTURED
BIG DATA
Structured Data
 It concerns all data which can be stored in database SQL in
table with rows and columns.
 They have relational key and can be easily mapped into pre-
designed fields.
 Today, those data are the most processed in development and
the simplest way to manage information.
 But structured data represent only 5 to 10% of all
informatics data.
Semi Structured Data
• Semi-structured data is information that doesn’t reside in a
relational database but that does have some organizational
properties that make it easier to analyze.
• With some process you can store them in relation database (it
could be very hard for some kind of semi structured data), but
the semi structure exist to ease space, clarity or compute…
Examples of semi-structured : CSV , XML documents are semi
structured documents.
But as Structured data, semi structured data represents a few
parts of data (5 to 10%).
Unstructured data
• Unstructured data represent around 80% of data.
• It often include text and multimedia content.
• Examples include e-mail messages, word processing
documents, videos, photos, audio files, presentations,
webpages and many other kinds of business documents.
• Unstructured data is everywhere.
• In fact, most individuals and organizations conduct their lives
around unstructured data.
• Just as with structured data, unstructured data is either
machine generated or human generated.
Unstructured data
Here are some examples of machine-generated unstructured data:
• Satellite images: This includes weather data or the data that the government captures in
its satellite surveillance imagery. Just think about Google Earth, and you get the picture.
• Scientific data: This includes seismic imagery, atmospheric data, and high energy
physics.
• Photographs and video: This includes security, surveillance, and traffic video.
• Radar or sonar data: This includes vehicular, meteorological, and oceanographic
seismic profiles.
• The following list shows a few examples of human-generated unstructured data:
• Social media data: This data is generated from the social media platforms such as
YouTube, Facebook, Twitter, LinkedIn, and Flickr.
• Mobile data: This includes data such as text messages and location information.
• website content: This comes from any site delivering unstructured content, like
YouTube, Flickr, or Instagram.
BIG DATA
• Big Data may well be the Next Big Thing in the IT world.
• Big data burst upon the scene in the first decade of the 21st
century.
• The first organizations to embrace it were online and startup
firms. Firms like Google, eBay, LinkedIn, and Facebook were
built around big data from the beginning.
• Like many new information technologies, big data can bring
about dramatic cost reductions, substantial improvements in the
time required to perform a computing task, or new product and
service offerings.
BIG DATA
• ‘Big Data’ is similar to ‘small data’, but bigger in size
• but having data bigger it requires different approaches:
– Techniques, tools and architecture
• An aim to solve new problems or old problems in a better way
• Big Data generates value from the storage and processing of very
large quantities of digital information that cannot be analyzed
with traditional computing techniques.
What is BIG DATA?
What is BIG DATA
• Walmart handles more than 1 million customer transactions every
hour.
• Facebook handles 40 billion photos from its user base.
• Decoding the human genome originally took 10years to process; now
it can be achieved in one week.
1st Character of Big Data
Volume
•A typical PC might have had 10 gigabytes of storage in 2000.
•Today, Facebook ingests 500 terabytes of new data every day.
•Boeing 737 will generate 240 terabytes of flight data during a single
flight across the US.
• The smart phones, the data they create and consume; sensors
embedded into everyday objects will soon result in billions of new,
constantly-updated data feeds containing environmental, location,
and other information, including video.
2nd Character of Big Data
Velocity
• Clickstreams and ad impressions capture user behavior at
millions of events per second
• high-frequency stock trading algorithms reflect market
changes within microseconds
• machine to machine processes exchange data between billions
of devices
• infrastructure and sensors generate massive log data in real-
time
• on-line gaming systems support millions of concurrent users,
each producing multiple inputs per second.
3rd Character of Big Data
Variety
• Big Data isn't just numbers, dates, and strings. Big Data is also
geospatial data, 3D data, audio and video, and unstructured
text, including log files and social media.
• Traditional database systems were designed to address smaller
volumes of structured data, fewer updates or a predictable,
consistent data structure.
• Big Data analysis includes different types of data
Storing Big Data
Analyzing your data characteristics
• Selecting data sources for ANALYSIS
• Eliminating redundant data
• Establishing the role of NoSQL*
Overview of Big Data stores
• Data models: key value, graph, document,
column-family
• Hadoop* Distributed File System
• HBase
• Hive
HADOOP
• Hadoop is not a type of database, but rather a software ecosystem that
allows for massively parallel computing.
• It is an enabler of certain types NoSQL distributed databases (such as
HBase), which can allow for data to be spread across thousands of
servers with little reduction in performance.
• Hadoop is an Apache Software Foundation project that importantly
provides two things:
A distributed filesystem called HDFS (Hadoop Distributed File System)
A framework and API for building and running MapReduce jobs
HDFS is structured similarly to a regular Unix filesystem except that
data storage is distributed across several machines.
HADOOP
HADOOP
• Google solved this problem using an algorithm called MapReduce.
• This algorithm divides the task into small parts and assigns those
parts to many computers connected over the network, and collects
the results to form the final result dataset.
• Doug Cutting, Mike Cafarella and team took the solution provided
by Google and started an Open Source Project called HADOOP in
2005 and Daug named it after his son's toy elephant.
• Hadoop is an Apache open source framework written in java that
allows distributed processing of large datasets across clusters of
computers using simple programming models.
• Hadoop is designed to scale up from single server to thousands of
machines, each offering local computation and storage.
SO TILL NOW WE HAVE STUDIED REGARDING
VARIOUS TYPES OF DATA
Structured Data –> Maintained by RDBMS software
Unstructured Data –> Managed by NOSQL databases
BIG Data –> Handled by HADOOP (MAP REDUCE FRAMEWORK)
BUT to INCREASE company/organisation PROFIT then after the
management of these data, they needs to be HIGHLYANALYZED.

More Related Content

Similar to big-data-notes1.ppt

Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
Vamshikrishna Goud
 

Similar to big-data-notes1.ppt (20)

Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
bigdatappt.pptx
bigdatappt.pptxbigdatappt.pptx
bigdatappt.pptx
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big Data
Big DataBig Data
Big Data
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analytics
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big data-ppt-
Big data-ppt-Big data-ppt-
Big data-ppt-
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
big data
big data big data
big data
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

big-data-notes1.ppt

  • 1.
  • 2. What is Data Analytics ? Why do we need it ? Where do we need it ? We need It to predict something relevant to our goal, increase profit and for efficient utilization of our resources We need it everywhere: Daily Life: Before buying the grocery, Before filling the fuel Business : To increase Sales Computer System : Various algorithms ( LRU, MRU , Command Queuing Algorithms) It is a process of transforming processed Information into knowledge
  • 3. Have U ever noticed that the product you select or viewed from , AMAZON, SNAPDEAL, MYANTRA. The Advertisement of the same product is reflect on all the websites and webpages over which U go?
  • 4. How Dominos is giving Ua offer a PIZZA free on Thrusday and may be giving U a GARLIC BREAD with CHEESY DIP absolutely free? WHY Supermarket stores are having SALES and how they are giving a product free? IF U HAVE SUCH QUESTIONS IN YOUR MINDTHEN DIVE IN OCEAN OF ANALYSIS
  • 6. ANALYSIS needs DATA So needs to understand various types of DATA over which companies work 1. STRUCTURED DATA 2. SEMI STRUCTURED DATA 3. UNSTRUCTURED BIG DATA
  • 7. Structured Data  It concerns all data which can be stored in database SQL in table with rows and columns.  They have relational key and can be easily mapped into pre- designed fields.  Today, those data are the most processed in development and the simplest way to manage information.  But structured data represent only 5 to 10% of all informatics data.
  • 8. Semi Structured Data • Semi-structured data is information that doesn’t reside in a relational database but that does have some organizational properties that make it easier to analyze. • With some process you can store them in relation database (it could be very hard for some kind of semi structured data), but the semi structure exist to ease space, clarity or compute… Examples of semi-structured : CSV , XML documents are semi structured documents. But as Structured data, semi structured data represents a few parts of data (5 to 10%).
  • 9. Unstructured data • Unstructured data represent around 80% of data. • It often include text and multimedia content. • Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents. • Unstructured data is everywhere. • In fact, most individuals and organizations conduct their lives around unstructured data. • Just as with structured data, unstructured data is either machine generated or human generated.
  • 10. Unstructured data Here are some examples of machine-generated unstructured data: • Satellite images: This includes weather data or the data that the government captures in its satellite surveillance imagery. Just think about Google Earth, and you get the picture. • Scientific data: This includes seismic imagery, atmospheric data, and high energy physics. • Photographs and video: This includes security, surveillance, and traffic video. • Radar or sonar data: This includes vehicular, meteorological, and oceanographic seismic profiles. • The following list shows a few examples of human-generated unstructured data: • Social media data: This data is generated from the social media platforms such as YouTube, Facebook, Twitter, LinkedIn, and Flickr. • Mobile data: This includes data such as text messages and location information. • website content: This comes from any site delivering unstructured content, like YouTube, Flickr, or Instagram.
  • 12. • Big Data may well be the Next Big Thing in the IT world. • Big data burst upon the scene in the first decade of the 21st century. • The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning. • Like many new information technologies, big data can bring about dramatic cost reductions, substantial improvements in the time required to perform a computing task, or new product and service offerings. BIG DATA
  • 13. • ‘Big Data’ is similar to ‘small data’, but bigger in size • but having data bigger it requires different approaches: – Techniques, tools and architecture • An aim to solve new problems or old problems in a better way • Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques. What is BIG DATA?
  • 14. What is BIG DATA • Walmart handles more than 1 million customer transactions every hour. • Facebook handles 40 billion photos from its user base. • Decoding the human genome originally took 10years to process; now it can be achieved in one week.
  • 15. 1st Character of Big Data Volume •A typical PC might have had 10 gigabytes of storage in 2000. •Today, Facebook ingests 500 terabytes of new data every day. •Boeing 737 will generate 240 terabytes of flight data during a single flight across the US. • The smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video.
  • 16. 2nd Character of Big Data Velocity • Clickstreams and ad impressions capture user behavior at millions of events per second • high-frequency stock trading algorithms reflect market changes within microseconds • machine to machine processes exchange data between billions of devices • infrastructure and sensors generate massive log data in real- time • on-line gaming systems support millions of concurrent users, each producing multiple inputs per second.
  • 17. 3rd Character of Big Data Variety • Big Data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media. • Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure. • Big Data analysis includes different types of data
  • 18. Storing Big Data Analyzing your data characteristics • Selecting data sources for ANALYSIS • Eliminating redundant data • Establishing the role of NoSQL* Overview of Big Data stores • Data models: key value, graph, document, column-family • Hadoop* Distributed File System • HBase • Hive
  • 19. HADOOP • Hadoop is not a type of database, but rather a software ecosystem that allows for massively parallel computing. • It is an enabler of certain types NoSQL distributed databases (such as HBase), which can allow for data to be spread across thousands of servers with little reduction in performance. • Hadoop is an Apache Software Foundation project that importantly provides two things: A distributed filesystem called HDFS (Hadoop Distributed File System) A framework and API for building and running MapReduce jobs HDFS is structured similarly to a regular Unix filesystem except that data storage is distributed across several machines.
  • 21. HADOOP • Google solved this problem using an algorithm called MapReduce. • This algorithm divides the task into small parts and assigns those parts to many computers connected over the network, and collects the results to form the final result dataset. • Doug Cutting, Mike Cafarella and team took the solution provided by Google and started an Open Source Project called HADOOP in 2005 and Daug named it after his son's toy elephant. • Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. • Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.
  • 22. SO TILL NOW WE HAVE STUDIED REGARDING VARIOUS TYPES OF DATA Structured Data –> Maintained by RDBMS software Unstructured Data –> Managed by NOSQL databases BIG Data –> Handled by HADOOP (MAP REDUCE FRAMEWORK) BUT to INCREASE company/organisation PROFIT then after the management of these data, they needs to be HIGHLYANALYZED.