SlideShare una empresa de Scribd logo
1 de 27
Presented by,
Gavara Sai Sri Lakshmi Alekhya,
MTech Graduate
Big Data
 Big Data is data whose scale, diversity and complexity require new
architecture, techniques, algorithms and analytics to manage it and
extract value and hidden knowledge from it.
 Simply Big data is similar to “small data” , but bigger in size.
 As having a data bigger it requires different approaches like
Techniques, tools, and architecture.
 This big data aims to solve new problems or old problems in a better
way.
 A Big data generates value from the storage and processing of very
large quantities of digital information that cannot be analyzed with
traditional computing techniques.
Generating Big data
Analysis -Big data Generation
 Walmart handles more than 1 million customer transactions every
hour.
 Facebook handles 40 billion photos from its user base.
 FB generates 10TB daily
 Twitter generates 7TB of data daily.
 Decoding the human genome originally took 10 years to process; but
now it can be achieved in one week.
4 V’s in Big Data
Big Data to Value
 Big data is not about the size of the data, but its
mainly about the value within the data.
Why Big Data Needed?
 Big Data Growth is needed.
 Increase of storage capacities.
 Increase of processing power.
 Availability of data(different data
types).
 Every day we create 2.5
quintillion bytes of data.
 IBM claims 90% of the data in the
world today has been created in
last two years alone.
Big Data Analytics
 Examining huge amounts of data.
 Accurate Information.
 Identification of hidden patterns, unknown
correlations.
 Competitive environment.
 Better Business Decisions like Strategic and
operational.
 Effective Marketing, Customer satisfaction, Increased
revenue.
Applications of Big Data
Risks of Big Data
 It will be so overwhelmed
 needs the right people and solve the
right problems.
 Costs escalate too fast
 is not necessary to capture 100%.
 Many sources of big data are privacy
 self regulation, legal regulation.
Challenges of Big Data
 Uncertainty of the Data Management Landscape
 The Big Data talent gap
 Getting data into Big data platform
 Synchronization across the data sources
 Getting useful information out of the Big data Platform
Big Data Analytics Technologies
 NoSQL: non-relational or atleast non-SQL database
solutions such as Hbase (also a part of the Hadoop
ecosystem), Cassandra, MongoDB, Riak, CouchDB and
many others.
 Hadoop : It is an ecosystem of software packages,
including MapReduce, HDFS and a whole host of
other software packages.
 Apache Hadoop is a frame work that allows for the distributed
processing of large data sets across clusters of commodity
computers using a simple programming model.
 It is an open source data management with scale-out storage and
distributed processing.
 Hadoop is a system for large scale data processing.
 It has two main components.
Hadoop = HDFS + MapReduce
HDFS – Hadoop Distributed File
System
 HDFS ( storage and file system): HDFS is a
scalable, fault tolerance reliable distributed file
system that provides high-throughput access to
data.
 NameNode:
 Master of the system
 Maintains and manages the blocks which are
present on the Datanodes.
 DataNodes:
 Slaves which are deployed on each machine
and provide the actual storage.
 Responsible for serving read and write
requests for the clients.
HDFS Architecture
Map Reduce
 A MapReduce job usually splits the input data-set into independent
chunks which are processed by the map tasks in a completely parallel
manner. The framework sorts the outputs of the maps, which are then
input to the reduce tasks. Typically both the input and the output of
the job are stored in a file-system.
 It has 2 phases.
 Mapper Phase:
 Process a key/value pair to generate intermediate key/value pairs
 Reducer Phase:
 Merge all intermediate values associated with the same key
MapReduce Architecture
HDFS and Map Reduce
Hadoop Eco-System
PIG:
Pig was initially developed at Yahoo Research around 2006 but moved into the
Apache Software Foundation in 2007 to allow individuals using Apache Hadoop to
focus a lot of on analyzing massive data sets and pay less time having to put in
writing mapper and reducer programs.
The Pig programming language is meant to handle any reasonably data—hence the
name!
 Pig consists of a two components, first is the language called as Pig Latin and
secondly an execution environment where Pig Latin programs are executed
HIVE:
Apache Hive is a data warehouse system for Apache Hadoop .
Hive is a technology which is developed by Facebook that turns Hadoop into
a data warehouse which complete with an extension of sql for querying.
Hive is used as HiveQL which is a declarative language.
In piglatin, dataflow is described but in Hive results must be described.
Hive by itself find out a dataflow to get those results.
Hive must have a schema that is more than one.
OOZIE:
Oozie is a java based web-application that runs in a java servlet that
uses the database to store definition of Workflow that is a collection of
actions. Hadoop jobs are managed by this.
HBASE:
Hbase is non-relational columnar distributed column oriented database
where as HDFS is file system.
 It is built and run on top of HDFS system.
 It is a management system that is open-source, versioned, and
distributed based on the Big Table of Google.
 It is written in Java. It is serving as the input and output for the Map
Reduce.
 For instance, read and write operations involve all rows but only a small
subset of all columns.
SQOOP:
Sqoop is a tool used to transfer the data from relational database environments like
oracle, mysql and postgre sql into hadoop environment.
It is a command-line interface platform is used for transferring data between
relational databases and Hadoop.
MAHOUT:
Mahout is a library for machine-learning and data mining which is divided
into four main groups: collective filtering, categorization, clustering, and
mining of parallel frequent patterns.
The Mahout library belongs to the subset that can be executed in a
distributed mode and executed by Map Reduce.
FLUME:
Flume is an open source programming which is made by cloud era to go about as
an organization for gathering and moving enormous measure of data around a
Hadoop bundle as data is conveyed or in no time.
Crucial use case of flume is together log records from all machines in cluster to
continue on them in a united store..
Conclusion
 Real time big data is not just a process for storing petabytes or
exabytes of data in a data warehouse, its about the ability to make
better decisions and take meaningful actions at the right time.
 Fast forward to the present and technologies like hadoop give you
the scale and flexibility to store data before you know how you are
going to process it
 Technologies such as MapReduce, Hive and Impala enables you to
run queries without changing the data structures underneath.
 It offers commercial opportunities of a comparable scale to
enterprise software in the late 1980’s.
Vendors using Big data(hadoop)
Future
 Our new research works in organizations use big data to
target
 customer centric outcomes,
 tap into internal data and
 build a better information ecosystem.
A Glimpse of Bigdata - Introduction

Más contenido relacionado

La actualidad más candente

The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
Zubair Nabi
 

La actualidad más candente (20)

Big data ppt
Big data pptBig data ppt
Big data ppt
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Why Hadoop is Useful?
Why Hadoop is Useful?Why Hadoop is Useful?
Why Hadoop is Useful?
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 

Similar a A Glimpse of Bigdata - Introduction

TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYA
Aditya Srinivasan
 

Similar a A Glimpse of Bigdata - Introduction (20)

Case study on big data
Case study on big dataCase study on big data
Case study on big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
paper
paperpaper
paper
 
Big data
Big dataBig data
Big data
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Big Data
Big DataBig Data
Big Data
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYA
 
Big data
Big dataBig data
Big data
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 

Último

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Último (20)

Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

A Glimpse of Bigdata - Introduction

  • 1. Presented by, Gavara Sai Sri Lakshmi Alekhya, MTech Graduate
  • 2.
  • 3. Big Data  Big Data is data whose scale, diversity and complexity require new architecture, techniques, algorithms and analytics to manage it and extract value and hidden knowledge from it.  Simply Big data is similar to “small data” , but bigger in size.  As having a data bigger it requires different approaches like Techniques, tools, and architecture.  This big data aims to solve new problems or old problems in a better way.  A Big data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques.
  • 5. Analysis -Big data Generation  Walmart handles more than 1 million customer transactions every hour.  Facebook handles 40 billion photos from its user base.  FB generates 10TB daily  Twitter generates 7TB of data daily.  Decoding the human genome originally took 10 years to process; but now it can be achieved in one week.
  • 6. 4 V’s in Big Data
  • 7. Big Data to Value  Big data is not about the size of the data, but its mainly about the value within the data.
  • 8. Why Big Data Needed?  Big Data Growth is needed.  Increase of storage capacities.  Increase of processing power.  Availability of data(different data types).  Every day we create 2.5 quintillion bytes of data.  IBM claims 90% of the data in the world today has been created in last two years alone.
  • 9. Big Data Analytics  Examining huge amounts of data.  Accurate Information.  Identification of hidden patterns, unknown correlations.  Competitive environment.  Better Business Decisions like Strategic and operational.  Effective Marketing, Customer satisfaction, Increased revenue.
  • 11. Risks of Big Data  It will be so overwhelmed  needs the right people and solve the right problems.  Costs escalate too fast  is not necessary to capture 100%.  Many sources of big data are privacy  self regulation, legal regulation.
  • 12. Challenges of Big Data  Uncertainty of the Data Management Landscape  The Big Data talent gap  Getting data into Big data platform  Synchronization across the data sources  Getting useful information out of the Big data Platform
  • 13. Big Data Analytics Technologies  NoSQL: non-relational or atleast non-SQL database solutions such as Hbase (also a part of the Hadoop ecosystem), Cassandra, MongoDB, Riak, CouchDB and many others.  Hadoop : It is an ecosystem of software packages, including MapReduce, HDFS and a whole host of other software packages.
  • 14.  Apache Hadoop is a frame work that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.  It is an open source data management with scale-out storage and distributed processing.  Hadoop is a system for large scale data processing.  It has two main components. Hadoop = HDFS + MapReduce
  • 15. HDFS – Hadoop Distributed File System  HDFS ( storage and file system): HDFS is a scalable, fault tolerance reliable distributed file system that provides high-throughput access to data.  NameNode:  Master of the system  Maintains and manages the blocks which are present on the Datanodes.  DataNodes:  Slaves which are deployed on each machine and provide the actual storage.  Responsible for serving read and write requests for the clients.
  • 17. Map Reduce  A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system.  It has 2 phases.  Mapper Phase:  Process a key/value pair to generate intermediate key/value pairs  Reducer Phase:  Merge all intermediate values associated with the same key
  • 19. HDFS and Map Reduce
  • 21. PIG: Pig was initially developed at Yahoo Research around 2006 but moved into the Apache Software Foundation in 2007 to allow individuals using Apache Hadoop to focus a lot of on analyzing massive data sets and pay less time having to put in writing mapper and reducer programs. The Pig programming language is meant to handle any reasonably data—hence the name!  Pig consists of a two components, first is the language called as Pig Latin and secondly an execution environment where Pig Latin programs are executed HIVE: Apache Hive is a data warehouse system for Apache Hadoop . Hive is a technology which is developed by Facebook that turns Hadoop into a data warehouse which complete with an extension of sql for querying. Hive is used as HiveQL which is a declarative language. In piglatin, dataflow is described but in Hive results must be described. Hive by itself find out a dataflow to get those results. Hive must have a schema that is more than one.
  • 22. OOZIE: Oozie is a java based web-application that runs in a java servlet that uses the database to store definition of Workflow that is a collection of actions. Hadoop jobs are managed by this. HBASE: Hbase is non-relational columnar distributed column oriented database where as HDFS is file system.  It is built and run on top of HDFS system.  It is a management system that is open-source, versioned, and distributed based on the Big Table of Google.  It is written in Java. It is serving as the input and output for the Map Reduce.  For instance, read and write operations involve all rows but only a small subset of all columns.
  • 23. SQOOP: Sqoop is a tool used to transfer the data from relational database environments like oracle, mysql and postgre sql into hadoop environment. It is a command-line interface platform is used for transferring data between relational databases and Hadoop. MAHOUT: Mahout is a library for machine-learning and data mining which is divided into four main groups: collective filtering, categorization, clustering, and mining of parallel frequent patterns. The Mahout library belongs to the subset that can be executed in a distributed mode and executed by Map Reduce. FLUME: Flume is an open source programming which is made by cloud era to go about as an organization for gathering and moving enormous measure of data around a Hadoop bundle as data is conveyed or in no time. Crucial use case of flume is together log records from all machines in cluster to continue on them in a united store..
  • 24. Conclusion  Real time big data is not just a process for storing petabytes or exabytes of data in a data warehouse, its about the ability to make better decisions and take meaningful actions at the right time.  Fast forward to the present and technologies like hadoop give you the scale and flexibility to store data before you know how you are going to process it  Technologies such as MapReduce, Hive and Impala enables you to run queries without changing the data structures underneath.  It offers commercial opportunities of a comparable scale to enterprise software in the late 1980’s.
  • 25. Vendors using Big data(hadoop)
  • 26. Future  Our new research works in organizations use big data to target  customer centric outcomes,  tap into internal data and  build a better information ecosystem.