SlideShare a Scribd company logo
1 of 6
Download to read offline
Sysfore Technologies
#117-120, First Floor, 4th Block, 80 Feet Road, Koramangala, Bangalore 560034
HADOOP AND BIG DATA
ANALYTICS
Hadoop and Big Data Analytics
Gartner defines Big Data as “high volume, velocity and variety information
assets that demand cost-effective, innovative forms of information processing
for enhanced insight and decision making”.
Big data is data that, by virtue of its velocity, volume, or variety (the three Vs),
cannot be easily stored or analysed with traditional methods.
The term covers each and every piece of data your organization has stored till
now. It includes all the data stored both on-premises or in the cloud. It could
be papers, digital, structured and non-structured data within your company.
There is a deluge of structured and unstructured data that is generated every
second. This is known as Big Data, which can be analysed to help customers
turn that data into insights. AWS provides a broad platform of managed
services, infrastructure and tools to tackle your next Big Data project. It
enables you to collect, store, process, analyse and visualize Big Data on the
cloud. It provides all the hardware, software, infrastructure to maintain and
scale, so that you focus on building your application.
Some of the common Big Data Customer scenarios include Web & E-Tailing,
Telecommunications, Government, Healthcare & Life Science, Bank & Financial
Services and Retail, where Big Data is continuously generated.
How is Big Data consumed by Businesses: Businesses can gain a lot of insight
into how their product is being consumed, by analysing the Big Data
generated. Big Data analytics is an area of rapidly growing diversity. Analysing
large data sets requires significant compute capacity that can vary in size based
on the amount of input data and the analysis required. This characteristic of
big data workloads is ideally suited to the pay-as-you-go cloud computing
model, where applications can easily scale up and down based on demand.
Using Big Data analytics will give you a clear picture about how your data is
being generated and consumed by the customers. It can be used for predictive
marketing and plans to increase your business. It provides:
 Early key indicators, gives insights into business trends resulting in
business fortunes.
 Analytics results in business advantage.
 Get more precise analysis and results with more data.
Limitations of using the traditional analytics methods: The advancements in
technologies has resulted in huge volume of data being generated every
second. Storing, processing, analysing and getting quality results is time
consuming, costly and ineffective in the current scenario.
 Only a limited amount of high fidelity raw data is available for analysis.
 Storage is limited by the high volume of raw data that is continuously
generated.
 Moving data for computation doesn’t scale accordingly.
 Data is archived regularly to conserve space. This limits the amount of
data that is available for the analytical tools.
 The perception that traditional data warehousing processes are too slow
and limited in scalability.
 The ability to converge data from multiple data sources, both structured
and unstructured.
 The realization that time to information is critical to extract value from
data sources that include mobile devices, RFID, the web and a growing
list of automated sensory technologies.
As requirements change you can easily resize your environment (horizontally
or vertically) on AWS to meet your Amazon Web Services.
In addition, there are at least four major developmental segments that
underline the diversity to be found within Big Data analytics. These segments
are MapReduce, scalable database, real-time stream processing and Big Data
appliance.
Using Hadoop for Big Data Analytics: There is a big difference between Big
Data and Hadoop. The former is an asset, often a complex and ambiguous one,
while the latter is a program that accomplishes a set of goals and objectives for
dealing with that asset.
Hadoop is an open-source software framework for storing data and running
applications on clusters of commodity hardware. It provides massive storage
for any kind of data, enormous processing power and the ability to handle
virtually limitless concurrent tasks or jobs.
Hadoop is a framework, which allows processing of large data sets. It
completes the tasks in minutes, while the same done using the RDMS would
take hours.
Hadoop has 2 main components:
 HDFS – Hadoop Distributed File System (for Storage)
 MapReduce (for Processing)
Hadoop Distributed File System works: The Hadoop Distributed File System
(HDFS) is the primary storage system used by Hadoop applications. It consists
of HDFS clusters, which each contain one or more data nodes. Incoming data is
split into segments and distributed across data nodes to support parallel
processing. Each segment is then replicated on multiple data nodes to enable
processing to continue in the event of a node failure.
While HDFS protects against some types of failure, it is not entirely fault
tolerant. A single NameNode located on a single server is required. If this
server fails, the entire file system shuts down. A secondary NameNode
periodically backs up the primary. The backup data is used to restart the
primary but cannot be used to maintain operation.
HDFS is typically used in a Hadoop installation, yet other distributed file
systems are also supported. The Amazon S3 file system can be used but does
not maintain information on the location of data segments, reducing the ability
of Hadoop to survive server or rack failures. Other file systems such as open
source CloudStore and the MapR file system can be used to do maintain
location information.
Distributed processing is handled by MapReduce: The idea behind
MapReduce is that Hadoop can first map a large data set, and then perform a
reduction on that content for specific results. A reduce function can be thought
of as a kind of filter for raw data. The HDFS system then acts to distribute data
across a network or migrate it as necessary.
The MapReduce feature consists of one JobTracker and multiple TaskTrackers.
Client applications submit jobs to the JobTracker, which assigns each job to a
TaskTracker node. When HDFS or another location-aware file system is in use,
JobTracker takes advantage of knowing the location of each data segment. It
attempts to assign processing to the same node on which the required data
has been placed.
Apache Hadoop users typically build their own parallelized computing clusters
from commodity servers, each with dedicated storage in the form of a small
disk array or solid-state drive (SSD) for performance. These are commonly
referred to as “shared-nothing” architectures.
Big Data is getting Big and more important: As more and more data are
collected, the analysis of these data requires scalable, flexible, and high-
performing tools to provide analysis and insight in a timely fashion. Big Data
analytics is a growing field, with the need to parse large data sets from
multiple sources, and to produce information in real-time or near-real-time
gaining importance. IT organizations are exploring various analytics
technologies to parse web-based data sources and extract value from the
social networking boom.

More Related Content

What's hot

Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
DataWorks Summit
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
magda3695
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 

What's hot (20)

Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 

Similar to Hadoop and Big Data Analytics | Sysfore

Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
inventionjournals
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure
Rajesh Angadi
 

Similar to Hadoop and Big Data Analytics | Sysfore (20)

paper
paperpaper
paper
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant Resource
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
 
Hadoop
HadoopHadoop
Hadoop
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Hadoop and Big Data Analytics | Sysfore

  • 1. Sysfore Technologies #117-120, First Floor, 4th Block, 80 Feet Road, Koramangala, Bangalore 560034 HADOOP AND BIG DATA ANALYTICS
  • 2. Hadoop and Big Data Analytics Gartner defines Big Data as “high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”. Big data is data that, by virtue of its velocity, volume, or variety (the three Vs), cannot be easily stored or analysed with traditional methods. The term covers each and every piece of data your organization has stored till now. It includes all the data stored both on-premises or in the cloud. It could be papers, digital, structured and non-structured data within your company. There is a deluge of structured and unstructured data that is generated every second. This is known as Big Data, which can be analysed to help customers turn that data into insights. AWS provides a broad platform of managed services, infrastructure and tools to tackle your next Big Data project. It enables you to collect, store, process, analyse and visualize Big Data on the cloud. It provides all the hardware, software, infrastructure to maintain and scale, so that you focus on building your application. Some of the common Big Data Customer scenarios include Web & E-Tailing, Telecommunications, Government, Healthcare & Life Science, Bank & Financial Services and Retail, where Big Data is continuously generated. How is Big Data consumed by Businesses: Businesses can gain a lot of insight into how their product is being consumed, by analysing the Big Data
  • 3. generated. Big Data analytics is an area of rapidly growing diversity. Analysing large data sets requires significant compute capacity that can vary in size based on the amount of input data and the analysis required. This characteristic of big data workloads is ideally suited to the pay-as-you-go cloud computing model, where applications can easily scale up and down based on demand. Using Big Data analytics will give you a clear picture about how your data is being generated and consumed by the customers. It can be used for predictive marketing and plans to increase your business. It provides:  Early key indicators, gives insights into business trends resulting in business fortunes.  Analytics results in business advantage.  Get more precise analysis and results with more data. Limitations of using the traditional analytics methods: The advancements in technologies has resulted in huge volume of data being generated every second. Storing, processing, analysing and getting quality results is time consuming, costly and ineffective in the current scenario.  Only a limited amount of high fidelity raw data is available for analysis.  Storage is limited by the high volume of raw data that is continuously generated.  Moving data for computation doesn’t scale accordingly.  Data is archived regularly to conserve space. This limits the amount of data that is available for the analytical tools.
  • 4.  The perception that traditional data warehousing processes are too slow and limited in scalability.  The ability to converge data from multiple data sources, both structured and unstructured.  The realization that time to information is critical to extract value from data sources that include mobile devices, RFID, the web and a growing list of automated sensory technologies. As requirements change you can easily resize your environment (horizontally or vertically) on AWS to meet your Amazon Web Services. In addition, there are at least four major developmental segments that underline the diversity to be found within Big Data analytics. These segments are MapReduce, scalable database, real-time stream processing and Big Data appliance. Using Hadoop for Big Data Analytics: There is a big difference between Big Data and Hadoop. The former is an asset, often a complex and ambiguous one, while the latter is a program that accomplishes a set of goals and objectives for dealing with that asset. Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.
  • 5. Hadoop is a framework, which allows processing of large data sets. It completes the tasks in minutes, while the same done using the RDMS would take hours. Hadoop has 2 main components:  HDFS – Hadoop Distributed File System (for Storage)  MapReduce (for Processing) Hadoop Distributed File System works: The Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. It consists of HDFS clusters, which each contain one or more data nodes. Incoming data is split into segments and distributed across data nodes to support parallel processing. Each segment is then replicated on multiple data nodes to enable processing to continue in the event of a node failure. While HDFS protects against some types of failure, it is not entirely fault tolerant. A single NameNode located on a single server is required. If this server fails, the entire file system shuts down. A secondary NameNode periodically backs up the primary. The backup data is used to restart the primary but cannot be used to maintain operation. HDFS is typically used in a Hadoop installation, yet other distributed file systems are also supported. The Amazon S3 file system can be used but does not maintain information on the location of data segments, reducing the ability of Hadoop to survive server or rack failures. Other file systems such as open source CloudStore and the MapR file system can be used to do maintain location information.
  • 6. Distributed processing is handled by MapReduce: The idea behind MapReduce is that Hadoop can first map a large data set, and then perform a reduction on that content for specific results. A reduce function can be thought of as a kind of filter for raw data. The HDFS system then acts to distribute data across a network or migrate it as necessary. The MapReduce feature consists of one JobTracker and multiple TaskTrackers. Client applications submit jobs to the JobTracker, which assigns each job to a TaskTracker node. When HDFS or another location-aware file system is in use, JobTracker takes advantage of knowing the location of each data segment. It attempts to assign processing to the same node on which the required data has been placed. Apache Hadoop users typically build their own parallelized computing clusters from commodity servers, each with dedicated storage in the form of a small disk array or solid-state drive (SSD) for performance. These are commonly referred to as “shared-nothing” architectures. Big Data is getting Big and more important: As more and more data are collected, the analysis of these data requires scalable, flexible, and high- performing tools to provide analysis and insight in a timely fashion. Big Data analytics is a growing field, with the need to parse large data sets from multiple sources, and to produce information in real-time or near-real-time gaining importance. IT organizations are exploring various analytics technologies to parse web-based data sources and extract value from the social networking boom.