SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
Introduction	
to
Big	Data
By:	Haluan	Mohammad	Irsad
Definition
Big data can defined by 3Vs (Three Vs):
– Volume, starts as low as 1 terabyte and it has no upper limit.
– Velocity, data volume per unit time, should be at least 30KB/sec.
– Variety, add unstructured & semi-structured to structured data.
Volume
Big data is composed of huge numbers of very small transactions that come in variety
formats.
Data produce true value only after they’re aggregated and analyzed.
Velocity
Required latency less than 100ms, measured from the time data is created to time has
the responds.
Throughput requirement can easily be as high as 1,000 messages per second.
Variety
Composing of a combination of datasets with differing underlying structures (structured,
semi-structured, or structured).
Heterogeneous format: graphics, JSON, XML, CSV, and log files.
Idetifying by	the	
Sources
Data		nowadays	generated	by:
• Humans
• Machine
• Sensors
Typical	sources:
• Social	media
• Financial	transactions
• Health	records
• Click	streams
• Log	files
• Internet	of	Things
Problem
■ Managing volume of data, caused by overloading volume
■ Maintaining system performance, caused by low velocity of data access
■ Avoiding disjunction of data, caused by variety of data structure and formats.
How to accomplish those?
Managing Volume
Scalable database, use NoSQL DBMS (MongoDB, Cassandra DB, Titan DB)
Maintaining Performance
■ For Batch Processing, use Hadoop MapReduce
■ For Stream Processing, use Apache Spark, Apache Storm, Apache Drill
Avoiding Disjunction
■ Use Flat Storage Architecture / Data Lake, to hold huge volumes of multi-structured
data.
■ Use Hadoop Distributed File System (HDFS), to deploy to the machine.
Hadoop
Digging	deeper	into
What is Hadoop?
A framework that allows for the distributed processing of large data sets across clusters
of computers using simple programming models.
Why Hadoop?
■ Most proven framework in industry nowadays
■ Open Source
■ Rich features & functionalities
■ Rich support plugins in the ecosystem
=> (https://hadoopecosystemtable.github.io/)
When to use Hadoop?
■ For processing large volumes of data
■ For parallel data processing
■ For storing a diverse set of data
When not to use Hadoop?
■ For a relational database system
■ For a general network network file system
■ For non-parallel data processing
Core Functions
■ Data Storage
■ Data Processing
■ Resource Management
Data Storage
Hadoop use HDFS (Hadoop Distributed File System) to store the data.
HDFS is a distributed file system designed to fault tolerant and deployed on low-cost
hardware
HDFS’ Goals
■ Hardware Failure, detection of faults and quick automatic recovery.
■ Streaming Data Access, emphasis on high throughput of data access.
■ Large Data Sets, provide high aggregate data bandwidth and scale to hundreds of
nodes in a single cluster and support tens of millions of files in a single instance.
■ Simple Coherency Model, HDFS applications need a write-once-read-many access
model for files.
■ Portability, designed to be easily move from one platform to another.
Data Processing
Data processing has two ways, batch and real-time.
■ Batch processing, execution of a series of jobs.
– Use Hadoop MapReduce
■ Real-time processing, execution of instantaneously jobs.
– Use Apache Spark
Hadoop	MapReduce
• MapReduce	is	a	parallel	distributed	processing	
that	can	be	used	to	process	large	of	data	in	
batch	to	transform	it	into	manageable-size	
data.
• This	work	is	done	in	two	steps:
1. Map	the	Data,	this	stage	is	to	delegate	
the	data	into	key-value	pairs	&	divided	
into	fragments,	then	assigned	to	map	
tasks.
2. Reduce	the	Data,	this	stage	is	the	
combination	of	the Shuffle stage	and	
the Reduce stage,	the	goals	of	this	stage	
is	to	process	the	data	result	of	map	
tasks,	then	produce	a	new	set	of	output	
which	will	stored	in	the	HDFS.
Apache	Spark
Is	a	compute	engine	for	Hadoop	data,	provides	
expressive	programming	model	(SparkQL),	stream	
processing,	machine	learning	(MLib),	and	graph	
computation	(GraphX).
Resource Management
Manage all resources in the Hadoop cluster, to monitor if there are any faults, job
scheduling, and do quick automatic recovery.
Hadoop use YARN.
Hadoop	YARN
• The	ResourceManager is	the	ultimate	authority	
that	arbitrates	resources	among	all	the	
applications	in	the	system	(cluster).
• The	NodeManager is	the	per-machine	framework	
agent	who	is	responsible	for	containers,	
monitoring	their	resource	usage	(cpu,	memory,	
disk,	network)	and	reporting	the	same	to	the	
ResourceManager
YARN (cot’d)
■ The Scheduler is responsible for allocating resources to the various running
applications subject to familiar constraints of capacities, queues, etc.
– Performs no monitoring or tracking of status for the application
– No guarantees about restarting failed tasks either due to application failure or
hardware failures
– Performs its scheduling function based on the resource requirements of the
applications
Analyzing Data
Process of inspecting, cleansing, transforming, and modeling data with the goal of
discovering useful information, suggesting conclusions, and supporting decision-
making.
The goal of analyzing data is to leverage your business to grow more higher.
Hadoop support this activity with the help from Apache Mahout.
Apache	Mahout
Library	that	help	creates	a	machine	learning	
applications.
The	main	functions	is	to	help	solve:
1. Classification,	assigning	a	set	of	data	to	known	
category.
2. Clustering,	grouping	a	set	of	objects	based	on	
the	similarity.
3. Recommendations,	give	list	of	
recommendation	based	on	statistic	analyzing.
Mahout	provides	the	algorithm	to	solve	all	problem	
above	and	allow	to	customized	them	on	demand.
Visualization
Hadoop by default doesn’t support to visualize the data.
To visualize the data, use Apache Zeppelin (http://zeppelin.apache.org/).
Apache	Zeppelin
Apache	Zeppelin	runs	on	top	of	Apache	Spark,	but	
provide	pluggable	interpreter	APIs	to	support	other	
data	processing	system.
Benefits
Hadoop give some benefits:
■ Ease of scaling
Hadoop is designed as a distributed system
■ Performance
Hadoop is designed to works in distributed & parallel processing
■ Availability & Reliability
Hadoop platform is providing data protection and automatic failover
configuration
Conclusion
■ Big data is not a barrier, but only a data that need to be managed properly.
■ Used a proper tools to managed them.
■ Prepare the strategy to processing the data (batch or stream).
■ Managed & maintain the system carefully.
■ Use plugins that needed by functional requirements.
■ Grow your business with Data-Driven Approach
FIN

Más contenido relacionado

La actualidad más candente

Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Raja Chiky
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewAmazon Web Services
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 

La actualidad más candente (20)

Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Big data Ppt
Big data PptBig data Ppt
Big data Ppt
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 

Destacado

Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
 
Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017Tracxn
 
2015 Internet Trends Report
2015 Internet Trends Report2015 Internet Trends Report
2015 Internet Trends ReportIQbal KHan
 
Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017Tracxn
 
Tracxn Research - Industrial Robotics Landscape, February 2017
Tracxn Research - Industrial Robotics Landscape, February 2017Tracxn Research - Industrial Robotics Landscape, February 2017
Tracxn Research - Industrial Robotics Landscape, February 2017Tracxn
 
2017 iosco research report on financial technologies (fintech)
2017 iosco research report on  financial technologies (fintech)2017 iosco research report on  financial technologies (fintech)
2017 iosco research report on financial technologies (fintech)Ian Beckett
 
Tracxn Research - Healthcare Analytics Landscape, February 2017
Tracxn Research - Healthcare Analytics Landscape, February 2017Tracxn Research - Healthcare Analytics Landscape, February 2017
Tracxn Research - Healthcare Analytics Landscape, February 2017Tracxn
 
Tracxn Research - Construction Tech Landscape, February 2017
Tracxn Research - Construction Tech Landscape, February 2017Tracxn Research - Construction Tech Landscape, February 2017
Tracxn Research - Construction Tech Landscape, February 2017Tracxn
 
Tugas 4 0317-imelda felicia-1412510545
Tugas 4 0317-imelda felicia-1412510545Tugas 4 0317-imelda felicia-1412510545
Tugas 4 0317-imelda felicia-1412510545imeldafelicia
 
Comparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsComparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsLucas Jellema
 
Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017Tracxn
 
MongoDB NoSQL database a deep dive -MyWhitePaper
MongoDB  NoSQL database a deep dive -MyWhitePaperMongoDB  NoSQL database a deep dive -MyWhitePaper
MongoDB NoSQL database a deep dive -MyWhitePaperRajesh Kumar
 
Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017Tracxn
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
 
Tugas4 0317-nasrulakbar-141250552
Tugas4 0317-nasrulakbar-141250552Tugas4 0317-nasrulakbar-141250552
Tugas4 0317-nasrulakbar-141250552Nasrul Akbar
 
What’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesWhat’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesAmazon Web Services
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceAmazon Web Services
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Apache Apex
 
Europa AI startup scaleups report 2016
Europa AI startup scaleups report 2016 Europa AI startup scaleups report 2016
Europa AI startup scaleups report 2016 Ian Beckett
 

Destacado (20)

Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017
 
2015 Internet Trends Report
2015 Internet Trends Report2015 Internet Trends Report
2015 Internet Trends Report
 
Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017
 
Tracxn Research - Industrial Robotics Landscape, February 2017
Tracxn Research - Industrial Robotics Landscape, February 2017Tracxn Research - Industrial Robotics Landscape, February 2017
Tracxn Research - Industrial Robotics Landscape, February 2017
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
2017 iosco research report on financial technologies (fintech)
2017 iosco research report on  financial technologies (fintech)2017 iosco research report on  financial technologies (fintech)
2017 iosco research report on financial technologies (fintech)
 
Tracxn Research - Healthcare Analytics Landscape, February 2017
Tracxn Research - Healthcare Analytics Landscape, February 2017Tracxn Research - Healthcare Analytics Landscape, February 2017
Tracxn Research - Healthcare Analytics Landscape, February 2017
 
Tracxn Research - Construction Tech Landscape, February 2017
Tracxn Research - Construction Tech Landscape, February 2017Tracxn Research - Construction Tech Landscape, February 2017
Tracxn Research - Construction Tech Landscape, February 2017
 
Tugas 4 0317-imelda felicia-1412510545
Tugas 4 0317-imelda felicia-1412510545Tugas 4 0317-imelda felicia-1412510545
Tugas 4 0317-imelda felicia-1412510545
 
Comparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsComparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statements
 
Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017
 
MongoDB NoSQL database a deep dive -MyWhitePaper
MongoDB  NoSQL database a deep dive -MyWhitePaperMongoDB  NoSQL database a deep dive -MyWhitePaper
MongoDB NoSQL database a deep dive -MyWhitePaper
 
Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
Tugas4 0317-nasrulakbar-141250552
Tugas4 0317-nasrulakbar-141250552Tugas4 0317-nasrulakbar-141250552
Tugas4 0317-nasrulakbar-141250552
 
What’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesWhat’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial Databases
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration Service
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Europa AI startup scaleups report 2016
Europa AI startup scaleups report 2016 Europa AI startup scaleups report 2016
Europa AI startup scaleups report 2016
 

Similar a Introduction to Big Data

Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopArchana Gopinath
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationGeorge Long
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsrishavkumar1402
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdfavenkatram
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCPBlibBlobb
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSpraveen bhat
 

Similar a Introduction to Big Data (20)

Big Data
Big DataBig Data
Big Data
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & Manipulation
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdf
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCP
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFS
 
Big data
Big dataBig data
Big data
 
hadoop
hadoophadoop
hadoop
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Introduction to Big Data

Notas del editor

  1. RDBMS could scale on read operation, but in write operation you need to drop ACID requirements which is violated RDBMS core rules.
  2. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
  3. Batch: “where data is collected and then processed as one unit with processing completion times on the order of hours or days”