SlideShare a Scribd company logo
1 of 24
Download to read offline
BIG	
  DATA	
  TRAINING	
  
Ranga	
  Vadlamudi	
  
March	
  2014	
  
What	
  is	
  Big	
  Data	
  
•  Volume:	
  Large	
  Amounts	
  of	
  Data	
  at	
  rest	
  
•  Velocity:	
  milliseconds	
  to	
  seconds	
  to	
  respond	
  
•  Variety:	
  Data	
  in	
  many	
  forms	
  (Structured,	
  
Unstructured,	
  MulEmedia,	
  Text	
  etc.)	
  
•  Veracity:	
  Data	
  in	
  doubt	
  
•  30	
  billion	
  pieces	
  of	
  content	
  a	
  month	
  
	
  
•  	
  1	
  Peta	
  byte	
  of	
  content	
  every	
  day	
  	
  
•  2	
  Billion	
  videos	
  watched	
  everyday	
  	
  
	
  
•  3	
  Billion	
  people	
  will	
  be	
  online	
  	
  
•  Sharing	
  8	
  zeQabytes	
  of	
  data	
  
	
  	
  
CAP	
  THEOREM	
  
(Consistency,	
  Availability,	
  ParEEon)	
  
Big	
  Data	
  SoluEons	
  
Big	
  Data	
  
Real	
  Time	
  
Querying	
  	
  
Batch	
  	
  
Querying	
  	
  
Mining	
  &	
  
AnalyEcs	
  
Machine	
  
Learning	
  
Storage	
  
Technology	
  
Background	
  
•  Underlying	
  Technology	
  invented	
  by	
  Google	
  
•  Google	
  Big-­‐Table	
  &	
  Google	
  File	
  System	
  
•  Doug	
  Cung	
  created	
  NUTCH	
  and	
  Hadoop	
  was	
  
spun	
  off	
  at	
  Yahoo	
  
•  Yahoo	
  played	
  a	
  key	
  role	
  in	
  developing	
  Hadoop	
  
for	
  enterprise	
  applicaEons	
  
Hadoop	
  	
  
•  Is	
  a	
  framework	
  
•  Built	
  on	
  commodity	
  hardware	
  
•  Implements	
  computaEonal	
  paradigm	
  called	
  
Map-­‐Reduce	
  
•  Provides	
  a	
  distributed	
  file	
  system	
  called	
  HDFS	
  
to	
  store	
  data	
  
•  Node	
  failures	
  are	
  automaEcally	
  handled	
  
Data	
  Becomes	
  BoQleneck	
  
•  Geng	
  data	
  to	
  processors	
  is	
  expensive	
  
•  Typical	
  disk	
  data	
  transfer	
  rate	
  75MB/sec	
  
•  100GB	
  data	
  transfer	
  :	
  22mins	
  approx.	
  
•  New	
  approach	
  is	
  needed	
  
	
  
Hadoop	
  Solves	
  
•  Problems	
  where	
  you	
  have	
  lot	
  of	
  data	
  
•  Mixture	
  of	
  complex	
  and	
  structured	
  data	
  
•  Speeds	
  up	
  computaEons	
  by	
  distribuEon	
  
•  Mantra	
  is	
  take	
  computaEon	
  to	
  the	
  data,	
  don’t	
  
bring	
  data	
  to	
  computaEon	
  
Hadoop	
  DistribuEons	
  
Hadoop	
  Architecture	
  
•  Master	
  Slave	
  philosophy	
  
•  Designed	
  to	
  run	
  on	
  large	
  number	
  of	
  machines	
  
•  Machines	
  don’t	
  share	
  memory	
  or	
  disk	
  
•  Rack	
  them	
  up	
  and	
  run	
  Hadoop	
  on	
  each	
  
machine	
  
Hadoop	
  Architecture	
  
•  Data	
  is	
  divided	
  and	
  spread	
  across	
  servers	
  
•  Hadoop	
  keeps	
  track	
  of	
  where	
  the	
  data	
  is	
  
•  Hadoop	
  replicates	
  data	
  to	
  mulEple	
  copies	
  to	
  
avoid	
  single	
  point	
  of	
  failure	
  
•  MapReduce	
  is	
  a	
  programming	
  model	
  	
  to	
  process	
  
large	
  sets	
  of	
  data	
  in	
  parallel	
  
•  Map	
  the	
  operaEon	
  out	
  to	
  all	
  servers	
  
•  Shuffle	
  the	
  results	
  
•  Reduce	
  the	
  results	
  back	
  into	
  one	
  result	
  set	
  
Hadoop	
  Components	
  
HDFS	
  
(Hadoop	
  File	
  
System	
  	
  
HDFS	
  
•  Distributed	
  file	
  system	
  
•  Highly	
  fault	
  tolerant	
  
•  HDFS	
  instance	
  can	
  span	
  across	
  many	
  servers	
  
•  Has	
  large	
  datasets	
  into	
  terabytes	
  to	
  petabytes	
  
•  Moving	
  computaEon	
  is	
  cheaper	
  than	
  moving	
  
data	
  
•  Large	
  block	
  sizes	
  (128MB	
  for	
  example)	
  
HDFS	
  Layout	
  
Cloudera	
  Manager	
  
•  Management	
  sogware	
  to	
  manage	
  Hadoop	
  
ecosystem	
  
•  Helps	
  install,	
  manage	
  and	
  maintain	
  a	
  cluster	
  
•  Resource	
  consumpEon	
  tracking	
  
•  ProacEve	
  health	
  checks	
  
•  AlerEng	
  
•  Config	
  changes	
  
	
  
Cloudera	
  CapabiliEes	
  
Demo	
  Cloudera	
  
	
  
Demo	
  Cassandra	
  
	
  
Demo	
  Mongo	
  DB	
  
QuesEons?	
  

More Related Content

What's hot

Reblaze Case Study on GCP
Reblaze Case Study on GCPReblaze Case Study on GCP
Reblaze Case Study on GCPIdan Tohami
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA
 
The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data StoreRommel Garcia
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)DataStax
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana
 
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1DataStax
 
Ankus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAnkus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAshrith Mekala
 
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...ArabNet ME
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Rommel Garcia
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBlueData, Inc.
 
Don’t Bring Old Problems to Your New Cloud Data Warehouse
Don’t Bring Old Problems to Your New Cloud Data Warehouse Don’t Bring Old Problems to Your New Cloud Data Warehouse
Don’t Bring Old Problems to Your New Cloud Data Warehouse Precisely
 
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...DataStax
 
Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the CloudNati Shalom
 
Cloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian
 
Hadoop world overview trends and topics
Hadoop world overview trends and topicsHadoop world overview trends and topics
Hadoop world overview trends and topicsValentin Kropov
 

What's hot (20)

Dremio introduction
Dremio introductionDremio introduction
Dremio introduction
 
Reblaze Case Study on GCP
Reblaze Case Study on GCPReblaze Case Study on GCP
Reblaze Case Study on GCP
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data Store
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Cloud and Big Data trends
Cloud and Big Data trendsCloud and Big Data trends
Cloud and Big Data trends
 
DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overview
 
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
 
Ankus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAnkus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration framework
 
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
 
Case study on big data
Case study on big dataCase study on big data
Case study on big data
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
 
Don’t Bring Old Problems to Your New Cloud Data Warehouse
Don’t Bring Old Problems to Your New Cloud Data Warehouse Don’t Bring Old Problems to Your New Cloud Data Warehouse
Don’t Bring Old Problems to Your New Cloud Data Warehouse
 
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...
 
Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the Cloud
 
Cloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian HyperStore Operating Environment
Cloudian HyperStore Operating Environment
 
Hadoop world overview trends and topics
Hadoop world overview trends and topicsHadoop world overview trends and topics
Hadoop world overview trends and topics
 

Similar to Big datatraining ranga_1

Similar to Big datatraining ranga_1 (20)

Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Big datatraining ranga_1

  • 1. BIG  DATA  TRAINING   Ranga  Vadlamudi   March  2014  
  • 2.
  • 3. What  is  Big  Data   •  Volume:  Large  Amounts  of  Data  at  rest   •  Velocity:  milliseconds  to  seconds  to  respond   •  Variety:  Data  in  many  forms  (Structured,   Unstructured,  MulEmedia,  Text  etc.)   •  Veracity:  Data  in  doubt  
  • 4. •  30  billion  pieces  of  content  a  month     •   1  Peta  byte  of  content  every  day     •  2  Billion  videos  watched  everyday       •  3  Billion  people  will  be  online     •  Sharing  8  zeQabytes  of  data      
  • 5.
  • 6. CAP  THEOREM   (Consistency,  Availability,  ParEEon)  
  • 7. Big  Data  SoluEons   Big  Data   Real  Time   Querying     Batch     Querying     Mining  &   AnalyEcs   Machine   Learning   Storage  
  • 9. Background   •  Underlying  Technology  invented  by  Google   •  Google  Big-­‐Table  &  Google  File  System   •  Doug  Cung  created  NUTCH  and  Hadoop  was   spun  off  at  Yahoo   •  Yahoo  played  a  key  role  in  developing  Hadoop   for  enterprise  applicaEons  
  • 10. Hadoop     •  Is  a  framework   •  Built  on  commodity  hardware   •  Implements  computaEonal  paradigm  called   Map-­‐Reduce   •  Provides  a  distributed  file  system  called  HDFS   to  store  data   •  Node  failures  are  automaEcally  handled  
  • 11. Data  Becomes  BoQleneck   •  Geng  data  to  processors  is  expensive   •  Typical  disk  data  transfer  rate  75MB/sec   •  100GB  data  transfer  :  22mins  approx.   •  New  approach  is  needed    
  • 12. Hadoop  Solves   •  Problems  where  you  have  lot  of  data   •  Mixture  of  complex  and  structured  data   •  Speeds  up  computaEons  by  distribuEon   •  Mantra  is  take  computaEon  to  the  data,  don’t   bring  data  to  computaEon  
  • 14. Hadoop  Architecture   •  Master  Slave  philosophy   •  Designed  to  run  on  large  number  of  machines   •  Machines  don’t  share  memory  or  disk   •  Rack  them  up  and  run  Hadoop  on  each   machine  
  • 15. Hadoop  Architecture   •  Data  is  divided  and  spread  across  servers   •  Hadoop  keeps  track  of  where  the  data  is   •  Hadoop  replicates  data  to  mulEple  copies  to   avoid  single  point  of  failure   •  MapReduce  is  a  programming  model    to  process   large  sets  of  data  in  parallel   •  Map  the  operaEon  out  to  all  servers   •  Shuffle  the  results   •  Reduce  the  results  back  into  one  result  set  
  • 17. HDFS   (Hadoop  File   System    
  • 18. HDFS   •  Distributed  file  system   •  Highly  fault  tolerant   •  HDFS  instance  can  span  across  many  servers   •  Has  large  datasets  into  terabytes  to  petabytes   •  Moving  computaEon  is  cheaper  than  moving   data   •  Large  block  sizes  (128MB  for  example)  
  • 19.
  • 21. Cloudera  Manager   •  Management  sogware  to  manage  Hadoop   ecosystem   •  Helps  install,  manage  and  maintain  a  cluster   •  Resource  consumpEon  tracking   •  ProacEve  health  checks   •  AlerEng   •  Config  changes    
  • 23. Demo  Cloudera     Demo  Cassandra     Demo  Mongo  DB