Enviar búsqueda
Cargar
Harnessing hadoop for big data analytics v0.1
•
1 recomendación
•
1,516 vistas
J
jobinwilson
Seguir
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 18
Descargar ahora
Descargar para leer sin conexión
Recomendados
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilind
EMC
Big data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Gord Sissons
User Experience through My work
User Experience through My work
Harris Lymperopoulos
DSN's and the Digital 4th Way
DSN's and the Digital 4th Way
Phil Lane Jr.
20131125 buyer behavior iba mba48 d
20131125 buyer behavior iba mba48 d
Zeeshan Huq
20131220 buyer behavior iba mba48 d
20131220 buyer behavior iba mba48 d
Zeeshan Huq
Genesics2 (2011) Competitive Intelligence Report
Genesics2 (2011) Competitive Intelligence Report
Viedoc
Recomendados
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilind
EMC
Big data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Gord Sissons
User Experience through My work
User Experience through My work
Harris Lymperopoulos
DSN's and the Digital 4th Way
DSN's and the Digital 4th Way
Phil Lane Jr.
20131125 buyer behavior iba mba48 d
20131125 buyer behavior iba mba48 d
Zeeshan Huq
20131220 buyer behavior iba mba48 d
20131220 buyer behavior iba mba48 d
Zeeshan Huq
Genesics2 (2011) Competitive Intelligence Report
Genesics2 (2011) Competitive Intelligence Report
Viedoc
20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 e
Zeeshan Huq
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
alanpillay79
Monavie Presentation
Monavie Presentation
monavieemployer
Cl introduction of p1_&_p2
Cl introduction of p1_&_p2
alanpillay79
Recommendation engines : Matching items to users
Recommendation engines : Matching items to users
jobinwilson
P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011
alanpillay79
Viral marketing
Viral marketing
Malathy Chithra
20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 d
Zeeshan Huq
TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011
alanpillay79
Building apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon Boston
amansk
Brightwater Engineering General Presentation
Brightwater Engineering General Presentation
fletcher_mat
Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01
Mukesh Thakur
Budjettikone
Budjettikone
Pluto Finland
Pharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence Report
Viedoc
Program Komuniti Tone Plus
Program Komuniti Tone Plus
Vun Chee Vui
Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010
Viedoc
IT & Big Data 2012 Report
IT & Big Data 2012 Report
Viedoc
Mauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea Decalogo
Mauricio Escalante
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
Viedoc
20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 d
Zeeshan Huq
Leveraging open source for big data stack
Leveraging open source for big data stack
Flytxt
HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)
Peter Lubbers
Más contenido relacionado
Destacado
20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 e
Zeeshan Huq
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
alanpillay79
Monavie Presentation
Monavie Presentation
monavieemployer
Cl introduction of p1_&_p2
Cl introduction of p1_&_p2
alanpillay79
Recommendation engines : Matching items to users
Recommendation engines : Matching items to users
jobinwilson
P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011
alanpillay79
Viral marketing
Viral marketing
Malathy Chithra
20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 d
Zeeshan Huq
TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011
alanpillay79
Building apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon Boston
amansk
Brightwater Engineering General Presentation
Brightwater Engineering General Presentation
fletcher_mat
Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01
Mukesh Thakur
Budjettikone
Budjettikone
Pluto Finland
Pharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence Report
Viedoc
Program Komuniti Tone Plus
Program Komuniti Tone Plus
Vun Chee Vui
Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010
Viedoc
IT & Big Data 2012 Report
IT & Big Data 2012 Report
Viedoc
Mauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea Decalogo
Mauricio Escalante
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
Viedoc
20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 d
Zeeshan Huq
Destacado
(20)
20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 e
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
Monavie Presentation
Monavie Presentation
Cl introduction of p1_&_p2
Cl introduction of p1_&_p2
Recommendation engines : Matching items to users
Recommendation engines : Matching items to users
P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011
Viral marketing
Viral marketing
20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 d
TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011
Building apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon Boston
Brightwater Engineering General Presentation
Brightwater Engineering General Presentation
Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01
Budjettikone
Budjettikone
Pharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence Report
Program Komuniti Tone Plus
Program Komuniti Tone Plus
Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010
IT & Big Data 2012 Report
IT & Big Data 2012 Report
Mauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea Decalogo
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 d
Similar a Harnessing hadoop for big data analytics v0.1
Leveraging open source for big data stack
Leveraging open source for big data stack
Flytxt
HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)
Peter Lubbers
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Taras Filatov
Html5 Flyover
Html5 Flyover
Skills Matter
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data Stores
DATAVERSITY
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
Adam Muise
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
Sean Roberts
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
AI4BD GmbH
SharePoint from the Forms-Eye View
SharePoint from the Forms-Eye View
Steve Weissman
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
Building a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev Day
javier ramirez
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Cloudera, Inc.
Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11
Adrian Treacy
Intro to hadoop tutorial
Intro to hadoop tutorial
markgrover
IBM Watson
IBM Watson
Mohamed Tawfik
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperability
parker01
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
Plug 20110217
Plug 20110217
Skills Matter
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
Mia Yuan Cao
PyData Texas 2015 Keynote
PyData Texas 2015 Keynote
Peter Wang
Similar a Harnessing hadoop for big data analytics v0.1
(20)
Leveraging open source for big data stack
Leveraging open source for big data stack
HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Html5 Flyover
Html5 Flyover
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data Stores
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
SharePoint from the Forms-Eye View
SharePoint from the Forms-Eye View
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Building a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev Day
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11
Intro to hadoop tutorial
Intro to hadoop tutorial
IBM Watson
IBM Watson
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperability
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Plug 20110217
Plug 20110217
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
PyData Texas 2015 Keynote
PyData Texas 2015 Keynote
Último
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Wonjun Hwang
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
charlottematthew16
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
Zilliz
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
hariprasad279825
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Zilliz
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
carlostorres15106
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
null - The Open Security Community
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
RankYa
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Commit University
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Slibray Presentation
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Hervé Boutemy
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Enterprise Knowledge
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
Último
(20)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Harnessing hadoop for big data analytics v0.1
1.
Transforming Mobile Marketing
& Advertising™ Harnessing s for Big Data Analytics Jobin Wilson jobin.wilson@flytxt.com Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
2.
Who am I
? • Architect @ Flytxt (Big Data Analytics & Automation) • Passionate about data, distributed computing , machine learning • Previously •Virtualization & Cloud Lifecycle Management(BMC) • Designed and Implemented Cloud Life Cycle Management Interface for BMC • Large Scale Data Centre Automation(AOL) • Implemented Centralized Data Center Management Framework for AOL •Workflow Systems & Automation (Accenture) • Implemented Service Management Suit for various customers Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
3.
Session Agenda! • Data
– What's the big deal? • What is Hadoop( & What it is not ) • Map-Reduce Model & HDFS • Hadoop Ecosystem & Tools • Lets get started! • Q&A 3 Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
4.
Five computers &
a 640k ;-) "I think there is a world market for about five computers" Moore’s Law Thomas Watson 1943, Chairman of the board of IBM "640k ought to be enough for anybody" Attributed to Bill Gates in 1981. Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
5.
Data Explosion !
Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
6.
Do I also
know what you might do next summer? • Does your travel company know you visited Goa & Cochin twice in the last two years? • Collaborative Filtering • Lots of Data + Statistics = WOW!!! • BTW, don’t worry about the eqn Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
7.
Don‟t throw away
data just because it doesn't „fit‟ • relational tuples, log files, semi structured textual data (e.g., e-mail),pictures , videos • User generated data & System generated data • Applications need more than structured data • My application is not “Dumb” any more!! • “I keep saying that the sexy job in the next 10 years will be statisticians, and I’m not kidding.” - Hal Varian (Google’s chief economist) Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
8.
Lets get to
business!! What is Apache Hadoop ? • Apache Hadoop is an open-source system to reliably store and process extremely large data sets across many commodity computers. • originally developed to support Nutch search engine project. • scales linearly with data size or analysis complexity • Scale-out ,shared nothing architecture • inspired by Google's MapReduce and Google File System (GFS) papers Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
9.
Basics of Hadoop
• Two Core Components – HDFS & Map-Reduce • Machines are un-reliable • Separates distributed fault-tolerant computing code from application logic. • No need to worry about identity of a machine • lets you interact with a cluster, not a bunch of machines. • Analysis workloads span across multiple machines • runs as a cloud(cluster) & possibly on a cloud (EC2) Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
10.
Lead Actors •
Name Node – Book keeping metadata server • Secondary Name Node – Assistant to Name Node • Job Tracker – Scheduler • Task Tracker - Task execution • Data Node - Block storage Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
11.
HDFS Write Model
Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
12.
Map-Reduce Model
Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
13.
Map-Reduce Execution Flow
Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
14.
Hadoop Ecosystem •
Oozie – Open-source workflow/coordination service to manage data processing jobs for Apache Hadoop™ - Developed at Yahoo! • HBase – Column-store database based on Google’s BigTable. Holds extremely large data sets (Petabytes) • Hive – SQL based data warehousing app with features for analyzing very large data sets - Developed at Facebook • Zoo Keeper – Distributed consensus engine providing Leader election, service discovery, distributed locking / mutual exclusion • Pig - platform for analyzing large data sets that consists of a high-level language for expressing data analysis steps • Ganglia - a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
15.
Hadoop is not
a “Holy Grail” • Not a substitute for a database • MapReduce is not always the best algorithm • HDFS is not a substitute for a High Availability SAN-hosted FS • HDFS is not a Posix file system • Not a place to learn Java programming • Not a place to learn Unix/Linux system administration • Not a place to learn basics of networking Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
16.
Notable Users of
Hadoop (Source: http://en.wikipedia.org/wiki/Hadoop) • A9.com • Meebo • AOL • Metaweb • EHarmony • The New York Times • eBay • Rackspace • Facebook • StumbleUpon • Fox Interactive Media • Twitter • IBM • Yahoo • Last.fm • Amazon • LinkedIn Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
17.
Q&A
www.flytxt.com Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
18.
THANK YOU
contact us : dev2dev@flytxt.com/ jobin.wilson@flytxt.com www.flytxt.com Confidential 18 Copyright © 2010 Flytxt B.V. All rights reserved.
Descargar ahora