Hadoop Presentation

•Descargar como PPTX, PDF•

5 recomendaciones•6,607 vistas

Pham Thai Hoa

Hadoop Presentation

Tecnología Educación

Topic
 Introduce to Hadoop
 Introduce to Hive
 Introduce to Logger
 Using Hadoop at Mobion
 Warehouse at Mobion
 Q&A

4/14/2012 Pham Thai Hoa

What is Hadoop
 It’s a framework for the distributed
processing
 Inspired by Google’s architecture: Map
Reduce and GFS
 A top-level Apache project
 Hadoop is the open source
 Hadoop have the two important
elements
+ Map – Reduce core
+ Hadoop Distributed File System
4/14/2012 Pham Thai Hoa

Why use Hadoop
 Fault-tolerant hardware is expensive
 Hadoop is designed to run on cheap
commodity hardware
 It automatically handles data
replication and node failure
 It does the hard work – you can focus
on processing data
 It has the three supported modes :
Local, Pseudo-Distributed, Fully-
Distributed Mode
4/14/2012 Pham Thai Hoa

Data Flow into Hadoop

4/14/2012 Pham Thai Hoa

Who use Hadoop
 Amazon's product search indices
using the streaming API and pre-
existing C++, Perl, and Python tools
 Yahoo : More than 100,000 CPUs in
>40,000 computers running Hadoop
 Facebook use Hadoop to store copies
of internal log and dimension data
sources and use it as a source for
reporting/analytics and machine
learning
4/14/2012 Pham Thai Hoa

What is Hive
 Hive is a data warehouse system for
Hadoop
 Using Map-Reduce for execution
 Using HDFS for storage
 Metadata in an RDBMS
 Scalability and performance
 Interoperability
 Using a SQL-like language called
HiveQL
4/14/2012 Pham Thai Hoa

Data Flow into Hive

4/14/2012 Pham Thai Hoa

Hive Data Model
 Tables
+ Typed columns (int, float, string,…)
+ Also, array/map/struct for JSON-like
data
 Partitions
+ e.g., to range-partition tables by
date
 Buckets
+ Hash partitions within ranges (useful
for sampling, join optimization)
4/14/2012 Pham Thai Hoa

Hive Metastore
 Database: namespace containing a
set of tables
 Holds Table/Partition definitions
(column types,mappings to HDFS
directories)
 Statistics
 Implemented with DataNucleus ORM.
Runs on Derby, MySQL, and many
other relational databases
4/14/2012 Pham Thai Hoa

Introduce to Logger
 A logging system has three broad
components
+ Client Code Interface
+ Distribution System
+ Do Something Usefullizer
 Scribe is a server for aggregating
streaming log data. It is designed to
scale to a very large number of nodes
and be robust to network and node
failures
4/14/2012 Pham Thai Hoa

Why use Scribe
 Scalability and performance
 Event Notification library
 Thrift framework
 Hadoop is optional
 Client using
 Distributed scribe system
 Over 1 million messages per second
for logging
 Hierarchy stores

4/14/2012 Pham Thai Hoa

Warehouse at Mobion
 Log Collector
 Log/Data Transformer
 Data Analyzer
 Web Reporter
 Log define
 Log integrate (into application)
 Log/Data analyze
 Report develop (API, Mobion, Music
…)
4/14/2012 Pham Thai Hoa

Warehouse at Mobion
 Data mining
 Music Recommendation
 Spam Detection
 Application performance
 Export data and import into MySQL for
web report
 Analytic system

4/14/2012 Pham Thai Hoa

Q&A
 Why use hadoop ?
 Why use Hive ?
 Why need a logging system ?
 What is the warehouse system
architecture ?
 Do we use these system for voting,
chat, message and feed ??
 How can we use them for
recommendation, suggestion ?

4/14/2012 Pham Thai Hoa

Following Link
 http://facebook.com
 http://highscalability.com/product-
scribe-facebooks-scalable-logging-
system
 http://hadoop.apache.org/
 http://hive.apache.org/
 http://wiki.apache.org/hadoop/Powere
dBy
 http://www.apache.org/foundation/than
ks.html 4/14/2012 Pham Thai Hoa

Más contenido relacionado

La actualidad más candente

Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal

PPT on HadoopShubham Parmar

Introduction to Big Data & HadoopEdureka!

Hadoop Presentation - PPTAnand Pandey

Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari

Apache hadoop introduction and architectureHarikrishnan K

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn

HadoopNishant Gandhi

Introduction to Bigdata and HADOOP vinoth kumar

Hadoop: Distributed Data ProcessingCloudera, Inc.

Hadoop tools with ExamplesJoe McTee

HADOOP TECHNOLOGY pptsravya raju

Introduction to Hadoop TechnologyManish Borkar

Introduction to Apache Hadoop EcosystemMahabubur Rahaman

Hadoop technologytipanagiriharika

Big data processing with apache spark part1Abbas Maazallahi

Hadoop EcosystemSandip Darwade

Big data Analytics HadoopMishika Bharadwaj

Big Data ConceptsAhmed Salman

Introduction to Big Data and HadoopEdureka!

La actualidad más candente (20)

Introduction to Big Data & Hadoop Architecture - Module 1

PPT on Hadoop

Introduction to Big Data & Hadoop

Hadoop Presentation - PPT

Big data Hadoop Analytic and Data warehouse comparison guide

Apache hadoop introduction and architecture

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...

Hadoop

Introduction to Bigdata and HADOOP

Hadoop: Distributed Data Processing

Hadoop tools with Examples

HADOOP TECHNOLOGY ppt

Introduction to Hadoop Technology

Introduction to Apache Hadoop Ecosystem

Hadoop technology

Big data processing with apache spark part1

Hadoop Ecosystem

Big data Analytics Hadoop

Big Data Concepts

Introduction to Big Data and Hadoop

Destacado

Seminar Presentation HadoopVarun Narang

Hadoop introduction , Why and What is Hadoop ?sudhakara st

Big Data Analytics with HadoopPhilippe Julio

Pig, Making Hadoop EasyNick Dimiduk

introduction to data processing using Hadoop and PigRicardo Varela

Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar

Hive Quick Start TutorialCarl Steinbach

Integration of Hive and HBaseHortonworks

Hadoop demo pptPhil Young

Introduction To Map Reducerantav

Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans

HIVE: Data Warehousing & Analytics on HadoopZheng Shao

Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil

Food & Beverage Liability InsuranceTom Wallace, CIC, ARM

Room Viewerroomviewer

The New Enterprise Data PlatformKrishnan Parasuraman

Apartment buildings insuranceCompleteMarkets/INSOMIS Corp.

Life Insurance FactsPolicyBoss

Destacado (18)

Seminar Presentation Hadoop

Hadoop introduction , Why and What is Hadoop ?

Big Data Analytics with Hadoop

Pig, Making Hadoop Easy

introduction to data processing using Hadoop and Pig

Practical Problem Solving with Apache Hadoop & Pig

Hive Quick Start Tutorial

Integration of Hive and HBase

Hadoop demo ppt

Introduction To Map Reduce

Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop

HIVE: Data Warehousing & Analytics on Hadoop

Hadoop, Pig, and Twitter (NoSQL East 2009)

Food & Beverage Liability Insurance

Room Viewer

The New Enterprise Data Platform

Apartment buildings insurance

Life Insurance Facts

Hadoop Presentation

1. Hadoop Presentation 2012 Presenter : Pham Thai Hoa Email : thaihoabo@gmail.com Web : http://mobion.com/hoa 4/14/2012 Pham Thai Hoa

2. Topic  Introduce to Hadoop  Introduce to Hive  Introduce to Logger  Using Hadoop at Mobion  Warehouse at Mobion  Q&A 4/14/2012 Pham Thai Hoa

3. What is Hadoop  It’s a framework for the distributed processing  Inspired by Google’s architecture: Map Reduce and GFS  A top-level Apache project  Hadoop is the open source  Hadoop have the two important elements + Map – Reduce core + Hadoop Distributed File System 4/14/2012 Pham Thai Hoa

4. Why use Hadoop  Fault-tolerant hardware is expensive  Hadoop is designed to run on cheap commodity hardware  It automatically handles data replication and node failure  It does the hard work – you can focus on processing data  It has the three supported modes : Local, Pseudo-Distributed, Fully- Distributed Mode 4/14/2012 Pham Thai Hoa

5. Data Flow into Hadoop 4/14/2012 Pham Thai Hoa

6. Who use Hadoop  Amazon's product search indices using the streaming API and pre- existing C++, Perl, and Python tools  Yahoo : More than 100,000 CPUs in >40,000 computers running Hadoop  Facebook use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning 4/14/2012 Pham Thai Hoa

7. What is Hive  Hive is a data warehouse system for Hadoop  Using Map-Reduce for execution  Using HDFS for storage  Metadata in an RDBMS  Scalability and performance  Interoperability  Using a SQL-like language called HiveQL 4/14/2012 Pham Thai Hoa

8. Data Flow into Hive 4/14/2012 Pham Thai Hoa

9. Hive Data Model  Tables + Typed columns (int, float, string,…) + Also, array/map/struct for JSON-like data  Partitions + e.g., to range-partition tables by date  Buckets + Hash partitions within ranges (useful for sampling, join optimization) 4/14/2012 Pham Thai Hoa

10. Hive Metastore  Database: namespace containing a set of tables  Holds Table/Partition definitions (column types,mappings to HDFS directories)  Statistics  Implemented with DataNucleus ORM. Runs on Derby, MySQL, and many other relational databases 4/14/2012 Pham Thai Hoa

11. Introduce to Logger  A logging system has three broad components + Client Code Interface + Distribution System + Do Something Usefullizer  Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures 4/14/2012 Pham Thai Hoa

12. Why use Scribe  Scalability and performance  Event Notification library  Thrift framework  Hadoop is optional  Client using  Distributed scribe system  Over 1 million messages per second for logging  Hierarchy stores 4/14/2012 Pham Thai Hoa

13. Warehouse at Mobion  Log Collector  Log/Data Transformer  Data Analyzer  Web Reporter  Log define  Log integrate (into application)  Log/Data analyze  Report develop (API, Mobion, Music …) 4/14/2012 Pham Thai Hoa

14. Warehouse at Mobion  Data mining  Music Recommendation  Spam Detection  Application performance  Export data and import into MySQL for web report  Analytic system 4/14/2012 Pham Thai Hoa

15. Q&A  Why use hadoop ?  Why use Hive ?  Why need a logging system ?  What is the warehouse system architecture ?  Do we use these system for voting, chat, message and feed ??  How can we use them for recommendation, suggestion ? 4/14/2012 Pham Thai Hoa

16. Following Link  http://facebook.com  http://highscalability.com/product- scribe-facebooks-scalable-logging- system  http://hadoop.apache.org/  http://hive.apache.org/  http://wiki.apache.org/hadoop/Powere dBy  http://www.apache.org/foundation/than ks.html 4/14/2012 Pham Thai Hoa

17. THANK YOU 4/14/2012 Pham Thai Hoa

Hadoop Presentation

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (18)

Similar a Hadoop Presentation

Similar a Hadoop Presentation (20)

Último

Último (20)

Hadoop Presentation