HadoopDB in Action

•Descargar como PPTX, PDF•

0 recomendaciones•717 vistas

Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

HadoopDB in Action: Building Real World Applications

Educación

 Introduction
 Architecture
and Design
 Example application
 Demostration Scenario

 Managing and analysing massive data
◦ Provides high performance
◦ Scales over clusters of thousands of heterogeneous
machines
◦ Versatile-adaptability of a system to analytical
queries of varying complexity

How does one build real world applications with
HadoopDB?

 Database Connector - connects Hadoop with
the single-node database systems.
 Data Loader - partitions data and manages
parallel loading of data into the database
systems.
 Catalog - tracks locations of different data
chunks,including those replicated across
multiple nodes.
 SQL-MapReduce-SQL (SMS) planner - extends
Hive to provide a SQL interface to HadoopDB

 Supports any JDBC-compliant database server
as an underlying DBMS layer
 Applications built on top of HadoopDB
generally use the 3-tier architecture
◦ data tier
◦ business logic tier
◦ presentation tier
 HadoopDB is a black box(in application
perspective)

 A semantic web/biological data analysis
application.
 A business data warehousing application.

 Semantic web is an effort by the W3C to
enable integration and sharing of data across
dierent applications
 RDF- is a directed, labeled graph data format
for representing information in the Web
 SPARQL –is an RDF query language

 Find all proteins whose existence in the
`Human' organism is uncertain

 SPARQL query :

 demonstrate
◦ how the data administrator should prepare the
dataset.

 Analyst- is shielded from the complexity of
the actual implementation of the RDF storage
layer.

 Natural target application for HadoopDB.
 Common business data warehousing
workloads are read-mostly and involve
analytical queries over a complex schema
 To achieve good query performance, the
dataset requires signicant preparation
through data partitioning and replication to
optimize for join queries
 Data & Queries- TPC-H benchmark

 Find 10 highest-revenue unshipped orders
 Query :

 Audience is invited to query both data sets
through HadoopDB
 Data sets are located in a remote cluster
 Multiple users interaction- two client
machines that connect to the clusters.

 user selects dataset
SemanticWeb—Biological Data Analysis
- An animation of the behind-the-scenes data
preparation & loading is presented
- Details on the tools used for data conversion from
RDF to relational form.
Business Data Warehousing- the animation provides
details on the partitioning scheme, the interaction
between the loader and catalog components, and a
summary of the configuration parameters

 User select and parametrize a query to execute
-User can then monitor the progress of query
execution

 In addition demonstrate HadoopDB's fault-
tolerance with the introduction of a node
failure.
 For a subset of the predened queries, as the
query executes in the background, an animation
of the flow of data and control through the
HadoopDB system is simultaneously presented,
highlighting which parts of the query execution
are run in parallel.

Más contenido relacionado

La actualidad más candente

Hadoop - A big data initiativeMansi Mehra

EDW and HadoopTapio Vaattanen

Performance Comparison of HBase and CassandraYashIyengar

HiveManas Nayak

Heart ProposalEdward Yoon

Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...AyeeshaParveen

Hadoop mapreduce and yarn frame work- unit5RojaT4

Big Data and HadoopMaulikLakhani

Iaetsd mapreduce streaming over cassandra datasetsIaetsd Iaetsd

An Introduction to Apache SparkElvis Saravia

Big Data with SQL ServerMark Kromer

Cred_hadoop_presenatationAshish Saraf

The Vision for Graph Database from PostgresEDB

Hadoop vs sparkamarkayam

Apache Hivetusharsinghal58

Big data vahidamiri-tabriz-13960226-datastack.irdatastack

Big data technology unit 3RojaT4

Big Data and Hadoop ComponentsDezyreAcademy

Data Discovery on Hadoop - Realizing the Full Potential of your DataDataWorks Summit

Quantopix analytics system (qas)Al Sabawi

La actualidad más candente (20)

Hadoop - A big data initiative

EDW and Hadoop

Performance Comparison of HBase and Cassandra

Hive

Heart Proposal

Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...

Hadoop mapreduce and yarn frame work- unit5

Big Data and Hadoop

Iaetsd mapreduce streaming over cassandra datasets

An Introduction to Apache Spark

Big Data with SQL Server

Cred_hadoop_presenatation

The Vision for Graph Database from Postgres

Hadoop vs spark

Apache Hive

Big data vahidamiri-tabriz-13960226-datastack.ir

Big data technology unit 3

Big Data and Hadoop Components

Data Discovery on Hadoop - Realizing the Full Potential of your Data

Quantopix analytics system (qas)

Destacado

Exercicio 02alicemota

Pig ExperienceTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

How to become a google educator and google education trainerRupinder Bedi

Interpreting the Data:Parallel Analysis with SawzallTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

New look for GAPPS Admin console-A Comparison of Old and New InterfaceRupinder Bedi

Cheetah:Data Warehouse on Top of MapReduceTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

PDToolkit for Words Their Waymbagshaw

MapReduceTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

IngleeeeesMario Andres Ayala Meza

Covering algorithmTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Hierachical clusteringTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Becoming a google educator and google education trainerRupinder Bedi

evaluation and credibility-Part 1Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Machine Learning and Data MiningTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Decision treeTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

K Nearest NeighborsTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Pusad solar power plant presentation (rev 7)California Free Solar

Big data in telecomTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Destacado (18)

Exercicio 02

Pig Experience

How to become a google educator and google education trainer

Interpreting the Data:Parallel Analysis with Sawzall

New look for GAPPS Admin console-A Comparison of Old and New Interface

Cheetah:Data Warehouse on Top of MapReduce

PDToolkit for Words Their Way

MapReduce

Ingleeeees

Covering algorithm

Hierachical clustering

Becoming a google educator and google education trainer

evaluation and credibility-Part 1

Machine Learning and Data Mining

Decision tree

K Nearest Neighbors

Pusad solar power plant presentation (rev 7)

Big data in telecom

Similar a HadoopDB in Action

Big Data: RDBMS vs. Hadoop vs. SparkGraisy Biswal

HAWQ: a massively parallel processing SQL engine in hadoopBigData Research

Comparison among rdbms, hadoop and sparkAgnihotriGhosh2

NoSQL Databases Introduction - UTN 2013Facundo Farias

عصر کلان داده، چرا و چگونه؟datastack

A big-data architecture for real-time analyticsramikaurraminder

RDBMS vs Hadoop vs SparkLaxmi8

No sql databasevishal gupta

Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group

Analysing big data with cluster service and RLushi Chen

Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer

Managing Big data with HadoopNalini Mehta

Hadoop - A big data initiativeMansi Mehra

Big Data , Big Problem?Mohammadhasan Farazmand

Hadoop_arunam_pptjerrin joseph

Big_SQL_3.0_WhitepaperScott Gray

Hadoop Integration with Microstrategy snehal parikh

Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks

EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal

Google Data Engineering.pdfavenkatram

Similar a HadoopDB in Action (20)

Big Data: RDBMS vs. Hadoop vs. Spark

HAWQ: a massively parallel processing SQL engine in hadoop

Comparison among rdbms, hadoop and spark

NoSQL Databases Introduction - UTN 2013

عصر کلان داده، چرا و چگونه؟

A big-data architecture for real-time analytics

RDBMS vs Hadoop vs Spark

No sql database

Vikram Andem Big Data Strategy @ IATA Technology Roadmap

Analysing big data with cluster service and R

Big Data Analytics with Hadoop, MongoDB and SQL Server

Managing Big data with Hadoop

Hadoop - A big data initiative

Big Data , Big Problem?

Hadoop_arunam_ppt

Big_SQL_3.0_Whitepaper

Hadoop Integration with Microstrategy

Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...

EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING

Google Data Engineering.pdf

Más de Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

BlockChain.pptxTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Introduction to data mining and machine learningTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Introduction to cloud computingTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Data analyticsTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Hadoop Eco systemTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Parallel Computing on the GPUTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

evaluation and credibility-Part 2Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

kmean clusteringTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Assosiate rule miningTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Cloud ComputingTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Efficient Parallel Set-Similarity Joins Using MapReduceTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Hadoop DBTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Más de Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL (12)

BlockChain.pptx

Introduction to data mining and machine learning

Introduction to cloud computing

Data analytics

Hadoop Eco system

Parallel Computing on the GPU

evaluation and credibility-Part 2

kmean clustering

Assosiate rule mining

Cloud Computing

Efficient Parallel Set-Similarity Joins Using MapReduce

Hadoop DB

Último

Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732

APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management

Introduction to AI in Higher Education_draft.pptxpboyjonauth

MENTAL STATUS EXAMINATION format.docxPoojaSen20

A Critique of the Proposed National Education Policy ReformChameera Dedduwage

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani

The Most Excellent Way | 1 Corinthians 13Steve Thomason

Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1

Arihant handbook biology for class 11 .pdfchloefrazer622

Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a

How to Make a Pirate ship Primary Education.pptxmanuelaromero2013

Software Engineering Methodologies (overview)eniolaolutunde

URLs and Routing in the Odoo 17 Website AppCeline George

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar

TataKelola dan KamSiber Kecerdasan Buatan v022.pdfSarwono Sutikno, Dr.Eng.,CISA,CISSP,CISM,CSX-F

Grant Readiness 101 TechSoup and Remy ConsultingTechSoup

Alper Gobel In Media Res Media ComponentInMediaRes1

Accessible design: Minimum effort, maximum impactdawncurless

HadoopDB in Action

1. Tilani Gunawardena

2.  Introduction  Architecture and Design  Example application  Demostration Scenario

3.  Managing and analysing massive data ◦ Provides high performance ◦ Scales over clusters of thousands of heterogeneous machines ◦ Versatile-adaptability of a system to analytical queries of varying complexity How does one build real world applications with HadoopDB?

4.  Database Connector - connects Hadoop with the single-node database systems.  Data Loader - partitions data and manages parallel loading of data into the database systems.  Catalog - tracks locations of different data chunks,including those replicated across multiple nodes.  SQL-MapReduce-SQL (SMS) planner - extends Hive to provide a SQL interface to HadoopDB

6.  Supports any JDBC-compliant database server as an underlying DBMS layer  Applications built on top of HadoopDB generally use the 3-tier architecture ◦ data tier ◦ business logic tier ◦ presentation tier  HadoopDB is a black box(in application perspective)

7.  A semantic web/biological data analysis application.  A business data warehousing application.

8.  Semantic web is an effort by the W3C to enable integration and sharing of data across dierent applications  RDF- is a directed, labeled graph data format for representing information in the Web  SPARQL –is an RDF query language

9.  Find all proteins whose existence in the `Human' organism is uncertain  SPARQL query :

10.

11.  demonstrate ◦ how the data administrator should prepare the dataset.  Analyst- is shielded from the complexity of the actual implementation of the RDF storage layer.

12.  Natural target application for HadoopDB.  Common business data warehousing workloads are read-mostly and involve analytical queries over a complex schema  To achieve good query performance, the dataset requires signicant preparation through data partitioning and replication to optimize for join queries  Data & Queries- TPC-H benchmark

13.  Find 10 highest-revenue unshipped orders  Query :

14.  Audience is invited to query both data sets through HadoopDB  Data sets are located in a remote cluster  Multiple users interaction- two client machines that connect to the clusters.

15.  user selects dataset SemanticWeb—Biological Data Analysis - An animation of the behind-the-scenes data preparation & loading is presented - Details on the tools used for data conversion from RDF to relational form. Business Data Warehousing- the animation provides details on the partitioning scheme, the interaction between the loader and catalog components, and a summary of the configuration parameters  User select and parametrize a query to execute -User can then monitor the progress of query execution

16.  In addition demonstrate HadoopDB's fault- tolerance with the introduction of a node failure.  For a subset of the predened queries, as the query executes in the background, an animation of the flow of data and control through the HadoopDB system is simultaneously presented, highlighting which parts of the query execution are run in parallel.

17. Thank You!

Notas del editor

Versatile-system flexibility
Key components of HadoopDB
HadoopDB therefore pushes computation closer to data (into the data tier) to achieve maximum parallelization in a multi-node clustercomplexity of the data tier and its parallel nature is hidden from the application developer
Universal Protein Resource.presentation layer consists of a web-based interface where analysts specify queries and view resultslogic layer consists of a SPARQL to SQL conversion toollogic and data layer communicate through JDBC
presentations provide our audience with an idea of the eort required for data preparation in HadoopDB

HadoopDB in Action

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (18)

Similar a HadoopDB in Action

Similar a HadoopDB in Action (20)

Más de Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Más de Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL (12)

Último

Último (20)

HadoopDB in Action

Notas del editor