SlideShare una empresa de Scribd logo
1 de 59
Descargar para leer sin conexión
Big Data Hadoop – Hands On Workshop
Data Processing Solutions – Comparison Guide
Big Data Workshop Series
Danairat T.
Results
Data Inputs
Cloud
1 2
Data Inputs
Results
Staging
Staging
Staging
Big
DWH
Data
Mart
Data
Mart
Data
Mart
Data
Mart
C
u
b
e
C
u
b
e
C
u
b
e
C
u
b
e
C
u
b
e
Staging
Analy
tic
Resul
ts
Layer
Cube
Layer
Data
Mart
Layer
Data
Warehouse
Layer
Data
Staging
Layer
Data
Source
Layer
1 2 3 4 5 6
Core Hadoop Traditional Data Warehouse
VS.
Big Data Hadoop
Solution 1. Core Hadoop processing
NO data staging transformation and NO data move required!!
Analytic Results
Excel Inputs
Top Benefits
1. Cloud and IoT ready architecture roadmap
2. No data duplication with reduce cost of data store/storage
3. Fast data processing and all processing are built-in fault tolerant
4. Align with unify data architecture and data governance
5. Less steps of data processing comparing with traditional DWH
The Effort Investment:-
1. Learn core Hadoop
Cloud Ready
1 2
Examples
Big Data Hadoop
Solution 2. Using BI Tools to analyze Hadoop data
Required single transformation in Hadoop for BI Tools since there is no BI Tools
built-in POI parser (MS-Office connector) over HDFS protocol.
Hadoop HDFS
(CVS Raw Text)
Excel Inputs
Top Benefits
1. Lower cost with cloud/IoT ready architecture
2. Fast data processing and all processing are built-in fault tolerant
3. Less steps of data processing comparing with traditional DWH
The Effort Investment:-
1. Learn Hadoop
2. Require transformation to
RAW text for BI Tools
Cloud Ready
1 2 3
Examples
Big Data Hadoop
Solution 3. Creating data warehouse in Hadoop
Required single transformation with DWH set up on Hadoop
for BI Tools
Top Benefits
1. Lower cost with cloud/IoT ready architecture
2. Fast data processing and all processing are built-in fault tolerant
3. Less steps of data processing comparing with traditional DWH
The Effort Investment:-
1. Learn core Hadoop
2. Require transformation to RAW text
for BI Tools
3. Require DWH on Hadoop set up
(Hive, Cassandra, HBase)
Hadoop HDFSExcel Inputs
Cloud Ready
Hadoop
DWH
Hive, (or
Cassandra,
Hbase)
1 2 3 4
Examples
Big Data Hadoop
Solution 4. Implementing traditional data warehouse
Staging
Staging
Staging
The more data
grow, the
slower data
processing
Data Mart
Data Mart
Data Mart
Data Mart
Top Concerns from Traditional Data Warehouse Architecture
1. A lot of data duplication lead to cost of data store/storage issue
2. Very slow of data processing and need to restart/roll back the job if any failed
3. Data security issue due to keep data too many copies and various formats
Cube
Cube
Cube
Cube
Cube
Staging
Analytic
Results
Layer
Cube
Layer
Data Mart
Layer
Data
Warehouse
Layer
Data
Staging
Layer
Data Source
Layer
1 2 3 4 5 6
Big Data Hadoop
Benefits Comparison Summary
Benefits
Criteria
Solutions
Cloud
Ready
Archit
ecture
Built-In
Parallel
Proces
sing
IoT
Archite
cture
Roadma
p
Without
DB cube
investm
ent
Witho
ut data
mart
invest
ment
Without
DWH
investme
nt
Without
Staging
data
(RAW
Text)
Unstruct
ured and
RAW
Source
Content
Processin
g
1. Core
Hadoop
Yes Yes Yes Yes Yes Yes Yes Yes
2. Hadoop and
Pentaho/Power
BI
Yes Yes Yes Yes Yes Yes No
(require
CSV)
No
(require
CSV)
3. Hadoop and
Cognos,
RapidMiner,
BO, Cognos,
Tableau
Yes Yes Yes Yes Yes No
(require
Hive
connector)
No
(require
Hive
connector)
No
(require
Hive
connector)
4. Traditional
Data
Warehouse
No No No No No No No No
Big Data Hadoop
Appendix
Big Data Hadoop
Pentaho supports Big Data Inputs
Big Data Hadoop
PowerBI supports Big Data Inputs
Big Data Hadoop
Tableau supports Big Data Inputs
Big Data Hadoop
Rapid Miner supports Big Data Inputs
Big Data Hadoop
Hadoop Cluster Installation and Excel
Parser Processing
Big Data Hadoop
Clone hadoop master to slave1 and slave2
master
slave1
slave2
Big Data Hadoop
At master node: Edit host file
Big Data Hadoop
At master node : Copy key file to slave1 and slave2
scp /home/ubuntu/.ssh/id_dsa.pub ip-172-31-1-8:/home/ubuntu/.ssh/master.pub
scp /home/ubuntu/.ssh/id_dsa.pub 172.31.15.16:/home/ubuntu/.ssh/master.pub
Big Data Hadoop
After this slide, we will use 3 cascaded
windows to represent master node, slave1
node and slave2 node
master node
slave1 node
slave2 node
Big Data Hadoop
At slave1 and slave2: cat /home/ubuntu/.ssh/master.pub >> /home/ubuntu/.ssh/authorized_keys
Big Data Hadoop
At master: Test ssh to slave1 and slave 2
$ ssh ip-172-31-1-8
$ exit
$ ssh ip-172-31-15-16
$ exit
Big Data Hadoop
At master: add slave1 and slave2 to Hadoop slave file
Big Data Hadoop
At master: add slave1 and slave2 to Hadoop slave file
Big Data Hadoop
At master: edit hdfs-site.xml
Big Data Hadoop
At master: edit hdfs-site.xml for 2 replication servers
Big Data Hadoop
At all nodes: remove directories of namenode and datanode
Big Data Hadoop
At master: format namenode
Big Data Hadoop
At master: format namenode
Big Data Hadoop
At master: Execute start-dfs.sh
Big Data Hadoop
At slave1: Check jps result, you will see DataNode has been started
Big Data Hadoop
At slave2: Check jps result, you will see DataNode has been started
Big Data Hadoop
At master: Execute start-yarn.sh
Big Data Hadoop
At slave1: Check jps result, you will see NodeManager has been started
Big Data Hadoop
At slave2: Check jps result, you will see NodeManager has been started
Big Data Hadoop
Importing data into HDFS Cluster
Big Data Hadoop
At master: import data to hdfs
Big Data Hadoop
At slave1: review imported result data from hdfs
Big Data Hadoop
At slave2: review imported result data from hdfs
Big Data Hadoop
Running MapReduce in Cluster Mode
Big Data Hadoop
At master: execute YARN mapreduce program
Big Data Hadoop
At slave1, slave2: you will see Application Master and Yarn Child Container
Big Data Hadoop
At master: review output file from hdfs
Big Data Hadoop
At master: review output file from hdfs
Big Data Hadoop
At slave1, slave2: review output file from hdfs by using command:-
hdfs dfs -cat /outputs/wordcount_output_dir01/part-r-00000
Big Data Hadoop
At master: review output result data from
web console
Big Data Hadoop
At master: review output result data from
web console
Big Data Hadoop
At master: review output result data from
web console
Big Data Hadoop
At master: review output result data from
web console
Big Data Hadoop
Process Excel Worksheet
Big Data Hadoop
1. Create Java Class using POI Libs
Big Data Hadoop
2. Transversal Data in Excel Spreadsheet
Workbook workbook = new XSSFWorkbook(inputStream);
Sheet firstSheet = workbook.getSheetAt(0);
Iterator<Row> iterator = firstSheet.iterator();
while (iterator.hasNext()) {
Row nextRow = iterator.next();
Iterator<Cell> cellIterator = nextRow.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
Big Data Hadoop
3. Extract Data from Excel Spreadsheet
switch (cell.getCellType()) {
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue());
break;
case Cell.CELL_TYPE_BOOLEAN:
System.out.print(cell.getBooleanCellValue());
break;
case Cell.CELL_TYPE_NUMERIC:
System.out.print(cell.getNumericCellValue());
break;
}
For further integration into HDFS, please emit data to output collector.
Big Data Hadoop
4. Close Excel Spreadsheet
workbook.close();
inputStream.close();
Big Data Hadoop
Excel Processing Results in Hadoop
Big Data Hadoop
Stopping Hadoop Cluster
Big Data Hadoop
At master: execute stop-yarn.sh
Big Data Hadoop
At slave1: use jps to review NodeManager has been stopped
Big Data Hadoop
At slave2: use jps to review NodeManager has been stopped
Big Data Hadoop
At master: execute stop-dfs.sh
Big Data Hadoop
At slave1: use jps to review DataNode has been stopped
Big Data Hadoop
At slave2: use jps to review DataNode has been stopped
Big Data Hadoop
Thank you very much

Más contenido relacionado

La actualidad más candente

Hadoop Architecture in Depth
Hadoop Architecture in DepthHadoop Architecture in Depth
Hadoop Architecture in DepthSyed Hadoop
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questionsKalyan Hadoop
 
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesHadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesMithun Radhakrishnan
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovVasil Remeniuk
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data trainingagiamas
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
 
Hadoop interview quations1
Hadoop interview quations1Hadoop interview quations1
Hadoop interview quations1Vemula Ravi
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldLester Martin
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Ferran Galí Reniu
 

La actualidad más candente (20)

Hadoop Architecture in Depth
Hadoop Architecture in DepthHadoop Architecture in Depth
Hadoop Architecture in Depth
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesHadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop interview quations1
Hadoop interview quations1Hadoop interview quations1
Hadoop interview quations1
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 

Similar a Big data hadooop analytic and data warehouse comparison guide

SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases DataWorks Summit
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationVskills
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Søren Lund
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High AvailabilityCloudera, Inc.
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderDmitry Makarchuk
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabsSiva Sankar
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 

Similar a Big data hadooop analytic and data warehouse comparison guide (20)

SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce Certification
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 

Más de Danairat Thanabodithammachari

Thailand State Enterprise - Business Architecture and SE-AM
Thailand State Enterprise - Business Architecture and SE-AMThailand State Enterprise - Business Architecture and SE-AM
Thailand State Enterprise - Business Architecture and SE-AMDanairat Thanabodithammachari
 
Agile Organization and Enterprise Architecture v1129 Danairat
Agile Organization and Enterprise Architecture v1129 DanairatAgile Organization and Enterprise Architecture v1129 Danairat
Agile Organization and Enterprise Architecture v1129 DanairatDanairat Thanabodithammachari
 
Enterprise Architecture and Agile Organization Management v1076 Danairat
Enterprise Architecture and Agile Organization Management v1076 DanairatEnterprise Architecture and Agile Organization Management v1076 Danairat
Enterprise Architecture and Agile Organization Management v1076 DanairatDanairat Thanabodithammachari
 
Digital Transformation, Enterprise Architecture, Big Data by Danairat
Digital Transformation, Enterprise Architecture, Big Data by DanairatDigital Transformation, Enterprise Architecture, Big Data by Danairat
Digital Transformation, Enterprise Architecture, Big Data by DanairatDanairat Thanabodithammachari
 
Perl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File ProcessingPerl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File ProcessingDanairat Thanabodithammachari
 
JEE Programming - 08 Enterprise Application Deployment
JEE Programming - 08 Enterprise Application DeploymentJEE Programming - 08 Enterprise Application Deployment
JEE Programming - 08 Enterprise Application DeploymentDanairat Thanabodithammachari
 

Más de Danairat Thanabodithammachari (20)

Thailand State Enterprise - Business Architecture and SE-AM
Thailand State Enterprise - Business Architecture and SE-AMThailand State Enterprise - Business Architecture and SE-AM
Thailand State Enterprise - Business Architecture and SE-AM
 
Agile Management
Agile ManagementAgile Management
Agile Management
 
Agile Organization and Enterprise Architecture v1129 Danairat
Agile Organization and Enterprise Architecture v1129 DanairatAgile Organization and Enterprise Architecture v1129 Danairat
Agile Organization and Enterprise Architecture v1129 Danairat
 
Blockchain for Management
Blockchain for ManagementBlockchain for Management
Blockchain for Management
 
Enterprise Architecture and Agile Organization Management v1076 Danairat
Enterprise Architecture and Agile Organization Management v1076 DanairatEnterprise Architecture and Agile Organization Management v1076 Danairat
Enterprise Architecture and Agile Organization Management v1076 Danairat
 
Agile Enterprise Architecture - Danairat
Agile Enterprise Architecture - DanairatAgile Enterprise Architecture - Danairat
Agile Enterprise Architecture - Danairat
 
Digital Transformation, Enterprise Architecture, Big Data by Danairat
Digital Transformation, Enterprise Architecture, Big Data by DanairatDigital Transformation, Enterprise Architecture, Big Data by Danairat
Digital Transformation, Enterprise Architecture, Big Data by Danairat
 
Perl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File ProcessingPerl for System Automation - 01 Advanced File Processing
Perl for System Automation - 01 Advanced File Processing
 
Perl Programming - 04 Programming Database
Perl Programming - 04 Programming DatabasePerl Programming - 04 Programming Database
Perl Programming - 04 Programming Database
 
Perl Programming - 03 Programming File
Perl Programming - 03 Programming FilePerl Programming - 03 Programming File
Perl Programming - 03 Programming File
 
Perl Programming - 02 Regular Expression
Perl Programming - 02 Regular ExpressionPerl Programming - 02 Regular Expression
Perl Programming - 02 Regular Expression
 
Perl Programming - 01 Basic Perl
Perl Programming - 01 Basic PerlPerl Programming - 01 Basic Perl
Perl Programming - 01 Basic Perl
 
JEE Programming - 03 Model View Controller
JEE Programming - 03 Model View ControllerJEE Programming - 03 Model View Controller
JEE Programming - 03 Model View Controller
 
JEE Programming - 05 JSP
JEE Programming - 05 JSPJEE Programming - 05 JSP
JEE Programming - 05 JSP
 
JEE Programming - 04 Java Servlets
JEE Programming - 04 Java ServletsJEE Programming - 04 Java Servlets
JEE Programming - 04 Java Servlets
 
JEE Programming - 08 Enterprise Application Deployment
JEE Programming - 08 Enterprise Application DeploymentJEE Programming - 08 Enterprise Application Deployment
JEE Programming - 08 Enterprise Application Deployment
 
JEE Programming - 07 EJB Programming
JEE Programming - 07 EJB ProgrammingJEE Programming - 07 EJB Programming
JEE Programming - 07 EJB Programming
 
JEE Programming - 06 Web Application Deployment
JEE Programming - 06 Web Application DeploymentJEE Programming - 06 Web Application Deployment
JEE Programming - 06 Web Application Deployment
 
JEE Programming - 01 Introduction
JEE Programming - 01 IntroductionJEE Programming - 01 Introduction
JEE Programming - 01 Introduction
 
JEE Programming - 02 The Containers
JEE Programming - 02 The ContainersJEE Programming - 02 The Containers
JEE Programming - 02 The Containers
 

Último

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Big data hadooop analytic and data warehouse comparison guide

  • 1. Big Data Hadoop – Hands On Workshop Data Processing Solutions – Comparison Guide Big Data Workshop Series Danairat T. Results Data Inputs Cloud 1 2 Data Inputs Results Staging Staging Staging Big DWH Data Mart Data Mart Data Mart Data Mart C u b e C u b e C u b e C u b e C u b e Staging Analy tic Resul ts Layer Cube Layer Data Mart Layer Data Warehouse Layer Data Staging Layer Data Source Layer 1 2 3 4 5 6 Core Hadoop Traditional Data Warehouse VS.
  • 2. Big Data Hadoop Solution 1. Core Hadoop processing NO data staging transformation and NO data move required!! Analytic Results Excel Inputs Top Benefits 1. Cloud and IoT ready architecture roadmap 2. No data duplication with reduce cost of data store/storage 3. Fast data processing and all processing are built-in fault tolerant 4. Align with unify data architecture and data governance 5. Less steps of data processing comparing with traditional DWH The Effort Investment:- 1. Learn core Hadoop Cloud Ready 1 2 Examples
  • 3. Big Data Hadoop Solution 2. Using BI Tools to analyze Hadoop data Required single transformation in Hadoop for BI Tools since there is no BI Tools built-in POI parser (MS-Office connector) over HDFS protocol. Hadoop HDFS (CVS Raw Text) Excel Inputs Top Benefits 1. Lower cost with cloud/IoT ready architecture 2. Fast data processing and all processing are built-in fault tolerant 3. Less steps of data processing comparing with traditional DWH The Effort Investment:- 1. Learn Hadoop 2. Require transformation to RAW text for BI Tools Cloud Ready 1 2 3 Examples
  • 4. Big Data Hadoop Solution 3. Creating data warehouse in Hadoop Required single transformation with DWH set up on Hadoop for BI Tools Top Benefits 1. Lower cost with cloud/IoT ready architecture 2. Fast data processing and all processing are built-in fault tolerant 3. Less steps of data processing comparing with traditional DWH The Effort Investment:- 1. Learn core Hadoop 2. Require transformation to RAW text for BI Tools 3. Require DWH on Hadoop set up (Hive, Cassandra, HBase) Hadoop HDFSExcel Inputs Cloud Ready Hadoop DWH Hive, (or Cassandra, Hbase) 1 2 3 4 Examples
  • 5. Big Data Hadoop Solution 4. Implementing traditional data warehouse Staging Staging Staging The more data grow, the slower data processing Data Mart Data Mart Data Mart Data Mart Top Concerns from Traditional Data Warehouse Architecture 1. A lot of data duplication lead to cost of data store/storage issue 2. Very slow of data processing and need to restart/roll back the job if any failed 3. Data security issue due to keep data too many copies and various formats Cube Cube Cube Cube Cube Staging Analytic Results Layer Cube Layer Data Mart Layer Data Warehouse Layer Data Staging Layer Data Source Layer 1 2 3 4 5 6
  • 6. Big Data Hadoop Benefits Comparison Summary Benefits Criteria Solutions Cloud Ready Archit ecture Built-In Parallel Proces sing IoT Archite cture Roadma p Without DB cube investm ent Witho ut data mart invest ment Without DWH investme nt Without Staging data (RAW Text) Unstruct ured and RAW Source Content Processin g 1. Core Hadoop Yes Yes Yes Yes Yes Yes Yes Yes 2. Hadoop and Pentaho/Power BI Yes Yes Yes Yes Yes Yes No (require CSV) No (require CSV) 3. Hadoop and Cognos, RapidMiner, BO, Cognos, Tableau Yes Yes Yes Yes Yes No (require Hive connector) No (require Hive connector) No (require Hive connector) 4. Traditional Data Warehouse No No No No No No No No
  • 8. Big Data Hadoop Pentaho supports Big Data Inputs
  • 9. Big Data Hadoop PowerBI supports Big Data Inputs
  • 10. Big Data Hadoop Tableau supports Big Data Inputs
  • 11. Big Data Hadoop Rapid Miner supports Big Data Inputs
  • 12. Big Data Hadoop Hadoop Cluster Installation and Excel Parser Processing
  • 13. Big Data Hadoop Clone hadoop master to slave1 and slave2 master slave1 slave2
  • 14. Big Data Hadoop At master node: Edit host file
  • 15. Big Data Hadoop At master node : Copy key file to slave1 and slave2 scp /home/ubuntu/.ssh/id_dsa.pub ip-172-31-1-8:/home/ubuntu/.ssh/master.pub scp /home/ubuntu/.ssh/id_dsa.pub 172.31.15.16:/home/ubuntu/.ssh/master.pub
  • 16. Big Data Hadoop After this slide, we will use 3 cascaded windows to represent master node, slave1 node and slave2 node master node slave1 node slave2 node
  • 17. Big Data Hadoop At slave1 and slave2: cat /home/ubuntu/.ssh/master.pub >> /home/ubuntu/.ssh/authorized_keys
  • 18. Big Data Hadoop At master: Test ssh to slave1 and slave 2 $ ssh ip-172-31-1-8 $ exit $ ssh ip-172-31-15-16 $ exit
  • 19. Big Data Hadoop At master: add slave1 and slave2 to Hadoop slave file
  • 20. Big Data Hadoop At master: add slave1 and slave2 to Hadoop slave file
  • 21. Big Data Hadoop At master: edit hdfs-site.xml
  • 22. Big Data Hadoop At master: edit hdfs-site.xml for 2 replication servers
  • 23. Big Data Hadoop At all nodes: remove directories of namenode and datanode
  • 24. Big Data Hadoop At master: format namenode
  • 25. Big Data Hadoop At master: format namenode
  • 26. Big Data Hadoop At master: Execute start-dfs.sh
  • 27. Big Data Hadoop At slave1: Check jps result, you will see DataNode has been started
  • 28. Big Data Hadoop At slave2: Check jps result, you will see DataNode has been started
  • 29. Big Data Hadoop At master: Execute start-yarn.sh
  • 30. Big Data Hadoop At slave1: Check jps result, you will see NodeManager has been started
  • 31. Big Data Hadoop At slave2: Check jps result, you will see NodeManager has been started
  • 32. Big Data Hadoop Importing data into HDFS Cluster
  • 33. Big Data Hadoop At master: import data to hdfs
  • 34. Big Data Hadoop At slave1: review imported result data from hdfs
  • 35. Big Data Hadoop At slave2: review imported result data from hdfs
  • 36. Big Data Hadoop Running MapReduce in Cluster Mode
  • 37. Big Data Hadoop At master: execute YARN mapreduce program
  • 38. Big Data Hadoop At slave1, slave2: you will see Application Master and Yarn Child Container
  • 39. Big Data Hadoop At master: review output file from hdfs
  • 40. Big Data Hadoop At master: review output file from hdfs
  • 41. Big Data Hadoop At slave1, slave2: review output file from hdfs by using command:- hdfs dfs -cat /outputs/wordcount_output_dir01/part-r-00000
  • 42. Big Data Hadoop At master: review output result data from web console
  • 43. Big Data Hadoop At master: review output result data from web console
  • 44. Big Data Hadoop At master: review output result data from web console
  • 45. Big Data Hadoop At master: review output result data from web console
  • 46. Big Data Hadoop Process Excel Worksheet
  • 47. Big Data Hadoop 1. Create Java Class using POI Libs
  • 48. Big Data Hadoop 2. Transversal Data in Excel Spreadsheet Workbook workbook = new XSSFWorkbook(inputStream); Sheet firstSheet = workbook.getSheetAt(0); Iterator<Row> iterator = firstSheet.iterator(); while (iterator.hasNext()) { Row nextRow = iterator.next(); Iterator<Cell> cellIterator = nextRow.cellIterator(); while (cellIterator.hasNext()) { Cell cell = cellIterator.next();
  • 49. Big Data Hadoop 3. Extract Data from Excel Spreadsheet switch (cell.getCellType()) { case Cell.CELL_TYPE_STRING: System.out.print(cell.getStringCellValue()); break; case Cell.CELL_TYPE_BOOLEAN: System.out.print(cell.getBooleanCellValue()); break; case Cell.CELL_TYPE_NUMERIC: System.out.print(cell.getNumericCellValue()); break; } For further integration into HDFS, please emit data to output collector.
  • 50. Big Data Hadoop 4. Close Excel Spreadsheet workbook.close(); inputStream.close();
  • 51. Big Data Hadoop Excel Processing Results in Hadoop
  • 52. Big Data Hadoop Stopping Hadoop Cluster
  • 53. Big Data Hadoop At master: execute stop-yarn.sh
  • 54. Big Data Hadoop At slave1: use jps to review NodeManager has been stopped
  • 55. Big Data Hadoop At slave2: use jps to review NodeManager has been stopped
  • 56. Big Data Hadoop At master: execute stop-dfs.sh
  • 57. Big Data Hadoop At slave1: use jps to review DataNode has been stopped
  • 58. Big Data Hadoop At slave2: use jps to review DataNode has been stopped
  • 59. Big Data Hadoop Thank you very much