SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
Building the Integrated Data
Warehouse
With Oracle Database and Hadoop

Gwen Shapira, Senior Consultant
Why Pythian
•  Recognized Leader:
   •  Global industry leader in data infrastructure managed services and consulting with expertise
      in Oracle, Oracle Applications, Microsoft SQL Server, MySQL, big data and systems
      administration
   •  Work with over 200 multinational companies such as Forbes.com, Fox Sports, Nordion and
      Western Union to help manage their complex IT deployments
•  Expertise:
   •  One of the world’s largest concentrations of dedicated, full-time DBA expertise. Employ 8
      Oracle ACEs/ACE Directors
   •  Hold 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata,
      Oracle GoldenGate & Oracle RAC
•  Global Reach & Scalability:
   •  24/7/365 global remote support for DBA and consulting, systems administration, special
      projects or emergency response

                                              © 2012 – Pythian
About Gwen Shapira
             •  Oracle ACE Director
             •  13 Years with pager
                   •    7 as Oracle DBA
             •  Senior Consultant:
                  •  Has   MacBook, will travel.
             •    @gwenshap
             •    http://www.pythian.com/news/author/
                  shapira/

                            © 2012 – Pythian
Agenda
•  What is Big Data?
•  Why do we care about Big Data?
•  Why your DWH needs Hadoop?
•  Examples of Hadoop in the DWH
•  How to integrate Hadoop into your
   DWH
•  Avoiding major pitfalls




                                    © 2012 – Pythian
What is Big Data?
MORE DATA THAN YOU
CAN HANDLE


        © 2012 – Pythian
MORE DATA THAN
RELATIONAL DATABASES
CAN HANDLE

         © 2012 – Pythian
MORE DATA THAN
RELATIONAL DATABASES
CAN HANDLE CHEAPLY

          © 2012 – Pythian
Data Arriving at fast Rates
Typically unstructured
Stored without aggregation
Analyzed in Real Time
For Reasonable Cost

                   © 2012 – Pythian
Where does Big Data come from?
• Social media
• Enterprise transactional data
• Consumer behaviour
• Multimedia
• Sensors and embedded devices
• Network devices


                         © 2012 – Pythian
Why all the Excitement?




                   © 2012 – Pythian
Complex Data Architecture




           © 2012 – Pythian
Your DWH needs Hadoop
Big Problems with Big Data
•  It is:
  •  Unstructured

  •  Unprocessed

  •  Un-aggregated

  •  Un-filtered

  •  Repetitive

  •  And    generally messy.

  Oh, and there is a lot of it.
                               © 2012 – Pythian
Technical Challenges
•  Storage capacity
•  Storage throughput         Scalable storage	

•  Pipeline throughput
•  Processing power
•  Parallel processing
                              Massive Parallel Processing	

•  System Integration
•  Data Analysis              Ready to use tools	



                         © 2012 – Pythian
Hadoop Principles
Bring Code to Data                      Share Nothing




                     © 2012 – Pythian
Hadoop in a Nutshell
HDFS:                                           Map-Reduce:
Replicated Distributed Big-Data File               Framework for writing massively parallel
                                                   jobs
System	





                                       © 2012 – Pythian
Hadoop Benefits
•  Reliable solution based on unreliable hardware
•  Designed for large files
•  Load data first, structure later
•  Designed to maximize throughput of large scans
•  Designed to maximize parallelism
•  Designed to scale
•  Flexible development platform
•  Solution Ecosystem


                                      © 2012 – Pythian
Hadoop Limitations
•  Hadoop is scalable but not fast
•  Batteries not included
•  Instrumentation not included either
•  Well-known reliability limitations




                               © 2012 – Pythian
Hadoop in the Data Warehouse
Use Cases and Customer Stories
ETL for Unstructured Data
   Logs        Flume                        Hadoop
Web servers,                                  Cleanup,
 app server,                                 aggregation
clickstreams                              Longterm storage




                                              DWH
                                                BI,
                                           batch reports
                       © 2012 – Pythian
ETL for Structured Data
               Sqoop,                        Hadoop
    OLTP
     Oracle,    Perl                        Transformation
     MySQL,                                   aggregation
   Informix…                               Longterm storage




                                               DWH
                                                 BI,
                                            batch reports

                        © 2012 – Pythian
Bring the World into Your Datacenter




                    © 2012 – Pythian
Rare Historical Report




                    © 2012 – Pythian
Find Needle in Haystack




                    © 2012 – Pythian
We are not doing SQL anymore




                   © 2012 – Pythian
Connecting the (big) Dots
Sqoop
        Queries	





                 © 2012 – Pythian
Sqoop is Flexible (for import)
 • Select
        columns from table where condition
 • Or write your own query

 • Split   column
 • Parallel

 • Incremental

 • File   formats

                       © 2012 – Pythian
Sqoop Import Examples
•  Sqoop	
  import	
  -­‐-­‐connect	
  jdbc:oracle:thin:@//dbserver:
   1521/masterdb	
  	
  
   -­‐-­‐username	
  hr	
  -­‐-­‐table	
  emp	
  	
  
   -­‐-­‐where	
  “start_date	
  	
  ’01-­‐01-­‐2012’”	
  
	
  

•  Sqoop	
  import	
  jdbc:oracle:thin:@//dbserver:1521/masterdb	
  	
  
   -­‐-­‐username	
  myuser	
  	
  
   -­‐-­‐table	
  shops	
  -­‐-­‐split-­‐by	
  shop_id	
  	
  
                                                               Must be indexed or
   -­‐-­‐num-­‐mappers	
  16	
  
                                                               partitioned to avoid
                                                                16 full table scans	

                                      © 2012 – Pythian
Less Flexible Export
 • 100row batch inserts
 • Commit every 100 batches

 • Parallel   export
 • Update     mode
 Example:
 sqoop	
  export	
  	
  
 -­‐-­‐connect	
  jdbc:oracle:thin:@//dbserver:1521/masterdb	
  	
  
 -­‐-­‐table	
  bar	
  	
  
 -­‐-­‐export-­‐dir	
  /results/bar_data	
  
                                 © 2012 – Pythian
Fuse-DFS
•  Mount HDFS on Oracle server:
 •  sudo   yum install hadoop-0.20-fuse
 •  hadoop-fuse-dfs
               dfs://
  name_node_hostname:namenode_port mount_point
•  Use external tables to load data into Oracle
•  File Formats may vary
•  All ETL best practices apply


                                © 2012 – Pythian
Oracle Loader for Hadoop
•  Load data from Hadoop into Oracle
•  Map-Reduce job inside Hadoop
•  Converts data types.
•  Partitions and sorts
•  Direct path loads
•  Reduces CPU utilization on database



                              © 2012 – Pythian
Oracle Direct Connector to HDFS
•  Create external tables of files in HDFS
•  PREPROCESSOR	
  HDFS_BIN_PATH:hdfs_stream	
  
•  All the features of External Tables
•  Tested (by Oracle) as 5 times faster (GB/s) than FUSE-DFS




                              © 2012 – Pythian
Big Data Appliance and Exadata




                    © 2012 – Pythian
How not to Fail
Data that belongs in RDBMS




                   © 2012 – Pythian
Prepare for Migration




                    © 2012 – Pythian
Use Hadoop Efficiently
•  Understand your bottlenecks:
 •  CPU,    storage or network?
•  Reduce use of temporary data:
 •  All   data is over the network
 •  Written    to disk in triplicate.
•  Eliminate unbalanced workloads
•  Offload work to RDBMS
•  Fine-tune optimization with Map-Reduce


                                        © 2012 – Pythian
Your Data
               is   NOT
              as    BIG
 as you think


© 2012 – Pythian
Getting Started
• Pick a Business Problem
•  Acquire Data
• Use right tool for the job
•  Hadoop can start on the cheap
•  Integrate the systems
• Analyze data
•  Get operational



                                   © 2012 – Pythian
Thank you and QA
To contact us…
    sales@pythian.com	


    1-877-PYTHIAN	


To follow us…
    http://www.pythian.com/news/	


    http://www.facebook.com/pages/The-Pythian-Group/163902527671	


    @pythian	


    @pythianjobs 	


    http://www.linkedin.com/company/pythian	


                                          © 2012 – Pythian

Más contenido relacionado

La actualidad más candente

Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsDavid Portnoy
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Kolja Manuel Rödel
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_finalAdam Muise
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 

La actualidad más candente (20)

Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Data lake
Data lakeData lake
Data lake
 

Destacado

Informatica to ODI Migration – What, Why and How | Informatica to Oracle Dat...
Informatica to ODI Migration – What, Why and How |  Informatica to Oracle Dat...Informatica to ODI Migration – What, Why and How |  Informatica to Oracle Dat...
Informatica to ODI Migration – What, Why and How | Informatica to Oracle Dat...Jade Global
 
Faster Cheaper Better-Replacing Oracle with Hadoop & Solr
Faster Cheaper Better-Replacing Oracle with Hadoop & SolrFaster Cheaper Better-Replacing Oracle with Hadoop & Solr
Faster Cheaper Better-Replacing Oracle with Hadoop & SolrDataWorks Summit
 
2010-08-26-mongodb-step-by-step-by-hexnova
2010-08-26-mongodb-step-by-step-by-hexnova2010-08-26-mongodb-step-by-step-by-hexnova
2010-08-26-mongodb-step-by-step-by-hexnovaccdaisy
 
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten IchibaRakuten Group, Inc.
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Amp Your Customer Service Statistics by Improving Data in Salesforce Service ...
Amp Your Customer Service Statistics by Improving Data in Salesforce Service ...Amp Your Customer Service Statistics by Improving Data in Salesforce Service ...
Amp Your Customer Service Statistics by Improving Data in Salesforce Service ...Informatica Cloud
 
Informatica Cloud for Oracle
Informatica Cloud for OracleInformatica Cloud for Oracle
Informatica Cloud for OracleDarren Cunningham
 
Why and How Migrate Informatica to ODI | Infa to ODI Migration | Infa to ODI ...
Why and How Migrate Informatica to ODI | Infa to ODI Migration | Infa to ODI ...Why and How Migrate Informatica to ODI | Infa to ODI Migration | Infa to ODI ...
Why and How Migrate Informatica to ODI | Infa to ODI Migration | Infa to ODI ...Jade Global
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsCloudera, Inc.
 
5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce AnalyticsInformatica Cloud
 
Informatica Cloud Winter 2016 Release Webinar
Informatica Cloud Winter 2016 Release WebinarInformatica Cloud Winter 2016 Release Webinar
Informatica Cloud Winter 2016 Release WebinarInformatica Cloud
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...Kai Wähner
 
Informatica Cloud Summer 2016 Release Webinar Slides
Informatica Cloud Summer 2016 Release Webinar SlidesInformatica Cloud Summer 2016 Release Webinar Slides
Informatica Cloud Summer 2016 Release Webinar SlidesInformatica Cloud
 
Chief Data Officer: DataOps - Transformation of the Business Data Environment
Chief Data Officer: DataOps - Transformation of the Business Data EnvironmentChief Data Officer: DataOps - Transformation of the Business Data Environment
Chief Data Officer: DataOps - Transformation of the Business Data EnvironmentCraig Milroy
 
Java 초보자를 위한 hadoop 설정
Java 초보자를 위한 hadoop 설정Java 초보자를 위한 hadoop 설정
Java 초보자를 위한 hadoop 설정HyeonSeok Choi
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 

Destacado (16)

Informatica to ODI Migration – What, Why and How | Informatica to Oracle Dat...
Informatica to ODI Migration – What, Why and How |  Informatica to Oracle Dat...Informatica to ODI Migration – What, Why and How |  Informatica to Oracle Dat...
Informatica to ODI Migration – What, Why and How | Informatica to Oracle Dat...
 
Faster Cheaper Better-Replacing Oracle with Hadoop & Solr
Faster Cheaper Better-Replacing Oracle with Hadoop & SolrFaster Cheaper Better-Replacing Oracle with Hadoop & Solr
Faster Cheaper Better-Replacing Oracle with Hadoop & Solr
 
2010-08-26-mongodb-step-by-step-by-hexnova
2010-08-26-mongodb-step-by-step-by-hexnova2010-08-26-mongodb-step-by-step-by-hexnova
2010-08-26-mongodb-step-by-step-by-hexnova
 
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Amp Your Customer Service Statistics by Improving Data in Salesforce Service ...
Amp Your Customer Service Statistics by Improving Data in Salesforce Service ...Amp Your Customer Service Statistics by Improving Data in Salesforce Service ...
Amp Your Customer Service Statistics by Improving Data in Salesforce Service ...
 
Informatica Cloud for Oracle
Informatica Cloud for OracleInformatica Cloud for Oracle
Informatica Cloud for Oracle
 
Why and How Migrate Informatica to ODI | Infa to ODI Migration | Infa to ODI ...
Why and How Migrate Informatica to ODI | Infa to ODI Migration | Infa to ODI ...Why and How Migrate Informatica to ODI | Infa to ODI Migration | Infa to ODI ...
Why and How Migrate Informatica to ODI | Infa to ODI Migration | Infa to ODI ...
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
 
5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics
 
Informatica Cloud Winter 2016 Release Webinar
Informatica Cloud Winter 2016 Release WebinarInformatica Cloud Winter 2016 Release Webinar
Informatica Cloud Winter 2016 Release Webinar
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
 
Informatica Cloud Summer 2016 Release Webinar Slides
Informatica Cloud Summer 2016 Release Webinar SlidesInformatica Cloud Summer 2016 Release Webinar Slides
Informatica Cloud Summer 2016 Release Webinar Slides
 
Chief Data Officer: DataOps - Transformation of the Business Data Environment
Chief Data Officer: DataOps - Transformation of the Business Data EnvironmentChief Data Officer: DataOps - Transformation of the Business Data Environment
Chief Data Officer: DataOps - Transformation of the Business Data Environment
 
Java 초보자를 위한 hadoop 설정
Java 초보자를 위한 hadoop 설정Java 초보자를 위한 hadoop 설정
Java 초보자를 위한 hadoop 설정
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 

Similar a Integrated Data Warehouse with Hadoop and Oracle Database

Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoopinside-BigData.com
 
Making Sense of Big data with Hadoop
Making Sense of Big data with HadoopMaking Sense of Big data with Hadoop
Making Sense of Big data with HadoopGwen (Chen) Shapira
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integrationibi
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio..."Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...Dataconomy Media
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.OW2
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 

Similar a Integrated Data Warehouse with Hadoop and Oracle Database (20)

Integrated dwh 3
Integrated dwh 3Integrated dwh 3
Integrated dwh 3
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Making Sense of Big data with Hadoop
Making Sense of Big data with HadoopMaking Sense of Big data with Hadoop
Making Sense of Big data with Hadoop
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integration
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio..."Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 

Más de Gwen (Chen) Shapira

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep DiveGwen (Chen) Shapira
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote Gwen (Chen) Shapira
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGwen (Chen) Shapira
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebookGwen (Chen) Shapira
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupGwen (Chen) Shapira
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Gwen (Chen) Shapira
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings MeetupGwen (Chen) Shapira
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersGwen (Chen) Shapira
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupGwen (Chen) Shapira
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 

Más de Gwen (Chen) Shapira (20)

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep Dive
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 

Último

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Último (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Integrated Data Warehouse with Hadoop and Oracle Database

  • 1. Building the Integrated Data Warehouse With Oracle Database and Hadoop Gwen Shapira, Senior Consultant
  • 2. Why Pythian •  Recognized Leader: •  Global industry leader in data infrastructure managed services and consulting with expertise in Oracle, Oracle Applications, Microsoft SQL Server, MySQL, big data and systems administration •  Work with over 200 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments •  Expertise: •  One of the world’s largest concentrations of dedicated, full-time DBA expertise. Employ 8 Oracle ACEs/ACE Directors •  Hold 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC •  Global Reach & Scalability: •  24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response © 2012 – Pythian
  • 3. About Gwen Shapira •  Oracle ACE Director •  13 Years with pager •  7 as Oracle DBA •  Senior Consultant: •  Has MacBook, will travel. •  @gwenshap •  http://www.pythian.com/news/author/ shapira/ © 2012 – Pythian
  • 4. Agenda •  What is Big Data? •  Why do we care about Big Data? •  Why your DWH needs Hadoop? •  Examples of Hadoop in the DWH •  How to integrate Hadoop into your DWH •  Avoiding major pitfalls © 2012 – Pythian
  • 5. What is Big Data?
  • 6. MORE DATA THAN YOU CAN HANDLE © 2012 – Pythian
  • 7. MORE DATA THAN RELATIONAL DATABASES CAN HANDLE © 2012 – Pythian
  • 8. MORE DATA THAN RELATIONAL DATABASES CAN HANDLE CHEAPLY © 2012 – Pythian
  • 9. Data Arriving at fast Rates Typically unstructured Stored without aggregation Analyzed in Real Time For Reasonable Cost © 2012 – Pythian
  • 10. Where does Big Data come from? • Social media • Enterprise transactional data • Consumer behaviour • Multimedia • Sensors and embedded devices • Network devices © 2012 – Pythian
  • 11. Why all the Excitement? © 2012 – Pythian
  • 12. Complex Data Architecture © 2012 – Pythian
  • 13. Your DWH needs Hadoop
  • 14. Big Problems with Big Data •  It is: •  Unstructured •  Unprocessed •  Un-aggregated •  Un-filtered •  Repetitive •  And generally messy. Oh, and there is a lot of it. © 2012 – Pythian
  • 15. Technical Challenges •  Storage capacity •  Storage throughput Scalable storage •  Pipeline throughput •  Processing power •  Parallel processing Massive Parallel Processing •  System Integration •  Data Analysis Ready to use tools © 2012 – Pythian
  • 16. Hadoop Principles Bring Code to Data Share Nothing © 2012 – Pythian
  • 17. Hadoop in a Nutshell HDFS: Map-Reduce: Replicated Distributed Big-Data File Framework for writing massively parallel jobs System © 2012 – Pythian
  • 18. Hadoop Benefits •  Reliable solution based on unreliable hardware •  Designed for large files •  Load data first, structure later •  Designed to maximize throughput of large scans •  Designed to maximize parallelism •  Designed to scale •  Flexible development platform •  Solution Ecosystem © 2012 – Pythian
  • 19. Hadoop Limitations •  Hadoop is scalable but not fast •  Batteries not included •  Instrumentation not included either •  Well-known reliability limitations © 2012 – Pythian
  • 20. Hadoop in the Data Warehouse Use Cases and Customer Stories
  • 21. ETL for Unstructured Data Logs Flume Hadoop Web servers, Cleanup, app server, aggregation clickstreams Longterm storage DWH BI, batch reports © 2012 – Pythian
  • 22. ETL for Structured Data Sqoop, Hadoop OLTP Oracle, Perl Transformation MySQL, aggregation Informix… Longterm storage DWH BI, batch reports © 2012 – Pythian
  • 23. Bring the World into Your Datacenter © 2012 – Pythian
  • 24. Rare Historical Report © 2012 – Pythian
  • 25. Find Needle in Haystack © 2012 – Pythian
  • 26. We are not doing SQL anymore © 2012 – Pythian
  • 28. Sqoop Queries © 2012 – Pythian
  • 29. Sqoop is Flexible (for import) • Select columns from table where condition • Or write your own query • Split column • Parallel • Incremental • File formats © 2012 – Pythian
  • 30. Sqoop Import Examples •  Sqoop  import  -­‐-­‐connect  jdbc:oracle:thin:@//dbserver: 1521/masterdb     -­‐-­‐username  hr  -­‐-­‐table  emp     -­‐-­‐where  “start_date    ’01-­‐01-­‐2012’”     •  Sqoop  import  jdbc:oracle:thin:@//dbserver:1521/masterdb     -­‐-­‐username  myuser     -­‐-­‐table  shops  -­‐-­‐split-­‐by  shop_id     Must be indexed or -­‐-­‐num-­‐mappers  16   partitioned to avoid 16 full table scans © 2012 – Pythian
  • 31. Less Flexible Export • 100row batch inserts • Commit every 100 batches • Parallel export • Update mode Example: sqoop  export     -­‐-­‐connect  jdbc:oracle:thin:@//dbserver:1521/masterdb     -­‐-­‐table  bar     -­‐-­‐export-­‐dir  /results/bar_data   © 2012 – Pythian
  • 32. Fuse-DFS •  Mount HDFS on Oracle server: •  sudo yum install hadoop-0.20-fuse •  hadoop-fuse-dfs dfs:// name_node_hostname:namenode_port mount_point •  Use external tables to load data into Oracle •  File Formats may vary •  All ETL best practices apply © 2012 – Pythian
  • 33. Oracle Loader for Hadoop •  Load data from Hadoop into Oracle •  Map-Reduce job inside Hadoop •  Converts data types. •  Partitions and sorts •  Direct path loads •  Reduces CPU utilization on database © 2012 – Pythian
  • 34. Oracle Direct Connector to HDFS •  Create external tables of files in HDFS •  PREPROCESSOR  HDFS_BIN_PATH:hdfs_stream   •  All the features of External Tables •  Tested (by Oracle) as 5 times faster (GB/s) than FUSE-DFS © 2012 – Pythian
  • 35. Big Data Appliance and Exadata © 2012 – Pythian
  • 36. How not to Fail
  • 37. Data that belongs in RDBMS © 2012 – Pythian
  • 38. Prepare for Migration © 2012 – Pythian
  • 39. Use Hadoop Efficiently •  Understand your bottlenecks: •  CPU, storage or network? •  Reduce use of temporary data: •  All data is over the network •  Written to disk in triplicate. •  Eliminate unbalanced workloads •  Offload work to RDBMS •  Fine-tune optimization with Map-Reduce © 2012 – Pythian
  • 40. Your Data is NOT as BIG as you think © 2012 – Pythian
  • 41. Getting Started • Pick a Business Problem •  Acquire Data • Use right tool for the job •  Hadoop can start on the cheap •  Integrate the systems • Analyze data •  Get operational © 2012 – Pythian
  • 42. Thank you and QA To contact us… sales@pythian.com 1-877-PYTHIAN To follow us… http://www.pythian.com/news/ http://www.facebook.com/pages/The-Pythian-Group/163902527671 @pythian @pythianjobs http://www.linkedin.com/company/pythian © 2012 – Pythian