SlideShare a Scribd company logo
1 of 19
Big Data redefines
   Enterprise Data Warehouse
Big Data Innovation, Unicom - Bangalore
                        February 2013
                       Raghu Kashyap
About Raghu Kashyap
Personal

■ Director – Data Insights Group @ Orbitz Worldwide
■ eMail: raghu.kashyap@orbitz.com
■ Twitter: @ragskashyap
■ Blog: http://kashyaps.com
■ LinkedIn: http://www.linkedin.com/in/raghukashyap/




                                                 Areas of Responsibility

                                                 ■ Orbitz Services Bangalore Center Head

                                                 ■ Lead Big Data team that builds out Global Data Infrastructure for
                                                   Orbitz Worldwide and provides business insights.

                                                 ■ US, Europe, Australia(APAC)



 page 2
Orbitz in a nutshell




page 3
Orbitz Worldwide




page 4
Back to the future.




page 5
Vendor evaluation

     • KARMAsphere
     • Datameer
     • Aster Data




page 6
Traditional Data warehouse
                                Greenplum
         Raw
         logs


                                 ETL
                Staging
                 table


                                            Temp tables




                                                  ETL



                    Data Mart
page 7
Hadoop Infrastructure




page 8
Redefine Enterprise Data warehouse
    ETL only approach
    2:12 seconds


    Run map reduce job


    1m 14.298s


    Port flat file to Greenplum using GP connector


    Time: 5.077 s


page 9
Approach with Hadoop and ETL
          Raw                                 Greenplum
          logs




                                               Event Model

                  Map Reduce


                                                             ETL
     Flat files
                     GP Connector
                                    External Tables

page 10
Resolving database keys
                tag_value_dim                    Greenplum
    id        tag      value                    tag_value_dim
    1         pos      ORB               id      tag          value
    2         pos      ORBC              200     pos          ORB
    3         pos      ORB               157     pos          ORBC


                                          ETL
                    fact
                                                       fact
   id     tag value id fact
                                    id         Tag value id     value
                       value
                                               200              $ 5600
          1                $ 5600
                                               200              $ 7500
          3                $ 7500



page 11
Hadoop Configuration

     • 74 Nodes
     • >1PB
     • Hive
     • Flume
     • HBase
     •R
     • Cloudera Distribution
     • Greenplum Connector




page 12
Hadoop Applications

     Site Analytics


     Machine Learning


     Multi Variate Testing Analysis


     Production Logs


     Hotel Rate Cache TTL


page 13
Hadoop Usage




page 14
Business Performance Monitoring

     • EFX
     • Marketing channels
     • Shopper patterns
     • Recommendation Module




page 15
Multi channel attribution




page 16
MVT

     Analyze behavioral and Test data from our MVT testing




page 17
Lessons Learnt




   Analytics using Big Data comes with a price.

   Data Governance

   Senior Leadership buy in

   I can't tell you the key to success, but the key to failure is
    trying to please everyone." -Ed Sheeran
page 18
Thank you




page 19

More Related Content

What's hot

Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 
Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Rohit Agrawal
 
Big Data Hadoop Training in Pune-Course Content Advanto Software
Big Data Hadoop Training in Pune-Course Content Advanto SoftwareBig Data Hadoop Training in Pune-Course Content Advanto Software
Big Data Hadoop Training in Pune-Course Content Advanto SoftwareAdvanto Software
 
Hadoop 2 cluster architecture
Hadoop 2 cluster architectureHadoop 2 cluster architecture
Hadoop 2 cluster architectureSandeep Patil
 
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...Torsten Steinbach
 
Best Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon RedshiftBest Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon RedshiftAmazon Web Services
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Hortonworks
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011Hortonworks
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Trainingstratapps
 
Lessons learned mongodb to redhsift - meetup July 1st Tel Aviv
Lessons learned   mongodb to redhsift - meetup July 1st Tel AvivLessons learned   mongodb to redhsift - meetup July 1st Tel Aviv
Lessons learned mongodb to redhsift - meetup July 1st Tel AvivRoie Shavit
 

What's hot (20)

Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
Big Data Hadoop Training in Pune-Course Content Advanto Software
Big Data Hadoop Training in Pune-Course Content Advanto SoftwareBig Data Hadoop Training in Pune-Course Content Advanto Software
Big Data Hadoop Training in Pune-Course Content Advanto Software
 
Hadoop 2 cluster architecture
Hadoop 2 cluster architectureHadoop 2 cluster architecture
Hadoop 2 cluster architecture
 
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
 
Best Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon RedshiftBest Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon Redshift
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012
 
RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Pilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOTPilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOT
 
Vertica
VerticaVertica
Vertica
 
May 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data OutMay 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data Out
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 
Lessons learned mongodb to redhsift - meetup July 1st Tel Aviv
Lessons learned   mongodb to redhsift - meetup July 1st Tel AvivLessons learned   mongodb to redhsift - meetup July 1st Tel Aviv
Lessons learned mongodb to redhsift - meetup July 1st Tel Aviv
 
Hive hcatalog
Hive hcatalogHive hcatalog
Hive hcatalog
 
ICESat-2 Metadata and Status
ICESat-2 Metadata and StatusICESat-2 Metadata and Status
ICESat-2 Metadata and Status
 

Similar to Big Data redefines Enterprise Data Warehouse @Bangalore

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームMasayuki Matsushita
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDBMongoDB
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationInside Analysis
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase DataWorks Summit
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13EDB
 
EDB Postgres Platform 11 Webinar
EDB Postgres Platform 11 WebinarEDB Postgres Platform 11 Webinar
EDB Postgres Platform 11 WebinarEDB
 
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingInside Analysis
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BIPrasad Prabhu (PP)
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFKeith Kraus
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...Chetan Khatri
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?Jos van Dongen
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoopinside-BigData.com
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)PyData
 
Postgres Databases in Minutes with the EDB Postgres Cloud Database Service
Postgres Databases in Minutes with the EDB Postgres Cloud Database ServicePostgres Databases in Minutes with the EDB Postgres Cloud Database Service
Postgres Databases in Minutes with the EDB Postgres Cloud Database ServiceEDB
 

Similar to Big Data redefines Enterprise Data Warehouse @Bangalore (20)

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Spark Worshop
Spark WorshopSpark Worshop
Spark Worshop
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
 
EDB Postgres Platform 11 Webinar
EDB Postgres Platform 11 WebinarEDB Postgres Platform 11 Webinar
EDB Postgres Platform 11 Webinar
 
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise Thinking
 
Dancing with the Elephant
Dancing with the ElephantDancing with the Elephant
Dancing with the Elephant
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
Postgres Databases in Minutes with the EDB Postgres Cloud Database Service
Postgres Databases in Minutes with the EDB Postgres Cloud Database ServicePostgres Databases in Minutes with the EDB Postgres Cloud Database Service
Postgres Databases in Minutes with the EDB Postgres Cloud Database Service
 

More from Raghu Kashyap

Agile 2017 Lean Product Development
Agile 2017 Lean Product DevelopmentAgile 2017 Lean Product Development
Agile 2017 Lean Product DevelopmentRaghu Kashyap
 
Is BI/Analytics and Agile an Oxymoron?
Is BI/Analytics and Agile an Oxymoron?Is BI/Analytics and Agile an Oxymoron?
Is BI/Analytics and Agile an Oxymoron?Raghu Kashyap
 
Traditional BI or Disruptive BI?
Traditional BI or Disruptive BI?Traditional BI or Disruptive BI?
Traditional BI or Disruptive BI?Raghu Kashyap
 
Idiots guide to stocks
Idiots guide to stocksIdiots guide to stocks
Idiots guide to stocksRaghu Kashyap
 
Orbitz fifth elephant_2015_conference_orbitz_presentation
Orbitz fifth elephant_2015_conference_orbitz_presentationOrbitz fifth elephant_2015_conference_orbitz_presentation
Orbitz fifth elephant_2015_conference_orbitz_presentationRaghu Kashyap
 
Big Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners ViewBig Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners ViewRaghu Kashyap
 
Accelerate 2012 chicago - orbitz
Accelerate   2012 chicago - orbitzAccelerate   2012 chicago - orbitz
Accelerate 2012 chicago - orbitzRaghu Kashyap
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitzRaghu Kashyap
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Raghu Kashyap
 

More from Raghu Kashyap (9)

Agile 2017 Lean Product Development
Agile 2017 Lean Product DevelopmentAgile 2017 Lean Product Development
Agile 2017 Lean Product Development
 
Is BI/Analytics and Agile an Oxymoron?
Is BI/Analytics and Agile an Oxymoron?Is BI/Analytics and Agile an Oxymoron?
Is BI/Analytics and Agile an Oxymoron?
 
Traditional BI or Disruptive BI?
Traditional BI or Disruptive BI?Traditional BI or Disruptive BI?
Traditional BI or Disruptive BI?
 
Idiots guide to stocks
Idiots guide to stocksIdiots guide to stocks
Idiots guide to stocks
 
Orbitz fifth elephant_2015_conference_orbitz_presentation
Orbitz fifth elephant_2015_conference_orbitz_presentationOrbitz fifth elephant_2015_conference_orbitz_presentation
Orbitz fifth elephant_2015_conference_orbitz_presentation
 
Big Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners ViewBig Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners View
 
Accelerate 2012 chicago - orbitz
Accelerate   2012 chicago - orbitzAccelerate   2012 chicago - orbitz
Accelerate 2012 chicago - orbitz
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Big Data redefines Enterprise Data Warehouse @Bangalore

  • 1. Big Data redefines Enterprise Data Warehouse Big Data Innovation, Unicom - Bangalore February 2013 Raghu Kashyap
  • 2. About Raghu Kashyap Personal ■ Director – Data Insights Group @ Orbitz Worldwide ■ eMail: raghu.kashyap@orbitz.com ■ Twitter: @ragskashyap ■ Blog: http://kashyaps.com ■ LinkedIn: http://www.linkedin.com/in/raghukashyap/ Areas of Responsibility ■ Orbitz Services Bangalore Center Head ■ Lead Big Data team that builds out Global Data Infrastructure for Orbitz Worldwide and provides business insights. ■ US, Europe, Australia(APAC) page 2
  • 3. Orbitz in a nutshell page 3
  • 5. Back to the future. page 5
  • 6. Vendor evaluation • KARMAsphere • Datameer • Aster Data page 6
  • 7. Traditional Data warehouse Greenplum Raw logs ETL Staging table Temp tables ETL Data Mart page 7
  • 9. Redefine Enterprise Data warehouse ETL only approach 2:12 seconds Run map reduce job 1m 14.298s Port flat file to Greenplum using GP connector Time: 5.077 s page 9
  • 10. Approach with Hadoop and ETL Raw Greenplum logs Event Model Map Reduce ETL Flat files GP Connector External Tables page 10
  • 11. Resolving database keys tag_value_dim Greenplum id tag value tag_value_dim 1 pos ORB id tag value 2 pos ORBC 200 pos ORB 3 pos ORB 157 pos ORBC ETL fact fact id tag value id fact id Tag value id value value 200 $ 5600 1 $ 5600 200 $ 7500 3 $ 7500 page 11
  • 12. Hadoop Configuration • 74 Nodes • >1PB • Hive • Flume • HBase •R • Cloudera Distribution • Greenplum Connector page 12
  • 13. Hadoop Applications Site Analytics Machine Learning Multi Variate Testing Analysis Production Logs Hotel Rate Cache TTL page 13
  • 15. Business Performance Monitoring • EFX • Marketing channels • Shopper patterns • Recommendation Module page 15
  • 17. MVT Analyze behavioral and Test data from our MVT testing page 17
  • 18. Lessons Learnt  Analytics using Big Data comes with a price.  Data Governance  Senior Leadership buy in  I can't tell you the key to success, but the key to failure is trying to please everyone." -Ed Sheeran page 18