SlideShare una empresa de Scribd logo
1 de 21
Big Data Adoption
Enterprise Integration of Disruptive Technologies

Prepared by:        Date:
Alasdair Anderson   18 March 2013




PUBLIC
Big Data and the Enterprise




                              !=
                              2    PUBLIC
Business Context: HSBC (HSS) a business with a lot of data…..



Global Business

Global outsourcer of
investment operations

Active in 40+ countries
& jurisdictions

Over 150 operational
technology systems

Outsourcing is a
diverse and
incrementally complex
business




                                   3                            PUBLIC
Challenges in building Big Data Environments

                ETL is a brittle 1 shot at success                 One version of the truth….
                                                                                                                                Design
              Tight coupling to the relational model     Any significant change initiates data migration                             Time

 Source                        Integration                          Warehouse                              Division Marts                   Channels

   Ops                                                                                                 Product
                                                                                                         Product                 Read
                                   ODS                                                     ETL             Product                          eCommerce
 Trades                                                                                                      Product
               ETL

 Position
                                                       ETL           Enterprise
                                                                      Logical                           Strategic Marts                     Analytical
  Corp                             CMF
 Actions                                                               Model                                                                  Tools
                                                                                                       Function
                                                                                                         Function                Read
 External                                                                                  ETL             Function
               ETL                Staging                                                                    Function                       Reporting
Market Data


  Client

Exchange                        Vertical Scale               RDBMS struggle with scale out       Multi-Marts increase duplication                 Run
                                 Big Batch                    Appliances are uneconomic          Cost increases with proliferation
                                                                                                                                                 Time


                                             Time to Market: Months for any given slice, years in total
              Total Cost: Any volume or low latency environment requires annual spend in the millions to 10’s of millions

                                                                             4                                                                 PUBLIC
Building Big Data platforms has been an unhappy experience


 Time to market has increased proliferation not consolidation

 Delivery risk is high, as witnessed in industry wide failure rates

 Ultimate Customer satisfaction is low, we often end up
 answering yesterdays questions tomorrow

 The economics of traditional technologies are against
 proliferation of analytical platforms
 – Costs increase with addition of data sources
 – Costs of change increase with addition of data sources


 Processing ceilings are reached quickly when adding newer
 sources of data to traditional platforms


                                           5                          PUBLIC
Crisis of Supply and Demand, we need a new approach

                              High level requirements……

   A single data platform that can provide 360 views of clients, operations and products

 – Functionally the platform should support:
  –   Continual development, integration and deployment
  –   Parallel development streams
  –   Integration of poly-structured datasets
  –   Multi-views on single data sets
  –   ……..act as an ENABLER of change


 – Non-functionally the platform should support:
  –   A low cost economic model for analytical platforms
  –   Scale to terabytes with high throughput ingest and integration
  –   Co-exist with our current estate
  –   Be accessible to business and technology teams


                             Enter Hadoop!
                                                  6                                 PUBLIC
Introducing any new technology to an enterprise



                         Adoption Lifecycle: Hadoop




           Learn           Plan                       Build


             Proof        Business               Pilot Projects
          Of Concept       Value                Strategic Stack




                           What have we done?
                          Whats left, whats next?


                                     7                            PUBLIC
Big Data Vision




                  8   PUBLIC
Big Data Vision: The Agile Information Lifecycle

                                 Data
                                                Events
                               Discovery




                 Analytical
                                                         Blotters
                 Application




                                  Map
                                Reduce          Ingest
                               Processing



Insights rarely happen on the first query or build, more likely to occur after
                      several iterations on a dataset

                                            9                             PUBLIC
Hadoop Proof of Concept Scope: Gaungzhou China




      Using           Time to install          Ease of
                                                             Performance
                        a vendor             maintaining
     Hadoop                                                  comparison
                        package              the cluster



   Developing         Integration of           Building         Porting
                         existing            applications    existing code
   on Hadoop            databases           on the cluster    to Hadoop


    Advanced                                Enhance an
                                                              Build out a
                      Development             existing
     Analytics         skills levels         analytics
                                                             new modelling
                                                                service
    on Hadoop                                package




                                       10                               PUBLIC
Proof of Concept Results


 Hadoop was installed and operational in a week

 18 RDBMS Warehouse and Marts databases were ported to
 Hadoop in 4 weeks

 A existing batch that currently take 3 hours was reengineering
 on Hadoop: Run Time 10 minutes

 A current Java based analytics routine was ported onto Hadoop
 increasing data coverage and reducing execution time

 We lost the namenode and had to rebuild the cluster…..




                                       11                         PUBLIC
Hadoop Code Day: Gaungzhou China

We sponsored a 24 hour code competition
to allow the off-shore teams to show their
stuff

We had over 50 volunteers for the event

The volunteers were split into teams of 3
and given 24 hours to develop an
application using the Proof of Concept
cluster

1 weeks training was offered to the
participant on a casual basis

All the teams delivered…………


                                          12   PUBLIC
Next Step: Planning



                         Adoption Lifecycle




           Learn        Plan                     Build


             Proof     Business                Pilot Projects
          Of Concept    Value                 Strategic Stack




                                  13                            PUBLIC
Big Data Plan: Big Data Economics (names removed to protect the innocent)




                                      14                                    PUBLIC
Hadoop Economics: Technology for Austerity




                                                       REVENUE

                             MARGIN

      COST


               Hadoop speaks to the economics of today
   Growing product and capacity at the same time as increasing margin
                                   15                               PUBLIC
Generic HSBC Big Data Use Cases

     Volume File Processing                        Big Warehouse                         Advanced Analytics
 Characteristics                          Characteristics                         Characteristics
 • High Volume, High Throughput           • Multi-source warehouse analytics      • Statistical modeling and what if
   processing of legacy flat files, XML     environment providing a single data     analysis on group wide data across
   or other structured and semi-            platform across multiple business       multiple business lines
   structured data                          lines                                 • Production of data derived products
                                          • Integration of polystructred data
 Current challenges                       Current challenges                      Current challenges
 • Cost: High volumes processing          • Time to Market: Data Warehouse /      • Scale: Traditional Analytic Data
   predominantly still reside on the        MI projects have proved extremely       platforms have only been able to
   mainframe, making low complexity         challenging to implement in HSBC        scale on the vertical
   processing expensive                     and in the Finance Industry in        • Cost: The amount of compute
 • Scale: the ability to grow out           general                                 power required to perform volume
   mainframe capacity quickly is          • Complexity: Data Integration of         statistical  operations     is   cost
   limited, the ability to scale on         even group standard systems has         prohibitive
   distributed platforms is limited         proved difficult due the variety of   • Fidelity: Analytical calculations are
                                            data structures and content             typically run on aggregate totals
                                          • Latency: Real Time MI is still only     leading to a disconnect between
                                            available via reporting from source     events and the derived conclusions
                                            directly                                or decisions
                                                                                  .

                                                   Day 1 Value

                                                 Strategic Value


                                                           16                                                               PUBLIC
Big Data Plan: When and Where




                                17   PUBLIC
Next Step: Planning



                         Adoption Lifecycle




           Learn        Plan                     Build


             Proof     Business                Pilot Projects
          Of Concept    Value                 Strategic Stack




                                  18                            PUBLIC
So we’re done?




                 Not quite……




                     19        PUBLIC
Remaining Challenges: Big Data Operations




      Big Data Operations             Big Data Organisation              Hype / Cynicism


Is Hadoop anti-virtualisation?      Segregation of duties          USE IT AS A POSITIVE!!!

  High Availability / disaster      Big Data doesn’t want a     Place Big Data into a competitive
 Recovery needs to improve       separate app, database, os &     situation against your existing
                                  storage team. The platform         Information Management
  Security and data privacy       demands skilled generalists   technologies, if you can’t get the
          concerns                                               job done better/faster/cheaper
                                                                  then alter your decision tree?

      Data Federation
                                                                                           PUBLIC
                                              20
The art of the possible in 24 hours…..




           Hadoop excites……
                                               Hadoop on iPad & Android
               (and tires)




                                                   The Winners….
         Hadoop on HTML5 & Flex
                                         Hadoop & R for Portfolio Optimisation

                                    21                                           PUBLIC

Más contenido relacionado

La actualidad más candente

Corporate overview ppt
Corporate overview pptCorporate overview ppt
Corporate overview pptdkkro
 
Bridging the gap between research and it
Bridging the gap between research and itBridging the gap between research and it
Bridging the gap between research and itBIOVIA
 
E-Business Suite 1 | Nadia Bendjedou | Oracle E-Business Suite applications s...
E-Business Suite 1 | Nadia Bendjedou | Oracle E-Business Suite applications s...E-Business Suite 1 | Nadia Bendjedou | Oracle E-Business Suite applications s...
E-Business Suite 1 | Nadia Bendjedou | Oracle E-Business Suite applications s...InSync2011
 
Astute oracle i participate webinar series - v1
Astute   oracle i participate webinar series - v1Astute   oracle i participate webinar series - v1
Astute oracle i participate webinar series - v1Arvind Rajan
 
2nd day 1 - alm overview
2nd day   1 - alm overview 2nd day   1 - alm overview
2nd day 1 - alm overview Lilian Schaffer
 
Revolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution Analytics
 
Grid07 4 Tzannetakis
Grid07 4 TzannetakisGrid07 4 Tzannetakis
Grid07 4 Tzannetakisimec.archive
 
To Each Their Own: How to Solve Analytic Complexity
To Each Their Own: How to Solve Analytic ComplexityTo Each Their Own: How to Solve Analytic Complexity
To Each Their Own: How to Solve Analytic ComplexityInside Analysis
 
Cordys presentation
Cordys presentationCordys presentation
Cordys presentationMans Jug
 
2 trasnformation design_patterns-sandeep_katoch
2 trasnformation design_patterns-sandeep_katoch2 trasnformation design_patterns-sandeep_katoch
2 trasnformation design_patterns-sandeep_katochIBM
 
Real insights real_results-steve_robinson
Real insights real_results-steve_robinsonReal insights real_results-steve_robinson
Real insights real_results-steve_robinsonIBM
 
IGC Solutions for IBM ECM
IGC Solutions for IBM ECMIGC Solutions for IBM ECM
IGC Solutions for IBM ECMkaterogersbrown
 
Industry-forum 2011 PARTsolutions CONTACT
Industry-forum 2011 PARTsolutions CONTACTIndustry-forum 2011 PARTsolutions CONTACT
Industry-forum 2011 PARTsolutions CONTACTCADENAS
 
Enterprise modernization: improving the economics of mainframe and multi-plat...
Enterprise modernization: improving the economics of mainframe and multi-plat...Enterprise modernization: improving the economics of mainframe and multi-plat...
Enterprise modernization: improving the economics of mainframe and multi-plat...IBM Rational software
 
M A Morcuende New Media Presentation January2012
M A Morcuende New Media Presentation January2012M A Morcuende New Media Presentation January2012
M A Morcuende New Media Presentation January2012Miguel Angel Morcuende
 
Summit 2011 infra_dev_soa
Summit 2011 infra_dev_soaSummit 2011 infra_dev_soa
Summit 2011 infra_dev_soaPini Cohen
 

La actualidad más candente (18)

Corporate overview ppt
Corporate overview pptCorporate overview ppt
Corporate overview ppt
 
Bridging the gap between research and it
Bridging the gap between research and itBridging the gap between research and it
Bridging the gap between research and it
 
E-Business Suite 1 | Nadia Bendjedou | Oracle E-Business Suite applications s...
E-Business Suite 1 | Nadia Bendjedou | Oracle E-Business Suite applications s...E-Business Suite 1 | Nadia Bendjedou | Oracle E-Business Suite applications s...
E-Business Suite 1 | Nadia Bendjedou | Oracle E-Business Suite applications s...
 
Astute oracle i participate webinar series - v1
Astute   oracle i participate webinar series - v1Astute   oracle i participate webinar series - v1
Astute oracle i participate webinar series - v1
 
2nd day 1 - alm overview
2nd day   1 - alm overview 2nd day   1 - alm overview
2nd day 1 - alm overview
 
Revolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution R Enterprise - 100% R and More Webinar Presentation
Revolution R Enterprise - 100% R and More Webinar Presentation
 
NCCW
NCCWNCCW
NCCW
 
Grid07 4 Tzannetakis
Grid07 4 TzannetakisGrid07 4 Tzannetakis
Grid07 4 Tzannetakis
 
PLM-ERP Integration
PLM-ERP IntegrationPLM-ERP Integration
PLM-ERP Integration
 
To Each Their Own: How to Solve Analytic Complexity
To Each Their Own: How to Solve Analytic ComplexityTo Each Their Own: How to Solve Analytic Complexity
To Each Their Own: How to Solve Analytic Complexity
 
Cordys presentation
Cordys presentationCordys presentation
Cordys presentation
 
2 trasnformation design_patterns-sandeep_katoch
2 trasnformation design_patterns-sandeep_katoch2 trasnformation design_patterns-sandeep_katoch
2 trasnformation design_patterns-sandeep_katoch
 
Real insights real_results-steve_robinson
Real insights real_results-steve_robinsonReal insights real_results-steve_robinson
Real insights real_results-steve_robinson
 
IGC Solutions for IBM ECM
IGC Solutions for IBM ECMIGC Solutions for IBM ECM
IGC Solutions for IBM ECM
 
Industry-forum 2011 PARTsolutions CONTACT
Industry-forum 2011 PARTsolutions CONTACTIndustry-forum 2011 PARTsolutions CONTACT
Industry-forum 2011 PARTsolutions CONTACT
 
Enterprise modernization: improving the economics of mainframe and multi-plat...
Enterprise modernization: improving the economics of mainframe and multi-plat...Enterprise modernization: improving the economics of mainframe and multi-plat...
Enterprise modernization: improving the economics of mainframe and multi-plat...
 
M A Morcuende New Media Presentation January2012
M A Morcuende New Media Presentation January2012M A Morcuende New Media Presentation January2012
M A Morcuende New Media Presentation January2012
 
Summit 2011 infra_dev_soa
Summit 2011 infra_dev_soaSummit 2011 infra_dev_soa
Summit 2011 infra_dev_soa
 

Destacado

Syoncloud big data for retail banking, Syoncloud
Syoncloud big data for retail banking,  SyoncloudSyoncloud big data for retail banking,  Syoncloud
Syoncloud big data for retail banking, SyoncloudLadislav Urban
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesDataWorks Summit
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
API Adoption Patterns in Banking & The Promise of Microservices
API Adoption Patterns in Banking & The Promise of MicroservicesAPI Adoption Patterns in Banking & The Promise of Microservices
API Adoption Patterns in Banking & The Promise of MicroservicesAkana
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerDataWorks Summit
 
25 best practices in banner advertising
25 best practices in banner advertising25 best practices in banner advertising
25 best practices in banner advertisingShane Smith
 
Franciscan Alliance Blazes New Trails in Healthcare Delivery
Franciscan Alliance Blazes New Trails in Healthcare DeliveryFranciscan Alliance Blazes New Trails in Healthcare Delivery
Franciscan Alliance Blazes New Trails in Healthcare DeliveryAvaya Inc.
 

Destacado (12)

Types of Banks in India
Types of Banks in IndiaTypes of Banks in India
Types of Banks in India
 
Syoncloud big data for retail banking, Syoncloud
Syoncloud big data for retail banking,  SyoncloudSyoncloud big data for retail banking,  Syoncloud
Syoncloud big data for retail banking, Syoncloud
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Big Data Proof of Concept
Big Data Proof of ConceptBig Data Proof of Concept
Big Data Proof of Concept
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
API Adoption Patterns in Banking & The Promise of Microservices
API Adoption Patterns in Banking & The Promise of MicroservicesAPI Adoption Patterns in Banking & The Promise of Microservices
API Adoption Patterns in Banking & The Promise of Microservices
 
RETAIL BANKING
RETAIL BANKING RETAIL BANKING
RETAIL BANKING
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A Primer
 
25 best practices in banner advertising
25 best practices in banner advertising25 best practices in banner advertising
25 best practices in banner advertising
 
Franciscan Alliance Blazes New Trails in Healthcare Delivery
Franciscan Alliance Blazes New Trails in Healthcare DeliveryFranciscan Alliance Blazes New Trails in Healthcare Delivery
Franciscan Alliance Blazes New Trails in Healthcare Delivery
 

Similar a Enterprise Integration of Disruptive Technologies

Saleseffectivity and business intelligence
Saleseffectivity and business intelligenceSaleseffectivity and business intelligence
Saleseffectivity and business intelligencemarekdan
 
Netapp - An Agile Data Infrastructure to Power Your Cloud
Netapp - An Agile Data Infrastructure to Power Your CloudNetapp - An Agile Data Infrastructure to Power Your Cloud
Netapp - An Agile Data Infrastructure to Power Your CloudGlobal Business Events
 
What is BI on Cloud
What is BI on CloudWhat is BI on Cloud
What is BI on Cloudtdwiindia
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapersKai Zhao
 
Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems divjeev
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroSpagoWorld
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...Mark Tapley
 
ActuateOne for Utility Analytics
ActuateOne for Utility AnalyticsActuateOne for Utility Analytics
ActuateOne for Utility Analyticskatsoulis
 
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelA Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelInside Analysis
 
Microsoft Business Intelligence Vision and Strategy
Microsoft Business Intelligence Vision and StrategyMicrosoft Business Intelligence Vision and Strategy
Microsoft Business Intelligence Vision and StrategyNic Smith
 
Open Source DWBI-A Primer
Open Source DWBI-A PrimerOpen Source DWBI-A Primer
Open Source DWBI-A Primerpartha69
 
Insync 10 session jd edwards strategy and roadmap anz (a4) - final
Insync 10 session   jd edwards strategy and roadmap anz (a4) - finalInsync 10 session   jd edwards strategy and roadmap anz (a4) - final
Insync 10 session jd edwards strategy and roadmap anz (a4) - finalInSync Conference
 
From open data to API-driven business
From open data to API-driven businessFrom open data to API-driven business
From open data to API-driven businessOpenDataSoft
 

Similar a Enterprise Integration of Disruptive Technologies (20)

Saleseffectivity and business intelligence
Saleseffectivity and business intelligenceSaleseffectivity and business intelligence
Saleseffectivity and business intelligence
 
Netapp - An Agile Data Infrastructure to Power Your Cloud
Netapp - An Agile Data Infrastructure to Power Your CloudNetapp - An Agile Data Infrastructure to Power Your Cloud
Netapp - An Agile Data Infrastructure to Power Your Cloud
 
What is BI on Cloud
What is BI on CloudWhat is BI on Cloud
What is BI on Cloud
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
 
Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence Intro
 
ETL
ETLETL
ETL
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...
 
ActuateOne for Utility Analytics
ActuateOne for Utility AnalyticsActuateOne for Utility Analytics
ActuateOne for Utility Analytics
 
The New Enterprise Data Platform
The New Enterprise Data PlatformThe New Enterprise Data Platform
The New Enterprise Data Platform
 
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelA Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
 
Microsoft Business Intelligence Vision and Strategy
Microsoft Business Intelligence Vision and StrategyMicrosoft Business Intelligence Vision and Strategy
Microsoft Business Intelligence Vision and Strategy
 
Open Source DWBI-A Primer
Open Source DWBI-A PrimerOpen Source DWBI-A Primer
Open Source DWBI-A Primer
 
Insync 10 session jd edwards strategy and roadmap anz (a4) - final
Insync 10 session   jd edwards strategy and roadmap anz (a4) - finalInsync 10 session   jd edwards strategy and roadmap anz (a4) - final
Insync 10 session jd edwards strategy and roadmap anz (a4) - final
 
Smarter Retail
Smarter RetailSmarter Retail
Smarter Retail
 
From open data to API-driven business
From open data to API-driven businessFrom open data to API-driven business
From open data to API-driven business
 
Enterprise Services Solutions
Enterprise Services SolutionsEnterprise Services Solutions
Enterprise Services Solutions
 
Innovate 2010-oslc-jazz
Innovate 2010-oslc-jazzInnovate 2010-oslc-jazz
Innovate 2010-oslc-jazz
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Último (20)

React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

Enterprise Integration of Disruptive Technologies

  • 1. Big Data Adoption Enterprise Integration of Disruptive Technologies Prepared by: Date: Alasdair Anderson 18 March 2013 PUBLIC
  • 2. Big Data and the Enterprise != 2 PUBLIC
  • 3. Business Context: HSBC (HSS) a business with a lot of data….. Global Business Global outsourcer of investment operations Active in 40+ countries & jurisdictions Over 150 operational technology systems Outsourcing is a diverse and incrementally complex business 3 PUBLIC
  • 4. Challenges in building Big Data Environments ETL is a brittle 1 shot at success One version of the truth…. Design Tight coupling to the relational model Any significant change initiates data migration Time Source Integration Warehouse Division Marts Channels Ops Product Product Read ODS ETL Product eCommerce Trades Product ETL Position ETL Enterprise Logical Strategic Marts Analytical Corp CMF Actions Model Tools Function Function Read External ETL Function ETL Staging Function Reporting Market Data Client Exchange Vertical Scale RDBMS struggle with scale out Multi-Marts increase duplication Run Big Batch Appliances are uneconomic Cost increases with proliferation Time Time to Market: Months for any given slice, years in total Total Cost: Any volume or low latency environment requires annual spend in the millions to 10’s of millions 4 PUBLIC
  • 5. Building Big Data platforms has been an unhappy experience Time to market has increased proliferation not consolidation Delivery risk is high, as witnessed in industry wide failure rates Ultimate Customer satisfaction is low, we often end up answering yesterdays questions tomorrow The economics of traditional technologies are against proliferation of analytical platforms – Costs increase with addition of data sources – Costs of change increase with addition of data sources Processing ceilings are reached quickly when adding newer sources of data to traditional platforms 5 PUBLIC
  • 6. Crisis of Supply and Demand, we need a new approach High level requirements…… A single data platform that can provide 360 views of clients, operations and products – Functionally the platform should support: – Continual development, integration and deployment – Parallel development streams – Integration of poly-structured datasets – Multi-views on single data sets – ……..act as an ENABLER of change – Non-functionally the platform should support: – A low cost economic model for analytical platforms – Scale to terabytes with high throughput ingest and integration – Co-exist with our current estate – Be accessible to business and technology teams Enter Hadoop! 6 PUBLIC
  • 7. Introducing any new technology to an enterprise Adoption Lifecycle: Hadoop Learn Plan Build Proof Business Pilot Projects Of Concept Value Strategic Stack What have we done? Whats left, whats next? 7 PUBLIC
  • 8. Big Data Vision 8 PUBLIC
  • 9. Big Data Vision: The Agile Information Lifecycle Data Events Discovery Analytical Blotters Application Map Reduce Ingest Processing Insights rarely happen on the first query or build, more likely to occur after several iterations on a dataset 9 PUBLIC
  • 10. Hadoop Proof of Concept Scope: Gaungzhou China Using Time to install Ease of Performance a vendor maintaining Hadoop comparison package the cluster Developing Integration of Building Porting existing applications existing code on Hadoop databases on the cluster to Hadoop Advanced Enhance an Build out a Development existing Analytics skills levels analytics new modelling service on Hadoop package 10 PUBLIC
  • 11. Proof of Concept Results Hadoop was installed and operational in a week 18 RDBMS Warehouse and Marts databases were ported to Hadoop in 4 weeks A existing batch that currently take 3 hours was reengineering on Hadoop: Run Time 10 minutes A current Java based analytics routine was ported onto Hadoop increasing data coverage and reducing execution time We lost the namenode and had to rebuild the cluster….. 11 PUBLIC
  • 12. Hadoop Code Day: Gaungzhou China We sponsored a 24 hour code competition to allow the off-shore teams to show their stuff We had over 50 volunteers for the event The volunteers were split into teams of 3 and given 24 hours to develop an application using the Proof of Concept cluster 1 weeks training was offered to the participant on a casual basis All the teams delivered………… 12 PUBLIC
  • 13. Next Step: Planning Adoption Lifecycle Learn Plan Build Proof Business Pilot Projects Of Concept Value Strategic Stack 13 PUBLIC
  • 14. Big Data Plan: Big Data Economics (names removed to protect the innocent) 14 PUBLIC
  • 15. Hadoop Economics: Technology for Austerity REVENUE MARGIN COST Hadoop speaks to the economics of today Growing product and capacity at the same time as increasing margin 15 PUBLIC
  • 16. Generic HSBC Big Data Use Cases Volume File Processing Big Warehouse Advanced Analytics Characteristics Characteristics Characteristics • High Volume, High Throughput • Multi-source warehouse analytics • Statistical modeling and what if processing of legacy flat files, XML environment providing a single data analysis on group wide data across or other structured and semi- platform across multiple business multiple business lines structured data lines • Production of data derived products • Integration of polystructred data Current challenges Current challenges Current challenges • Cost: High volumes processing • Time to Market: Data Warehouse / • Scale: Traditional Analytic Data predominantly still reside on the MI projects have proved extremely platforms have only been able to mainframe, making low complexity challenging to implement in HSBC scale on the vertical processing expensive and in the Finance Industry in • Cost: The amount of compute • Scale: the ability to grow out general power required to perform volume mainframe capacity quickly is • Complexity: Data Integration of statistical operations is cost limited, the ability to scale on even group standard systems has prohibitive distributed platforms is limited proved difficult due the variety of • Fidelity: Analytical calculations are data structures and content typically run on aggregate totals • Latency: Real Time MI is still only leading to a disconnect between available via reporting from source events and the derived conclusions directly or decisions . Day 1 Value Strategic Value 16 PUBLIC
  • 17. Big Data Plan: When and Where 17 PUBLIC
  • 18. Next Step: Planning Adoption Lifecycle Learn Plan Build Proof Business Pilot Projects Of Concept Value Strategic Stack 18 PUBLIC
  • 19. So we’re done? Not quite…… 19 PUBLIC
  • 20. Remaining Challenges: Big Data Operations Big Data Operations Big Data Organisation Hype / Cynicism Is Hadoop anti-virtualisation? Segregation of duties USE IT AS A POSITIVE!!! High Availability / disaster Big Data doesn’t want a Place Big Data into a competitive Recovery needs to improve separate app, database, os & situation against your existing storage team. The platform Information Management Security and data privacy demands skilled generalists technologies, if you can’t get the concerns job done better/faster/cheaper then alter your decision tree? Data Federation PUBLIC 20
  • 21. The art of the possible in 24 hours….. Hadoop excites…… Hadoop on iPad & Android (and tires) The Winners…. Hadoop on HTML5 & Flex Hadoop & R for Portfolio Optimisation 21 PUBLIC

Notas del editor

  1. In essence: We are a processor of other peoples dataChallengesNobody does data the same way, even in the same systemsDifferences are inDefinitionsFormatscontent
  2. In essence: We are a processor of other peoples dataChallengesNobody does data the same way, even in the same systemsDifferences are inDefinitionsFormatscontent
  3. Dedicated ETL is an expensive way of doing thingsBig RDBMS or dedicated appliances are expensiveMarts mart everywhereCONCLUSION: high volume or/and low latency is very expensive to runRESULT: People are becoming reluctant to invest in these platforms and are looking for a service that can start small and grow
  4. The road to damascus…..Vision is HSS only at this point in timeThe search for an alternate way of doing things has led us to hadoopHadoop lowers the barrier to entry for compute style solution to data problemsCONCLUSION: We view Hadoop as THE future technology for data platformsRESULT: We have begun the tech adoption process in the bank
  5. The road to damascus…..Vision is HSS only at this point in timeThe search for an alternate way of doing things has led us to hadoopHadoop lowers the barrier to entry for compute style solution to data problemsCONCLUSION: We view Hadoop as THE future technology for data platformsRESULT: We have begun the tech adoption process in the bank
  6. Todays biggest business challenge: Information management currently representsAgility in delivering data integrationFlexibility to present multi-views of dataBiggest business opportunity: AnalyticsScenario modellingPortfolio efficiency measurementThese all require big compute
  7. …..here’s what it looks likeWalk left to rightExplain Map ReduceContrast with the old way, our vision of the new wayEDW will be around for some time to come but will be gradually superceededMap Reduce will be implemented via high level languagesA single warehouse become achievableMarts are demised in favour of views onto the base dataThe value add will come via data discovery….iterative ETL…..hypothesis testingCONCLUSION: Hadoop brings massive compute levels to bear on these problems, affordably
  8. The is the next generation ETLETL process become truly iterativeAccept that you will get it wrong the first time round, Hadoop make the penalty for failure minimalThe value add will come via data discovery….iterative ETL…..hypothesis testingCONCLUSION: ETL moves from brittle to bend don’t breakRESULT: In building your Big Warehouse adding additional data/systems/perspect is a low tax operation
  9. Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
  10. Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
  11. ….our experience wasA vendor Hadoop package makes sense to an organisation like usData loads tooks days not monthsWe were quickly able to automate the loadsUsed Apache tools onlyBONUS Calypso data…. New for HSSHACKATHONOpen invite to all markets staffObjective; to use Hadoop against the business use caseSet judging criteriaStraight 24 hours over a weekendCompetition Prizes Attended by nearly 60 staff, equal to 20% of our China office18 teams, 17 delivered Wining application was stunningCONCLUSION: Hadoop is a great functional fit for our business demandRESULT: High level of confidence around the technology
  12. Todays biggest business challenge: Information management currently representsAgility in delivering data integrationFlexibility to present multi-views of dataBiggest business opportunity: AnalyticsScenario modellingPortfolio efficiency measurementThese all require big compute
  13. Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
  14. The is the next generation ETLETL process become truly iterativeAccept that you will get it wrong the first time round, Hadoop make the penalty for failure minimalThe value add will come via data discovery….iterative ETL…..hypothesis testingCONCLUSION: ETL moves from brittle to bend don’t breakRESULT: In building your Big Warehouse adding additional data/systems/perspect is a low tax operation
  15. Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
  16. Todays biggest business challenge: Information management currently representsAgility in delivering data integrationFlexibility to present multi-views of dataBiggest business opportunity: AnalyticsScenario modellingPortfolio efficiency measurementThese all require big compute
  17. Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
  18. Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
  19. ….our experience wasA vendor Hadoop package makes sense to an organisation like usData loads tooks days not monthsWe were quickly able to automate the loadsUsed Apache tools onlyBONUS Calypso data…. New for HSSHACKATHONOpen invite to all markets staffObjective; to use Hadoop against the business use caseSet judging criteriaStraight 24 hours over a weekendCompetition Prizes Attended by nearly 60 staff, equal to 20% of our China office18 teams, 17 delivered Wining application was stunningCONCLUSION: Hadoop is a great functional fit for our business demandRESULT: High level of confidence around the technology