SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
SOA & Big data	
  

Arnon	
  Rotem-­‐Gal-­‐Oz	
  
Sept	
  2012	
  –	
  iOS6	
  launched	
  with	
  new	
  maps	
  applica>on	
  
hEp://theamazingios6maps.tumblr.com/	
  




But	
  something	
  went	
  terribly	
  wrong….	
  
•  It	
  isn’t	
  just	
  about	
  
   geKng	
  all	
  the	
  data	
  
   there	
  
•  Algorithms	
  are	
  cool	
  
   but	
  we	
  need	
  humans	
  
   in	
  the	
  loop	
  
•  Hire	
  the	
  right	
  people	
  
•  Test	
  !	
  Test	
  !	
  Test!	
  	
  



                                             hEp://theamazingios6maps.tumblr.com/	
  
hEp://theamazingios6maps.tumblr.com/	
  




                               It	
  isn’t	
  just	
  one	
  pile	
  of	
  data	
  
Integra>ng	
  Big	
  data	
  &	
  SOA	
  	
  




                      Yoel	
  Ben	
  Avraham	
  -­‐	
  hEp://www.flickr.com/photos/epublicist/3546059144/	
  
Data	
  	
  
           Refinery	
  	
  

Ofer	
  Berger	
  	
  
hEp://www.haifacity.com/allsites/allpic/a/A1738/A1738Pic3326.jpg	
  
ETL	
  integra>on	
              Department	
  
DB	
  integra>on	
  
                                 Server	
  
File-­‐based	
  integra>on	
  
Online	
  integra>on	
           DB	
  
 
                            Object	
  soup	
  
 ASB           BLT

              AFT                 TGI                  FRY

                                                                     ECP
 HDL
                                                SWG
                           DRW                                             MFP



                                                                           WCP
                   QYD                  DLY             SKD

                                                DLY
                                                                    XPS
WIU

                                                                             KYF
       XOI           ZIS          CUI                       WKD

                                                WHR
ASB                                                                                GEX
         RMO                                                               JIA
                                        HCO



             BST            VUH           KFC         AJT     FQA           DKE
 
                                  Services	
  
 ASB           BLT

              AFT                  TGI                  FRY


  Customer
 HDL
                                                 SWG
                                                       Orders
                                                                      ECP


                           DRW                                              MFP



                                                                            WCP
                   QYD                   DLY             SKD

                                                 DLY
                                                                     XPS
WIU

                                                                              KYF
       XOI           ZIS           CUI                       WKD


                                                        Promotions
                                                 WHR
ASB
        Invoices
         RMO
                                         HCO
                                                                            JIA     GEX




             BST            VUH            KFC         AJT     FQA           DKE
Adheres	
  to          	
                Policy   	
               Governed	
  by   	
  
                              Binds	
  to   	
                      Endpoint           	
           Exposes      	
  
                                                                     Serves     	
  
 Service	
  	
  
consumer           	
        Understands                     	
     Contracts          	
       Implements       	
                  	
  
                                                                                                                               Service


                                                                    Describes           	
  
                                                                                                                                            Key	
  



                                                      	
                               	
                               	
  
                                                                                                                                                      Component	
  

                          Sends/receives                            Messages                   Sends/receives                                         Rela>on	
  
Customer	
  




             Interac>ons	
  



Agents	
                       Categories	
  
Integra>ng	
  Big	
  data	
  &	
  SOA	
  	
  




                      Yoel	
  Ben	
  Avraham	
  -­‐	
  hEp://www.flickr.com/photos/epublicist/3546059144/	
  
Saga	
  
              Key	
  
                        SOA	
  component	
            PaEern	
  component	
  

                        Rela>on	
                     Concern/aEribute	
  	
  




                                                                                 Prepare/commit/undo	
  
                                               Protocol	
                                                                          Perform	
  	
  
                                                                                                                                   acDvity	
  	
  
                                                                                            Register	
                                   	
   	
  
                                         RegistraDon	
  
                                             	
                                                                                     	
   	
   	
  
                                                                                                                                 Compensate	
  
Prepare	
  /	
                           Coordinator*	
  
                                             	
                                                                                     	
   	
   	
  
commit	
  /	
                                	
                                                                                     	
   	
   	
  
                                           Perform	
  	
  
                                             	
                                                                                     	
   	
   	
  
undo	
  
                                           acDvity	
  
                                             	
                                    AcDviDes	
  and	
  replies	
                     	
   	
   	
  
                                                                                                                                 Par>cipator	
  
                                             	
                                                                                     	
   	
   	
  
                                         Compensate	
  
                                             	
                                                                                     	
   	
  
                                             	
                                                                                     	
  
                                            Create	
                               AcDviDes	
  and	
  replies	
  
                                           context	
  
                               Ini>ator	
  
                   Service	
  consumer	
                                                                            Service	
  	
  
HCatalog                                                     HBase
                     Data
                     Management
                                     HBase                                     Interactions


                                                                                                 HBase
                                                 Customer   HBase


      Interaction         ETL
      Recordings

                                                                                                               HBase
NIM                                      Raw                                                    Categories
                                        (HDFS)
                                                                 Resolved
                                                              Interactions(H
                                                                   DFS)



                                                                                                 Resolved
                                                                                              Interactions(H
                                                                                                   DFS)
                    Hadoop Cluster
So,	
  
what’s	
  the	
  	
  
problem	
  ?	
  
 &	
  Big	
  data	
  	
  
can’t	
  move	
  
Performance	
  of	
  joins	
  in	
  distributed	
  
         system	
  sucks!	
  

                                                       Interactions                   Interactions
         Interactions 0-99                                                              200-299
                                                         100-199




           customers A-H                              customers I-M                  customers N-Z



      Node 1                                     Node 2                           Node 3



 {”Interac>on":	
  {	
  
 	
  	
  "id":	
  ”5",	
  
 	
  	
  	
  ”par>cipants":	
  {	
  
 	
  	
  	
  	
  ”customer":	
  [	
  
 	
  	
  	
  	
  	
  	
  {”surname":	
  ”McDonalds",	
  ”name":	
  ”Old"},]	
  
 	
  	
  }	
  
 }}	
  
Cookie	
  cuEer	
  
 scalability	
  	
  
Cell	
  architecture	
  

               Node	
  
                2	
  

        Node	
  
         3	
  

               Node	
  
                1	
  

        Node	
  
         N	
  
Cell	
  Architecture	
  
   HBase                HBase           HBase




  Categories        Customers           ORCA




                                  BUS



         Interactions      Reference            …
                             Data




    HDFS       HBase        HBase
Orchestra>on	
  
                                                                                       	
  
                                                                                       	
  
Initiate business process                                                              	
  
                                                                   Endpoint    	
      	
  
                                                                                       	
  
                                                                                       	
  
                           Manage	
  
                           process	
                                                   	
  
                                                                                      Service
                                                                                       	
  

                           Schedule	
  
                                                                                       	
  


 Route
 request                                    Invoke                             	
  
                                                     servic                    	
  
           Workflow instance
                                                           es
                                                                               	
  
Workflow      Host	
          Manage	
          Monitor	
              	
  
                                                                Endpoint
                                                                               	
  
                                                                               	
  
engine      workflows	
       workflows	
        workflows	
                      	
  
                                                                               	
  
                                                                              Service
                                                                               	
  
                                                                               	
  
Map	
  Reduce	
  processing	
  pipeline	
  
                                                                   Customers
                                                                   Local cache




                                               Retrive segment
                                                                                      Categorize       Update Segment    InteractionID, Segment Row
                          Segment Row           data - create        Resolve
                                                  segment          Customer IDs        Segment             document
                                                  document          (Custoemr)      (Categorization)     (Interaction)
                                                (Interaction)

                                          Map pipeline




                                                                                         Write
                                                                                                             Write
                                                                                      Categories
                                                                                                          Interaction
                                                                                        Results
                                                                                                         (interaction)
                                                                                    (Categorization)

                           Map




                          Interaction &                                                   Update
                                                  Categorize        Prepare data
                          Segments                                                      Interaction
                                                  Interaction       mart Export
                                                                                         document
                                                (Categorization)     (Datamart)
                                                                                       (Interaction)


                                          Reduce pipeline




                                                    Write
                                                                        Write             Write
                                                 Categories
                                                                     Interaction       Interaction
                                                   Results
                                                                    (interaction)     (interaction)
                                               (Categorization)

                            Reduce



      Hadoop Map/Reduce
Map	
  Reduce	
  processing	
  pipeline	
  

                                     Customers
                                     Local cache




                   Retrive segment
                                                      Categorize       Update Segment    InteractionID, Segment Row
Segment Row         data - create      Resolve
                      segment        Customer IDs      Segment             document
                      document        (Custoemr)    (Categorization)     (Interaction)
                    (Interaction)

              Map pipeline




                                                         Write
                                                                             Write
                                                      Categories
                                                                          Interaction
                                                        Results
                                                                         (interaction)
                                                    (Categorization)

 Map
Data	
  Facets	
  
Memcached	
                                                                   GigaSpaces	
  
                                                                                            Redis	
                GridGain	
  


                                                                        Caching                                                                    Data grid
                                                                                                                      Oracle	
  Coherence	
  
         Columnar
                                                                                                                   WebSphere	
  eXtreme	
  Scale	
  
                                                                                                                                                                                      Hama	
  
        HBase	
                          Cassandra	
                                              In-memory
                                                                                                                                                            Pregel	
  
                                                           Accumulo	
  
                     Hypertable	
  
                                                                                      Key-value store                                           Neo4j	
                          Graph
  Hadoop	
  
                         GlusterFS	
  
Distributed file systems
         RavenDB	
  
                                                                                                                                                                            ScaleBase	
  
                                     MongoDB	
  
                                                                                                           Relational
                                                        Document
      CouchDB	
                                                                                                                      NewSQL
                                                                   IndexTank	
                                                                                           Amazon	
  RDS	
  
                                                                                              Analytics/MPP
                                                                                                                                                                                 VoltDB	
  
                Apache	
  Solr	
                   AKvio	
  
                                                                              Aster	
  Data	
  

                                                                                                                        Microsoo	
  PDW	
  
                    Indexing                                         Columnar
                                                         ParAccel	
  
                                                                              SAP	
  HANA	
  
                                                                                                                           Oracle	
  Exadata	
  
                                                           HP	
  Ver>ca	
  
                                                                                              IBM	
  Netezza	
  
                                                               EMC	
  Greenplum	
  
Data	
  is	
  mul>-­‐>ered	
  
                                         Datamart(s)	
                 Cube	
             Real-­‐>me	
  
Datawarehouse	
  
                                            (RDBMS)	
              (MOLAP)	
            (in	
  memory)	
  
(Hadoop/Hbase)	
  
                                                  	
                        	
             1-­‐7	
  days	
  
          	
  
                                                  	
                        	
             detailed	
  
          	
  
                                        6-­‐12	
  months	
        6-­‐?	
  Months	
  
          	
  
                                            Detailed	
            aggregated	
  
   20	
  years	
  	
  
                                                  	
  
    detailed	
  
                               1-­‐3	
  years	
  aggregated	
  
  aggregated	
  
          	
  
          	
  
Data	
  is	
  mul>-­‐>ered	
  

                                        Datamart(s)	
          Real-­‐>me	
  
Data	
  warehouse	
  
                                         (Columnar)	
                  	
  
(Hadoop/Hbase)	
  
                                                  	
           1-­‐7	
  days	
  
          	
  
                                        6-­‐12	
  months	
      detailed	
  
          	
  
          	
                                Detailed	
  
   20	
  years	
  	
                              	
  
    detailed	
  
  aggregated	
  
          	
  
          	
  
SOA	
  leaves	
  us	
  with	
  a	
  lot	
  
of	
  isolated	
  data	
  
Aggregated	
  Repor>ng	
  
                                                              SQL endpoint




                                                   	
  
                                                   	
  
                                                   	
  
                                                   	
  
                                                   	
  
                                                   	
  
                                                   	
  
                                                   	
  
                                                   	
  
                                                   	
  
                                                   	
  
                                                   	
  
                                                   	
  
 Endpoint
                 Produce	
  	
  
 Request
                 reports     	
                                         ODS/DM
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  




                                                                	
  
  Report         Report                                         	
  
                                                                	
  
                                          Raw	
  data	
         	
  
                                          Out                   	
      Transpose	
  
 Endpoint
       	
  
       	
  
       	
  
       	
  
       	
  
       	
  
       	
  
       	
  
       	
  
       	
  
       	
  
       	
  
       	
  



  Pull data
                    Ingest     	
                               	
  
                                                                	
         Join	
  
                                                                	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  




Subscribed/     Load
                                        Landing	
  area	
  
                                                                          Clean       	
  
pulled data
                                                                       Transform
              Service                                                  Data backend
                                      SQL endpoint
                                      	
  
                                      	
  
                                      	
  
                                      	
  
                                      	
  
                                      	
  
                                      	
  
                                      	
  
                                      	
  
                                      	
  
                                      	
  
                                      	
  
                                      	
  
5

                                                              Report	
  
                                                              service	
  
                                                                                   Views	
  
                       2            Raw	
  data	
  	
  
                                       	
  
     Load	
                            	
                                         DW/ODS	
  
1   service	
                          	
  

                                                                              4
                                            2


                                                          3
                                                              Transforma>on	
  
                                                                  service	
  
                  	
  Landing	
  


                           1
3
                                      Report tool
      Drill through
                                  7
       REST API                                                      5

                              9                                6


           8                                                         10




 HBase          2     Aggregation                   4

                      map/reduce
                                      2




                1
                         ETL              Details       Aggregates
Raw data              (map/reduce
 (HDFS)                 +ETL)                  Data mart
Take	
  aways	
  




SOA	
  &	
  Big	
  data	
  are	
  beEer	
  together	
  
Arnon	
  Rotem-­‐Gal-­‐Oz	
  
  	
  



                                                            arnonr@nice.com	
  
                                                            	
  
                                                       hEp://www.nice.com	
  
                                                       	
  

hEp://arnon.me/soa-­‐paEerns	
  
	
  


                                   arnon@rgoarchitects.com	
  
           @arnonrgo	
  
                                   	
   hEp://arnon.me	
  
                                        	
  

Más contenido relacionado

Destacado

Unified Log Processing Architecture
Unified Log Processing ArchitectureUnified Log Processing Architecture
Unified Log Processing ArchitectureGuido Schmutz
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)
Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)
Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)Kai Wähner
 
Asynchronous micro-services and the unified log
Asynchronous micro-services and the unified logAsynchronous micro-services and the unified log
Asynchronous micro-services and the unified logAlexander Dean
 
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...confluent
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Lightbend
 

Destacado (7)

SOA OSB BPEL BPM Presentation
SOA OSB BPEL BPM PresentationSOA OSB BPEL BPM Presentation
SOA OSB BPEL BPM Presentation
 
Unified Log Processing Architecture
Unified Log Processing ArchitectureUnified Log Processing Architecture
Unified Log Processing Architecture
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)
Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)
Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)
 
Asynchronous micro-services and the unified log
Asynchronous micro-services and the unified logAsynchronous micro-services and the unified log
Asynchronous micro-services and the unified log
 
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
 

Más de Arnon Rotem-Gal-Oz

Más de Arnon Rotem-Gal-Oz (20)

Taking ML to production - a journey
Taking ML to production - a journeyTaking ML to production - a journey
Taking ML to production - a journey
 
Apache spark
Apache sparkApache spark
Apache spark
 
Fallacies of Distributed Computing
Fallacies of Distributed Computing Fallacies of Distributed Computing
Fallacies of Distributed Computing
 
Docker & Kubernetes intro
Docker & Kubernetes introDocker & Kubernetes intro
Docker & Kubernetes intro
 
Docker Intro
Docker IntroDocker Intro
Docker Intro
 
Data security @ the personal level
Data security @ the personal levelData security @ the personal level
Data security @ the personal level
 
Microservices - it's déjà vu all over again
Microservices  - it's déjà vu all over againMicroservices  - it's déjà vu all over again
Microservices - it's déjà vu all over again
 
Big data in the cloud - welcome to cost oriented design
Big data in the cloud - welcome to cost oriented designBig data in the cloud - welcome to cost oriented design
Big data in the cloud - welcome to cost oriented design
 
Distilling insights @ AppsFlyer
Distilling insights @ AppsFlyerDistilling insights @ AppsFlyer
Distilling insights @ AppsFlyer
 
Distilling Insights @ Appsflyer (Data Architecture)
Distilling Insights @ Appsflyer (Data Architecture)Distilling Insights @ Appsflyer (Data Architecture)
Distilling Insights @ Appsflyer (Data Architecture)
 
Big data Overview
Big data OverviewBig data Overview
Big data Overview
 
Hadoop YARN overview
Hadoop YARN overviewHadoop YARN overview
Hadoop YARN overview
 
SAF
SAFSAF
SAF
 
REST presentation
REST presentationREST presentation
REST presentation
 
Why the JVM?
Why the JVM?Why the JVM?
Why the JVM?
 
Building reliable systems from unreliable components
Building reliable systems from unreliable componentsBuilding reliable systems from unreliable components
Building reliable systems from unreliable components
 
Azure migration
Azure migrationAzure migration
Azure migration
 
Things to think about while architecting azure solutions
Things to think about while architecting azure solutionsThings to think about while architecting azure solutions
Things to think about while architecting azure solutions
 
Soa
Soa Soa
Soa
 
Rest
RestRest
Rest
 

Último

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Último (20)

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

SOA & Big Data

  • 1. SOA & Big data   Arnon  Rotem-­‐Gal-­‐Oz  
  • 2. Sept  2012  –  iOS6  launched  with  new  maps  applica>on  
  • 4. •  It  isn’t  just  about   geKng  all  the  data   there   •  Algorithms  are  cool   but  we  need  humans   in  the  loop   •  Hire  the  right  people   •  Test  !  Test  !  Test!     hEp://theamazingios6maps.tumblr.com/  
  • 5. hEp://theamazingios6maps.tumblr.com/   It  isn’t  just  one  pile  of  data  
  • 6. Integra>ng  Big  data  &  SOA     Yoel  Ben  Avraham  -­‐  hEp://www.flickr.com/photos/epublicist/3546059144/  
  • 7.
  • 8. Data     Refinery     Ofer  Berger     hEp://www.haifacity.com/allsites/allpic/a/A1738/A1738Pic3326.jpg  
  • 9.
  • 10. ETL  integra>on   Department   DB  integra>on   Server   File-­‐based  integra>on   Online  integra>on   DB  
  • 11.   Object  soup   ASB BLT AFT TGI FRY ECP HDL SWG DRW MFP WCP QYD DLY SKD DLY XPS WIU KYF XOI ZIS CUI WKD WHR ASB GEX RMO JIA HCO BST VUH KFC AJT FQA DKE
  • 12.   Services   ASB BLT AFT TGI FRY Customer HDL SWG Orders ECP DRW MFP WCP QYD DLY SKD DLY XPS WIU KYF XOI ZIS CUI WKD Promotions WHR ASB Invoices RMO HCO JIA GEX BST VUH KFC AJT FQA DKE
  • 13. Adheres  to   Policy   Governed  by   Binds  to   Endpoint   Exposes   Serves   Service     consumer   Understands   Contracts   Implements     Service Describes   Key         Component   Sends/receives Messages Sends/receives Rela>on  
  • 14. Customer   Interac>ons   Agents   Categories  
  • 15. Integra>ng  Big  data  &  SOA     Yoel  Ben  Avraham  -­‐  hEp://www.flickr.com/photos/epublicist/3546059144/  
  • 16. Saga   Key   SOA  component   PaEern  component   Rela>on   Concern/aEribute     Prepare/commit/undo   Protocol   Perform     acDvity     Register       RegistraDon           Compensate   Prepare  /   Coordinator*           commit  /           Perform             undo   acDvity     AcDviDes  and  replies         Par>cipator           Compensate             Create   AcDviDes  and  replies   context   Ini>ator   Service  consumer   Service    
  • 17. HCatalog HBase Data Management HBase Interactions HBase Customer HBase Interaction ETL Recordings HBase NIM Raw Categories (HDFS) Resolved Interactions(H DFS) Resolved Interactions(H DFS) Hadoop Cluster
  • 18. So,   what’s  the     problem  ?  
  • 19.  &  Big  data     can’t  move  
  • 20. Performance  of  joins  in  distributed   system  sucks!   Interactions Interactions Interactions 0-99 200-299 100-199 customers A-H customers I-M customers N-Z Node 1 Node 2 Node 3 {”Interac>on":  {      "id":  ”5",        ”par>cipants":  {          ”customer":  [              {”surname":  ”McDonalds",  ”name":  ”Old"},]      }   }}  
  • 21. Cookie  cuEer   scalability    
  • 22. Cell  architecture   Node   2   Node   3   Node   1   Node   N  
  • 23. Cell  Architecture   HBase HBase HBase Categories Customers ORCA BUS Interactions Reference … Data HDFS HBase HBase
  • 24. Orchestra>on       Initiate business process   Endpoint         Manage   process     Service   Schedule     Route request Invoke   servic   Workflow instance es   Workflow Host   Manage   Monitor     Endpoint     engine workflows   workflows   workflows       Service    
  • 25. Map  Reduce  processing  pipeline   Customers Local cache Retrive segment Categorize Update Segment InteractionID, Segment Row Segment Row data - create Resolve segment Customer IDs Segment document document (Custoemr) (Categorization) (Interaction) (Interaction) Map pipeline Write Write Categories Interaction Results (interaction) (Categorization) Map Interaction & Update Categorize Prepare data Segments Interaction Interaction mart Export document (Categorization) (Datamart) (Interaction) Reduce pipeline Write Write Write Categories Interaction Interaction Results (interaction) (interaction) (Categorization) Reduce Hadoop Map/Reduce
  • 26. Map  Reduce  processing  pipeline   Customers Local cache Retrive segment Categorize Update Segment InteractionID, Segment Row Segment Row data - create Resolve segment Customer IDs Segment document document (Custoemr) (Categorization) (Interaction) (Interaction) Map pipeline Write Write Categories Interaction Results (interaction) (Categorization) Map
  • 28. Memcached   GigaSpaces   Redis   GridGain   Caching Data grid Oracle  Coherence   Columnar WebSphere  eXtreme  Scale   Hama   HBase   Cassandra   In-memory Pregel   Accumulo   Hypertable   Key-value store Neo4j   Graph Hadoop   GlusterFS   Distributed file systems RavenDB   ScaleBase   MongoDB   Relational Document CouchDB   NewSQL IndexTank   Amazon  RDS   Analytics/MPP VoltDB   Apache  Solr   AKvio   Aster  Data   Microsoo  PDW   Indexing Columnar ParAccel   SAP  HANA   Oracle  Exadata   HP  Ver>ca   IBM  Netezza   EMC  Greenplum  
  • 29. Data  is  mul>-­‐>ered   Datamart(s)   Cube   Real-­‐>me   Datawarehouse   (RDBMS)   (MOLAP)   (in  memory)   (Hadoop/Hbase)       1-­‐7  days         detailed     6-­‐12  months   6-­‐?  Months     Detailed   aggregated   20  years       detailed   1-­‐3  years  aggregated   aggregated      
  • 30. Data  is  mul>-­‐>ered   Datamart(s)   Real-­‐>me   Data  warehouse   (Columnar)     (Hadoop/Hbase)     1-­‐7  days     6-­‐12  months   detailed       Detailed   20  years       detailed   aggregated      
  • 31. SOA  leaves  us  with  a  lot   of  isolated  data  
  • 32. Aggregated  Repor>ng   SQL endpoint                           Endpoint Produce     Request reports   ODS/DM                         Report Report     Raw  data     Out   Transpose   Endpoint                           Pull data Ingest       Join                           Subscribed/ Load Landing  area   Clean   pulled data Transform Service Data backend SQL endpoint                          
  • 33. 5 Report   service   Views   2 Raw  data       Load     DW/ODS   1 service     4 2 3 Transforma>on   service    Landing   1
  • 34. 3 Report tool Drill through 7 REST API 5 9 6 8 10 HBase 2 Aggregation 4 map/reduce 2 1 ETL Details Aggregates Raw data (map/reduce (HDFS) +ETL) Data mart
  • 35. Take  aways   SOA  &  Big  data  are  beEer  together  
  • 36. Arnon  Rotem-­‐Gal-­‐Oz     arnonr@nice.com     hEp://www.nice.com     hEp://arnon.me/soa-­‐paEerns     arnon@rgoarchitects.com   @arnonrgo     hEp://arnon.me