SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
Financial Big Data
Loosely coupled, highly structured

         Andrew Elmore
Agenda

     • Why traditional RDBMSs aren’t an ideal fit

     • Why storage format is important

     • Using NoSQL stores to store & query financial data
             • Examples using GemFire, GigaSpaces & Integration Objects

     • Scaling complex processing using grids

Copyright © 2013 C24 Technologies Ltd.
Message Structure
The Humble MT103

                                            {1:F01MIDLGB22AXXX0548100000}{2:I103BKTRUS33XBR

     • A commonly used SWIFT                DN2}{3:{108:valid}}{4:
                                            :20:8861198-0706
            message                         :23B:CRED
                                            :32A:000612USD73025,
                                            :33B:USD73025,
                                            :50K:GIAN ANGELO IMPORTS
                                            NAPLES
                                            :52A:EFGHITMM500

     • Used for credit transfers            :53A:BCITUS33
                                            :54A:IRVTUS3N
                                            :57A:BNPAFRPPGRE
                                            :59:/20041010050500001M02606
                                            KILLY S.A.


     • Mid-complexity compared to
                                            GRENOBLE
                                            :70:Invoice: 100001
                                            :71A:SHA
            other SWIFT messages            :72:/ABCD/narrative
                                            //more narrative
                                            /EFGH/narrative
                                            -}




Copyright © 2013 C24 Technologies Ltd.
MT103 Structure




Copyright © 2013 C24 Technologies Ltd.
MT103 Structure




     • The MT103 Block 4 has 22 fields
             • Some are optional
             • Some repeat a variable number of times
             • Some are grouped into (repeating) sequences
Copyright © 2013 C24 Technologies Ltd.
MT103 Structure




     • Some fields have multiple options
Copyright © 2013 C24 Technologies Ltd.
MT103 Structure




     • Each field can be composed of multiple components
Copyright © 2013 C24 Technologies Ltd.
The Full MT103 Definition




Copyright © 2013 C24 Technologies Ltd.
To Make Matters Worse...

     • There are hundreds of SWIFT messages
             • Each organisation typically uses a subset

     • The standard changes every year
             • Firms may need to support multiple versions concurrently

     • This is just one of many standards
             • FIX, FpML, SEPA, ...



Copyright © 2013 C24 Technologies Ltd.
RDBMS & Financial Messages



     • Storing the full message in a relational structure is very time
            consuming and fragile
             • In practice it’s not viable

     • What options do we have?



Copyright © 2013 C24 Technologies Ltd.
Lossy Storage
     • If we only care about a subset of
            fields, we can construct a simpler
            storage structure
                                                         {1:F01MIDLGB22AXXX0548100000}{2:I…
                                                         :20:8861198-0706
                                                         :23B:CRED


     • Information thrown away
                                                         :32A:000612USD73025,                 100000   USD   73025   BCIT
                                                         :33B:USD73025,
                                                         :50K:GIAN ANGELO IMPORTS
            however is lost forever                      NAPLES
                                                         :52A:EFGHITMM500

             • Advances in processing capacity
                                                         :53A:BCITUS33
                                                         :54A:IRVTUS3N
                   however mean that new applications    :57A:BNPAFRPPGRE
                                                         :59:/20041010050500001M02606
                   are being found for historical data   KILLY S.A.
                                                         GRENOBLE
                                                         :70:Invoice: 100001
                                                         :71A:SHA
                                                         :72:/ABCD/narrative

     • Hybrid solution is to store
                                                         //more narrative
                                                         /EFGH/narrative
                                                         -}
            original source message
            alongside in a CLOB
             • Not readily accessible but at least
                   it’s available

Copyright © 2013 C24 Technologies Ltd.
XML Overload
     • As message formats are hierarchical, we can
            construct an XML representation of the messages
                                                     100000   USD   73025   BCIT   <?xml version="1.0"?>


     • Store this in a CLOB
                                                                                   <MT103 xmlns="http://www.c24.biz/IO/SWIFT2012">
                                                                                     <Block1>
                                                                                        <ApplicationID>F</ApplicationID>
                                                                                        <ServiceID>01</ServiceID>
                                                                                        <LTAddress>
                                                                                           <BankCode>MIDL</BankCode>
                                                                                           <CountryCode>GB</CountryCode>



     • Either create additional columns for indexed
                                                                                           <LocationCode>
                                                                                             <LocationCode1>2</LocationCode1>
                                                                                             <LocationCode2>2</LocationCode2>

            values or use RDBMS with XML index support
                                                                                           </LocationCode>
                                                                                           <LogicalTerminalCode>A</
                                                                                   LogicalTerminalCode>
                                                                                           <BranchCode>XXX</BranchCode>
                                                                                        </LTAddress>
                                                                                        <SessionNumber>0548</SessionNumber>
                                                                                        <SequenceNumber>100005</
                                                                                   SequenceNumber>
                                                                                     </Block1>
                                                                                     <Iblock2>



     • Extraction of data either requires multiple XPath
                                                                                        ….

                                                                                   </MT103>
            queries or parsing the XML to resurrect the
            original message/object
             • Slower than querying simple cols and (probably) an
                   artificial format

Copyright © 2013 C24 Technologies Ltd.
Message Format
     • Our in-memory structure needs to model all the data we need for
            parsing/processing in an efficient structure
             • e.g. a tree of (Java) objects

     • Ideally we’ll persist the same structure directly into the store
             • Persistence and loading are therefore fast
                     •
                 Do we even need to persist to disk?

             • We need ready access to the original message format too

     • The store must be able to index arbitrary attributes in the persisted
            object tree for querying


     • The store should be capable of extracting individual values from the
            object tree

Copyright © 2013 C24 Technologies Ltd.
Future Proofing
     • New clients == new message formats
             • Same purpose, just different presentation
     • Systems architected to work regardless of format &
            transport

                                         Flat File, XML,
                                          Native, ZIP,
                                            (S)FTP, ..



            SWIFT
                    FIX                     JMS, MQ,
                                            AMQP, ...       Parse   Validate   Process
                  FpML
                                          HTTP, REST,
                                         WebServices, ...




                                             SMTP




Copyright © 2013 C24 Technologies Ltd.
C24 Integration Objects


     • Pre-built models for common standards
             • Updated as standards evolve

     • Easy to import/create new models

     • Java binding toolkit
             • Parse messages into queryable objects


Copyright © 2013 C24 Technologies Ltd.
Partners & Platforms




Copyright © 2013 C24 Technologies Ltd.
Typical Integration Flow

     • Linear process, potentially replicated

                                      File
                                 Channel Adapter

                         ErrorChannel




                                     JMS                C24                                                Store
                                                   Unmarshalling           Validating       Process
                                Channel Adapter                                                       Channel Adapter
                                                    Transformer          Message Filter
                                                                                                       Valid Region
                                                                                  Invalid
                        ErrorChannel                               ErrorChannel




                                    HTTP
                                Channel Adapter

                        ErrorChannel




Copyright © 2013 C24 Technologies Ltd.
Acquisition

             <!-- Read the SWIFT message files   -->
             <int-file:inbound-channel-adapter   directory="data/swift"
             	 	 	 	 	 	 	 	 	                   filename-pattern="*.dat"
             	 	 	 	 	 	 	 	 	                   prevent-duplicates="true"
             	 	 	 	 	 	 	 	 	                   channel="parse-swift-message-channel"
             	 	 	 	 	 	 	 	 	                   id="inbound-transport-swift-file">

             	 	 <poller fixed-rate="100" task-executor="swiftThreadPool"/>
             </int-file:inbound-channel-adapter>
             	
             <task:executor id="swiftThreadPool" pool-size="8"/>
             <channel id="parse-swift-message-channel"/>




Copyright © 2013 C24 Technologies Ltd.
Flow-based Processing
     <!-- SWIFT Parsing -->
     <c24:model id="swiftMessageModel" base-element="biz.c24.io.swift2012.MT103Element"/>
     <int-c24:unmarshalling-transformer input-channel="parse-swift-message-channel"
                                        output-channel="validate-message-channel"
                                        model-ref="swiftMessageModel"
                                        source-factory-ref="textualSourceFactory"/>



     <!-- Validate and filter out invalid messages -->
     <int-c24:validating-selector id="c24Validator" throw-exception-on-rejection="true"/>
     <filter input-channel="validate-message-channel"
             output-channel="process-message-channel" ref="c24Validator"
             discard-channel="errorChannel"/>



     <!-- Process - stub out for now-->
     <bridge input-channel="process-message-channel"
             output-channel="persist-valid-message-channel"/>



     <!-- Persist -->
     <channel id="persist-valid-message-channel"/>



Copyright © 2013 C24 Technologies Ltd.
Persistence


             <int-gfe:outbound-channel-adapter id="validStore"
                                               channel="persist-valid-message-channel"
                                               region="valid">

                     <int-gfe:cache-entries>

                            <beans:entry key="headers[id]" value="payload"/>

                     </int-gfe:cache-entries>

             </int-gfe:outbound-channel-adapter>




Copyright © 2013 C24 Technologies Ltd.
Flow-based Architecture




     • All of the scalability is at the front-end
             • Requires us to spread input load evenly across processing units

                                         Acquire   Parse   Validate   Store




     • Alternatively we can use the store and make our flow event-
            based

Copyright © 2013 C24 Technologies Ltd.
Event-based Architecture

                                                    Parser           Validator
                                         Acquirer                                Processor




                                                             Store




     • The store ‘triggers’ processing
             • Listeners
             • Subscriptions
             • Querying
             • ...
Copyright © 2013 C24 Technologies Ltd.
Scaling Up


                                                        Parser        Validator
                                                         Parser        Validator
                                         Acquirer         Parser        Validator      Processor
                                          Acquirer         Parser         Validator     Processor
                                           Acquirer         Parser         Validator     Processor
                                            Acquirer                                      Processor
                                             Acquirer                                      Processor




                                                              Store




     • Multiple components can register an interest in the events
             • We can scale out and add redundancy by replicating the listeners
Copyright © 2013 C24 Technologies Ltd.
Scaling Out
     • We can control what is replicated and to where
     • The node that parses the message doesn’t have to be the same as
            the one that processes it
                                                          Parser          Validator
                                                           Parser          Validator
                                         Acquirer           Parser          Validator       Processor
                                          Acquirer           Parser           Validator      Processor
                                           Acquirer           Parser           Validator      Processor
                                            Acquirer                                           Processor
                                             Acquirer                                           Processor




                                                                                                                                           Parser         Validator
                                                                                                                                            Parser         Validator
                                                                  Store                                                     Acquirer         Parser         Validator      Processor
                                                                                                                             Acquirer         Parser          Validator     Processor
                                                                                                                              Acquirer         Parser          Validator     Processor
                                                                                                                               Acquirer                                       Processor
                                                                                                                                Acquirer                                       Processor




                                                                                                Replicate                                         Store




                                                                                   Store




                                                                          Parser           Validator
                                                                           Parser           Validator
                                                        Acquirer            Parser           Validator      Processor
                                                         Acquirer            Parser            Validator     Processor
                                                          Acquirer            Parser            Validator     Processor
                                                           Acquirer                                            Processor
                                                            Acquirer                                            Processor




Copyright © 2013 C24 Technologies Ltd.
Ingesting Data
Demo Time
      • GemFire Groovy DSL is still in development

      • Allows simple definition of regions and event-based handlers
  	        	       	       region 'new', shortcut: REPLICATE, {
  	        	       	       	 listener (
  	        	       	       	 	 afterCreate: { EntryEvent e ->
  	        	       	       	 	 	 try {
  	        	       	       	 	 	 	 // Parse the raw message
  	        	       	       	 	 	 	 String message = e.getNewValue();
  	        	       	       	 	 	 	 source.setReader(new StringReader(message));
  	        	       	       	 	 	 	 parsedRegion.put(e.getKey(), source.readObject(element));
  	        	       	       	 	 	 } catch(Exception ex) {
  	        	       	       	 	 	 	 exceptionRegion.put(e.getNewValue(), ex);
  	        	       	       	 	 	 } finally {
  	        	       	       	 	 	 	 newRegion.remove(e.getKey());
  	        	       	       	 	 	 }
  	        	       	       	 	 }
  	        	       	       	 )
  	        	       	       }


Copyright © 2013 C24 Technologies Ltd.
Demo Takeaways
     • We had the full message available to us
             • With a meaningful schema

     • We can query on and extract the full message or individual values

     • Approach is message format neutral
             • We just need to know the hierarchical location of the information we’re
                   interested in


     • Our persistence code is extremely simple

     • We can index on any [derived] attribute
Copyright © 2013 C24 Technologies Ltd.
Scaling Out

     • Nodes can be replicated and sharded/partitioned
             • Queries are automatically distributed

     • Data can be ingested on any node

     • The same event-based architecture works well for
            distributed processing & analysis
             • And as we’ve seen our data grids allow us to embed our logic in them


Copyright © 2013 C24 Technologies Ltd.
Processing
Real-Time Analytics
     • Complex Event Processing

     • Inter-dependent set of variables & rules

     • Typically very high volume & latency critical

     • Run on multiple nodes to cope with
             • Load
             • Volume
             • Resilience
Copyright © 2013 C24 Technologies Ltd.
Rule Engines

     • Track dependencies between rules
             • Recalculate when necessary
             • Trigger behaviour when all rules are met

     • Changes typically triggered by time & message state change
             • Fits well with our event-based architecture

     • See RETE algorithms & derivates for creating highly
            performant rule trees


Copyright © 2013 C24 Technologies Ltd.
A Simple Architecture
     • We’ll use Gigaspaces XAP for some variety
             • Distributed in-memory data grid + elastic application platform

     • Sample application consists of 3 key areas:
             • ‘Space Object’ ie what are we going to be storing
             • Feeder - how do we get data into the grid
             • Processors - what happens to data once it’s in the grid
                    mvn os:create -DgroupId=my.group -DartifactId=Test1 -Dtemplate=basic
                    mvn package
                    mvn os:deploy -Dlocators=localhost:4174


     • Message Processors
             • ParseProcessor - parse raw messages into domain objects
             • RuleProcessor - trigger rule updates on message state change
Copyright © 2013 C24 Technologies Ltd.
Configuring XAP




     • Multiple ways to mark up our code
             • We’ll use annotations for simplicity

     • First we need the type we’re going to store in the space



Copyright © 2013 C24 Technologies Ltd.
The Space Class
       @SpaceClass
       public class Message<T extends ComplexDataObject> implements Serializable {

       	       public static enum State {
       	       	 NEW,
       	       	 VALID,
       	       	 PROCESSED,
       	       	 INVALID
       	       };
       	
                 private String id = null;

                 private State state = null;

                 private String rawMessage = null;

                 private T message = null;

                 private Throwable failure = null;

                 @SpaceId(autoGenerate=true)
                 public String getId() {
                     return id;
                 }


Copyright © 2013 C24 Technologies Ltd.
The Feeder
     • Simple implementation to read raw messages from the
            filesystem
             • In practice we’ll be acquiring from multiple sources
             • Use an outbound channel adapater as a feeder?
                      public class Feeder implements InitializingBean, DisposableBean {
                          @GigaSpaceContext
                          private GigaSpace gigaSpace;

                                ...

                               public void loadFiles() {
                               	 ...;
                               	 for(File file : dir.listFiles()) {
                      	       	 	 Scanner scanner = new Scanner(file);
                      	               String fileContent = scanner.useDelimiter("A").next();
                      	               scanner.close();
                      	               gigaSpace.write(new Message(fileContent));
                               	 }
                               }


Copyright © 2013 C24 Technologies Ltd.
ParseProcessor

     • Declare the Processor and tell it when to trigger
          @EventDriven @Notify
          @NotifyType(write = true, update = true)
          public class ParseProcessor<T extends ComplexDataObject> {

                    /**
                      * We process any messages in a new state
                      * @return
                      */
                    @EventTemplate
                    Message templateMessage() {
                    	 Message msg = new Message();
                    	 msg.setState(Message.State.NEW);
                    	 return msg;
                    }




Copyright © 2013 C24 Technologies Ltd.
ParseProcessor
     • ...and how to process the messages it receives
                         @SpaceDataEvent
                         public Message<T> processData(Message<T> data) {
                         	
                         	 try {

                         	      	        log.info("Parsing " + data.getId());

                         	      	        source.setReader(new StringReader(data.getRawMessage()));
                         	      	        data.setMessage((T)source.readObject(element));
                         	      	        data.setState(State.VALID);

                         	      	        log.info("Parsed " + data.getId());

                         	      } catch(Throwable ex) {
                         	      	 log.info("Message invalid " + data.getId());
                         	      	 data.setFailure(ex);
                         	      	 data.setState(State.INVALID);
                         	      }
                         	
                                    return data;
                         }


Copyright © 2013 C24 Technologies Ltd.
RuleProcessor
                 /**
                   * We process any messages in a valid state
                   * @return
                   */
                 @EventTemplate
                 Message templateMessage() {
                 	 Message msg = new Message();
                 	 msg.setState(Message.State.VALID);
                 	 return msg;
                 }



                @SpaceDataEvent
                public Message<T> processData(Message<T> data) {
                	
       	       	 log.info("Processing " + data.getId());
       	       	 engine.process(data.getMessage());
       	       	 data.setState(State.PROCESSED);

                         return data;
                 }




Copyright © 2013 C24 Technologies Ltd.
Demo
General Concerns
Serialisation

     • At some point your objects are likely to be serialized
             • ...and it won’t be during your local testing

     • Create unit tests
                  	        	       MyType obj = ...;
                  	        	       ByteArrayOutputStream bos = new ByteArrayOutputStream();
                  	        	       ObjectOutputStream oos = new ObjectOutputStream(bos);
                  	        	       oos.writeObject(obj);
                  	        	       oos.close();




     • Check what you’re actually serialising
                                   assertThat(bos.toByteArray().length, is(lessThan(some size)));




Copyright © 2013 C24 Technologies Ltd.
Interface Semantics
     • Exclusive lock vs shared lock vs take

     • Who gets notified?

     • Callback thread behaviour
             • What happens if we hold on to the thread for a long time?
             • What if we generate an exception?

     • None of these technologies make coordination problems
            disappear
             • They can however make it easier to apply good practice
Copyright © 2013 C24 Technologies Ltd.
Thank You!




                                              ?
                                         andrew.elmore@c24.biz
                                              @andrew_elmore




Copyright © 2013 C24 Technologies Ltd.

Más contenido relacionado

Similar a Financial big data loosely coupled, highly structured - andrew elmore

Postgis_Topology_to_secure_data_integrity,_simple_API_and_clean_up_messy_simp...
Postgis_Topology_to_secure_data_integrity,_simple_API_and_clean_up_messy_simp...Postgis_Topology_to_secure_data_integrity,_simple_API_and_clean_up_messy_simp...
Postgis_Topology_to_secure_data_integrity,_simple_API_and_clean_up_messy_simp...Lars Aksel Opsahl
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkDataWorks Summit
 
Desktop Private Cloud
Desktop Private CloudDesktop Private Cloud
Desktop Private CloudPaul Morse
 
Istio as an enabler for migrating to microservices (edition 2022)
Istio as an enabler for migrating to microservices (edition 2022)Istio as an enabler for migrating to microservices (edition 2022)
Istio as an enabler for migrating to microservices (edition 2022)Ahmed Misbah
 
Introducing LCS to Digital Design Verification
Introducing LCS to Digital Design VerificationIntroducing LCS to Digital Design Verification
Introducing LCS to Digital Design VerificationDaniele Loiacono
 
Bitcoin Decision Point - April 2017
Bitcoin Decision Point - April 2017Bitcoin Decision Point - April 2017
Bitcoin Decision Point - April 2017Jeff Garzik
 
ISO 20022 for funds presentation
ISO 20022 for funds presentationISO 20022 for funds presentation
ISO 20022 for funds presentationKoen Vierendeels
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...Ambassador Labs
 
SKS_Presentation(2008.10)
SKS_Presentation(2008.10)SKS_Presentation(2008.10)
SKS_Presentation(2008.10)aprilnewman
 
New idc architecture
New idc architectureNew idc architecture
New idc architectureMason Mei
 
bcPro Introductory Offer
bcPro Introductory OfferbcPro Introductory Offer
bcPro Introductory OfferNghiaCao8
 
2019 touraine tech - Marteau Hubert
2019 touraine tech - Marteau Hubert2019 touraine tech - Marteau Hubert
2019 touraine tech - Marteau HubertHubertMARTEAU
 
Cryptocurrencies overview
Cryptocurrencies overviewCryptocurrencies overview
Cryptocurrencies overviewTrector Rancor
 
State of GeoServer 2012
State of GeoServer 2012State of GeoServer 2012
State of GeoServer 2012Jody Garnett
 
onos-day-dkim-20150914-lkin
onos-day-dkim-20150914-lkinonos-day-dkim-20150914-lkin
onos-day-dkim-20150914-lkinDongkyun Kim
 
The Future of Service Mesh
The Future of Service MeshThe Future of Service Mesh
The Future of Service MeshAll Things Open
 
Using Graphs to Take Down Fraudsters in Real Time
Using Graphs to Take Down Fraudsters in Real TimeUsing Graphs to Take Down Fraudsters in Real Time
Using Graphs to Take Down Fraudsters in Real TimeNeo4j
 
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...HostedbyConfluent
 

Similar a Financial big data loosely coupled, highly structured - andrew elmore (20)

Postgis_Topology_to_secure_data_integrity,_simple_API_and_clean_up_messy_simp...
Postgis_Topology_to_secure_data_integrity,_simple_API_and_clean_up_messy_simp...Postgis_Topology_to_secure_data_integrity,_simple_API_and_clean_up_messy_simp...
Postgis_Topology_to_secure_data_integrity,_simple_API_and_clean_up_messy_simp...
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on Spark
 
Desktop Private Cloud
Desktop Private CloudDesktop Private Cloud
Desktop Private Cloud
 
Istio as an enabler for migrating to microservices (edition 2022)
Istio as an enabler for migrating to microservices (edition 2022)Istio as an enabler for migrating to microservices (edition 2022)
Istio as an enabler for migrating to microservices (edition 2022)
 
Introducing LCS to Digital Design Verification
Introducing LCS to Digital Design VerificationIntroducing LCS to Digital Design Verification
Introducing LCS to Digital Design Verification
 
CourboSpark
CourboSparkCourboSpark
CourboSpark
 
Bitcoin Decision Point - April 2017
Bitcoin Decision Point - April 2017Bitcoin Decision Point - April 2017
Bitcoin Decision Point - April 2017
 
ISO 20022 for funds presentation
ISO 20022 for funds presentationISO 20022 for funds presentation
ISO 20022 for funds presentation
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
 
SKS_Presentation(2008.10)
SKS_Presentation(2008.10)SKS_Presentation(2008.10)
SKS_Presentation(2008.10)
 
New idc architecture
New idc architectureNew idc architecture
New idc architecture
 
Trending Topics in Payments
Trending Topics in PaymentsTrending Topics in Payments
Trending Topics in Payments
 
bcPro Introductory Offer
bcPro Introductory OfferbcPro Introductory Offer
bcPro Introductory Offer
 
2019 touraine tech - Marteau Hubert
2019 touraine tech - Marteau Hubert2019 touraine tech - Marteau Hubert
2019 touraine tech - Marteau Hubert
 
Cryptocurrencies overview
Cryptocurrencies overviewCryptocurrencies overview
Cryptocurrencies overview
 
State of GeoServer 2012
State of GeoServer 2012State of GeoServer 2012
State of GeoServer 2012
 
onos-day-dkim-20150914-lkin
onos-day-dkim-20150914-lkinonos-day-dkim-20150914-lkin
onos-day-dkim-20150914-lkin
 
The Future of Service Mesh
The Future of Service MeshThe Future of Service Mesh
The Future of Service Mesh
 
Using Graphs to Take Down Fraudsters in Real Time
Using Graphs to Take Down Fraudsters in Real TimeUsing Graphs to Take Down Fraudsters in Real Time
Using Graphs to Take Down Fraudsters in Real Time
 
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
 

Financial big data loosely coupled, highly structured - andrew elmore

  • 1. Financial Big Data Loosely coupled, highly structured Andrew Elmore
  • 2. Agenda • Why traditional RDBMSs aren’t an ideal fit • Why storage format is important • Using NoSQL stores to store & query financial data • Examples using GemFire, GigaSpaces & Integration Objects • Scaling complex processing using grids Copyright © 2013 C24 Technologies Ltd.
  • 4. The Humble MT103 {1:F01MIDLGB22AXXX0548100000}{2:I103BKTRUS33XBR • A commonly used SWIFT DN2}{3:{108:valid}}{4: :20:8861198-0706 message :23B:CRED :32A:000612USD73025, :33B:USD73025, :50K:GIAN ANGELO IMPORTS NAPLES :52A:EFGHITMM500 • Used for credit transfers :53A:BCITUS33 :54A:IRVTUS3N :57A:BNPAFRPPGRE :59:/20041010050500001M02606 KILLY S.A. • Mid-complexity compared to GRENOBLE :70:Invoice: 100001 :71A:SHA other SWIFT messages :72:/ABCD/narrative //more narrative /EFGH/narrative -} Copyright © 2013 C24 Technologies Ltd.
  • 5. MT103 Structure Copyright © 2013 C24 Technologies Ltd.
  • 6. MT103 Structure • The MT103 Block 4 has 22 fields • Some are optional • Some repeat a variable number of times • Some are grouped into (repeating) sequences Copyright © 2013 C24 Technologies Ltd.
  • 7. MT103 Structure • Some fields have multiple options Copyright © 2013 C24 Technologies Ltd.
  • 8. MT103 Structure • Each field can be composed of multiple components Copyright © 2013 C24 Technologies Ltd.
  • 9. The Full MT103 Definition Copyright © 2013 C24 Technologies Ltd.
  • 10. To Make Matters Worse... • There are hundreds of SWIFT messages • Each organisation typically uses a subset • The standard changes every year • Firms may need to support multiple versions concurrently • This is just one of many standards • FIX, FpML, SEPA, ... Copyright © 2013 C24 Technologies Ltd.
  • 11. RDBMS & Financial Messages • Storing the full message in a relational structure is very time consuming and fragile • In practice it’s not viable • What options do we have? Copyright © 2013 C24 Technologies Ltd.
  • 12. Lossy Storage • If we only care about a subset of fields, we can construct a simpler storage structure {1:F01MIDLGB22AXXX0548100000}{2:I… :20:8861198-0706 :23B:CRED • Information thrown away :32A:000612USD73025, 100000 USD 73025 BCIT :33B:USD73025, :50K:GIAN ANGELO IMPORTS however is lost forever NAPLES :52A:EFGHITMM500 • Advances in processing capacity :53A:BCITUS33 :54A:IRVTUS3N however mean that new applications :57A:BNPAFRPPGRE :59:/20041010050500001M02606 are being found for historical data KILLY S.A. GRENOBLE :70:Invoice: 100001 :71A:SHA :72:/ABCD/narrative • Hybrid solution is to store //more narrative /EFGH/narrative -} original source message alongside in a CLOB • Not readily accessible but at least it’s available Copyright © 2013 C24 Technologies Ltd.
  • 13. XML Overload • As message formats are hierarchical, we can construct an XML representation of the messages 100000 USD 73025 BCIT <?xml version="1.0"?> • Store this in a CLOB <MT103 xmlns="http://www.c24.biz/IO/SWIFT2012"> <Block1> <ApplicationID>F</ApplicationID> <ServiceID>01</ServiceID> <LTAddress> <BankCode>MIDL</BankCode> <CountryCode>GB</CountryCode> • Either create additional columns for indexed <LocationCode> <LocationCode1>2</LocationCode1> <LocationCode2>2</LocationCode2> values or use RDBMS with XML index support </LocationCode> <LogicalTerminalCode>A</ LogicalTerminalCode> <BranchCode>XXX</BranchCode> </LTAddress> <SessionNumber>0548</SessionNumber> <SequenceNumber>100005</ SequenceNumber> </Block1> <Iblock2> • Extraction of data either requires multiple XPath …. </MT103> queries or parsing the XML to resurrect the original message/object • Slower than querying simple cols and (probably) an artificial format Copyright © 2013 C24 Technologies Ltd.
  • 14. Message Format • Our in-memory structure needs to model all the data we need for parsing/processing in an efficient structure • e.g. a tree of (Java) objects • Ideally we’ll persist the same structure directly into the store • Persistence and loading are therefore fast • Do we even need to persist to disk? • We need ready access to the original message format too • The store must be able to index arbitrary attributes in the persisted object tree for querying • The store should be capable of extracting individual values from the object tree Copyright © 2013 C24 Technologies Ltd.
  • 15. Future Proofing • New clients == new message formats • Same purpose, just different presentation • Systems architected to work regardless of format & transport Flat File, XML, Native, ZIP, (S)FTP, .. SWIFT FIX JMS, MQ, AMQP, ... Parse Validate Process FpML HTTP, REST, WebServices, ... SMTP Copyright © 2013 C24 Technologies Ltd.
  • 16. C24 Integration Objects • Pre-built models for common standards • Updated as standards evolve • Easy to import/create new models • Java binding toolkit • Parse messages into queryable objects Copyright © 2013 C24 Technologies Ltd.
  • 17. Partners & Platforms Copyright © 2013 C24 Technologies Ltd.
  • 18. Typical Integration Flow • Linear process, potentially replicated File Channel Adapter ErrorChannel JMS C24 Store Unmarshalling Validating Process Channel Adapter Channel Adapter Transformer Message Filter Valid Region Invalid ErrorChannel ErrorChannel HTTP Channel Adapter ErrorChannel Copyright © 2013 C24 Technologies Ltd.
  • 19. Acquisition <!-- Read the SWIFT message files --> <int-file:inbound-channel-adapter directory="data/swift" filename-pattern="*.dat" prevent-duplicates="true" channel="parse-swift-message-channel" id="inbound-transport-swift-file"> <poller fixed-rate="100" task-executor="swiftThreadPool"/> </int-file:inbound-channel-adapter> <task:executor id="swiftThreadPool" pool-size="8"/> <channel id="parse-swift-message-channel"/> Copyright © 2013 C24 Technologies Ltd.
  • 20. Flow-based Processing <!-- SWIFT Parsing --> <c24:model id="swiftMessageModel" base-element="biz.c24.io.swift2012.MT103Element"/> <int-c24:unmarshalling-transformer input-channel="parse-swift-message-channel" output-channel="validate-message-channel" model-ref="swiftMessageModel" source-factory-ref="textualSourceFactory"/> <!-- Validate and filter out invalid messages --> <int-c24:validating-selector id="c24Validator" throw-exception-on-rejection="true"/> <filter input-channel="validate-message-channel" output-channel="process-message-channel" ref="c24Validator" discard-channel="errorChannel"/> <!-- Process - stub out for now--> <bridge input-channel="process-message-channel" output-channel="persist-valid-message-channel"/> <!-- Persist --> <channel id="persist-valid-message-channel"/> Copyright © 2013 C24 Technologies Ltd.
  • 21. Persistence <int-gfe:outbound-channel-adapter id="validStore" channel="persist-valid-message-channel" region="valid"> <int-gfe:cache-entries> <beans:entry key="headers[id]" value="payload"/> </int-gfe:cache-entries> </int-gfe:outbound-channel-adapter> Copyright © 2013 C24 Technologies Ltd.
  • 22. Flow-based Architecture • All of the scalability is at the front-end • Requires us to spread input load evenly across processing units Acquire Parse Validate Store • Alternatively we can use the store and make our flow event- based Copyright © 2013 C24 Technologies Ltd.
  • 23. Event-based Architecture Parser Validator Acquirer Processor Store • The store ‘triggers’ processing • Listeners • Subscriptions • Querying • ... Copyright © 2013 C24 Technologies Ltd.
  • 24. Scaling Up Parser Validator Parser Validator Acquirer Parser Validator Processor Acquirer Parser Validator Processor Acquirer Parser Validator Processor Acquirer Processor Acquirer Processor Store • Multiple components can register an interest in the events • We can scale out and add redundancy by replicating the listeners Copyright © 2013 C24 Technologies Ltd.
  • 25. Scaling Out • We can control what is replicated and to where • The node that parses the message doesn’t have to be the same as the one that processes it Parser Validator Parser Validator Acquirer Parser Validator Processor Acquirer Parser Validator Processor Acquirer Parser Validator Processor Acquirer Processor Acquirer Processor Parser Validator Parser Validator Store Acquirer Parser Validator Processor Acquirer Parser Validator Processor Acquirer Parser Validator Processor Acquirer Processor Acquirer Processor Replicate Store Store Parser Validator Parser Validator Acquirer Parser Validator Processor Acquirer Parser Validator Processor Acquirer Parser Validator Processor Acquirer Processor Acquirer Processor Copyright © 2013 C24 Technologies Ltd.
  • 27. Demo Time • GemFire Groovy DSL is still in development • Allows simple definition of regions and event-based handlers region 'new', shortcut: REPLICATE, { listener ( afterCreate: { EntryEvent e -> try { // Parse the raw message String message = e.getNewValue(); source.setReader(new StringReader(message)); parsedRegion.put(e.getKey(), source.readObject(element)); } catch(Exception ex) { exceptionRegion.put(e.getNewValue(), ex); } finally { newRegion.remove(e.getKey()); } } ) } Copyright © 2013 C24 Technologies Ltd.
  • 28. Demo Takeaways • We had the full message available to us • With a meaningful schema • We can query on and extract the full message or individual values • Approach is message format neutral • We just need to know the hierarchical location of the information we’re interested in • Our persistence code is extremely simple • We can index on any [derived] attribute Copyright © 2013 C24 Technologies Ltd.
  • 29. Scaling Out • Nodes can be replicated and sharded/partitioned • Queries are automatically distributed • Data can be ingested on any node • The same event-based architecture works well for distributed processing & analysis • And as we’ve seen our data grids allow us to embed our logic in them Copyright © 2013 C24 Technologies Ltd.
  • 31. Real-Time Analytics • Complex Event Processing • Inter-dependent set of variables & rules • Typically very high volume & latency critical • Run on multiple nodes to cope with • Load • Volume • Resilience Copyright © 2013 C24 Technologies Ltd.
  • 32. Rule Engines • Track dependencies between rules • Recalculate when necessary • Trigger behaviour when all rules are met • Changes typically triggered by time & message state change • Fits well with our event-based architecture • See RETE algorithms & derivates for creating highly performant rule trees Copyright © 2013 C24 Technologies Ltd.
  • 33. A Simple Architecture • We’ll use Gigaspaces XAP for some variety • Distributed in-memory data grid + elastic application platform • Sample application consists of 3 key areas: • ‘Space Object’ ie what are we going to be storing • Feeder - how do we get data into the grid • Processors - what happens to data once it’s in the grid mvn os:create -DgroupId=my.group -DartifactId=Test1 -Dtemplate=basic mvn package mvn os:deploy -Dlocators=localhost:4174 • Message Processors • ParseProcessor - parse raw messages into domain objects • RuleProcessor - trigger rule updates on message state change Copyright © 2013 C24 Technologies Ltd.
  • 34. Configuring XAP • Multiple ways to mark up our code • We’ll use annotations for simplicity • First we need the type we’re going to store in the space Copyright © 2013 C24 Technologies Ltd.
  • 35. The Space Class @SpaceClass public class Message<T extends ComplexDataObject> implements Serializable { public static enum State { NEW, VALID, PROCESSED, INVALID }; private String id = null; private State state = null; private String rawMessage = null; private T message = null; private Throwable failure = null; @SpaceId(autoGenerate=true) public String getId() { return id; } Copyright © 2013 C24 Technologies Ltd.
  • 36. The Feeder • Simple implementation to read raw messages from the filesystem • In practice we’ll be acquiring from multiple sources • Use an outbound channel adapater as a feeder? public class Feeder implements InitializingBean, DisposableBean { @GigaSpaceContext private GigaSpace gigaSpace; ... public void loadFiles() { ...; for(File file : dir.listFiles()) { Scanner scanner = new Scanner(file); String fileContent = scanner.useDelimiter("A").next(); scanner.close(); gigaSpace.write(new Message(fileContent)); } } Copyright © 2013 C24 Technologies Ltd.
  • 37. ParseProcessor • Declare the Processor and tell it when to trigger @EventDriven @Notify @NotifyType(write = true, update = true) public class ParseProcessor<T extends ComplexDataObject> { /** * We process any messages in a new state * @return */ @EventTemplate Message templateMessage() { Message msg = new Message(); msg.setState(Message.State.NEW); return msg; } Copyright © 2013 C24 Technologies Ltd.
  • 38. ParseProcessor • ...and how to process the messages it receives @SpaceDataEvent public Message<T> processData(Message<T> data) { try { log.info("Parsing " + data.getId()); source.setReader(new StringReader(data.getRawMessage())); data.setMessage((T)source.readObject(element)); data.setState(State.VALID); log.info("Parsed " + data.getId()); } catch(Throwable ex) { log.info("Message invalid " + data.getId()); data.setFailure(ex); data.setState(State.INVALID); } return data; } Copyright © 2013 C24 Technologies Ltd.
  • 39. RuleProcessor /** * We process any messages in a valid state * @return */ @EventTemplate Message templateMessage() { Message msg = new Message(); msg.setState(Message.State.VALID); return msg; } @SpaceDataEvent public Message<T> processData(Message<T> data) { log.info("Processing " + data.getId()); engine.process(data.getMessage()); data.setState(State.PROCESSED); return data; } Copyright © 2013 C24 Technologies Ltd.
  • 40. Demo
  • 42. Serialisation • At some point your objects are likely to be serialized • ...and it won’t be during your local testing • Create unit tests MyType obj = ...; ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(bos); oos.writeObject(obj); oos.close(); • Check what you’re actually serialising assertThat(bos.toByteArray().length, is(lessThan(some size))); Copyright © 2013 C24 Technologies Ltd.
  • 43. Interface Semantics • Exclusive lock vs shared lock vs take • Who gets notified? • Callback thread behaviour • What happens if we hold on to the thread for a long time? • What if we generate an exception? • None of these technologies make coordination problems disappear • They can however make it easier to apply good practice Copyright © 2013 C24 Technologies Ltd.
  • 44. Thank You! ? andrew.elmore@c24.biz @andrew_elmore Copyright © 2013 C24 Technologies Ltd.