SlideShare a Scribd company logo
1 of 48
Download to read offline
Storing and processing data
  with the WSO2 Platform
        Deependra Ariyadewa
         Wathsala Vithanage
WSO2
• Founded in 2005 by acknowledged leaders in XML, Web
  Services Technologies & Standards and Open Source

• Producing entire middleware platform 100% open source
  under Apache license

• Business model is to sell comprehensive support &
  maintenance for our products

• Venture funded by Intel Capital and Quest Software.

• Global corporation with offices in USA, UK & Sri Lanka

• 150+ employees and growing.
Introduction to Data Problem
• Information explosion
   o Rapid growth of published data.
   o Managing large amounts of data is difficult (this leads to
     an information overload)
   o Difficulties include
       Capture
       Storage
       Search
       Sharing
       Analytics
       Visualization
   o We need new tools to deal with BIG DATA.
The Well Known Data Solution
RDBMS
• For many years this has been the choice

• Scaling up RDBMS
   o   Put it in a bigger computer
   o   Replicate database over 2 - 3 nodes. This does not work well
       with more than 2 - 3 nodes.
   o    Partition data over several nodes. Although JOIN queries are
       hard across many nodes, may require custom code and
       configuration. Transactions may not scale well.
CAP Theorem and RDBMS
• RDBMS has two key features
   o Relational Model with SQL
   o ACID transactions (Atomic, Consistent, Isolation &
     Durable)
• CAP theorem states that in distributed systems it is only
  possible to have two properties out of the properties
  Consistency, Availability & Partition Tolerance at any given
  time.
   o Once you have picked two properties you will loose the
     remaining one.
• But there are some applications that do not need all the
  properties of RDBMS. Once these are dropped system
  scales. (e.g. Google Big Tables)
Rise of NoSQL
• Large internet companies hit the problem first, they build
  systems that are specific to their problem, and they did
  scale.
   o Google Big table
   o Amazon Dynamo

• Soon many others followed, and most of them are free and
  open source.

• Among advantages of NoSQL are
  o Scalability
  o Flexible schema
  o Designed to scale and support fault tolerance out of the
    Box
Finding the right Data Solution

• Data Types
  o Unstructured Data
      Files

  o   Semi Structured Data
       XML Databases, Queues, Graphs and Lists

  o   Structured Data
       DBMS
Handling Unstructured Data
• Storage Options
  o Key - Value storages for small data items
  o Distributed file systems for other cases
  o Metadata Registries (Nirvana, SDSC Resource broker)
• Scalability
  o Key - Value storages are highly Scalable (e.g. Amazon
    Dynamo)
  o Distributed File Systems are generally scalable (HDFS,
    Lustre)
  o Metadata Registries are also highly scalable
• Search
  o Each of above provide key based retrieval
  o Metadata registries provide property based search.
  o It is possible to build a index for content using tools like
    Lucence and use that for search.
Handling Semi-Structured Data
• Storage Options
   o   Answer depends on the type of structure. (e.g.
                                          XML = XML Databases,
       Graphs = Graph Databases, List = Data structure servers, work
       items = Queue)
   o   If there is a server optimized for a given type, it is often much more efficient than
       using a DB. (e.g. Graph databases can support fast relationship search)

• Scalabilty
   o   XML databases can shared data across nodes, so usually scalable, but others are not
       that scalable

• Search
   o   Very much custom. E.g. XML or any tree = XPath
   o   Graph can support very fast relationship search
Handling Structured Data (1-3 nodes)

                                                          • In general using DB here
               Small (1-3 nodes)
                                                            for every case might
               Loose         Operation     Transactions
               Consistency   Consistency                    work.
 Primary Key   DB/ KV/ CF    DB/ KV/ CF    DB             • Reason for using options
                                                            other than DB
 Where         DB/ CF/Doc    DB/ CF/Doc    DB
                                                             • When there is
 JOIN          DB            DB            DB                   potential need to scale
 Offline       DB/CF/Doc     DB/CF/Doc     DB/CF/Doc
                                                                later.
                                                             • High write throughput
                                                          • KV is 1-D where as other
                                                            two are 2D
*KV: Key-Value Systems, CF: Column
Families, Doc: document based
Systems
Handling Structured Data (10 nodes)

                                                     • KV, CF, and Doc can easily handle
          Scalable (10 nodes)
                                                       this case.
          Loose         Operation     Transactions
          Consistency   Consistency                  • If DBs used with data shredded
                                                       across many nodes.
Primary
Key
          KV/CF         KV/CF         Partitioned
                                      DB?
                                                        • Transactions might work with
Where     CF/Doc        CF/Doc        Partitioned
                                                            given that participants on one
                                      DB?                   transaction are not too many.
JOIN      ??            ??            Partitioned       • JOINs might need to transfer too
                                      DB??
                                                            much data between nodes.
Offline   CF/Doc        CF/Doc        No
                                                        • Also should consider in Memory
                                                            DBs like Vault DB
                                                     • Offline mode will work
                                                     • Most systems let users choose
*KV: Key-Value Systems, CF: Column                     consistency, and loose consistency
Families, Doc: document based                          can scale more. (e.g. Cassandra)
Systems
Highly Scalable System
                                                      • Transactions does not work in this scale.
               Highly Scalable (1000s nodes)            (CAP theorem).
                                                      • Same for the JOIN. Problem is sometime
          Loose          Operation     Transactions
          Consistency    Consistency                    too much data needs to be transferred
Primary   KV/CF          KV/CF         No               between nodes to perform the JOIN.
Key
                                                      • Offline case handled through Map-
Where     CF/Doc         CF/Doc        No               Reduce. Even JOIN case is OK since
                                                        there is time.
JOIN      No             No            No


Offline   CF/Doc         CF/Doc        No




 *KV: Key-Value Systems, CF: Column
 Families, Doc: document based
 Systems
Highly Scalable Systems + Primary Key Retrieval

                                                      • This is (comparatively) the easy one.
            Highly Scalable (1000s nodes)

           Loose         Operation     Transactions   • Can be solved through DHT
           Consistency   Consistency                    (Distributed Hash table) based solutions
Primary      KV/CF         KV/CF            No          or architectures like OceanStore.
  Key
 Where      CF/Doc(?)     CF/Doc(?)         No
                                                      • Both Key-Value Storages(KV) and
 JOIN          No            No             No          Column Families (CF) can be used. But
                                                        Key-Value model is preferred as it is
 Offline     CF/Doc        CF/Doc           No
                                                        more scalable.




*KV: Key-Value Systems, CF: Column
Families, Doc: document based
Systems
Highly scalable systems + WHERE
                                                     • This Generally OK, but tricky.
           Highly Scalable (1000s nodes)

          Loose         Operation     Transactions   • CF work through a Secondary index that
          Consistency   Consistency
                                                       do Scatter-gather (e.g. Cassandra).
Primary      KV/CF         KV/CF          No
  Key
                                                     • Doc work through Map-Reduce views
Where      CF/Doc(?)     CF/Doc(?)        No
                                                       (e.g. CouchDB).

 JOIN         No             No           No
                                                     • There is Bissa, which build a index for all
                                                       possible queries (No range queries)
Offline     CF/Doc         CF/Doc         No

                                                     • If you are doing this, you should do pilot
                                                       runs and make sure things work.


*KV: Key-Value Systems, CF: Column
Families, Doc: document based
Systems
Hybrid Approaches

• Some solution have many types of data and hence need
  more than one data solution (hybrid architectures).

• For example
   o Using DB for transactional data and CF for other data.
   o Keeping metadata and actual data separate for large data
     archives.
   o Use GraphDB to store relationship data while other while
     other data is in Column family storage.

• However, if transactions are needed, transactions have to
  be handled outside storages (e.g. using Atomicas,
  Zookeeper ).
Other Parameters

• Above list is not exhaustive, and there are other parameters
  o Read/Write ratio - when high, easy to scale.
  o High write throughput.
  o Very large data products - you will need a file system.
    May be keep metadata in Data registry and store data in
    a file system.
  o Flexible schema.
  o Archival usecases
  o Analytical usecases
  o Others ...
WSO2 Data Solutions




• Data Service Server - DSS

• Relational Storage Service - RSS
• Column Store Service - CSS
• File System as a service ( FSaaS) - HDFS

• DSS and RSS
• DSS and CSS
WSO2 Data Service Server (DSS)
WSO2 Data Service Server (DSS)
   Support for large XML outputs
   Content Filtering based on User's role
   Support for named parameters
   Ability to configure schema type for output elements
   Mixing multiple data sources in nested queries
   Distributed transaction support
   Oracle Ref Cursor support
   Support for multiple data source types
   Clustering support for High Availability and High Scalability
   Full support for WS-Security, WS-Trust, WS-Policy and WS-Secure Conversation and XKMS
   JMX and Web interface based monitoring and management
   WS-* and REST support
   Data validations
   UDT (User Defined Type) Support
   Complex Results
   Auto Generated Keys Support
   Boxcarring Support
   Batch Request Support
   Scheduled Tasks
   Registry Integration for Excel,CSV,XSLT
   Web Scraping Support
   Multiple SQL Dialect Support
   DB -> DS Generation
   Service Group/Hierarchy Support
   Database Explorer
   Data as a Service Features - DSS Stratos Service
     o Cassandra Integration
     o RDS Provisioning
WSO2 Data Service Server (DSS)
Data Services Description Language - DSDL
DSS Management Console
WSO2 Stratos Support for Relational Data

 • Offering a “database as as service” for tenants
    WSO2 Relational Storage Service
 • Users create database and receive JDBC URL
 • Database is allocated from Amazon RDS (MySQL) horizontal cluster
 • Tenants are isolated from each other and integrated with platform
   security model
WSO2 Relational Storage Service

• Use your own database server (anywhere)

• Register database connection as a datasource
  Use RSS to allocate a database
Stratos RSS
Stratos RSS
Stratos RSS
RSS Sample
WSO2 Column Store Service - CSS




Users can log in to the Web Console and create
Cassandra key spaces.
Column Store Service (Contd.)

• Key spaces will be allocated from a Cassandra clusters

• Users can manage and share his key spaces through Stratos
  Web Console and use those key spaces through Hector
  Client (Java Client for Cassandra)

• In essence we provide Cassandra as a part of Stratos as a
  Service with Multi-tenancy support and Security integration
  with WSO2 security model
WSO2 CSS Admin Console
 Left Menu




             Keyspace View
WSO2 CSS Admin Console

Keyspace Connection Details
WSO2 CSS Sample
File System as a Service - FSaaS
File System as a Service - FSaaS

The volume will be allocated from a HDFS cluster they are
isolated from other tenants in Stratos it is integrated with WSO2
Security model.

Users can manage and share his File system through Stratos
Web Console and use the file system like any other file
system.
FSaaS Sample
Data Processing - Mapreduce

• Mapreduce is inspired by map and reduce functions used in
  functional programming.
   o Initially introduced by Google with some parts being
     patented.

• Hadoop is a Mapreduce implementation that comes under
  Apache license agreement.

• WSO2 provides Mapreduce as a service.

• WSO2 Business Activity Monitor (BAM2) is an example use-
  case for WSO2's Mapreduce as a service.
WSO2 Mapreduce
• WSO2 Mapreduce is secure.

• WSO2 Mapreduce can use both FSaaS and DSS.
  o HDFS (FSaaS)
  o Cassendra (DSS)
WSO2 Mapreduce
WSO2 Mapreduce
WSO2 Mapreduce
WSO2 Mapreduce
WSO2 Mapreduce
WSO2 Mapreduce
Q&A
WSO2
• Founded in 2005 by acknowledged leaders in XML, Web
  Services Technologies & Standards and Open Source

• Producing entire middleware platform 100% open source
  under Apache license

• Business model is to sell comprehensive support &
  maintenance for our products

• Venture funded by Intel Capital and Quest Software.

• Global corporation with offices in USA, UK & Sri Lanka

• 150+ employees and growing.
Selected Customers



 https://ail.google.com/mail/u/0/?ui=2&i
 k=ad9ae58f41&view=att&th=1331a70
 983344a32&attid=0.1&disp=thd&reala
 ttid=f_gtxto6mk0&zw
WSO2 engagement model

• QuickStart
• Development
  Support
• Development
  Services
• Production
  Support
• Turnkey Solutions
  • WSO2 Mobile Services Solution
  • WSO2 FIX Gateway Solution
  • WSO2 SAP Gateway Solution

More Related Content

What's hot

Introduction to failover clustering with sql server
Introduction to failover clustering with sql serverIntroduction to failover clustering with sql server
Introduction to failover clustering with sql serverEduardo Castro
 
Introduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David NalleyIntroduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David Nalleybuildacloud
 
Getting Started with Apache CloudStack
Getting Started with Apache CloudStackGetting Started with Apache CloudStack
Getting Started with Apache CloudStackJoe Brockmeier
 
Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Isaac Chiang
 
2015 deploying flash in the data center
2015 deploying flash in the data center2015 deploying flash in the data center
2015 deploying flash in the data centerHoward Marks
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Deploying Apache CloudStack from API to UI
Deploying Apache CloudStack from API to UIDeploying Apache CloudStack from API to UI
Deploying Apache CloudStack from API to UIJoe Brockmeier
 
Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Lucas Jellema
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBrian Ritchie
 
Software defined storage real or bs-2014
Software defined storage real or bs-2014Software defined storage real or bs-2014
Software defined storage real or bs-2014Howard Marks
 
Scott Schnoll - Exchange server 2013 high availability and site resilience
Scott Schnoll - Exchange server 2013 high availability and site resilienceScott Schnoll - Exchange server 2013 high availability and site resilience
Scott Schnoll - Exchange server 2013 high availability and site resilienceNordic Infrastructure Conference
 
Taking the open cloud to 11
Taking the open cloud to 11Taking the open cloud to 11
Taking the open cloud to 11Joe Brockmeier
 
Build public private cloud using openstack
Build public private cloud using openstackBuild public private cloud using openstack
Build public private cloud using openstackFramgia Vietnam
 
Building Storage for Clouds (ONUG Spring 2015)
Building Storage for Clouds (ONUG Spring 2015)Building Storage for Clouds (ONUG Spring 2015)
Building Storage for Clouds (ONUG Spring 2015)Howard Marks
 
Guaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike TutkowskiGuaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike Tutkowskibuildacloud
 
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical HighlightsMaginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical HighlightsMaginatics
 

What's hot (20)

Introduction to failover clustering with sql server
Introduction to failover clustering with sql serverIntroduction to failover clustering with sql server
Introduction to failover clustering with sql server
 
Introduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David NalleyIntroduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David Nalley
 
Storage for VDI
Storage for VDIStorage for VDI
Storage for VDI
 
Getting Started with Apache CloudStack
Getting Started with Apache CloudStackGetting Started with Apache CloudStack
Getting Started with Apache CloudStack
 
Cloud stack for_beginners
Cloud stack for_beginnersCloud stack for_beginners
Cloud stack for_beginners
 
Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Cloud stack design camp on jun 15
Cloud stack design camp on jun 15
 
2015 deploying flash in the data center
2015 deploying flash in the data center2015 deploying flash in the data center
2015 deploying flash in the data center
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Deploying Apache CloudStack from API to UI
Deploying Apache CloudStack from API to UIDeploying Apache CloudStack from API to UI
Deploying Apache CloudStack from API to UI
 
Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Software defined storage real or bs-2014
Software defined storage real or bs-2014Software defined storage real or bs-2014
Software defined storage real or bs-2014
 
Txlf2012
Txlf2012Txlf2012
Txlf2012
 
Scott Schnoll - Exchange server 2013 high availability and site resilience
Scott Schnoll - Exchange server 2013 high availability and site resilienceScott Schnoll - Exchange server 2013 high availability and site resilience
Scott Schnoll - Exchange server 2013 high availability and site resilience
 
Taking the open cloud to 11
Taking the open cloud to 11Taking the open cloud to 11
Taking the open cloud to 11
 
Azure DBA with IaaS
Azure DBA with IaaSAzure DBA with IaaS
Azure DBA with IaaS
 
Build public private cloud using openstack
Build public private cloud using openstackBuild public private cloud using openstack
Build public private cloud using openstack
 
Building Storage for Clouds (ONUG Spring 2015)
Building Storage for Clouds (ONUG Spring 2015)Building Storage for Clouds (ONUG Spring 2015)
Building Storage for Clouds (ONUG Spring 2015)
 
Guaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike TutkowskiGuaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike Tutkowski
 
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical HighlightsMaginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
 

Similar to Storing and processing data with the wso2 platform

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupMayaData Inc
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonMongoDB
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
An introduction to Pincaster
An introduction to PincasterAn introduction to Pincaster
An introduction to PincasterFrank Denis
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixJason Brown
 
Schema Design
Schema DesignSchema Design
Schema DesignQBurst
 
Applications in the Cloud
Applications in the CloudApplications in the Cloud
Applications in the CloudEberhard Wolff
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 

Similar to Storing and processing data with the wso2 platform (20)

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
From 0 to syncing
From 0 to syncingFrom 0 to syncing
From 0 to syncing
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris Meetup
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Inexpensive storage
Inexpensive storageInexpensive storage
Inexpensive storage
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max Schireson
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
An introduction to Pincaster
An introduction to PincasterAn introduction to Pincaster
An introduction to Pincaster
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Big data stores
Big data  storesBig data  stores
Big data stores
 
Applications in the Cloud
Applications in the CloudApplications in the Cloud
Applications in the Cloud
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 

More from WSO2

Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
How to Create a Service in Choreo
How to Create a Service in ChoreoHow to Create a Service in Choreo
How to Create a Service in ChoreoWSO2
 
Ballerina Tech Talk - May 2023
Ballerina Tech Talk - May 2023Ballerina Tech Talk - May 2023
Ballerina Tech Talk - May 2023WSO2
 
Platform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzurePlatform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzureWSO2
 
GartnerITSymSessionSlides.pdf
GartnerITSymSessionSlides.pdfGartnerITSymSessionSlides.pdf
GartnerITSymSessionSlides.pdfWSO2
 
[Webinar] How to Create an API in Minutes
[Webinar] How to Create an API in Minutes[Webinar] How to Create an API in Minutes
[Webinar] How to Create an API in MinutesWSO2
 
Modernizing the Student Journey with Ethos Identity
Modernizing the Student Journey with Ethos IdentityModernizing the Student Journey with Ethos Identity
Modernizing the Student Journey with Ethos IdentityWSO2
 
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...WSO2
 
CIO Summit Berlin 2022.pptx.pdf
CIO Summit Berlin 2022.pptx.pdfCIO Summit Berlin 2022.pptx.pdf
CIO Summit Berlin 2022.pptx.pdfWSO2
 
Delivering New Digital Experiences Fast - Introducing Choreo
Delivering New Digital Experiences Fast - Introducing ChoreoDelivering New Digital Experiences Fast - Introducing Choreo
Delivering New Digital Experiences Fast - Introducing ChoreoWSO2
 
Fueling the Digital Experience Economy with Connected Products
Fueling the Digital Experience Economy with Connected ProductsFueling the Digital Experience Economy with Connected Products
Fueling the Digital Experience Economy with Connected ProductsWSO2
 
A Reference Methodology for Agile Digital Businesses
 A Reference Methodology for Agile Digital Businesses A Reference Methodology for Agile Digital Businesses
A Reference Methodology for Agile Digital BusinessesWSO2
 
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)WSO2
 
Lessons from the pandemic - From a single use case to true transformation
 Lessons from the pandemic - From a single use case to true transformation Lessons from the pandemic - From a single use case to true transformation
Lessons from the pandemic - From a single use case to true transformationWSO2
 
Adding Liveliness to Banking Experiences
Adding Liveliness to Banking ExperiencesAdding Liveliness to Banking Experiences
Adding Liveliness to Banking ExperiencesWSO2
 
Building a Future-ready Bank
Building a Future-ready BankBuilding a Future-ready Bank
Building a Future-ready BankWSO2
 
WSO2 API Manager Community Call - November 2021
WSO2 API Manager Community Call - November 2021WSO2 API Manager Community Call - November 2021
WSO2 API Manager Community Call - November 2021WSO2
 
[API World ] - Managing Asynchronous APIs
[API World ] - Managing Asynchronous APIs[API World ] - Managing Asynchronous APIs
[API World ] - Managing Asynchronous APIsWSO2
 
[API World 2021 ] - Understanding Cloud Native Deployment
[API World 2021 ] - Understanding Cloud Native Deployment[API World 2021 ] - Understanding Cloud Native Deployment
[API World 2021 ] - Understanding Cloud Native DeploymentWSO2
 
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”WSO2
 

More from WSO2 (20)

Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
How to Create a Service in Choreo
How to Create a Service in ChoreoHow to Create a Service in Choreo
How to Create a Service in Choreo
 
Ballerina Tech Talk - May 2023
Ballerina Tech Talk - May 2023Ballerina Tech Talk - May 2023
Ballerina Tech Talk - May 2023
 
Platform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzurePlatform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on Azure
 
GartnerITSymSessionSlides.pdf
GartnerITSymSessionSlides.pdfGartnerITSymSessionSlides.pdf
GartnerITSymSessionSlides.pdf
 
[Webinar] How to Create an API in Minutes
[Webinar] How to Create an API in Minutes[Webinar] How to Create an API in Minutes
[Webinar] How to Create an API in Minutes
 
Modernizing the Student Journey with Ethos Identity
Modernizing the Student Journey with Ethos IdentityModernizing the Student Journey with Ethos Identity
Modernizing the Student Journey with Ethos Identity
 
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
 
CIO Summit Berlin 2022.pptx.pdf
CIO Summit Berlin 2022.pptx.pdfCIO Summit Berlin 2022.pptx.pdf
CIO Summit Berlin 2022.pptx.pdf
 
Delivering New Digital Experiences Fast - Introducing Choreo
Delivering New Digital Experiences Fast - Introducing ChoreoDelivering New Digital Experiences Fast - Introducing Choreo
Delivering New Digital Experiences Fast - Introducing Choreo
 
Fueling the Digital Experience Economy with Connected Products
Fueling the Digital Experience Economy with Connected ProductsFueling the Digital Experience Economy with Connected Products
Fueling the Digital Experience Economy with Connected Products
 
A Reference Methodology for Agile Digital Businesses
 A Reference Methodology for Agile Digital Businesses A Reference Methodology for Agile Digital Businesses
A Reference Methodology for Agile Digital Businesses
 
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
 
Lessons from the pandemic - From a single use case to true transformation
 Lessons from the pandemic - From a single use case to true transformation Lessons from the pandemic - From a single use case to true transformation
Lessons from the pandemic - From a single use case to true transformation
 
Adding Liveliness to Banking Experiences
Adding Liveliness to Banking ExperiencesAdding Liveliness to Banking Experiences
Adding Liveliness to Banking Experiences
 
Building a Future-ready Bank
Building a Future-ready BankBuilding a Future-ready Bank
Building a Future-ready Bank
 
WSO2 API Manager Community Call - November 2021
WSO2 API Manager Community Call - November 2021WSO2 API Manager Community Call - November 2021
WSO2 API Manager Community Call - November 2021
 
[API World ] - Managing Asynchronous APIs
[API World ] - Managing Asynchronous APIs[API World ] - Managing Asynchronous APIs
[API World ] - Managing Asynchronous APIs
 
[API World 2021 ] - Understanding Cloud Native Deployment
[API World 2021 ] - Understanding Cloud Native Deployment[API World 2021 ] - Understanding Cloud Native Deployment
[API World 2021 ] - Understanding Cloud Native Deployment
 
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

Storing and processing data with the wso2 platform

  • 1. Storing and processing data with the WSO2 Platform Deependra Ariyadewa Wathsala Vithanage
  • 2. WSO2 • Founded in 2005 by acknowledged leaders in XML, Web Services Technologies & Standards and Open Source • Producing entire middleware platform 100% open source under Apache license • Business model is to sell comprehensive support & maintenance for our products • Venture funded by Intel Capital and Quest Software. • Global corporation with offices in USA, UK & Sri Lanka • 150+ employees and growing.
  • 3. Introduction to Data Problem • Information explosion o Rapid growth of published data. o Managing large amounts of data is difficult (this leads to an information overload) o Difficulties include  Capture  Storage  Search  Sharing  Analytics  Visualization o We need new tools to deal with BIG DATA.
  • 4. The Well Known Data Solution RDBMS • For many years this has been the choice • Scaling up RDBMS o Put it in a bigger computer o Replicate database over 2 - 3 nodes. This does not work well with more than 2 - 3 nodes. o Partition data over several nodes. Although JOIN queries are hard across many nodes, may require custom code and configuration. Transactions may not scale well.
  • 5. CAP Theorem and RDBMS • RDBMS has two key features o Relational Model with SQL o ACID transactions (Atomic, Consistent, Isolation & Durable) • CAP theorem states that in distributed systems it is only possible to have two properties out of the properties Consistency, Availability & Partition Tolerance at any given time. o Once you have picked two properties you will loose the remaining one. • But there are some applications that do not need all the properties of RDBMS. Once these are dropped system scales. (e.g. Google Big Tables)
  • 6. Rise of NoSQL • Large internet companies hit the problem first, they build systems that are specific to their problem, and they did scale. o Google Big table o Amazon Dynamo • Soon many others followed, and most of them are free and open source. • Among advantages of NoSQL are o Scalability o Flexible schema o Designed to scale and support fault tolerance out of the Box
  • 7. Finding the right Data Solution • Data Types o Unstructured Data  Files o Semi Structured Data  XML Databases, Queues, Graphs and Lists o Structured Data  DBMS
  • 8. Handling Unstructured Data • Storage Options o Key - Value storages for small data items o Distributed file systems for other cases o Metadata Registries (Nirvana, SDSC Resource broker) • Scalability o Key - Value storages are highly Scalable (e.g. Amazon Dynamo) o Distributed File Systems are generally scalable (HDFS, Lustre) o Metadata Registries are also highly scalable • Search o Each of above provide key based retrieval o Metadata registries provide property based search. o It is possible to build a index for content using tools like Lucence and use that for search.
  • 9. Handling Semi-Structured Data • Storage Options o Answer depends on the type of structure. (e.g. XML = XML Databases, Graphs = Graph Databases, List = Data structure servers, work items = Queue) o If there is a server optimized for a given type, it is often much more efficient than using a DB. (e.g. Graph databases can support fast relationship search) • Scalabilty o XML databases can shared data across nodes, so usually scalable, but others are not that scalable • Search o Very much custom. E.g. XML or any tree = XPath o Graph can support very fast relationship search
  • 10. Handling Structured Data (1-3 nodes) • In general using DB here Small (1-3 nodes) for every case might Loose Operation Transactions Consistency Consistency work. Primary Key DB/ KV/ CF DB/ KV/ CF DB • Reason for using options other than DB Where DB/ CF/Doc DB/ CF/Doc DB • When there is JOIN DB DB DB potential need to scale Offline DB/CF/Doc DB/CF/Doc DB/CF/Doc later. • High write throughput • KV is 1-D where as other two are 2D *KV: Key-Value Systems, CF: Column Families, Doc: document based Systems
  • 11. Handling Structured Data (10 nodes) • KV, CF, and Doc can easily handle Scalable (10 nodes) this case. Loose Operation Transactions Consistency Consistency • If DBs used with data shredded across many nodes. Primary Key KV/CF KV/CF Partitioned DB? • Transactions might work with Where CF/Doc CF/Doc Partitioned given that participants on one DB? transaction are not too many. JOIN ?? ?? Partitioned • JOINs might need to transfer too DB?? much data between nodes. Offline CF/Doc CF/Doc No • Also should consider in Memory DBs like Vault DB • Offline mode will work • Most systems let users choose *KV: Key-Value Systems, CF: Column consistency, and loose consistency Families, Doc: document based can scale more. (e.g. Cassandra) Systems
  • 12. Highly Scalable System • Transactions does not work in this scale. Highly Scalable (1000s nodes) (CAP theorem). • Same for the JOIN. Problem is sometime Loose Operation Transactions Consistency Consistency too much data needs to be transferred Primary KV/CF KV/CF No between nodes to perform the JOIN. Key • Offline case handled through Map- Where CF/Doc CF/Doc No Reduce. Even JOIN case is OK since there is time. JOIN No No No Offline CF/Doc CF/Doc No *KV: Key-Value Systems, CF: Column Families, Doc: document based Systems
  • 13. Highly Scalable Systems + Primary Key Retrieval • This is (comparatively) the easy one. Highly Scalable (1000s nodes) Loose Operation Transactions • Can be solved through DHT Consistency Consistency (Distributed Hash table) based solutions Primary KV/CF KV/CF No or architectures like OceanStore. Key Where CF/Doc(?) CF/Doc(?) No • Both Key-Value Storages(KV) and JOIN No No No Column Families (CF) can be used. But Key-Value model is preferred as it is Offline CF/Doc CF/Doc No more scalable. *KV: Key-Value Systems, CF: Column Families, Doc: document based Systems
  • 14. Highly scalable systems + WHERE • This Generally OK, but tricky. Highly Scalable (1000s nodes) Loose Operation Transactions • CF work through a Secondary index that Consistency Consistency do Scatter-gather (e.g. Cassandra). Primary KV/CF KV/CF No Key • Doc work through Map-Reduce views Where CF/Doc(?) CF/Doc(?) No (e.g. CouchDB). JOIN No No No • There is Bissa, which build a index for all possible queries (No range queries) Offline CF/Doc CF/Doc No • If you are doing this, you should do pilot runs and make sure things work. *KV: Key-Value Systems, CF: Column Families, Doc: document based Systems
  • 15. Hybrid Approaches • Some solution have many types of data and hence need more than one data solution (hybrid architectures). • For example o Using DB for transactional data and CF for other data. o Keeping metadata and actual data separate for large data archives. o Use GraphDB to store relationship data while other while other data is in Column family storage. • However, if transactions are needed, transactions have to be handled outside storages (e.g. using Atomicas, Zookeeper ).
  • 16. Other Parameters • Above list is not exhaustive, and there are other parameters o Read/Write ratio - when high, easy to scale. o High write throughput. o Very large data products - you will need a file system. May be keep metadata in Data registry and store data in a file system. o Flexible schema. o Archival usecases o Analytical usecases o Others ...
  • 17. WSO2 Data Solutions • Data Service Server - DSS • Relational Storage Service - RSS • Column Store Service - CSS • File System as a service ( FSaaS) - HDFS • DSS and RSS • DSS and CSS
  • 18. WSO2 Data Service Server (DSS)
  • 19. WSO2 Data Service Server (DSS)  Support for large XML outputs  Content Filtering based on User's role  Support for named parameters  Ability to configure schema type for output elements  Mixing multiple data sources in nested queries  Distributed transaction support  Oracle Ref Cursor support  Support for multiple data source types  Clustering support for High Availability and High Scalability  Full support for WS-Security, WS-Trust, WS-Policy and WS-Secure Conversation and XKMS  JMX and Web interface based monitoring and management  WS-* and REST support  Data validations  UDT (User Defined Type) Support  Complex Results  Auto Generated Keys Support  Boxcarring Support  Batch Request Support  Scheduled Tasks  Registry Integration for Excel,CSV,XSLT  Web Scraping Support  Multiple SQL Dialect Support  DB -> DS Generation  Service Group/Hierarchy Support  Database Explorer  Data as a Service Features - DSS Stratos Service o Cassandra Integration o RDS Provisioning
  • 20. WSO2 Data Service Server (DSS)
  • 21. Data Services Description Language - DSDL
  • 23. WSO2 Stratos Support for Relational Data • Offering a “database as as service” for tenants WSO2 Relational Storage Service • Users create database and receive JDBC URL • Database is allocated from Amazon RDS (MySQL) horizontal cluster • Tenants are isolated from each other and integrated with platform security model
  • 24. WSO2 Relational Storage Service • Use your own database server (anywhere) • Register database connection as a datasource Use RSS to allocate a database
  • 29. WSO2 Column Store Service - CSS Users can log in to the Web Console and create Cassandra key spaces.
  • 30. Column Store Service (Contd.) • Key spaces will be allocated from a Cassandra clusters • Users can manage and share his key spaces through Stratos Web Console and use those key spaces through Hector Client (Java Client for Cassandra) • In essence we provide Cassandra as a part of Stratos as a Service with Multi-tenancy support and Security integration with WSO2 security model
  • 31. WSO2 CSS Admin Console Left Menu Keyspace View
  • 32. WSO2 CSS Admin Console Keyspace Connection Details
  • 34. File System as a Service - FSaaS
  • 35. File System as a Service - FSaaS The volume will be allocated from a HDFS cluster they are isolated from other tenants in Stratos it is integrated with WSO2 Security model. Users can manage and share his File system through Stratos Web Console and use the file system like any other file system.
  • 37. Data Processing - Mapreduce • Mapreduce is inspired by map and reduce functions used in functional programming. o Initially introduced by Google with some parts being patented. • Hadoop is a Mapreduce implementation that comes under Apache license agreement. • WSO2 provides Mapreduce as a service. • WSO2 Business Activity Monitor (BAM2) is an example use- case for WSO2's Mapreduce as a service.
  • 38. WSO2 Mapreduce • WSO2 Mapreduce is secure. • WSO2 Mapreduce can use both FSaaS and DSS. o HDFS (FSaaS) o Cassendra (DSS)
  • 45. Q&A
  • 46. WSO2 • Founded in 2005 by acknowledged leaders in XML, Web Services Technologies & Standards and Open Source • Producing entire middleware platform 100% open source under Apache license • Business model is to sell comprehensive support & maintenance for our products • Venture funded by Intel Capital and Quest Software. • Global corporation with offices in USA, UK & Sri Lanka • 150+ employees and growing.
  • 47. Selected Customers https://ail.google.com/mail/u/0/?ui=2&i k=ad9ae58f41&view=att&th=1331a70 983344a32&attid=0.1&disp=thd&reala ttid=f_gtxto6mk0&zw
  • 48. WSO2 engagement model • QuickStart • Development Support • Development Services • Production Support • Turnkey Solutions • WSO2 Mobile Services Solution • WSO2 FIX Gateway Solution • WSO2 SAP Gateway Solution