Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Seamless Analytical Framework

25 visualizaciones

Publicado el


FinTech and InsuranceTech case studies digitally transforming Europe's future with BigData and AI
The new data-driven industrial revolution highlights the need for big data technologies to unlock the potential in various application domains. The insurance and finance services industry is rapidly transformed by data-intensive operations and applications. FinTech and InsuranceTech combine very large datasets from legacy banking systems with other data sources such as financial markets data, regulatory datasets, real-time retail transactions, and more, improving financial services and activities for customers.

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Seamless Analytical Framework

  1. 1. Seamless Analytical Framework Pavlos Kranas LeanXcale pavlos@leanxcale.com
  2. 2. Seamless Analytical Framework Modern enterprises use Operational databases for OLTP load Key-Value for IoT data Data warehouses for data analytics Datalakes etc ... Need polyglot capabilities
  3. 3. Seamless Analytical Framework Nowadays: Data Federation using Spark
  4. 4. Seamless Analytical Framework Nowadays: Data Federation using Spark BUT: Can be very resource consuming Cannot exploit the specific capabilities of each different datastore
  5. 5. Seamless Analytical Framework – User Story Data ingestion in operational datastore (LeanXcale) Old data becomes historical, with no modifications Data Warehouse to perform analytics on big data volumes Distribution of datasets is problematic Data to be retrieved from both stores To be merged in the application level Data consistency considerations when moving datasets
  6. 6. Seamless Analytical Framework - Solution Seamless Analytical Framework Federate data coming from two different datastores: HTAP Relational LXS Datastore IBM Object store sharing the SAME dataset Single (black box) component that consists of two datastores exploits unique characteristics of each one transparently from the user does not compromise some requirements for the benefits of others
  7. 7. Query Federation
  8. 8. Seamless Component Query path 16.07.2019 Periodic Review Meeting 8 LXS Federator at: ‘2018-01-01' SELECT vessel_code, datetime, longitude, latitude, wind_speed FROM danaos WHERE wind_speed > 30
  9. 9. Query path 16.07.2019 Periodic Review Meeting 9 LXS Federator LXS DB Seamless Component SELECT vessel_code, datetime, longitude, latitude, wind_speed FROM danaos WHERE wind_speed > 30 LXS Federator at: ‘2018-01-01' WHERE wind_speed > 30 AND date > ‘2018-01-01’
  10. 10. Seamless Component Query path 16.07.2019 Periodic Review Meeting 10 LXS Federator Spark JDBC Thrift Connector Object Storage with Data Skipping LXS DB SELECT vessel_code, datetime, longitude, latitude, wind_speed FROM danaos WHERE wind_speed > 30 WHERE wind_speed > 30 AND date > ‘2018-01-01’ WHERE wind_speed > 30 AND date <= ‘2018-01-01’
  11. 11. Supported Operations Currently supports Full Scan Ordered Scan LIMIT Aggregations Group By Aggregations Ordered Group By Aggregations Does not yet support JOIN on fragmented data tables
  12. 12. Data Movement High Level 16.07.2019 Periodic Review Meeting 12
  13. 13. Data Movement High Level 16.07.2019 Periodic Review Meeting 13 Prepare Data Slice – Data Manager informs LXS to move data slice to a new data region, so that it can easily drop it later 1. Prepare Data Slice
  14. 14. Data Movement High Level 16.07.2019 Periodic Review Meeting 14 Inform to Move Data Slice – Data Manager informs the Data Mover that it can start the movement process, sending the SQL statement to grab the data slice from LXS 2. Inform to Move Data Slice
  15. 15. Data Movement High Level 16.07.2019 Periodic Review Meeting 15 Get Data Slice – Upon receiving a message from the RabbitMQ the data mover fetches the data slice from LXS via JDBC 3. Get Data Slice
  16. 16. Data Movement High Level 16.07.2019 Periodic Review Meeting 16 Store Data Slice – Data Mover layout the data, build data skipping indexes and stores the data to Object Storage 4. Store Data Slice
  17. 17. Data Movement High Level 16.07.2019 Periodic Review Meeting 17 Slice is moved – After data is persisted in Object Store, an ACK is sent to the Data Manager which then informs the federator to adjust the split point and drop the slice from LXS 5. Slice is Moved
  18. 18. Data Movement High Level 16.07.2019 Periodic Review Meeting 18 Drop Slice – Federator forwards the split point and requests LXS to drop the slice 6. Drop Slice
  19. 19. Seamless Analytical Framework Thank you!!!
  20. 20. Federator Ensures Data Consistency 16.07.2019 Periodic Review Meeting 20 IBM COS LXS Federator 2018-01-01 2018-01-01
  21. 21. Federator Ensures Data Consistency 16.07.2019 Periodic Review Meeting 21 IBM COS LXS Federator 2018-01-01 2018-01-01 2018-02-01
  22. 22. Federator Ensures Data Consistency 16.07.2019 Periodic Review Meeting 22 IBM COS LXS Federator 2018-01-01 2018-01-01 2018-02-01 2018-02-01
  23. 23. Federator Ensures Data Consistency 16.07.2019 Periodic Review Meeting 23 IBM COS LXS Federator 2018-01-01 2018-01-01 2018-02-01 2018-02-01
  24. 24. Federator Ensures Data Consistency 16.07.2019 Periodic Review Meeting 24 IBM COS LXS Federator 2018-01-01 2018-01-01 2018-02-01 2018-02-01
  25. 25. Federator Ensures Data Consistency 16.07.2019 Periodic Review Meeting 25 IBM COS LXS Federator 2018-02-01 2018-02-01
  26. 26. Next Steps Support of the JOIN operation on split tables between the stores Both Stores natively support JOIN operations Query Federator can push down the operation and merge results Strategies to take into account data locality to reduce amount of data to be sent JOIN can be translated as the union of 4 separate JOINS Bind Join can be considered for optimize the execution

×