Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
© 2013 IBM Corporation
IBM DB2 Analytics Accelerator (IDAA)
Near Real-Time Analytics with IDAA
March 2013
Daniel Martin (d...
© 2013 IBM Corporation
Disclaimer
© Copyright IBM Corporation 2012. All rights reserved.
U.S. Government Users Restricted ...
© 2013 IBM Corporation3 03/20/13
Introduction & Overview
© 2013 IBM Corporation
Concept: Transparently accelerate analytical
queries by dynamically offloading (DB2 optimizer
decid...
© 2013 IBM Corporation5
“Host” Computers
Snippet BladesTM
(S-Blades, SPUs)
Disk Enclosures
IDAA Server
SQL Compiler, Query...
© 2013 IBM Corporation6
DB2 for z/OS
Optimizer
ISAOptDRDARequestor
Smart Analytics Optimizer
Application
Application
Inter...
© 2013 IBM Corporation7 03/20/13
Integrating Replication - Requirements
 The Incremental Update capability is part of the...
© 2013 IBM Corporation8 03/20/13
Complementing Existing Synchronization Options
 There are different options to synchroni...
© 2013 IBM Corporation9 03/20/13
Reporting and Analytics on Continuously Changing Data
 With continuously changing data, ...
© 2013 IBM Corporation10 03/20/13
Architecture
© 2013 IBM Corporation11 03/20/13
IBM Puredata System for AnalyticsIBM Puredata System for Analytics
Architecture
DB2 for ...
© 2013 IBM Corporation12 03/20/13
Properties of this Architecture
 Optimized for throughput
– During normal operation, no...
© 2013 IBM Corporation13 03/20/13
Incremental Update - Table Refresh Integration
Using IDAA table-refresh for taking the i...
© 2013 IBM Corporation14 03/20/13
User Interface
Incremental update UI elements only visible if function was enabled on th...
© 2013 IBM Corporation15 03/20/13
High-Availability Setup
 Capture side
– One active capture engine per DS-Group
• Multip...
© 2013 IBM Corporation16 03/20/13
Replication Tuning
 Replication on the target system produces DELETE statements with pr...
© 2013 IBM Corporation17 03/20/13
Evaluation
© 2013 IBM Corporation18 03/20/13
Impact on Concurrently Running Queries
 Validated that incremental update has only mino...
© 2013 IBM Corporation19 03/20/13
Table Refresh “Best Practices”
© 2013 IBM Corporation20 03/20/13
Próxima SlideShare
Cargando en…5
×

EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator

581 visualizaciones

Publicado el

EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator

Publicado en: Software
  • Sé el primero en comentar

EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator

  1. 1. © 2013 IBM Corporation IBM DB2 Analytics Accelerator (IDAA) Near Real-Time Analytics with IDAA March 2013 Daniel Martin (danmartin@de.ibm.com) – IBM Software Group, Information Management
  2. 2. © 2013 IBM Corporation Disclaimer © Copyright IBM Corporation 2012. All rights reserved. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. IBM, the IBM logo, ibm.com, DB2, and DB2 for z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml Other company, product, or service names may be trademarks or service marks of others.
  3. 3. © 2013 IBM Corporation3 03/20/13 Introduction & Overview
  4. 4. © 2013 IBM Corporation Concept: Transparently accelerate analytical queries by dynamically offloading (DB2 optimizer decides) to a data warehouse appliance: no application change! • Transparency: applications connected to DB2 are entirely unaware of the Accelerator • Integration: Deep integration with DB2 (security, monitoring, backup, ...) • Self-managed workloads: queries are executed in the most efficient location • Simplified administration: appliance hands-free operations, eliminating most database tuning tasks • Performance: Unprecedented response times for both, OLTP and OLAP queries IBM DB2 Analytics Accelerator (IDAA)
  5. 5. © 2013 IBM Corporation5 “Host” Computers Snippet BladesTM (S-Blades, SPUs) Disk Enclosures IDAA Server SQL Compiler, Query Plan, Optimizer, Administration 2 front/end hosts, IBM 3650M3 or 3850X5 clustered active-passive 2 Nehalem-EP Quad-core 2.4GHz per host Processor & streaming DB logic High-performance database engine streaming joins, aggregations, sorts, etc. e.g. TF12: 12 back/end SPUs (more details on following charts) Slice of User Data Swap and Mirror partitions High speed data streaming High compression rate EXP3000 JBOD Enclosures 12 x 3.5” 1TB, 7200RPM, SAS (3Gb/s) max 116MB/s (200-500MB/s compressed data) e.g. TF12: 8 enclosures → 96 HDDs 32TB uncompressed user data (→ 128TB) 9 GB/s scan rate (~36GB/s w. compression) Powered by IBM Netezza
  6. 6. © 2013 IBM Corporation6 DB2 for z/OS Optimizer ISAOptDRDARequestor Smart Analytics Optimizer Application Application Interface Queries executed with Smart Analytics Optimizer Queries executed without Smart Analytics Optimizer Heartbeat (Smart Analytics Optimizer availability and performance indicators) Query execution run-time for queries that cannot be or should not be off- loaded to ISAOpt SPU CPU FPGA Memory SPU CPU FPGA Memory SPU CPU FPGA Memory SPU CPU FPGA Memory SMPHost Heartbeat IDAA Query Execution
  7. 7. © 2013 IBM Corporation7 03/20/13 Integrating Replication - Requirements  The Incremental Update capability is part of the base offering for all customers, and not a separately orderable feature  Fully integrated into IDAA – Managed via IDAA Studio – Integrated into IDAA software update – Integrated into IDAA HA concepts – Automated scheduling of maintenance operations (RUNSTATS / REORG) on IDAA – Automation possible via Stored Procedure
  8. 8. © 2013 IBM Corporation8 03/20/13 Complementing Existing Synchronization Options  There are different options to synchronize tables between DB2 and IDAA – Choice depends on IDAA usage scenarios, update frequency, affinity to partitions, etc. Synchronization options Use cases, characteristics and requirements Full table refresh The entire content of a database table is refreshed for accelerator processing  Existing ETL process replaces entire table  Multiple sources or complex transformations  Smaller, un-partitioned tables  Reporting based on consistent snapshot (“check point”) Table partition refresh For a partitioned database table, selected partitions can be refreshed for accelerator processing  Optimization for (time-) partitioned warehouse tables, appending changes “at the end”  More efficient than full table refresh for larger tables  Reporting based on consistent snapshot (“check point”) Incremental update Log-based capturing of changes and propagation to IDAA with low latency (typically few minutes)  Scattered updates after “bulk” load  Reporting on continuously updated data (e.g., an ODS), considering most recent changes  More efficient for smaller updates than full table refresh
  9. 9. © 2013 IBM Corporation9 03/20/13 Reporting and Analytics on Continuously Changing Data  With continuously changing data, users may experience different results for subsequent query execution – Users need to understand this behavior  Can use “waitForReplication” Accelerator SP subcommand – Wait until all committed data at the time of SP invocation has been applied to the target Time Users submitting queries Updates to database waitForReplication() waitForReplication()
  10. 10. © 2013 IBM Corporation10 03/20/13 Architecture
  11. 11. © 2013 IBM Corporation11 03/20/13 IBM Puredata System for AnalyticsIBM Puredata System for Analytics Architecture DB2 for z/OSDB2 for z/OS insert delete update Engine for DB2 z/OS (Log reading) Engine for DB2 z/OS (Log reading) IDAA Database IDAA Database Engine for IBM Netezza (stage + apply changes) Engine for IBM Netezza (stage + apply changes) APIAPI IDAA ServerIDAA Server Access Server (manage engines and subscriptions) Access Server (manage engines and subscriptions) (private network 10G fiber) Catalog information Catalog information <xml> IDAA Stored Procedures ACCEL_CONTROL_ACCELERATOR ACCEL_ENABLE_REPLICATION ... IDAA Stored Procedures ACCEL_CONTROL_ACCELERATOR ACCEL_ENABLE_REPLICATION ... JCLJCL Automation Code (creates data sources, subscriptions) Automation Code (creates data sources, subscriptions) IDAA StudioIDAA Studio
  12. 12. © 2013 IBM Corporation12 03/20/13 Properties of this Architecture  Optimized for throughput – During normal operation, no disk I/O involved • DB2 → log buffer → capture staging space → network → apply staging space → IDAA – Changes within the apply staging space are consolidated on the target • More than one change to the same row results in a single change – Mini-batches to leverage Netezza bulk load interface • The source sends a UR to the target once the commit log record was read • The target applies all URs that arrived during a 60s window (or if size limit reached) – UPDATEs are decomposed into <DELETE, INSERT> pairs (and merged with “regular” DELETE and INSERT batches)  Use of parallel UNLOAD with DB2 INTERNAL format to establish the initial snapshot of a table – Replication continues from this snapshot (capture point automatically managed)  IDAA schedules REORG automatically as a low prio task in the background as a threshold of “disorganization” is reached on Netezza  Simple identity mapping of tables – No user-exits – No transformations  Based on “production” components
  13. 13. © 2013 IBM Corporation13 03/20/13 Incremental Update - Table Refresh Integration Using IDAA table-refresh for taking the initial snapshot or re-syncing after bulk changes Use case Details Operations Enable incremental update on a newly added table (state: INITIAL_LOAD_PENDING) Lock mode TABLE or TABLESET used for the load to prevent in-flight changes while the UNLOADs are running ● Enable replication for table ● Load table (sets capture point when load completed) ● Start replication Re-load a loaded, replicated table, e.g. because of non- logged operation on source table Assumption: table is synchronized after re-load, replication will continue from this new “snapshot” ● Full reload or partition-reload the table (sets new capture point when the load completed)
  14. 14. © 2013 IBM Corporation14 03/20/13 User Interface Incremental update UI elements only visible if function was enabled on the DB2 subsystem  Start / stop replication process (per subsystem-accelerator pair)  Enable / disable replication (per table)  Trace collection  Information on replication latency and events
  15. 15. © 2013 IBM Corporation15 03/20/13 High-Availability Setup  Capture side – One active capture engine per DS-Group • Multiple stand-by instances, coordinated via ENQ • Shared metadata – z/OS Communication Server migrates D-VIPA in case of fail-over  Apply side (appliance internal) – Integration into cluster management (active-standby) – Mirrored disk between active and standby host (shared metadata) – All components are migrated to the standby host and restarted – replication will continue automatically where it left off Member 1 Capture (active) Member 2 LPAR 2 LPAR 1 DS Group Capture (hot-standby) catalog D-VIPA D-VIPA
  16. 16. © 2013 IBM Corporation16 03/20/13 Replication Tuning  Replication on the target system produces DELETE statements with predicates on the unique columns (index or constraint) of the source table – Can use “clustered base tables” for more efficient location of rows to be deleted – Caveat: may conflict with tuning objectives (e.g. table already clustered on time columns)  If multiple unique constraints are available, we automatically select the “best” set of columns – The set with the minimal number of columns (partially) matching existing clustering columns  If tables are not clustered yet, the system suggests to cluster on source table columns with unique index or unique constraint
  17. 17. © 2013 IBM Corporation17 03/20/13 Evaluation
  18. 18. © 2013 IBM Corporation18 03/20/13 Impact on Concurrently Running Queries  Validated that incremental update has only minor impact on query response time – “No” workload: • 10x parallel queries: 5 streaming, 5 aggregation / group by – “Medium” workload: • 10x parallel queries: 5 streaming, 5 aggregation / group by • Replication from 1 subsystem: 300.000 rows/minute / 5.000 rows/s – “Full” workload • 10x parallel queries: 5 streaming, 5 aggregation / group by • Replication from 2 subsystems: 2.0 mio rows/minute, 33.333 rows/s
  19. 19. © 2013 IBM Corporation19 03/20/13 Table Refresh “Best Practices”
  20. 20. © 2013 IBM Corporation20 03/20/13

×