SlideShare a Scribd company logo
Enviar búsqueda
Cargar
Iniciar sesión
Registrarse
Information Virtualization: Query Federation on Data Lakes
Denunciar
DataWorks Summit
Seguir
DataWorks Summit
16 de Jun de 2015
•
0 recomendaciones
•
8,078 vistas
1
de
29
Information Virtualization: Query Federation on Data Lakes
16 de Jun de 2015
•
0 recomendaciones
•
8,078 vistas
Denunciar
Tecnología
Hadoop Summit 2015
DataWorks Summit
Seguir
DataWorks Summit
Recomendados
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
34.9K vistas
•
19 diapositivas
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Denodo
6.6K vistas
•
22 diapositivas
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
DataWorks Summit
2.5K vistas
•
15 diapositivas
Datalake Architecture
TechYugadi IT Solutions & Consulting
4.4K vistas
•
34 diapositivas
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Data Con LA
1.3K vistas
•
20 diapositivas
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
prajods
4.6K vistas
•
28 diapositivas
Más contenido relacionado
La actualidad más candente
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Jeffrey T. Pollock
1.3K vistas
•
34 diapositivas
One Slide Overview: ORCL Big Data Integration and Governance
Jeffrey T. Pollock
1.7K vistas
•
6 diapositivas
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
5.3K vistas
•
38 diapositivas
JBoss Enterprise Data Services (Data Virtualization)
plarsen67
2K vistas
•
75 diapositivas
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
1.3K vistas
•
14 diapositivas
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Zaloni
2.9K vistas
•
21 diapositivas
La actualidad más candente
(20)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Jeffrey T. Pollock
•
1.3K vistas
One Slide Overview: ORCL Big Data Integration and Governance
Jeffrey T. Pollock
•
1.7K vistas
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
•
5.3K vistas
JBoss Enterprise Data Services (Data Virtualization)
plarsen67
•
2K vistas
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
•
1.3K vistas
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Zaloni
•
2.9K vistas
Enterprise Data Lake - Scalable Digital
sambiswal
•
99 vistas
Data Lakes - The Key to a Scalable Data Architecture
Zaloni
•
2.3K vistas
Data lake benefits
Ricky Barron
•
1.3K vistas
5 Steps for Architecting a Data Lake
MetroStar
•
366 vistas
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Hortonworks
•
3.3K vistas
Data Governance, Compliance and Security in Hadoop with Cloudera
Caserta
•
7.7K vistas
Big Data and Data Virtualization
Kenneth Peeples
•
4.3K vistas
Big data architectures and the data lake
James Serra
•
54K vistas
Building the Enterprise Data Lake: A look at architecture
mark madsen
•
4.7K vistas
Data Lake
Anitha Krishnappa
•
196 vistas
Big data insights with Red Hat JBoss Data Virtualization
Kenneth Peeples
•
4.7K vistas
Open Development
Medsphere
•
449 vistas
Planing and optimizing data lake architecture
Milos Milovanovic
•
865 vistas
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Denodo
•
4K vistas
Similar a Information Virtualization: Query Federation on Data Lakes
Overview - IBM Big Data Platform
Vikas Manoria
21.6K vistas
•
33 diapositivas
Tapdata Product Intro
Tapdata
57 vistas
•
52 diapositivas
SendGrid Improves Email Delivery with Hybrid Data Warehousing
Amazon Web Services
1.5K vistas
•
44 diapositivas
Future of Data Strategy (ASEAN)
Denodo
186 vistas
•
35 diapositivas
Pivotal Big Data Suite: A Technical Overview
VMware Tanzu
2.3K vistas
•
54 diapositivas
Get Started Quickly with IBM's Hadoop as a Service
IBM Cloud Data Services
2.1K vistas
•
27 diapositivas
Similar a Information Virtualization: Query Federation on Data Lakes
(20)
Overview - IBM Big Data Platform
Vikas Manoria
•
21.6K vistas
Tapdata Product Intro
Tapdata
•
57 vistas
SendGrid Improves Email Delivery with Hybrid Data Warehousing
Amazon Web Services
•
1.5K vistas
Future of Data Strategy (ASEAN)
Denodo
•
186 vistas
Pivotal Big Data Suite: A Technical Overview
VMware Tanzu
•
2.3K vistas
Get Started Quickly with IBM's Hadoop as a Service
IBM Cloud Data Services
•
2.1K vistas
Oil and gas big data edition
Mark Kerzner
•
1.6K vistas
IBM Smarter Analytics
Adrian Turcu
•
1.7K vistas
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Denodo
•
145 vistas
Streaming Data and Stream Processing with Apache Kafka
confluent
•
3K vistas
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
MongoDB
•
514 vistas
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
•
21.1K vistas
IBM Industry Models and Data Lake
Pat O'Sullivan
•
4.3K vistas
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
•
1K vistas
OC Big Data Monthly Meetup #6 - Session 1 - IBM
Big Data Joe™ Rossi
•
1.1K vistas
SD Big Data Monthly Meetup #4 - Session 1 - IBM
Big Data Joe™ Rossi
•
771 vistas
Making Hadoop Ready for the Enterprise
DataWorks Summit
•
2.5K vistas
Big and fast data strategy 2017 jr
Jonathan Raspaud
•
434 vistas
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
•
482 vistas
Fast Data Strategy Houston Roadshow Presentation
Denodo
•
480 vistas
Más de DataWorks Summit
Data Science Crash Course
DataWorks Summit
19.3K vistas
•
47 diapositivas
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
2.9K vistas
•
20 diapositivas
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
2.1K vistas
•
19 diapositivas
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
1.8K vistas
•
18 diapositivas
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
1.6K vistas
•
74 diapositivas
Managing the Dewey Decimal System
DataWorks Summit
1K vistas
•
8 diapositivas
Más de DataWorks Summit
(20)
Data Science Crash Course
DataWorks Summit
•
19.3K vistas
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
•
2.9K vistas
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
•
2.1K vistas
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
•
1.8K vistas
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
•
1.6K vistas
Managing the Dewey Decimal System
DataWorks Summit
•
1K vistas
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
•
833 vistas
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
•
911 vistas
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
•
714 vistas
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
•
1.3K vistas
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
•
1.8K vistas
Security Framework for Multitenant Architecture
DataWorks Summit
•
1.1K vistas
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
•
1.8K vistas
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
•
3.2K vistas
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
•
1K vistas
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
•
4K vistas
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
•
921 vistas
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
•
763 vistas
Computer Vision: Coming to a Store Near You
DataWorks Summit
•
214 vistas
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
•
615 vistas
Último
Common WordPress APIs_ Settings API
Jonathan Bossenger
32 vistas
•
10 diapositivas
roomos_webinar_280923_v2.pptx
ThousandEyes
31 vistas
•
29 diapositivas
Product Listing Presentation-Maidy Veloso.pptx
MaidyVeloso
61 vistas
•
11 diapositivas
Solving today’s Traffic Problems with Sustainable Ride Hailing Solution
On Demand Clone
44 vistas
•
9 diapositivas
OpenAI API crash course
Dimitrios Platis
22 vistas
•
42 diapositivas
Product Research Presentation-Maidy Veloso.pptx
MaidyVeloso
44 vistas
•
23 diapositivas
Último
(20)
Common WordPress APIs_ Settings API
Jonathan Bossenger
•
32 vistas
roomos_webinar_280923_v2.pptx
ThousandEyes
•
31 vistas
Product Listing Presentation-Maidy Veloso.pptx
MaidyVeloso
•
61 vistas
Solving today’s Traffic Problems with Sustainable Ride Hailing Solution
On Demand Clone
•
44 vistas
OpenAI API crash course
Dimitrios Platis
•
22 vistas
Product Research Presentation-Maidy Veloso.pptx
MaidyVeloso
•
44 vistas
Cloud Study Jam ppt.pptx
Poorabpatel
•
31 vistas
Deep Dive Microsoft Viva Insights - Collabdays Bletchley Park 2023
Chirag Patel
•
18 vistas
Webhook Testing Strategy
Dimpy Adhikary
•
99 vistas
"Building Asynchronous SOA for Modern Applications", Sai Pragna Etikyala
Fwdays
•
33 vistas
Product Research Presentation
DeahJadeArellano
•
34 vistas
GDSC ZHCET Google Study Jams 23.pdf
AbhishekSingh313342
•
26 vistas
Brisbane MuleSoft Meetup 13 MuleSoft Maven and Managing Dependencies Part 1.pptx
BrianFraser29
•
17 vistas
How is AI changing journalism? Strategic considerations for publishers and ne...
Damian Radcliffe
•
125 vistas
Dennis Wendland_The i4Trust Collaboration Programme.pptx
FIWARE
•
16 vistas
Take Control of Podcasting thanks to Open Source and Podcasting 2.0
🎙 Benjamin Bellamy
•
80 vistas
10 reasons to choose Galaxy Tab S9 for work on the go
Samsung Business USA
•
95 vistas
Accelerating Data Science through Feature Platform, Transformers and GenAI
FeatureByte
•
127 vistas
Product Research Presentation-Maidy Veloso.pptx
MaidyVeloso
•
43 vistas
"From Orchestration to Choreography and Back", Yevhen Bobrov
Fwdays
•
55 vistas
Information Virtualization: Query Federation on Data Lakes
1.
© 2015 IBM
Corporation Information Virtualization: Query Federation on Data Lakes Beate Porst porst@us.ibm.com Product Manager Information Server Jo Ramos joramos@us.ibm.com Distinguished Engineer – Big Data and Analytics @IBM
2.
© 2015 IBM
Corporation2 Agenda Data Lakes and Data Reservoirs Information Virtualization and Federation Examples of Federation and Best Practices Information Integration on Hadoop
3.
© 2015 IBM
Corporation3 The true value of Big Data is in context Raw data Feature extraction metadata Domain linkages Full contextual analytics Location risk Occupational risk Dietary risk Family history Actuarial data Government statistics Epidemic data Chemical exposure Personal financial situation Social relationships Travel history Weather history . . . . . . Patient records
4.
© 2015 IBM
Corporation4 A growing data demand … and organizational tensions Data Scientists seeking data for new analytics models. Marketer seeking data for new campaigns. Fraud investigator seeking data to understand the details of suspicious activity. Agility Data Access Freedom Any kinds of data Powerful Analysis & Visualization Security Data Privacy Standards .. Application Developer Knowledge Worker Lines of Business IT Organization
5.
© 2015 IBM
Corporation5 Why a Data Reservoir and Not a Lake Data flows in “naturally” and just sits there Built to extract value from the data Data Lake Data Reservoir
6.
© 2015 IBM
Corporation6 The Data Reservoir subsystems Data Reservoir Information Management and Governance Fabric Data Reservoir Repositories SandBox Master Data Management Cache Data Data Marts Operational Data Stores Information Warehouse (EDW) Deep Data (aka Hadoop, Aka Data Lake) Catalogue Self- Service Access Enterprise IT Data Exchange Raw Data Interaction Analytics Teams Governance, Risk and Compliance Team Information Curator Line of Business Teams Data Reservoir Operations Enterprise IT New Sources System of Record Systems of Engagement
7.
© 2015 IBM
Corporation8 Data Reservoir Logical Architecture Data Reservoir DataReservoir Repositories Harvested Data INFORMATION WAREHOUSE Descriptive Data INFORMATION VIEWS CATALOG Shared Operational Data ASSET HUB ACTIVITY HUB CODE HUB CONTENT HUB Deposited Data Historical Data DEEP DATA AUDIT DATA OPERATIONAL HISTORY SEARCH INDEX OFFLINE ARCHIVE Line of Business Applications Information Service Calls Search Requests Report Requests Deploy Decision Models Information Service Calls Data Access Deploy Real-time Decision Models Data Reservoir Operations Curation Interaction Management Data Access Data Deposit Data Deposit Decision Model Management Enterprise IT Events to Evaluate Information Service Calls Data Out Data In Other Systems Of Insight Notifications New Sources Third Party Feeds Third Party APIs Internal Sources Deploy Real-time Decision Models Understand Information Sources Understand Information Sources Understand Compliance Report Compliance Advertise Information Source Governance, Risk and Compliance Team Information Curator Catalog Interfaces Raw Data Interaction SAND BOXES Information Integration & Governance INFORMATION BROKER OPERATIONAL GOVERNANCE HUB CODE HUB WORKFLOWSTAGING AREAS GUARDSMONITOR Enterprise IT Interaction Service Interfaces Data Ingestion Publishing Feeds Continuous Analytics STREAMING ANALYTICS Other Data Reservoirs Consumers of Insight Simple, ad hoc Discovery and Analysis Reporting Analytical Insight Applications Analytics Tools View-based Interaction Access and Feedback Published SAND BOXES REPORTING DATA MARTS OBJECT CACHE System of Record Applications Enterprise ServiceBus Systems of Engagement EVENT CORRELATION
8.
© 2015 IBM
Corporation9 INFORMATION VIRTUALIZATION & FEDERATION
9.
© 2015 IBM
Corporation10 Information Virtualization hides the complexity of the information landscape Information Virtualization Report on Values View related Values Search Values Browse Sources Analyze Values Provision Information Provisioning Information Delivery Data Access APIs Semantic/Business Objects 10001 01010 01010 Data Scientist Line of Business
10.
© 2015 IBM
Corporation11 Different Styles of Information Provisioning Federation Replication Caching Consolidation Analytical & Reporting Tools Web Applications Product Performance Real-time Inventory Level Consolidation Headquarters Stores Primary Data Center Backup Data Center Replication Replication Cache Region 1 Product Performance Region 2 Product Performance Consolidation Replication Replication Database FederationFederation
11.
© 2015 IBM
Corporation12 Example – Integrating the enterprise across independent silos ETL transforming Data for consistency Global View Global View Silo 1 Silo 2 Silo 3 Silo 1 Silo 2 Silo 3 The optimal approach depends on how consistent the data is across the silos, how much spare capacity each silo has to support additional queries and the appropriate availability of all silos to answer a global query. Federated Queries Consistent Data Sources
12.
© 2015 IBM
Corporation13 Example – Creating a logical warehouse Deep Data (hadoop system) System of Record Requested View Information virtualization hides the complexities of where the data is located. Here different repositories are being used to host different workloads, but this complexity is hidden by the information virtualization layer. Detailed data maintained for exploratory analysis and investigations. Structured information optimized for complex analytics and reporting ?
13.
© 2015 IBM
Corporation14 Service Federation Semantic FederationDatabase Federation Virtual Information Collection 14 1 2 Information Federation Process 3 • Relational Data Only • SQL Pushdown • Challenges: • Query optimization • Out-of-memory • Complex SQL/joins • Data is combined in-memory Technology: SOA, Message Broker, Spark, BI & Reporting Tools • Challenges: - Performance (network, memory, etc.) • Use triple store and ontology to create the virtualized interfaces on- the-fly. New technology ie Spark • Challenges: • Query Optimization • Security
14.
© 2015 IBM
Corporation15 IBM FEDERATION SOLUTIONS
15.
© 2015 IBM
Corporation16 BigSQL Query Fluid (federation) Data never lives in isolation • Either as a landing zone or a queryable archive it is desirable to query data across Hadoop and active Data warehouses Big SQL provides the ability to query heterogeneous systems • Join Hadoop to other relational databases • Query optimizer understands capabilities of external system •Including available statistics • As much work as possible is pushed to each system to process Head Node Big SQL Compute Node Task Tracker Data Node Big SQL Compute Node Task Tracker Data Node Big SQL Compute Node Task Tracker Data Node Big SQL Compute Node Task Tracker Data Node Big SQL
16.
© 2015 IBM
Corporation17 BigInsights (hadoop) BIGSQL MPP Engine Relational Engines Relational Database Engines Applications User Interaction BigSQL Fluid Query: Federation to RDBMS Engines Local Data Sources SQL ?? Oracle Teradata Netezza DB2 1 7 Table-2 (local) Table-1 (local) Table-3 (local) File Formats Parquet CSV Seq RC Avro JSON Custom ORC Application needs to join Table-1, Table-2 and Table-3 HDFS & GPFS
17.
© 2015 IBM
Corporation18 BigInsights (hadoop) BIGSQL MPP Engine Federation Engine Relational Engines Relational Database Engines Applications User Interaction BigSQL Fluid Query: Federation to RDBMS Engines Local Data Sources SQL Oracle Teradata Netezza DB2 1 8 Table-2 (local) Table-1 (local) Table-3 (local) Table-2 (alias) Table-1 (alias) File Formats Parquet CSV Seq RC Avro JSON Custom ORC Application needs to join Table-1, Table-2 and Table-3 1. Create Alias for Table-1 and Table-2 on BigSQL Federation Engine. HDFS & GPFS
18.
© 2015 IBM
Corporation19 BigInsights (hadoop) BIGSQL MPP Engine Federation Engine Relational Engines Relational Database Engines Applications User Interaction BigSQL Fluid Query: Federation to RDBMS Engines Local Data Sources SQL • Joins, Predicates, Aggregation are pushed down to backend RDBMS engine to reduce data transfers. Oracle Teradata Netezza DB2 1 9 Table-2 (local) Table-1 (local) SQL Table-3 (local) Table-2 (alias) Table-1 (alias) File Formats Parquet CSV Seq RC Avro JSON Custom ORC SQL Application needs to join Table-1, Table-2 and Table-3 1. Create Alias for Table-1 and Table-2 on BigSQL Federation Engine 2. Query Optimizer engine push part of the SQL to be executed remote RDBMS. 3. Final Join/aggregation executed on BigSQL HDFS & GPFS ClientDriver Client Driver Data Access Data flow
19.
© 2015 IBM
Corporation20 IBM Fluid Query V1.0 Connectors: • Routes PDA (Netezza) queries to the top Hadoop providers Data movement: • Allows rapid data movement between PDA and Hadoop • PDA to Hadoop • Hadoop to PDA Initial Supported Hadoop SQL Query Engines • BigInsights – Hive2, BigSQL v1, BigSQL v3, BigSQL v4 • Hortonworks – Hive2 • Cloudera – Hive2, Impala Unifying PureData System for Analytics (PDA) with Hadoop
20.
© 2015 IBM
Corporation21 Applications User Interaction PureData for Analytics (Netezza) Netezza Fluid Query to Hadoop Engines NPS MPP Engine Fluid Query Table-1 (alias) Table-3 (local) SQL SQL Table-2 (alias) Joins , Predicates, Aggregation are applied on Hadoop via Views to minimize data transfers. Final Joins, Predicates and aggregation are applied on Netezza. ClientDriver ClientDriver Application needs to join Table-1, Table-2 and Table-3 2 1 Impala / Hive BigSQL Table-1 (local) Table-2 (local) SQL Local Data Sources File Formats Parquet CSV Seq RC Avro JSON ORC HDFS Data flow
21.
© 2015 IBM
Corporation22 Query Federation Best Practices Avoid Complex Joins Across Multiple Disparate Repositories • Example: Join tables from BigSQL, Oracle, Teradata, Netezza on same SQL. • Consider other techniques (copy data local, caching, etc.) Keep statistics current on every Table part of the Federated System • Statistics are critical for query optimization. Watch out for network bandwidth and traffic • You can overload network with large data transfers (intermediate results need to be generated) Consider Implement Workload Management and Query Governor • Avoid a federated query to overload an system. Avoid Complex Data Transformations (in-flight transformation) • Can impact any of the involved systems
22.
© 2015 IBM
Corporation23 When Apply Federation Build multi-temperature data systems • Hot/Cold/Warm data on different repositories Data Dynamically changing, in particular schema evolution. Federated queries can perform reasonable without impact any of systems involved Real-time access to small set of data on distributed systems When remote data can not be moved to local • Regulatory issues Number of federated queries is manageable
23.
© 2015 IBM
Corporation24 Some considerations to provide access to information Access in place Up-to-date information Cost-effective Slower access path • Remote Access • Reformatting Make a local copy Specially formatted for use case Local data access Local control Local cost Potentially stale values Consider this questions and make the best choice • How much information? • How rapidly is it changing? • How frequently is it accessed? • How much transformation is required to consume the information? • When is the information available? • Who owns the information? • How easily can it be changed?
24.
© 2015 IBM
Corporation25 IBM INFORMATION SERVER FOR HADOOP
25.
© 2015 IBM
Corporation26 The Data Reservoir subsystems Data Reservoir Information Management and Governance Fabric Data Reservoir Repositories SandBox Master Data Management In-Memory Cache Data Marts Operational Data Stores Information Warehouse (EDW) Deep Data (aka Hadoop, Aka Data Lake) Catalogue Self- Service Access Enterprise IT Data Exchange Raw Data Interaction Analytics Teams Governance, Risk and Compliance Team Information Curator Line of Business Teams Data Reservoir Operations Enterprise IT New Sources System of Record Systems of Engagement
26.
© 2015 IBM
Corporation27 IBM Confidential IAP PMOM Std DCP Template – V1 May, 2015 Introducing IBM Information Server for Apache Hadoop: Information Empowerment for Your Hadoop Environment Superfast data ingest and processing Integrate, prepare and enrich data with speed and confidence running natively on Hadoop with speeds 10-15x faster than MapReduce Complete confidence in your data Understand what data is available and where it came from monitor and cleanse quality of data; catalog metadata assets and trace lineage Higher Level of Productivity Develop integration processes much faster than with hand coding – based on existing enterprise skills graphical data flow development environment with 100s of prebuilt stages and 1000s of prebuilt functions no other vendor has this scale or speed extend existing leadership into hadoop domain proven development paradigm
27.
© 2015 IBM
Corporation28 IBM Confidential IAP PMOM Std DCP Template – V1 May, 2015 • Optimize your integration and DQ workload based on data locality and resources availability • Design your transformation or cleansing once and run it on your Hadoop cluster, on your traditional engine or optimize to run on your database Traditional ETL EngineDatabases Execute “Anywhere” One Integration & Quality Design Maximize your IT resources utilization through “anywhere” execution this release adds this pattern to run natively on the hadoop cluster
28.
© 2015 IBM
Corporation29 zzzz z z z Questions?
29.
© 2015 IBM
Corporation30 REFERENCE MATERIAL New Information Architectures and Capabilities