Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/ptGwp7
Curious about product roadmap? In this session, we will review some of the new key features introduced this year in the Denodo Platform in areas such as performance, self-service, security and monitoring. We will also take a sneak peek at the most exciting features in the roadmap for Denodo 7.0.
In this session, you will learn:
• New performance-related features in big data scenarios
• New governance and self-service features
• New connectivity, data transformation, and enterprise-wide deployment features
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
1. O C T O B E R 1 8 , 2 0 1 6 S A N F R A N C I S C O B A Y A R E A , C A
#DenodoDataFest
RAPID, AGILE DATA STRATEGIES
For Accelerating Analytics, Cloud, and Big Data Initiatives.
2. What’s New in Denodo Platform
Dr. Alberto Pan
Denodo, CTO
5. Main Areas
Dynamic Query Optimizer for Big Data
(Denodo 6)
Incremental queries (Denodo 6 Updates)
Embedded in-memory fabric (Denodo 7)
New Information Self-Service Tool (Denodo 6)
Information Self-service: Glossary and
Collaboration Features (Denodo 7)
▪ Tighter integration with Data Governance and
Data Modeling Tools (Denodo 7)
Workload Management: Denodo Resource
Manager (Denodo 6)
Monitoring and Diagnostic Tool (Denodo 6
Updates)
Solution Manager (Denodo 7)
New VDP Admin Tool (Denodo 6)
GIT Support (Denodo 6)
▪ Support for new data sources and publishing
formats (continuous work)
▪ New Data Types (Denodo 7)
Performance in BigData Scenarios
Security, Governance and Self-service
Enterprise Wide Deployments
Connectivity and Data Transformation
6. Move Processing to the Data
Process the data where it resides
Process the data locally where
it resides
DV System combines partial
results
Minimizes network traffic
Leverages specialized data
sources
7. 7
How to Choose the Best Execution Plan?
Cost-Based Optimization in Data Virtualization
Data statistics to estimate size of intermediate result sets
Data Source Indexes (and other physical structures)
Execution Model of data sources: e.g. Parallel Databases VS
Hadoop clusters VS Relational Databases
Features of data sources (e.g. number of processing cores in
parallel database or Hadoop Cluster)
Data Transfer rate
Must take into account:
8. 8
Denodo has done extensive testing using queries from the standard benchmarking test
TPC-DS* and the following scenario
Compares the performance of a federated approach in Denodo with an MPP system where
all the data has been replicated via ETL
Customer Dim.
2 M rows
Sales Facts
290 M rows
Items Dim.
400 K rows
* TPC-DS is the de-facto industry standard benchmark for
measuring the performance of decision support solutions including,
but not limited to, Big Data systems.
vs.
Sales Facts
290 M rows
Items Dim.
400 K rows
Customer Dim.
2 M rows
Denodo 6.0 Architecture
Performance Comparison – Logical Data Warehouse vs. Physical Data Warehouse
9. 9
Denodo 6.0 Architecture
Query Description
Returned
Rows
Time Netezza
Time Denodo
(Federated Oracle,
Netezza & SQL Server)
Optimization Technique
(automatically selected)
Total sales by customer 1,99 M 20.9 sec. 21.4 sec. Full aggregation push-down
Total sales by customer and
year between 2000 and 2004
5,51 M 52.3 sec. 59.0 sec Full aggregation push-down
Total sales by item brand 31,35 K 4.7 sec. 5.0 sec. Partial aggregation push-down
Total sales by item where
sale price less than current
list price
17,05 K 3.5 sec. 5.2 sec On the fly data movement
Performance Comparison – Logical Data Warehouse vs. Physical Data Warehouse
10. 10
Incremental Queries
New Caching Mode for SaaS Data Sources
Merge cached data with delta
changes from the data source
Real-time results with minimum
latency
Data source needs to provide a
way to obtain the delta changes
Get Leads Changed
/ Added since
1:00AM
CACHE
Leads updated
at 1:00AM
Up-to-date Leads
data
11. Full Cache – Incremental queries
Configuration
1. Cached data
3. Merged
based on PK
2. New data
from source
11
12. 12
Parallel In-Memory Fabric
Embedded in-memory fabric fully integrated with cost optimization (Denodo 7)
Embedded in-memory fabric
MPP processing of costly
local processing operations
External in-memory fabrics
supported
Integrated with cost-based
optimization
13. Main Areas
Dynamic Query Optimizer for Big Data
(Denodo 6)
Incremental queries (Denodo 6 Updates)
Embedded in-memory fabric (Denodo 7)
New Information Self-Service Tool (Denodo 6)
Information Self-service: Glossary and
Collaboration Features (Denodo 7)
▪ Tighter integration with Data Governance and
Data Modeling Tools (Denodo 7)
Workload Management: Denodo Resource
Manager (Denodo 6)
Monitoring and Diagnostic Tool (Denodo 6
Updates)
Solution Manager (Denodo 7)
New VDP Admin Tool (Denodo 6)
GIT Support (Denodo 6)
▪ Support for new data sources and publishing
formats (continuous work)
▪ New Data Types (Denodo 7)
Performance in BigData Scenarios
Security, Governance and Self-service
Enterprise Wide Deployments
Connectivity and Data Transformation
14. 14
Information Discovery and Self-Service (1)
Graphically Expose Data Views to Business Users
Search and Query Data and
Metadata
Browse data associations
Transform and combine views
Publish results to Denodo or
your favourite reporting tool
Find more details at: datavirtualization.blog
http://www.datavirtualizationblog.com/data-exploration-and-
self-service-bi-welcome-to-the-dataweb/
19. 19
Information Self-Service Tool: 6.0 Updates
Enhancements in 6.0 Updates
Support for Solr, Elastic
Search in Global Search
See folders structure
See web services
Improved metadata search
And Support for specifying
field descriptions
20. 20
Information Self-Service Tool: Denodo 7 (1)
Extended metadata and Components Catalog
Categorized/Tagged catalog of
data components to associate
views and business terms
Extended metadata fields
Ability to Edit Metadata
21. 21
Information Self-Service Tool: Denodo 7 (and 2)
Governance and Collaboration Features
Publish / share new
components to the catalog
Governance:
- Approval process
- Stewards
Public and private comments
22. Main Areas
Dynamic Query Optimizer for Big Data
(Denodo 6)
Incremental queries (Denodo 6 Updates)
Embedded in-memory fabric (Denodo 7)
New Information Self-Service Tool (Denodo 6)
Information Self-service: Glossary and
Collaboration Features (Denodo 7)
▪ Tighter integration with Data Governance and
Data Modeling Tools (Denodo 7)
Workload Management: Denodo Resource
Manager (Denodo 6)
Monitoring and Diagnostic Tool (Denodo 6
Updates)
Solution Manager (Denodo 7)
New VDP Admin Tool (Denodo 6)
GIT Support (Denodo 6)
▪ Support for new data sources and publishing
formats (continuous work)
▪ New Data Types (Denodo 7)
Performance in BigData Scenarios
Security, Governance and Self-service
Enterprise Wide Deployments
Connectivity and Data Transformation
23. 23
Denodo Resource Manager
Controlled Resource Allocation
1 Defines a rule that will be
triggered for “app1” and users
with the role “reporting”
2 For those request that fulfill the rule, if the
CPU usage is greater than 85%, will apply the
following:
• Reduce thread priority
• Reduce the number of concurrent requests
• Limit the number of queued queries
24. 24
Monitor current state of
servers and clusters
Inspect sessions, queries
(with real-time trace),
connections,...
Inspect data sources
activity, cache load
processes and content,...
Monitoring and Diagnostic Tool (1)
Graphical Monitoring and Diagnosing of Servers and Clusters
Go back in time to the
moment where a problem
happened
Diagnose root cause of the
problem
25. 25
Monitoring and Diagnosing Tool (2)
Graphical Monitoring and Diagnosing of Servers and Clusters
State: Summary of the state of the server/environment
Resources: physical resources (memory, cpu,…)
Requests: including real-time execution trace
Session: Currently opened sessions, including client application
Cache: cache load processes, cache contents,...
Datasources: pools state, active requests,...
Threads: priorities, CPU usage,...
Errors: Inspect logged errors and warnings
… and many others
Filter and sort information by any criteria
26. 26
Monitoring and Diagnosing Tool (3)
Automatic Alerts (Denodo 6.0 Updates)
Server down
Data Source or Cache Down
% CPU Usage
Connection Pool full
…
Alerts (Visual / E-Mail):
27. 27
Monitoring and Diagnostic Tool (and 4)
Pre-defined Reports (Denodo 7)
Pre-defined graphical usage
reports)
• Workload breakdown by
application
• Most used views
• Requests per Data Source
• …
28. 28
28
Denodo Solution Manager
Make it easier to manage large Deployments (Denodo 7)
Catalog of all elements of a
Denodo deployment
Manage licenses configuration,
logs and extensions
Automate migrations
Integrated governance
workflows
29. 29
Automate Migration Between Environments
Overview of the Migration Process in Denodo 7 (Simplified)
S11
denodo-prd-1
S21
denodo-prd-2
S12
S22
S13
S23Solution
Manager
Properties DB
Developers Migration Admins
Development
Production
1. Select
Elements
to Migrate
2. Validate
Revision
VCS
4. Deploy Revision
5. Save full VQL
after Revision
Load Balancer
3. Register
Revision
30. Main Areas
Dynamic Query Optimizer for Big Data
(Denodo 6)
Incremental queries (Denodo 6 Updates)
Embedded in-memory fabric (Denodo 7)
New Information Self-Service Tool (Denodo 6)
Information Self-service: Glossary and
Collaboration Features (Denodo 7)
▪ Tighter integration with Data Governance and
Data Modeling Tools (Denodo 7)
Workload Management: Denodo Resource
Manager (Denodo 6)
Monitoring and Diagnosing Tool (Denodo 6
Updates)
Solution Manager (Denodo 7)
New VDP Admin Tool (Denodo 6)
GIT Support (Denodo 6)
▪ Support for new data sources and publishing
formats (continuous work)
▪ New Data Types (Denodo 7)
Performance in BigData Scenarios
Security, Governance and Self-service
Enterprise Wide Deployments
Connectivity and Data Transformation
32. New VDP Admin Tool (and 2)
Collapsable
Work Areas
32
33. 33
New adapters for Spark, Redshift and Snowflake (already
available), Presto DB (Q1 2017), Neo4j (Denodo 7)
New adapters for Denodo in IBM Cognos and Looker (already
available), Tableau (Q4 2016)
Extended set of geospatial functions and GeoJSON support (Denodo
7)
Continuous work on transformation functions
Connectivity:
Other Enhancements
Transformation / Integration: