Más contenido relacionado
La actualidad más candente (20)
Similar a Lecture about SAP HANA and Enterprise Comupting at University of Halle (20)
Lecture about SAP HANA and Enterprise Comupting at University of Halle
- 2. 2© AOK Systems GmbH 2013
Content
HANA Architecture and Use Cases
Enabling Quantitative Approaches
Summary
Chances for Development & IT Management
- 3. 3© AOK Systems GmbH 2013
SAP HANA - former „High Performance Analytic Appliance“
HANA
is a hardware device from certified vendors with integrated firmware
has standard DBMS features: ACID properties, high availability, SQL and MDX. It
is fully MVCC with regular capabilities like statement level and snapshot isolation
has specialized engines (calculation and planning engine) and proprietary
languages: SQL Script, RDL, …
supports pushing calculations down to the database level by IMSL, R and
specialized libraries for Data Mining, Machine Learning, Statistics, Optimization
and financial mathematics
SAP is working on multi-tenancy - so far only certain scenarios are supported for
customers
supports text analysis, indexing and search – support of geospatial data was
announced
support of temporal tables
- 4. 4© AOK Systems GmbH 2013
HANA Hardware
© Hitachi
Different hardware vendors offer appliances: Cisco, Dell, Fujitsu, Hitachi, HP,
IBM, NEC – the solutions differ in details
SAP HANA is running on Intel's Westmere-EX / E7 processors; Intel and SAP
collaborated to optimize HANA for those CPUs
A single HANA node has 128GB * number of CPUs of RAM, a CPU has 20 cores
HANA uses Fusion-io flash drives as log space that have the same size as RAM
(logs are written after each transaction)
In addition HANA has disk storage for persistency
which is about 40 times of RAM size
HANA scales out and can be installed on multiple
nodes. So far scale out scenarios have been used SAP
internally but they have been certified for public use
recently - experts expect nearly linear scaling
- 5. 5© AOK Systems GmbH 2013
HANA as a multi-core Platform
HANA is an In-Memory database optimized for multicore technology: as much as
possible is kept in CPU and caches - usage of storage hierarchy for persistence
Ailamaki et al. showed that RDBMS doesn‘t work optimally on multicore
processores and has up to 50% idle times (see DBMSs on a modern processor:
Where does time go?, Proceedings on the 25th International Conference on Very
Large Databases (VLDB), 1999)
Since in the last years CPU got faster by using multi-cores and not increasing
clock rate, SAP decided to create a platform that is optimized for parallel
execution
Up to SAP’s information HANA scales up linearly in size of HANA RAM
- 6. 6© AOK Systems GmbH 2013
Advantages of Column Stores
In a column store data is stored using special encodings that save the value and
the number of occurings in a row (see Plattner, A Common Database Approach
for OLTP and OLAP Using an In-Memory Column Database, SIGMOD 2009)
This leads to new possibilities and drastic performance gains:
- data can be loaded very fast into CPU
- column operations esp. aggregates can by performed very efficiently
- additional indexes (especially materialized) can be eliminated
- operations on multiple columns can be parallized on multiple cores
Further optimization possible using an insert-only approach to avoid expensive
update operations (see Copeland and Khoshafian, A Decomposition Storage
Model, Proceesings of the 1985 ACM SIGMOD International Conference on
Management of Data, Austin, Texas, p. 268-279, ACM Press)
- 7. 7© AOK Systems GmbH 2013
Advantages of Column Stores in an ERP Environment
Krueger et al. (see Krueger et al., Enterprise
data management in mixed workload
environments, 16th International Conference
on Industrial Engineering and Engineering
Management, 2009) showed that in typical
ERP systems most of the columns contain
only a few disctinct values. The figure shows
first 10 out of 98 columns of an accounting
header table in descending order
Most SQL queries work on only 10% of the rows (see Plattner, A Common
Database Approach for OLTP and OLAP Using an In-Memory Column Database,
SIGMOD 2009) which makes data access in column stores fast
- 8. 8© AOK Systems GmbH 2013
HANA Architecture in a Nutshell
technical foundations prototyped in SanssouciDB at HPI. SAP integrated TREX
search engine, P*Time and MaxDB for persistence
Planning Engine for execution of
basic financial planning operations
Calculation Engine as common infrastruc-
ture that can be accessed using SQL Script
Extended Application Services
(server-side JavaScript for
light-weight applications) make it
possible to expose data and queries
using REST interfaces
Programming on database level using
L (a restricted subset of C++), and C++
(so far not released for customers)
© SAP
- 9. 9© AOK Systems GmbH 2013
Calculation Engine as Common Execution Runtime
An overview of the HANA architecture is given in Franz Faerber et al. In The SAP
HANA Database – An Architecture Overview, IEEE Data Engineering Bulletin,
Volume 35
The Calculation Engine is a common execution runtime that is able to optimize
and execute calculation models (see Bernhard Jaecksch, Franz Faerber, Wolfgang
Lehner: Cherry picking in database languages. IDEAS 2010: 117-122. 2007) from
various domain specific languages
This approach is very flexible and extensible because the calculation model is a
data flow graph whose nodes can contain operations from various operators
that can integrate different frameworks accessible from the execution
environment – the column store as well as specialized DSLs
So the Calculation Engine introduces the first level of parallelization, the called
operators (especially column store) can introduce further parallelization
accessing a single row with different processes as well splitting them into
multiple partitions in distributed scenarios (scale-out)
- 10. 10© AOK Systems GmbH 2013
Excursus: HANA and Exalytics have different Architecture &
Technology
HANA keeps the data in Dynamic Random Access Memory, Solid-State-
Disks/Flash devices are used for persistence. A 0.5 TB appliance usually has 2 TB
SSD storage for persistence. In contrast Oracle‘s Exadata appliance keeps most
data in SSD/Flash.
HANA is an integrated solution for OLTP, BI and predicitive analysis. Exalytics
consists of different components. When using Oracle Exalytics data are
replicated for read-only scenarios: into TimesTen database for reporting and into
Essbase OLAP Engine for forecasting.
HANA uses columnar storage, special encodings and RAM- and processor cache-
aware algorithms which is quite similar to Sybase IQ or hBase database. Exalytics
(exactly its TimesTen engine) does provide so called hybrid columnar
compression scheme.
There are scale-out scenarios for HANA (at the moment only SAP internal but not
yet released for SAP Business Suite) but Exalytics has not.
- 11. 11© AOK Systems GmbH 2013
Current HANA Research: Graph Data Structures and Processing
in the Data Management Platform
Current Research and possible applications are described in Rudolf et. al, The
Graph Story of the SAP HANA Database, at 15. GI-Fachtagung
Datenbanksysteme für Business, Technologie und Web, 11. März - 15. März 2013
A software layer called Active Information Store was created on top of the
column store which allows is a generalization of directed multigraphs with
attributes on vertices and edges as well as hierarchies of attributes (taxonomies)
For graph manipulation, query, graph traversal and BI-like aggregation the WIPE
language („Weakly structured Information Processing and Exploration“) was
invented, see Bornhövd et al., Flexible Information Management, Exploration
and Amalysis in SAP HANA, Proceedings of the International Conference on Data
Technologies and Applications, pages 15-28, SciTePress, 2012
- 12. 12© AOK Systems GmbH 2013
Current SAP HANA Use Cases
SAP Business Warehouse on HANA
SAP Business Suite on HANA
Accelerators and Rapid Deployment Solutions:
- Customer Segmentation
- Financial and Controlling
- Operational Intelligence
- Sales Pipeline Analysis
- Smart Meter Analysis (Utilities)
Also for non-SAP data:
personalized cancer therapy
real time offer management for online games
by Big Point
real time analysis & simulation for Formula One
by McLaren
- 13. 13© AOK Systems GmbH 2013
AOK – Business at Large Scale
AOK has market share 34% - we have to work on mass data:
24 millions of insurants
54.500 employees
370 million medical treatments by resident physicians per year
6 million hospital treatments per year
400 million prescription of medicaments per year
Our mission:
optimal service for insured people
continuous improvement of quality of teatment and prevention
optimal allocation of costs
- 14. 14© AOK Systems GmbH 2013
HANA @ AOK – Operations at Large Scale
We have to automize all business processes and create complex workflows only for
the relevant items. But what is relevant? We need insight for making decisions:
prediction based on operational data: how much will a treatment cost?
selection of insured people for Disease Management Programs and campaigns
simulation: „What happens if we change fraud detection rulesets?“
fast navigation in huge data sets of structured and non-structured data
measuring market campaign response
anomaly detection
cross selling, up and down selling of insurance products
finding hidden patterns in data
- 15. 15© AOK Systems GmbH 2013
HANA @ AOK – Analytical Applications
Results from HANA queries:
„Diabetic foot syndrome“ is a prediction of possible
amputation within next 3 months that is used to identify
candidates for disease management programs. We
simplified the query to 250 lines of SQL.
BW processing time could be reduced by 60% and by 80%
after a redesign which also improved runtime on traditional
DB. We have the same code line for HANA and
non-HANA BW
Most BW queries became 20 times faster
- 16. 16© AOK Systems GmbH 2013
Content
HANA Architecture and Use Cases
Enabling Quantitative Approaches
Summary
Chances for Development & IT Management
- 17. 17© AOK Systems GmbH 2013
In-Memory Computing simplifies Queries and Data Models
SQL-statements become easier using set theoretic SQL
start to do operational reporting directly on OLTP systems
with traditional databases often we have to persist results of calculations like
aggregations, using HANA this is only necessary if calculations are complex and
contain values from external systems or have to be persisted because of
compliance
performing more and more aggregations on the fly leads to simplification of the
code and aggregated values are up to date
Business Warehouses processing gets faster and simpler if we remove complex
staging & materializations
faster response provides more insight into data and reduces development cycles
- 18. 18© AOK Systems GmbH 2013
Challenge #1 – Evolution of existing Business Applications
New applications based on HANA can be developed in various programming
languages. SAP Business Suite and SAP Business Warehouse are database
agnostic and can benefit directly from HANA. Furthermore:
SAP performs optimizations of programs and frameworks to make them use of
HANA proprietary features
the ABAP language and infrastructure is evolved to support HANA specific
features
That implies challenges for software engineering:
new development patterns for code pushdown beyond stored procedures
new programming models for efficient transactional applications
evolution of existing applications to run more efficiently using HANA
- 19. 19© AOK Systems GmbH 2013
Challenge #2 – Topics for Research in In-Memory Analytics
Real-time Data Warehousing is complex: HANA knows the concepts of
temporal tables but BW processing consists of complex processing steps
which makes temporal queries non-trivial
OLTP reporting is more difficult compared to OLAP reporting:
- an Enterprise Data Warehouse has a governance of the data model
- there are no deletions, data is preprocessed to ensure consistence
- cleansing process of data, enrichement and completion
Advances in OLTP reporting will lead to convergence of OLAP and OLTP. More
and more analytics will performed directly on operational data
- 20. 20© AOK Systems GmbH 2013
Why is this an Inflection Point for IT Architecture
Management?
Today‘s IT system landscape are „best of breed“, heterogeneos and diverse
They consist of
- standard software for operations
- individual software
- highly specialized software f.e. for statistical and optimization
- platforms for edge-innovation
- OLAP systems with complex ETL processes
HANA can be used as data storage but also as development platform for all
above mentioned systems (SAP and non-SAP)
Architects of enterprise IT can use it to identify complexity and latency in IT
landscapes use it for simplification
- 21. 21© AOK Systems GmbH 2013
Complexity in IT Landscapes
Enterprise Architecture separated OLAP and OLTP. This produces latency and
complexity because of ETL szenarios
The same pattern is applied in other cases:
- often data in mainframe systems is replicated from VSAM data files/IMS
into an RDBMS to give client-server applications or other systems (OMS f.e.)
access to those data
- even data from operational SAP system are often replicated to avoid direct
access from external applications
Remark:
1. From my point of view service orientation couldn‘t solve this problem. Studies
(see D. Krafzig et al., Enterprise SOA, Prentice Hall PTR, Eaglewood Cliffs, 2006)
say, that the overall reuse factor of a service is 1.6
2. I don‘t know much scientific work about metrics of IT Landscapes & reasons
for latency – this could be topic of thorough research
- 22. 22© AOK Systems GmbH 2013
Complexity and Latency in Enterprise Resource Planning
definition of
business rules
implementation
and test of
business rules
working with
business
rules
data
extraction
data
processing
and analysis
Today‘s ERP systems aren‘t agile enough:
every step of this process on the right can
take weeks
How to speed the whole process up?
- operational reporting: analyzing huge
amount of operational data, even
real time data
- getting faster insight into data by
performing queries in real time instead
of hours
- simulation of changes of business rules
in transactional systems
- 23. 23
latency produces workarounds that
increase complexity of IT landscape
platform for edge-innovation increase
complexity, too, if they require new data
flow and data integration
virtualization & enterprise services buses
provide help, nevertheless IT governance
and releases planning are complex tasks:
data flow is complex, changes take time
if a solution or a change is delivered too
late business users will create
workarounds that increase complexity
escpecially if data is written back from
workaround systems into operational
systems
© AOK Systems GmbH 2013
Workarounds and Edge-Innovation increase Complexity
CRM ERP
central
CRM
central
ERP
HCM
HCM
BW
BW
SRM
non
SAP
non
SAP
Portal
work
around
work
around
specialized
system
work
around
IT systems and data flows
- 24. 24© AOK Systems GmbH 2013
Content
HANA Architecture and Use Cases
Enabling Quantitative Approaches
Summary
Chances for Development & IT Management
- 25. 25© AOK Systems GmbH 2013
In-Memory Computing and Decision Making
With In-memory technology you can help users of IT systems:
users benefit from Google-like search functions
navigation in huge datasets
access all data for a customer
faster segmentation for campaigns in customer relationship management
Business Intelligence and Data Mining on operational data
simulation of changes of business rules based on operational data
performing predictions
solving optimization problems
HANA is an enabler for quantitative methods in the area of operation: decision
making and optimization
- 26. 26© AOK Systems GmbH 2013
Challenge #3 – Quantitative Methods for Business Insight
are used only in a few lines of Businesses
The biggest strenght of HANA is not speed. It is a calculating engine providing
business insight and is an enabler for decision making. This requires more skills
from Statistics, Data Mining and Machine Learning. But:
only a few lines of businesses frequently use mathematical methods: finance,
insurance, logistics (supply chain management)
developers need skills in Business Intelligence and Business Warehouse
foundations: key figures, measures, star schemas, hierarchies and other
concepts directly supported by HANA using attribute and calculation views that
operate on top of Calculation Engine
isolated skills aren‘t enough – we need skills of a „Data Scientist“ in companies
that work with „Big Data“ (Facebook, Google, Amazon)
methods from Operations Research are even more seldom used than other
quantitative approaches
- 27. 27© AOK Systems GmbH 2013
Does Data Speak for Itself?
Taken from „What Data Doesn‘t Do“ by Coco Krumme in „Beautiful Data“
- 28. 28© AOK Systems GmbH 2013
Can Simple Statistics Help?
Taken from „What Data Doesn‘t Do“ by Coco Krumme in „Beautiful Data“
- 29. 29© AOK Systems GmbH 2013
Challenge #4 – Skill Management in the Enterprise
To use the full potential of HANA we need mathematical skills (visualization of
huge data sets, predicitve analytics and simulation) – unfortunately those
skills are rare
Developers need skills with mathematical standard software (R, IMSL)
BI experts don‘t know OLTP data models - programmers usually have limited
BI skills
Many BW experts are afraid of using virtual data sources and prefer
materialized aggregations instead
BI experts and experts from operations usually don‘t work in the same
organizational units
- 30. 30© AOK Systems GmbH 2013
Challenge #5 – Innovation Management in Enterprises
We accepted limitations of traditional database systems since years and have
„scissors in mind“
Because IT people tend to think like engineers in solutions SAP established the
method of „Design Thinking“ – here a definition from Wikipedia:
„As a style of thinking, design thinking is generally considered the ability to
combine empathy for the context of a problem, creativity in the generation of
insights and solutions, and rationality to analyze and fit solutions to the
context.”
- 31. 31© AOK Systems GmbH 2013
Content
HANA Architecture and Use Cases
Transformation of Enterprise IT
Enabling Quantitative Approaches
Summary
New Development Patterns
- 32. 32© AOK Systems GmbH 2013
My Personal Conclusion
With HANA we can build new types of business applications
HANA makes existing SAP and non-SAP solutions faster and more flexible which
leads to more agility
HANA is the first step towards convergence of OLAP and OLTP
Enterprise Architects can use HANA to simplify corporate IT landscapes
Software developers have the chance to use more quantitative approaches in
business and bring it near to operations
Therefore we need new skills in the enterprise: classical BI, statistics, data
mining, traditional data warehousing, machine learning, optimization and
business domain
- 33. 33
OLTP reporting: where to perform data cleansing, enrichment and completion?
how to achieve consistent time-awareness?
software engineering: programming models that allow code pushdown of
business logic to the database
software evolution: how to evolve systems and IT-Landscape to profit from In-
Memory Technology? how can we push down code to the database and still
keep maintainability and one codeline?
solving large scale optimization problems on HANA: strengths and weaknesses of
the current architecture & libraries
advanced business rules on the database: monotone and non-monotone
reasoning.
© AOK Systems GmbH 2013
Some HANA relevant Research Topics
- 34. 34
Graph Based Search and Graph Based Data Minining: so far Semantic
Technologies provided solutions but didn‘t scale
Combination of Graph Based Data Mining and traditional Data Mining
Complex Event Processing and SOA integration: With HANA we can store event
streams (RFID events from manufacturing, clicks in webshops etc.) – how can we
define alerts and notifications from those data and publish them in a SOA?
Multi Criteria Decision Making (see Kou, Miettinen and Shin, „Multiple Criteria
Decision Making: Challenges and advancements“, Journal of Multi-Criteria
Decicion Analysis, vol. 18, 2001)
© AOK Systems GmbH 2013
Research Projects where HANA is promising
- 35. 35
code pushdown of very complex rulesets, f.e.
- checks according provisions regulating benefits of the German Social Code
- automated agent determination for worflows
expert systems for advanced process automation:
- accident questionnaires contain narrative text that has to be evaluated
using business rules that also need data from the backend
- automated fine tuning of those rulesets
© AOK Systems GmbH 2013
Some Challenges at AOK
- 36. 36© AOK Systems GmbH 2013
Challenge #6 – Invention vs. Adoption
I presented examples for research topics that could be tackled using HANA as a
platform. Last but not least a personal advice: academia created innovative
technology but why aren‘t they ubiquitous in industry?
This is an acid test for prototypes:
Do they work with real data?
Are they able to work with huge data sets?
Can business people use them?
Are they so easy to use like a mobile app?
If parts of the domain changes (business rules,
compliance…), can you adapt the application within
short time?
© DB AG
- 38. 38© AOK Systems GmbH 2013
Information about SAP‘s In-Memory Data Management
General Information:
- www.experiencehana.com
- www.scn.sap.com
- help.sap.com/hana
Training material
- open.sap.com
- www.saphana.com/community/implement/hana-academy
- openhpi.de/course/inmemorydatabases
Starting point for search for scientific HANA research:
www.informatik.uni-trier.de/~ley/pers/hd/f/F=auml=rber:Franz.html
- 39. 39© AOK Systems GmbH 2013
SAP University Alliance
Informationen unter scn.sap.com/community/uac
SAP HANA @ Universities: scn.sap.com/community/uac/hana
SAP gives access to:
- 30 Tage Free Trial License
HANA in the Cloud
- training material
- special prices for HANA access
- SAP HANA Demo
Cloud environment for
Universities