Más contenido relacionado
La actualidad más candente (18)
Similar a Slidedeck Mehr als Reporting - Datenanalysen mit Oracle R Enterprise - DOAG Development and DOAG SIG BigData 2014 (20)
Slidedeck Mehr als Reporting - Datenanalysen mit Oracle R Enterprise - DOAG Development and DOAG SIG BigData 2014
- 2. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Mehr als Reporting –
Datenanalysen mit Oracle R Enterprise
Dr. Nadine Schöne
Sales Consultant
Oracle Direct, Sales Consulting
Dr. Michael Haupt
Principal Member of Technical Staff
Oracle Labs, Virtual Machine Research Group
25. September 2014
- 3. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
- 4. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Agenda
Mehr als Standard Reporting?
Weiterführende Datenanalysen
R und Oracle R Enterprise (ORE)
Demo
Benefits
Ausblick: Mehr Performance für R
1
2
3
4
5
4
6
- 5. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Mehr als Standard Reporting?
5
- 6. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Reporting
6
- 7. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Weiterführende Datenanalysen
7
- 9. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Sensordaten-Analyse I
9
200.000 Haushalte
3 Jahre
1 Messung/Stunde
5.256 Mrd. Messwerte
(2.628 Messwerte/Kunde)
- 10. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Sensordaten-Analyse II
10
10 s/Modell
200.000 Haushalte
➔
200.000 Modelle
23 Tage + 4 Stunden 4,3 Stunden
Oracle R
Enterprise
- 11. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R Screenshots
- 12. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics
• Data Understanding & Visualization
– Summary & Descriptive Statistics
– Histograms, scatter plots, box plots, bar charts
– R graphics: 3-D plots, link plots, special R graph types
– Cross tabulations
– Tests for Correlations (t-test, Pearson’s, ANOVA)
– Selected Base SAS equivalents
• Data Selection, Preparation and Transformations
– Joins, Tables, Views, Data Selection, Data Filter, SQL time windows, Multiple
schemas
– Sampling techniques
– Re-coding, Missing values
– Aggregations
– Spatial data
– R to SQL transparency and push down
• Classification Models
– Logistic Regression (GLM)
– Naive Bayes
– Decision Trees
– Support Vector Machines (SVM)
– Neural Networks (NNs)
• Regression Models
– Multiple Regression (GLM)
– Support Vector Machines
Große Bandbreite an In-Database Data Mining und statistischen Funktionen
Clustering
– Hierarchical K-means
– Orthogonal Partitioning
– Expectation Maximization
Anomaly Detection
– Special case Support Vector Machine (1-Class SVM)
Associations / Market Basket Analysis
– A Priori algorithm
Feature Selection and Reduction
– Attribute Importance (Minimum Description Length)
– Principal Components Analysis (PCA)
– Non-negative Matrix Factorization
– Singular Vector Decomposition
Text Mining
– Most OAA algorithms support unstructured data (i.e. customer
comments, email, abstracts, etc.)
Transactional Data
– Most OAA algorithms support transactional data (i.e. purchase
transactions, repeated measures over time)
R packages—ability to run open source
– Broad range of R CRAN packages can be run as part of database
process via R to SQL transparency and/or via Embedded R mode
* included in every Oracle Database
Deskriptive Datenanalyse & Visualization
Klassifikations- & Regressions Modelle
Clustering
Verwendung von
Open Source R packages
Daten Aufbereitung & Transformationen
- 13. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Wichtige Themen für Enterprise Data Analytics
1. Skalierbarkeit
2. Performance
3. Entwicklung &
Produktion
- 14. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R und Oracle R Enterprise (ORE)
14
- 15. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Aspekte herkömmlicher R/Datenbank-Interaktion
15
R logo © R Foundation, vonhttp://www.r-project.org
- 16. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R Engine andere
R-Packages
Oracle R Enterprise Packages
User R Engine (Dektop)1
User-Tabellen
Oracle DBSQL
Ergebnisse
Datenbank Compute Engine2
R Engine andere
R-Packages
Oracle R Enterprise Packages
R Engine(s) verwaltet durch Oracle DB
R
Ergebnisse
3
Post-Processing
der Ergebnisse
Analysen, die in der Oracle
DB nicht verfügbar sind
Ausführung in Collaboration
mit der Oracle DB
„Collaborative Execution“-Modell
- 17. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracles R Technologien
•Oracle R Distribution
•ROracle
•Oracle R Enterprise
•Oracle R Advanced Analytics for Hadoop
Für R Comunity frei verfügbar
- 18. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Demo
18
- 19. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Benefits
19
- 20. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Benefits I
5.881 R-Packages
20
- 21. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Benefits II
21
Integration
Performance & Scalability
Performante Enterprise
Predictive Analytics Applikationen
Geringe Total Costs of Ownership
- 22. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Ausblick: Mehr Performance für R
22
- 23. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
FastR
• Neuimplementierung von R in Java
– Verwendung von Graal (Compiler) und Truffle (AST-Interpreter)
– Dynamische Compilierung, Skalierung auf heterogenen Architekturen
– Beteiligt: Oracle Labs (Deutschland, USA, Österreich),
JKU Linz,
Purdue University,
TU Dortmund
23
U
U U
U
U I
I I
G
G I
I I
G
G
Node Rewriting
for Profiling Feedback
AST Interpreter
Rewritten Nodes
AST Interpreter
Uninitialized Nodes
Compilation using
Partial Evaluation
Compiled Code
Node Transitions
S
U
I
D
G
Uninitialized Integer
Generic
DoubleString
- 24. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
“R is a powerful and interesting tool for
data analysis! ORE brings R into a
scalable DB engine (solving problems
of data management, analysis and
scalability). We actually can obtain
information and added value from not
so actively used data.”
– Stefano Alberto Russo, Researcher at CERN Openlab
24
- 25. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Weitere Informationen
25
ORE-Diskussionsforum:
https://community.oracle.com/community/developer/english/business_intelligence/data_warehousing/r
Oracle Advanced Analytics:
http://www.oracle.com/technetwork/database/options/advanced-analytics/index.html
ORE-Blog:
https://blogs.oracle.com/R/
FastR:
https://bitbucket.org/allR/fastR
Graal/Truffle:
https://wiki.openjdk.java.net/display/Graal/Main
- 26. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Kontakt
Dr. Nadine Schöne| Sales Consultant
Email: nadine.schoene@oracle.com
Tel: +49 331 200 7190
ORACLE Deutschland B.V. & Co. KG
Schiffbauergasse 14
14467 Potsdam