More Related Content Similar to Slidedeck Datenanalysen auf Enterprise-Niveau mit Oracle R Enterprise - DOAG2014 (20) Slidedeck Datenanalysen auf Enterprise-Niveau mit Oracle R Enterprise - DOAG20143. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Datenanalysen auf Enterprise Niveau
mit Oracle R Enterprise
Dr. Nadine Schöne
Sales Consultant
Oracle Direct, Sales Consulting
Dr. Michael Haupt
Tech Lead, FastR Project
Virtual Machine Research Group, Oracle Labs
Negib Marhoul
Leading Senior Sales Consultant
Oracle Direct, Sales Consulting
4. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
4
5. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Agenda
Datenanalysen im Enterprise
R und Oracle R Enterprise (ORE)
Demo
Oracle Labs und FastR
Weitere Informationen
1
2
3
4
5
5
6. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Datenanalysen im Enterprise
6
8. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Hintergrund
Statistik und Mining Verfahren
Zeitaufwendige
Analyseprozesse
Mehrere Interationen
Workflows von immer
wiederkehrenden
Arbeitsschritten
Ressourcen-intensive
Datenanalysen
Daten
sammeln
Daten
identifizieren
Daten
aufbereiten
Daten
analysieren
9. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Wichtige Themen für Enterprise Data Analytics
1. Skalierbarkeit
2. Performance
3. Entwicklung & Produktion
10. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R und Oracle R Enterprise (ORE)
10
11. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R ist …
1. Eine Programmiersprache
2. Eine statistische Workbench
3. Ein Data Science Ökosystem
R ist die lingua franca für Data Science.
R logo © R Foundation, vonhttp://www.r-project.org
12. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Aspekte herkömmlicher R/Datenbank-Interaktion
12
R logo © R Foundation, vonhttp://www.r-project.org
13. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R Engine andere
R-Packages
Oracle R Enterprise Packages
User R Engine (Desktop)1
User-Tabellen
Oracle DBSQL
Ergebnisse
Datenbank Compute Engine2
R Engine andere
R-Packages
Oracle R Enterprise Packages
R Engine(s) verwaltet durch Oracle DB
R
Ergebnisse
3
Transparency Layer => Nutzung der Rechenkraft der Datenbank
Kein Flat File Export => Zeitersparnis + Nutzung der Rechenkraft des Servers
„Collaborative Execution“-Modell
14. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
“R is a powerful and interesting tool for
data analysis! ORE brings R into a
scalable DB engine (solving problems
of data management, analysis and
scalability). We actually can obtain
information and added value from not
so actively used data.”
– Stefano Alberto Russo, Researcher at CERN Openlab
14
15. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
• Oracle R Distribution
• ROracle
• Oracle R Enterprise
• Oracle R Advanced Analytics for Hadoop
Kostenlos für die R Community
16. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle R Enterprise auf einen Blick
Function push-down –
Datentransformation & Statistiken
R workspace console
Oracle statistics engine
OBIEE, Web Services
Unveränderte
User Experience
Skalierbar auf große
Datenmengen
Einbettung in
operationale Systeme
©2014 Oracle – All Rights Reserved
Entwicklung Produktion Anwendung
17. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Sensordaten-Analyse I
17
200.000 Haushalte
3 Jahre
1 Messung/Stunde
5.256 Mrd. Messwerte
(2.628 Messwerte/Kunde)
18. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Sensordaten-Analyse II
18
10 s/Modell
200.000 Haushalte
➔
200.000 Modelle
23 Tage + 4 Stunden 4,3 Stunden
Oracle R
Enterprise
19. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Integration Data Miner mit Oracle R Enterprise
SQL Query node
– Erlaubt die Integration von R Skripten
20. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics
• Data Understanding & Visualization
– Summary & Descriptive Statistics
– Histograms, scatter plots, box plots, bar charts
– R graphics: 3-D plots, link plots, special R graph types
– Cross tabulations
– Tests for Correlations (t-test, Pearson’s, ANOVA)
– Selected Base SAS equivalents
• Data Selection, Preparation and Transformations
– Joins, Tables, Views, Data Selection, Data Filter, SQL time windows, Multiple
schemas
– Sampling techniques
– Re-coding, Missing values
– Aggregations
– Spatial data
– R to SQL transparency and push down
• Classification Models
– Logistic Regression (GLM)
– Naive Bayes
– Decision Trees
– Support Vector Machines (SVM)
– Neural Networks (NNs)
• Regression Models
– Multiple Regression (GLM)
– Support Vector Machines
Wide Range of In-Database Data Mining and Statistical Functions
Clustering
– Hierarchical K-means
– Orthogonal Partitioning
– Expectation Maximization
Anomaly Detection
– Special case Support Vector Machine (1-Class SVM)
Associations / Market Basket Analysis
– A Priori algorithm
Feature Selection and Reduction
– Attribute Importance (Minimum Description Length)
– Principal Components Analysis (PCA)
– Non-negative Matrix Factorization
– Singular Vector Decomposition
Text Mining
– Most OAA algorithms support unstructured data (i.e. customer
comments, email, abstracts, etc.)
Transactional Data
– Most OAA algorithms support transactional data (i.e. purchase
transactions, repeated measures over time)
R packages—ability to run open source
– Broad range of R CRAN packages can be run as part of database
process via R to SQL transparency and/or via Embedded R mode
* included in every Oracle Database
Data Understanding & Visualization
Classification & Regression Models
Clustering
Run open source R packages
Data Preparation and Transformations
21. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Demo
21
22. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R 3.1.1
Oracle R Enterprise (ORE) 1.4.1
Oracle DB
12.1.0.2.0
R, SQL
Software-Komponenten im VM-Image
Oracle SQLDeveloper 4.0.3Rstudio 0.98.1079
23. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Benefits
6054 R-Packages
23
24. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Labs und FastR
24
25. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 25
Safe Harbor Statement
The following is intended to provide some insight into a line of research in Oracle Labs. It
is intended for information purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing decisions. Oracle reserves the right
to alter its development plans and practices at any time, and the development, release,
and timing of any features or functionality described in connection with any
Oracle product or service remains at the sole discretion of Oracle. Any views expressed in
this presentation are my own and do not necessarily reflect the views of Oracle.
26. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The Mission of Oracle Labs is straightforward:
Identify, explore, and transfer new
technologies that have the potential to
substantially improve Oracle's business.
– Edward Screven, Chief Corporate Architect, Oracle
26
27. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Überlegungen zu R
• R eignet sich hervorragend für
statistische Aufgaben.
Warum sollte man C und Fortran
verwenden?
• R ist als Sprache inhärent parallel.
Warum sollte man Parallelität extra
implementieren?
27
Library'2
(R'+'Fortran)
Library'1
(R'+'C)
28. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
FastR
• Open-Source-R-Implementierung
– GPL 2
– https://bitbucket.org/allr/fastr
– Forschungsprototyp
– Linux, Mac
• Eigenschaften
– In “100 % Java” implementiert
– Mit Truffle (Interpreter)
und Graal (dynamischer Compiler)
28
Library'2'(R)
Library'1'(R)
29. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Truffle und Graal
29
Node%Transi, ons:
Specializing%for%Types
Unini, alized
Generic
AST$Interpreter
Unini- alized$Nodes
AST$Interpreter
Rewri. en$Nodes Compiled)Code
Deop%miza%on
to,AST,Interpreter
Node%Rewri*ng%to%Update
Profiling%Feedback
Node%Rewri*ng
for%Profiling%Feedback
Compila( on*using
Par( al*Evalua( on
Recompila*on,using
Par*al,Evalua*on
30. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Benchmark-Ergebnisse: Shootout
• Benchmark-Eigenschaften
– “Computer Languages Shootout Game”
– Keine typischen R-Anwendungen
• Ergebnisse
– Achtung, logarithmische Achse
– Die meisten sind ca. 10x schneller
– Positive Ausnahme: ca. 520x
30
1
10
100
1000
Geometric mean:
10x improvement over GNU R
31. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
PGX: Überblick
PGX ist ein Framework zur Datenanalyse, das
mächtige Graphen-Analysen der Daten unterstützt
Recommendation
Influencer
Identification
Community
Detection
Pattern Matching
PGX führt schnelle und parallele Analysen auf
großen Graphen aus – sowohl auf einer einzelnen
Maschine als auch in einer verteilten Umgebung.
PGX ist eng integriert mit der Oracle DB (Optionen
RDF und PG), welche Graphdaten auf persistentem
Speicher konsistent verwaltet.
PGX
…
Single Machine Distributed
Graph
Program
(DSL)
compiler
Unsere DSL-Compiler-Technologie erlaubt einfaches
Umschalten zwischen zwei Umgebungen.
32. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Mehr Informationen
32
33. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Mehr Informationen
33
ORE Discussion Forum:
https://community.oracle.com/community/developer/english/business_intelligence/data_warehousing/r
Oracle Advanced Analytics:
http://www.oracle.com/technetwork/database/options/advanced-analytics/index.html
ORE-Blog:
https://blogs.oracle.com/R/
FastR:
https://bitbucket.org/allR/fastR
Graal/Truffle:
https://wiki.openjdk.java.net/display/Graal/Main
Oracle Labs im OTN:
http://www.oracle.com/technetwork/oracle-labs/index.html
34. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Kontakt
Dr. Nadine Schöne| Sales Consultant
Email: nadine.schoene@oracle.com
Tel: +49 331 200 7190
Dr. Michael Haupt | Tech Lead, FastR Project
Email: michael.haupt@oracle.com
Tel: +49 331 200 7277
ORACLE Deutschland B.V. & Co. KG
Schiffbauergasse 14
14467 Potsdam