Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Slidedeck Datenanalysen auf Enterprise-Niveau mit Oracle R Enterprise - DOAG2014

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 36 Anuncio

Slidedeck Datenanalysen auf Enterprise-Niveau mit Oracle R Enterprise - DOAG2014

Descargar para leer sin conexión

Slide deck for conference talk at DOAG2014 conference. In German only, translation available on request. Please have a look at the corresponding abstract.

Slide deck for conference talk at DOAG2014 conference. In German only, translation available on request. Please have a look at the corresponding abstract.

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (20)

Anuncio

Similares a Slidedeck Datenanalysen auf Enterprise-Niveau mit Oracle R Enterprise - DOAG2014 (20)

Más reciente (20)

Anuncio

Slidedeck Datenanalysen auf Enterprise-Niveau mit Oracle R Enterprise - DOAG2014

  1. 1. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
  2. 2. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Datenanalysen auf Enterprise Niveau mit Oracle R Enterprise Dr. Nadine Schöne Sales Consultant Oracle Direct, Sales Consulting Dr. Michael Haupt Tech Lead, FastR Project Virtual Machine Research Group, Oracle Labs Negib Marhoul Leading Senior Sales Consultant Oracle Direct, Sales Consulting
  3. 3. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 4
  4. 4. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Agenda Datenanalysen im Enterprise R und Oracle R Enterprise (ORE) Demo Oracle Labs und FastR Weitere Informationen 1 2 3 4 5 5
  5. 5. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Datenanalysen im Enterprise 6
  6. 6. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 7
  7. 7. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Hintergrund Statistik und Mining Verfahren  Zeitaufwendige Analyseprozesse  Mehrere Interationen  Workflows von immer wiederkehrenden Arbeitsschritten  Ressourcen-intensive Datenanalysen Daten sammeln Daten identifizieren Daten aufbereiten Daten analysieren
  8. 8. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Wichtige Themen für Enterprise Data Analytics 1. Skalierbarkeit 2. Performance 3. Entwicklung & Produktion
  9. 9. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | R und Oracle R Enterprise (ORE) 10
  10. 10. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | R ist … 1. Eine Programmiersprache 2. Eine statistische Workbench 3. Ein Data Science Ökosystem R ist die lingua franca für Data Science. R logo © R Foundation, vonhttp://www.r-project.org
  11. 11. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Aspekte herkömmlicher R/Datenbank-Interaktion 12 R logo © R Foundation, vonhttp://www.r-project.org
  12. 12. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | R Engine andere R-Packages Oracle R Enterprise Packages User R Engine (Desktop)1 User-Tabellen Oracle DBSQL Ergebnisse Datenbank Compute Engine2 R Engine andere R-Packages Oracle R Enterprise Packages R Engine(s) verwaltet durch Oracle DB R Ergebnisse 3 Transparency Layer => Nutzung der Rechenkraft der Datenbank Kein Flat File Export => Zeitersparnis + Nutzung der Rechenkraft des Servers „Collaborative Execution“-Modell
  13. 13. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | “R is a powerful and interesting tool for data analysis! ORE brings R into a scalable DB engine (solving problems of data management, analysis and scalability). We actually can obtain information and added value from not so actively used data.” – Stefano Alberto Russo, Researcher at CERN Openlab 14
  14. 14. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | • Oracle R Distribution • ROracle • Oracle R Enterprise • Oracle R Advanced Analytics for Hadoop Kostenlos für die R Community
  15. 15. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle R Enterprise auf einen Blick Function push-down – Datentransformation & Statistiken R workspace console Oracle statistics engine OBIEE, Web Services Unveränderte User Experience Skalierbar auf große Datenmengen Einbettung in operationale Systeme ©2014 Oracle – All Rights Reserved Entwicklung Produktion Anwendung
  16. 16. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Sensordaten-Analyse I 17 200.000 Haushalte 3 Jahre 1 Messung/Stunde 5.256 Mrd. Messwerte (2.628 Messwerte/Kunde)
  17. 17. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Sensordaten-Analyse II 18 10 s/Modell 200.000 Haushalte ➔ 200.000 Modelle 23 Tage + 4 Stunden 4,3 Stunden Oracle R Enterprise
  18. 18. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Integration Data Miner mit Oracle R Enterprise  SQL Query node – Erlaubt die Integration von R Skripten
  19. 19. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Advanced Analytics • Data Understanding & Visualization – Summary & Descriptive Statistics – Histograms, scatter plots, box plots, bar charts – R graphics: 3-D plots, link plots, special R graph types – Cross tabulations – Tests for Correlations (t-test, Pearson’s, ANOVA) – Selected Base SAS equivalents • Data Selection, Preparation and Transformations – Joins, Tables, Views, Data Selection, Data Filter, SQL time windows, Multiple schemas – Sampling techniques – Re-coding, Missing values – Aggregations – Spatial data – R to SQL transparency and push down • Classification Models – Logistic Regression (GLM) – Naive Bayes – Decision Trees – Support Vector Machines (SVM) – Neural Networks (NNs) • Regression Models – Multiple Regression (GLM) – Support Vector Machines Wide Range of In-Database Data Mining and Statistical Functions  Clustering – Hierarchical K-means – Orthogonal Partitioning – Expectation Maximization  Anomaly Detection – Special case Support Vector Machine (1-Class SVM)  Associations / Market Basket Analysis – A Priori algorithm  Feature Selection and Reduction – Attribute Importance (Minimum Description Length) – Principal Components Analysis (PCA) – Non-negative Matrix Factorization – Singular Vector Decomposition  Text Mining – Most OAA algorithms support unstructured data (i.e. customer comments, email, abstracts, etc.)  Transactional Data – Most OAA algorithms support transactional data (i.e. purchase transactions, repeated measures over time)  R packages—ability to run open source – Broad range of R CRAN packages can be run as part of database process via R to SQL transparency and/or via Embedded R mode * included in every Oracle Database Data Understanding & Visualization Classification & Regression Models Clustering Run open source R packages Data Preparation and Transformations
  20. 20. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Demo 21
  21. 21. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | R 3.1.1 Oracle R Enterprise (ORE) 1.4.1 Oracle DB 12.1.0.2.0 R, SQL Software-Komponenten im VM-Image Oracle SQLDeveloper 4.0.3Rstudio 0.98.1079
  22. 22. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Benefits 6054 R-Packages 23
  23. 23. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Labs und FastR 24
  24. 24. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 25 Safe Harbor Statement The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Oracle reserves the right to alter its development plans and practices at any time, and the development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.
  25. 25. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | The Mission of Oracle Labs is straightforward: Identify, explore, and transfer new technologies that have the potential to substantially improve Oracle's business. – Edward Screven, Chief Corporate Architect, Oracle 26
  26. 26. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Überlegungen zu R • R eignet sich hervorragend für statistische Aufgaben. Warum sollte man C und Fortran verwenden? • R ist als Sprache inhärent parallel. Warum sollte man Parallelität extra implementieren? 27 Library'2 (R'+'Fortran) Library'1 (R'+'C)
  27. 27. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | FastR • Open-Source-R-Implementierung – GPL 2 – https://bitbucket.org/allr/fastr – Forschungsprototyp – Linux, Mac • Eigenschaften – In “100 % Java” implementiert – Mit Truffle (Interpreter) und Graal (dynamischer Compiler) 28 Library'2'(R) Library'1'(R)
  28. 28. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Truffle und Graal 29 Node%Transi, ons: Specializing%for%Types Unini, alized Generic AST$Interpreter Unini- alized$Nodes AST$Interpreter Rewri. en$Nodes Compiled)Code Deop%miza%on to,AST,Interpreter Node%Rewri*ng%to%Update Profiling%Feedback Node%Rewri*ng for%Profiling%Feedback Compila( on*using Par( al*Evalua( on Recompila*on,using Par*al,Evalua*on
  29. 29. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Benchmark-Ergebnisse: Shootout • Benchmark-Eigenschaften – “Computer Languages Shootout Game” – Keine typischen R-Anwendungen • Ergebnisse – Achtung, logarithmische Achse – Die meisten sind ca. 10x schneller – Positive Ausnahme: ca. 520x 30 1 10 100 1000 Geometric mean: 10x improvement over GNU R
  30. 30. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | PGX: Überblick PGX ist ein Framework zur Datenanalyse, das mächtige Graphen-Analysen der Daten unterstützt Recommendation Influencer Identification Community Detection Pattern Matching PGX führt schnelle und parallele Analysen auf großen Graphen aus – sowohl auf einer einzelnen Maschine als auch in einer verteilten Umgebung. PGX ist eng integriert mit der Oracle DB (Optionen RDF und PG), welche Graphdaten auf persistentem Speicher konsistent verwaltet. PGX … Single Machine Distributed Graph Program (DSL) compiler Unsere DSL-Compiler-Technologie erlaubt einfaches Umschalten zwischen zwei Umgebungen.
  31. 31. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Mehr Informationen 32
  32. 32. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Mehr Informationen 33 ORE Discussion Forum: https://community.oracle.com/community/developer/english/business_intelligence/data_warehousing/r Oracle Advanced Analytics: http://www.oracle.com/technetwork/database/options/advanced-analytics/index.html ORE-Blog: https://blogs.oracle.com/R/ FastR: https://bitbucket.org/allR/fastR Graal/Truffle: https://wiki.openjdk.java.net/display/Graal/Main Oracle Labs im OTN: http://www.oracle.com/technetwork/oracle-labs/index.html
  33. 33. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Kontakt Dr. Nadine Schöne| Sales Consultant Email: nadine.schoene@oracle.com Tel: +49 331 200 7190 Dr. Michael Haupt | Tech Lead, FastR Project Email: michael.haupt@oracle.com Tel: +49 331 200 7277 ORACLE Deutschland B.V. & Co. KG Schiffbauergasse 14 14467 Potsdam
  34. 34. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 35

×