SlideShare una empresa de Scribd logo
1 de 58
Descargar para leer sin conexión
Oracle Advanced Analytics
Oracle R Enterprise & Oracle Data Mining
Extending the Database into a Comprehensive Advanced Analytics Platform
Charlie Berger
Sr. Director Product Management, Data Mining and Advanced Analytics
Oracle Corporation
charlie.berger@oracle.com www.twitter.com/CharlieDataMine
                                                                          R
The following is intended to outline our general product
     direction. It is intended for information purposes only, and may
     not be incorporated into any contract. It is not a commitment to
     deliver any material, code, or functionality, and should not be
     relied upon in making purchasing decisions.
     The development, release, and timing of any features or
     functionality described for Oracle’s products remains at the
     sole discretion of Oracle.




2   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Advanced Analytics Option—Agenda
    Extending the Database into a Comprehensive Advanced Analytics Platform

    • Big Data Analytics—Oracle Advanced Analytics Option
    • Oracle Data Mining
          – SQL & PL/SQL focused in-database data mining and predictive analytics
    • Oracle R Enterprise
          – Integrates Open Source R with the Oracle Database




                                                                                    R
3   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Big Data Analytics
                                                                                                                                                      R
    But the bigger you get, the more likely you are to be doing
    extensive data mining and the more likely you are to be
    implementing or moving towards in-database analytics.

    —Jim Kobielus, a senior data management analyst with Cambridge, Mass.-based Forrester
    Research Inc. quote from “Customary Data Warehouse Concepts vs. Hadoop: Forrester
    Makes the Call”,
    Mark Brunelli, Senior News Editor
    This RSS Reprints Published: 11 Aug 2011
    http://searchdatamanagement.techtarget.com/news/2240039468/Customary-data-warehouse-concepts-vs-Hadoop-Forrester-makes-the-call?vgnextfmt=print




4   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Advanced Analytics Option
    Transforming the Database into a Comprehensive Advanced Analytics Platform

    • Oracle Advanced Analytics Option enables companies to
                                                                                                     R
      "bring the algorithms to the data" vs. extracting the data to specialized and
      expensive dedicated statistical and data mining servers
    • Oracle Advanced Analytics Option includes:
      – Oracle Data Mining
                   • SQL & PL/SQL focused in-database data mining and predictive analytics

          – Oracle R Enterprise
                   • Integrates the Open-Source Statistical Environment R with the Oracle Database

    • Data movement is eliminated or dramatically reduced while analytical and compute
      intensive operations are performed inside the database, where the data resides, to
      increase performance, reduce cycle times required to extract information from data
      and reduce total cost of ownership over traditional statistical and data mining
      environments

5   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle In-Database Advanced Analytics
    Comprehensive Advanced Analytics Platform




Oracle R Enterprise                                                             Oracle Data Mining
• Popular open source statistical                                               • Automated knowledge discovery inside the
  programming language & environment                                              Database
• Integrated with database for scalability                                      • 12 in-database data mining algorithms
• Wide range of statistical and advanced                                        • Text mining
  analytical functions                                                          • Predictive analytics applications
• R embedded in enterprise appls & OBIEE
• Exploratory data analysis
                                                                           R      development environment
                                                                                • Star schema and transactional data mining
• Extensive graphics                                                            • Exadata "scoring" of ODM models
• Open source R (CRAN) packages                                                 • SQL Developer/Oracle Data Miner GUI
• Integrated with Hadoop for HPC

Statistics                            Advanced Analytics                       Data & Text Mining          Predictive Analytics

6   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle—Hardware and Software
    Engineering to Work Together
                                                                           • Oracle is the world's most complete,
                                                                             open, and integrated business software
                                                                             and hardware systems company
                                                                            • Data Warehousing, VLDB and ILM
                                                                              • Oracle Data Mining        New GUI
                                                                               •   12- in-DB data mining algorithms
                                                                               •   Mine star schemas, text, transactional data
                                                                               •   In-DB model build & apply—Exadata scoring
                                                                               •   50+ in-DB statistical functions
                                                                              • Oracle R Enterprise          New
                                                                               • Run R in-DB; function push down to SQL
                                                                               • Wide library of supported in-DB statistical functions
                                                                               • Embedded R supports all R packages
                                                                             Oracle has taught the RDBMS how to perform
                                                                             data mining, statistical analysis, adv. analytics, etc.

7   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle’s In-Database Advanced Analytics Option
    Value Proposition

    •10-100x PERFORMANCE
          – Integrated features of the Database
          – Perform analytics in-DB to eliminate data movement
          – Reduce information latency: days-wks  mins-hours


    •10x LOWER TOTAL COST OF OWNERSHIP
          – Eliminate/minimize expensive annual usage fees associated with
            traditional stats/mining packages
          – Leverage Oracle DB, DW & BI technology platform
8   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Data Mining
    Building Predictive Analytics Applications
    • Oracle Data Mining provides 12 powerful in-database data mining algorithms for big
      data analytics as a native feature of the database
          –      Designed for or big data problems involving discovering patterns and relationships in large amounts of data and
                 oftentimes making predictions based on those patterns, Oracle Data Mining allows data analysts and data miners to
                 mine star schemas, transactional data and unstructured data stored inside the database, build predictive models and
                 apply them to data inside the database--all without moving data.

    • Oracle Data Miner help users mine their data and define, save and share advanced
      analytical methodologies
          –      Users who prefer a Graphical User Interface can use the Oracle Data Miner extension to SQL Developer to develop,
                 build, evaluate, share and automate analytical workflows to solve important data driven business problems.

    • Developers can use the SQL APIs and PL/SQL to build applications to automate
      knowledge discovery
          –      The Oracle Data Miner GUI generates SQL code that application developers can use to develop and deploy SQL and
                 PL/SQL based automated predictive analytics applications that run natively inside the Oracle Database.




9   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
What is Data Mining?
     •   Automatically sifts through data to find hidden patterns,
         discover new insights, and make predictions
     •   Data Mining can provide valuable results:
           •      Predict customer behavior (Classification)
           •      Predict or estimate a value (Regression)
           •      Segment a population (Clustering)
           •      Identify factors more associated with a business
                  problem (Attribute Importance)
           •      Find profiles of targeted people or items (Decision Trees)
           •      Determine important relationships and “market baskets” within the population
                  (Associations)
           •      Find fraudulent or “rare events” (Anomaly Detection)

10   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
In-Database Data Mining
         Traditional Analytics                                               Oracle Data Mining

                 Data Import                                                                              Results
                                                                                                          • Faster time for
               Data Mining                                                                                  “Data” to “Insights”
              Model “Scoring”
                                                                                                          • Lower TCO—Eliminates
              Data Preparation                                                                               • Data Movement
                    and
               Transformation
                                                                                        Savings              • Data Duplication
                                                                                                          • Maintains Security
                Data Mining
               Model Building
                                                                                                        Model “Scoring”
                                                                                                        Data remains in the Database
                Data Prep &
               Transformation                                                                           Embedded data preparation

                                                                                                        Cutting edge machine learning
                                                                                    Model “Scoring”     algorithms inside the SQL kernel of
              Data Extraction                                                      Embedded Data Prep
                                                                                                        Database
                                                                                  Model Building
                                                                                 Data Preparation
                                                                                                        SQL—Most powerful language for data
       Hours, Days or Weeks                                                  Secs, Mins or Hours        preparation and transformation
     Source    Dataset   Analytic   Process    Target
      Data     s/ Work
                Area
                            al
                         Process
                                    Output                                                              Data remains in the Database
                           ing




11    Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Data Miner Nodes (Partial List)
Tables and Views


 Transformations


       Explore Data


                   Modeling


                                   Text
  12   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Data Mining and Unstructured Data
     • Oracle Data Mining
       mines unstructured
       i.e. “text” data
     • Include free text and
       comments in ODM
       models
     • Cluster and Classify
       documents
     • Oracle Text
       used to preprocess
       unstructured text



13    Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Data Mining Algorithms
 Problem                                                                       Algorithm                          Applicability
     Classification                                                            Logistic Regression (GLM)          Classical statistical technique
                                                                               Decision Trees                     Popular / Rules / transparency
                                                                               Naïve Bayes                        Embedded app
                                                                               Support Vector Machine             Wide / narrow data / text

     Regression                                                                Multiple Regression (GLM)          Classical statistical technique
                                                                               Support Vector Machine             Wide / narrow data / text
     Anomaly
                                                                               One Class SVM                      Lack examples of target field
     Detection
     Attribute                                                                                                    Attribute reduction
                                                                               Minimum Description Length (MDL)   Identify useful data
     Importance                                         A1 A2 A3 A4 A5 A6 A7
                                                                                                                  Reduce data noise
     Association                                                                                                  Market basket analysis
     Rules                                                                     Apriori                            Link analysis
                                                                                                                  Product grouping
     Clustering                                                                Hierarchical K-Means               Text mining
                                                                               Hierarchical O-Cluster             Gene and protein analysis
     Feature                                                                                                      Text analysis
     Extraction                                                                Nonnegative Matrix Factorization   Feature reduction
                                                          F1 F2 F3 F4




14    Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
SQL Developer 3.0/Oracle Data Miner 11g Release 2 GUI
     • Graphical User Interface
       for data analyst
     • SQL Developer Extension
         (OTN download)
     • Explore data—discover
       new insights
     • Build and evaluate data
       mining models
     • Apply predictive models                                              New GUI
     • Share analytical workflows
     • Deploy SQL Apply
       code/scripts




15   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Data Miner 11g Release 2 GUI
     Churn Demo—Simple Conceptual Workflow




Churn models to product
   and “profile” likely
       churners




16   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Data Miner 11g Release 2 GUI
      Churn Demo—Simple Conceptual Workflow




Clustering analysis to
  discover customer
 segments based on
behavior, demograhics,
plans, equipment, etc.



 17   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Data Miner 11g Release 2 GUI
     Churn Demo—Simple Conceptual Workflow



                                                                            Market Basket Analysis to
                                                                             identify potential product
                                                                                      bundless




18   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Fraud Prediction Demo
     drop table CLAIMS_SET;                                                                POLICYNUMBER     PERCENT_FRAUD      RNK
                                                                                          ------------      -------------      ----------
     exec dbms_data_mining.drop_model('CLAIMSMODEL');                                     6532              64.78              1
     create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000));   2749              64.17              2
     insert into CLAIMS_SET values                                                        3440              63.22              3
         ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES');                                    654               63.1               4
                                                                                          12650             62.36              5
     insert into CLAIMS_SET values ('PREP_AUTO','ON');
     commit;
                                                                                          Automated Monthly “Application”! Just add:
     begin                                                                                  Create
     dbms_data_mining.create_model('CLAIMSMODEL', 'CLASSIFICATION',                         View CLAIMS2_30
       'CLAIMS', 'POLICYNUMBER', null, 'CLAIMS_SET');                                       As
     end;                                                                                   Select * from CLAIMS2
     /                                                                                      Where mydate > SYSDATE – 30

     -- Top 5 most suspicious fraud policy holder claims
     select * from
     (select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud,
         rank() over (order by prob_fraud desc) rnk from
     (select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud
     from CLAIMS
     where PASTNUMBEROFCLAIMS in ('2to4', 'morethan4')))
     where rnk <= 5
     order by percent_fraud desc;


19   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Exadata + Data Mining 11g Release 2
     “DM Scoring” Pushed to Storage!


                                                                Faster




     • In 11g Release 2, SQL predicates and Oracle Data Mining models are pushed to
       storage level for execution
         For example, find the US customers likely to churn:
         select cust_id
         from customers
         where region = ‘US’
         and prediction_probability(churnmod,‘Y’ using *) > 0.8;


20   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Real-time Prediction for a Customer
     • On-the-fly, single record apply with new data (e.g. from                                                 call center)
           Select prediction_probability(CLAS_DT_5_2, 'Yes'
             USING 7800 as bank_funds, 125 as checking_amount, 20 as
             credit_balance, 55 as age, 'Married' as marital_status,
               250 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership)
           from dual;
                                                                    Call Center   Social Media
                                                                                                   Branch
                              ECM                                                                                       BI
                                                                                                 Get Advice
                                             Web                                                              Mobile




                                                                                  Email



                                                                                    CRM


21   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Ability to Import/Export 3rd Party DM Models
     • ODM 11g Release 2 adds ability to import 3rd party models (PMML), convert
       to native ODM models and score them in-DB
           • Supported models for ODM model export:
                    • Decision Trees (PMML)
           • Supported algorithms for ODM model import:
                    • Multiple regression models (PMML)
                    • Logistic regression models (PMML)

     • Benefits
           • SAS, SPSS, R, etc. data mining models can scored on Exadata
                    • Imported dm models become native ODM models and inherit all ODM benefits including scoring at
                      Exadata storage layer, 1st class objects, security, etc.

                                                                            Faster




22   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Communications Industry Data Model
     Better Information for OBIEE Dashboards
                                                                            Oracle Data Mining discovers different
                             Oracle Data Mining                             profiles of churning customers and their
                             identifies key contributors                    profile, both critical to developing a
                             to customer churn                              proactive response to reduce churn.




23   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Communications Industry Data Model Example
     Better Information for OBIEE Dashboards
                                                                            ODM’s predictions & probabilities are
                                                                            available in the Database for reporting
                                                                            using Oracle BI EE and other tools




24   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Exadata with Analytics and Business Intelligence
      Better Together

• In-database
  data mining
  builds
  predictive
  models that
  predict
  customer
  behavior
• OBIEE’s
  integrated
  spatial
  mapping
                                                     Customer “most likely” be be
  shows                                              HIGH and VERY HIGH value
  where                                              customer in the future


 25   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Exadata with Analytics and Business Intelligence
       Better Together
                                                                              Drill-through for details about top
• Exadata                                                                     factors that define HIGH and VERY
  power                                                                       HIGH value customers
• OBIEE
  ease-of-use




  26   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Fusion HCM Predictive Analytics
     Factory Installed PA/ODM Methodologies




       Oracle Data Mining’s factory-installed predictive analytics
       show employees likely to leave, top reasons, expected
       performance and real-time "What if?" analysis


27   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Predictive Analytics Applications                                           (partial list)

     • Oracle Industry Data Models—factory installed data mining for specific industries
         • Communications Industry Data Model—churn, segmentation, profiling, etc.
         • Retail Industry Data Model—market basket analysis, loyalty, etc.
     • Oracle Spend Classification—auto review/real-time correction of submission mistakes
     • Oracle Fusion Human Capital Management (HCM)
         • Predictive Workforce—Employee turnover and performance prediction
     • Oracle Fusion CRM
         • Sales Prediction—Prediction of Sales opportunities, what to sell, amount, timing, etc.
     • Oracle Identify Management
         • Adaptive Access Manager Real-time Security at user login
     • Oracle FMW
         • Complex Event Processing integrated with integrated ODM models
     • Oracle ACS
         • Predictive Incident Monitoring Service for Oracle Database customers

28   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Learn More
     Oracle Data Mining on OTN                                                     Oracle Data Mining Blog




                                                                       Oracle Data Mining
29   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Learn More




30   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
R Statistical Programming Language

                                                                            Open source language and
                                                                            environment

                                                                            Used for statistical computing
                                                                            and graphics

                                                                            Strength in easily producing
                                                                            publication-quality plots

                                                                            Highly extensible with open
                                                                            source community R packages



31   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Growing Popularity




 • R’s rapid adoption over several years
   has earned its reputation as a new
   statistical software standard
         – Rival to SAS and SPSS
         While it is difficult to calculate exactly how many people use R, those most familiar
        with the software estimate that close to 250,000 people work with it regularly.
        “Data Analysts Captivated by R’s Power”, New York Times, Jan 6, 2009

      http://www.r-project.org/
32   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Typical R Approach



                                                                            Statistical and
                                                                            advanced analyses are
                                                                            run and stored on the
                                                                            user’s laptop




33   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
What Are                                                         ’s Challenges?

     1. R is memory constrained
             – R processing is single threaded - does not exploit available
               compute infrastructure
             – R lacks industrial strength for enterprise use cases
     2. R has lacked mindshare in Enterprise market
             – R is still met with caution by the long established SAS and
               IBM/SPSS statistical community
                   • However, major university (e.g. Yale ) Statistics courses now taught in R
                   • The FDA has recently shown indications for approval of new drugs for which
                     the submission’s data analysis was performed using R

34   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle R Enterprise Approach


                                                                             Data and statistical analysis
                                                                             are stored and run in-
                                                                             database
                                                                             Same R user experience &
                                                                     R       same R clients
                                                               Open Source

                                                                             Embed in operational
                                                                             systems
                                                                             Complements Oracle Data
                                                                             Mining


35   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
What is                                                                R Enterprise?
     • Integrates popular open source R statistical programming language
       and environment with Oracle Database 11g Release 2 for big data.
            – Designed for problems involving large amounts of data, Oracle R Enterprise
              integrates R wit the database and enables users to run R commands on database-
              resident data, develop and refine R scripts, and leverage the parallelism and
              scalability of the database.
            – Data analysts can run the latest R open source packages and develop and
              operationalize R scripts for analytical applications in one step—without having to
              learn SQL.
            – Oracle R Enterprise performs function push down for in-database execution of base
              R and popular R packages.
            – Because it runs as an embedded component of the database, Oracle R Enterprise
              can run any R package either by function push down or via embedded R while the
              database manages the data served to the R engine.

36   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
What is                                                                R Enterprise?
     • Oracle R Enterprise brings R’s statistical
       functionality closer to the Oracle Database
     1. Eliminate R’s memory constraint by enabling R
        to work directly/transparently on database objects                                     R
                                                                                            Open Source

             –          Allows R to run on very large data sets, tables, views
     2. Architected for Enterprise production infrastructure
             –          Automatically exploits database parallelism without requiring
                        parallel R programming
             –          Build and immediately deploy R scripts
     3. Oracle R leverages the latest R algorithms and packages
             –          Embedded R runs as an embedded component of the DBMS server


37   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Architecture and Performance
     • Transparently function-ships R
       constructs to database via R  SQL
       translation
       –Data structures
       –Functions
           • Data manipulation functions (select, project, join)




                                                                            Seconds
           • Basic statistical functions (avg, sum, summary)
           • Advanced statistical functions(gamma, beta)

     • Performs data-heavy computations in
       database
       –R for summary analysis and graphics
     • Transparent implementation enables
       using wide range of R “packages” from
       open source community

38   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle R Enterprise Architecture                                                                                   Worst

                                          Use Case: Using ONTIME airline data, of the 36 busiest
                                      airports, run a box-plot analysis of the best/worst airports for
                                                                                       arrival delay?



R workspace console
                                                                                                                         Best




                Function push-down                                           Oracle statistics engine
                  – data transformation &                                                                  OBIEE, Web
                                                      statistics                                R
                                                                                             Open Source
                                                                                                           Services



                      Development                                             Production                   Consumption

 39   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Better Business Intelligence
     Enrich BI Dashboards with Statistics, Data Mining and Adv. Analytics
     • Ad hoc
     • Exploratory data analysis
     • Interactive graphics
     • Problem-solving
            Oracle R Enterprise's
           and ODM's results become a
           data feed for OBIEE




40   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
How Oracle R Enterprise Works
     ORE Computation Engines
                                                                                                    R
                                                                                                 Open Source

     • Oracle R Enterprise tightly integrates R with the database and fully
       manages the data operated upon by R code.
           – The database is always involved in serving up data to the R code.
           – Oracle R Enterprise runs in the Oracle Database.
     • Oracle R Enterprise eliminates data movement and duplication, maintains
       security and minimizes latency time from raw data to new information.
     • Three ORE Computation Engines
           – Oracle R Enterprise provides three different interfaces between the open-source R engine and
             the Oracle database:
                 1. Oracle R Enterprise (ORE) Transparency Layer
                 2. Oracle Statistics Engine
                 3. Embedded R

41   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle R Enterprise Compute Engines

                                                          1                                                                2                                                           3
                                                                                                  Oracle Database
                                               Other R                                                                                                                            Other R
           R Engine                            packages
                                                                                  SQL
                                                                                                    User tables
                                                                                                                                           R               R Engine               packages
                                                                                                                          ?x
      Oracle R Enterprise packages                                                               R                                                    Oracle R Enterprise packages
                                                                                  Results     Open Source                                Results


User R Engine on desktop                                                            Database Compute Engine                                    R Engine(s) spawned by Oracle DB
•   R-SQL Transparency Framework intercepts R                                        •      Scale to large datasets                            •   Database can spawn multiple R engines for
    functions for scalable                                                                                                                         database-managed parallelism
                                                                                     •      Access tables, views, and external tables,
    in-database execution                                                                                                                      •   Efficient data transfer to spawned
                                                                                            as well as data through
•   Function intercept for data transforms,                                                 DB LINKS                                               R engines
    statistical functions and advanced analytics                                                                                               •   Emulate map-reduce style algorithms and
                                                                                     •      Leverage database SQL parallelism                      applications
•   Interactive display of graphical results and                                                                                               •   Enables “lights-out” execution of R scripts
                                                                                     •      Leverage new and existing
    flow control as in standard R
                                                                                            in-database statistical and data mining
•   Submit entire R scripts for execution by                                                capabilities
    Oracle Database

      42   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
How Oracle R Enterprise Works
     ORE Computation Engines                                                            R
                                                                                    Open Source


     1. Oracle R Enterprise (ORE) Transparency Layer
           – Traps all R commands and scripts prior to execution and looks for
             opportunities to function ship them to the database for native execution
           – ORE transparency layer converts R commands/scripts into SQL equivalents
             and thereby leverages the database as a compute engine.




43   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
How Oracle R Enterprise Works
     ORE Computation Engines                                                                               R
                                                                                                        Open Source


     2. In-Database Statistics Engine
           – Significantly extends the Oracle Database’s library of                     All Base R functions
             statistical functions and advanced analytical computations                 R Multiple Regression
           – Provides support for the complete R language and                           …. Driven by customers
             statistical functions found in Base R and selected R
             packages based on customer usage                                            ORE Functions
                                                                                         •   ORE SUMMARY
                    • Open source packages - written entirely in R language with only    •   ORE FREQUENCY
                      the functions for which we have implemented SQL counterparts -     •   ORE CORR
                                                                                         •   ORE UNIVARITE
                      can be translated to execute in database.                          •   ORE CROSSTAB
           – Without anything visibly different to the R users, their R                  •   ORE RANK
                                                                                         •   ORE SORT
             commands and scripts are oftentimes accelerated by a                        •   …
             factor of 10-100x
           – ORE Functions for most common/base traditional statistical
             software procedures
44   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Transparency Framework + Statistics Engine
     • The following operators and functions are supported. See R documentation for syntax and
       semantics of these operators/function. Syntax and semantics for these items remain unchanged
       when used on a database mapped data type (also henceforth known as an Oracle R Enterprise
       data type):

     • Mathematical transformations: abs, sign, sqrt, ceiling, floor, trunc, cummax, cummin, cumprod,
       cumsum, log, log10, log2, log1p, acos, acosh, asin, asinh, atan, atanh, exp, expm1, cos, cosh,
       sin, sinh, tan, tanh, gamma, lgamma, digamma, trigamma, round, signif, pmin, pmax, zapsmall
     • Basic statistics: mean, summary, min, max, sum, prod, any, all, median, range,
     • IQR, fivenum, mad, quantile, sd, var, CSS, CV, kurtosis, skewness, USS,
     • STDERR, table, rowSums, colSums, rowMeans, colMeans
     • Arithmetic operators:+, -, *, /, ^, %%, %/%
     • Comparison operators: ==, >, <, !=, <=, >=




45    Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Transparency Framework + Statistics Engine
     • Logical operators: &, |, xor                                         • Graphics: hist, boxplot, plot, smoothScatter
     • Set operations: unique, %in%, match, union, intersect,               • Garbage collection of temporary objects created by the
       setdiff, setequal, is.element                                          transparency
     • Assignment operator forms: <-, =, ->                                 • framework: gc
     • String operations: tolower, toupper, casefold, toString, chatr,      • Conversion functions: as.ore.{character, factor,
       sub, gsub, substr, substring, paste, nchar                             integer, logical, numeric, vector}
     • Combine Data Frame: cbind, rbind, merge                              • Test functions: is.ore.{character, factor, integer,
     • Combine vectors: c, append, stack                                      logical, numeric, vector}
     • Vector creation: rep, ifelse                                         • Hypothesis testing: wilcox.test, ks.test, var.test, binom.test,
     • Subset: [, [[, $, head, tail, window, subset, Filter, na.omit,         chisq.test, t.test, bartlett.test
       na.exclude, complete.cases                                           • Bessel Functions: Bessel(I,J,K,Y)
     • Data order: order, rank, sort, rev                                   • Gamma Functions: gamma, lgamma, digamma, trigamma
     • Data reshaping: split, unlist                                        • Error Functions: ERF, ERFC
     • Data processing: eval, with, within, transform                       • Covariance and Correlation matrices: cov, cor
     • Apply variants: tapply, lapply, sapply                               • Probability, Density and Quantile Functions: Poisson, Beta,
     • Aggregation: aggregate                                                 Normal,
     • Regression: ore.lm() - a variant of lm()                             • Chi-Squared, F, Gamma, Hyper-geometric, Probit, ProbNegative,
     • Anova and Boxcox: Working on output of ore.lm()                        ProbNorm, ProbT
     • Step-wise regression: step                                           • Matrix Operations: Matrix multiplication, Crossprod, tcrossprod,
     • Special value checks: is.na, is.finite, is.infinite, is.nan            solve, backsolve, eigen, svd
     • Metadata functions: attributes, nrow, NROW, ncol, NCOL,
       nlevels, names, row, col, dimnames, dim, length,
       row.names, col.names, levels, reorder



46   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
How Oracle R Enterprise Works
     ORE Computation Engines                                                                                 R
                                                                                                         Open Source


     3. Embedded R Engine
           – The R Engine spawned by Oracle DB is also used for lights-out scheduled
             execution of R scripts; that is, scheduling or triggering R scripts packaged
             inside a PL/SQL or SQL query. Oracle R Enterprise R packages are also
             installed with the embedded R engine.
                    • For R functions not able to be mapped to native in-database functions, Oracle R
                      Enterprise makes “extproc” remote procedure calls to multiple R engines running on
                      multiple database servers/nodes
                    • This Oracle R Enterprise embedded layer uses the database as a data provider
                      providing data level parallelism to R code
                    • The interfaces, called embedded-layer RQ functions, pass streams of data to one or
                      more instances of R for (parallel) row by row processing (scoring), groups of rows
                      processing (building a model one per group) and table of rows processing (building a
                      model
           – These functions are used for “operationalizing” R code to run in production
47   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle R Enterprise Example
     Illustrates Use of all 3 Engines from within 1 R Script                   R
                                                                            Open Source




48   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
R Data and Summary Statistic
     CARSTATS
     head(CARSTATS)
     • Prints top rows of data set




     summary(CARSTATS)
     • Provides summary
       statistics for data set




49   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
R Graphics
     R> plot(CARSTATS$weight, CARSTATS$mpg)




                                                                            Heavy cars




50   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
R Graphics
      R> boxplot(split(CARSTATS$mpg, CARSTATS$model.year), col =
        "green")

                                                                            MPG increases
                                                                            over time…




51   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
R Graphics

     R> plot(CARSTATS)

     • Supports
       Exploratory Data
       Analysis for Oracle
       data




52   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle R Enterprise ARIMA Forecasting Script
     year200801 <- ONTIME_S[(ONTIME_S$YEAR==2008)& (ONTIME_S$MONTH==1),]
     y <- ore.pull(year200801)
     gc()
     delays <- tapply(y$ARRDELAY, y$DAYOFMONTH, mean, na.rm=TRUE)
     delays <- ts(delays, start=1, end=31, frequency=1)
     # Create a Kalman filter with the first 5 delays and predict the rest
     preds <- c()
     ses <- c()
     # 1 step predictions
     for (i in 5:length(delays))
     {
       fit <- arima(delays[1:i], c(1,2,1))
       # predict 1 step into the future.
       pred <- predict(fit)
       preds <- c(preds, pred$pred)
       ses <- c(ses, pred$se)
     }
     plot(5:length(delays), preds, type='l', col='green',
       ylim=range(c(preds+2*ses, preds-2*ses)), xlab="DEay of month",
       ylab="Predicted average delay (in minutes)",
       main="Average delays by day for January 2008")
     lines(5:length(delays), preds+2*ses, col='red')
     lines(5:length(delays), preds-2*ses, col='red')
     points(5:length(delays), as.vector(delays[5:length(delays)]))

     legend( 23, -8, c("Delay", "Predicted delay", "2 se confidence"),
       col=c(1, 3, 8), lty=c(0, 1, 1), pch=c(1, -1, -1), merge=TRUE)

53    Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Statistical Quality Control R Applications
 • The qcc package for R:
        – Plots Shewhart quality
          control charts for
          continuous, attribute and
          count data;
        – • lots Cusum and EWMA
          P
          charts for continuous data;
        – Performs process
           capability analyses;
        – Creates Pareto charts and
          cause-and-effect diagrams


http://www.stat.unipg.it/luca/Rnews_2004-1-pag11-17.pdf


   54    Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Statistical Quality Control R Applications
Process capability studies to                                               Pareto (80/20 rule) analysis
characterize and understand the                                             to understand which few
behavior of a "process"                                                     factors contribute most




        http://www.stat.unipg.it/luca/Rnews_2004-1-pag11-17.pdf

55   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Big Data Appliance + R
       For Compute Intensive Operations Using R
                        R workspace console


                                                                              Oracle statistics engine




                                                                                                            OBIEE, Web
                                                                                                            Services
                                         Function push-down
                                           – data transf ormation &
                                        statistics
logreg <- function(input, iterations, dims, alpha){                                                       Massively
  plane = rep(0, dims)                                                                                     parallel
  g = function(z) 1/(1 + exp(-z))                                                                        computations
  for (i in 1:iterations) {
    z = hdfs.get(hadoop.run( input,
          export = c(plane, g),
          map = logisticRegressionMapper,
          reduce = logisticRegressionReducer))
    gradient = c(z$val[1], z$val[2])
    plane = plane + alpha * gradient
  }
  plane
}
x = hdfs.push(WEBSESSIONS)
logreg(x, 10, 2, 0.05)


  56   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle In-Database Advanced Analytics
     Comprehensive Advanced Analytics Platform




 Oracle R Enterprise                                                             Oracle Data Mining
 • Popular open source statistical                                               • Automated knowledge discovery inside the
   programming language & environment                                              Database
 • Integrated with database for scalability                                      • 12 in-database data mining algorithms
 • Wide range of statistical and advanced                                        • Text mining
   analytical functions                                                          • Predictive analytics applications
 • R embedded in enterprise appls & OBIEE
 • Exploratory data analysis
                                                                            R      development environment
                                                                                 • Star schema and transactional data mining
 • Extensive graphics                                                            • Exadata "scoring" of ODM models
 • Open source R (CRAN) packages                                                 • SQL Developer/Oracle Data Miner GUI
 • Integrated with Hadoop for HPC

Statistics                             Advanced Analytics                       Data & Text Mining          Predictive Analytics

57   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
58   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Más contenido relacionado

La actualidad más candente

Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Rajesh Kumar
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data FlowMark Kromer
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Scaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksScaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksDatabricks
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Databricks
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020Timothy McAliley
 

La actualidad más candente (20)

Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Scaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksScaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with Databricks
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
 
ETL
ETLETL
ETL
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
 

Similar a Oracle Advanced Analytics

Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data avanttic Consultoría Tecnológica
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Jeffrey T. Pollock
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseJeffrey T. Pollock
 
2009.10.22 S308460 Cloud Data Services
2009.10.22 S308460  Cloud Data Services2009.10.22 S308460  Cloud Data Services
2009.10.22 S308460 Cloud Data ServicesJeffrey T. Pollock
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Charlie Berger
 
Big dataappliance hadoopworld_final
Big dataappliance hadoopworld_finalBig dataappliance hadoopworld_final
Big dataappliance hadoopworld_finaljdijcks
 
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Cloudera, Inc.
 
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15Dave Segleau
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
2. nick whitehead&ajlec bojan final bi
2. nick whitehead&ajlec bojan final  bi2. nick whitehead&ajlec bojan final  bi
2. nick whitehead&ajlec bojan final biDoina Draganescu
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Cloudera, Inc.
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 

Similar a Oracle Advanced Analytics (20)

Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
 
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management TrendsMeetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
2009.10.22 S308460 Cloud Data Services
2009.10.22 S308460  Cloud Data Services2009.10.22 S308460  Cloud Data Services
2009.10.22 S308460 Cloud Data Services
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
 
Big dataappliance hadoopworld_final
Big dataappliance hadoopworld_finalBig dataappliance hadoopworld_final
Big dataappliance hadoopworld_final
 
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
 
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
2. nick whitehead&ajlec bojan final bi
2. nick whitehead&ajlec bojan final  bi2. nick whitehead&ajlec bojan final  bi
2. nick whitehead&ajlec bojan final bi
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 

Oracle Advanced Analytics

  • 1. Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining Extending the Database into a Comprehensive Advanced Analytics Platform Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics Oracle Corporation charlie.berger@oracle.com www.twitter.com/CharlieDataMine R
  • 2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 3. Oracle Advanced Analytics Option—Agenda Extending the Database into a Comprehensive Advanced Analytics Platform • Big Data Analytics—Oracle Advanced Analytics Option • Oracle Data Mining – SQL & PL/SQL focused in-database data mining and predictive analytics • Oracle R Enterprise – Integrates Open Source R with the Oracle Database R 3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 4. Big Data Analytics R But the bigger you get, the more likely you are to be doing extensive data mining and the more likely you are to be implementing or moving towards in-database analytics. —Jim Kobielus, a senior data management analyst with Cambridge, Mass.-based Forrester Research Inc. quote from “Customary Data Warehouse Concepts vs. Hadoop: Forrester Makes the Call”, Mark Brunelli, Senior News Editor This RSS Reprints Published: 11 Aug 2011 http://searchdatamanagement.techtarget.com/news/2240039468/Customary-data-warehouse-concepts-vs-Hadoop-Forrester-makes-the-call?vgnextfmt=print 4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 5. Oracle Advanced Analytics Option Transforming the Database into a Comprehensive Advanced Analytics Platform • Oracle Advanced Analytics Option enables companies to R "bring the algorithms to the data" vs. extracting the data to specialized and expensive dedicated statistical and data mining servers • Oracle Advanced Analytics Option includes: – Oracle Data Mining • SQL & PL/SQL focused in-database data mining and predictive analytics – Oracle R Enterprise • Integrates the Open-Source Statistical Environment R with the Oracle Database • Data movement is eliminated or dramatically reduced while analytical and compute intensive operations are performed inside the database, where the data resides, to increase performance, reduce cycle times required to extract information from data and reduce total cost of ownership over traditional statistical and data mining environments 5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 6. Oracle In-Database Advanced Analytics Comprehensive Advanced Analytics Platform Oracle R Enterprise Oracle Data Mining • Popular open source statistical • Automated knowledge discovery inside the programming language & environment Database • Integrated with database for scalability • 12 in-database data mining algorithms • Wide range of statistical and advanced • Text mining analytical functions • Predictive analytics applications • R embedded in enterprise appls & OBIEE • Exploratory data analysis R development environment • Star schema and transactional data mining • Extensive graphics • Exadata "scoring" of ODM models • Open source R (CRAN) packages • SQL Developer/Oracle Data Miner GUI • Integrated with Hadoop for HPC Statistics Advanced Analytics Data & Text Mining Predictive Analytics 6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 7. Oracle—Hardware and Software Engineering to Work Together • Oracle is the world's most complete, open, and integrated business software and hardware systems company • Data Warehousing, VLDB and ILM • Oracle Data Mining New GUI • 12- in-DB data mining algorithms • Mine star schemas, text, transactional data • In-DB model build & apply—Exadata scoring • 50+ in-DB statistical functions • Oracle R Enterprise New • Run R in-DB; function push down to SQL • Wide library of supported in-DB statistical functions • Embedded R supports all R packages Oracle has taught the RDBMS how to perform data mining, statistical analysis, adv. analytics, etc. 7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 8. Oracle’s In-Database Advanced Analytics Option Value Proposition •10-100x PERFORMANCE – Integrated features of the Database – Perform analytics in-DB to eliminate data movement – Reduce information latency: days-wks  mins-hours •10x LOWER TOTAL COST OF OWNERSHIP – Eliminate/minimize expensive annual usage fees associated with traditional stats/mining packages – Leverage Oracle DB, DW & BI technology platform 8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 9. Oracle Data Mining Building Predictive Analytics Applications • Oracle Data Mining provides 12 powerful in-database data mining algorithms for big data analytics as a native feature of the database – Designed for or big data problems involving discovering patterns and relationships in large amounts of data and oftentimes making predictions based on those patterns, Oracle Data Mining allows data analysts and data miners to mine star schemas, transactional data and unstructured data stored inside the database, build predictive models and apply them to data inside the database--all without moving data. • Oracle Data Miner help users mine their data and define, save and share advanced analytical methodologies – Users who prefer a Graphical User Interface can use the Oracle Data Miner extension to SQL Developer to develop, build, evaluate, share and automate analytical workflows to solve important data driven business problems. • Developers can use the SQL APIs and PL/SQL to build applications to automate knowledge discovery – The Oracle Data Miner GUI generates SQL code that application developers can use to develop and deploy SQL and PL/SQL based automated predictive analytics applications that run natively inside the Oracle Database. 9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 10. What is Data Mining? • Automatically sifts through data to find hidden patterns, discover new insights, and make predictions • Data Mining can provide valuable results: • Predict customer behavior (Classification) • Predict or estimate a value (Regression) • Segment a population (Clustering) • Identify factors more associated with a business problem (Attribute Importance) • Find profiles of targeted people or items (Decision Trees) • Determine important relationships and “market baskets” within the population (Associations) • Find fraudulent or “rare events” (Anomaly Detection) 10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 11. In-Database Data Mining Traditional Analytics Oracle Data Mining Data Import Results • Faster time for Data Mining “Data” to “Insights” Model “Scoring” • Lower TCO—Eliminates Data Preparation • Data Movement and Transformation Savings • Data Duplication • Maintains Security Data Mining Model Building Model “Scoring” Data remains in the Database Data Prep & Transformation Embedded data preparation Cutting edge machine learning Model “Scoring” algorithms inside the SQL kernel of Data Extraction Embedded Data Prep Database Model Building Data Preparation SQL—Most powerful language for data Hours, Days or Weeks Secs, Mins or Hours preparation and transformation Source Dataset Analytic Process Target Data s/ Work Area al Process Output Data remains in the Database ing 11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 12. Oracle Data Miner Nodes (Partial List) Tables and Views Transformations Explore Data Modeling Text 12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 13. Oracle Data Mining and Unstructured Data • Oracle Data Mining mines unstructured i.e. “text” data • Include free text and comments in ODM models • Cluster and Classify documents • Oracle Text used to preprocess unstructured text 13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 14. Oracle Data Mining Algorithms Problem Algorithm Applicability Classification Logistic Regression (GLM) Classical statistical technique Decision Trees Popular / Rules / transparency Naïve Bayes Embedded app Support Vector Machine Wide / narrow data / text Regression Multiple Regression (GLM) Classical statistical technique Support Vector Machine Wide / narrow data / text Anomaly One Class SVM Lack examples of target field Detection Attribute Attribute reduction Minimum Description Length (MDL) Identify useful data Importance A1 A2 A3 A4 A5 A6 A7 Reduce data noise Association Market basket analysis Rules Apriori Link analysis Product grouping Clustering Hierarchical K-Means Text mining Hierarchical O-Cluster Gene and protein analysis Feature Text analysis Extraction Nonnegative Matrix Factorization Feature reduction F1 F2 F3 F4 14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 15. SQL Developer 3.0/Oracle Data Miner 11g Release 2 GUI • Graphical User Interface for data analyst • SQL Developer Extension (OTN download) • Explore data—discover new insights • Build and evaluate data mining models • Apply predictive models New GUI • Share analytical workflows • Deploy SQL Apply code/scripts 15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 16. Oracle Data Miner 11g Release 2 GUI Churn Demo—Simple Conceptual Workflow Churn models to product and “profile” likely churners 16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 17. Oracle Data Miner 11g Release 2 GUI Churn Demo—Simple Conceptual Workflow Clustering analysis to discover customer segments based on behavior, demograhics, plans, equipment, etc. 17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 18. Oracle Data Miner 11g Release 2 GUI Churn Demo—Simple Conceptual Workflow Market Basket Analysis to identify potential product bundless 18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 19. Fraud Prediction Demo drop table CLAIMS_SET; POLICYNUMBER PERCENT_FRAUD RNK ------------ ------------- ---------- exec dbms_data_mining.drop_model('CLAIMSMODEL'); 6532 64.78 1 create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000)); 2749 64.17 2 insert into CLAIMS_SET values 3440 63.22 3 ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES'); 654 63.1 4 12650 62.36 5 insert into CLAIMS_SET values ('PREP_AUTO','ON'); commit; Automated Monthly “Application”! Just add: begin Create dbms_data_mining.create_model('CLAIMSMODEL', 'CLASSIFICATION', View CLAIMS2_30 'CLAIMS', 'POLICYNUMBER', null, 'CLAIMS_SET'); As end; Select * from CLAIMS2 / Where mydate > SYSDATE – 30 -- Top 5 most suspicious fraud policy holder claims select * from (select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud, rank() over (order by prob_fraud desc) rnk from (select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud from CLAIMS where PASTNUMBEROFCLAIMS in ('2to4', 'morethan4'))) where rnk <= 5 order by percent_fraud desc; 19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 20. Exadata + Data Mining 11g Release 2 “DM Scoring” Pushed to Storage! Faster • In 11g Release 2, SQL predicates and Oracle Data Mining models are pushed to storage level for execution For example, find the US customers likely to churn: select cust_id from customers where region = ‘US’ and prediction_probability(churnmod,‘Y’ using *) > 0.8; 20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 21. Real-time Prediction for a Customer • On-the-fly, single record apply with new data (e.g. from call center) Select prediction_probability(CLAS_DT_5_2, 'Yes' USING 7800 as bank_funds, 125 as checking_amount, 20 as credit_balance, 55 as age, 'Married' as marital_status, 250 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership) from dual; Call Center Social Media Branch ECM BI Get Advice Web Mobile Email CRM 21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 22. Ability to Import/Export 3rd Party DM Models • ODM 11g Release 2 adds ability to import 3rd party models (PMML), convert to native ODM models and score them in-DB • Supported models for ODM model export: • Decision Trees (PMML) • Supported algorithms for ODM model import: • Multiple regression models (PMML) • Logistic regression models (PMML) • Benefits • SAS, SPSS, R, etc. data mining models can scored on Exadata • Imported dm models become native ODM models and inherit all ODM benefits including scoring at Exadata storage layer, 1st class objects, security, etc. Faster 22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 23. Oracle Communications Industry Data Model Better Information for OBIEE Dashboards Oracle Data Mining discovers different Oracle Data Mining profiles of churning customers and their identifies key contributors profile, both critical to developing a to customer churn proactive response to reduce churn. 23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 24. Oracle Communications Industry Data Model Example Better Information for OBIEE Dashboards ODM’s predictions & probabilities are available in the Database for reporting using Oracle BI EE and other tools 24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 25. Exadata with Analytics and Business Intelligence Better Together • In-database data mining builds predictive models that predict customer behavior • OBIEE’s integrated spatial mapping Customer “most likely” be be shows HIGH and VERY HIGH value where customer in the future 25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 26. Exadata with Analytics and Business Intelligence Better Together Drill-through for details about top • Exadata factors that define HIGH and VERY power HIGH value customers • OBIEE ease-of-use 26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 27. Fusion HCM Predictive Analytics Factory Installed PA/ODM Methodologies Oracle Data Mining’s factory-installed predictive analytics show employees likely to leave, top reasons, expected performance and real-time "What if?" analysis 27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 28. Predictive Analytics Applications (partial list) • Oracle Industry Data Models—factory installed data mining for specific industries • Communications Industry Data Model—churn, segmentation, profiling, etc. • Retail Industry Data Model—market basket analysis, loyalty, etc. • Oracle Spend Classification—auto review/real-time correction of submission mistakes • Oracle Fusion Human Capital Management (HCM) • Predictive Workforce—Employee turnover and performance prediction • Oracle Fusion CRM • Sales Prediction—Prediction of Sales opportunities, what to sell, amount, timing, etc. • Oracle Identify Management • Adaptive Access Manager Real-time Security at user login • Oracle FMW • Complex Event Processing integrated with integrated ODM models • Oracle ACS • Predictive Incident Monitoring Service for Oracle Database customers 28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 29. Learn More Oracle Data Mining on OTN Oracle Data Mining Blog Oracle Data Mining 29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 30. Learn More 30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 31. R Statistical Programming Language Open source language and environment Used for statistical computing and graphics Strength in easily producing publication-quality plots Highly extensible with open source community R packages 31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 32. Growing Popularity • R’s rapid adoption over several years has earned its reputation as a new statistical software standard – Rival to SAS and SPSS While it is difficult to calculate exactly how many people use R, those most familiar with the software estimate that close to 250,000 people work with it regularly. “Data Analysts Captivated by R’s Power”, New York Times, Jan 6, 2009 http://www.r-project.org/ 32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 33. Typical R Approach Statistical and advanced analyses are run and stored on the user’s laptop 33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 34. What Are ’s Challenges? 1. R is memory constrained – R processing is single threaded - does not exploit available compute infrastructure – R lacks industrial strength for enterprise use cases 2. R has lacked mindshare in Enterprise market – R is still met with caution by the long established SAS and IBM/SPSS statistical community • However, major university (e.g. Yale ) Statistics courses now taught in R • The FDA has recently shown indications for approval of new drugs for which the submission’s data analysis was performed using R 34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 35. Oracle R Enterprise Approach Data and statistical analysis are stored and run in- database Same R user experience & R same R clients Open Source Embed in operational systems Complements Oracle Data Mining 35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 36. What is R Enterprise? • Integrates popular open source R statistical programming language and environment with Oracle Database 11g Release 2 for big data. – Designed for problems involving large amounts of data, Oracle R Enterprise integrates R wit the database and enables users to run R commands on database- resident data, develop and refine R scripts, and leverage the parallelism and scalability of the database. – Data analysts can run the latest R open source packages and develop and operationalize R scripts for analytical applications in one step—without having to learn SQL. – Oracle R Enterprise performs function push down for in-database execution of base R and popular R packages. – Because it runs as an embedded component of the database, Oracle R Enterprise can run any R package either by function push down or via embedded R while the database manages the data served to the R engine. 36 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 37. What is R Enterprise? • Oracle R Enterprise brings R’s statistical functionality closer to the Oracle Database 1. Eliminate R’s memory constraint by enabling R to work directly/transparently on database objects R Open Source – Allows R to run on very large data sets, tables, views 2. Architected for Enterprise production infrastructure – Automatically exploits database parallelism without requiring parallel R programming – Build and immediately deploy R scripts 3. Oracle R leverages the latest R algorithms and packages – Embedded R runs as an embedded component of the DBMS server 37 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 38. Architecture and Performance • Transparently function-ships R constructs to database via R  SQL translation –Data structures –Functions • Data manipulation functions (select, project, join) Seconds • Basic statistical functions (avg, sum, summary) • Advanced statistical functions(gamma, beta) • Performs data-heavy computations in database –R for summary analysis and graphics • Transparent implementation enables using wide range of R “packages” from open source community 38 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 39. Oracle R Enterprise Architecture Worst Use Case: Using ONTIME airline data, of the 36 busiest airports, run a box-plot analysis of the best/worst airports for arrival delay? R workspace console Best Function push-down Oracle statistics engine – data transformation & OBIEE, Web statistics R Open Source Services Development Production Consumption 39 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 40. Better Business Intelligence Enrich BI Dashboards with Statistics, Data Mining and Adv. Analytics • Ad hoc • Exploratory data analysis • Interactive graphics • Problem-solving  Oracle R Enterprise's and ODM's results become a data feed for OBIEE 40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 41. How Oracle R Enterprise Works ORE Computation Engines R Open Source • Oracle R Enterprise tightly integrates R with the database and fully manages the data operated upon by R code. – The database is always involved in serving up data to the R code. – Oracle R Enterprise runs in the Oracle Database. • Oracle R Enterprise eliminates data movement and duplication, maintains security and minimizes latency time from raw data to new information. • Three ORE Computation Engines – Oracle R Enterprise provides three different interfaces between the open-source R engine and the Oracle database: 1. Oracle R Enterprise (ORE) Transparency Layer 2. Oracle Statistics Engine 3. Embedded R 41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 42. Oracle R Enterprise Compute Engines 1 2 3 Oracle Database Other R Other R R Engine packages SQL User tables R R Engine packages ?x Oracle R Enterprise packages R Oracle R Enterprise packages Results Open Source Results User R Engine on desktop Database Compute Engine R Engine(s) spawned by Oracle DB • R-SQL Transparency Framework intercepts R • Scale to large datasets • Database can spawn multiple R engines for functions for scalable database-managed parallelism • Access tables, views, and external tables, in-database execution • Efficient data transfer to spawned as well as data through • Function intercept for data transforms, DB LINKS R engines statistical functions and advanced analytics • Emulate map-reduce style algorithms and • Leverage database SQL parallelism applications • Interactive display of graphical results and • Enables “lights-out” execution of R scripts • Leverage new and existing flow control as in standard R in-database statistical and data mining • Submit entire R scripts for execution by capabilities Oracle Database 42 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 43. How Oracle R Enterprise Works ORE Computation Engines R Open Source 1. Oracle R Enterprise (ORE) Transparency Layer – Traps all R commands and scripts prior to execution and looks for opportunities to function ship them to the database for native execution – ORE transparency layer converts R commands/scripts into SQL equivalents and thereby leverages the database as a compute engine. 43 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 44. How Oracle R Enterprise Works ORE Computation Engines R Open Source 2. In-Database Statistics Engine – Significantly extends the Oracle Database’s library of All Base R functions statistical functions and advanced analytical computations R Multiple Regression – Provides support for the complete R language and …. Driven by customers statistical functions found in Base R and selected R packages based on customer usage ORE Functions • ORE SUMMARY • Open source packages - written entirely in R language with only • ORE FREQUENCY the functions for which we have implemented SQL counterparts - • ORE CORR • ORE UNIVARITE can be translated to execute in database. • ORE CROSSTAB – Without anything visibly different to the R users, their R • ORE RANK • ORE SORT commands and scripts are oftentimes accelerated by a • … factor of 10-100x – ORE Functions for most common/base traditional statistical software procedures 44 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 45. Transparency Framework + Statistics Engine • The following operators and functions are supported. See R documentation for syntax and semantics of these operators/function. Syntax and semantics for these items remain unchanged when used on a database mapped data type (also henceforth known as an Oracle R Enterprise data type): • Mathematical transformations: abs, sign, sqrt, ceiling, floor, trunc, cummax, cummin, cumprod, cumsum, log, log10, log2, log1p, acos, acosh, asin, asinh, atan, atanh, exp, expm1, cos, cosh, sin, sinh, tan, tanh, gamma, lgamma, digamma, trigamma, round, signif, pmin, pmax, zapsmall • Basic statistics: mean, summary, min, max, sum, prod, any, all, median, range, • IQR, fivenum, mad, quantile, sd, var, CSS, CV, kurtosis, skewness, USS, • STDERR, table, rowSums, colSums, rowMeans, colMeans • Arithmetic operators:+, -, *, /, ^, %%, %/% • Comparison operators: ==, >, <, !=, <=, >= 45 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 46. Transparency Framework + Statistics Engine • Logical operators: &, |, xor • Graphics: hist, boxplot, plot, smoothScatter • Set operations: unique, %in%, match, union, intersect, • Garbage collection of temporary objects created by the setdiff, setequal, is.element transparency • Assignment operator forms: <-, =, -> • framework: gc • String operations: tolower, toupper, casefold, toString, chatr, • Conversion functions: as.ore.{character, factor, sub, gsub, substr, substring, paste, nchar integer, logical, numeric, vector} • Combine Data Frame: cbind, rbind, merge • Test functions: is.ore.{character, factor, integer, • Combine vectors: c, append, stack logical, numeric, vector} • Vector creation: rep, ifelse • Hypothesis testing: wilcox.test, ks.test, var.test, binom.test, • Subset: [, [[, $, head, tail, window, subset, Filter, na.omit, chisq.test, t.test, bartlett.test na.exclude, complete.cases • Bessel Functions: Bessel(I,J,K,Y) • Data order: order, rank, sort, rev • Gamma Functions: gamma, lgamma, digamma, trigamma • Data reshaping: split, unlist • Error Functions: ERF, ERFC • Data processing: eval, with, within, transform • Covariance and Correlation matrices: cov, cor • Apply variants: tapply, lapply, sapply • Probability, Density and Quantile Functions: Poisson, Beta, • Aggregation: aggregate Normal, • Regression: ore.lm() - a variant of lm() • Chi-Squared, F, Gamma, Hyper-geometric, Probit, ProbNegative, • Anova and Boxcox: Working on output of ore.lm() ProbNorm, ProbT • Step-wise regression: step • Matrix Operations: Matrix multiplication, Crossprod, tcrossprod, • Special value checks: is.na, is.finite, is.infinite, is.nan solve, backsolve, eigen, svd • Metadata functions: attributes, nrow, NROW, ncol, NCOL, nlevels, names, row, col, dimnames, dim, length, row.names, col.names, levels, reorder 46 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 47. How Oracle R Enterprise Works ORE Computation Engines R Open Source 3. Embedded R Engine – The R Engine spawned by Oracle DB is also used for lights-out scheduled execution of R scripts; that is, scheduling or triggering R scripts packaged inside a PL/SQL or SQL query. Oracle R Enterprise R packages are also installed with the embedded R engine. • For R functions not able to be mapped to native in-database functions, Oracle R Enterprise makes “extproc” remote procedure calls to multiple R engines running on multiple database servers/nodes • This Oracle R Enterprise embedded layer uses the database as a data provider providing data level parallelism to R code • The interfaces, called embedded-layer RQ functions, pass streams of data to one or more instances of R for (parallel) row by row processing (scoring), groups of rows processing (building a model one per group) and table of rows processing (building a model – These functions are used for “operationalizing” R code to run in production 47 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 48. Oracle R Enterprise Example Illustrates Use of all 3 Engines from within 1 R Script R Open Source 48 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 49. R Data and Summary Statistic CARSTATS head(CARSTATS) • Prints top rows of data set summary(CARSTATS) • Provides summary statistics for data set 49 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 50. R Graphics R> plot(CARSTATS$weight, CARSTATS$mpg) Heavy cars 50 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 51. R Graphics R> boxplot(split(CARSTATS$mpg, CARSTATS$model.year), col = "green") MPG increases over time… 51 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 52. R Graphics R> plot(CARSTATS) • Supports Exploratory Data Analysis for Oracle data 52 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 53. Oracle R Enterprise ARIMA Forecasting Script year200801 <- ONTIME_S[(ONTIME_S$YEAR==2008)& (ONTIME_S$MONTH==1),] y <- ore.pull(year200801) gc() delays <- tapply(y$ARRDELAY, y$DAYOFMONTH, mean, na.rm=TRUE) delays <- ts(delays, start=1, end=31, frequency=1) # Create a Kalman filter with the first 5 delays and predict the rest preds <- c() ses <- c() # 1 step predictions for (i in 5:length(delays)) { fit <- arima(delays[1:i], c(1,2,1)) # predict 1 step into the future. pred <- predict(fit) preds <- c(preds, pred$pred) ses <- c(ses, pred$se) } plot(5:length(delays), preds, type='l', col='green', ylim=range(c(preds+2*ses, preds-2*ses)), xlab="DEay of month", ylab="Predicted average delay (in minutes)", main="Average delays by day for January 2008") lines(5:length(delays), preds+2*ses, col='red') lines(5:length(delays), preds-2*ses, col='red') points(5:length(delays), as.vector(delays[5:length(delays)])) legend( 23, -8, c("Delay", "Predicted delay", "2 se confidence"), col=c(1, 3, 8), lty=c(0, 1, 1), pch=c(1, -1, -1), merge=TRUE) 53 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 54. Statistical Quality Control R Applications • The qcc package for R: – Plots Shewhart quality control charts for continuous, attribute and count data; – • lots Cusum and EWMA P charts for continuous data; – Performs process capability analyses; – Creates Pareto charts and cause-and-effect diagrams http://www.stat.unipg.it/luca/Rnews_2004-1-pag11-17.pdf 54 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 55. Statistical Quality Control R Applications Process capability studies to Pareto (80/20 rule) analysis characterize and understand the to understand which few behavior of a "process" factors contribute most http://www.stat.unipg.it/luca/Rnews_2004-1-pag11-17.pdf 55 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 56. Big Data Appliance + R For Compute Intensive Operations Using R R workspace console Oracle statistics engine OBIEE, Web Services Function push-down – data transf ormation & statistics logreg <- function(input, iterations, dims, alpha){ Massively plane = rep(0, dims) parallel g = function(z) 1/(1 + exp(-z)) computations for (i in 1:iterations) { z = hdfs.get(hadoop.run( input, export = c(plane, g), map = logisticRegressionMapper, reduce = logisticRegressionReducer)) gradient = c(z$val[1], z$val[2]) plane = plane + alpha * gradient } plane } x = hdfs.push(WEBSESSIONS) logreg(x, 10, 2, 0.05) 56 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 57. Oracle In-Database Advanced Analytics Comprehensive Advanced Analytics Platform Oracle R Enterprise Oracle Data Mining • Popular open source statistical • Automated knowledge discovery inside the programming language & environment Database • Integrated with database for scalability • 12 in-database data mining algorithms • Wide range of statistical and advanced • Text mining analytical functions • Predictive analytics applications • R embedded in enterprise appls & OBIEE • Exploratory data analysis R development environment • Star schema and transactional data mining • Extensive graphics • Exadata "scoring" of ODM models • Open source R (CRAN) packages • SQL Developer/Oracle Data Miner GUI • Integrated with Hadoop for HPC Statistics Advanced Analytics Data & Text Mining Predictive Analytics 57 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 58. 58 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.