SlideShare una empresa de Scribd logo
1 de 16
BI/Data Warehouse



     LEVERAGING AND ENRICHING THE CAPABILITIES OF ORACLE DATABASE 11G


                                                                                         Dylan Kucera, Director – Data Architecture
                                                                                                    Ontario Teachers’ Pension Plan



INTRODUCTION
Facilitating and managing a large scale data warehouse in an effective and efficient manner cannot be achieved using a simple
turn-key set of approaches. Oracle Database 11g offers up a number of key features to help facilitate success in data
warehouse management. Where features fall short, enough extensibility is available in the platform to be able to bridge the
gap. This paper will focus on the capabilities that enable the sustenance and efficiency of a data warehouse inside of the
Oracle Database 11g environment. We will remain focused on facilitation and will not be exploring what might be
traditionally found in a BI/Data Warehousing paper such as ETL or Reporting tools.


PARTITIONING – ENABLING EFFICIENT DATA RETENTION
Partitioning is a feature often looked at in light of how it can assist with data access. While this is an absolutely critical aspect,
the job of the data warehouse administrator can also be aided through the use of partitioning. Consider a problem where you
have an analysis process that runs daily a produces a 200 million record result set that is stored in your Data Warehouse. The
data retention requirements state that you must store the analysis results for the previous 10 business days, in addition to the
previous 12 month end dates, and the previous year end date.

Every day, you must delete 200 million records; once a month it will be 400 million records, and once a year it’s 600 million
records to delete. The traditional way could take hours:

DELETE FROM RISK.PNL_DETAIL WHERE ANALYSIS_ID = 12345

However, were you to partition the results table by the “ANALYSIS_ID”, this operation is virtually instantaneous:

ALTER TABLE RISK.PNL_DETAIL DROP PARTITION P12345

There are two other important benefits to note with this approach. Firstly, the lengthy delete will chew up Undo space while
it is working, and take up valuable I/O bandwidth. Also consider, depending on the nature of your data, that you may be
able to avoid costly Index maintenance where you can instead rely on access by partition/sub-partition.




                                                                1                                                   Session 239
BI/Data Warehouse



ADVANCED COMPRESSION – SAVING SPACE AND I/O TIME
Consider the case where you have a need to store flat files in your Data Warehouse, perhaps for traceability and audit-ability
of the artifacts you load into your Data Warehouse from your counterparties or data providers. These days, with high-end
disc being the most expensive hardware in your data centre, and with Disc I/O times being the slowest link in your data
access chain, it pays to minimize the amount of storage you use and minimize the amount of I/O time spent accessing that
storage. Oracle Database Advanced Compression is an excellent tool in achieving this goal and is very simple to implement.
All you need do is to add the compression clause to your file archive table like so:

CREATE TABLE ETL.ETL_ARCHIVE ( ETL_ID NUMBER,
                                              DATE_      DATE,
                                              LENGTH NUMBER
                                              FILE_      BLOB)
       LOB(FILE_) STORE AS SECUREFILE (
                 COMPRESS
                 CACHE
       )
           Figure 1 : Oracle Database Advanced Compression

If the files you are storing are simple comma delimited text files or XML files, you can expect to achieve well in excess of
50% compression. This means you’ve more than halved the amount of I/O it takes to retrieve the files. With the speed of
today’s CPUs, you will see no material difference in access times due to the compression algorithm, and whatever overhead
there might be is dwarfed by the savings in I/O time.




                                                                 2                                            Session 239
BI/Data Warehouse



ORACLE DATABASE JVM
Suppose that you would like your Data Warehouse to serve itself in terms of data acquisition or parsing, but External Tables
are not sufficient because, for example, the target data resides on a foreign (S)FTP site or is published in Excel .XLS format.
In these cases you can supplement the Oracle Database functionality with a (Open Source) Java Library. Take for example
the case where your Data Warehouse needs to import the following Excel (native .XLS, not CSV) file:

 AFN      Afghanistan Afghanis                                44.61500
 ALL      Albania Leke                                       107.80000
 DZD      Algeria Dinars                                      75.29000
 ARS      Argentina Pesos                                      3.94400
 AUD      Australia Dollars                                    1.11865
 BSD      Bahamas Dollars                                      1.00000
 BHD      Bahrain Dinars                                       0.37705
 BDT      Bangladesh Taka                                     69.75000
 BBD      Barbados Dollars                                     2.00000
 BMD      Bermuda Dollars                                      1.00000
 BRL      Brazil Reais                                         1.75806
 BGN      Bulgaria Leva                                        1.54157
 XOF      CFA BCEAO Francs                                   516.94708
 XAF      CFA BEAC Francs                                    516.94708
 CAD      Canada Dollars                                       1.05650
        Figure 2 : Excel File to be loaded to Oracle Data Warehouse

One could perhaps shell out and use an Excel macro to save the file in a more readable CSV format. This method forms a
disconnected architecture and could lead to reliability issues. In order to perform the task in a single connected set of
operations, we need to begin by teaching the Oracle Database how to read native XLS format. This can be achieved by
importing a Java library into the Database that is suited for the task. “Java Excel API” aka “jexcelapi” is a very complete and
compatible set of classes designed to read and write native Microsoft Excel format files. What’s better is that the library is
completely Open Source which affords you the opportunity to fix or enhance the library should it not be up to your particular
requirement. First, download the latest build of the Java Excel API here:

http://jexcelapi.sourceforge.net

Then, load the .jar file that contains the library’s classes to your target database schema from a Shell/Command window:

loadjava -schema ETL -user *******/********@EDBDEV -verbose jxl.jar

This operation will produce some rather lengthy output, though really all you must verify is that you see that the Java Classes
were loaded and that there were no errors encountered in the process. These indicators appear at the very end of the loadjava
output:




                                                                  3                                            Session 239
BI/Data Warehouse



creating : class ETL.jxl/NumberCell
loading      : class ETL.jxl/NumberCell
creating : class ETL.jxl/NumberFormulaCell
loading      : class ETL.jxl/NumberFormulaCell
creating : class ETL.jxl/StringFormulaCell
loading      : class ETL.jxl/StringFormulaCell
Classes Loaded: 520
Resources Loaded: 7
Sources Loaded: 0
Published Interfaces: 0
Classes generated: 0
Classes skipped: 0
Synonyms Created: 0
Errors: 0
     Figure 3 : Output of a successful “loadjava” operation

The challenging part is putting the classes together such that an Excel file can be turned into something that the Database can
understand. Now that the Java Excel API classes are imported to your schema (in this example, the schema is “ETL”), it’s
straightforward to get started:

create or replace and compile java source named etl.utl_file as
…
import jxl.Workbook;
import jxl.WorkbookSettings;
import jxl.Sheet;
import jxl.Cell;
import jxl.CellType;
import jxl.NumberCell;
import jxl.FormalaCell;
import jxl.DateCell;
…
       Figure 4 : Java Stored Procedure – importing supplimental Open Source classes

In the body of your Java class, you would then have a static procedure that takes a BLOB as a parameter, the contents of the
BLOB is expected to be an XLS file. You can then loop through every sheet, row, and column and write each to a row in a
(temporary) table. From there you have something that you can deal with in the Database with SQL or PL/SQL. While the
entire code sample cannot be provided for the Java class due to ownership/copyright, a screen shot of the core bits is
included here which should provide a running start for your implementation:




                                                                   4                                          Session 239
BI/Data Warehouse




             Figure 5 : Partial code sample bridging gap between Java capabilities and Oracle Database

After reading the Excel file into a generic Excel-style database table by calling this procedure, the resulting data is still very
“Excel”-ish in structure with generic columns named SHEET, ROW_, COLUMN_, and VALUE_ (see Figure 7). However,
knowing the structure of the Excel file, a simple and elegant PIVOT query can be written to turn the general Excel structure
into something more usable for your Data Warehouse:

SELECT ISO_CODE, CURRENCY_NAME, RATE FROM ETL.TEMP_ETL_EXCEL
PIVOT (MAX(VALUE_)
FOR COLUMN_ IN             (1 AS ISO_CODE,
                           2 AS CURRENCY_NAME,
                           3 AS RATE))
WHERE ROW_ > 0
ORDER BY CURRENCY_NAME
             Figure 6 : Partial code sample bridging gap between Java capabilities and Oracle Database

By designing our Java class in a way that can read any arbitrary Excel file, we create a re-usable component. The output from
this component can then be turned into its final product using the approach above so long as you can rely on consistency of
the overall structure of the Excel file.




                                                                   5                                             Session 239
BI/Data Warehouse




        Figure 7 : Excel Data translated to generic Oracle table structure




Figure 8 : PIVOTing the generic Excel table to a Data Warehouse style structure

                                                       6                           Session 239
BI/Data Warehouse




Other useful Java libraries that are freely available as Open Source components are:

Apache Commons Net – Useful for implementing File Transfer protocols other than HTTP(S) (which is available natively in
Oracle Database package SYS.UTL_HTTP)
        http://commons.apache.org/net/

Orion SSH2 – Apache Commons Net is missing SFTP protocol. This library fills that gap
       http://sourceforge.net/projects/orion-ssh2/

jCIFS – NetBIOS/CIFS/SMB network protocols – useful if you are running on a Microsoft Windows driven network (eg.
Can take an IP Address of a PC and query the Windows Machine Name even if the IP is not in the DNS).
        http://jcifs.samba.org/


ORACLE ADVANCED QUEUEING – ASYNCHRONOUS NOTIFICATION
Following on the topic of extending the capabilities of the Database is the situation where the Data Warehouse must notify an
external process of a data event. Oracle Advanced Queueing (AQ) is often well suited to this problem space, depending
somewhat on the capabilities of the target that must be notified. In a sentence, Oracle Advanced Queueing provides a fully
featured Messaging implementation, with the ability to enqueue and dequeue messages using simple PL/SQL calls. Oracle
Advanced Queueing also supplies a complete JMS API, so attaching your ESB to an AQ is a simple way to allow your Data
Warehouse to announce important changes to data.

Another powerful feature of AQ is its ability to dynamically create consumers for a queue when a message is enqueued. This
feature is called PL/SQL Notification. This feature comes in most handy when you want to spin up a background process in
order to complete a long running operation without holding back the user that created the need for the operation. As a more
specific example, suppose you have a team of individuals preparing monthly reports. Once the team has signed off on the
results in the operational system, the Team Leader closes the book for the month which in turn demands a refresh of the
materialized views in the Data Warehouse. The Team Leader doesn’t want their reporting publisher to show an “hourglass”
for (say) half an hour or more while the materialized views refresh; so long as they get notified some time later that the
operation was complete, and that’s all they need to know. PL/SQL Notification is one effective way to spin up the refresh
operation in the background, allowing the Team Lead to move on with their other work.

Always be sure to consider the necessary communication (ie. send an e-mail) requirements at the end of your PL/SQL
Notification processes. Should the process fail, it may be necessary for someone to intervene to fix the problem and without
appropriate communication the problem could go unnoticed until a Data Warehouse consumer phones you with an issue!




                                                             7                                              Session 239
BI/Data Warehouse




ORACLE STREAMS AS AN ENABLER OF DATA WAREHOUSE CENTRALIZATION
Suppose your environment contains one or more Data Warehouses that need to be consolidated, or Data Marts that must be
populated from the Warehouse. Perhaps these targets are technologies other than Oracle Databases. Oracle Streams
combined with Transparent Gateway allows for seamless Heterogeneous Replication back to the legacy Data Warehouse or
Data Marts. Using this approach, the Data Warehouse staff need build and support only one ETL process, and DBA’s
support Oracle Streams like any other aspect of the Database Infrastructure.


ORACLE STREAMS – BUILDING A HETEROGENEOUS STREAM
When building a Heterogeneous Streams setup, the traditional separated Capture and Apply model must be used. Much can
be learned about the architecture of Oracle Streams by reading the Oracle Streams Concepts and Administration manual. In a very
small nutshell, the Capture Process is responsible for Mining the archive logs and finding/queueing all DML that needs to be
sent to the legacy Data Warehouse target. The Apply Process takes from this queue and actually ships the data downstream
to the legacy target.


In general, Streams is a very memory hungry process. Be prepared to allocate 2 to 4 gigabytes of memory to the Streams
Pool. Explicitly split your Capture and Apply processes over multiple nodes if you are employing RAC in order to smooth
the memory usage across your environment. The value that Streams will provide to your Data Warehouse migration strategy
should hopefully pay for the cost of the memory resources it requires.


ORACLE STREAMS – CAPTURE PROCESS AND RULES
The Capture process is created the same way as any Homogeneous capture process would be and is well described in the
manual Oracle Streams Concepts and Administration. This paper will therefore not focus on the creation of the Capture process
further, except to show a script that can be used to create an example Capture process called “SAMPLE_CAPTURE” and a
Capture rule to capture the table “PLAY.NHL_PLAYER_STAT”:
BEGIN
DBMS_STREAMS_ADM.SET_UP_QUEUE(
       queue_table => 'SAMPLE_STREAM_QT',
       queue_name       => 'SAMPLE_STREAM_Q',
       queue_user       => 'STRMADMIN'
       );
END;
/


BEGIN
DBMS_CAPTURE_ADM.CREATE_CAPTURE(
    queue_name                            => 'SAMPLE_STREAM_Q',
    capture_name                          => 'SAMPLE_CAPTURE',
    capture_user                          => 'STRMADMIN',
    checkpoint_retention_time             =>    3
    );
END;
/


                                                             8                                                Session 239
BI/Data Warehouse


Figure 9 : Oracle Streams – Standard Capture




                                               9    Session 239
BI/Data Warehouse



BEGIN
DBMS_STREAMS_ADM.ADD_TABLE_RULES(
        table_name                    => 'PLAY.NHL_PLAYER_STAT',
        streams_type                  => 'CAPTURE',
        streams_name                  => 'SAMPLE_CAPTURE',
        queue_name                    => 'SAMPLE_STREAM_Q',
        include_dml                   => true,
        include_ddl                   => false,
        include_tagged_lcr => false,
        inclusion_rule                => true
        );
END;
/
                 Figure 10 : Oracle Streams – Standard Capture Rule

ORACLE STREAMS – TRANSPARENT GATEWAY CONFIGURATION
Before you begin building the Streams Apply process, a Transparent Gateway Database Link must first be in place. The
recommended configuration is to create a separate Database Link for your Streams processes even if you have a Database
Link available to applications and users to the same remote target. Doing so allows you to use different permissions for the
Streams user (eg. The Streams link must be able to write to remote tables while Applications must not write to these same
tables or the replication will become out of sync!), and also provides flexibility in configuring or even upgrading and patching
the gateway for Streams in a different way than the gateway for applications and users.


Creating and configuring the Database Link for Streams is therefore like any other Database Link, except we will make it
owned by the database user STRMADMIN. This example shows a link named MSSQL_STREAMS_NORTHWIND that
links to the SQL Server Northwind database on a server named SQLDEV2:

#
# HS init parameters
#
HS_FDS_CONNECT_INFO=SQLDEV2//Northwind
HS_FDS_TRACE_LEVEL=OFF
HS_COMMIT_POINT_STRENGTH=0
HS_FDS_RESULTSET_SUPPORT=TRUE
HS_FDS_DEFAULT_OWNER=dbo

                 Figure 11 : Text file “initLDB_STREAMS_NORTHWIND.ora”


CREATE DATABASE LINK MSSQL_STREAMS_NORTHWIND
    CONNECT TO STRMADMIN IDENTIFIED BY ********
    USING 'LDB_STREAMS_NORTHWIND’;

                 Figure 12 : DDL to create Database Link MSSQL_STREAMS_NORTHWIND


                                                                      10                                        Session 239
BI/Data Warehouse



ORACLE STREAMS – APPLY PROCESS AND RULES
The Streams Apply Process is where the work to send rows to the Heterogeneous target occurs. Each step in the Apply
Process and Rules creation/configuration is worth looking at in some detail and so this paper will focus more closely on the
Apply Process configuration than previous steps.


When creating a Heterogeneous Apply Process, a Database Link is named. This means that in the design of your Streams
Topology, you will need to include at least one Apply Process for each “Database” on the target server. This is especially
important to consider when targeting Microsoft SQL Server or Sybase, as a Database in those environments is more like a
Schema in Oracle. Below is a script to create a sample Heterogeneous Apply process called
“SAMPLE_APPLY_NORTHWIND”:

BEGIN
DBMS_APPLY_ADM.CREATE_APPLY(
        queue_name                    => 'SAMPLE_STREAM_Q',
        apply_name                    => 'SAMPLE_APPLY_NORTHWIND',
        apply_captured                => TRUE,
        apply_database_link => 'MSSQL_STREAMS_NORTHWIND'
);
END;
/
                Figure 13 : Oracle Streams – Heterogeneous Apply


In a Heterogeneous Apply situation, the Apply Table Rule itself does not differ from a typical Streams Apply Table Rule.
Below is an example of an Apply Table Rule that includes the same table we captured in the sections above,
PLAY.COLLABORATE_SCHEDULE, as a part of the table rules for the Apply Process
SAMPLE_APPLY_NORTHWIND.

BEGIN
DBMS_STREAMS_ADM.ADD_TABLE_RULES(
         table_name          => 'PLAY.NHL_PLAYER_STAT',
         streams_type => 'APPLY',
         streams_name => 'SAMPLE_APPLY_NORTHWIND',
         queue_name          => 'SAMPLE_STREAM_Q',
         include_dml         => true,
         include_ddl         => false
);
END;
/
                Figure 14 : Oracle Streams – Standard Apply Rule




                                                                   11                                        Session 239
BI/Data Warehouse



ORACLE STREAMS – APPLY TRANSFORMS – TABLE RENAME


The Apply Table Rename transform is one of the most noteworthy steps in the process of setting up Heterogeneous streams
because it is absolutely required, unless you are applying to the same schema on the legacy Data Warehouse as the schema
owner of the table in the new Oracle Data Warehouse. It is more likely that you have either redesigned your schemas to be
aligned with the current business model, or in the case of a Microsoft SQL Server legacy you have made Oracle Schemas out
of the Databases on the SQL Server, and the legacy owner of the tables is “dbo”. You may also have wanted to take the
opportunity to create the table in the Oracle Data Warehouse using more accurate or standardized names. Below is an
example of an Apply Table Rename transform that maps the new table PLAY.NHL_PLAYER_STAT to the legacy
dbo.NHL_PLAYER_STATS table in the Northwind database:

BEGIN
DBMS_STREAMS_ADM.RENAME_TABLE(
         rule_name               => 'NHL_PLAYER_STAT2283',
         from_table_name => 'PLAY.NHL_PLAYER_STAT',
         to_table_name           => '"dbo".NHL_PLAYER_STATS',
         step_number             => 0,
         operation               =>'ADD');
END;
/
                Figure 15 : Oracle Streams – Apply Table Rename rule


Notice that the rule name is suffixed in this example with the number 554. This number was chosen by Oracle in the Add
Table Rule step. You will need to pull this out of the view DBA_STREAMS_RULES after executing the
ADD_TABLE_RULE step, or write a more sophisticated script that stores the rule name in a variable using the overloaded
ADD_TABLE_RULE procedure that allows this to be obtained as an OUT variable.


One final note about the Rename Table transform: it is not possible to Apply to a Heterogeneous target table whose name is
in Mixed Case. For example, Microsoft SQL Server allows for mixed case table names. You will need to have your DBA’s
change the table names to upper case on the target before the Apply process will work. Luckily Microsoft SQL Server is
completely case insensitive when it comes to the use of the tables, and so while changing the table names to upper case may
make a legacy “Camel Case” table list look rather ugly, nothing should functionally break as a result of this change.


ORACLE STREAMS – APPLY TRANSFORMS – COLUMN RENAME

The Column Rename transform is similar in nature to the Table Rename Transform. Notice in the example below how a
column is being renamed because the legacy table contains a column named “DATE” which is completely disallowed in
Oracle as a column name because DATE is a key word (data type). The same restriction applies to Column names as with
Table names in a Heterogeneous Apply configuration: All column names on the target must be in upper case. Again, this
should have no impact on your legacy code as systems that allow mixed case column names such as Microsoft SQL Server are
typically not case sensitive when using the column.




                                                                 12                                        Session 239
BI/Data Warehouse



BEGIN
    DBMS_STREAMS_ADM.RENAME_COLUMN(
    rule_name                 => 'NHL_PLAYER_STAT2283',
    table_name                => 'PLAY.NHL_PLAYER_STAT',
    from_column_name          => '"DATE_"',
    to_column_name            => '"DATE"',
    value_type                => '*',
    step_number               => 0,
    operation                 => 'ADD');
END;
/


BEGIN
    DBMS_STREAMS_ADM.RENAME_COLUMN(
    rule_name                 => 'NHL_PLAYER_STAT2283',
    table_name                => 'PLAY.NHL_PLAYER_STAT',
    from_column_name          => '"NAME_"',
    to_column_name            => '"NAME"',
    value_type                => '*',
    step_number               => 0,
    operation                 => 'ADD');
END;
/



            Figure 16 : Oracle Streams – Apply Column Rename rule


DECLARE
iscn NUMBER;
BEGIN
    iscn := DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER;
    DBMS_APPLY_ADM.SET_TABLE_INSTANTIATION_SCN(
    source_object_name => 'PLAY.NHL_PLAYER_STAT',
    source_database_name =>'EDBDEV.OTPP.COM',
    instantiation_scn => iscn,
    apply_database_link =>'MSSQL_STREAMS_NORTHWIND');
END;
            Figure 17 : Oracle Streams – Instantiating the Apply process




                                                                13          Session 239
BI/Data Warehouse




ORACLE STREAMS – EXERCISING THE STREAM
Assuming we have tables set up in both the legacy and new Data Warehouse that have only the table name and one column
name difference in terms of structure, the steps above are sufficient to now put the Stream into action. Streams has no ability
to synchronize tables that are out of sync. Before setting up the Stream you must ensure that the table content matches
exactly. Let’s assume for now that you are starting with zero rows and plan to insert all the data after the Stream is set up.
The screenshot below illustrates for this example that the legacy target table on Microsoft SQL Server is empty:




                 Figure 18 : Oracle Streams – Empty Microsoft SQL Server target table


Below is a rudimentary script showing the execution of some seed data being inserted into the Oracle table. You would of
course want to use a more sophisticated approach such as SQL*Loader, however, this sample is meant to be simple for the
purposes of understanding and transparency:




                                                                   14                                         Session 239
BI/Data Warehouse


SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT)
     VALUES ('2010-08-31', 'VAN', 33, 'Henrik Sedin', '1980-09-26', 112, 49.5);
1 row inserted
SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT)
     VALUES ('2010-08-31', 'PIT', 87, 'Sidney Crosby', '1987-08-07', 109, 55.9);
1 row inserted
SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT)
     VALUES ('2010-08-31', 'WSH', 8, 'Alex Ovechkin', '1985-09-17', 109, 45.4);
1 row inserted
SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT)
     VALUES ('2010-08-31', 'SJS', 19, 'Joe Thornton', '1979-07-02', 89, 53.9);
1 row inserted
SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT)
     VALUES ('2010-08-31', 'OTT', 11, 'Daniel Alfredsson', '1972-12-11', 71, 35.0);
1 row inserted
SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT)
     VALUES ('2010-08-31', 'CGY', 12, 'Jarome Iginla', '1977-07-01', 69, 47.0 );
1 row inserted
SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT)
     VALUES ('2010-08-31', 'TOR', 15, 'Tomas Kaberle', '1978-03-02', 49, NULL);
1 row inserted
SQL> COMMIT;
Commit complete


                  Figure 19 : Oracle Streams – Inserting to the new Oracle Data Warehouse Table

Allowing the Capture and Apply processes a few seconds to catch-up, re-executing the query from above on the legacy Data
Warehouse shows that the rows have been replicated through Streams to the target.




               Figure 20 : Oracle Streams – Populated Microsoft SQL Server target Table



                                                                     15                                  Session 239
BI/Data Warehouse



DATA WAREHOUSE MODERIZATION – EDITION BASED REDEFINITION
Every information system has a useful life, Data Warehouse structures being no exception. The lifecycle of the Data
Warehouse structures is particularly difficult to facilitate due to the wide number of consumers. Eventually, one reaches the
point where everyone agrees that new structures are required; however, time is required to make the transition. It is likely not
the case that every Data Warehouse modernization would justify an entirely new warehouse, in fact, a targeted and staged
approach would be more prudent. Historically, the challenge with modernizing a subset of a Data Warehouse has been: how
does one control the use of newer structures in conjunction with the older structures such that new dependencies on legacy
structures are not created? After all, do we not wish to retire the legacy structures after some reasonable period of parallel
availability?


Edition Based Redefinition, available in Oracle Database as of 11gR2, facilitates the need to provide two (or more) structural
representations of Data Warehouse segments, while allowing only one to be used at a time by any one database connection.
The edition context of the connection specifies one clear definition of data warehouse entities, either legacy or current. Of
course, a crafty developer could work around the edition segregation by opening multiple connections, or switching their
edition context mid-session, however, this would most likely be a purposeful violation of the target state rather than an easily
made oversight.


CONCLUSION
Oracle Database is well equipped to handle some of the most challenging Data Warehouse requirements with sophisticated
features such as Partitioning, Advanced Compression, Materialized Views, Advanced Queueing and Heterogenous Streams.
When the stack of capabilities as shipped are insufficient, often times it is not too far of a stretch to enrich the available
functionality with some Open Source Java Classes. As our Data Warehouses begin to show signs of age and are demanding
modernization, perhaps Oracle’s latest addition to the stack with Edition Based Redefinition has arrived not a moment too
soon. Through the extraordinary capability of Editions, we can now take control of our legacy and manage an orderly
transition of data models in our Data Warehouse environment.




                                                             16                                                 Session 239

Más contenido relacionado

La actualidad más candente

Data upload from flat file to business intelligence 7.0 system
Data upload from flat file to business intelligence 7.0 systemData upload from flat file to business intelligence 7.0 system
Data upload from flat file to business intelligence 7.0 systemBhaskar Reddy
 
You most probably dont need rman catalog database white paper
You most probably dont need rman catalog database white paperYou most probably dont need rman catalog database white paper
You most probably dont need rman catalog database white paperYury Velikanov
 
Java Web Programming Using Cloud Platform: Module 3
Java Web Programming Using Cloud Platform: Module 3Java Web Programming Using Cloud Platform: Module 3
Java Web Programming Using Cloud Platform: Module 3IMC Institute
 
Developer & Fusion Middleware 2 _ Scott Robertson _ SOA, Portals and Enterpri...
Developer & Fusion Middleware 2 _ Scott Robertson _ SOA, Portals and Enterpri...Developer & Fusion Middleware 2 _ Scott Robertson _ SOA, Portals and Enterpri...
Developer & Fusion Middleware 2 _ Scott Robertson _ SOA, Portals and Enterpri...InSync2011
 
Toad for-oracle-release-notes 142
Toad for-oracle-release-notes 142Toad for-oracle-release-notes 142
Toad for-oracle-release-notes 142MrFrancito777
 
Steps for upgrading the database to 10g release 2
Steps for upgrading the database to 10g release 2Steps for upgrading the database to 10g release 2
Steps for upgrading the database to 10g release 2nesmaddy
 
Athena java dev guide
Athena java dev guideAthena java dev guide
Athena java dev guidedvdung
 
Big Data: Get started with SQL on Hadoop self-study lab
Big Data:  Get started with SQL on Hadoop self-study lab Big Data:  Get started with SQL on Hadoop self-study lab
Big Data: Get started with SQL on Hadoop self-study lab Cynthia Saracco
 
JD Edwards & Peoplesoft 1 _ Peter Bruce _ Include Work Centre error messages ...
JD Edwards & Peoplesoft 1 _ Peter Bruce _ Include Work Centre error messages ...JD Edwards & Peoplesoft 1 _ Peter Bruce _ Include Work Centre error messages ...
JD Edwards & Peoplesoft 1 _ Peter Bruce _ Include Work Centre error messages ...InSync2011
 
Database Performance Management in Cloud
Database Performance Management in CloudDatabase Performance Management in Cloud
Database Performance Management in CloudDr. Amarjeet Singh
 
Creating Web Applications with IDMS, COBOL and ADSO
Creating Web Applications with IDMS, COBOL and ADSOCreating Web Applications with IDMS, COBOL and ADSO
Creating Web Applications with IDMS, COBOL and ADSOMargaret Sliming
 
Oracle Database Backup
Oracle Database BackupOracle Database Backup
Oracle Database BackupHandy_Backup
 
Oracle database 12c sql worshop 2 student guide vol 2
Oracle database 12c sql worshop 2 student guide vol 2Oracle database 12c sql worshop 2 student guide vol 2
Oracle database 12c sql worshop 2 student guide vol 2Otto Paiz
 
Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and HadoopCynthia Saracco
 

La actualidad más candente (19)

Data upload from flat file to business intelligence 7.0 system
Data upload from flat file to business intelligence 7.0 systemData upload from flat file to business intelligence 7.0 system
Data upload from flat file to business intelligence 7.0 system
 
Intro To IDMS
Intro To IDMSIntro To IDMS
Intro To IDMS
 
Ramesh_Oracle_DBA
Ramesh_Oracle_DBARamesh_Oracle_DBA
Ramesh_Oracle_DBA
 
You most probably dont need rman catalog database white paper
You most probably dont need rman catalog database white paperYou most probably dont need rman catalog database white paper
You most probably dont need rman catalog database white paper
 
Java Web Programming Using Cloud Platform: Module 3
Java Web Programming Using Cloud Platform: Module 3Java Web Programming Using Cloud Platform: Module 3
Java Web Programming Using Cloud Platform: Module 3
 
Developer & Fusion Middleware 2 _ Scott Robertson _ SOA, Portals and Enterpri...
Developer & Fusion Middleware 2 _ Scott Robertson _ SOA, Portals and Enterpri...Developer & Fusion Middleware 2 _ Scott Robertson _ SOA, Portals and Enterpri...
Developer & Fusion Middleware 2 _ Scott Robertson _ SOA, Portals and Enterpri...
 
Toad for-oracle-release-notes 142
Toad for-oracle-release-notes 142Toad for-oracle-release-notes 142
Toad for-oracle-release-notes 142
 
Steps for upgrading the database to 10g release 2
Steps for upgrading the database to 10g release 2Steps for upgrading the database to 10g release 2
Steps for upgrading the database to 10g release 2
 
Athena java dev guide
Athena java dev guideAthena java dev guide
Athena java dev guide
 
Big Data: Get started with SQL on Hadoop self-study lab
Big Data:  Get started with SQL on Hadoop self-study lab Big Data:  Get started with SQL on Hadoop self-study lab
Big Data: Get started with SQL on Hadoop self-study lab
 
JD Edwards & Peoplesoft 1 _ Peter Bruce _ Include Work Centre error messages ...
JD Edwards & Peoplesoft 1 _ Peter Bruce _ Include Work Centre error messages ...JD Edwards & Peoplesoft 1 _ Peter Bruce _ Include Work Centre error messages ...
JD Edwards & Peoplesoft 1 _ Peter Bruce _ Include Work Centre error messages ...
 
Readme
ReadmeReadme
Readme
 
Database Performance Management in Cloud
Database Performance Management in CloudDatabase Performance Management in Cloud
Database Performance Management in Cloud
 
Creating Web Applications with IDMS, COBOL and ADSO
Creating Web Applications with IDMS, COBOL and ADSOCreating Web Applications with IDMS, COBOL and ADSO
Creating Web Applications with IDMS, COBOL and ADSO
 
Oracle Database Backup
Oracle Database BackupOracle Database Backup
Oracle Database Backup
 
Oracle database 12c sql worshop 2 student guide vol 2
Oracle database 12c sql worshop 2 student guide vol 2Oracle database 12c sql worshop 2 student guide vol 2
Oracle database 12c sql worshop 2 student guide vol 2
 
Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and Hadoop
 
notes
notesnotes
notes
 
Aditi
AditiAditi
Aditi
 

Similar a Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Database 11g

Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Or...
Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Or...Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Or...
Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Or...djkucera
 
Less04 database instance
Less04 database instanceLess04 database instance
Less04 database instanceAmit Bhalla
 
Oracle-12c Online Training by Quontra Solutions
 Oracle-12c Online Training by Quontra Solutions Oracle-12c Online Training by Quontra Solutions
Oracle-12c Online Training by Quontra SolutionsQuontra Solutions
 
Migration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyMigration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyDonna Guazzaloca-Zehl
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryIlya Ganelin
 
Whitepaper - Information management with oracle
Whitepaper - Information management with oracleWhitepaper - Information management with oracle
Whitepaper - Information management with oracleinfoMENTUM
 
ORACLE Architechture.ppt
ORACLE Architechture.pptORACLE Architechture.ppt
ORACLE Architechture.pptaggarwalb
 
Oracle Database 12c "New features"
Oracle Database 12c "New features" Oracle Database 12c "New features"
Oracle Database 12c "New features" Anar Godjaev
 
Db2 Important questions to read
Db2 Important questions to readDb2 Important questions to read
Db2 Important questions to readPrasanth Dusi
 
OOW09 EBS Tech Essentials
OOW09 EBS Tech EssentialsOOW09 EBS Tech Essentials
OOW09 EBS Tech Essentialsjucaab
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseSandesh Rao
 
Frustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFramesFrustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFramesIlya Ganelin
 
Sql Health in a SharePoint environment
Sql Health in a SharePoint environmentSql Health in a SharePoint environment
Sql Health in a SharePoint environmentEnrique Lima
 
Whitepaper: Running Oracle e-Business Suite Database on Oracle Database Appli...
Whitepaper: Running Oracle e-Business Suite Database on Oracle Database Appli...Whitepaper: Running Oracle e-Business Suite Database on Oracle Database Appli...
Whitepaper: Running Oracle e-Business Suite Database on Oracle Database Appli...Maris Elsins
 
Move your private cloud to Dell EMC PowerEdge C6420 server nodes and boost Ap...
Move your private cloud to Dell EMC PowerEdge C6420 server nodes and boost Ap...Move your private cloud to Dell EMC PowerEdge C6420 server nodes and boost Ap...
Move your private cloud to Dell EMC PowerEdge C6420 server nodes and boost Ap...Principled Technologies
 
22059 slides
22059 slides22059 slides
22059 slidespholden1
 
Open world exadata_top_10_lessons_learned
Open world exadata_top_10_lessons_learnedOpen world exadata_top_10_lessons_learned
Open world exadata_top_10_lessons_learnedchet justice
 

Similar a Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Database 11g (20)

Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Or...
Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Or...Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Or...
Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Or...
 
PHP Oracle
PHP OraclePHP Oracle
PHP Oracle
 
Less04 database instance
Less04 database instanceLess04 database instance
Less04 database instance
 
Oracle-12c Online Training by Quontra Solutions
 Oracle-12c Online Training by Quontra Solutions Oracle-12c Online Training by Quontra Solutions
Oracle-12c Online Training by Quontra Solutions
 
Frb Briefing Database
Frb Briefing DatabaseFrb Briefing Database
Frb Briefing Database
 
Migration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyMigration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication Technology
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
 
Whitepaper - Information management with oracle
Whitepaper - Information management with oracleWhitepaper - Information management with oracle
Whitepaper - Information management with oracle
 
ORACLE Architechture.ppt
ORACLE Architechture.pptORACLE Architechture.ppt
ORACLE Architechture.ppt
 
Oracle Database 12c "New features"
Oracle Database 12c "New features" Oracle Database 12c "New features"
Oracle Database 12c "New features"
 
Db2 Important questions to read
Db2 Important questions to readDb2 Important questions to read
Db2 Important questions to read
 
OOW09 EBS Tech Essentials
OOW09 EBS Tech EssentialsOOW09 EBS Tech Essentials
OOW09 EBS Tech Essentials
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous Database
 
Frustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFramesFrustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFrames
 
Sql Health in a SharePoint environment
Sql Health in a SharePoint environmentSql Health in a SharePoint environment
Sql Health in a SharePoint environment
 
Whitepaper: Running Oracle e-Business Suite Database on Oracle Database Appli...
Whitepaper: Running Oracle e-Business Suite Database on Oracle Database Appli...Whitepaper: Running Oracle e-Business Suite Database on Oracle Database Appli...
Whitepaper: Running Oracle e-Business Suite Database on Oracle Database Appli...
 
Move your private cloud to Dell EMC PowerEdge C6420 server nodes and boost Ap...
Move your private cloud to Dell EMC PowerEdge C6420 server nodes and boost Ap...Move your private cloud to Dell EMC PowerEdge C6420 server nodes and boost Ap...
Move your private cloud to Dell EMC PowerEdge C6420 server nodes and boost Ap...
 
What's next after Upgrade to 12c
What's next after Upgrade to 12cWhat's next after Upgrade to 12c
What's next after Upgrade to 12c
 
22059 slides
22059 slides22059 slides
22059 slides
 
Open world exadata_top_10_lessons_learned
Open world exadata_top_10_lessons_learnedOpen world exadata_top_10_lessons_learned
Open world exadata_top_10_lessons_learned
 

Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Database 11g

  • 1. BI/Data Warehouse LEVERAGING AND ENRICHING THE CAPABILITIES OF ORACLE DATABASE 11G Dylan Kucera, Director – Data Architecture Ontario Teachers’ Pension Plan INTRODUCTION Facilitating and managing a large scale data warehouse in an effective and efficient manner cannot be achieved using a simple turn-key set of approaches. Oracle Database 11g offers up a number of key features to help facilitate success in data warehouse management. Where features fall short, enough extensibility is available in the platform to be able to bridge the gap. This paper will focus on the capabilities that enable the sustenance and efficiency of a data warehouse inside of the Oracle Database 11g environment. We will remain focused on facilitation and will not be exploring what might be traditionally found in a BI/Data Warehousing paper such as ETL or Reporting tools. PARTITIONING – ENABLING EFFICIENT DATA RETENTION Partitioning is a feature often looked at in light of how it can assist with data access. While this is an absolutely critical aspect, the job of the data warehouse administrator can also be aided through the use of partitioning. Consider a problem where you have an analysis process that runs daily a produces a 200 million record result set that is stored in your Data Warehouse. The data retention requirements state that you must store the analysis results for the previous 10 business days, in addition to the previous 12 month end dates, and the previous year end date. Every day, you must delete 200 million records; once a month it will be 400 million records, and once a year it’s 600 million records to delete. The traditional way could take hours: DELETE FROM RISK.PNL_DETAIL WHERE ANALYSIS_ID = 12345 However, were you to partition the results table by the “ANALYSIS_ID”, this operation is virtually instantaneous: ALTER TABLE RISK.PNL_DETAIL DROP PARTITION P12345 There are two other important benefits to note with this approach. Firstly, the lengthy delete will chew up Undo space while it is working, and take up valuable I/O bandwidth. Also consider, depending on the nature of your data, that you may be able to avoid costly Index maintenance where you can instead rely on access by partition/sub-partition. 1 Session 239
  • 2. BI/Data Warehouse ADVANCED COMPRESSION – SAVING SPACE AND I/O TIME Consider the case where you have a need to store flat files in your Data Warehouse, perhaps for traceability and audit-ability of the artifacts you load into your Data Warehouse from your counterparties or data providers. These days, with high-end disc being the most expensive hardware in your data centre, and with Disc I/O times being the slowest link in your data access chain, it pays to minimize the amount of storage you use and minimize the amount of I/O time spent accessing that storage. Oracle Database Advanced Compression is an excellent tool in achieving this goal and is very simple to implement. All you need do is to add the compression clause to your file archive table like so: CREATE TABLE ETL.ETL_ARCHIVE ( ETL_ID NUMBER, DATE_ DATE, LENGTH NUMBER FILE_ BLOB) LOB(FILE_) STORE AS SECUREFILE ( COMPRESS CACHE ) Figure 1 : Oracle Database Advanced Compression If the files you are storing are simple comma delimited text files or XML files, you can expect to achieve well in excess of 50% compression. This means you’ve more than halved the amount of I/O it takes to retrieve the files. With the speed of today’s CPUs, you will see no material difference in access times due to the compression algorithm, and whatever overhead there might be is dwarfed by the savings in I/O time. 2 Session 239
  • 3. BI/Data Warehouse ORACLE DATABASE JVM Suppose that you would like your Data Warehouse to serve itself in terms of data acquisition or parsing, but External Tables are not sufficient because, for example, the target data resides on a foreign (S)FTP site or is published in Excel .XLS format. In these cases you can supplement the Oracle Database functionality with a (Open Source) Java Library. Take for example the case where your Data Warehouse needs to import the following Excel (native .XLS, not CSV) file: AFN Afghanistan Afghanis 44.61500 ALL Albania Leke 107.80000 DZD Algeria Dinars 75.29000 ARS Argentina Pesos 3.94400 AUD Australia Dollars 1.11865 BSD Bahamas Dollars 1.00000 BHD Bahrain Dinars 0.37705 BDT Bangladesh Taka 69.75000 BBD Barbados Dollars 2.00000 BMD Bermuda Dollars 1.00000 BRL Brazil Reais 1.75806 BGN Bulgaria Leva 1.54157 XOF CFA BCEAO Francs 516.94708 XAF CFA BEAC Francs 516.94708 CAD Canada Dollars 1.05650 Figure 2 : Excel File to be loaded to Oracle Data Warehouse One could perhaps shell out and use an Excel macro to save the file in a more readable CSV format. This method forms a disconnected architecture and could lead to reliability issues. In order to perform the task in a single connected set of operations, we need to begin by teaching the Oracle Database how to read native XLS format. This can be achieved by importing a Java library into the Database that is suited for the task. “Java Excel API” aka “jexcelapi” is a very complete and compatible set of classes designed to read and write native Microsoft Excel format files. What’s better is that the library is completely Open Source which affords you the opportunity to fix or enhance the library should it not be up to your particular requirement. First, download the latest build of the Java Excel API here: http://jexcelapi.sourceforge.net Then, load the .jar file that contains the library’s classes to your target database schema from a Shell/Command window: loadjava -schema ETL -user *******/********@EDBDEV -verbose jxl.jar This operation will produce some rather lengthy output, though really all you must verify is that you see that the Java Classes were loaded and that there were no errors encountered in the process. These indicators appear at the very end of the loadjava output: 3 Session 239
  • 4. BI/Data Warehouse creating : class ETL.jxl/NumberCell loading : class ETL.jxl/NumberCell creating : class ETL.jxl/NumberFormulaCell loading : class ETL.jxl/NumberFormulaCell creating : class ETL.jxl/StringFormulaCell loading : class ETL.jxl/StringFormulaCell Classes Loaded: 520 Resources Loaded: 7 Sources Loaded: 0 Published Interfaces: 0 Classes generated: 0 Classes skipped: 0 Synonyms Created: 0 Errors: 0 Figure 3 : Output of a successful “loadjava” operation The challenging part is putting the classes together such that an Excel file can be turned into something that the Database can understand. Now that the Java Excel API classes are imported to your schema (in this example, the schema is “ETL”), it’s straightforward to get started: create or replace and compile java source named etl.utl_file as … import jxl.Workbook; import jxl.WorkbookSettings; import jxl.Sheet; import jxl.Cell; import jxl.CellType; import jxl.NumberCell; import jxl.FormalaCell; import jxl.DateCell; … Figure 4 : Java Stored Procedure – importing supplimental Open Source classes In the body of your Java class, you would then have a static procedure that takes a BLOB as a parameter, the contents of the BLOB is expected to be an XLS file. You can then loop through every sheet, row, and column and write each to a row in a (temporary) table. From there you have something that you can deal with in the Database with SQL or PL/SQL. While the entire code sample cannot be provided for the Java class due to ownership/copyright, a screen shot of the core bits is included here which should provide a running start for your implementation: 4 Session 239
  • 5. BI/Data Warehouse Figure 5 : Partial code sample bridging gap between Java capabilities and Oracle Database After reading the Excel file into a generic Excel-style database table by calling this procedure, the resulting data is still very “Excel”-ish in structure with generic columns named SHEET, ROW_, COLUMN_, and VALUE_ (see Figure 7). However, knowing the structure of the Excel file, a simple and elegant PIVOT query can be written to turn the general Excel structure into something more usable for your Data Warehouse: SELECT ISO_CODE, CURRENCY_NAME, RATE FROM ETL.TEMP_ETL_EXCEL PIVOT (MAX(VALUE_) FOR COLUMN_ IN (1 AS ISO_CODE, 2 AS CURRENCY_NAME, 3 AS RATE)) WHERE ROW_ > 0 ORDER BY CURRENCY_NAME Figure 6 : Partial code sample bridging gap between Java capabilities and Oracle Database By designing our Java class in a way that can read any arbitrary Excel file, we create a re-usable component. The output from this component can then be turned into its final product using the approach above so long as you can rely on consistency of the overall structure of the Excel file. 5 Session 239
  • 6. BI/Data Warehouse Figure 7 : Excel Data translated to generic Oracle table structure Figure 8 : PIVOTing the generic Excel table to a Data Warehouse style structure 6 Session 239
  • 7. BI/Data Warehouse Other useful Java libraries that are freely available as Open Source components are: Apache Commons Net – Useful for implementing File Transfer protocols other than HTTP(S) (which is available natively in Oracle Database package SYS.UTL_HTTP) http://commons.apache.org/net/ Orion SSH2 – Apache Commons Net is missing SFTP protocol. This library fills that gap http://sourceforge.net/projects/orion-ssh2/ jCIFS – NetBIOS/CIFS/SMB network protocols – useful if you are running on a Microsoft Windows driven network (eg. Can take an IP Address of a PC and query the Windows Machine Name even if the IP is not in the DNS). http://jcifs.samba.org/ ORACLE ADVANCED QUEUEING – ASYNCHRONOUS NOTIFICATION Following on the topic of extending the capabilities of the Database is the situation where the Data Warehouse must notify an external process of a data event. Oracle Advanced Queueing (AQ) is often well suited to this problem space, depending somewhat on the capabilities of the target that must be notified. In a sentence, Oracle Advanced Queueing provides a fully featured Messaging implementation, with the ability to enqueue and dequeue messages using simple PL/SQL calls. Oracle Advanced Queueing also supplies a complete JMS API, so attaching your ESB to an AQ is a simple way to allow your Data Warehouse to announce important changes to data. Another powerful feature of AQ is its ability to dynamically create consumers for a queue when a message is enqueued. This feature is called PL/SQL Notification. This feature comes in most handy when you want to spin up a background process in order to complete a long running operation without holding back the user that created the need for the operation. As a more specific example, suppose you have a team of individuals preparing monthly reports. Once the team has signed off on the results in the operational system, the Team Leader closes the book for the month which in turn demands a refresh of the materialized views in the Data Warehouse. The Team Leader doesn’t want their reporting publisher to show an “hourglass” for (say) half an hour or more while the materialized views refresh; so long as they get notified some time later that the operation was complete, and that’s all they need to know. PL/SQL Notification is one effective way to spin up the refresh operation in the background, allowing the Team Lead to move on with their other work. Always be sure to consider the necessary communication (ie. send an e-mail) requirements at the end of your PL/SQL Notification processes. Should the process fail, it may be necessary for someone to intervene to fix the problem and without appropriate communication the problem could go unnoticed until a Data Warehouse consumer phones you with an issue! 7 Session 239
  • 8. BI/Data Warehouse ORACLE STREAMS AS AN ENABLER OF DATA WAREHOUSE CENTRALIZATION Suppose your environment contains one or more Data Warehouses that need to be consolidated, or Data Marts that must be populated from the Warehouse. Perhaps these targets are technologies other than Oracle Databases. Oracle Streams combined with Transparent Gateway allows for seamless Heterogeneous Replication back to the legacy Data Warehouse or Data Marts. Using this approach, the Data Warehouse staff need build and support only one ETL process, and DBA’s support Oracle Streams like any other aspect of the Database Infrastructure. ORACLE STREAMS – BUILDING A HETEROGENEOUS STREAM When building a Heterogeneous Streams setup, the traditional separated Capture and Apply model must be used. Much can be learned about the architecture of Oracle Streams by reading the Oracle Streams Concepts and Administration manual. In a very small nutshell, the Capture Process is responsible for Mining the archive logs and finding/queueing all DML that needs to be sent to the legacy Data Warehouse target. The Apply Process takes from this queue and actually ships the data downstream to the legacy target. In general, Streams is a very memory hungry process. Be prepared to allocate 2 to 4 gigabytes of memory to the Streams Pool. Explicitly split your Capture and Apply processes over multiple nodes if you are employing RAC in order to smooth the memory usage across your environment. The value that Streams will provide to your Data Warehouse migration strategy should hopefully pay for the cost of the memory resources it requires. ORACLE STREAMS – CAPTURE PROCESS AND RULES The Capture process is created the same way as any Homogeneous capture process would be and is well described in the manual Oracle Streams Concepts and Administration. This paper will therefore not focus on the creation of the Capture process further, except to show a script that can be used to create an example Capture process called “SAMPLE_CAPTURE” and a Capture rule to capture the table “PLAY.NHL_PLAYER_STAT”: BEGIN DBMS_STREAMS_ADM.SET_UP_QUEUE( queue_table => 'SAMPLE_STREAM_QT', queue_name => 'SAMPLE_STREAM_Q', queue_user => 'STRMADMIN' ); END; / BEGIN DBMS_CAPTURE_ADM.CREATE_CAPTURE( queue_name => 'SAMPLE_STREAM_Q', capture_name => 'SAMPLE_CAPTURE', capture_user => 'STRMADMIN', checkpoint_retention_time => 3 ); END; / 8 Session 239
  • 9. BI/Data Warehouse Figure 9 : Oracle Streams – Standard Capture 9 Session 239
  • 10. BI/Data Warehouse BEGIN DBMS_STREAMS_ADM.ADD_TABLE_RULES( table_name => 'PLAY.NHL_PLAYER_STAT', streams_type => 'CAPTURE', streams_name => 'SAMPLE_CAPTURE', queue_name => 'SAMPLE_STREAM_Q', include_dml => true, include_ddl => false, include_tagged_lcr => false, inclusion_rule => true ); END; / Figure 10 : Oracle Streams – Standard Capture Rule ORACLE STREAMS – TRANSPARENT GATEWAY CONFIGURATION Before you begin building the Streams Apply process, a Transparent Gateway Database Link must first be in place. The recommended configuration is to create a separate Database Link for your Streams processes even if you have a Database Link available to applications and users to the same remote target. Doing so allows you to use different permissions for the Streams user (eg. The Streams link must be able to write to remote tables while Applications must not write to these same tables or the replication will become out of sync!), and also provides flexibility in configuring or even upgrading and patching the gateway for Streams in a different way than the gateway for applications and users. Creating and configuring the Database Link for Streams is therefore like any other Database Link, except we will make it owned by the database user STRMADMIN. This example shows a link named MSSQL_STREAMS_NORTHWIND that links to the SQL Server Northwind database on a server named SQLDEV2: # # HS init parameters # HS_FDS_CONNECT_INFO=SQLDEV2//Northwind HS_FDS_TRACE_LEVEL=OFF HS_COMMIT_POINT_STRENGTH=0 HS_FDS_RESULTSET_SUPPORT=TRUE HS_FDS_DEFAULT_OWNER=dbo Figure 11 : Text file “initLDB_STREAMS_NORTHWIND.ora” CREATE DATABASE LINK MSSQL_STREAMS_NORTHWIND CONNECT TO STRMADMIN IDENTIFIED BY ******** USING 'LDB_STREAMS_NORTHWIND’; Figure 12 : DDL to create Database Link MSSQL_STREAMS_NORTHWIND 10 Session 239
  • 11. BI/Data Warehouse ORACLE STREAMS – APPLY PROCESS AND RULES The Streams Apply Process is where the work to send rows to the Heterogeneous target occurs. Each step in the Apply Process and Rules creation/configuration is worth looking at in some detail and so this paper will focus more closely on the Apply Process configuration than previous steps. When creating a Heterogeneous Apply Process, a Database Link is named. This means that in the design of your Streams Topology, you will need to include at least one Apply Process for each “Database” on the target server. This is especially important to consider when targeting Microsoft SQL Server or Sybase, as a Database in those environments is more like a Schema in Oracle. Below is a script to create a sample Heterogeneous Apply process called “SAMPLE_APPLY_NORTHWIND”: BEGIN DBMS_APPLY_ADM.CREATE_APPLY( queue_name => 'SAMPLE_STREAM_Q', apply_name => 'SAMPLE_APPLY_NORTHWIND', apply_captured => TRUE, apply_database_link => 'MSSQL_STREAMS_NORTHWIND' ); END; / Figure 13 : Oracle Streams – Heterogeneous Apply In a Heterogeneous Apply situation, the Apply Table Rule itself does not differ from a typical Streams Apply Table Rule. Below is an example of an Apply Table Rule that includes the same table we captured in the sections above, PLAY.COLLABORATE_SCHEDULE, as a part of the table rules for the Apply Process SAMPLE_APPLY_NORTHWIND. BEGIN DBMS_STREAMS_ADM.ADD_TABLE_RULES( table_name => 'PLAY.NHL_PLAYER_STAT', streams_type => 'APPLY', streams_name => 'SAMPLE_APPLY_NORTHWIND', queue_name => 'SAMPLE_STREAM_Q', include_dml => true, include_ddl => false ); END; / Figure 14 : Oracle Streams – Standard Apply Rule 11 Session 239
  • 12. BI/Data Warehouse ORACLE STREAMS – APPLY TRANSFORMS – TABLE RENAME The Apply Table Rename transform is one of the most noteworthy steps in the process of setting up Heterogeneous streams because it is absolutely required, unless you are applying to the same schema on the legacy Data Warehouse as the schema owner of the table in the new Oracle Data Warehouse. It is more likely that you have either redesigned your schemas to be aligned with the current business model, or in the case of a Microsoft SQL Server legacy you have made Oracle Schemas out of the Databases on the SQL Server, and the legacy owner of the tables is “dbo”. You may also have wanted to take the opportunity to create the table in the Oracle Data Warehouse using more accurate or standardized names. Below is an example of an Apply Table Rename transform that maps the new table PLAY.NHL_PLAYER_STAT to the legacy dbo.NHL_PLAYER_STATS table in the Northwind database: BEGIN DBMS_STREAMS_ADM.RENAME_TABLE( rule_name => 'NHL_PLAYER_STAT2283', from_table_name => 'PLAY.NHL_PLAYER_STAT', to_table_name => '"dbo".NHL_PLAYER_STATS', step_number => 0, operation =>'ADD'); END; / Figure 15 : Oracle Streams – Apply Table Rename rule Notice that the rule name is suffixed in this example with the number 554. This number was chosen by Oracle in the Add Table Rule step. You will need to pull this out of the view DBA_STREAMS_RULES after executing the ADD_TABLE_RULE step, or write a more sophisticated script that stores the rule name in a variable using the overloaded ADD_TABLE_RULE procedure that allows this to be obtained as an OUT variable. One final note about the Rename Table transform: it is not possible to Apply to a Heterogeneous target table whose name is in Mixed Case. For example, Microsoft SQL Server allows for mixed case table names. You will need to have your DBA’s change the table names to upper case on the target before the Apply process will work. Luckily Microsoft SQL Server is completely case insensitive when it comes to the use of the tables, and so while changing the table names to upper case may make a legacy “Camel Case” table list look rather ugly, nothing should functionally break as a result of this change. ORACLE STREAMS – APPLY TRANSFORMS – COLUMN RENAME The Column Rename transform is similar in nature to the Table Rename Transform. Notice in the example below how a column is being renamed because the legacy table contains a column named “DATE” which is completely disallowed in Oracle as a column name because DATE is a key word (data type). The same restriction applies to Column names as with Table names in a Heterogeneous Apply configuration: All column names on the target must be in upper case. Again, this should have no impact on your legacy code as systems that allow mixed case column names such as Microsoft SQL Server are typically not case sensitive when using the column. 12 Session 239
  • 13. BI/Data Warehouse BEGIN DBMS_STREAMS_ADM.RENAME_COLUMN( rule_name => 'NHL_PLAYER_STAT2283', table_name => 'PLAY.NHL_PLAYER_STAT', from_column_name => '"DATE_"', to_column_name => '"DATE"', value_type => '*', step_number => 0, operation => 'ADD'); END; / BEGIN DBMS_STREAMS_ADM.RENAME_COLUMN( rule_name => 'NHL_PLAYER_STAT2283', table_name => 'PLAY.NHL_PLAYER_STAT', from_column_name => '"NAME_"', to_column_name => '"NAME"', value_type => '*', step_number => 0, operation => 'ADD'); END; / Figure 16 : Oracle Streams – Apply Column Rename rule DECLARE iscn NUMBER; BEGIN iscn := DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER; DBMS_APPLY_ADM.SET_TABLE_INSTANTIATION_SCN( source_object_name => 'PLAY.NHL_PLAYER_STAT', source_database_name =>'EDBDEV.OTPP.COM', instantiation_scn => iscn, apply_database_link =>'MSSQL_STREAMS_NORTHWIND'); END; Figure 17 : Oracle Streams – Instantiating the Apply process 13 Session 239
  • 14. BI/Data Warehouse ORACLE STREAMS – EXERCISING THE STREAM Assuming we have tables set up in both the legacy and new Data Warehouse that have only the table name and one column name difference in terms of structure, the steps above are sufficient to now put the Stream into action. Streams has no ability to synchronize tables that are out of sync. Before setting up the Stream you must ensure that the table content matches exactly. Let’s assume for now that you are starting with zero rows and plan to insert all the data after the Stream is set up. The screenshot below illustrates for this example that the legacy target table on Microsoft SQL Server is empty: Figure 18 : Oracle Streams – Empty Microsoft SQL Server target table Below is a rudimentary script showing the execution of some seed data being inserted into the Oracle table. You would of course want to use a more sophisticated approach such as SQL*Loader, however, this sample is meant to be simple for the purposes of understanding and transparency: 14 Session 239
  • 15. BI/Data Warehouse SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) VALUES ('2010-08-31', 'VAN', 33, 'Henrik Sedin', '1980-09-26', 112, 49.5); 1 row inserted SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) VALUES ('2010-08-31', 'PIT', 87, 'Sidney Crosby', '1987-08-07', 109, 55.9); 1 row inserted SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) VALUES ('2010-08-31', 'WSH', 8, 'Alex Ovechkin', '1985-09-17', 109, 45.4); 1 row inserted SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) VALUES ('2010-08-31', 'SJS', 19, 'Joe Thornton', '1979-07-02', 89, 53.9); 1 row inserted SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) VALUES ('2010-08-31', 'OTT', 11, 'Daniel Alfredsson', '1972-12-11', 71, 35.0); 1 row inserted SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) VALUES ('2010-08-31', 'CGY', 12, 'Jarome Iginla', '1977-07-01', 69, 47.0 ); 1 row inserted SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) VALUES ('2010-08-31', 'TOR', 15, 'Tomas Kaberle', '1978-03-02', 49, NULL); 1 row inserted SQL> COMMIT; Commit complete Figure 19 : Oracle Streams – Inserting to the new Oracle Data Warehouse Table Allowing the Capture and Apply processes a few seconds to catch-up, re-executing the query from above on the legacy Data Warehouse shows that the rows have been replicated through Streams to the target. Figure 20 : Oracle Streams – Populated Microsoft SQL Server target Table 15 Session 239
  • 16. BI/Data Warehouse DATA WAREHOUSE MODERIZATION – EDITION BASED REDEFINITION Every information system has a useful life, Data Warehouse structures being no exception. The lifecycle of the Data Warehouse structures is particularly difficult to facilitate due to the wide number of consumers. Eventually, one reaches the point where everyone agrees that new structures are required; however, time is required to make the transition. It is likely not the case that every Data Warehouse modernization would justify an entirely new warehouse, in fact, a targeted and staged approach would be more prudent. Historically, the challenge with modernizing a subset of a Data Warehouse has been: how does one control the use of newer structures in conjunction with the older structures such that new dependencies on legacy structures are not created? After all, do we not wish to retire the legacy structures after some reasonable period of parallel availability? Edition Based Redefinition, available in Oracle Database as of 11gR2, facilitates the need to provide two (or more) structural representations of Data Warehouse segments, while allowing only one to be used at a time by any one database connection. The edition context of the connection specifies one clear definition of data warehouse entities, either legacy or current. Of course, a crafty developer could work around the edition segregation by opening multiple connections, or switching their edition context mid-session, however, this would most likely be a purposeful violation of the target state rather than an easily made oversight. CONCLUSION Oracle Database is well equipped to handle some of the most challenging Data Warehouse requirements with sophisticated features such as Partitioning, Advanced Compression, Materialized Views, Advanced Queueing and Heterogenous Streams. When the stack of capabilities as shipped are insufficient, often times it is not too far of a stretch to enrich the available functionality with some Open Source Java Classes. As our Data Warehouses begin to show signs of age and are demanding modernization, perhaps Oracle’s latest addition to the stack with Edition Based Redefinition has arrived not a moment too soon. Through the extraordinary capability of Editions, we can now take control of our legacy and manage an orderly transition of data models in our Data Warehouse environment. 16 Session 239