7. Account, Customer & Address Relationships Account Contact Party Address link Account Party link Address Account Party Account Information loaded from ALL Source Systems ETL process builds the relationship between Accounts and Customers (Party) based on the relationship file from CUSTOMER CRM SYSTEM
8. EDW Process State Staging Area EDW Metadata | Data Governance | Data Management DM CPS MANTAS CRDB MKTG FIN SALES EDW Data cleansing Data profiling Sync & Sort BI Source System Cleanse / Pre-process IMP RM OEC ALS AFS ST RE DFP SBA AFS V-PR
Dims have a simple PRIMARY KEY Facts have FOREIGN KEYS (which make up a compound primary key often used as a natural key in ETL coding DIMS are 2 nd normal form FACTS are 3 rd normal form
Fact.Sales is the fact table and there are three dimension tables Dim.Date, Dim.Store and Dim.Product. Each dimension table has a primary key on its PK column, relating to one of the columns (viewed as rows in the example schema) of the Fact.Sales table's three-column (compound) primary key (Date_FK, Store_FK, Product_FK). The non-primary key [Units Sold] column of the fact table in this example represents a measure or metric that can be used in calculations and analysis. The non-primary key columns of the dimension tables represent additional attributes of the dimensions (such as the Year of the Dim.Date dimension). Using schema descriptors with dot-notation, combined with simple suffix decorations for column differentiation, makes it easier to write the SQL for Star Schema queries. This is because fewer underscores are required and table aliasing is minimized. Most SQL database engines allow schemata descriptors, and also permit decoration suffixes on surrogate keys columns. Using square brackets, which are physically easier to type on the keyboard (no shift key needed) are not intrusive and make the code easier to read. For example, the following query extracts how many TV sets have been sold, for each brand and country, in 1997: SELECT Brand, Country, SUM ([Units Sold]) FROM Fact.Sales JOIN Dim.Date ON Date_FK = Date_PK JOIN Dim.Store ON Store_FK = Store_PK JOIN Dim.Product ON Product_FK = Product_PK WHERE [Year] = 1997 AND [Product Category] = 'tv' GROUP BY Brand, Country http://en.wikipedia.org/wiki/Star_schema
http://en.wikipedia.org/wiki/Snowflake_schema
DIMS connect out to be more 3 rd normal form
The skyrocketing power of hardware and software, along with the availability of affordable and easy-to-use reporting and analysis tools have played the most important role in evolution of data warehouses.
Another factor that is fast becoming an important variable in data warehousing equations is the emergence of vendors with popular business application suites. Led by wildly popular German software vendor SAP AG, flexible business software suites adapted to the particulars of a business have become a very popular way to move to a sophisticated multi-tier architecture. Other vendors such as Baan, PeopleSoft, and Oracle have likewise come out with suites of software that provide different strengths but have comparable functionality. The emergence of these application suites has a direct bearing on the increased use of data warehousing in that they are increasingly able to provide standard applications that are replacing existing custom developed legacy applications. In the near future, almost every data warehouse is likely to derive data from one of these application sources rather than the customized extraction from legacy systems. Further, there are significant initiatives at these vendors to make transaction data easily available to data warehousing systems. To the extent that these standard applications have extensive customization features, data acquisition from these applications can be much simpler than from the mainframe systems
Provides consistent use of data element (entity attributes) values – ie M, F vs 1,2 for gender
Yes, we can come up with more – but we’ll pay attention to these
“A challenge that organizations face as they attempt to define data quality key performance indicators is that completeness, validity and integrity may be relatively easy to measure, but measuring consistency, accuracy and timeliness is a whole other story. “ Information Mgmt
Hardware Software licenses ETL Testing Promotion to production
For the purpose of this analysis, Ability to Execute is a function of a vendor's score of five measures that Gartner believes customers care about most in vendor selection. It does not equate to revenue, revenue growth or market share. Completeness of Vision is based on the scoring of six key measures, including, but not exclusive to, "Offering (Product) Strategy." It is important to understand these criteria while judging vendors' positions on the Magic Quadrant. These evaluation criteria are detailed in the Evaluation Criteria section of this document.
With an analytical approach, the Patriots managed to win the Super Bowl three times in four years. The team uses data and analytical models extensively, both on and off the field. In-depth analytics help the team select players and stay below the NFL salary cap. Patriots coaches and players are renowned for their extensive study of game film and statistics, and Coach Bill Belichick reads articles by academic economists on statistical probabilities of football outcomes. Off the field, the team uses detailed analytics to assess and improve the "total fan experience." At every home game, for example, 20 to 25 people have specific assignments to make quantitative measurements of the stadium food, parking, personnel, bathroom cleanliness and other factors. In retail, Wal-Mart uses vast amounts of data and category analysis to dominate the industry. Harrah’s has changed the basis of competition in gaming from building megacasinos to analytics around customer loyalty and service. Amazon and Yahoo aren't just e-commerce sites; they are extremely analytical and follow a "test and learn" approach to business changes. Capital One runs more than 30,000 experiments a year to identify desirable customers and price credit card offers.
Mainly 2 tools: Multidimensional OLAP and Relationship OLAP HOLAP is a hybrid of the two
BI Engineer job posting: Responsibilities: Act as a point person for statistical analyses, data deep dives, and general reporting. Deep dive into massive data sets to answer key business questions using MS Excel, Oracle, SQL, SAS, Perl, and other data manipulation languages. Interact with key stakeholders to understand business issues and recommend approaches to insure business questions are properly answered. Manage large scale requests and projects to define requirements, manage timelines, and coordinate activities with other involved team members. Use experimental design and statistics to assist in the design and measurement of marketing tests. Report on key business metrics. Participate in the design and development of analytics and reporting data mart. Using data mining techniques, statistics, and SAS, build predictive models and segmentation schemes for the purposes of cross sell, retention, acquisition, and lifetime value. Qualifications : Master’s degree or foreign equivalent in Mathematics, Statistics, Analytics, Operations Research, or a related field plus one year of progressively responsible experience in the job offered or as a Business Analyst, Data Engineer, or another related occupation. Employer will accept a Bachelor’s degree in Mathematics, Statistics, Analytics, Operations Research, or a related field plus five years of experience in the specialty as equivalent to a Master’s degree and one year of progressively responsible experience. Experience in the job offered or related occupation must involve performing data modeling, database development, and statistical testing and analysis of large-scale datasets using Oracle SQL, Perl, MS Excel, and SAS.