Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Data ware house architecture

this was the slide i made for class presentation .now it is public

  • Sé el primero en comentar

Data ware house architecture

  1. 1. The issus I’ll make focus on…  What is data ware house?  Architecture of data ware house?  Olap server and its various types and their working?  Data marts?
  2. 2. What is this dataware house all about ??
  3. 3. A data warehouse is a Subject-oriented ->DATABASE AND DATAWARE HOUSE ARE 2 DIFFERENT THINGS SO DIFFERENT APPROACH S OF STORING DATA INTO THEM . Integrated -> BRINGING INTO A COMMON FORMAT Time-varying ->HISTORICAL DATA ,DATE ASSOCIATED WITH TIME Non-volatile -> UNDELETABLE AND NON UPDATABLE FORMAT collection of data that is used primarily in organizational decision making.
  4. 4. Subject oriented?? 5 Operational Database Application -orientation Order processing Saving account Data ware house Subject-orientation sales account Stock mgmt Billing Current account Loan account Business Bank
  5. 5. Explanation  As we can see in both business and bank example the databases store the data application wise . It simply means that for every operational application of the organization there is a storage associated in which that application specific data are stored. These storages are called database.  But in the case of data ware house of the organization the data are stored subject wise , this subject is most important aspect of the organization . for bank account is important for business sale is important
  6. 6. Integrated ?? •Data in DW comes from several operational systems. •Different datasets in these operational system have different file formats. •Example: Data for subject Account comes from 3 different data sources.(AS SHOWN IN FIGURE) Account savings current Loan Subject = account Operational environment
  7. 7. o So variations could be there, like: 1. Naming conventions could be different. Example: Saving account no. could be of 8 bytes long but only 6 bytes for checking accounts. 1. Number of total Attributes for data items could be different. Example :saving account can have 5 attribute while checking account can have 7 attribute associated with it.
  8. 8. Time variant?? Data warehouse  The operational database stores only current data but the data ware house stores all present as well as past data in order to full fill its purposes.  Data is stored as series of snapshots each representing a period of time.  Data is tagged with some element of time - creation date, as of date, etc.  Data is available on-line for long periods of time for trend analysis and forecasting. For example, five or more years
  9. 9. Non-volatile?? Data from operational systems are moved into DW after specific intervals.(process is called refreashing)  Business transaction don’t update in Data ware house. Data from Data ware house is not deleted.
  10. 10. The 3 tier architecture of Data Ware house--- •When all the components of the system are combined together to form the complete system then the style of designing(combining) of that structure is known as the architecture of the system.(ex-the architecture of a school building). •In data ware house the components are- 1. Data acquisition 2. Data storage 3. Data processing 4. Data delivery Layers(ex. Osi reference model in computer network ) means the system is made by logically separated components and tier means the system is made by physically separated components.
  11. 11. The various possible architecture while dealing with database: Hare database (in the form of files) is itself stored in the client computer. Hare database server is present in the distant place and client machine and database are connected via network.
  12. 12. Here between the client machine and the database server we have included an application server which is mainly at server side and does the processing and return results to the client machine.
  13. 13. conclusions Tiers Security Maintainability No . Of users Speed cost
  14. 14. The architecture of data ware house Data tier logic tier presentation Information Sources Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Clients (Tier 3) External sources Operational DB’s extract transform load Data Warehous e Data Marts MOLAP serve ROLAP OLAP Query/Report ing Data Mining serve serve tier
  15. 15. The bottom most: Operational databases External sourse •These are the application specific database which are used to store all the daily basis transactional data of the organization. •This is the database which is used to store all important external information.
  16. 16. Database vs. data ware house OLTP (on-line transaction processing)  Major task of traditional relational DBMS  Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.  OLAP (on-line analytical processing)  Major task of data warehouse system.  Data analysis and decision making.  Forecasting, monitoring of business.
  17. 17. How loading is done of the Warehouse?? This is done using back end tools. To know about back end tools go to the next page.
  18. 18. Data extraction: get data from multiple, heterogeneous, and external sources. Data cleaning: correcting values. Data transformation: converting from one format to another format. (pond kg , age dob) Load: summarize tables are loaded into data ware house. Refresh: propagate the updates from the data sources to the warehouse.
  19. 19. Tier 1 :data ware house  It is the data ware house that is loaded with strategy making information.  This tier also consists of data marts.
  20. 20. Tier 2  This tier consists of Olap server which are used for the processing purposes. Here the following issues are also handled—  Security of data.(you are not letting user directly communicate with data base)  Business logic(here you can decide what kind of information to be shown to a particular kind of query ).  Translation(users high level query are converted into low level sql query).  Intermediate calculations(removes burden from user interface and database )
  21. 21. Olap server Rolap server Molap server Choose this if space is important for you Choose this if time is important for you
  23. 23. Multi dimensional view Desktop client Rolap server Creating data cube dynamically (on the fly) Rdbms server Data ware house ROLAP
  24. 24. DETAILS  Relational online analytical processing (ROLAP) is a form of online analytical processing (olap) that performs multidimensional analysis of data which is stored in a relational database rather than in a multidimensional database.  In a three-tiered architecture, the user submits a request for multidimensional analysis and the ROLAP engine converts the request to SQL for submission to the relational database. Then the operation is performed in reverse: the engine converts the resulting data from SQL to a multidimensional format(on the fly) before it is returned to the client for viewing.
  25. 25. Add up total sale amount by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date ans date sum 1 81 2 48 sale prodId storeId date amt p1 s1 1 12 p2 s1 1 11 p1 s3 1 50 p2 s2 1 8 p1 s1 2 44 p1 s2 2 4 QUERY
  27. 27. Multi dimensional view Desktop client Molap server Rdbms server Data ware house Multidimensional database Molap
  28. 28. POINTS ABOUT MOLAP:  Here we use Multidimensional database for the purpose of data fetching when an analytical query is submitted by user.  Facts (fact table)are stored in multi-dimensional arrays.  Dimensions(dimension table) used to index the arrays.  One of the major distinctions of molap against a rolap tool is that data are pre-summarized pre-calculated and are stored in an optimized format in a multidimensional cube, instead of in a relational database , in accordance with a client’s reporting requirements .
  29. 29. MOLAP is more optimized for fast query performance and retrieval of summarized information. There are certain limitations to implementation of a MOLAP system, one primary weakness of which is that MOLAP tool is less scalable than a ROLAP tool as the former is capable of handling only a limited amount of data. Pre-calculating or pre-consolidating transactional data improves speed.
  30. 30. The MOLAP Cube Add up total sale amount by day Fact table view: Multi-dimensional cube: sale prodId storeId amt p1 s1 12 p2 s1 11 p1 s3 50 p2 s2 8 s1 s2 s3 p1 12 50 p2 11 8 dimensions = 2
  31. 31. Add up total sale amount by day Fact table view: Multi-dimensional cube: dimensions = 3 sale prodId storeId date amt p1 s1 1 12 p2 s1 1 11 p1 s3 1 50 p2 s2 1 8 p1 s1 2 44 p1 s2 2 4 day 2 s1 s2 s3 p1 44 4 p2 s1 s2 s3 p1 12 50 p2 11 8 day 1
  32. 32. The total sale of of computers in year 2008 at the location asia is 200 unit The total sale of of books in year 2008 at the location Europe is 200
  33. 33. Hybrid OLAP (HOLAP)  HOLAP = Hybrid OLAP:  Best of both worlds  Storing detailed data in RDBMS  Storing aggregated data in MDBMS  User access via MOLAP tools
  34. 34. Data Flow in HOLAP MDBMS Server Client Multi-dimensiona l access Multidimensiona l Viewer Relational Viewer Multi-dimension aldata SQL-Read RDBMS Server User data Meta data Derived data SQL-Reach Through SQL-Read
  35. 35. Pie chart reports Front end tools Mobile phone computer Query result Graphs Bar chart
  36. 36. Data mart 37
  37. 37. THANK YOU