SlideShare una empresa de Scribd logo
1 de 61
Data
in databases
  “It’s not what you think”


            Clare Somerville
                Trish O’Kane
Our point
 Long term preservation of data requires
  understanding how data is created and
  managed
 We have to work out:
    ◦ What data the business needs to keep
    ◦ What records the business needs
      to create and keep
   And….. how
    ◦ What data must be unchanged
    ◦ What we mean by usable and retrievable
The problem, as we see it




   What is a record and its attributes

                                             We
What is a database and how they are built
             and maintained
                                             will
                                            cover
   How can we use data sets to create
               records?



 What is a data warehouse and how they
         are built and maintained



How can we ensure that useful data sets
        are available over time
Agenda
   The problem
   Definitions
   Delivering data &
    records from data
    ◦ Data warehousing
    ◦ Data “lifecycle”
     management

   Conclusion
The problem
   Databases have replaced many semi-structured
    records
    ◦ Register of Births, Deaths and Marriages (and Divorces!)
    ◦ EQC claims data
   But - we want some of that information available
    long term in a usable format
   Records managers are unfamiliar with the world
    of structured data
    ◦ Disposal outcome in a draft disposal authority:
      “When database decommissioned,
      transfer to Archives NZ”
    ◦ Transfer what?
Who wants what?
   What have we got?
    ◦ Data in databases
   What do we want?
    ◦ Records
   When do we want them?
    ◦ Now, and for the long term
   But….what is a record
    in the context of data?
    ◦ The individual data item?
    ◦ A whole dataset?
What have we got
1.       Customers
     ◦    Customers for data
     ◦    Customers for records
2.       Information assets
     ◦    Records
     ◦    Transactional data in databases
     ◦    Datasets
     ◦    Data marts and data warehouses
3.       What do we have to do to?
     ◦    Principles from data warehousing
     ◦    Data life cycle management
Definitions
Records, metadata, data, source systems, database, data warehouse
Records

 Recordkeeping
 definition                      In structured world

Public Records Act 2005         A record is a line of
 A record or class of           data in a table in a
  records in any form in         database
  whole or part, created
  or received by a public
  office in the conduct of
  its affairs
Attributes of a record
             Recordkeeping               Data management
              perspective                   perspective
   Documents the carrying out          Field types
    of the organisation’s business
    objectives, core business            ◦ Numeric
    functions, services and              ◦ Character
    deliverables,
    and/or                               ◦ Date/time
   Provides evidence of                Composite, derived
    compliance with any current
    jurisdictional standards,           Values
    and/or
   Documents
    the value of the resources
    of the organisation
    and how risks to the business
    are managed,
    and/or
   Supports the long-term
    viability of the organisation
Data and metadata
Documents and metadata
“Essentially there is a different relationship
                  between
           data and its metadata
                     than
      documents and their metadata”
Is it data or is it metadata?
It depends, doesn’t it?

It’s about the level at which it is used/applied           date created
  E.g. Date created


Customer ID   Date created   Customer        Customer
                             name            Type

123           2008-10-20     Bloggs, Joe     Retailer

124           2008-10-23     Mouse, Minnie   Distributor



125           2008-10-26     Max,            Direct
                             Metadata
Metadata in the data warehouse

Business metadata          Technical metadata

Link between database      What data, from
and users – road map for   where, how, when etc
access
                           Developers
 Business users            Technical users
 Analysts                  Maintenance and
 Less technical            growth
                           On-going development
Metadata in the data warehouse

Business metadata    Technical metadata

 Structure of data    Table names
 Table names          Keys
 Attribute names      Indexes
 Location             Program names
 Access               Job dependencies
 Reliability          Transformation
 Summarisations       Execution time
 Business rules       Audit, security controls
Metadata

     Data      Metadata

    10 bytes    1 byte
Metadata

     Data   Metadata

             Heaps!
0349,000,A," ","CHANGE ADD ON MED CERT          "," "," "," ","
            ","S","GASUP","
            ",00000,71909,00000,0,71909,10393470,00000.00,00000.00,00000.00,00000.0
            0,00000.00,000000,71937,72266,0,139,600,4,72266,471,360480713,000000000
            ,1,00090.00,00037.00,000031543560
            ",00000.00,00000.00,+000000.00,0000000,0000,000,00,000000000,00000,0000
            0,000000.00,000000.00,000000000,009,72266,00000,72268,16414213,00000000
            1,000000000,244,0114340511,04,01,+000000.00,+000000.00,00000,000000,+00


Data –
            0000.00,610,0,00146.13,000000.00,000,000,610,0,290763901,290763901,0000
            00000,000000000,000069987378
            0174,000,D,"C","N","N","Y","Y","Y","N","N","3349533755","Y","T REED
            ","Y","DSWSINVE106      ","BELOQ","Y","NAWEK","TANIA

comma       ","REED
            ","C","N",02651,009,0000,72273,72268,16405202,0114340511,03,72245,0000,
            003,011434,000000228855
            0174,000,A,"C","N","N","Y","Y","Y","N","N","3349533755","Y","T REED

delimited   ","Y","DSWSINVE106
            ","REED
                                    ","BELOQ","Y","NAWEK","TANIA

            ","C","N",02651,009,0000,72273,72268,16405202,0114340511,04,72245,0000,
            003,011434,000000228855
            0161,000,D,"A",126,72263,00000.00,600,5,360480713,000007282728
            0161,000,A,"A",126,72263,00000.00,600,5,360480713,000007282728
            0057,000,D," ","          ","      ","          ","     ","A","
            ","
            ","AHMEV","VOKOG",000000003,0814409,2500,001,25,00,00000.00,000000,00,1
            32,00000,0,+00063.00,72266,14133031,00000,00000.00,2,+00063.00,01,00000
            .00,000000,2,0,0,00000.00,607,1,471,362400470,000000000,000409413299
            0057,000,A," ","MANOP     ","      ","          ","     ","A","
            ","
            ","AHMEV","VOKOG",000000003,0814409,2500,001,25,00,00000.00,000000,00,1
            32,72269,0,+00063.00,72266,14133031,00000,00000.00,2,+00063.00,01,00000
            .00,000000,2,0,0,00000.00,607,1,471,362400470,000000000,000409413299
            0270,000,A," "," ","
            ","N","N","G",128,72266,72268,16414261,01,00000.000,00000.000,0,139,000
            00.00,00000.00,00000.00,00000.00,00000.00,00000.00,00000.00,00000.00,00
            000.00,600,5,471,000,000,000,000,000,000,000,000,0001,360480713,00000.0
            0,000602537445
            0062,000,A,"YYYYYYYYYY                                ","AUTH
            01532600063000000000131197N014101       0000000
            ","VA","SATRA","DSWSAUCK119       "," ","MANOP","
            ",003,132,72268,16414266,0000000000,0,607,362400470,000000000,000084800
            530
Data – in a table
Database
3 layers




           •User interface
Database




           •Rules and algorithms
           •Data
Application layer   Provides views, creates reports
                    Turns data into information


                    Adds, overwrites, deletes data
                    Runs rules and processes
   Data layer

                    Data in tables
                    Acted on by application layer
Can data fit the PRA definition?
• We are “format neutral” in the
  management of records, so….
• Data can be records!
  – Births Deaths and Marriages Register
  – EQC claims data
• Test questions
  – If we exclude data what have we lost?
  – What is the impact of losing data?
     • On the business
     • For the future
Source solution is not
a recordkeeping system
                        The Solution System is not a
Application layer       recordkeeping system because it…
                    •     Holds transactional data, not
                          evidence of transactions in context
                          (records)
                    •     Isn‟t tamper proof
                           – Difficult to know exactly what the
   Data layer                application layer is doing
                           – Different tables and rows may be
                             managed differently
                           – Hard to roll back to a point in time
                    •     Must overwrite „redundant‟ data to
                          run efficiently
                           – Compromise of history vs speed
                           – Business use is the priority
                    •     The data layer is not usable without
                          the application layer
Inside a database
   Here today - gone tomorrow
   Transaction metadata
    ◦ Example: An activity about a customer is a record
   Is there a Unique ID
      For the transaction?
      For the customer?
   Where and when are/were components
    located?
      Multiple data tables in one database
      Multiple data tables across multiple database
   Table names and column names
   Standard names for elements across tables
Source / business databases
   Data stored in tables
   Normalised structure
   Lots of data
   Large number of users
   Lots of very quick transactions
   Varying history retained
   Mostly data is overwritten
Data warehouse
Data warehouse
 Storing and accessing large amounts of
 data

 Central repository for all or significant
 parts of the data that an enterprise’s
 various business systems collect
Multiple
                                               Designed for
            Historical               source
                                                reporting
              data                  systems
                                               and analysis
                         Lots of data!
                                                     Large
      Transaction                                   queries
       level data
                                                          Multiple
Corporate                                                table joins
  effort


    Centrally                                       Unpredictable
     owned                                              use


Corporate
 needs
                               Data                    Pressure on
                             warehouse                  resources
What is the
simplest/most robust
approach
to deliver
data and
records from databases?
Elegant solutions needed
1 Create policy to document:
   What authoritative records must be retained
    and what metadata must be retained
   What formats are acceptable
   Which (if any) records and metadata are
    considered transient artefacts, and why (e.g.
    format shifting duplicates, quality checking
    etc),
   Get approval for destruction of transient
    artefacts as part of the normal functioning of
    the systems that dispose of them
Approach: create and export
records from solution system
1.   Identify what data tables/records are needed
     and that can be produced
2.   Map identified records to disposal authorities
     ◦   Which records must be kept beyond system
         decommission
     ◦   Identify the business need for retention
3.   Use the application layer to create and export
     those records in a suitable format
4.   Store in recordkeeping system e.g. data
     warehouse or EDRMS
5.   Retain records needed for the business post-
     decommission
2 Persistently associate metadata
   Appropriate metadata associated and
    retained with authoritative records
    ◦ Identify data linkages between systems
    ◦ Retain those linkages
    or
    ◦ Consolidate metadata and associated record
      objects into one system, and ensure they are
      persistently associated
   Ensure migrated data/metadata/objects
    retain their context (e.g. date
    created, author etc)
Future state
BAU transfers to recordkeeping systems



                                         Structured data
                                         to data warehouse


             Customer
               mgmt         Case
 EDRMS
              system        mgmt
                           system


  Create key records and send to EDRMS
Data warehouses as an
example of good practice
Managing data
Data feeds - principles
   Direct data feeds from source systems
   Not changed in any way
   No intervening processes
   All changes to the data
   Fully auditable
   Reconcile to source system
For Example: one table…
Before:                                          After:
29 months data                                   29 months data
162 tapes                                        4 physical files
400 million records                              27 million records
88 GB                                            6 GB
     Month1        Month2         Month3
                                           ......   Monthn

                                           ...
           Compare          Compare
                                           ......
                                           ...
        Differences1        Differences2   ......
                                           ...
    Consolidated
    file
Subsets
   Frequently used data
   At a point in time
   Smaller, quicker
   Easier to use
   Daily, weekly, monthly
Summary data
 Summary layer
 Analysts access the summary layer
 Smaller, easier
 Data Marts
Benefits of data warehouse
Accessible

  Stored online

     Quick and easy to access

        Multiple sources of data

             Updated daily

               Full history – track everything

                  Can do more – freedom to explore

                     Tuned environment

                       One version of the truth
Data management
   Data does not manage itself!
   Difficult, unruly
   Standards, processes
   Roles and responsibilities
   Data warehouse team
   Skills
    ◦ Data warehousing, Data
      management, Software, Hardware, Metadata, Architectur
      e, Analysis, Performance, tuning
   Coordination, communication, marketing
Best practice
 Data warehousing around for years
 Proven architectures, technologies,
  methodologies
 Good infrastructure


                            … but will it last?
Challenges – big data
33% - data growth
contributes to performance
issues “most of the time”

Managing storage may cost
3-10 times cost of
procurement

Average company keeps 20-
40 duplicates of its data
Helping IT and the business to
collaborate in managing data
   It’s not just about BI

  Business and IT must
     work together
Helping IT and the business to
collaborate in managing data
Data “lifecycle” management
Decommission = risk

         Old case
 Old
          mgmt
EDRMS
         system



            New
  New                    Data
            case
 EDRMS
            mgmt       warehouse
           system

                    Partial exports
Data lifecycle management
 Data lifecycle management (DLM)
 Managing the flow of data, information
  and associated metadata through
  information systems and
  repositories, from creation and storage
  through to when it can be discarded.
 Recognises that the importance and
  business value of data does not rely on its
  age, or how often it is used.
Why DLM
   Data and information has value for
    ◦ strategic and operational business needs
    ◦ managing risk
    ◦ meeting legislative obligations
 Value of information decays over time
 Some information can be archived, some
  discarded
 Occasionally, sometimes
  unexpectedly, older data may need to be
  accessed again, quickly, completely and
  accurately
DLM Components
                               Create or Modify
                                Standards
                                Formats       Requires:
                                               Core process artefacts
                                                                                                  Includes data
                                Retrieval      Connected systems
                                                     Automated capture
                                                                                                    validation


                                         Property
           Retain or Dispose                                      Maintain
            Archive               Customer                         Organise
            Transfer                  Tenancy
                                                                   Describe
            Destroy                                                Manage

   Requires:                                                              Requires:
    Disposal Authorities                                                    Risk identification
    Business requirements                                                   Lifecycle policies
    Disposal planning                                                       Metadata schema
    Tiered Storage                                                          Business classification
                                                                          linked to business process
                               Use
                                Access
                                Share
                                Find
                                                Requires:
                                                 Single source of truth
                                                 Disposal Authorities
                                                 Disposal Planning
                                                 Tiered Storage
Conclusion
Create and maintain
Principle 1: Recordkeeping Must be
Planned and Implemented
1. Responsibility assigned CEO down
2. Policy
3. Procedures
4. Responsibilities defined, resourced
5. Recordkeeping programme & monitoring
Principle 2: Full & accurate records of business activity
                        must be made
Requirement                                              Data   Data Warehouse
                                                         base


1. Functions and business activities identified and
   documented                                                       
2. Records of business decisions and transactions
   must be created                                                 
3. All records of business activity captured routinely
   into an organisation-wide recordkeeping                          
   framework



4. Training provided
                                                                    
Principle 3: records must provide authoritative and
  reliable evidence of business activity

Requirement                                               Data   Data
                                                          base   Warehouse

   10. Authentic: accurately documented creation,
   receipt, & transmission                                          
   11. Reliability & integrity, maintained unaltered
                                                                    
   12. Useable, retrievable, accessible
                                                                    
   13. Complete, with content & contextual information
                                                                    
   14. Comprehensive, provide authoritative evidence of
   all business activities                                          
Principle 4: records must be managed
  systematically
Requirement                                                  Data   Data Warehouse
                                                             base

   15. Identified & captured in recordkeeping framework
                                                                         
   16. Organised according to a business classification
   scheme                                                                
   17. Reliably maintained over time in recordkeeping
   framework                                                             
   18. Useable, accessible & retrievable for the entire
   period of their retention                                             
   19. Contextual and structural integrity maintained over
   time                                                                  
   20. Retention & disposal actions systematic
                                                                        
RK capability of system(s)
   A system that holds authoritative records
    ◦ Must be capable of recordkeeping, or
    ◦ Made capable, or
    ◦ Must transfer records to a recordkeeping
      system
   Who makes that decision?
    ◦ Should be business owner
    ◦ (with advice from IT)
   Data warehouses show us
    ◦ what can be done
    ◦ how to do it
Developing an Enterprise Information Management Framework

                                                            Develop a strategy and       Establish principles   Assess current and         Document legislative




                                                                                                                                                                     INFORMATION CULTURE




                                                                                                                                                                                           INFORMATION STEWARDSHIP
 GOVERNANCE                  Authority, management,
                             monitoring and performance
                                                            roadmap                      Define:                desired maturity           framework
                                                            Establish structures and     - Policies             Determine metrics and      Understand compliance
                             of information management      arrangements                 - Standards            measuring                  Determine and optimise
                             functions                                                   - Business Rules
                                                            Define roles and processes                          Establish monitoring       business benefits
                                                            arrangements                                        processes                  Manage information risk


                                         A blueprint for the semantic     Model key information flows     Identify:                     Organise information for:
 INFORMATION ASSET                       and physical integration of      Establish IS design             - Authoritative information   - Navigation and retrieval
                                         enterprise information           principles and standards        - High-value information      - Discovery
 ARCHITECTURE                            assets, technology and the       Develop an inventory of         - Critical information        - Content types and
                                         business                         information, systems and        Plan for disaster recovery      categorisation
                                                                          processes



 BUSINESS                                  REFERENCE AND                                 STRUCTURED AND UNSTRUCTURED
 INTELLIGENCE AND                          MASTER DATA                                   INFORMATION
                                                                                                                                                                     The
 DATA                                      MANAGEMENT                                    Develop an information lifecycle
                                                                                         strategy and roadmap
                                                                                                                                   Develop a recordkeeping
                                                                                                                                   strategy and roadmap
                                                                                                                                                                     behaviours,
 WAREHOUSING
                                                                                                                                                                     values and
                                           Capture, store and re-use core                Enable integration and interoperability   Enable compliant retention and
                                                                                                                                                                     norms of the          Oversight of
  Store and transform                      business entities                             Plan and manage:                          disposal in systems
                                                                                                                                                                     enterprise            the content,
  Integrate and deliver                    Consolidate and match data                    - Repositories                            Support access to legacy          within the
                                                                                         - Storage                                 information                                             description,
  Perform analytics and reporting          Manage and control data quality                                                                                           context of            quality, and
                                           Distribute core data appropriately            - Format                                  Plan for any content migration    information
  Support decision making                                                                                                                                                                  accuracy of
                                                                                                                                                                     use                   enterprise
                                                                                         Develop:                                  Map across metadata                                     information
                                                                                                                                                                                           throughout
 METADATA MANAGEMENT                                                                     - Metadata Schema                         schemas                            Manage and
                                                The connecting foundation for
                                                                                         - Controlled Vocabulary                   Establish monitoring and           sustain              its lifecycle
                                                EIM, used to describe, organise,
                                                                                         - Thesauri                                maintenance processes              change
                                                integrate, share, and govern
                                                enterprise information assets            - Business Function Classification        Implement metadata                 Provide              Define
                                                                                         Utilise system generated metadata         management tools                   information          responsibility,
                                                                                                                                                                      leadership           roles and
                                                                                                                                                                      Embed EIM in         accountability
                                                                                         Establish security policies               Manage access control              performance          Establish
 SECURITY AND CONTROL                           Policies, rules and tools that
                                                ensure the proper control,
                                                                                         and rules                                 Manage classified information      management
                                                                                                                                                                      Deliver
                                                                                                                                                                                           stewardship
                                                                                         Model information security                Ensure regulatory compliance                            processes
                                                protection and privacy of                and scenarios                             Establish monitoring and           training and         Establish
                                                information                              Build security into system                metrics                            ongoing              monitoring
                                                                                         metadata                                                                     support              and
                                                                                                                                                                      Develop              maintenance
                                                                                                                                                                      toolkits and
                                                                                                                                                                      reference
    Social                     Emails                     Audio                  Mobile                            IT/OT                     Transactional            material
                                                                                                                                             Data
                   Documents                  Images                    Text                     Movies                     Search
Future state of data
   Accurate, relevant, timely delivery of data and
    information
    ◦ Trustworthy information
    ◦ Where it is needed
    ◦ Formats most appropriate to business need and future
   Information found quickly, whether it’s old or new
   Clear guidelines for systems and processes
    ◦ Keep what’s needed for only as long as it’s needed
    ◦ In the right format
   Data has recognisable value and appropriate levels of
    management
    ◦ Business need: we know what’s important, and when it’s
      important
    ◦ Risk: we’re clear about what to manage, and how
    ◦ Regulatory framework: we meet legislative obligations
Our point
 Long term preservation of data requires
  understanding how data is created and
  managed
 We have to work out:
    ◦ What data the business needs to keep
    ◦ What records the business needs
      to create and keep
   And….. how
    ◦ What data must be unchanged
    ◦ What we mean by usable and retrievable
Data
                   in databases
                     “It’s not what you think”
Clare Somerville
Trish O’Kane

Más contenido relacionado

La actualidad más candente

Storage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store DatabasesStorage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store DatabasesDavid Walker
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
TUW - 184.742 Evaluating Data Concerns for DaaS
TUW - 184.742 Evaluating Data Concerns for DaaSTUW - 184.742 Evaluating Data Concerns for DaaS
TUW - 184.742 Evaluating Data Concerns for DaaSHong-Linh Truong
 
2018 10 igneous
2018 10 igneous2018 10 igneous
2018 10 igneousChris Dwan
 
Data Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI ComplianceData Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI ComplianceDavid Walker
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Usedmurph4
 
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...emermell
 
An Eye on the Future A Review of Data Virtualization Techniques to Improve Re...
An Eye on the Future A Review of Data Virtualization Techniques to Improve Re...An Eye on the Future A Review of Data Virtualization Techniques to Improve Re...
An Eye on the Future A Review of Data Virtualization Techniques to Improve Re...HMO Research Network
 
Big datarevealed hadoop catalog
Big datarevealed hadoop catalogBig datarevealed hadoop catalog
Big datarevealed hadoop catalogSteven Meister
 
Next Generation BI: current state and changing product assumptions
Next Generation BI: current state and changing product assumptionsNext Generation BI: current state and changing product assumptions
Next Generation BI: current state and changing product assumptionsmark madsen
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsPerficient, Inc.
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
White Paper Data Quality Process Design For Ad Hoc Reporting
White Paper   Data Quality Process Design For Ad Hoc ReportingWhite Paper   Data Quality Process Design For Ad Hoc Reporting
White Paper Data Quality Process Design For Ad Hoc Reportingmacrochaotic
 
Standards metadata management - version control and its governance
Standards metadata management - version control and its governanceStandards metadata management - version control and its governance
Standards metadata management - version control and its governanceKevin Lee
 

La actualidad más candente (19)

Storage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store DatabasesStorage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store Databases
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
 
TUW - 184.742 Evaluating Data Concerns for DaaS
TUW - 184.742 Evaluating Data Concerns for DaaSTUW - 184.742 Evaluating Data Concerns for DaaS
TUW - 184.742 Evaluating Data Concerns for DaaS
 
2018 10 igneous
2018 10 igneous2018 10 igneous
2018 10 igneous
 
Data Manager
Data ManagerData Manager
Data Manager
 
Data Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI ComplianceData Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI Compliance
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Use
 
The new EDW
The new EDWThe new EDW
The new EDW
 
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
 
An Eye on the Future A Review of Data Virtualization Techniques to Improve Re...
An Eye on the Future A Review of Data Virtualization Techniques to Improve Re...An Eye on the Future A Review of Data Virtualization Techniques to Improve Re...
An Eye on the Future A Review of Data Virtualization Techniques to Improve Re...
 
Big datarevealed hadoop catalog
Big datarevealed hadoop catalogBig datarevealed hadoop catalog
Big datarevealed hadoop catalog
 
Next Generation BI: current state and changing product assumptions
Next Generation BI: current state and changing product assumptionsNext Generation BI: current state and changing product assumptions
Next Generation BI: current state and changing product assumptions
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
Data lakes
Data lakesData lakes
Data lakes
 
White Paper Data Quality Process Design For Ad Hoc Reporting
White Paper   Data Quality Process Design For Ad Hoc ReportingWhite Paper   Data Quality Process Design For Ad Hoc Reporting
White Paper Data Quality Process Design For Ad Hoc Reporting
 
Standards metadata management - version control and its governance
Standards metadata management - version control and its governanceStandards metadata management - version control and its governance
Standards metadata management - version control and its governance
 

Destacado

Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsFuture Perfect 2012
 
Grace Currie Ann Jebson First Things First
Grace Currie Ann Jebson First Things FirstGrace Currie Ann Jebson First Things First
Grace Currie Ann Jebson First Things FirstFuture Perfect 2012
 
Bunny booktemplate1
Bunny booktemplate1Bunny booktemplate1
Bunny booktemplate1mjbeichner
 
Kevin De Vorsey Past is Prologue
Kevin De Vorsey Past is PrologueKevin De Vorsey Past is Prologue
Kevin De Vorsey Past is PrologueFuture Perfect 2012
 
Alison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessAlison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessFuture Perfect 2012
 
Ageofdiscovery
AgeofdiscoveryAgeofdiscovery
AgeofdiscoveryNinjaBlank
 
Dave Pearson The Adventures of Digi
Dave Pearson The Adventures of DigiDave Pearson The Adventures of Digi
Dave Pearson The Adventures of DigiFuture Perfect 2012
 
NACADA_STARS_Presentation
NACADA_STARS_PresentationNACADA_STARS_Presentation
NACADA_STARS_Presentationjpollett
 
Jeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation PerspectiveJeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation PerspectiveFuture Perfect 2012
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveFuture Perfect 2012
 
OGD Wien - Ideensammlung
OGD Wien - IdeensammlungOGD Wien - Ideensammlung
OGD Wien - IdeensammlungBrigitte Lutz
 

Destacado (17)

Tourismo filipino1
Tourismo filipino1Tourismo filipino1
Tourismo filipino1
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and Formats
 
Grace Currie Ann Jebson First Things First
Grace Currie Ann Jebson First Things FirstGrace Currie Ann Jebson First Things First
Grace Currie Ann Jebson First Things First
 
Bunny booktemplate1
Bunny booktemplate1Bunny booktemplate1
Bunny booktemplate1
 
Kevin De Vorsey Past is Prologue
Kevin De Vorsey Past is PrologueKevin De Vorsey Past is Prologue
Kevin De Vorsey Past is Prologue
 
Alison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessAlison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for Success
 
Ageofdiscovery
AgeofdiscoveryAgeofdiscovery
Ageofdiscovery
 
Dave Pearson The Adventures of Digi
Dave Pearson The Adventures of DigiDave Pearson The Adventures of Digi
Dave Pearson The Adventures of Digi
 
Andrew Waugh Business Systems
Andrew Waugh Business SystemsAndrew Waugh Business Systems
Andrew Waugh Business Systems
 
Fdtd
FdtdFdtd
Fdtd
 
NACADA_STARS_Presentation
NACADA_STARS_PresentationNACADA_STARS_Presentation
NACADA_STARS_Presentation
 
Jeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation PerspectiveJeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation Perspective
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data Archive
 
OGD Wien - Ideensammlung
OGD Wien - IdeensammlungOGD Wien - Ideensammlung
OGD Wien - Ideensammlung
 
Michael Parsons Passion
Michael Parsons PassionMichael Parsons Passion
Michael Parsons Passion
 
Steve Knight by Design
Steve Knight by DesignSteve Knight by Design
Steve Knight by Design
 
Bigger Hard Drive Jamie Lean
Bigger Hard Drive Jamie LeanBigger Hard Drive Jamie Lean
Bigger Hard Drive Jamie Lean
 

Similar a Clare Somerville Trish O’Kane Data in Databases

Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...Patrick Van Renterghem
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...DataScienceConferenc1
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services MarketplaceDenodo
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?Albert Hoitingh
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
EPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfEPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfcedrinemadera
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 

Similar a Clare Somerville Trish O’Kane Data in Databases (20)

Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services Marketplace
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
 
Bi overview
Bi overviewBi overview
Bi overview
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
EPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfEPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdf
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 

Más de Future Perfect 2012

Working Across Organizations white paper
Working Across Organizations white paperWorking Across Organizations white paper
Working Across Organizations white paperFuture Perfect 2012
 
Ensuring Data Integrity white paper
Ensuring Data Integrity white paperEnsuring Data Integrity white paper
Ensuring Data Integrity white paperFuture Perfect 2012
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Future Perfect 2012
 
Joe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryJoe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryFuture Perfect 2012
 
James Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchJames Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchFuture Perfect 2012
 
Shaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemShaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemFuture Perfect 2012
 
Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineFuture Perfect 2012
 
Parul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationParul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationFuture Perfect 2012
 
Jay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsJay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsFuture Perfect 2012
 
Stuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingStuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingFuture Perfect 2012
 
Cassie Findlay Digital Transformation SRNSW
Cassie Findlay Digital Transformation SRNSWCassie Findlay Digital Transformation SRNSW
Cassie Findlay Digital Transformation SRNSWFuture Perfect 2012
 
T Bahr M Lindlar Goportis Digital Preservation Pilot
T Bahr M Lindlar Goportis Digital Preservation PilotT Bahr M Lindlar Goportis Digital Preservation Pilot
T Bahr M Lindlar Goportis Digital Preservation PilotFuture Perfect 2012
 
Dennis Phillips Cooperative Digital Preservation
Dennis Phillips Cooperative Digital PreservationDennis Phillips Cooperative Digital Preservation
Dennis Phillips Cooperative Digital PreservationFuture Perfect 2012
 

Más de Future Perfect 2012 (15)

Working Across Organizations white paper
Working Across Organizations white paperWorking Across Organizations white paper
Working Across Organizations white paper
 
Ensuring Data Integrity white paper
Ensuring Data Integrity white paperEnsuring Data Integrity white paper
Ensuring Data Integrity white paper
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
 
Joe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryJoe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage Library
 
James Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchJames Smithies Academic Earthquake Research
James Smithies Academic Earthquake Research
 
Shaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemShaun Hendy Innovation Ecosystem
Shaun Hendy Innovation Ecosystem
 
Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP Online
 
Parul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationParul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right Combination
 
Gabe Nault Data Integrity
Gabe Nault Data IntegrityGabe Nault Data Integrity
Gabe Nault Data Integrity
 
Jay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsJay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying Formats
 
Stuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingStuart Wakefield Cloud Computing
Stuart Wakefield Cloud Computing
 
Cassie Findlay Digital Transformation SRNSW
Cassie Findlay Digital Transformation SRNSWCassie Findlay Digital Transformation SRNSW
Cassie Findlay Digital Transformation SRNSW
 
T Bahr M Lindlar Goportis Digital Preservation Pilot
T Bahr M Lindlar Goportis Digital Preservation PilotT Bahr M Lindlar Goportis Digital Preservation Pilot
T Bahr M Lindlar Goportis Digital Preservation Pilot
 
Dennis Phillips Cooperative Digital Preservation
Dennis Phillips Cooperative Digital PreservationDennis Phillips Cooperative Digital Preservation
Dennis Phillips Cooperative Digital Preservation
 
Bedrich Vychodil DIFFER
Bedrich Vychodil DIFFERBedrich Vychodil DIFFER
Bedrich Vychodil DIFFER
 

Último

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Último (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Clare Somerville Trish O’Kane Data in Databases

  • 1. Data in databases “It’s not what you think” Clare Somerville Trish O’Kane
  • 2. Our point  Long term preservation of data requires understanding how data is created and managed  We have to work out: ◦ What data the business needs to keep ◦ What records the business needs to create and keep  And….. how ◦ What data must be unchanged ◦ What we mean by usable and retrievable
  • 3. The problem, as we see it What is a record and its attributes We What is a database and how they are built and maintained will cover How can we use data sets to create records? What is a data warehouse and how they are built and maintained How can we ensure that useful data sets are available over time
  • 4. Agenda  The problem  Definitions  Delivering data & records from data ◦ Data warehousing ◦ Data “lifecycle” management  Conclusion
  • 5. The problem  Databases have replaced many semi-structured records ◦ Register of Births, Deaths and Marriages (and Divorces!) ◦ EQC claims data  But - we want some of that information available long term in a usable format  Records managers are unfamiliar with the world of structured data ◦ Disposal outcome in a draft disposal authority: “When database decommissioned, transfer to Archives NZ” ◦ Transfer what?
  • 6. Who wants what?  What have we got? ◦ Data in databases  What do we want? ◦ Records  When do we want them? ◦ Now, and for the long term  But….what is a record in the context of data? ◦ The individual data item? ◦ A whole dataset?
  • 7. What have we got 1. Customers ◦ Customers for data ◦ Customers for records 2. Information assets ◦ Records ◦ Transactional data in databases ◦ Datasets ◦ Data marts and data warehouses 3. What do we have to do to? ◦ Principles from data warehousing ◦ Data life cycle management
  • 8. Definitions Records, metadata, data, source systems, database, data warehouse
  • 9. Records Recordkeeping definition In structured world Public Records Act 2005  A record is a line of  A record or class of data in a table in a records in any form in database whole or part, created or received by a public office in the conduct of its affairs
  • 10. Attributes of a record Recordkeeping Data management perspective perspective  Documents the carrying out  Field types of the organisation’s business objectives, core business ◦ Numeric functions, services and ◦ Character deliverables, and/or ◦ Date/time  Provides evidence of  Composite, derived compliance with any current jurisdictional standards,  Values and/or  Documents the value of the resources of the organisation and how risks to the business are managed, and/or  Supports the long-term viability of the organisation
  • 11. Data and metadata Documents and metadata “Essentially there is a different relationship between data and its metadata than documents and their metadata”
  • 12. Is it data or is it metadata? It depends, doesn’t it? It’s about the level at which it is used/applied date created E.g. Date created Customer ID Date created Customer Customer name Type 123 2008-10-20 Bloggs, Joe Retailer 124 2008-10-23 Mouse, Minnie Distributor 125 2008-10-26 Max, Direct Metadata
  • 13. Metadata in the data warehouse Business metadata Technical metadata Link between database What data, from and users – road map for where, how, when etc access Developers Business users Technical users Analysts Maintenance and Less technical growth On-going development
  • 14. Metadata in the data warehouse Business metadata Technical metadata Structure of data Table names Table names Keys Attribute names Indexes Location Program names Access Job dependencies Reliability Transformation Summarisations Execution time Business rules Audit, security controls
  • 15. Metadata Data Metadata 10 bytes 1 byte
  • 16. Metadata Data Metadata Heaps!
  • 17. 0349,000,A," ","CHANGE ADD ON MED CERT "," "," "," "," ","S","GASUP"," ",00000,71909,00000,0,71909,10393470,00000.00,00000.00,00000.00,00000.0 0,00000.00,000000,71937,72266,0,139,600,4,72266,471,360480713,000000000 ,1,00090.00,00037.00,000031543560 ",00000.00,00000.00,+000000.00,0000000,0000,000,00,000000000,00000,0000 0,000000.00,000000.00,000000000,009,72266,00000,72268,16414213,00000000 1,000000000,244,0114340511,04,01,+000000.00,+000000.00,00000,000000,+00 Data – 0000.00,610,0,00146.13,000000.00,000,000,610,0,290763901,290763901,0000 00000,000000000,000069987378 0174,000,D,"C","N","N","Y","Y","Y","N","N","3349533755","Y","T REED ","Y","DSWSINVE106 ","BELOQ","Y","NAWEK","TANIA comma ","REED ","C","N",02651,009,0000,72273,72268,16405202,0114340511,03,72245,0000, 003,011434,000000228855 0174,000,A,"C","N","N","Y","Y","Y","N","N","3349533755","Y","T REED delimited ","Y","DSWSINVE106 ","REED ","BELOQ","Y","NAWEK","TANIA ","C","N",02651,009,0000,72273,72268,16405202,0114340511,04,72245,0000, 003,011434,000000228855 0161,000,D,"A",126,72263,00000.00,600,5,360480713,000007282728 0161,000,A,"A",126,72263,00000.00,600,5,360480713,000007282728 0057,000,D," "," "," "," "," ","A"," "," ","AHMEV","VOKOG",000000003,0814409,2500,001,25,00,00000.00,000000,00,1 32,00000,0,+00063.00,72266,14133031,00000,00000.00,2,+00063.00,01,00000 .00,000000,2,0,0,00000.00,607,1,471,362400470,000000000,000409413299 0057,000,A," ","MANOP "," "," "," ","A"," "," ","AHMEV","VOKOG",000000003,0814409,2500,001,25,00,00000.00,000000,00,1 32,72269,0,+00063.00,72266,14133031,00000,00000.00,2,+00063.00,01,00000 .00,000000,2,0,0,00000.00,607,1,471,362400470,000000000,000409413299 0270,000,A," "," "," ","N","N","G",128,72266,72268,16414261,01,00000.000,00000.000,0,139,000 00.00,00000.00,00000.00,00000.00,00000.00,00000.00,00000.00,00000.00,00 000.00,600,5,471,000,000,000,000,000,000,000,000,0001,360480713,00000.0 0,000602537445 0062,000,A,"YYYYYYYYYY ","AUTH 01532600063000000000131197N014101 0000000 ","VA","SATRA","DSWSAUCK119 "," ","MANOP"," ",003,132,72268,16414266,0000000000,0,607,362400470,000000000,000084800 530
  • 18. Data – in a table
  • 20. 3 layers •User interface Database •Rules and algorithms •Data
  • 21. Application layer Provides views, creates reports Turns data into information Adds, overwrites, deletes data Runs rules and processes Data layer Data in tables Acted on by application layer
  • 22. Can data fit the PRA definition? • We are “format neutral” in the management of records, so…. • Data can be records! – Births Deaths and Marriages Register – EQC claims data • Test questions – If we exclude data what have we lost? – What is the impact of losing data? • On the business • For the future
  • 23. Source solution is not a recordkeeping system The Solution System is not a Application layer recordkeeping system because it… • Holds transactional data, not evidence of transactions in context (records) • Isn‟t tamper proof – Difficult to know exactly what the Data layer application layer is doing – Different tables and rows may be managed differently – Hard to roll back to a point in time • Must overwrite „redundant‟ data to run efficiently – Compromise of history vs speed – Business use is the priority • The data layer is not usable without the application layer
  • 24. Inside a database  Here today - gone tomorrow  Transaction metadata ◦ Example: An activity about a customer is a record  Is there a Unique ID  For the transaction?  For the customer?  Where and when are/were components located?  Multiple data tables in one database  Multiple data tables across multiple database  Table names and column names  Standard names for elements across tables
  • 25. Source / business databases  Data stored in tables  Normalised structure  Lots of data  Large number of users  Lots of very quick transactions  Varying history retained  Mostly data is overwritten
  • 27. Data warehouse Storing and accessing large amounts of data Central repository for all or significant parts of the data that an enterprise’s various business systems collect
  • 28. Multiple Designed for Historical source reporting data systems and analysis Lots of data! Large Transaction queries level data Multiple Corporate table joins effort Centrally Unpredictable owned use Corporate needs Data Pressure on warehouse resources
  • 29. What is the simplest/most robust approach to deliver data and records from databases?
  • 31. 1 Create policy to document:  What authoritative records must be retained and what metadata must be retained  What formats are acceptable  Which (if any) records and metadata are considered transient artefacts, and why (e.g. format shifting duplicates, quality checking etc),  Get approval for destruction of transient artefacts as part of the normal functioning of the systems that dispose of them
  • 32. Approach: create and export records from solution system 1. Identify what data tables/records are needed and that can be produced 2. Map identified records to disposal authorities ◦ Which records must be kept beyond system decommission ◦ Identify the business need for retention 3. Use the application layer to create and export those records in a suitable format 4. Store in recordkeeping system e.g. data warehouse or EDRMS 5. Retain records needed for the business post- decommission
  • 33. 2 Persistently associate metadata  Appropriate metadata associated and retained with authoritative records ◦ Identify data linkages between systems ◦ Retain those linkages or ◦ Consolidate metadata and associated record objects into one system, and ensure they are persistently associated  Ensure migrated data/metadata/objects retain their context (e.g. date created, author etc)
  • 34. Future state BAU transfers to recordkeeping systems Structured data to data warehouse Customer mgmt Case EDRMS system mgmt system Create key records and send to EDRMS
  • 35. Data warehouses as an example of good practice
  • 37. Data feeds - principles  Direct data feeds from source systems  Not changed in any way  No intervening processes  All changes to the data  Fully auditable  Reconcile to source system
  • 38. For Example: one table… Before:  After: 29 months data  29 months data 162 tapes  4 physical files 400 million records  27 million records 88 GB  6 GB Month1 Month2 Month3 ...... Monthn ... Compare Compare ...... ... Differences1 Differences2 ...... ... Consolidated file
  • 39. Subsets  Frequently used data  At a point in time  Smaller, quicker  Easier to use  Daily, weekly, monthly
  • 40. Summary data  Summary layer  Analysts access the summary layer  Smaller, easier  Data Marts
  • 41. Benefits of data warehouse Accessible Stored online Quick and easy to access Multiple sources of data Updated daily Full history – track everything Can do more – freedom to explore Tuned environment One version of the truth
  • 42. Data management  Data does not manage itself!  Difficult, unruly  Standards, processes  Roles and responsibilities  Data warehouse team  Skills ◦ Data warehousing, Data management, Software, Hardware, Metadata, Architectur e, Analysis, Performance, tuning  Coordination, communication, marketing
  • 43. Best practice  Data warehousing around for years  Proven architectures, technologies, methodologies  Good infrastructure  … but will it last?
  • 44.
  • 45. Challenges – big data 33% - data growth contributes to performance issues “most of the time” Managing storage may cost 3-10 times cost of procurement Average company keeps 20- 40 duplicates of its data
  • 46. Helping IT and the business to collaborate in managing data It’s not just about BI Business and IT must work together Helping IT and the business to collaborate in managing data
  • 48. Decommission = risk Old case Old mgmt EDRMS system New New Data case EDRMS mgmt warehouse system Partial exports
  • 49. Data lifecycle management  Data lifecycle management (DLM)  Managing the flow of data, information and associated metadata through information systems and repositories, from creation and storage through to when it can be discarded.  Recognises that the importance and business value of data does not rely on its age, or how often it is used.
  • 50. Why DLM  Data and information has value for ◦ strategic and operational business needs ◦ managing risk ◦ meeting legislative obligations  Value of information decays over time  Some information can be archived, some discarded  Occasionally, sometimes unexpectedly, older data may need to be accessed again, quickly, completely and accurately
  • 51. DLM Components Create or Modify Standards Formats Requires: Core process artefacts Includes data Retrieval Connected systems Automated capture validation Property Retain or Dispose Maintain Archive Customer Organise Transfer Tenancy Describe Destroy Manage Requires: Requires: Disposal Authorities Risk identification Business requirements Lifecycle policies Disposal planning Metadata schema Tiered Storage Business classification linked to business process Use Access Share Find Requires: Single source of truth Disposal Authorities Disposal Planning Tiered Storage
  • 53. Create and maintain Principle 1: Recordkeeping Must be Planned and Implemented 1. Responsibility assigned CEO down 2. Policy 3. Procedures 4. Responsibilities defined, resourced 5. Recordkeeping programme & monitoring
  • 54. Principle 2: Full & accurate records of business activity must be made Requirement Data Data Warehouse base 1. Functions and business activities identified and documented   2. Records of business decisions and transactions must be created   3. All records of business activity captured routinely into an organisation-wide recordkeeping   framework 4. Training provided  
  • 55. Principle 3: records must provide authoritative and reliable evidence of business activity Requirement Data Data base Warehouse 10. Authentic: accurately documented creation, receipt, & transmission   11. Reliability & integrity, maintained unaltered   12. Useable, retrievable, accessible   13. Complete, with content & contextual information   14. Comprehensive, provide authoritative evidence of all business activities  
  • 56. Principle 4: records must be managed systematically Requirement Data Data Warehouse base 15. Identified & captured in recordkeeping framework   16. Organised according to a business classification scheme   17. Reliably maintained over time in recordkeeping framework   18. Useable, accessible & retrievable for the entire period of their retention   19. Contextual and structural integrity maintained over time   20. Retention & disposal actions systematic  
  • 57. RK capability of system(s)  A system that holds authoritative records ◦ Must be capable of recordkeeping, or ◦ Made capable, or ◦ Must transfer records to a recordkeeping system  Who makes that decision? ◦ Should be business owner ◦ (with advice from IT)  Data warehouses show us ◦ what can be done ◦ how to do it
  • 58. Developing an Enterprise Information Management Framework Develop a strategy and Establish principles Assess current and Document legislative INFORMATION CULTURE INFORMATION STEWARDSHIP GOVERNANCE Authority, management, monitoring and performance roadmap Define: desired maturity framework Establish structures and - Policies Determine metrics and Understand compliance of information management arrangements - Standards measuring Determine and optimise functions - Business Rules Define roles and processes Establish monitoring business benefits arrangements processes Manage information risk A blueprint for the semantic Model key information flows Identify: Organise information for: INFORMATION ASSET and physical integration of Establish IS design - Authoritative information - Navigation and retrieval enterprise information principles and standards - High-value information - Discovery ARCHITECTURE assets, technology and the Develop an inventory of - Critical information - Content types and business information, systems and Plan for disaster recovery categorisation processes BUSINESS REFERENCE AND STRUCTURED AND UNSTRUCTURED INTELLIGENCE AND MASTER DATA INFORMATION The DATA MANAGEMENT Develop an information lifecycle strategy and roadmap Develop a recordkeeping strategy and roadmap behaviours, WAREHOUSING values and Capture, store and re-use core Enable integration and interoperability Enable compliant retention and norms of the Oversight of Store and transform business entities Plan and manage: disposal in systems enterprise the content, Integrate and deliver Consolidate and match data - Repositories Support access to legacy within the - Storage information description, Perform analytics and reporting Manage and control data quality context of quality, and Distribute core data appropriately - Format Plan for any content migration information Support decision making accuracy of use enterprise Develop: Map across metadata information throughout METADATA MANAGEMENT - Metadata Schema schemas Manage and The connecting foundation for - Controlled Vocabulary Establish monitoring and sustain its lifecycle EIM, used to describe, organise, - Thesauri maintenance processes change integrate, share, and govern enterprise information assets - Business Function Classification Implement metadata Provide Define Utilise system generated metadata management tools information responsibility, leadership roles and Embed EIM in accountability Establish security policies Manage access control performance Establish SECURITY AND CONTROL Policies, rules and tools that ensure the proper control, and rules Manage classified information management Deliver stewardship Model information security Ensure regulatory compliance processes protection and privacy of and scenarios Establish monitoring and training and Establish information Build security into system metrics ongoing monitoring metadata support and Develop maintenance toolkits and reference Social Emails Audio Mobile IT/OT Transactional material Data Documents Images Text Movies Search
  • 59. Future state of data  Accurate, relevant, timely delivery of data and information ◦ Trustworthy information ◦ Where it is needed ◦ Formats most appropriate to business need and future  Information found quickly, whether it’s old or new  Clear guidelines for systems and processes ◦ Keep what’s needed for only as long as it’s needed ◦ In the right format  Data has recognisable value and appropriate levels of management ◦ Business need: we know what’s important, and when it’s important ◦ Risk: we’re clear about what to manage, and how ◦ Regulatory framework: we meet legislative obligations
  • 60. Our point  Long term preservation of data requires understanding how data is created and managed  We have to work out: ◦ What data the business needs to keep ◦ What records the business needs to create and keep  And….. how ◦ What data must be unchanged ◦ What we mean by usable and retrievable
  • 61. Data in databases “It’s not what you think” Clare Somerville Trish O’Kane

Notas del editor

  1. Adam Brown, Statistics NZGenerally my feedback on ISO 15489 would be: why can't/shouldn't it be applied to data? At the end of the day data is just another record so it really shouldn't be an issue. Having said that I'm not sure that it would particularly add any value to it either. One of the main issues is how to define a record, in terms of data. It is the individual data item or is it a whole dataset? This is certainly the most tricky issue because you generally maintain metadata at the dataset level but potentially slice and dice at lower levels.The other key addition to it for data would have to be a greater focus on usability (7.2.5). As we know, with data this isn't a given to the same extent as it can be for a document. Significantly more information is required to be able to do anything with it - essentially there is a different relationship between data and its metadata than documents and their metadata.In summary, the principles fit for applying it to data (and should be applied!) but as it is it wouldn't add much value.
  2. Adam Brown – Stats NZ
  3. CS: We used to believe that there was 1 byte of metadata for every 10 bytes of data.
  4. CS: But those numbers are changing with metadata now exceeding data.
  5. Data is a set of values, in this case in a comma delimited file. I need more information in order to know how to read this.
  6. From FMG
  7. Volume, velocity, variety, complexity! Things like smart grids causing significant data volume rises.Velocity – speed produced, received, processedVariety – structured databases, emails, metering, video, image. Financial trans etc. Much unstr. – content analytics, taxonomy, ontology. Non-trad BI tools
  8. We’ve built skills in DI, DQ in BI and analytics. Have core data management skills to support BI.Other data requires the same DM practices
  9. This slide provides some background on what data lifecycle management is, and generic reasons as to why it is important.
  10. This slide provides a brief high-level over view of the future state of DLM at HNZCAccurate, relevant, timely delivery implies: data will be managed through it’s lifecycle in a way that ensure there is a single source of truth, meeting the needs of users, and that can be accessed in a timely manner. “Everyone who needs it” will include mobile workers, this is also why format is an important consideration. This is just a take on “the right information to the right person at the right time (and in the right format)Finding old or new information quickly implies: information is described by metadata so that it has context and meaning, and systems use that metadata to locate relevant content and deliver information to users.Clear guidelines implies: principles will be agreed on, which will decompose into business rules for systems. For example: that X information should be kept for Y years, then disposed of by following Z procedures.Appropriate levels of management implies: the value of data is understood within the context of business need, risk and legislation, and that processes are in place to manage how business need Is determined, and how risk and legislative obligations are managed.