Data-Ed Online: Let's Talk Metadata: Strategies and Successes

TITLE
Welcome!

Let’s Talk Metadata:
Strategies and Successes

Date: September 11,
2012
Time: 2:00 PM ET
Presented by: Dr. Peter Aiken

PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 8/14/2012 1
09/14/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

TITLE
Commonly Asked Questions
1) Will I get copies of the slides after the event?

YES*

2) Is this being recorded so I can view it afterwards?

YES*


TITLE
Get Social With Us!

Live Twitter Feed Like Us on Facebook Join the Group
Join the conversation! www.facebook.com/datablueprintData Management &
Follow us: Business Intelligence
@datablueprint Post questions and Ask questions, gain
@paiken comments insights and collaborate
Find industry news, with fellow data
Ask questions and submit
insightful content management
your comments: #dataed
professionals
and event updates.


TITLE
Meet Your Presenter: Dr. Peter Aiken
• Internationally recognized thought-leader in
the data management field with more than
30 years of experience
• Recipient of the 2010 International Stevens
Award
• Founding Director of Data Blueprint
(http://datablueprint.com)
• Associate Professor of Information Systems
at Virginia Commonwealth University
(http://vcu.edu)
• President of DAMA International (http://dama.org)
• DoD Computer Scientist, Reverse Engineering Program Manager/
Office of the Chief Information Officer
• Visiting Scientist, Software Engineering Institute/Carnegie Mellon
University
• 7 books and dozens of articles
• Experienced w/ 500+ data management practices in 20 countries
#dataed

Let’s Talk Metadata:
Strategies and
Successes

Let’s Talk Metadata: Strategies and Successes
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA
23060
EDUCATION

TITLE

Abstract: Metadata Practices
This presentation describes how data
management can be enhanced using meta-
processing. Commonly described as metadata
management, properly implemented metadata
practices incorporate data structures into more
abstract processing. By using data about the
data to enhance its value, its understandability,
its ease of use, and many other options –
organizations have developed sophisticated
ways to enhance their data management and
especially their data quality engineering efforts.


TITLE
Outline

1. Data Management Overview
2. What is metadata and why is it
important?
3. Types of metadata
4. Metadata for unstructured data
5. Strategy and implementation
6. Guiding Principles
7. Take Aways, References and
Q&A Tweeting now:
#dataed


TITLE
The DAMA Guide to the Data Management Body of Knowledge
Published by DAMA
International
•The professional
association for Data
Managers (40 chapters
worldwide)
DMBoK organized
around
•Primary data
management functions
focused around data
delivery to the
organization
•Organized around
several environmental
elements

Data
Management
Functions

TITLE
The DAMA Guide to the Data Management Body of Knowledge

Amazon:
http://www.amazon.com
Or enter the terms
"dama dm bok" at
the Amazon search
engine

Environmental Elements

TITLE

Data Management


TITLE

Data Management
Manage data coherently.
Manage data coherently.

Data Program
Coordination
Share data across boundaries.
Share data across boundaries.
Organizational
Assign responsibilities for data.
Assign responsibilities for data.
Data Integration

Engineer data delivery systems.
Engineer data delivery systems.
Data Data
Stewardship Development

Data Support
Maintain data availability.
Maintain data availability. Operations


TITLE

Data Management


from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE

Metadata Management

1/26/2010

TITLE
Outline

important?
Q&A Tweeting now:
#dataed


TITLE

Metadata or metadata
• In the history of language, whenever two words
are pasted together to form a combined concept
initially, a hyphen links them.
• With the passage of time,
the hyphen is lost. The
argument can be made
that that time has passed.
• There is a copyright on
the term "metadata," but
it has not been enforced.
• So, term is "metadata"


TITLE

Definitions
Metadata is …
•… everywhere in every data management activity and integral
to all IT systems and applications.
•… to data what data is to real life. Data reflects real life transactions, events,
objects, relationships, etc. Metadata reflects data transactions, events, objects,
relations, etc.
•… the data that describe the structure and workings of an
organization’s use of information, and which describe the
systems it uses to manage that information.
[quote from David Hay's new book, page 4]
•Data describing various facets of a data asset, for the purpose of improving its
usability throughout its life cycle [Gartner 2010]
•Metadata unlocks the value of data, and therefore requires management
attention [Gartner 2010]
Metadata Management is …
•… the set of processes that ensure proper creation, storage, integration, and
control to support associated use of metadata

TITLE

Analogy: Card catalog in a library
• Card catalog identifies what books
are stored in the library and where
they are located in the building
• Users can search for books by
subject area, author, or title
• Catalog shows author, subject tags, publication date and
revision history of each book
• Card catalog information helps determine which books
will meet the reader’s needs
• Without this catalog resource, finding books in the library
would be difficult, time consuming and frustrating
• Readers may search many incorrect books before
finding the right book if a catalog does not exist

Definition, cont’d
TITLE

• Metadata is the card catalog in a
managed data environment
• Abstractly, Metadata is the descriptive
tags or context on the data (the content)
in a managed data environment
• Metadata shows business and technical
users where to find information in data
repositories
• Metadata provides details on where the
data came from, how it got there, any
transformations, and its level of quality
• Metadata provides assistance with what
the data really means and how to
interpret it from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

TITLE

Defining Metadata

Who

Metadata is any What How
combination of
any circle and the Data
data in the center
of the spark! Where
Why

When
Adapted from Brad Melton


Library Metadata Example
TITLE

Libraries can operate efficiently through careful use of metadata (Card Catalog)

Who: Author
What: Title Who
Where: Shelf
Location What How
When: Publication Dat
Dat
Date Data
Data
a
a
Library Book
Manage a large
amount of data (the Why Where
Library) with a small
amount of metadata When
(Card Catalog)

TITLE

Outlook Example

Who
"Outlook" metadata is
used to navigate and
manage email What How
Imagine how
Data
Messages
managing e-mail
(already non-trivial)
would change if Where
Why
Outlook did not make
use of metadata
When


TITLE

Outlook Example, cont’d

Who: "To" & "From"
What: "Subject"
How: "Priority"
Where:"USERID/Inbox",
"USERID/Personal Folders"
Why: "Body"
When: "Sent" & "Received”

•Find the important stuff/weed out junk
•Organize for future access/outlook rules


Metadata practices connect data sources and
TITLE

uses in an organized and efficient manner
Metadata Practices
Metadata Metadata Metadata
Engineering Storage Delivery
Sources Uses
Metadata Governance

• What is the structure of metadata practices?
– Storage: repository, glossary, models, lineage - currently multiple technologies
are used
– Engineering: identifying/harvesting/normalizing/administer evolving metadata
structures
– Delivery: supply/access/portal/definition/lookup search identify/ensure required
metadata supplies to meet business needs
– Governance: ensure proper/creation/storage/integration/control to support
effective use
• When executed, engineering and delivery implement governance

Metadata Practices will be inextricably intertwined with
TITLE
Extraction
Data Quality and Master Data and Knowledge Sources
Management, (among other EIM Functions)
Organized Knowledge 'Data' Knowledge
Management
Practices
Routine Data Scans Data Organization Practices

Data that might benefit from
Suspected/ Master Management
Identified
Master Data Catalogs
Data
Quality Master Data
Problems Management
Data Quality Practices
Engineering

Routine Data Scans
Improved Quality Data
Operational Data


TITLE

Metadata History 1990-2008
The history of Metadata management tools and products
seems to be a metaphor for the lack of a methodological
approach to enterprise information management:

• Lack of standards and proprietary nature of most managed
Metadata solutions cause many organizations to avoid
focusing on metadata
• This limits organizations’ ability to develop a true enterprise
information management environment
• Increased attention given to information and its importance to
an organization’s operations and decision-making will drive
Metadata management products and solutions to become
more standardized
• More recognition to the need for a methodological approach
to managing information and metadata

TITLE

Metadata History: The 1990s
• Business managers began to recognize the
value of Metadata repositories
• Newer tools expanded the scope
• Potential benefits identified during this period
include:
– Providing semantic layer between company’s system
and business users
– Reducing training costs
– Making strategic information more valuable as aid in
decision making
– Creating actionable information
– Limiting incorrect decisions

TITLE

Metadata History: Mid-to late 1990s
• Metadata becomes more relevant to corporations who were
struggling to understand their information resources caused by:
– Y2K deadline
– Emerging data warehousing initiatives
– Growing focus around the World Wide Web
• Beginning of efforts to try to standardize Metadata definition and
exchange between applications in the enterprise
• Examples of standardization:
– 1995: CASE Definition Interchange Facility (CDIF)
– 1995: Dublin Core Metadata Elements
– 1994 – 1999: First parts of ISO 11179 standard for Specification and
Standardization of Data Elements were published
– 1998: Common Warehouse Metadata Model (CWM)
– 1995: Metadata Coalitions’ (MDC) Open Information Model
– 2000: Both standards merged into CSM. Many Metadata repositories
began promising adoption of CWM standard

TITLE

Metadata History: 21st Century
• Update of existing Metadata repositories for deployment on the web
• Introduction of products to support CWM
• Vendors begin focusing on Metadata as an additional product
offering
• Few organizations purchase or develop Metadata repositories
• Effective enterprise-wide Managed Metadata Environments are rare
due to:
– Scarcity of people with real world skills
– Difficulty of the effort
– Less than stellar success of some of the initial efforts at some
companies
– Stagnation of the tool market after the initial burst of interest in late 90s
– Still less than universal understanding of the business benefits
– Too heavy emphasis on legacy applications and technical metadata

TITLE
Polling Question #1
What have been the driving factors in focusing on
metadata within the last decade?

a. Recent entry of smaller vendors into the market
b. Challenges related to addressing regulatory requirements
c. Declination to the existing Metadata standards


TITLE

Metadata History: Current Decade
• Focus on need for and importance of metadata
• Focus on how to incorporate Metadata beyond traditional
structured sources and include unstructured sources
• Driving factors:
– Recent entry of larger vendors into the market
– Challenges related to addressing regulatory requirements, e.g.
Sarbanes-Oxley, and privacy requirements with unsophisticated tools
– Emergence of enterprise-wide initiatives, e.g. information governance,
compliance, enterprise architecture, automated software reuse
– Improvements to the existing Metadata standards, e.g. RFP release of
new OMG standard Information Management Metamodel (IMM), which
will replace CWM
– Recognition at the highest levels that information is an asset that must
be actively and effectively managed


TITLE
Outline

important?
Q&A Tweeting now:
#dataed


TITLE

Types of Metadata: Process Metadata
• Process Metadata is...
– Data that defines and describes the
characteristics of other system elements, e.g.
processes, business rules, programs, jobs, tools,
etc.
• Examples of Process metadata:
– Data stores and data involved
– Government/regulatory bodies
– Organization owners and stakeholders
– Process dependencies and decomposition
– Process feedback loop and documentation
– Process name

TITLE

Business Process Metadata

Who
Who: Created
the
document What How
ation?
What: Are the Data
important
dependen
cies Why Where
among
the
processes When
?
DATA BLUEPRINT 10124-C W.Do the
How: BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 09/14/12 33

TITLE

Types of Metadata: Business Metadata
• Business Metadata describe
to the end user what data are
available, what they mean and
how to retrieve them.
• Included are:
– Business names and definitions of subject and
concept areas, entities, attributes
– Attribute data types and other attribute properties
– Range descriptions, calculations, algorithms and
business rules
– Valid domain values and their definitions

TITLE
Types of Metadata: Technical & Operational Metadata
• Technical and operational metadata provides developers
and technical users with information about their systems
• Technical metadata includes…
– Physical database table and column names, column properties, other
properties, other database object properties and database storage
• Operational metadata is targeted at IT operations
users’ needs, including…
– Information about data movement, source and target systems, batch
programs, job frequency, schedule anomalies, recovery and backup
information, archive rules and usage
• Examples of Technical & Operational metadata:
– Audit controls and balancing information
– Data archiving and retention rules
– Encoding/reference table conversions
– History of extracts and results

TITLE

Types of Metadata: Data Stewardship
• Data stewardship Metadata is about...
– Data stewards, stewardship processes, and responsibility
assignments
• Data stewards…
– Assure that data and Metadata are accurate, with high quality
across the enterprise.
– Establish and monitor data sharing.
• Examples of Data stewardship metadata:
– Business drivers/goals
– Data CRUD rules
– Data definitions – business and technical
– Data owners
– Data sharing rules and agreements/contracts
– Data stewards, roles and responsibilities


TITLE

Types of Metadata: Provenance
• Provenance:
– the history of ownership of a valued object or
work of art or literature" [Merriam Webster]
– For each datum, this is the description of:
• Its source (system or person or department),
• Any derivation used, and
• The date it was created.
– Examples of Data Provenance:
• The programs or processes by which it was created
• Its owner
• The steward responsible for its quality
• Other roles and responsibilities
• Rules for sharing it.


TITLE
Outline

important?
Q&A Tweeting now:
#dataed


TITLE

Metadata Subject Areas
Subject Areas Components
1) Business Analytics Data definitions, reports, users, usage, performance

2) Business Architecture Roles and organizations, goals and objectives
Business terms and explanations for a particular
3) Business Definitions concept, fact, or other item found in an organization

4) Business Rules Standard calculations and derivation methods

Policies, standards, procedures, programs, roles,
5) Data Governance organizations, stewardship assignments

Sources, targets, transformations, lineage, ETL
6) Data Integration workflows, EAI, EII, migration/conversion

7) Data Quality Defects, metrics, ratings

Unstructured data, documents, taxonomies,
8) Document Content
ontologies, name sets, legal discovery, search engine
Management indexes

TITLE

Metadata Subject Areas, cont’d
Subject Areas Components
9) Information Technology
Platforms, networks, configurations, licenses
Infrastructure
Entities, attributes, relationships and rules, business
10) Conceptual data models
names and definitions.
Files, tables, columns, views, business definitions,
11) Logical Data Models
indexes, usage, performance, change management
Functions, activities, roles, inputs/outputs, workflow,
12) Process Models
timing, stores
13) Systems Portfolio and IT Databases, applications, projects, and programs,
Governance integration roadmap, change management
14) Service-oriented
Architecture (SOA) Components, services, messages, master data
information:
15) System Design and
Requirements, designs and test plans, impact
Development
Data security, licenses, configuration, reliability,
16) Systems Management
service levels

TITLE

Benefits of Metadata
1) Increase the value of strategic information (e.g. data warehousing,
CRM, SCM, etc.) by providing context for the data, thus aiding analysts in
making more effective decisions.
2) Reduce training costs and lower the impact of staff turnover through
thorough documentation of data context, history, and origin.
3) Reduce data-oriented research time by assisting business analysts in
finding the information they need in a timely manner.
4) Improve communication by bridging the gap between business users and
IT professionals, leveraging work done by other teams and increasing
confidence in IT system data.
5) Increased speed of system development’s time-to-market by reducing
system development life-cycle time.
6) Reduce risk of project failure through better impact analysis at various
levels during change management.
7) Identify and reduce redundant data and processes, thereby reducing
rework and use of redundant, out-of-data, or incorrect data.

1/26/2010

TITLE

Metadata for Unstructured Data
• Unstructured data = any data that is not in a database or data file,
including documents or other media data

• Metadata describes both structured and unstructured data
• Metadata for unstructured data exists in many formats, responding
to a variety of different requirements
• Examples of Metadata repositories describing unstructured data:
– Content management applications
– University websites
– Company intranet sites
– Data archives
– Electronic journals collections
– Community resource lists

• Common method for classifying Metadata in unstructured
sources is to describe them as descriptive metadata,
structural metadata, or administrative metadata

1/26/2010

TITLE
Metadata for Unstructured Data: Examples
Examples of descriptive
metadata:
• Catalog information
• Thesauri keyword terms Examples of
administrative metadata
• Source(s)
Examples of structural • Integration/update schedule
• Access rights
metadata
• Page relationships (e.g. site
• Dublin Core navigational design)
• Field structures
• Format (audio/visual, booklet)
• Thesauri keyword labels
• XML schemas

TITLE

Sources of Metadata
Primary Sources:
• Virtually anything named in an organization

Secondary sources:
• Other Metadata repositories, accessed using
bridge software
• CASE tools, ETL tools

Many data management tools create and use
repositories for their own use.


}
TITLE

Specific Example
Four metadata sources:
ADRM
1.Existing reference
models (i.e., ADRM)

2.Conceptual model
created two years ago

3.Existing systems (to
be reverse engineered)

4.Enterprise data model


TITLE
Outline

important?
Q&A Tweeting now:
#dataed


TITLE

Metadata Strategy
• Metadata Strategy is…
– … a statement of direction in Metadata management by the enterprise
– … a statement of intend that acts as a reference framework for the
development teams
– …driven by business objectives and prioritized by the business value
they bring to the organization
• Build a Metadata strategy from a set of defined components
• Primary focus of Metadata strategy: gain an understanding of and
consensus on the organization’s key business drivers, issues, and
information requirements for the enterprise Metadata program
• Need to understand how well the current environment meets these
requirements now and in the future
• Metadata strategy objectives define the organization’s future
enterprise Metadata architecture and recommend logical
progression of phased implementation steps

TITLE
Metadata Strategy Implementation Phases


TITLE

Metadata Management

      
      
      
      
      
      

      

      

1/26/2010

TITLE

Goals and Principles
1. Provide organizational
understanding of terms and
usage
2. Integrate Metadata from
diverse sources
3. Provide easy, integrated
access to metadata
4. Ensure Metadata quality and
security

1/26/2010

TITLE

Activities
1) Understand Metadata requirements
2) Define the Metadata architecture
3) Develop and maintain Metadata standards
4) Implement a managed Metadata environment
5) Create and maintain metadata
6) Integrate metadata
7) Management Metadata repositories
8) Distribute and deliver metadata
9) Query, report and analyze metadata

1/26/2010

TITLE

Activities: Metadata Standards Types
• Two major types exist:
1) Industry or consensus
standards
2) International standards

• High level framework shows
how standards are related and
how they rely on each other for
context and usage:


TITLE
Activities: Noteworthy Metadata Standards Types
Common Warehouse Metadata (CWM):
• Specifies the interchange of Metadata among data warehousing, BI, KM,
and portal technologies.
• Based on UML and depends on it to represent object-oriented data
constructs.

The CWM Metamodel
Management Warehouse Process Warehouse Operation
Data Information Business
Analysis Transformation OLAP
Mining Visualization Nomenclature
Object
Resource Relational Record Multidimensional XML
Model
Business Keys and Type Software
Data Types Expression
Foundation Information Indexes Mapping Deployment
Object Model


TITLE
Information Management Metamodel (IMM)
• Object Management Group Project to replace CWM
• Concerned with:
– Business Modeling
• Entity/relationship metamodel
– Technology modeling
• Relational Databases
• XML
• LDAP
– Model Management
• Traceability
– Compatibility with related models
• Semantics of business vocabulary and business rules
• Ontology Definition Metamodel


TITLE
The Information Management Metamodel...
• Based on Core model.
• Used to translate from one model to another.


TITLE

Primary Deliverables
• Metadata repositories

• Quality metadata

• Metadata analysis

• Data lineage

• Change impact analysis

• Metadata control procedures

• Metadata models and architecture

• Metadata management operational analysis
1/26/2010

TITLE

Roles and Responsibilities
Suppliers:
– Data Stewards
– Data Architects
– Data Modelers
– Database Administrators
– Other Data Professionals
– Data Brokers
– Government and Industry
Regulators

Participants:
– Metadata Specialists
– Data Integration Architects Consumers:
– Data Stewards
– Data Architects and Modelers • Data Stewards
– Database Administrators • Data Professionals
– Other DM Professionals • Other IT Professionals
– Other IT Professionals • Knowledge Workers
– DM Executives • Managers and Executives
– Business Users
• Customers and Collaborators
• Business Users

1/26/2010

TITLE

Technology
• Metadata repositories
• Data modeling tools
• Database management systems
• Data integration tools
• Business intelligence tools
• System management tools
• Object modeling tools
• Process modeling tools
• Report generating tools
• Data quality tools
• Data development and administration tools
• Reference and mater data management tools

TITLE
Outline

important?
Q&A Tweeting now:
#dataed


TITLE

Guiding Principles
1) Establish and maintain a Metadata strategy and
appropriate policies, especially clear goals and
objectives for Metadata management and usage
2) Secure sustained commitment, funding, and vocal support from
senior management concerning Metadata management for the
enterprise
3) Take an enterprise perspective to ensure future extensibility, but
implement through iterative and incremental delivery
4) Develop a Metadata strategy before evaluating, purchasing, and
installing Metadata management products
5) Create or adopt Metadata standards to ensure interoperability of
Metadata across the enterprise
6) Ensure effective Metadata acquisition for internal and external
metadata
7) Maximize user access since a solution that is not accessed or is
under-accessed will not show business value

TITLE

Guiding Principles, cont’d
8) Understand and communicate the necessity of
Metadata and the purpose of each type of
metadata; socialization of the value of Metadata
will encourage business usage
9) Measure content and usage
10) Leverage XML, messaging and web services
8) Establish and maintain enterprise-wide business involvement in data
stewardship, assigning accountability for metadata
9) Define and monitor procedures and processes to ensure correct policy
implementation
10) Include a focus on roles, staffing, standards, procedures, training, and
metrics
11) Provide dedicated Metadata experts to the project and beyond
12) Certify Metadata quality


TITLE
Using metadata descriptions of Bluetooth devices

Data Column Attributes/Fields
CGL Trackpad Keyboard VCU
IDR Trackpad Motorola S9
Motorola S9 Peter's i4
Peter's i4 Trackpad CGL
VCU Keyboard Trackpad IDR
VCU Trackpad Trackpad VCU


TITLE
Example: iTunes Metadata

• Example:
– iTunes Metadata
• Insert a recently
purchased CD
• iTunes can:
– Count the number
of tracks (25)
– Determine the
length of each
track


TITLE

• When connected to
the Internet iTunes
connects to the
Gracenote(.com)
Media Database and
retrieves:
– CD Name
– Artist
– Track Names
– Genre
– Artwork
• Sure would be a pain
to type in all this
information


TITLE

• To organize
iTunes
– I create a
"New Smart
Playlist" for
Artist's
containing
"Miles Davis"


TITLE

• Notice I didn't get the
desired results
• I already had another
Miles Davis recording
in iTunes
• Must fine-tune the
request to get the
desired results
– Album contains "The
complete birth of the
cool"
• Now I can move the
playlist "Miles Davis"
PRODUCED BY
to a folder
CLASSIFICATION
DATE SLIDE

TITLE

• The same:
– Interface
– Processing
– Data Structures
• are applied to
– Podcasts
– Movies
– Books
– .pdf files
• Economies of scale
are enormous
CLASSIFICATION DATE SLIDE
PRODUCED BY

TITLE
Outline

important?
Q&A Tweeting now:
#dataed


TITLE

Summary

1/26/2010

TITLE

References & Recommended Reading

1/26/2010

TITLE

References, cont’d

1/26/2010

TITLE


1/26/2010

TITLE

Questions?

+ =

It’s your turn!
Use the chat feature or Twitter (#dataed) to submit
your questions to Peter now.


TITLE
Upcoming Events
October Webinar:
Engineering Solutions to Data Quality Challenges
October 9, 2012 @ 2:00 PM – 3:30 PM ET
(11:00 AM-12:30 PM PT)
November Webinar:
Get the Most Out of Your Tools:
Data Management Technologies
November 13, 2012 @ 2:00 PM – 3:30 PM ET
(11:00 AM-12:30 PM PT)
Sign up here:
•www.datablueprint.com/webinar-schedule
•www.Dataversity.net
Brought to you by:


Data-Ed Online: Let's Talk Metadata: Strategies and Successes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Data-Ed Online: Let's Talk Metadata: Strategies and Successes

Similar to Data-Ed Online: Let's Talk Metadata: Strategies and Successes (20)

More from Data Blueprint

More from Data Blueprint (20)

Recently uploaded

Recently uploaded (20)

Data-Ed Online: Let's Talk Metadata: Strategies and Successes

Editor's Notes