Publicidad
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Publicidad
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Publicidad
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Publicidad
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Publicidad
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Publicidad
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Publicidad
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Publicidad
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Publicidad
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Publicidad
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
Próximo SlideShare
Data Strategy Best PracticesData Strategy Best Practices
Cargando en ... 3
1 de 54
Publicidad

Más contenido relacionado

Presentaciones para ti(20)

Similar a Why Data Modeling Is Fundamental(20)

Publicidad

Más de DATAVERSITY(20)

Último(20)

Publicidad

Why Data Modeling Is Fundamental

  1. © Copyright 2021 by Peter Aiken Slide # 1 paiken@plusanythingawesome.com+1.804.382.5957 Peter Aiken, PhD Why Data Modeling is Fundamental Peter Aiken, Ph.D. • I've been doing this a long time • My work is recognized as useful • Associate Professor of IS (vcu.edu) • Institute for Defense Analyses (ida.org) • DAMA International (dama.org) • MIT CDO Society (iscdo.org) • Anything Awesome (plusanythingawesome.com) • 11 books and dozens of articles • Experienced w/ 500+ data management practices worldwide • Multi-year immersions – US DoD (DISA/Army/Marines/DLA) – Nokia – Deutsche Bank – Wells Fargo – Walmart … © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 2
  2. Supporting Your Fundamental Data Modeling Needs
  3. Data Modeling Challenges Heterogeneous Databases Technologies Agile Data Modeling Semantic Integration © 2021 erwin, Inc. All rights reserved. 2 Integrating Modeling Techniques
  4. Data Modeling Requires Broad DBMS Support and Integration Metadata Transformation & Exchange Import Bridges from Vendor Tools Export Bridges to Vendor Tools Data & Object Modeling Data Integration (EAI, ETL, EII) Business Intelligence (OLAP, Reporting) Data & Object Modeling Data Integration (EAI, ETL, EII) Business Intelligence (OLAP, Reporting) © 2021 erwin, Inc. All rights reserved. 3
  5. Data Modeling Requires Multiple, Integrated Design Paradigms Relational Design • Structure is driven by how we expect to query the data • Overarching principle is query optimization – access all relevant data from fewest containers • Leads to simple queries with minimal joins: faster = good candidate for real-time applications NoSQL Document Design • Structure is driven by what data we want to capture and store • Overarching principle is storage optimization – store atomic data once • Leads to complex queries with multiple joins: slower = poor candidate for real-time applications © 2021 erwin, Inc. All rights reserved. 4
  6. Data Vault 2.0 Hubs – A list of unique business keys Links – An intersection of business keys (two or more) Satellites – Non-key descriptive columns that change over time Bridges – A combination of primary and business keys spread across multiple Hubs and Links PIT – “Point in Time” combination of primary and business keys from a single Hub and its surrounding Satellites Reference – A collection of code and description lookup structures that are generally resolved as run-time queries Data Vault is a comprehensive methodology (rules, best practices, standards, process designs and more) composed of three pillars: Architecture, Model, and Methodology. Data Vault is designed and used for solving enterprise level issues such as: agility, scalability, flexibility, auditability and consistency.” The Data Vault Alliance “ © 2021 erwin, Inc. All rights reserved. 5
  7. User Access and Permission Management Model Check In/Check Out Model Change Control with Version Management Concurrent Modeling with Conflict Resolution Cross Model Reporting Centralized Standards Access and Management Model Mart Data Analyst Data Architect DBA & Developer Model Management Services Model Management Services Governed Collaboration for Data Modeling © 2021 erwin, Inc. All rights reserved. 6
  8. Semantic Integration – Microsoft Common Data Model (CDM) Automatically transform the Microsoft CDM into a graphical model, complete with business data constructs and semantic metadata Feed existing data models and database designs with reusable CDM constructs and semantics Manage ongoing integration and reuse of CDM best practices through compare, synchronization and automated model templates © 2021 erwin, Inc. All rights reserved. 7
  9. Optimize Data Design, Dev/OPS, Literacy & Governance with Data Modeling 8 Data Modeling Lower Design Costs Reduced Design Risks Improved Design Quality Increased Design Agility REDUCE TOTAL COST OF OWNERSHIP ACCELERATE TIME TO VALUE Assuring Business Alignment Reducing Expensive Re-Work Easing Integration Enabling Collaboration © 2021 erwin, Inc. All rights reserved.
  10. © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com Why Data Modeling is Fundamental Program • Data Management Contextual Overview • Motivation – of systems/components – Data is not well understood • Why data modeling & what is it? – Model represents our understanding of the – Fundamental, foundational system characteristics – Shared between systems and humans • Fundamentals – The power of the purpose statement – Understanding data centric thinking – Data modeling compliments other architecture/engineering techniques – Challenges beyond data modeling • Take Aways, References, Q&A X 2020 AA market value ~ $6b AAdvantage valued between $19.5-$31.5 United market value ~ 9$b MileagePlus ~ $22b https://www.forbes.com/sites/advisor/2020/07/15/how-airlines-make-billions-from-monetizing-frequent-flyer-programs/?sh=66da87a614e9 Data Assets Financial Assets Real Estate Assets Inventory Assets Non- depletable Available for subsequent use Can be used up Can be used up Non- degrading √ √ Can degrade over time Can degrade over time Durable Non-taxed √ √ Strategic Asset √ √ √ √ Data Assets Win! • Today, data is the most powerful, yet underutilized and poorly managed organizational asset • Data is your – Sole – Non-depletable – Non-degrading – Durable – Strategic • Asset – Data is the new oil! – Data is the new (s)oil! – Data is the new bacon! • As such, data deserves: – It's own strategy – Attention on par with similar organizational assets – Professional ministration to make up for past neglect © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 4 Asset: A resource controlled by the organization as a result of past events or transactions and from which future economic benefits are expected to flow [Wikipedia] Data Assets Win!
  11. Why Model? © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 5 • Would you build a house without an architecture sketch? • Model is the sketch of the system to be built in a project. • Would you like to have an estimate how much your new house is going to cost? • Your model gives you a very good idea of how demanding the implementation work is going to be! • If you hired a set of constructors from all over the world to build your house, would you like them to have a common language? • Model is the common language for the project team. • Would you like to verify the proposals of the construction team before the work gets started? • Models can be reviewed before thousands of hours of implementation work will be done. • If it was a great house, would you like to build something rather similar again, in another place? • It is possible to implement the system to various platforms using the same model. • Would you drill into a wall of your house without a map of the plumbing and electric lines? • Models document the system built in a project. This makes life easier for the support and maintenance! powerpivotpro.com Augusta Ada King (aka Lady Ada, Countess of Lovelace) © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 6 https://people.well.com/user/adatoole/bio.htm Jacquard machine 1804 ≈ • 8,000+ years • formalize practices • GAAP
  12. © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 7 Unrefined data management definition Sources Uses Data Management © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 8 More refined data management definition Sources Reuse Data Management ➜ ➜
  13. Better still data management definition © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 9 Data Governance Data Assets/Ethical Framework Sources ➜ Use ➜Reuse ➜ You can accomplish Advanced Data Practices without becoming proficient in the Foundational Data Practices however this will: • Take longer • Cost more • Deliver less • Present greater risk (with thanks to Tom DeMarco) Data Management Practices Hierarchy © Copyright 2021 by Peter Aiken Slide # Advanced Data Practices • MDM • Mining • Big Data • Analytics • Warehousing • SOA Foundational Data Practices Data Platform/Architecture Data Governance Data Quality Data Operations Data Management Strategy T e c h n o l o g i e s C a p a b i l i t i e s https://plusanythingawesome.com 10
  14. Digital Insight • Subtract data from digital and what do you have? • Subtract digital from data and you still have data © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 11 https://www.linkedin.com/in/mark-johnson-518a752/ DIGITAL DATA ? DIGITAL DATA DATA © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com Recent Technology Realization 12 Recent
  15. ( Bad Data ) + Anything Awesome ( will always yield ) Bad Results © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 13 Garbage In ➜ Garbage Out! + © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 14 Perfect Model Garbage Data Garbage Results Data Warehouse Machine Learning Business Intelligence Block Chain AI MDM Data Governance Analytics Technology GI➜GO!
  16. © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 15 Perfect Model Garbage Data Garbage Results Data Warehouse Machine Learning Block Chain AI MDM Analytics Technology Data Governance GI➜GO! Business Intelligence © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 16 Perfect Model Quality Data Good Results Data Warehouse Machine Learning Business Intelligence Block Chain AI MDM Analytics Technology Data Governance Quality In ➜ Quality Out!
  17. It isn't possible to go digital Digital © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 17 a By just spelling 'data' Dat © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 18
  18. It requires more work Data © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com a 19 © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com Metadata Management 20 Data Management Body of Knowledge (DM BoK V2) Practice Areas from The DAMA Guide to the Data Management Body of Knowledge 2E © 2017 by DAMA International • Analysis • Database Design • Implementation • Additional data development
  19. DAMA DM BoK: Data Development © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 21 from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com Why Data Modeling is Fundamental Program • Data Management Contextual Overview • Motivation – of systems/components – Data is not well understood • Why data modeling & what is it? – Model represents our understanding of the – Fundamental, foundational system characteristics – Shared between systems and humans • Fundamentals – The power of the purpose statement – Understanding data centric thinking – Data modeling compliments other architecture/engineering techniques – Challenges beyond data modeling • Take Aways, References, Q&A X
  20. Data Architectures: here, whether you like it or not © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 23 deviantart.com • All organizations have data architectures – Some are better understood and documented (and therefore more useful to the organization) than others Levels of Abstraction, Completeness and Utility © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 24 • Models more downward facing - detail • Architecture is higher level of abstraction - integration • In the past architecture attempted to gain complete (perfect) understanding – Not timely – Not feasible • Focus instead on architectural components – Governed by a framework – More immediate utility • http://www.architecturalcomponentsinc.com
  21. How are components expressed as architectures? • Details are organized into larger components • Larger components are organized into models • Models are organized into architectures (composed of architectural components) © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 25 A B C D A B C D A D C B Intricate Dependencies Purposefulness How are data structures expressed as architectures? • Attributes are organized into entities/objects – Attributes are characteristics of "things" – Entitles/objects are "things" whose information is managed in support of strategy – Example(s) • Entities/objects are organized into models – Combinations of attributes and entities are structured to represent information requirements – Poorly structured data, constrains organizational information delivery capabilities – Example(s) • Models are organized into architectures – When building new systems, architectures are used to plan development – More often, data managers do not know what existing architectures are and - therefore - cannot make use of them in support of strategy implementation © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 26 Intricate Dependencies Purposefulness THING Thing.Id # Thing.Description Thing.Status Thing.Sex.To.Be.Assigned Thing.Reserve.Reason
  22. Q: What is an Attribute? • What does the existence of this attribute tell us? – Clubs need to be identified (#) separately from one another – Club-specific information is likely maintained – Some concept (organization) exists above the 'club level' – ... © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 27 A: Attribute Definition • Attributes describe an entity and attribute values describe “instances of business things” © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 28
  23. Entities organized into a model • Defines mandatory/optional relationships using minimum/ maximum occurrences from one entity to another © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 29 Data architectures are composed of data models © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 30
  24. Working While Bleeding © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 31 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $$$$$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $$$$$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $$$$$ $ $ $ $ $ Data incoherence is a hidden expense • How does maltreated data cost money? • Consider the opposite question: – Were your systems explicitly designed to be integrated or otherwise work together? – If not then what is the likelihood that they will work well together? • Organizations spend 20-40% of their IT budget evolving their data - including: – Data migration • Changing the location from one place to another – Data conversion • Changing data into another form, state, or product – Data improving • "Inspecting and manipulating, or re-keying data to prepare it for subsequent use" - Source: John Zachman © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 32 PETER AIKEN WITH JUANITA BILLINGS FOREWORD BY JOHN BOTTEGA MONETIZING DATA MANAGEMENT Unlocking the Value in Your Organization’s Most Important Asset.
  25. As a topic, data is ... Complex & detailed • Outsiders do not want to hear about or discuss any aspects of challenges/solutions • Most are unqualified re: architecture/ engineering Taught inconsistently • Focus is on technology • Business impact is not addressed Not well understood • (Re)learned by every workgroup • Lack of standards/ poor literacy/ unknown dependencies © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com Wally Easton Playing Piano https://www.youtube.com/watch?v=NNbPxSvII-Q 33 Bad Data Decisions Spiral © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 34 Bad data decisions Technical deci- sion makers are not data knowledgable Business decision makers are not data knowledgable Poor organizational outcomes Poor treatment of organizational data assets Poor quality data
  26. © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com Why Data Modeling is Fundamental Program • Data Management Contextual Overview • Motivation – of systems/components – Data is not well understood • Why data modeling & what is it? – Model represents our understanding of the – Fundamental, foundational system characteristics – Shared between systems and humans • Fundamentals – The power of the purpose statement – Understanding data centric thinking – Data modeling compliments other architecture/engineering techniques – Challenges beyond data modeling • Take Aways, References, Q&A X Data Modeling Definition • Modeling = Analysis and design method used to – Define and analyze data requirements – Design data structures that support these requirements • Model = set of data specifications and related diagrams that reflect requirements and designs – Representation of something in our environment – Employs standardized text/symbols to represent data attributes (grouped into data elements) and the relationships among them – Integrated collection of specifications and related diagrams that represent data requirements and design © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 36 from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
  27. Data Modeling • Modeling = complex process involving interaction between people and with technology that don’t compromise the integrity or security of the data – Good data models accurately express and effectively communicate data requirements and quality solution design • Modeling approach (guided by 2 formulas): – Purpose + audience = deliverables – Deliverables + resources + time = approach © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 37 from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International Data Models Facilitate • Formalization – Data model documents a single, precise definition of data requirements and data-related business rules • Communication – Data model is a bridge to understanding data between people with different levels and types of experience. – Helps understand business area, existing application, or impact of modifying an existing structure – May also facilitate training new business and/or technical staff • Scope – Data model can help explain the data concept and scope of purchased application packages © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 38
  28. ANSI-SPARC 3-Layer Schema 1. CONCEPTUAL - Allows independent customized user views: – Each should be able to access the same data, but have a different customized view of the data. 2. LOGICAL - This hides the physical storage details from users: – Users should not have to deal with physical database storage details. They should be allowed to work with the data itself, without concern for how it is physically stored. 3. PHYSICAL - The database administrator should be able to change the database storage structures without affecting the users’ views: – Changes to the structure of an organization's data will be required. The internal structure of the database should be unaffected by changes to the physical aspects of the storage. © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 39 For example, a changeover to a new DBMS technology. The database administrator should be able to change the conceptual or global structure of the database without affecting the users. Families of Modeling Notation Variants © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 40 Information Engineering
  29. What is a Relationship? • Natural associations between two or more entities © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 41 Ordinality & Cardinality • Defines mandatory/optional relationships using minimum/ maximum occurrences from one entity to another © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 42 A BED is placed in one and only one ROOM A ROOM contains zero or more BEDS A BED is occupied by zero or more PATIENTS A PATIENT occupies at least one or more BEDS ROOM BED PATIENT
  30. Q: What is the proper relationship for these entities? © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 43 Eventually One or Many (optional) Eventually One (optional) Zero, or Many (optional) One or Many (mandatory) Exactly One (mandatory) Possible Entity Relationship Cardinality Options © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 44
  31. informed information investing over technology acquisition activities © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 45 Person Job Class Position BR1) One EMPLOYEE can be associated with one PERSON BR2) One EMPLOYEE can be associated with one POSITION Manual Job Sharing Manual Moon Lighting Employee informed information investing over technology acquisition activities © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 46 Person Job Class Employee Position BR1) Zero, one, or more EMPLOYEES can be associated with one PERSON BR2) Zero, one, or more EMPLOYEES can be associated with one POSITION Job Sharing Moon Lighting
  32. informed information investing over technology acquisition activities © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 47 Data structures must be specified prior to IT development/acquisition (Requires 2 structural loops more than the more flexible data structure) More flexible data structure Less flexible data structure Understanding • Definition: – 'Understanding an architecture' – Documented and articulated as a digital blueprint illustrating the commonalities and interconnections among the architectural components – Ideally the understanding is shared by systems and humans © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 48
  33. Data Modeling Process 1. Identify entities 2. Identify key for each entity 3. Draw rough draft of entity relationship data model 4. Identify data attributes 5. Map data attributes to entities © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 49 Model evolution is good, at first ... 1. Identify entities 2. Identify key for each entity 3. Draw rough draft of entity relationship data model 4. Identify data attributes 5. Map data attributes to entities © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 50
  34. Relative use of time allocated to tasks during modeling © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 51 Preliminary Modeling Wrapup Activity activities Cycles activities Evidence Analysis collection & analysis Collection Project coordination requirements Declining coordination requirements Target system analysis Increasing amounts of target system analysis Modeling Validation cycle focus Refinement Don’t Tell Them That You Are Modeling! Then make some appropriate connections between your objects © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 52 Just write some stuff down Then arrange it
  35. Table Handling • A table is a collection of data items that have the same description, such as account totals or monthly averages; it consists of a table name and subordinate items called table elements. – Under representation of other database characteristics causes confusion and introduces risk to organizational data capabilities • In this example, the table consists of Song, Album • and length? • No, iTunes uses – Song Start Time – Song Stop Time • More flexible and less risk © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 53 https://www.ibm.com/support/knowledgecenter/en/SS6SG3_4.2.0/com.ibm.entcobol.doc_4.2/PGandLR/tasks/tptbl02.htm There are correct ways to organize data • Optimization can be done for: – Flexibility – Adaptability – Retrievability – Risk reduction – ... • Techniques include: – Data integrity – Smart codes bad/dumb codes good – Architecture (table joins) – ... © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 54
  36. (a hypothetical portion of the) iTunes database • What information is lost if we delete record #1? © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 55 Record Purchaser ID Song Price 1 Peter We Met Today $0.99 2 Peter My Mother's Voice $1.29 3 Peter Fortune Smiles $0.99 4 Lolly Thousand Pieces of Gold $0.99 (a hypothetical portion of the) iTunes database: Deletion Anomaly • Question: – What information is lost if we delete record #1? • Answer: – We loose the fact that Peter purchased "We Met Today" – We loose the fact that "We Met Today" costs $0.99 – This is usually undesirable and unintended © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 56 Row Purchaser ID Song Price 1 Peter We Met Today $0.99 2 Peter My Mother's Voice $1.29 3 Peter Fortune Smiles $0.99 4 Lolly Thousand Pieces of Gold $0.99
  37. Student Activities File: Insertion Anomalies • Question: – Suppose we want to add new song SCUBA and that it costs $1.29? • Answer: – Cannot enter it until a purchaser buys SCUBA – We cannot insert a full row until we have an additional fact about that row – This is usually undesirable and unintended © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 57 Row Purchaser ID Song Price 2 Peter My Mother's Voice $1.29 3 Peter Fortune Smiles $0.99 4 Lolly Thousand Pieces of Gold $0.99 5 ??? SCUBA $1.29 Student Activities File: Update Anomalies • Question: – Suppose we want to increase the price of 'We Met Today' from $0.99 to $1.29? • Answer: – Change to data items such as Song requires examination of every single record – Will not catch spelling errors - such as "We met Toddy" – This is usually undesirable and unintended © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 58 Row Purchaser ID Song Price 1 Peter We Met Toddy $0.99 2 Peter My Mother's Voice $1.29 3 Peter Fortune Smiles $0.99 4 Lolly Thousand Pieces of Gold $0.99 5 Lolly SCUBA $1.29
  38. How Should it be Done? (In General) • As much as possible, store 1 fact per row – Row 5 is a good example as it shows both that purchaser Lolly has purchased SCUBA and that SCUBA costs $0.99 – These are two distinct facts and are correctly stored in two tables sharing a formal relationship – More remains codes © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 59 Row Purchaser ID Song Price 1 Peter We Met Toddy $0.99 2 Peter My Mother's Voice $1.29 3 Peter Fortune Smiles $0.99 4 Lolly Thousand Pieces of Gold $0.99 5 Lolly SCUBA $0.99 PRICING Row Song Price 1 We Met Today $1.29 2 My Mother's Voice $1.29 3 Fortune Smiles $0.99 4 Thousand Pieces of Gold $0.99 5 SCUBA $0.99 PURCHASES Row Purchaser ID Song 1 Peter We Met Toddy 2 Peter My Mother's Voice 3 Peter Fortune Smiles 4 Lolly Thousand Pieces of Gold 5 Lolly SCUBA 6 Pat SCUBA How Should it be Done? (Joining Tables) • Data from the two tables is joined to provide requested information • Purchaser PETER is now properly registered to own "We Meet Today" • The price change for SCUBA has been resolved • PRICING table is a better engineered solution © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com PRICING Row Song Price 1 We Met Today $1.29 2 My Mother's Voice $1.29 3 Fortune Smiles $0.99 4 Thousand Pieces of Gold $0.99 5 SCUBA $1.29 PURCHASES Row Purchaser ID Song 1 Peter We Met Today 2 Peter My Mother's Voice 3 Peter Fortune Smiles 4 Lolly Thousand Pieces of Gold 5 Lolly SCUBA 6 Pat SCUBA 60 (each price instance can provide context for many purchases)
  39. How Should it be Done? (Connection Types) • Defines mandatory/ optional relationships using minimum/ maximum occurrences from one entity to another © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 61 PRICING Row Song Price 1 We Met Today $1.29 2 My Mother's Voice $1.29 3 Fortune Smiles $0.99 4 Thousand Pieces of Gold $0.99 5 SCUBA $1.29 PURCHASES Row Purchaser ID Song 1 Peter We Met Toddy 2 Peter My Mother's Voice 3 Peter Fortune Smiles 4 Lolly Thousand Pieces of Gold 5 Lolly SCUBA 6 Pat SCUBA How Should it be Done? (Smart codes bad, dumb codes good) • 804 → N zero N → long distance call signaling – All telephone switching equipment (hardware) had to be changed to not route calls to long distance if the hardware 'saw' a zero in the middle of a 3 digit number • Course listings – "You can not add another undergraduate business computer course" • A large organization has to expand a primary master data item by a number of digits – Requires upwards of 100,000 changes to be managed © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 62 BUS360 Business Computer Courses BUS361 BUS362 BUS363 BUS364 BUS365 BUS366 BUS367 BUS368 BUS369 BUS3?? https://www.youtube.com/watch?v=_f1gwAGfZs0&frags=pl%2Cwn
  40. © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com Why Data Modeling is Fundamental Program • Data Management Contextual Overview • Motivation – of systems/components – Data is not well understood • Why data modeling & what is it? – Model represents our understanding of the – Fundamental, foundational system characteristics – Shared between systems and humans • Fundamentals – The power of the purpose statement – Understanding data centric thinking – Data modeling compliments other architecture/engineering techniques – Challenges beyond data modeling • Take Aways, References, Q&A X © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com ! ! ! ! 64 Organizational Needs become instantiated and integrated into a Data Models Informa(on)System) Requirements authorizes and articulates satisfy specific organizational needs The process is iterative Data models and data architectures are developed in response to needs
  41. Bed Entity: BED Purpose: This is a substructure within the room substructure of the facility location. It contains information about beds within rooms. Attributes: Bed.Description Bed.Status Bed.Sex.To.Be.Assigned Bed.Reserve.Reason Associations: >0-+ Room Status: Validated Keep them focused on data model purpose • The reason we are locked in this room is to: – Mission: Understand formal relationship between soda and customer • Outcome: Walk out the door with a data model this relationship – Mission: Understand the characteristics that differ between our hospital beds • Outcome: We will walk out the door when we identify the top three traits that represent the brand. – Mission: Could our systems handle the following business rule tomorrow? – "Is job-sharing permitted?" • Outcomes: Confirm that it is possible to staff a position with multiple employees effective tomorrow © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 65 selects and pays for given to Soda Customer selects can be filled by zero or 1 Employee Position has exactly 1 How does our perspective change: the primary means of tracking a patient Standard definition reporting does not provide conceptual context © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 66 BED Something you sleep in
  42. Entity: BED Data Asset Type: Principal Data Entity Purpose: This is a substructure within the room substructure of the facility location. It contains information about beds within rooms. Source: Maintenance Manual for File and Table Data (Software Version 3.0, Release 3.1) Attributes: Bed.Description Bed.Status Bed.Sex.To.Be.Assigned Bed.Reserve.Reason Associations: >0-+ Room Status: Validated The Power of the Purpose Statement © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 67 IT Project or Application-Centric Development © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com Original articulation from Doug Bagley @ Walmart 68 Data/ Information IT Projects • In support of strategy, organizations implement IT projects • Data/information are typically considered within the scope of IT projects • Problems with this approach: – Ensures data is formed to the applications and not around the organizational-wide information requirements – Process are narrowly formed around applications – Very little data reuse is possible Strategy
  43. Data-Centric Development © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com Original articulation from Doug Bagley @ Walmart 69 Data/ Information IT Projects • In support of strategy, the organization develops specific, shared data-based goals/objectives • These organizational data goals/ objectives drive the development of specific IT projects with an eye to organization-wide usage • Advantages of this approach: - Data/information assets are developed from an organization-wide perspective - Systems support organizational data needs and compliment organizational process flows - Maximum data/information reuse Strategy the Data Doctrine® (V2) We are uncovering better ways of developing IT systems by doing it and helping others do it. Through this work we have come to value: data programs driving IT programs informed information investing over technology acquisition activities stable, shared organizational data over IT component evolution data reuse over the acquisition of new data sources © Copyright 2021 by Peter Aiken Slide # 70 https://plusanythingawesome.com That is, while there is value in the items on the right, we value the items on the left more. Source: theagiledoctrine.org
  44. Typically Managed Architectures • Business Architecture – Goals, strategies, roles, organizational structure, location(s) • Process Architecture – Arrangement of inputs -> transformations = value -> outputs – Typical elements: Functions, activities, workflow, events, cycles, products, procedures • Systems Architecture – Applications, software components, interfaces, projects • Security Architecture – Arrangement of security controls relation to IT Architecture • Technical Architecture/Tarchitecture – Relation of software capabilities/technology stack – Structure of the technology infrastructure of an enterprise, solution or system – Typical elements: Networks, hardware, software platforms, standards/protocols • Data / Information Architecture – Arrangement of data assets supporting organizational strategy – Typical elements: specifications expressed as entities, relationships, attributes, definitions, values, vocabularies © Copyright 2021 by Peter Aiken Slide # 71 https://plusanythingawesome.com 1 in 10 organizations manage 1 or more of these formally Data Modeling Example #1 © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 72 from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International Primary deliverables become reference material Model Purpose Statement: This model codifies the official vocabulary to be used when describing aspects of any of the following organizational concepts: – Subscriber – Account – Charge – Bill
  45. Data Modeling Example #2 fuel rent-rate phone-rate phone-call rental agreement customer auto repair history phone-unit Source: Chikofsky 1990 © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 73 Model Purpose Statement: This model codifies the official vocabulary to be used when describing aspects of any of the following organizational concepts: – fuel – customer – auto – rental agreement – rent-rate – phone-call – phone-rate – phone-unit – repair history It is documentation shown during the on- boarding process Interpretations: 1. Car rental company 2. Rental agreement is central 3. No direct connection between customer and contract 4. Contract must have a customer 5. Nothing structural prevents autos from being rented to multiple customers 6. Phone units are tied to rentals Data Modeling Example #3 © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com salesperson name commission rate invoice # amount date paid customer name address customer # date order # price quantity order # item # quantity on hand description supplier item # cost SALESPERSON INVOICE ORDER CATALOG LINE ITEM 74 • Sales commission-based pricing information • Difficult to change a customer address • Price not included in the catalog • Easy to implement variable pricing - difficult to implement standard pricing - is standard pricing implemented • Sales person information is not directly tied to the order • Do sales people sell things that are shipped quickly so they get their commission quicker? • Nothing prohibits a sales from having multiple sales persons • Multiple invoices are allowed for a single order • Partial shipment is allowed • Data base cannot tell what part of an order the invoice pertains to Model Purpose Statement: This model codifies the official vocabulary and specific operational rules to be used when describing aspects of any of the following organizational concepts: – salesperson – invoice – order – line item – catalog
  46. DISPOSITION Data Map © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com Model Purpose Statement: This model codifies the official vocabulary to be used when describing disposition related organizational concepts: – user – admission – discharge – encounter – facility – provider – diagnosis 75 Data Model #4: DISPOSITION © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com • At least one but possibly more system USERS enter the DISPOSITION facts into the system. • An ADMISSION is associated with one and only one DISCHARGE. • An ADMISSION is associated with zero or more FACILITIES. • An ADMISSION is associated with zero or more PROVIDERS. • An ADMISSION is associated with one or more ENCOUNTERS. • An ENCOUNTER may be recorded by a system USER. • An ENCOUNTER may be associated with a PROVIDER. • An ENCOUNTER may be associated with one or more DIAGNOSES. 76 ADMISSION Contains information about patient admission history related to one or more inpatient episodes DIAGNOSIS Contains the International Disease Classification (IDC) of code representation and/or description of a patient's health related to an inpatient code DISCHARGE A table of codes describing disposition types available for an inpatient at a FACILITY ENCOUNTER Tracking information related to inpatient episodes FACILITY File containing a list of all facilities in regional health care system PROVIDER Full name of a member of the FACILITY team providing services to the patient USER Any user with access to create, read, update, and delete DISPOSITION data ADMISSION Contains information about patient admission history related to one or more inpatient episodes DIAGNOSIS Contains the International Disease Classification (IDC) of code representation and/or description of a patient's health related to an inpatient code DISCHARGE A table of codes describing disposition types available for an inpatient at a FACILITY ENCOUNTER Tracking information related to inpatient episodes FACILITY File containing a list of all facilities in regional health care system PROVIDER Full name of a member of the FACILITY team providing services to the patient USER Any user with access to create, read, update, and delete DISPOSITION data ADMISSION Contains information about patient admission history related to one or more inpatient episodes DIAGNOSIS Contains the International Disease Classification (IDC) of code representation and/or description of a patient's health related to an inpatient code DISCHARGE A table of codes describing disposition types available for an inpatient at a FACILITY ENCOUNTER Tracking information related to inpatient episodes FACILITY File containing a list of all facilities in regional health care system PROVIDER Full name of a member of the FACILITY team providing services to the patient USER Any user with access to create, read, update, and delete DISPOSITION data Death must be a disposition code!
  47. As Is Information Requirements Assets As Is Data Design Assets As Is Data Implementation Assets Existing New Modeling in Various Contexts © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com O2 Recreate Data Design Reverse Engineering Forward engineering O5 Reconstitute Requirements O9 Reimplement Data To Be Data Implementation Assets O8 Redesign Data O4 Recon- stitute Data Design O3 Recreate Requirements O6 Redesign Data To Be Design Assets O7 Re- develop Require- ments To Be Requirements Assets O1 Recreate Data Implementation Metadata 77 Modeling Options O-1 data implementation (e.g., by recreating descriptions of implemented file layouts); O-2 data designs (e.g., by recreating the logical system design layouts); or O-3 information requirements (e.g., by recreating existing system specifications and business rules). O-4 data design assets by examining the existing data implementation (when appropriate O-1 can facilitate O-4); and O-5 system information requirements by reverse engineering the data design O-4. (Note: if the data design doesn't exist O-4 must precede O-5.) O-6 transforming as is data design assets, yielding improved to be data designs that are based on reconstituted data design assets produced by O-2 or O-4 and (possibly O-1); O-7 transforming as is system requirements into to be system requirements that are based on reconstituted system requirements produced by O-3 or O-5 and (possibly O-2); O-8 redesigning to be data design assets using the to be system requirements based on reconstituted system requirements produced by O-7; and O-9 re-implementing system data based on data redesigns produced by O-6 or O-8. © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 78
  48. Model Evolution Framework © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 79 Conceptual Logical Physical Validated Not Validated Every modeling change can be mapped to a transformation in this framework! Model Evolution (better explanation) © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 80 As-is To-be Technology Independent/ Logical Technology Dependent/ Physical abstraction Other logical as-is data architecture components
  49. Pick any two! and there are still tradeoffs to be made! © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 81 Data Models Used to Support Strategy • Flexible, adaptable data structures • Cleaner, less complex code • Ensure strategy effectiveness measurement • Build in future capabilities • Form/assess merger and acquisitions strategies © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 82 Employee Type Employee Sales Person Manager Manager Type Staff Manager Line Manager Adapted from Clive Finkelstein Information Engineering Strategic Systems Development 1992
  50. How do Data Models Support Organizational Strategy • Consider the opposite question: – Were your systems explicitly designed to be integrated or otherwise work together? – If not then what is the likelihood that they will work well together? – In all likelihood your organization is spending between much of its IT budget compensating for poor data structure integration – They cannot be helpful as long as their structure is unknown • Two answers – Achieving efficiency and effectiveness goals – Providing organizational dexterity for rapid implementation © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 83 Typical focus of a database modeling effort Data Modeling Ensures Interoperability © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 84 Program F Program E Program D Program G Program H Application domain 2 Application domain 3 Program I Typical focus of a software engineering effort Program A
  51. Typical focus of a database modeling effort Data Models Ensure Interoperability © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 85 Program F Program E Program D Program G Program H Application domain 2 Application domain 3 Program I Typical focus of a software engineering effort Program A D a t a M o d e l D a t a M o d e l D a t a M o d e l D a t a M o d e l D a t a M o d e l D a t a M o d e l Program F Program E Program D Program G Program H Program I Application domain 2 Application domain 3 D a t a M o d e l D a t a M o d e l D a t a M o d e l Data Model Focus has Great Potential Business Value • How are decisions about the range and scope of common data usage, made? • Analysis scope is on use of data to support a process • Problems caused by data exchange or interface problems • Goals often connect strategic and operational • One data model is ideal © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 86 D a t a M o d e l Program A
  52. © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com Why Data Modeling is Fundamental Program • Data Management Contextual Overview • Motivation – of systems/components – Data is not well understood • Why data modeling & what is it? – Model represents our understanding of the – Fundamental, foundational system characteristics – Shared between systems and humans • Fundamentals – The power of the purpose statement – Understanding data centric thinking – Data modeling compliments other architecture/engineering techniques – Challenges beyond data modeling • Take Aways, References, Q&A X Event Pricing © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 88 • 20% off directly from the publisher on select titles • My Book Store @ http://plusanythingawesome.com • Enter the code "anythingawesome" at the Technics bookstore checkout where it says to "Apply Coupon" anythingawesome
  53. Use Models to • Store and formalize information • Filter out extraneous detail • Define an essential set of information • Help understand complex system behavior • Gain information from the process of developing and interacting with the model • Evaluate various scenarios or other outcomes indicated by the model • Monitor and predict system responses to changing environmental conditions © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 89 https://www.youtube.com/watch?v=J5YR0uqPAI8 Data Modeling for Business Value • Goal must be shared IT/business understanding – No disagreements = insufficient communication • Data sharing/exchange is automated and dependent on successful engineering/architecture – Requires a sound foundation of data modeling basics (the essence) on which to build technologies • Modeling characteristics evolve during the analysis – Different model instances may be useful to different analytical problems • Incorporate motivation (purpose statements) in all modeling – Modeling is a problem defining as well as a problem solving activity • Use of modeling is more important than selection of a specific method • Models are often living documents • Models need to be available in an easily searchable manner • Utility is paramount – Adding color and diagramming objects customizes models and allows for a more engaging and enjoyable user review process © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 90 Inspired by: Karen Lopez http://www.information-management.com/newsletters/enterprise_architecture_data_model_ERP_BI-10020246-1.html?pg=2
  54. Upcoming Events Business Value through Reference & Master Data Strategies 13 July 2021 Getting (Re)Started with Data Stewardship 10 August 2021 Approaching Data Quality Engineering 14 September 2021 © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 91 Brought to you by: Time: 19:00 UTC (2:00 PM NYC) | Presented by: Peter Aiken, PhD paiken@plusanythingawesome.com +1.804.382.5957 Questions? Thank You! © Copyright 2021 by Peter Aiken Slide # 92 Book a call with Peter to discuss anything - https://plusanythingawesome.com/OfficeHours.html + =
Publicidad