SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
I’ve Always Wanted To
Data Model
Ian Varley, Salesforce.com
Data Week, 2013-10-02
Lightning Talk (10 minutes)
Who am I?
Ian Varley
Austin, TX
Salesforce.com
Big Data Team
@thefutureian
What’s Data Modeling?
The act of taking the intelligible
structure of the world around us, and
making it concrete enough for
computers to act on it.
(More specifically, data modeling usually
has to do with storing it in a database.)
Traditionally, data modeling has meant
Entity Attribute Relationship
modeling techniques.

There are variants that are more “OO” (like UML) but they
share most of the same core assumptions.
Many a project was sunk
due to shitty data modeling.
It’s a difficult occupation.
You have to be part engineer, part
psychologist, and part philosopher.
If you’re doing it, you’re not alone.
Lots of smart folks think about this stuff.
(David Hay, Steve Hoberman, Joe Celko, many more.)
But.
The expressive power of our
conceptual modeling techniques hasn’t
improved much since the 1970s.

We mostly look at the world in the
same static way we did 40 years ago.
Partly, this is because our discipline is
wedded to relational (SQL) DBs.

When the only tool you have
is a hammer ...
A book that opened my eyes ...

(He said a lot of the stuff I’m about to say back in 1978!)
I don’t have a lot of answers.
But I want to raise some questions.
And hopefully, start a conversation.
Here are 5 observations about the
tools of traditional data modeling.
#1: nobody actually knows
what an “entity” really is.
“Entity” is another word for Category,
in linguistics terms.
And an important property of linguistic
categories is that they are slippery.
See:
● Steven Pinker: The Stuff Of Thought
● Douglas Hofstadter: Surfaces & Essences
● George Lakoff: Women, Fire, and Dangerous Things
part: an abstract definition of
a connected set of physical
materials that serve some
purpose, and that people are
willing to buy

part: one instance of a part
type, which arrives on the QA
line at a specific time and
either does or doesn't meet
quality standards
And if you think you can “solve” the
problem, I’ve got some world trade
center insurance policies to sell you.
That said, there are a couple tools we
could adopt that would help:
● First-class Sub- / Super-Typing
● First-class Scoping and Aliasing
(Not that there aren’t ways to do this in ERD models, but
they’re unobvious and not widely used.)
#2: entities, attributes, and
relationships are really the
same thing, maaaan ...

http://the-hippie-portfolio.tumblr.com/
Say I’ve got a “parent” in my model.
Is it:
● A “parent” entity?
● A “person” entity with
an “isParent” attribute?
● Two “person” entities in
a “parent” relationship?
It’s all of them; the distinction is
arbitrary.
The real structure is just a graph … but
none of our modeling tools are that
flexible, nor is it helpful to think that
abstractly about most software.
Normally, we make the choice based
on our experience and gut feeling, and
pretend there’s a science to it.
But the whole way of thinking is a
convenience based on “records”.
I have no idea what to do about this.
Tools that allow you to view any part of
your model in any of those ways?
I have no idea what to do about this.
Tools that allow you to view any part of
your model in any of those ways?
I have no idea what to do about this.
Tools that allow you to view any part of
your model in any of those ways?
This isn’t realistic with today’s tools, so
this is just idle speculation.
#3: prescriptive models
encourage black & white
thinking in a gray world
You have to make decisions (about
entities, attributes, relationships, types)
up front. But sometimes that’s not right.
This is a strength of (some) NoSQL
databases: you can do data first, and
surface structure later.
Sometimes the deep structure is
actually ambiguous.
This can apply broadly.
(What if an employee isn’t really “in” a department, but has
flexible membership based on where she spends her time?)
You can represent that in a traditional
data model, sure.
But you’re not encouraged to.
#4: static models make the
time dimension unwieldy
Entity models are generally silent on
the ways data changes.
Many modern databases can keep
older versions of objects.
But should they? For which entities
How many versions? etc.
Worse, what about when the model
changes at runtime, and you need to
also retain knowledge of what the old
model was?
As in #3, there are ways to model this
in entity models, but it’s not easy, so
most people just don’t think about it.
#5: boxes & lines aren’t
how we actually think
Our spatial processing of diagrams
doesn’t map well to our temporal,
spatial, and causal comprehension of
data structure.
What do people really do?
Skip making models when their
models look too complicated.
F*** THAT NOISE.
Is there an alternative? Not yet.
What could move the needle?
● Prototype based modeling
● Proper scoping
● Semantic zooming
The map is not the territory.
In conclusion …
if you dig this stuff, let’s talk!
@thefutureian

Más contenido relacionado

La actualidad más candente

Hpai class 4 - text classification w colab - 020520 and in class demo
Hpai   class 4 - text classification w colab - 020520 and in class demoHpai   class 4 - text classification w colab - 020520 and in class demo
Hpai class 4 - text classification w colab - 020520 and in class demomelendez321
 
Hpai class 14 - brain cells and memory - 031620
Hpai   class 14 - brain cells and memory - 031620Hpai   class 14 - brain cells and memory - 031620
Hpai class 14 - brain cells and memory - 031620melendez321
 
Using Social Science Data in ABM: Opportunities and Challenges
Using Social Science Data in ABM: Opportunities and ChallengesUsing Social Science Data in ABM: Opportunities and Challenges
Using Social Science Data in ABM: Opportunities and ChallengesEdmund Chattoe-Brown
 
Augmented 11022020-ieee
Augmented 11022020-ieeeAugmented 11022020-ieee
Augmented 11022020-ieeeRaman Kannan
 
Making sense of messy problems - Systems Thinking for multi-channel UX
Making sense of messy problems - Systems Thinking for multi-channel UXMaking sense of messy problems - Systems Thinking for multi-channel UX
Making sense of messy problems - Systems Thinking for multi-channel UXjohanna kollmann
 
Hpai class 12 - potpourri & perception - 032620 actual
Hpai   class 12 - potpourri & perception - 032620 actualHpai   class 12 - potpourri & perception - 032620 actual
Hpai class 12 - potpourri & perception - 032620 actualmelendez321
 

La actualidad más candente (6)

Hpai class 4 - text classification w colab - 020520 and in class demo
Hpai   class 4 - text classification w colab - 020520 and in class demoHpai   class 4 - text classification w colab - 020520 and in class demo
Hpai class 4 - text classification w colab - 020520 and in class demo
 
Hpai class 14 - brain cells and memory - 031620
Hpai   class 14 - brain cells and memory - 031620Hpai   class 14 - brain cells and memory - 031620
Hpai class 14 - brain cells and memory - 031620
 
Using Social Science Data in ABM: Opportunities and Challenges
Using Social Science Data in ABM: Opportunities and ChallengesUsing Social Science Data in ABM: Opportunities and Challenges
Using Social Science Data in ABM: Opportunities and Challenges
 
Augmented 11022020-ieee
Augmented 11022020-ieeeAugmented 11022020-ieee
Augmented 11022020-ieee
 
Making sense of messy problems - Systems Thinking for multi-channel UX
Making sense of messy problems - Systems Thinking for multi-channel UXMaking sense of messy problems - Systems Thinking for multi-channel UX
Making sense of messy problems - Systems Thinking for multi-channel UX
 
Hpai class 12 - potpourri & perception - 032620 actual
Hpai   class 12 - potpourri & perception - 032620 actualHpai   class 12 - potpourri & perception - 032620 actual
Hpai class 12 - potpourri & perception - 032620 actual
 

Similar a I've Always Wanted To Data Model - Data Week 2013

Hybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyHybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyAnthony (Tony) Sarris
 
“The real world”: information in the workplace versus information in college ...
“The real world”: information in the workplace versus information in college ...“The real world”: information in the workplace versus information in college ...
“The real world”: information in the workplace versus information in college ...IL Group (CILIP Information Literacy Group)
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
The Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
The Untold Benefits of Ethical Design - Web Directions Summit 2018, SydneyThe Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
The Untold Benefits of Ethical Design - Web Directions Summit 2018, SydneyHolger Bartel
 
ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011tpgoddard
 
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docxLearning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docxsmile790243
 
Why Software Drives Us Crazy
Why Software Drives Us CrazyWhy Software Drives Us Crazy
Why Software Drives Us CrazyTechWell
 
Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories! (handouts)Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories! (handouts)Tudor Girba
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)Truong Bomi
 
Rebecca parsons agile east
Rebecca parsons   agile eastRebecca parsons   agile east
Rebecca parsons agile eastKmanthei
 
they should clearly communicate what the model component represents. .pdf
they should clearly communicate what the model component represents. .pdfthey should clearly communicate what the model component represents. .pdf
they should clearly communicate what the model component represents. .pdfsrinivas9922
 
From/To: Everything You Wanted to Know About the Future of Your Work But Were...
From/To: Everything You Wanted to Know About the Future of Your Work But Were...From/To: Everything You Wanted to Know About the Future of Your Work But Were...
From/To: Everything You Wanted to Know About the Future of Your Work But Were...Cognizant
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Miningebelani
 
Object Oriented Analysis And Design
Object Oriented Analysis And DesignObject Oriented Analysis And Design
Object Oriented Analysis And DesignSahil Mahajan
 
Flexible Content Requires Future-Ready Organizations
Flexible Content Requires Future-Ready OrganizationsFlexible Content Requires Future-Ready Organizations
Flexible Content Requires Future-Ready OrganizationsSara Wachter-Boettcher
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLBritney Muller
 
Understanding and Conceptualizing interaction - Mary Margarat
Understanding and Conceptualizing interaction  - Mary MargaratUnderstanding and Conceptualizing interaction  - Mary Margarat
Understanding and Conceptualizing interaction - Mary MargaratMary Margarat
 

Similar a I've Always Wanted To Data Model - Data Week 2013 (20)

Ai lecture1 final
Ai lecture1 finalAi lecture1 final
Ai lecture1 final
 
Hybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyHybrid use of machine learning and ontology
Hybrid use of machine learning and ontology
 
“The real world”: information in the workplace versus information in college ...
“The real world”: information in the workplace versus information in college ...“The real world”: information in the workplace versus information in college ...
“The real world”: information in the workplace versus information in college ...
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
The Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
The Untold Benefits of Ethical Design - Web Directions Summit 2018, SydneyThe Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
The Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
 
ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011
 
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docxLearning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
 
Why Software Drives Us Crazy
Why Software Drives Us CrazyWhy Software Drives Us Crazy
Why Software Drives Us Crazy
 
Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories! (handouts)Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories! (handouts)
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)
 
Rebecca parsons agile east
Rebecca parsons   agile eastRebecca parsons   agile east
Rebecca parsons agile east
 
they should clearly communicate what the model component represents. .pdf
they should clearly communicate what the model component represents. .pdfthey should clearly communicate what the model component represents. .pdf
they should clearly communicate what the model component represents. .pdf
 
From/To: Everything You Wanted to Know About the Future of Your Work But Were...
From/To: Everything You Wanted to Know About the Future of Your Work But Were...From/To: Everything You Wanted to Know About the Future of Your Work But Were...
From/To: Everything You Wanted to Know About the Future of Your Work But Were...
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
Object Oriented Analysis And Design
Object Oriented Analysis And DesignObject Oriented Analysis And Design
Object Oriented Analysis And Design
 
Flexible Content Requires Future-Ready Organizations
Flexible Content Requires Future-Ready OrganizationsFlexible Content Requires Future-Ready Organizations
Flexible Content Requires Future-Ready Organizations
 
Theseus' data
Theseus' dataTheseus' data
Theseus' data
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXL
 
Understanding and Conceptualizing interaction - Mary Margarat
Understanding and Conceptualizing interaction  - Mary MargaratUnderstanding and Conceptualizing interaction  - Mary Margarat
Understanding and Conceptualizing interaction - Mary Margarat
 

Último

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

I've Always Wanted To Data Model - Data Week 2013

  • 1. I’ve Always Wanted To Data Model Ian Varley, Salesforce.com Data Week, 2013-10-02 Lightning Talk (10 minutes)
  • 2. Who am I? Ian Varley Austin, TX Salesforce.com Big Data Team @thefutureian
  • 4. The act of taking the intelligible structure of the world around us, and making it concrete enough for computers to act on it. (More specifically, data modeling usually has to do with storing it in a database.)
  • 5. Traditionally, data modeling has meant Entity Attribute Relationship modeling techniques. There are variants that are more “OO” (like UML) but they share most of the same core assumptions.
  • 6. Many a project was sunk due to shitty data modeling.
  • 7. It’s a difficult occupation. You have to be part engineer, part psychologist, and part philosopher.
  • 8. If you’re doing it, you’re not alone. Lots of smart folks think about this stuff. (David Hay, Steve Hoberman, Joe Celko, many more.)
  • 10. The expressive power of our conceptual modeling techniques hasn’t improved much since the 1970s. We mostly look at the world in the same static way we did 40 years ago.
  • 11. Partly, this is because our discipline is wedded to relational (SQL) DBs. When the only tool you have is a hammer ...
  • 12. A book that opened my eyes ... (He said a lot of the stuff I’m about to say back in 1978!)
  • 13. I don’t have a lot of answers. But I want to raise some questions. And hopefully, start a conversation.
  • 14. Here are 5 observations about the tools of traditional data modeling.
  • 15. #1: nobody actually knows what an “entity” really is.
  • 16. “Entity” is another word for Category, in linguistics terms. And an important property of linguistic categories is that they are slippery. See: ● Steven Pinker: The Stuff Of Thought ● Douglas Hofstadter: Surfaces & Essences ● George Lakoff: Women, Fire, and Dangerous Things
  • 17. part: an abstract definition of a connected set of physical materials that serve some purpose, and that people are willing to buy part: one instance of a part type, which arrives on the QA line at a specific time and either does or doesn't meet quality standards
  • 18. And if you think you can “solve” the problem, I’ve got some world trade center insurance policies to sell you.
  • 19. That said, there are a couple tools we could adopt that would help: ● First-class Sub- / Super-Typing ● First-class Scoping and Aliasing (Not that there aren’t ways to do this in ERD models, but they’re unobvious and not widely used.)
  • 20. #2: entities, attributes, and relationships are really the same thing, maaaan ... http://the-hippie-portfolio.tumblr.com/
  • 21. Say I’ve got a “parent” in my model. Is it: ● A “parent” entity? ● A “person” entity with an “isParent” attribute? ● Two “person” entities in a “parent” relationship? It’s all of them; the distinction is arbitrary.
  • 22. The real structure is just a graph … but none of our modeling tools are that flexible, nor is it helpful to think that abstractly about most software.
  • 23. Normally, we make the choice based on our experience and gut feeling, and pretend there’s a science to it.
  • 24. But the whole way of thinking is a convenience based on “records”.
  • 25. I have no idea what to do about this. Tools that allow you to view any part of your model in any of those ways?
  • 26. I have no idea what to do about this. Tools that allow you to view any part of your model in any of those ways?
  • 27. I have no idea what to do about this. Tools that allow you to view any part of your model in any of those ways?
  • 28. This isn’t realistic with today’s tools, so this is just idle speculation.
  • 29. #3: prescriptive models encourage black & white thinking in a gray world
  • 30. You have to make decisions (about entities, attributes, relationships, types) up front. But sometimes that’s not right.
  • 31. This is a strength of (some) NoSQL databases: you can do data first, and surface structure later.
  • 32. Sometimes the deep structure is actually ambiguous.
  • 33.
  • 34. This can apply broadly. (What if an employee isn’t really “in” a department, but has flexible membership based on where she spends her time?)
  • 35. You can represent that in a traditional data model, sure. But you’re not encouraged to.
  • 36. #4: static models make the time dimension unwieldy
  • 37. Entity models are generally silent on the ways data changes.
  • 38. Many modern databases can keep older versions of objects. But should they? For which entities How many versions? etc.
  • 39. Worse, what about when the model changes at runtime, and you need to also retain knowledge of what the old model was?
  • 40. As in #3, there are ways to model this in entity models, but it’s not easy, so most people just don’t think about it.
  • 41. #5: boxes & lines aren’t how we actually think
  • 42. Our spatial processing of diagrams doesn’t map well to our temporal, spatial, and causal comprehension of data structure.
  • 43. What do people really do? Skip making models when their models look too complicated.
  • 44.
  • 46. Is there an alternative? Not yet.
  • 47. What could move the needle? ● Prototype based modeling ● Proper scoping ● Semantic zooming
  • 48. The map is not the territory.
  • 49. In conclusion … if you dig this stuff, let’s talk! @thefutureian