SlideShare una empresa de Scribd logo
1 de 43
Finding and Making Data

     Prof. Alvarado
       MDST 3705
    12 February 2013
Business
• Quizzes by Friday
• Safari Resources
  – When off grounds, use VPN or access from
    the Library web page
  – It should allow you to log on to the resource
Big Data
• What is Big Data?
  – Data produced by governments, corporations,
    scientific instruments, transactions …
  – Captured by databases
• Databases are at the foundation of almost
  all digital products we use
  – Social Media, from Facebook to WordPress
  – Learning Management Systems (e.g. Collab)
  – Video Games and Simulations
  – Maps and Timelines
The Digital Humanities has entered the
            era of Big Data

 Numerous collections of primary and
secondary sources have been digitized
      over the last two decades

 To do scholarship, you need to both
     produce and consume data
Databases
• We can also use relational databases to
  ingest data sets from the wild
• Once they are in the database, we may
  modify them to conform to our own data
  model
• And we may combine them to produce
  new data
• The database becomes a recombinant
  space for creating data mash ups
The database is also a machine for
      making inferences …
This query is an example
of how two tables can be
"joined" into a third table.
It also shows how you can
manipulate the data on
the fly to produce new
results.
Quick Note
• MySQL uses two kinds of quotes
  – Double and single to wrap strings
  – ―Backticks‖ ( ` ) are used sometimes to wrap
    table and field names
  – E.g. SELECT `Country` FROM `country_debt`
• Back ticks are used to allow spaces in field
  and table names
  – But this is a bad practice; I do not encourage
    spaces
  – Therefore backticks are optional
Just as we saw with Aristotle’s
logic, relational databases allow us to
develop ontologies from which we can
draw inferences
We can see that each of
                                   table we imported actually
                                   stands for an assertion




(The conclusion in this case is simply a correlation)
I felt like the strategy for database design explained in the
reading on SQL ran quite contrary to my understanding of
the ―hacker‖ mentality, and I think it speaks to the lack
of flexibility in the SQL database system. . . . Database
designers [are] encouraged to map everything out before
even thinking about beginning construction on the actual
database.

This is true – the book does project a planning ethos
at odds with the spirit of hacking and iterative
building. This is as it should be – experienced
programmers and database designers do value
planning. But building databases can be organic and
creative too, especially when we the domain being
modeled is not well understood, which is often the
case with the digital humanities.
Remember that in the digital
humanities, we are reverse engineering
          culture from media

 Instead of planning a data model, we
    need to extract and evolve one

 But we can use the tools of database
          design to help us
EXAMPLES OF DATABASES
Database Design, or Making Data
Making data is more than adding
        data to a database

You first have to create the database

 All good databases are based on
models, which we view as knowledge
          representations
Learning MySQL
• Provides the right level of
  information
  – But follows traditional planning
    model
  – Our approach is a bit different
  – Introduces useful vocabulary
• Key idea in Chapter 3 is use
  of Entity Relationship
  Diagrams
  – E-R diagrams
  – I use a simplified version
Database Design
• Process 1 (Planned)
  – Gather requirements
  – Create an ER model – data model
  – Translate into tables – database schema
• Process 2 (Evolved)
  – Gather data
  – Find implicit relations
  – Create new tables
  – Create ER model
  – Translate into tables
The simplest case of two entities with a relationship.
We don't specify the nature of the relationship at this
point. For example, A might stand for PERSON and B
might stand for BOOK, as in PERSON READS BOOK.
This includes the cardinality of the
relationship. A relates to 1 or more (or 0 or
more) of B. For example, PERSON READS
MANY BOOKS.
This shows a Many-to-Many relationship (M:M, or
M:N). MANY PERSONS READ MANY BOOKS.
That is, a given PERSON may read more than one
BOOK, and a given BOOK may be read by more
than one PERSON.
This implies the creation of a third entity, C, to
capture the BOOK / PERSON relationship. We
can think of this as a kind of EVENT -- our
database will capture all instances, say, of
PEOPLE reading BOOKS.
Now, in the case of our two tables,
we have the following implied
model. (The single arrow heads
imply a Subject/Object relation.)
After thinking about this model some, we can see
that COUNTRY actually has a 1:M relationship to
DEBT, since the latter varies by year. (We can
imagine a DEBT table with an AMOUNT field and a
YEAR field.) We also know that each
SOCIALNETWORK can be related to more than
one COUNTRY.
In the end, our model will look something like
this. So we will need to create tables to match
these entities, e.g. COUNTRY,
DEBT_OF_COUNTRY, SOCIALNETWORK,
SOCIALNETWORK_OF_COUNTRY
E-R Rules
• Entities and Attributes
  – Entities are definitions of things that have some
    ―integrity‖
  – Attributes are like properties of things
  – The difference can be logical or practical
• Relations and Cardinality
  – Relations exist between Entities
  – They are like assertions—PERSON read BOOK
  – Relations have ―cardinality‖ which gives clues about
    the data model
• Uniqueness and keys
  – Entities are uniquely defined by certain attributes
Mapping ER Diagrams to Tables
Cardinality matters:
1:1      Same table, with exceptions
1:M      Two tables, table A has key
M:1      Two tables, table B has foreign key
M:M      Third table of foreign keys

Más contenido relacionado

La actualidad más candente

Tableau Final Presentation
Tableau Final PresentationTableau Final Presentation
Tableau Final PresentationAnvesh Rao
 
Shared Data & Big Data for Libraries
Shared Data & Big Data for LibrariesShared Data & Big Data for Libraries
Shared Data & Big Data for Librariesrobin fay
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)heba_ahmad
 

La actualidad más candente (9)

Database and types of database
Database and types of databaseDatabase and types of database
Database and types of database
 
Design approach
Design approachDesign approach
Design approach
 
Tableau Final Presentation
Tableau Final PresentationTableau Final Presentation
Tableau Final Presentation
 
Shared Data & Big Data for Libraries
Shared Data & Big Data for LibrariesShared Data & Big Data for Libraries
Shared Data & Big Data for Libraries
 
Database Engine
Database EngineDatabase Engine
Database Engine
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Tree
TreeTree
Tree
 

Destacado

UVA MDST 3703 The Stack of Scholarship 2012-09-24
UVA MDST 3703 The Stack of Scholarship 2012-09-24UVA MDST 3703 The Stack of Scholarship 2012-09-24
UVA MDST 3703 The Stack of Scholarship 2012-09-24Rafael Alvarado
 
Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13Rafael Alvarado
 
Mdst3703 shiva-2012-10-18
Mdst3703 shiva-2012-10-18Mdst3703 shiva-2012-10-18
Mdst3703 shiva-2012-10-18Rafael Alvarado
 
Mdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlMdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlRafael Alvarado
 
UVA MDST 3703 Studio 01 2012-08-30
UVA MDST 3703 Studio 01 2012-08-30UVA MDST 3703 Studio 01 2012-08-30
UVA MDST 3703 Studio 01 2012-08-30Rafael Alvarado
 
MDST 3703 F10 Seminar 14
MDST 3703 F10 Seminar 14MDST 3703 F10 Seminar 14
MDST 3703 F10 Seminar 14Rafael Alvarado
 
Mdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsMdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsRafael Alvarado
 

Destacado (9)

Hd Overview
Hd OverviewHd Overview
Hd Overview
 
UVA MDST 3703 The Stack of Scholarship 2012-09-24
UVA MDST 3703 The Stack of Scholarship 2012-09-24UVA MDST 3703 The Stack of Scholarship 2012-09-24
UVA MDST 3703 The Stack of Scholarship 2012-09-24
 
Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13
 
Mdst3703 shiva-2012-10-18
Mdst3703 shiva-2012-10-18Mdst3703 shiva-2012-10-18
Mdst3703 shiva-2012-10-18
 
MDST 3703 F10 Studio 2
MDST 3703 F10 Studio 2 MDST 3703 F10 Studio 2
MDST 3703 F10 Studio 2
 
Mdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlMdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-html
 
UVA MDST 3703 Studio 01 2012-08-30
UVA MDST 3703 Studio 01 2012-08-30UVA MDST 3703 Studio 01 2012-08-30
UVA MDST 3703 Studio 01 2012-08-30
 
MDST 3703 F10 Seminar 14
MDST 3703 F10 Seminar 14MDST 3703 F10 Seminar 14
MDST 3703 F10 Seminar 14
 
Mdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsMdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collections
 

Similar a Making Data from Documents

Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)Zhang Bo
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to databaseSuleman Memon
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesUlf Wendel
 
data base system to new data science lerne
data base system to new data science lernedata base system to new data science lerne
data base system to new data science lernetarunprajapati0t
 
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
Kskv kutch university DBMS unit 1  basic concepts, data,information,database,...Kskv kutch university DBMS unit 1  basic concepts, data,information,database,...
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...Dipen Parmar
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling TechniqueCarmen Sanborn
 
Data models and ro
Data models and roData models and ro
Data models and roDiana Diana
 
Reflective Teaching Essay
Reflective Teaching EssayReflective Teaching Essay
Reflective Teaching EssayLisa Williams
 
Info systems databases
Info systems databasesInfo systems databases
Info systems databasesMR Z
 
Choosing your NoSQL storage
Choosing your NoSQL storageChoosing your NoSQL storage
Choosing your NoSQL storageImteyaz Khan
 

Similar a Making Data from Documents (20)

Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
 
Database system
Database system Database system
Database system
 
Database
DatabaseDatabase
Database
 
Database Essay
Database EssayDatabase Essay
Database Essay
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodes
 
Database systems Handbook dbms rdbms pdf.pdf
Database systems Handbook dbms rdbms pdf.pdfDatabase systems Handbook dbms rdbms pdf.pdf
Database systems Handbook dbms rdbms pdf.pdf
 
Database systems Handbook rdbms.pdf
Database systems Handbook  rdbms.pdfDatabase systems Handbook  rdbms.pdf
Database systems Handbook rdbms.pdf
 
Database systems Handbook dbms & rdbms.pdf
Database systems Handbook dbms & rdbms.pdfDatabase systems Handbook dbms & rdbms.pdf
Database systems Handbook dbms & rdbms.pdf
 
Database systems Handbook rdbms.pdf
Database systems Handbook  rdbms.pdfDatabase systems Handbook  rdbms.pdf
Database systems Handbook rdbms.pdf
 
DBMS Part 1.pptx
DBMS Part 1.pptxDBMS Part 1.pptx
DBMS Part 1.pptx
 
data base system to new data science lerne
data base system to new data science lernedata base system to new data science lerne
data base system to new data science lerne
 
Database report
Database reportDatabase report
Database report
 
Database systems introduction
Database systems introductionDatabase systems introduction
Database systems introduction
 
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
Kskv kutch university DBMS unit 1  basic concepts, data,information,database,...Kskv kutch university DBMS unit 1  basic concepts, data,information,database,...
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
 
Data models and ro
Data models and roData models and ro
Data models and ro
 
Reflective Teaching Essay
Reflective Teaching EssayReflective Teaching Essay
Reflective Teaching Essay
 
Info systems databases
Info systems databasesInfo systems databases
Info systems databases
 
Choosing your NoSQL storage
Choosing your NoSQL storageChoosing your NoSQL storage
Choosing your NoSQL storage
 

Más de Rafael Alvarado

Mdst3703 2013-10-01-hypertext-and-history
Mdst3703 2013-10-01-hypertext-and-historyMdst3703 2013-10-01-hypertext-and-history
Mdst3703 2013-10-01-hypertext-and-historyRafael Alvarado
 
Mdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertextMdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertextRafael Alvarado
 
Mdst3703 2013-09-17-text-models
Mdst3703 2013-09-17-text-modelsMdst3703 2013-09-17-text-models
Mdst3703 2013-09-17-text-modelsRafael Alvarado
 
Mdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signalsMdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signalsRafael Alvarado
 
Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2Rafael Alvarado
 
Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2Rafael Alvarado
 
Mdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-worldMdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-worldRafael Alvarado
 
UVA MDST 3703 2013 08-27 Introduction
UVA MDST 3703 2013 08-27 IntroductionUVA MDST 3703 2013 08-27 Introduction
UVA MDST 3703 2013 08-27 IntroductionRafael Alvarado
 
MDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to VisualizationMDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to VisualizationRafael Alvarado
 
Mdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genreMdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genreRafael Alvarado
 
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-dataMdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-dataRafael Alvarado
 
Mdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesMdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesRafael Alvarado
 
Mdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxisMdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxisRafael Alvarado
 
Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3Rafael Alvarado
 
Mdst3705 2012-01-22-code-as-language
Mdst3705 2012-01-22-code-as-languageMdst3705 2012-01-22-code-as-language
Mdst3705 2012-01-22-code-as-languageRafael Alvarado
 
Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2Rafael Alvarado
 
Mdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introductionMdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introductionRafael Alvarado
 
Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012Rafael Alvarado
 
Mdst3703 culturomics-2012-11-01
Mdst3703 culturomics-2012-11-01Mdst3703 culturomics-2012-11-01
Mdst3703 culturomics-2012-11-01Rafael Alvarado
 

Más de Rafael Alvarado (20)

Mdst3703 2013-10-01-hypertext-and-history
Mdst3703 2013-10-01-hypertext-and-historyMdst3703 2013-10-01-hypertext-and-history
Mdst3703 2013-10-01-hypertext-and-history
 
Mdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertextMdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertext
 
Presentation1
Presentation1Presentation1
Presentation1
 
Mdst3703 2013-09-17-text-models
Mdst3703 2013-09-17-text-modelsMdst3703 2013-09-17-text-models
Mdst3703 2013-09-17-text-models
 
Mdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signalsMdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signals
 
Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2
 
Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2
 
Mdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-worldMdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-world
 
UVA MDST 3703 2013 08-27 Introduction
UVA MDST 3703 2013 08-27 IntroductionUVA MDST 3703 2013 08-27 Introduction
UVA MDST 3703 2013 08-27 Introduction
 
MDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to VisualizationMDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to Visualization
 
Mdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genreMdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genre
 
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-dataMdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
 
Mdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesMdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databases
 
Mdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxisMdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxis
 
Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3
 
Mdst3705 2012-01-22-code-as-language
Mdst3705 2012-01-22-code-as-languageMdst3705 2012-01-22-code-as-language
Mdst3705 2012-01-22-code-as-language
 
Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2
 
Mdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introductionMdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introduction
 
Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012
 
Mdst3703 culturomics-2012-11-01
Mdst3703 culturomics-2012-11-01Mdst3703 culturomics-2012-11-01
Mdst3703 culturomics-2012-11-01
 

Making Data from Documents

  • 1. Finding and Making Data Prof. Alvarado MDST 3705 12 February 2013
  • 2. Business • Quizzes by Friday • Safari Resources – When off grounds, use VPN or access from the Library web page – It should allow you to log on to the resource
  • 3. Big Data • What is Big Data? – Data produced by governments, corporations, scientific instruments, transactions … – Captured by databases • Databases are at the foundation of almost all digital products we use – Social Media, from Facebook to WordPress – Learning Management Systems (e.g. Collab) – Video Games and Simulations – Maps and Timelines
  • 4. The Digital Humanities has entered the era of Big Data Numerous collections of primary and secondary sources have been digitized over the last two decades To do scholarship, you need to both produce and consume data
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Databases • We can also use relational databases to ingest data sets from the wild • Once they are in the database, we may modify them to conform to our own data model • And we may combine them to produce new data • The database becomes a recombinant space for creating data mash ups
  • 23. The database is also a machine for making inferences …
  • 24. This query is an example of how two tables can be "joined" into a third table. It also shows how you can manipulate the data on the fly to produce new results.
  • 25. Quick Note • MySQL uses two kinds of quotes – Double and single to wrap strings – ―Backticks‖ ( ` ) are used sometimes to wrap table and field names – E.g. SELECT `Country` FROM `country_debt` • Back ticks are used to allow spaces in field and table names – But this is a bad practice; I do not encourage spaces – Therefore backticks are optional
  • 26. Just as we saw with Aristotle’s logic, relational databases allow us to develop ontologies from which we can draw inferences
  • 27. We can see that each of table we imported actually stands for an assertion (The conclusion in this case is simply a correlation)
  • 28. I felt like the strategy for database design explained in the reading on SQL ran quite contrary to my understanding of the ―hacker‖ mentality, and I think it speaks to the lack of flexibility in the SQL database system. . . . Database designers [are] encouraged to map everything out before even thinking about beginning construction on the actual database. This is true – the book does project a planning ethos at odds with the spirit of hacking and iterative building. This is as it should be – experienced programmers and database designers do value planning. But building databases can be organic and creative too, especially when we the domain being modeled is not well understood, which is often the case with the digital humanities.
  • 29. Remember that in the digital humanities, we are reverse engineering culture from media Instead of planning a data model, we need to extract and evolve one But we can use the tools of database design to help us
  • 31. Database Design, or Making Data
  • 32. Making data is more than adding data to a database You first have to create the database All good databases are based on models, which we view as knowledge representations
  • 33. Learning MySQL • Provides the right level of information – But follows traditional planning model – Our approach is a bit different – Introduces useful vocabulary • Key idea in Chapter 3 is use of Entity Relationship Diagrams – E-R diagrams – I use a simplified version
  • 34. Database Design • Process 1 (Planned) – Gather requirements – Create an ER model – data model – Translate into tables – database schema • Process 2 (Evolved) – Gather data – Find implicit relations – Create new tables – Create ER model – Translate into tables
  • 35. The simplest case of two entities with a relationship. We don't specify the nature of the relationship at this point. For example, A might stand for PERSON and B might stand for BOOK, as in PERSON READS BOOK.
  • 36. This includes the cardinality of the relationship. A relates to 1 or more (or 0 or more) of B. For example, PERSON READS MANY BOOKS.
  • 37. This shows a Many-to-Many relationship (M:M, or M:N). MANY PERSONS READ MANY BOOKS. That is, a given PERSON may read more than one BOOK, and a given BOOK may be read by more than one PERSON.
  • 38. This implies the creation of a third entity, C, to capture the BOOK / PERSON relationship. We can think of this as a kind of EVENT -- our database will capture all instances, say, of PEOPLE reading BOOKS.
  • 39. Now, in the case of our two tables, we have the following implied model. (The single arrow heads imply a Subject/Object relation.)
  • 40. After thinking about this model some, we can see that COUNTRY actually has a 1:M relationship to DEBT, since the latter varies by year. (We can imagine a DEBT table with an AMOUNT field and a YEAR field.) We also know that each SOCIALNETWORK can be related to more than one COUNTRY.
  • 41. In the end, our model will look something like this. So we will need to create tables to match these entities, e.g. COUNTRY, DEBT_OF_COUNTRY, SOCIALNETWORK, SOCIALNETWORK_OF_COUNTRY
  • 42. E-R Rules • Entities and Attributes – Entities are definitions of things that have some ―integrity‖ – Attributes are like properties of things – The difference can be logical or practical • Relations and Cardinality – Relations exist between Entities – They are like assertions—PERSON read BOOK – Relations have ―cardinality‖ which gives clues about the data model • Uniqueness and keys – Entities are uniquely defined by certain attributes
  • 43. Mapping ER Diagrams to Tables Cardinality matters: 1:1 Same table, with exceptions 1:M Two tables, table A has key M:1 Two tables, table B has foreign key M:M Third table of foreign keys