This document discusses databases and data modeling. It begins by defining what big data is and how databases underlie digital products. It then discusses how the digital humanities deals with large amounts of digitized information sources. The document goes on to explain how databases can be used to ingest and combine data sets to create new data. It also discusses how databases allow for inferences to be made through relations between tables. Finally, it covers database design, including creating entity relationship diagrams to model data and mapping those diagrams to database tables.
2. Business
• Quizzes by Friday
• Safari Resources
– When off grounds, use VPN or access from
the Library web page
– It should allow you to log on to the resource
3. Big Data
• What is Big Data?
– Data produced by governments, corporations,
scientific instruments, transactions …
– Captured by databases
• Databases are at the foundation of almost
all digital products we use
– Social Media, from Facebook to WordPress
– Learning Management Systems (e.g. Collab)
– Video Games and Simulations
– Maps and Timelines
4. The Digital Humanities has entered the
era of Big Data
Numerous collections of primary and
secondary sources have been digitized
over the last two decades
To do scholarship, you need to both
produce and consume data
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22. Databases
• We can also use relational databases to
ingest data sets from the wild
• Once they are in the database, we may
modify them to conform to our own data
model
• And we may combine them to produce
new data
• The database becomes a recombinant
space for creating data mash ups
24. This query is an example
of how two tables can be
"joined" into a third table.
It also shows how you can
manipulate the data on
the fly to produce new
results.
25. Quick Note
• MySQL uses two kinds of quotes
– Double and single to wrap strings
– ―Backticks‖ ( ` ) are used sometimes to wrap
table and field names
– E.g. SELECT `Country` FROM `country_debt`
• Back ticks are used to allow spaces in field
and table names
– But this is a bad practice; I do not encourage
spaces
– Therefore backticks are optional
26. Just as we saw with Aristotle’s
logic, relational databases allow us to
develop ontologies from which we can
draw inferences
27. We can see that each of
table we imported actually
stands for an assertion
(The conclusion in this case is simply a correlation)
28. I felt like the strategy for database design explained in the
reading on SQL ran quite contrary to my understanding of
the ―hacker‖ mentality, and I think it speaks to the lack
of flexibility in the SQL database system. . . . Database
designers [are] encouraged to map everything out before
even thinking about beginning construction on the actual
database.
This is true – the book does project a planning ethos
at odds with the spirit of hacking and iterative
building. This is as it should be – experienced
programmers and database designers do value
planning. But building databases can be organic and
creative too, especially when we the domain being
modeled is not well understood, which is often the
case with the digital humanities.
29. Remember that in the digital
humanities, we are reverse engineering
culture from media
Instead of planning a data model, we
need to extract and evolve one
But we can use the tools of database
design to help us
32. Making data is more than adding
data to a database
You first have to create the database
All good databases are based on
models, which we view as knowledge
representations
33. Learning MySQL
• Provides the right level of
information
– But follows traditional planning
model
– Our approach is a bit different
– Introduces useful vocabulary
• Key idea in Chapter 3 is use
of Entity Relationship
Diagrams
– E-R diagrams
– I use a simplified version
34. Database Design
• Process 1 (Planned)
– Gather requirements
– Create an ER model – data model
– Translate into tables – database schema
• Process 2 (Evolved)
– Gather data
– Find implicit relations
– Create new tables
– Create ER model
– Translate into tables
35. The simplest case of two entities with a relationship.
We don't specify the nature of the relationship at this
point. For example, A might stand for PERSON and B
might stand for BOOK, as in PERSON READS BOOK.
36. This includes the cardinality of the
relationship. A relates to 1 or more (or 0 or
more) of B. For example, PERSON READS
MANY BOOKS.
37. This shows a Many-to-Many relationship (M:M, or
M:N). MANY PERSONS READ MANY BOOKS.
That is, a given PERSON may read more than one
BOOK, and a given BOOK may be read by more
than one PERSON.
38. This implies the creation of a third entity, C, to
capture the BOOK / PERSON relationship. We
can think of this as a kind of EVENT -- our
database will capture all instances, say, of
PEOPLE reading BOOKS.
39. Now, in the case of our two tables,
we have the following implied
model. (The single arrow heads
imply a Subject/Object relation.)
40. After thinking about this model some, we can see
that COUNTRY actually has a 1:M relationship to
DEBT, since the latter varies by year. (We can
imagine a DEBT table with an AMOUNT field and a
YEAR field.) We also know that each
SOCIALNETWORK can be related to more than
one COUNTRY.
41. In the end, our model will look something like
this. So we will need to create tables to match
these entities, e.g. COUNTRY,
DEBT_OF_COUNTRY, SOCIALNETWORK,
SOCIALNETWORK_OF_COUNTRY
42. E-R Rules
• Entities and Attributes
– Entities are definitions of things that have some
―integrity‖
– Attributes are like properties of things
– The difference can be logical or practical
• Relations and Cardinality
– Relations exist between Entities
– They are like assertions—PERSON read BOOK
– Relations have ―cardinality‖ which gives clues about
the data model
• Uniqueness and keys
– Entities are uniquely defined by certain attributes
43. Mapping ER Diagrams to Tables
Cardinality matters:
1:1 Same table, with exceptions
1:M Two tables, table A has key
M:1 Two tables, table B has foreign key
M:M Third table of foreign keys