Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Winter 2014: Session #2
Programming on the Whiteboard
(Paige Morgan, Sarah Kremen-Hicks, Brian Gutierrez)
Previously, at DMDH...
• The work of creating usable data
• Forms that this data might take:
• markup language
• spreadshe...
Workshop #2
• Caveat Curator (challenges of working with
data)
• Programming on the whiteboard, i.e.,
conceptualizing the ...
Why this focus on data?
• Understanding your data, and your
intended actions, is a key skill for working
with any programm...
Programming languages
are like human
languages in that they
both have phrases,
patterns, and rules.
Programming languages
are unlike human
languages in that they
aren’t for communicating
with people.
They are also unlike human
languages in that every
programming utterance
does something, i.e., causes
an action to occur.
You can get used to
patterns – even
unfamiliar ones.
The shift is in getting
used to thinking in
terms of every single
action.
Our subject matter today is all
actions that you’ll need to
think about before you work
with...
Image: Josh Lee, @wtrsld, via Twitter, January 2014.
Even when you’re just
experimenting, you need to
prep your data.
You may know your dataset
in detail already, from your
research -- but your
computer is concerned with
different levels of...
Becoming aware of those levels
of detail is not only helpful for
your project ideas...
...it’s also a useful skill for
working with programming
languages.
(where a stray /> or ; can break your program/website)
Caveat Curator
Data only works if your
computer can read it.
But my data is just text!
(Isn’t that easy?)
(Remember, your computer is
fairly stupid).
Formatted text
is often full of
text your
computer can’t
parse correctly.
The┘re┘sÜlt ís that yoÜr te┘xt
might come┘ oÜt looking
like┘this
whe┘n yoÜ ope┘n it in a
programming e┘nvironme┘nt.
So you need to
convert it to
plain text.
(without any of the fancy details
encoded in MS Word fonts.)
But even that can produce
unexpected errors.
Maybe you want to work with
sailing data and ports of call:
The ship you’re interested in
leaves the Ivory Coast for St.
Helena...
But when you create your map,
you get this:
The latitude/longitude
coordinate is the significant
datum.
The city name is just the
human-readable component.
Each datum needs to be
unique.
Figuring out what sort of
unique configuration will
work best involves at
least some
experimentation.
To experiment effectively, you’ll
want to keep careful records.
If you develop categories of
information, you’ll want to
keep a record of what each
category means, and what
its limits ar...
Cleaning and structuring your
data is a foundation issue that
changes, depending on the
available format of your data.
What if your data is
crowdsourced?
You can require a particular
format for submissions
You can even put
programmatic limits on the
formats available for
submission
But in the end, you’re still going
to need to scrub and/or
format.
This is true even for data
from supposedly reputable
sources, like government or
media organizations.
Example: Doctor WhoVillains
dataset
http://tinyurl.com/doctorwhovil
lains
This step is no fun!
But it’s absolutely necessary.
What does a baby computer
call his father: “data”
Break!
Working with “little data”:
GIS and the Spatial Turn
GIS technology has paved the
way for the analyzing qualitative
data associated with cultural
experiences
“A good map is worth a thousand words,
cartographers say, and they are right: because
it produces a thousand words: it rai...
Literary texts are filled with
subjective spatial data: an
author or character's
articulation of geographically
located dw...
Project: Mapping William
Wordsworth's Conspicuous
Consumption in The Prelude
(Brian R. Gutierrez)
Objective: to map the visual culture
events referenced in Wordsworth’s
autobiographical poem The Prelude (as
well as the o...
Problem to solve: Prove that literary
galleries, specifically Joseph Boydell’s
“Shakespeare Gallery” shaped the
dramaturgi...
Data: place-names, indirect
references, and all non-
referenced visual cultural
events
Access to data: Project
Gutenberg, digital archive of
British newspapers and
periodicals
What to do with that data?
Map it!!
First data set:
Literary spatial articulations
Wordsworth mentions these following place
names and references:
"Oh wonderous power of words, how sweet  they are
 / Accor...
First, I need to know what and
where these places were in
order to identify them as
spatial data
Ex:Vauxhall and Ranelagh
Second, if I'm interested in
visual cultural experiences, I
need to identify what kind of
event occurred there: galley
pla...
Third, how would I access the data?
Answer: place-names in a book are not
under any copyright.  
However, if I wanted to i...
Fourth, I would have to locate any indirect
reference to visual cultural phenomena.
Ex:Wordsworth mentions two actresses b...
Fifth, I need to research what special
events were occurring at other places
he mentions. For that, I look to The
Times (n...
Sixth, because I going to create
a map, using ArcGIS, I need to
put my data in an excel
spreadsheet so that it can be
read...
What is the relationship
between the data?
Analyze the qualitative data
Humanist skill=
Dhumanist skill
Programming on the
whiteboard involves looking at
the categories of information,
and thinking about how they
interact.
Categories
• Place names
• Poetic lines
• Genre of visual/cultural event
• Spatial data (latitude/longitude)
Return to the source of original
data—the literary text—to
examine how the author is
describing these phenomena
Why use ArcGIS?
Benefits of ArcGIS
• It allows the overlay of historical maps
• Trainings were available and accessible
(through DHSI and ...
Disadvantages of ArcGIS
• Available only for PCs
• Proprietary file format (even if input data is
open-access, the end res...
In Franco Moretti’s Atlas of the
European Novel 1800-1900
(1998), he calls for a “literary
geography,” predicated on the
c...
Caveats?
The pursuit of mapping data
may exclude complex social
spaces (e.g., gender domestic
environments)
Caveats?
Cartographical representations
should not be divorced from
their primary texts
Project:Visualizing Prosody
(Sarah Kremen-Hicks)
x / |x /|xx / | x / |x /
Sir Walter Vivian all a summer's day
/ x | / x |...
Marking up a poem for
metrical scansion is encoding it
with data.
What can a computer do with
that data?
Computers are good at
counting things – like iambs.
Is it possible to predict
deviations from a metrical
norm based on author or lyric
classification?
Will authors show a tendency
for particular types of metrical
substitution?
Prepping the Data
• For proof of concept, start with one author
(Alfred, LordTennyson)
• Get Tennyson’s poems from Project...
Programming on the Whiteboard
What should the
computer do?
Computer tasks• Count feet per line
• Recognize | as a foot boundary
• Recognize carriage return as a line boundary
• Supp...
These steps involve recognizing
each metrical foot as units that
contain particular accentual-
syllabic data.
x / |x /|xx ...
Computer tasks, cont’d.
• Identify the most common number of feet
per line
• Supply a report on lines (by number) that
dev...
After recognizing the foot as a
unit, the computer can calculate
what patterns of data each foot
contains.
Computer tasks, cont’d.
• Identify the most common foot type
• Identify markings within foot boundaries
• Compare markings...
These tasks identify each line
as a unit composed of one or
more feet.
x / |x /|xx / | x / |x /
Sir WalterVivian all a sum...
Still more computing tasks!
• Identify the most common foot type within
a poem
• Supply a report on feet (by line and foot...
Just as the feet contain
patterns, the lines contain
patterns that can be analyzed
as well.
Still more computing tasks!
• Report on types of deviations arranged by
most to least common
• Information should include ...
Deviations and their placement
within each line and each poem
should display certain patterns
unique to each author (I hop...
Current status: I’m investigating
using the Natural Language
Toolkit to tokenize each foot;
and to establish syllables, fe...
ApplicableValues
•Iterative development
•Failure as valuable
•Collaboration
If you are thinking about your
data, and the tasks that you
need to accomplish, then it’s
easier to determine what sort
of...
There are countless tutorials,
online courses, etc., for almost
any programming language or
platform.
(We’re giving you a ...
Learning them can be a slow
process, especially at first.
However, knowing what tasks
you’re working towards makes
it easier to understand the
purpose of the introductory
lessons.
It’s also easy to think about
how the first rules you learn
for any language or platform
might affect your goals.
And now, it’s your turn...
For this activity, we
recommend that you pair up,
or form small groups to work
together.
Group Activity
• What do you need to do with your data?
• What units might that data exist in?
• What categories do you ne...
Spring Workshops!
• Project Ideation and Development
• April 5th and April 26th (advance
registration for DMDH participant...
DMDH content is developed by Paige Morgan,
Sarah Kremen-Hicks, and Brian Gutierrez, with
generous support from the Simpson...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard
Próxima SlideShare
Cargando en…5
×

Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

1.529 visualizaciones

Publicado el

Slides for the second workshop on programming in digital humanities through the University of Washington's Demystifying Digital Humanities project.

Publicado en: Educación, Tecnología
  • Sé el primero en comentar

Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

  1. 1. Winter 2014: Session #2 Programming on the Whiteboard (Paige Morgan, Sarah Kremen-Hicks, Brian Gutierrez)
  2. 2. Previously, at DMDH... • The work of creating usable data • Forms that this data might take: • markup language • spreadsheets
  3. 3. Workshop #2 • Caveat Curator (challenges of working with data) • Programming on the whiteboard, i.e., conceptualizing the specific steps that you need to take to accomplish your goals
  4. 4. Why this focus on data? • Understanding your data, and your intended actions, is a key skill for working with any programming language or platform. • This is true whether you are the programmer or whether you are working with professional programmers.
  5. 5. Programming languages are like human languages in that they both have phrases, patterns, and rules.
  6. 6. Programming languages are unlike human languages in that they aren’t for communicating with people.
  7. 7. They are also unlike human languages in that every programming utterance does something, i.e., causes an action to occur.
  8. 8. You can get used to patterns – even unfamiliar ones.
  9. 9. The shift is in getting used to thinking in terms of every single action.
  10. 10. Our subject matter today is all actions that you’ll need to think about before you work with...
  11. 11. Image: Josh Lee, @wtrsld, via Twitter, January 2014.
  12. 12. Even when you’re just experimenting, you need to prep your data.
  13. 13. You may know your dataset in detail already, from your research -- but your computer is concerned with different levels of detail.
  14. 14. Becoming aware of those levels of detail is not only helpful for your project ideas...
  15. 15. ...it’s also a useful skill for working with programming languages. (where a stray /> or ; can break your program/website)
  16. 16. Caveat Curator
  17. 17. Data only works if your computer can read it.
  18. 18. But my data is just text! (Isn’t that easy?)
  19. 19. (Remember, your computer is fairly stupid).
  20. 20. Formatted text is often full of text your computer can’t parse correctly.
  21. 21. The┘re┘sÜlt ís that yoÜr te┘xt might come┘ oÜt looking like┘this whe┘n yoÜ ope┘n it in a programming e┘nvironme┘nt.
  22. 22. So you need to convert it to plain text. (without any of the fancy details encoded in MS Word fonts.)
  23. 23. But even that can produce unexpected errors.
  24. 24. Maybe you want to work with sailing data and ports of call:
  25. 25. The ship you’re interested in leaves the Ivory Coast for St. Helena...
  26. 26. But when you create your map, you get this:
  27. 27. The latitude/longitude coordinate is the significant datum.
  28. 28. The city name is just the human-readable component.
  29. 29. Each datum needs to be unique.
  30. 30. Figuring out what sort of unique configuration will work best involves at least some experimentation.
  31. 31. To experiment effectively, you’ll want to keep careful records.
  32. 32. If you develop categories of information, you’ll want to keep a record of what each category means, and what its limits are.
  33. 33. Cleaning and structuring your data is a foundation issue that changes, depending on the available format of your data.
  34. 34. What if your data is crowdsourced?
  35. 35. You can require a particular format for submissions
  36. 36. You can even put programmatic limits on the formats available for submission
  37. 37. But in the end, you’re still going to need to scrub and/or format.
  38. 38. This is true even for data from supposedly reputable sources, like government or media organizations.
  39. 39. Example: Doctor WhoVillains dataset http://tinyurl.com/doctorwhovil lains
  40. 40. This step is no fun!
  41. 41. But it’s absolutely necessary.
  42. 42. What does a baby computer call his father: “data” Break!
  43. 43. Working with “little data”: GIS and the Spatial Turn
  44. 44. GIS technology has paved the way for the analyzing qualitative data associated with cultural experiences
  45. 45. “A good map is worth a thousand words, cartographers say, and they are right: because it produces a thousand words: it raises doubts, ideas. It poses new questions, and forces you to look for new answers.” (Moretti 1998, 3–4)
  46. 46. Literary texts are filled with subjective spatial data: an author or character's articulation of geographically located dwellings, urban and rural landscapes, as well as performance spaces
  47. 47. Project: Mapping William Wordsworth's Conspicuous Consumption in The Prelude (Brian R. Gutierrez)
  48. 48. Objective: to map the visual culture events referenced in Wordsworth’s autobiographical poem The Prelude (as well as the ones not referenced)
  49. 49. Problem to solve: Prove that literary galleries, specifically Joseph Boydell’s “Shakespeare Gallery” shaped the dramaturgical choices in the only play written by Wordsworth. He reads Shakespeare not through a personal copy of the play, but through the visual and performative texts at that time
  50. 50. Data: place-names, indirect references, and all non- referenced visual cultural events
  51. 51. Access to data: Project Gutenberg, digital archive of British newspapers and periodicals
  52. 52. What to do with that data? Map it!!
  53. 53. First data set: Literary spatial articulations
  54. 54. Wordsworth mentions these following place names and references: "Oh wonderous power of words, how sweet  they are  / According to the meaning which they bring-- / Vauxhall and Ranelagh, I then had heard / Of your green groves and wilderness of lamps, /Your gorgeous ladies, fairy cataracts,And pageant fireworks"  (119-125) "Half-rural Sadler's Wells" (267)
  55. 55. First, I need to know what and where these places were in order to identify them as spatial data Ex:Vauxhall and Ranelagh
  56. 56. Second, if I'm interested in visual cultural experiences, I need to identify what kind of event occurred there: galley play, etc.
  57. 57. Third, how would I access the data? Answer: place-names in a book are not under any copyright.   However, if I wanted to include sections from the text when a viewer would click on that place name then I would have to think about copyright, but it's on PG, so that's covered.
  58. 58. Fourth, I would have to locate any indirect reference to visual cultural phenomena. Ex:Wordsworth mentions two actresses by name Mary Robinson and Sarah Siddons. Since I cannot map a person, I need to investigate which plays they were in and at which theaters during that moment of his life (it's an autobiography)
  59. 59. Fifth, I need to research what special events were occurring at other places he mentions. For that, I look to The Times (newspapers) and various periodicals.
  60. 60. Sixth, because I going to create a map, using ArcGIS, I need to put my data in an excel spreadsheet so that it can be read by the program.
  61. 61. What is the relationship between the data?
  62. 62. Analyze the qualitative data Humanist skill= Dhumanist skill
  63. 63. Programming on the whiteboard involves looking at the categories of information, and thinking about how they interact.
  64. 64. Categories • Place names • Poetic lines • Genre of visual/cultural event • Spatial data (latitude/longitude)
  65. 65. Return to the source of original data—the literary text—to examine how the author is describing these phenomena
  66. 66. Why use ArcGIS?
  67. 67. Benefits of ArcGIS • It allows the overlay of historical maps • Trainings were available and accessible (through DHSI and UW courses) • As a software program,ArcGIS is established enough to be considered robust • Available through the UW software suite
  68. 68. Disadvantages of ArcGIS • Available only for PCs • Proprietary file format (even if input data is open-access, the end result is not) • Available only on an annual subscription model (and prohibitively expensive for scholars without campus-granted access)
  69. 69. In Franco Moretti’s Atlas of the European Novel 1800-1900 (1998), he calls for a “literary geography,” predicated on the creation of “readerly maps” and the use of those maps as analytical tools.
  70. 70. Caveats? The pursuit of mapping data may exclude complex social spaces (e.g., gender domestic environments)
  71. 71. Caveats? Cartographical representations should not be divorced from their primary texts
  72. 72. Project:Visualizing Prosody (Sarah Kremen-Hicks) x / |x /|xx / | x / |x / Sir Walter Vivian all a summer's day / x | / x | x / | x / | x / Gave his broad lawns until the set of sun
  73. 73. Marking up a poem for metrical scansion is encoding it with data. What can a computer do with that data?
  74. 74. Computers are good at counting things – like iambs.
  75. 75. Is it possible to predict deviations from a metrical norm based on author or lyric classification?
  76. 76. Will authors show a tendency for particular types of metrical substitution?
  77. 77. Prepping the Data • For proof of concept, start with one author (Alfred, LordTennyson) • Get Tennyson’s poems from Project Gutenberg • Hand-mark representative poems for prosody
  78. 78. Programming on the Whiteboard What should the computer do?
  79. 79. Computer tasks• Count feet per line • Recognize | as a foot boundary • Recognize carriage return as a line boundary • Supply foot boundaries at beginning/end of lines • Count the number of areas contained within foot boundaries for each line
  80. 80. These steps involve recognizing each metrical foot as units that contain particular accentual- syllabic data. x / |x /|xx / | x / |x / Sir WalterVivian all a summer's day
  81. 81. Computer tasks, cont’d. • Identify the most common number of feet per line • Supply a report on lines (by number) that deviate • Calculate rate of deviation/adherence • Mode = paradigm
  82. 82. After recognizing the foot as a unit, the computer can calculate what patterns of data each foot contains.
  83. 83. Computer tasks, cont’d. • Identify the most common foot type • Identify markings within foot boundaries • Compare markings to foot dictionary to identify type
  84. 84. These tasks identify each line as a unit composed of one or more feet. x / |x /|xx / | x / |x / Sir WalterVivian all a summer's day (iambic pentameter with third foot anapestic substitution)
  85. 85. Still more computing tasks! • Identify the most common foot type within a poem • Supply a report on feet (by line and foot number) that deviate • Calculate rate of deviation/adherence • Mode = paradigm
  86. 86. Just as the feet contain patterns, the lines contain patterns that can be analyzed as well.
  87. 87. Still more computing tasks! • Report on types of deviations arranged by most to least common • Information should include location (line/foot number), as well as prevalence of substitution type
  88. 88. Deviations and their placement within each line and each poem should display certain patterns unique to each author (I hope!)
  89. 89. Current status: I’m investigating using the Natural Language Toolkit to tokenize each foot; and to establish syllables, feet, and lines as a unique hierarchy.
  90. 90. ApplicableValues •Iterative development •Failure as valuable •Collaboration
  91. 91. If you are thinking about your data, and the tasks that you need to accomplish, then it’s easier to determine what sort of language or platform your project needs.
  92. 92. There are countless tutorials, online courses, etc., for almost any programming language or platform. (We’re giving you a cheat sheet, too; and http://www.dmdh.org is your friend. So is Google.)
  93. 93. Learning them can be a slow process, especially at first.
  94. 94. However, knowing what tasks you’re working towards makes it easier to understand the purpose of the introductory lessons.
  95. 95. It’s also easy to think about how the first rules you learn for any language or platform might affect your goals.
  96. 96. And now, it’s your turn...
  97. 97. For this activity, we recommend that you pair up, or form small groups to work together.
  98. 98. Group Activity • What do you need to do with your data? • What units might that data exist in? • What categories do you need to create? • What relationships need to exist between the units and categories?
  99. 99. Spring Workshops! • Project Ideation and Development • April 5th and April 26th (advance registration for DMDH participants at the end of Winter Quarter
  100. 100. DMDH content is developed by Paige Morgan, Sarah Kremen-Hicks, and Brian Gutierrez, with generous support from the Simpson Center for the Humanities at the University of Washington. Content is available under a Creative Commons Attribution-NonCommercial 3.0 Unported License. Please contact Paige at paigecm@uw.edu with questions.

×