SlideShare una empresa de Scribd logo
1 de 64
DMDH Winter 2015 Session #1:
Exploring Programming in the Digital
Humanities
Programming is complex
enough that just figuring out
what you want to do and
what sort of language you
need is work.
Thinking that you ought to be able
to do everything almost immediately
is a recipe for feeling terrible.
Being aware that it is
genuine work, and not just
work for newbies, matters.
There will always be new
programs and platforms that
you will want to experiment
with.
Working with technology
means periodically starting
from scratch -- a bit like
working with a new time
period or culture; or figuring
out how to teach a new
class.
What can programming
languages do?
Programming languages can...
They can also do all these
things in combination.
Example #1
• find all the statements in quotes ("") from a
novel.
• count how many words are in each statement
• put the statements in order from smallest
amount of words to largest
• write all the statements from the novel in a text
file
Example #2
• allow a user to type in some information, i.e.,
"Benedict Cumberbatch"
• compare “Benedict Cumberbatch” to a much
larger file
• retrieve any data that matches the information
• print the retrieved information on screen
Example #3
• "read" two texts -- say, two plays by Seneca
• search for any words that the two plays have in
common
• print the words that they have in common on
screen
• calculate what percentage of the words in each play
are shared
• print that percentage onscreen
Example #4
• if the user is located in geographic location
Z, i.e., 45th and University, go to an online
address and retrieve some text
• print that text on the user’s tablet screen
• receive input from the user and respond
However...
• In Example #1, the computer is focusing on things
that characters say. But what if you want to isolate
speeches from just one character?
• In Example 2, how does the computer know how
much text to print? Will it just print "Benedict
Cumberbatch" 379 times, because that's how often
it appears in the larger file?
These are the areas of
programming where critical
thinking and humanities
skills become vital.
The Difference
• Humans are good at differentiating
between material in complex and
sophisticated ways.
• Computers are good at not differentiating
between material unless they’ve been
specifically instructed to do so.
Computers work with data.
You work with data, too -- but in most cases,
you'll have to make your data readable by
computer.
How to make your data
machine-readable
• Annotate it with markup language
• Organize it in patterns that the computer
can understand
• Add data that is not explicitly readable in
the current format (i.e.,
hardbound/softbound binding;
language:English; date of record creation)
Depending on the data you
have, and the way you
annotate or structure it,
different things become
possible.
For instance, sometimes it
may be enough to know
that a tile is 9” sq. But
sometimes you need to
know that it is 3” x 3”.
Your goal is to make the data As
Simple As Possible -- but not so
simple that it stops being useful.
Depending on the data you
work with, the work of
structuring or annotating
becomes more challenging,
but also more useful.
The work of creating data is
social.
In other words, how can
others use it?
Many programming languages have
governing bodies that establish
standards for their use:
•the World Wide Web (W3C) Consortium
(http://www.w3.org/standards/)
•the TEI Technical Council
BREAK!
Data Examples
• Annotated (Markup Languages: HTML,TEI)
• Structured (MySQL)
• Combination (Semantic Web)
Markup: HTML
<i> This text is
italic.</i>
=
This text is italic.
Markup: HTML
<a href=“http://www.dmdh.org”>
This text</a> will take you to a webpage.
=
This text will take you to a webpage.
Markup: HTML
Anything can be data -- and markup languages
provide instructions for how computers should
treat that data.
Markup: HTML
HTML is a display language used to format text on webpages.
<p> separates text into paragraphs.
<em> makes text bold (emphasized).
These are just a few of the HTML formatting instructions that
you can use.
HTML Syntax Rules
•Open and closed tags: <> and </>
•Attributes (2nd
-level information)
defined using =“”
•Comments: <!-- -->
Markup languages are
popular in digital humanities
because lots of humanists
work with texts.
Without markup languages,
the things that a computer
can search for are limited.
Ctrl + F: any text in iambic
pentameter.
With markup, the
things you can search
for are only limited by
your interpretation.
Markup: TEI
TEI
(Text Encoding Initiative)
Markup: TEI
Poetry w/ TEI
<text xmlns="http://www.tei-c.org/ns/1.0" xml:id="d1">
<body xml:id="d2">
<div1 type="book" xml:id="d3">
<head>Songs of Innocence</head>
<pb n="4"/>
<div2 type="poem" xml:id="d4">
<head>Introduction</head>
<lg type="stanza">
<l>Piping down the valleys wild, </l>
<l>Piping songs of pleasant glee, </l>
<l>On a cloud I saw a child, </l>
<l>And he laughing said to me: </l>
</lg>
Grammar w/ TEI
<entry>
<form>
<orth>pamplemousse</orth>
</form>
<gramGrp>
<gram type="pos">noun</gram>
<gram type="gen">masculine</gram>
</gramGrp>
</entry>
TEI’s syntax rules are
identical to HTML’s --
though your normal
browser can’t work with TEI
the way it works with
HTML.
TEI is meant to be a highly
social language -- meaning
that the committee who
maintains its standards want
it to be something that
anyone can use.
In order for TEI to
successfully encode texts, it
has to be adaptable to
individual projects.
Anything that you can isolate (and
put in brackets) can (theoretically)
then be manipulated to serve your
project.
TEI can be used to encode more than just text:
<div type="shot">
  <view>BBC World symbol</view>
  <sp>
   <speaker>Voice Over</speaker>
   <p>Monty Python's Flying Circus tonight comes to you live
     from the Grillomat Snack Bar, Paignton.</p>
 </sp>
</div>
<div type="shot">
  <view>Interior of a nasty snack bar. Customers around, preferably
   real people. Linkman sitting at one of the plastic tables.</view>
 <sp>
   <speaker>Linkman</speaker>
    <p>Hello to you live from the Grillomat Snack Bar.</p>
  </sp>
</div>
Or, you could encode all
Stephenie Meyer’s Twilight
according to its emotional
register.
Whether you include or
exclude some aspect of the
text in your markup can be
very important from an
academic perspective.
The challenge of creating
good data is one reason that
collaboration is so
important to digital
scholarship.
Data Collaboration
• Avoid reinventing the wheel (has the
markup for this text already been done?)
• Consider the labor involved vs. the
outcome (and future use of the data you
create.)
Structured Data
Study Scenario #1
• You study urban espresso stands: their
hours, brands of coffee, whether or not
they sell pastries, and how far the espresso
stands are from major roadways.
What Types of Data?
• Binary (pastries: y/n)
• Unordered (hours; coffee brands)
• Derived/subservient (hours+proximity to
roadways; take cards? Which cards?)
Study Scenario #2
• You study female characters in novels
written between 1700 and 1850. Encoding a
whole novel just to study female characters
isn’t practical for you.
What types of data might
you collect in this case?
Both scenarios involve
aggregating information,
rather than encoding it.
Structured Data: Example #1
(MySQL)
ID Name Location Hours Coffee Brand Pastries (Y/N) Distance from
Street
008 Java the Hut 56 Farringdon
Road, London,
UK
7:00 a.m.-2:00
p.m.
Square Mile
Roasters
N 25 meters
009 Prufrock
Coffee
18 Shoreditch
High Street
7:00 a.m. –
10:00 p.m.
Monmouth Y 10 meters
Structured Data:
Example #2 (RDF)
How your data is (or can
be) structured will influence
the technology that you
(can) use to work with it.
Digital humanists see
creating machine-readable
data as valuable scholarship,
and consider it vital to make
that labor transparent.
Exercise:
You Create the Data!
Your data
determines your
project.
Every project has
data.
Text objects, images, tags, geographical
coordinates, categories, records, creator
metadata, etc.
Even if you’re not planning to learn
any programming skills, you are still
working with data.
Next time:
Programming on the Whiteboard
January 24, 9:30, CMU 202
•Cleaning data before you work with it!
•Identifying specific programming tasks
•How access affects your project idea
•Flash project development
•Homework: bring some data to work with.

Más contenido relacionado

Similar a Dmdh winter 2015 session #1

Feb.2016 Demystifying Digital Humanities - Workshop 2
Feb.2016 Demystifying Digital Humanities - Workshop 2Feb.2016 Demystifying Digital Humanities - Workshop 2
Feb.2016 Demystifying Digital Humanities - Workshop 2Paige Morgan
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAnant Corporation
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
Natural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesNatural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesXiang Li
 
DataFirst approach to coding
DataFirst approach to codingDataFirst approach to coding
DataFirst approach to codingAto Mensah
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
 
Semantic web, python, construction industry
Semantic web, python, construction industrySemantic web, python, construction industry
Semantic web, python, construction industryReinout van Rees
 
Beginners guide-to-coding-updated
Beginners guide-to-coding-updatedBeginners guide-to-coding-updated
Beginners guide-to-coding-updatedSaidLezzar
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.Peter Tröger
 
Cs121 Unit Test
Cs121 Unit TestCs121 Unit Test
Cs121 Unit TestJill Bell
 
Semantic web xml-rdf-dom parser
Semantic web xml-rdf-dom parserSemantic web xml-rdf-dom parser
Semantic web xml-rdf-dom parserSerdar Sönmez
 
Html add
Html addHtml add
Html addlidanx
 
Empowerment Technologies : Joana Golo, Patrick Obusa, Mariemar Gilo, Edgielyn...
Empowerment Technologies : Joana Golo, Patrick Obusa, Mariemar Gilo, Edgielyn...Empowerment Technologies : Joana Golo, Patrick Obusa, Mariemar Gilo, Edgielyn...
Empowerment Technologies : Joana Golo, Patrick Obusa, Mariemar Gilo, Edgielyn...joanagolo4ever
 
Empowerment Technologies
Empowerment  TechnologiesEmpowerment  Technologies
Empowerment Technologiesdavemonieva
 
Patterns of Semantic Integration
Patterns of Semantic IntegrationPatterns of Semantic Integration
Patterns of Semantic IntegrationOptum
 

Similar a Dmdh winter 2015 session #1 (20)

Feb.2016 Demystifying Digital Humanities - Workshop 2
Feb.2016 Demystifying Digital Humanities - Workshop 2Feb.2016 Demystifying Digital Humanities - Workshop 2
Feb.2016 Demystifying Digital Humanities - Workshop 2
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
Natural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesNatural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital Humanities
 
ACM Init() lesson 1
ACM Init() lesson 1ACM Init() lesson 1
ACM Init() lesson 1
 
DataFirst approach to coding
DataFirst approach to codingDataFirst approach to coding
DataFirst approach to coding
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
Semantic web, python, construction industry
Semantic web, python, construction industrySemantic web, python, construction industry
Semantic web, python, construction industry
 
Introduction to Html
Introduction to HtmlIntroduction to Html
Introduction to Html
 
Learning to code in 2020
Learning to code in 2020Learning to code in 2020
Learning to code in 2020
 
Beginners guide-to-coding-updated
Beginners guide-to-coding-updatedBeginners guide-to-coding-updated
Beginners guide-to-coding-updated
 
Sweo talk
Sweo talkSweo talk
Sweo talk
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.
 
Cs121 Unit Test
Cs121 Unit TestCs121 Unit Test
Cs121 Unit Test
 
Semantic web xml-rdf-dom parser
Semantic web xml-rdf-dom parserSemantic web xml-rdf-dom parser
Semantic web xml-rdf-dom parser
 
Html add
Html addHtml add
Html add
 
Aspects of NLP Practice
Aspects of NLP PracticeAspects of NLP Practice
Aspects of NLP Practice
 
Empowerment Technologies : Joana Golo, Patrick Obusa, Mariemar Gilo, Edgielyn...
Empowerment Technologies : Joana Golo, Patrick Obusa, Mariemar Gilo, Edgielyn...Empowerment Technologies : Joana Golo, Patrick Obusa, Mariemar Gilo, Edgielyn...
Empowerment Technologies : Joana Golo, Patrick Obusa, Mariemar Gilo, Edgielyn...
 
Empowerment Technologies
Empowerment  TechnologiesEmpowerment  Technologies
Empowerment Technologies
 
Patterns of Semantic Integration
Patterns of Semantic IntegrationPatterns of Semantic Integration
Patterns of Semantic Integration
 

Último

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Último (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Dmdh winter 2015 session #1

  • 1. DMDH Winter 2015 Session #1: Exploring Programming in the Digital Humanities
  • 2. Programming is complex enough that just figuring out what you want to do and what sort of language you need is work.
  • 3. Thinking that you ought to be able to do everything almost immediately is a recipe for feeling terrible.
  • 4. Being aware that it is genuine work, and not just work for newbies, matters.
  • 5. There will always be new programs and platforms that you will want to experiment with.
  • 6. Working with technology means periodically starting from scratch -- a bit like working with a new time period or culture; or figuring out how to teach a new class.
  • 9. They can also do all these things in combination.
  • 10. Example #1 • find all the statements in quotes ("") from a novel. • count how many words are in each statement • put the statements in order from smallest amount of words to largest • write all the statements from the novel in a text file
  • 11. Example #2 • allow a user to type in some information, i.e., "Benedict Cumberbatch" • compare “Benedict Cumberbatch” to a much larger file • retrieve any data that matches the information • print the retrieved information on screen
  • 12. Example #3 • "read" two texts -- say, two plays by Seneca • search for any words that the two plays have in common • print the words that they have in common on screen • calculate what percentage of the words in each play are shared • print that percentage onscreen
  • 13. Example #4 • if the user is located in geographic location Z, i.e., 45th and University, go to an online address and retrieve some text • print that text on the user’s tablet screen • receive input from the user and respond
  • 14. However... • In Example #1, the computer is focusing on things that characters say. But what if you want to isolate speeches from just one character? • In Example 2, how does the computer know how much text to print? Will it just print "Benedict Cumberbatch" 379 times, because that's how often it appears in the larger file?
  • 15. These are the areas of programming where critical thinking and humanities skills become vital.
  • 16. The Difference • Humans are good at differentiating between material in complex and sophisticated ways. • Computers are good at not differentiating between material unless they’ve been specifically instructed to do so.
  • 17. Computers work with data. You work with data, too -- but in most cases, you'll have to make your data readable by computer.
  • 18. How to make your data machine-readable • Annotate it with markup language • Organize it in patterns that the computer can understand • Add data that is not explicitly readable in the current format (i.e., hardbound/softbound binding; language:English; date of record creation)
  • 19. Depending on the data you have, and the way you annotate or structure it, different things become possible.
  • 20. For instance, sometimes it may be enough to know that a tile is 9” sq. But sometimes you need to know that it is 3” x 3”.
  • 21. Your goal is to make the data As Simple As Possible -- but not so simple that it stops being useful.
  • 22. Depending on the data you work with, the work of structuring or annotating becomes more challenging, but also more useful.
  • 23. The work of creating data is social.
  • 24. In other words, how can others use it?
  • 25. Many programming languages have governing bodies that establish standards for their use: •the World Wide Web (W3C) Consortium (http://www.w3.org/standards/) •the TEI Technical Council
  • 27. Data Examples • Annotated (Markup Languages: HTML,TEI) • Structured (MySQL) • Combination (Semantic Web)
  • 28. Markup: HTML <i> This text is italic.</i> = This text is italic.
  • 29. Markup: HTML <a href=“http://www.dmdh.org”> This text</a> will take you to a webpage. = This text will take you to a webpage.
  • 30. Markup: HTML Anything can be data -- and markup languages provide instructions for how computers should treat that data.
  • 31. Markup: HTML HTML is a display language used to format text on webpages. <p> separates text into paragraphs. <em> makes text bold (emphasized). These are just a few of the HTML formatting instructions that you can use.
  • 32. HTML Syntax Rules •Open and closed tags: <> and </> •Attributes (2nd -level information) defined using =“” •Comments: <!-- -->
  • 33. Markup languages are popular in digital humanities because lots of humanists work with texts.
  • 34. Without markup languages, the things that a computer can search for are limited.
  • 35. Ctrl + F: any text in iambic pentameter.
  • 36. With markup, the things you can search for are only limited by your interpretation. Markup: TEI
  • 38. Poetry w/ TEI <text xmlns="http://www.tei-c.org/ns/1.0" xml:id="d1"> <body xml:id="d2"> <div1 type="book" xml:id="d3"> <head>Songs of Innocence</head> <pb n="4"/> <div2 type="poem" xml:id="d4"> <head>Introduction</head> <lg type="stanza"> <l>Piping down the valleys wild, </l> <l>Piping songs of pleasant glee, </l> <l>On a cloud I saw a child, </l> <l>And he laughing said to me: </l> </lg>
  • 39. Grammar w/ TEI <entry> <form> <orth>pamplemousse</orth> </form> <gramGrp> <gram type="pos">noun</gram> <gram type="gen">masculine</gram> </gramGrp> </entry>
  • 40. TEI’s syntax rules are identical to HTML’s -- though your normal browser can’t work with TEI the way it works with HTML.
  • 41. TEI is meant to be a highly social language -- meaning that the committee who maintains its standards want it to be something that anyone can use.
  • 42. In order for TEI to successfully encode texts, it has to be adaptable to individual projects.
  • 43. Anything that you can isolate (and put in brackets) can (theoretically) then be manipulated to serve your project.
  • 44. TEI can be used to encode more than just text: <div type="shot">   <view>BBC World symbol</view>   <sp>    <speaker>Voice Over</speaker>    <p>Monty Python's Flying Circus tonight comes to you live      from the Grillomat Snack Bar, Paignton.</p>  </sp> </div> <div type="shot">   <view>Interior of a nasty snack bar. Customers around, preferably    real people. Linkman sitting at one of the plastic tables.</view>  <sp>    <speaker>Linkman</speaker>     <p>Hello to you live from the Grillomat Snack Bar.</p>   </sp> </div>
  • 45. Or, you could encode all Stephenie Meyer’s Twilight according to its emotional register.
  • 46. Whether you include or exclude some aspect of the text in your markup can be very important from an academic perspective.
  • 47. The challenge of creating good data is one reason that collaboration is so important to digital scholarship.
  • 48. Data Collaboration • Avoid reinventing the wheel (has the markup for this text already been done?) • Consider the labor involved vs. the outcome (and future use of the data you create.)
  • 50. Study Scenario #1 • You study urban espresso stands: their hours, brands of coffee, whether or not they sell pastries, and how far the espresso stands are from major roadways.
  • 51. What Types of Data? • Binary (pastries: y/n) • Unordered (hours; coffee brands) • Derived/subservient (hours+proximity to roadways; take cards? Which cards?)
  • 52. Study Scenario #2 • You study female characters in novels written between 1700 and 1850. Encoding a whole novel just to study female characters isn’t practical for you.
  • 53. What types of data might you collect in this case?
  • 54. Both scenarios involve aggregating information, rather than encoding it.
  • 55. Structured Data: Example #1 (MySQL) ID Name Location Hours Coffee Brand Pastries (Y/N) Distance from Street 008 Java the Hut 56 Farringdon Road, London, UK 7:00 a.m.-2:00 p.m. Square Mile Roasters N 25 meters 009 Prufrock Coffee 18 Shoreditch High Street 7:00 a.m. – 10:00 p.m. Monmouth Y 10 meters
  • 56.
  • 58. How your data is (or can be) structured will influence the technology that you (can) use to work with it.
  • 59. Digital humanists see creating machine-readable data as valuable scholarship, and consider it vital to make that labor transparent.
  • 62. Every project has data. Text objects, images, tags, geographical coordinates, categories, records, creator metadata, etc.
  • 63. Even if you’re not planning to learn any programming skills, you are still working with data.
  • 64. Next time: Programming on the Whiteboard January 24, 9:30, CMU 202 •Cleaning data before you work with it! •Identifying specific programming tasks •How access affects your project idea •Flash project development •Homework: bring some data to work with.