SlideShare una empresa de Scribd logo
1 de 92
Open for Research: A Demonstration of Text Analysis Applications
and a Discussion of Library Collaboration Opportunities
Nat Gustafson-Sundell
Journal Acquisitions/ Reference Librarian
Assistant Professor
Minnesota State University, Mankato
See OpenResearch.Weebly.Com for my notes
Section 1: 60-70 minutes
I. Introduction to Workshop (5-10)
II. Overview: A Functional Description of Open Access (15)
III. Overview: Digital Humanities, Humanities Computing, eResearch: Background and Theory (20-25)
IV. Overview: Text Analysis & Topic Modeling: Background and Theory (20, possibly broken by coffee break)
*Coffee Break 2:30-2:45*
Section 2: 65-70 minutes (Start around 2:45-50)
V. Demonstration: Text Discovery/ Text Preparation/ Text Analysis/ Topic Modeling (30)
VI. Group Exploration: Text Analysis/ Topic Modeling (20, less if we need to make up some time)
VII. Overview: Content Analysis (15-20, start by 3:40)
*Breather* (5 minutes)
Section 3: 55-60 minutes (Start around 4:00)
VIII. Demonstration: Text Annotation Tools & Relational Databases (20)
IX. Group Exploration: Projects (20-25)
X. Group Exploration: Collaboration (eResearch Centers and Libraries) (15, less if we need to make up some time)
Schedule
2
Overview: Open Access . 1
http://openresearch.weebly.com/open-access.html
Key Terms:
• Gratis/ Libre
• Green/ Gold/ “Platinum”
• Working Papers, pre-prints, e-prints, etc.
• Subject and Institutional Repositories
3
http://www.opendoar.org/onechart.php?cID=&ctID=&rtID=&clID=&lID=&potID=&rSoftWareName=&search=&groupby=rt.rtHeading&orderby=Tally%20D
ESC&charttype=pie&width=600&height=300&caption=Open%20Access%20Repository%20Types%20-%20Worldwide
Overview: Open Access . 2
4
https://creativecommons.org/licenses/
Overview: Open Access . 3
• Do not sign away your copyright (unless you have a REALLY good reason)
5
Overview: Open Access . 4
“We are all, or are all soon to become,
nineteenth centuryists.”
- Matthew Jockers
6
Overview: Digital Humanities . 1
http://openresearch.weebly.com/digital-humanities-eresearch.html
“In 1949, an Italian Jesuit priest, Father Roberto Busa, began what
even today is a monumental task: to make an index verborum of
all the words in the works of St. Thomas Aquinas and related
authors, totaling some 11 million words of medieval Latin. Father
Busa imagined that a machine might be able to help him, and,
having heard of computers, went to visit Thomas J. Watson at IBM
in the United States in search of support … The entire texts were
gradually transferred to punched cards and a concordance
program written for the project.” (Hockey online)
“The History of Humanities Computing,” Susan Hockey, 2004
7
Overview: Digital Humanities . 2
“The real origin of that term *digital humanities+ was in conversation with Andrew McNeillie, the original
acquiring editor for the Blackwell Companion to Digital Humanities. We started talking with him about that book
project in 2001, in April, and by the end of November we’d lined up contributors and were discussing the title,
for the contract. Ray *Siemens+ want ‘A Companion to Humanities Computing’ as that was the term commonly
used at that point; the editorial and marketing folks at Blackwell wanted ‘Companion to Digitized Humanities.’ I
suggested ‘Companion to Digital Humanities’ to shift emphasis away from simple digitization.” (John Unsworth
quoted in Kirschenbaum 2-3)
“Twitter, along with blogs and other online outlets, has inscribed the digital humanities
as a network topology, that is to say, lines drawn by aggregates of affinities, formally
and functionally manifest in who follows whom, who friends whom, who tweets whom,
and who links to what.” (Kirschenbaum 5)
“What is Digital Humanities and What’s It Doing in English Departments?” Matthew G. Kirschenbaum, 2010
8
Overview: Digital Humanities . 3
“…the digital humanities can be … a nexus of fields within which scholars use
computing technologies to investigate the kinds of questions that are
traditional to the humanities, or… who ask traditional kinds of humanities-
oriented questions about computing technologies.” (Fitzpatrick online)
“Reporting from the Digital Humanities 2010 Conference,” Kathleen Fitzpatrick, 2010
9
(Meeks online, 2011) See https://dhs.stanford.edu/comprehending-the-digital-humanities/
Overview: Digital Humanities . 4
10
Image Removed to avoid possibility of copyright infringement
Overview: Digital Humanities . 5
“The Landscape of Digital Humanities,” Patrik Svensson, 2010
5 “paradigmatic modes of engagement between the humanities and information
technology: information technology as a tool, an object of study, an exploratory laboratory,
an expressive medium and an activist venue.” (Svensson online)
“…the digital humanities comprise a field in a loose sense.” (Svensson online)
“…it seems quite unlikely that the digital humanities would ever become a fully separate
field.” (Svensson online)
“The complexity of digital humanities as a ‘field’ comes partly from its disciplinary and
institutional diversity, and its multiple modes of engagement with information
technology.” (Svensson online)
“…one interesting question is whether the digital calls for other modes of investigation,
collaboration and making that may be partially incompatible with the epistemic
commitments of the established discipline or field.” (Svensson online)
11
Overview: Digital Humanities . 6
“A Genealogy of Digital Humanities,” Marija Dalbello, 2010
“This paper explores the history of digital humanities that grappled with the epistemological
status of technology, as a major field of contemplation of the effects of new media on writing,
reading, and interpretation.” (Dalbello 482)
Previous to the TLG, according to Karen Ruhleder, “gaining familiarity with the corpus was a life’s
work” (1995), but given the TLG, graduate students could “ask questions that formerly could only
be answered through comprehensive reading and experience un the field.” (Dalbello 488)
The TLG “had a profound restructuring effect on knowledge production in the field of classics …
a broader range of ‘legitimate’ research queries could produce a sense of ‘doing complete work’”
so that the “researcher could be free to do more intellectually interesting work rather than
focusing on learning the corpus” (Dalbello 489)
“Seeing patterns and connections out of context liberates them from an archive, as a signifier
that attaches itself to a new meaning; searches through digital corpora can produce radical
readings and undermine existing interpretations, thus contributing to a program of critical
interpretation,” (Dalbello 493)
12
Overview: Digital Humanities . 7
“The term post-human … suggests that our sense of what it is to be human has changed – as
Katherine Hayles puts it, the post-human is a state of mind, a realization that mankind has finally
understood that it is definitely not the centre of the universe. My concern here is to consider the
implications of this post-human state of mind for our understanding and practice of the digital
humanities.” (Prescott online)
“Making the Digital Human: Anxieties, Possibilities, Challenges,” Andrew Prescott, 2012
“For all the rhetoric about digital technologies changing the humanities, the overwhelming
picture presented by the activities of digital humanities centres in the Great Britain is that they
are busily engaged in turning back the intellectual clock and reinstating a view of the humanities
appropriate to the 1950s...” (Prescott online)
“…as far as the digital humanities are concerned, interdisciplinarity is just a cover for the lack of
a distinctive intellectual agenda … Another major obstacle preventing the digital humanities
developing its own scholarly identity is our interest in method. If we focus on modelling
methods used by other scholar, we will simply never develop new methods of our own.”
(Prescott online)
“We might start by seeking closer contact with our colleagues in Cultural and Media Studies.”
(Prescott online)
“We should be seeking to provide new perspectives on the way in which technology interacts
with text.” (Prescott online)
13
Overview: Digital Humanities . 8
“The state of the digital humanities: A report and a critique,” Alan Liu, 2011
“A purely economic rationale for the digital humanities might … be that they re-engineer higher education for
knowledge work by providing ever smarter tools for working with increasingly global-scale knowledge resources,
all the while trimming the need to invest proportionally in the traditional facilities…” (Liu 2011)
“I offer a report on the current state of the digital humanities … I will define it with unusual breadth. “Digital
humanities’ will here have a supervening sense that combines ‘humanities computing’ or ‘text-based’ digital
humanities … and new media studies …” (Liu 10)
“Currently, I fear, the digital humanities are not ready to take up their full responsibility because the field does
not posses an adequate critical awareness of the larger social, economic, and cultural issues at stake … the
whole amounts to the lack of a mental and policy firewall against postindustrial takeovers of the digital idea …”
(Liu 11)
“The digital humanities are on the threshold of a new interpretive paradigm. The old paradigm, especially on the
text-oriented side of the field, was constraining. That paradigm was empirical … about ‘hypothesis testing and
empirical validation’.
…
The new paradigm allows computers and humans to share responsibility for the full act of interpretation,
including the component acts of hypothesis-framing, observation, discovery, analysis, testing, reiterative
hypothesis-framing, etc.” (Liu 21)
“…the text-oriented side of the digital humanities has been almost wholly uninterested in any social, political,
economic, or cultural inquiry into the contexts and implications of information technology.” (Liu 30)
14
Overview: Digital Humanities . 9
“….attempts to work toward a theoretical foundation for humanities computing surfaced at the outset of
scholarly publication in the field and have been in progress ever since – with no consensus in sight … the
considerable variety in how humanities computing is … conceived is a sign of health rather than decay.”
(McCarty 1228)
“….humanities computing … need not wait on the emergence of a theoretical framework … its
semidirected, semicoherent activities are no discredit, rather the norm for an experimental
field.” (McCarty 1233)
“…the crafted objects of humanities computing are … primary ‘metatheoretical statements’ produced by
those who ‘think in things rather than words.’ A new definition of scholarship, demanind new abilities,
would seem to follow.” (McCarty 1227)
“The question of reinventing our scholarly forms takes us beyond any list of projects to what we might
consider the fundamental ‘project’ of humanities computing … This is, in simple but far-reaching terms,
epistemological: to ask, in the context of computing, what can (and must) be known of our artifacts, how we
know what we know about them, and how new knowledge is made.” (McCarty 1231)
“The socioacademic function of humanities computing can be understood as an elaboration of the
Galisonian trading zone, which the field establishes as a methodological commons and within it, takes the
role of merchant trader among mutually divergent academic cultures.” (McCarty 1232)
“Play, apparently without conscious direction, is a recognized factor in scientific discovery.” (McCarty 1232)
“Humanities Computing,” Willard McCarty, 2003
15
Overview: Digital Humanities . 9
(McCarty 1225) 16
Image Removed to avoid possibility of copyright infringement
Overview: Text Analysis . 1
http://openresearch.weebly.com/text-analysis.html
Some Terms:
• Corpus
• Concordance
• Key Words in Context (KWIC)
• Stopwords
• Collocation
• Frequency (and Frequency Distribution)
• Lemmatization
• Hapax legomena (dis, tris, tetrakis)
• Etc.
“Text-analysis tools have their roots in the print concordance. The concordance is a standard research tool in the
humanities that goes back to the thirteenth century.
…
The challenge before us is to question our procedural habits and presuppositions as to what are legitimate
recombinations – to forget the concordance and ask anew how we can analyse text with a computer and whether
such computer-assisted interpretations are interesting in and of themselves. We need to play again and make
playpens available to our colleagues.
…
I therefore want to propose a very different image of what a concordance is … I call it a hybrid (or monster)
because it is authored not just by the original author, but also by the user’s choices and the procedures used to
generate it … it is neither the work of the original author nor that entirely of the provoker of the concordance.”
(Rockwell 210,213)
17
Overview: Text Analysis . 2
Macroanalysis, Matthew Jockers, 2013
“The literary scholar of the twenty-first century can no longer be content with anecdotal evidence, with random
‘things’ generated from a few , even ‘representative’ texts. We must strive to understand these things in the
context of everything else, including a mass of possibly ‘uninteresting’ texts.” (Jockers 8)
“Today’s student of literature must be adept at reading and gathering evidence from individual
texts and equally adept at accessing and mining digital-text repositories.” (Jockers 9)
“At the macro scale , we see evidence of time and gender influences on theme and style. By
superimposing these two network snapshots in our minds, we can begin to imagine a larger
context in which to read and study nineteenth-century literature. What is clear is that the
books we have traditionally studied are not isolated books. The canonical greats are not even
outliers: they are books that are similar to other books…” (Jockers 168)
“…macroscopic investigation is contextualization on an unprecedented scale.” (Jockers 27-8)
“It is the exact interplay between the macro and micro scale that promises a new, enhanced,
and perhaps even better understanding of the literary record. The two approaches work in
tandem and inform each other. Human interpretation of the ‘data,’ whether it be mined at the
macro or micro level, remains essential … The most fundamental and important difference in
the two approaches is that the macroanalytic approach reveals details about texts that are for
all intents and purposes unavailable to close-readers of the texts.” (Jockers online)
See Example Project
18
Overview: Text Analysis . 3
Distant Reading, Franco Moretti, 2013
“Writing about comparative social history, Marc Bloch once coined a lovely ‘slogan,’ as he himself called it:
‘years of analysis for a day of synthesis’; and if you read Braudel or Wallerstein you immediately see what Bloch
had in mind. The text which is strictly Wallerstein’s, his ‘day of synthesis’, occupies one-third of a page … the
rest are quotations … Years of analysis; other people’s analysis, which Wallerstein’s page synthesizes into a
system.
Not, if we take this model seriously, the study of world literature will somehow have to reproduce this ‘page’ –
which is to say: this relationship between analysis and synthesis – for the literary field. But in that case, literary
history will quickly become very different from what it is now: it will become ‘second hand’: a patchwork of
other people’s research, without a single direct textual reading. Still ambitious, and actually even more so than
before (world literature!); but the ambition is now directly proportional to the distance from the text: the
more ambitious the project, the greater must the distance be.” (Moretti 47-8, 2000)
“Distant reading: where distance … is a condition of knowledge: it allows you
to focus on units that are much smaller or much larger than the text:
devices, themes, tropes – or genres and systems. And if, between the very
small and the very large, the text itself disappears, well, it is one of those
cases when one can justifiably say, Less is more. It we want to understand
the system in its entirety, we must accept losing something…” (Moretti 48-9,
2000)
19
Overview: Text Analysis . 4
Distant Reading, Franco Moretti, 2013
(Moretti 221) See also http://litlab.stanford.edu/LiteraryLabPamphlet2B.Figures.pdf
20
Image Removed to avoid possibility of copyright infringement
Overview: Text Analysis . 5
Reading Machines, Stephen Ramsay, 2011
“…literary criticism operates at a register in which understanding, knowledge, and truth occur outside
of the narrower denotative realm in which scientific statements are made. It is not merely the case
that literary criticism is concerned with something other than the amassing of verified knowledge.
Literary criticism operates within a hermeneutical framework in which the specifically scientific
meaning of fact, metric, verification, and evidence simply do not apply … ‘evidence’ stands as a
metaphor for the delicate building blocks of rhetorical persuasion … ‘Verification’ occurs in a social
community of scholars whose agreement or disagreement is almost never put forth without
qualification.” (Ramsay 7, 2011)
“If text analysis is to participate in literary critical endeavor in some manner
beyond fact-checking, it must endeavor to assist the critic in the unfolding of
interpretive possibilities. We might say that its purpose should be to generate
further ‘evidence,’ though we do well to bracket the association that term holds
in the context of less methodologically certain pursuits. The evidence we seek is
not definitive, but suggestive of grander arguments and schemes.” (Ramsay 10,
2011)
“Critics often use the word ‘pattern’ to describe what they’re putting forth, and that word aptly
connotes the fundamental nature of the data upon which literary insight relies. The understanding
promised by the critical act arises not from a presentation of facts, but from the elaboration of a
gestalt, and it rightfully includes the vague reference, the conjectured similitude, the ironic twist, and
the dramatic turn. In the spirit of inventio, the critic freely employs the rhetorical tactics of conjecture
– not so that a given matter might be definitely settled, but in order that the matter might become
richer, deeper, and ever more complicated. (Ramsay 16, 2011)
21
“Although it is true that we do not typically ask *students+ to cast yarrow stalks or choose things at random,
we do ask them to find some pattern beyond the apparent pattern of the text … We ask them to select,
isolate, notice – to consider a small group of sub-patterns from among the infinity of patterns that make up
the text. Having done this, we then ask them to re-articulate those patterns in narrative form as
elucidations of the texts in which they occur. We call those articulations ‘meanings’, and we call the act of
embedding them in a narrative framework ‘interpretation’.
…
With algorithmic criticism, one would not ask how the ends of interpretation were or were not justified by
means of the algorithms imposed, but rather, how successful the algorithms were in provoking thought
and allowing insight.” (Ramsay 171, 173, 2003)
Overview: Text Analysis . 6
“…the real message of our technology is something entirely unexpected – a writerly,
anarchic text that is more useful than the readerly, institutional text … This is, if you like,
the basis of the Screwmeneutical Imperative. There are so many books. There is so little
time. Your ethical obligation is neither to read them all nor to pretend that you have read
them all, but to understand each path through the vast archive as an important moment
in the world’s duration – as an invitation to community, relationship, and play.” (Ramsay
online, 2010)
Reading Machines, Stephen Ramsay, 2011, and other essays
“… As with any text-analytical result, we can weave a narrative through the gaps. For this reason, we
would do better to say it carves a new path through the document space, which in turn allows us to
reread and rethink…” (Ramsay 80, 2011)
22
Overview: Text Analysis . 7
“Tampering with the Text to Increase Awareness of Poetry’s Art,” Estelle Irizarry, 1996
“Computer-Assisted Reading: Reconceiving Text Analysis,” Stefan Sinclair, 2003
“What is Text Analysis, Really?” Geoffrey Rockwell, 2003
“The value of the computer-mediated exercises is that they enable readers to readily perceive and appreciate
features that are not obvious in a conventional reading of a printed text.” (Irizarry 155)
“The computer is, among other things, an instrument uniquely suite to play activities ...”
(Irizarry 156)
“By thinking more about process than outcomes, about multiplying meanings (not data) rather than
converging on answers, we can consider how to make the computer an extension of the reading and
interpretive practices in which humanists are already engaged.” (Sinclair 176)
“Playful experimentation is a pragmatic approach of trying something, seeing if you obtain
interesting results, and if you do, then trying to theorize why those results are interesting
rather than starting from articulated principles.” (Rockwell 214, 2003)
“Assembling and disassembling a text, like playing with blocks of Lego, may not
necessarily contribute immediately to its understanding, but it is likely to contribute to
the aggregate experience of the text in valuable ways. … I am suggesting that play is an
integral part of a humanist’s interpretive activities…” (Sinclair 181)
“…we should rethink our tools on a principle of research as disciplined play.” (Rockwell 213)
23
Overview: Text Analysis . 8
“Between Language and Literature: Digital Text Exploration,” Geoffrey Rockwell and Stefan Sinclair, 2009
“Just because we can’t extract the same meaning(s) from a representation in the
way we might from a traditional text does not mean that representations can’t be
read.
…
As everyone should know by now, looking at visualizations of texts is a form of
exploring and should be taken not as analysis, but exploration.
…
We can (and must) learn new ways of reading texts, and to embrace
mathematical abstraction and visualization as interpretive allies rather than
black-box enemies.” (Gibbs online, 2013)
“We have found that students enjoy submitting their own texts to these types of analysis tools, where
they discover aspects of their writing of which they were not aware (like a propensity for repeating a given
phrase. An engaging activity can be to have students try to find texts on the web that most closely
resemble the data profile of their own texts. Do so can provoke interesting results and awaken the
curiosity of the students for the relationship between text analysis and linguistic proficiency.” (Rockwell
and Sinclair, 2009)
“Learning to Read Again,” Fred Gibbs, 2013
24
Overview: Topic Modeling . 1
http://openresearch.weebly.com/topic-modeling.html
(Blei 78)
25
Image Removed to avoid possibility of copyright infringement
Overview: Topic Modeling . 2
Time
Years
Sing
Past
Land
Songs
Long
Things
Divine
Blood
…
45%
Man
Body
Soul
Poems
Woman
Make
True
Large
Beauty
Times
…
23%
Thee
Thy
Soul
Joy
Life
Ship
Space
Joys
Long
…
10%
Earth
Men
Face
Strong
Young
Love
Cities
Children
Women
Fill
…
10%
World
Life
States
War
America
Great
Present
Future
Real
Today
…
5%
26
Overview: Topic Modeling . 3
“Topic modeling gives us a way to infer the latent structure behind a collection of documents. In principle, it
could work at any scale, but I tend to think human beings are already pretty good at inferring the latent
structure in (say) a single writer’s oeuvre. I suspect this technique becomes more useful as we move toward a
scale that is too large to fit into human memory.” (Underwood online)
“Topic modeling made just simple enough,” Ted Underwood, 2012
“…I’m not sure how much value they will have as evidence. For one thing, they
require you to make a series of judgment calls that deeply shape the results you
get (from choosing stopwords, to the number of topics produced, to the scope of
the collection). The resulting model ends up being tailored in difficult-to-explain
ways by a researchers preferences.” (Underwood online)
See a variety of Topic Model visualizations
“…excitement about the use of topic models for discovery needs to be tempered with skepticism about
how often the unexpected juxtapositions LDA creates will be helpful, and how often merely surprising. A
poorly supervised machine learning algorithm is like a bad research assistant. It might produce some
unexpected constellations that show flickers of deeper truths; but it will also produce tedious,
inexplicable, or misleading results.” (Schmidt 50)
“Words Alone: Dismantling Topic Models in the Humanities,” Benjamin M. Schmidt, 2012
27
Demonstration: Text Analysis: Text Discovery . 1
http://openresearch.weebly.com/tools.html
http://chroniclingamerica.loc.gov/
28
Demonstration: Text Analysis: Text Discovery . 2
29
Demonstration: Text Analysis: Text Discovery . 3
30
Demonstration: Text Analysis: Text Discovery . 4
http://www.gutenberg.org/
31
Demonstration: Text Analysis: Text Discovery . 5
32
Demonstration: Text Analysis: Text Discovery . 6
33
Demonstration: Text Analysis: Text Preparation . 1
http://openresearch.weebly.com/tools.html
34
Steps:
1. Reg Ex: ̶ → □
2. Reg Ex: n→ ◊
3. Reg Ex: r → ◊
4. Normal: ; → ;□
5. Normal: , → ,□
6. Normal: . → .□
7. Normal: ? → ?□
8. Normal: ! → !□
9. Normal: : → :□
10. Reg Ex: (s{2,}) → □
11. Normal: -□ → ◊
12. Normal: - → ◊
13. Add carriage returns for
every “document”
14. Encode in UTF-8
Demonstration: Text Analysis: Text Preparation . 2
35
Demonstration: Text Analysis: Text Preparation . 3
36
Steps:
1. Reg Ex: (.+?) → □
Find every instance of parentheses enclosing any text.
Demonstration: Text Analysis: Text Preparation . 4
37
Image Removed to avoid possibility of copyright
infringement
Demonstration: Text Analysis: Voyant . 1
http://openresearch.weebly.com/tools.html
http://openresearch.weebly.com/texts.html
http://voyant-tools.org
38
Demonstration: Text Analysis: Voyant . 2
39
Demonstration: Text Analysis: Voyant . 3
40
Demonstration: Text Analysis: Voyant . 4
41
Demonstration: Text Analysis: Voyant . 5
42
Demonstration: Text Analysis: Voyant . 6
43
<ctrl> A
<ctrl> C
Demonstration: Text Analysis: Voyant . 7
44
Demonstration: Text Analysis: Voyant . 8
45
Demonstration: Text Analysis: Voyant . 9
46
Image Removed to avoid possibility of copyright infringement
Demonstration: Text Analysis: Topic Modeling Tool . 1
http://openresearch.weebly.com/tools.html
http://code.google.com/p/topic-modeling-tool/
47
Demonstration: Text Analysis: Topic Modeling Tool . 2
48
Demonstration: Text Analysis: Topic Modeling Tool . 3
1. Reg Ex: n→ ◊
2. Reg Ex: r → ◊
49
Demonstration: Text Analysis: Topic Modeling Tool . 4
50
Demonstration: Text Analysis: Topic Modeling Tool . 5
51
Demonstration: Text Analysis: Topic Modeling Tool . 6
52
Demonstration: Text Analysis: Topic Modeling Tool . 7
53
Concept?
Language Type?
Sentiment Cluster?
Demonstration: Text Analysis: Topic Modeling Tool . 8
54
Group Exploration: Text Analysis . 1
http://openresearch.weebly.com/tools.html
???
55
Leaves of Grass 1867
Group Exploration: Text Analysis . 2
56
Leaves of Grass 1892
Group Exploration: Text Analysis . 3
57
Leaves of Grass 1892
Group Exploration: Text Analysis . 4
58
There is 1 document in this corpus
with a total of 130,506 words and
14,594 unique words.
Most frequent words in the corpus:
old (303), shall (265), life (261), love
(261), soul (245). More…
Leaves of Grass 1892
Group Exploration: Text Analysis . 5
59
Word 1892 Count 1892 Ratio 1867 Count 1867 Ratio
old 303 0.00232173 157 0.00157965
shall 265 0.00203056 218 0.0021934
life 261 0.00199991 161 0.0016199
love 261 0.00199991 218 0.0021934
soul 245 0.00187731 160 0.00160984
long 236 0.00180835 154 0.00154947
earth 229 0.00175471 199 0.00200223
night 210 0.00160912 168 0.00169033
man 206 0.00157847 190 0.00191168
day 188 0.00144055 140 0.00140861
men 185 0.00141756 182 0.00183119
know 179 0.00137158 146 0.00146898
death 178 0.00136392 131 0.00131805
come 175 0.00134093 131 0.00131805
time 175 0.00134093 111 0.00111682
great 174 0.00133327 159 0.00159977
world 157 0.00120301 78 0.0007848
sea 153 0.00117236 124 0.00124762
hear 137 0.00104976 115 0.00115707
like 137 0.00104976 94 0.00094578
face 132 0.00101145 111 0.00111682
hand 128 0.0009808 101 0.00101621
good 126 0.00096547 100 0.00100615
body 124 0.00095015 114 0.00114701
young 122 0.00093482 101 0.00101621
Total 130506 99389
Sorted by 1892 Ratio
Same list re-sorted by 1867 Ratio
Leaves of Grass 1867, 1892
Group Exploration: Text Analysis . 6
60
Overview: Content Analysis . 1
http://openresearch.weebly.com/content-analysis.html
“Content Analysis” or “message analysis”?
• Rhetorical Analysis
• Narrative Analysis
• Discourse Analysis
• Structuralist (or Semiotic) Analysis
• Interpretative Analysis
• Conversation Analysis
• Critical Analysis
• Normative Analysis
(Neuendorf 5-8)
“Content analysis is a summarizing, quantitative analysis of messages that relies on the scientific method
and is not limited as to the types of the variables that may be measured or the context in which the
messages are created or presented.” (Neuendorf 10)
“Content analysis is any technique for making inferences by objectively and systematically identifying
specified characteristics of messages. (Holsti 25)
61
Overview: Content Analysis . 2
Concepts and Design:
• Quantitative/ Qualitative
• Deductive/ Inductive
• Manifest/ Latent Content
• Content/ Form
• Reliability, Validity, Generalizability, Replicability
• Unitizing
• Etc.
• Etc.
Content analysis “ ‘learned its methods from cryptography, from the subject classification of library books, and from
biblical concordances, as well as from standard guides to legal precedents’ “ (Marvick, quoted in Rogers 214, 1994,
quoted in Neuendorf 31)
62
Message produced by
source A:
time t1
Message produced by
source A:
time t2
Content Variable X A
X t1
A
X t2
Trends in communication
content
Adapted from Figure 2-2, Holsti 28
Overview: Content Analysis . 3
63
Image Removed to avoid possibility of copyright infringement
Adapted from Figure 2-5, Holsti 30
Messages produced by
source A
Content Variable X A
X
Relationship of content
variables to each other
Content Variable Y A
Y
Overview: Content Analysis . 4
64
Image Removed to avoid possibility of copyright infringement
Message produced by
source A
Message produced by
source B
Content Variable X A
X
B
X
Differences between
communicators
Adapted from Figure 2-6, Holsti 30
Overview: Content Analysis . 5
65
Image Removed to avoid possibility of copyright infringement
Content Analysis can be used to:
• To describe trends in communication content
• To relate known characteristics of sources to the messages they produce
• To audit communication content against standards
• To analyze techniques of persuasion
• To analyze style
• To relate known characteristics of the audience to messages produced for them
• To describe patterns of communication
• To secure political and military intelligence
• To analyze psychological traits of individuals
• To infer aspects of culture and cultural change
• To provide legal evidence
• To answer questions of disputed authorship
• To measure readability
• To analyze the flow of information
• To assess responses to communication
Adapted from Table 2-1, Holsti 26
Overview: Content Analysis . 6
66
Overview: Content Analysis . 7
A 1935 study by Edgar Dale analyzed the themes and emphases of American motion pictures.
(Neuendorf 33-4)
A World War II study “systematically analyzed radio broadcasts from Axis powers …. Allied forces were able to
estimate the concentration of German troops in various locations by comparing music played on German radio
stations with music played elsewhere in occupied Europe.” (Neuendorf 37)
A 1994 study by Kathleen Carley tracked “the representation of robots in science fiction” from “three different time
periods – pre-1950s, the 1950s and 1960s, and the 1970s and 1980s.” (Neuendorf 184-6) See visualization p. 185
A 1963 study by E.S. Shneidman analyzed political rhetoric “to infer personality traits of the speaker
from logical and cognitive characteristics of his verbal production.” The study categorized
“Idiosyncrasies of reasoning and cognitive maneuvers in the rhetoric of John F. Kennedy, Richard M.
Nixon, and Nikita Krushchev:”
• Idiosyncrasies of reasoning:
o Irrelevant premise
o Argumentum ad populum
o Complex question
o Derogation
o Stranded predicate
o Truth-type confusion
• Cognitive Maneuvers:
o To enlarge or elaborate the preceding
o To smuggle debatable point into alien context
o To be irrelevant
o To allege but not substantiate
o To introduce new notion
Adapted from Holsti 72-4 67
Overview: Content Analysis . 8
ArticleManager
68
This is a typical hierarchy of forms linked from the main
form. The Group and Actor forms are more complexly
related, so possibly more interesting to look at, but
would not fit so nicely on a single page.
Overview: Content Analysis . 9
ArticleManager
69
Overview: Content Analysis . 10
ArticleManager
70
Overview: Content Analysis . 11
ArticleManager
71
Demonstration: Text Annotation: CATMA . 1
http://openresearch.weebly.com/tools.html
http://www.catma.de/
72
Demonstration: Text Annotation: CATMA . 2
73
Demonstration: Text Annotation: CATMA . 3
74
Demonstration: Text Annotation: CATMA . 4
75
Demonstration: Text Annotation: CATMA . 5
What tags should I use?
- Bottom-up (inductive)
- Top-down (deductive)
What’s most appropriate for the
text?
What is the unit of analysis?
What do I want to analyze for –
what am I wondering about>
What do I think I might find?
How is play a part of this process?
How might hypothesis testing
enter the process?
How might a collaborative
project proceed?
76
Demonstration: Text Annotation: CATMA . 6
77
Demonstration: Text Annotation: CATMA . 7
If there are too many tags, it is not possible to close-read the tags with the
text. The tags only have value for analysis.
As tags proliferate, it can be difficult to stop adding tags for each new nuance.
Should I tag the last line. Is this the “Object’s Action Property?” How to note
sarcasm?
Seek out appropriate models:
• See Narratology
• See Content Analysis
78
Demonstration: Text Annotation: CATMA . 8
79
Demonstration: Text Annotation: CATMA . 9
80
Demonstration: Text Annotation: CATMA . 10
81
Example Projects . 1
http://openresearch.weebly.com/example-projects.html
82
Example Projects . 2
http://www.oldbaileyonline.org/obapi//
83
Example Projects . 3
84
Example Projects . 4
Description of the BeardStair project
Cameron Blevins describes
how he became a Digital
Humanist by text mining
Martha Ballard’s diary.
85
Example Projects . 5
86
Example Projects . 5
87
Example Services . 1
Google Spreadsheet of Digital Humanities Centers
http://openresearch.weebly.com/eresearch-services-examples.html
To your knowledge, does your university or library support Digital Humanities research or eResearch
more broadly?
If no, would you feel comfortable approaching your library to find out if the library might help out? Or
someone in your department, or your college/ university, or through an association?
Would you feel comfortable continuing to explore the possibilities on your own?
What kinds of project(s) can you imagine pursuing? (or have you pursued?)
Would there be any need for, or value in, seeking collaboration? What kind of collaboration (or
support) would be valuable?
What kind of budget environment might factor into what services are available or could be made
available to you?
What kinds of services would be most valuable to you? (Ex: Videos, Link Lists, Project Descriptions,
Classes, One on One Consultations, Project Development support, etc.)
88
89
90
91
92

Más contenido relacionado

La actualidad más candente

Notational systems and cognitive evolution
Notational systems and cognitive evolutionNotational systems and cognitive evolution
Notational systems and cognitive evolutionJeff Long
 
When will there be a digital revolution in the humanities?
When will there be a digital revolution in the humanities?When will there be a digital revolution in the humanities?
When will there be a digital revolution in the humanities?Martin Wynne
 
Organizing and Embedding a Library Hackfest Into a 1st Year Course
Organizing and Embedding a Library Hackfest Into a 1st Year CourseOrganizing and Embedding a Library Hackfest Into a 1st Year Course
Organizing and Embedding a Library Hackfest Into a 1st Year Coursesshujah
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesChantal van Son
 
OntoSOC: S ociocultural K nowledge O ntology
OntoSOC:  S ociocultural  K nowledge  O ntology OntoSOC:  S ociocultural  K nowledge  O ntology
OntoSOC: S ociocultural K nowledge O ntology IJwest
 
Digital Humanities Revisited - Summary Report
Digital Humanities Revisited - Summary ReportDigital Humanities Revisited - Summary Report
Digital Humanities Revisited - Summary ReportVolkswagenStiftung
 
What is digital humanities ? What's doing into English departments?
What is digital humanities ? What's doing into English departments?What is digital humanities ? What's doing into English departments?
What is digital humanities ? What's doing into English departments?gondasmita
 
Topic Maps: Romancing Conversation Topics
Topic Maps: Romancing Conversation TopicsTopic Maps: Romancing Conversation Topics
Topic Maps: Romancing Conversation TopicsJack Park
 
Whats Wrong With Online Reading
Whats Wrong With Online ReadingWhats Wrong With Online Reading
Whats Wrong With Online ReadingRandy Connolly
 
شبكات التواصل
شبكات التواصلشبكات التواصل
شبكات التواصلNoha Abdelmoaty
 
Introduction to Electracy
Introduction to ElectracyIntroduction to Electracy
Introduction to ElectracyRichard Smyth
 

La actualidad más candente (19)

Notational systems and cognitive evolution
Notational systems and cognitive evolutionNotational systems and cognitive evolution
Notational systems and cognitive evolution
 
When will there be a digital revolution in the humanities?
When will there be a digital revolution in the humanities?When will there be a digital revolution in the humanities?
When will there be a digital revolution in the humanities?
 
Organizing and Embedding a Library Hackfest Into a 1st Year Course
Organizing and Embedding a Library Hackfest Into a 1st Year CourseOrganizing and Embedding a Library Hackfest Into a 1st Year Course
Organizing and Embedding a Library Hackfest Into a 1st Year Course
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 
OntoSOC: S ociocultural K nowledge O ntology
OntoSOC:  S ociocultural  K nowledge  O ntology OntoSOC:  S ociocultural  K nowledge  O ntology
OntoSOC: S ociocultural K nowledge O ntology
 
Digital Humanities Revisited - Summary Report
Digital Humanities Revisited - Summary ReportDigital Humanities Revisited - Summary Report
Digital Humanities Revisited - Summary Report
 
What is digital humanities ? What's doing into English departments?
What is digital humanities ? What's doing into English departments?What is digital humanities ? What's doing into English departments?
What is digital humanities ? What's doing into English departments?
 
Thatcamp recap
Thatcamp recapThatcamp recap
Thatcamp recap
 
Mortenson Distinguished Lecture2010
Mortenson Distinguished Lecture2010Mortenson Distinguished Lecture2010
Mortenson Distinguished Lecture2010
 
Topic Maps: Romancing Conversation Topics
Topic Maps: Romancing Conversation TopicsTopic Maps: Romancing Conversation Topics
Topic Maps: Romancing Conversation Topics
 
Whats Wrong With Online Reading
Whats Wrong With Online ReadingWhats Wrong With Online Reading
Whats Wrong With Online Reading
 
شبكات التواصل
شبكات التواصلشبكات التواصل
شبكات التواصل
 
P2 Lecture 1
P2 Lecture 1P2 Lecture 1
P2 Lecture 1
 
Newmedia2
Newmedia2Newmedia2
Newmedia2
 
P2 Lecture 4
P2 Lecture 4P2 Lecture 4
P2 Lecture 4
 
Introduction to Electracy
Introduction to ElectracyIntroduction to Electracy
Introduction to Electracy
 
P2 Lecture 2
P2 Lecture 2P2 Lecture 2
P2 Lecture 2
 
P2 Lecture 3
P2 Lecture 3P2 Lecture 3
P2 Lecture 3
 
P2 Lecture 5
P2 Lecture 5P2 Lecture 5
P2 Lecture 5
 

Destacado

The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...
The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...
The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...NatGustafsonSundell
 
The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...Brian Solis
 
Open Source Creativity
Open Source CreativityOpen Source Creativity
Open Source CreativitySara Cannon
 
Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)maditabalnco
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsBarry Feldman
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome EconomyHelge Tennø
 

Destacado (7)

The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...
The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...
The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 
The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...
 
Open Source Creativity
Open Source CreativityOpen Source Creativity
Open Source Creativity
 
Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post Formats
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome Economy
 

Similar a Open Research

Presentation ciula-paris2013
Presentation ciula-paris2013Presentation ciula-paris2013
Presentation ciula-paris2013Arianna Ciula
 
Materiality and the digital archive
Materiality and the digital archiveMateriality and the digital archive
Materiality and the digital archiveJisc
 
Cyberanthropology
CyberanthropologyCyberanthropology
CyberanthropologyUls Ulsaa
 
Bex lecture 5 - digitisation and the museum
Bex   lecture 5 - digitisation and the museumBex   lecture 5 - digitisation and the museum
Bex lecture 5 - digitisation and the museumBex Lewis
 
Dh presentation
Dh presentationDh presentation
Dh presentationOscarfuzz
 
Data versus Text: 30 years of confrontation
Data versus Text: 30 years of confrontationData versus Text: 30 years of confrontation
Data versus Text: 30 years of confrontationLou Burnard
 
How and why study big cultural data
How and why study big cultural dataHow and why study big cultural data
How and why study big cultural dataLev Manovich
 
Cyberanthropology
CyberanthropologyCyberanthropology
CyberanthropologyUls Ulsaa
 
Library trends and_theory
Library trends and_theoryLibrary trends and_theory
Library trends and_theoryJanet Tillotson
 
Comparative Literature in the Age of Digital Humanities _ On Possible Future ...
Comparative Literature in the Age of Digital Humanities _ On Possible Future ...Comparative Literature in the Age of Digital Humanities _ On Possible Future ...
Comparative Literature in the Age of Digital Humanities _ On Possible Future ...Hina Parmar
 
2014 02-21 media-open_day_talk_slides
2014 02-21 media-open_day_talk_slides2014 02-21 media-open_day_talk_slides
2014 02-21 media-open_day_talk_slidesJames Baker
 
Estado arte de las Humanidades Digitales. Algunos proyectos de investigación
Estado arte de las Humanidades Digitales. Algunos proyectos de investigaciónEstado arte de las Humanidades Digitales. Algunos proyectos de investigación
Estado arte de las Humanidades Digitales. Algunos proyectos de investigaciónGimena Del Rio Riande
 
Mattern Guest Lecture, Understanding Media Studies, 9/21/09
Mattern Guest Lecture, Understanding Media Studies, 9/21/09Mattern Guest Lecture, Understanding Media Studies, 9/21/09
Mattern Guest Lecture, Understanding Media Studies, 9/21/09Shannon Mattern
 
Introduction to digital scholarship and digital humanities in the liberal art...
Introduction to digital scholarship and digital humanities in the liberal art...Introduction to digital scholarship and digital humanities in the liberal art...
Introduction to digital scholarship and digital humanities in the liberal art...kgerber
 
Gender, education & new technologies
Gender, education & new technologies Gender, education & new technologies
Gender, education & new technologies Michael Peters
 
Digital Humanities vs. Information Science
Digital Humanities vs. Information ScienceDigital Humanities vs. Information Science
Digital Humanities vs. Information ScienceHans-Christoph Hobohm
 

Similar a Open Research (20)

Presentation ciula-paris2013
Presentation ciula-paris2013Presentation ciula-paris2013
Presentation ciula-paris2013
 
Second Looks
Second LooksSecond Looks
Second Looks
 
Materiality and the digital archive
Materiality and the digital archiveMateriality and the digital archive
Materiality and the digital archive
 
Cyberanthropology
CyberanthropologyCyberanthropology
Cyberanthropology
 
Bex lecture 5 - digitisation and the museum
Bex   lecture 5 - digitisation and the museumBex   lecture 5 - digitisation and the museum
Bex lecture 5 - digitisation and the museum
 
Dh presentation
Dh presentationDh presentation
Dh presentation
 
Data versus Text: 30 years of confrontation
Data versus Text: 30 years of confrontationData versus Text: 30 years of confrontation
Data versus Text: 30 years of confrontation
 
How and why study big cultural data
How and why study big cultural dataHow and why study big cultural data
How and why study big cultural data
 
Cyberanthropology
CyberanthropologyCyberanthropology
Cyberanthropology
 
Library trends and_theory
Library trends and_theoryLibrary trends and_theory
Library trends and_theory
 
Comparative Literature in the Age of Digital Humanities _ On Possible Future ...
Comparative Literature in the Age of Digital Humanities _ On Possible Future ...Comparative Literature in the Age of Digital Humanities _ On Possible Future ...
Comparative Literature in the Age of Digital Humanities _ On Possible Future ...
 
2014 02-21 media-open_day_talk_slides
2014 02-21 media-open_day_talk_slides2014 02-21 media-open_day_talk_slides
2014 02-21 media-open_day_talk_slides
 
Estado arte de las Humanidades Digitales. Algunos proyectos de investigación
Estado arte de las Humanidades Digitales. Algunos proyectos de investigaciónEstado arte de las Humanidades Digitales. Algunos proyectos de investigación
Estado arte de las Humanidades Digitales. Algunos proyectos de investigación
 
Mattern Guest Lecture, Understanding Media Studies, 9/21/09
Mattern Guest Lecture, Understanding Media Studies, 9/21/09Mattern Guest Lecture, Understanding Media Studies, 9/21/09
Mattern Guest Lecture, Understanding Media Studies, 9/21/09
 
Humanities and ICT
Humanities and ICTHumanities and ICT
Humanities and ICT
 
Introduction to digital scholarship and digital humanities in the liberal art...
Introduction to digital scholarship and digital humanities in the liberal art...Introduction to digital scholarship and digital humanities in the liberal art...
Introduction to digital scholarship and digital humanities in the liberal art...
 
Dh intro
Dh introDh intro
Dh intro
 
Dh presentation 2018
Dh presentation 2018Dh presentation 2018
Dh presentation 2018
 
Gender, education & new technologies
Gender, education & new technologies Gender, education & new technologies
Gender, education & new technologies
 
Digital Humanities vs. Information Science
Digital Humanities vs. Information ScienceDigital Humanities vs. Information Science
Digital Humanities vs. Information Science
 

Último

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Último (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Open Research

  • 1. Open for Research: A Demonstration of Text Analysis Applications and a Discussion of Library Collaboration Opportunities Nat Gustafson-Sundell Journal Acquisitions/ Reference Librarian Assistant Professor Minnesota State University, Mankato See OpenResearch.Weebly.Com for my notes
  • 2. Section 1: 60-70 minutes I. Introduction to Workshop (5-10) II. Overview: A Functional Description of Open Access (15) III. Overview: Digital Humanities, Humanities Computing, eResearch: Background and Theory (20-25) IV. Overview: Text Analysis & Topic Modeling: Background and Theory (20, possibly broken by coffee break) *Coffee Break 2:30-2:45* Section 2: 65-70 minutes (Start around 2:45-50) V. Demonstration: Text Discovery/ Text Preparation/ Text Analysis/ Topic Modeling (30) VI. Group Exploration: Text Analysis/ Topic Modeling (20, less if we need to make up some time) VII. Overview: Content Analysis (15-20, start by 3:40) *Breather* (5 minutes) Section 3: 55-60 minutes (Start around 4:00) VIII. Demonstration: Text Annotation Tools & Relational Databases (20) IX. Group Exploration: Projects (20-25) X. Group Exploration: Collaboration (eResearch Centers and Libraries) (15, less if we need to make up some time) Schedule 2
  • 3. Overview: Open Access . 1 http://openresearch.weebly.com/open-access.html Key Terms: • Gratis/ Libre • Green/ Gold/ “Platinum” • Working Papers, pre-prints, e-prints, etc. • Subject and Institutional Repositories 3
  • 5. https://creativecommons.org/licenses/ Overview: Open Access . 3 • Do not sign away your copyright (unless you have a REALLY good reason) 5
  • 6. Overview: Open Access . 4 “We are all, or are all soon to become, nineteenth centuryists.” - Matthew Jockers 6
  • 7. Overview: Digital Humanities . 1 http://openresearch.weebly.com/digital-humanities-eresearch.html “In 1949, an Italian Jesuit priest, Father Roberto Busa, began what even today is a monumental task: to make an index verborum of all the words in the works of St. Thomas Aquinas and related authors, totaling some 11 million words of medieval Latin. Father Busa imagined that a machine might be able to help him, and, having heard of computers, went to visit Thomas J. Watson at IBM in the United States in search of support … The entire texts were gradually transferred to punched cards and a concordance program written for the project.” (Hockey online) “The History of Humanities Computing,” Susan Hockey, 2004 7
  • 8. Overview: Digital Humanities . 2 “The real origin of that term *digital humanities+ was in conversation with Andrew McNeillie, the original acquiring editor for the Blackwell Companion to Digital Humanities. We started talking with him about that book project in 2001, in April, and by the end of November we’d lined up contributors and were discussing the title, for the contract. Ray *Siemens+ want ‘A Companion to Humanities Computing’ as that was the term commonly used at that point; the editorial and marketing folks at Blackwell wanted ‘Companion to Digitized Humanities.’ I suggested ‘Companion to Digital Humanities’ to shift emphasis away from simple digitization.” (John Unsworth quoted in Kirschenbaum 2-3) “Twitter, along with blogs and other online outlets, has inscribed the digital humanities as a network topology, that is to say, lines drawn by aggregates of affinities, formally and functionally manifest in who follows whom, who friends whom, who tweets whom, and who links to what.” (Kirschenbaum 5) “What is Digital Humanities and What’s It Doing in English Departments?” Matthew G. Kirschenbaum, 2010 8
  • 9. Overview: Digital Humanities . 3 “…the digital humanities can be … a nexus of fields within which scholars use computing technologies to investigate the kinds of questions that are traditional to the humanities, or… who ask traditional kinds of humanities- oriented questions about computing technologies.” (Fitzpatrick online) “Reporting from the Digital Humanities 2010 Conference,” Kathleen Fitzpatrick, 2010 9
  • 10. (Meeks online, 2011) See https://dhs.stanford.edu/comprehending-the-digital-humanities/ Overview: Digital Humanities . 4 10 Image Removed to avoid possibility of copyright infringement
  • 11. Overview: Digital Humanities . 5 “The Landscape of Digital Humanities,” Patrik Svensson, 2010 5 “paradigmatic modes of engagement between the humanities and information technology: information technology as a tool, an object of study, an exploratory laboratory, an expressive medium and an activist venue.” (Svensson online) “…the digital humanities comprise a field in a loose sense.” (Svensson online) “…it seems quite unlikely that the digital humanities would ever become a fully separate field.” (Svensson online) “The complexity of digital humanities as a ‘field’ comes partly from its disciplinary and institutional diversity, and its multiple modes of engagement with information technology.” (Svensson online) “…one interesting question is whether the digital calls for other modes of investigation, collaboration and making that may be partially incompatible with the epistemic commitments of the established discipline or field.” (Svensson online) 11
  • 12. Overview: Digital Humanities . 6 “A Genealogy of Digital Humanities,” Marija Dalbello, 2010 “This paper explores the history of digital humanities that grappled with the epistemological status of technology, as a major field of contemplation of the effects of new media on writing, reading, and interpretation.” (Dalbello 482) Previous to the TLG, according to Karen Ruhleder, “gaining familiarity with the corpus was a life’s work” (1995), but given the TLG, graduate students could “ask questions that formerly could only be answered through comprehensive reading and experience un the field.” (Dalbello 488) The TLG “had a profound restructuring effect on knowledge production in the field of classics … a broader range of ‘legitimate’ research queries could produce a sense of ‘doing complete work’” so that the “researcher could be free to do more intellectually interesting work rather than focusing on learning the corpus” (Dalbello 489) “Seeing patterns and connections out of context liberates them from an archive, as a signifier that attaches itself to a new meaning; searches through digital corpora can produce radical readings and undermine existing interpretations, thus contributing to a program of critical interpretation,” (Dalbello 493) 12
  • 13. Overview: Digital Humanities . 7 “The term post-human … suggests that our sense of what it is to be human has changed – as Katherine Hayles puts it, the post-human is a state of mind, a realization that mankind has finally understood that it is definitely not the centre of the universe. My concern here is to consider the implications of this post-human state of mind for our understanding and practice of the digital humanities.” (Prescott online) “Making the Digital Human: Anxieties, Possibilities, Challenges,” Andrew Prescott, 2012 “For all the rhetoric about digital technologies changing the humanities, the overwhelming picture presented by the activities of digital humanities centres in the Great Britain is that they are busily engaged in turning back the intellectual clock and reinstating a view of the humanities appropriate to the 1950s...” (Prescott online) “…as far as the digital humanities are concerned, interdisciplinarity is just a cover for the lack of a distinctive intellectual agenda … Another major obstacle preventing the digital humanities developing its own scholarly identity is our interest in method. If we focus on modelling methods used by other scholar, we will simply never develop new methods of our own.” (Prescott online) “We might start by seeking closer contact with our colleagues in Cultural and Media Studies.” (Prescott online) “We should be seeking to provide new perspectives on the way in which technology interacts with text.” (Prescott online) 13
  • 14. Overview: Digital Humanities . 8 “The state of the digital humanities: A report and a critique,” Alan Liu, 2011 “A purely economic rationale for the digital humanities might … be that they re-engineer higher education for knowledge work by providing ever smarter tools for working with increasingly global-scale knowledge resources, all the while trimming the need to invest proportionally in the traditional facilities…” (Liu 2011) “I offer a report on the current state of the digital humanities … I will define it with unusual breadth. “Digital humanities’ will here have a supervening sense that combines ‘humanities computing’ or ‘text-based’ digital humanities … and new media studies …” (Liu 10) “Currently, I fear, the digital humanities are not ready to take up their full responsibility because the field does not posses an adequate critical awareness of the larger social, economic, and cultural issues at stake … the whole amounts to the lack of a mental and policy firewall against postindustrial takeovers of the digital idea …” (Liu 11) “The digital humanities are on the threshold of a new interpretive paradigm. The old paradigm, especially on the text-oriented side of the field, was constraining. That paradigm was empirical … about ‘hypothesis testing and empirical validation’. … The new paradigm allows computers and humans to share responsibility for the full act of interpretation, including the component acts of hypothesis-framing, observation, discovery, analysis, testing, reiterative hypothesis-framing, etc.” (Liu 21) “…the text-oriented side of the digital humanities has been almost wholly uninterested in any social, political, economic, or cultural inquiry into the contexts and implications of information technology.” (Liu 30) 14
  • 15. Overview: Digital Humanities . 9 “….attempts to work toward a theoretical foundation for humanities computing surfaced at the outset of scholarly publication in the field and have been in progress ever since – with no consensus in sight … the considerable variety in how humanities computing is … conceived is a sign of health rather than decay.” (McCarty 1228) “….humanities computing … need not wait on the emergence of a theoretical framework … its semidirected, semicoherent activities are no discredit, rather the norm for an experimental field.” (McCarty 1233) “…the crafted objects of humanities computing are … primary ‘metatheoretical statements’ produced by those who ‘think in things rather than words.’ A new definition of scholarship, demanind new abilities, would seem to follow.” (McCarty 1227) “The question of reinventing our scholarly forms takes us beyond any list of projects to what we might consider the fundamental ‘project’ of humanities computing … This is, in simple but far-reaching terms, epistemological: to ask, in the context of computing, what can (and must) be known of our artifacts, how we know what we know about them, and how new knowledge is made.” (McCarty 1231) “The socioacademic function of humanities computing can be understood as an elaboration of the Galisonian trading zone, which the field establishes as a methodological commons and within it, takes the role of merchant trader among mutually divergent academic cultures.” (McCarty 1232) “Play, apparently without conscious direction, is a recognized factor in scientific discovery.” (McCarty 1232) “Humanities Computing,” Willard McCarty, 2003 15
  • 16. Overview: Digital Humanities . 9 (McCarty 1225) 16 Image Removed to avoid possibility of copyright infringement
  • 17. Overview: Text Analysis . 1 http://openresearch.weebly.com/text-analysis.html Some Terms: • Corpus • Concordance • Key Words in Context (KWIC) • Stopwords • Collocation • Frequency (and Frequency Distribution) • Lemmatization • Hapax legomena (dis, tris, tetrakis) • Etc. “Text-analysis tools have their roots in the print concordance. The concordance is a standard research tool in the humanities that goes back to the thirteenth century. … The challenge before us is to question our procedural habits and presuppositions as to what are legitimate recombinations – to forget the concordance and ask anew how we can analyse text with a computer and whether such computer-assisted interpretations are interesting in and of themselves. We need to play again and make playpens available to our colleagues. … I therefore want to propose a very different image of what a concordance is … I call it a hybrid (or monster) because it is authored not just by the original author, but also by the user’s choices and the procedures used to generate it … it is neither the work of the original author nor that entirely of the provoker of the concordance.” (Rockwell 210,213) 17
  • 18. Overview: Text Analysis . 2 Macroanalysis, Matthew Jockers, 2013 “The literary scholar of the twenty-first century can no longer be content with anecdotal evidence, with random ‘things’ generated from a few , even ‘representative’ texts. We must strive to understand these things in the context of everything else, including a mass of possibly ‘uninteresting’ texts.” (Jockers 8) “Today’s student of literature must be adept at reading and gathering evidence from individual texts and equally adept at accessing and mining digital-text repositories.” (Jockers 9) “At the macro scale , we see evidence of time and gender influences on theme and style. By superimposing these two network snapshots in our minds, we can begin to imagine a larger context in which to read and study nineteenth-century literature. What is clear is that the books we have traditionally studied are not isolated books. The canonical greats are not even outliers: they are books that are similar to other books…” (Jockers 168) “…macroscopic investigation is contextualization on an unprecedented scale.” (Jockers 27-8) “It is the exact interplay between the macro and micro scale that promises a new, enhanced, and perhaps even better understanding of the literary record. The two approaches work in tandem and inform each other. Human interpretation of the ‘data,’ whether it be mined at the macro or micro level, remains essential … The most fundamental and important difference in the two approaches is that the macroanalytic approach reveals details about texts that are for all intents and purposes unavailable to close-readers of the texts.” (Jockers online) See Example Project 18
  • 19. Overview: Text Analysis . 3 Distant Reading, Franco Moretti, 2013 “Writing about comparative social history, Marc Bloch once coined a lovely ‘slogan,’ as he himself called it: ‘years of analysis for a day of synthesis’; and if you read Braudel or Wallerstein you immediately see what Bloch had in mind. The text which is strictly Wallerstein’s, his ‘day of synthesis’, occupies one-third of a page … the rest are quotations … Years of analysis; other people’s analysis, which Wallerstein’s page synthesizes into a system. Not, if we take this model seriously, the study of world literature will somehow have to reproduce this ‘page’ – which is to say: this relationship between analysis and synthesis – for the literary field. But in that case, literary history will quickly become very different from what it is now: it will become ‘second hand’: a patchwork of other people’s research, without a single direct textual reading. Still ambitious, and actually even more so than before (world literature!); but the ambition is now directly proportional to the distance from the text: the more ambitious the project, the greater must the distance be.” (Moretti 47-8, 2000) “Distant reading: where distance … is a condition of knowledge: it allows you to focus on units that are much smaller or much larger than the text: devices, themes, tropes – or genres and systems. And if, between the very small and the very large, the text itself disappears, well, it is one of those cases when one can justifiably say, Less is more. It we want to understand the system in its entirety, we must accept losing something…” (Moretti 48-9, 2000) 19
  • 20. Overview: Text Analysis . 4 Distant Reading, Franco Moretti, 2013 (Moretti 221) See also http://litlab.stanford.edu/LiteraryLabPamphlet2B.Figures.pdf 20 Image Removed to avoid possibility of copyright infringement
  • 21. Overview: Text Analysis . 5 Reading Machines, Stephen Ramsay, 2011 “…literary criticism operates at a register in which understanding, knowledge, and truth occur outside of the narrower denotative realm in which scientific statements are made. It is not merely the case that literary criticism is concerned with something other than the amassing of verified knowledge. Literary criticism operates within a hermeneutical framework in which the specifically scientific meaning of fact, metric, verification, and evidence simply do not apply … ‘evidence’ stands as a metaphor for the delicate building blocks of rhetorical persuasion … ‘Verification’ occurs in a social community of scholars whose agreement or disagreement is almost never put forth without qualification.” (Ramsay 7, 2011) “If text analysis is to participate in literary critical endeavor in some manner beyond fact-checking, it must endeavor to assist the critic in the unfolding of interpretive possibilities. We might say that its purpose should be to generate further ‘evidence,’ though we do well to bracket the association that term holds in the context of less methodologically certain pursuits. The evidence we seek is not definitive, but suggestive of grander arguments and schemes.” (Ramsay 10, 2011) “Critics often use the word ‘pattern’ to describe what they’re putting forth, and that word aptly connotes the fundamental nature of the data upon which literary insight relies. The understanding promised by the critical act arises not from a presentation of facts, but from the elaboration of a gestalt, and it rightfully includes the vague reference, the conjectured similitude, the ironic twist, and the dramatic turn. In the spirit of inventio, the critic freely employs the rhetorical tactics of conjecture – not so that a given matter might be definitely settled, but in order that the matter might become richer, deeper, and ever more complicated. (Ramsay 16, 2011) 21
  • 22. “Although it is true that we do not typically ask *students+ to cast yarrow stalks or choose things at random, we do ask them to find some pattern beyond the apparent pattern of the text … We ask them to select, isolate, notice – to consider a small group of sub-patterns from among the infinity of patterns that make up the text. Having done this, we then ask them to re-articulate those patterns in narrative form as elucidations of the texts in which they occur. We call those articulations ‘meanings’, and we call the act of embedding them in a narrative framework ‘interpretation’. … With algorithmic criticism, one would not ask how the ends of interpretation were or were not justified by means of the algorithms imposed, but rather, how successful the algorithms were in provoking thought and allowing insight.” (Ramsay 171, 173, 2003) Overview: Text Analysis . 6 “…the real message of our technology is something entirely unexpected – a writerly, anarchic text that is more useful than the readerly, institutional text … This is, if you like, the basis of the Screwmeneutical Imperative. There are so many books. There is so little time. Your ethical obligation is neither to read them all nor to pretend that you have read them all, but to understand each path through the vast archive as an important moment in the world’s duration – as an invitation to community, relationship, and play.” (Ramsay online, 2010) Reading Machines, Stephen Ramsay, 2011, and other essays “… As with any text-analytical result, we can weave a narrative through the gaps. For this reason, we would do better to say it carves a new path through the document space, which in turn allows us to reread and rethink…” (Ramsay 80, 2011) 22
  • 23. Overview: Text Analysis . 7 “Tampering with the Text to Increase Awareness of Poetry’s Art,” Estelle Irizarry, 1996 “Computer-Assisted Reading: Reconceiving Text Analysis,” Stefan Sinclair, 2003 “What is Text Analysis, Really?” Geoffrey Rockwell, 2003 “The value of the computer-mediated exercises is that they enable readers to readily perceive and appreciate features that are not obvious in a conventional reading of a printed text.” (Irizarry 155) “The computer is, among other things, an instrument uniquely suite to play activities ...” (Irizarry 156) “By thinking more about process than outcomes, about multiplying meanings (not data) rather than converging on answers, we can consider how to make the computer an extension of the reading and interpretive practices in which humanists are already engaged.” (Sinclair 176) “Playful experimentation is a pragmatic approach of trying something, seeing if you obtain interesting results, and if you do, then trying to theorize why those results are interesting rather than starting from articulated principles.” (Rockwell 214, 2003) “Assembling and disassembling a text, like playing with blocks of Lego, may not necessarily contribute immediately to its understanding, but it is likely to contribute to the aggregate experience of the text in valuable ways. … I am suggesting that play is an integral part of a humanist’s interpretive activities…” (Sinclair 181) “…we should rethink our tools on a principle of research as disciplined play.” (Rockwell 213) 23
  • 24. Overview: Text Analysis . 8 “Between Language and Literature: Digital Text Exploration,” Geoffrey Rockwell and Stefan Sinclair, 2009 “Just because we can’t extract the same meaning(s) from a representation in the way we might from a traditional text does not mean that representations can’t be read. … As everyone should know by now, looking at visualizations of texts is a form of exploring and should be taken not as analysis, but exploration. … We can (and must) learn new ways of reading texts, and to embrace mathematical abstraction and visualization as interpretive allies rather than black-box enemies.” (Gibbs online, 2013) “We have found that students enjoy submitting their own texts to these types of analysis tools, where they discover aspects of their writing of which they were not aware (like a propensity for repeating a given phrase. An engaging activity can be to have students try to find texts on the web that most closely resemble the data profile of their own texts. Do so can provoke interesting results and awaken the curiosity of the students for the relationship between text analysis and linguistic proficiency.” (Rockwell and Sinclair, 2009) “Learning to Read Again,” Fred Gibbs, 2013 24
  • 25. Overview: Topic Modeling . 1 http://openresearch.weebly.com/topic-modeling.html (Blei 78) 25 Image Removed to avoid possibility of copyright infringement
  • 26. Overview: Topic Modeling . 2 Time Years Sing Past Land Songs Long Things Divine Blood … 45% Man Body Soul Poems Woman Make True Large Beauty Times … 23% Thee Thy Soul Joy Life Ship Space Joys Long … 10% Earth Men Face Strong Young Love Cities Children Women Fill … 10% World Life States War America Great Present Future Real Today … 5% 26
  • 27. Overview: Topic Modeling . 3 “Topic modeling gives us a way to infer the latent structure behind a collection of documents. In principle, it could work at any scale, but I tend to think human beings are already pretty good at inferring the latent structure in (say) a single writer’s oeuvre. I suspect this technique becomes more useful as we move toward a scale that is too large to fit into human memory.” (Underwood online) “Topic modeling made just simple enough,” Ted Underwood, 2012 “…I’m not sure how much value they will have as evidence. For one thing, they require you to make a series of judgment calls that deeply shape the results you get (from choosing stopwords, to the number of topics produced, to the scope of the collection). The resulting model ends up being tailored in difficult-to-explain ways by a researchers preferences.” (Underwood online) See a variety of Topic Model visualizations “…excitement about the use of topic models for discovery needs to be tempered with skepticism about how often the unexpected juxtapositions LDA creates will be helpful, and how often merely surprising. A poorly supervised machine learning algorithm is like a bad research assistant. It might produce some unexpected constellations that show flickers of deeper truths; but it will also produce tedious, inexplicable, or misleading results.” (Schmidt 50) “Words Alone: Dismantling Topic Models in the Humanities,” Benjamin M. Schmidt, 2012 27
  • 28. Demonstration: Text Analysis: Text Discovery . 1 http://openresearch.weebly.com/tools.html http://chroniclingamerica.loc.gov/ 28
  • 29. Demonstration: Text Analysis: Text Discovery . 2 29
  • 30. Demonstration: Text Analysis: Text Discovery . 3 30
  • 31. Demonstration: Text Analysis: Text Discovery . 4 http://www.gutenberg.org/ 31
  • 32. Demonstration: Text Analysis: Text Discovery . 5 32
  • 33. Demonstration: Text Analysis: Text Discovery . 6 33
  • 34. Demonstration: Text Analysis: Text Preparation . 1 http://openresearch.weebly.com/tools.html 34
  • 35. Steps: 1. Reg Ex: ̶ → □ 2. Reg Ex: n→ ◊ 3. Reg Ex: r → ◊ 4. Normal: ; → ;□ 5. Normal: , → ,□ 6. Normal: . → .□ 7. Normal: ? → ?□ 8. Normal: ! → !□ 9. Normal: : → :□ 10. Reg Ex: (s{2,}) → □ 11. Normal: -□ → ◊ 12. Normal: - → ◊ 13. Add carriage returns for every “document” 14. Encode in UTF-8 Demonstration: Text Analysis: Text Preparation . 2 35
  • 36. Demonstration: Text Analysis: Text Preparation . 3 36
  • 37. Steps: 1. Reg Ex: (.+?) → □ Find every instance of parentheses enclosing any text. Demonstration: Text Analysis: Text Preparation . 4 37 Image Removed to avoid possibility of copyright infringement
  • 38. Demonstration: Text Analysis: Voyant . 1 http://openresearch.weebly.com/tools.html http://openresearch.weebly.com/texts.html http://voyant-tools.org 38
  • 44. <ctrl> A <ctrl> C Demonstration: Text Analysis: Voyant . 7 44
  • 46. Demonstration: Text Analysis: Voyant . 9 46 Image Removed to avoid possibility of copyright infringement
  • 47. Demonstration: Text Analysis: Topic Modeling Tool . 1 http://openresearch.weebly.com/tools.html http://code.google.com/p/topic-modeling-tool/ 47
  • 48. Demonstration: Text Analysis: Topic Modeling Tool . 2 48
  • 49. Demonstration: Text Analysis: Topic Modeling Tool . 3 1. Reg Ex: n→ ◊ 2. Reg Ex: r → ◊ 49
  • 50. Demonstration: Text Analysis: Topic Modeling Tool . 4 50
  • 51. Demonstration: Text Analysis: Topic Modeling Tool . 5 51
  • 52. Demonstration: Text Analysis: Topic Modeling Tool . 6 52
  • 53. Demonstration: Text Analysis: Topic Modeling Tool . 7 53
  • 54. Concept? Language Type? Sentiment Cluster? Demonstration: Text Analysis: Topic Modeling Tool . 8 54
  • 55. Group Exploration: Text Analysis . 1 http://openresearch.weebly.com/tools.html ??? 55
  • 56. Leaves of Grass 1867 Group Exploration: Text Analysis . 2 56
  • 57. Leaves of Grass 1892 Group Exploration: Text Analysis . 3 57
  • 58. Leaves of Grass 1892 Group Exploration: Text Analysis . 4 58
  • 59. There is 1 document in this corpus with a total of 130,506 words and 14,594 unique words. Most frequent words in the corpus: old (303), shall (265), life (261), love (261), soul (245). More… Leaves of Grass 1892 Group Exploration: Text Analysis . 5 59
  • 60. Word 1892 Count 1892 Ratio 1867 Count 1867 Ratio old 303 0.00232173 157 0.00157965 shall 265 0.00203056 218 0.0021934 life 261 0.00199991 161 0.0016199 love 261 0.00199991 218 0.0021934 soul 245 0.00187731 160 0.00160984 long 236 0.00180835 154 0.00154947 earth 229 0.00175471 199 0.00200223 night 210 0.00160912 168 0.00169033 man 206 0.00157847 190 0.00191168 day 188 0.00144055 140 0.00140861 men 185 0.00141756 182 0.00183119 know 179 0.00137158 146 0.00146898 death 178 0.00136392 131 0.00131805 come 175 0.00134093 131 0.00131805 time 175 0.00134093 111 0.00111682 great 174 0.00133327 159 0.00159977 world 157 0.00120301 78 0.0007848 sea 153 0.00117236 124 0.00124762 hear 137 0.00104976 115 0.00115707 like 137 0.00104976 94 0.00094578 face 132 0.00101145 111 0.00111682 hand 128 0.0009808 101 0.00101621 good 126 0.00096547 100 0.00100615 body 124 0.00095015 114 0.00114701 young 122 0.00093482 101 0.00101621 Total 130506 99389 Sorted by 1892 Ratio Same list re-sorted by 1867 Ratio Leaves of Grass 1867, 1892 Group Exploration: Text Analysis . 6 60
  • 61. Overview: Content Analysis . 1 http://openresearch.weebly.com/content-analysis.html “Content Analysis” or “message analysis”? • Rhetorical Analysis • Narrative Analysis • Discourse Analysis • Structuralist (or Semiotic) Analysis • Interpretative Analysis • Conversation Analysis • Critical Analysis • Normative Analysis (Neuendorf 5-8) “Content analysis is a summarizing, quantitative analysis of messages that relies on the scientific method and is not limited as to the types of the variables that may be measured or the context in which the messages are created or presented.” (Neuendorf 10) “Content analysis is any technique for making inferences by objectively and systematically identifying specified characteristics of messages. (Holsti 25) 61
  • 62. Overview: Content Analysis . 2 Concepts and Design: • Quantitative/ Qualitative • Deductive/ Inductive • Manifest/ Latent Content • Content/ Form • Reliability, Validity, Generalizability, Replicability • Unitizing • Etc. • Etc. Content analysis “ ‘learned its methods from cryptography, from the subject classification of library books, and from biblical concordances, as well as from standard guides to legal precedents’ “ (Marvick, quoted in Rogers 214, 1994, quoted in Neuendorf 31) 62
  • 63. Message produced by source A: time t1 Message produced by source A: time t2 Content Variable X A X t1 A X t2 Trends in communication content Adapted from Figure 2-2, Holsti 28 Overview: Content Analysis . 3 63 Image Removed to avoid possibility of copyright infringement
  • 64. Adapted from Figure 2-5, Holsti 30 Messages produced by source A Content Variable X A X Relationship of content variables to each other Content Variable Y A Y Overview: Content Analysis . 4 64 Image Removed to avoid possibility of copyright infringement
  • 65. Message produced by source A Message produced by source B Content Variable X A X B X Differences between communicators Adapted from Figure 2-6, Holsti 30 Overview: Content Analysis . 5 65 Image Removed to avoid possibility of copyright infringement
  • 66. Content Analysis can be used to: • To describe trends in communication content • To relate known characteristics of sources to the messages they produce • To audit communication content against standards • To analyze techniques of persuasion • To analyze style • To relate known characteristics of the audience to messages produced for them • To describe patterns of communication • To secure political and military intelligence • To analyze psychological traits of individuals • To infer aspects of culture and cultural change • To provide legal evidence • To answer questions of disputed authorship • To measure readability • To analyze the flow of information • To assess responses to communication Adapted from Table 2-1, Holsti 26 Overview: Content Analysis . 6 66
  • 67. Overview: Content Analysis . 7 A 1935 study by Edgar Dale analyzed the themes and emphases of American motion pictures. (Neuendorf 33-4) A World War II study “systematically analyzed radio broadcasts from Axis powers …. Allied forces were able to estimate the concentration of German troops in various locations by comparing music played on German radio stations with music played elsewhere in occupied Europe.” (Neuendorf 37) A 1994 study by Kathleen Carley tracked “the representation of robots in science fiction” from “three different time periods – pre-1950s, the 1950s and 1960s, and the 1970s and 1980s.” (Neuendorf 184-6) See visualization p. 185 A 1963 study by E.S. Shneidman analyzed political rhetoric “to infer personality traits of the speaker from logical and cognitive characteristics of his verbal production.” The study categorized “Idiosyncrasies of reasoning and cognitive maneuvers in the rhetoric of John F. Kennedy, Richard M. Nixon, and Nikita Krushchev:” • Idiosyncrasies of reasoning: o Irrelevant premise o Argumentum ad populum o Complex question o Derogation o Stranded predicate o Truth-type confusion • Cognitive Maneuvers: o To enlarge or elaborate the preceding o To smuggle debatable point into alien context o To be irrelevant o To allege but not substantiate o To introduce new notion Adapted from Holsti 72-4 67
  • 68. Overview: Content Analysis . 8 ArticleManager 68
  • 69. This is a typical hierarchy of forms linked from the main form. The Group and Actor forms are more complexly related, so possibly more interesting to look at, but would not fit so nicely on a single page. Overview: Content Analysis . 9 ArticleManager 69
  • 70. Overview: Content Analysis . 10 ArticleManager 70
  • 71. Overview: Content Analysis . 11 ArticleManager 71
  • 72. Demonstration: Text Annotation: CATMA . 1 http://openresearch.weebly.com/tools.html http://www.catma.de/ 72
  • 76. Demonstration: Text Annotation: CATMA . 5 What tags should I use? - Bottom-up (inductive) - Top-down (deductive) What’s most appropriate for the text? What is the unit of analysis? What do I want to analyze for – what am I wondering about> What do I think I might find? How is play a part of this process? How might hypothesis testing enter the process? How might a collaborative project proceed? 76
  • 78. Demonstration: Text Annotation: CATMA . 7 If there are too many tags, it is not possible to close-read the tags with the text. The tags only have value for analysis. As tags proliferate, it can be difficult to stop adding tags for each new nuance. Should I tag the last line. Is this the “Object’s Action Property?” How to note sarcasm? Seek out appropriate models: • See Narratology • See Content Analysis 78
  • 82. Example Projects . 1 http://openresearch.weebly.com/example-projects.html 82
  • 83. Example Projects . 2 http://www.oldbaileyonline.org/obapi// 83
  • 85. Example Projects . 4 Description of the BeardStair project Cameron Blevins describes how he became a Digital Humanist by text mining Martha Ballard’s diary. 85
  • 88. Example Services . 1 Google Spreadsheet of Digital Humanities Centers http://openresearch.weebly.com/eresearch-services-examples.html To your knowledge, does your university or library support Digital Humanities research or eResearch more broadly? If no, would you feel comfortable approaching your library to find out if the library might help out? Or someone in your department, or your college/ university, or through an association? Would you feel comfortable continuing to explore the possibilities on your own? What kinds of project(s) can you imagine pursuing? (or have you pursued?) Would there be any need for, or value in, seeking collaboration? What kind of collaboration (or support) would be valuable? What kind of budget environment might factor into what services are available or could be made available to you? What kinds of services would be most valuable to you? (Ex: Videos, Link Lists, Project Descriptions, Classes, One on One Consultations, Project Development support, etc.) 88
  • 89. 89
  • 90. 90
  • 91. 91
  • 92. 92

Notas del editor

  1. MeJournal Acquisitions/ Reference Librarian Art,Communication Studies &amp; Mass Media liaison Formerly Electronic Resources Librarian, Chair of Scholarly Communications Committee at NULCreated data model and programmed database application for a complex, long-term political science content analysis projectAs librarian, created data models and programmed database applications:To track resources purchased per year, expenditures, and budgets (the main goal was an easy-to-use interface for tech averse librarians)To track resources in a large variety of ways (LC, Dewey, Year, Language, and so on), along with Circulation and Expenditure informationCitation Analysis prototypeAT MSU, currently working with a graduate student team on a tool to track electronic resource usage statistics: SUSHI client, repository, and reporting functionalityPrevious Career: Treasurer of Software Development Company and Business Manager of Market Research Firm (basically a social science lab for hire) The goals of this workshopOverview awareness of Open Access, Digital Humanities, Text Analysisincluding familiarity with significant concepts and jargon in these areasas well as a notion of some of the differences among the leading thinkers in these areas.Awareness of at least three tools that can be used for text analysisalong with some ideas about how to apply these tools to actual projectsas well as some ideas about how these tools might be used for pedagogical purposesI want you to walk out of here comfortable enough to consider using Voyant or CATMA on your own and, if you want to pursue any of the areas under discussion further, I want you to have some confidence that you know how to proceed.
  2. I’ll apologize in advance for all of the text on the slides for the overviews of the Digital Humanities and Text Analysis. There are several important essays in both areas. To give you a sense of these papers, I provided numerous quotations. I will not be reading all of the quotations, although I will draw your attention to several as we move along.If you don’t have time to read the quotations today, you will find them at the website in the slideshow. I have also included a works cited section in the slideshow.
  3. Some DH practitioners don’t seem aware of the importance of Open Access, but it is an essential background concept.Origins of OA in serials crisisOld Publishing Model: Scholar signs away copyright to publisher for freeNew Publishing Model: Scholar retains copyright (library as publisher, other “platinum” OA), or at least retains some rights from commercial publishersOA termsOA venuesThe key to Open Access is rights.
  4. Alternative to Open Access is the Public DomainIn U.S., everything published before 1923, and a selection of post 1923 publications depending on a variety of conditionsCopyright shouldn’t cover text analysis, but publishers hire expensive lawyers to block usage in attempt to profit by another means. 4 conditions of Fair Use:The purpose and character of the use, including whether such use is of commercial nature or is for nonprofit educational purposesThe nature of the copyrighted workThe amount and substantiality of the portion used in relation to the copyrighted work as a wholeThe effect of the use upon the potential market for, or value of, the copyrighted work
  5. Traditional origin story: one protagonist on a questThe genealogy also sometimes acknowledges other ancestors, such as “quantitative approaches to style and authorship studies” (Hockey, 2004)What is excluded and why?See content analysis studies on art (Gordon, 1952; Paisley, 1968), cartoons (Ehrle and Johnson, 1961), music (Paisley, 1964; La Rue, 1967), the readability of texts (Gray and Leary, 1935; Flesch, 1943; Lorge, 1944; Kearl, 1948), and so forth (from Holsti, 1969).How “should” humanists engage with tools? How “should” humanists conduct research?Ex: Can humanists utilize social scientific methodologies? … and would this lead toward a “consilient” humanities? (a humanities consistent with scientific understanding vs. current “quasi-creationism,” a la Gottschall’s formulation)There is a great deal of argumentation about methodology, which I’ll try to cover in some degree later. For now:The truth is that there don’t seem to be many humanists who actually “get” scientific methodology More often, scientific jargon is used (words like hypothesis and falsifiability), but the usage can be misleading. Or quantification is used to make a “case,” but the case is actually a rhetorical argument, not the report of a scientific study. My position is nuanced. I think we need to beware of the abuse of scientific terms for non-scientific studies. I do think there is a place for scientific methodologies in many kinds of humanities studies, so, for example, the humanities could learn much by studying content analysis, but:I tend to follow Stephen Ramsay, whom you’ll meet later. He makes the case for “homegrown” humanities-based methodologies.And anyway, should the computer really be at the heart of the story?The computer concordance replaces book concordances and index card concordances, for example. Isn’t the computer just a step on a ladder? So what is the ladder?(how much longer will we be thinking in terms of “computers” as such anyway? How many computers do you have in your home – do you know?)
  6. Digital Humanities vs. The Digital HumanitiesSingular: A field itself, standing on its own: theory(ies), programs, departments, majorsPlural: A variety of research in a range of humanities fieldsSingular: The digital as object of study? Leading to newly emergent object(s) of study?Plural: The digital (information technology) as instrument(s) of study?Singular seems to be preferred in writing, while plural often occurs when spoken … or is the singular the formal version and the plural informal?Plural: Recalls the history of the field - Humanities ComputingHumanities ComputingText Digitization ProjectsTool DevelopmenteResearch (…eScience, Social Sciences Computing)“Digital Humanities” as a trend, as a fraud…The Tweeting Humanities?Actually, there seem to be vastly more DH practitioners not tweeting or blogging than doing so. However, Kirschenbaum is certainly right that there do seem to be “cliques” among the folks active on Twitter.Are the (relatively) few DH practitioners on Twitter representative of the field? DH tweets and blogs remain useful (if not always efficient)
  7. An awkward sentence, but nicely inclusive of the ongoing range of scholarly activity called “Digital Humanities”- Note that this blog post talks much about the DH “big tent,” but the substantive literature on and about the field reveals deep, deep rifts
  8. 4 Part Article Series, of which this is #2Note his observations of the fieldSvensson refers to Gary Hall: “It follows that the consequences and implications of digital media for research into cultural studies themes, problematic, and questions cannot be explored simply by using the recognized, legitimate, preconstituted, disciplinary forms of knowledge: literary studies, philosophy, sociology, history, psychoanalysis, and so on. Digital media change the very nature of such disciplines, rending them ‘unrecognizable’ as Derrida says of psychoanalysis.” (Hall 2008, 81)The reference anticipates the positions of some later papers, of which I’ll mention 2 in a moment: Alan Liu, Andrew Prescott
  9. Not really a genealogy: really, a partial lit reviewShe tracks some of the impacts of DH practices on various aspects of scholarship, including methods, expectations, organization, and scope of practice. SEE her summary of Ruhleder:The Thesaurus Linguae Graecae (TLG) is a digital collection of “the canon of Greek authors and works,: which was “started in 1972 by Theodore Brunner at the University of California, Irvine.” “this resource had a profound restructuring effect on knowledge production in the field of Classics” (Dalbello 489)She also discusses concepts such as ‘frenetic’ reading as ‘meta reading’(492) which we will take up, at least implicitly, when we discuss approaches to text analysis.
  10. Note the big play at the start – does he deliver?Does Prescott accurately portray the range of the Digital Humanities?Does Prescott offer real solutions? Or are these solutions already encompassed by DH? (recall all of Svensson’s modes of engagement; and consider, for example, topics covered at conferences, such as the Chicago Colloquium … video game platform studies, etc.)Is this scholarly colonialism? Simply naïvete? Showmanship? Valid opinion and worthy of pursuit?
  11. Interesting essay too long to handle fairly here, but Liu shows little interest himself in “fairness,” so I’m not troubled.He claims “unusual breadth” in his construction of DH, but if anything, his DH is unusually narrow … in order to provide a basis for his next rhetorical move?Note how he sets up a problem for DH, so that DH must be “rescued” from its practitioners. While he may be right in interesting ways, the colonialism of the project staggered me at first reading.“…the single most important theoretical development in the digital humanities in recent years has been the explosion of non-empirical interpretive paradigms”Liu is right that there have been exciting developments, but his concept of the empirical throughout is bizarre. Maybe he is trying to strike a Derridean pose, but the truth is that empiricism embraces the experiential, so the new paradigms of play and deformance are as empirical as other observational methods.Liu’s references to the “text-oriented side of the field” are usually back-handed, at best. I think the last quote requires some empirical testing because, given my own reading, it seems to me he must be disallowing any evidence that may contradict him.
  12. Willard McCarty’s essay seems to me to be just a little bit wiser and farther-seeing than the previously described papers. SEE the year.
  13. If you are just starting to think about text analysis or listen in on conversations about text analysis, the vocabulary might seem off-putting, but there is really very little to it….In addition, if you look at text analysis tools, you might feel overwhelmed by the options seeming available, but you will quickly notice that the options to be variations on a limited set of themes…
  14. So let’s admit Jockers is a heavy-handed writer, thus he characterizes critical insights as “random ‘things’” and tells us what we “must” do, but next, let’s consider what he’s offering.Jockers details many interesting applications of macroanalysis“by providing … a fuller sense of the literary-historical milieu” (28) of textsAs useful corrective to inaccurate literary historical generalizationsFor authorship attribution based on type-token ratio, or by analysis of the use of “little words,” and so onFor exploring genre, period, gender and so onFor exploring influenceTo compare literary historical or comparative literature concepts, such as the idea of “national voice”Jockers does seem at times to have lost sight of the trees for the forest. It’s hard to imagine him actually close reading a book, esp The Wasteland, In Parenthesis, etc.For some kinds of studies, I think Jockers is just plain right. Some kinds of literature problems can be solved (esp. something like Authorship Attribution). But is this criticism? We’ll talk about this more.
  15. I quote Moretti at extreme length, but I want to be clear about the origins of the phrase “Distant Reading” – which has been a marketing slogan to rival “Digital Humanities”Moretti was arguing for a study of world literature which would not rely on close reading alone, but would also, or perhaps instead, rely on the scholarship of others.However, his formulation of Distant Reading does nicely encompass computer-assisted text analysis.As it happens, Moretti has also gone on to direct the Stanford Literary Lab (founded with Matthew Jockers). He has been a leader of the kind of quantitative, computer-assisted text analysis studies we saw with Jockers.
  16. “Style, Inc.: Reflections on 7,000 Titles” (lots of graphs descriptive of changes in … titles)“Network Theory, Plot Analysis” – see Figure“The Slaughterhouse of Literature”Here we might observe the danger of quantitative analysis seeming scientific, but not actually conducted scientifically. He conjectures that the ongoing popularity of Sherlock Holmes (“sociallysupercanonical right away, but academically canonical only a hundred years later), by Arthur Conan Doyle, results from Doyle’s use of clues in his stories. He compares Doyle’s stories to those of his contemporaries, tracking whether and how clues are used in their stories. Moretti finds what he sets out to find. Doyle does use clues more often than his contemporaries (but perhaps a little less consistently than Moretti expected)He does not set up a control or look for disconfirmation or even consider other possibilities. (ex: How about character traits – Sherlock is modelled on Poe’s Dupin. Sherlock on TV today relies on the character traits of both, much less on clues.)Looking ahead, we might say that Moretti succeeds because he works in the spirit of “inventio,” he “freely employs the rhetorical tactics of conjecture” (Ramsay 16) – I’m quoting the words of Stephen Ramsay.
  17. If you pick up just one book on text analysis, I suggest Stephen Ramsay’s Reading Machines. 98 pages of collected essays.“…for decades the dominant assumption within humanities computing … has been that if the computer is to be useful to the humanities, its efficacy must necessarily lie in the aptness of the scientific metaphor for humanistic study. This work takes the contrary view and proposes that scientific method and metaphor … is, for the most part, incompatible with the terms of humanistic endeavor.” (Ramsay x)
  18. I think Ramsay suggests the wisest path forward for a literary criticism that makes use of quantification (perhaps excepting some kinds of questions, as we saw in Jockers, but then, these may not be literary critical).In Ramsay, I think we see the best encapsulation of what Liu considered to be the “new interpretive paradigm.”In addition, Ramsay writes about “deformance” (from Jerome McGann and Lisa Samuels) and “tamperings” (from Estelle Irizarry), in which a text is made unfamiliar, thus leading to new thinkings … there is a cottage industry in talking about text analysis as play…
  19. Sinclair also quotes Mike McCullough from Abstracting Craft, 1996: “… it is a distinct advantage of computation to introduce play; this is a natural consequence of working in bits.”I have not read McCullough, in part because my library offers access only through a very bad online interface (humorously enough), but one might see how the chapter on “Play” might help broadly to inform a discussion of play with Text Analysis tools:“There is much to be said for play in a medium. If a medium is defined by its affordances and constraints, then learning consists of exploring these properties. Experimentation is especially useful for becoming familiar with constraints: we learn from our mistakes. We must accept that beginning work in a new medium will be full of setbacks. There will also be fortuitous discoveries, however, particularly of affordances. Design is not only invention, but also sensitivity to a medium. Craft cannot be merely in service of technique, or of inappropriately conceived ends. The craftsman must begin to feel something about the artifacts, and only certain moves will feel right.”
  20. These quotations draw our attention to pedagogical aspects of Text Analysis Tools. We’ll talk more about pedagogy later when thinking about projects, but I want to note here two issues. Students can use TA tools possibly to improve, or at least to reflect upon their own writing. We must ask whether students not taught TA tools are being deprived, are not being taught competencies that might be growing increasingly “basic”
  21. From a topic modeling (LDA) perspective, a text consists of some number of topics, each of which makes up some percent of the text. A topic can be thought of as a “bag of words.” We can think of a text as resulting from a number of random drawings from those bags of words based on the percentage allocation of topics (and the numbers of various words in those bags will dependon the percentage allocation of words within those topics).“One way to think about how the process of topic modeling works is to imagine working though an article with a set of highlighters. As you read through the article, you use a different color for the key words of themes within the paper as you come across them. When you were done, you could copy out the words as grouped by the color you assigned them. That list of words is a topic, and each color represents a different topic. Note: this description is inspired by the following illustration from David Blei’saricle, which is one of the best visual representation of a topic I’ve found.” (Brett 12)My caveat: the computer does not know the meanings of the words. The algorithm finds topics based on the co-occurrence of the words: “They look like ‘topics’ because terms that frequently occur together tend to be about the same subject” (Blei 9)
  22. This graphic represents a (perhaps gross) simplification of the most common form of topic modeling (Latent Dirichlet Allocation, or LDA)
  23. What if I want to topic model Whitman’s 1892 Leaves of Grass? I decide to treat each poem as a document, so I need to prepare the text so that TMT will read each poem as a document (and I also have to make some decisions about what will count as words).
  24. Leaves of grassMake copies of the original textFor complex clean-up, the precise order of steps might take several attempts to get right.Regular expressions can help a great deal
  25. If, instead of topic modeling, I want to visualize various aspects of a text in Voyant, my text preparation process is very different. I simply remove anything that will cause my word counts to go awry.
  26. See more of a tool?See other tools on openresearch.weebly.com?See other texts? Other kinds of texts?See more of the Whitman project: “Playing with Whitman”
  27. Content Analysis is a large topic that is not easy to cover in an overview, because we can’t cover research design or methodological issues, which are the core of the topic.I’m introducing this topic because I will demonstrate CATMA in a little bit. I think it would be worthwhile to look more closely into content analysis in order to make the best use of CATMA.I consider content analysis to be too important to skip. Many digital humanists are pursuing some form of content analysis, sometimes without knowing it as such, and they could really benefit from a wider understanding. And anyway, a lot of great humanities work is done in the social sciences.Because nothing is easy, I should point out that my use of the label “Content Analysis” here, although traditional, is not uncontroversial. The term has historically been used (and continues to be used by many) to include a wide variety of both quantitative and qualitative studies of text content. However, an important textbook on the topic, The Content Analysis Guidebook, by Kimberly A. Neuendorf, narrowly defines content analysis. She distinguishes numerous kinds of “message analysis” studies. Her informal use of the label “message analysis” in her text coincides, I think, with the historical use of “content analysis,” but I haven’t made a study of this thought.Rhetorical Analysis: “The analyst engages in a reconstruction of manifest characteristics of text … such as the message’s construction, form, metaphors, argumentation structure, and choices.”Narrative Analysis: “…involves a description of formal narrative structure”Discourse Analysis: “engages in characteristics of manifest language and word use … connection of words to theme analysis of content and the establishment of central terms.”Structuralist (or Semiotic) Analysis: “Technique aims at deep structure, latent meanings, and the signifying process through signs, codes, and binary oppositions.”Interpretative Analysis: “focus … on formation of theory from the observation of messages and the coding of those messages)Conversation Analysis: “analyzing naturally occurring conversations”Critical AnalysisNormative Analysis: prescriptive or proscriptive(Neuendorf 5-8)
  28. Quantitative/ Qualitative:In quantitative analysis, things are counted by category. The counts (frequencies) allow statistical analyses (which vary depending on the kinds of counts)Ole Holsti proposed that there might be such a thing as qualitative content analysis, depending not on frequencies, but on whether some category of content appears (and perhaps how and when it appears) (Holsti 5-11), but he nevertheless called for an objective and systematic method to apprehend these appearancesBruce Berg also suggests qualitative analysis is possible: “Analysis of the data once organized according to certain content elements should involve consideration of the literal words in the text being analyzed, including the manner in which these words have been offered … content analysis is not a reductionistic, positivistic approach. Rather, it is a passport to listening to the words of the text, and understanding better the perspective(s) of the producer of these words.” (Berg 242)Deductive/ InductiveNeuendorf acknowledges that it might be necessary for the researcher to immerse herself in some data to identify variables: she calls this the “Grounded or Emergent Process of Variable Identification” (Neuendorf 102-103)I have also seen this process referred to as “Open Coding” (Berg 245-6)Manifest/ Latent Content: Neuendorf: “…it is perhaps more useful to think of a continuum from ‘highly manfest’ to ‘highly latent’” (Neuendorf 23)Content/ Form: Consider Neuendorf’s study of gender and violence imagery in music videosReliability, Validity, Generalizability, ReplicabilityUnitizing:units of sampling, of analysis, of data collection; recording units; etc.
  29. Example: Track the use of gender pronouns in company policy manuals over a period of time, or between two or more periods of time.
  30. Neuendorf would circumscribe some of the uses on this list more carefully than Holsti
  31. If possible, show CATMA online: login, load doc, add markup doc, create some tags.
  32. Stop Calling Office Parks “nondescript” – a trivial, but fascinating example of how a text analysis tool might be used.
  33. Note how “Word Trends” seems to have a different setting from default Voyant so that multiple words cannot be tracked at once. I might be able to figure out how to enable multiple words, but how much time do I spend?
  34. See openresearch.weebly.com for more
  35. See openresearch.weebly.com