3. Once Upon a Time....
- There was too much scientific information
(43,848 Papers on p53)
- And it was all written in stories....
[demo Papers]
4. Research Goal
- Find a structure for research articles,
that allows computer-aided access to
knowledge elements
- Start with Research Articles in Cell Biology
- Expand to other genres/domains?
- How do we extract this structure?
- How do we use this structure?
5. Speech acts, conversational maxims, face principles, deixis, …
PragmaticEnglish 306A; Harris 5
1. Colloquial: practical, vs. theoretical
2. Linguistic: ‘meaning
of linguistic messages in
their context of use’ (per/il/locutionary goals)
3. Pragmaticweb: ‘quality of goal-oriented
Meaning
discourse in communities’
Semantics Pragmatics
Propositions Utterances
Truth/falsity Appropriateness
Context-free Context-dependent
Language-in-vitro Language-in-vivo
7. Genre + Discourse Studies
- Science is written in text, as a story
- Text is created by humans to persuade
other humans (peers, that claims are facts)
- To tell the computer how we encode our
knowledge, we need to understand:
=> How do humans tell stories?
=> How do stories make sense?
8. Work on corpus
- Corpus of 14 coherent (citing, cited)
articles in Cell Biology, based around
(Voorhoeve, 2006)
- Hand-modeled ascii text; created XML
- Manual (by me + small user validation)
10. 1st Attempt: Classical rhetoric
Aristotle Quintilian Cell APA Style
Guide
The introduction of a speech, where one announces the subject and
prooimion Introduction exordium purpose of the discourse, and where one usually employs the persuasive Introduction Introduction
appeal of ethos in order to establish credibility with the audience.
The second part of a classical oration, following the introduction or
Statement exordium. The speaker here provides a narrative account of what has
prothesis of Facts narratio happened and generally explains the nature of the case. Quintilian adds Introduction Introduction
that the narratio is followed by the propositio, a kind of summary of the
issues or a statement of the charge.
Coming between the narratio and the partitio of a classical oration, the
Summary propostitio propositio provides a brief summary of what one is about to speak on, or Abstract Abstract
concisely puts forth the charges or accusation.
Following the statement of facts, or narratio, comes the partitio or divisio.
Division/ In this section of the oration, the speaker outlines what will follow, in Table of
outline partitio accordance with what's been stated as the status, or point at issue in the Contents Article Outline
case. Quintilian suggests the partitio is blended with the propositio and
also assists memory.
Following the division / outline or partitio comes the main body of the
pistis Proof confirmatio speech where one offers logical arguments as proof. The appeal to logos is Results Methods, Results
emphasized here.
Following the the confirmatio or section on proof in a classical oration,
Refutation refutatio comes the refutation. As the name connotes, this section of a speech was Discussion Discussion
devoted to answering the counterarguments of one's opponent.
Following the refutatio and concluding the classical oration, the peroratio
epilogos peroratio conventionally employed appeals through pathos, and often included a Discussion Discussion
summing up (see the figures of summary, below).
11. 2nd Attempt: Story Grammar
The Story of Goldilocks Story Grammar Paper The AXH Domain of Ataxin-1 Mediates
and the Three Bears Neurodegeneration through Its Interaction with
Gfi-1/Senseless Proteins
Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully
understood, but some general principles have emerged.
a little girl named Goldilocks Characters Objects of the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ
study tract,
She went for a walk in the Location Experimental studied and compared in vivo effects and interactions to those o
forest. Pretty soon, she came setup the human protein
upon a house.
She knocked and, when no Goal Theme Research Gain insight into how Atx-1's function contributes to SCA1
one answered, goal pathogenesis. How these interactions might contribute to the
disease process and how they might cause toxicity in only a
subset of neurons in SCA1 is not fully understood.
she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression
At the table in the kitchen, Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When
there were three bowls of Overexpressed in Files
porridge.
Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain
She tasted the porridge from Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system
the first bowl. (Brand and Perrimon, 1993) and compared its effects to those o
hAtx-1.
This porridge is too hot! she Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which
exclaimed. drives expression in the differentiated R1-R6 photoreceptor cell
(Mollereau et al., 2000 and O'Tousa et al., 1985), results in
neurodegeneration in the eye, as does overexpression of hAtx-1
[82Q]. Although at 2 days after eclosion, overexpression of eithe
Atx-1 does not show obvious morphological changes in the
photoreceptor cells
So, she tasted the porridge Data (data not shown),
from the second bowl.
This porridge is too cold, she Outcome Results both genotypes show many large holes and loss of cell integrity
said at 28 days
So, she tasted the last bowl of Data (Figures 1B-1D).
porridge.
12. 3rd Attempt: Discourse Segments
- “A text is made up of Discourse Segments
and the relations between them” - Grosz and
Sidner, Mann-Thomson, Marcu, Swales
- Discourse Segment Purpose: element that has
a consistent rhetorical/pragmatic goal.
- Define for Biological Research Article
13. Discourse Segments In Biology
<Goal>
To examine miRNA expression from the miR-Vec system,
</Goal>
<Method>
a miR-24 minigene-containing virus was transduced into
human cells. Expression was determined using an RNase
protection assay (RPA) with a probe designed to identify
both precursor and mature miR-24 (Figure 1B).
</Method>
<Result>
Figure 1C shows that cells transduced with miR-Vec-24
clearly express high levels of mature miR-24,
whereas little expression was detected in control-
transduced cells.
</Result>
18. Discourse: A Fact(ory)
hypothetical realm: realm of activity:
(might, would) (to test, to see)
goal
to
problem results
we realm of
introduction method experience:
past
resulting in
result
suggests that
hypothesis
incongruity or ignorance
discussion
realm of models:
fact fact fact implication present
Shared view Own view discussion
19. Links (Under Construction)
To references:
- From/to segment type makes difference:
methods link, fact link, agree/disagree link
- Not clear where to link into: is claim truly in referred
document? How to locate?
To figures/tables:
- Usually main proof in results (methods) segments: need
to allow multi-media elements in system!
Discourse relations:
- Many taxonomies: RST, Hovy, Sanders, ClaiMaker
- Identify textual coherence/argumentation...
20. Coherence Markers
Fact Problem Goal Method Results Implication Hypothesi
s
Fact in animals however to, we we fused, we in contrast, we our data suggesting
(3x) examined utilised found (5x), suggest, we that (2x)
(2x) though, on propose that,
average, under consistent
our conditions with
Problem we fused in this paper
Goal we isolated we showed
Method we found (2x), but suggests we
while, as seen predicted
Results in addition, we utilised, interestingly (2x), (strongly) we
in contrast we used since (3x), also suggests/ propose,
(2x), while (2x), suggesting suggesting
second (2x), third that (8x), that
(2x), finally (2x), implicating
subsequent, (2x),
thereafter, in our consistent
study with (2x),
demonstratin
g that (3x)
Implication to verify, to we however, first also in theory
confirm replaced, we (2x), interestingly
fused, we (2x), consistent
tried with, in our
analysis,
strikingly, neither
21. Preliminary Hypotheses
1 'To' infinitive appears as marker of Goal moves +
2 Sequential connectives appear within same segment type -
3 'though', 'however', 'therefore' - causal connectives occur at all 0
-> Problem and -> Hypothesis boundaries
4 'suggests' occurs at Results-> Implication/Hypothesis boundary +/0
5 'we found' /'we observed'/ 'we showed' -> Result boundary +/0
6 'we + other verb' occurs at -> method boundary 0
7 Contrast/correspondence in Fact <-> Result <-> Implication moves +!
23. Research Goals fulfilled?
allow computer-aided access to knowledge:
yes, but:
> need to identify if they do cover this genre
> need to finalize a structure of relations
other genres/domains?
> investigate more than cell biology
how do we extract this structure?
> collaborative attempts to identify segment markers/
relationships - next step
how do we use this structure? : [ DEMO ]
> possible collaborations with sensemaking systems?
24. Preliminary Conclusions
- Science is created in text
- Goal of text is to convince peers that claims (backed
by data) belong to fact canon
- Text convinces humans through rhetorical/narrative
discourse structure
- Text creates meaning in the human mind
- Discourse parsing could allow access to knowledge
structure
- More work needed: collaborations?
27. Related work
Bio-informatics Style Shum et Harmsze Swales RST Teufel Collier
Guides al et.al
Sections x x x
Moves x x x x!
Entities x x
Embedding x x
Discourse relations x x x
Argumentational x x
relations
* Need complete model for multidocument collection – markup
content elements
and relationships
* Unique role as a publisher: can apply/mandate at the source
29. Clause Classification Test
Nr Section Introduction Results Discussion
Results Clause assignment test (8 tests handed
in, avg. 38 clauses each): A1 Agami, Results 4
114 Clauses
51 No Disagreement A2 Agami, Discussion ½ 2 ½
13 Fact/Result
11 Fact/Problem
10 Method/Result A3 Agami, Introduction 3
7 Result/Implication
4 Goal/Method
3 Problem/Goal S1 Serrano, Results 2
2 Goal/Result
2 Problem/Interpretation
2 Fact/Interpretation S2 Serrano, Discussion 1 1
1 Problem/Result
Comments on classification: S3 Serrano, Introduction 2
• Incomplete sentences are unclear, hard to classify
• Add ‘Hypothesis’ category, exx. clauses 8, 33, 74a, 77,
78b.
V1 Voorhoeve, Results 2
• Other possible categories: Assumption, Observation,
“Given that...”
V2 Voorhoeve, 3
Discussion
V3 Voorhoeve, 1 2
Introduction
24
30. References
• Austin, J.L. How to do things with words, J.O. Urmson, ed. Oxford: Clarendon Press, 1962.
• Bazerman, Charles : Shaping written knowledge : the genre and activity of the experimental article in science, Madison,
Wisconsin: Univ. of Wisconsin Press, 1988.
• F.J. Bex, H. Prakken, C. Reed & D.N. Walton, Towards a formal account of reasoning about evidence: argumentation schemes and
generalisations. Artificial Intelligence and Law 11 (2003), 125-165
• Buckingham Shum, Simon J. Uren,V. et. al , Modelling Naturalistic Argumentation in Research literatures: Representation and
Interaction Design Issues, Tech Report kmi-04-28, December 2004
• Harmsze, Frédérique. PhD Thesis, February 9, 2000. A modular structure for scientific articles in an electronic environment
(HTML & PDF).
• Hovy, E. Automated discourse generation using discourse structure relations. Art. Intelligence 63(1-2): 1993. 341-386.
• Kircz, Joost G.. Modularity: the next form of scientific information presentation? Journal of Documentation. vol.54. No. 2. March
1998. pp. 210-235.
• Kuhn, Thomas, The Structure of Scientific Revolutions (Chicago: University of Chicago Press, 1962)
• Latour, B., Science in Action, How to Follow Scientists and Engineers through Society, (Cambridge, Ma.: Harvard University Press,
1987)
• Latour, Bruno, Steve Woolgar, Jonas Salk, Laboratory Life: The Construction of Scientific Facts, Princeton University Press,
1986
25