SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Science Beyond the Facts:
A Pragmatic Structure for Research
             Articles

              Anita de Waard
    Elsevier Labs, Disruptive Technologies
              Utrecht University
Introduction
Once Upon a Time....
- There was too much scientific information
    (43,848 Papers on p53)
- And it was all written in stories....
   [demo Papers]
Research Goal
- Find a structure for research articles,
  that allows computer-aided access to
  knowledge elements
- Start with Research Articles in Cell Biology
- Expand to other genres/domains?
- How do we extract this structure?
- How do we use this structure?
Speech acts, conversational maxims, face principles, deixis, …


                        PragmaticEnglish 306A; Harris                           5



1. Colloquial: practical, vs. theoretical

2. Linguistic: ‘meaning
                      of linguistic messages in
  their context of use’ (per/il/locutionary goals)
3. Pragmaticweb: ‘quality of goal-oriented
                     Meaning
  discourse in communities’
        Semantics                                Pragmatics
         Propositions                               Utterances
         Truth/falsity                              Appropriateness
         Context-free                               Context-dependent
         Language-in-vitro                          Language-in-vivo
Method
Genre + Discourse Studies
- Science is written in text, as a story
- Text is created by humans to persuade
  other humans (peers, that claims are facts)
- To tell the computer how we encode our
  knowledge, we need to understand:
 => How do humans tell stories?
 => How do stories make sense?
Work on corpus

- Corpus of 14 coherent (citing, cited)
 articles in Cell Biology, based around
 (Voorhoeve, 2006)
- Hand-modeled ascii text; created XML
- Manual (by me + small user validation)
(Preliminary)
   Results
1st Attempt: Classical rhetoric
Aristotle                                               Quintilian                                                          Cell         APA Style
                                                                                                                                          Guide

                                           The introduction of a speech, where one announces the subject and
prooimion   Introduction   exordium      purpose of the discourse, and where one usually employs the persuasive          Introduction     Introduction
                                            appeal of ethos in order to establish credibility with the audience.


                                             The second part of a classical oration, following the introduction or
             Statement                      exordium. The speaker here provides a narrative account of what has
prothesis     of Facts      narratio       happened and generally explains the nature of the case. Quintilian adds       Introduction     Introduction
                                          that the narratio is followed by the propositio, a kind of summary of the
                                                             issues or a statement of the charge.


                                          Coming between the narratio and the partitio of a classical oration, the
             Summary       propostitio   propositio provides a brief summary of what one is about to speak on, or         Abstract          Abstract
                                                      concisely puts forth the charges or accusation.


                                         Following the statement of facts, or narratio, comes the partitio or divisio.
             Division/                     In this section of the oration, the speaker outlines what will follow, in      Table of
              outline       partitio     accordance with what's been stated as the status, or point at issue in the       Contents      Article Outline
                                          case. Quintilian suggests the partitio is blended with the propositio and
                                                                     also assists memory.


                                           Following the division / outline or partitio comes the main body of the
  pistis       Proof       confirmatio    speech where one offers logical arguments as proof. The appeal to logos is        Results      Methods, Results
                                                                      emphasized here.


                                          Following the the confirmatio or section on proof in a classical oration,
             Refutation     refutatio    comes the refutation. As the name connotes, this section of a speech was        Discussion       Discussion
                                              devoted to answering the counterarguments of one's opponent.


                                         Following the refutatio and concluding the classical oration, the peroratio
 epilogos                   peroratio     conventionally employed appeals through pathos, and often included a           Discussion       Discussion
                                                     summing up (see the figures of summary, below).
2nd Attempt: Story Grammar
The Story of Goldilocks           Story        Grammar     Paper          The AXH Domain of Ataxin-1 Mediates
and the Three Bears                                                       Neurodegeneration through Its Interaction with
                                                                          Gfi-1/Senseless Proteins
Once upon a time                  Time         Setting     Background     The mechanisms mediating SCA1 pathogenesis are still not fully
                                                                          understood, but some general principles have emerged.
a little girl named Goldilocks    Characters               Objects of     the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ
                                                           study          tract,
She went for a walk in the        Location                 Experimental   studied and compared in vivo effects and interactions to those o
forest. Pretty soon, she came                              setup          the human protein
upon a house.
She knocked and, when no          Goal         Theme       Research       Gain insight into how Atx-1's function contributes to SCA1
one answered,                                              goal           pathogenesis. How these interactions might contribute to the
                                                                          disease process and how they might cause toxicity in only a
                                                                          subset of neurons in SCA1 is not fully understood.

she walked right in.              Attempt                  Hypothesis     Atx-1 may play a role in the regulation of gene expression


At the table in the kitchen,      Name         Episode 1   Name           dAtX-1 and hAtx-1 Induce Similar Phenotypes When
there were three bowls of                                                 Overexpressed in Files
porridge.
Goldilocks was hungry.            Subgoal                  Subgoal        test the function of the AXH domain
She tasted the porridge from      Attempt                  Method         overexpressed dAtx-1 in flies using the GAL4/UAS system
the first bowl.                                                            (Brand and Perrimon, 1993) and compared its effects to those o
                                                                          hAtx-1.
This porridge is too hot! she     Outcome                  Results        Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which
exclaimed.                                                                drives expression in the differentiated R1-R6 photoreceptor cell
                                                                          (Mollereau et al., 2000 and O'Tousa et al., 1985), results in
                                                                          neurodegeneration in the eye, as does overexpression of hAtx-1
                                                                          [82Q]. Although at 2 days after eclosion, overexpression of eithe
                                                                          Atx-1 does not show obvious morphological changes in the
                                                                          photoreceptor cells
So, she tasted the porridge                                Data           (data not shown),
from the second bowl.
This porridge is too cold, she    Outcome                  Results        both genotypes show many large holes and loss of cell integrity
said                                                                      at 28 days
So, she tasted the last bowl of                            Data           (Figures 1B-1D).
porridge.
3rd Attempt: Discourse Segments

- “A text is made up of Discourse Segments
  and the relations between them” - Grosz and
  Sidner, Mann-Thomson, Marcu, Swales
- Discourse Segment Purpose: element that has
  a consistent rhetorical/pragmatic goal.
- Define for Biological Research Article
Discourse Segments In Biology
 <Goal>
To examine miRNA expression from the miR-Vec system,
</Goal>
<Method>
a miR-24 minigene-containing virus was transduced into
human cells. Expression was determined using an RNase
protection assay (RPA) with a probe designed to identify
both precursor and mature miR-24 (Figure 1B).
</Method>
<Result>
Figure 1C shows that cells transduced with miR-Vec-24
clearly express high levels of mature miR-24,
whereas little expression was detected in control-
transduced cells.
</Result>
12
Segments vs. Sections
              Introduction   Method   Results   Discussion Total

Fact          63             0        104       37         204

Problem       20             0        10        15         45

Goal          2              0        72        6          80

Method        2              all      129       6          137

Result        10             0        230       44         284

Implication   14             0        100       36         150

Hypothesis    10             0        33        26         69

Total         121            0        678       170        969
Segment Tense
               Fact        Problem   Goal        Method     Result         Implication   Hypothesis
Present active 72 46%      27   60% 15      23% 7     7%    37       16%   69    51%     38   55%
Present        5      3%   2    4%   2      3%   1    1%    1        0%    11    8%      1    1%
passive
Past active    18 11%      5    11% 11      17% 48    47%   122      54%   16    12%     8    12%
Past passive   25 16%      2    4%   1      2%   17   17%   21       9%    1     1%      5    7%
Future         2      1%   3    7%   0      0%   0    0%    1        0%    0     0%      0    0%
Imperfect: quot;toquot; 13 8%      2    4%   32     50% 2     2%    20       9%    14    10%     7    10%
Gerund (quot;ingquot;) 22 14%      4    9%   3      5%   28   27%   23       10%   24    18%     10   14%
Total          157 100% 45      100% 64     100% 103 100% 225        100% 135    100% 69      100%
Segment order
           Fact Hypothesis Problem Goal   Method Result   Implication End   Total

Start      18    3        1        8      2      2        4          0      38

Fact       83    22       13       17     9      31       12         1      188

Hypothesis 20    5        3        7      6      2        6          3      52

Problem    9     7        7        2      3      5        3          3      39

Goal       7     0        2        4      46     6        0          0      65

Method     13    2        3        10     25     54       3          0      110

Result     23    9        4        6      16     85       78         6      227

Implication 13   6        4        12     11     30       12         25     113

Total      186   54       37       61     118    215      118        38     827
Discourse: A Fact(ory)
   hypothetical realm:                                                          realm of activity:
    (might, would)                                                             (to test, to see)
                                                                goal
                                                   to
                      problem                                                             results
                                                                              we                     realm of
introduction                                                               method                   experience:
                                                                                                       past
                                                                               resulting in
                                                                            result


                                                                          suggests that
                                                   hypothesis
               incongruity or ignorance

                                                 discussion
                                                                                                realm of models:
            fact         fact             fact                         implication                  present


              Shared view                                         Own view                          discussion
Links (Under Construction)
To references:
- From/to segment type makes difference:
  methods link, fact link, agree/disagree link

- Not clear where to link into: is claim truly in referred
  document? How to locate?

To figures/tables:
- Usually main proof in results (methods) segments: need
  to allow multi-media elements in system!

Discourse relations:
- Many taxonomies: RST, Hovy, Sanders, ClaiMaker
- Identify textual coherence/argumentation...
Coherence Markers
              Fact           Problem   Goal            Method         Results               Implication    Hypothesi
                                                                                                           s

Fact          in animals     however   to, we          we fused, we in contrast, we         our data      suggesting
                             (3x)      examined        utilised     found (5x),             suggest, we   that (2x)
                                       (2x)                         though, on              propose that,
                                                                    average, under          consistent
                                                                    our conditions          with
Problem                                                we fused       in this paper

Goal                                                   we isolated    we showed
Method                                                                we found (2x),        but suggests   we
                                                                      while, as seen                       predicted

Results       in addition,                             we utilised,   interestingly (2x),   (strongly)     we
              in contrast                              we used        since (3x), also      suggests/      propose,
                                                                      (2x), while (2x),     suggesting     suggesting
                                                                      second (2x), third    that (8x),     that
                                                                      (2x), finally (2x),    implicating
                                                                      subsequent,           (2x),
                                                                      thereafter, in our    consistent
                                                                      study                 with (2x),
                                                                                            demonstratin
                                                                                            g that (3x)
Implication                            to verify, to   we             however, first       also             in theory
                                       confirm          replaced, we   (2x), interestingly
                                                       fused, we      (2x), consistent
                                                       tried          with, in our
                                                                      analysis,
                                                                      strikingly, neither
Preliminary Hypotheses
1 'To' infinitive appears as marker of Goal moves                       +
2 Sequential connectives appear within same segment type               -
3 'though', 'however', 'therefore' - causal connectives occur at all   0
   -> Problem and -> Hypothesis boundaries
4 'suggests' occurs at Results-> Implication/Hypothesis boundary       +/0
5 'we found' /'we observed'/ 'we showed' -> Result boundary            +/0
6 'we + other verb' occurs at -> method boundary                        0
7 Contrast/correspondence in Fact <-> Result <-> Implication moves     +!
Discussion
Research Goals fulfilled?
allow computer-aided access to knowledge:
yes, but:
> need to identify if they do cover this genre
> need to finalize a structure of relations
other genres/domains?
> investigate more than cell biology
how do we extract this structure?
> collaborative attempts to identify segment markers/
relationships - next step
how do we use this structure? : [ DEMO ]
> possible collaborations with sensemaking systems?
Preliminary Conclusions
- Science is created in text
- Goal of text is to convince peers that claims (backed
  by data) belong to fact canon
- Text convinces humans through rhetorical/narrative
  discourse structure
- Text creates meaning in the human mind
- Discourse parsing could allow access to knowledge
  structure
- More work needed: collaborations?
Questions?

       anita@cs.uu.nl
http://people.cs.uu.nl/anita
Appendix
Related work
                      Bio-informatics    Style   Shum et   Harmsze   Swales   RST   Teufel   Collier
                                        Guides     al                                         et.al


     Sections                             x                   x                       x


      Moves                                         x                  x              x        x!


     Entities               x                                 x


   Embedding                                                  x                x


Discourse relations                                 x         x                x


 Argumentational            x                                 x
    relations

 * Need complete model for multidocument collection – markup
                       content elements
                       and relationships
     * Unique role as a publisher: can apply/mandate at the source
Total      Fact Problem Goal Method Result Implication   Hypothesis   End      Total

Start      18    1      8     2       2          4       3            0        38

Fact       83    13     17    9       31         12      22           1        188

Problem    9     7      2     3       5          3       7            3        39

Goal       7     2      4     46      6          0       0            0        65

Method     13    3      10    25      54         3       2            0        110

Result     23    4      6     16      85         78      9            6        227

Implication 13   4      12    11      30         12      6            25       113

Hypothesis 20    3      7     6       2          6       5            3        52

Total      186   37     61    118     215        118     54           38       827

                                                                      Selfs    221

                                                                      Model: 399

                                            19                        % in     65.84%
                                                                      Model:
Clause Classification Test
                                                              Nr           Section         Introduction   Results   Discussion

Results Clause assignment test (8 tests handed
             in, avg. 38 clauses each):                       A1     Agami, Results                         4
                 114 Clauses
              51 No Disagreement                              A2   Agami, Discussion            ½           2           ½
                  13 Fact/Result
                 11 Fact/Problem
                10 Method/Result                              A3   Agami, Introduction          3
                7 Result/Implication
                  4 Goal/Method
                  3 Problem/Goal                              S1     Serrano, Results                       2
                   2 Goal/Result
             2 Problem/Interpretation
               2 Fact/Interpretation                          S2   Serrano, Discussion          1                       1
                 1 Problem/Result

          Comments on classification:                         S3   Serrano, Introduction        2
•    Incomplete sentences are unclear, hard to classify
•   Add ‘Hypothesis’ category, exx. clauses 8, 33, 74a, 77,
                           78b.
                                                              V1   Voorhoeve, Results                       2
•   Other possible categories: Assumption, Observation,
                       “Given that...”

                                                              V2       Voorhoeve,                                       3
                                                                         Discussion


                                                              V3       Voorhoeve,               1                       2
                                                                        Introduction




                                                                      24
References
•   Austin, J.L. How to do things with words, J.O. Urmson, ed. Oxford: Clarendon Press, 1962.

•   Bazerman, Charles : Shaping written knowledge : the genre and activity of the experimental article in science, Madison,
    Wisconsin: Univ. of Wisconsin Press, 1988.

•   F.J. Bex, H. Prakken, C. Reed & D.N. Walton, Towards a formal account of reasoning about evidence: argumentation schemes and
    generalisations. Artificial Intelligence and Law 11 (2003), 125-165

•   Buckingham Shum, Simon J. Uren,V. et. al , Modelling Naturalistic Argumentation in Research literatures: Representation and
    Interaction Design Issues, Tech Report kmi-04-28, December 2004

•   Harmsze, Frédérique. PhD Thesis, February 9, 2000. A modular structure for scientific articles in an electronic environment
    (HTML & PDF).

•   Hovy, E. Automated discourse generation using discourse structure relations. Art. Intelligence 63(1-2): 1993. 341-386.

•   Kircz, Joost G.. Modularity: the next form of scientific information presentation? Journal of Documentation. vol.54. No. 2. March
    1998. pp. 210-235.

•   Kuhn, Thomas, The Structure of Scientific Revolutions (Chicago: University of Chicago Press, 1962)

•   Latour, B., Science in Action, How to Follow Scientists and Engineers through Society, (Cambridge, Ma.: Harvard University Press,
    1987)

•   Latour, Bruno, Steve Woolgar, Jonas Salk, Laboratory Life: The Construction of Scientific Facts, Princeton University Press,
    1986




                                                                   25

Más contenido relacionado

Destacado

Pragmatics in the EFL classroom: An introduction
Pragmatics in the EFL classroom: An introductionPragmatics in the EFL classroom: An introduction
Pragmatics in the EFL classroom: An introduction
Jerry Talandis
 
Pragmatics (Linguistics)
Pragmatics (Linguistics)Pragmatics (Linguistics)
Pragmatics (Linguistics)
Coltz Mejia
 
Introduction to linguistics lec 1
Introduction to linguistics lec 1Introduction to linguistics lec 1
Introduction to linguistics lec 1
Hina Honey
 
Pragmatics presentation
Pragmatics presentationPragmatics presentation
Pragmatics presentation
Mehwish Nazar
 

Destacado (11)

Pragmatics: Introduction
Pragmatics: IntroductionPragmatics: Introduction
Pragmatics: Introduction
 
Pragmatics in the EFL classroom: An introduction
Pragmatics in the EFL classroom: An introductionPragmatics in the EFL classroom: An introduction
Pragmatics in the EFL classroom: An introduction
 
Pragmatics
PragmaticsPragmatics
Pragmatics
 
What is pragmatics
What is pragmaticsWhat is pragmatics
What is pragmatics
 
Pragmatics (Linguistics)
Pragmatics (Linguistics)Pragmatics (Linguistics)
Pragmatics (Linguistics)
 
Introduction to linguistics lec 1
Introduction to linguistics lec 1Introduction to linguistics lec 1
Introduction to linguistics lec 1
 
Pragmatics
PragmaticsPragmatics
Pragmatics
 
Pragmatics
PragmaticsPragmatics
Pragmatics
 
Pragmatics presentation
Pragmatics presentationPragmatics presentation
Pragmatics presentation
 
Pragmatics: Deixis And Distance By Dr.Shadia.Pptx
Pragmatics:  Deixis And Distance By Dr.Shadia.PptxPragmatics:  Deixis And Distance By Dr.Shadia.Pptx
Pragmatics: Deixis And Distance By Dr.Shadia.Pptx
 
Pragmatics - George Yule
Pragmatics - George YulePragmatics - George Yule
Pragmatics - George Yule
 

Similar a ICPW2007.deWaard

Knowledge Media Panel U Toronto, Sept 30 2010
Knowledge Media Panel U Toronto, Sept 30 2010Knowledge Media Panel U Toronto, Sept 30 2010
Knowledge Media Panel U Toronto, Sept 30 2010
Anita de Waard
 
Towards Collaborative Environments for Ontology Construction and Sharing
Towards Collaborative Environments for Ontology Construction and SharingTowards Collaborative Environments for Ontology Construction and Sharing
Towards Collaborative Environments for Ontology Construction and Sharing
Jie Bao
 
· Persuasive essay rubicExemplaryProficientNeeds Improvement.docx
· Persuasive essay rubicExemplaryProficientNeeds Improvement.docx· Persuasive essay rubicExemplaryProficientNeeds Improvement.docx
· Persuasive essay rubicExemplaryProficientNeeds Improvement.docx
alinainglis
 
Evolution, Humanity and Religion Where is the evidence for God?
Evolution, Humanity and Religion Where is the evidence for God?Evolution, Humanity and Religion Where is the evidence for God?
Evolution, Humanity and Religion Where is the evidence for God?
William Hall
 
Ontology - and Reloaded and Revolutions
Ontology - and Reloaded and RevolutionsOntology - and Reloaded and Revolutions
Ontology - and Reloaded and Revolutions
Jie Bao
 

Similar a ICPW2007.deWaard (20)

Sensemaking in Science
Sensemaking in ScienceSensemaking in Science
Sensemaking in Science
 
Scientific Sensemaking
Scientific SensemakingScientific Sensemaking
Scientific Sensemaking
 
C-SHALS 2010: representing scientific discourse, or: why triples are not enough
C-SHALS 2010: representing scientific discourse, or:  why triples are not enoughC-SHALS 2010: representing scientific discourse, or:  why triples are not enough
C-SHALS 2010: representing scientific discourse, or: why triples are not enough
 
A syntagmatic/Paradigmatic analysis of scientific text
A syntagmatic/Paradigmatic analysis of scientific textA syntagmatic/Paradigmatic analysis of scientific text
A syntagmatic/Paradigmatic analysis of scientific text
 
A syntagmatic and paradigmatic analysis of scientific text
A syntagmatic and paradigmatic analysis of scientific textA syntagmatic and paradigmatic analysis of scientific text
A syntagmatic and paradigmatic analysis of scientific text
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
 
Annotation systems
Annotation systemsAnnotation systems
Annotation systems
 
Stories that persuade with data
Stories that persuade with dataStories that persuade with data
Stories that persuade with data
 
Knowledge Media Panel U Toronto, Sept 30 2010
Knowledge Media Panel U Toronto, Sept 30 2010Knowledge Media Panel U Toronto, Sept 30 2010
Knowledge Media Panel U Toronto, Sept 30 2010
 
KNDI Toronto panel
KNDI Toronto panelKNDI Toronto panel
KNDI Toronto panel
 
Reengineering the scientific research paper
Reengineering the scientific research paperReengineering the scientific research paper
Reengineering the scientific research paper
 
On the research paper, and the knowledge within
On the research paper, and the knowledge withinOn the research paper, and the knowledge within
On the research paper, and the knowledge within
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papers
 
Arágon et all english grammar in context for academic and professional purp...
Arágon et all   english grammar in context for academic and professional purp...Arágon et all   english grammar in context for academic and professional purp...
Arágon et all english grammar in context for academic and professional purp...
 
Towards Collaborative Environments for Ontology Construction and Sharing
Towards Collaborative Environments for Ontology Construction and SharingTowards Collaborative Environments for Ontology Construction and Sharing
Towards Collaborative Environments for Ontology Construction and Sharing
 
· Persuasive essay rubicExemplaryProficientNeeds Improvement.docx
· Persuasive essay rubicExemplaryProficientNeeds Improvement.docx· Persuasive essay rubicExemplaryProficientNeeds Improvement.docx
· Persuasive essay rubicExemplaryProficientNeeds Improvement.docx
 
Evolution, Humanity and Religion Where is the evidence for God?
Evolution, Humanity and Religion Where is the evidence for God?Evolution, Humanity and Religion Where is the evidence for God?
Evolution, Humanity and Religion Where is the evidence for God?
 
English grammar in context
English grammar in contextEnglish grammar in context
English grammar in context
 
Contextual practice2033
Contextual practice2033Contextual practice2033
Contextual practice2033
 
Ontology - and Reloaded and Revolutions
Ontology - and Reloaded and RevolutionsOntology - and Reloaded and Revolutions
Ontology - and Reloaded and Revolutions
 

Más de pragmaticweb

ICPW2007.deLeenheerChristiaens
ICPW2007.deLeenheerChristiaensICPW2007.deLeenheerChristiaens
ICPW2007.deLeenheerChristiaens
pragmaticweb
 
ICPW2007.AgerfalkSjostrom
ICPW2007.AgerfalkSjostromICPW2007.AgerfalkSjostrom
ICPW2007.AgerfalkSjostrom
pragmaticweb
 
ICPW2007.CahierZaherZacklad
ICPW2007.CahierZaherZackladICPW2007.CahierZaherZacklad
ICPW2007.CahierZaherZacklad
pragmaticweb
 

Más de pragmaticweb (9)

ICPW2007.Aakhus
ICPW2007.AakhusICPW2007.Aakhus
ICPW2007.Aakhus
 
ICPW2007.Hoffman
ICPW2007.HoffmanICPW2007.Hoffman
ICPW2007.Hoffman
 
ICPW2007.deLeenheerChristiaens
ICPW2007.deLeenheerChristiaensICPW2007.deLeenheerChristiaens
ICPW2007.deLeenheerChristiaens
 
ICPW2007.deMoor
ICPW2007.deMoorICPW2007.deMoor
ICPW2007.deMoor
 
ICPW2007.Yetim
ICPW2007.YetimICPW2007.Yetim
ICPW2007.Yetim
 
ICPW2007.Paschke
ICPW2007.PaschkeICPW2007.Paschke
ICPW2007.Paschke
 
ICPW2007.AgerfalkSjostrom
ICPW2007.AgerfalkSjostromICPW2007.AgerfalkSjostrom
ICPW2007.AgerfalkSjostrom
 
ICPW2007.CahierZaherZacklad
ICPW2007.CahierZaherZackladICPW2007.CahierZaherZacklad
ICPW2007.CahierZaherZacklad
 
ICPW2007.Delugach
ICPW2007.DelugachICPW2007.Delugach
ICPW2007.Delugach
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

ICPW2007.deWaard

  • 1. Science Beyond the Facts: A Pragmatic Structure for Research Articles Anita de Waard Elsevier Labs, Disruptive Technologies Utrecht University
  • 3. Once Upon a Time.... - There was too much scientific information (43,848 Papers on p53) - And it was all written in stories.... [demo Papers]
  • 4. Research Goal - Find a structure for research articles, that allows computer-aided access to knowledge elements - Start with Research Articles in Cell Biology - Expand to other genres/domains? - How do we extract this structure? - How do we use this structure?
  • 5. Speech acts, conversational maxims, face principles, deixis, … PragmaticEnglish 306A; Harris 5 1. Colloquial: practical, vs. theoretical 2. Linguistic: ‘meaning of linguistic messages in their context of use’ (per/il/locutionary goals) 3. Pragmaticweb: ‘quality of goal-oriented Meaning discourse in communities’ Semantics Pragmatics Propositions Utterances Truth/falsity Appropriateness Context-free Context-dependent Language-in-vitro Language-in-vivo
  • 7. Genre + Discourse Studies - Science is written in text, as a story - Text is created by humans to persuade other humans (peers, that claims are facts) - To tell the computer how we encode our knowledge, we need to understand: => How do humans tell stories? => How do stories make sense?
  • 8. Work on corpus - Corpus of 14 coherent (citing, cited) articles in Cell Biology, based around (Voorhoeve, 2006) - Hand-modeled ascii text; created XML - Manual (by me + small user validation)
  • 9. (Preliminary) Results
  • 10. 1st Attempt: Classical rhetoric Aristotle Quintilian Cell APA Style Guide The introduction of a speech, where one announces the subject and prooimion Introduction exordium purpose of the discourse, and where one usually employs the persuasive Introduction Introduction appeal of ethos in order to establish credibility with the audience. The second part of a classical oration, following the introduction or Statement exordium. The speaker here provides a narrative account of what has prothesis of Facts narratio happened and generally explains the nature of the case. Quintilian adds Introduction Introduction that the narratio is followed by the propositio, a kind of summary of the issues or a statement of the charge. Coming between the narratio and the partitio of a classical oration, the   Summary propostitio propositio provides a brief summary of what one is about to speak on, or Abstract Abstract concisely puts forth the charges or accusation. Following the statement of facts, or narratio, comes the partitio or divisio. Division/ In this section of the oration, the speaker outlines what will follow, in Table of   outline partitio accordance with what's been stated as the status, or point at issue in the Contents Article Outline case. Quintilian suggests the partitio is blended with the propositio and also assists memory. Following the division / outline or partitio comes the main body of the pistis Proof confirmatio speech where one offers logical arguments as proof. The appeal to logos is Results Methods, Results emphasized here. Following the the confirmatio or section on proof in a classical oration,   Refutation refutatio comes the refutation. As the name connotes, this section of a speech was Discussion Discussion devoted to answering the counterarguments of one's opponent. Following the refutatio and concluding the classical oration, the peroratio epilogos   peroratio conventionally employed appeals through pathos, and often included a Discussion Discussion summing up (see the figures of summary, below).
  • 11. 2nd Attempt: Story Grammar The Story of Goldilocks Story Grammar Paper The AXH Domain of Ataxin-1 Mediates and the Three Bears Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged. a little girl named Goldilocks Characters Objects of the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ study tract, She went for a walk in the Location Experimental studied and compared in vivo effects and interactions to those o forest. Pretty soon, she came setup the human protein upon a house. She knocked and, when no Goal Theme Research Gain insight into how Atx-1's function contributes to SCA1 one answered, goal pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood. she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression At the table in the kitchen, Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When there were three bowls of Overexpressed in Files porridge. Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain She tasted the porridge from Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system the first bowl. (Brand and Perrimon, 1993) and compared its effects to those o hAtx-1. This porridge is too hot! she Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which exclaimed. drives expression in the differentiated R1-R6 photoreceptor cell (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1 [82Q]. Although at 2 days after eclosion, overexpression of eithe Atx-1 does not show obvious morphological changes in the photoreceptor cells So, she tasted the porridge   Data (data not shown), from the second bowl. This porridge is too cold, she Outcome Results both genotypes show many large holes and loss of cell integrity said at 28 days So, she tasted the last bowl of   Data (Figures 1B-1D). porridge.
  • 12. 3rd Attempt: Discourse Segments - “A text is made up of Discourse Segments and the relations between them” - Grosz and Sidner, Mann-Thomson, Marcu, Swales - Discourse Segment Purpose: element that has a consistent rhetorical/pragmatic goal. - Define for Biological Research Article
  • 13. Discourse Segments In Biology <Goal> To examine miRNA expression from the miR-Vec system, </Goal> <Method> a miR-24 minigene-containing virus was transduced into human cells. Expression was determined using an RNase protection assay (RPA) with a probe designed to identify both precursor and mature miR-24 (Figure 1B). </Method> <Result> Figure 1C shows that cells transduced with miR-Vec-24 clearly express high levels of mature miR-24, whereas little expression was detected in control- transduced cells. </Result>
  • 14. 12
  • 15. Segments vs. Sections Introduction Method Results Discussion Total Fact 63 0 104 37 204 Problem 20 0 10 15 45 Goal 2 0 72 6 80 Method 2 all 129 6 137 Result 10 0 230 44 284 Implication 14 0 100 36 150 Hypothesis 10 0 33 26 69 Total 121 0 678 170 969
  • 16. Segment Tense Fact Problem Goal Method Result Implication Hypothesis Present active 72 46% 27 60% 15 23% 7 7% 37 16% 69 51% 38 55% Present 5 3% 2 4% 2 3% 1 1% 1 0% 11 8% 1 1% passive Past active 18 11% 5 11% 11 17% 48 47% 122 54% 16 12% 8 12% Past passive 25 16% 2 4% 1 2% 17 17% 21 9% 1 1% 5 7% Future 2 1% 3 7% 0 0% 0 0% 1 0% 0 0% 0 0% Imperfect: quot;toquot; 13 8% 2 4% 32 50% 2 2% 20 9% 14 10% 7 10% Gerund (quot;ingquot;) 22 14% 4 9% 3 5% 28 27% 23 10% 24 18% 10 14% Total 157 100% 45 100% 64 100% 103 100% 225 100% 135 100% 69 100%
  • 17. Segment order Fact Hypothesis Problem Goal Method Result Implication End Total Start 18 3 1 8 2 2 4 0 38 Fact 83 22 13 17 9 31 12 1 188 Hypothesis 20 5 3 7 6 2 6 3 52 Problem 9 7 7 2 3 5 3 3 39 Goal 7 0 2 4 46 6 0 0 65 Method 13 2 3 10 25 54 3 0 110 Result 23 9 4 6 16 85 78 6 227 Implication 13 6 4 12 11 30 12 25 113 Total 186 54 37 61 118 215 118 38 827
  • 18. Discourse: A Fact(ory) hypothetical realm: realm of activity: (might, would) (to test, to see) goal to problem results we realm of introduction method experience: past resulting in result suggests that hypothesis incongruity or ignorance discussion realm of models: fact fact fact implication present Shared view Own view discussion
  • 19. Links (Under Construction) To references: - From/to segment type makes difference: methods link, fact link, agree/disagree link - Not clear where to link into: is claim truly in referred document? How to locate? To figures/tables: - Usually main proof in results (methods) segments: need to allow multi-media elements in system! Discourse relations: - Many taxonomies: RST, Hovy, Sanders, ClaiMaker - Identify textual coherence/argumentation...
  • 20. Coherence Markers Fact Problem Goal Method Results Implication Hypothesi s Fact in animals however to, we we fused, we in contrast, we our data suggesting (3x) examined utilised found (5x), suggest, we that (2x) (2x) though, on propose that, average, under consistent our conditions with Problem we fused in this paper Goal we isolated we showed Method we found (2x), but suggests we while, as seen predicted Results in addition, we utilised, interestingly (2x), (strongly) we in contrast we used since (3x), also suggests/ propose, (2x), while (2x), suggesting suggesting second (2x), third that (8x), that (2x), finally (2x), implicating subsequent, (2x), thereafter, in our consistent study with (2x), demonstratin g that (3x) Implication to verify, to we however, first also in theory confirm replaced, we (2x), interestingly fused, we (2x), consistent tried with, in our analysis, strikingly, neither
  • 21. Preliminary Hypotheses 1 'To' infinitive appears as marker of Goal moves + 2 Sequential connectives appear within same segment type - 3 'though', 'however', 'therefore' - causal connectives occur at all 0 -> Problem and -> Hypothesis boundaries 4 'suggests' occurs at Results-> Implication/Hypothesis boundary +/0 5 'we found' /'we observed'/ 'we showed' -> Result boundary +/0 6 'we + other verb' occurs at -> method boundary 0 7 Contrast/correspondence in Fact <-> Result <-> Implication moves +!
  • 23. Research Goals fulfilled? allow computer-aided access to knowledge: yes, but: > need to identify if they do cover this genre > need to finalize a structure of relations other genres/domains? > investigate more than cell biology how do we extract this structure? > collaborative attempts to identify segment markers/ relationships - next step how do we use this structure? : [ DEMO ] > possible collaborations with sensemaking systems?
  • 24. Preliminary Conclusions - Science is created in text - Goal of text is to convince peers that claims (backed by data) belong to fact canon - Text convinces humans through rhetorical/narrative discourse structure - Text creates meaning in the human mind - Discourse parsing could allow access to knowledge structure - More work needed: collaborations?
  • 25. Questions? anita@cs.uu.nl http://people.cs.uu.nl/anita
  • 27. Related work Bio-informatics Style Shum et Harmsze Swales RST Teufel Collier Guides al et.al Sections x x x Moves x x x x! Entities x x Embedding x x Discourse relations x x x Argumentational x x relations * Need complete model for multidocument collection – markup content elements and relationships * Unique role as a publisher: can apply/mandate at the source
  • 28. Total Fact Problem Goal Method Result Implication Hypothesis End Total Start 18 1 8 2 2 4 3 0 38 Fact 83 13 17 9 31 12 22 1 188 Problem 9 7 2 3 5 3 7 3 39 Goal 7 2 4 46 6 0 0 0 65 Method 13 3 10 25 54 3 2 0 110 Result 23 4 6 16 85 78 9 6 227 Implication 13 4 12 11 30 12 6 25 113 Hypothesis 20 3 7 6 2 6 5 3 52 Total 186 37 61 118 215 118 54 38 827 Selfs 221 Model: 399 19 % in 65.84% Model:
  • 29. Clause Classification Test Nr Section Introduction Results Discussion Results Clause assignment test (8 tests handed in, avg. 38 clauses each): A1 Agami, Results 4 114 Clauses 51 No Disagreement A2 Agami, Discussion ½ 2 ½ 13 Fact/Result 11 Fact/Problem 10 Method/Result A3 Agami, Introduction 3 7 Result/Implication 4 Goal/Method 3 Problem/Goal S1 Serrano, Results 2 2 Goal/Result 2 Problem/Interpretation 2 Fact/Interpretation S2 Serrano, Discussion 1 1 1 Problem/Result Comments on classification: S3 Serrano, Introduction 2 • Incomplete sentences are unclear, hard to classify • Add ‘Hypothesis’ category, exx. clauses 8, 33, 74a, 77, 78b. V1 Voorhoeve, Results 2 • Other possible categories: Assumption, Observation, “Given that...” V2 Voorhoeve, 3 Discussion V3 Voorhoeve, 1 2 Introduction 24
  • 30. References • Austin, J.L. How to do things with words, J.O. Urmson, ed. Oxford: Clarendon Press, 1962. • Bazerman, Charles : Shaping written knowledge : the genre and activity of the experimental article in science, Madison, Wisconsin: Univ. of Wisconsin Press, 1988. • F.J. Bex, H. Prakken, C. Reed & D.N. Walton, Towards a formal account of reasoning about evidence: argumentation schemes and generalisations. Artificial Intelligence and Law 11 (2003), 125-165 • Buckingham Shum, Simon J. Uren,V. et. al , Modelling Naturalistic Argumentation in Research literatures: Representation and Interaction Design Issues, Tech Report kmi-04-28, December 2004 • Harmsze, Frédérique. PhD Thesis, February 9, 2000. A modular structure for scientific articles in an electronic environment (HTML & PDF). • Hovy, E. Automated discourse generation using discourse structure relations. Art. Intelligence 63(1-2): 1993. 341-386. • Kircz, Joost G.. Modularity: the next form of scientific information presentation? Journal of Documentation. vol.54. No. 2. March 1998. pp. 210-235. • Kuhn, Thomas, The Structure of Scientific Revolutions (Chicago: University of Chicago Press, 1962) • Latour, B., Science in Action, How to Follow Scientists and Engineers through Society, (Cambridge, Ma.: Harvard University Press, 1987) • Latour, Bruno, Steve Woolgar, Jonas Salk, Laboratory Life: The Construction of Scientific Facts, Princeton University Press, 1986 25