SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Categorizing Epistemic
                               Segment Types in Biology
                                  Research Articles

                                       Anita de Waard
                                  Elsevier Labs, Amsterdam
                                 UiL-OTS, Utrecht University




Thursday, September 17, 2009                                   1
Introduction




Thursday, September 17, 2009                  2
Why Study Biological Discourse?

                    -      There is too much of it!

                    -      Text mining and ‘fact
                           extraction’ techniques are
                           gaining ground to tame this
                           tangle

                    -      Emerging area of biological
                           natural language processing
                           (BioNLP): subfield of computational linguistics

                    -      Main focus: identifying biological entities (genes,
                           proteins, drugs) and their relationships



Thursday, September 17, 2009                                                     3
Example state of the art: MEDIE

                                       without some idea of the status of the
                                        sentence, it cannot be interpreted!

    Alteration of nm23, P53, and S100A4 expression may
    contribute to the development of gastric



                     Previous studies have implicated miR-34a as a tumor
                     suppressor gene whose transcription is activated by p53.




Thursday, September 17, 2009                                                    4
How can linguistics help?
             Underlying model of text mining systems:

                      -        Scientific paper is ‘statement of pertinent facts’
                      -        So: finding entities and relationships will give you a summary of
                               the knowledge within the paper
                      -        However, information extracted this way is not very useful....
             Proposed approach: treat scientific paper as a persuasive text: specific
             genre, with genre characteristics and allowed persuasive techniques:
                      -        ‘these results suggest’ (depersonification)

                      -        ‘as fig. 2a shows’ (evidence is in the data)

                      -        ‘oncogenes produce a stress response [Serrano, 2003]’

             References and data form a “folded array of successive defense lines, behind
             which scientists ensconce themselves” [Latour, 1988]




Thursday, September 17, 2009                                                                      5
Modality Dropping
                    -      Fact creation occurs through social acceptance: “[Y]ou can
                           transform .. fiction into fact just by adding or subtracting
                           references” [Latour, 1988]

                    -      When references are cited the modality is dropped:

                          -    A: ‘these results suggest/demonstate/imply that’ X

                          -    B: ‘A et al. have shown that X [A, 2009]’

                          -    C: ‘X [2009]’

                          -    D: ‘Since X, we investigated the possibility that Y’




Thursday, September 17, 2009                                                             6
Overall Research Questions
              I. (How) can we add epistemic value to results from a
                 text mining system?
              II. How is a scientific fact created, as it moves from a
                  hedged claim to a throughout successive citations?
              III. Can we identify a rhetorically successful text (and
                   help authors create them)?




Thursday, September 17, 2009                                             7
Present work:
            Perform discourse analysis on a few selected texts in
            biology:
            1. Parse text into discourse segments (edu’s) containing a
               single rhetorical move (if possible...)
            2. Determine categories or types of discourse segments
               that have similar rhetorical/pragmatic properties
            3. Look at a number of linguistic characteristics and see if
               these segments share those characteristics.




Thursday, September 17, 2009                                               8
Present research questions:

          i. Can these segments indeed be grouped by linguistic
             characteristics (verb tense, verb registry, metadiscourse
             markers?)
          ii. Does this offer a useful version of the structure of a
              paper?
          iii. Is this useful for enabling automated epistemic markup?
          iv. Can this help us to trace evolution of a hypothesis?




Thursday, September 17, 2009                                             9
Methods




Thursday, September 17, 2009             10
Method
            1. Parse text into Discourse Segments (EDUs) according to
               syntactic criteria
            2. Define set of semantic segment types
            3. Identify semantic type for each segment
            4. Specify linguistic and structural properties for each
               segment
            5. Identify correlations between semantic type and
               structural/syntactic properties
            6. Trace a hypothesis through the process of fact creation




Thursday, September 17, 2009                                             11
Segmentation Criteria
        Goal: ‘one new thought per segment’:
                Figure 4A shows that following RASV12 stimulation, p53
                was stabilized and activated, and its target gene, p21cip1,
                was induced in all cases, indicating an intact p53 pathway
                in these cells.
                a.     Figure 4a shows that
                b.     following RASV12 stimulation
                c.     p53 was stabilized and activated
                d.     and the target gene, p21cip1, was induced in all cases,
                e.     indicating an intact p53 pathway in these cells.



Thursday, September 17, 2009                                                     12
Segmentation Criteria (summary)
                Finite/
                                        Grammatical role                 Segment?                      Example
               Non-finite

                                                                                    The extent to which miRNAs specifically affect
             Finite/Non-finite                  Subject                      N       metastasis

             Finite/Non-finite               Direct Object                   Y       these miRNAs are potential novel oncogenes

                                 Phrase-level adjunct (restrictive and
                 Nonfinite                                                   N       spanning a given miRNA genomic region
                                           non-restrictive)

                 Nonfinite                Clause-level adjunct               Y       by cloning eight miR-Vec plasmids


                                                                                    which is only active when tamoxifen is added (De
                    Finite       Non-restrictive Phrase-level adjunct       Y       Vita et al, 2005) […]


                    Finite         Restrictive Phrase-level adjunct         N       that we examined

                                                                                    which correlates with the reported ES-cell
                    Finite               Clause-level adjunct               Y       expression pattern of the miR-371-3 cluster (Suh et
                                                                                    al, 2004)




Thursday, September 17, 2009                                                                                                              13
Basic Segment Types
                         Segment               Description                                     Example

                                          a known fact, generally
                                Fact                                     mature miR-373 is a homolog of miR-372
                                         without explicit citation

                                            a proposed idea, not
                         Hypothesis                                      This could for instance be a result of high mdm2 levels
                                           supported by evidence

                                        unresolved, contradictory, or However, further investigation is required to
                           Problem
                                               unclear issue          demonstrate the exact mechanism of LATS2 action


                               Goal             research goal            To identify novel functions of miRNAs,


                           Method           experimental method          Using fluorescence microscopy and luciferase assays,

                                        a restatement of the outcome all constructs yielded high expression levels of mature
                               Result
                                               of an experiment      miRNAs

                                          an interpretation of the
                                                                         our procedure is sensitive enough to detect mild growth
                         Implication      results, in light of earlier
                                            hypotheses and facts         differences




Thursday, September 17, 2009                                                                                                       14
Two Types of Derived Segment Types
                ‘Other-segments’, related to (referenced) other work:

                -      other-result: ‘they are also found in the FCX and other cortical structures
                       ([Sokoloff et al., 1990]’

                -      other-goal: ‘the role of D3 receptors in the control of motivation and affect
                       has been intensively studied [Heidbreder et al., 2005]’

                -      other-implication: ‘D1 or, more likely, D5, receptors have been implicated in
                       mechanisms underlying long-term spatial memory [Hersi et al., 1995]’

                Regulatory segments, acting as matrix sentences framing other segments:

                -      reg-hypothesis: ‘we hypothesized that ’

                -      reg-implication: ‘These observations suggest that’

                -      intratextual: ‘Fig 4 shows that’

                -      intertextual: ‘reviewed in (Serrano, 1997)’




Thursday, September 17, 2009                                                                           15
My categories vs. Latour (1979)




Thursday, September 17, 2009                       16
Linguistic and structural properties
                        1. Position in text

                               -   Section of the paper (Introduction, Results, Discussion)
                               -   Beginning/middle/end of section
                               -   First/second third part of sentence
                        2. Verb:

                               -   Tense, aspect, voice
                               -   Verb class (idiosyncratic)
                               -   Lexicon

                        3. Metadiscourse markers [Hyland, 2003]:

                               -   Connectives
                               -   Endophorics, Evidentials
                               -   Hedges, Boosters
                               -   Person markers


Thursday, September 17, 2009                                                                  17
Verb class
    Two types of entities interact in biology texts:
    -       Thing:
              -       Thing -> Increase, die, etc
              -       Thing-thing: affect, stimulate etc.
    -       People:
              -       People -> Thing:
                    -          Examine (Goal)
                    -          Operate (Method)

                    -          Observe (Result)
                    -          Implicate (Implication)
              -       People - people: Report



Thursday, September 17, 2009                                   18
Results




Thursday, September 17, 2009             19
Two texts
                    1. Voorhoeve, 2006: Cell

                          -    Cell biology text, written by group in Amsterdam

                          -    Dealing with microRNAs - hot topic

                          -    290 citations in Google Scholar: succesful paper!


                    2. Louiseau, 2008: European Neuropsychopharmacology

                          -    Text on schizophrenia

                          -    Prompted by interest from Pharma company

                          -    Adjacent subfield of biology (neuropharmacology)




Thursday, September 17, 2009                                                       20
Segment vs. Section




Thursday, September 17, 2009                         21
Segment vs.Verb Type




Thursday, September 17, 2009                          22
Segment vs. verb tense




Thursday, September 17, 2009                            23
Segments vs. markers




Thursday, September 17, 2009                          24
Segment Order




Thursday, September 17, 2009                   25
Discussion




Thursday, September 17, 2009                26
Interpretation: 3 Realms of Science:
                                    (1) Oncogene-induced senescence is            (4b) transduction with either
       Conceptual                   characterized by the appearance of            miR-Vec-371&2 or miR-Vec-
                                                                                                     V12
                                    cells with a flat morphology that             373 prevents RAS -
         realm                      express senescence associated (SA)-           induced growth arrest in
                                     -Galactosid a s e .                          primary human cells.


                                         (2a) Indeed,              (4a) Altogether, these data
                                                                   show that

Experimental realm (2b) control RAS                 V12
                                                       -arrested                     (3b) very few cells showed
                                    cells showed relatively high                     senescent morphology when
                                                                   (3a) Consistent
                                    abundance of flat cells                          transduced with either miR-
                                                                   with the cell
                                    expressing SA- -                                 Vec-371&2, miR-Vec-373, or
                                                                   growth assay,                 kd
                                    Galactosidase                                    control p53 .

                                           (2c) (Figures
                                           2G and 2H).
             Data realm
                                            (Figures)




Thursday, September 17, 2009                                                                                       27
Tense 1: Concepts vs. Experiment
                               (1) Oncogene-induced senescence is            (4b) transduction with either




                                                                                                              Concept realm
                               characterized by the appearance of            miR-Vec-371&2 or miR-Vec-
                                                                                                V12
                               cells with a flat morphology that             373 prevents RAS -
                               express senescence associated (SA)-           induced growth arrest in
                                -Galactosid a s e .                          primary human cells.


                                    (2a) Indeed,              (4a) Altogether, these data
                                                              show that




                                                                                                              Experimental realm
                                                                                                              (personal, past)
                                               V12
                               (2b) control RAS -arrested                       (3b) very few cells showed
                               cells showed relatively high                     senescent morphology when
                                                              (3a) Consistent
                               abundance of flat cells                          transduced with either miR-
                                                              with the cell
                               expressing SA- -                                 Vec-371&2, miR-Vec-373, or
                                                              growth assay,                 kd
                               Galactosidase                                    control p53 .

                                      (2c) (Figures
                                      2G and 2H).




                                                                                                              (nontverbal)
                                                                                                              Data realm
                                       (Figures)




Thursday, September 17, 2009                                                                                                       28
Tense 2: Referral

                               past                                present                                  future
                                           Introduction                                        Discussion
               own paper




                                   After     Before current      Current work       After current
                                   other     work: present                            work: past
                                                              (= Results section)
                                   work:
                                   past
               other papers




                              Other Work




Thursday, September 17, 2009                                                                                         29
Tense 1+ 2 = 3:


                                             Claim,
                                              fact
                   Conceptual




                                             Experi
                                             ment
                   Experiential




                                  past     present       future
                                          Reading time




Thursday, September 17, 2009                                      30
Discourse Fact-ory
              hypothetical realm:              hypothesis                                realm of activity:
               (might, would)                                                           (to test, to see)
                                                                         goal
                                                              to
                                      problem                                                 results
                                                                                      we                   realm of
     introduction                                                                   method                experience:
                                                                                                             past
                                                                                     resulting in
                                                                                     result


                                                                                suggests that

                                                            discussion                                 realm of models:
                               fact     fact         fact                                                  present
                                                                                implication



                                 Shared view                               Own view                      discussion

Thursday, September 17, 2009                                                                                              31
Citation and fact creation:                                                                        Yabuta, JBioChem 2007

                                                   Voorhoeve, 2006                                   miR-372 and miR-373 target the
                                                                                                         Lats2 tumor suppressor
             To investigate the possibility that                                                         (Voorhoeve et al., 2006)
            miR-372 and miR-373 suppress the
                expression of LATS2, we...
                                                                                         Raver-Shapira et.al, JMolCell 2007

                                    Therefore, these results point to              two miRNAs, miRNA-372 and-373, function as
                                    LATS2 as a mediator of the miR-372 and        potential novel oncogenes in testicular germ cell
                                    miR-373 effects on cell proliferation and    tumors by inhibition of LATS2 expression, which
                                    tumorigenicity,                                 suggests that Lats2 is an important tumor
                                                                                       suppressor (Voorhoeve et al., 2006).


                                 KnownFact                      KnownFact

 Concepts                                     Hypothesis                 Implication                                   Fact


                                                                                               Goal
                                      Goal


                               Method                  Result                              Method                    Result



                                        Data                                                         Data

      Experiment 1                                                         Experiment 2
Thursday, September 17, 2009                                                                                                          32
Answers to current research questions:
    i.     Can these segments indeed be identified?
                    ✓      yes, adequate evidence, probably ok segments:
                    ‣      need more annotators!
    ii. Does this offer a useful version of the structure of a paper?
                    ✓      yes, offers insight, and a possible model
                    ‣      need to be validated whether this structure holds over more
                           papers, different subcategories
    iii. Is this useful for enabling automated epistemic markup?
                    ✓      first efforts seem promising: simple markers (‘suggest’ verbs,
                           connectives, etc.) already help
                    ‣      ongoing research! (Sandor, XRCE; Buitelaar, DERI)
    iv. Can this help us to trace the evolution of a hypothesis?
                    ✓      anecdotal: promising
                    ‣      need to scale up!


Thursday, September 17, 2009                                                               33
Where are we on overall research questions?
              I. (How) can we add epistemic value to results from a
                 text mining system?
              ‣       Segment types help - need to expand + verify
              II. How is a scientific fact created, as it moves from a
                  hedged claim to a throughout successive citations?
              ‣       Model is developing, also spurt of other work!
              III. Can we identify a rhetorically successful text (and
                   help authors create them)?
              ‣       Not addressed yet - verb tense, hedging seem
                      important.


Thursday, September 17, 2009                                             34
Work on (biological) scientific discourse

                    -      Is a growing field of interest!

                    -      Several projects developing going ‘beyond the facts’

                    -      Epistemic modality is becoming a term
                           bioinformaticians are exploring

                    -      Room for people who know about discourse
                           analysis!




Thursday, September 17, 2009                                                      35

Más contenido relacionado

Similar a Epistemics

Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...
Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...
Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...GigaScience, BGI Hong Kong
 
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)Daisuke BEKKI
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesMike Hucka
 
Luciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsLuciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsJoanne Luciano
 
Luciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsLuciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsJoanne Luciano
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summaryJaved Riza
 
A Computational Framework for Concept Representation in Cognitive Systems and...
A Computational Framework for Concept Representation in Cognitive Systems and...A Computational Framework for Concept Representation in Cognitive Systems and...
A Computational Framework for Concept Representation in Cognitive Systems and...Antonio Lieto
 
15methods for Qualitative Research
15methods for Qualitative Research15methods for Qualitative Research
15methods for Qualitative Researchmrinalwkh
 
Afl 521 interpretive
Afl 521 interpretiveAfl 521 interpretive
Afl 521 interpretiveRandy Nobleza
 
A model for epistemic modality and knowledge attribution
A model for epistemic modality and knowledge attributionA model for epistemic modality and knowledge attribution
A model for epistemic modality and knowledge attributionAnita de Waard
 
download
downloaddownload
downloadbutest
 
download
downloaddownload
downloadbutest
 
Psychological processes in language acquisition
Psychological processes in language acquisitionPsychological processes in language acquisition
Psychological processes in language acquisitionsrnaz
 
Representation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelRepresentation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelMihika Shah
 
Philosophy of science summary presentation engelby
Philosophy of science summary presentation engelbyPhilosophy of science summary presentation engelby
Philosophy of science summary presentation engelbyDavid Engelby
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS
 
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
'These Results Suggest That...', Knowledge Attribution in Scientific DiscourseAnita de Waard
 

Similar a Epistemics (20)

Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...
Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...
Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...
 
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resources
 
Luciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsLuciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metrics
 
Luciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsLuciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metrics
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summary
 
A Computational Framework for Concept Representation in Cognitive Systems and...
A Computational Framework for Concept Representation in Cognitive Systems and...A Computational Framework for Concept Representation in Cognitive Systems and...
A Computational Framework for Concept Representation in Cognitive Systems and...
 
15methods for Qualitative Research
15methods for Qualitative Research15methods for Qualitative Research
15methods for Qualitative Research
 
Prosdocimi ucb cdao
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
 
Afl 521 interpretive
Afl 521 interpretiveAfl 521 interpretive
Afl 521 interpretive
 
A model for epistemic modality and knowledge attribution
A model for epistemic modality and knowledge attributionA model for epistemic modality and knowledge attribution
A model for epistemic modality and knowledge attribution
 
download
downloaddownload
download
 
download
downloaddownload
download
 
Meiosis pdf
Meiosis pdfMeiosis pdf
Meiosis pdf
 
Psychological processes in language acquisition
Psychological processes in language acquisitionPsychological processes in language acquisition
Psychological processes in language acquisition
 
A Bridge Not too Far
A Bridge Not too FarA Bridge Not too Far
A Bridge Not too Far
 
Representation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelRepresentation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object model
 
Philosophy of science summary presentation engelby
Philosophy of science summary presentation engelbyPhilosophy of science summary presentation engelby
Philosophy of science summary presentation engelby
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
 

Más de Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 

Más de Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 

Último

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Último (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Epistemics

  • 1. Categorizing Epistemic Segment Types in Biology Research Articles Anita de Waard Elsevier Labs, Amsterdam UiL-OTS, Utrecht University Thursday, September 17, 2009 1
  • 3. Why Study Biological Discourse? - There is too much of it! - Text mining and ‘fact extraction’ techniques are gaining ground to tame this tangle - Emerging area of biological natural language processing (BioNLP): subfield of computational linguistics - Main focus: identifying biological entities (genes, proteins, drugs) and their relationships Thursday, September 17, 2009 3
  • 4. Example state of the art: MEDIE without some idea of the status of the sentence, it cannot be interpreted! Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53. Thursday, September 17, 2009 4
  • 5. How can linguistics help? Underlying model of text mining systems: - Scientific paper is ‘statement of pertinent facts’ - So: finding entities and relationships will give you a summary of the knowledge within the paper - However, information extracted this way is not very useful.... Proposed approach: treat scientific paper as a persuasive text: specific genre, with genre characteristics and allowed persuasive techniques: - ‘these results suggest’ (depersonification) - ‘as fig. 2a shows’ (evidence is in the data) - ‘oncogenes produce a stress response [Serrano, 2003]’ References and data form a “folded array of successive defense lines, behind which scientists ensconce themselves” [Latour, 1988] Thursday, September 17, 2009 5
  • 6. Modality Dropping - Fact creation occurs through social acceptance: “[Y]ou can transform .. fiction into fact just by adding or subtracting references” [Latour, 1988] - When references are cited the modality is dropped: - A: ‘these results suggest/demonstate/imply that’ X - B: ‘A et al. have shown that X [A, 2009]’ - C: ‘X [2009]’ - D: ‘Since X, we investigated the possibility that Y’ Thursday, September 17, 2009 6
  • 7. Overall Research Questions I. (How) can we add epistemic value to results from a text mining system? II. How is a scientific fact created, as it moves from a hedged claim to a throughout successive citations? III. Can we identify a rhetorically successful text (and help authors create them)? Thursday, September 17, 2009 7
  • 8. Present work: Perform discourse analysis on a few selected texts in biology: 1. Parse text into discourse segments (edu’s) containing a single rhetorical move (if possible...) 2. Determine categories or types of discourse segments that have similar rhetorical/pragmatic properties 3. Look at a number of linguistic characteristics and see if these segments share those characteristics. Thursday, September 17, 2009 8
  • 9. Present research questions: i. Can these segments indeed be grouped by linguistic characteristics (verb tense, verb registry, metadiscourse markers?) ii. Does this offer a useful version of the structure of a paper? iii. Is this useful for enabling automated epistemic markup? iv. Can this help us to trace evolution of a hypothesis? Thursday, September 17, 2009 9
  • 11. Method 1. Parse text into Discourse Segments (EDUs) according to syntactic criteria 2. Define set of semantic segment types 3. Identify semantic type for each segment 4. Specify linguistic and structural properties for each segment 5. Identify correlations between semantic type and structural/syntactic properties 6. Trace a hypothesis through the process of fact creation Thursday, September 17, 2009 11
  • 12. Segmentation Criteria Goal: ‘one new thought per segment’: Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells. a. Figure 4a shows that b. following RASV12 stimulation c. p53 was stabilized and activated d. and the target gene, p21cip1, was induced in all cases, e. indicating an intact p53 pathway in these cells. Thursday, September 17, 2009 12
  • 13. Segmentation Criteria (summary) Finite/ Grammatical role Segment? Example Non-finite The extent to which miRNAs specifically affect Finite/Non-finite Subject N metastasis Finite/Non-finite Direct Object Y these miRNAs are potential novel oncogenes Phrase-level adjunct (restrictive and Nonfinite N spanning a given miRNA genomic region non-restrictive) Nonfinite Clause-level adjunct Y by cloning eight miR-Vec plasmids which is only active when tamoxifen is added (De Finite Non-restrictive Phrase-level adjunct Y Vita et al, 2005) […] Finite Restrictive Phrase-level adjunct N that we examined which correlates with the reported ES-cell Finite Clause-level adjunct Y expression pattern of the miR-371-3 cluster (Suh et al, 2004) Thursday, September 17, 2009 13
  • 14. Basic Segment Types Segment Description Example a known fact, generally Fact mature miR-373 is a homolog of miR-372 without explicit citation a proposed idea, not Hypothesis This could for instance be a result of high mdm2 levels supported by evidence unresolved, contradictory, or However, further investigation is required to Problem unclear issue demonstrate the exact mechanism of LATS2 action Goal research goal To identify novel functions of miRNAs, Method experimental method Using fluorescence microscopy and luciferase assays, a restatement of the outcome all constructs yielded high expression levels of mature Result of an experiment miRNAs an interpretation of the our procedure is sensitive enough to detect mild growth Implication results, in light of earlier hypotheses and facts differences Thursday, September 17, 2009 14
  • 15. Two Types of Derived Segment Types ‘Other-segments’, related to (referenced) other work: - other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’ - other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’ - other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’ Regulatory segments, acting as matrix sentences framing other segments: - reg-hypothesis: ‘we hypothesized that ’ - reg-implication: ‘These observations suggest that’ - intratextual: ‘Fig 4 shows that’ - intertextual: ‘reviewed in (Serrano, 1997)’ Thursday, September 17, 2009 15
  • 16. My categories vs. Latour (1979) Thursday, September 17, 2009 16
  • 17. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence 2. Verb: - Tense, aspect, voice - Verb class (idiosyncratic) - Lexicon 3. Metadiscourse markers [Hyland, 2003]: - Connectives - Endophorics, Evidentials - Hedges, Boosters - Person markers Thursday, September 17, 2009 17
  • 18. Verb class Two types of entities interact in biology texts: - Thing: - Thing -> Increase, die, etc - Thing-thing: affect, stimulate etc. - People: - People -> Thing: - Examine (Goal) - Operate (Method) - Observe (Result) - Implicate (Implication) - People - people: Report Thursday, September 17, 2009 18
  • 20. Two texts 1. Voorhoeve, 2006: Cell - Cell biology text, written by group in Amsterdam - Dealing with microRNAs - hot topic - 290 citations in Google Scholar: succesful paper! 2. Louiseau, 2008: European Neuropsychopharmacology - Text on schizophrenia - Prompted by interest from Pharma company - Adjacent subfield of biology (neuropharmacology) Thursday, September 17, 2009 20
  • 21. Segment vs. Section Thursday, September 17, 2009 21
  • 22. Segment vs.Verb Type Thursday, September 17, 2009 22
  • 23. Segment vs. verb tense Thursday, September 17, 2009 23
  • 24. Segments vs. markers Thursday, September 17, 2009 24
  • 27. Interpretation: 3 Realms of Science: (1) Oncogene-induced senescence is (4b) transduction with either Conceptual characterized by the appearance of miR-Vec-371&2 or miR-Vec- V12 cells with a flat morphology that 373 prevents RAS - realm express senescence associated (SA)- induced growth arrest in -Galactosid a s e . primary human cells. (2a) Indeed, (4a) Altogether, these data show that Experimental realm (2b) control RAS V12 -arrested (3b) very few cells showed cells showed relatively high senescent morphology when (3a) Consistent abundance of flat cells transduced with either miR- with the cell expressing SA- - Vec-371&2, miR-Vec-373, or growth assay, kd Galactosidase control p53 . (2c) (Figures 2G and 2H). Data realm (Figures) Thursday, September 17, 2009 27
  • 28. Tense 1: Concepts vs. Experiment (1) Oncogene-induced senescence is (4b) transduction with either Concept realm characterized by the appearance of miR-Vec-371&2 or miR-Vec- V12 cells with a flat morphology that 373 prevents RAS - express senescence associated (SA)- induced growth arrest in -Galactosid a s e . primary human cells. (2a) Indeed, (4a) Altogether, these data show that Experimental realm (personal, past) V12 (2b) control RAS -arrested (3b) very few cells showed cells showed relatively high senescent morphology when (3a) Consistent abundance of flat cells transduced with either miR- with the cell expressing SA- - Vec-371&2, miR-Vec-373, or growth assay, kd Galactosidase control p53 . (2c) (Figures 2G and 2H). (nontverbal) Data realm (Figures) Thursday, September 17, 2009 28
  • 29. Tense 2: Referral past present future Introduction Discussion own paper After Before current Current work After current other work: present work: past (= Results section) work: past other papers Other Work Thursday, September 17, 2009 29
  • 30. Tense 1+ 2 = 3: Claim, fact Conceptual Experi ment Experiential past present future Reading time Thursday, September 17, 2009 30
  • 31. Discourse Fact-ory hypothetical realm: hypothesis realm of activity: (might, would) (to test, to see) goal to problem results we realm of introduction method experience: past resulting in result suggests that discussion realm of models: fact fact fact present implication Shared view Own view discussion Thursday, September 17, 2009 31
  • 32. Citation and fact creation: Yabuta, JBioChem 2007 Voorhoeve, 2006 miR-372 and miR-373 target the Lats2 tumor suppressor To investigate the possibility that (Voorhoeve et al., 2006) miR-372 and miR-373 suppress the expression of LATS2, we... Raver-Shapira et.al, JMolCell 2007 Therefore, these results point to two miRNAs, miRNA-372 and-373, function as LATS2 as a mediator of the miR-372 and potential novel oncogenes in testicular germ cell miR-373 effects on cell proliferation and tumors by inhibition of LATS2 expression, which tumorigenicity, suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006). KnownFact KnownFact Concepts Hypothesis Implication Fact Goal Goal Method Result Method Result Data Data Experiment 1 Experiment 2 Thursday, September 17, 2009 32
  • 33. Answers to current research questions: i. Can these segments indeed be identified? ✓ yes, adequate evidence, probably ok segments: ‣ need more annotators! ii. Does this offer a useful version of the structure of a paper? ✓ yes, offers insight, and a possible model ‣ need to be validated whether this structure holds over more papers, different subcategories iii. Is this useful for enabling automated epistemic markup? ✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help ‣ ongoing research! (Sandor, XRCE; Buitelaar, DERI) iv. Can this help us to trace the evolution of a hypothesis? ✓ anecdotal: promising ‣ need to scale up! Thursday, September 17, 2009 33
  • 34. Where are we on overall research questions? I. (How) can we add epistemic value to results from a text mining system? ‣ Segment types help - need to expand + verify II. How is a scientific fact created, as it moves from a hedged claim to a throughout successive citations? ‣ Model is developing, also spurt of other work! III. Can we identify a rhetorically successful text (and help authors create them)? ‣ Not addressed yet - verb tense, hedging seem important. Thursday, September 17, 2009 34
  • 35. Work on (biological) scientific discourse - Is a growing field of interest! - Several projects developing going ‘beyond the facts’ - Epistemic modality is becoming a term bioinformaticians are exploring - Room for people who know about discourse analysis! Thursday, September 17, 2009 35