SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
Structured and Unstructured:
                 Extracting Information From Classics
                            Scholarly Texts

                                              Matteo Romanello1
                                     1 Centre   for Computing in the Humanities
                                                 King’s College London


                                 Graduate Colloquium - DHSI 2010
                               University of Victoria BC - 8th June 2010



Romanello                                                                         CCH
Extracting Information From Scholarly Texts
The Project at a glance



               Project started in October 2009;
               Disciplines: Digital Humanities, Classics, Computer
               Science;
               co-supervised by:
                       Willard McCarty (KCL, Department of Digital Humanities)
                       Jonathan Ginzburg (KCL, Department of Computer
                       Science)
               project supported by an AHRC (Arts and Humanities
               Research Council) award



Romanello                                                                        CCH
Extracting Information From Scholarly Texts
Goal

       Devising an automatic system to improve semantic
       information retrieval over a discipline-specific corpus of
       unstructured texts
               focus on secondary sources (e.g. journal papers) as
               opposed to primary sources (i.e. Ancient Texts)
               automatic -> scalable with huge amount of data
               information retrieval -> the task of retrieving information
               unstructured texts -> raw texts (e.g. .txt files) as opposed
               to the structured/encoded XML

       Example
       “Hom. Il. XII 1”: sequence of 14 characters meaning “first line
       of the twelfth book of Homer’s Iliad”
Romanello                                                                    CCH
Extracting Information From Scholarly Texts
Semantic Information Retrieval




                                 Semantic vs String Matching based IR
Romanello                                                               CCH
Extracting Information From Scholarly Texts
Named Entities as Entry Point to Information




       Entities to be extracted:
            1   Place Names (ancient and modern);
            2   Relevant Person Names (mythological names, ancient authors,
                modern scholars)
            3   References to primary and secondary sources (canonical
                texts and modern publications about them)
Romanello                                                                     CCH
Extracting Information From Scholarly Texts
Work Phases




Romanello                                     CCH
Extracting Information From Scholarly Texts
Corpus building




       Getting materials
       Crawling online archives

       Extracting the text from collected documents
               Tools for text extraction from PDF -> open issues with
               Ancient Greek encoding
               re-OCR documents even the native digital ones




Romanello                                                               CCH
Extracting Information From Scholarly Texts
Corpus Building II


       Corpora
               open access, multilingual
               Princeton/Stanford Working Papers in Classics (PSWPC)
               Lexis online
               470 articles in 2 corpora

       OCR
          Finereader
               Ocropus (layout analysis)
               text extracted from PDFs (tools like pdftotext etc.)
               Alignment of multiple OCR outputs

Romanello                                                              CCH
Extracting Information From Scholarly Texts
Building the Knowledge Base (KB)

       Goal: integrate different data sources into a single KB
       Why?
               Information about the same entities spread over several
               data sources
               Data sources might use different output formats (raw text,
               DBs, HTML, XML etc.)
               partial overlappings but no interoperability

       How?
          Use of high level ontologies to map records related to the
          same entity
               Result: KB containing semantic data

Romanello                                                                   CCH
Extracting Information From Scholarly Texts
Corpus Processing



       Tasks
            1   sentence identification
            2   entities extraction (named entities recognition +
                disambiguation)
                       KB implied to build up an entity context
            3   canonical references extraction
                    KB provides training data
            4   modern bibliographic references extraction
                   KB provides list of journals/name places/authors to improve
                   the perfomances of the tool



Romanello                                                                        CCH
Extracting Information From Scholarly Texts
Canonical References




Romanello                                     CCH
Extracting Information From Scholarly Texts
Canonical References Extraction

            1   citations used specifically for primary sources (i.e. works of
                ancient authors)
            2   essential entry point to information: refer to the research
                object, i.e. ancient texts
            3   logical instead of physical citation scheme (e.g., chapter/paragr
                vs. page)
            4   variation -> time, style, language (regexp insufficient!)

       Example
       Hom. Il. XII 1
       Aesch. ’Sept.’ 565-67, 628-30; Ar. ’Arch.’ 803
       Hes. fr. 321 M.-W.
       Callimaco, ’ep.’ 28 Pf., 5-6

Romanello                                                                           CCH
Extracting Information From Scholarly Texts
So What?




       New Possible Research Questions:
          how citing primary sources in Classics changed?
               what are the characteristics of citation and co-citation
               networks?
               the traditional IR tools in Classics are actually exhaustive?




Romanello                                                                      CCH
Extracting Information From Scholarly Texts
Why a Digital Humanities project?



               Better understanding of
                       the discipline specifities
                       users’ needs
               Writing code to develop a project means
                       formalizing the way a given result is obtained
                       creating a repeatable and thus confutable process
                       introducing a reasoning based on the analysis of
                       quantitative data into Classics
               Being able to
                       apply the product of a DH research to traditional scholarship




Romanello                                                                              CCH
Extracting Information From Scholarly Texts
Thanks for your attention!
       matteo.romanello@kcl.ac.uk
       http://kcl.academia.edu/MatteoRomanello




Romanello                                        CCH
Extracting Information From Scholarly Texts

Más contenido relacionado

Similar a Structured and Unstructured:Extracting Information From Classics Scholarly Texts

The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04
Rinke Hoekstra
 
Rethinking Critical Editions of Fragments by Ontologies
Rethinking Critical Editions of Fragments by OntologiesRethinking Critical Editions of Fragments by Ontologies
Rethinking Critical Editions of Fragments by Ontologies
Matteo Romanello
 
M.Romanello Ecal Presentation
M.Romanello Ecal PresentationM.Romanello Ecal Presentation
M.Romanello Ecal Presentation
Matteo Romanello
 
Eswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalEswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies final
Elena Simperl
 
Writing Right: Teaching Writing Conventions Specific to a Discipline
Writing Right: Teaching Writing Conventions Specific to a DisciplineWriting Right: Teaching Writing Conventions Specific to a Discipline
Writing Right: Teaching Writing Conventions Specific to a Discipline
Robert Domanski
 
An Ontological View of Canonical Citations
An Ontological View of Canonical CitationsAn Ontological View of Canonical Citations
An Ontological View of Canonical Citations
Michele Pasin
 
PA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.docPA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.doc
butest
 

Similar a Structured and Unstructured:Extracting Information From Classics Scholarly Texts (20)

Romanello tokyo
Romanello tokyoRomanello tokyo
Romanello tokyo
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04
 
Rethinking Critical Editions of Fragments by Ontologies
Rethinking Critical Editions of Fragments by OntologiesRethinking Critical Editions of Fragments by Ontologies
Rethinking Critical Editions of Fragments by Ontologies
 
Global Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryGlobal Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage Library
 
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
 
M.Romanello Ecal Presentation
M.Romanello Ecal PresentationM.Romanello Ecal Presentation
M.Romanello Ecal Presentation
 
Eswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalEswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies final
 
Writing Right: Teaching Writing Conventions Specific to a Discipline
Writing Right: Teaching Writing Conventions Specific to a DisciplineWriting Right: Teaching Writing Conventions Specific to a Discipline
Writing Right: Teaching Writing Conventions Specific to a Discipline
 
An Ontological View of Canonical Citations
An Ontological View of Canonical CitationsAn Ontological View of Canonical Citations
An Ontological View of Canonical Citations
 
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
 
A Global Library of Life: The Biodiversity Heritage Library
A Global Library of Life: The Biodiversity Heritage LibraryA Global Library of Life: The Biodiversity Heritage Library
A Global Library of Life: The Biodiversity Heritage Library
 
Annotated Bibliographical Reference Corpora In Digital Humanities
Annotated Bibliographical Reference Corpora In Digital HumanitiesAnnotated Bibliographical Reference Corpora In Digital Humanities
Annotated Bibliographical Reference Corpora In Digital Humanities
 
Semantic Libraries: the Container, the Content and the Contenders
Semantic Libraries: the Container, the Content and the ContendersSemantic Libraries: the Container, the Content and the Contenders
Semantic Libraries: the Container, the Content and the Contenders
 
SciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro SlidesSciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro Slides
 
Miao
MiaoMiao
Miao
 
PA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.docPA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.doc
 
Esad 12may2010
Esad 12may2010Esad 12may2010
Esad 12may2010
 

Más de Matteo Romanello

Exploring Citation Networks to Study Intertextuality in Classics
Exploring Citation Networks to Study Intertextuality in ClassicsExploring Citation Networks to Study Intertextuality in Classics
Exploring Citation Networks to Study Intertextuality in Classics
Matteo Romanello
 
DARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceDARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and Space
Matteo Romanello
 
Presentatio @ ELPUB 2008, Toronto
Presentatio @ ELPUB 2008, TorontoPresentatio @ ELPUB 2008, Toronto
Presentatio @ ELPUB 2008, Toronto
Matteo Romanello
 
Linking Primary and Secondary by Microformats
Linking Primary and Secondary by MicroformatsLinking Primary and Secondary by Microformats
Linking Primary and Secondary by Microformats
Matteo Romanello
 

Más de Matteo Romanello (13)

Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
 
Scaling up the Extraction of Canonical Citations in Classics
Scaling up the Extraction of Canonical Citations in ClassicsScaling up the Extraction of Canonical Citations in Classics
Scaling up the Extraction of Canonical Citations in Classics
 
Transforming Indexes Locorum into Citation Networks
Transforming Indexes Locorum into Citation NetworksTransforming Indexes Locorum into Citation Networks
Transforming Indexes Locorum into Citation Networks
 
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
 
Introduction to the Text Reuse panel at DH 2014
Introduction to the Text Reuse panel at DH 2014Introduction to the Text Reuse panel at DH 2014
Introduction to the Text Reuse panel at DH 2014
 
Exploring Citation Networks to Study Intertextuality in Classics
Exploring Citation Networks to Study Intertextuality in ClassicsExploring Citation Networks to Study Intertextuality in Classics
Exploring Citation Networks to Study Intertextuality in Classics
 
DARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceDARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and Space
 
Greedy Enough for the Grid?
Greedy Enough for the Grid?Greedy Enough for the Grid?
Greedy Enough for the Grid?
 
DIGITAL HUMANITIES E FILOLOGIA Un'introduzione
DIGITAL HUMANITIES   E FILOLOGIA   Un'introduzioneDIGITAL HUMANITIES   E FILOLOGIA   Un'introduzione
DIGITAL HUMANITIES E FILOLOGIA Un'introduzione
 
Ht159 Poster
Ht159 PosterHt159 Poster
Ht159 Poster
 
Presentatio @ ELPUB 2008, Toronto
Presentatio @ ELPUB 2008, TorontoPresentatio @ ELPUB 2008, Toronto
Presentatio @ ELPUB 2008, Toronto
 
Linking Primary and Secondary by Microformats
Linking Primary and Secondary by MicroformatsLinking Primary and Secondary by Microformats
Linking Primary and Secondary by Microformats
 
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
 

Último

Último (20)

Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 

Structured and Unstructured:Extracting Information From Classics Scholarly Texts

  • 1. Structured and Unstructured: Extracting Information From Classics Scholarly Texts Matteo Romanello1 1 Centre for Computing in the Humanities King’s College London Graduate Colloquium - DHSI 2010 University of Victoria BC - 8th June 2010 Romanello CCH Extracting Information From Scholarly Texts
  • 2. The Project at a glance Project started in October 2009; Disciplines: Digital Humanities, Classics, Computer Science; co-supervised by: Willard McCarty (KCL, Department of Digital Humanities) Jonathan Ginzburg (KCL, Department of Computer Science) project supported by an AHRC (Arts and Humanities Research Council) award Romanello CCH Extracting Information From Scholarly Texts
  • 3. Goal Devising an automatic system to improve semantic information retrieval over a discipline-specific corpus of unstructured texts focus on secondary sources (e.g. journal papers) as opposed to primary sources (i.e. Ancient Texts) automatic -> scalable with huge amount of data information retrieval -> the task of retrieving information unstructured texts -> raw texts (e.g. .txt files) as opposed to the structured/encoded XML Example “Hom. Il. XII 1”: sequence of 14 characters meaning “first line of the twelfth book of Homer’s Iliad” Romanello CCH Extracting Information From Scholarly Texts
  • 4. Semantic Information Retrieval Semantic vs String Matching based IR Romanello CCH Extracting Information From Scholarly Texts
  • 5. Named Entities as Entry Point to Information Entities to be extracted: 1 Place Names (ancient and modern); 2 Relevant Person Names (mythological names, ancient authors, modern scholars) 3 References to primary and secondary sources (canonical texts and modern publications about them) Romanello CCH Extracting Information From Scholarly Texts
  • 6. Work Phases Romanello CCH Extracting Information From Scholarly Texts
  • 7. Corpus building Getting materials Crawling online archives Extracting the text from collected documents Tools for text extraction from PDF -> open issues with Ancient Greek encoding re-OCR documents even the native digital ones Romanello CCH Extracting Information From Scholarly Texts
  • 8. Corpus Building II Corpora open access, multilingual Princeton/Stanford Working Papers in Classics (PSWPC) Lexis online 470 articles in 2 corpora OCR Finereader Ocropus (layout analysis) text extracted from PDFs (tools like pdftotext etc.) Alignment of multiple OCR outputs Romanello CCH Extracting Information From Scholarly Texts
  • 9. Building the Knowledge Base (KB) Goal: integrate different data sources into a single KB Why? Information about the same entities spread over several data sources Data sources might use different output formats (raw text, DBs, HTML, XML etc.) partial overlappings but no interoperability How? Use of high level ontologies to map records related to the same entity Result: KB containing semantic data Romanello CCH Extracting Information From Scholarly Texts
  • 10. Corpus Processing Tasks 1 sentence identification 2 entities extraction (named entities recognition + disambiguation) KB implied to build up an entity context 3 canonical references extraction KB provides training data 4 modern bibliographic references extraction KB provides list of journals/name places/authors to improve the perfomances of the tool Romanello CCH Extracting Information From Scholarly Texts
  • 11. Canonical References Romanello CCH Extracting Information From Scholarly Texts
  • 12. Canonical References Extraction 1 citations used specifically for primary sources (i.e. works of ancient authors) 2 essential entry point to information: refer to the research object, i.e. ancient texts 3 logical instead of physical citation scheme (e.g., chapter/paragr vs. page) 4 variation -> time, style, language (regexp insufficient!) Example Hom. Il. XII 1 Aesch. ’Sept.’ 565-67, 628-30; Ar. ’Arch.’ 803 Hes. fr. 321 M.-W. Callimaco, ’ep.’ 28 Pf., 5-6 Romanello CCH Extracting Information From Scholarly Texts
  • 13. So What? New Possible Research Questions: how citing primary sources in Classics changed? what are the characteristics of citation and co-citation networks? the traditional IR tools in Classics are actually exhaustive? Romanello CCH Extracting Information From Scholarly Texts
  • 14. Why a Digital Humanities project? Better understanding of the discipline specifities users’ needs Writing code to develop a project means formalizing the way a given result is obtained creating a repeatable and thus confutable process introducing a reasoning based on the analysis of quantitative data into Classics Being able to apply the product of a DH research to traditional scholarship Romanello CCH Extracting Information From Scholarly Texts
  • 15. Thanks for your attention! matteo.romanello@kcl.ac.uk http://kcl.academia.edu/MatteoRomanello Romanello CCH Extracting Information From Scholarly Texts