SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
Data Analysis for Ancient Corpora
Cody Kingham and Dirk Roorda
FAMES, Cambridge, 2019-01-31
0
50
100
150
200
250
conj nmpr subs adjv prep art
Parts of Speech after Atnach in ETCBC Phrase
background
description
mini-study
new horizons
• Put researchers in control of their
data.
• Empower researchers to fully
harness the data available to them.
• Encourage a new paradigm in the
humanities
🤔
"# data
💰
what’s important
limits
researchers
they decide
Text-Fabric and Hebrew Data
• Free, accessible corpus annotation and analysis tool.
• Published the Amsterdam Hebrew data on Github with free,
open-source license.
• Encouraged researchers to step out of their technological
comfort zones.
A Different Vision
• Researchers are in charge of their data and set the agenda for
its use.
• Researchers are empowered with the tools needed for
powerful data analysis.
• Data is made open-source, freely available
Text-Fabric
• Graph model: words, phrases, etc. are “nodes,” relationships
between them are edges.
• We can model complex data structures better than other
methods (e.g. XML).
• All stored in easy-to-understand, plain-text files. No messy
XML, SQL, etc.
&P005381 = MSVO 3, 70
#atf: lang qpc
@tablet
@obverse
@column 1
1.a. 2(N14) , SZE~a SAL TUR3~a NUN~a
1.b. 3(N19) , |GISZ.TE|
2. 1(N14) , NAR NUN~a SIG7
3. 2(N04)# , PIRIG~b1 SIG7 URI3~a NUN~a
@column 2
1. 3(N04) , |GISZ.TE| GAR |SZU2.((HI+1(N57))+(HI+1(N57)))| GI4~a
2. , GU7 AZ SI4~f
@reverse
@column 1
1. 3(N14) , SZE~a
2. 3(N19) 5(N04) ,
3. , GU7
@column 2
1. , AZ SI4~f
CTBA|CTBA#CTBA#CTB###0#0#0#3#1#0#2#0#0#2#0#0#2#0#0#0#0#0 D;L;DOTH|;L;DOT#;L;DOTA#;LD#D#H#0#0#0#3#1#0#3#0#0#2#0#0#2#1#1#3#0#0
D;WOE|;WOE#;WOE#;WOE#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 MW;KA|MW;KA#MW;KA#MWK###0#1#0#3#1#0#2#0#0#0#0#2#0#0#0#0#0#0 BRH|
BR#BRA#BR##H#0#0#0#3#1#0#2#0#0#2#0#0#2#1#1#3#0#0 DDO;D|DO;D#DO;D#DO;D#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 BRH|
BR#BRA#BR##H#0#0#0#3#1#0#2#0#0#2#0#0#2#1#1#3#0#0 DABRHM|ABRHM#ABRHM#ABRHM#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0
ABRHM|ABRHM#ABRHM#ABRHM###0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 AOLD|AOLD#;LD#;LD###0#5#1#0#1#3#2#0#0#0#0#0#0#0#0#0#0#0 LA;SKX|
A;SKX#A;SKX#A;SKX#L##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 A;SKX|A;SKX#A;SKX#A;SKX###0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 AOLD|
Syriac NT (Sedra database)
DEUT33,02 >C- >;71C 1.000 >;71C- >C-
DEUT33,02 DT D.@73T 1.000 D.@73T DT
DEUT33,09 BNW B.@N@73JW 1.000 B.@N@73W BNW
EST 01,16 MWMKN M:MW.K@81N 1.000 M:WM.K@81N MWMKN
EST 03,04 B- K.:- 1.000 B.:- B-
EST 03,04 >MRM >@M:R@70M 1.000 >@M:R@70M >MRM
Hebrew Ketiv-Qere (ETCBC)
Cuneiform Uruk (CDLI)
(1:1:1:1) bi P PREFIX|bi+
(1:1:1:2) somi N STEM|POS:N|LEM:{som|ROOT:smw|M|GEN
(1:1:2:1) {ll~ahi PN STEM|POS:PN|LEM:{ll~ah|ROOT:Alh|GEN
(1:1:3:1) {l DET PREFIX|Al+
(1:1:3:2) r~aHoma`ni ADJ STEM|POS:ADJ|LEM:r~aHoma`n|ROOT:rHm|MS|GEN
(1:1:4:1) {l DET PREFIX|Al+
(1:1:4:2) r~aHiymi ADJ STEM|POS:ADJ|LEM:r~aHiym|ROOT:rHm|MS|GEN
(1:2:1:1) {lo DET PREFIX|Al+
(1:2:1:2) Hamodu N STEM|POS:N|LEM:Hamod|ROOT:Hmd|M|NOM
Arabic Quran (Tanzil)
Source data of a corpus
TEI, Markdown, ASCII, Database
Data structure of TF - the IKEA spirit
node
order! order!
stacks of components
uniquely identified
words
phrases
chapters
verses
Conversion to TF
TF does more than half of the work
# Consider Phlebas
$ author=Iain M. Banks
## 1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of
patterns of nothing;
that’s the bottom line, the final truth.
So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good
ones,
in our own terms?
## 2
Besides,
it left the humans in the Culture free to take care of the things that
really mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
@node
@compiler=Dirk Roorda
@description=the letters of a word
@name=Culture quotes from Iain
Banks
@source=Good Reads
@url=https://www.goodreads.com/
work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
Everything
about
us
everything
around
us
everything
we
know
and
can
know
of
is
composed
ultimately
of
patterns
of
nothing
that’s
the
bottom
line
the
final
truth
So letters
@node
@compiler=Dirk Roorda
@description=the punctuation after
a word
@name=Culture quotes from Iain
Banks
@source=Good Reads
@url=https://www.goodreads.com/
work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
3 ,
6 ,
20 ;
24 ,
27 .
38 ,
45 ,
51 ,
55 ?
,
75 ,
78 ,
,
,
83 ,
88 ,
99 .
punc
banks/tf/
author.tf
gap.tf
letters.tf
number.tf
oslots.tf
otext.tf
otype.tf
punc.tf
terminator.tf
title.tf
TF dataset
otype
@node
@compiler=Dirk Roorda
@name=Culture quotes from Iain Banks
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
1-99 word
100 book
101-102 chapter
103-114 line
115-117 sentence
oslots
@edge
@compiler=Dirk Roorda
@name=Culture quotes from Iain Banks
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
100 1-99
1-55
56-99
1-3
4-6
7-9,14-20
21-27
28-38
39-51
52-55
56
57-75
76-77,81-83
84-88
89-99
1-27
28-55
56-99
1-99 word
100 book
101-102 chapter
103-114 line
115-117 sentence
## 1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of patterns of
nothing;
that’s the bottom line, the final truth.
So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good ones,
in our own terms?
## 2
Besides,
it left the humans in the Culture free to take care of the things that really
mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
otext
@config
@compiler=Dirk Roorda
@fmt:text-orig-full={letters}{punc}
@name=Culture quotes from Iain Banks
@sectionFeatures=title,number
@sectionTypes=book,chapter
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
Computing - Python - Jupyter notebooks
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/start.ipynb
BHSA
Quran
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/quran/start.ipynb
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/syrnt/start.ipynb
Syriac NT
Old Babylon'
https://shebanq.ancient-data.org/hebrew/query?version=4b&id=1050 SHEBANQ
Computing - more power!
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/searchFromMQL.ipynb
BHSA
Quran
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/quran/search.ipynb
Quran
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/quran/search.ipynb
Syriac NT
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/syrnt/search.ipynb
Old Babylon'
Uruk
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/uruk/search.ipynb
UrukPower to you! (without the programming)
Uruk
Uruk
Mini-Study:
Atnachs and Phrase Divisions
• How often do atnach accents disagree with the ETCBC phrase
divisions?
• Why?
Sharing and re-using data
Text-Fabric has been developed by a DANS-employee
as a consequence:
Data export is built in ✅
Provenance tracking is built in ✅
Redistribution of newly created data is built in ✅
sharing #1: GitHub & NBviewer
work done in a Jupyter Notebook inside a GitHub repository
is very sharable
https://github.com/Nino-cunei/primers/blob/master/oldbabylonian/OB-primer1.ipynb
sharing #2: Export from TF-browser
sharing #3: Zenodo
sharing #4: Create new features
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/share.ipynb
• etcbc/valence/tf : the results of the verbal valence work of Janet Dyk in the
SYNVAR project;

• etcbc/lingo/heads/tf : head words for phrases, work done by Cody Kingham;

• ch-jensen/Semantic-mapping-of-participants/actor/tf : participant analysis in
progress by Christian Høygaard-Jensen;

• cmerwich/bh-reference-system/tf: participant analysis in progress by
Christiaan Erwich;

• or whatever you have in the making!

• HINT: semantic/fuzzy/plurality for collective nouns (Chip Hardy?)
https://github.com/ETCBC/lingo/tree/master/easter/tf/c
https://github.com/ETCBC/lingo/tree/master/easter/tf/c
Open Science Rocks
thank you
Cody Kingham codykingham@icloud.com
Dirk Roorda dirk.roorda@dans.knaw.nl

Más contenido relacionado

Similar a Ancient corpora analysis

Digital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldDigital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldaelang
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
 
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithThe world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithSanjiv Kawa
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 
Complex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseComplex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseMax Neunhöffer
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applicationsVasileios Lampos
 
A Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentA Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentKemal Can Kara
 
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Digital History
 
Dark Data In the Long Tail of Science:   Examples in Biology
Dark Data In the Long Tail of Science:  Examples in BiologyDark Data In the Long Tail of Science:  Examples in Biology
Dark Data In the Long Tail of Science:   Examples in BiologyBryan Heidorn
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Duncan Hull
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDan Brickley
 
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012Enno Meijers
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyAlbert Meroño-Peñuela
 
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...andimou
 
DB-IR-ranking
DB-IR-rankingDB-IR-ranking
DB-IR-rankingFELIX75
 

Similar a Ancient corpora analysis (20)

Digital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldDigital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the field
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithThe world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
Complex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseComplex queries in a distributed multi-model database
Complex queries in a distributed multi-model database
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applications
 
A Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentA Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitment
 
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
 
Dark Data In the Long Tail of Science:   Examples in Biology
Dark Data In the Long Tail of Science:  Examples in BiologyDark Data In the Long Tail of Science:  Examples in Biology
Dark Data In the Long Tail of Science:   Examples in Biology
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
Empirical Semantics
Empirical SemanticsEmpirical Semantics
Empirical Semantics
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic Study
 
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
 
Recommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenuRecommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenu
 
DB-IR-ranking
DB-IR-rankingDB-IR-ranking
DB-IR-ranking
 

Más de Dirk Roorda

General Missives
General MissivesGeneral Missives
General MissivesDirk Roorda
 
Text Display (when it gets tricky)
Text Display (when it gets tricky)Text Display (when it gets tricky)
Text Display (when it gets tricky)Dirk Roorda
 
Verbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsVerbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsDirk Roorda
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchersDirk Roorda
 
Annotating the Hebrew Bible
Annotating the Hebrew BibleAnnotating the Hebrew Bible
Annotating the Hebrew BibleDirk Roorda
 
20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissenDirk Roorda
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleDirk Roorda
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDirk Roorda
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDirk Roorda
 
Hebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, LessonsHebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, LessonsDirk Roorda
 
Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Dirk Roorda
 
Data Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleData Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleDirk Roorda
 
Auto ingest demo-werklunch 2013-11-05
Auto ingest demo-werklunch 2013-11-05Auto ingest demo-werklunch 2013-11-05
Auto ingest demo-werklunch 2013-11-05Dirk Roorda
 

Más de Dirk Roorda (20)

TF-FAIR.pdf
TF-FAIR.pdfTF-FAIR.pdf
TF-FAIR.pdf
 
Textpy
TextpyTextpy
Textpy
 
General Missives
General MissivesGeneral Missives
General Missives
 
Text Display (when it gets tricky)
Text Display (when it gets tricky)Text Display (when it gets tricky)
Text Display (when it gets tricky)
 
Tf in-context
Tf in-contextTf in-context
Tf in-context
 
Qdf2tf
Qdf2tfQdf2tf
Qdf2tf
 
Text fabric
Text fabricText fabric
Text fabric
 
Verbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsVerbal Valency in Hebrew Verbs
Verbal Valency in Hebrew Verbs
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchers
 
Annotating the Hebrew Bible
Annotating the Hebrew BibleAnnotating the Hebrew Bible
Annotating the Hebrew Bible
 
20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case Study
 
Award
AwardAward
Award
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case Study
 
Hebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, LessonsHebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, Lessons
 
Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Laf fabric-dh benelux2014
Laf fabric-dh benelux2014
 
Data Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleData Analysis in the Hebrew Bible
Data Analysis in the Hebrew Bible
 
LAF Fabric
LAF FabricLAF Fabric
LAF Fabric
 
Auto ingest demo-werklunch 2013-11-05
Auto ingest demo-werklunch 2013-11-05Auto ingest demo-werklunch 2013-11-05
Auto ingest demo-werklunch 2013-11-05
 

Último

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Último (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

Ancient corpora analysis

  • 1. Data Analysis for Ancient Corpora Cody Kingham and Dirk Roorda FAMES, Cambridge, 2019-01-31 0 50 100 150 200 250 conj nmpr subs adjv prep art Parts of Speech after Atnach in ETCBC Phrase
  • 3. • Put researchers in control of their data. • Empower researchers to fully harness the data available to them. • Encourage a new paradigm in the humanities
  • 4.
  • 5.
  • 6.
  • 8.
  • 9.
  • 10.
  • 12.
  • 13.
  • 14. Text-Fabric and Hebrew Data • Free, accessible corpus annotation and analysis tool. • Published the Amsterdam Hebrew data on Github with free, open-source license. • Encouraged researchers to step out of their technological comfort zones.
  • 15.
  • 16. A Different Vision • Researchers are in charge of their data and set the agenda for its use. • Researchers are empowered with the tools needed for powerful data analysis. • Data is made open-source, freely available
  • 17. Text-Fabric • Graph model: words, phrases, etc. are “nodes,” relationships between them are edges. • We can model complex data structures better than other methods (e.g. XML). • All stored in easy-to-understand, plain-text files. No messy XML, SQL, etc.
  • 18. &P005381 = MSVO 3, 70 #atf: lang qpc @tablet @obverse @column 1 1.a. 2(N14) , SZE~a SAL TUR3~a NUN~a 1.b. 3(N19) , |GISZ.TE| 2. 1(N14) , NAR NUN~a SIG7 3. 2(N04)# , PIRIG~b1 SIG7 URI3~a NUN~a @column 2 1. 3(N04) , |GISZ.TE| GAR |SZU2.((HI+1(N57))+(HI+1(N57)))| GI4~a 2. , GU7 AZ SI4~f @reverse @column 1 1. 3(N14) , SZE~a 2. 3(N19) 5(N04) , 3. , GU7 @column 2 1. , AZ SI4~f CTBA|CTBA#CTBA#CTB###0#0#0#3#1#0#2#0#0#2#0#0#2#0#0#0#0#0 D;L;DOTH|;L;DOT#;L;DOTA#;LD#D#H#0#0#0#3#1#0#3#0#0#2#0#0#2#1#1#3#0#0 D;WOE|;WOE#;WOE#;WOE#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 MW;KA|MW;KA#MW;KA#MWK###0#1#0#3#1#0#2#0#0#0#0#2#0#0#0#0#0#0 BRH| BR#BRA#BR##H#0#0#0#3#1#0#2#0#0#2#0#0#2#1#1#3#0#0 DDO;D|DO;D#DO;D#DO;D#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 BRH| BR#BRA#BR##H#0#0#0#3#1#0#2#0#0#2#0#0#2#1#1#3#0#0 DABRHM|ABRHM#ABRHM#ABRHM#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 ABRHM|ABRHM#ABRHM#ABRHM###0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 AOLD|AOLD#;LD#;LD###0#5#1#0#1#3#2#0#0#0#0#0#0#0#0#0#0#0 LA;SKX| A;SKX#A;SKX#A;SKX#L##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 A;SKX|A;SKX#A;SKX#A;SKX###0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 AOLD| Syriac NT (Sedra database) DEUT33,02 >C- >;71C 1.000 >;71C- >C- DEUT33,02 DT D.@73T 1.000 D.@73T DT DEUT33,09 BNW B.@N@73JW 1.000 B.@N@73W BNW EST 01,16 MWMKN M:MW.K@81N 1.000 M:WM.K@81N MWMKN EST 03,04 B- K.:- 1.000 B.:- B- EST 03,04 >MRM >@M:R@70M 1.000 >@M:R@70M >MRM Hebrew Ketiv-Qere (ETCBC) Cuneiform Uruk (CDLI) (1:1:1:1) bi P PREFIX|bi+ (1:1:1:2) somi N STEM|POS:N|LEM:{som|ROOT:smw|M|GEN (1:1:2:1) {ll~ahi PN STEM|POS:PN|LEM:{ll~ah|ROOT:Alh|GEN (1:1:3:1) {l DET PREFIX|Al+ (1:1:3:2) r~aHoma`ni ADJ STEM|POS:ADJ|LEM:r~aHoma`n|ROOT:rHm|MS|GEN (1:1:4:1) {l DET PREFIX|Al+ (1:1:4:2) r~aHiymi ADJ STEM|POS:ADJ|LEM:r~aHiym|ROOT:rHm|MS|GEN (1:2:1:1) {lo DET PREFIX|Al+ (1:2:1:2) Hamodu N STEM|POS:N|LEM:Hamod|ROOT:Hmd|M|NOM Arabic Quran (Tanzil) Source data of a corpus TEI, Markdown, ASCII, Database
  • 19. Data structure of TF - the IKEA spirit node order! order! stacks of components uniquely identified words phrases chapters verses
  • 20. Conversion to TF TF does more than half of the work
  • 21. # Consider Phlebas $ author=Iain M. Banks ## 1 Everything about us, everything around us, everything we know [and can know of] is composed ultimately of patterns of nothing; that’s the bottom line, the final truth. So where we find we have any control over those patterns, why not make the most elegant ones, the most enjoyable and good ones, in our own terms? ## 2 Besides, it left the humans in the Culture free to take care of the things that really mattered in life, such as [sports, games, romance,] studying dead languages, barbarian societies and impossible problems, and climbing high mountains without the aid of a safety harness.
  • 22. @node @compiler=Dirk Roorda @description=the letters of a word @name=Culture quotes from Iain Banks @source=Good Reads @url=https://www.goodreads.com/ work/quotes/14366-consider-phlebas @valueType=str @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z Everything about us everything around us everything we know and can know of is composed ultimately of patterns of nothing that’s the bottom line the final truth So letters @node @compiler=Dirk Roorda @description=the punctuation after a word @name=Culture quotes from Iain Banks @source=Good Reads @url=https://www.goodreads.com/ work/quotes/14366-consider-phlebas @valueType=str @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z 3 , 6 , 20 ; 24 , 27 . 38 , 45 , 51 , 55 ? , 75 , 78 , , , 83 , 88 , 99 . punc banks/tf/ author.tf gap.tf letters.tf number.tf oslots.tf otext.tf otype.tf punc.tf terminator.tf title.tf TF dataset
  • 23. otype @node @compiler=Dirk Roorda @name=Culture quotes from Iain Banks @source=Good Reads @url=https://www.goodreads.com/work/quotes/14366-consider-phlebas @valueType=str @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z 1-99 word 100 book 101-102 chapter 103-114 line 115-117 sentence
  • 24. oslots @edge @compiler=Dirk Roorda @name=Culture quotes from Iain Banks @source=Good Reads @url=https://www.goodreads.com/work/quotes/14366-consider-phlebas @valueType=str @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z 100 1-99 1-55 56-99 1-3 4-6 7-9,14-20 21-27 28-38 39-51 52-55 56 57-75 76-77,81-83 84-88 89-99 1-27 28-55 56-99 1-99 word 100 book 101-102 chapter 103-114 line 115-117 sentence ## 1 Everything about us, everything around us, everything we know [and can know of] is composed ultimately of patterns of nothing; that’s the bottom line, the final truth. So where we find we have any control over those patterns, why not make the most elegant ones, the most enjoyable and good ones, in our own terms? ## 2 Besides, it left the humans in the Culture free to take care of the things that really mattered in life, such as [sports, games, romance,] studying dead languages, barbarian societies and impossible problems, and climbing high mountains without the aid of a safety harness.
  • 25. otext @config @compiler=Dirk Roorda @fmt:text-orig-full={letters}{punc} @name=Culture quotes from Iain Banks @sectionFeatures=title,number @sectionTypes=book,chapter @source=Good Reads @url=https://www.goodreads.com/work/quotes/14366-consider-phlebas @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z
  • 26. Computing - Python - Jupyter notebooks https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/start.ipynb BHSA
  • 31. Computing - more power! https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/searchFromMQL.ipynb BHSA
  • 37. UrukPower to you! (without the programming)
  • 38. Uruk
  • 39. Uruk
  • 40. Mini-Study: Atnachs and Phrase Divisions • How often do atnach accents disagree with the ETCBC phrase divisions? • Why?
  • 41. Sharing and re-using data Text-Fabric has been developed by a DANS-employee as a consequence: Data export is built in ✅ Provenance tracking is built in ✅ Redistribution of newly created data is built in ✅
  • 42. sharing #1: GitHub & NBviewer work done in a Jupyter Notebook inside a GitHub repository is very sharable
  • 44. sharing #2: Export from TF-browser
  • 46. sharing #4: Create new features https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/share.ipynb • etcbc/valence/tf : the results of the verbal valence work of Janet Dyk in the SYNVAR project; • etcbc/lingo/heads/tf : head words for phrases, work done by Cody Kingham; • ch-jensen/Semantic-mapping-of-participants/actor/tf : participant analysis in progress by Christian Høygaard-Jensen; • cmerwich/bh-reference-system/tf: participant analysis in progress by Christiaan Erwich; • or whatever you have in the making! • HINT: semantic/fuzzy/plurality for collective nouns (Chip Hardy?)
  • 49. Open Science Rocks thank you Cody Kingham codykingham@icloud.com Dirk Roorda dirk.roorda@dans.knaw.nl