SlideShare una empresa de Scribd logo
1 de 13
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Project Overview
JISC/SSHRC Digging into Data Challenge II
Jan 2012 - Dec 2013
Text mining, data extraction and information
visualisation to explore big historical datasets.
Focus on how commodities were traded across
the globe in the 19th century.
Help historians to discover novel patterns and
explore new research questions.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Project Team
Ewan Klein, Bea Alex, Claire Grover, Richard
Tobin: text mining
Colin Coates, Andrew Watson and
Jim Clifford: historical analysis
James Reid, Nicola Osborne : data
management, social media
Aaron Quigley, Uta Hinrichs: information
visualisation
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Traditional Historical
Research
Gillow and the Use of Mahogany in the Eighteenth
Century, Adam Bowett, Regional Furniture, v.XII,
1998.
Global Fats Supply 1894-98
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Document Collections
Collection # of Documents # of Images
House of Commons
Parliamentary Papers
(ProQuest)
118,526 6,448,739
Early Canadiana Online 83,016 3,938,758
Directors’ Letters of
Correspondence (Kew)
14,340 n/a
Confidential Prints (Adam
Matthews)
1,315 140,010
Foreign and
Commonwealth Office
Collection
1,000 41,611
Asia and the West (Gale) 4,725 948,773 (OCRed: 450,841)
Over 10 million document pages,
Over 7 billion word tokens.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
System Architecture
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Mined Information
Example sentence:
Normalised and grounded entities:
commodity: cassia bark [concept: Cinnamomum cassia]
date: 1871 (year=1871)
location: Padang (lat=-0.94924;long=100.35427;country=ID)
location: America (lat=39.76;long=-98.50;country=n/a)
quantity + unit: 6,127 piculs
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Mined Information
Example sentence:
Extracted entity attributes and relations:
origin location: Padang
destination location: America
commodity–date relation: cassia bark – 1871
commodity–location relation: cassia bark – Padang
commodity–location relation: cassia bark – America
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Edinburgh Geoparser
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Text Mining Output
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Over 31 million commodity mentions in 7 billion words.
Over 15 million commodity-location relations.
The 100 most frequent commodities are repeated over
210,000 times on average (68% of mentions),
mentions repeated >=100 (1775) make up 99.8% of all
mentions.
All information stored in Trading Consequences
database (150GB)
Extract of Early Canadiana Online document 9_00952_3, p. vi.
OCR Errors
Extract of Early Canadiana Online document 9_00952_3, p. vi.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Lessons Learned
Importance of two-way collaboration between
technology and humanities expert in digital HSS
projects.
Value of iterative development and rapid prototyping.
Geo-referencing text is very important for historical
analysis.
Most OCR errors are noise in big data but HSS
scholars need to be made more aware of OCR errors
affecting their search results for historical collections.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Thank You
Contact: balex@inf.ed.ac.uk
Website: http://tradingconsequences.blogs.edina.ac.uk/
Launch of web-based user interface: March 2014.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014

Más contenido relacionado

Similar a Trading Consequences - Bea Alex

Digital History and Big Data: text mining historical documents on trade in th...
Digital History and Big Data: text mining historical documents on trade in th...Digital History and Big Data: text mining historical documents on trade in th...
Digital History and Big Data: text mining historical documents on trade in th...Beatrice Alex
 
Practicum presentation
Practicum presentationPracticum presentation
Practicum presentationumthisisalex
 
Culture hack scotland – handy data guide v1
Culture hack scotland – handy data guide v1Culture hack scotland – handy data guide v1
Culture hack scotland – handy data guide v1festivalslab
 
Digital Resources for the Eighteenth Century
Digital Resources for the Eighteenth CenturyDigital Resources for the Eighteenth Century
Digital Resources for the Eighteenth CenturyAlastair Dunning
 
Research skills in practice - Matthew Stephens
Research skills in practice - Matthew StephensResearch skills in practice - Matthew Stephens
Research skills in practice - Matthew Stephenslearningslnsw
 
An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland dri_ireland
 
Research Skills in Practice - Matthew Stephens
Research Skills in Practice - Matthew StephensResearch Skills in Practice - Matthew Stephens
Research Skills in Practice - Matthew Stephenslearningslnsw
 
Tanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection DatabaseTanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection DatabaseAndrew Prescott
 
Quantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesQuantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesEric Meyer
 
LocScot's Digitisation Day School 18th March 2016
LocScot's Digitisation Day School 18th March 2016LocScot's Digitisation Day School 18th March 2016
LocScot's Digitisation Day School 18th March 2016Fiona Myles
 
Digital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment ScotlandDigital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment ScotlandHistoric Environment Scotland
 
SCA Scotland Forum 210508 Paul Ell
SCA Scotland Forum 210508 Paul EllSCA Scotland Forum 210508 Paul Ell
SCA Scotland Forum 210508 Paul Ellmichellep
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryNora McGregor
 
'Introduction to the concept of Open Access and Digital Preservation'
'Introduction to the concept of  Open Access and Digital Preservation''Introduction to the concept of  Open Access and Digital Preservation'
'Introduction to the concept of Open Access and Digital Preservation'dri_ireland
 
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...Javier Pereda
 
Metadata costs per unit of effort (cpue)
Metadata  costs per unit of effort (cpue)Metadata  costs per unit of effort (cpue)
Metadata costs per unit of effort (cpue)Tom Moritz
 

Similar a Trading Consequences - Bea Alex (20)

Digital History and Big Data: text mining historical documents on trade in th...
Digital History and Big Data: text mining historical documents on trade in th...Digital History and Big Data: text mining historical documents on trade in th...
Digital History and Big Data: text mining historical documents on trade in th...
 
Practicum presentation
Practicum presentationPracticum presentation
Practicum presentation
 
Culture hack scotland – handy data guide v1
Culture hack scotland – handy data guide v1Culture hack scotland – handy data guide v1
Culture hack scotland – handy data guide v1
 
Digital Resources for the Eighteenth Century
Digital Resources for the Eighteenth CenturyDigital Resources for the Eighteenth Century
Digital Resources for the Eighteenth Century
 
Research skills in practice - Matthew Stephens
Research skills in practice - Matthew StephensResearch skills in practice - Matthew Stephens
Research skills in practice - Matthew Stephens
 
An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland
 
Research Skills in Practice - Matthew Stephens
Research Skills in Practice - Matthew StephensResearch Skills in Practice - Matthew Stephens
Research Skills in Practice - Matthew Stephens
 
Tanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection DatabaseTanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection Database
 
Political Arithmetic, Territorial Geometry and Programmed Cities
Political Arithmetic, Territorial Geometry and Programmed CitiesPolitical Arithmetic, Territorial Geometry and Programmed Cities
Political Arithmetic, Territorial Geometry and Programmed Cities
 
101 This is Digital Scholarship 2016
101 This is Digital Scholarship 2016101 This is Digital Scholarship 2016
101 This is Digital Scholarship 2016
 
Quantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesQuantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archives
 
LocScot's Digitisation Day School 18th March 2016
LocScot's Digitisation Day School 18th March 2016LocScot's Digitisation Day School 18th March 2016
LocScot's Digitisation Day School 18th March 2016
 
Digital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment ScotlandDigital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment Scotland
 
SCA Scotland Forum 210508 Paul Ell
SCA Scotland Forum 210508 Paul EllSCA Scotland Forum 210508 Paul Ell
SCA Scotland Forum 210508 Paul Ell
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British Library
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British Library
 
'Introduction to the concept of Open Access and Digital Preservation'
'Introduction to the concept of  Open Access and Digital Preservation''Introduction to the concept of  Open Access and Digital Preservation'
'Introduction to the concept of Open Access and Digital Preservation'
 
Where is Cultural Heritage in INSPIRE?
Where is Cultural Heritage in INSPIRE?Where is Cultural Heritage in INSPIRE?
Where is Cultural Heritage in INSPIRE?
 
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
 
Metadata costs per unit of effort (cpue)
Metadata  costs per unit of effort (cpue)Metadata  costs per unit of effort (cpue)
Metadata costs per unit of effort (cpue)
 

Más de tarastar

Engaging V&A Visitors with Game Design
Engaging V&A Visitors with Game DesignEngaging V&A Visitors with Game Design
Engaging V&A Visitors with Game Designtarastar
 
Engaging the public in tagging and researching the UK's paintings: Two case s...
Engaging the public in tagging and researching the UK's paintings: Two case s...Engaging the public in tagging and researching the UK's paintings: Two case s...
Engaging the public in tagging and researching the UK's paintings: Two case s...tarastar
 
Transcribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of ScotlandTranscribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of Scotlandtarastar
 
Data and Ethics Roundtable
Data and Ethics RoundtableData and Ethics Roundtable
Data and Ethics Roundtabletarastar
 
Studying the Use of Glasgow University's Digital Collections
Studying the Use of Glasgow University's Digital CollectionsStudying the Use of Glasgow University's Digital Collections
Studying the Use of Glasgow University's Digital Collectionstarastar
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigmstarastar
 
Glasgow Life - Glasgow Museums - John Ferry
Glasgow Life - Glasgow Museums - John FerryGlasgow Life - Glasgow Museums - John Ferry
Glasgow Life - Glasgow Museums - John Ferrytarastar
 
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...tarastar
 

Más de tarastar (8)

Engaging V&A Visitors with Game Design
Engaging V&A Visitors with Game DesignEngaging V&A Visitors with Game Design
Engaging V&A Visitors with Game Design
 
Engaging the public in tagging and researching the UK's paintings: Two case s...
Engaging the public in tagging and researching the UK's paintings: Two case s...Engaging the public in tagging and researching the UK's paintings: Two case s...
Engaging the public in tagging and researching the UK's paintings: Two case s...
 
Transcribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of ScotlandTranscribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of Scotland
 
Data and Ethics Roundtable
Data and Ethics RoundtableData and Ethics Roundtable
Data and Ethics Roundtable
 
Studying the Use of Glasgow University's Digital Collections
Studying the Use of Glasgow University's Digital CollectionsStudying the Use of Glasgow University's Digital Collections
Studying the Use of Glasgow University's Digital Collections
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigms
 
Glasgow Life - Glasgow Museums - John Ferry
Glasgow Life - Glasgow Museums - John FerryGlasgow Life - Glasgow Museums - John Ferry
Glasgow Life - Glasgow Museums - John Ferry
 
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
 

Último

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 

Último (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 

Trading Consequences - Bea Alex

  • 1. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 2. Project Overview JISC/SSHRC Digging into Data Challenge II Jan 2012 - Dec 2013 Text mining, data extraction and information visualisation to explore big historical datasets. Focus on how commodities were traded across the globe in the 19th century. Help historians to discover novel patterns and explore new research questions. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 3. Project Team Ewan Klein, Bea Alex, Claire Grover, Richard Tobin: text mining Colin Coates, Andrew Watson and Jim Clifford: historical analysis James Reid, Nicola Osborne : data management, social media Aaron Quigley, Uta Hinrichs: information visualisation Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 4. Traditional Historical Research Gillow and the Use of Mahogany in the Eighteenth Century, Adam Bowett, Regional Furniture, v.XII, 1998. Global Fats Supply 1894-98 Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 5. Document Collections Collection # of Documents # of Images House of Commons Parliamentary Papers (ProQuest) 118,526 6,448,739 Early Canadiana Online 83,016 3,938,758 Directors’ Letters of Correspondence (Kew) 14,340 n/a Confidential Prints (Adam Matthews) 1,315 140,010 Foreign and Commonwealth Office Collection 1,000 41,611 Asia and the West (Gale) 4,725 948,773 (OCRed: 450,841) Over 10 million document pages, Over 7 billion word tokens. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 6. System Architecture Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 7. Mined Information Example sentence: Normalised and grounded entities: commodity: cassia bark [concept: Cinnamomum cassia] date: 1871 (year=1871) location: Padang (lat=-0.94924;long=100.35427;country=ID) location: America (lat=39.76;long=-98.50;country=n/a) quantity + unit: 6,127 piculs Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 8. Mined Information Example sentence: Extracted entity attributes and relations: origin location: Padang destination location: America commodity–date relation: cassia bark – 1871 commodity–location relation: cassia bark – Padang commodity–location relation: cassia bark – America Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 9. Edinburgh Geoparser Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 10. Text Mining Output Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014 Over 31 million commodity mentions in 7 billion words. Over 15 million commodity-location relations. The 100 most frequent commodities are repeated over 210,000 times on average (68% of mentions), mentions repeated >=100 (1775) make up 99.8% of all mentions. All information stored in Trading Consequences database (150GB)
  • 11. Extract of Early Canadiana Online document 9_00952_3, p. vi. OCR Errors Extract of Early Canadiana Online document 9_00952_3, p. vi. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 12. Lessons Learned Importance of two-way collaboration between technology and humanities expert in digital HSS projects. Value of iterative development and rapid prototyping. Geo-referencing text is very important for historical analysis. Most OCR errors are noise in big data but HSS scholars need to be made more aware of OCR errors affecting their search results for historical collections. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 13. Thank You Contact: balex@inf.ed.ac.uk Website: http://tradingconsequences.blogs.edina.ac.uk/ Launch of web-based user interface: March 2014. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014