SlideShare una empresa de Scribd logo
1 de 20
RSA 2019, Toronto
Preconference day
March 16, 2019
11AM-1PM
Programm
 11-11:05 -- Introduction to the session and
presenters
 PRESENTATION OF PROJECTS
 11:05-11:20 – Jodi: Mapping Titan, Mapping
Paintings
 11:20-11:35 – Catherine: Mapping Sculpture
 PRESENTATION OF TOOLS
 11:35-12:05 – Angela: OpenRefine, TimelineJS
 12:05-12:35 – Catherine: Palladio, CARTO
 Hands-on
OpenRefine
 Cleaning up messy data from a
spreadsheet
 Spelling errors
 Uniform data
 Removing whitespace
 Splitting columns
 Enriching data from external sources
 Etc.
You won’t be analysing your data one by one, but
in groups and sets. Therefore the application is
suitable for very large data sets.
OpenRefine
 Apart from cleaning data, you can also
use Open Refine for different purposes
 Word counts in sets
 Combine sheets
 Enriching reconciled data with Open Refine:
Import data from Wikidata or VIAF
OpenRefine
 Free, open source software
 Works best with Google Chrome (less with Safari and
Explorer)
 Written in Java. Requires Java JRE
 Works with Interactive Data Transformation tools (IDTs),
which allows to change a big data set at one time. It is
similar to a spreadsheet, but has more functionalities.
 Works as a destop application. It does not store your
data. Save them! It may be used in several tabs
contemporaneously.
 The .exe file opens a terminal window as web application,
where the little server is running. It needs to remain open.
Runs offline through the terminal window.
OpenRefine
 Chose a project and upload it.
 Rename project (save it later, Open Refine does not save
or store automatically!!)
 Use code UTF-8
 Configure your data: You will be shown a preview of your
data. In the lower blue field, make sure “Parse data as” is
set to “CSV / TSV / separator-based files”. Where it says
character encoding, click in the blank field next to it and
select UTF-8 from the pop-up window of encodings. Make
sure the first row with your column headers is recognized
as headers (boldfaced) and not as your data. If it is not
automatically recognized, check the click box for “Parse
next ‘1’ line(s) as column headers”. Since our exercise file
is a CSV, activate the radio button “commas (CSV)” as the
separator.
OpenRefine – basic clean
up
 Text facet -> cluster
 Get rid of whitespace: «Edit cells» -> «Common
transforms» -> «Trim leading and trailing whitespace» /
«Collapse consecutive whitespace»
 Divide columns: «Edit column» -> «Split into several
columns…»
 Reorder columns
 Cluster: «Edit cells» -> «Cluster and edit…» (only works
for entire clusters to be merged, no selection possible)
 Replace: Edit cells -> replace
 Undo/redo: step by step index in the menu
 Cancelling: Text facet –> chose what to eliminate and
place a star –> back to facet by star –> true –> under all –
facet by star –> remove all matching rows
OpenRefine - transform
 Exchange values: Edit cells -> transform ->
GREL language -> transform the value
 Replace: value.replace(‘xx’, ‘x’)
 Add characters to a column: “prefix” + value
 Cleaning up a date to show only the year:
datePart(value,'year')
 GREL : General Refine Expression Language on
GitHub
https://github.com/OpenRefine/OpenRefine/wiki/Gen
eral-Refine-Expression-Language
OpenRefine – example from
Wikipedia – Italian artists
 Download table from Wikipedia
 You want to separate names and years
 Add column based on this column
 Edit cells -> replace (to change the brakets into a colon, to be
used later as idenfier)
 Edit column – split into several columns (use colon as identifier)
 Replace ) by null
 Value + «, « + cells(«mycell»).value
 Person separate: edit column – add column based on this
column – value.split(« «)[1]
○ 1= last name / 0= first name
 Add last name, first name together: value + «, « +
cells[«Firstname»].value
 Another option: Split cells: Choose ‘Edit cells’, ‘Split multi-
valued cells’, entering ‘|’ as the value separator.
OpenRefine for Data
enrichment
(using Linked Open Data)
 Fetch URLs using Refine
 Contruct URL queries to retrieve
information from a simple web API
 Using query services like:
 Wikidata
 Google maps API
 VIAF (Virtual International Authority File)
 etc.
Retrieving data from
Wikidata
 You need a column Wikidata_uri
 Create a column Wikidata_id: Edit column –> add
column based on this column –> for the ID extraction
enter value
replace(value,"http://www.wikidata.org/entity/", "")
 On Wikidata_id column: Edit column -> add column
by fetching URLs -> if you want to query birth dates
enter value «P569»
("https://tools.wmflabs.org/openrefine-
wikidata/en/fetch_values?item="+value+"&prop=P56
9") -> name column «date_of_birth_Wikidata». The
result is in JSON.
 Clean data by -> edit cells -> transform -> for value
enter forEach(value.parseJson().values,v,v).join(";")
 Cleaning up a date to show only the year:
datePart(value,'year')
Retrieving data from
Wikidata
 Reconcile (how simple is this!!)
 Chose source – Wikidata (in case include
other columns too)
 Start reconciling – record will be
automatically linked to Wikidata (some rest
has to be done manually)
 Use values as identifiers
OpenRefine - export
 At the end: export your data set! (Open
Refine does not change your original
data set)
 Single column export -> facet -> chose
facet -> export csv
 Full sheet export -> comma-separated
value
 It is also possible to only export parts of
your sheet.
OpenRefine tutorials
 http://openrefine.org/
 https://programminghistorian.org/en/lessons/cleaning
-data-with-openrefine
 https://github.com/miriamposner/get-started-with-
openrefine/blob/master/get-started-with-
openrefine.md
 https://github.com/OpenRefine/OpenRefine/wiki/Doc
umentation-For-Users
 Retrieving data from Wikidata or VIAF
https://medium.com/the-bytegeist-blog/enriching-
reconciled-data-with-openrefine-89b885dcadbb
 There are many more!!
Timelines (selection)
 Timeline JS (Northwestern University)
https://news.northwestern.edu/stories/2012
/03/knight-lab-digital-timelines/ (with
examples and spreadsheet)
 Neatline – for Omeka
http://docs.neatline.org/creating-records.html
 Google Timeline
https://www.google.com/maps/timeline?pb
 Office Timelines (for Excel or Powerpoint)
https://templates.office.com/en-
us/Timelines?page=1
TimelineJS
With Google Chrome and Google Spreadsheets
 Advantages
 Easy to use for a chronological visualization
 Incorporates maps and images from the web
 Can be incorporated into Websites and
Powerpoints
 Disadvantages
 Limited interactivity
 Only uses images published on the web, not
from own collection
TimelineJS
With Google Chrome
 https://timeline.knightlab.com/

 Botticelli spreadsheet:
https://docs.google.com/spreadsheets/d/
1BAg-2_XZM-
Oap1cwQoftBcYjrJYBjXOSNOqdXBwQ
WyY/edit#gid=0
 Botticelli timeline (imbedded link to
website or presentation)
Thank you !
Dr. Angela Dressen
Villa I Tatti, The Harvard University Center
for Italian Renaissance Studies / Florenz,
Italy
adressen@itatti.harvard.edu
Discipline Representative for Digital
Humanities at the Renaissance Society of
America (RSA)

Más contenido relacionado

Similar a Dressen-RSA-2019-preconference-data-workshop-copy.pptx

Spss tutorial 1
Spss tutorial 1Spss tutorial 1
Spss tutorial 1
debataraja
 
MS Office Access Tutorial
MS Office Access TutorialMS Office Access Tutorial
MS Office Access Tutorial
virtualMaryam
 
Article link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxArticle link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docx
fredharris32
 
Lession 6.introduction to records
Lession 6.introduction to recordsLession 6.introduction to records
Lession 6.introduction to records
Đỗ Đức Hùng
 
1 Introduction to SPSS.pdf
1 Introduction to SPSS.pdf1 Introduction to SPSS.pdf
1 Introduction to SPSS.pdf
Yomif3
 
Top 20 something info path 2010 tips and trips - sps-ozarks12
Top 20 something info path 2010 tips and trips - sps-ozarks12Top 20 something info path 2010 tips and trips - sps-ozarks12
Top 20 something info path 2010 tips and trips - sps-ozarks12
Kevin Dostalek
 

Similar a Dressen-RSA-2019-preconference-data-workshop-copy.pptx (20)

Spss tutorial 1
Spss tutorial 1Spss tutorial 1
Spss tutorial 1
 
OpenRefine
OpenRefineOpenRefine
OpenRefine
 
MS Office Access Tutorial
MS Office Access TutorialMS Office Access Tutorial
MS Office Access Tutorial
 
Automation Of Reporting And Alerting
Automation Of Reporting And AlertingAutomation Of Reporting And Alerting
Automation Of Reporting And Alerting
 
ADVANCE ITT BY PRASAD
ADVANCE ITT BY PRASADADVANCE ITT BY PRASAD
ADVANCE ITT BY PRASAD
 
Itm310 problem solving #7 complete solutions correct answers key
Itm310 problem solving #7 complete solutions correct answers keyItm310 problem solving #7 complete solutions correct answers key
Itm310 problem solving #7 complete solutions correct answers key
 
Chapter.07
Chapter.07Chapter.07
Chapter.07
 
L1 - Recap.pdf
L1 - Recap.pdfL1 - Recap.pdf
L1 - Recap.pdf
 
Std 10 Computer Chapter 5 Introduction to Calc
Std 10 Computer Chapter 5 Introduction to CalcStd 10 Computer Chapter 5 Introduction to Calc
Std 10 Computer Chapter 5 Introduction to Calc
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Article link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxArticle link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docx
 
Lession 6.introduction to records
Lession 6.introduction to recordsLession 6.introduction to records
Lession 6.introduction to records
 
1 Introduction to SPSS.pdf
1 Introduction to SPSS.pdf1 Introduction to SPSS.pdf
1 Introduction to SPSS.pdf
 
Text processing by Rj
Text processing by RjText processing by Rj
Text processing by Rj
 
MapInfo Professional 12.0 and SQL Server 2008
MapInfo Professional 12.0 and SQL Server 2008MapInfo Professional 12.0 and SQL Server 2008
MapInfo Professional 12.0 and SQL Server 2008
 
Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Stata Cheat Sheets (all)
Stata Cheat Sheets (all)
 
Libre Office Calc Lesson 1: Introduction to spreadsheets
Libre Office Calc Lesson 1: Introduction to spreadsheetsLibre Office Calc Lesson 1: Introduction to spreadsheets
Libre Office Calc Lesson 1: Introduction to spreadsheets
 
Pandas csv
Pandas csvPandas csv
Pandas csv
 
Top 20 something info path 2010 tips and trips - sps-ozarks12
Top 20 something info path 2010 tips and trips - sps-ozarks12Top 20 something info path 2010 tips and trips - sps-ozarks12
Top 20 something info path 2010 tips and trips - sps-ozarks12
 
Model Assistant Suite
Model Assistant SuiteModel Assistant Suite
Model Assistant Suite
 

Más de AvneeshKumar164042

Más de AvneeshKumar164042 (20)

Dental Radiography machine.ppt
Dental Radiography machine.pptDental Radiography machine.ppt
Dental Radiography machine.ppt
 
telagana.ppt
telagana.ppttelagana.ppt
telagana.ppt
 
BholuMNNIT.ppt
BholuMNNIT.pptBholuMNNIT.ppt
BholuMNNIT.ppt
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
 
Hypertension.ppt
Hypertension.pptHypertension.ppt
Hypertension.ppt
 
GenerationofXRays.ppt
GenerationofXRays.pptGenerationofXRays.ppt
GenerationofXRays.ppt
 
2-180318193019.pdf
2-180318193019.pdf2-180318193019.pdf
2-180318193019.pdf
 
xrayproductionandproperties-171229054704.pdf
xrayproductionandproperties-171229054704.pdfxrayproductionandproperties-171229054704.pdf
xrayproductionandproperties-171229054704.pdf
 
TR-069_Overview.ppt
TR-069_Overview.pptTR-069_Overview.ppt
TR-069_Overview.ppt
 
barbados-day1-presentation_blending.pptx
barbados-day1-presentation_blending.pptxbarbados-day1-presentation_blending.pptx
barbados-day1-presentation_blending.pptx
 
Capacity Development Program for Investment Promotion Agencies of the LDCs by...
Capacity Development Program for Investment Promotion Agencies of the LDCs by...Capacity Development Program for Investment Promotion Agencies of the LDCs by...
Capacity Development Program for Investment Promotion Agencies of the LDCs by...
 
BCH_6.4_international Business_week 4_vartika_FDI.pptx
BCH_6.4_international Business_week 4_vartika_FDI.pptxBCH_6.4_international Business_week 4_vartika_FDI.pptx
BCH_6.4_international Business_week 4_vartika_FDI.pptx
 
2) Infective endocarditis .pptx
2) Infective endocarditis .pptx2) Infective endocarditis .pptx
2) Infective endocarditis .pptx
 
Sinusitis.ppt
Sinusitis.pptSinusitis.ppt
Sinusitis.ppt
 
Arterial Blood Gas.ppt1.ppt
Arterial Blood Gas.ppt1.pptArterial Blood Gas.ppt1.ppt
Arterial Blood Gas.ppt1.ppt
 
Chapter15.ppt
Chapter15.pptChapter15.ppt
Chapter15.ppt
 
Chapter_021.pptx
Chapter_021.pptxChapter_021.pptx
Chapter_021.pptx
 
24_lecture_pptEK.ppt
24_lecture_pptEK.ppt24_lecture_pptEK.ppt
24_lecture_pptEK.ppt
 
Soft-Skills-Usman- Ghani-Akbani-for-participants.pptx
Soft-Skills-Usman- Ghani-Akbani-for-participants.pptxSoft-Skills-Usman- Ghani-Akbani-for-participants.pptx
Soft-Skills-Usman- Ghani-Akbani-for-participants.pptx
 

Último

Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
mahaiklolahd
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
chetankumar9855
 

Último (20)

Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
 
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on WhatsappMost Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
 
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
 
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
 
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
 
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
 
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
 
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
 
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
 
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
 

Dressen-RSA-2019-preconference-data-workshop-copy.pptx

  • 1. RSA 2019, Toronto Preconference day March 16, 2019 11AM-1PM
  • 2. Programm  11-11:05 -- Introduction to the session and presenters  PRESENTATION OF PROJECTS  11:05-11:20 – Jodi: Mapping Titan, Mapping Paintings  11:20-11:35 – Catherine: Mapping Sculpture  PRESENTATION OF TOOLS  11:35-12:05 – Angela: OpenRefine, TimelineJS  12:05-12:35 – Catherine: Palladio, CARTO  Hands-on
  • 3.
  • 4. OpenRefine  Cleaning up messy data from a spreadsheet  Spelling errors  Uniform data  Removing whitespace  Splitting columns  Enriching data from external sources  Etc. You won’t be analysing your data one by one, but in groups and sets. Therefore the application is suitable for very large data sets.
  • 5. OpenRefine  Apart from cleaning data, you can also use Open Refine for different purposes  Word counts in sets  Combine sheets  Enriching reconciled data with Open Refine: Import data from Wikidata or VIAF
  • 6. OpenRefine  Free, open source software  Works best with Google Chrome (less with Safari and Explorer)  Written in Java. Requires Java JRE  Works with Interactive Data Transformation tools (IDTs), which allows to change a big data set at one time. It is similar to a spreadsheet, but has more functionalities.  Works as a destop application. It does not store your data. Save them! It may be used in several tabs contemporaneously.  The .exe file opens a terminal window as web application, where the little server is running. It needs to remain open. Runs offline through the terminal window.
  • 7. OpenRefine  Chose a project and upload it.  Rename project (save it later, Open Refine does not save or store automatically!!)  Use code UTF-8  Configure your data: You will be shown a preview of your data. In the lower blue field, make sure “Parse data as” is set to “CSV / TSV / separator-based files”. Where it says character encoding, click in the blank field next to it and select UTF-8 from the pop-up window of encodings. Make sure the first row with your column headers is recognized as headers (boldfaced) and not as your data. If it is not automatically recognized, check the click box for “Parse next ‘1’ line(s) as column headers”. Since our exercise file is a CSV, activate the radio button “commas (CSV)” as the separator.
  • 8. OpenRefine – basic clean up  Text facet -> cluster  Get rid of whitespace: «Edit cells» -> «Common transforms» -> «Trim leading and trailing whitespace» / «Collapse consecutive whitespace»  Divide columns: «Edit column» -> «Split into several columns…»  Reorder columns  Cluster: «Edit cells» -> «Cluster and edit…» (only works for entire clusters to be merged, no selection possible)  Replace: Edit cells -> replace  Undo/redo: step by step index in the menu  Cancelling: Text facet –> chose what to eliminate and place a star –> back to facet by star –> true –> under all – facet by star –> remove all matching rows
  • 9. OpenRefine - transform  Exchange values: Edit cells -> transform -> GREL language -> transform the value  Replace: value.replace(‘xx’, ‘x’)  Add characters to a column: “prefix” + value  Cleaning up a date to show only the year: datePart(value,'year')  GREL : General Refine Expression Language on GitHub https://github.com/OpenRefine/OpenRefine/wiki/Gen eral-Refine-Expression-Language
  • 10. OpenRefine – example from Wikipedia – Italian artists  Download table from Wikipedia  You want to separate names and years  Add column based on this column  Edit cells -> replace (to change the brakets into a colon, to be used later as idenfier)  Edit column – split into several columns (use colon as identifier)  Replace ) by null  Value + «, « + cells(«mycell»).value  Person separate: edit column – add column based on this column – value.split(« «)[1] ○ 1= last name / 0= first name  Add last name, first name together: value + «, « + cells[«Firstname»].value  Another option: Split cells: Choose ‘Edit cells’, ‘Split multi- valued cells’, entering ‘|’ as the value separator.
  • 11. OpenRefine for Data enrichment (using Linked Open Data)  Fetch URLs using Refine  Contruct URL queries to retrieve information from a simple web API  Using query services like:  Wikidata  Google maps API  VIAF (Virtual International Authority File)  etc.
  • 12. Retrieving data from Wikidata  You need a column Wikidata_uri  Create a column Wikidata_id: Edit column –> add column based on this column –> for the ID extraction enter value replace(value,"http://www.wikidata.org/entity/", "")  On Wikidata_id column: Edit column -> add column by fetching URLs -> if you want to query birth dates enter value «P569» ("https://tools.wmflabs.org/openrefine- wikidata/en/fetch_values?item="+value+"&prop=P56 9") -> name column «date_of_birth_Wikidata». The result is in JSON.  Clean data by -> edit cells -> transform -> for value enter forEach(value.parseJson().values,v,v).join(";")  Cleaning up a date to show only the year: datePart(value,'year')
  • 13. Retrieving data from Wikidata  Reconcile (how simple is this!!)  Chose source – Wikidata (in case include other columns too)  Start reconciling – record will be automatically linked to Wikidata (some rest has to be done manually)  Use values as identifiers
  • 14. OpenRefine - export  At the end: export your data set! (Open Refine does not change your original data set)  Single column export -> facet -> chose facet -> export csv  Full sheet export -> comma-separated value  It is also possible to only export parts of your sheet.
  • 15. OpenRefine tutorials  http://openrefine.org/  https://programminghistorian.org/en/lessons/cleaning -data-with-openrefine  https://github.com/miriamposner/get-started-with- openrefine/blob/master/get-started-with- openrefine.md  https://github.com/OpenRefine/OpenRefine/wiki/Doc umentation-For-Users  Retrieving data from Wikidata or VIAF https://medium.com/the-bytegeist-blog/enriching- reconciled-data-with-openrefine-89b885dcadbb  There are many more!!
  • 16.
  • 17. Timelines (selection)  Timeline JS (Northwestern University) https://news.northwestern.edu/stories/2012 /03/knight-lab-digital-timelines/ (with examples and spreadsheet)  Neatline – for Omeka http://docs.neatline.org/creating-records.html  Google Timeline https://www.google.com/maps/timeline?pb  Office Timelines (for Excel or Powerpoint) https://templates.office.com/en- us/Timelines?page=1
  • 18. TimelineJS With Google Chrome and Google Spreadsheets  Advantages  Easy to use for a chronological visualization  Incorporates maps and images from the web  Can be incorporated into Websites and Powerpoints  Disadvantages  Limited interactivity  Only uses images published on the web, not from own collection
  • 19. TimelineJS With Google Chrome  https://timeline.knightlab.com/   Botticelli spreadsheet: https://docs.google.com/spreadsheets/d/ 1BAg-2_XZM- Oap1cwQoftBcYjrJYBjXOSNOqdXBwQ WyY/edit#gid=0  Botticelli timeline (imbedded link to website or presentation)
  • 20. Thank you ! Dr. Angela Dressen Villa I Tatti, The Harvard University Center for Italian Renaissance Studies / Florenz, Italy adressen@itatti.harvard.edu Discipline Representative for Digital Humanities at the Renaissance Society of America (RSA)

Notas del editor

  1. Cleaning up your own accumulated data or data gathered from the net. Works with an algorithm.
  2. Wikidata provides an endpoint for querying data as a URL. Once you know the property you would like to retrieve, the objective is to use OpenRefine to build a query string and retrieve the data you want from that endpoint.