SlideShare una empresa de Scribd logo
1 de 67
Descargar para leer sin conexión
Search & Data Mining 
SKILLS SEMINAR 
Master of European History, University of Luxembourg, 11 December 2014 
Gerben Zaagsma 
Lichtenberg-Kolleg,
Overview 
1. 
2. T 
3. Practical exercises 
1. Introduction search & data mining
Code yourself… …or use existing tools
Why historians should be 
interested: 
Old New CHANGE 
Analogue resources Digital resources 
SCALE 
Small data Big data 
Close reading Distant reading TECHNOLOGY
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities
culturomics and Google ngrams
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history 
Patterns and structures: a new essentialism?
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history 
Patterns and structures: a new essentialism? 
Based upon changes of scale & method: humanities 
supposedly becoming more ‘scientific’ > results can be 
checked and replicated, but can they? Interpretation.
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history 
Patterns and structures: a new essentialism? 
Based upon changes of scale & method: humanities 
supposedly becoming more ‘scientific’ > results can be 
checked and replicated, but can they? Interpretation. 
Politics: funding & valorisation
“One of the problems confronting data enthusiasts in 
the humanities is that we feel a need to convince our 
more old-fashioned colleagues about what can be done. 
But our role as advocates of data shouldn't mean that 
we lose our critical sense as scholars. 
[....] there is a risk that we look more carefully at the 
technical components of the datasets than the 
historical context of the information that they represent. 
Andrew Prescott, ‘The Deceptions of Data’, Digital Riffs (13 
January 2013).
Frédéric Clavert, ‘Lecture des sources historiennes à l’ère 
numérique’ (14 November 2012) 
Integrate 
approaches 
& methods/ 
hybridity
1. SEARCH
Google/ Bing/ Yahoo 
er is veel meer ...
zoeken op Internet algemeen: 
Google 
er is veel meer dan Google 
filter bubble? bekijk eens: http://dontbubble.us
zoeken op Internet algemeen: 
Google 
er is veel meer dan Google 
filter bubble? bekijk eens: http://dontbubble.us 
http://www.langreiter.com/exec/yahoo-vs-google.html
zoeken op Internet algemeen: 
Google 
er is veel meer dan Google 
filter bubble? bekijk eens: http://dontbubble.us 
http://yometa.com
filter bubble? 
http://www.thefilterbubble.com
filter bubble? 
http://www.thefilterbubble.com
Web search round-up 
differences between search engines 
filter bubble 
deep web versus visible web
Searching digital libraries & archives…
composition of resources, selection…
example of Compactmemory: a great resource on 
German-Jewish history
Die Sammlung umfasst die 110 wichtigsten jüdischen 
Zeitungen und Zeitschriften des deutschsprachigen Raumes 
aus den Jahren 1806-1938. Die Periodika repräsentieren die 
gesamte religiöse, politische, soziale, literarische oder 
wissenschaftliche Bandbreite der jüdischen Gemeinschaft. 
but be aware of selection: focus on elites and organisations that 
highlight German Jewry’s process of emancipation : 
• classical vision in historiography on German Jewry? 
• reinforcement of existing master narratives?
mind the context…
Processing and searching data on your own 
computer…
1. DATA MINING
data? 
data = computer-processable information
Example of structured data
Many digital libraries/archives: 
un-/semi-structured data
Digital editions: bridging the gap with XML
http://eculture.cs.vu.nl/europeana/session/search 
•Google/ Bing/ Yahoo 
• er is veel meer ... 
• resultaten verschillen per zoekmachine 
• en er is een filter bubbel 
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal 
Semantic web and linking data
•Google/ Bing/ Yahoo 
• er is veel meer ... 
• resultaten verschillen per zoekmachine 
• en er is een filter bubbel 
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal 
cs.vu.nl/europeana/session/search
•Google/ Bing/ Yahoo 
• er is veel meer ... 
• resultaten verschillen per zoekmachine 
• en er is een filter bubbel 
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal
Some definitions of data mining:
At its simplest, data mining is the process of extracting 
new knowledge (usually in terms of previously unknown 
patterns) from sets of data already in existence. 
Jonathan Hagood
Data mining (the analysis step of the "Knowledge Discovery in 
Databases" process, or KDD), an interdisciplinary subfield of 
computer science, is the computational process of discovering 
patterns in large data sets involving methods at the intersection 
of artificial intelligence, machine learning, statistics, and 
database systems. 
The overall goal of the data mining process is to extract 
information from a data set and transform it into an 
understandable structure for further use. 
Wikipedia
Examples of projects and techniques
an n-gram is a contiguous sequence of n 
items from a given sequence of text or speech
Topic Modeling Martha Ballard’s Diary
data? 
data & data mining ≠ neutral
“What is too often forgotten, though, is that our 
digital helpers are full of ‘theory’ and ‘judgement’ 
already. As with any methodology, they rely on sets 
of assumptions, models, and strategies. Theory is 
already at work on the most basic level when it 
comes to defining units of analysis, algorithms, and 
visualisation procedures.” 
Bernhard Rieder and Theo Röhle, ‘Digital Methods: Five 
Challenges’ in: David M Berry ed., Understanding Digital 
Humanities (Houndmills: Palgrave Macmillan, 2012) 67-85, 
70.
2. TOOLS
3. Practical exercises
Overview of exercises 
http://goo.gl/72fCn7
Tools & workflows 
Voyant Tools 
Voyant Tools Documentation 
Programming Historian 
DIRT: Digital Research Tools 
Turkel, William J., Kevin Kee, and Spencer Roberts, ‘A 
Method for Navigating the Infinite Archive’ in: Toni 
Weller ed., History in the Digital Age (London; New 
York: Routledge, 2013). 
William J. Turkel: How To
Further reading 
Special issue on Digital History, BMGN - Low Countries Historical Review, 128/4 (2013). 
Haber, Peter, Digital Past : Geschichtswissenschaft Im Digitalen Zeitalter (München: 
Oldenbourg Verlag, 2011). 
Boonstra, Onno, Leen Breure, and Peter Doorn, Past, Present and Future of Historical 
Information Science (Amsterdam: NIWI-KNAW, 2004). 
Ciravegna, Fabio, Mark Greengrass, Tim Hitchcock, Sam Chapman, Jamie McLaughlin, 
and Ravish Bhagdev, ‘Finding Needles in Hay- Stacks: Data-Mining in Distributed 
Historical Datasets’ in: Mark Greengrass and Lorna M Hughes eds., The Virtual 
Representation of the Past (Ashgate, 2008). 
Cohen, D, F Gibbs, T Hitchcock, G Rockwell, J Sander, R Shoemaker, S Sinclair, S Takats, 
W J Turkel, and C Briquet. "Data Mining with Criminal Intent." Final white paper (2011). 
Hagood, Jonathan, "A Brief Introduction to Data Mining Projects in the Humanities." 
Bulletin of the American Society for Information Science and Technology 38/4 (2012). 
Hitchcock, Tim, "Big Data for Dead People: Digital Readings and the Conundrums of 
Positivism." (9 December 2013). 
Leonard, Peter, "Mining Large Datasets for the Humanities”, IFLA WLIC 2014.
Dr. Gerben Zaagsma 
http://gerbenzaagsma.org 
de.linkedin.com/in/gerbenzaagsma/ 
https://twitter.com/gerbenzaagsma 
https://uni-goettingen.academia.edu/GerbenZaagsma 
https://www.researchgate.net/profile/Gerben_Zaagsma 
https://www.slideshare.net/gerbenzaagsma
Image credits 
The Field Museum Library, Hall 37 Geology overview. URL: https://www.flickr.com/photos/ 
field_museum_library/3333920156/in/set-72157614881700424. 
The U.S. National Archives, Photograph of Card Catalog in Central Search Room, 1942. URL: http:// 
www.flickr.com/photos/usnationalarchives/3873932255/. 
Witch computer 1951: Wolverhampton and Staffordshire College of Technology in 1961, The National 
Computing Museum and Computer Conservation Society/UKAEA/Wolverhampton Express and Star, via: 
http://www.wired.com/2009/09/britan-oldest-computer/. 
Code: https://www.flickr.com/photos/lord_james/4696338852/. 
Tools: Flickr Commons 
The droids we're googling for: https://www.flickr.com/photos/st3f4n/3951143570/. 
Jaws (Steven Spielberg) original movie poster: https://en.wikipedia.org/wiki/File:JAWS_Movie_poster.jpg 
Structured/unstructured data: http://www.emc.com/collateral/demos/microsites/emc-digital-universe- 
2011/index.htm 
Macbook Data Mining: http://www.flickr.com/photos/17208993@N00/442531562/. 
Topic Modeling Martha Ballard’s Diary: http://www.cameronblevins.org/posts/topic-modeling-martha-ballards- 
diary/. 
Boolean operators: http://uksourcers.co.uk/2012/capital-letters-the-key-to-boolean-success/ 
Miami University students in laboratory classroom 1908: https://www.flickr.com/photos/ 
muohio_digital_collections/3199691495/

Más contenido relacionado

La actualidad más candente

International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Sciencedatasciencekorea
 
A Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesA Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesIan Mulvany
 
Big Data in the Arts and Humanities
Big Data in the Arts and HumanitiesBig Data in the Arts and Humanities
Big Data in the Arts and HumanitiesAndrew Prescott
 
Humanities in the Digital World
Humanities in the Digital WorldHumanities in the Digital World
Humanities in the Digital WorldDavid De Roure
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle Kimberly Hoffman
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjMirko Lorenz
 
New Forms of Data for e-Research
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-ResearchDavid De Roure
 
CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730jeffreylancaster
 
MPhil Lecture of Data Vis for Presentation
MPhil Lecture of Data Vis for PresentationMPhil Lecture of Data Vis for Presentation
MPhil Lecture of Data Vis for PresentationShawn Day
 
Data Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopData Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopCarly Strasser
 
Scholarship in the Digital World
Scholarship in the Digital WorldScholarship in the Digital World
Scholarship in the Digital WorldDavid De Roure
 
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014Kimberly Hoffman
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Jon Voss
 
Beyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional PracticeBeyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional PracticeEric Kansa
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFascorlosquet
 
Google Tools for Digital Humanities Scholars
Google Tools for Digital Humanities ScholarsGoogle Tools for Digital Humanities Scholars
Google Tools for Digital Humanities ScholarsShawn Day
 

La actualidad más candente (20)

Rogers digitalmethods 4nov2010
Rogers digitalmethods 4nov2010Rogers digitalmethods 4nov2010
Rogers digitalmethods 4nov2010
 
Semantic web Santhosh N Basavarajappa
Semantic web   Santhosh N BasavarajappaSemantic web   Santhosh N Basavarajappa
Semantic web Santhosh N Basavarajappa
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
 
A Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesA Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific Curiosities
 
Big Data in the Arts and Humanities
Big Data in the Arts and HumanitiesBig Data in the Arts and Humanities
Big Data in the Arts and Humanities
 
Humanities in the Digital World
Humanities in the Digital WorldHumanities in the Digital World
Humanities in the Digital World
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle
 
Mini-Training: DataViz, data-driven documents and D3.js
Mini-Training: DataViz, data-driven documents and D3.jsMini-Training: DataViz, data-driven documents and D3.js
Mini-Training: DataViz, data-driven documents and D3.js
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
 
New Forms of Data for e-Research
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-Research
 
CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730
 
MPhil Lecture of Data Vis for Presentation
MPhil Lecture of Data Vis for PresentationMPhil Lecture of Data Vis for Presentation
MPhil Lecture of Data Vis for Presentation
 
Data Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopData Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities Workshop
 
Scholarship in the Digital World
Scholarship in the Digital WorldScholarship in the Digital World
Scholarship in the Digital World
 
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.
 
Beyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional PracticeBeyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional Practice
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFa
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
Google Tools for Digital Humanities Scholars
Google Tools for Digital Humanities ScholarsGoogle Tools for Digital Humanities Scholars
Google Tools for Digital Humanities Scholars
 

Destacado

Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data Mining Ieee Papers Trichy
Data Mining Ieee Papers TrichyData Mining Ieee Papers Trichy
Data Mining Ieee Papers Trichykrish madhi
 
Presentation data mining(1)
Presentation data mining(1)Presentation data mining(1)
Presentation data mining(1)cegonsoft1999
 
Cloud computing 2015 ieee papers Data mining ieee project titles
Cloud computing  2015 ieee papers  Data mining ieee project titlesCloud computing  2015 ieee papers  Data mining ieee project titles
Cloud computing 2015 ieee papers Data mining ieee project titlesDoClick Solutions
 
Project center in trichy @ieee 2016 17 titles for java and dotnet
Project center in trichy @ieee 2016 17 titles for java and dotnetProject center in trichy @ieee 2016 17 titles for java and dotnet
Project center in trichy @ieee 2016 17 titles for java and dotnetElakkiya Triplen
 
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACHMINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACHNexgen Technology
 
Mining Electronic Health Records for Insights
Mining Electronic Health Records for InsightsMining Electronic Health Records for Insights
Mining Electronic Health Records for InsightsOntotext
 
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCAFinal year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCAprojectsepark
 
Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences Biplab Debnath
 
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdSMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdHealthcare consultant
 
Smart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoftSmart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoftCustom Soft
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Monkey runner & Monkey testing
Monkey runner & Monkey testingMonkey runner & Monkey testing
Monkey runner & Monkey testingSWAAM Tech
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar reportmayurik19
 

Destacado (20)

Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data Mining Ieee Papers Trichy
Data Mining Ieee Papers TrichyData Mining Ieee Papers Trichy
Data Mining Ieee Papers Trichy
 
Presentation data mining(1)
Presentation data mining(1)Presentation data mining(1)
Presentation data mining(1)
 
Cloud computing 2015 ieee papers Data mining ieee project titles
Cloud computing  2015 ieee papers  Data mining ieee project titlesCloud computing  2015 ieee papers  Data mining ieee project titles
Cloud computing 2015 ieee papers Data mining ieee project titles
 
Project center in trichy @ieee 2016 17 titles for java and dotnet
Project center in trichy @ieee 2016 17 titles for java and dotnetProject center in trichy @ieee 2016 17 titles for java and dotnet
Project center in trichy @ieee 2016 17 titles for java and dotnet
 
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACHMINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
 
Mining Electronic Health Records for Insights
Mining Electronic Health Records for InsightsMining Electronic Health Records for Insights
Mining Electronic Health Records for Insights
 
PPT FOR BIG
PPT FOR BIGPPT FOR BIG
PPT FOR BIG
 
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCAFinal year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA
 
Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences
 
Data mining
Data miningData mining
Data mining
 
Text categorization
Text categorizationText categorization
Text categorization
 
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdSMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
 
Smart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoftSmart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoft
 
Monkey talk
Monkey talkMonkey talk
Monkey talk
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Monkey runner & Monkey testing
Monkey runner & Monkey testingMonkey runner & Monkey testing
Monkey runner & Monkey testing
 
HMI
HMIHMI
HMI
 
Human machine interface
Human machine interfaceHuman machine interface
Human machine interface
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 

Similar a Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014

Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Stella Wisdom
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesChantal van Son
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeJosh Cowls
 
Critical issues in the collection, analysis and use of student (digital) data
Critical issues in the collection, analysis and use of student (digital) dataCritical issues in the collection, analysis and use of student (digital) data
Critical issues in the collection, analysis and use of student (digital) dataUniversity of South Africa (Unisa)
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Han Woo PARK
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchJaap Kamps
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Digital Humanities by Ingrid Thomson
Digital Humanities  by Ingrid ThomsonDigital Humanities  by Ingrid Thomson
Digital Humanities by Ingrid Thomsonpvhead123
 
Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015University of Cape Town
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time MachineGiovanni Colavizza
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015Jonathan Woodward
 
Exploring human behaviour in interdisciplinary learning environments - Ali Fi...
Exploring human behaviour in interdisciplinary learning environments - Ali Fi...Exploring human behaviour in interdisciplinary learning environments - Ali Fi...
Exploring human behaviour in interdisciplinary learning environments - Ali Fi...The Higher Education Academy
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 

Similar a Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014 (20)

Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
Critical issues in the collection, analysis and use of student (digital) data
Critical issues in the collection, analysis and use of student (digital) dataCritical issues in the collection, analysis and use of student (digital) data
Critical issues in the collection, analysis and use of student (digital) data
 
AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101  AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes Search
 
Dh presentation 2018
Dh presentation 2018Dh presentation 2018
Dh presentation 2018
 
Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
Digital Humanities Workshop
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 
Digital Humanities by Ingrid Thomson
Digital Humanities  by Ingrid ThomsonDigital Humanities  by Ingrid Thomson
Digital Humanities by Ingrid Thomson
 
Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time Machine
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
 
Exploring human behaviour in interdisciplinary learning environments - Ali Fi...
Exploring human behaviour in interdisciplinary learning environments - Ali Fi...Exploring human behaviour in interdisciplinary learning environments - Ali Fi...
Exploring human behaviour in interdisciplinary learning environments - Ali Fi...
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 

Más de Gerben Zaagsma

20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3 - Bronnenkri...
20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3  - Bronnenkri...20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3  - Bronnenkri...
20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3 - Bronnenkri...Gerben Zaagsma
 
20130314 - Historical sources and data in the digital age
20130314 - Historical sources and data in the digital age20130314 - Historical sources and data in the digital age
20130314 - Historical sources and data in the digital ageGerben Zaagsma
 
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...Gerben Zaagsma
 
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - InleidingGerben Zaagsma
 
20130107 - Introduction: On Digital History
20130107 -  Introduction: On Digital History20130107 -  Introduction: On Digital History
20130107 - Introduction: On Digital HistoryGerben Zaagsma
 
20110517 - Presenting the Yiddish past in contemporary Europe
20110517 - Presenting the Yiddish past in contemporary Europe20110517 - Presenting the Yiddish past in contemporary Europe
20110517 - Presenting the Yiddish past in contemporary EuropeGerben Zaagsma
 
20111031 - Online Jewish content in a broader context
20111031 - Online Jewish content in a broader context20111031 - Online Jewish content in a broader context
20111031 - Online Jewish content in a broader contextGerben Zaagsma
 

Más de Gerben Zaagsma (7)

20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3 - Bronnenkri...
20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3  - Bronnenkri...20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3  - Bronnenkri...
20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3 - Bronnenkri...
 
20130314 - Historical sources and data in the digital age
20130314 - Historical sources and data in the digital age20130314 - Historical sources and data in the digital age
20130314 - Historical sources and data in the digital age
 
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
 
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
 
20130107 - Introduction: On Digital History
20130107 -  Introduction: On Digital History20130107 -  Introduction: On Digital History
20130107 - Introduction: On Digital History
 
20110517 - Presenting the Yiddish past in contemporary Europe
20110517 - Presenting the Yiddish past in contemporary Europe20110517 - Presenting the Yiddish past in contemporary Europe
20110517 - Presenting the Yiddish past in contemporary Europe
 
20111031 - Online Jewish content in a broader context
20111031 - Online Jewish content in a broader context20111031 - Online Jewish content in a broader context
20111031 - Online Jewish content in a broader context
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 

Último (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 

Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014

  • 1. Search & Data Mining SKILLS SEMINAR Master of European History, University of Luxembourg, 11 December 2014 Gerben Zaagsma Lichtenberg-Kolleg,
  • 2.
  • 3. Overview 1. 2. T 3. Practical exercises 1. Introduction search & data mining
  • 4. Code yourself… …or use existing tools
  • 5.
  • 6. Why historians should be interested: Old New CHANGE Analogue resources Digital resources SCALE Small data Big data Close reading Distant reading TECHNOLOGY
  • 7. the Big Data revolution? Big data and claims about a paradigm change in the humanities
  • 9.
  • 10.
  • 11. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history
  • 12. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism?
  • 13. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism? Based upon changes of scale & method: humanities supposedly becoming more ‘scientific’ > results can be checked and replicated, but can they? Interpretation.
  • 14. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism? Based upon changes of scale & method: humanities supposedly becoming more ‘scientific’ > results can be checked and replicated, but can they? Interpretation. Politics: funding & valorisation
  • 15. “One of the problems confronting data enthusiasts in the humanities is that we feel a need to convince our more old-fashioned colleagues about what can be done. But our role as advocates of data shouldn't mean that we lose our critical sense as scholars. [....] there is a risk that we look more carefully at the technical components of the datasets than the historical context of the information that they represent. Andrew Prescott, ‘The Deceptions of Data’, Digital Riffs (13 January 2013).
  • 16. Frédéric Clavert, ‘Lecture des sources historiennes à l’ère numérique’ (14 November 2012) Integrate approaches & methods/ hybridity
  • 18. Google/ Bing/ Yahoo er is veel meer ...
  • 19. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us
  • 20. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us http://www.langreiter.com/exec/yahoo-vs-google.html
  • 21. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us http://yometa.com
  • 24.
  • 25. Web search round-up differences between search engines filter bubble deep web versus visible web
  • 28. example of Compactmemory: a great resource on German-Jewish history
  • 29. Die Sammlung umfasst die 110 wichtigsten jüdischen Zeitungen und Zeitschriften des deutschsprachigen Raumes aus den Jahren 1806-1938. Die Periodika repräsentieren die gesamte religiöse, politische, soziale, literarische oder wissenschaftliche Bandbreite der jüdischen Gemeinschaft. but be aware of selection: focus on elites and organisations that highlight German Jewry’s process of emancipation : • classical vision in historiography on German Jewry? • reinforcement of existing master narratives?
  • 31.
  • 32.
  • 33.
  • 34. Processing and searching data on your own computer…
  • 35.
  • 36.
  • 37.
  • 39.
  • 40. data? data = computer-processable information
  • 41.
  • 43. Many digital libraries/archives: un-/semi-structured data
  • 44. Digital editions: bridging the gap with XML
  • 45.
  • 46.
  • 47. http://eculture.cs.vu.nl/europeana/session/search •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal Semantic web and linking data
  • 48. •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal cs.vu.nl/europeana/session/search
  • 49. •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal
  • 50. Some definitions of data mining:
  • 51. At its simplest, data mining is the process of extracting new knowledge (usually in terms of previously unknown patterns) from sets of data already in existence. Jonathan Hagood
  • 52. Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Wikipedia
  • 53. Examples of projects and techniques
  • 54.
  • 55. an n-gram is a contiguous sequence of n items from a given sequence of text or speech
  • 56.
  • 57.
  • 58. Topic Modeling Martha Ballard’s Diary
  • 59. data? data & data mining ≠ neutral
  • 60. “What is too often forgotten, though, is that our digital helpers are full of ‘theory’ and ‘judgement’ already. As with any methodology, they rely on sets of assumptions, models, and strategies. Theory is already at work on the most basic level when it comes to defining units of analysis, algorithms, and visualisation procedures.” Bernhard Rieder and Theo Röhle, ‘Digital Methods: Five Challenges’ in: David M Berry ed., Understanding Digital Humanities (Houndmills: Palgrave Macmillan, 2012) 67-85, 70.
  • 63. Overview of exercises http://goo.gl/72fCn7
  • 64. Tools & workflows Voyant Tools Voyant Tools Documentation Programming Historian DIRT: Digital Research Tools Turkel, William J., Kevin Kee, and Spencer Roberts, ‘A Method for Navigating the Infinite Archive’ in: Toni Weller ed., History in the Digital Age (London; New York: Routledge, 2013). William J. Turkel: How To
  • 65. Further reading Special issue on Digital History, BMGN - Low Countries Historical Review, 128/4 (2013). Haber, Peter, Digital Past : Geschichtswissenschaft Im Digitalen Zeitalter (München: Oldenbourg Verlag, 2011). Boonstra, Onno, Leen Breure, and Peter Doorn, Past, Present and Future of Historical Information Science (Amsterdam: NIWI-KNAW, 2004). Ciravegna, Fabio, Mark Greengrass, Tim Hitchcock, Sam Chapman, Jamie McLaughlin, and Ravish Bhagdev, ‘Finding Needles in Hay- Stacks: Data-Mining in Distributed Historical Datasets’ in: Mark Greengrass and Lorna M Hughes eds., The Virtual Representation of the Past (Ashgate, 2008). Cohen, D, F Gibbs, T Hitchcock, G Rockwell, J Sander, R Shoemaker, S Sinclair, S Takats, W J Turkel, and C Briquet. "Data Mining with Criminal Intent." Final white paper (2011). Hagood, Jonathan, "A Brief Introduction to Data Mining Projects in the Humanities." Bulletin of the American Society for Information Science and Technology 38/4 (2012). Hitchcock, Tim, "Big Data for Dead People: Digital Readings and the Conundrums of Positivism." (9 December 2013). Leonard, Peter, "Mining Large Datasets for the Humanities”, IFLA WLIC 2014.
  • 66. Dr. Gerben Zaagsma http://gerbenzaagsma.org de.linkedin.com/in/gerbenzaagsma/ https://twitter.com/gerbenzaagsma https://uni-goettingen.academia.edu/GerbenZaagsma https://www.researchgate.net/profile/Gerben_Zaagsma https://www.slideshare.net/gerbenzaagsma
  • 67. Image credits The Field Museum Library, Hall 37 Geology overview. URL: https://www.flickr.com/photos/ field_museum_library/3333920156/in/set-72157614881700424. The U.S. National Archives, Photograph of Card Catalog in Central Search Room, 1942. URL: http:// www.flickr.com/photos/usnationalarchives/3873932255/. Witch computer 1951: Wolverhampton and Staffordshire College of Technology in 1961, The National Computing Museum and Computer Conservation Society/UKAEA/Wolverhampton Express and Star, via: http://www.wired.com/2009/09/britan-oldest-computer/. Code: https://www.flickr.com/photos/lord_james/4696338852/. Tools: Flickr Commons The droids we're googling for: https://www.flickr.com/photos/st3f4n/3951143570/. Jaws (Steven Spielberg) original movie poster: https://en.wikipedia.org/wiki/File:JAWS_Movie_poster.jpg Structured/unstructured data: http://www.emc.com/collateral/demos/microsites/emc-digital-universe- 2011/index.htm Macbook Data Mining: http://www.flickr.com/photos/17208993@N00/442531562/. Topic Modeling Martha Ballard’s Diary: http://www.cameronblevins.org/posts/topic-modeling-martha-ballards- diary/. Boolean operators: http://uksourcers.co.uk/2012/capital-letters-the-key-to-boolean-success/ Miami University students in laboratory classroom 1908: https://www.flickr.com/photos/ muohio_digital_collections/3199691495/