This document outlines a seminar on search and data mining skills for historians. It discusses how digital resources have increased in scale and changed the nature of historical research. The seminar introduces tools for searching online sources as well as processing and mining data on one's own computer. Practical exercises are provided to help historians learn new skills and approaches for working with digital sources and large datasets.
6. Why historians should be
interested:
Old New CHANGE
Analogue resources Digital resources
SCALE
Small data Big data
Close reading Distant reading TECHNOLOGY
7. the Big Data revolution?
Big data and claims about a paradigm change in the
humanities
11. the Big Data revolution?
Big data and claims about a paradigm change in the
humanities
Data driven history
12. the Big Data revolution?
Big data and claims about a paradigm change in the
humanities
Data driven history
Patterns and structures: a new essentialism?
13. the Big Data revolution?
Big data and claims about a paradigm change in the
humanities
Data driven history
Patterns and structures: a new essentialism?
Based upon changes of scale & method: humanities
supposedly becoming more ‘scientific’ > results can be
checked and replicated, but can they? Interpretation.
14. the Big Data revolution?
Big data and claims about a paradigm change in the
humanities
Data driven history
Patterns and structures: a new essentialism?
Based upon changes of scale & method: humanities
supposedly becoming more ‘scientific’ > results can be
checked and replicated, but can they? Interpretation.
Politics: funding & valorisation
15. “One of the problems confronting data enthusiasts in
the humanities is that we feel a need to convince our
more old-fashioned colleagues about what can be done.
But our role as advocates of data shouldn't mean that
we lose our critical sense as scholars.
[....] there is a risk that we look more carefully at the
technical components of the datasets than the
historical context of the information that they represent.
Andrew Prescott, ‘The Deceptions of Data’, Digital Riffs (13
January 2013).
16. Frédéric Clavert, ‘Lecture des sources historiennes à l’ère
numérique’ (14 November 2012)
Integrate
approaches
& methods/
hybridity
19. zoeken op Internet algemeen:
Google
er is veel meer dan Google
filter bubble? bekijk eens: http://dontbubble.us
20. zoeken op Internet algemeen:
Google
er is veel meer dan Google
filter bubble? bekijk eens: http://dontbubble.us
http://www.langreiter.com/exec/yahoo-vs-google.html
21. zoeken op Internet algemeen:
Google
er is veel meer dan Google
filter bubble? bekijk eens: http://dontbubble.us
http://yometa.com
29. Die Sammlung umfasst die 110 wichtigsten jüdischen
Zeitungen und Zeitschriften des deutschsprachigen Raumes
aus den Jahren 1806-1938. Die Periodika repräsentieren die
gesamte religiöse, politische, soziale, literarische oder
wissenschaftliche Bandbreite der jüdischen Gemeinschaft.
but be aware of selection: focus on elites and organisations that
highlight German Jewry’s process of emancipation :
• classical vision in historiography on German Jewry?
• reinforcement of existing master narratives?
48. •Google/ Bing/ Yahoo
• er is veel meer ...
• resultaten verschillen per zoekmachine
• en er is een filter bubbel
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal
cs.vu.nl/europeana/session/search
49. •Google/ Bing/ Yahoo
• er is veel meer ...
• resultaten verschillen per zoekmachine
• en er is een filter bubbel
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal
51. At its simplest, data mining is the process of extracting
new knowledge (usually in terms of previously unknown
patterns) from sets of data already in existence.
Jonathan Hagood
52. Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of
computer science, is the computational process of discovering
patterns in large data sets involving methods at the intersection
of artificial intelligence, machine learning, statistics, and
database systems.
The overall goal of the data mining process is to extract
information from a data set and transform it into an
understandable structure for further use.
Wikipedia
60. “What is too often forgotten, though, is that our
digital helpers are full of ‘theory’ and ‘judgement’
already. As with any methodology, they rely on sets
of assumptions, models, and strategies. Theory is
already at work on the most basic level when it
comes to defining units of analysis, algorithms, and
visualisation procedures.”
Bernhard Rieder and Theo Röhle, ‘Digital Methods: Five
Challenges’ in: David M Berry ed., Understanding Digital
Humanities (Houndmills: Palgrave Macmillan, 2012) 67-85,
70.
64. Tools & workflows
Voyant Tools
Voyant Tools Documentation
Programming Historian
DIRT: Digital Research Tools
Turkel, William J., Kevin Kee, and Spencer Roberts, ‘A
Method for Navigating the Infinite Archive’ in: Toni
Weller ed., History in the Digital Age (London; New
York: Routledge, 2013).
William J. Turkel: How To
65. Further reading
Special issue on Digital History, BMGN - Low Countries Historical Review, 128/4 (2013).
Haber, Peter, Digital Past : Geschichtswissenschaft Im Digitalen Zeitalter (München:
Oldenbourg Verlag, 2011).
Boonstra, Onno, Leen Breure, and Peter Doorn, Past, Present and Future of Historical
Information Science (Amsterdam: NIWI-KNAW, 2004).
Ciravegna, Fabio, Mark Greengrass, Tim Hitchcock, Sam Chapman, Jamie McLaughlin,
and Ravish Bhagdev, ‘Finding Needles in Hay- Stacks: Data-Mining in Distributed
Historical Datasets’ in: Mark Greengrass and Lorna M Hughes eds., The Virtual
Representation of the Past (Ashgate, 2008).
Cohen, D, F Gibbs, T Hitchcock, G Rockwell, J Sander, R Shoemaker, S Sinclair, S Takats,
W J Turkel, and C Briquet. "Data Mining with Criminal Intent." Final white paper (2011).
Hagood, Jonathan, "A Brief Introduction to Data Mining Projects in the Humanities."
Bulletin of the American Society for Information Science and Technology 38/4 (2012).
Hitchcock, Tim, "Big Data for Dead People: Digital Readings and the Conundrums of
Positivism." (9 December 2013).
Leonard, Peter, "Mining Large Datasets for the Humanities”, IFLA WLIC 2014.
66. Dr. Gerben Zaagsma
http://gerbenzaagsma.org
de.linkedin.com/in/gerbenzaagsma/
https://twitter.com/gerbenzaagsma
https://uni-goettingen.academia.edu/GerbenZaagsma
https://www.researchgate.net/profile/Gerben_Zaagsma
https://www.slideshare.net/gerbenzaagsma
67. Image credits
The Field Museum Library, Hall 37 Geology overview. URL: https://www.flickr.com/photos/
field_museum_library/3333920156/in/set-72157614881700424.
The U.S. National Archives, Photograph of Card Catalog in Central Search Room, 1942. URL: http://
www.flickr.com/photos/usnationalarchives/3873932255/.
Witch computer 1951: Wolverhampton and Staffordshire College of Technology in 1961, The National
Computing Museum and Computer Conservation Society/UKAEA/Wolverhampton Express and Star, via:
http://www.wired.com/2009/09/britan-oldest-computer/.
Code: https://www.flickr.com/photos/lord_james/4696338852/.
Tools: Flickr Commons
The droids we're googling for: https://www.flickr.com/photos/st3f4n/3951143570/.
Jaws (Steven Spielberg) original movie poster: https://en.wikipedia.org/wiki/File:JAWS_Movie_poster.jpg
Structured/unstructured data: http://www.emc.com/collateral/demos/microsites/emc-digital-universe-
2011/index.htm
Macbook Data Mining: http://www.flickr.com/photos/17208993@N00/442531562/.
Topic Modeling Martha Ballard’s Diary: http://www.cameronblevins.org/posts/topic-modeling-martha-ballards-
diary/.
Boolean operators: http://uksourcers.co.uk/2012/capital-letters-the-key-to-boolean-success/
Miami University students in laboratory classroom 1908: https://www.flickr.com/photos/
muohio_digital_collections/3199691495/