SlideShare una empresa de Scribd logo
1 de 75
Descargar para leer sin conexión
Using Web Archives to Enrich the Live
Web Experience Through Storytelling	
Yasmin	AlNoamany	
	
University	of	California,	Berkeley	
Web	Science	&	Digital	Libraries	Research	Group	at	ODU	
@yasmina_anwar	
@WebSciDL	
Research	Funded	by	IMLS	LG-71-15-0077-15	
		
csv,conf,v3	
Portland,	OR,	2017-05-03
My son, Yousof, was 2 on
January 17, 2011
hQps://www.facebook.com/elshaheeed.co.uk/	
2
No worries! MulHple iniHaHves for
documenHng the EgypHan RevoluHon
403	photos	with	
informaVon	about	the	
lives	of	the	martyrs	
3,525	images	and	
2,387	videos	posted	by	
people	for	the	January	
demonstraVons	
artwork	produced	
during	the	EgypVan	
RevoluVon	
3
Several studies and books about the
EgypHan RevoluHon 
4	4
These repositories do not exist any more!
5	5
Luckily these sites are archived at Archive-It
in the EgypHan RevoluHon collecHon
hQp://wayback.archive-it.org/2358/20110314134348/hQp://iamjan25.com/	
hQp://wayback.archive-it.org/2358/20110211072306/hQp://1000memories.com/egypt/	
hQps://wayback.archive-it.org/2358/20111128095924/hQp://iamtahrir.com/	
6
Archived collecHons are important
for posterity, but there are
problems with archived collecHons
7
AOer 10 years, Yousof knows
about Archive-It 
>	3,500	
collecVons	
	
~340	
insVtuVons	
	
>	10B	archived	
pages	
Archive-It,	a	subscripVon-based	service,	hosts	curated	web	collecVons	
8
There is more than one collecHon about
the EgypHan RevoluHon
•  “2010-2011	Arab	Spring”	hQps://archive-it.org/collecVons/3101	
•  “North	Africa	&	the	Middle	East	2011-2013”	hQps://archive-it.org/collecVons/2349	
•  “Egypt	RevoluVon	and	PoliVcs”		hQps://archive-it.org/collecVons/2358	
9
Current browsing and searching services for
the “Egypt RevoluHon and PoliHcs” collecHon
10
Current browsing and searching services for
the “Egypt RevoluHon and PoliHcs” collecHon
11
Current browsing and searching services for
the “Egypt RevoluHon and PoliHcs” collecHon

12
CollecHon understanding and collecHon
summarizaHon are not currently supported
Not easy to answer “what’s in that collecHon?” or
“how is this collecHon different from others”?
13
Our early aYempts at collecHon understanding
tried to include everything… 
14	
“Visualizing	digital	collec5ons	at	Archive-It”,	JCDL	2012.	
hQp://ws-dl.blogspot.com/2012/08/2012-08-10-ms-thesis-visualizing.html
1000s of seeds X 1000s of archived pages ==
Conven,onal Vis Methods Not Applicable
15
Idea:
Storytelling
16
Stories in literature
Story	elements:		sedng,	characters,	sequence,	exposiVon,	conflict,	
climax,	resoluVon	
Once	upon	a	Vme	
hQp://www.learner.org/interacVves/story/	
17
Stories in social media
“It's	hard	to	define	a	story,	but	I	know	it	when	I	see	it”	(Alexander,	2008)	
basically,	just	arranging	web	pages	in	Vme	
18
“Storytelling” is becoming a popular
technique in social media 
19
What are the limitaHons of
storytelling services?
20
The EgypHan RevoluHon on Storify
21
Bookmarking, not preserving! 
22
Use interface people already know how to use
to summarize collecHons
Archived	collec5ons	Storytelling	services	
Archived	enriched	
stories	
23
Hand-craOed stories to summarize the
EgypHan RevoluHon collecHon for Yousof
hQps://storify.com/yasmina_anwar/the-egypVan-revoluVon-
on-archive-it-collecVon	
hQps://storify.com/yasmina_anwar/the-story-of-the-egypVan-
revoluVon-from-archive-	 24
How do we generate this automaHcally?
25
CollecHons have two dimensions:
{Fixed, Sliding} X {Page, Time}
t1	 t3	t2	 t5	t4	 tk	
	…	
	
URI	
Time	
t6	
26	
	…	
	
	…
Fixed Page, Fixed Time
A	desktop	Chrome	user-agent	
hQp://www.cnn.com/2014/02/24/world/africa/egypt-poliVcs/
index.html?hpt=wo_c2	
Android	Chrome	user-agent	
hQp://www.cnn.com/2014/02/24/world/africa/egypt-
poliVcs/index.html?hpt=wo_c2	
Schneider	and	McCown,	“First	Steps	in	Archiving	the	Mobile	Web:	Automated	Discovery	of	Mobile	Websites”,	JCDL	2013.	
Kelly	et	al.	“A	Method	for	IdenVfying	Personalized	RepresentaVons	in	Web	Archives”,	D-Lib	Magazine	2013	.	
27
Feb	1	 Feb	1	 Feb	2	
Feb	4	 Feb	5	 Feb	7	
Feb	9	 Feb	11	 Feb	11	
28	
Fixed	Page,	Sliding	Time
Feb.	11,	2011	
Mubarak	resigns	
29	
Sliding	Page,	Fixed	Time
Jan	27	 Jan	31	
Feb	7	Feb	4	
Feb	11	 Feb	11	
Feb	2	
Jan	25	
Feb	10	
30	
Sliding	Page,	Sliding	Time
The Dark and Stormy Archives (DSA) framework
Establish a
baseline
Reduce the candidate
pool of archived pages
Select good
representative
pages
Characteristics of
human-generated
Stories
Characteristics of
Archive-It
collections
Exclude duplicates
Exclude off-topic pages
Exclude non-English Language
Dynamically slice the collection
Cluster the pages
in each slice
Select high-quality
pages from each
cluster
Order pages
by time
Visualize
31	
hQps://pbs.twimg.com/media/BQcpj7ACMAAHRp4.jpg
Establish a baseline of 
social media stories
"Characteris5cs	of	Social	Media	Stories”,	TPDL	2015,	IJDL	2016.		
32
What is the length of a story
(the number of resources per story)?
This	story	has	
31	resources	
1
3
2
33
What are the types of resources that
compose a story?
Quotes		
Video	
34	
This	story	has		
•  19	quotes		
•  8	images	
•  4	videos
We found that 28 mementos is a good
number for the resources in the stories.
35
DetecHng off-topic pages
"Detec5ng	Off-Topic	Pages	in	Web	Archives”,	TPDL	2015,	IJDL	2016.		
36
More than 60% of archive copies of
hamdeensabahy.com are off-topic
May	13,	2012:	The	page	started	as	
on-topic.	
May	24,	2012:	Off-topic	due	to	a	
database	error.	
Mar.	21,	2013:	Not	working	because	of	
financial	problems.	
May	21,	2013:	On-topic	again	 June	5,	2014:	The	site	has	been	hacked	 Oct.	10,	2014:	The	domain	has	expired.	
hQp://wayback.archive-it.org/2358/*/hQp://hamdeensabahy.com	
37
Based on evaluaHng 6 similarity methods, we applied the best
performing method to automaHcally detect off-topic pages
May	13,	2012:	The	page	started	as	
on-topic.	
May	24,	2012:	Off-topic	due	to	a	
database	error.	
Mar.	21,	2013:	Not	working	because	of	
financial	problems.	
May	21,	2013:	On-topic	again	 June	5,	2014:	The	site	has	been	hacked	 Oct.	10,	2014:	The	domain	has	expired.	
hQp://wayback.archive-it.org/2358/*/hQp://hamdeensabahy.com	
38
9 mementos for news.egypt.com,
but 5 are duplicates
39
SelecHng representaHve pages for
generaHng stories
40	
”Genera5ng	Stories	from	Archived	Collec5ons”,	WebSci	2017.
Quality metrics for selecHng mementos
•  In	the	DSA,	memento	quality	Mq	is	calculated	as	
following:		
	
			Mq	=	(1	−	wm*Dm)	+	wql*Sql	+	wqc*Sqc	
•  Dm	is	the	memento	damage	(Brunelle,	JCDL	2014)		
•  Sql	is	the	snippet	quality	based	on	the	URI	level	
•  Sqc	is	the	snippet	quality	based	on	URI	category	
•  wm,	wql,	wqc	are	the	weights	of	memento	damage,	level,	
and	category	
41
We prefer a higher quality memento (Dm)
hQp://wayback.archive-it.org/2358/20110201231457/	
hQp://news.blogs.cnn.com/category/world/egypt-world-latest-news/	
hQp://wayback.archive-it.org/2358/20110201231622/	
hQp://www.bbc.co.uk/news/world/middle_east/	
42	
Brunelle		et	al.	Not	All	Mementos	Are	Created	Equal:	Measuring	The	Impact	Of	Missing	Resources,		JCDL	2014
We prefer pages with aYracHve snippets
hQps://wayback.archive-it.org/2358/20110207193404/hQp://news.blogs.cnn.com/2011/02/07/egypt-crisis-country-
to-aucVon-treasury-bills/	
hQps://wayback.archive-it.org/2358/20110207194425/hQp://www.cnn.com/2011/WORLD/africa/02/07/
egypt.google.execuVve/index.html?hpt=T1	
43
Visualizing stories in Storify
44	
”Genera5ng	Stories	from	Archived	Collec5ons”,	WebSci	2017.
We extract the metadata of the pages
and order them chronologically 
{ "elements":[
{
"permalink":"http://wayback.archive-it.org/694/20070523182134/http://www.usatoday.com/news/nation/2007-04-16-
virginia-tech_N.htm", "type":"link",
"source":{"href":"http://www.usatoday.com",
"name":"www.usatoday.com
@ 23, May 2007"}
},
{
"permalink":"http://wayback.archive-it.org/694/20070530182159/http://www.time.com/time/specials/2007/
vatech_victims", "type":"link", "source":{"href":"http://www.time.com",
"name":"www.time.com
@ 30, May 2007" }
},
{
"permalink":"http://wayback.archive-it.org/694/20070530182206/http://www.collegiatetimes.com/",
"type":"link", "source":{"href":"http://www.collegiatetimes.com",
"name":"www.collegiatetimes.com
@ 30, May 2007" }
},
{
"permalink":"http://wayback.archive-it.org/694/20070606234248/http://hokies416.wordpress.com/",
"type":"link", "source":{ "href":"http://hokies416.wordpress.com",
"name":"hokies416.wordpress.com
@ 06, Jun 2007" }
},
…
{ "permalink":"http://wayback.archive-it.org/694/20070620234329/http://www.hokiesports.com/april16/",
"type":"link", "source":{"href":"http://www.hokiesports.com",
"name":"www.hokiesports.com
@ 20, Jun 2007" } },
],
"description":"This is an automatically generated story from Archive-It collection.", "title":"April 16
Archive ”
}
45	
Using the Storify API, we
override the default
metadata to generate more
attractive snippets
Example of an automaHcally generated story 
46	
Notice the good
metadata: images,
titles with dates,
favicons
EvaluaHng the Dark and Stormy
Archive framework
(how good are the automaHcally generated stories?)
47
EvaluaHon is tricky!
(two perfectly good stories could have non-overlapping k=28
elements!)
•  Successful	evaluaVon	means:	
•  Human	and	DSA	stories	are	indisVnguishable	
•  Human	and	DSA	stories	are	beQer	than	Random	
•  We	use	human	evaluators	(via	Amazon's	
Mechanical	Turk)	to	compare:	
•  Human-generated	stories	
•  DSA	(automaVcally)	generated	stories	
•  Randomly	generated	stories		
48
Our guidelines for expert archivists at Archive-It
for generaHng stories from the collecHons
49
A sample HIT
50
DSA == Human
(Human,DSA) > Random
Automatic/Human Automatic/Random Human/Random
Percentage
020406080100
Automatic Human Random
51
Understanding the archived collecHons is
important for posterity 
52
Understanding the archived collecHons is
important for posterity 
Thanks	mom	for	the	DSA!	
53
Use interface people already know how to use
to summarize collecHons
Archived	collec5ons	Storytelling	services	
Archived	enriched	
stories	
54	
All	the	code,	datasets,		
papers,	slides,	etc.:	
hQp://bit.ly/YasminPhD		
	
@yasmina_anwar
Backup Slides
55
We sample k mementos from N pages of the
collecHon (k << N) to create a summary story
Collec5on	Y	
Collec5on	Z	
Collec5on	X	
56
How do we automaHcally detect
off-topic pages?
57
Textual content
cosine similarity, intersecHon of the most frequent terms,
Jaccard similarity
Method	 Similarity	
cosine	 0.7	
TF-IntersecVon	 0.6	
Jaccard	 0.5	
Method	 Similarity	
cosine	 0.0	
TF-IntersecVon	 0.0	
Jaccard	 0.0	
58
SemanHcs of the text
Web based kernel funcHon using the search engine (SE)
Method	 Similarity	
SE-Kernel	 0.7	
59	
Sahami	and	Heilman,	A	Web-based	Kernel	FuncVon	for	Measuring	the	Similarity	of	Short	Text	Snippets,	WWW	2006
Structural methods
no. of words, content-length
100	 109	
100	 5	
Method	 %	change	
WordCount	 0.09	
Method	 %	change	
WordCount	 -0.95	
60
We built a gold standard data set to
evaluate the methods

61
We manually labeled 15,760 mementos
Egypt	Revolu5on	and	Poli5cs	
URI-Rs:	136	
URI-Ms:	6,886	
Off-topic	URI-Ms:	384	
Occupy	Movement	
URI-Rs:	255	
URI-Ms:	6,570	
Off-topic	URI-Ms:	458	
Columbia	Univ.	Human	Rights	collec5on	
URI-Rs:	198	
URI-Ms:	2,304	
Off-topic	URI-Ms:	94	 62
Evaluated 6 methods on manually
labeled 15,760 
•  Textual	content	
•  cosine	similarity		
•  intersecVon	of	the	most	frequent	terms	
•  Jaccard	similarity	
•  SemanVcs	of	the	text	
•  Web	based	kernel	funcVon	using	the	search	engine	(SE)	
•  Structural	methods	
•  no.	of	words	
•  content-length	
63	"Detec5ng	Off-Topic	Pages	in	Web	Archives”,	TPDL	2015,	IJDL	2016.
Cosine Similarity performed well
64	
Similarity	Measure	 Threshold	 FP	 FN	 FP+FN	 ACC	 F1	 AUC	
Cosine|WordCount	 0.10|-0.85	 24	 10	 34	 0.987	 0.906	 0.968	
Cosine|SEKernel	 0.10|0.00	 6	 35	 40	 0.990	 0.901	 0.934	
Cosine	 0.15	 31	 22	 53	 0.983	 0.881	 0.961	
WordCount|SEKernel	 -0.80|0.00	 14	 27	 42	 0.985	 0.818	 0.885	
WordCount	 -0.85	 6	 44	 50	 0.982	 0.806	 0.870	
SEKernel	 0.05	 64	 83	 147	 0.965	 0.683	 0.865	
Bytes	 -0.65	 28	 133	 161	 0.962	 0.584	 0.746	
Jaccard	 0.05	 74	 86	 159	 0.962	 0.538	 0.809	
TF-IntersecVon	 0.00	 49	 104	 153	 0.967	 0.537	 0.740
Average precision of 0.89 on 18 different
Archive-It collecHons
65	
(Cosine,WordCount)	with	(0.10,-0.85)	thresholds
EvaluaHng the DSA 
66
MT experiment setup
•  Three	HITs	for	each	story	(69	HITs	to	evaluate	23	stories);	two	
comparisons	per	HIT:	
•  HIT1:	human	vs.	automaVc,	human	vs.	poor	
•  HIT2:	human	vs.	random,	human	vs.	poor	
•  HIT3:	random	vs.	automaVc,	automaVc	vs.	poor	
	
•  15	disVnct	turkers	with	master	qualificaVon	(i.e.,	high	acceptance	rate)		
for	each	HIT	
•  We	rejected	the	submissions	contained	poorly-generated	stories	and	
the	HITs	that	were	completed	in	less	than	10	seconds	(mean	Vme	per	
HIT	=	7	minutes)	
•  989	out	of	1,035	(69*15)	valid	HITs		
•  We	awarded	the	turker	$0.50	per	HIT	
67	hQps://www.mturk.com/mturk/help?helpPage=worker#what_is_master_worker
We prefer deep links over
high level domains (Sql) 
Feb.	11,	2011:	the	homepage	of	BBC	on	Storify	
Feb.	11,	2011:	the	homepage	of	BBC	Middle	East	secVon	on	Storify	
Feb.	11,	2011:	the	arVcle	of	BBC	on	Storify	
hQps://wayback.archive-it.org/2358/20110211191429/hQp://www.bbc.co.uk/		
hQps://wayback.archive-it.org/2358/20110211192204/hQp://www.bbc.co.uk/news/world-middle-east-12433045	
hQps://wayback.archive-it.org/2358/20110211191942/hQp://www.bbc.co.uk/news/world/middle_east/	
68
Social media pages may not produce
good snippets (Sqc)
hQp://wayback.archive-it.org/1784/20100131023240/hQp:/twiQer.com/HaiVfeed/	hQp://wayback.archive-it.org/2358/20141225080305/hQps:/www.facebook.com/elshaheeed.co.uk	
69
How do we dynamically divide the
collecHons into appropriate slices?
(in other words, how do we pick just 28?) 
70	
”Genera5ng	Stories	from	Archived	Collec5ons”,	WebSci	2017.
We expected most collecHons to look like this…
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ●●● ●●● ●●● ●●● ●●● ●●●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
2012
020406080100120
Memento−Datetime
URIs
The	Global	Food	Crisis	collecVon	at	Archive-It		
71
csvconfyasmin2017_05_03
csvconfyasmin2017_05_03
csvconfyasmin2017_05_03
csvconfyasmin2017_05_03

Más contenido relacionado

La actualidad más candente

Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web ArchivesShawn Jones
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesMichael Nelson
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesYasmin AlNoamany, PhD
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...Yasmin AlNoamany, PhD
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesShawn Jones
 
Telling Stories with Web Archives
Telling Stories with Web ArchivesTelling Stories with Web Archives
Telling Stories with Web ArchivesMichele Weigle
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsShawn Jones
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
 
Information Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-ItInformation Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-ItMichele Weigle
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingYasmin AlNoamany, PhD
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple ArchivesMichael Nelson
 
Bootstrapping Web Archive Collections of Stories from Micro-collections in S...
Bootstrapping Web Archive Collections  of Stories from Micro-collections in S...Bootstrapping Web Archive Collections  of Stories from Micro-collections in S...
Bootstrapping Web Archive Collections of Stories from Micro-collections in S...Alexander Nwala
 
Linked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerLinked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerWiLS
 
Open the Door, Let \'em In: Virtual School Libraries
Open the Door, Let \'em In: Virtual School LibrariesOpen the Door, Let \'em In: Virtual School Libraries
Open the Door, Let \'em In: Virtual School LibrariesJoyce Kasman Valenza
 
NAG2007
NAG2007NAG2007
NAG2007daveyp
 
Wikipedia: Why? Who? and How?
Wikipedia: Why? Who? and How?Wikipedia: Why? Who? and How?
Wikipedia: Why? Who? and How?Don Boozer
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Michael Nelson
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better Michael Nelson
 
Interoperability and Its Role In Standardization, Plus A ResourceSync Overview
Interoperability and Its Role In Standardization, Plus A ResourceSync OverviewInteroperability and Its Role In Standardization, Plus A ResourceSync Overview
Interoperability and Its Role In Standardization, Plus A ResourceSync OverviewPeter Murray
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...Alison Hitchens
 

La actualidad más candente (20)

Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
 
Telling Stories with Web Archives
Telling Stories with Web ArchivesTelling Stories with Web Archives
Telling Stories with Web Archives
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive Collections
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Information Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-ItInformation Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-It
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Bootstrapping Web Archive Collections of Stories from Micro-collections in S...
Bootstrapping Web Archive Collections  of Stories from Micro-collections in S...Bootstrapping Web Archive Collections  of Stories from Micro-collections in S...
Bootstrapping Web Archive Collections of Stories from Micro-collections in S...
 
Linked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerLinked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve Meyer
 
Open the Door, Let \'em In: Virtual School Libraries
Open the Door, Let \'em In: Virtual School LibrariesOpen the Door, Let \'em In: Virtual School Libraries
Open the Door, Let \'em In: Virtual School Libraries
 
NAG2007
NAG2007NAG2007
NAG2007
 
Wikipedia: Why? Who? and How?
Wikipedia: Why? Who? and How?Wikipedia: Why? Who? and How?
Wikipedia: Why? Who? and How?
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
Interoperability and Its Role In Standardization, Plus A ResourceSync Overview
Interoperability and Its Role In Standardization, Plus A ResourceSync OverviewInteroperability and Its Role In Standardization, Plus A ResourceSync Overview
Interoperability and Its Role In Standardization, Plus A ResourceSync Overview
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
 

Similar a csvconfyasmin2017_05_03

The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...Biblioteca Nacional de España
 
Smart Libraries, Smart Classrooms
Smart Libraries, Smart ClassroomsSmart Libraries, Smart Classrooms
Smart Libraries, Smart ClassroomsJudy O'Connell
 
Multilingual Online Services for Libraries: Gget global with your local!
Multilingual Online Services for Libraries: Gget global with your local! Multilingual Online Services for Libraries: Gget global with your local!
Multilingual Online Services for Libraries: Gget global with your local! Alexandra Yarrow
 
Biodiversity Heritage Library: An Effective local-global collaboration
Biodiversity Heritage Library: An Effective local-global collaborationBiodiversity Heritage Library: An Effective local-global collaboration
Biodiversity Heritage Library: An Effective local-global collaborationMBLWHOIlibrary
 
Biodiversity Heritage Library, an effective local-global collaboration. The p...
Biodiversity Heritage Library, an effective local-global collaboration. The p...Biodiversity Heritage Library, an effective local-global collaboration. The p...
Biodiversity Heritage Library, an effective local-global collaboration. The p...Matt Person
 
YouTube, social media, and academic libraries: building a digital collection
YouTube, social media, and academic libraries: building a digital collection YouTube, social media, and academic libraries: building a digital collection
YouTube, social media, and academic libraries: building a digital collection Allan Cho
 
Open Education Resources and the Open Web: Collaborating & sharing for studen...
Open Education Resources and the Open Web: Collaborating & sharing for studen...Open Education Resources and the Open Web: Collaborating & sharing for studen...
Open Education Resources and the Open Web: Collaborating & sharing for studen...Heather Braum
 
Fostering historical thinking EADTU 2016
Fostering historical thinking EADTU 2016Fostering historical thinking EADTU 2016
Fostering historical thinking EADTU 2016Orna Farrell
 
The DPLA and NY Heritage for Tech Camp 2014
The DPLA and NY Heritage for Tech Camp 2014The DPLA and NY Heritage for Tech Camp 2014
The DPLA and NY Heritage for Tech Camp 2014Larry Naukam
 
Web usability in practice: a case study from the First World War Poetry Digit...
Web usability in practice: a case study from the First World War Poetry Digit...Web usability in practice: a case study from the First World War Poetry Digit...
Web usability in practice: a case study from the First World War Poetry Digit...Kate Lindsay
 
Social media as an engagement, collection, and curriculum tool: Using YouTube...
Social media as an engagement, collection, and curriculum tool: Using YouTube...Social media as an engagement, collection, and curriculum tool: Using YouTube...
Social media as an engagement, collection, and curriculum tool: Using YouTube...Allan Cho
 
M5y1 presentation
M5y1 presentationM5y1 presentation
M5y1 presentationiSkillsTav
 
Researching your dissertation for MA Education students
Researching your dissertation for MA Education studentsResearching your dissertation for MA Education students
Researching your dissertation for MA Education studentsyiwenhon
 
How to get the pdf? : with ór without the help of your library
How to get the pdf? : with ór without the help of your libraryHow to get the pdf? : with ór without the help of your library
How to get the pdf? : with ór without the help of your libraryGuus van den Brekel
 
Uksg2012 dave pattern_final
Uksg2012 dave pattern_finalUksg2012 dave pattern_final
Uksg2012 dave pattern_finaldaveyp
 
Digital storytelling microsoft
Digital storytelling microsoftDigital storytelling microsoft
Digital storytelling microsoftthinkict
 
Nelson, Michael: Summarizing Archival Collections Using Storytelling Techniques
Nelson, Michael: Summarizing Archival Collections Using Storytelling TechniquesNelson, Michael: Summarizing Archival Collections Using Storytelling Techniques
Nelson, Michael: Summarizing Archival Collections Using Storytelling TechniquesReynolds Journalism Institute (RJI)
 
Presentation opportunities for sharing resources with now’s learning reposito...
Presentation opportunities for sharing resources with now’s learning reposito...Presentation opportunities for sharing resources with now’s learning reposito...
Presentation opportunities for sharing resources with now’s learning reposito...annalarmstrong
 

Similar a csvconfyasmin2017_05_03 (20)

The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
 
Smart Libraries, Smart Classrooms
Smart Libraries, Smart ClassroomsSmart Libraries, Smart Classrooms
Smart Libraries, Smart Classrooms
 
Multilingual Online Services for Libraries: Gget global with your local!
Multilingual Online Services for Libraries: Gget global with your local! Multilingual Online Services for Libraries: Gget global with your local!
Multilingual Online Services for Libraries: Gget global with your local!
 
Biodiversity Heritage Library: An Effective local-global collaboration
Biodiversity Heritage Library: An Effective local-global collaborationBiodiversity Heritage Library: An Effective local-global collaboration
Biodiversity Heritage Library: An Effective local-global collaboration
 
Biodiversity Heritage Library, an effective local-global collaboration. The p...
Biodiversity Heritage Library, an effective local-global collaboration. The p...Biodiversity Heritage Library, an effective local-global collaboration. The p...
Biodiversity Heritage Library, an effective local-global collaboration. The p...
 
YouTube, social media, and academic libraries: building a digital collection
YouTube, social media, and academic libraries: building a digital collection YouTube, social media, and academic libraries: building a digital collection
YouTube, social media, and academic libraries: building a digital collection
 
Open Education Resources and the Open Web: Collaborating & sharing for studen...
Open Education Resources and the Open Web: Collaborating & sharing for studen...Open Education Resources and the Open Web: Collaborating & sharing for studen...
Open Education Resources and the Open Web: Collaborating & sharing for studen...
 
Fostering historical thinking EADTU 2016
Fostering historical thinking EADTU 2016Fostering historical thinking EADTU 2016
Fostering historical thinking EADTU 2016
 
The DPLA and NY Heritage for Tech Camp 2014
The DPLA and NY Heritage for Tech Camp 2014The DPLA and NY Heritage for Tech Camp 2014
The DPLA and NY Heritage for Tech Camp 2014
 
Web usability in practice: a case study from the First World War Poetry Digit...
Web usability in practice: a case study from the First World War Poetry Digit...Web usability in practice: a case study from the First World War Poetry Digit...
Web usability in practice: a case study from the First World War Poetry Digit...
 
Social media as an engagement, collection, and curriculum tool: Using YouTube...
Social media as an engagement, collection, and curriculum tool: Using YouTube...Social media as an engagement, collection, and curriculum tool: Using YouTube...
Social media as an engagement, collection, and curriculum tool: Using YouTube...
 
M5y1 presentation
M5y1 presentationM5y1 presentation
M5y1 presentation
 
Researching your dissertation for MA Education students
Researching your dissertation for MA Education studentsResearching your dissertation for MA Education students
Researching your dissertation for MA Education students
 
How to get the pdf? : with ór without the help of your library
How to get the pdf? : with ór without the help of your libraryHow to get the pdf? : with ór without the help of your library
How to get the pdf? : with ór without the help of your library
 
Uksg2012 dave pattern_final
Uksg2012 dave pattern_finalUksg2012 dave pattern_final
Uksg2012 dave pattern_final
 
End of year 2010 11
End of year 2010 11End of year 2010 11
End of year 2010 11
 
End of year 2010 11
End of year 2010 11End of year 2010 11
End of year 2010 11
 
Digital storytelling microsoft
Digital storytelling microsoftDigital storytelling microsoft
Digital storytelling microsoft
 
Nelson, Michael: Summarizing Archival Collections Using Storytelling Techniques
Nelson, Michael: Summarizing Archival Collections Using Storytelling TechniquesNelson, Michael: Summarizing Archival Collections Using Storytelling Techniques
Nelson, Michael: Summarizing Archival Collections Using Storytelling Techniques
 
Presentation opportunities for sharing resources with now’s learning reposito...
Presentation opportunities for sharing resources with now’s learning reposito...Presentation opportunities for sharing resources with now’s learning reposito...
Presentation opportunities for sharing resources with now’s learning reposito...
 

Más de Yasmin AlNoamany, PhD

Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research ObjectYasmin AlNoamany, PhD
 
Generating stories from Archive-It collections
Generating stories from Archive-It collectionsGenerating stories from Archive-It collections
Generating stories from Archive-It collectionsYasmin AlNoamany, PhD
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingYasmin AlNoamany, PhD
 
User Access Patterns in Web Archives
User Access Patterns in Web ArchivesUser Access Patterns in Web Archives
User Access Patterns in Web ArchivesYasmin AlNoamany, PhD
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet ArchiveYasmin AlNoamany, PhD
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesYasmin AlNoamany, PhD
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesYasmin AlNoamany, PhD
 

Más de Yasmin AlNoamany, PhD (9)

A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
 
Data curation vanderbilt
Data curation vanderbiltData curation vanderbilt
Data curation vanderbilt
 
Generating stories from Archive-It collections
Generating stories from Archive-It collectionsGenerating stories from Archive-It collections
Generating stories from Archive-It collections
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
User Access Patterns in Web Archives
User Access Patterns in Web ArchivesUser Access Patterns in Web Archives
User Access Patterns in Web Archives
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web Archives
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web Archives
 

Último

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 

Último (20)

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 

csvconfyasmin2017_05_03

  • 1. Using Web Archives to Enrich the Live Web Experience Through Storytelling Yasmin AlNoamany University of California, Berkeley Web Science & Digital Libraries Research Group at ODU @yasmina_anwar @WebSciDL Research Funded by IMLS LG-71-15-0077-15 csv,conf,v3 Portland, OR, 2017-05-03
  • 2. My son, Yousof, was 2 on January 17, 2011 hQps://www.facebook.com/elshaheeed.co.uk/ 2
  • 3. No worries! MulHple iniHaHves for documenHng the EgypHan RevoluHon 403 photos with informaVon about the lives of the martyrs 3,525 images and 2,387 videos posted by people for the January demonstraVons artwork produced during the EgypVan RevoluVon 3
  • 4. Several studies and books about the EgypHan RevoluHon 4 4
  • 5. These repositories do not exist any more! 5 5
  • 6. Luckily these sites are archived at Archive-It in the EgypHan RevoluHon collecHon hQp://wayback.archive-it.org/2358/20110314134348/hQp://iamjan25.com/ hQp://wayback.archive-it.org/2358/20110211072306/hQp://1000memories.com/egypt/ hQps://wayback.archive-it.org/2358/20111128095924/hQp://iamtahrir.com/ 6
  • 7. Archived collecHons are important for posterity, but there are problems with archived collecHons 7
  • 8. AOer 10 years, Yousof knows about Archive-It > 3,500 collecVons ~340 insVtuVons > 10B archived pages Archive-It, a subscripVon-based service, hosts curated web collecVons 8
  • 9. There is more than one collecHon about the EgypHan RevoluHon •  “2010-2011 Arab Spring” hQps://archive-it.org/collecVons/3101 •  “North Africa & the Middle East 2011-2013” hQps://archive-it.org/collecVons/2349 •  “Egypt RevoluVon and PoliVcs” hQps://archive-it.org/collecVons/2358 9
  • 10. Current browsing and searching services for the “Egypt RevoluHon and PoliHcs” collecHon 10
  • 11. Current browsing and searching services for the “Egypt RevoluHon and PoliHcs” collecHon 11
  • 12. Current browsing and searching services for the “Egypt RevoluHon and PoliHcs” collecHon 12
  • 13. CollecHon understanding and collecHon summarizaHon are not currently supported Not easy to answer “what’s in that collecHon?” or “how is this collecHon different from others”? 13
  • 14. Our early aYempts at collecHon understanding tried to include everything… 14 “Visualizing digital collec5ons at Archive-It”, JCDL 2012. hQp://ws-dl.blogspot.com/2012/08/2012-08-10-ms-thesis-visualizing.html
  • 15. 1000s of seeds X 1000s of archived pages == Conven,onal Vis Methods Not Applicable 15
  • 18. Stories in social media “It's hard to define a story, but I know it when I see it” (Alexander, 2008) basically, just arranging web pages in Vme 18
  • 19. “Storytelling” is becoming a popular technique in social media 19
  • 20. What are the limitaHons of storytelling services? 20
  • 21. The EgypHan RevoluHon on Storify 21
  • 23. Use interface people already know how to use to summarize collecHons Archived collec5ons Storytelling services Archived enriched stories 23
  • 24. Hand-craOed stories to summarize the EgypHan RevoluHon collecHon for Yousof hQps://storify.com/yasmina_anwar/the-egypVan-revoluVon- on-archive-it-collecVon hQps://storify.com/yasmina_anwar/the-story-of-the-egypVan- revoluVon-from-archive- 24
  • 25. How do we generate this automaHcally? 25
  • 26. CollecHons have two dimensions: {Fixed, Sliding} X {Page, Time} t1 t3 t2 t5 t4 tk … URI Time t6 26 … …
  • 27. Fixed Page, Fixed Time A desktop Chrome user-agent hQp://www.cnn.com/2014/02/24/world/africa/egypt-poliVcs/ index.html?hpt=wo_c2 Android Chrome user-agent hQp://www.cnn.com/2014/02/24/world/africa/egypt- poliVcs/index.html?hpt=wo_c2 Schneider and McCown, “First Steps in Archiving the Mobile Web: Automated Discovery of Mobile Websites”, JCDL 2013. Kelly et al. “A Method for IdenVfying Personalized RepresentaVons in Web Archives”, D-Lib Magazine 2013 . 27
  • 28. Feb 1 Feb 1 Feb 2 Feb 4 Feb 5 Feb 7 Feb 9 Feb 11 Feb 11 28 Fixed Page, Sliding Time
  • 31. The Dark and Stormy Archives (DSA) framework Establish a baseline Reduce the candidate pool of archived pages Select good representative pages Characteristics of human-generated Stories Characteristics of Archive-It collections Exclude duplicates Exclude off-topic pages Exclude non-English Language Dynamically slice the collection Cluster the pages in each slice Select high-quality pages from each cluster Order pages by time Visualize 31 hQps://pbs.twimg.com/media/BQcpj7ACMAAHRp4.jpg
  • 32. Establish a baseline of social media stories "Characteris5cs of Social Media Stories”, TPDL 2015, IJDL 2016. 32
  • 33. What is the length of a story (the number of resources per story)? This story has 31 resources 1 3 2 33
  • 34. What are the types of resources that compose a story? Quotes Video 34 This story has •  19 quotes •  8 images •  4 videos
  • 35. We found that 28 mementos is a good number for the resources in the stories. 35
  • 37. More than 60% of archive copies of hamdeensabahy.com are off-topic May 13, 2012: The page started as on-topic. May 24, 2012: Off-topic due to a database error. Mar. 21, 2013: Not working because of financial problems. May 21, 2013: On-topic again June 5, 2014: The site has been hacked Oct. 10, 2014: The domain has expired. hQp://wayback.archive-it.org/2358/*/hQp://hamdeensabahy.com 37
  • 38. Based on evaluaHng 6 similarity methods, we applied the best performing method to automaHcally detect off-topic pages May 13, 2012: The page started as on-topic. May 24, 2012: Off-topic due to a database error. Mar. 21, 2013: Not working because of financial problems. May 21, 2013: On-topic again June 5, 2014: The site has been hacked Oct. 10, 2014: The domain has expired. hQp://wayback.archive-it.org/2358/*/hQp://hamdeensabahy.com 38
  • 39. 9 mementos for news.egypt.com, but 5 are duplicates 39
  • 40. SelecHng representaHve pages for generaHng stories 40 ”Genera5ng Stories from Archived Collec5ons”, WebSci 2017.
  • 41. Quality metrics for selecHng mementos •  In the DSA, memento quality Mq is calculated as following: Mq = (1 − wm*Dm) + wql*Sql + wqc*Sqc •  Dm is the memento damage (Brunelle, JCDL 2014) •  Sql is the snippet quality based on the URI level •  Sqc is the snippet quality based on URI category •  wm, wql, wqc are the weights of memento damage, level, and category 41
  • 42. We prefer a higher quality memento (Dm) hQp://wayback.archive-it.org/2358/20110201231457/ hQp://news.blogs.cnn.com/category/world/egypt-world-latest-news/ hQp://wayback.archive-it.org/2358/20110201231622/ hQp://www.bbc.co.uk/news/world/middle_east/ 42 Brunelle et al. Not All Mementos Are Created Equal: Measuring The Impact Of Missing Resources, JCDL 2014
  • 43. We prefer pages with aYracHve snippets hQps://wayback.archive-it.org/2358/20110207193404/hQp://news.blogs.cnn.com/2011/02/07/egypt-crisis-country- to-aucVon-treasury-bills/ hQps://wayback.archive-it.org/2358/20110207194425/hQp://www.cnn.com/2011/WORLD/africa/02/07/ egypt.google.execuVve/index.html?hpt=T1 43
  • 44. Visualizing stories in Storify 44 ”Genera5ng Stories from Archived Collec5ons”, WebSci 2017.
  • 45. We extract the metadata of the pages and order them chronologically { "elements":[ { "permalink":"http://wayback.archive-it.org/694/20070523182134/http://www.usatoday.com/news/nation/2007-04-16- virginia-tech_N.htm", "type":"link", "source":{"href":"http://www.usatoday.com", "name":"www.usatoday.com @ 23, May 2007"} }, { "permalink":"http://wayback.archive-it.org/694/20070530182159/http://www.time.com/time/specials/2007/ vatech_victims", "type":"link", "source":{"href":"http://www.time.com", "name":"www.time.com @ 30, May 2007" } }, { "permalink":"http://wayback.archive-it.org/694/20070530182206/http://www.collegiatetimes.com/", "type":"link", "source":{"href":"http://www.collegiatetimes.com", "name":"www.collegiatetimes.com @ 30, May 2007" } }, { "permalink":"http://wayback.archive-it.org/694/20070606234248/http://hokies416.wordpress.com/", "type":"link", "source":{ "href":"http://hokies416.wordpress.com", "name":"hokies416.wordpress.com @ 06, Jun 2007" } }, … { "permalink":"http://wayback.archive-it.org/694/20070620234329/http://www.hokiesports.com/april16/", "type":"link", "source":{"href":"http://www.hokiesports.com", "name":"www.hokiesports.com @ 20, Jun 2007" } }, ], "description":"This is an automatically generated story from Archive-It collection.", "title":"April 16 Archive ” } 45 Using the Storify API, we override the default metadata to generate more attractive snippets
  • 46. Example of an automaHcally generated story 46 Notice the good metadata: images, titles with dates, favicons
  • 47. EvaluaHng the Dark and Stormy Archive framework (how good are the automaHcally generated stories?) 47
  • 48. EvaluaHon is tricky! (two perfectly good stories could have non-overlapping k=28 elements!) •  Successful evaluaVon means: •  Human and DSA stories are indisVnguishable •  Human and DSA stories are beQer than Random •  We use human evaluators (via Amazon's Mechanical Turk) to compare: •  Human-generated stories •  DSA (automaVcally) generated stories •  Randomly generated stories 48
  • 49. Our guidelines for expert archivists at Archive-It for generaHng stories from the collecHons 49
  • 51. DSA == Human (Human,DSA) > Random Automatic/Human Automatic/Random Human/Random Percentage 020406080100 Automatic Human Random 51
  • 52. Understanding the archived collecHons is important for posterity 52
  • 53. Understanding the archived collecHons is important for posterity Thanks mom for the DSA! 53
  • 54. Use interface people already know how to use to summarize collecHons Archived collec5ons Storytelling services Archived enriched stories 54 All the code, datasets, papers, slides, etc.: hQp://bit.ly/YasminPhD @yasmina_anwar
  • 56. We sample k mementos from N pages of the collecHon (k << N) to create a summary story Collec5on Y Collec5on Z Collec5on X 56
  • 57. How do we automaHcally detect off-topic pages? 57
  • 58. Textual content cosine similarity, intersecHon of the most frequent terms, Jaccard similarity Method Similarity cosine 0.7 TF-IntersecVon 0.6 Jaccard 0.5 Method Similarity cosine 0.0 TF-IntersecVon 0.0 Jaccard 0.0 58
  • 59. SemanHcs of the text Web based kernel funcHon using the search engine (SE) Method Similarity SE-Kernel 0.7 59 Sahami and Heilman, A Web-based Kernel FuncVon for Measuring the Similarity of Short Text Snippets, WWW 2006
  • 60. Structural methods no. of words, content-length 100 109 100 5 Method % change WordCount 0.09 Method % change WordCount -0.95 60
  • 61. We built a gold standard data set to evaluate the methods 61
  • 62. We manually labeled 15,760 mementos Egypt Revolu5on and Poli5cs URI-Rs: 136 URI-Ms: 6,886 Off-topic URI-Ms: 384 Occupy Movement URI-Rs: 255 URI-Ms: 6,570 Off-topic URI-Ms: 458 Columbia Univ. Human Rights collec5on URI-Rs: 198 URI-Ms: 2,304 Off-topic URI-Ms: 94 62
  • 63. Evaluated 6 methods on manually labeled 15,760 •  Textual content •  cosine similarity •  intersecVon of the most frequent terms •  Jaccard similarity •  SemanVcs of the text •  Web based kernel funcVon using the search engine (SE) •  Structural methods •  no. of words •  content-length 63 "Detec5ng Off-Topic Pages in Web Archives”, TPDL 2015, IJDL 2016.
  • 64. Cosine Similarity performed well 64 Similarity Measure Threshold FP FN FP+FN ACC F1 AUC Cosine|WordCount 0.10|-0.85 24 10 34 0.987 0.906 0.968 Cosine|SEKernel 0.10|0.00 6 35 40 0.990 0.901 0.934 Cosine 0.15 31 22 53 0.983 0.881 0.961 WordCount|SEKernel -0.80|0.00 14 27 42 0.985 0.818 0.885 WordCount -0.85 6 44 50 0.982 0.806 0.870 SEKernel 0.05 64 83 147 0.965 0.683 0.865 Bytes -0.65 28 133 161 0.962 0.584 0.746 Jaccard 0.05 74 86 159 0.962 0.538 0.809 TF-IntersecVon 0.00 49 104 153 0.967 0.537 0.740
  • 65. Average precision of 0.89 on 18 different Archive-It collecHons 65 (Cosine,WordCount) with (0.10,-0.85) thresholds
  • 67. MT experiment setup •  Three HITs for each story (69 HITs to evaluate 23 stories); two comparisons per HIT: •  HIT1: human vs. automaVc, human vs. poor •  HIT2: human vs. random, human vs. poor •  HIT3: random vs. automaVc, automaVc vs. poor •  15 disVnct turkers with master qualificaVon (i.e., high acceptance rate) for each HIT •  We rejected the submissions contained poorly-generated stories and the HITs that were completed in less than 10 seconds (mean Vme per HIT = 7 minutes) •  989 out of 1,035 (69*15) valid HITs •  We awarded the turker $0.50 per HIT 67 hQps://www.mturk.com/mturk/help?helpPage=worker#what_is_master_worker
  • 68. We prefer deep links over high level domains (Sql) Feb. 11, 2011: the homepage of BBC on Storify Feb. 11, 2011: the homepage of BBC Middle East secVon on Storify Feb. 11, 2011: the arVcle of BBC on Storify hQps://wayback.archive-it.org/2358/20110211191429/hQp://www.bbc.co.uk/ hQps://wayback.archive-it.org/2358/20110211192204/hQp://www.bbc.co.uk/news/world-middle-east-12433045 hQps://wayback.archive-it.org/2358/20110211191942/hQp://www.bbc.co.uk/news/world/middle_east/ 68
  • 69. Social media pages may not produce good snippets (Sqc) hQp://wayback.archive-it.org/1784/20100131023240/hQp:/twiQer.com/HaiVfeed/ hQp://wayback.archive-it.org/2358/20141225080305/hQps:/www.facebook.com/elshaheeed.co.uk 69
  • 70. How do we dynamically divide the collecHons into appropriate slices? (in other words, how do we pick just 28?) 70 ”Genera5ng Stories from Archived Collec5ons”, WebSci 2017.
  • 71. We expected most collecHons to look like this… ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ●●● ●●● ●●● ●●● ●●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2012 020406080100120 Memento−Datetime URIs The Global Food Crisis collecVon at Archive-It 71