Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Project to Production: The Web Archiving Service
1. Project to Production: The Web
Archiving Service
DLF Fall Forum, 2010
Tracy Seneca
University of California Curation Center, California Digital Library
3. Today:
• Whirlwind tour
• Project to production
• Current / future work
• The DLF community
University of California Curation Center, California Digital Library
4. Build unique archives
for local research communities
Geographically focused archives
support local research
• Los Angeles
• Monterey Bay
• Orange County
• San Diego
• Santa Barbara
Topical archives support special
research collections
• Guantanamo Bay / Tamiment Library
• California Water Districts / Water
Resources Center Archives
University of California Curation Center, California Digital Library
6. Analyze Site Change
Are there new
documents on this
site?
Are there documents
in my archive that
have been removed
from the live site?
University of California Curation Center, California Digital Library
7. Easy to Use : Simple Workflow
University of California Curation Center, California Digital Library
8. Tailor Capture Settings
Focus on site, directory
or page
Set appropriate
frequency
University of California Curation Center, California Digital Library
11. Automatic List of Sites
University of California Curation Center, California Digital Library
12. Search Across All Sites or Target Just One
University of California Curation Center, California Digital Library
13. WAS Snapshot: October 2010
Stats: Since January 2007
17 organizations
4,509 sites captured
33,619 captures run
21.7 terabytes
29 archives published
University of California Curation Center, California Digital Library
14. Project to Production
• Scale
– Infrastructure: ensure continuity of service
through server upgrades
• Policy
– Setting up service agreements
University of California Curation Center, California Digital Library
15. Engage the users
(Needs assessment doesn’t stop)
University of California Curation Center, California Digital Library
16. “Production” is not just for us!
• How does WAS fit into existing workflows?
• Is there a ‘web archivist’? Student workers?
Friday November 5th 10-11:30 am
Curators from UC San Diego, NYU, University
of Michigan and UC Berkeley share their
approaches to web archiving.
http://was.cdlib.org
WAS News
University of California Curation Center, California Digital Library
17. Archiving the State of California
Learning from the archives
California State Agency Sites
with robots no robots gone
2%
47%
51%
307 sites
1363 captures
2.7 TB
Began October 2008
University of California Curation Center, California Digital Library
18. Archiving the Gulf oil spill
Improving support for collaboration
527 sites
9288 + captures
1.6 TB
Began May 5
University of California Curation Center, California Digital Library
19. LSU tags relevant sites in Delicious
CDL imports Delicious JSON feed into WAS
University of California Curation Center, California Digital Library
20. Interoperability of CDL Services
University of California Curation Center, California Digital Library
22. DLF Community
Tools Common issues/developments
• WAS • Memento
– UC libraries, NYU, Stanford • Policy
• Archive-It – Can / should Facebook block
– Columbia, Rice, Indiana… government pages from
capture?
• Other
– Library of Congress, North
Texas, Los Alamos
• Collection
• Collaboration
• Imagination
23. In-depth demos: Nov 8th and 16th
Workflows discussion: Nov 5th
• http://www.facebook.com/webarchiving
• http://was.cdlib.org > WAS News
University of California Curation Center, California Digital Library
Editor's Notes
I’ve talked about this issue – as it applies to WAS - a couple times beforeLook back at older presentations to look at how we guessed we were going to support collaborationHow much has happened as we expectedWhat has happened that we didn’t anticipate at all
Archive sites not just to preserve static content, but to see things you couldn’t see on the live web.
You determine what’s relevant to you
See WAS handout for basic detailsIn use by UC campus libraries, Stanford university, NYU, Minnesota Historical Society, Water Resources Center archives. University of Michigan Bentley Historical Library coming on board in July.81 archives under construction (public access available to 25)Archiving all state of California Government Agency Sites“Contact Us” link from WAS website if you have questions, video demos, user guides available.Available both within and beyond the UC
Curator controls the appearance & description of public archives
Browseable list of sites automatically providedAny descriptive information you provided for a site would display here as well
Search by key word or URLLimit results to a particular siteLimit results by file type (PDF, HTML, images, audio, video, MS Office)
These figures include 2007 – 2008 pilot activity.Very high number of dark archives reflects:Caution making content public- very active new users whose content is still embargoed for rights considerations- some may be combined after migration to SOLR
“Web resources” means both entire web sites, sections of websites pertaining to the spill, and individual resources such as patent information for blowout preventers etc.117 of these sites were also included in the 2005 Hurricane Katrina Web archive. That archive is not yet publicly available; we hope to provide access concurrent with the oil spill archive.400+ of these sites were selected by Louisiana State University subject experts.
“Web resources” means both entire web sites, sections of websites pertaining to the spill, and individual resources such as patent information for blowout preventers etc.117 of these sites were also included in the 2005 Hurricane Katrina Web archive. That archive is not yet publicly available; we hope to provide access concurrent with the oil spill archive.400+ of these sites were selected by Louisiana State University subject experts.
Gave subject experts access to WAS: # site nominations: 0Gave subject experts access to external site nomination tool: # nominations: 6Pulled librarian-nominated sites from Delicious: 400+
CDL’s Digital Preservation Repository currently provides preservation services for the Web Archiving Service. Forthcoming migration will improve curatorial access to preservation reports and features and improve the fundamental design of WAS preservation.CDL’s eScholarship service has already migrated to Merritt storage.IMLS grant proposal to create a link between eScholarship and WAS to scrape, harvest and preserve all cited URLs in faculty publications.Preserve and provide access to cited references along with publications