SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
Trends in Use of
Pandora Archive
        Presentation at IIPC Open Day
       The Broad Value of Web Archives

      30th April, 2012, Library of Congress
                  Monica Omodei
Director, Web Archiving and Digital Preservation
           National Library of Australia
            momodei @ nla.gov.au
About the Pandora Archive
       •  Selective, Collaborative Approach "
           –  high value, discrete, timely collecting"
           –  A number of partners contribute to Pandora"
       •  Targeted Australian content "
           –  selection policy, nominations are reviewed"
       •  Historical – started 1996"
       •  Bibliocentric approach "
           –  archived sites/publications are fully catalogued"
       •  Publicly accessible"
           –  full content keyword search through national resource
              discovery service trove.nla.gov.au
           –  Browse is of reconstituted version of original site
           –  Metadata indexed in google"
Pandora Archive Stats

 •    Size – 6.32 TB"
 •    Number of Files > 140 million"
 •    Number of titles > 30.5K"
 •    Number of title instances > 73.5K"
Whole domain archive
•  We have also commissioned the IA to crawl
   the .au domain for us annually since 2005

•  Legislation prevents us from making this
   accessible yet

•  Hopefully soon we will be able to allow
   access to researchers
Australian web domain crawls

Year!      2005!      2006!      2007!      2008!        2009!      2011!



Files!     185        596        516        1 billion!   765        660
           million!   million!   million!                million!   million!

Hosts      811,523!   1,046,038! 1,247,614! 3,038,658! 1,074,645! 1,346,549!
crawled!

Size (TBs) 6.69!
         !            19.04!     18.47!     34.55!       24.29!     30.71!
The Bad News
•  we have no legal deposit legislation for electronic
   publications so permission to archive must be
   obtained"
    –  significant content missed because permission to
       copy refused"
•  QA and fixing process can be labour intensive"
    –  Technical infrastructure ten years old"
•  Selection guidelines outdated and dont align"
•  Significant content missed because of resourcing
   constraints and high labour cost"
•  Search and browse functionality very limited"
    –  no URL search, no time-based searching"
•  Current infrastructure doesnʼt scale for broader
   themed collections with multiple sites or for domain-
   scale archiving
Glass half full
       •  Situation will improve markedly if Legal Deposit
          provisions extended to digital publications"
         – The Australian Attorney-General has released a
           consultation paper with a model for this extension"
       •  Broader coverage will be achieved when
          infrastructure is upgraded, improving scalability
          and reducing labour costs for QA/fixing
         – We have commenced a multi-year Digital Library
           Infrastructure Replacement Project which includes
           upgrading our web archiving tools"
         – We are currently trialling Heritrix for collaborative
           thematic collecting, and wayback for access to our
           commissioned .gov.au sub-domain archive"
DLIR Project
•  Digital Library Infrastructure Replacement"
•  RFP was followed by RFT for components
   where reasonable solutions had been
   proposed (including core repository)"
•  The RFT evaluation recommended
   proceeding to contract negotiations with
   the selected tenderer for each component"
•  Currently preparing a submission for
   ministerial approval prior to contract
   negotiations with vendors.
Patterns of Use

•  Which archived sites are popular
   and why ?"
•  Is use of our archive growing ?"
•  What is the relative interest in
   older vs more recent captures ?"
•  Who is using our archives ?"
•  And what for ?
Which archived sites are popular ?
       •  Data source – filtered, aggregated web
          access log data which counts access to
           titles "
       •  Examined top 30 archived titles (# of
          accesses) for each year 2009 to 2012"
       •  Selected some to examine and
          speculate as to why they might be
          popular"
       •  Included consistently high ranking, and
          ones that were very variable between
          years
Reasons for popularity of archived version
            •  Were once popular and are now
               decommissioned, particularly if
               domain name continues to exist and
               redirects to the archive"
            •  May not be that popular as live sites
               but their live site links prominently to
               Pandora as an archive for their
               content"
            •  Popular referencing sources cite the
               archive as well as the live site (if it
               still exists)
Conclusions

•  Be more proactive in identifying
   unresponsive domains "
•  Market automatic redirect
   services to web site owners/
   managers"
•  Allow Google to index archive
   content for sites which are no
   longer live "
Is use of Pandora growing ?
Annual access figures for Pandora Web Site and Archive




          NB robots.txt was not introduced on the site until 2005
          Web site design change in 2008 affected measure downward
Interest in older vs recent content
         •  Filtered access logs by reference
            from the entry page to the archived
            instance

         •  aggregated accesses by age(year)
            of archived instance

         •  Added number of instances of that
            age in the archive as a reference
Age of instances accessed
Who is using archive .
                     "

  •  Online survey linked to from search
     service - approx 450 respondents

  •  Age, gender, location, education

  •  How did they arrive

  •  What type of information and for
     what purpose

  •  Is it still available on the live web ?
But first an anecdote
Article in major newspaper – quote

WE at Spring Loaded are no conspiracy theorists, but
the disappearance of Liberal Party policies is curious.
First went the policy documents. A recent revamp of
the website saw the pre-election press releases go.
But thanks to the National Library of Australia s
Internet archive, many of the policies can be seen at
http://pandora.nla.gov.au When Spring Loaded asked
about the missing policies, the Liberal Party said there
was nothing untoward .
Examples of lost web sites
• Qantas own special web site presenting
 their case during the major dispute with
 pilots, engineers and cabin crew unions that
 grounded the airline in 2011
• Jeff Kennett's campaign web site in the
 1999 Victorian State election - the first use
 of the web by a politician during a
 campaign in Australia
About the
respondents
How did they arrive ?
What information was
      sought ?
What for ?
Other questions
•  Did you realise that you were going to enter
   an archived version of a web site, not the live
   one (60% yes to 40% no)

•  Was the resource you were looking for no
   longer available on the live web ? (50-50)

•  Have you visited other web archives ? (60%
   yes, 40% no)
Conclusions
•  We need to market our archive better

•  Promote redirects for closing, unsupported
   web sites

•  Convert archives to arc/warc so memento API
   will find content

•  allow google indexing of content for archived
   web sites where live version is extinct or
   substantially altered

Más contenido relacionado

La actualidad más candente

Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 
Deep Dive Into KBART
Deep Dive Into KBARTDeep Dive Into KBART
Deep Dive Into KBARTNASIG
 
Radicalize Your Library Catalog with Ebooks Your Patrons Can Keep Forever
Radicalize Your Library Catalog with Ebooks Your Patrons Can Keep ForeverRadicalize Your Library Catalog with Ebooks Your Patrons Can Keep Forever
Radicalize Your Library Catalog with Ebooks Your Patrons Can Keep Foreverloriayre
 
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...Andrew Bourgeois
 
ER&L 2022 - Set It and Forget It: Librarian, Publisher, and Vendor Perspectiv...
ER&L 2022 - Set It and Forget It: Librarian, Publisher, and Vendor Perspectiv...ER&L 2022 - Set It and Forget It: Librarian, Publisher, and Vendor Perspectiv...
ER&L 2022 - Set It and Forget It: Librarian, Publisher, and Vendor Perspectiv...Matthew Ragucci
 
Dulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PISDulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PISMicah Altman
 
Open Source ILS Add-Ons
Open Source ILS Add-OnsOpen Source ILS Add-Ons
Open Source ILS Add-Onsloriayre
 
Charleston 2021 - Hit the ground running - Best practices for navigating cont...
Charleston 2021 - Hit the ground running - Best practices for navigating cont...Charleston 2021 - Hit the ground running - Best practices for navigating cont...
Charleston 2021 - Hit the ground running - Best practices for navigating cont...Matthew Ragucci
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
 
NISO Plus 2022 - Content Platform Migrations Working Group Update
NISO Plus 2022 - Content Platform Migrations  Working Group UpdateNISO Plus 2022 - Content Platform Migrations  Working Group Update
NISO Plus 2022 - Content Platform Migrations Working Group UpdateMatthew Ragucci
 

La actualidad más candente (10)

Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 
Deep Dive Into KBART
Deep Dive Into KBARTDeep Dive Into KBART
Deep Dive Into KBART
 
Radicalize Your Library Catalog with Ebooks Your Patrons Can Keep Forever
Radicalize Your Library Catalog with Ebooks Your Patrons Can Keep ForeverRadicalize Your Library Catalog with Ebooks Your Patrons Can Keep Forever
Radicalize Your Library Catalog with Ebooks Your Patrons Can Keep Forever
 
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
 
ER&L 2022 - Set It and Forget It: Librarian, Publisher, and Vendor Perspectiv...
ER&L 2022 - Set It and Forget It: Librarian, Publisher, and Vendor Perspectiv...ER&L 2022 - Set It and Forget It: Librarian, Publisher, and Vendor Perspectiv...
ER&L 2022 - Set It and Forget It: Librarian, Publisher, and Vendor Perspectiv...
 
Dulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PISDulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PIS
 
Open Source ILS Add-Ons
Open Source ILS Add-OnsOpen Source ILS Add-Ons
Open Source ILS Add-Ons
 
Charleston 2021 - Hit the ground running - Best practices for navigating cont...
Charleston 2021 - Hit the ground running - Best practices for navigating cont...Charleston 2021 - Hit the ground running - Best practices for navigating cont...
Charleston 2021 - Hit the ground running - Best practices for navigating cont...
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
NISO Plus 2022 - Content Platform Migrations Working Group Update
NISO Plus 2022 - Content Platform Migrations  Working Group UpdateNISO Plus 2022 - Content Platform Migrations  Working Group Update
NISO Plus 2022 - Content Platform Migrations Working Group Update
 

Similar a Pandora

Web-Scale Discovery: Post Implementation
Web-Scale Discovery: Post ImplementationWeb-Scale Discovery: Post Implementation
Web-Scale Discovery: Post ImplementationRachel Vacek
 
Building the AAPB: Inter-Institutional Preservation and Access Workflows
Building the AAPB: Inter-Institutional Preservation and Access WorkflowsBuilding the AAPB: Inter-Institutional Preservation and Access Workflows
Building the AAPB: Inter-Institutional Preservation and Access WorkflowsWGBH Media Library and Archives
 
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012lljohnston
 
Cro presentation for library jan13v2
Cro presentation for library jan13v2Cro presentation for library jan13v2
Cro presentation for library jan13v2NeilStewartCity
 
Web Archiving – Lessons and Potential
 Web Archiving – Lessons and Potential Web Archiving – Lessons and Potential
Web Archiving – Lessons and PotentialDaniel Gomes
 
Marc and beyond: 3 Linked Data Choices
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices Richard Wallis
 
Putting the Pieces Together: Creating a National Educational Television Catalog
Putting the Pieces Together: Creating a National Educational Television CatalogPutting the Pieces Together: Creating a National Educational Television Catalog
Putting the Pieces Together: Creating a National Educational Television CatalogWGBH Media Library and Archives
 
Your Archives - (The National Archives Wiki)
Your Archives - (The National Archives Wiki)Your Archives - (The National Archives Wiki)
Your Archives - (The National Archives Wiki)ALISS
 
Presentation Deep Web Technology.pptx
Presentation Deep Web Technology.pptxPresentation Deep Web Technology.pptx
Presentation Deep Web Technology.pptxmayurbokan
 
Repositioning realignment and the researcher
Repositioning realignment and the researcherRepositioning realignment and the researcher
Repositioning realignment and the researcherLIBER Europe
 
TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013Avtex
 
Three Linked Data choices for Libraries
Three Linked Data choices for LibrariesThree Linked Data choices for Libraries
Three Linked Data choices for LibrariesRichard Wallis
 
NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)Christine Stohn
 
Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct Anna Perricci
 
The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3Essam Obaid
 
ENGL 1221 McManus
ENGL 1221 McManusENGL 1221 McManus
ENGL 1221 McManusTraciwm
 
Web and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of CongressWeb and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of Congressnullhandle
 

Similar a Pandora (20)

Web-Scale Discovery: Post Implementation
Web-Scale Discovery: Post ImplementationWeb-Scale Discovery: Post Implementation
Web-Scale Discovery: Post Implementation
 
Building the AAPB: Inter-Institutional Preservation and Access Workflows
Building the AAPB: Inter-Institutional Preservation and Access WorkflowsBuilding the AAPB: Inter-Institutional Preservation and Access Workflows
Building the AAPB: Inter-Institutional Preservation and Access Workflows
 
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
 
Cro presentation for library jan13v2
Cro presentation for library jan13v2Cro presentation for library jan13v2
Cro presentation for library jan13v2
 
Web Archiving – Lessons and Potential
 Web Archiving – Lessons and Potential Web Archiving – Lessons and Potential
Web Archiving – Lessons and Potential
 
Marc and beyond: 3 Linked Data Choices
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices
 
Putting the Pieces Together: Creating a National Educational Television Catalog
Putting the Pieces Together: Creating a National Educational Television CatalogPutting the Pieces Together: Creating a National Educational Television Catalog
Putting the Pieces Together: Creating a National Educational Television Catalog
 
Your Archives - (The National Archives Wiki)
Your Archives - (The National Archives Wiki)Your Archives - (The National Archives Wiki)
Your Archives - (The National Archives Wiki)
 
Scaling up to archive the UK Web. Helen Hockx-Yu
Scaling up to archive the UK Web. Helen Hockx-YuScaling up to archive the UK Web. Helen Hockx-Yu
Scaling up to archive the UK Web. Helen Hockx-Yu
 
Presentation Deep Web Technology.pptx
Presentation Deep Web Technology.pptxPresentation Deep Web Technology.pptx
Presentation Deep Web Technology.pptx
 
Repositioning realignment and the researcher
Repositioning realignment and the researcherRepositioning realignment and the researcher
Repositioning realignment and the researcher
 
TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013
 
Three Linked Data choices for Libraries
Three Linked Data choices for LibrariesThree Linked Data choices for Libraries
Three Linked Data choices for Libraries
 
NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)
 
Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct
 
Winter, Chandler, Biedenbach, Pearson, and Stanton, "It’s Only as Good as the...
Winter, Chandler, Biedenbach, Pearson, and Stanton, "It’s Only as Good as the...Winter, Chandler, Biedenbach, Pearson, and Stanton, "It’s Only as Good as the...
Winter, Chandler, Biedenbach, Pearson, and Stanton, "It’s Only as Good as the...
 
The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3
 
Web decay and Internet Archive
Web decay and Internet ArchiveWeb decay and Internet Archive
Web decay and Internet Archive
 
ENGL 1221 McManus
ENGL 1221 McManusENGL 1221 McManus
ENGL 1221 McManus
 
Web and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of CongressWeb and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of Congress
 

Más de National Library of Australia

Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...National Library of Australia
 
CHG recipient case study - Julia Mant of the National Institute of Dramatic Art
CHG recipient case study - Julia Mant of the National Institute of Dramatic ArtCHG recipient case study - Julia Mant of the National Institute of Dramatic Art
CHG recipient case study - Julia Mant of the National Institute of Dramatic ArtNational Library of Australia
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaNational Library of Australia
 
Trove - a window to our community heritage - Hilary Berthon of Trove, NLA
Trove - a window to our community heritage - Hilary Berthon of Trove, NLATrove - a window to our community heritage - Hilary Berthon of Trove, NLA
Trove - a window to our community heritage - Hilary Berthon of Trove, NLANational Library of Australia
 
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...National Library of Australia
 
Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
 Assessing Significance and Significance 2.0: an introduction - Margaret Birt... Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
Assessing Significance and Significance 2.0: an introduction - Margaret Birt...National Library of Australia
 
Assessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyAssessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyNational Library of Australia
 
Publicity, Media & Completing your CHG project - 2017 - Fran D'Castro
Publicity, Media & Completing your CHG project - 2017 - Fran D'CastroPublicity, Media & Completing your CHG project - 2017 - Fran D'Castro
Publicity, Media & Completing your CHG project - 2017 - Fran D'CastroNational Library of Australia
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaNational Library of Australia
 
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLATROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLANational Library of Australia
 
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...National Library of Australia
 
CHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
CHG recipient case study - Donna Bailey of the Catholic Diocese of SandhurstCHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
CHG recipient case study - Donna Bailey of the Catholic Diocese of SandhurstNational Library of Australia
 
Assessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyAssessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyNational Library of Australia
 
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...National Library of Australia
 
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaJust digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaNational Library of Australia
 

Más de National Library of Australia (20)

Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
Publicity and media - Anna Gressier & Sarah Kleven (Communications and Market...
 
CHG recipient case study - Julia Mant of the National Institute of Dramatic Art
CHG recipient case study - Julia Mant of the National Institute of Dramatic ArtCHG recipient case study - Julia Mant of the National Institute of Dramatic Art
CHG recipient case study - Julia Mant of the National Institute of Dramatic Art
 
Completing your CHG project - Fran D'Castro
Completing your CHG project - Fran D'CastroCompleting your CHG project - Fran D'Castro
Completing your CHG project - Fran D'Castro
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
 
Trove - a window to our community heritage - Hilary Berthon of Trove, NLA
Trove - a window to our community heritage - Hilary Berthon of Trove, NLATrove - a window to our community heritage - Hilary Berthon of Trove, NLA
Trove - a window to our community heritage - Hilary Berthon of Trove, NLA
 
National Archives of Australia
National Archives of AustraliaNational Archives of Australia
National Archives of Australia
 
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
 
Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
 Assessing Significance and Significance 2.0: an introduction - Margaret Birt... Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
Assessing Significance and Significance 2.0: an introduction - Margaret Birt...
 
Preservation Needs Assessment - Tamara Lavrencic
Preservation Needs Assessment  - Tamara LavrencicPreservation Needs Assessment  - Tamara Lavrencic
Preservation Needs Assessment - Tamara Lavrencic
 
Assessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyAssessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania Cleary
 
Publicity, Media & Completing your CHG project - 2017 - Fran D'Castro
Publicity, Media & Completing your CHG project - 2017 - Fran D'CastroPublicity, Media & Completing your CHG project - 2017 - Fran D'Castro
Publicity, Media & Completing your CHG project - 2017 - Fran D'Castro
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
 
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLATROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
TROVE - a window to our community heritage - Hilary Berthon of Trove, NLA
 
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
Disaster Prevention, Preparedness, Response and Recovery for Collections - Ki...
 
CHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
CHG recipient case study - Donna Bailey of the Catholic Diocese of SandhurstCHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
CHG recipient case study - Donna Bailey of the Catholic Diocese of Sandhurst
 
Preservation Needs Assessment - Tamara Lavrencic
Preservation Needs Assessment - Tamara LavrencicPreservation Needs Assessment - Tamara Lavrencic
Preservation Needs Assessment - Tamara Lavrencic
 
Assessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania ClearyAssessing the significance of cultural heritage - Tania Cleary
Assessing the significance of cultural heritage - Tania Cleary
 
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
Significance Assessment and Significance 2.0: an introduction - Veronica Bull...
 
Preservation assessment - Tamara Lavrencic
Preservation assessment - Tamara LavrencicPreservation assessment - Tamara Lavrencic
Preservation assessment - Tamara Lavrencic
 
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaJust digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Pandora

  • 1. Trends in Use of Pandora Archive Presentation at IIPC Open Day The Broad Value of Web Archives 30th April, 2012, Library of Congress Monica Omodei Director, Web Archiving and Digital Preservation National Library of Australia momodei @ nla.gov.au
  • 2. About the Pandora Archive •  Selective, Collaborative Approach " –  high value, discrete, timely collecting" –  A number of partners contribute to Pandora" •  Targeted Australian content " –  selection policy, nominations are reviewed" •  Historical – started 1996" •  Bibliocentric approach " –  archived sites/publications are fully catalogued" •  Publicly accessible" –  full content keyword search through national resource discovery service trove.nla.gov.au –  Browse is of reconstituted version of original site –  Metadata indexed in google"
  • 3. Pandora Archive Stats •  Size – 6.32 TB" •  Number of Files > 140 million" •  Number of titles > 30.5K" •  Number of title instances > 73.5K"
  • 4. Whole domain archive •  We have also commissioned the IA to crawl the .au domain for us annually since 2005 •  Legislation prevents us from making this accessible yet •  Hopefully soon we will be able to allow access to researchers
  • 5. Australian web domain crawls Year! 2005! 2006! 2007! 2008! 2009! 2011! Files! 185 596 516 1 billion! 765 660 million! million! million! million! million! Hosts 811,523! 1,046,038! 1,247,614! 3,038,658! 1,074,645! 1,346,549! crawled! Size (TBs) 6.69! ! 19.04! 18.47! 34.55! 24.29! 30.71!
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. The Bad News •  we have no legal deposit legislation for electronic publications so permission to archive must be obtained" –  significant content missed because permission to copy refused" •  QA and fixing process can be labour intensive" –  Technical infrastructure ten years old" •  Selection guidelines outdated and dont align" •  Significant content missed because of resourcing constraints and high labour cost" •  Search and browse functionality very limited" –  no URL search, no time-based searching" •  Current infrastructure doesnʼt scale for broader themed collections with multiple sites or for domain- scale archiving
  • 12. Glass half full •  Situation will improve markedly if Legal Deposit provisions extended to digital publications" – The Australian Attorney-General has released a consultation paper with a model for this extension" •  Broader coverage will be achieved when infrastructure is upgraded, improving scalability and reducing labour costs for QA/fixing – We have commenced a multi-year Digital Library Infrastructure Replacement Project which includes upgrading our web archiving tools" – We are currently trialling Heritrix for collaborative thematic collecting, and wayback for access to our commissioned .gov.au sub-domain archive"
  • 13. DLIR Project •  Digital Library Infrastructure Replacement" •  RFP was followed by RFT for components where reasonable solutions had been proposed (including core repository)" •  The RFT evaluation recommended proceeding to contract negotiations with the selected tenderer for each component" •  Currently preparing a submission for ministerial approval prior to contract negotiations with vendors.
  • 14. Patterns of Use •  Which archived sites are popular and why ?" •  Is use of our archive growing ?" •  What is the relative interest in older vs more recent captures ?" •  Who is using our archives ?" •  And what for ?
  • 15. Which archived sites are popular ? •  Data source – filtered, aggregated web access log data which counts access to titles " •  Examined top 30 archived titles (# of accesses) for each year 2009 to 2012" •  Selected some to examine and speculate as to why they might be popular" •  Included consistently high ranking, and ones that were very variable between years
  • 16. Reasons for popularity of archived version •  Were once popular and are now decommissioned, particularly if domain name continues to exist and redirects to the archive" •  May not be that popular as live sites but their live site links prominently to Pandora as an archive for their content" •  Popular referencing sources cite the archive as well as the live site (if it still exists)
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. Conclusions •  Be more proactive in identifying unresponsive domains " •  Market automatic redirect services to web site owners/ managers" •  Allow Google to index archive content for sites which are no longer live "
  • 28. Is use of Pandora growing ? Annual access figures for Pandora Web Site and Archive NB robots.txt was not introduced on the site until 2005 Web site design change in 2008 affected measure downward
  • 29. Interest in older vs recent content •  Filtered access logs by reference from the entry page to the archived instance •  aggregated accesses by age(year) of archived instance •  Added number of instances of that age in the archive as a reference
  • 30. Age of instances accessed
  • 31. Who is using archive . " •  Online survey linked to from search service - approx 450 respondents •  Age, gender, location, education •  How did they arrive •  What type of information and for what purpose •  Is it still available on the live web ?
  • 32. But first an anecdote Article in major newspaper – quote WE at Spring Loaded are no conspiracy theorists, but the disappearance of Liberal Party policies is curious. First went the policy documents. A recent revamp of the website saw the pre-election press releases go. But thanks to the National Library of Australia s Internet archive, many of the policies can be seen at http://pandora.nla.gov.au When Spring Loaded asked about the missing policies, the Liberal Party said there was nothing untoward .
  • 33. Examples of lost web sites • Qantas own special web site presenting their case during the major dispute with pilots, engineers and cabin crew unions that grounded the airline in 2011 • Jeff Kennett's campaign web site in the 1999 Victorian State election - the first use of the web by a politician during a campaign in Australia
  • 34.
  • 35.
  • 36.
  • 38. How did they arrive ?
  • 41. Other questions •  Did you realise that you were going to enter an archived version of a web site, not the live one (60% yes to 40% no) •  Was the resource you were looking for no longer available on the live web ? (50-50) •  Have you visited other web archives ? (60% yes, 40% no)
  • 42. Conclusions •  We need to market our archive better •  Promote redirects for closing, unsupported web sites •  Convert archives to arc/warc so memento API will find content •  allow google indexing of content for archived web sites where live version is extinct or substantially altered