Presentation given by Gill Hamilton (Systems Librarian, National Library of Scotland) and Tarik Rahman (Repository developer, National Library of Scotland) to colleagues from National Library of Scotland (NLS) and National Archives Scotland (NAS) on the NLS's plans to use METS as a means of ingest and dissemination of digital and other objects in the Library's Fedora repository
34. Thank you Gill Hamilton Systems Librarian [email_address] Tarik Rahman Repository Developer [email_address] National Library of Scotland Information Systems Development Team
In the beginning was the word .. And it was written on stone, paper, etc The paper world Description well understood, standards Implied rights It’s in the library you need a card Implied preservation It’s paper or stone
Kinda easy -100s years of experience, we understand about preservation of paper, we use standard metadata schemans, we use standard rules for description.
Digital world More accessible More structure More rights More provenance More metadata overall Descriptive metadata: information about the intellectual content of a digital object, which is used to aid identification and discovery of the object by the researcher. Structural metadata: information about the relationships between digital objects, which can be very complex in a large hybrid personal archive. Structural metadata also supports the display and navigation of digital objects by users. Administrative metadata: information needed by the repository for the long-term management of a digital object, including information about an object’s creation, technical information such as file formats, provenance information and information about intellectual property rights.
More describptio - technical
Hub document XML schema provides the voc & syntax to identify and describe the components that make up a digital (or non-digital) object Specifies location of these components Expresses the structural relationships METS provides an XML schema-based specification for encoding "hub" documents for materials whose content is digital. A "hub" document draws together potentially dispersed but related files and data. METS uses XML to provide a vocabulary and syntax for identifying the digital components that together comprise a digital object, for specifying the location of these components, and for expressing their structural relationships. The digital components comprising a digital object could include the content files, the descriptive metadata, and the administrative metadata. METS can be used for the transfer (SIP in OAIS terms), dissemination (DIP in OAIS terms) and/or archiving (AIP in OAIS terms) of digital objects. The UCB Library will ultimately use it for all three purposes. Flexible standard (some say too flexible) Encode Dos To express the complex links and realtionships between the various associated DO and DO groups. To mananage complext and compound objects relationship to OAIS – designed to fulfil OAIS IP concept Standard for exchange & transmission between repositories Assists in display, archive nad exchange (encapsulates) Can be used to manage the lifecycle of a DO
METS and MODS
MOA - collaborative project to digitse primary documents relating to the development of the establishment of US infrastructure Cornell & Uni of Michigan with Later input from LC. Mellon foundation. One of the 1st large scale collaborative digitisation projcts. From it came the beginnings of the understandings of how to manage DO and they developed a model. Began in 1995 METS promotes Interoperability Scalability preservation Background to METS Stuff done at Berkely The Making of America II was a Digital Library Federation project to create a proposed digital library object standard by encoding defined descriptive, administrative and structural metadata, along with the primary content, inside a digital library object. The preliminary digital object "standard" that came out of the project is now itself obsolete--but served as the starting point for the development of the Metadata Encoding and Transmission Standard ( METS ), which is currently maintained by the Library of Congress About MOA1 Making of America (MOA) represents a major collaborative endeavor to preserve and make accessible through digital technology a significant body of primary sources related to development of the U.S. infrastructure. Funded originally by The Andrew W. Mellon Foundation , MOA sought to involve research institutions and national consortia to develop common protocols and consensus for the selection, conversion, storage, retrieval, and use of digitized materials on a large, distributed scale. METS originated in a project that identified metadata and complex digital object structure as an area of critical concern for digital libraries. As more and more institutions digitized portions of their collections, there was growing concern about how to express the structural relationships between the digital files and data that together comprise a single digital object or entity. The Making of America II (MOA2) project sponsored by the Digital Library Federation (DLF) in the early stages and funded by the National Endowment for the Humanities resulted from these discussions. New York Public Library and the libraries of Cornell, Penn State, and Stanford Universities collaborated under the leadership of the UCB Library on an investigation of how to encode structural, descriptive and administrative metadata for digital objects. The project produced the MOA2 XML Document Type Definition (DTD), the direct predecessor of METS, to specify a vocabulary and syntax for encoding digital objects. After the MOA2 project ended in early 2000, the Council on Library and Information Resources (CLIR) published the group's findings, and the MOA2 DTD was circulated for assessment and discussion. While MOA2 aroused considerable interest within the library community, the MOA2 DTD was too restrictive in some respects and lacked some basic functionality, especially for time-based media such as audio and video. In February, 2001, interested institutions started to meet to build on the groundwork laid by the MOA2 project. The METS schema has come out of this work. LC image attribution http://www.flickr.com/photos/wallyg/3658222013/sizes/m/ DFL, HUL, NYC, LC, MetaE, UC, OCLC collaborate and release schema 2003 Now in release 1.8 -> 2.0 due 2010 In use by …. Name organisations (get some archives). Commerical products that use METS, Fedora, DocWorks, Greenstone, Dspace, Digitool, VITAL etc etc (get more)
Can be very simple - think of a photo that you only have in one size Intellectual Toc Images Map from BNP (portugal)
fileSec is for organising the content files Records file specific technical metadata (checksum, file size, creation date/time) tiff, jpeg, jpeg2000, gif would be common image file formats Something in common could be intended use of the files, e.g. a number of drawings, or it could be the physical structure of the resource e.g. a newspaper A file element can have nested <file> elements This lovely geeky man is here to warn you of xml code coming up
Xml namespace is URI that uniquely identifies the context for elements and attributes, xsi:schemaLocation in this case METS can use any element from any namespace, e.g. dmdSec using MODS md Can link to elements in external files with xlink:href, so need the xlink schema xsd in order to call that namespace A METS profile is expressed as an XML document conforming to the profile schema. It’s a formal description of a class of METS documents
The heart of the METS standard Only <structMap> required to validate METS doc structMap is for assembling some or all of the content files from fileSec into a coherent structured whole
Div May represent logical or physical division structMap contains only 1 root div element, can have nested divs within that Div can ref to dmdSec ids and amdSec ids
div and fptr elements for the pdf file and the individual page image files all occur at the same level of the structMap as the structMap first and foremost organizes the content by content format, rather than hierarchically
The content files, the names and details for which appear below, include jpeg and gif image versions of each page, a TEI transcription of the &quot;entire&quot; book and an audio recording of the &quot;entire&quot; book.
What is it?
Software communicates with Fedora for ingesting digital objects Upgrades will need to be made to Fedora software for supporting v1.8 or even v2.0 which is out this year.
A number of front ends, depending on who wants to use it, librarian, curator, basic user Input from browser e.g. subject, title, insert fileSec, fileGrp, will create METS metadata Could have manual input of structural and descriptive metadata Content files are virus-scanned and verified. Using METADATA service, METS file is validated, content file checksums are verified METS file+ content files ingested to Fedora Did not illustrate that these objects will be available for access after ingestion to Fedora
METS file must be formatted slightly differently for ingest to Fedora