The document discusses the challenges and unfinished agenda around creating a digital library of all published works available online. It outlines landmarks in the field from 1975 to present. The grand challenge is to make all published works instantly available anywhere, in any language, through searchable, browsable and navigable interfaces for humans and machines. Major challenges include providing access to billions of users globally, multilingual search, translation and summarization. The agenda also includes digitizing newspapers, magazines, lectures, talks, monuments and other media. National digital archives are proposed to preserve marginal or out-of-copyright content. An international organization is suggested to promote digital archiving policies and technologies.
1. Online Digital Content: Unfinished Agenda
Raj Reddy
Carnegie Mellon University
May 24, 2011
Talk Presented at QITCOM Doha May 24, 2011
2. Digital Libraries for Digital Content
Landmarks
1975: Stanford Connection
McCarthy Proposal
1981: The French Connection
Centre Mondial Informatique Marseilles effort
1995 to Present: CMU Connection
1995: Electronic Library Workshop
1996: MCC Austin Workshop on DL
1997: Shamos – Scan everything
2001: NSF Funding and Launch of UDL: Start scanning books
2003: MBP MOU signed
2005: First ICUDL Conference
3. The Grand Challenge
Create Access to
All published works online
Instantly available
Anywhere in the world
Searchable, browsable, navigable
In any language
By humans and machines
6. Books
Title Structure Des Molecules Title Rig Veda Title Jawahar Ali Joyviyah
Author Victor Henri Author Pandit Sriram Sharma Acharya Author Dr.Ilyas lomas
Language French Language Sanskrit Language Arabic
Subject Chemistry Subject Philosophy Subject Metrology
13. Million Book Project: Research Challenges
Providing Access to Billions everyday
Distributed Cached Servers in every country and
region
Easy to use interfaces for Billions
Multilingual Information Retrieval
Translation
Summarization
14. Million Book Project: Policy Challenges
Compensating for Creative Works
5% out of copyright
92% out-of-print and in-copyright
3% in-print and in-copyright
Options
Tax Credit
Usage based Government funded compensation
Analogous to Public Lending Right in UK and Australia
Usage charges to the user
Compulsory Licensing
Digital Submissions to National Archives of all books
that are “born-digital”
15. Can we do it?
The Grand Challenge: Create Access to
All published works online
Instantly available
In any language
Anywhere in the world
Searchable, browsable, navigable
By humans and machines
16. Unfinished Agenda
DL of Music
DL of Newspapers and Magazines
DL of Lectures
DL of Talks
DL of Monuments
17. Next Steps for Digital Contetnt
Business Models have overtaken some of our Agenda
Google, Apple and Amazon already provide access music, movies and books
Criterion: willingness to pay for content
Content of Long Term interest and value may be lost
Setup an International UDL Society to promote archiving and data
permanence of all content, especially content of marginal value today
To change national policies including digital copyright conventions in support
Digital Archiving
Create National Digital Archives of content of all media types:
Book archives
Newspapers, Magazines, Scholarly Journals and other periodicals
Paintings archives,
Music archives,
Lecture archives,
Movie and TV archives,
Monument archives: Virtual Tours
Provide technical support for digitization, annotation and storage of archival
content
Research into digital preservation technologies
18. Title Rig Veda
Author Pandit Sriram Sharma Acharya
Language Sanskrit
Subject Philosophy
Publisher Sanskriti Sansthan Bareli
Year
Abstract Rig Veda is the oldest of the
Vedas. The Rig Veda is the
oldest book in Sanskrit or any
Indo-European language. Many
great Yogis and scholars who
have understood the
astronomical references in the
hymns, date the Rig Veda as
before 4000 B.C., perhaps as
early as 12,000. Modern
western scholars date it around
1500 B.C., though recent
archaeological finds in India
(like Dwaraka) now appear to
require a much earlier date
19. Title Gulzar-A-Badesha
Author Khader Badesha
Language Urdu
Subject Literature
Publisher Namipress, Chennai
Year 1919
Abstract Literature
20. Title Structure Des Molecules
Author Victor Henri
Language French
Subject Chemistry
Publisher Taylor and Francis
Year 1925
Abstract This is a unique book that
explicates, in detail, the
structure of molecules and
touches upon certain specific
characteristics of molecules
with particular reference to
Benzene