1. The Google Book Settlement:
Where are we now, and how did we get here?
Sian Meikle & Tony Horava
University of Toronto
Sept. 25, 2009
2. Outline
Overview of settlement
Access to Google Books
Copyright issues
Marketplace impacts
Integration/curation of content
Competition issues
Privacy matters
Academic freedom
Future business models
3. Google Book Search
Started in 2004
42 Library Partners, many publishing partners
Google-funded
In-copyright and out-of-copyright material
Google and selected library partner servers only
10 million books to date:
2 million public domain (20%)
7.5 million in copyright, out of print (75%)
0.5 million in copyright, in print (5%)
Eventual aim: 30 million books
4. Google digital products
Metadata
Scanned (back files, library partner scans):
Scanned images: TIFFs
Access derivatives:
JPEGs
Image-based PDFs (one per page or one per book)
Uncorrected OCR
Born-digital (front file, new content)
Digital text, xml format
5. Proposed Google Book Settlement
2005
US class-action lawsuit against Google
American Publishers Association
American Authors Guild
October 28 2008
proposed settlement announced
Oct 7 2009 (moved from June 11 09)
Originally (final) Court Fairness Hearing
Possible outcomes:
accept, reject, court oversight
Out of scope: change agreement
Sept 18: US Department of Justice advises Court not to accept
settlement but to encourage further discussion
Sept 24: Court accepts motion to delay final hearing;
will hold status conference on Oct 7 instead
6. Google Book Settlement Outline
covers online access in US for books:
published before January 5 2009
covered by Berne copyright convention
Google pays $125 million
$34.5 million to establish Book Registry
$45 million to rights holders for books scanned prior to May
2009 ($60 per title)
$45.5 million for legal fees
Split of future revenues:
63% to copyright holders
37% to Google
7. Google Book Settlement: Products
Display uses (saleable products):
Access, preview, snippets, book records
Non display uses (free and research products):
Display of metadata only; full-text and geographic indexing
without display of text; analytical research across corpus; and
Google R&D
Inclusion:
In print books: opt-in to display uses
Out of print books: opt-out of display uses
In print = commercially available in USA and Europe
Products:
Individuals: sale of perpetual access per title via Google server
Institutions: sale of annual access to ISD
8. Book Registry
Non-profit independent agency representing
plaintiff interests to:
Manage rights database: book status, contact info
Negotiate terms and prices of online book uses on
behalf of rights holders
Distribute share of revenue to rights holders
9. Institutional Subscription Licensing models
Libraries that contribute books for scanning:
Fully participating libraries
Give in-copyright books; get digital copies,
must meet security requirements
Cooperating libraries
Give in-copyright books; get no digital copies
Libraries that do not contribute books for scanning:
May subscribe to ISD (either whole or discipline based)
Pricing model set by Google and Book Registry
Benefits of partnering:
Ability to challenge institutional pricing model
Some subscription discounts
Information about inclusion / exclusion of books
10. Google Research Corpus
All Google books except in-copyright works whose
rights holders have removed their works
Hosted at Google, up to two other sites
Non-consumptive research:
linguistic analysis, automated translation, book
relationships, index/search techniques
Qualified users
approved research agenda
letter of support from participating library, book registry,
google, or corpus host
11. How many orphans?
One possible calculation Claimed
21%
3.5 million out of print books
scanned prior to May 2009
$45 million (rights claims)
÷ $60 per book
______________ Orphans
79%
= 750,000 claimed books,
and 2.75 million orphans Out-of-print titles
But $45 million is the minimum payment, so numbers may vary.
12. Google Book Settlement: Reactions
Positive:
huge corpus available to wide audience
bypassed orphan works log jam
Concerns:
de facto monopoly
user privacy
intellectual freedom
transparency
equity of access
long-term security of data
13. Copyright challenges
Does the settlement erode statutory rights under copyright law?
Is ‘fair use’ doctrine affected by the settlement? (NB – ‘fair use’ is
much broader than Canadian ‘fair dealing’)
Many argue that it does not….it is a private settlement among
three parties. ‘Fair use’ legislative provisions haven’t changed.
Many argue that it is more restrictive than ‘fair use’
General view that the settlement will be influential in setting de
facto standards for ‘fair use’, such as number of pages that can
be displayed or printed; the conditions for archiving and indexing
of text for discovery purposes
Contractual licensing is supplanting copyright legislation as the
driver for reproducing and disseminating in-copyright books (in a
commercial model) to our collective cultural heritage – enormous
risk for the stewardship role of libraries.
14. Copyright challenges (2)
The Registry will not be available to the public - a key tool is
being developed privately
The board of the Registry will have no librarian or reader
representation… will a balanced approach to copyright, access,
and pricing issues be possible?
The settlement is silent on how agreements between libraries
and the Registry would ensure that users can exercise their
rights under the US Copyright Act
Will have a damper effect on Open Access and Creative
Commons licensing – how will affect the long-term plans of the
Open Content Alliance, for example?
15. Pricing and market impacts
No other provider can be offered license terms better than Google’s for ten
years – creates a virtual monopoly and enormous lead time advantage.
Upward pressure on pricing– affordability in a limited market.
Pricing to be determined by: the pricing of similar products & services; the
scope of books available; the quality of the scan; and features offered via
the subscription
The settlement refers to two goals: 1)market realization of revenues for
rights holders, and 2) broad access by the public including institutions of
higher education…to be based on “comparable products and services”. But
which ones?
Only Google can license ‘orphan works’ (in the absence of legislation)
Opting out for orphan works would be very problematic. Creates a huge
locked-in pool of books.
Will have huge influence on market pricing for out-of-print books.
Rights holder can set price, or the Registry will use a pricing algorithm
based on similar books to determine price
16. Integration & curation issues
Google has opened Book Search via APIs: libraries can embed book
images, previews, and links to Book Search within discovery layers and
catalogues
Book Search won’t allow downloading of in-copyright books to mobile
devices.
How can we leverage SFX link resolver to obtain maximum benefit from
enriched content in Book Search?
Libraries can work closer with Google and OCLC to make it very easy to
move from search results to purchased content (eg ‘Find in a Library’
link)
Could lead to better partnership arrangements with libraries for
developing finding aids and user tools
Preservation is based on a commercial model, not on certifiable
standards in a non-profit, research environment – what guarantee of
permanence do we have? What if Google’s business model changes?
17. Integration & curation (2)
Book Search offers a very limited form of collaboration
(eg shared annotation among small & predefined
groups) but:
Doesn’t permit enhancements of texts;
Doesn’t permit the layering of new services upon texts;
Doesn’t permit use of texts in digital mash-ups.
Compare this with dynamic developments in ebook
interfaces for searching, sharing, storing, and managing
ebook content
18. The Competition….
The sheer size and scope of Book Search will invite comparisons with
established commercial products, such as EEBO, ECCO, and the backlists
of major publishers like Oxford or Taylor & Francis.
How will the pricing for the Institutional Database Subscription (IDS) affect
pricing models in the academic marketplace?
There will be much pressure on US libraries to acquire the IDS. If and when
it is available in Canada, there will be pressure on libraries to acquired it.
How will the ebook aggregators (eg NetLibrary, ebrary) be affected? Google
has very deep pockets for R&D
Turf wars - Google is providing public domain titles in ePub format to the
Sony ebook reader and recently announced that “it would let anyone resell
the millions of out-of-print books it has scanned from the nation’s libraries.”
What was Amazon’s response?
“an Amazon executive immediately rejected the idea of becoming Google’s
affiliate”
19. The Competition (2)
Comparison of DRM systems will be important for
access to material.
How will Google propose to integrate Book Search
into the researcher’s workflow?
Book Search won’t be able to offer the range of
functionality and tools on Scholars Portal as a
discovery environment for researchers
Can we deem Book Search ‘a collection’ analogous
to a library collection, eg on Scholars Portal?
20. Privacy matters
Concerns over user privacy…will these be addressed?
Google has an unprecedented opportunity to monitor
and track user reading habits, eg when a user prints out
pages from a book in the ISD, there will be a visible
watermark displaying encrypted session information
“which could be used to identify the authorized user that
printed the material or the access point from which the
material was printed” (art 4.1)
“For purchases of online e-book access or access via
institutional subscriptions, Google will have the technical
ability to track every page that one views, even recording
how long is spent on a page.” (Alan Inouye, ALA)
What will privacy look like in a Google environment?
21. Google’s response re privacy concerns
Federal Trade Commission (consumer protection) letters and statements
“..because the settlement agreement has not yet been approved by the court, and the
services authorized by the agreement have not been built or even designed yet, it's
not possible to draft a final privacy policy that covers details of the settlement's
anticipated services and features. Our privacy policies are usually based on detailed
review of a final product -- and on weeks, months or years of careful work engineering
the product itself to protect privacy. In this case, we've planned in advance for the
protections that will later be built, and we've described some of those in the Google
Books policy” – Jane Horvath, Global Privacy Counsel, Google
“The Bureau [of Consumer Protection] asks Google to commit to a continuing
dialogue regarding consumer privacy policies for Google products and services…I
believe such a commitment would require Google to adhere to the concept of privacy
by design,..” – Commissioner Paula Jones Harbour
Center for Democracy and Technology Privacy Recommendations for the Google
Book Settlement
22. Intellectual Freedom
If qualified users want to search the Research Corpus for ‘non
consumptive’ research, e.g. textual or linguistic analysis, their research
agenda needs to be approved by the host institution
“Research Agenda” means a document that describes a research
project in sufficient detail to demonstrate that it will be Non-
Consumptive Research” (p. 17 of Settlement)
Host institution is responsible. What will the criteria be?
This will certainly conflict with academic freedom…fundamental values
will be at play
Google can exclude a book for editorial reasons: on what basis?
Pressure from governments, powerful interest groups could have an
important impact, e.g. Google saving itself from embarrassment or bad
PR by suppressing a controversial book
The Settlement requires Google to provide public access and the ISD
for only 85% of the in-copyright, not commercially available books
(potentially 1M books)
Censorship & freedom of expression – another conflict with library
values
23. Equity of Access
Works within works might be excluded, depending on rights holder
exercising his rights independently, eg an essay, a poem, a chart or
a table
The Settlement doesn’t include pictorial works, eg photographs and
illustrations will be blacked out.
Momentum driving supply & demand :
“…it is possible that faculty and students at institutions of higher education
will come to view the institutional subscription as an indispensable only
because research libraries have invested significant resources in preserving
out of print books.. They might insist that their institution’s library purchase
such a subscription. The institution’s administration might also insist that the
library purchase an institutional subscription so that the institution can
remain competitive with other institutions of higher education in terms of the
recruitment and retention of faculty and students.” ALA-ACRL-ARL Brief
This can exacerbate inequalities among libraries, based on budget realities.
24. Future Business opportunities under the
settlement…
Print on Demand
Custom Publishing
PDF downloads
Consumer subscriptions
Summaries, abstracts, compilations
To compete, publishers will need to focus more on
metadata, rights-management and new logically
structured units (ie not pages) using a XML-based
content architecture and workflow.
Announcement last week: Google will provide the public
domain books to On Demand Books for print-on-demand
publishing using the Espresso Book Machine.
25. Conclusion
We need to monitor developments closely, and engage
in vigorous, balanced advocacy with our stakeholders,
and show support for US libraries & organizations that
are raising serious concerns
What will be the future impact on our libraries?
“Google is a behemoth, and the Google Settlement, if
approved, will make it the behemoth of the book….Will
the restraints of the Book Rights Registry be enough to
keep it from abusing such a position, or will they be like
the ropes of the Lilliputians around the sleeping Gulliver?
This story is surely only in chapter one” - Grace
Westcott, Globe & Mail Feb 20, 2009
Editor's Notes
(As you may have heard, this past week has been quite tumultuous for the Google Book Settlement. Last Friday, following more than 400 filings in response to the Settlement, the US DoJ recommended that it not be accepted in its current form, and this Tuesday, the plaintiffs filed a motion to delay final hearing on the settlement while it is reworked,. However, the DoJ did recognize the clear potential value of a a properly constructed settlement, and recommends that the plaintiffs and Google keep discussions going towards this end.) Given the pace of change these past few days we’ve decided that handouts for this talk might be impractical – I hope that’s okay? What Tony and I would like to do now is lay down the ground work for what is sure to be a lively panel discussion. I’ll start by setting out the general landscape of the Google Book Settlement and its components, flagging the places of particular concern as we go by. Then Tony will walk through the contentious issues in more depth for us. Our presentation is based on paper that we prepared, together with 6 other colleagues, for the directors of the OCUL libraries this spring. It’s been updated a couple of times since. The agreement is large and complex, so simplifying some details and leaving out others.
Lets start by going back to 2004. Many people were scanning out-of-copyright materials Much discussion of how to move into in-copyright materials People were thinking about orphan works – those books that are in copyright, but whose rights holders cannot be located. Into this arena stepped Google, announcing Google Book Search Google funded partnerships with library partners for older books, and publishers for new books. Differed from other similar projects in two key ways: Google was not proposing to make the scanned material freely available to be replicated on any server Google was going to scan in copyright materials. In all 10 million books have been scanned to date, with an eventual goal of 30 million books. Estimates vary, but one breakdown is: 2 million public domain (out of copyright) 7.5 million in copyright, but out of print and 0.5 million in copyright and in print
The products of the project, for the scanned material, were fairly typical: TIFF master images, JPEGs, OCR, and PDF derivatives for discovery and web delivery. In addition to the scanned materials obtained from libraries, publisher partners provided new titles, in xml-based digital text format.
At the outset, Google announced that the full text of in-copyright books would be indexed for search and discovery only, and argued that this fell within fair use. However, the American Publishers Association and the American Authors Guild did not agree, and so they each mounted class action lawsuits in the United States, against Google. The two suits were subsequently merged In October of last year, a settlement of the class action lawsuit was announced, subject to approval by the US Court. A final fairness hearing was originally scheduled for June of this year, but it was moved back to October, owing to the amount of reaction to the suit, and the clear need for further time to assess the issues it raises. Possible outcomes of a final fairness hearing are the acceptance, rejection, or court oversight of the agreement. Restructuring the agreement is not thought to be a possible action. Reaction continued apace over the summer, with more than 400 filings received by the court, Opinion varied as to whether the settlement should stand. Much discussion has arisen around areas that the settlement is silent, as well as those areas where it is explicit. Last Friday (Sept 18.) the US Department of Justice advised the US District Court (NY Southern District) that it should not accept the class action settlement as it stands, but that discussions should continue, as an improved settlement would offer important societal benefits. Specifically, the DOJ recommended the parties entertain modifications to the open-ended future licensing; conflicts among class members; additional protections for unknown rights holders; make comparable access for competitors possible; Judge Denny Chin stated yesterday in his order: “ The current settlement agreement raises significant issues, as demonstrated not only by the number of objections, but also by the fact that the objectors include countries, states, nonprofit organizations, and prominent authors and law professors.” But Judge Chin goes on to say “The settlement would offer many benefits to society, as recognized by supporters of the settlement as well as DoJ.,It would appear that if a fair and reasonable settlement can be struck, the public would benefit.” So what is the structure of this settlement that it has raised such concerns?
The settlement afforded Google the rights to provide and sell online access to books within the US. Several points are worth noting: It only covers books – not journals, newspapers, etc., nor the illustrations within books, as the rights holders of these materials were not represented. It covers all books in copyright that are covered by the Berne convention – anything, therefore, published in the 164 countries that signed the Berne convention. The class action lawsuit only pertains within the United States. No access beyond the United States was settled, though it was speculated that similar agreements might be made in other countries So Google was to pay $125 mill up front as follow: $34.5 million to establish an independent Book Registry to manage rights & revenues. $45 million to pay the rights holders of books scanned before May 2009 $60 per title $45.5 million for legal fees (To put that number in perspective, the class action lawsuit launched by Heather Robertson on behalf of freelance newspaper writers, against CTVglobemedia Inc., Thomson Reuters Canada and The Gale Group , settled for $11 million this May.) The settlement gave Google the right to sell online access to the books and to develop other related products , with the revenue stream being split, 63% to copyright holders and 37% to Google. The settlement only covered books that existed prior to January 2009. It is also a non-exclusive settlement. (Rights holders retain the rights to make other agreements with other providers). However, it does create a massive one-stop shop for online books. It might be thought, then, to be a compelling channel for publishers to market future works, and an intimidating channel for other ventures to compete against.
Let’s take a closer look at how books are displayed through Google Book Search, and what, by default is included and excluded in that display. The settlement lays out display uses and non-display uses of books, and then defines what uses are allowed for each pool of books: out-of-copyright, ou t-of-print, and in-print. Display uses are defined as the view or annotate an entire book; print or copy-and-paste chunks of a book; preview 20% of a book prior to purchase, a nd view short snippets or metadata about a book. The Non display uses include indexing for search and discovery but not display, and research inquiries across the entire corpus , and Google R&D. By default, in-print books are excluded from display uses, unless the rights holder opts in. (It’s worth noting in passing here that the agreement decreases access to in-print books, as snippet views were formerly available across Google Books.) Out-of-print books are included in display uses, unless the rights holder opts out. “ In print” was originally defined as commercially available in the US; however, Google announced in early September, in response to concerns raised by the EU, that it would view books available in Europe as “in-print” also. Revenue will be generated from the display uses, and so, by and large, the out-of-print book pool is the revenue stream. Two sales models are proposed: Individuals may purchase perpetual access to individual titles. The titles are hosted on the Google server, and may not be downloaded to portable book readers. For institutions, Google makes available an Institutional subscription database, which must comprise at least 85% of the out of print book pool. Institutions may subscribe to the whole ISD, or subject-based subsets of it on an annual basis. Google also has the right to display a dvertising on book pages, with permission of rights holders .
The Book Rights Registry defined in the settlement plays a pivotal role, and was a magnet for many concerns expressed. This is a non-profit independent agency, which plays the pivotal role of managing the database of book rights holders: What is the status of individual titles Who holds the rights to individual titles How is the ISD to be priced? There is no provision in the settlement for the BRR information to be made publicly available. The Board is to have representation (4 members each) from the author and publisher subclasses of the suit. Much discussion has arisen around the composition of the Book Registry Board, the interests that it represents, and the availability and transparency of the data it manages.
Let’s turn now to the proposed licensing models Google Book Search: Google is the primary host. All individual titles subscriptions, and the ISDs, will be accessed on the Google server. Some libraries – fully participating partners -- have limited rights to host a portion of the corpus, if they meet stringent security requirements, subject to audit by the BRR. There are other important b enefits for google’s library partners : Partners have a mechanism for challeng ing institutional pricing model, and may receive information about the ISD pricing strategy Some receive subscription discounts (UMich free for 25 years) Information about inclusion / exclusion of books: GB must disclose to partners information such as whether books are commercially available, and whether books are included in ISD. Only the identify of public domain books and the identity of books excluded from display uses for editorial reasons may be publicly disclosed. Libraries that do not contribute books for scannin g m ay subscribe to ISD on an annual basis, on the google server.
The agreement also provides for a Research Corpus Under the settlement, Google retains the rights to withhold up to 15% of titles from the Research Corpus for unspecified reasons. Concerns around academic freedom and privacy mounted up around this Corpus.
So there are the bones of the settlement, presented briefly. Before I finish, I want to pause here for a moment, on the subject of the out-of-print pool. As I mentioned earlier a major outcome of the s ettlement it that it enables Google to sell orphan books online. This is because the AAG and APA included all book copyright holders in their class, including the ones that couldn’t be found.The settlement is non-exclusive: rights holders do have the right to license their books elsewhere. But for the orphan books this is moot: no rights holders have come forward to exercise this right. So for this pool of books, through this settlement, and in the absence of any further legislation, Google is the only online seller. So how many orphans might there be ? Google argues that because they have created a commercial value for out-of-print books, and through the Book Registry, a mechanism for rights holders to claim their rights, the number of true orphans will diminish to perhaps 10% of the out-of-print pool, or 1 million of the 10 million currently scanned. However, if we look at the $45 million to be paid to the rights holders of books scanned prior to May 2009, that amount suggests an expected claim pool of 750,000 titles. Although the settlement stipulates that more money may be paid out in rights, this suggests a considerably larger orphan pool of 2.75 million out of the 7 million scanned to that point. Prior to the settlement, there was a great hope (in the scanning community at least) that the solution to the orphan book log jam was legislation to define how these books could be digitized and made available online. It has been argued that because the settlement creates a commercial value for these orphan works that did not exist before, it may have a negative impact on any future orphan works legislation that might provide that more general models for access. Another point of note is that under the settlement, revenue would be generated for all titles that are out of print. That re venue is split between Google and the active rights holders, even though orphans may comprise a large percentage of the out-of-print pool.
So the Google Book Settlement has generated strong reactions, both positive and negative, and some serious questions for us to ponder. First, it has made an enormous corpus of books available to a wide audience. Therefore, there is every reason to think that it could be highly useful to our users. However, the settlement in its current form also raises some serious concerns. Does the fact that only Google has the right to provide access to these books create a de facto monopoly? And what sort of chilling effect might this new product have on other similar products from other vendors? The proposed service involves user identification, and dictates that the book products remain on a very few servers. Does this raise issues of user privacy? As defined the Research Corpus use is carefully gate-kept, and the contents of the Rights Registry are not publicly available. Do we have concerns around transparency and intellectual freedom? And this massive corpus, which might have a chilling effect on other similar enterprises, is corporately managed, not freely reproducible, and thus far, only available in the US. What then are our concerns around equity of access and long-term security of data?
Is under US law, as Sian has indicated. Many commentators see an erosion of existing rights, eg interlibrary loans, and usage for public domain books Fair use is much broader than fair dealing, eg incl. teaching, education, parody. Can be seen as more restrictive than fair use, eg under the agreement, free snippets to in-print titles through Google has decreased….previously snippets could be viewed; going forward, only bib info and front-matter may be viewed. Settlement seen as a model for the future, in terms of setting various standards, such as no of pages that can be viewed/printed; the ability to archive and index text under specific conditions….
The BRR will be key to determining what rights are available, ie what can be used and how it can be used. Google has major advantage that it can use to promote the sales/marketing of books; transforming the book business Would be very difficult for anyone to compete with Google….they would have to seek agreement with authors; would not receive terms/conditions any better than Google’s. ‘ Most favoured nation status’ Not in the parties’ interests to favour OA or CC licensing – would undermine commercial model.
Reasons why opting out is problematic. BRR & Google will have enormous benefit from non-display uses to analyze the market and set prices accordingly. Huge impact of discoverability – and impact on ILL – but no ILL permitted on digital books. (only originals)
Various applications now available… Can we leverage SFX, eg create portfolios of subject area collections within Book Search and create targets?
Pricing is supposed to be comparable to existing pricing…but there is nothing comparable. Fully Participating Libraries will have their costs subsidized based on the scale of works that have been digitized…U Michigan gets free access for 25 years.
Conflict with privacy…how can we address if we don’t have representation on the board of the BRR?
XML –based content architecture will be important for incorporating content, metadata and rights, in a consistent and scalable manner; requires new processes and skill sets to