This document discusses HathiTrust's efforts to review the copyright status of books scanned from academic libraries. It describes two grants from the Institute of Museum and Library Services (IMLS) that supported expanding the copyright review process. The first grant involved four partners who reviewed 170,000 works, identifying 87,000 in the public domain. The second grant added international partners, bringing the total works reviewed to over 200,000, with over 80,000 identified as public domain. The document then outlines five blocks or steps for effective inter-institutional collaboration on copyright review, including onboarding workers, meeting institutional requirements, training, technical support, and tracking work.
3. What is Copyright Review ?
• HathiTrust has scanned books from academic
libraries
• Scans are reviewed for copyright status
• Public Domain materials are opened to view
by anyone
• Thank you IMLS!
4. IMLS Grant Partners
IMLS 1--US Works – 2008-2012
• Four partners:
Michigan, Wisconsin, Minne
sota, Indiana
• 12 reviewers
• 170,000 works reviewed at
end of grant 1
• 87,000 public domain works
identified at end of grant 1
IMLS 2 – 2012-2014
• Adds works from
Canada, UK, Australia
• 44 reviewers from 19
institutions
• 58,347 non-US works
reviewed
• 41,805 non-US public
domain works identified
11. Block 4: Technical support and shared
tools
•
•
•
•
A single unified web-accessible interface
Locking in security
Developer and IT support
Culture of progressive change and
improvement
12.
13. Block 5: Tracking the work
•
•
•
•
Ongoing feedback
Remedial communication and re-training
Incentives to improve
Letting participants see progress
16. Lessons Learned
• Budget for rewards and celebrations
• Be clear and firm about the
contributions, skills and qualities you want in
staff.
• Get local supervisors on your side.
• Procedural training works best when formally
administered and monitored from a central
location.
• Little things can look big from a distance.
In 2004, the University of Michigan and several other large academic libraries made a deal with Google—scan our books and you can have a copy of the scans, and we can have a copy of the scans. In the years since then, that agreement has grown into the HathiTrust, a collaborative repository that holds nearly 11 million digital files contributed by 77 institutions.
But just having the scans doesn’t mean that we can deliver them to users. Many of the materials scanned are still protected by copyright. Sorting out which materials can be viewed and which have to be kept in dark archives is a monumental task.
In 2007, the University of Michigan and several partners applied for an Institute for Museum and Library Services Leadership grant that set a goal of 60,000 US-published public domain works to be identified within a three year period. The grant was awarded in 2008 and completed in November 2012 with over 170,000 books reviewed and 87,000 determined to be in the public domain. A second National Leadership grant was awarded for 2012-2014 to continue the US work and additionally to look at works published in the United Kingdom, Canada, and Australia. As I wrote these words in December 2013, we were nearing the end of the bulk of US copyright evaluation with 298,623 items having been reviewed under both grants and only 4,944 remaining in the evaluation queue. As a result of the work to date, 155,423 US books published between 1923 and 1963 have been made available to anyone who chooses to view them. Managing the work of so many people to meet the requirements of legal scrutiny took a great deal of planning and organization. We hope this report of our experiences will prove useful to others who envision large-scale projects.
The evaluation process is deliberately designed to assure random assignment of work to reviewers and to provide two sets of eyes for each item as an error-avoidance, due diligence safety net. Each item is reviewed twice by different reviewers. If both reviewers agree on the copyright status, then the status is recorded as a code in the Rights Database that manages access to materials in HathiTrust. If the reviewers do not agree, then the item conflict is examined by a third, expert, reviewer who makes a final determination. Records are kept of each evaluation and who made it.
Who makes a good worker and how can you identify and nurture that from a distance? We found one key factor for productivity is percent effort. We saw the best performance at a range of 15-25 % FTE. Above that there are conflicts with other duties. Below that there are difficulties maintaining skills.We also found that School of Information grad students hired solely for the purpose of doing reviews and catalogers who have strong accuracy and production habits were the most useful for our particular purposes. There is definite value in thinking about what qualities are needed for the task you have in mind and asking for the appropriate people, even if you are asking for volunteers. This is actually respect for the worker whose time is not wasted just because they happen to be available at a particular time or institution but who isn’t truly suitable for the work. Because you have not observed people at work, you are often dependent on local administrators selecting workers for you. It is important to be very clear about what you are looking for and why.And each institution needs to understand the commitment that is being asked of its workers and be prepared to give them credit for the work.A variety of communication is needed: group phone calls, email, Chat, online documentation, training slides and videos. Different participants will prefer and respond to different things and you will have limited feedback to determine whether or not one particular communication form is working for someone. Shared values need to be explicit and expressed. You can’t assume people will pick this up from the context when they have little or no context to observe.
For example, this is part of a value presentation that is included in CRMS-US orientation. It is on screen while the following statement is read: It is vitally important to the success of Hathi Trust that you use your privilege of viewing all unreleased texts responsibly. To do this work, you as a reviewer will have access to works that are in copyright. You must use this access only for the purposes of review and not share your privilege with anyone else. . .Our only protection from infringement of copyright is that we view potentially protected texts solely for the purposes of determining if they are in the public domain. If we use them for any other purpose and they turn out to still be protected, we are breaking the law. If we are careless with our access and someone else uses our login to steal and reprint protected items, we can invalidate all the good work that has been done in releasing material to the public. We are placing a heavy responsibility on you to be careful in using the access you are being given and to take the work you do seriously. You will be operating on the front lines of copyright controversy and what you do can affect how much material the world is allowed to view for generations to come.
In some cases, institutions we worked with established a higher level administrator as a liaison. While it is important to have higher-level buy in, for day to day purposes, we found that workers’ direct supervisors were better able to make arrangements and to respond to our needs. Among other things, involving direct supervisors means that the time commitment is clear. It also means that the work can be part of a workers performance evaluation without elaborate arrangements. Communication with the supervisor means that he or she is aware of what and how workers are doing. They can apply pressure to improve performance and oversee remedial activities. They can directly compliment those whose work is good. This also means that they can inform us when staff members are shifted and or reassigned. It is equally important to communicate upfront just what will be required from participating institutions. For example, because our workers must have access to copyright protected works during evaluation, security of the workstation was vital. Users had to work from specially set up non-public workstations with up to date security software and stable, single-user IP addresses. Institutions need to know if they will be asked to make any investments in equipment or software and what will be asked of them in the way of network and IT support. While our case for legal indemnification was a particularly strong one, it is a good idea to consider a written agreement about responsibility in any collaborative digitization endeavor: who is responsible if an item is damaged or lost, for example. The University of Michigan accepted the responsibility for defending our decisions in court if necessary and that needed to be formally recognized with legal documentation for all participants. We had to make it clear who would be responsible if a mistake was made and who would be liable if a suit were instituted. Because Michigan assumed this responsibility, we had to insist on the use of Michigan’s established criteria for public domain determinations since they were the ones legally responsible for the results. It is important for a digital project to have these kinds of decisions made up front and documented and to have them clearly understood by all participants.
Copyright and, we suspect, other procedural training works best when formally administered and monitored from a central location. Misconceptions or gaps in knowledge are more common with decentralized training processes where multiple trainers approach the training differently. Training was best provided by a trainer who was also an active worker. We found it essential for instructors to have continuing and up to date practical experience with the review process in order to train other reviewers. Our efforts at Train-the Trainer arrangements were not successful in producing a consistent legally supportable product.Immediate and direct feedback is also important in developing the knowledge base of a new reviewer. It encourages reviewers to ask more questions and helps reviewers refine their model for thinking about complex copyright questions. We also had to be sure to communicate any changes in policy and practice to all participants in a timely, uniform and complete way.
This is an example of one piece of our documentation. To assure consistency we created a decision tree that enforced a particular sequence of decisions to ensure that all reviewers were operating with the same data at the same point in the workflow. We made sure it was kept up to date in a uniformly accessible and labeled version.
Because users all share the same interface, they are using the same resources in the same ways. They have access to the most up to date version of documentation, they search the same databases and view the same images. This produces the consistency necessary when people are widespread. It also means that items may be served up in a random process that shares the work evenly depending on who is using the interface at a given time. With workers using a variety of browsers and versions on a variety of networks, in several different time zones, any change can produce a bug that affects access to the interface. We found it necessary to keep a developer on staff to accommodate these changes as they occur. Any one can halt production until the bug is fixed. Also, suggestions from workers for new features or improvements in display increases buy-in as well as improving the product. Questions from collaborators about whether our logic is calculating accurately are a second safety net.
This is a sample screen from the shared interface. All users see the same views, connect to the same databases for searches and see the same coding options.
Our system keeps a running total of decisions and their validations so that users can compare their time per review and accuracy to the average performance of the total users. This gives them a sense of how well they are working and also allows supervisors to measure their efficiency. It pinpoints those who might need some remedial training as well.
Participants can see directly how well they are doing and compare their work to the average rates. Supervisors can identify those who are doing less well and work with them to improve.
Users can view their mistakes and see what the correct decision should have been. We learned a lesson from this display: while this kind of feedback was useful, participants became upset when we first showed the error by displaying a big red X. They felt insulted and criticized and we were unable to soften the blow because we weren’t present when they encountered the feedback. We made the X smaller and pink and sent a message to everyone to reinforce the idea that feedback wasn’t intended to be criticism, but to be an opportunity to identify misconceptions that we needed to rectify. Users took this far more seriously and competitively than we had expected and we had to put in some work to tone the message down.
So what did we learn from our experience? With nothing in the grant budget and nothing budgeted by participating institutions, for recognition of workers, we were hard put to provide a special long-distance celebration when we reached the end of the US evaluation queue. If we were to do it again, we would be sure that there was a way to provide a shared occasion for all participants that recognized their achievement beyond just an appreciative letter. We learned to specify what we needed in the way of time and working skills and to treat participants as genuine employees. We did best when direct supervisors were engaged and working with us.We found centralized training produced a more consistent product.And we learned that even the smallest detail can be upsetting when you are distant from those who are judging your work.
If you would like to learn more about HathiTrust in general or the copyright review project specifically, here are places to start.
And please feel free to contact me as well.
Again, we are clear and open about the principles upon which our process was designed and we state them directly as something that we value.As a copyright reviewer, your task will be to apply carefully developed criteria to the volumes you examine in order to determine if they might be in the public domain. This work will be different from the kind of things you are accustomed to doing in several ways, but one significant way has to do with the law. The way the civil legal system works, what is acceptable or unacceptable is based on precedent, the previous interpretations of how laws should be applied. If there is no precedent for what you plan to do, you must take a risk, with the determination of whether or not you made the right decision based on any litigation that follows. The massive size of the Hathi Trust, its relationship with Google and the new technology of scanning and delivery over the Internet means there is limited legal precedent to define what we are allowed to do. The goal then, is to do as much as we think we are permitted to do while limiting the risk of being sued. HathiTrust is taking on the risks of liability, but is placing strict controls on what you as reviewers are expected to do in order to control this risk. We call this risk management and have taken several steps to limit our exposure to lawsuits.Copyright laws are complicated. To minimize that complexity and to make our first steps as transparent to interested parties as possible, we chose to focus on the items that are most obviously in the public domain, that is the “low hanging fruit.” This has given us a chance to develop understanding of what is involved and demonstrates to rights holders that we are not going after their livelihoods or profits. We exhibit good faith. We have a take-down policy that removes any item from public access if someone complains about it. The decision is then reviewed by a legal counsel to determine if a mistake was made. We reduce the risk of this happening by asking that you take care in the work that you do. Mistakes do expose usWe apply what is called due diligence. We have two independent reviewers examine each item and only apply a decision when both agree. We are careful and give strong attention to the decision making process.We keep documentation of the criteria we use for review and stick to it.We keep records of every decision made, who made it and when it was made. This is not intended to intimidate you, but to make it clear that we all need to be mindful of our responsibility.