Slides accompanying a presentation at CTC2013 by Judges David Harvey and Daniel Garrie Esq on issues judges need to take into account when dealing with E-Discovery disputes
2. Judge David Harvey
LLB Auckland
M Jur Waikato
PhD Auckland
A Judge of the District
Court
Auckland
New Zealand
Judge David Harvey has been a judge of the district court in New Zealand for 25 years. He also teaches law and information
technology for the Faculty of Law, Auckland University, and has written a text on Internet and computer technology law
titled internet.law.nz, now in its 3rd edition. He is consultant editor to Butterworths Electronic Business and Technology
Law and a member of the editorial board for Butterworths Technology Law Forum. He has written extensively in the field of
law and technology and has presented a number of papers both in New Zealand and internationally on law and technology
matters.
He graduated with an LLB from Auckland University in 1969, MJur from University of Waikato in 1994, and PhD from
Auckland University in 2012. His doctoral dissertation was on the influence of a new technology (the printing press) on law
and legal culture in England in the Early Modern period.
Judge Harvey has an interest on the immediate and wider impact of technology on the law and legal culture. He has co-
written an article with Daniel Garrie on New Zealand’s new Discovery Rules and has delivered several presentations on the
subject.
3. Mr. Garrie is a Senior Managing Partner at Law & Forensics, an e-Discovery, cyber security, and electronic forensic consulting firm with offices
nationwide. Mr. Garrie is also General Counsel of Pulse Advisory, a venture Development firm.
Mr. Garrie has served as an Electronically Stored Information Liaison, Neutral or Expert for the L.A. Superior Courts, 2nd Circuit, 3rd Circuit, 7th Circuit,
New York Supreme Court, and Delaware Supreme Court. In 2009, the Daily Journal recognized Mr. Garrie as a “Rising Star,” and in 2011 featured Mr.
Garrie as a Special Master and thought leader in E-Discovery. In addition, due to his outstanding reputation in the emerging industry of E-Discovery and
computer forensics, Mr. Garrie was one of a handful of individuals appointed to the E-Discovery Special Master Pilot Program for the U.S. District Court
of Western Pennsylvania out of a national pool of candidates.
Mr. Garrie is on the editorial board of the Journal of Legal Technology and Risk Management, Journal of Law & Cyber Warfare, and Beijing Law Review.
He has published over 90 articles spanning many topics. His articles have been featured in the University of San Diego Law Review, ABA International Law
Journal and Suffolk Law Review. Mr. Garrie also authored the text book E-discovery and Dispute Resolution published by Thomson Reuters in the
Summer of 2013 2nd Edition and Software and the Law, fall of 2013.
Mr. Garrie is admitted to practice law in Washington, New York, and New Jersey.
Daniel B. Garrie, Esq.
Chair E-Discovery Dispute Resolution Panel
Alternative Resolution Centers
Offices: Delaware, California, New Jersey, New York, Washington, and Brazil
Contact: W: (646) 738-0951 | M: (215) 280-7033 | (213) 784 – 0951
LinkedIn: http:/www.linkedin.com/in/danielgarrie
Twitter: @dbgarrie
B.A., Computer Science, Brandeis Uni.
M.A., Computer Science Brandies Uni.
J.D., Rutgers School of Law
7. • The digital paradigm is so
revolutionary that it
undermines some of the
values and assumptions that
underlie traditional thinking
about documents.
• Instead we should be thinking
about “information”
9. Tangible Material
• Discovery and searches are
based on the quest for
information
• Information on paper – easy for a
reader to access that information
long after it was created
• Tangible – has a discrete physical
existence
10. 1 hard drive + 12 monthly backups 13
3 internal recipients 39
5 drafts reviewed by recipients 195
E-mail used to circulate drafts
and final of the document
Over 1,000
Confidential. Distribution NOT permitted
Why is e-discovery so voluminous?
11. Nature of an “E-Document”
• An e-document is not
“out there somewhere”
like a book in a library
• An e-document is a
“process” whereby
unintelligible pieces of
data – distributed over a
storage medium – are
assembled, processed
and rendered legible
• As a single entity, the
document is “nowhere”
12. Technology and “Functional Equivalence”
• Involvement of
technology makes the e-
document
paradigmatically different
from hard copy.
• “Functional equivalence”
is used to bring cyber
searches into line with
hard copy searches.
• An electronic document
may be seen as
“functionally equivalent”
in the presentation of
information in readable
form.
14. What is E-Discovery?
• The methods by which the parties use electronic means to assist in
finding, identifying, locating, retrieving, reviewing, listing or
exchanging documents to satisfy discovery obligations
• Rules do not mandate the use of digital tools and methods to find,
identify, locate, retrieve or review documents
• But such tools and methods, when properly implemented, can lower
the monetary costs of the litigation and accord with cost and
proportionality principles
16. Federal Judicial Center Survey
• Survey found 72% of respondents met with opposing counsel to plan counsel, yet, only 40% who
had a “meet-and-confer” discussed discovery of ESI, and only 60% discussed preservation
obligations.
• 86% of the conference-occurred by phone or videoconference, with 9% meeting in person, and
25% occuring by correspondence/email (multiple methods could be indicated)
• 73% of the respondents indicated that the meeting was completed in a half hour or less, with 19%
of those meetings lasting 10 minutes or less
• Only 25% discussed ESI issues and only 13% discussed preservation
Seventh Circuit E-Discovery Pilot Program-Phase Two Survey
• In cases “in which the Principles were perceived to have an impact, the consensus view among
attorneys appears to be that the Principles resulted in more discovery disputes, more discovery on
discovery, longer discovery periods, and greater expense for discovery and the litigation in general.
Are Meet, Confer Efforts Doing More Harm Than Good?
17. • Examine Early Case Assessment that has been undertaken.
• What ESI retention policies are in place
• Consider validity and effectiveness of ECA search criteria.
• What technological solutions are proposed by the parties – are they reasonable and
proportionate – will require judicial understanding of the advantages and
disadvantages of technology
“Once a party reasonably anticipates litigation, it must suspend its
routine document retention/destruction policy and put in place a
„litigation hold‟ to ensure the preservation of relevant documents.”
Zubulake v. UBS Warburg LLC, 220 F.R.D. 212, 218 (S.D.N.Y. 2003)
Preserve
18. • Critical examination of processes
undertaken and technologies
used to identify document
custodians.
Collection
Self-collection. The Fox Guarding the Hen House?
19. • Key word searching is a fairly blunt instrument
but may be useful for Early Case Assessment
• Key words create a black or white scenario
based upon whether or not a document
contains a word or does not
• The difficulty with key word searching is that it
may result in irrelevant documents being
identified because the key word selected may
have different meanings or context to what is
desired
• Ideally the construction of the search string or
key words should be discussed with other
parties so that the key words may be agreed
• Because of its limitations, key word searching
is not an ideal method of cutting and filtering
documents and other automated searches
may be preferable
Search – Keyword Searches
21. How Search Works
• Build an Index
• 10-30% additional storage
• Static Copy
• Run once – search many
• Crawl/Streaming Text
• No storage
• Dynamic selection
• Full Text
• Boolean – Keywords
• Natural Language – hidden risks
• Expanded Words
• Synonyms, grouping, related words,
thesaurus
• Concept Clustering – folders v.
visual analysis
22. Involvement in Effective Keyword
Searching
• Cannot proceed from an uninformed
perspective.
• Examine the approach of the parties
• How did they go about keyword
selection and search construction
• Is the dispute about definitions of the
keyword search or something else
• If judge is required to adjudicate a
keyword dispute consider a mixed
process
• Sampling and testing followed by
• Manual review
23. • Duplication and Exclusion
• Concept Searching
• Clustering
• Document Prioritisation aka Predictive Coding
• E-mail Threading
• Near Duplicate Identification
• Native File Review
Technology and E-Discovery
24. Duplication and Exclusion
• The process of identifying and removing duplicate
documents from a collection of documents so that
one unique copy of each document remains
• A cryptographic hash function such as the message
digest algorithm five (MDA5) may be used to
generate a digital fingerprint for an electronic
document.
• The digital fingerprint of a document can then be
electronically compared against the digital
fingerprint of any other document to determine
whether the documents are exact duplicates
• Duplication may also be implemented by using a
cryptographic hash function applied to a group of
documents
25. Duplication Problems
• In the paper world the process of
duplication required visually sighting
documents
• Some lawyers are still using the same
practices that they used when reviewing
paper documents adding unnecessary cost
and burden to the discovery process
• It is not unknown for the document review
process to be carried out by printing out
hardcopies of all the electronic material
and then laboriously reading through
document by document to ascertain if
there were duplicates
26. Concept Searching
• Useful when large volumes have to be examined and the search
attempts to match results with the query conceptually
• Methodology is based not upon key words but upon the subject
matter of the document paragraph or sentence
• Concept searching adds additional information to the very basic key
words as it evaluates both words and the context in which they
appear
27. Clustering
• Clustering groups documents by identifying conceptually alike
documents and the technology breaks them up into groups of similar
documents. The technology is calculated through the mathematical
relationship between the text context of the documents.
• There is an advantage with process in that similar issues can be
investigated at the same time instead of reviewing different
documents throughout the document review set.
28. An Example
• Someone creates a word doc, then prints it PDF, another person
opens the PDF, cuts and paste the text of the document into an email
and emails that to third person.
• That person then prints the email, and a fourth person scans the
email to TIFF.
Cluster analysis could possibly put all of the files together in a cluster,
you’d have four types of files in the cluster (DOC, PDF, MSG, and TIFF)
all because the content is similar.
29. I took the same document and converted it in 6 different file types: MSG, TXT, DOC, PDF, RTF,
& XPS. The cluster analysis engine detected all 6 files and grouped them regardless of the
fundamentally different file types. Again our technology for clustering is all based on the
content. Notice the different file type icons in the similar panel.
Metadata can show
history of places the
document has been
stored. This example is
from a British dossier on
Iraq’s security
infrastructure and
reveals that the
document was compiled
by copying content from
outside documents,
including a post-
graduate student.
30. Near Duplicate Identification
• Not referred to in the New Zealand checklist
• Near duplicate technology identifies documents that have
similar content although not an exact duplicate
• The technology groups all of the near duplicates together
so they can be reviewed at the same time allowing the
reviewer to quickly focus on the differences and move
through the documents more quickly and accurately
• Email threading and near duplicate technology can be
used on paper documents as well as e-documents
• The accuracy of the paper documents will depend upon
the quality of the text searchable content or OCR –
(optical character recognition) when the document is
scanned
31. Email Threading
• Many emails contain earlier message and are constructed in the form
of a thread or a chain
• Email threading technology is essential to respond to the problems
caused by these chains
• By identifying the end point of the email chain, redundant emails do
not have to be reviewed
• Threading organises emails into conversations, revealing the context
of the communication and reducing review time by 50% or more
33. Document Prioritisation or Predictive Coding
move to predictive coding section.
• May produce accurate results especially when there are large volumes
• An initial document set can be reviewed by someone knowledgeable about the
matter
• The same irrelevancy calls are then carried forward to the remainder of the
document set based on the results of the sample set
• The software then prioritises or ranks the remainder of the documents based on the
decisions made on the same documents which allows the most relevant documents
to be identified first
• An important feature is that the initial review must be carried out by someone with
an intimate knowledge of the case at hand
35. Measuring data retrieval
Recall = A/(A+D) = # of relevant docs retrieved = 8/10 = 80%
# of relevant docs in collection
Responsive
Not retrieved
Retrieved
Not Responsive
A
D
B
C
36. Measuring data retrieval
Responsive
Not retrieved
Retrieved
Not Responsive
A
D
B
C
Recall = A/(A+D) = # of relevant docs retrieved = 8/10 = 80%
# of relevant docs in collection
Precision = A/(A+B) = # of relevant docs retrieved = 8/12 = 67%
# of docs retrieved
37. Measuring data retrieval
Responsive
Not retrieved
Retrieved
Not Responsive
A
D
B
C
Better recall means you discover more responsive docs
Better precision means you read less junk
39. 39
“Technology-Assisted Review”
• Uses machine learning technologies
• Based on human review of a subset of the documents
• Categorizes documents as responsive or not to given request
• Some tools rank or score documents given likelihood they will be responsive
• Rankings can be used to partition the documents into categories: e.g. potentially
responsive or not; in need of further review or not; etc.
• Think of a spam filter that classifies e-mail into “ham,” “spam,” and “questionable”
• Contrasts with exhaustive manual review
40. 40
Types of Machine Learning
• “Standard” Supervised Learning
• Human chooses the document exemplars (“seed set”) to feed to the system
• System ranks the remaining documents in the collection
• Ranking based on similarity (or dissimilarity) to exemplars (“find more like this”)
• Active Learning
• A variant of supervised learning
• System chooses the document exemplars to feed to the human
• Human makes responsiveness determinations
• System learns from these determinations and chooses next exemplars to maximize learning
• System applies what it has learned to the remaining documents in the collection
41. Predictive Coding
Important Case Law
• Monique da Silva Moore v. Publicis
Group SA, No. 11 Civ. 1279 (ALC)(AJP)
(S.D.N.Y. Apr. 26, 2012), approving use
of reliable predictive coding.
• In Re Actos (Pioglitazone) Products
Liability Litigation, MDL No. 6:11-md-
2299 (W.D. La July 27, 2012), issuing
case management order allowing for
use of predictive coding.
• Kleen Products LLC v. Packaging
Corporation of America, No. 10 C 5711
(N.D. Ill. August 21, 2012), issueing
CMO reserving on question of use of
predictive coding.
• Automated process that culls
through electronic data and
focuses on non-keyword attributes
such as context or word frequency
• “Computer-assisted review is not a
magic, Staples-Easy-Button
solution appropriate for all cases.
The technology exists and should
be used where appropriate.” Da
Silva Moore v. Publicis Groupe et
al., 11-Civ.01279 (S.D.N.Y
February 24, 2012)
42. Predictive Coding in Practice
Most expensive component of any document
production remains the attorney review
Keyword search: an imperfect state-of-the art
Is there a better culling tool?
43. Native File review
• Allows lawyers to view documents in the format in
which they were intended to be viewed
• Spreadsheets and databases may only be able to
be accurately assessed for their native
applications. This can have considerable cost
saving
• Converting all documents to PDF prior to the
document review (rather than after it) will usually
add unnecessary expense to the discovery process
• It will usually be more efficient to review
documents in their native file format and then
only convert the relevant documents to PDF for
the electronic exchange of documents
45. Knowing Saves Time
Ensure that both sides know
something about their clients
systems such as:
• Know and verify how to manage information
• Know what systems may be impacted
• Know what systems are involved
• Bring technical documents including data map
46. Focus on the Facts and Issues
• Focus on the e-discovery
facts, not the issues
• Think in terms of
technical specifications
47. Remember to
educate and listen as
counsel and parties
are not likely
technology or e-
discovery gurus.
Do Not Rush….walk before you run
48. • There are ways utilising conferencing, careful case management,
scheduling conferences and time tabling
• Judicial Activism
Try to avoid a disputed E-Discovery Hearing
49. • Use the case management conference actively
to superintend the discovery process
• Encourage narrow targeting of requests for ESI.
• Consider imposing limits on E-Discovery.
• Consider sampling to determine relevance,
need and cost of more expansive discovery.
• Develop procedures for production of
information in usable form.
• Develop procedures to deal with inadvertent
disclosure of privileged material.
• Consider cost shifting if the information sought
is not reasonably accessible – may require a
consideration of document storage and
retention policies of the party in question.
Using the Case Management Conference
50. • Always encourage co-operation and continually
remind the parties of the necessity for
reasonableness and proportionality.
• The obligation to co-operate should be an on-
going requirement.
• Because E-Discovery is process driven, it is
important to cooperate over at all stages of the
process, especially the with Keyword searching
or using Predictive Coding.
Emphasise Co-Operation
51. • Is the scope of discovery reasonable in the context of the case
• Is the scope of discovery proportional to the matters at issue
Use reasonableness and proportionality as a yard stick to
measure stances of counsel on E-Discovery issues.
52. What is Proportionality?
• The relationship between cost
and value in the proceedings
• Is the extent or manner of the
discovery sought justified by
the amount and matters at
issue in the proceeding.
53. • New Zealand and England use variants of a checklist or questionnaire.
• The checklist provides a very useful roadmap to assist parties to co-
operate over how discovery will be conducted.
• The checklist establishes a framework to assess a proportionate and
reasonable search for documents tailored to suit the requirements of
each matter.
• All of these discussions must take place prior to the first case
management conference.
• The Checklist can be used by counsel and the Judge
Use a Checklist as a Guide
54. The New Zealand Checklist
The checklist itself highlights ways to reduce some of the listing and
exchange costs -
“to reduce unnecessary costs of listing documents parties are
encouraged to:
a) Use native electronic versions of documents as much as possible; and
b) Use the extracted metadata from native electronic documents instead of
manually listing documents; and
c) Convert documents to image format only when it is decided they are to be
produced for discovery; and
d) If document images are to be numbered, only number those images if they
are to be produced for discovery.”
55. • Focus and reduce the issues to be determined within the framework
of the pleadings.
• Proper Case Management will distil the main areas of dispute
• May only be about a technological method or the scope of discovery
of a class of documents
Worst Case Scenario – A Disputed Hearing
56. Looking Ahead
• Particular methods of discovery will depend upon the
case in hand
• Different products may be more relevant to the
different parts of the discovery process
• Lawyers and Judges are going to have to become
intimately aware of the technologies that are
available and of the technological processes that can
underlay the discovery process if the advantages of
cost reduction and proportionality that underlie the
rules are to be achieved
Notas del editor
It is implicit that all parties including the judicial officer should have the knowledge of the benefits, advantages and disadvantages of the various documents sorting and document review technologies that are available At times, depending upon the nature of the documents and their extent, such knowledge or awareness is going to have to be detailed and specific May have an impact upon a tailored discovery order
Parties should be prepared to provide greater information if there is a dispute about proportionality.There is a difference between proportionality and taking ‘short-cuts’.What is the cost and time that make the approach disproportionate? Lawyers sometimes say they did not have time to do all of this work prior to discussing with the other party. They usually just say it is not proportionate as an excuse.