SlideShare una empresa de Scribd logo
1 de 7
Descargar para leer sin conexión
November 2012




E-discovery —
Taking Predictive
Coding Out of
the Black Box




Joseph H. Looby
Senior Managing Director
FTI TECHNOLOGY




IN CASES OF COMMERCIAL LITIGATION, the process of discovery can
place a huge burden on defendants. Often, they have to review millions
or even tens of millions of documents to find those that are relevant to a
case. These costs can spiral into millions of dollars quickly.
November 2012




         M
                                                anual reviews are time consuming and very
                                                expensive. And the computerized alternate —
                                                keyword search — is too imprecise a tool for
                                                extracting relevant documents from a huge pool.
                                                As the number of documents to be reviewed for
                                                a typical lawsuit increases, the legal profession
                                                needs a better way.




                                                Predictive discovery is that approach.      this year courts in the United States
Illustration by Big Human                       Predictive discovery is a machine-          have begun to endorse the use of this
                                                based method that rapidly can               method. But according to a survey
                                                review a large pool of documents to         FTI Consulting recently conducted, a
                                                determine which are relevant to a           significant obstacle stands in the way of
                                                lawsuit. It takes attorneys’ judgments      widespread adoption: Most attorneys do
                                                about the relevance of a sample set of      not understand how predictive discovery
                                                documents and builds a model that           works and are apprehensive about using it.
                                                accurately extrapolates that expertise
                                                to the entire population of potential       In this article, we explain how predictive
                                                documents. Predictive discovery is          discovery functions and why attorneys
                                                faster, cheaper and more accurate than      should familiarize themselves with it.
                                                traditional discovery approaches, and




Companies defending themselves                  commercial lawsuit might have entailed      about $10 million — a staggering sum.
against litigation frequently must              searching through tens of thousands of      Conducting keyword searches of online
produce documents that may be                   documents. A team of attorneys would        documents, of course, is much faster. But
relevant to a case by searching through         pore through all the paper, with perhaps    in certain circumstances, such searches
archives. The same is true for companies        one or two subject matter experts           have been shown to produce as little as
being investigated by a regulatory              to define the search parameters and         20 percent of the relevant documents,
body. The process of plumbing one’s             oversee the procedure. A larger number of   along with many that are irrelevant.
own archives to produce relevant                more junior lawyers frequently would do
documents is known as discovery. When           the bulk of the work. This process (known   With the number of documents in large
the documents are in electronic form,           in the trade as linear human or manual      cases soaring (we currently have a case
the process is called e-discovery. Large        review) would cost a few thousand dollars   involving 50 million documents), costs
companies today often must review               (today, it runs about $1 per document).     are getting out of hand. Yet the price of
millions or even tens of millions of letters,                                               failing to produce relevant documents
e-mails, records, manuscripts, transcripts      But now, the large number of documents      can be extraordinarily high in big-stakes
and other documents for this purpose.           to be reviewed for a single case can make   litigation. The legal profession needs a
                                                manual searches a hugely expensive and      new approach that improves reliability
The discovery process once was entirely         time-consuming solution. For a case with    and minimizes costs.
manual. In the 1980s, the typical               10 million documents, the bill could run
November 2012



There now is a third approach. Predictive                       documents to identify the full responsive set.                    because most of the expense is incurred
discovery takes expert judgment from                                                                                              to establish the upfront search rules.
a sample set of documents relevant (or                          Studies show that predictive coding                               Afterwards, the cost increases only slightly
responsive) to a case and uses computer                         usually finds a greater proportion —                              as the size of the collection grows.
modeling to extrapolate that judgment                           typically in the range of 75 percent
to the remainder of the pool. In a typical                      (a measure known as recall) — of the                              In a recent engagement, we were asked
discovery process using predictive                              responsive documents than other                                   to review 5.4 million documents. Using
coding, an expert reviews a sample of                           approaches. A seminal 1985 study found                            predictive discovery, we easily met the
the full set of documents, say 10,000-                          keyword searches yielded average recall                           deadline and identified 480,000 relevant
20,000 of them, and ranks them as                               rates of about 20 percent. A 2011 study                           documents — around 9 percent of the
responsive (R) or non-responsive (NR) to                        showed recall rates for linear human                              total — for about $1 million less than linear
the case. A computer model then builds                          review to be around 60 percent. And,                              manual review or keyword search would
a set of rules that reflects the attorney’s                     according to these same studies, the                              have cost.
judgment on the sample set. Working on                          proportion of relevant documents in
the sample set and comparing its verdict                        the pool of documents identified by
on each document with the expert’s, the                         predictive coding (a measure known as                                  Definition: Trial and Error
software continually improves its model                         precision) is better, too.                                             The process of experimenting
until the results meet the standards                                                                                                   with various methods of doing
agreed to by the parties and sometimes                          The cost of predictive discovery is a fraction                         something until one finds the
a court. Then the software applies                              of that for linear manual review or keyword                            most successful.
its rules to the entire population of                           search, especially for large document sets,




So how does predictive discovery work?
As suggested above, there are five key steps:

      1         Experts code a representative sample of the document set as
                responsive or non-responsive.

      2         Predictive coding software reviews the sample and assigns weights
                to features of the documents.
                The software refines the weights by testing them against every
      3         document in the sample set.

      4          The effectiveness of the model is measured statistically to assess
                 whether it is producing acceptable results.

      5          The software is run on all the documents to identify the
                 responsive set.

1.	   “An Evaluation Of Retrieval Effectiveness For A Full-Text Document Retrieval System,” by David C. Blair of the University of Michigan and M.E. Maron of the University of California at
      Berkeley, 1985

2.	    “Technology-Assisted Review in E-discovery Can Be More Effective and More Efficient than Exhaustive Manual Review,” by Maura R. Grossman and Gordon V. Cormack, Richmond
      Journal of Law and Technology, Vol. XVII, Issue 3, 2011.
November 2012



Let’s look at these steps in more detail.




1
                                                         COLLECTION               REFERENCE SET            EXPERT REVIEW                      RESULTS

              Experts code a
              representative sample                                                        17,000
              of the document set as                                                                                                          Responsive
              responsive or non-                                                   Sample Training Set                                          (23%)

              responsive.                                1,000,000

In a typical case, a plaintiff might ask a                                              983,000           Attorney Expert Review
company to produce the universe of                   Document Collection
                                                                                                                                           Non–Responsive

documents regarding some specifics about
Project X. The company prepares a list of                                               Prediction Set                                 responsive            3,900

its employees involved in the project. From                                                                                            non–responsive       13,100

all the places where teams of employees
working on Project X stored documents,           Estimate Of Goal
the firm secures every one of the potentially    We are 99% confident that 23% (+/- 1%) of the 1,000,000 documents are responsive.
relevant e-mails and documents. These can
easily total 1 million or more items.

These documents are broken into two              and valuation issues ….” for instance,                  Each feature is given a weight. If the
subsets: the sample or training set and the      yields 33 features as shown below.                      feature tends to indicate a document is
prediction set (i.e., the remainder). To have                                                            responsive, the weight will be positive.
a high level of confidence in the sample,        The total number of features in a set                   If the feature tends to indicate a
the training set must be around 17,000           of 1 million documents is huge, but                     document is non-responsive, the weight
documents. One or more expert attorneys          even large numbers of features easily                   will be negative. At the start, all the
codes the training set’s documents as R or       can be processed today with relatively                  weights are zero.
NR with respect to the litigation.               inexpensive technology.

The results of this exercise will predict with
reasonable accuracy the total number                                TRAINING SET                                        EXAMPLE FEATURE SET
of Rs in the full set. A statistically valid
random sample of 17,000 documents                  Words, Phrases &        responsive            3,900
                                                                                                                                   Feature                     #
                                                   Metadata                                                                  skillings                          1
generally is sufficient to ensure that our                                 non–responsive       13,100      Unigrams         abrupt                             2
answer will be accurate to +/-1 percent                                                                                      departure                          3
                                                                                                                             will                               4
at the 99 percent confidence level. For                                                                                      raise                              5
example, if the number of R documents                                                                                        suspicions                         6
in the sample set is 3,900 (23 percent                              PROCESSING                                               of                                 7
                                                                                                                             accounting                         8
of 17,000), it is 99 percent certain that                                                                                    improprieties                      9
between 22 to 24 percent of the 1 million                                                                                    and                                10
                                                     1    Extract Machine-Readable Text                                      valuation                          11
documents are responsive.                                                                                                    issues                             12




2
                                                          Example                                                            skillings abrupt                   13
              Predictive coding                                                                             Bigrams          abrupt departure                   14
                                                                                                                             departure will                     15
              software reviews the                        From: Sherron Watkins                                              will raise                         16
              sample and assigns                          To:   Kenneth Lay                                                  raise suspicions                   17
                                                          Sent: August 2001                                                  suspicions of                      18
              weights to features                                                                                            of accounting                      19
              of the documents.                            Skilling’s abrupt departure will raise
                                                                                                                             accounting improprieties           20
                                                                                                                             improprieties and                  21
                                                           suspicions of accounting improprieties
                                                                                                                             and valuation                      22
                                                           and valuation issues...
For the sample set of documents, the                                                                                         valuation issues                   23
                                                                                                                             skillings abrupt departure         24
software creates a list of all the features                                                                 Trigrams         abrupt departure will              25
of every document in the sample set.                 3    Make Features                                                      departure will raise               26
Features are single words or strings                      - Break stream of text into tokens
                                                                                                                             will raise suspicions              27
                                                                                                                             raise suspicions of                28
of consecutive words; for example,                        - Remove whitespace and punctuation                                suspicions of accounting           29
up to three words long. The sentence                      - Normalize to lowercase                                           of accounting improprieties        30
                                                                                                                             accounting improprieties and       31
“Skilling’s abrupt departure will raise                   - Convert tokens into Feature Set
                                                                                                                             improprieties and valuation        32
suspicions of accounting improprieties                                                                                       and valuation issues               33
November 2012




3
               The software refines the                                                                       score for every document according to the
                                                                     WEIGHT TABLE
               weights by testing them                                                                        final weight table. The machine score for a
               against every document                 Feature                              Weight             document is the sum of the weights of all
               in the sample set.                     will raise                            -0.6
                                                                                                              its features. In the illustration to the left, we
                                                                                                              show two documents: a document with a
The next step is one of trial and error. The          suspicions                            0.7               score of -5.3 (probably non-responsive) and
model picks a document from the training              of                                    -
                                                                                                              a document with a score of +8.7 (probably
set and tries to guess if the document is                                                                     responsive).
responsive. To do this, the software sums             improprieties and valuation           1.8

the importance weights for the words in                                                                       Then, if predictive coding is to be used to
the document to calculate a score. If the                                                                     select the highly responsive documents
score is positive (+), the software guesses                                                                   from the collection of 1 million and
                                                           NR                                   R
responsive, or R — this is the trial.                                                                         to discard the highly non-responsive
                                                                                                              documents, a line has to be drawn. The line
If the model guesses wrong (i.e., the model                                                                   is the minimum score for documents that
and attorney disagree), the software                                                                          will be considered responsive. Everything
adjusts the importance weights of the                                                                         with a higher score (i.e., above the line),
words recorded in the model’s weight                                                                          subject to review, will comprise the
table. This, of course, is the error. Gradually,                   -5.3      0      +8.7
                                                                                                              responsive set.
through trial and error, the importance
weights become better indicators of                                                                           In our 17,000 document sample, of which
responsiveness. Pretty soon, the computer          better reflection of the attorney expert’s                 the expert found 3,900 to be R, there might
will have created a weight table with              judgment. This process is called machine                   be 7,650 documents that scored more
scores of importance and unimportance              learning. The computer can take several                    than zero. Of these, 3,519 are documents
for the words and phrases in the training          passes through the sample set until the                    the expert coded R, and 4,131 are not.
set, an excerpt of which might look like the       weight table is as good as it can be based                 Therefore, with the line drawn at 0, we will
illustration at the right.                         on the sample.                                             achieve a recall of 90 percent (3,519 divided
                                                                                                              by 3,900) and a precision of 46 percent
As the software reviews more documents,            Once the weight table is ready, the program                (3,519 divided by 7,650).
the weight table becomes an increasingly           is run on the full sample to generate a



                      ATTORNEY EXPERT                                                                COMPUTER MODEL
                          REVIEW                                                                         (AT >0)


                                                                                                    NR                                         R
                                                                                                                                  7,650
                                                                                                               FN
                                                                                                                                  TP
                                                                                                               TN




                                                                          Vs
                                                                                                                                  FP

                                                                                                    -1                   0                     +1




                                                                                                                         Found

                responsive                          3,900                                       True +                    3,519        90% Recall

                non–responsive                     13,100                                       False +                   4,131        46% Precision

                                                   17,000                                                                 7,650




                                                                                                            FALSE       TRUE         FALSE
                                                                                                          POSITIVES   POSITIVES    NEGATIVES
November 2012




4
             The effectiveness of                 between 0 and +0.3 are missed). On the
             the model is measured                basis of the sample set, the software
             statistically to assess              can determine recall and precision for
             whether it is producing              any line. Typically, there is a trade-off;
             acceptable results.                  as precision goes up (and the program
                                                  delivers fewer NR documents), recall
By drawing the line higher, one can               declines (and more of the R documents
increase the ratio of recall to precision.        are left behind). However, not only are
For example, if the line is drawn at +0.3,        recall and precision scores higher than
the precision may be significantly higher         for other techniques, the trade-off can be
(say 70 percent) but with lower recall            managed much more explicitly and precisely.
(i.e., 80 percent because some of the Rs




                     ATTORNEY EXPERT                                                          COMPUTER MODEL
                         REVIEW                                                                  (AT >0.3)


                                                                                             NR                                              R
                                                                                                                                 4,450
                                                                                                        FN
                                                                                                                                 TP




                                                                  Vs.
                                                                                                        TN
                                                                                                                                 FP

                                                                                             -1                    0       0.3               +1




                                                                                                                   Found

                responsive                         3,900                               True +                      3,125              80% Recall

                non–responsive                    13,100                               False +                     1,325              70% Precision

                                                  17,000                                                           4,450



                                                                                                    FALSE       TRUE               FALSE
                                                                                                  POSITIVES   POSITIVES          NEGATIVES




     Recall measures how well a process retrieves                                Precision measures how well a process
     relevant documents. For example, 80% recall                                 retrieves only relevant documents. For example,
     means a process is returning 80% of the estimated                           70% precision means for every 70 responsive
     responsive documents in the collection.                                     documents found, 30 non-responsive
                                                                                 documents also were returned.



       80%       =     3,125      /    3,125      +     775                        70%        =         3,125          /         3,125        +     1,325
       Recall           True            True            False                    Precision               True                  True                 False
                      Positives       Positives       Negatives                                        Positives             Positives             Positives
                        (TP)            (TP)            (FN)                                             (TP)                  (TP)                  (FP)
November 2012




5
             The software is run                                                             too. For example, the machine score for
             on all the documents                                                            a document is the sum of the weights
             to identify the                                                                 of all its features. There are no more
             responsive set.                                                                 straightforward arithmetic functions than
                                                                                             addition and subtraction.
Once the trade-off between recall and
precision has been made and the line                                                         As courts begin to endorse predictive
determined, the full set of documents                                                        coding and judges articulate its
can be scored. The outcomes generally                                                        advantages, attorneys will not want to be
will be close to those predicted,                                                            left behind.
although there usually is some
                                                                                             The Way Forward
degradation because the sample set
never perfectly represents the whole.                                                        As we mentioned, we found when we
                                                                                             surveyed in-house counsel and outside
As a final quality check, samples can         Is The World Ready For Predictive              lawyers that most are using predictive
be taken from both above and below            Discovery?                                     coding to cull the document set to
the line to make sure results meet                                                           reduce the task for the follow-on manual
expectations. In most cases, samples of       Predictive discovery is not new. With          review. The need for some human review
2,000 are adequate, giving a +/-3 percent     patents dating back to 2005, FTI Consulting    never will go away — with predictive
margin of error (at a 99% confidence          was the first firm to use machine learning     coding, manual review is required for
interval). Experts then review these random   for commercial e-discovery. Yet adoption       coding the sample set, learning the facts
samples against the computer rating. They     has been slow. In a survey conducted           of the case, and for identifying privileged
should confirm that approximately as many     this year, we found only about half of         documents, for example. But predictive
Rs occur above the line and as few beneath    corporate in-house and external lawyers        coding can take a higher proportion of
it as predicted.                              are using predictive discovery. But even       the load in e-discovery than most are
                                              companies using the technology are largely     allowing it to do today.
                                              experimenting with it to cull and prioritize
                                              documents for manual review. 3                 To be sure, predictive discovery isn’t
                                                                                             right for every e-discovery process. Cases
Why This is Better than Keyword Search
                                              Why has the uptake been slow? Our              with many documents in which only a
Attorneys have been using search terms        study and experience suggest two               handful are expected to be responsive —
for e-discovery for many years. But the       primary reasons: 1) a reluctance to            searches for needles in a haystack — can
biggest drawback of search is that people     invest in something the courts might           be approached differently. Smart people,
determine the terms on which to search.       not support and 2) a widespread lack of        process, and a variety of technologies
A set of 1 million documents may have         understanding of how predictive coding         can address these varying and different
100,000 distinct words. People easily can     works. Two recent court cases are helping      research goals.
come up with some likely terms — for          eliminate the first concern:
example, the name of the project under        •	 Da Silva Moore: In February, both           But for most companies that have millions
investigation and the key players. But this      parties in this case agreed to              of documents to review in e-discovery,
is a primitive way of mining a complex set       e-discovery by predictive coding. This      predictive discovery is a lower-cost solution
of documents. Consequently, recall and           was the first validation of the use of      with greater reliability and a higher level
precision typically are low.                     predictive coding by a court.               of transparency that should be used more
                                              •	 Landow Aviation: In April, a Virginia       frequently than it is today.
With predictive coding, attorneys begin          state court judge permitted the
with no preconceptions about the most            defendant (over plaintiff’s objection)
                                                                                             3	      Advice from counsel: Can predictive coding
relevant words. They let a computer              to use predictive coding to search                  deliver on its promise? by Ari Kaplan of Ari Kaplan
figure that out, considering each word           an estimated 2 million electronic                   Advisors and Joe Looby of FTI Consulting, 2012.
and every combination of two and three           documents.
consecutive words without constraint.
Further, predictive coding can use the        The second concern, though, has yet to               Joe Looby
fact that certain words will indicate         be overcome. Our recent survey found                 Senior Managing Director, FTI
                                                                                                   Technology
against responsiveness. Processing            that one of the biggest impediments to
                                                                                                   joe.looby@fticonsulting.com
this volume of possibilities is much          the adoption of predictive coding is that
too big a manual task. But, of course,        many lawyers do not know how it works.
the procedure can be done easily (and
cheaply) by a computer. And it produces       As we have tried to show here, the                  For more information and an online version of
far better results than keyword search.       principles are basic; and we intentionally          this article, visit ftijournal.com.
                                              keep the computer model’s math simple,
                                                                                             The views expressed in this article are those of the author and not
                                                                                             necessarily those of FTI Consulting, Inc., or its other professionals.



                                                                                             ©2012 FTI Consulting, Inc. All rights reserved.

Más contenido relacionado

Similar a Fti Journal Predictive Discovery

, eDiscovery: Why Humans Are Better Than Algorithms; ACC Docket May 2010
, eDiscovery: Why Humans Are Better Than Algorithms; ACC Docket May 2010, eDiscovery: Why Humans Are Better Than Algorithms; ACC Docket May 2010
, eDiscovery: Why Humans Are Better Than Algorithms; ACC Docket May 2010Susan Burns
 
eDiscovery, Humans and Algorithms
eDiscovery, Humans and AlgorithmseDiscovery, Humans and Algorithms
eDiscovery, Humans and AlgorithmsRQRusseth
 
Epiq E Discovery Faq Hong Kong
Epiq E Discovery Faq Hong KongEpiq E Discovery Faq Hong Kong
Epiq E Discovery Faq Hong KongDmitriHubbard
 
The Investigative Lab - Nuix
The Investigative Lab - NuixThe Investigative Lab - Nuix
The Investigative Lab - NuixNuix
 
The Investigative Lab - White Paper
The Investigative Lab - White PaperThe Investigative Lab - White Paper
The Investigative Lab - White PaperNuix
 
Surviving Technology 2009 & The Paralegal
Surviving Technology 2009 & The ParalegalSurviving Technology 2009 & The Paralegal
Surviving Technology 2009 & The ParalegalAubrey Owens
 
Legal Technology 2011 and the Paralegal
Legal Technology 2011 and the ParalegalLegal Technology 2011 and the Paralegal
Legal Technology 2011 and the ParalegalAubrey Owens
 
Rapid dna -_disruptive new technology for criminal justice_rbj
Rapid dna -_disruptive new technology for criminal justice_rbjRapid dna -_disruptive new technology for criminal justice_rbj
Rapid dna -_disruptive new technology for criminal justice_rbjrbjamieson
 
Ten steps for early awareness and relevance/non-relevance selection
Ten steps for early awareness and relevance/non-relevance selectionTen steps for early awareness and relevance/non-relevance selection
Ten steps for early awareness and relevance/non-relevance selectionJeffJohnson442
 
Predict Conference: Data Analytics for Digital Forensics and Cybersecurity
Predict Conference: Data Analytics for Digital Forensics and CybersecurityPredict Conference: Data Analytics for Digital Forensics and Cybersecurity
Predict Conference: Data Analytics for Digital Forensics and CybersecurityMark Scanlon
 
Orange Legal Technologies Time, Risk, And Cost In E Discovery 021009
Orange Legal Technologies   Time, Risk, And Cost In E Discovery 021009Orange Legal Technologies   Time, Risk, And Cost In E Discovery 021009
Orange Legal Technologies Time, Risk, And Cost In E Discovery 021009kecurrey
 
Josh Moulin: Designing a Mobile Digital Forensic Lab on a Budget
Josh Moulin: Designing a Mobile Digital Forensic Lab on a BudgetJosh Moulin: Designing a Mobile Digital Forensic Lab on a Budget
Josh Moulin: Designing a Mobile Digital Forensic Lab on a BudgetJosh Moulin, MSISA,CISSP
 
Text mining on criminal documents
Text mining on criminal documentsText mining on criminal documents
Text mining on criminal documentsZhongLI28
 
Finding a Needle in an Electronic Haystack: The Science of Search and Retrieval
Finding a Needle in an Electronic Haystack: The Science of Search and RetrievalFinding a Needle in an Electronic Haystack: The Science of Search and Retrieval
Finding a Needle in an Electronic Haystack: The Science of Search and RetrievalRonaldJLevine
 
Judging E-Discovery Disputes
Judging E-Discovery DisputesJudging E-Discovery Disputes
Judging E-Discovery DisputesDavid Harvey
 
Defining a Legal Strategy ... The Value in Early Case Assessment
Defining a Legal Strategy ... The Value in Early Case AssessmentDefining a Legal Strategy ... The Value in Early Case Assessment
Defining a Legal Strategy ... The Value in Early Case AssessmentAubrey Owens
 
Waarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeftWaarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeftjcscholtes
 
Digital forensics research: The next 10 years
Digital forensics research: The next 10 yearsDigital forensics research: The next 10 years
Digital forensics research: The next 10 yearsMehedi Hasan
 
2014 Year-End E-Discovery Update
2014 Year-End E-Discovery Update2014 Year-End E-Discovery Update
2014 Year-End E-Discovery UpdateGareth Evans
 
Network and computer forensics
Network and computer forensicsNetwork and computer forensics
Network and computer forensicsJohnson Ubah
 

Similar a Fti Journal Predictive Discovery (20)

, eDiscovery: Why Humans Are Better Than Algorithms; ACC Docket May 2010
, eDiscovery: Why Humans Are Better Than Algorithms; ACC Docket May 2010, eDiscovery: Why Humans Are Better Than Algorithms; ACC Docket May 2010
, eDiscovery: Why Humans Are Better Than Algorithms; ACC Docket May 2010
 
eDiscovery, Humans and Algorithms
eDiscovery, Humans and AlgorithmseDiscovery, Humans and Algorithms
eDiscovery, Humans and Algorithms
 
Epiq E Discovery Faq Hong Kong
Epiq E Discovery Faq Hong KongEpiq E Discovery Faq Hong Kong
Epiq E Discovery Faq Hong Kong
 
The Investigative Lab - Nuix
The Investigative Lab - NuixThe Investigative Lab - Nuix
The Investigative Lab - Nuix
 
The Investigative Lab - White Paper
The Investigative Lab - White PaperThe Investigative Lab - White Paper
The Investigative Lab - White Paper
 
Surviving Technology 2009 & The Paralegal
Surviving Technology 2009 & The ParalegalSurviving Technology 2009 & The Paralegal
Surviving Technology 2009 & The Paralegal
 
Legal Technology 2011 and the Paralegal
Legal Technology 2011 and the ParalegalLegal Technology 2011 and the Paralegal
Legal Technology 2011 and the Paralegal
 
Rapid dna -_disruptive new technology for criminal justice_rbj
Rapid dna -_disruptive new technology for criminal justice_rbjRapid dna -_disruptive new technology for criminal justice_rbj
Rapid dna -_disruptive new technology for criminal justice_rbj
 
Ten steps for early awareness and relevance/non-relevance selection
Ten steps for early awareness and relevance/non-relevance selectionTen steps for early awareness and relevance/non-relevance selection
Ten steps for early awareness and relevance/non-relevance selection
 
Predict Conference: Data Analytics for Digital Forensics and Cybersecurity
Predict Conference: Data Analytics for Digital Forensics and CybersecurityPredict Conference: Data Analytics for Digital Forensics and Cybersecurity
Predict Conference: Data Analytics for Digital Forensics and Cybersecurity
 
Orange Legal Technologies Time, Risk, And Cost In E Discovery 021009
Orange Legal Technologies   Time, Risk, And Cost In E Discovery 021009Orange Legal Technologies   Time, Risk, And Cost In E Discovery 021009
Orange Legal Technologies Time, Risk, And Cost In E Discovery 021009
 
Josh Moulin: Designing a Mobile Digital Forensic Lab on a Budget
Josh Moulin: Designing a Mobile Digital Forensic Lab on a BudgetJosh Moulin: Designing a Mobile Digital Forensic Lab on a Budget
Josh Moulin: Designing a Mobile Digital Forensic Lab on a Budget
 
Text mining on criminal documents
Text mining on criminal documentsText mining on criminal documents
Text mining on criminal documents
 
Finding a Needle in an Electronic Haystack: The Science of Search and Retrieval
Finding a Needle in an Electronic Haystack: The Science of Search and RetrievalFinding a Needle in an Electronic Haystack: The Science of Search and Retrieval
Finding a Needle in an Electronic Haystack: The Science of Search and Retrieval
 
Judging E-Discovery Disputes
Judging E-Discovery DisputesJudging E-Discovery Disputes
Judging E-Discovery Disputes
 
Defining a Legal Strategy ... The Value in Early Case Assessment
Defining a Legal Strategy ... The Value in Early Case AssessmentDefining a Legal Strategy ... The Value in Early Case Assessment
Defining a Legal Strategy ... The Value in Early Case Assessment
 
Waarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeftWaarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeft
 
Digital forensics research: The next 10 years
Digital forensics research: The next 10 yearsDigital forensics research: The next 10 years
Digital forensics research: The next 10 years
 
2014 Year-End E-Discovery Update
2014 Year-End E-Discovery Update2014 Year-End E-Discovery Update
2014 Year-End E-Discovery Update
 
Network and computer forensics
Network and computer forensicsNetwork and computer forensics
Network and computer forensics
 

Más de Albert Kassis

Threat From The Inside, Fti Journal
Threat From The Inside, Fti JournalThreat From The Inside, Fti Journal
Threat From The Inside, Fti JournalAlbert Kassis
 
Email Risk, by Albert Kassis
Email Risk, by Albert KassisEmail Risk, by Albert Kassis
Email Risk, by Albert KassisAlbert Kassis
 
Antitrust Case Study
Antitrust Case StudyAntitrust Case Study
Antitrust Case StudyAlbert Kassis
 
Metadata Strategies by Albert Kassis
Metadata Strategies by Albert KassisMetadata Strategies by Albert Kassis
Metadata Strategies by Albert KassisAlbert Kassis
 
Audio Discovery by Albert Kassis
 Audio  Discovery by Albert Kassis  Audio  Discovery by Albert Kassis
Audio Discovery by Albert Kassis Albert Kassis
 

Más de Albert Kassis (7)

Threat From The Inside, Fti Journal
Threat From The Inside, Fti JournalThreat From The Inside, Fti Journal
Threat From The Inside, Fti Journal
 
Email Risk, by Albert Kassis
Email Risk, by Albert KassisEmail Risk, by Albert Kassis
Email Risk, by Albert Kassis
 
Antitrust Case Study
Antitrust Case StudyAntitrust Case Study
Antitrust Case Study
 
Refco Case Study
Refco Case StudyRefco Case Study
Refco Case Study
 
FTC Case Study
FTC Case StudyFTC Case Study
FTC Case Study
 
Metadata Strategies by Albert Kassis
Metadata Strategies by Albert KassisMetadata Strategies by Albert Kassis
Metadata Strategies by Albert Kassis
 
Audio Discovery by Albert Kassis
 Audio  Discovery by Albert Kassis  Audio  Discovery by Albert Kassis
Audio Discovery by Albert Kassis
 

Fti Journal Predictive Discovery

  • 1. November 2012 E-discovery — Taking Predictive Coding Out of the Black Box Joseph H. Looby Senior Managing Director FTI TECHNOLOGY IN CASES OF COMMERCIAL LITIGATION, the process of discovery can place a huge burden on defendants. Often, they have to review millions or even tens of millions of documents to find those that are relevant to a case. These costs can spiral into millions of dollars quickly.
  • 2. November 2012 M anual reviews are time consuming and very expensive. And the computerized alternate — keyword search — is too imprecise a tool for extracting relevant documents from a huge pool. As the number of documents to be reviewed for a typical lawsuit increases, the legal profession needs a better way. Predictive discovery is that approach. this year courts in the United States Illustration by Big Human Predictive discovery is a machine- have begun to endorse the use of this based method that rapidly can method. But according to a survey review a large pool of documents to FTI Consulting recently conducted, a determine which are relevant to a significant obstacle stands in the way of lawsuit. It takes attorneys’ judgments widespread adoption: Most attorneys do about the relevance of a sample set of not understand how predictive discovery documents and builds a model that works and are apprehensive about using it. accurately extrapolates that expertise to the entire population of potential In this article, we explain how predictive documents. Predictive discovery is discovery functions and why attorneys faster, cheaper and more accurate than should familiarize themselves with it. traditional discovery approaches, and Companies defending themselves commercial lawsuit might have entailed about $10 million — a staggering sum. against litigation frequently must searching through tens of thousands of Conducting keyword searches of online produce documents that may be documents. A team of attorneys would documents, of course, is much faster. But relevant to a case by searching through pore through all the paper, with perhaps in certain circumstances, such searches archives. The same is true for companies one or two subject matter experts have been shown to produce as little as being investigated by a regulatory to define the search parameters and 20 percent of the relevant documents, body. The process of plumbing one’s oversee the procedure. A larger number of along with many that are irrelevant. own archives to produce relevant more junior lawyers frequently would do documents is known as discovery. When the bulk of the work. This process (known With the number of documents in large the documents are in electronic form, in the trade as linear human or manual cases soaring (we currently have a case the process is called e-discovery. Large review) would cost a few thousand dollars involving 50 million documents), costs companies today often must review (today, it runs about $1 per document). are getting out of hand. Yet the price of millions or even tens of millions of letters, failing to produce relevant documents e-mails, records, manuscripts, transcripts But now, the large number of documents can be extraordinarily high in big-stakes and other documents for this purpose. to be reviewed for a single case can make litigation. The legal profession needs a manual searches a hugely expensive and new approach that improves reliability The discovery process once was entirely time-consuming solution. For a case with and minimizes costs. manual. In the 1980s, the typical 10 million documents, the bill could run
  • 3. November 2012 There now is a third approach. Predictive documents to identify the full responsive set. because most of the expense is incurred discovery takes expert judgment from to establish the upfront search rules. a sample set of documents relevant (or Studies show that predictive coding Afterwards, the cost increases only slightly responsive) to a case and uses computer usually finds a greater proportion — as the size of the collection grows. modeling to extrapolate that judgment typically in the range of 75 percent to the remainder of the pool. In a typical (a measure known as recall) — of the In a recent engagement, we were asked discovery process using predictive responsive documents than other to review 5.4 million documents. Using coding, an expert reviews a sample of approaches. A seminal 1985 study found predictive discovery, we easily met the the full set of documents, say 10,000- keyword searches yielded average recall deadline and identified 480,000 relevant 20,000 of them, and ranks them as rates of about 20 percent. A 2011 study documents — around 9 percent of the responsive (R) or non-responsive (NR) to showed recall rates for linear human total — for about $1 million less than linear the case. A computer model then builds review to be around 60 percent. And, manual review or keyword search would a set of rules that reflects the attorney’s according to these same studies, the have cost. judgment on the sample set. Working on proportion of relevant documents in the sample set and comparing its verdict the pool of documents identified by on each document with the expert’s, the predictive coding (a measure known as Definition: Trial and Error software continually improves its model precision) is better, too. The process of experimenting until the results meet the standards with various methods of doing agreed to by the parties and sometimes The cost of predictive discovery is a fraction something until one finds the a court. Then the software applies of that for linear manual review or keyword most successful. its rules to the entire population of search, especially for large document sets, So how does predictive discovery work? As suggested above, there are five key steps: 1 Experts code a representative sample of the document set as responsive or non-responsive. 2 Predictive coding software reviews the sample and assigns weights to features of the documents. The software refines the weights by testing them against every 3 document in the sample set. 4 The effectiveness of the model is measured statistically to assess whether it is producing acceptable results. 5 The software is run on all the documents to identify the responsive set. 1. “An Evaluation Of Retrieval Effectiveness For A Full-Text Document Retrieval System,” by David C. Blair of the University of Michigan and M.E. Maron of the University of California at Berkeley, 1985 2. “Technology-Assisted Review in E-discovery Can Be More Effective and More Efficient than Exhaustive Manual Review,” by Maura R. Grossman and Gordon V. Cormack, Richmond Journal of Law and Technology, Vol. XVII, Issue 3, 2011.
  • 4. November 2012 Let’s look at these steps in more detail. 1 COLLECTION REFERENCE SET EXPERT REVIEW RESULTS Experts code a representative sample 17,000 of the document set as Responsive responsive or non- Sample Training Set (23%) responsive. 1,000,000 In a typical case, a plaintiff might ask a 983,000 Attorney Expert Review company to produce the universe of Document Collection Non–Responsive documents regarding some specifics about Project X. The company prepares a list of Prediction Set responsive 3,900 its employees involved in the project. From non–responsive 13,100 all the places where teams of employees working on Project X stored documents, Estimate Of Goal the firm secures every one of the potentially We are 99% confident that 23% (+/- 1%) of the 1,000,000 documents are responsive. relevant e-mails and documents. These can easily total 1 million or more items. These documents are broken into two and valuation issues ….” for instance, Each feature is given a weight. If the subsets: the sample or training set and the yields 33 features as shown below. feature tends to indicate a document is prediction set (i.e., the remainder). To have responsive, the weight will be positive. a high level of confidence in the sample, The total number of features in a set If the feature tends to indicate a the training set must be around 17,000 of 1 million documents is huge, but document is non-responsive, the weight documents. One or more expert attorneys even large numbers of features easily will be negative. At the start, all the codes the training set’s documents as R or can be processed today with relatively weights are zero. NR with respect to the litigation. inexpensive technology. The results of this exercise will predict with reasonable accuracy the total number TRAINING SET EXAMPLE FEATURE SET of Rs in the full set. A statistically valid random sample of 17,000 documents Words, Phrases & responsive 3,900 Feature # Metadata skillings 1 generally is sufficient to ensure that our non–responsive 13,100 Unigrams abrupt 2 answer will be accurate to +/-1 percent departure 3 will 4 at the 99 percent confidence level. For raise 5 example, if the number of R documents suspicions 6 in the sample set is 3,900 (23 percent PROCESSING of 7 accounting 8 of 17,000), it is 99 percent certain that improprieties 9 between 22 to 24 percent of the 1 million and 10 1 Extract Machine-Readable Text valuation 11 documents are responsive. issues 12 2 Example skillings abrupt 13 Predictive coding Bigrams abrupt departure 14 departure will 15 software reviews the From: Sherron Watkins will raise 16 sample and assigns To: Kenneth Lay raise suspicions 17 Sent: August 2001 suspicions of 18 weights to features of accounting 19 of the documents. Skilling’s abrupt departure will raise accounting improprieties 20 improprieties and 21 suspicions of accounting improprieties and valuation 22 and valuation issues... For the sample set of documents, the valuation issues 23 skillings abrupt departure 24 software creates a list of all the features Trigrams abrupt departure will 25 of every document in the sample set. 3 Make Features departure will raise 26 Features are single words or strings - Break stream of text into tokens will raise suspicions 27 raise suspicions of 28 of consecutive words; for example, - Remove whitespace and punctuation suspicions of accounting 29 up to three words long. The sentence - Normalize to lowercase of accounting improprieties 30 accounting improprieties and 31 “Skilling’s abrupt departure will raise - Convert tokens into Feature Set improprieties and valuation 32 suspicions of accounting improprieties and valuation issues 33
  • 5. November 2012 3 The software refines the score for every document according to the WEIGHT TABLE weights by testing them final weight table. The machine score for a against every document Feature Weight document is the sum of the weights of all in the sample set. will raise -0.6 its features. In the illustration to the left, we show two documents: a document with a The next step is one of trial and error. The suspicions 0.7 score of -5.3 (probably non-responsive) and model picks a document from the training of - a document with a score of +8.7 (probably set and tries to guess if the document is responsive). responsive. To do this, the software sums improprieties and valuation 1.8 the importance weights for the words in Then, if predictive coding is to be used to the document to calculate a score. If the select the highly responsive documents score is positive (+), the software guesses from the collection of 1 million and NR R responsive, or R — this is the trial. to discard the highly non-responsive documents, a line has to be drawn. The line If the model guesses wrong (i.e., the model is the minimum score for documents that and attorney disagree), the software will be considered responsive. Everything adjusts the importance weights of the with a higher score (i.e., above the line), words recorded in the model’s weight subject to review, will comprise the table. This, of course, is the error. Gradually, -5.3 0 +8.7 responsive set. through trial and error, the importance weights become better indicators of In our 17,000 document sample, of which responsiveness. Pretty soon, the computer better reflection of the attorney expert’s the expert found 3,900 to be R, there might will have created a weight table with judgment. This process is called machine be 7,650 documents that scored more scores of importance and unimportance learning. The computer can take several than zero. Of these, 3,519 are documents for the words and phrases in the training passes through the sample set until the the expert coded R, and 4,131 are not. set, an excerpt of which might look like the weight table is as good as it can be based Therefore, with the line drawn at 0, we will illustration at the right. on the sample. achieve a recall of 90 percent (3,519 divided by 3,900) and a precision of 46 percent As the software reviews more documents, Once the weight table is ready, the program (3,519 divided by 7,650). the weight table becomes an increasingly is run on the full sample to generate a ATTORNEY EXPERT COMPUTER MODEL REVIEW (AT >0) NR R 7,650 FN TP TN Vs FP -1 0 +1 Found responsive 3,900 True + 3,519 90% Recall non–responsive 13,100 False + 4,131 46% Precision 17,000 7,650 FALSE TRUE FALSE POSITIVES POSITIVES NEGATIVES
  • 6. November 2012 4 The effectiveness of between 0 and +0.3 are missed). On the the model is measured basis of the sample set, the software statistically to assess can determine recall and precision for whether it is producing any line. Typically, there is a trade-off; acceptable results. as precision goes up (and the program delivers fewer NR documents), recall By drawing the line higher, one can declines (and more of the R documents increase the ratio of recall to precision. are left behind). However, not only are For example, if the line is drawn at +0.3, recall and precision scores higher than the precision may be significantly higher for other techniques, the trade-off can be (say 70 percent) but with lower recall managed much more explicitly and precisely. (i.e., 80 percent because some of the Rs ATTORNEY EXPERT COMPUTER MODEL REVIEW (AT >0.3) NR R 4,450 FN TP Vs. TN FP -1 0 0.3 +1 Found responsive 3,900 True + 3,125 80% Recall non–responsive 13,100 False + 1,325 70% Precision 17,000 4,450 FALSE TRUE FALSE POSITIVES POSITIVES NEGATIVES Recall measures how well a process retrieves Precision measures how well a process relevant documents. For example, 80% recall retrieves only relevant documents. For example, means a process is returning 80% of the estimated 70% precision means for every 70 responsive responsive documents in the collection. documents found, 30 non-responsive documents also were returned. 80% = 3,125 / 3,125 + 775 70% = 3,125 / 3,125 + 1,325 Recall True True False Precision True True False Positives Positives Negatives Positives Positives Positives (TP) (TP) (FN) (TP) (TP) (FP)
  • 7. November 2012 5 The software is run too. For example, the machine score for on all the documents a document is the sum of the weights to identify the of all its features. There are no more responsive set. straightforward arithmetic functions than addition and subtraction. Once the trade-off between recall and precision has been made and the line As courts begin to endorse predictive determined, the full set of documents coding and judges articulate its can be scored. The outcomes generally advantages, attorneys will not want to be will be close to those predicted, left behind. although there usually is some The Way Forward degradation because the sample set never perfectly represents the whole. As we mentioned, we found when we surveyed in-house counsel and outside As a final quality check, samples can Is The World Ready For Predictive lawyers that most are using predictive be taken from both above and below Discovery? coding to cull the document set to the line to make sure results meet reduce the task for the follow-on manual expectations. In most cases, samples of Predictive discovery is not new. With review. The need for some human review 2,000 are adequate, giving a +/-3 percent patents dating back to 2005, FTI Consulting never will go away — with predictive margin of error (at a 99% confidence was the first firm to use machine learning coding, manual review is required for interval). Experts then review these random for commercial e-discovery. Yet adoption coding the sample set, learning the facts samples against the computer rating. They has been slow. In a survey conducted of the case, and for identifying privileged should confirm that approximately as many this year, we found only about half of documents, for example. But predictive Rs occur above the line and as few beneath corporate in-house and external lawyers coding can take a higher proportion of it as predicted. are using predictive discovery. But even the load in e-discovery than most are companies using the technology are largely allowing it to do today. experimenting with it to cull and prioritize documents for manual review. 3 To be sure, predictive discovery isn’t right for every e-discovery process. Cases Why This is Better than Keyword Search Why has the uptake been slow? Our with many documents in which only a Attorneys have been using search terms study and experience suggest two handful are expected to be responsive — for e-discovery for many years. But the primary reasons: 1) a reluctance to searches for needles in a haystack — can biggest drawback of search is that people invest in something the courts might be approached differently. Smart people, determine the terms on which to search. not support and 2) a widespread lack of process, and a variety of technologies A set of 1 million documents may have understanding of how predictive coding can address these varying and different 100,000 distinct words. People easily can works. Two recent court cases are helping research goals. come up with some likely terms — for eliminate the first concern: example, the name of the project under • Da Silva Moore: In February, both But for most companies that have millions investigation and the key players. But this parties in this case agreed to of documents to review in e-discovery, is a primitive way of mining a complex set e-discovery by predictive coding. This predictive discovery is a lower-cost solution of documents. Consequently, recall and was the first validation of the use of with greater reliability and a higher level precision typically are low. predictive coding by a court. of transparency that should be used more • Landow Aviation: In April, a Virginia frequently than it is today. With predictive coding, attorneys begin state court judge permitted the with no preconceptions about the most defendant (over plaintiff’s objection) 3 Advice from counsel: Can predictive coding relevant words. They let a computer to use predictive coding to search deliver on its promise? by Ari Kaplan of Ari Kaplan figure that out, considering each word an estimated 2 million electronic Advisors and Joe Looby of FTI Consulting, 2012. and every combination of two and three documents. consecutive words without constraint. Further, predictive coding can use the The second concern, though, has yet to Joe Looby fact that certain words will indicate be overcome. Our recent survey found Senior Managing Director, FTI Technology against responsiveness. Processing that one of the biggest impediments to joe.looby@fticonsulting.com this volume of possibilities is much the adoption of predictive coding is that too big a manual task. But, of course, many lawyers do not know how it works. the procedure can be done easily (and cheaply) by a computer. And it produces As we have tried to show here, the For more information and an online version of far better results than keyword search. principles are basic; and we intentionally this article, visit ftijournal.com. keep the computer model’s math simple, The views expressed in this article are those of the author and not necessarily those of FTI Consulting, Inc., or its other professionals. ©2012 FTI Consulting, Inc. All rights reserved.