A (bright) green light for predictive coding in English litigation
A recent interlocutory judgment in Pyrrho Investments Limited & Anr -v- MWB Property Limited & Ors  EWHC 256 (Ch) endorses, for the first time, the use of predictive coding when conducting disclosure in English civil proceedings.
RPC act for the claimants in the case. The decision is of significant interest to all parties involved in disputes with significant volumes of documents.
Disclosure in any large case can often be challenging. The extent of the "reasonable search" to be conducted by parties pursuant to CPR Part 31 and its practice directions can be a particularly vexing question given the sharp increase in the amount of electronic data being created in all walks of life. Frequently the starting point for the number of documents captured by the search can run into several million. Associated costs can be vast, as can be the amount of time required to review documents.
In the present case, the bulk of documents to be reviewed for the purposes of disclosure were held by the second claimant because it controlled back-up tapes on which all the data from its servers (including emails) was stored during the relevant period. The restoration of data from a selection of those back-up tapes yielded more than 17.6 million documents. This reduced to approximately 3.1 million by a process of electronic de-duplication, but reviewing this number of documents remained a large and costly exercise which, in broad terms, the parties agreed would be disproportionate in this case. As a result, the parties sought to consider ways in which the second claimant's disclosure review process could be improved by the use of technology.
The extent of the reasonable search
Parties ordered to give standard disclosure in English proceedings must, in summary, make a reasonable search for documents (including electronic documents) which are helpful or unhelpful to their case or the case of another party. Factors relevant in deciding the extent of the reasonable search include matters such as the number of documents involved, the nature and complexity of the proceedings, and the ease and expense of retrieval of any document and its likely significance. These factors are expanded specifically in relation to the reasonable search for electronic documents to include a requirement that parties "should bear in mind that the overriding objective includes dealing with the case in ways which are proportionate".
However, the question of how the search for and review of electronic documents is to be conducted is not dealt with in any detail. There are comments in Practice Direction B to CPR 31 about the use of "Keyword Searches or other automated methods of searching if a full review of each and every document would be unreasonable". The judges of the Technology and Construction Court support an eDisclosure Protocol, produced by practitioners and available on the website of the Technology and Construction Solicitors’ Association. This contemplates the use of computer software in appropriate cases. However, it is only a protocol and has no normative force.
Historically this lack of guidance was less of an issue because there were less electronic documents to be searched for and reviewed and the amount of paper documents tended to be more manageable in most cases. As a result, litigators have for decades tended to employ a manual "linear review" process to review documents which are collected by a reasonable search, where a lawyer or teams of lawyers review every document collected to determine whether or not each is disclosable. In cases with large numbers of documents, this can lead to a substantial army of lawyers and paralegals reviewing several hundreds of thousands of documents over a period of many months. However, as the amount of data – particularly electronic documents – grows ever larger, such linear reviews often become ever more unworkable from both a time and proportionality perspective. Step in predictive coding.
Predictive coding goes by many names, including "technology assisted review" and "computer aided review". It means the review of documents by proprietary computer software rather than human beings. There are a number of potential variables and processes involved but, in essence, the computer software is "trained" by lawyers, who are familiar with the issues in the case. The lawyers review various subsets of the global dataset available and the computer then categorises all other available documents as relevant or not relevant, essentially by applying complex algorithms and looking for common concepts and language used in documents.
There then follows a further level of manual review after which the documents to be disclosed can be finalised. The extent of this tends to lie somewhere between, on the one hand, a full manual review of all documents considered by the computer to be relevant and, on the other hand, a less extensive review only of further subsets of documents for the purposes of quality assurance. Various mechanisms can also be built into the process to seek to identify material which may be privileged for full manual review.
At the very least, predictive coding helps ensure that documents most likely to be relevant are reviewed earlier in the process and documents least likely to be relevant are not subject to manual review at all. However, if employed to a greater extent in appropriate cases, predictive coding can allow only a relatively small proportion of the overall pool of documents to be subject to manual review with a far greater number ultimately selected by the computer for disclosure without necessarily having been manually reviewed at all. It was the latter approach which, although perhaps not suitable in all cases, was contemplated (and ultimately agreed) in this case for various reasons.
Discussion in this case
In light of the large amount of documents involved in the second claimant's disclosure review, and the projected costs and timescales involved to conduct a full linear review of them, the parties engaged in extensive correspondence to seek to agree a sensible and proportionate process to be utilised by the second claimant.
At the case management conference, the Court ordered that a further hearing be held to deal with issues pertaining to e-disclosure. Leading up to that hearing, the parties sought to agree various methods to make the second claimant's disclosure review process more targeted, including by identifying particular data custodians, applying date ranges and using key word searches in the usual way. In addition, the parties agreed in principle to use predictive coding to significantly reduce the amount of manual review to be undertaken. However, as the method of predictive coding contemplated would mean that not all documents disclosed by the second claimant would have been reviewed by its legal team prior to disclosure, and given that there was no prior English authority endorsing the use of predictive coding to any extent, the parties considered it appropriate to seek the Court's endorsement of the proposed approach.
In its helpful judgment on these issues, the Court explained the matters summarised above in further detail. It referred to previous comments on electronic disclosure made by the English Court in Goodale v Ministry of Justice which contemplated the use of computer software to aid a disclosure review (but went no further than that). It also referred to judgments in the US Federal Court and the Irish High Court which both gave helpful commentary on the use of computer assisted review in those jurisdictions. Drawing all of this together, the Court then cited 10 factors in favour of approving the use of predictive coding in this case, including that:
- experience in other jurisdictions has been that predictive coding can be useful in appropriate cases;
- there is no evidence to suggest that predictive coding leads to less accurate disclosure being given than other methods, and indeed greater accuracy and consistency may be achievable;
- there is nothing precluding its use in English procedural rules; and
- there was a vast number of documents to be considered for review in this case and the costs of lawyers conducting a linear review would be vast, which costs could be substantially reduced to a more proportionate level by using predictive coding in the manner contemplated.
The Court was also of the view that there were no factors pointing against the use of predictive coding in this case.
As a result, the Court concluded that this "was a suitable case in which to use [predictive coding], and that it would promote the overriding objective set out in Part 1 of the CPR…"
The Court's approval of predictive coding is welcomed and a significant development. With the ever-increasing amounts of data often being handled in litigation, and automated search techniques becoming ever more sophisticated, perhaps the only surprise is that it has taken until now for the Court to formally endorse its use. In any event, the potential benefits of predictive coding in appropriate cases are obvious. At the very least, it presents a viable alternative to traditional linear reviews for consideration.
For those involved in complex litigation with vast numbers of documents, this judgment is likely to provide the comfort needed to allow serious consideration to be given to the use of predictive coding which previously had perhaps been seen a riskier and less defensible alternative to linear reviews. As a result, the use of predictive coding in such cases may well increase notably following this judgment (as, perhaps, will the body of English judicial authority supporting its use).
 CPR, rule 31.6
 CPR, rule 31.7
 CPR, paragraph 20 of Practice Direction B to rule 31
  EWHC B41 (QB)
 Moore v Publicis Groupe, 11 Civ 1279 (ALC)(AJP)
 Irish Bank Resolution Corporation Ltd v Quinn  IEHC 175