Use Of Artificial Intelligence In E-Discovery

Artificial Intelligence (AI) is not a new concept, with the term widely accepted to have entered popular usage as early as 1955 when computer scientist (and AI founding father) John McCarthy coined it as part of his efforts to consolidate thinking and new ideas around 'thinking machines'. The following year, McCarthy and others organized the Dartmouth Summer Research Project on Artificial Intelligence, which is now considered to be the founding event in the field of AI.¹

There have been several AI Springs and Winters since then, with periods of rapid progress followed by cycles of reduced funding and interest. Experts now agree, however, that the age of AI 'hype cycles' may be over, as we transition globally into the so-called AI era, towards the irreversible integration of AI technology in everyday life and across an increasing number of economic sectors. This new era has largely been driven by the release of generative AI models, deep-learning models which can – to varying degrees of success – replicate the levels of reasoning, creativity and intelligence associated with human cognition and abilities, based on the data they were trained on.

Role Of Technology and AI In E-Discovery

AI certainly has the capacity to completely transform the legal industry, particularly when it comes to the world of e-discovery. E-discovery refers to the process of identifying, collecting and producing data in litigation or other legal proceedings where Electronically Stored Information (ESI) is required. ESI is any data that is stored or transmitted electronically – not only emails and documents, but also digital content such as audio files, images, videos, website content, instant messages etc.

AI technology has long been used at various stages of the e-discovery process to expedite and simplify the task at hand. Experts use predictive coding or Technology-Assisted Review (TAR) which – in its original iteration – uses machine learning to predict which documents are likely to contain relevant content based on an initial 'seed set' of manually coded or 'tagged' documents inputted by a human reviewer. The computer running TAR software learns from the seed set to predict how new documents should be tagged. This method relies heavily on complete and accurate data sets being inputted initially – the quality of the computer-generated results reflects the quality of the input.

The next generation of TAR, however, uses Continuous Active Learning (CAL) or TAR 2.0 as it is commonly known. With a CAL workflow, there is no need for the review of a seed set of documents like with the traditional TAR model. Instead, the computer is able to learn in real time as human reviewers begin coding documents. The CAL workflow then promotes documents that it thinks are most relevant to the front of the review queue, meaning that documents likely to be of greatest relevance are seen earlier by reviewers. The computer continuously improves its understanding of the data set, integrating and learning from information as the review team codes documents. This technology is particularly beneficial in cases with large data sets and tight timeframes.

TAR and the Courts

Under new UK disclosure rules which became permanent practice direction in September 2022, the importance of TAR in the e-discovery process is highlighted and the rules show the general acceptance of the technology by the courts². TAR was first approved for use over a decade ago in 2012 in the US courts, however, with the Da Silva Moore v Publicis Groupe & MSL Group ruling³. In the ruling, now retired New York Magistrate Judge Andrew J. Peck issued an opinion endorsing the use of TAR as an 'acceptable way to search for relevant ESI in appropriate cases' given the hugely positive impact on speed of review, performance outcomes and process transparency.

Three years later, Judge Peck reiterated his judicial opinion on court acceptance of TAR in Rio Tinto PLC v Vale S.A., in which he stated that 'in the three years since Da Silva Moore, case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it'⁴.

In the same year, 2015, the Irish High Courts also first approved the use of TAR (Irish Bank Resolution Corporation Ltd. v Quinn⁵), closely followed by the British Courts in 2016, which recognized the use of TAR in Pyrrho Investments Ltd. v MWB Property Ltd.⁶, in a ruling that highlighted the increased efficiency of TAR and the overriding objective of dealing with cases 'justly and at proportionate cost' as set out by the Civil Procedure Rules (CPR)⁷.

TAR is commonly accepted today, with disputes regarding TAR now focused instead on the transparency of the approach or how it is conducted, rather than the question of whether it should be used in the discovery process at all.

Whilst TAR cannot entirely replace humans in the discovery process, the relentless advancement of the design and capabilities of the software make it a vital tool for lawyers working on increasingly large data sets in the digital age. Yet still, for TAR to be effective, the key to success is having an experienced e-discovery expert at the helm of the algorithm.

Generative AI And E-Discovery

Generative AI (GenAI) describes algorithms that work by analyzing vast and often complex data sets to create a series of structures and patterns from which the model is able to generate new content – whether that be text, images, or other data – as an output, often as a response to a set of specific prompts.

In the e-discovery process, GenAI models use pre-trained Large Language Models (LLMs) as a reference framework for completing elements of discovery including document identification and review. In contrast to TAR – which requires significant time and cost investment from experts to train the model – GenAI is able to deliver accurate results 'out of the box' resulting in a significant time saving for discovery experts.

GenAI has much broader capabilities than TAR, allowing it to perform a number of tasks beyond document classification and sorting. For instance, GenAI can perform conceptual searches allowing discovery experts to use the technology for fact finding by inserting a prompt into the system. The GenAI model uses LLM technology to swiftly answer the question using natural language answers, references and example documents. Moreover, the natural language processing capabilities of GenAI also mean it is able to perform these searches and analyse sentiment across multiple languages simultaneously.

GenAI is also considerably more flexible than TAR, and can provide customized solutions for specific use cases in terms of the way the system performs or is integrated into e-discovery processes. This increased integration functionality means GenAI models often offer a more holistic solution by working alongside other AI technologies and tools / data platforms to increase the quality of the output.

However, due to the sensitive nature of e-discovery work it is highly unlikely that the data would ever be used to create a LLM – in no small part due to issues around confidentiality.

Therefore, AI systems, at the moment at least, cannot "learn" or get better – the performance of the system is effectively a flat-line, which could easily be overtaken by adaptive / learning technologies.

That said, we will see enhanced benefits when interoperability challenges are overcome and the different forms of AI (e.g. TAR/CAL/GenAI) start to work together to drive enhanced results – though this integrated approach is still some way off becoming a reality.

The Risks

The greater flexibility and capability of GenAI without doubt make the technology a huge value add in the e-discovery process, especially for complex cases and those involving vast amounts of diverse data. That said, this potential benefit also comes with potential risk.

Despite huge advancements in the technology over recent years, GenAI models still can – and often do – make errors, particularly when dealing with complex documents. GenAI has also been known to hallucinate – confidently fabricating content and presenting it as fact. These hallucinations are often caused by flaws in the input data set; whether that be inaccurate data, data biases, or simply where the data is too limited in volume or scope. Where a GenAI system does not have enough data to respond to a prompt it will often 'paper over the gaps' and produce what it considers the most likely answer based on its training data.

Problems arise, however, where this answer is presented as truth when it is factually incorrect. An example of GenAI creating a fictious output has already been seen in the courts, with six of the cases submitted by the plaintiff's lawyers in a 2023 personal injury case against an airline operator appearing to be 'bogus', which the legal team later admitted were created by natural language processing GenAI chatbot ChatGPT⁸. The problem seems to be widespread, with a recent study from Stanford University finding that AI hallucinations relating to the law are 'alarmingly prevalent', occurring around 69% of the time with ChatGPT⁹.

More widely, there is increasing pressure to rush to incorporate GenAI into business models and processes for fear of 'missing out' or losing competitive advantage. Anyone using the technology – including e-discovery experts – needs to proceed with extreme caution here, as it is almost certain that the GenAI 'boom' we are currently seeing will lead to increased legal and regulatory challenges. These technologies rely on masses of data and are complex and not always fully understood. When developing and deploying these models, it will be essential to ensure that appropriate safeguards are implemented, and that there is a clear understanding of how inputted data is being used, how the learning was done, whether there was bias and so forth. Models need to be forensically examined and outputs need to be explainable to a judge or court.

Conclusion

AI will no doubt have an enormous impact on e-discovery and the legal industry more widely and will certainly drive competitive advantage in the marketplace. That said, it is important to understand that any changes will be incremental, and experienced users who understand the technology and its limitations will see the most benefit. AI is still not a replacement for humans – in fact in many ways it makes human work even more critical. AI is another tool in the e-discovery toolbox which needs to be used in the right way, and for the right purpose.

Foonotes

1. Moor, J., "The Dartmouth College Artificial Intelligence Conference: The Next Fifty Years", AI Magazine, Vol 27, No. 4, pp. 87–89, 2006

2. https://www.justice.gov.uk/courts/procedure-rules/ civil/rules/part31/pd_part31b

3. Da Silva Moore v. Publicis Groupe et al, No. 1:2011cv01279 - Document 96 (S.D.N.Y. 2012)

4. Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125, 127 (S.D.N.Y. 2015)

5. Quinn v Irish Bank Resolution Corporation Ltd (In Special Liquidation) & ors [2015] IESC 29,[1] [2016]

6. Pyrrho Investments Limited v MWB Property Limited [2016] EWHC 256

7. https://www.justice.gov.uk/courts/procedure-rules/ civil/rules/part01

8. Mata v Avianca Airlines (2023), Civil Action No.: 22-cv-1461 (PKC), Document 32-1

9. Matthew Dahl, Varun Magesh, Mirac Suzgun & Daniel E. Ho, Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models, arXiv (2024).

Originally published by 13 June, 2024

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

Use Of Artificial Intelligence In E-Discovery

Contributor

Role Of Technology and AI In E-Discovery

Generative AI And E-Discovery

The Risks

Conclusion

Technology

Contributor

United Kingdom