The Case for LLM Optimism

Guest post by Sean A. Harrington, Law & Technology Librarian at Ross-Blakley Law Library, Arizona State University

I’ve watched the lively discussions on the RIP-SIS blog about the implications of ChatGPT in legal research and instruction with great interest.  I have been taking a deep dive into this technology in recent months, and I thought I might be able to add a new perspective to the conversation and share some of the information that I’ve discovered with the broader law librarian community.

Hallucinations

Let’s dive into the topic at the most controversial place: Hallucinations. Hallucinations – or AI responses that are made up – are at the forefront of everyone’s mind when they are using these tools for legal research.  I’m hoping what follows will ease your mind and get you excited about this new technology.  I’m going to focus on search and retrieval because generative AI and agents are equally complex topics that could have their own posts.

The Terminology Problem

As many of you have doubtlessly heard, the law-specific LLMs from Lexis and Westlaw are going to be different than something like ChatGPT.  ChatGPT is designed to cater to a wide array of queries from an assortment of domains and produce a huge range of outputs. While this versatility enables users to obtain preliminary information, it falls short in the context of legal research where specificity, precision, and reliability are paramount. Asking ChatGPT a legal research question is like Googling a research question and then cutting and pasting the summary of the first result into a legal research memo.  It could be an excerpt from an attorney’s blog which could be mostly correct.  In many cases, it could be a result from years ago because people are not using a web-connected plugin (or Bing) to find current information.  In any event, it is not how we conduct legal research.

The big upgrade we’re getting with these law-specific LLMs is the massive library of premium  secondary materials from Lexis and Westlaw to train the models.  This will immediately make them substantially more useful than something like ChatGPT because it refines the predictive nature of the LLMs).  In addition, we get the massive primary law databases from both Lexis and Westlaw so that we can have direct links to the sources that will be used to generate the answer.  Let’s talk about some of the other stuff going on behind the scenes with LLMs that will make them more effective for our purposes.

Prompting

A concept that is often not understood to its full extent is “prompting.” Prompting usually means providing the model with some context, so it knows what type of information to look for to predict the output. You’ve undoubtedly seen a webinar or youtube video or Twitter thread with prompting along these lines: 

“You are a law professor at a top united states law school.  You are an expert in Administrative Law and provide all of your output in a clear, academic tone without bias.  Please summarize the current state of the Chevron Doctrine in ten bullet points.”

That is not what I’m talking about.  LLM-specific programming frameworks like Langchain allow for the creation of complex behind-the-scenes prompts, for “multi-stage reasoning” that provides natural language feedback to the LLM before the user ever sees a response.   Langchain is a sort of pseudo-code that is something between natural language text and programming in python.  A law librarian with an understanding of HTML, JSON, Python, or R could learn this new language. With Langchain you can link these prompts in a sequence that transforms the output into various formats, rigorously verifying the output through multiple checks, executing an array of similar searches and amalgamating the results. They can connect the LLM to a myriad of different applications and platforms.  More importantly, this can allow the LLM to perform complex, niche subtasks with a high level of efficacy.

Here is a visual example of the types of advanced prompting you could do with Langchain for legal research:

Note: Each step of this analysis could be pages and pages worth of complex code running behind the scenes before the user sees anything.

I’m not advocating that every law librarian should immediately enroll in Computer Science classes to learn new programming languages.  Frankly, you don’t need to in order to use these tools.  However, having someone on your staff with a basic understanding of these processes could become very valuable and could provide a new form of outreach (and growth) for your law library. 

Vector Embeddings and Vector Databases

Imagine you have a vast collection of books, and you want to create a magical card catalog or index where you can instantly find not just books, but also ideas and concepts that are similar or related. Embeddings in Large Language Models (LLMs) are like creating this magical index.

In this index, each book, paragraph, or even word is represented by a magical token. These tokens are arranged in such a way that similar ideas and concepts are placed closer together. These embeddings allow for semantic searching.

For example, consider the phrases

“The cat meowed for food.”
“The kitten called for treats.”

These two expressions would be very difficult to connect through a traditional keyword search but conceptually they are pretty similar. Once you have embeddings in place, the terms are close enough for an LLM to pick up this nuance easily. 

Imagine the complexity of some of the Boolean expressions you have created for complex legal scenarios. In this example, we are searching for [Prompt] a DUI driving arrest where the person was stopped on the side of the road, keys in the ignition but the engine is not running.   

(drunk driving OR intoxicated OR under influence OR impaired OR alcohol OR drug OR substance abuse OR alcohol abuse OR drug abuse OR DUI OR DWI) 

AND 

(NOT (car started OR vehicle started OR ignition started OR engine started)) 

The relationships created in a vector database are especially useful to a field that frequently relies on analogy to craft compelling arguments.  In this way, legal research is especially well-suited to the use of LLMs because much of it is based on words and similar concepts. 

Conclusion

LLMs excel in automating and speeding up repetitive and tedious processes. As it stands, you still need a thorough understanding of the law to evaluate the results you get back from any of these services.  LLMs could afford law librarians the invaluable opportunity to allocate their time and energy toward more human-centric pursuits (the reference interview, 1-on-1 student appointments, writing, etc.). It also gives you some breathing room so you can spend time deeply reading and pondering the legal analysis and tasks before you. Finally, LLMs create new opportunities for law librarians to get involved in AI so that we can ride this tech tidal wave into the future and expand our budgets and staff.  Noteworthy people in tech have anticipated that the demand for legal services could increase in the AI age as questions surrounding ownership, usage, and risk become increasingly prescient. This reallocation of focus, not only optimizes the research process but also fosters a more interpersonal and service-oriented approach to law librarianship – specifically the part that is unlikely to be replaced by machines anytime soon.

For a deeper-dive on all of this, I have an editorial on SSRN in preprint format for feedback (or coauthors if you are interested in this technology and want to tag-team this with me). I’ve also got another preprint that talks about students using custom-tailored LLMs for studying in law school. I’d love to hear from you!

This entry was posted in Uncategorized and tagged , , , , , , . Bookmark the permalink.

1 Response to The Case for LLM Optimism

  1. Hi Sean! Very interesting. I think that I understand the Lexis AI+ presentation I attended at AALL.

Leave a comment