Overview

Updated 12.01.23

Our experience applying language models in healthcare has led us to several key learnings (and a growing list of use cases).

While LLMs are easy to use for great effect, they’re harder to use for great outcomes.

Rather than a panacea, they opened up a class of data problems that previously seemed virtually intractable. Rather than thinking of them as language understanding and language generating models, think of them as tools that are uniquely capable at going from structured data into unstructured data, and the other direction. That makes them exceptionally powerful in healthcare, because healthcare might be the industry vertical with the highest intersection of natural language and rules-based systems. Extracting contract configurations in our claims system (← structured) from contract language (← unstructured), or explaining (← unstructured language) claims payment decisions (← structured), or translating (← unstructured) a utilization management decision into a patient-friendly letter (← structured) - those are all healthcare use cases that sit right at those intersections. Seek out those intersections, and you’ll have found a language model application in healthcare.

Newest post here: Evaluating the Behavior of Call Chaining LLM Agents

Hallucinations are real, and a sensitive issue in an industry like healthcare.

LLMs are capable of spectacular feats, and they are also capable of spectacularly random flame-outs. We deal with them as follows. First, we focus on use cases with a human in the loop (like our test result documentation automation for Oscar Medical Group). Second, hallucinations become more controllable and identifiable in use cases that go from unstructured to structured data, such as our provider data fixer. Third, knowledge injected into the prompt is much more reliably retrieved than knowledge encoded in the model’s weights, especially when the knowledge is esoteric or sparse within the corpus. As context windows expand and with the help of embeddings and vector databases, it becomes easier to implement LLMs as a set of prompt parsing problems.

Learn more here: A Simple Example for Limits on LLM Prompting Complexity

Healthcare regulatory rules are useful language model benchmarks.

Language models need benchmarks. People like to use stuff like MMLU (a collection of language understanding tasks). However, healthcare is highly regulated, so many decisions are already made using very quantifiable and auditable rubrics. Use your rubrics to create your own language model benchmarks. One example is NCQA guidelines for care management programs: already a formalized rule set that many organizations benchmark against - might as well turn it into a language model benchmark.

Healthcare data regulation creates some unique issues.

Because this is such a fast-moving paradigm shift, many companies or models are not yet prepared to accept the contractual and legal obligations of a HIPAA business associate, and the availability of HIPAA-compliant models is inconsistent. Google has responded most quickly to meet rising demand for LLMs-as-a-service. But be aware that “alpha,” “beta,” or so-called “preview” features released by all cloud providers are not automatically included under the existing HIPAA compliance provisions you’ve negotiated with them. And of course, even if you have a signed BAA, you must comply with HIPAA’s minimum necessary rule and obtain appropriate member/patient consents, among other privacy-related obligations.

On the question of open-source models (like Meta’s LLaMA) vs. proprietary models (like OpenAI’s GPT-4):

The open-source model landscape has been exploding recently. But, in short: Proprietary models are currently better at most use cases, even though (until recently) open-source models could fit more use cases. With the launch of GPT-4 Turbo, which has an 128K context-window, proprietary models have cemented their advantage (for now).

Ultimately, our three drivers to generate healthcare value are: 1) create better healthcare experiences that drive growth and retention, 2) impact behavior that yields better outcomes, 3) automate processes to increase affordability.

Language models have the potential to support all three. For example, incorporating language model processing in our Campaign Builder tool (which already manages virtually all of our member engagement) has clear potential for continuing to create better healthcare experiences and drive retention.