Enforced planning and reasoning within our LLM Claim Assistant

By Evan Poe, Nazem Aldroubi, and Benjamin Baker

Background

We recently shared that we’re developing an LLM-powered Claim Assistant to answer complex questions about how claims are processed. As discussed in our first post, this task often surpassed the LLM’s token limits when processing entire claim stack traces. Our solution was the adoption of call chaining, which we describe further in our second post. Call chaining allows us to break down the various pieces of the claims process into smaller, more manageable segments, facilitating more effective analysis.

Despite these improvements, we continued to encounter limitations when reviewing our most complex claims. For instance, processing a claim involving multiple services might require Oscar’s claim system to review a member’s past claims to inform our decision on the current one. This necessitates a series of intricate cross-queries, where each service might trigger a search through a potentially endless list of related claims. The resulting data traces are incredibly difficult to interpret because they consist of repetitive, yet slightly varied, logic operations (e.g., searching for a specific service within a claim). These subtle differences are crucial for decision-making but are so nuanced that even human experts find them challenging to analyze.

In an attempt to solve this, we explored the concept of forced reasoning and action using an agent-based approach. In this framework, the LLM serves as the cognitive core of the agent but it also has access to a suite of APIs that act as its “toolbox.” This allows the agent to first reason (i.e., using planning, memory, and reflection), and then act using the tools at its disposal.

Fortunately for us, we had already developed many of the necessary components; they just hadn’t been organized within an agent-based structure. In this post, we will outline how we integrated these elements into the agent framework and discuss the resulting performance improvements.

Strategic Planning and Subgoal Breakdown

The Claim Assistant starts by formulating a strategic plan to tackle the problem at hand. This plan is an array of thoughts or subgoals, much like breaking down a large task into smaller, manageable pieces. Claim Assistant, acting as the agent’s brain, uses its LLM capabilities to process complex information and develop a step-by-step approach to address the claims adjudication issue. This “Relevancy Search” uses the following prompt:

In the context of our claims platform, a “module” corresponds to a single file containing a chunk of adjudication logic. Each module execution receives a unique ID, or index, that can be used to deduce the exact location it executed within a particular claim’s trace. A typical Claim Assistant response when presented with this prompt is as follows:

The process of generating these structured thoughts is underpinned by a data architecture we refer to as the Skeleton Trace. This trace functions analogously to a book’s table of contents, providing a high-level outline of the claim’s processing narrative without delving into the specifics. By presenting the Relevancy Search service with the Skeleton Trace and inquiring about the most relevant segments, the Claim Assistant can pinpoint the essential parts of the full trace required to address the query. This strategy effectively narrows the focus from one large, complex problem to a series of smaller, more manageable issues.

Through this refined planning and subgoal decomposition process, the Claim Assistant identifies specific module indexes to investigate, explaining the rationale behind each choice. This targeted approach streamlines the planning process, ensuring that the Claim Assistant efficiently navigates the complexities of claims adjudication.

Tool-Assisted Reasoning and Execution

In the preceding section, we discussed the Claim Assistant’s ability to plan and decompose problems, which is crucial for managing the complexity of claims adjudication. Now, we turn our attention to the next stage: how the Claim Assistant reasons and acts using a suite of tools to navigate through the decision-making process.

In this phase, where the Claim Assistant transitions from planning to execution, our methodology is informed by the work done by Yao et al. on synergizing reasoning and action in language models. It employs prompting that guides the LLM through a structured process of thinking, acting, and observing. This method allows the Claim Assistant to interact with its available tools: APIs we had previously developed but not yet integrated into an agent-based framework. For Claim Assistant, the output of this step looks like the following:

The prompt template is designed to facilitate a clear sequence of steps for the LLM. It begins with a thought, which is a specific inquiry or action item derived from the planning stage. The Claim Assistant then performs an action, such as querying a detailed trace of a particular module. Finally, it makes an observation, interpreting the results of its action to gather insights.

For example, when the Claim Assistant is prompted to investigate a specific module index, it might take the following action: querying the detail_summary API for that module. The observation would then articulate the findings from this detailed trace, providing a concise explanation that contributes to solving the larger problem at hand.

This iterative process of thought, action, and observation continues until the Claim Assistant exhausts all relevant thoughts. By doing so, it systematically works through the complexities of the claim, combining observations to reach a comprehensive solution.

This prompting method was a key milestone in the development. Prior to this enhancement, our ground truth evaluations — a measure where GPT-4 is used to grade the agent-generated answer against a human provided ground truth (more on that in our previous post) — indicated that we were plateauing at scores around the 7.9 (out of 10) mark for our most complex tasks. As a frame of reference, responses that score less than a 9 typically do not meet our performance standards. However, the moment we implemented this agent-based approach, we witnessed a significant increase in our evaluation scores. They jumped from an average of 7.9 to a more impressive 9.4, firmly placing us in the 9s range for all subsequent evaluations. This improvement underscores the efficacy of enforced reasoning in enhancing the Claim Assistant’s decision-making precision.

What’s Next?

By integrating additional tools such as document retrieval systems for reimbursement policies and code functions capable of executing specific claims-related calculations, we can unlock new use cases and deeper insights into the provider reimbursement journey. The agent framework, coupled with enforced planning and reasoning, provides a great foundation for these advances, ensuring that each new tool can be incorporated into the agent’s cognitive process. This continuous expansion of the agent’s toolbox represents an exciting opportunity to step towards a more intelligent and autonomous Claim Assistant.

References

Synergizing Reasoning and Acting in Language Models by Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao

https://arxiv.org/abs/2210.03629