Using AI to fix one of healthcare’s most overlooked bottlenecks: getting documents to the right place

Jul 1

By Lauren Pendo and Rachael Burns

Documents are the lifeblood of healthcare operations, but they can also be a real challenge to manage.

At Oscar, we receive thousands of provider documents every day. These documents need to be processed and routed to the correct team in order to help our members get the care they need. Since these documents come from thousands of sources – including world-leading hospitals and “Mom and Pop Pediatrics” – they are not uniform and often are missing key details that we use internally.

For example, while we may be tracking a claim as it is being processed, the medical records associated with the claim may not contain the respective claim ID. Similarly, the medical record may be missing the unique identifier for a member, and instead may only contain the member’s name and date of birth.

Without clear metadata, routing documents to the right teams becomes challenging, causing potential delays in processes like claim reviews. We needed a better system. So we built one.

At Oscar, we’re pushing the boundaries of AI to make healthcare smarter and more efficient. Interested in joining us? Check out our open roles.

Building a smarter way to route documents

Our team designed a new AI-powered triage tool to automatically extract and validate the metadata inside each document. These include details such as member name, date of service, and provider details. These fields are the keys to matching a document to its corresponding claim.

Here’s how it works:

The document comes in through various channels, including our fax lines
Oscar's central document system registers the document and dispatches an event
The AI Document Extractor listens for that event, downloads the document, and converts the pages to images
The Extractor sends three pages at a time to GPT, which provides a structured definition of what we want the model to read from the pages
The Extractor compiles information from all of these chunks, then validates it against Oscar data through a series of service calls.

Given the following sample medical record:

An example response from would look like:

responses = [
{
"document_type": "medical_record",
      "claim_id": ["123499999", "56789999"],
      "member": {
"first_name": "John",
           "last_name": "Doe",
           "osc_id": "OSC12345678-01",
           "date_of_birth": {"year": 1980, "month": 1, "day": 1},
               },
       "dates_of_service": [
        {
           "service_start_date": {"year": 2023, "month": 5, "day": 10},
           "service_end_date": {"year": 2023, "month": 5, "day": 20},
        }
               ],
        "providers": [
         {
          "first_name": "Jane",
           "last_name": "Smith",
           "npi": "1245319599",
            "tin": "987654321",
            "address": "123 Main Street",
            "city": "Anytown",
            "zip_code": "11111",
         }
               ],
           },
 ]

Once we have a medical record, we need to match it to the member and claim. To solve this, we built an AI-powered triage tool that extracts and validates document metadata. But the real challenge was matching that extracted data to the correct Oscar member and claim, especially when the information isn’t perfect. For a single medical record, we perform the following checks to tag the document with the correct member and claim.

Validating members

We have two services that we call to validate the raw metadata extracted from the document.

Exact Match RPC (GetPersonByDemographicInformation): Requires a perfect match. Oftentimes, we see small mistakes in how AI extracts a member’s name, which would cause this lookup to return no results. For example, “Joe” vs. “Joseph.”

Search RPC (StructuredSearchPerson): Tolerates typos, nicknames, and partial dates, but returns multiple candidates. The service returns the best match based on the request, but also provides other candidate matches, including both high confidence matches and low confidence ones. Typically, the best match is the member we want to find, but it is not guaranteed.

With our metadata, we do the following:

Try the strict lookup first. We call the exact RPC to see if we get a perfect match.
Fallback to the broader search. If the strict lookup does not find a member, we automatically run the more tolerant search.

This handles cases when there is a small mistake in the extraction or a member has a confusing name structure.
For example “James” “Taylor Morgan” might’ve been interpreted by GPT as “James” “Taylor” “Morgan”.

Normalize the results: Regardless of which service returns the match, we standardize the output so the rest of our system always receives the same Python objects.

This orchestration means our application logic doesn’t have to worry about which service to call or how to handle mismatches. It just asks for a member tag, and our resilient lookup handles the rest.

Validating providers

Once we’ve identified the member, we need to validate the provider. Documents often contain incomplete or slightly incorrect provider details—maybe just a last name and a city, or a different address listed on the medical record vs. the address in our system. We tackle this the following way:

Geocoding: Using Oscar’s Geocoder to turn whatever address info we have into coordinates, even if it means gradually dropping fields (from street to city to zip) until we get a hit.
Progressive Search: Running wider and wider radius searches (1, 5, 10, 20, 50 miles) against our provider database until we find at least one plausible match.
De-duplication: Packaging results as structured, de-duplicated objects, so we can confidently link the document to the right provider.

Looking up claims

After retrieving a list of candidate members and validated providers, we are ready to look for matching claims.

Comprehensive Claim Search: With our list of candidate member IDs, provider NPIs/TINs, and the extracted date-of-service range, we run a resilient claim lookup. This process:

Tries all sensible combinations of member, provider, and date filters, starting with the exact member match or high confidence member matches.
Deduplicates by claim lineage, returning only the root claim(s) to avoid confusion with reversals or adjustments.
Short-circuits and returns as soon as a confident match is found.

Traceability: Throughout the process, we log every step and capture all intermediate results in a trace object, making it easy to debug and audit the matching logic.

If at any point we lack enough information (for example, if the document is missing both a member ID and a service date), we gracefully skip the claim lookup and return an empty result, ensuring that only high-confidence matches are surfaced to downstream systems. Only when the system passes those checks do we tag the document and route it to the right workflow.

Designed to be reused

We initially built this system for our Special Investigations Unit team, which investigates potential fraud, waste, and abuse. But because we structured it using shared infrastructure, it’s already paying dividends elsewhere.

For example, Oscar’s Utilization Management team often receives pre-claim documents that don’t have claim IDs. Thanks to the tagging built into our triage system, they can now identify the right member automatically, without manual review.

Another unexpected use case: identifying checks. Members sometimes send paper checks that get scanned into the same system. We can use the same extraction pipeline to match those checks to the right member.

What we’ve learned so far

A few lessons stood out along the way:

Extraction is easy. Validation is hard. AI can pull names and dates. But matching them correctly requires smart logic and clean internal architecture with resilient fallbacks as detailed above.
AI often outperforms humans at classifying document types. We found that in many cases, the model was better than people at telling whether a document was a medical record, especially when human labels were vague or incorrect. When we dug into the 7% of documents that GPT classified as medical records that Oscar hadn’t, we found that the documents were in fact medical records, but had been classified based on their intent by humans, leading to mis-classification. By decoupling our classification of document type and intent, we can more cleanly classify document types.

What’s next

We’re just beginning to explore what else this system can do. With the core tagging and routing infrastructure in place, we’re exploring new use cases across our operations—from claims and payments to care management and member support.

We’re also working with our provider partners to encourage more structured submissions, like standardized cover sheets. But in the meantime, this model helps us bridge that gap.