Skip to Content
Health DataBiomarker Parser

Biomarker Parser

@loop/biomarker-parser extracts structured biomarker data from lab report PDFs and images using AI providers (Anthropic Claude, OpenAI GPT-4). It handles the complete pipeline from raw document to normalized, classified biomarker readings.

Installation

pnpm add @loop/biomarker-parser

Peer dependencies: openai and/or @anthropic-ai/sdk (depending on which provider you use).

Quick Start

import { parseLabReport, createAnthropicProvider } from '@loop/biomarker-parser'; const provider = createAnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY!, }); const result = await parseLabReport(provider, { type: 'pdf', content: pdfBuffer, // Buffer or base64 string }, { patientSex: 'male', }); if (result.ok) { console.log(result.value.biomarkers); // [{ code: 'testosterone-total', name: 'Total Testosterone', value: 650, unit: 'ng/dL', status: 'normal' }] } else { console.error(result.error); }

API Reference

parseLabReport(provider, input, options?)

Main entry point for lab report parsing.

Parameters:

ParamTypeDescription
providerAIProviderAI provider instance (Anthropic or OpenAI)
inputExtractionInputInput document
optionsParserOptionsOptional configuration

ExtractionInput:

type ExtractionInput = | { type: 'pdf'; content: Buffer | string } | { type: 'image'; content: Buffer | string; mimeType: string } | { type: 'text'; content: string };

ParserOptions:

interface ParserOptions { patientSex?: 'male' | 'female'; normalizeUnits?: boolean; // default: true }

Returns: Promise<Result<ParseResult>>

interface ParseResult { biomarkers: ProcessedBiomarker[]; metadata: { provider: string; labDate?: string; patientName?: string; rawBiomarkerCount: number; matchedCount: number; unmatchedCount: number; }; } interface ProcessedBiomarker { code: string; // Canonical biomarker code name: string; // Display name value: number; // Numeric value unit: string; // Standardized unit originalUnit?: string; // Original unit before normalization status: BiomarkerStatus; referenceRange?: { low: number; high: number; optimalLow?: number; optimalHigh?: number; }; } type BiomarkerStatus = | 'critical_low' | 'low' | 'optimal' | 'normal' // within range but not optimal | 'high' | 'critical_high' | 'unknown';

parseFromText(provider, text, options?)

Parse pre-extracted text (bypasses PDF/image processing).

const result = await parseFromText(provider, labReportText, { patientSex: 'female', });

AI Providers

Anthropic Provider

import { createAnthropicProvider } from '@loop/biomarker-parser'; const provider = createAnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4-20250514', // optional, default varies });

Supports: text and image inputs. Does not support direct PDF input (PDF must be converted to images first).

OpenAI Provider

import { createOpenAIProvider } from '@loop/biomarker-parser'; const provider = createOpenAIProvider({ apiKey: process.env.OPENAI_API_KEY!, model: 'gpt-4o', // optional });

Supports: PDF, image, and text inputs.


Unit Normalization

The parser normalizes biomarker values to standard units using a conversion table.

normalizeValue(biomarkerCode, value, unit)

import { normalizeValue } from '@loop/biomarker-parser'; const result = normalizeValue('testosterone-total', 22.5, 'nmol/L'); // { ok: true, value: { value: 648.6, unit: 'ng/dL', conversionFactor: 28.818 } }

convertFromStandard(biomarkerCode, value, targetUnit)

Reverse conversion from standard unit:

import { convertFromStandard } from '@loop/biomarker-parser'; const result = convertFromStandard('testosterone-total', 650, 'nmol/L'); // { ok: true, value: { value: 22.55, unit: 'nmol/L' } }

canonicalizeUnit(unit)

Normalizes unit strings to canonical form:

import { canonicalizeUnit } from '@loop/biomarker-parser'; canonicalizeUnit('ng/dl'); // 'ng/dL' canonicalizeUnit('mIU/ml'); // 'mIU/mL' canonicalizeUnit('µg/L'); // 'mcg/L'

Biomarker Matching

matchBiomarkerCode(name)

Maps raw biomarker names from lab reports to canonical codes using synonym resolution:

import { matchBiomarkerCode } from '@loop/biomarker-parser'; matchBiomarkerCode('Total Testosterone'); // 'testosterone-total' matchBiomarkerCode('TSH'); // 'tsh' matchBiomarkerCode('Hemoglobin A1c'); // 'hba1c' matchBiomarkerCode('Unknown Test'); // null

The matching uses:

  1. Exact match against canonical names
  2. Synonym lookup from @loop/health-data
  3. Longest-substring match as fallback

processRawBiomarkers(raw, sex?, normalizeUnits?)

Full processing pipeline for raw biomarker data:

import { processRawBiomarkers } from '@loop/biomarker-parser'; const result = processRawBiomarkers( [ { name: 'Total Testosterone', value: 650, unit: 'ng/dL' }, { name: 'TSH', value: 2.1, unit: 'mIU/L' }, ], 'male', true ); // result.matched: ProcessedBiomarker[] (successfully matched) // result.unmatched: RawBiomarker[] (could not be matched)

Error Handling

All functions return Result<T> from @loop/core. Errors use specific error codes:

Error CodeFunctionDescription
EXTRACTION_FAILEDextractionFailed()AI provider failed to extract biomarkers
INVALID_PDFinvalidPdf()Input PDF is invalid or corrupt
NORMALIZATION_FAILEDnormalizationFailed()Unit conversion failed
PROVIDER_ERRORproviderError()AI provider returned an error
NO_BIOMARKERS_FOUNDnoBiomarkersFound()No biomarkers extracted from input
UNSUPPORTED_UNITunsupportedUnit()Unit conversion not supported