Biomarker Parser

@loop/biomarker-parser extracts structured biomarker data from lab report PDFs and images using AI providers (Anthropic Claude, OpenAI GPT-4). It handles the complete pipeline from raw document to normalized, classified biomarker readings.

Installation


pnpm add @loop/biomarker-parser

Peer dependencies: openai and/or @anthropic-ai/sdk (depending on which provider you use).

Quick Start


import { parseLabReport, createAnthropicProvider } from '@loop/biomarker-parser';
 
const provider = createAnthropicProvider({
  apiKey: process.env.ANTHROPIC_API_KEY!,
});
 
const result = await parseLabReport(provider, {
  type: 'pdf',
  content: pdfBuffer, // Buffer or base64 string
}, {
  patientSex: 'male',
});
 
if (result.ok) {
  console.log(result.value.biomarkers);
  // [{ code: 'testosterone-total', name: 'Total Testosterone', value: 650, unit: 'ng/dL', status: 'normal' }]
} else {
  console.error(result.error);
}

API Reference

`parseLabReport(provider, input, options?)`

Main entry point for lab report parsing.

Parameters:

Param	Type	Description
`provider`	`AIProvider`	AI provider instance (Anthropic or OpenAI)
`input`	`ExtractionInput`	Input document
`options`	`ParserOptions`	Optional configuration

ExtractionInput:


type ExtractionInput =
  | { type: 'pdf'; content: Buffer | string }
  | { type: 'image'; content: Buffer | string; mimeType: string }
  | { type: 'text'; content: string };

ParserOptions:


interface ParserOptions {
  patientSex?: 'male' | 'female';
  normalizeUnits?: boolean;  // default: true
}

Returns: Promise<Result<ParseResult>>


interface ParseResult {
  biomarkers: ProcessedBiomarker[];
  metadata: {
    provider: string;
    labDate?: string;
    patientName?: string;
    rawBiomarkerCount: number;
    matchedCount: number;
    unmatchedCount: number;
  };
}
 
interface ProcessedBiomarker {
  code: string;           // Canonical biomarker code
  name: string;           // Display name
  value: number;          // Numeric value
  unit: string;           // Standardized unit
  originalUnit?: string;  // Original unit before normalization
  status: BiomarkerStatus;
  referenceRange?: {
    low: number;
    high: number;
    optimalLow?: number;
    optimalHigh?: number;
  };
}
 
type BiomarkerStatus =
  | 'critical_low'
  | 'low'
  | 'optimal'
  | 'normal'    // within range but not optimal
  | 'high'
  | 'critical_high'
  | 'unknown';

`parseFromText(provider, text, options?)`

Parse pre-extracted text (bypasses PDF/image processing).


const result = await parseFromText(provider, labReportText, {
  patientSex: 'female',
});

AI Providers

Anthropic Provider


import { createAnthropicProvider } from '@loop/biomarker-parser';
 
const provider = createAnthropicProvider({
  apiKey: process.env.ANTHROPIC_API_KEY!,
  model: 'claude-sonnet-4-20250514',  // optional, default varies
});

Supports: text and image inputs. Does not support direct PDF input (PDF must be converted to images first).

OpenAI Provider


import { createOpenAIProvider } from '@loop/biomarker-parser';
 
const provider = createOpenAIProvider({
  apiKey: process.env.OPENAI_API_KEY!,
  model: 'gpt-4o',  // optional
});

Supports: PDF, image, and text inputs.

Unit Normalization

The parser normalizes biomarker values to standard units using a conversion table.

`normalizeValue(biomarkerCode, value, unit)`


import { normalizeValue } from '@loop/biomarker-parser';
 
const result = normalizeValue('testosterone-total', 22.5, 'nmol/L');
// { ok: true, value: { value: 648.6, unit: 'ng/dL', conversionFactor: 28.818 } }

`convertFromStandard(biomarkerCode, value, targetUnit)`

Reverse conversion from standard unit:


import { convertFromStandard } from '@loop/biomarker-parser';
 
const result = convertFromStandard('testosterone-total', 650, 'nmol/L');
// { ok: true, value: { value: 22.55, unit: 'nmol/L' } }

`canonicalizeUnit(unit)`

Normalizes unit strings to canonical form:


import { canonicalizeUnit } from '@loop/biomarker-parser';
 
canonicalizeUnit('ng/dl');  // 'ng/dL'
canonicalizeUnit('mIU/ml'); // 'mIU/mL'
canonicalizeUnit('µg/L');   // 'mcg/L'

Biomarker Matching

`matchBiomarkerCode(name)`

Maps raw biomarker names from lab reports to canonical codes using synonym resolution:


import { matchBiomarkerCode } from '@loop/biomarker-parser';
 
matchBiomarkerCode('Total Testosterone');  // 'testosterone-total'
matchBiomarkerCode('TSH');                 // 'tsh'
matchBiomarkerCode('Hemoglobin A1c');      // 'hba1c'
matchBiomarkerCode('Unknown Test');        // null

The matching uses:

Exact match against canonical names
Synonym lookup from @loop/health-data
Longest-substring match as fallback

`processRawBiomarkers(raw, sex?, normalizeUnits?)`

Full processing pipeline for raw biomarker data:


import { processRawBiomarkers } from '@loop/biomarker-parser';
 
const result = processRawBiomarkers(
  [
    { name: 'Total Testosterone', value: 650, unit: 'ng/dL' },
    { name: 'TSH', value: 2.1, unit: 'mIU/L' },
  ],
  'male',
  true
);
 
// result.matched: ProcessedBiomarker[] (successfully matched)
// result.unmatched: RawBiomarker[] (could not be matched)

Error Handling

All functions return Result<T> from @loop/core. Errors use specific error codes:

Error Code	Function	Description
`EXTRACTION_FAILED`	`extractionFailed()`	AI provider failed to extract biomarkers
`INVALID_PDF`	`invalidPdf()`	Input PDF is invalid or corrupt
`NORMALIZATION_FAILED`	`normalizationFailed()`	Unit conversion failed
`PROVIDER_ERROR`	`providerError()`	AI provider returned an error
`NO_BIOMARKERS_FOUND`	`noBiomarkersFound()`	No biomarkers extracted from input
`UNSUPPORTED_UNIT`	`unsupportedUnit()`	Unit conversion not supported