Biomarker Parser
@loop/biomarker-parser extracts structured biomarker data from lab report PDFs and images using AI providers (Anthropic Claude, OpenAI GPT-4). It handles the complete pipeline from raw document to normalized, classified biomarker readings.
Installation
pnpm add @loop/biomarker-parserPeer dependencies: openai and/or @anthropic-ai/sdk (depending on which provider you use).
Quick Start
import { parseLabReport, createAnthropicProvider } from '@loop/biomarker-parser';
const provider = createAnthropicProvider({
apiKey: process.env.ANTHROPIC_API_KEY!,
});
const result = await parseLabReport(provider, {
type: 'pdf',
content: pdfBuffer, // Buffer or base64 string
}, {
patientSex: 'male',
});
if (result.ok) {
console.log(result.value.biomarkers);
// [{ code: 'testosterone-total', name: 'Total Testosterone', value: 650, unit: 'ng/dL', status: 'normal' }]
} else {
console.error(result.error);
}API Reference
parseLabReport(provider, input, options?)
Main entry point for lab report parsing.
Parameters:
| Param | Type | Description |
|---|---|---|
provider | AIProvider | AI provider instance (Anthropic or OpenAI) |
input | ExtractionInput | Input document |
options | ParserOptions | Optional configuration |
ExtractionInput:
type ExtractionInput =
| { type: 'pdf'; content: Buffer | string }
| { type: 'image'; content: Buffer | string; mimeType: string }
| { type: 'text'; content: string };ParserOptions:
interface ParserOptions {
patientSex?: 'male' | 'female';
normalizeUnits?: boolean; // default: true
}Returns: Promise<Result<ParseResult>>
interface ParseResult {
biomarkers: ProcessedBiomarker[];
metadata: {
provider: string;
labDate?: string;
patientName?: string;
rawBiomarkerCount: number;
matchedCount: number;
unmatchedCount: number;
};
}
interface ProcessedBiomarker {
code: string; // Canonical biomarker code
name: string; // Display name
value: number; // Numeric value
unit: string; // Standardized unit
originalUnit?: string; // Original unit before normalization
status: BiomarkerStatus;
referenceRange?: {
low: number;
high: number;
optimalLow?: number;
optimalHigh?: number;
};
}
type BiomarkerStatus =
| 'critical_low'
| 'low'
| 'optimal'
| 'normal' // within range but not optimal
| 'high'
| 'critical_high'
| 'unknown';parseFromText(provider, text, options?)
Parse pre-extracted text (bypasses PDF/image processing).
const result = await parseFromText(provider, labReportText, {
patientSex: 'female',
});AI Providers
Anthropic Provider
import { createAnthropicProvider } from '@loop/biomarker-parser';
const provider = createAnthropicProvider({
apiKey: process.env.ANTHROPIC_API_KEY!,
model: 'claude-sonnet-4-20250514', // optional, default varies
});Supports: text and image inputs. Does not support direct PDF input (PDF must be converted to images first).
OpenAI Provider
import { createOpenAIProvider } from '@loop/biomarker-parser';
const provider = createOpenAIProvider({
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-4o', // optional
});Supports: PDF, image, and text inputs.
Unit Normalization
The parser normalizes biomarker values to standard units using a conversion table.
normalizeValue(biomarkerCode, value, unit)
import { normalizeValue } from '@loop/biomarker-parser';
const result = normalizeValue('testosterone-total', 22.5, 'nmol/L');
// { ok: true, value: { value: 648.6, unit: 'ng/dL', conversionFactor: 28.818 } }convertFromStandard(biomarkerCode, value, targetUnit)
Reverse conversion from standard unit:
import { convertFromStandard } from '@loop/biomarker-parser';
const result = convertFromStandard('testosterone-total', 650, 'nmol/L');
// { ok: true, value: { value: 22.55, unit: 'nmol/L' } }canonicalizeUnit(unit)
Normalizes unit strings to canonical form:
import { canonicalizeUnit } from '@loop/biomarker-parser';
canonicalizeUnit('ng/dl'); // 'ng/dL'
canonicalizeUnit('mIU/ml'); // 'mIU/mL'
canonicalizeUnit('µg/L'); // 'mcg/L'Biomarker Matching
matchBiomarkerCode(name)
Maps raw biomarker names from lab reports to canonical codes using synonym resolution:
import { matchBiomarkerCode } from '@loop/biomarker-parser';
matchBiomarkerCode('Total Testosterone'); // 'testosterone-total'
matchBiomarkerCode('TSH'); // 'tsh'
matchBiomarkerCode('Hemoglobin A1c'); // 'hba1c'
matchBiomarkerCode('Unknown Test'); // nullThe matching uses:
- Exact match against canonical names
- Synonym lookup from
@loop/health-data - Longest-substring match as fallback
processRawBiomarkers(raw, sex?, normalizeUnits?)
Full processing pipeline for raw biomarker data:
import { processRawBiomarkers } from '@loop/biomarker-parser';
const result = processRawBiomarkers(
[
{ name: 'Total Testosterone', value: 650, unit: 'ng/dL' },
{ name: 'TSH', value: 2.1, unit: 'mIU/L' },
],
'male',
true
);
// result.matched: ProcessedBiomarker[] (successfully matched)
// result.unmatched: RawBiomarker[] (could not be matched)Error Handling
All functions return Result<T> from @loop/core. Errors use specific error codes:
| Error Code | Function | Description |
|---|---|---|
EXTRACTION_FAILED | extractionFailed() | AI provider failed to extract biomarkers |
INVALID_PDF | invalidPdf() | Input PDF is invalid or corrupt |
NORMALIZATION_FAILED | normalizationFailed() | Unit conversion failed |
PROVIDER_ERROR | providerError() | AI provider returned an error |
NO_BIOMARKERS_FOUND | noBiomarkersFound() | No biomarkers extracted from input |
UNSUPPORTED_UNIT | unsupportedUnit() | Unit conversion not supported |