Skip to Content
Applications@loop/embeddings-api

@loop/embeddings-api — Vector Embedding Generation

Hono-based microservice for generating vector embeddings from text using OpenAI’s embedding models. Provides API endpoints for semantic search, content recommendations, and RAG (Retrieval Augmented Generation) pipelines.

Purpose

The embeddings API powers semantic search and AI features across Loop applications:

Core Functions

  • Text Embedding: Convert text to vector representations using OpenAI
  • Batch Processing: Efficient batch embedding generation
  • Vector Storage: Store embeddings in Pinecone vector database
  • Similarity Search: Find semantically similar content
  • Content Indexing: Index CMS content, research papers, FAQs for AI retrieval

Use Cases

  • Semantic Search: Find relevant content based on meaning, not keywords
  • Content Recommendations: Suggest related articles, protocols, research
  • RAG Pipelines: Retrieve context for AI-powered chat and Q&A
  • Duplicate Detection: Identify similar content in CMS
  • Clustering: Group related content for analytics

Enables AI-powered search and recommendations without exposing OpenAI API keys to frontend apps.

Architecture

Route Structure

API Routes (src/routes/)

  • POST /api/v1/embeddings — Generate embedding for text
  • POST /api/v1/embeddings/batch — Batch embed multiple texts
  • POST /api/v1/search — Semantic similarity search
  • POST /api/v1/index — Index content with metadata
  • DELETE /api/v1/index/:id — Remove indexed content
  • GET /api/v1/similar/:id — Find similar content by ID
  • GET /api/health — Health check endpoint

Key Components

Embedding Generator (src/services/embedding-generator.ts)

  • OpenAI API client wrapper
  • Batch processing with rate limiting
  • Error handling and retry logic
  • Token counting and cost estimation

Vector Store (src/services/vector-store.ts)

  • Pinecone client wrapper
  • Upsert and delete operations
  • Similarity search with filtering
  • Metadata management

Content Indexer (src/services/content-indexer.ts)

  • Chunk long text into embeddings
  • Extract metadata from content
  • Batch index documents
  • Track indexing status

Search Engine (src/services/search-engine.ts)

  • Query vector generation
  • Hybrid search (vector + metadata filters)
  • Result ranking and scoring
  • Deduplication

Key Features

Embedding Generation

  • OpenAI Models: Uses text-embedding-3-small or text-embedding-3-large
  • Batch Processing: Efficient batching for multiple texts
  • Rate Limiting: Automatic rate limiting to stay within OpenAI quotas
  • Error Handling: Retry logic for transient failures
  • Cost Tracking: Log token usage for cost monitoring

Vector Storage

  • Pinecone Integration: Store embeddings in Pinecone vector database
  • Metadata Filtering: Filter search results by metadata
  • Namespaces: Separate embeddings by content type or environment
  • Upsert Operations: Update existing embeddings efficiently
  • Bulk Deletion: Remove embeddings by namespace or metadata
  • Cosine Similarity: Find semantically similar content
  • Hybrid Search: Combine vector search with metadata filters
  • Result Ranking: Score and rank results by relevance
  • Deduplication: Remove duplicate results
  • Pagination: Support for large result sets

Content Indexing

  • Chunking: Split long documents into embeddable chunks
  • Metadata Extraction: Extract title, type, author, date, etc.
  • Batch Indexing: Index multiple documents in parallel
  • Status Tracking: Track indexing progress in database
  • Error Recovery: Resume failed indexing jobs

Tech Stack

  • Framework: Hono (lightweight web framework)
  • Runtime: Node.js 20+ or Cloudflare Workers
  • Database: Supabase PostgreSQL (indexing status tracking)
  • Embeddings: OpenAI text-embedding-3-small/large
  • Vector DB: Pinecone vector database
  • Validation: Zod 3.x
  • Auth: Clerk JWT verification for API access

Package Dependencies

  • @loop/core — Result type, error handling, logging, circuit breakers, rate limiting
  • @loop/shared — Zod schemas, types, constants
  • @loop/database — Supabase repositories for indexing status
  • @loop/ai — OpenAI client, embedding utilities, token counting
  • @loop/hono — Shared Hono middleware (auth, errors, CORS)

Development

Local Setup

# Install dependencies pnpm install # Set environment variables cp apps/embeddings-api/.env.example apps/embeddings-api/.env # Run development server pnpm --filter @loop/embeddings-api dev # Access service at http://localhost:3004

Required Environment Variables

# Database DATABASE_URL=postgresql://... SUPABASE_URL=https://okjpxbiipeghfhwksoit.supabase.co SUPABASE_SERVICE_KEY=eyJhbGc... # Clerk Auth CLERK_PUBLISHABLE_KEY=pk_... CLERK_SECRET_KEY=sk_... # OpenAI OPENAI_API_KEY=sk-... # Pinecone PINECONE_API_KEY=... PINECONE_ENVIRONMENT=us-west1-gcp PINECONE_INDEX=loop-embeddings # Configuration EMBEDDING_MODEL=text-embedding-3-small EMBEDDING_DIMENSIONS=1536 MAX_BATCH_SIZE=100

Testing Embeddings

# Generate single embedding curl -X POST http://localhost:3004/api/v1/embeddings \ -H "Content-Type: application/json" \ -H "Authorization: Bearer <clerk-token>" \ -d '{"text": "What are the benefits of BPC-157?"}' # Search for similar content curl -X POST http://localhost:3004/api/v1/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer <clerk-token>" \ -d '{"query": "muscle recovery peptides", "limit": 10}'

Commands

pnpm dev # Start dev server (port 3004) pnpm build # Production build pnpm start # Start production server pnpm typecheck # Type check pnpm lint # Lint code pnpm test # Run tests

Deployment

  • Platform: Cloudflare Workers (production) or Vercel (fallback)
  • Environments:
    • Production: main branch → embeddings.loop.health
    • Staging: staging branch → staging-embeddings.loop.health
    • Development: develop branch → dev-embeddings.loop.health
  • Database: Shared Supabase project
  • Vector DB: Pinecone production index
  • Edge Runtime: Runs on Cloudflare’s global network
  • Auto-Deploy: Push to branch triggers deployment

Environment-Specific Configuration

Production:

  • Model: text-embedding-3-small (cost-optimized)
  • Index: loop-embeddings-production
  • Rate limit: 3000 requests/minute

Staging:

  • Model: text-embedding-3-small
  • Index: loop-embeddings-staging
  • Rate limit: 1000 requests/minute

Development:

  • Model: text-embedding-3-small
  • Index: loop-embeddings-dev
  • Rate limit: 100 requests/minute

Monitoring

  • Health Check: GET /api/health returns service status
  • Metrics: Embedding generation rate, search latency, error rate
  • Cost Tracking: OpenAI API usage and costs
  • Alerts: Failed embeddings, Pinecone errors, rate limit violations
  • Logs: Structured logs via @loop/core logger

See Also