@loop/embeddings-api — Vector Embedding Generation
Hono-based microservice for generating vector embeddings from text using OpenAI’s embedding models. Provides API endpoints for semantic search, content recommendations, and RAG (Retrieval Augmented Generation) pipelines.
Purpose
The embeddings API powers semantic search and AI features across Loop applications:
Core Functions
- Text Embedding: Convert text to vector representations using OpenAI
- Batch Processing: Efficient batch embedding generation
- Vector Storage: Store embeddings in Pinecone vector database
- Similarity Search: Find semantically similar content
- Content Indexing: Index CMS content, research papers, FAQs for AI retrieval
Use Cases
- Semantic Search: Find relevant content based on meaning, not keywords
- Content Recommendations: Suggest related articles, protocols, research
- RAG Pipelines: Retrieve context for AI-powered chat and Q&A
- Duplicate Detection: Identify similar content in CMS
- Clustering: Group related content for analytics
Enables AI-powered search and recommendations without exposing OpenAI API keys to frontend apps.
Architecture
Route Structure
API Routes (src/routes/)
POST /api/v1/embeddings— Generate embedding for textPOST /api/v1/embeddings/batch— Batch embed multiple textsPOST /api/v1/search— Semantic similarity searchPOST /api/v1/index— Index content with metadataDELETE /api/v1/index/:id— Remove indexed contentGET /api/v1/similar/:id— Find similar content by IDGET /api/health— Health check endpoint
Key Components
Embedding Generator (src/services/embedding-generator.ts)
- OpenAI API client wrapper
- Batch processing with rate limiting
- Error handling and retry logic
- Token counting and cost estimation
Vector Store (src/services/vector-store.ts)
- Pinecone client wrapper
- Upsert and delete operations
- Similarity search with filtering
- Metadata management
Content Indexer (src/services/content-indexer.ts)
- Chunk long text into embeddings
- Extract metadata from content
- Batch index documents
- Track indexing status
Search Engine (src/services/search-engine.ts)
- Query vector generation
- Hybrid search (vector + metadata filters)
- Result ranking and scoring
- Deduplication
Key Features
Embedding Generation
- OpenAI Models: Uses text-embedding-3-small or text-embedding-3-large
- Batch Processing: Efficient batching for multiple texts
- Rate Limiting: Automatic rate limiting to stay within OpenAI quotas
- Error Handling: Retry logic for transient failures
- Cost Tracking: Log token usage for cost monitoring
Vector Storage
- Pinecone Integration: Store embeddings in Pinecone vector database
- Metadata Filtering: Filter search results by metadata
- Namespaces: Separate embeddings by content type or environment
- Upsert Operations: Update existing embeddings efficiently
- Bulk Deletion: Remove embeddings by namespace or metadata
Semantic Search
- Cosine Similarity: Find semantically similar content
- Hybrid Search: Combine vector search with metadata filters
- Result Ranking: Score and rank results by relevance
- Deduplication: Remove duplicate results
- Pagination: Support for large result sets
Content Indexing
- Chunking: Split long documents into embeddable chunks
- Metadata Extraction: Extract title, type, author, date, etc.
- Batch Indexing: Index multiple documents in parallel
- Status Tracking: Track indexing progress in database
- Error Recovery: Resume failed indexing jobs
Tech Stack
- Framework: Hono (lightweight web framework)
- Runtime: Node.js 20+ or Cloudflare Workers
- Database: Supabase PostgreSQL (indexing status tracking)
- Embeddings: OpenAI text-embedding-3-small/large
- Vector DB: Pinecone vector database
- Validation: Zod 3.x
- Auth: Clerk JWT verification for API access
Package Dependencies
@loop/core— Result type, error handling, logging, circuit breakers, rate limiting@loop/shared— Zod schemas, types, constants@loop/database— Supabase repositories for indexing status@loop/ai— OpenAI client, embedding utilities, token counting@loop/hono— Shared Hono middleware (auth, errors, CORS)
Development
Local Setup
# Install dependencies
pnpm install
# Set environment variables
cp apps/embeddings-api/.env.example apps/embeddings-api/.env
# Run development server
pnpm --filter @loop/embeddings-api dev
# Access service at http://localhost:3004Required Environment Variables
# Database
DATABASE_URL=postgresql://...
SUPABASE_URL=https://okjpxbiipeghfhwksoit.supabase.co
SUPABASE_SERVICE_KEY=eyJhbGc...
# Clerk Auth
CLERK_PUBLISHABLE_KEY=pk_...
CLERK_SECRET_KEY=sk_...
# OpenAI
OPENAI_API_KEY=sk-...
# Pinecone
PINECONE_API_KEY=...
PINECONE_ENVIRONMENT=us-west1-gcp
PINECONE_INDEX=loop-embeddings
# Configuration
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
MAX_BATCH_SIZE=100Testing Embeddings
# Generate single embedding
curl -X POST http://localhost:3004/api/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <clerk-token>" \
-d '{"text": "What are the benefits of BPC-157?"}'
# Search for similar content
curl -X POST http://localhost:3004/api/v1/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <clerk-token>" \
-d '{"query": "muscle recovery peptides", "limit": 10}'Commands
pnpm dev # Start dev server (port 3004)
pnpm build # Production build
pnpm start # Start production server
pnpm typecheck # Type check
pnpm lint # Lint code
pnpm test # Run testsDeployment
- Platform: Cloudflare Workers (production) or Vercel (fallback)
- Environments:
- Production:
mainbranch → embeddings.loop.health - Staging:
stagingbranch → staging-embeddings.loop.health - Development:
developbranch → dev-embeddings.loop.health
- Production:
- Database: Shared Supabase project
- Vector DB: Pinecone production index
- Edge Runtime: Runs on Cloudflare’s global network
- Auto-Deploy: Push to branch triggers deployment
Environment-Specific Configuration
Production:
- Model: text-embedding-3-small (cost-optimized)
- Index: loop-embeddings-production
- Rate limit: 3000 requests/minute
Staging:
- Model: text-embedding-3-small
- Index: loop-embeddings-staging
- Rate limit: 1000 requests/minute
Development:
- Model: text-embedding-3-small
- Index: loop-embeddings-dev
- Rate limit: 100 requests/minute
Monitoring
- Health Check:
GET /api/healthreturns service status - Metrics: Embedding generation rate, search latency, error rate
- Cost Tracking: OpenAI API usage and costs
- Alerts: Failed embeddings, Pinecone errors, rate limit violations
- Logs: Structured logs via
@loop/corelogger