@anvia/fastembed

Local FastEmbed embedding models for Anvia.

@anvia/fastembed runs embedding models locally using ONNX runtime via the FastEmbed library. No API calls, no network latency, no per-token costs. Use it when you want embeddings computed entirely on your machine or CI server.

Install

pnpm add @anvia/fastembed

FastEmbed (fastembed) is a transitive dependency. Models are downloaded and cached on first use.

Quick Start

import { createFastEmbedEmbeddingModel } from "@anvia/fastembed";

const model = await createFastEmbedEmbeddingModel();

const embeddings = await model.embedTexts([
  "Anvia is a TypeScript AI agent framework.",
  "Agents can use tools and maintain conversation history.",
]);

console.log(embeddings[0].vector.length); // 384 (for BGESmallENV15)

The create call downloads and initializes the model on first run. Subsequent calls use the cached model.

Configuration

type FastEmbedEmbeddingModelOptions = {
  model?: FastEmbedEmbeddingModelName;  // model name
  maxBatchSize?: number;                // texts per batch (default: 256)
  initOptions?: {
    executionProviders?: ExecutionProvider[];  // ONNX runtime providers
    maxLength?: number;                        // max token length
    cacheDir?: string;                         // model cache directory
    showDownloadProgress?: boolean;            // download progress bar
    modelName?: string;                        // custom model name
  };
};
// Custom model
const model = await createFastEmbedEmbeddingModel({
  model: "BAAI/bge-base-en-v1.5",
  maxBatchSize: 128,
});

// Custom cache directory
const model = await createFastEmbedEmbeddingModel({
  initOptions: { cacheDir: "./models", showDownloadProgress: true },
});

Available Models

The default model is BAAI/bge-small-en-v1.5 (384 dimensions, fast).

Other options depend on the FastEmbed version. Check the FastEmbed documentation for the full list of supported models.

When to Use Local Embeddings

ScenarioRecommendation
Development and testingFastEmbed (no API key needed)
CI/CD pipelinesFastEmbed (no network dependency)
Privacy-sensitive dataFastEmbed (data stays local)
High throughput productionProvider embeddings (dedicated infrastructure)
Large model qualityProvider embeddings (bigger models available)

Using with Vector Stores

import { createFastEmbedEmbeddingModel } from "@anvia/fastembed";
import { ChromaVectorStore } from "@anvia/chroma";

const model = await createFastEmbedEmbeddingModel();
const store = await ChromaVectorStore.connect({ collectionName: "docs" });

// Create an index for searching
const index = store.index(model);
const results = await index.search({ query: "How do agents work?", topK: 5 });

Error Handling

  • Throws if the model download fails (network issues on first run)
  • Throws if the embedding count does not match the input text count
  • Throws on invalid batch format from the FastEmbed runtime