@anvia/transformers

Local Transformers.js embedding models for Anvia.

@anvia/transformers runs Hugging Face embedding models locally using Transformers.js. No API calls, no network latency, no per-token costs. It supports a wide range of sentence transformer models from the Hugging Face Hub.

Install

pnpm add @anvia/transformers

@huggingface/transformers is a transitive dependency. Models are downloaded and cached on first use.

Quick Start

import { createTransformersEmbeddingModel } from "@anvia/transformers";

const model = await createTransformersEmbeddingModel();

const embeddings = await model.embedTexts([
  "Anvia is a TypeScript AI agent framework.",
  "Agents can use tools and maintain conversation history.",
]);

console.log(embeddings[0].vector.length); // 384 (for all-MiniLM-L6-v2)

The create call downloads and initializes the model on first run. Subsequent calls use the cached model.

Configuration

type TransformersEmbeddingModelOptions = {
  model?: string;           // Hugging Face model name (default: "Xenova/all-MiniLM-L6-v2")
  pooling?: "mean" | "cls"; // pooling strategy (default: "mean")
  normalize?: boolean;      // normalize vectors (default: true)
  maxBatchSize?: number;    // texts per batch (default: 16)
};
// Custom model
const model = await createTransformersEmbeddingModel({
  model: "Xenova/all-mpnet-base-v2",
  maxBatchSize: 32,
});

// CLS pooling instead of mean
const model = await createTransformersEmbeddingModel({
  pooling: "cls",
  normalize: false,
});

Pooling Strategies

StrategyDescription
mean (default)Average all token embeddings. Best for general-purpose similarity.
clsUse the [CLS] token embedding. Some models are trained specifically for this.

Model Selection

The default Xenova/all-MiniLM-L6-v2 is a good general-purpose model (384 dimensions, fast). For higher quality:

  • Xenova/all-mpnet-base-v2 -- 768 dimensions, better accuracy
  • Xenova/multi-qa-MiniLM-L6-cos-v1 -- optimized for semantic search
  • Xenova/paraphrase-multilingual-MiniLM-L12-v2 -- multilingual support

Any model compatible with the feature-extraction pipeline from @huggingface/transformers will work.

FastEmbed vs Transformers.js

Aspect@anvia/fastembed@anvia/transformers
RuntimeONNX RuntimeTransformers.js (ONNX in browser/Node)
Model libraryFastEmbed curated listAny HF feature-extraction model
Default modelBGE-small-en-v1.5all-MiniLM-L6-v2
Browser supportNoYes
Batch size default25616

Choose FastEmbed for a curated, optimized experience. Choose Transformers.js when you need specific Hugging Face models or browser compatibility.

Using with Vector Stores

import { createTransformersEmbeddingModel } from "@anvia/transformers";
import { QdrantVectorStore } from "@anvia/qdrant";

const model = await createTransformersEmbeddingModel();
const store = await QdrantVectorStore.connect({
  collectionName: "docs",
  vectorSize: 384,
});

const index = store.index(model);
const results = await index.search({ query: "How do agents work?", topK: 5 });

Error Handling

  • Throws if the model download fails (network issues on first run)
  • Throws if the embedding count does not match the input text count
  • Throws on invalid vector format from the Transformers.js runtime