Anvia
Retrieval

Embed Documents

Prepare and embed text documents for retrieval.

Use embedDocuments(...) during preprocessing. This step should usually run before user requests: in a build step, admin action, startup task, or background ingestion job.

1. Prepare Documents

const documents = [
  {
    id: "password-reset",
    title: "Password reset policy",
    body: "Password reset links expire after 30 minutes.",
    product: "support",
  },
  {
    id: "priority-support",
    title: "Priority support",
    body: "Enterprise customers receive priority support.",
    product: "support",
  },
];

Normalize or chunk your source text before embedding it.

2. Embed With Selectors

import { embedDocuments } from "@anvia/core";

const embedded = await embedDocuments(embeddings, documents, {
  id: (doc) => doc.id,
  content: (doc) => `${doc.title}\n${doc.body}`,
  metadata: (doc) => ({
    product: doc.product,
    title: doc.title,
  }),
});

The content selector chooses text to embed. The original document is still kept on the embedded result.

3. Embed Multiple Chunks per Document

Return an array from content(...) when one document should have multiple embeddings.

const embedded = await embedDocuments(embeddings, articles, {
  id: (article) => article.slug,
  content: (article) => article.sections.map((section) => section.text),
  metadata: (article) => ({ product: article.product }),
});

Multiple chunks can match the same document during search.

4. Control Ingestion Concurrency

const embedded = await embedDocuments(embeddings, documents, {
  id: (doc) => doc.id,
  content: (doc) => doc.body,
  concurrency: 2,
});

Start low and increase concurrency only when your embedding provider and infrastructure can handle it.