Retrieval
Embed Documents
Prepare and embed text documents for retrieval.
Use embedDocuments(...) during preprocessing. This step should usually run before user requests: in a build step, admin action, startup task, or background ingestion job.
1. Prepare Documents
const documents = [
{
id: "password-reset",
title: "Password reset policy",
body: "Password reset links expire after 30 minutes.",
product: "support",
},
{
id: "priority-support",
title: "Priority support",
body: "Enterprise customers receive priority support.",
product: "support",
},
];Normalize or chunk your source text before embedding it.
2. Embed With Selectors
import { embedDocuments } from "@anvia/core";
const embedded = await embedDocuments(embeddings, documents, {
id: (doc) => doc.id,
content: (doc) => `${doc.title}\n${doc.body}`,
metadata: (doc) => ({
product: doc.product,
title: doc.title,
}),
});The content selector chooses text to embed. The original document is still kept on the embedded result.
3. Embed Multiple Chunks per Document
Return an array from content(...) when one document should have multiple embeddings.
const embedded = await embedDocuments(embeddings, articles, {
id: (article) => article.slug,
content: (article) => article.sections.map((section) => section.text),
metadata: (article) => ({ product: article.product }),
});Multiple chunks can match the same document during search.
4. Control Ingestion Concurrency
const embedded = await embedDocuments(embeddings, documents, {
id: (doc) => doc.id,
content: (doc) => doc.body,
concurrency: 2,
});Start low and increase concurrency only when your embedding provider and infrastructure can handle it.
