Embed Documents
Prepare and embed text documents for retrieval.
Use embedDocuments(...) during preprocessing. This step should usually run before user requests: in a build step, admin action, startup task, or background ingestion job.
If your source material starts as local files or PDFs, read Loaders first, then embed the loaded documents.
1. Prepare Documents
const documents = [
{
id: "password-reset",
title: "Password reset policy",
body: "Password reset links expire after 30 minutes.",
product: "support",
},
{
id: "priority-support",
title: "Priority support",
body: "Enterprise customers receive priority support.",
product: "support",
},
];Normalize or chunk your source text before embedding it.
2. Load Local Files
Use @anvia/core/loaders when ingestion starts from local text files or PDFs. Loaders convert files, directories, globs, bytes, and PDFs into the Document[] shape that embedDocuments(...) expects.
import { FileLoader, fileLoaderToDocuments } from "@anvia/core/loaders";
const documents = await fileLoaderToDocuments(
FileLoader.withGlob("content/**/*.md").readWithPath().ignoreErrors(),
);PDF loaders support the same glob, directory, and byte inputs. Call .byPage() when each page should become its own retrieval document.
import { PdfFileLoader, pdfPageLoaderToDocuments } from "@anvia/core/loaders";
const pdfPages = await pdfPageLoaderToDocuments(
PdfFileLoader.withGlob("manuals/**/*.pdf").readWithPath().byPage().ignoreErrors(),
);Anvia loaders do ingestion only. For text chunking beyond PDF pages, preprocess text in application code before calling embedDocuments(...). For the complete loader workflow, see Loaders.
3. Embed With Selectors
import { embedDocuments } from "@anvia/core/embeddings";
const embedded = await embedDocuments(embeddings, documents, {
id: (doc) => doc.id,
content: (doc) => `${doc.title}\n${doc.body}`,
metadata: (doc) => ({
product: doc.product,
title: doc.title,
}),
});The content selector chooses text to embed. The original document is still kept on the embedded result.
4. Embed Multiple Chunks per Document
Return an array from content(...) when one document should have multiple embeddings.
const embedded = await embedDocuments(embeddings, articles, {
id: (article) => article.slug,
content: (article) => article.sections.map((section) => section.text),
metadata: (article) => ({ product: article.product }),
});Multiple chunks can match the same document during search.
5. Control Ingestion Concurrency
const embedded = await embedDocuments(embeddings, documents, {
id: (doc) => doc.id,
content: (doc) => doc.body,
concurrency: 2,
});Start low and increase concurrency only when your embedding provider and infrastructure can handle it.
