Loaders
Read local text files and PDFs before embedding documents.
Loaders are ingestion helpers for retrieval preprocessing. Use them when your source material starts as local files, directories, globs, bytes, or PDFs and you need to turn that material into Anvia Document[] values before calling embedDocuments(...).
Import loaders from @anvia/core/loaders, not from the root @anvia/core entry point. The loader subpath is separate because it depends on Node filesystem and PDF extraction packages.
1. Load Text Files
Use FileLoader for UTF-8 text files such as Markdown, plain text, exported docs, or generated knowledge files.
import { FileLoader, fileLoaderToDocuments } from "@anvia/core/loaders";
const documents = await fileLoaderToDocuments(
FileLoader.withGlob("content/**/*.md").readWithPath().ignoreErrors(),
);readWithPath() keeps the source path, and fileLoaderToDocuments(...) stores that path as the document id plus source metadata.
2. Load a Directory
const documents = await fileLoaderToDocuments(
FileLoader.withDir("content/articles").readWithPath().ignoreErrors(),
);withDir(...) reads direct files only. Use withGlob(...) when you need recursive matching.
3. Load Bytes
const bytes = new TextEncoder().encode("Password reset links expire after 30 minutes.");
const documents = await fileLoaderToDocuments(
FileLoader.fromBytes(bytes).readWithPath().ignoreErrors(),
);Byte loaders are useful when files come from an upload, object store, or another runtime source instead of a local path.
4. Load PDFs
Use PdfFileLoader when source material is a PDF.
import { PdfFileLoader, pdfLoaderToDocuments } from "@anvia/core/loaders";
const documents = await pdfLoaderToDocuments(
PdfFileLoader.withGlob("manuals/**/*.pdf").readWithPath().ignoreErrors(),
);This creates one document per PDF with the extracted text and mediaType: "application/pdf" metadata.
5. Split PDFs by Page
import { PdfFileLoader, pdfPageLoaderToDocuments } from "@anvia/core/loaders";
const pages = await pdfPageLoaderToDocuments(
PdfFileLoader.withGlob("manuals/**/*.pdf").readWithPath().byPage().ignoreErrors(),
);Use page splitting when a whole PDF is too broad for retrieval or when page-level source metadata matters. Page documents include source, mediaType, and pageNumber metadata.
6. Handle Batch Errors
Loader methods yield LoaderResult<T> values by default so one unreadable file does not have to fail the whole batch.
for await (const result of FileLoader.withGlob("content/**/*.md").readWithPath()) {
if (result.ok) {
console.log(result.value.path);
} else {
console.error(result.error);
}
}Call .ignoreErrors() when your ingestion job should skip failed files and continue with successful records.
7. Embed Loaded Documents
After loading, pass the documents to embedDocuments(...).
import { embedDocuments } from "@anvia/core";
const embedded = await embedDocuments(embeddings, documents, {
id: (document) => document.id,
content: (document) => document.text,
metadata: (document) => document.additionalProps,
});Loaders do ingestion only. For chunking beyond PDF pages, split text in application code before embedding or return multiple strings from the content(...) selector.
Related Reference
| Topic | Reference |
|---|---|
| Loader API | Loaders |
| Document embedding | Embed Documents |
