Core

Loaders

File and PDF ingestion helpers for retrieval preprocessing.

Import from @anvia/core/loaders.

Loaders are async iterables for ingestion pipelines. They read source material and yield LoaderResult<T> values by default. Call .ignoreErrors() when a batch should skip unreadable files instead of returning failed results.

Loaders are intentionally not exported from the root @anvia/core entry point because they depend on Node filesystem and PDF extraction packages.

LoaderResult

type LoaderResult<T> =
  | { ok: true; value: T }
  | { ok: false; error: unknown };

type FileSource = { path: string } | { path: "<memory>"; bytes: Uint8Array };
type FileReadWithPath = { path: string; text: string };

type PdfSource = { path: string } | { path: "<memory>"; bytes: Uint8Array };
type PdfReadWithPath = { path: string; text: string };
type PdfPage = { pageNumber: number; text: string };
type PdfPageWithPath = { path: string; pageNumber: number; text: string };

Purpose: keeps batch ingestion from throwing on the first failed item and names the file/PDF records yielded by loader modes.

Return behavior: .ignoreErrors() unwraps successful values and filters failures.

FileLoader

FileLoader.withGlob(pattern);
FileLoader.withDir(directory);
FileLoader.fromBytes(bytes);
FileLoader.fromBytesMany(bytesArray);

loader.read();
loader.readWithPath();
loader.ignoreErrors();

Purpose: read UTF-8 text files from globs, directories, or memory.

Return behavior:

MethodOutput
.read()string text
.readWithPath(){ path, text }
fromBytes(...)uses "<memory>" as the path

withDir(...) reads direct files only; subdirectories are ignored.

PdfFileLoader

PdfFileLoader.withGlob(pattern);
PdfFileLoader.withDir(directory);
PdfFileLoader.fromBytes(bytes);
PdfFileLoader.fromBytesMany(bytesArray);

loader.read();
loader.readWithPath();
loader.byPage();
loader.ignoreErrors();

Purpose: extract text from PDF files or PDF bytes.

Return behavior:

MethodOutput
.read()full PDF text
.readWithPath(){ path, text }
.byPage(){ pageNumber, text } with zero-based page numbers
.readWithPath().byPage(){ path, pageNumber, text }

PDF page splitting is the only built-in loader chunking behavior.

Document Adapters

fileToDocument(file);
fileLoaderToDocuments(loader);

pdfToDocument(pdf);
pdfLoaderToDocuments(loader);

pdfPageToDocument(page);
pdfPageLoaderToDocuments(loader);

Purpose: convert successful loader outputs into Anvia Document[] for retrieval preprocessing.

Text documents include source and mediaType: "text/plain" metadata. PDF documents include source and mediaType: "application/pdf" metadata. PDF page documents also include a string pageNumber metadata value.

For workflow guidance, see Embed Documents.