Retrieval

Loaders

Read local text files and PDFs before embedding documents.

Loaders are ingestion helpers for retrieval preprocessing. Use them when your source material starts as local files, directories, globs, bytes, or PDFs and you need to turn that material into Anvia Document[] values before calling embedDocuments(...).

Import loaders from @anvia/core/loaders, not from the root @anvia/core entry point. The loader subpath is separate because it depends on Node filesystem and PDF extraction packages.

1. Load Text Files

Use FileLoader for UTF-8 text files such as Markdown, plain text, exported docs, or generated knowledge files.

import { FileLoader, fileLoaderToDocuments } from "@anvia/core/loaders";

const documents = await fileLoaderToDocuments(
  FileLoader.withGlob("content/**/*.md").readWithPath().ignoreErrors(),
);

readWithPath() keeps the source path, and fileLoaderToDocuments(...) stores that path as the document id plus source metadata.

2. Load a Directory

const documents = await fileLoaderToDocuments(
  FileLoader.withDir("content/articles").readWithPath().ignoreErrors(),
);

withDir(...) reads direct files only. Use withGlob(...) when you need recursive matching.

3. Load Bytes

const bytes = new TextEncoder().encode("Password reset links expire after 30 minutes.");

const documents = await fileLoaderToDocuments(
  FileLoader.fromBytes(bytes).readWithPath().ignoreErrors(),
);

Byte loaders are useful when files come from an upload, object store, or another runtime source instead of a local path.

4. Load PDFs

Use PdfFileLoader when source material is a PDF.

import { PdfFileLoader, pdfLoaderToDocuments } from "@anvia/core/loaders";

const documents = await pdfLoaderToDocuments(
  PdfFileLoader.withGlob("manuals/**/*.pdf").readWithPath().ignoreErrors(),
);

This creates one document per PDF with the extracted text and mediaType: "application/pdf" metadata.

5. Split PDFs by Page

import { PdfFileLoader, pdfPageLoaderToDocuments } from "@anvia/core/loaders";

const pages = await pdfPageLoaderToDocuments(
  PdfFileLoader.withGlob("manuals/**/*.pdf").readWithPath().byPage().ignoreErrors(),
);

Use page splitting when a whole PDF is too broad for retrieval or when page-level source metadata matters. Page documents include source, mediaType, and pageNumber metadata.

6. Handle Batch Errors

Loader methods yield LoaderResult<T> values by default so one unreadable file does not have to fail the whole batch.

for await (const result of FileLoader.withGlob("content/**/*.md").readWithPath()) {
  if (result.ok) {
    console.log(result.value.path);
  } else {
    console.error(result.error);
  }
}

Call .ignoreErrors() when your ingestion job should skip failed files and continue with successful records.

7. Embed Loaded Documents

After loading, pass the documents to embedDocuments(...).

import { embedDocuments } from "@anvia/core";

const embedded = await embedDocuments(embeddings, documents, {
  id: (document) => document.id,
  content: (document) => document.text,
  metadata: (document) => document.additionalProps,
});

Loaders do ingestion only. For chunking beyond PDF pages, split text in application code before embedding or return multiple strings from the content(...) selector.

TopicReference
Loader APILoaders
Document embeddingEmbed Documents