DeepInfra
Use DeepInfra's OpenAI-compatible inference with Anvia.
DeepInfra exposes an OpenAI-compatible chat completions surface at https://api.deepinfra.com/v1/openai. In Anvia, configure OpenAIClient with that baseUrl, then pass DeepInfra model ids to completionModel(...). DeepInfra also hosts dedicated APIs for embeddings, image generation, speech, and reranking.
Create the Client
import { AgentBuilder } from "@anvia/core";
import { OpenAIClient } from "@anvia/openai";
const client = new OpenAIClient({
baseUrl: "https://api.deepinfra.com/v1/openai",
apiKey: process.env.DEEPINFRA_TOKEN,
});
const model = client.completionModel("deepseek-ai/DeepSeek-V3");
const agent = new AgentBuilder("support", model)
.instructions("Answer support questions clearly.")
.build();
const response = await agent.prompt("Hello!").send();
console.log(response.output);baseUrl makes Anvia use the OpenAI-compatible chat completion adapter. DeepInfra model ids are namespaced as <org>/<model> and are not interchangeable with the upstream Hugging Face ids.
Get the Model List
DeepInfra exposes a /v1/models endpoint that returns the model ids available to your token. Because the client was created with baseUrl, listModels() calls DeepInfra's /models endpoint.
const models = await client.listModels();
console.table(
models.data.map((model) => ({
id: model.id,
name: model.name,
contextLength: model.contextLength,
})),
);Use the id field directly with completionModel(...).
Notes
- DeepInfra API tokens are passed as bearer tokens in the
Authorizationheader. - Supported parameters include
model,messages,max_tokens,stream,temperature,top_p,stop,n,presence_penalty,frequency_penalty,response_format,tools,tool_choice,service_tier, andreasoning_effort. Coverage can vary slightly per model. service_tier: "priority"requests priority inference with faster time-to-first-token and higher throughput during peak demand. Priority requests incur a 20% surcharge on top of the model's standard per-token price.- The hard cap for output tokens is 16384 for most models. To extend a response beyond the cap, send a follow-up chat completion with the previous assistant content included; DeepInfra returns
400when the total context size is exceeded. - DeepInfra hosts a wide open-model catalog, including DeepSeek, Llama, Qwen, Mistral, Mixtral, and many others. Browse the full list at deepinfra.com/models.
- DeepInfra also exposes dedicated APIs for embeddings, image generation, speech, and reranking that are not part of the OpenAI surface. Use DeepInfra's native endpoints or the OpenAI-compatible chat completions endpoint described here.
For current DeepInfra API details, see the DeepInfra OpenAI chat completions docs and the DeepInfra model catalog.
