Module

ai.llm

LLM — Typed async LLM access.

Provider string in, typed results out. Streaming-native.

TheLLMclass wraps provider-specific HTTP calls behind a unified interface. Bothgenerate() and stream()support text and structured (dataclass) output modes.

Free-threading safety:

  • LLM instances are effectively immutable after construction
  • httpx.AsyncClient is created per-request (no shared mutable state)
  • ProviderConfig is a frozen dataclass

Classes

LLM 11
Typed async LLM access. Usage:: llm = LLM("anthropic:claude-sonnet-4-20250514") # Text g…

Typed async LLM access.

Usage::

llm = LLM("anthropic:claude-sonnet-4-20250514")

# Text generation
text = await llm.generate("Explain quantum computing")

# Text streaming
async for token in llm.stream("Analyze this:"):
    print(token, end="")

# Structured output (frozen dataclass)
@dataclass(frozen=True, slots=True)
class Summary:
    title: str
    key_points: list[str]
    sentiment: str

summary = await llm.generate(Summary, prompt="Summarize: ...")

Provider string format:provider:model

  • **anthropic**: claude-sonnet-4-20250514
  • **openai**: gpt-4o

Methods

provider 0 str
The provider name (e.g., 'anthropic', 'openai').
property
def provider(self) -> str
Returns
str
model 0 str
The model name (e.g., 'claude-sonnet-4-20250514').
property
def model(self) -> str
Returns
str
generate 2 str
async
async def generate(self, prompt: str, /, **kwargs: Any) -> str
Parameters
Name Type Description
prompt
**kwargs
Returns
str
generate 2 T
async
async def generate(self, cls: type[T], /, *, prompt: str, **kwargs: Any) -> T
Parameters
Name Type Description
prompt
**kwargs
Returns
T
generate 5 Any
Generate a complete LLM response. **Text mode** — pass a prompt string, get a …
async
async def generate(self, prompt_or_cls: str | type, /, *, prompt: str | None = None, system: str | None = None, max_tokens: int | None = None, temperature: float | None = None) -> Any

Generate a complete LLM response.

Text mode — pass a prompt string, get a string back::

text = await llm.generate("Explain quantum computing")

Structured mode — pass a dataclass type + prompt, get a typed instance back::

summary = await llm.generate(Summary, prompt="Summarize: ...")

The LLM is instructed to return JSON matching the dataclass schema. The response is parsed and mapped to a frozen dataclass instance.

Parameters
Name Type Description
prompt_or_cls
prompt Default:None
system Default:None
max_tokens Default:None
temperature Default:None
Returns
Any
stream 2 AsyncIterator[str]
def stream(self, prompt: str, /, **kwargs: Any) -> AsyncIterator[str]
Parameters
Name Type Description
prompt
**kwargs
Returns
AsyncIterator[str]
stream 2 AsyncIterator[str]
def stream(self, cls: type[T], /, *, prompt: str, **kwargs: Any) -> AsyncIterator[str]
Parameters
Name Type Description
prompt
**kwargs
Returns
AsyncIterator[str]
stream 5 AsyncIterator[str]
Stream LLM response tokens incrementally. **Text mode** — yields string tokens…
async
async def stream(self, prompt_or_cls: str | type, /, *, prompt: str | None = None, system: str | None = None, max_tokens: int | None = None, temperature: float | None = None) -> AsyncIterator[str]

Stream LLM response tokens incrementally.

Text mode — yields string tokens::

async for token in llm.stream("Analyze this:"):
    print(token, end="")

Structured mode — streams tokens (for display) while building toward a structured result. Caller accumulates tokens for parsing.

Both modes yieldstrtokens. For structured output, accumulate the full text and parse withparse_structured()after streaming completes.

Parameters
Name Type Description
prompt_or_cls
prompt Default:None
system Default:None
max_tokens Default:None
temperature Default:None
Returns
AsyncIterator[str]
Internal Methods 3
__init__ 4
def __init__(self, provider: str, /, *, api_key: str | None = None, max_tokens: int = 4096, temperature: float = 0.0) -> None
Parameters
Name Type Description
provider
api_key Default:None
max_tokens Default:4096
temperature Default:0.0
_generate_raw 4 str
Dispatch to provider-specific generation.
async
async def _generate_raw(self, messages: list[dict[str, str]], *, system: str | None, max_tokens: int, temperature: float) -> str
Parameters
Name Type Description
messages
system
max_tokens
temperature
Returns
str
_stream_raw 4 AsyncIterator[str]
Dispatch to provider-specific streaming.
async
async def _stream_raw(self, messages: list[dict[str, str]], *, system: str | None, max_tokens: int, temperature: float) -> AsyncIterator[str]
Parameters
Name Type Description
messages
system
max_tokens
temperature
Returns
AsyncIterator[str]