Swapping the LLM Provider¶
FastSvelte ships OpenAI as the default LLM, but no route, service, or billing code ever imports a provider SDK directly — they depend only on a small LLMClient seam (see AI Integration). To switch providers you write one class implementing that interface and repoint the dependency-injection container at it. Nothing else in the backend changes.
The contract¶
backend/app/service/llm_client.py:
class LLMClient(Protocol):
async def structured(self, messages: list[dict], model: Type[T]) -> tuple[T, TokenUsage]: ...
def stream(self, messages: list[dict]) -> AsyncIterator[str | TokenUsage]: ...
structured()returns a parsed Pydantic model plus the call'sTokenUsage.stream()yields plain-text chunks and, as its final item, the call'sTokenUsage(the caller accumulates text for the UI and forwards that final usage to billing).
messages is an OpenAI-style list[dict] (roles system / user / assistant); each adapter is responsible for translating it into its own provider's request shape.
Claude (Anthropic) — a complete recipe¶
Add anthropic to the backend dependencies, then create backend/app/service/claude_client.py:
from typing import AsyncIterator, Type, TypeVar
from anthropic import AsyncAnthropic, NOT_GIVEN
from pydantic import BaseModel
from app.model.llm_usage_model import TokenUsage
from app.service.llm_client import LLMClient
T = TypeVar("T", bound=BaseModel)
PROVIDER = "anthropic"
def _split_system(messages: list[dict]) -> tuple[str | object, list[dict]]:
# Anthropic takes the system prompt as a top-level arg, not a role in `messages`.
system = next((m["content"] for m in messages if m["role"] == "system"), NOT_GIVEN)
chat = [m for m in messages if m["role"] != "system"]
return system, chat
class ClaudeClient(LLMClient):
def __init__(self, model: str, max_tokens: int = 4096, api_key: str | None = None):
self.client = AsyncAnthropic(api_key=api_key)
self.model = model
self.max_tokens = max_tokens
async def structured(self, messages: list[dict], model: Type[T]) -> tuple[T, TokenUsage]:
system, chat = _split_system(messages)
response = await self.client.messages.parse(
model=self.model,
max_tokens=self.max_tokens,
system=system,
messages=chat,
output_format=model, # Pydantic class -> response.parsed_output
)
return response.parsed_output, self._usage(response.usage)
async def stream(self, messages: list[dict]) -> AsyncIterator[str | TokenUsage]:
system, chat = _split_system(messages)
async with self.client.messages.stream(
model=self.model,
max_tokens=self.max_tokens,
system=system,
messages=chat,
) as stream:
async for text in stream.text_stream:
yield text
final = await stream.get_final_message()
yield self._usage(final.usage)
def _usage(self, usage) -> TokenUsage:
# Anthropic reports input/output separately; total is the sum.
return TokenUsage(
provider=PROVIDER,
model=self.model,
input_tokens=usage.input_tokens,
output_tokens=usage.output_tokens,
total_tokens=usage.input_tokens + usage.output_tokens,
)
The shape mirrors the shipped OpenAIClient: structured() uses the SDK's structured-output helper (messages.parse with a Pydantic output_format), and stream() yields text deltas then a final TokenUsage. The only provider-specific work is lifting the system role out of messages (Anthropic takes it as a top-level argument) and supplying max_tokens, which Anthropic requires.
Wire it in¶
Point the dependency-injection container at the new class. In backend/app/config/container.py, swap the openai_client provider for a claude_client and inject it wherever openai_client was wired (the llm_client=... argument):
claude_client = providers.Singleton(
ClaudeClient,
model=settings.anthropic_model, # e.g. "claude-opus-4-8"
api_key=settings.anthropic_api_key,
)
Add anthropic_api_key / anthropic_model to Settings (mirroring the existing OpenAI fields), which read FS_ANTHROPIC_API_KEY / FS_ANTHROPIC_MODEL from the environment.
Add the pricing row¶
Cost is computed from the model_price table, so add a row for the Claude model under provider anthropic or cost calculation fails — see AI Usage & Credit Billing. Current Claude pricing (per 1M tokens, input / output):
| Model | Input / 1M | Output / 1M |
|---|---|---|
claude-opus-4-8 |
$5.00 | $25.00 |
claude-sonnet-4-6 |
$3.00 | $15.00 |
claude-haiku-4-5 |
$1.00 | $5.00 |
Other providers¶
Gemini, LiteLLM, or any other provider follow the exact same shape: one class implementing structured() + stream(), mapping the provider's usage fields into TokenUsage, wired via container.py, with a matching model_price row. The copilot, billing, routes, and UI are untouched.
Next steps¶
- AI Integration — the
LLMClientseam, the sample copilot, and streaming. - AI Usage & Credit Billing — how every call is metered and billed.