Response Length Enforcement

New in 0.36.0

Response length enforcement is now configurable per client with four modes.

Talemate can control response length in two ways: by capping the token budget sent to the API (max_tokens) and by appending human-readable length instructions to the prompt. The Response Length Enforcement setting lets you choose which of these mechanisms to use.

How It Works

Many prompt templates already include response length instructions inline (e.g., "The length of your response must fit within 2 paragraphs"). However, some templates do not. When the selected mode includes instructions, Talemate automatically appends a response length instruction as a fallback for those templates, ensuring the model always receives length guidance.

The instruction is derived from the configured max token count for the current generation:

Token Budget	Instruction
≤ 32	Keep your response short and limited
≤ 64	1 - 3 sentences
≤ 128	1 paragraph
≤ 256	2 paragraphs
≤ 384	3 paragraphs
≤ 512	4 paragraphs
≤ 1024	6 paragraphs
≤ 1536	9 paragraphs
> 1536	Be as detailed and verbose as you need to be

Modes

Mode	Token limit	Instructions	Description
Limit tokens and send instructions	Yes	Yes	The default. Limits the API token budget and appends length instructions to prompts that don't already include them.
Limit tokens	Yes	No	Limits the API token budget but does not append any length instructions.
Send instructions	No	Yes	Appends length instructions but does not limit the API token budget, leaving it up to the API.
Uncapped	No	No	Neither limits tokens nor sends instructions.

Uncapped

Not recommended. Any generation length settings will be ignored when this is selected.

Configuration

The setting is found in the client settings under the General tab.

Open the client settings by clicking on a client in the sidebar
Find the Response Length Enforcement dropdown
Select the desired mode

This setting defaults to Limit tokens and send instructions.