Concurrent Requests (Experimental)

Concurrent requests is an experimental feature that allows certain LLM clients to process multiple requests simultaneously rather than one at a time.

What It Does

When enabled, operations that require multiple LLM queries (such as generating image prompts) will execute those queries in parallel instead of sequentially. This can significantly reduce the total time needed for these batch operations.

Currently, this feature is only used for visual prompt generation (creating prompts for image generation). It is not applied to regular conversation or narration tasks.

Supported Clients

Concurrent requests are available for the following hosted API clients:

Local clients (KoboldCpp, llama.cpp, etc.) do not support this feature as they typically cannot handle concurrent inference requests.

How to Enable

For clients that support concurrent requests, you will see a toggle button in the client list. The button uses a parallel lines icon.

Click the button to enable concurrent requests. When enabled, the button will appear highlighted.

You can also enable this feature through the client's settings dialog under the Concurrency tab.

Important Considerations

Experimental Feature

This feature is experimental and may behave unpredictably in certain situations, particularly when rate limiting is in effect.

Rate Limiting: If you have rate limiting configured for the client, concurrent requests may interact with the rate limiter in unexpected ways. If you experience issues, consider disabling concurrent requests or adjusting your rate limit settings.

When to Use This Feature

Consider enabling concurrent requests if:

You frequently use the visual/image generation features
You want to reduce wait times during image prompt generation
You are not experiencing rate limit issues with the API

You can safely leave this disabled if:

You rarely use image generation features
You have strict rate limits configured
You experience any instability with the feature enabled