Google

The Google backend provides image generation, editing, and analysis capabilities using Google's Gemini image models. It supports text-to-image generation, image editing with reference images, and AI-powered image analysis.

Prerequisites

Before configuring the Google backend, you need to obtain a Google API key:

Go to Google AI Studio
Sign in with your Google account
Create a new API key or use an existing one
Copy the API key

Then configure it in Talemate:

Open Talemate Settings → Application → Google
Paste your Google API key in the "Google API Key" field
Save your changes

API Key vs Vertex AI Credentials

The Visualizer agent uses the Google API key (not Vertex AI service account credentials). Make sure you're using the API key from Google AI Studio, not the service account JSON file used for Vertex AI.

Configuration

In the Visualizer agent settings, select Google as your backend for text-to-image generation, image editing, image analysis, or any combination of these. Each operation can be configured separately.

Text-to-Image Configuration

For text-to-image generation, configure the following settings:

Google API Key: Your Google API key (configured globally in Talemate Settings)
Model: Select the image generation model to use:
- gemini-2.5-flash-image: Faster generation, good quality
- gemini-3-pro-image-preview: Higher quality, slower generation

The Google backend automatically handles aspect ratios based on the format you select:

Landscape: 16:9 aspect ratio
Portrait: 9:16 aspect ratio
Square: 1:1 aspect ratio

Image Editing Configuration

For image editing, configure similar settings but with an additional option:

Google API Key: Your Google API key
Model: Select the image generation model (same options as text-to-image)
Max References: Configure the maximum number of reference images (1-3). This determines how many reference images you can provide when editing an image.

Reference Images

Google's image editing models can use up to 3 reference images to guide the editing process. The "Max References" setting controls how many reference images Talemate will send to the API. You can adjust this based on your needs, but keep in mind that more references may provide better context for complex edits.

Image Analysis Configuration

For image analysis, configure the following:

Google API Key: Your Google API key
Model: Select a vision-capable text model:
- gemini-2.5-flash: Fast analysis, good for general use
- gemini-2.5-pro: Higher quality analysis
- gemini-3-pro-preview: Latest model with improved capabilities

Analysis Models

Image analysis uses text models that support vision capabilities, not the image generation models. These models can analyze images and provide detailed descriptions, answer questions about image content, and extract information from visual content.

Usage

Once configured, the Google backend will appear in the Visualizer agent status with green indicators showing which capabilities are available.

The status indicators show:

Text to Image: Available when text-to-image backend is configured
Image Edit: Available when image editing backend is configured (shows max references if configured)
Image Analysis: Available when image analysis backend is configured

Model Recommendations

Text-to-Image and Image Editing

gemini-2.5-flash-image: Best for faster generation and general use. Good balance of speed and quality.
gemini-3-pro-image-preview: Best for higher quality results when speed is less important. Use when you need the best possible image quality.

Image Analysis

gemini-2.5-flash: Best for quick analysis and general use cases. Fast responses with good accuracy.
gemini-2.5-pro: Best for detailed analysis requiring higher accuracy and more nuanced understanding.
gemini-3-pro-preview: Best for the latest capabilities and most advanced analysis features.

Prompt Formatting

The Google backend uses Descriptive prompt formatting by default. This means prompts are formatted as natural language descriptions rather than keyword lists. This works well with Google's Gemini models, which are designed to understand natural language instructions.

When generating images, provide detailed descriptions of what you want to create. For image editing, describe the changes you want to make in natural language.