Embeddings

You can manage your available embeddings through the application settings.

In the settings dialogue go to Presets and then Embeddings.

Pre-configured Embeddings

all-MiniLM-L6-v2

The default ChromaDB embedding. Also the default for the Memory agent unless changed.

Fast, but the least accurate.

Alibaba-NLP/Gte-Base-En-V1.5

Sentence transformer model that is decently fast and accurate and will likely become the default for the Memory agent in the future.

Instructor Models

Support of these likely deprecated

Its become increasingly difficult to install support for these while keeping other dependencies up to date. See this issue for more details.

Use the Alibaba-NLP/Gte-Base-En-V1.5 embedding instead, its pretty close in accuracy and much smaller.

Instructor embeddings, coming in three sizes: base, large, and xl. XL is the most accurate but also has the biggest memory footprint and is the slowest. Using cuda is recommended for the xl and large models.

OpenAI text-embedding-3-small

OpenAI's current text embedding model. Fast and accurate, but not free.

Adding an Embedding

You can add new embeddings by clicking the Add new button.

Select the embedding type and then enter the model name. When using sentence-transformer, make sure the modelname matches the name of the model repository on Huggingface, so for example Alibaba-NLP/gte-base-en-v1.5.

New embeddings require a download

When you add a new embedding model and use it for the first time in the Memory agent, Talemate will download the model from Huggingface. This can take a while, depending on the size of the model and your internet connection.

You can track the download in the talemate process window. A better UX based download progress bar is planned for a future release.

Editing an Embedding

Select the existing embedding from the left side bar and you may change the following properties:

Trust Remote Code

For custom sentence-transformer models, you may need to toggle this on. This can be a security risk, so only do this if you trust the model's creator. It basically allows remote code execution.

Warning

Only trust models from reputable sources.

Device

The device to use for the embeddings. This can be either cpu or cuda. Note that this can also be overridden in the Memory agent settings.

Distance

The maximum distance for results to be considered a match. Different embeddings may require different distances, so if you find low accuracy, try changing this value.

Distance Mod

A multiplier for the distance. This can be used to fine-tune the distance without changing the actual distance value. Generally you should leave this at 1.

Distance Function

The function to use for calculating the distance. The default is Cosine Similarity, but you can also use Inner Product or Squared L2. The selected embedding may require a specific distance function, so if you find low accuracy, try changing this value.

Fast

This is just a tag to mark the embedding as fast. It doesn't actually do anything, but can be useful for sorting later on.

GPU Recommendation

This is a tag to mark the embedding as needing a GPU. It doesn't actually do anything, but can be useful for sorting later on.

Local

This is a tag to mark the embedding as local. It doesn't actually do anything, but can be useful for sorting later on.