Embedding Models

Embedding models transform text into vector representations, enabling semantic search capabilities in CapyDB. These vectors capture the meaning and context of text, allowing you to find documents based on semantic similarity rather than exact keyword matches.

When working with EmbText in CapyDB, you can specify which embedding model to use via the emb_model parameter.

Supported Embedding Models

Model	Provider	Dimensions	Description
text-embedding-3-small	OpenAI	1536	Smaller, faster model with excellent performance-to-cost ratio
text-embedding-3-large	OpenAI	3072	Highest quality model for the most demanding use cases
text-embedding-ada-002	OpenAI	1536	Legacy model with good performance for general use cases

Using Embedding Models in CapyDB

To specify an embedding model when working with EmbText, use the emb_model parameter:

from capydb import EmbText, EmbModels

{
    "description": EmbText(
        "This is a sample text that will be embedded.",
        emb_model=EmbModels.TEXT_EMBEDDING_3_LARGE
    )
}

In the example above, we're using the TEXT_EMBEDDING_3_LARGE model from OpenAI to embed the text.

Note: CapyDB does not charge for LLM usage directly. Instead, you pay the LLM providers via CapyDB, which facilitates the payment process for your convenience.

Best Practices

Use text-embedding-3-small for most general-purpose applications where cost-efficiency is important.
Choose text-embedding-3-large for applications requiring the highest quality embeddings, such as complex semantic retrieval tasks.
Consider using the same embedding model throughout your application for consistent results.
For text fields that don't require semantic search, you can omit the embedding model to save on processing costs.

How can we improve this documentation?

Share Your Thoughts