Embedding Models

Embedding models transform text into vector representations, enabling semantic search capabilities in CapyDB. These vectors capture the meaning and context of text, allowing you to find documents based on semantic similarity rather than exact keyword matches.

When working with EmbText in CapyDB, you can specify which embedding model to use via the emb_model parameter.

Supported Embedding Models

ModelProviderDimensionsDescription
text-embedding-3-smallOpenAI1536Smaller, faster model with excellent performance-to-cost ratio
text-embedding-3-largeOpenAI3072Highest quality model for the most demanding use cases
text-embedding-ada-002OpenAI1536Legacy model with good performance for general use cases

Using Embedding Models in CapyDB

To specify an embedding model when working with EmbText, use the emb_model parameter:

from capydb import EmbText, EmbModels

{
    "description": EmbText(
        "This is a sample text that will be embedded.",
        emb_model=EmbModels.TEXT_EMBEDDING_3_LARGE
    )
}

In the example above, we're using the TEXT_EMBEDDING_3_LARGE model from OpenAI to embed the text.

Note: CapyDB does not charge for LLM usage directly. Instead, you pay the LLM providers via CapyDB, which facilitates the payment process for your convenience.

Best Practices

  • Use text-embedding-3-small for most general-purpose applications where cost-efficiency is important.
  • Choose text-embedding-3-large for applications requiring the highest quality embeddings, such as complex semantic retrieval tasks.
  • Consider using the same embedding model throughout your application for consistent results.
  • For text fields that don't require semantic search, you can omit the embedding model to save on processing costs.

How can we improve this documentation?

Got question? Email us