Vision models enable CapyDB to process and understand image content, creating semantic representations that can be used for advanced image search and analysis. These models extract features and context from images, similar to how embedding models work with text.
When working with EmbImage in CapyDB, you can specify which vision model to use via the vision_model
parameter.
Model | Provider | Description |
---|---|---|
GPT-4o | OpenAI | High-quality multimodal model capable of understanding images with excellent detail recognition and contextual understanding |
GPT-4o-mini | OpenAI | Smaller, more cost-effective version of GPT-4o with good performance for most image processing tasks |
GPT-4-turbo | OpenAI | Tuned for efficiency while maintaining high-quality image understanding capabilities |
O1 | OpenAI | Advanced vision model optimized for high-fidelity understanding of complex visual content |
To specify a vision model when working with EmbImage
, use the vision_model
parameter:
from capydb import EmbImage, VisionModels
{
"product_image": EmbImage(
data="base64_encoded_image_data",
mime_type="image/jpeg",
vision_model=VisionModels.GPT_4O
)
}
In the example above, we're using the GPT_4O
model from OpenAI to process and understand the image content.
You can use both vision models and embedding models together with EmbImage
to get the benefits of both:
from capydb import EmbImage, EmbModels, VisionModels
{
"product_image": EmbImage(
data="base64_encoded_image_data",
mime_type="image/jpeg",
vision_model=VisionModels.GPT_4O,
emb_model=EmbModels.TEXT_EMBEDDING_3_LARGE
)
}
This combination allows CapyDB to extract both visual features (via the vision model) and encode textual descriptions (via the embedding model) for comprehensive multimodal search capabilities.
Note: CapyDB does not charge for LLM usage directly. Instead, you pay the LLM providers via CapyDB, which facilitates the payment process for your convenience.
GPT-4o-mini
for cost-efficient image processing where the highest level of detail recognition is not required.GPT-4o
for high-quality image understanding in production applications.O1
for applications that require the most advanced image understanding capabilities.Your feedback helps us improve our documentation. Let us know what you think!