CapyDB now supports image embeddings as well. By wrapping your base64 encoded image data in EmbImage
, you can leverage both text and vision models to extract meaningful features from your images. This allows for semantic searches that go beyond traditional keyword matching.
Key Points:
Below is the simplest way to use EmbImage
:
from capydb import EmbImage
# Storing a single image field to embed.
{
"field_name": EmbImage("https://example.com/image.jpg", mime_type="image/jpeg")
}
This snippet creates an EmbImage
object containing your image URL. The mime_type
parameter is required to specify the image format. By default, no specific models are set and all other parameters remain optional.
If you have specific requirements (e.g., using a particular embedding or vision model), customize EmbImage
by specifying additional parameters:
from capydb import EmbImage, EmbModels, VisionModels
{
"field_name": EmbImage(
url="https://example.com/image.jpg", # URL to the image
mime_type="image/jpeg", # Required: specify the image format
emb_model=EmbModels.TEXT_EMBEDDING_3_LARGE, # Optionally specify an embedding model
vision_model=VisionModels.GPT_4O, # Optionally specify a vision model
max_chunk_size=200, # Configure chunk sizes
chunk_overlap=20, # Overlap between chunks
is_separator_regex=False, # Are separators plain strings or regex?
separators=[
"\n\n",
"\n",
],
keep_separator=False # Keep or remove the separator in chunks
)
}
CapyDB processes your image data asynchronously. Once processed, it automatically adds a chunks
field to each EmbImage
for easy access to the internal representations.
{
"field_name": EmbImage(
url="https://example.com/image.jpg",
mime_type="image/jpeg",
chunks=["chunk1", "chunk2", "chunk3"],
emb_model=EmbModels.TEXT_EMBEDDING_3_LARGE,
vision_model=VisionModels.GPT_4O,
max_chunk_size=200,
chunk_overlap=20,
is_separator_regex=False,
separators=["\n\n", "\n"],
keep_separator=False
)
}
Parameter | Description |
---|---|
url | The URL to the image. This image is processed and embedded for semantic search. |
mime_type | The MIME type of the image (e.g., "image/jpeg", "image/png"). This parameter is required. Supported types include image/jpeg , image/jpg , image/png , image/gif , and image/webp . |
emb_model | Which embedding model to use. Defaults to None if not provided. Supported models include text-embedding-3-small , text-embedding-3-large , and text-embedding-ada-002 . |
vision_model | Which vision model to use for processing the image. Defaults to None if not provided. Supported vision models include GPT_4O_MINI , GPT_4O , GPT_4O_TURBO , and GPT_O1 . |
max_chunk_size | Maximum size for each chunk. Depending on your image processing, this parameter can control the granularity of the embedded segments. |
chunk_overlap | Overlapping size between consecutive chunks, useful for preserving context between segmented image parts. |
is_separator_regex | Whether to treat each separator in separators as a regular expression. Defaults to False . |
separators | A list of separator strings (or regex patterns) used during processing. While more common in text, these may also apply to image metadata or descriptions if present. |
keep_separator | If True , separators remain in the processed data. If False , they are removed. |
chunks | Auto-generated by the database after processing the image. It is not set by the user, and is available only after embedding completes. |
Whenever you insert a document containing EmbImage
into CapyDB, the following steps occur asynchronously:
Image Retrieval
The image is retrieved from the provided URL.
Chunking (if applicable)
Depending on your configuration, the image may be internally segmented into chunks for finer-grained processing.
Embedding
The image (or its chunks) is transformed into vector representations using the specified embedding and/or vision model. This step extracts the semantic and visual features.
Indexing
The resulting embeddings are indexed for efficient, semantic searches. These steps happen in the background, so while write operations are fast, query availability may have a slight delay.
Once the Embedding and Indexing steps are complete, your EmbImage
fields become searchable. To perform semantic queries on image data, use the standard query operations. Please refer to the Query page for details.
Your feedback helps us improve our documentation. Let us know what you think!