EmbImage

Overview

CapyDB now supports image embeddings as well. By wrapping your base64 encoded image data in EmbImage, you can leverage both text and vision models to extract meaningful features from your images. This allows for semantic searches that go beyond traditional keyword matching.

Key Points:

  • Image Data Handling: Provide your image data in base64 format. Note that EmbImage only accepts binary data (base64 encoded) and does not accept image file paths.
  • Dual-Model Support: Optionally use an embedding model and a vision model to generate rich, multi-modal representations.
  • Asynchronous Processing: Image embeddings and chunking occur in the background, ensuring a responsive write experience.
  • Automatic Chunking: Images are internally processed (and chunked, if applicable) for efficient embedding and indexing.

Basic Usage

Below is the simplest way to use EmbImage:

from capydb import EmbImage

# Storing a single image field to embed.
{
  "field_name": EmbImage(
    data="base64_encoded_image_data",
    mime_type="image/jpeg"
  )
}

This snippet creates an EmbImage object containing your base64 encoded image data. The mime_type parameter is required to specify the image format. By default, no specific models are set and all other parameters remain optional.


Customized Usage with Parameters

If you have specific requirements (e.g., using a particular embedding or vision model), customize EmbImage by specifying additional parameters:

from capydb import EmbImage, EmbModels, VisionModels

{
    "field_name": EmbImage(
        data="base64_encoded_image_data",                   # Base64-encoded image
        mime_type="image/jpeg",                             # Required: specify the image format
        emb_model=EmbModels.TEXT_EMBEDDING_3_LARGE,         # Optionally specify an embedding model
        vision_model=VisionModels.GPT_4O,                   # Optionally specify a vision model
        max_chunk_size=200,                                 # Configure chunk sizes
        chunk_overlap=20,                                   # Overlap between chunks
        is_separator_regex=False,                           # Are separators plain strings or regex?
        separators=[
            "\n\n",
            "\n",
        ],
        keep_separator=False                                # Keep or remove the separator in chunks
    )
}

After Saving

CapyDB processes your image data asynchronously. Once processed, several important changes happen:

  1. The original base64 image data is removed to save space and bandwidth
  2. A public URL is assigned to access the stored image
  3. A chunks field is added containing the generated text descriptions/representations

This significantly reduces document size while providing convenient access to both the image and its processed text representations.

{
    "field_name": EmbImage(
        # Original base64 data is removed and replaced with a URL
        url="https://media.capydb.com/your-project/your-db/your-collection/doc-id/field_name.jpg",
        mime_type="image/jpeg",
        chunks=["chunk1", "chunk2", "chunk3"],
        emb_model=EmbModels.TEXT_EMBEDDING_3_LARGE,
        vision_model=VisionModels.GPT_4O,
        max_chunk_size=200,
        chunk_overlap=20,
        is_separator_regex=False,
        separators=["\n\n", "\n"],
        keep_separator=False
    )
}

Accessing EmbImage Properties

After retrieving a document with an EmbImage field, you can access its properties like the URL and chunks. The following examples demonstrate how to work with EmbImage objects in your application:

from capydb import CapyDBClient, EmbImage

# Get document with an EmbImage
client = CapyDBClient("your_api_key")
collection = client.database("project_id", "db_name").collection("collection_name")
document = collection.find_one({"_id": "document_id"})

# Access EmbImage properties
if "image" in document and isinstance(document["image"], EmbImage):
    # Access the URL (added by server after processing)
    image_url = document["image"].url
    print(f"Image URL: {image_url}")
    
    # Access processed chunks (added by server after processing)
    chunks = document["image"].chunks
    print(f"First chunk: {chunks[0] if chunks else 'No chunks yet'}")

Parameter Reference

ParameterDescription
dataThe base64 encoded image data. This image is processed and embedded for semantic search. Note: After processing, this data is removed from the document and replaced with a URL. Important: EmbImage only accepts base64 encoded binary data and does not support image file paths.
urlAuto-generated by the database after processing the image. This URL provides access to the stored image and replaces the original base64 data in the document.
mime_typeThe MIME type of the image (e.g., "image/jpeg", "image/png"). This parameter is required. Supported types include image/jpeg, image/jpg, image/png, image/gif, and image/webp.
emb_modelWhich embedding model to use. Defaults to None if not provided. Supported models include text-embedding-3-small, text-embedding-3-large, and text-embedding-ada-002.
vision_modelWhich vision model to use for processing the image. Defaults to None if not provided. Supported vision models include GPT_4O_MINI, GPT_4O, GPT_4_TURBO, and O1.
max_chunk_sizeMaximum size for each chunk. Depending on your image processing, this parameter can control the granularity of the embedded segments.
chunk_overlapOverlapping size between consecutive chunks, useful for preserving context between segmented image parts.
is_separator_regexWhether to treat each separator in separators as a regular expression. Defaults to False.
separatorsA list of separator strings (or regex patterns) used during processing. While more common in text, these may also apply to image metadata or descriptions if present.
keep_separatorIf True, separators remain in the processed data. If False, they are removed.
chunksAuto-generated by the database after processing the image. It is not set by the user, and is available only after embedding completes.

How It Works: Asynchronous Processing

Whenever you insert a document containing EmbImage into CapyDB, the following steps occur asynchronously:

  1. Image Retrieval

    The image is retrieved from the provided base64 encoded data.

  2. Chunking (if applicable)

    Depending on your configuration, the image may be internally segmented into chunks for finer-grained processing.

  3. Embedding

    The image (or its chunks) is transformed into vector representations using the specified embedding and/or vision model. This step extracts the semantic and visual features.

  4. Indexing

    The resulting embeddings are indexed for efficient, semantic searches. These steps happen in the background, so while write operations are fast, query availability may have a slight delay.


Querying

Once the Embedding and Indexing steps are complete, your EmbImage fields become searchable. To perform semantic queries on image data, use the standard query operations. Please refer to the Query page for details.


How can we improve this documentation?

Got question? Email us