CapyDB Extended JSON (EmbJSON)

CapyDB Extended JSON (EmbJSON) is a set of special data types that make working with AI and embeddings simple. With EmbJSON, you can store text, images, and other media in your database and have them automatically embedded for semantic search—without setting up separate vector databases or pipelines.

What EmbJSON Does For You

  • Automatic Embedding: Just wrap your content in an EmbJSON type, and CapyDB handles the embedding process
  • Simple Semantic Search: Query your data using natural language and similarity search
  • No Pipeline Management: Skip complex data processing workflows—embedding happens behind the scenes
  • Background Processing: Embeddings are generated asynchronously, keeping your app responsive

Quick Example

See how easy it is to use EmbJSON in your applications:

from capydb import EmbText, EmbImage

# Create a document with embedded fields
document = {
    "title": "My First Document",
    # EmbText automatically embeds the text for semantic search
    "description": EmbText("This is a detailed description that will be embedded for semantic search"),
    # EmbImage embeds the image data (base64 encoded)
    "thumbnail": EmbImage(
        data="base64_encoded_image_data",
        mime_type="image/jpeg"
    )
}

# Store in CapyDB - embedding and indexing happens automatically
collection.insert_one(document)

# Later, search semantically across all embedded fields
results = collection.find({"$semanticSearch": "design principles"})

Available EmbJSON Types

  • EmbText: For text data ranging from short phrases to long documents
  • EmbImage: For images (accepts base64 encoded binary data only)
  • Coming Soon: EmbVideo, EmbFile, EmbAudio, and Emb3D for additional media types

Why Use EmbJSON?

Traditional approaches require separate systems for storing data and embeddings. EmbJSON unifies them, eliminating the need to maintain vector databases alongside document stores. Write your data once, query it semantically.

Pro Tip: EmbJSON types handle customization options like chunk sizes and embedding models. Start simple and refine as your needs evolve.

How can we improve this documentation?

Got question? Email us