Overview

CapyDB is a high-level database built specifically for Large Language Model (LLM) applications. It unifies multiple database architectures—NoSQL, vector, and object storage—within a single platform, allowing seamless storage, indexing, and retrieval of structured, unstructured, and vector-based data. This makes CapyDB the ideal choice for AI-driven projects, particularly those focused on natural language processing and data analysis.

What is a High-Level Database?

Much like how high-level programming languages like Python abstract away technical complexities to simplify development, CapyDB abstracts the complexities of different database architectures. By integrating NoSQL, vector, and object storage under one system, it provides developers with an accessible, powerful platform to manage the diverse data needs of LLM applications—without requiring expertise in multiple types of databases.

Benefits of a High-Level Database

CapyDB offers several key advantages for developers:

Cost Efficiency: No need to maintain separate servers or databases—CapyDB handles these tasks, reducing infrastructure costs and making it more affordable.
Time Savings: Built-in solutions streamline setup and management, allowing developers to focus on building applications instead of managing backends.
Ease of Use: Industry-leading data processing pipelines reduce the need for specialized knowledge, saving time and avoiding the need for additional hires.

Components of CapyDB

1. NoSQL (Document) Database

CapyDB includes a Mongo-compatible NoSQL database for flexible document-based storage and querying, making it easy for developers familiar with MongoDB tools.

2. Vector Database

CapyDB integrates a high-performance vector database that supports:

Vector Storage in Documents: Embedding vector data (e.g., text embeddings) directly within documents for efficient management.
Semantic Indexing: Advanced retrieval and similarity searches, crucial for LLM and AI applications.

3. Object Storage

CapyDB's object storage manages unstructured data like files and images, complementing its structured and vector data capabilities.

EmbJSON (Extended JSON Types)

CapyDB extends the standard BSON (Binary JSON) format with EmbJSON (CapyDB Extended JSON), which simplifies managing and querying complex data structures like text embeddings. EmbJSON is key to CapyDB's database abstraction and is explained further in EmbJSON Overview.