Query
This guide explains how to perform semantic queries on documents in CapybaraDB. Semantic queries retrieve documents by matching the meaning of the provided query text with EmbJSONs in the database.
The query operation returns a list of matched chunks from EmbJSONs in the collection. Only EmbJSONs with the same emb_model as the query text are included in the semantic search. EmbJSONs with differing emb_model
are excluded from the semantic search.
Example Code for query
Operation
Below is an example of how to perform a semantic query using Python. This example includes EmbJSON fields that align with the type of data you may have inserted previously:
query_text = "Software engineer with expertise in AI"
response = collection.query(query)
Alternatively, you can use an advanced query with optional parameters like so:
query_text = "Software engineer with expertise in AI"
emb_model = "text-embedding-3-small" # Optional
top_k = 3 # Optional
include_values = True # Optional
projection = {
"mode": "include",
"fields": ["name", "bio"]
} # Optional
response = collection.query(query, emb_model, top_k, include_values, projection)
Default Response
{
"matches": [
{
"chunk": "John is a software engineer with expertise in AI.",
"path": "bio", # the path of this chunk in the document
"chunk_n": 0, # the index number of this chunk in the EmbJSON
"score": 0.95,
"document": {
"_id": ObjectId("64d2f8f01234abcd5678ef90"),
}
},
{
"chunk": "Alice is a data scientist with a background in machine learning.",
"path": "bio",
"chunk_n": 1,
"score": 0.89,
"document": {
"_id": ObjectId("64d2f8f01234abcd5678ef91"),
}
}
]
}
The matches field contains an array of documents that were semantically matched, with additional metadata about the matched chunks, such as path, chunk, score, and values if requested. Note that only documents containing EmbJSON fields are returned in the response.
Detailed Response with Additional Parameters
If you use additional parameters such as include_values
or projection
, the response will include more details.
{
"matches": [
{
"path": "bio",
"chunk": "John is a software engineer with expertise in AI.",
"chunk_n": 0,
"score": 0.95,
"values": [
0.123, 0.456, 0.789, ...
], # Vector values of this chunk. (include_values = True)
"document": {
"_id": ObjectId("64d2f8f01234abcd5678ef90"),
"name": "John Doe",
"bio": EmbText("John is a software engineer with expertise in AI.")
}, # Specified fields in the projection will be returned
},
{
"path": "bio",
"chunk": "Alice is a data scientist with a background in machine learning.",
"chunk_n": 1,
"score": 0.89,
"values": [
0.234, 0.567, 0.890, ...
],
"document": {
"_id": ObjectId("64d2f8f01234abcd5678ef91"),
"name": "Alice Smith",
"bio": EmbText("Alice is a data scientist with a background in machine learning.")
},
}
]
}
In this response, only the _id field is included in the document since the projection parameter was not specified.
Query Parameters
Parameter | Description | Data Type / Format |
---|---|---|
query | The text to be embedded and matched against stored EmbJSON fields. This parameter is required. | string |
emb_model (optional) | The embedding model used for the query. Defaults to OpenAI's text-embedding-3-small. Users can select from supported embedding models listed at /models. Refer to Supported Embedding Models for more details. If the specified model does not match those used in the stored EmbJSON, only matching fields will be targeted. | string |
top_k (optional) | Specifies the maximum number of top-matching chunks to return, sorted by semantic similarity. Default is 10. Use top_k to control how many chunks are returned, ensuring you receive the most relevant semantic matches. | integer |
include_values (optional) | Boolean flag to include vector values for each matched chunk in the response. Default is false. Set include_values to true to include the actual vector values of each matched chunk in the response. | boolean |
projection (optional) | Defines which fields to return in the response. The default value is { "mode": "exclude" } . Accepts a required mode (include or exclude) and an optional fields list. See the table below for how different values of projection affect the response. | JSON object |
Format of projection
Parameter
Key | Description | Format |
---|---|---|
mode | Required. Specifies whether to include or exclude certain fields in the response. | string ("include" or "exclude" ) |
fields | Optional. A list of specific fields to include or exclude based on the mode setting. | list of strings (["field1", "field2"] ) |
Projection Parameter Scenarios
Example Projection Value | Result |
---|---|
{ "mode": "include" } | The entire document is returned. |
{ "mode": "include", "fields": ["title", "author"] } | Only the title and author fields are returned. |
{ "mode": "exclude" } | Only the _id field is returned. |
{ "mode": "exclude", "fields": ["title", "author"] } | All fields except title and author are returned. |