Python API Reference¶
Database¶
Database
¶
Database(path: str, dim: Optional[int] = None, metric: Optional[str] = None, m: Optional[int] = None, ef_construction: Optional[int] = None, quantize: bool = False)
A persistent vector database backed by an HNSW index.
Stores vectors with string IDs and optional JSON metadata. Supports cosine, euclidean (L2), and dot-product distance metrics. Uses memory-mapped I/O for fast loading and a write-ahead log for crash safety.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Directory path for the database files. Created if it doesn't exist.
TYPE:
|
dim
|
Vector dimensionality (e.g. 384 for MiniLM). Required when creating a new database, omit when opening an existing one.
TYPE:
|
metric
|
Distance metric —
TYPE:
|
m
|
HNSW
TYPE:
|
ef_construction
|
HNSW build-time search width (default 200). Higher values improve index quality at the cost of build time.
TYPE:
|
quantize
|
If
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If opening an existing database without |
Example::
with Database("my_db", dim=384) as db:
db.add("doc1", embedding, {"title": "Hello"})
results = db.search(query_vector, k=5)
for r in results:
print(r.id, r.distance)
Example (opening existing)::
db = Database("my_db") # auto-detects dim and metric
deleted_count
property
¶
Number of deleted slots not yet reclaimed by :meth:compact.
add
¶
add(id: str, vector: Union[list[float], NDArray[float32]], metadata: Optional[dict[str, Any]] = None) -> None
Add a vector with a unique string ID.
| PARAMETER | DESCRIPTION |
|---|---|
id
|
Unique identifier. Raises if the ID already exists
(use :meth:
TYPE:
|
vector
|
The embedding vector. Must match the database's dimensionality. Accepts a Python list or a 1-D numpy array of float32.
TYPE:
|
metadata
|
Optional JSON-serializable dict of metadata
(e.g.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the ID already exists or the vector dimension doesn't match. |
upsert
¶
upsert(id: str, vector: Union[list[float], NDArray[float32]], metadata: Optional[dict[str, Any]] = None) -> None
Insert a vector, or update it if the ID already exists.
Same as :meth:add but overwrites existing entries instead
of raising an error.
| PARAMETER | DESCRIPTION |
|---|---|
id
|
Unique identifier.
TYPE:
|
vector
|
The embedding vector.
TYPE:
|
metadata
|
Optional metadata dict.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the vector dimension doesn't match. |
add_many
¶
add_many(ids: list[str], vectors: Union[list[list[float]], NDArray[float32]], metadatas: Optional[list[Optional[dict[str, Any]]]] = None) -> None
Batch insert multiple vectors.
All three lists must have the same length. Vectors can be
passed as a list of lists or a 2-D numpy array of shape
(n, dim).
| PARAMETER | DESCRIPTION |
|---|---|
ids
|
List of unique identifiers.
TYPE:
|
vectors
|
Batch of embedding vectors.
TYPE:
|
metadatas
|
Optional list of metadata dicts (or
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If any ID already exists, lengths mismatch, or dimensions don't match. |
upsert_many
¶
upsert_many(ids: list[str], vectors: Union[list[list[float]], NDArray[float32]], metadatas: Optional[list[Optional[dict[str, Any]]]] = None) -> None
Batch upsert — inserts new vectors, updates existing ones.
More efficient than calling :meth:upsert in a loop because
new vectors are batch-inserted together.
| PARAMETER | DESCRIPTION |
|---|---|
ids
|
List of identifiers.
TYPE:
|
vectors
|
Batch of embedding vectors.
TYPE:
|
metadatas
|
Optional list of metadata dicts.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If lengths mismatch or dimensions don't match. |
search
¶
search(vector: Union[list[float], NDArray[float32]], k: int = 10, ef_search: Optional[int] = None, where_filter: Optional[dict[str, Any]] = None, max_distance: Optional[float] = None) -> list[SearchResult]
Find the k nearest neighbors to a query vector.
| PARAMETER | DESCRIPTION |
|---|---|
vector
|
The query embedding.
TYPE:
|
k
|
Number of results to return (default 10).
TYPE:
|
ef_search
|
HNSW search-time width. Higher values improve
recall at the cost of latency. Defaults to
TYPE:
|
where_filter
|
Optional metadata filter. Supports:
TYPE:
|
max_distance
|
Optional distance threshold. Results with distance greater than this value are discarded. Useful for finding only "close enough" matches.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[SearchResult]
|
List of :class: |
list[SearchResult]
|
(ascending). |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the vector dimension doesn't match. |
search_many
¶
search_many(vectors: Union[list[list[float]], NDArray[float32]], k: int = 10, ef_search: Optional[int] = None, where_filter: Optional[dict[str, Any]] = None, max_distance: Optional[float] = None) -> list[list[SearchResult]]
Search multiple queries in parallel using Rayon.
Significantly faster than calling :meth:search in a loop
for multiple queries.
| PARAMETER | DESCRIPTION |
|---|---|
vectors
|
Batch of query embeddings (list of lists or 2-D numpy array).
TYPE:
|
k
|
Number of results per query (default 10).
TYPE:
|
ef_search
|
HNSW search-time width.
TYPE:
|
where_filter
|
Optional metadata filter (same syntax as
:meth:
TYPE:
|
max_distance
|
Optional distance threshold applied to all queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[list[SearchResult]]
|
List of result lists, one per query. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If any vector dimension doesn't match. |
get
¶
Retrieve a vector and its metadata by ID.
| PARAMETER | DESCRIPTION |
|---|---|
id
|
The vector's unique identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[float]
|
A tuple of |
Optional[dict[str, Any]]
|
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the ID is not found. |
delete
¶
Delete a vector by ID.
The slot is marked as deleted but not reclaimed until
:meth:compact is called.
| PARAMETER | DESCRIPTION |
|---|---|
id
|
The vector's unique identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
|
bool
|
if the ID was not found. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
On internal errors (e.g. WAL write failure). |
delete_many
¶
Delete multiple vectors by ID.
More efficient than calling :meth:delete in a loop because
locks are held once for the entire batch.
| PARAMETER | DESCRIPTION |
|---|---|
ids
|
List of vector IDs to delete.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
Number of vectors actually deleted (IDs not found are skipped). |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
On internal errors (e.g. WAL write failure). |
update
¶
update(id: str, vector: Optional[Union[list[float], NDArray[float32]]] = None, metadata: Optional[dict[str, Any]] = None) -> None
Update a vector's embedding and/or metadata in-place.
At least one of vector or metadata must be provided.
| PARAMETER | DESCRIPTION |
|---|---|
id
|
The vector's unique identifier.
TYPE:
|
vector
|
New embedding vector (or
TYPE:
|
metadata
|
New metadata dict (or
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the ID is not found or the vector dimension doesn't match. |
count
¶
Count vectors matching a filter, or all vectors if no filter.
Uses the inverted metadata index for fast counting with
equality and $in filters.
| PARAMETER | DESCRIPTION |
|---|---|
where_filter
|
Optional metadata filter (same syntax as
:meth:
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
Number of matching vectors. |
save
¶
Persist the database to disk.
Writes the HNSW graph, metadata, and vectors to the database directory and truncates the write-ahead log.
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
On I/O errors (e.g. disk full, permission denied). |
compact
¶
Rebuild the index with only live vectors.
Reclaims slots from deleted vectors, reducing memory usage and on-disk size. Call after many deletions.
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
On internal errors. |
enable_quantized_search
¶
Enable SQ8 quantized search for faster HNSW traversal.
Quantized vectors use 4x less memory for distance comparisons during graph traversal, with full-precision re-ranking of final candidates.
disable_quantized_search
¶
Disable quantized search and use full-precision vectors.
stats
¶
Get graph-level statistics for diagnostics.
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict with keys: |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
export_json
¶
Export all vectors and metadata to a JSON file.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
File path to write the JSON export.
TYPE:
|
pretty
|
If
TYPE:
|
import_json
¶
Import vectors from a JSON file (upsert semantics).
Updates existing IDs and inserts new ones.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
File path to read the JSON export from.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the JSON dimension doesn't match. |
Client¶
Client
¶
Multi-collection client for managing named vector databases.
Each collection is stored in its own subdirectory under the root path.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Root directory for all collections.
TYPE:
|
Example::
client = Client("/data/vectors")
movies = client.create_collection("movies", dim=384)
docs = client.get_or_create_collection("docs", dim=768)
print(client.list_collections()) # ["docs", "movies"]
create_collection
¶
create_collection(name: str, dim: int, metric: Optional[str] = None, m: Optional[int] = None, ef_construction: Optional[int] = None, quantize: bool = False) -> Database
Create a new collection.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Collection name (alphanumeric, hyphens, underscores).
TYPE:
|
dim
|
Vector dimensionality.
TYPE:
|
metric
|
Distance metric (default
TYPE:
|
m
|
HNSW M parameter (default 16).
TYPE:
|
ef_construction
|
HNSW build-time width (default 200).
TYPE:
|
quantize
|
Enable SQ8 quantization.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the collection already exists or the name is invalid. |
get_collection
¶
get_collection(name: str) -> Database
Open an existing collection.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Collection name.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the collection doesn't exist. |
get_or_create_collection
¶
get_or_create_collection(name: str, dim: int, metric: Optional[str] = None) -> Database
Get or create a collection.
If the collection exists, opens it (dim/metric are ignored). Otherwise creates a new one.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Collection name.
TYPE:
|
dim
|
Vector dimensionality (used only for creation).
TYPE:
|
metric
|
Distance metric (used only for creation).
TYPE:
|
delete_collection
¶
Delete a collection and all its data.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Collection name.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
|
list_collections
¶
List all collection names (sorted alphabetically).
AsyncDatabase¶
AsyncDatabase
¶
AsyncDatabase(path: str, dim: Optional[int] = None, metric: Optional[str] = None, m: Optional[int] = None, ef_construction: Optional[int] = None, quantize: bool = False)
Async wrapper around Database. All methods are non-blocking and run the underlying Rust operations in a thread pool via asyncio.to_thread().
Usage::
async with AsyncDatabase("my_db", dim=384) as db:
await db.add("id1", vector, {"key": "value"})
results = await db.search(query_vector, k=10)
open
async
classmethod
¶
open(path: str, dim: Optional[int] = None, metric: Optional[str] = None, m: Optional[int] = None, ef_construction: Optional[int] = None, quantize: bool = False) -> AsyncDatabase
Async factory method to open or create a database.
add
async
¶
add(id: str, vector: Union[list[float], NDArray[float32]], metadata: Optional[dict[str, Any]] = None) -> None
Add a vector with a unique string ID.
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the ID already exists or dimension mismatch. |
upsert
async
¶
upsert(id: str, vector: Union[list[float], NDArray[float32]], metadata: Optional[dict[str, Any]] = None) -> None
Insert a vector, or update it if the ID already exists.
add_many
async
¶
add_many(ids: list[str], vectors: Union[list[list[float]], NDArray[float32]], metadatas: Optional[list[Optional[dict[str, Any]]]] = None) -> None
Batch insert multiple vectors.
upsert_many
async
¶
upsert_many(ids: list[str], vectors: Union[list[list[float]], NDArray[float32]], metadatas: Optional[list[Optional[dict[str, Any]]]] = None) -> None
Batch upsert — inserts new vectors, updates existing ones.
search
async
¶
search(vector: Union[list[float], NDArray[float32]], k: int = 10, ef_search: Optional[int] = None, where_filter: Optional[dict[str, Any]] = None, max_distance: Optional[float] = None) -> list[SearchResult]
Find the k nearest neighbors to a query vector.
| PARAMETER | DESCRIPTION |
|---|---|
vector
|
The query embedding.
TYPE:
|
k
|
Number of results to return (default 10).
TYPE:
|
ef_search
|
HNSW search-time width.
TYPE:
|
where_filter
|
Optional metadata filter.
TYPE:
|
max_distance
|
Optional distance threshold.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[SearchResult]
|
List of SearchResult objects sorted by distance. |
search_many
async
¶
search_many(vectors: Union[list[list[float]], NDArray[float32]], k: int = 10, ef_search: Optional[int] = None, where_filter: Optional[dict[str, Any]] = None, max_distance: Optional[float] = None) -> list[list[SearchResult]]
Search multiple queries in parallel.
| RETURNS | DESCRIPTION |
|---|---|
list[list[SearchResult]]
|
List of result lists, one per query. |
get
async
¶
Retrieve a vector and its metadata by ID.
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the ID is not found. |
delete
async
¶
Delete a vector by ID.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if found and deleted, False if not found. |
delete_many
async
¶
Delete multiple vectors by ID.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Number of vectors actually deleted. |
update
async
¶
update(id: str, vector: Optional[Union[list[float], NDArray[float32]]] = None, metadata: Optional[dict[str, Any]] = None) -> None
Update a vector's embedding and/or metadata.
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the ID is not found. |
count
¶
Count vectors matching a filter, or all vectors if no filter.
SearchResult¶
SearchResult
¶
A single search result returned by :meth:Database.search.
| ATTRIBUTE | DESCRIPTION |
|---|---|
id |
The unique string identifier of the matched vector.
TYPE:
|
distance |
The distance between the query and this vector. Lower is more similar for cosine and euclidean metrics.
TYPE:
|
metadata |
The metadata dict attached to this vector, or
TYPE:
|
Supports indexing (result[0] → id, result[1] → distance,
result[2] → metadata) for tuple-style destructuring.