Python API Reference¶

Database¶

Database ¶

Database(path: str, dim: Optional[int] = None, metric: Optional[str] = None, m: Optional[int] = None, ef_construction: Optional[int] = None, quantize: bool = False)

A persistent vector database backed by an HNSW index.

Stores vectors with string IDs and optional JSON metadata. Supports cosine, euclidean (L2), and dot-product distance metrics. Uses memory-mapped I/O for fast loading and a write-ahead log for crash safety.

PARAMETER	DESCRIPTION
`path`	Directory path for the database files. Created if it doesn't exist. TYPE: `str`
`dim`	Vector dimensionality (e.g. 384 for MiniLM). Required when creating a new database, omit when opening an existing one. TYPE: `Optional[int]` DEFAULT: `None`
`metric`	Distance metric — `"cosine"` (default), `"euclidean"` / `"l2"`, or `"dot"` / `"dot_product"`. TYPE: `Optional[str]` DEFAULT: `None`
`m`	HNSW `M` parameter — max edges per node (default 16). Higher values improve recall at the cost of memory. TYPE: `Optional[int]` DEFAULT: `None`
`ef_construction`	HNSW build-time search width (default 200). Higher values improve index quality at the cost of build time. TYPE: `Optional[int]` DEFAULT: `None`
`quantize`	If `True`, enable scalar quantization (SQ8) for faster search with slightly lower precision. TYPE: `bool` DEFAULT: `False`

RAISES	DESCRIPTION
`ValueError`	If opening an existing database without `dim` and the database doesn't exist, or if `metric` is invalid.

Example::

with Database("my_db", dim=384) as db:
    db.add("doc1", embedding, {"title": "Hello"})
    results = db.search(query_vector, k=5)
    for r in results:
        print(r.id, r.distance)

Example (opening existing)::

db = Database("my_db")  # auto-detects dim and metric

quantized_search `property` ¶

quantized_search: bool

Whether quantized search is currently enabled.

deleted_count `property` ¶

deleted_count: int

Number of deleted slots not yet reclaimed by :meth:compact.

total_slots `property` ¶

total_slots: int

Total allocated slots (active + deleted).

dim `property` ¶

dim: int

The vector dimensionality of this database.

metric `property` ¶

metric: str

The distance metric: "cosine", "euclidean", or "dot_product".

add ¶

add(id: str, vector: Union[list[float], NDArray[float32]], metadata: Optional[dict[str, Any]] = None) -> None

Add a vector with a unique string ID.

PARAMETER	DESCRIPTION
`id`	Unique identifier. Raises if the ID already exists (use :meth:`upsert` to insert-or-update). TYPE: `str`
`vector`	The embedding vector. Must match the database's dimensionality. Accepts a Python list or a 1-D numpy array of float32. TYPE: `Union[list[float], NDArray[float32]]`
`metadata`	Optional JSON-serializable dict of metadata (e.g. `{"category": "science", "year": 2024}`). TYPE: `Optional[dict[str, Any]]` DEFAULT: `None`

RAISES	DESCRIPTION
`ValueError`	If the ID already exists or the vector dimension doesn't match.

upsert ¶

upsert(id: str, vector: Union[list[float], NDArray[float32]], metadata: Optional[dict[str, Any]] = None) -> None

Insert a vector, or update it if the ID already exists.

Same as :meth:add but overwrites existing entries instead of raising an error.

PARAMETER	DESCRIPTION
`id`	Unique identifier. TYPE: `str`
`vector`	The embedding vector. TYPE: `Union[list[float], NDArray[float32]]`
`metadata`	Optional metadata dict. TYPE: `Optional[dict[str, Any]]` DEFAULT: `None`

RAISES	DESCRIPTION
`ValueError`	If the vector dimension doesn't match.

add_many ¶

add_many(ids: list[str], vectors: Union[list[list[float]], NDArray[float32]], metadatas: Optional[list[Optional[dict[str, Any]]]] = None) -> None

Batch insert multiple vectors.

All three lists must have the same length. Vectors can be passed as a list of lists or a 2-D numpy array of shape (n, dim).

PARAMETER	DESCRIPTION
`ids`	List of unique identifiers. TYPE: `list[str]`
`vectors`	Batch of embedding vectors. TYPE: `Union[list[list[float]], NDArray[float32]]`
`metadatas`	Optional list of metadata dicts (or `None` per entry). TYPE: `Optional[list[Optional[dict[str, Any]]]]` DEFAULT: `None`

RAISES	DESCRIPTION
`ValueError`	If any ID already exists, lengths mismatch, or dimensions don't match.

upsert_many ¶

upsert_many(ids: list[str], vectors: Union[list[list[float]], NDArray[float32]], metadatas: Optional[list[Optional[dict[str, Any]]]] = None) -> None

Batch upsert — inserts new vectors, updates existing ones.

More efficient than calling :meth:upsert in a loop because new vectors are batch-inserted together.

PARAMETER	DESCRIPTION
`ids`	List of identifiers. TYPE: `list[str]`
`vectors`	Batch of embedding vectors. TYPE: `Union[list[list[float]], NDArray[float32]]`
`metadatas`	Optional list of metadata dicts. TYPE: `Optional[list[Optional[dict[str, Any]]]]` DEFAULT: `None`

RAISES	DESCRIPTION
`ValueError`	If lengths mismatch or dimensions don't match.

search ¶

search(vector: Union[list[float], NDArray[float32]], k: int = 10, ef_search: Optional[int] = None, where_filter: Optional[dict[str, Any]] = None, max_distance: Optional[float] = None) -> list[SearchResult]

Find the k nearest neighbors to a query vector.

PARAMETER	DESCRIPTION
`vector`	The query embedding. TYPE: `Union[list[float], NDArray[float32]]`
`k`	Number of results to return (default 10). TYPE: `int` DEFAULT: `10`
`ef_search`	HNSW search-time width. Higher values improve recall at the cost of latency. Defaults to `max(k, 10)`. TYPE: `Optional[int]` DEFAULT: `None`
`where_filter`	Optional metadata filter. Supports: Equality: `{"field": "value"}` Not-equal: `{"field": {"$ne": "value"}}` In-set: `{"field": {"$in": ["a", "b"]}}` Numeric ranges: `{"field": {"$gt": 10, "$lte": 20}}` Compound (AND): `{"f1": "v1", "f2": {"$gt": 5}}` TYPE: `Optional[dict[str, Any]]` DEFAULT: `None`
`max_distance`	Optional distance threshold. Results with distance greater than this value are discarded. Useful for finding only "close enough" matches. TYPE: `Optional[float]` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[SearchResult]`	List of :class:`SearchResult` objects sorted by distance
`list[SearchResult]`	(ascending).

RAISES	DESCRIPTION
`ValueError`	If the vector dimension doesn't match.

search_many ¶

search_many(vectors: Union[list[list[float]], NDArray[float32]], k: int = 10, ef_search: Optional[int] = None, where_filter: Optional[dict[str, Any]] = None, max_distance: Optional[float] = None) -> list[list[SearchResult]]

Search multiple queries in parallel using Rayon.

Significantly faster than calling :meth:search in a loop for multiple queries.

PARAMETER	DESCRIPTION
`vectors`	Batch of query embeddings (list of lists or 2-D numpy array). TYPE: `Union[list[list[float]], NDArray[float32]]`
`k`	Number of results per query (default 10). TYPE: `int` DEFAULT: `10`
`ef_search`	HNSW search-time width. TYPE: `Optional[int]` DEFAULT: `None`
`where_filter`	Optional metadata filter (same syntax as :meth:`search`). Applied to all queries. TYPE: `Optional[dict[str, Any]]` DEFAULT: `None`
`max_distance`	Optional distance threshold applied to all queries. TYPE: `Optional[float]` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[list[SearchResult]]`	List of result lists, one per query.

RAISES	DESCRIPTION
`ValueError`	If any vector dimension doesn't match.

get ¶

get(id: str) -> tuple[list[float], Optional[dict[str, Any]]]

Retrieve a vector and its metadata by ID.

PARAMETER	DESCRIPTION
`id`	The vector's unique identifier. TYPE: `str`

RETURNS	DESCRIPTION
`list[float]`	A tuple of `(vector, metadata)` where metadata may be
`Optional[dict[str, Any]]`	`None`.

RAISES	DESCRIPTION
`ValueError`	If the ID is not found.

delete ¶

delete(id: str) -> bool

Delete a vector by ID.

The slot is marked as deleted but not reclaimed until :meth:compact is called.

PARAMETER	DESCRIPTION
`id`	The vector's unique identifier. TYPE: `str`

RETURNS	DESCRIPTION
`bool`	`True` if the vector was found and deleted, `False`
`bool`	if the ID was not found.

RAISES	DESCRIPTION
`ValueError`	On internal errors (e.g. WAL write failure).

delete_many ¶

delete_many(ids: list[str]) -> int

Delete multiple vectors by ID.

More efficient than calling :meth:delete in a loop because locks are held once for the entire batch.

PARAMETER	DESCRIPTION
`ids`	List of vector IDs to delete. TYPE: `list[str]`

RETURNS	DESCRIPTION
`int`	Number of vectors actually deleted (IDs not found are skipped).

RAISES	DESCRIPTION
`ValueError`	On internal errors (e.g. WAL write failure).

update ¶

update(id: str, vector: Optional[Union[list[float], NDArray[float32]]] = None, metadata: Optional[dict[str, Any]] = None) -> None

Update a vector's embedding and/or metadata in-place.

At least one of vector or metadata must be provided.

PARAMETER	DESCRIPTION
`id`	The vector's unique identifier. TYPE: `str`
`vector`	New embedding vector (or `None` to keep existing). TYPE: `Optional[Union[list[float], NDArray[float32]]]` DEFAULT: `None`
`metadata`	New metadata dict (or `None` to keep existing). TYPE: `Optional[dict[str, Any]]` DEFAULT: `None`

RAISES	DESCRIPTION
`ValueError`	If the ID is not found or the vector dimension doesn't match.

count ¶

count(where_filter: Optional[dict[str, Any]] = None) -> int

Count vectors matching a filter, or all vectors if no filter.

Uses the inverted metadata index for fast counting with equality and $in filters.

PARAMETER	DESCRIPTION
`where_filter`	Optional metadata filter (same syntax as :meth:`search`). If `None`, returns total count. TYPE: `Optional[dict[str, Any]]` DEFAULT: `None`

RETURNS	DESCRIPTION
`int`	Number of matching vectors.

ids ¶

ids() -> list[str]

Return a list of all vector IDs in the database.

save ¶

save() -> None

Persist the database to disk.

Writes the HNSW graph, metadata, and vectors to the database directory and truncates the write-ahead log.

RAISES	DESCRIPTION
`ValueError`	On I/O errors (e.g. disk full, permission denied).

compact ¶

compact() -> None

Rebuild the index with only live vectors.

Reclaims slots from deleted vectors, reducing memory usage and on-disk size. Call after many deletions.

RAISES	DESCRIPTION
`ValueError`	On internal errors.

enable_quantized_search ¶

enable_quantized_search() -> None

Enable SQ8 quantized search for faster HNSW traversal.

Quantized vectors use 4x less memory for distance comparisons during graph traversal, with full-precision re-ranking of final candidates.

disable_quantized_search ¶

disable_quantized_search() -> None

Disable quantized search and use full-precision vectors.

stats ¶

stats() -> dict[str, Any]

Get graph-level statistics for diagnostics.

RETURNS	DESCRIPTION
`dict[str, Any]`	Dict with keys: `num_vectors`, `num_deleted`,
`dict[str, Any]`	`num_layers`, `avg_degree_layer0`, `max_degree_layer0`,
`dict[str, Any]`	`min_degree_layer0`, `memory_vectors_bytes`,
`dict[str, Any]`	`memory_graph_bytes`, `memory_quantized_bytes`,
`dict[str, Any]`	`uses_brute_force`, `uses_quantized_search`.

export_json ¶

export_json(path: str, pretty: bool = False) -> None

Export all vectors and metadata to a JSON file.

PARAMETER	DESCRIPTION
`path`	File path to write the JSON export. TYPE: `str`
`pretty`	If `True`, pretty-print the JSON output. TYPE: `bool` DEFAULT: `False`

import_json ¶

import_json(path: str) -> None

Import vectors from a JSON file (upsert semantics).

Updates existing IDs and inserts new ones.

PARAMETER	DESCRIPTION
`path`	File path to read the JSON export from. TYPE: `str`

RAISES	DESCRIPTION
`ValueError`	If the JSON dimension doesn't match.

Client¶

Client ¶

Client(path: str)

Multi-collection client for managing named vector databases.

Each collection is stored in its own subdirectory under the root path.

PARAMETER	DESCRIPTION
`path`	Root directory for all collections. TYPE: `str`

Example::

client = Client("/data/vectors")
movies = client.create_collection("movies", dim=384)
docs = client.get_or_create_collection("docs", dim=768)
print(client.list_collections())  # ["docs", "movies"]

create_collection ¶

create_collection(name: str, dim: int, metric: Optional[str] = None, m: Optional[int] = None, ef_construction: Optional[int] = None, quantize: bool = False) -> Database

Create a new collection.

PARAMETER	DESCRIPTION
`name`	Collection name (alphanumeric, hyphens, underscores). TYPE: `str`
`dim`	Vector dimensionality. TYPE: `int`
`metric`	Distance metric (default `"cosine"`). TYPE: `Optional[str]` DEFAULT: `None`
`m`	HNSW M parameter (default 16). TYPE: `Optional[int]` DEFAULT: `None`
`ef_construction`	HNSW build-time width (default 200). TYPE: `Optional[int]` DEFAULT: `None`
`quantize`	Enable SQ8 quantization. TYPE: `bool` DEFAULT: `False`

RAISES	DESCRIPTION
`ValueError`	If the collection already exists or the name is invalid.

get_collection ¶

get_collection(name: str) -> Database

Open an existing collection.

PARAMETER	DESCRIPTION
`name`	Collection name. TYPE: `str`

RAISES	DESCRIPTION
`ValueError`	If the collection doesn't exist.

get_or_create_collection ¶

get_or_create_collection(name: str, dim: int, metric: Optional[str] = None) -> Database

Get or create a collection.

If the collection exists, opens it (dim/metric are ignored). Otherwise creates a new one.

PARAMETER	DESCRIPTION
`name`	Collection name. TYPE: `str`
`dim`	Vector dimensionality (used only for creation). TYPE: `int`
`metric`	Distance metric (used only for creation). TYPE: `Optional[str]` DEFAULT: `None`

delete_collection ¶

delete_collection(name: str) -> bool

Delete a collection and all its data.

PARAMETER	DESCRIPTION
`name`	Collection name. TYPE: `str`

RETURNS	DESCRIPTION
`bool`	`True` if the collection existed and was deleted.

list_collections ¶

list_collections() -> list[str]

List all collection names (sorted alphabetically).

AsyncDatabase¶

AsyncDatabase ¶

AsyncDatabase(path: str, dim: Optional[int] = None, metric: Optional[str] = None, m: Optional[int] = None, ef_construction: Optional[int] = None, quantize: bool = False)

Async wrapper around Database. All methods are non-blocking and run the underlying Rust operations in a thread pool via asyncio.to_thread().

Usage::

async with AsyncDatabase("my_db", dim=384) as db:
    await db.add("id1", vector, {"key": "value"})
    results = await db.search(query_vector, k=10)

dim `property` ¶

dim: int

The vector dimensionality of this database.

metric `property` ¶

metric: str

The distance metric.

quantized_search `property` ¶

quantized_search: bool

Whether quantized search is currently enabled.

deleted_count `property` ¶

deleted_count: int

Number of deleted slots not yet reclaimed.

total_slots `property` ¶

total_slots: int

Total allocated slots (active + deleted).

open `async` `classmethod` ¶

open(path: str, dim: Optional[int] = None, metric: Optional[str] = None, m: Optional[int] = None, ef_construction: Optional[int] = None, quantize: bool = False) -> AsyncDatabase

Async factory method to open or create a database.

add `async` ¶

add(id: str, vector: Union[list[float], NDArray[float32]], metadata: Optional[dict[str, Any]] = None) -> None

Add a vector with a unique string ID.

RAISES	DESCRIPTION
`ValueError`	If the ID already exists or dimension mismatch.

upsert `async` ¶

upsert(id: str, vector: Union[list[float], NDArray[float32]], metadata: Optional[dict[str, Any]] = None) -> None

Insert a vector, or update it if the ID already exists.

add_many `async` ¶

add_many(ids: list[str], vectors: Union[list[list[float]], NDArray[float32]], metadatas: Optional[list[Optional[dict[str, Any]]]] = None) -> None

Batch insert multiple vectors.

upsert_many `async` ¶

upsert_many(ids: list[str], vectors: Union[list[list[float]], NDArray[float32]], metadatas: Optional[list[Optional[dict[str, Any]]]] = None) -> None

Batch upsert — inserts new vectors, updates existing ones.

search `async` ¶

search(vector: Union[list[float], NDArray[float32]], k: int = 10, ef_search: Optional[int] = None, where_filter: Optional[dict[str, Any]] = None, max_distance: Optional[float] = None) -> list[SearchResult]

Find the k nearest neighbors to a query vector.

PARAMETER	DESCRIPTION
`vector`	The query embedding. TYPE: `Union[list[float], NDArray[float32]]`
`k`	Number of results to return (default 10). TYPE: `int` DEFAULT: `10`
`ef_search`	HNSW search-time width. TYPE: `Optional[int]` DEFAULT: `None`
`where_filter`	Optional metadata filter. TYPE: `Optional[dict[str, Any]]` DEFAULT: `None`
`max_distance`	Optional distance threshold. TYPE: `Optional[float]` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[SearchResult]`	List of SearchResult objects sorted by distance.

search_many `async` ¶

search_many(vectors: Union[list[list[float]], NDArray[float32]], k: int = 10, ef_search: Optional[int] = None, where_filter: Optional[dict[str, Any]] = None, max_distance: Optional[float] = None) -> list[list[SearchResult]]

Search multiple queries in parallel.

RETURNS	DESCRIPTION
`list[list[SearchResult]]`	List of result lists, one per query.

get `async` ¶

get(id: str) -> tuple[list[float], Optional[dict[str, Any]]]

Retrieve a vector and its metadata by ID.

RAISES	DESCRIPTION
`ValueError`	If the ID is not found.

delete `async` ¶

delete(id: str) -> bool

Delete a vector by ID.

RETURNS	DESCRIPTION
`bool`	True if found and deleted, False if not found.

delete_many `async` ¶

delete_many(ids: list[str]) -> int

Delete multiple vectors by ID.

RETURNS	DESCRIPTION
`int`	Number of vectors actually deleted.

update `async` ¶

update(id: str, vector: Optional[Union[list[float], NDArray[float32]]] = None, metadata: Optional[dict[str, Any]] = None) -> None

Update a vector's embedding and/or metadata.

RAISES	DESCRIPTION
`ValueError`	If the ID is not found.

save `async` ¶

save() -> None

Persist the database to disk.

compact `async` ¶

compact() -> None

Rebuild the index with only live vectors.

count ¶

count(where_filter: Optional[dict[str, Any]] = None) -> int

Count vectors matching a filter, or all vectors if no filter.

ids ¶

ids() -> list[str]

Return a list of all vector IDs.

stats ¶

stats() -> dict[str, Any]

Get graph-level statistics for diagnostics.

enable_quantized_search ¶

enable_quantized_search() -> None

Enable SQ8 quantized search.

disable_quantized_search ¶

disable_quantized_search() -> None

Disable quantized search.

SearchResult¶

SearchResult ¶

A single search result returned by :meth:Database.search.

ATTRIBUTE	DESCRIPTION
`id`	The unique string identifier of the matched vector. TYPE: `str`
`distance`	The distance between the query and this vector. Lower is more similar for cosine and euclidean metrics. TYPE: `float`
`metadata`	The metadata dict attached to this vector, or `None`. TYPE: `Optional[dict[str, Any]]`

Supports indexing (result[0] → id, result[1] → distance, result[2] → metadata) for tuple-style destructuring.

Python API Reference¶

Database¶

Database ¶

quantized_search property ¶

deleted_count property ¶

total_slots property ¶

dim property ¶

metric property ¶

add ¶

upsert ¶

add_many ¶

upsert_many ¶

search ¶

search_many ¶

get ¶

delete ¶

delete_many ¶

update ¶

count ¶

ids ¶

save ¶

compact ¶

enable_quantized_search ¶

disable_quantized_search ¶

stats ¶

export_json ¶

import_json ¶

Client¶

Client ¶

create_collection ¶

get_collection ¶

get_or_create_collection ¶

delete_collection ¶

list_collections ¶

AsyncDatabase¶

AsyncDatabase ¶

dim property ¶

metric property ¶

quantized_search property ¶

deleted_count property ¶

total_slots property ¶

open async classmethod ¶

add async ¶

upsert async ¶

add_many async ¶

upsert_many async ¶

search async ¶

search_many async ¶

get async ¶

delete async ¶

delete_many async ¶

update async ¶

save async ¶

compact async ¶

count ¶

ids ¶

stats ¶

enable_quantized_search ¶

disable_quantized_search ¶

SearchResult¶

SearchResult ¶

quantized_search `property` ¶

deleted_count `property` ¶

total_slots `property` ¶

dim `property` ¶

metric `property` ¶

dim `property` ¶

metric `property` ¶

quantized_search `property` ¶

deleted_count `property` ¶

total_slots `property` ¶

open `async` `classmethod` ¶

add `async` ¶

upsert `async` ¶

add_many `async` ¶

upsert_many `async` ¶

search `async` ¶

search_many `async` ¶

get `async` ¶

delete `async` ¶

delete_many `async` ¶

update `async` ¶

save `async` ¶

compact `async` ¶