Skip to content

Python API Reference

Database

Database

Database(path: str, dim: Optional[int] = None, metric: Optional[str] = None, m: Optional[int] = None, ef_construction: Optional[int] = None, quantize: bool = False)

A persistent vector database backed by an HNSW index.

Stores vectors with string IDs and optional JSON metadata. Supports cosine, euclidean (L2), and dot-product distance metrics. Uses memory-mapped I/O for fast loading and a write-ahead log for crash safety.

PARAMETER DESCRIPTION
path

Directory path for the database files. Created if it doesn't exist.

TYPE: str

dim

Vector dimensionality (e.g. 384 for MiniLM). Required when creating a new database, omit when opening an existing one.

TYPE: Optional[int] DEFAULT: None

metric

Distance metric — "cosine" (default), "euclidean" / "l2", or "dot" / "dot_product".

TYPE: Optional[str] DEFAULT: None

m

HNSW M parameter — max edges per node (default 16). Higher values improve recall at the cost of memory.

TYPE: Optional[int] DEFAULT: None

ef_construction

HNSW build-time search width (default 200). Higher values improve index quality at the cost of build time.

TYPE: Optional[int] DEFAULT: None

quantize

If True, enable scalar quantization (SQ8) for faster search with slightly lower precision.

TYPE: bool DEFAULT: False

RAISES DESCRIPTION
ValueError

If opening an existing database without dim and the database doesn't exist, or if metric is invalid.

Example::

with Database("my_db", dim=384) as db:
    db.add("doc1", embedding, {"title": "Hello"})
    results = db.search(query_vector, k=5)
    for r in results:
        print(r.id, r.distance)

Example (opening existing)::

db = Database("my_db")  # auto-detects dim and metric
quantized_search: bool

Whether quantized search is currently enabled.

deleted_count property

deleted_count: int

Number of deleted slots not yet reclaimed by :meth:compact.

total_slots property

total_slots: int

Total allocated slots (active + deleted).

dim property

dim: int

The vector dimensionality of this database.

metric property

metric: str

The distance metric: "cosine", "euclidean", or "dot_product".

add

add(id: str, vector: Union[list[float], NDArray[float32]], metadata: Optional[dict[str, Any]] = None) -> None

Add a vector with a unique string ID.

PARAMETER DESCRIPTION
id

Unique identifier. Raises if the ID already exists (use :meth:upsert to insert-or-update).

TYPE: str

vector

The embedding vector. Must match the database's dimensionality. Accepts a Python list or a 1-D numpy array of float32.

TYPE: Union[list[float], NDArray[float32]]

metadata

Optional JSON-serializable dict of metadata (e.g. {"category": "science", "year": 2024}).

TYPE: Optional[dict[str, Any]] DEFAULT: None

RAISES DESCRIPTION
ValueError

If the ID already exists or the vector dimension doesn't match.

upsert

upsert(id: str, vector: Union[list[float], NDArray[float32]], metadata: Optional[dict[str, Any]] = None) -> None

Insert a vector, or update it if the ID already exists.

Same as :meth:add but overwrites existing entries instead of raising an error.

PARAMETER DESCRIPTION
id

Unique identifier.

TYPE: str

vector

The embedding vector.

TYPE: Union[list[float], NDArray[float32]]

metadata

Optional metadata dict.

TYPE: Optional[dict[str, Any]] DEFAULT: None

RAISES DESCRIPTION
ValueError

If the vector dimension doesn't match.

add_many

add_many(ids: list[str], vectors: Union[list[list[float]], NDArray[float32]], metadatas: Optional[list[Optional[dict[str, Any]]]] = None) -> None

Batch insert multiple vectors.

All three lists must have the same length. Vectors can be passed as a list of lists or a 2-D numpy array of shape (n, dim).

PARAMETER DESCRIPTION
ids

List of unique identifiers.

TYPE: list[str]

vectors

Batch of embedding vectors.

TYPE: Union[list[list[float]], NDArray[float32]]

metadatas

Optional list of metadata dicts (or None per entry).

TYPE: Optional[list[Optional[dict[str, Any]]]] DEFAULT: None

RAISES DESCRIPTION
ValueError

If any ID already exists, lengths mismatch, or dimensions don't match.

upsert_many

upsert_many(ids: list[str], vectors: Union[list[list[float]], NDArray[float32]], metadatas: Optional[list[Optional[dict[str, Any]]]] = None) -> None

Batch upsert — inserts new vectors, updates existing ones.

More efficient than calling :meth:upsert in a loop because new vectors are batch-inserted together.

PARAMETER DESCRIPTION
ids

List of identifiers.

TYPE: list[str]

vectors

Batch of embedding vectors.

TYPE: Union[list[list[float]], NDArray[float32]]

metadatas

Optional list of metadata dicts.

TYPE: Optional[list[Optional[dict[str, Any]]]] DEFAULT: None

RAISES DESCRIPTION
ValueError

If lengths mismatch or dimensions don't match.

search

search(vector: Union[list[float], NDArray[float32]], k: int = 10, ef_search: Optional[int] = None, where_filter: Optional[dict[str, Any]] = None, max_distance: Optional[float] = None) -> list[SearchResult]

Find the k nearest neighbors to a query vector.

PARAMETER DESCRIPTION
vector

The query embedding.

TYPE: Union[list[float], NDArray[float32]]

k

Number of results to return (default 10).

TYPE: int DEFAULT: 10

ef_search

HNSW search-time width. Higher values improve recall at the cost of latency. Defaults to max(k, 10).

TYPE: Optional[int] DEFAULT: None

where_filter

Optional metadata filter. Supports:

  • Equality: {"field": "value"}
  • Not-equal: {"field": {"$ne": "value"}}
  • In-set: {"field": {"$in": ["a", "b"]}}
  • Numeric ranges: {"field": {"$gt": 10, "$lte": 20}}
  • Compound (AND): {"f1": "v1", "f2": {"$gt": 5}}

TYPE: Optional[dict[str, Any]] DEFAULT: None

max_distance

Optional distance threshold. Results with distance greater than this value are discarded. Useful for finding only "close enough" matches.

TYPE: Optional[float] DEFAULT: None

RETURNS DESCRIPTION
list[SearchResult]

List of :class:SearchResult objects sorted by distance

list[SearchResult]

(ascending).

RAISES DESCRIPTION
ValueError

If the vector dimension doesn't match.

search_many

search_many(vectors: Union[list[list[float]], NDArray[float32]], k: int = 10, ef_search: Optional[int] = None, where_filter: Optional[dict[str, Any]] = None, max_distance: Optional[float] = None) -> list[list[SearchResult]]

Search multiple queries in parallel using Rayon.

Significantly faster than calling :meth:search in a loop for multiple queries.

PARAMETER DESCRIPTION
vectors

Batch of query embeddings (list of lists or 2-D numpy array).

TYPE: Union[list[list[float]], NDArray[float32]]

k

Number of results per query (default 10).

TYPE: int DEFAULT: 10

ef_search

HNSW search-time width.

TYPE: Optional[int] DEFAULT: None

where_filter

Optional metadata filter (same syntax as :meth:search). Applied to all queries.

TYPE: Optional[dict[str, Any]] DEFAULT: None

max_distance

Optional distance threshold applied to all queries.

TYPE: Optional[float] DEFAULT: None

RETURNS DESCRIPTION
list[list[SearchResult]]

List of result lists, one per query.

RAISES DESCRIPTION
ValueError

If any vector dimension doesn't match.

get

get(id: str) -> tuple[list[float], Optional[dict[str, Any]]]

Retrieve a vector and its metadata by ID.

PARAMETER DESCRIPTION
id

The vector's unique identifier.

TYPE: str

RETURNS DESCRIPTION
list[float]

A tuple of (vector, metadata) where metadata may be

Optional[dict[str, Any]]

None.

RAISES DESCRIPTION
ValueError

If the ID is not found.

delete

delete(id: str) -> bool

Delete a vector by ID.

The slot is marked as deleted but not reclaimed until :meth:compact is called.

PARAMETER DESCRIPTION
id

The vector's unique identifier.

TYPE: str

RETURNS DESCRIPTION
bool

True if the vector was found and deleted, False

bool

if the ID was not found.

RAISES DESCRIPTION
ValueError

On internal errors (e.g. WAL write failure).

delete_many

delete_many(ids: list[str]) -> int

Delete multiple vectors by ID.

More efficient than calling :meth:delete in a loop because locks are held once for the entire batch.

PARAMETER DESCRIPTION
ids

List of vector IDs to delete.

TYPE: list[str]

RETURNS DESCRIPTION
int

Number of vectors actually deleted (IDs not found are skipped).

RAISES DESCRIPTION
ValueError

On internal errors (e.g. WAL write failure).

update

update(id: str, vector: Optional[Union[list[float], NDArray[float32]]] = None, metadata: Optional[dict[str, Any]] = None) -> None

Update a vector's embedding and/or metadata in-place.

At least one of vector or metadata must be provided.

PARAMETER DESCRIPTION
id

The vector's unique identifier.

TYPE: str

vector

New embedding vector (or None to keep existing).

TYPE: Optional[Union[list[float], NDArray[float32]]] DEFAULT: None

metadata

New metadata dict (or None to keep existing).

TYPE: Optional[dict[str, Any]] DEFAULT: None

RAISES DESCRIPTION
ValueError

If the ID is not found or the vector dimension doesn't match.

count

count(where_filter: Optional[dict[str, Any]] = None) -> int

Count vectors matching a filter, or all vectors if no filter.

Uses the inverted metadata index for fast counting with equality and $in filters.

PARAMETER DESCRIPTION
where_filter

Optional metadata filter (same syntax as :meth:search). If None, returns total count.

TYPE: Optional[dict[str, Any]] DEFAULT: None

RETURNS DESCRIPTION
int

Number of matching vectors.

ids

ids() -> list[str]

Return a list of all vector IDs in the database.

save

save() -> None

Persist the database to disk.

Writes the HNSW graph, metadata, and vectors to the database directory and truncates the write-ahead log.

RAISES DESCRIPTION
ValueError

On I/O errors (e.g. disk full, permission denied).

compact

compact() -> None

Rebuild the index with only live vectors.

Reclaims slots from deleted vectors, reducing memory usage and on-disk size. Call after many deletions.

RAISES DESCRIPTION
ValueError

On internal errors.

enable_quantized_search() -> None

Enable SQ8 quantized search for faster HNSW traversal.

Quantized vectors use 4x less memory for distance comparisons during graph traversal, with full-precision re-ranking of final candidates.

disable_quantized_search() -> None

Disable quantized search and use full-precision vectors.

stats

stats() -> dict[str, Any]

Get graph-level statistics for diagnostics.

RETURNS DESCRIPTION
dict[str, Any]

Dict with keys: num_vectors, num_deleted,

dict[str, Any]

num_layers, avg_degree_layer0, max_degree_layer0,

dict[str, Any]

min_degree_layer0, memory_vectors_bytes,

dict[str, Any]

memory_graph_bytes, memory_quantized_bytes,

dict[str, Any]

uses_brute_force, uses_quantized_search.

export_json

export_json(path: str, pretty: bool = False) -> None

Export all vectors and metadata to a JSON file.

PARAMETER DESCRIPTION
path

File path to write the JSON export.

TYPE: str

pretty

If True, pretty-print the JSON output.

TYPE: bool DEFAULT: False

import_json

import_json(path: str) -> None

Import vectors from a JSON file (upsert semantics).

Updates existing IDs and inserts new ones.

PARAMETER DESCRIPTION
path

File path to read the JSON export from.

TYPE: str

RAISES DESCRIPTION
ValueError

If the JSON dimension doesn't match.

Client

Client

Client(path: str)

Multi-collection client for managing named vector databases.

Each collection is stored in its own subdirectory under the root path.

PARAMETER DESCRIPTION
path

Root directory for all collections.

TYPE: str

Example::

client = Client("/data/vectors")
movies = client.create_collection("movies", dim=384)
docs = client.get_or_create_collection("docs", dim=768)
print(client.list_collections())  # ["docs", "movies"]

create_collection

create_collection(name: str, dim: int, metric: Optional[str] = None, m: Optional[int] = None, ef_construction: Optional[int] = None, quantize: bool = False) -> Database

Create a new collection.

PARAMETER DESCRIPTION
name

Collection name (alphanumeric, hyphens, underscores).

TYPE: str

dim

Vector dimensionality.

TYPE: int

metric

Distance metric (default "cosine").

TYPE: Optional[str] DEFAULT: None

m

HNSW M parameter (default 16).

TYPE: Optional[int] DEFAULT: None

ef_construction

HNSW build-time width (default 200).

TYPE: Optional[int] DEFAULT: None

quantize

Enable SQ8 quantization.

TYPE: bool DEFAULT: False

RAISES DESCRIPTION
ValueError

If the collection already exists or the name is invalid.

get_collection

get_collection(name: str) -> Database

Open an existing collection.

PARAMETER DESCRIPTION
name

Collection name.

TYPE: str

RAISES DESCRIPTION
ValueError

If the collection doesn't exist.

get_or_create_collection

get_or_create_collection(name: str, dim: int, metric: Optional[str] = None) -> Database

Get or create a collection.

If the collection exists, opens it (dim/metric are ignored). Otherwise creates a new one.

PARAMETER DESCRIPTION
name

Collection name.

TYPE: str

dim

Vector dimensionality (used only for creation).

TYPE: int

metric

Distance metric (used only for creation).

TYPE: Optional[str] DEFAULT: None

delete_collection

delete_collection(name: str) -> bool

Delete a collection and all its data.

PARAMETER DESCRIPTION
name

Collection name.

TYPE: str

RETURNS DESCRIPTION
bool

True if the collection existed and was deleted.

list_collections

list_collections() -> list[str]

List all collection names (sorted alphabetically).

AsyncDatabase

AsyncDatabase

AsyncDatabase(path: str, dim: Optional[int] = None, metric: Optional[str] = None, m: Optional[int] = None, ef_construction: Optional[int] = None, quantize: bool = False)

Async wrapper around Database. All methods are non-blocking and run the underlying Rust operations in a thread pool via asyncio.to_thread().

Usage::

async with AsyncDatabase("my_db", dim=384) as db:
    await db.add("id1", vector, {"key": "value"})
    results = await db.search(query_vector, k=10)

dim property

dim: int

The vector dimensionality of this database.

metric property

metric: str

The distance metric.

quantized_search: bool

Whether quantized search is currently enabled.

deleted_count property

deleted_count: int

Number of deleted slots not yet reclaimed.

total_slots property

total_slots: int

Total allocated slots (active + deleted).

open async classmethod

open(path: str, dim: Optional[int] = None, metric: Optional[str] = None, m: Optional[int] = None, ef_construction: Optional[int] = None, quantize: bool = False) -> AsyncDatabase

Async factory method to open or create a database.

add async

add(id: str, vector: Union[list[float], NDArray[float32]], metadata: Optional[dict[str, Any]] = None) -> None

Add a vector with a unique string ID.

RAISES DESCRIPTION
ValueError

If the ID already exists or dimension mismatch.

upsert async

upsert(id: str, vector: Union[list[float], NDArray[float32]], metadata: Optional[dict[str, Any]] = None) -> None

Insert a vector, or update it if the ID already exists.

add_many async

add_many(ids: list[str], vectors: Union[list[list[float]], NDArray[float32]], metadatas: Optional[list[Optional[dict[str, Any]]]] = None) -> None

Batch insert multiple vectors.

upsert_many async

upsert_many(ids: list[str], vectors: Union[list[list[float]], NDArray[float32]], metadatas: Optional[list[Optional[dict[str, Any]]]] = None) -> None

Batch upsert — inserts new vectors, updates existing ones.

search async

search(vector: Union[list[float], NDArray[float32]], k: int = 10, ef_search: Optional[int] = None, where_filter: Optional[dict[str, Any]] = None, max_distance: Optional[float] = None) -> list[SearchResult]

Find the k nearest neighbors to a query vector.

PARAMETER DESCRIPTION
vector

The query embedding.

TYPE: Union[list[float], NDArray[float32]]

k

Number of results to return (default 10).

TYPE: int DEFAULT: 10

ef_search

HNSW search-time width.

TYPE: Optional[int] DEFAULT: None

where_filter

Optional metadata filter.

TYPE: Optional[dict[str, Any]] DEFAULT: None

max_distance

Optional distance threshold.

TYPE: Optional[float] DEFAULT: None

RETURNS DESCRIPTION
list[SearchResult]

List of SearchResult objects sorted by distance.

search_many async

search_many(vectors: Union[list[list[float]], NDArray[float32]], k: int = 10, ef_search: Optional[int] = None, where_filter: Optional[dict[str, Any]] = None, max_distance: Optional[float] = None) -> list[list[SearchResult]]

Search multiple queries in parallel.

RETURNS DESCRIPTION
list[list[SearchResult]]

List of result lists, one per query.

get async

get(id: str) -> tuple[list[float], Optional[dict[str, Any]]]

Retrieve a vector and its metadata by ID.

RAISES DESCRIPTION
ValueError

If the ID is not found.

delete async

delete(id: str) -> bool

Delete a vector by ID.

RETURNS DESCRIPTION
bool

True if found and deleted, False if not found.

delete_many async

delete_many(ids: list[str]) -> int

Delete multiple vectors by ID.

RETURNS DESCRIPTION
int

Number of vectors actually deleted.

update async

update(id: str, vector: Optional[Union[list[float], NDArray[float32]]] = None, metadata: Optional[dict[str, Any]] = None) -> None

Update a vector's embedding and/or metadata.

RAISES DESCRIPTION
ValueError

If the ID is not found.

save async

save() -> None

Persist the database to disk.

compact async

compact() -> None

Rebuild the index with only live vectors.

count

count(where_filter: Optional[dict[str, Any]] = None) -> int

Count vectors matching a filter, or all vectors if no filter.

ids

ids() -> list[str]

Return a list of all vector IDs.

stats

stats() -> dict[str, Any]

Get graph-level statistics for diagnostics.

enable_quantized_search() -> None

Enable SQ8 quantized search.

disable_quantized_search() -> None

Disable quantized search.

SearchResult

SearchResult

A single search result returned by :meth:Database.search.

ATTRIBUTE DESCRIPTION
id

The unique string identifier of the matched vector.

TYPE: str

distance

The distance between the query and this vector. Lower is more similar for cosine and euclidean metrics.

TYPE: float

metadata

The metadata dict attached to this vector, or None.

TYPE: Optional[dict[str, Any]]

Supports indexing (result[0] → id, result[1] → distance, result[2] → metadata) for tuple-style destructuring.