Controllable Data Generation
Generates high-dimensional and large-scale vector data while preserving the original similarity distribution as much as possible.
A benchmark suite for evaluating vector databases under structured scalar predicates, controllable vector distributions, adjustable query conditions, and dynamic end-to-end workloads.
Filtered vector search combines nearest-neighbor retrieval with scalar predicates such as equality, range, and containment filters. Existing benchmarks often use fixed datasets, random filters, or query-only measurements. VecBench is designed to expose how vector databases behave under realistic, controllable, and dynamic workloads.
Generates high-dimensional and large-scale vector data while preserving the original similarity distribution as much as possible.
Controls selectivity and filter correlation to stress pre-filtering, post-filtering, in-filtering, and expanded-filtering strategies.
Aggregates ranking across initialization, querying, concurrency, incremental loading, updates, and deletion to compare end-to-end behavior.
VecBench separates benchmark construction from system execution. The data synthesizer controls vector dimensionality and scale; the query generator builds filtered search workloads with target properties; the executor runs database-specific pipelines and collects metrics.
This structure makes it easier to analyze whether a system is sensitive to vector dimension, dataset scale, filter selectivity, local filter correlation, or dynamic maintenance costs.
Click a metric to sort the table. Lower is better for T0, latency, T1, T2, T3, and Vrank; higher is better for recall and QPS. Green cells indicate the best value in the column.
N/A indicates that the corresponding result is not reported or not directly comparable in the current setting.
VecBench evaluates both static search quality and dynamic maintenance behavior, then summarizes the result with Vrank.
Insert initial data and build the vector index.
T0Run filtered search queries on indexed data.
Recall · LatencyExecute multiple filtered search queries concurrently.
QPSInsert remaining data and maintain indexes.
T1Modify vectors and scalar attributes.
T2Delete a portion of the dataset and measure delay.
T3The paper is authored by researchers from Renmin University of China and Tsinghua University. The project implementation also includes two undergraduate contributors.
Renmin University of China
Implementation contributorRenmin University of China
Implementation contributorUse the following BibTeX entry when referring to the benchmark, paper, or leaderboard.
@article{zhang2026vecbench,
title = {VecBench: A Controllable Benchmark for Filtered Vector Search},
author = {Zhang, Xiang and Zhang, Chao and Fan, Ju and Li, Guoliang and Du, Xiaoyong},
journal = {Proceedings of the ACM on Management of Data},
volume = {4},
number = {3},
year = {2026},
doi = {10.1145/3802125}
}