X-bench: End-to-End Filtered Vector Search Benchmark

X-bench offers a controllable, system-agnostic suite for benchmarking filtered vector search, combining high-dimensional similarity lookup with structured scalar filtering to enable fair, scalable comparisons across vector databases.

Project Overview

X-bench is a controllable and end-to-end benchmark suite for evaluating filtered vector search—a core capability of modern vector databases that combines high-dimensional similarity search with structured scalar filtering. It is designed to reflect realistic filtered search scenarios, enabling fair, scalable, and explainable performance comparison across diverse database systems.

X-bench provides a modular, auto-scalable benchmarking pipeline covering data generation, query synthesis, and workload execution. The benchmark leverages statistically grounded methods—such as Johnson–Lindenstrauss–based random projection and distribution-aware query generation—to preserve vector-scalar correlations while flexibly controlling data dimension, scale, filtering rate, and filter correlation. This design enables systematic stress testing of vector databases under both static and dynamic workloads, including concurrent queries and online updates.

A unified ranking-based metric (Vrank) aggregates six-phase performance (indexing, query, concurrency, and update) into a single comparable score, providing a holistic view of system efficiency and robustness.

X-bench is system-agnostic and can be easily adapted to a wide range of vector databases and indexing strategies. It encourages combined software–hardware optimization and fair comparison under standardized configurations, promoting deeper understanding of the trade-offs in filtered vector search and driving innovation in next-generation vector data systems.

With its controllable workloads, realistic data distributions, and unified metric, X-bench offers the first principled framework for ranking and understanding filtered vector search performance at scale.

Leaderboard

Click any metric to sort the table. Values highlighted in green indicate the best score for that column.

X-bench Evaluation Workflow

X-bench adopts a six-phase end-to-end evaluation workflow to comprehensively measure vector database performance across static and dynamic workloads. Each phase mirrors a real-world hybrid search stage, and the unified Vrank metric aggregates all results.

Initialization Phase

Load the initial portion of the dataset and build vector indexes.

Measures: Index construction latency (T₀)

Query Execution Phase

Execute filtered vector search queries over the indexed data.

Measures: Average latency and recall

Concurrent Phase

Run multiple filtered queries simultaneously to assess scalability.

Measures: Queries per second (QPS)

Incremental Load Phase

Insert the remaining dataset portion and trigger incremental index updates.

Measures: Insertion and maintenance time (T₁)

Update Phase

Modify a subset of scalar attributes to evaluate index maintenance under updates.

Measures: Update latency (T₂)

Deletion Phase

Delete part of the dataset to study removal overhead and cleanup efficiency.

Measures: Deletion latency (T₃)

Comprehensive Metric

All six phases are aggregated into the unified Vrank score, delivering a holistic ranking that captures indexing throughput, query efficiency, concurrency resilience, and dynamic maintenance cost.

Team

Researchers and engineers across search, recommendation, and data infrastructure collaborate to advance benchmarking standards.

Member A portrait

Member A

xxx

Member B portrait

Member B

xxx

Member C portrait

Member C

xxx

Member D portrait

Member D

xxx