Advance Submission: Vector Search

Advance Rewards Submission

We are delighted to share that our first Advance submission for Vector Search has been made public today!

The algorithm, created by a research team at Granite Labs LLC, is now open for the community to review.

Please see here: Advance Evidence Form.
The code submission that embodies the method described above: Code Submission.

You can compare this submission against state-of-the-art (SOTA) algorithms using Granite Labs LLC’s TIG-SOTA-reproduce repo.

You are invited to explore the evidence and code and prepare to cast your vote in the token-weighted vote on the submission’s eligibility for Advance rewards. Voting will open at the beginning of the next round (round 83) and remain open until the end of that round.

Vector Search

Vector search is a powerful method for finding similar items within a massive dataset. It works by converting data, whether it’s a document, an image, or even a molecule, into a numerical representation known as a vector.

Vectors capture the most important attributes of an item, and items with shared traits will have vectors that are numerically close to one another. Vector search, also known as Approximate Nearest Neighbor (ANN) search, is the process of efficiently finding the “closest” vectors in a dataset to a given “query” vector.

You likely interact with vector search algorithms daily without realizing it. When Netflix recommends a particular series, it’s using vector search. It converts your history into a vector of your viewing preferences, then matches it against similar shows.

Why It Matters

Vector search is a fundamental technology that powers countless applications, especially in the age of big data and AI. It’s crucial for:

  • Navigating Huge Datasets: It excels at sifting through millions or even billions of data points to find relevant results quickly.
  • Handling Complex Data: It can efficiently search high-dimensional data, like the complex vectors that represent images or text.
  • Powering Real-Time Applications: Its speed makes it ideal for recommendation engines, search systems, fraud detection, and Retrieval-Augmented Generation (RAG), which helps Large Language Models (LLMs) access more relevant and up-to-date information.

Applications

The Vector Search challenge hosted on TIG focuses on an important area: vector search on dynamic datasets. Most real-world data isn’t static; it’s constantly changing. New content is created, old content is updated, and some is deleted. For vector search to be effective in these scenarios, the system must handle these continuous updates in real time without degrading search quality or speed.

This is essential for applications such as:

  • Online Data Collection: Such as in autonomous navigation systems that constantly process new sensor data.
  • Real-Time Machine Learning: Where ML models are continuously updated, changing the vector representations of data.
  • Dynamic Robotics: For tasks like motion planning in ever-changing environments.
  • Molecular Similarity Search: Accelerating drug discovery by rapidly finding compounds with similar molecular structures.

Vector search underpins critical production systems, many of which run continuously 365 days a year. Improvements in efficiency compound into substantial savings in compute, energy, and infrastructure costs.

This submission represents a potential step forward in one of the most fundamental problems in computer science. Your participation in reviewing, discussing, and voting will help determine whether it qualifies for Advance Rewards.

3 Likes

I think a lot of people might not realize how huge this advance submission actually is. Most of the current SOTA vector search methods, HNSW, IVF, PQ, etc. are only really fast once you’ve built a heavy index. This is fine if your data never changes, but in real life data is constantly being updated. Every rebuild costs time and money, and that’s where most systems get bogged down.

The thing that is different with stat_filter is that this approach basically skips the heavy index step and still manages to be both fast and accurate. It shifts the expensive math into lightweight GPU operations and then double-checks a tiny shortlist to make sure nothing is missed. The end result is that you get near-perfect accuracy but at speeds that are tens or even hundreds of times faster in the situations that matter most, when your dataset is changing all the time.

That has huge implications for real-world systems such as recommendation engines, fraud detection, search, even retrieval-augmented generation for LLMs. This is not just a minor improvement, it is a massive efficiency improvement. At scale, this can translate directly into massive cost savings and faster performance for services people rely on every day.

While stat_filter currently won’t topple IVF/HNSW it has certainly found a niche that will be very attractive for someone.

The guys at Granite Labs have done a fantastic job with this vector_search algorithm, one that truly embodies what TIG is all about. This is proper innovation this! :slight_smile:

In my opinion this deserves advance rewards, it’s not just a simple improvement, it’s an absolute game-changer and sets the foundations for literally changing the vector_search landscape.

3 Likes

Thanks to @Aoibheann for opening the discussion, and to @Jake_Logos whose perspectives, one focused on practical overhead, the other on operational reality; capture exactly the gap we set out to close.

The core idea behind Stat_Filter is simple and unified: do most of the similarity work in the compressed domain, then verify a tiny shortlist exactly. Concretely, we use a bit‑sliced, low‑bit comparator, the same mechanism we originally prototyped at 1-2 bits, and have now expanded to 2-4 bits per dimension to score candidates with GPU bitwise ops + popcount across bit‑planes. We optionally apply a robust MAD gate when the distribution helps (and disable it when it hurts, e.g., heavy‑tailed SIFT‑1M). Finally, we run a small FP16 exact re‑rank over the shortlist to return the true nearest neighbor. Two tunable knobs: (1) K (shortlist size) and (2) MAD (gate aggressiveness), this lets practitioners move along the speed/recall curve without changing the method. This is incredibly useful by itself, reducing context switching and cognitive overhead.

For those focused on theoretical foundations and implications, the contribution is a clean, analyzable pipeline: rate-distortion in the quantizer; bit‑plane weighted scoring that’s monotone with respect to the underlying metric in expectation; and a correctness‑preserving exact re‑rank that bounds error while keeping the computational budget in O(K) per query. We also address the details that usually get hand‑waved: per‑dimension range selection (e.g., ~80% caps on heavy‑tailed dims), dataset‑aware tunable gating policies (MAD off for SIFT‑1M, moderate MAD for Fashion‑60K), and explicit reporting of “reported” vs “actual” recall when an answer key contains label noise.

With a focus on industry and production applications, the meaning is straightforward: we move the cost out of index rebuilds (usually 64-128 multi-CPU for flashy benchmark numbers, but typically are run with single-CPU builds “in the wild”) into fast, predictable GPU work. There’s essentially no heavy index to build or warm. You can ingest and serve immediately, the latency envelope is dominated by lightweight bit‑ops and a tiny exact pass; and the operational surface area shrinks significantly. We also evaluated in a way that mirrors how teams actually deploy: single‑CPU builds, one GPU, and results reported as total time (build+search), not just search‑only. That fairness constraint makes the numbers believable and portable.

This begs the question: do classic IVF/HNSW/PQ still have a place? Absolutely, we’re not suggesting this is a replacement for everything. If your data is largely static and your batches are huge, amortization can work in the favor of IVF et al. But modern production systems increasingly live in the small‑to‑mid batch, continuously updating regime. That’s the operating point Stat_Filter targets, and it’s where replacing most FLOPs with bit‑ops and reranking a handful we believe is a very good trade, and is quite novel. In practice, this can sit in front of existing stacks as a drop‑in prefilter, or stand on its own when rebuild windows are the bottleneck.

This qualifies as an Advance because it’s not a clever micro‑optimization of a known index; it’s a completely different cost model for ANN that remains accurate by construction (thanks to exact re‑rank), hardware‑efficient (with bit‑sliced GPU scoring), and operationally honest (using single‑CPU‑build totals reflecting real world dynamics). It generalizes across datasets with clear, reproducible policies, and it opens room for hybrid designs that combine compressed‑domain screening with light partitioning or graphs where appropriate.

In short, Stat_Filter makes vector search less brittle and more deployable in the places real systems actually live in 2025 and beyond. That is the kind of algorithmic shift an Advance would recognize. We welcome careful scrutiny of the code, the settings, and the fairness constraints, and I’m confident the community will find this is a solid foundation others can build on and commercialize thoroughly.

Looking forward to hearing from the community and happy to discuss further.

Inventor,
Granite Labs LLC

2 Likes

The Vector Search Challenge

The TIG vector search challenge has Innovators build algorithms that receive a database and query set of vectors. For each query vector, the algorithm must return a candidate database vector, such that the mean distance between the queries and the candidates is within some threshold. The database and query vectors are sampled from a mixture of Gaussians. Each instance of the TIG challenge is a new sampling of the database and query set.

Stat-filter High-Level Algorithm Flow

The Stat-filter algorithm has two main components, first a quantization is performed on all the vectors and then a MAD filter using the L2 norms of bit quantized vectors is applied before the search phase begins.

  • Quantization.
    First the per-dimension min/max of the vectors is identified. Data shifting is applied, let \text{overall}\_\min be the minimum across all dimensions of all database vectors, if negative, a uniform shift is applied (every component of every vector gets \text{overall}\_\min subtracted from it). All data is now non-negative. By using the per-dimension min/max, the vectors are then quantized (each vector has its components in each dimension binned into 1 of either 4 or 16 bins, corresponding to 2-bit or 4-bit quantization.) By default 4-bit quantization is done, unless STATFILT_BIT_MODE=2, in which case 2-bit quantization is done.

  • MAD filter.
    First the L2 norms of the quantized data are calculated. MAD is computed on quantized norms, not original norms! That is, MAD creates a length-based filter that rejects database vectors that are unlikely to be close to a query, based on their quantized L2 norms. What this means practically: the quantized database norms are sorted and then the median is found. The absolute deviations between the norms and the median is calculated for each database vector, and put into a list - the median of this list is then identified as a value called \text{mad}. A threshold is constructed via the formula

\text{threshold}= \text{scale} * \text{mad} * \text{MAD_SCALE_NORMAL}

For each query vector q and database vector p if

|\|q\|_2-\|p\|_2|>\text{threshold},

then when performing a search with q we will skip p. The two scaling terms are :

  • \text{MAD_SCALE_NORMAL}, which is set to 1.4826, which comes from the property that for a Normal distribution X \sim N(\mu,\sigma^2), if one defines MAD = \text{median}(|X - \text{median}(X)|) then \sigma \approx 1.4826 \times \text{MAD}. In the case of the submission X is the quantized L2 norm of a vector (which since we are in high dimensions is normal). Thus, the threshold effectively measures “how many standard deviations away from the median” a norm lies. This filtering makes sense due to the reverse triangle inequality.

  • The parameter \text{scale} adapts selectivity: for few queries (≤700) it is 0.20, giving fast but aggressive filtering (~80–90% removed), while for many queries (≥2500) it is 1.00, giving broader coverage (~50–60% removed) to preserve accuracy.

Brute Force Search. For each query, a GPU block computes the quantized dot product between it and non-filtered database vectors using bit-sliced operations. Each bit plane is compared using AND operations, then popcount tallies the matches. These counts are weighted by powers of 2 and summed to reconstruct the approximate dot product in quantized space. Each thread tracks its own top-K candidates, then all threads combine results to find the final top-K for exact reranking.

Performance

Here we evaluate the performance of the submission by testing it against other vector search algorithms. We stick to vector search algorithms that have a low build time. We test the algorithms on the fashion-mnist-784-euclidean dataset.


Two bar plots of stat_filter algorithm vs IVF-flat and IVF-PQ and Brute Force. On the left the total time algorithms took and on the right the recall (the proportion of candidate neighbours which were the true nearest neighbour). The submitted algorithm stat_filter achieves 0.95 recall, whilst being the fastest algorithm. We used the provided SOTA repo to produce these results. We also point to the evidence form which shows stat_filters performance against improved_search a previously high performing merged tig algorithm.

Claim of Novelty

According to the Advance Evidence Form, the claim of novelty relies on the following:

  1. The method itself is entirely new,

  2. The method represents a new combination of prior art,

  3. The method applies known techniques in a novel way that produces a distinct technical effect.

As described above in the Stat-filter High-Level Algorithm Flow, the submission combines bit quantization with a Median Absolute Deviation (MAD) gating layer to eliminate low-probability candidate vectors before the costly distance computation stage of an Approximate Nearest Neighbour (ANN) pipeline. In the search phase, the method then applies an optimized brute-force approach, where a quantized dot product is used to efficiently search for approximate nearest neighbours. Within the ANN literature on filtering, this approach clearly belongs to the pre-filtering category: for each query, the database is filtered first, and search is performed only on the reduced set.

We now assess this method against the three points of novelty, noting that only one of these needs to be true in order to be considered “novel” (novelty being a requirement to be eligible for Advance rewards):

  1. Entirely new method Neither MAD filtering nor bit quantization are new in themselves, and bit quantization methods for ANN are well established [1]. To the best of our knowledge, however, MAD filtering has not been applied in this specific way to ANN search. Similar examples of tasks for MAD are outlier detection [2] or as a robust data scaling method [3]. General filtering in ANN is also not new, with common examples including:
  • Metadata filters: restricting search to subsets of vectors based on attached scalar values [4]

  • IVF methods: using cluster centroids for coarse partitioning (IVF-flat) or quantization (IVF-PQ) [5]

  • Locality-Sensitive Hashing (LSH): bucketing vectors by random hash functions [6]

  1. New combination of prior art
    To the best of our knowledge, there are no prior examples of combining bit-quantized MAD L2-norm filtering with bit-sliced similarity evaluation in this way for ANN search. However using bit-quantized vectors for distance calculations in itself is not a new combination.

  2. Novel technical effect
    The specific way these techniques are combined produces tangible technical benefits. MAD L2-norm filtering becomes significantly faster because it operates on quantized values, which in turn reduces the number of candidate vectors considered in the distance search. Since bit-sliced similarity itself requires bit quantization, the pipeline is tightly coupled, leading to improved efficiency overall. However as advised by the submitter when running the method on heavy-tailed distributions (like SIFT dataset) then STATFILT_MAD_SCALE must be 0. Any non-zero value will cause severe recall degradation. Hence when STATFILT_MAD_SCALE=0 the technical effect of the submission is only attributed to the quantization and bit-slicing, not the pre-filter.

The submitted evidence form reflects this analysis. It acknowledges that the method does not meet criterion (1) but makes a case that it qualifies for novelty under criteria (2) and (3).

Claim of inventiveness

Having established novelty, we now turn to inventiveness (or “non-obviousness”). The evidence from Granite Labs argues that the submission is non-obvious because a person of ordinary skill in the art (POSITA) would not naturally expect that combining a robust statistical measure from outside the ANN field with coarse quantization would yield state-of-the-art results.

Most existing ANN filters are based on mean statistics or clustering. In this context, the use of the Median Absolute Deviation (MAD) as a pre-filter is unconventional. MAD is typically used in outlier detection or robust scaling, but its application here to reject improbable candidates in a high-dimensional ANN setting is not an obvious choice. The method achieves two outcomes:

  • predictable speedups from filtering and low-precision computation, and

  • unexpectedly high recall, even after pre-filtering and quantized similarity evaluation.

The latter result strengthens the inventiveness claim. A POSITA might anticipate efficiency gains but would reasonably doubt whether accuracy could be preserved under such coarse approximations. The fact that the method demonstrates both efficiency and strong recall suggests that its effectiveness is not obvious from prior art.

Summary

The submission combines two well-known mathematical methods but applies them jointly to ANN search in a way that, to the best of our knowledge, has not previously been documented. The combination achieves clear technical benefits, as shown both in the evidence form and in our performance evaluations. Importantly, this method should not be directly compared to ANN algorithms that rely on pre-built or learned structures, since those approaches achieve faster query times at the cost of substantially higher build times.



  1. Weaviate Documentation. Binary Quantization. Available at: https://docs.weaviate.io/weaviate/concepts/vector-quantization#binary-quantization. ↩︎

  2. Zilliz. How do I detect and handle outlier embeddings? Available at: https://zilliz.com/ai-faq/how-do-i-detect-and-handle-outlier-embeddings. ↩︎

  3. Kalinin, A. A., Arevalo, J., Serrano, E., Vulliard, L., Tsang, H., Bornholdt, M., Muñoz, Á. F., Sivagurunathan, S., Rajwa, B., Carpenter, A. E., et al. (2025). A versatile information retrieval framework for evaluating profile strength and similarity. Nature Communications, 16(1), 5181. ↩︎

  4. Lin, Y., Zhang, K., He, Z., Jing, Y., & Wang, X. S. (2025). Survey of Filtered Approximate Nearest Neighbor Search over the Vector-Scalar Hybrid Data. arXiv preprint arXiv:2505.06501. ↩︎

  5. Jégou, H., Douze, M., & Schmid, C. (2010). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128. ↩︎

  6. Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing. In Proceedings of VLDB (Vol. 99, pp. 518–529). ↩︎

1 Like

review_alex.pdf (155.7 KB)

2 Likes