ANN using PQ on multivector
Using an Approximate Nearest Neighbor (ANN) index, such as one built with product quantization (PQ), introduces an approximation compared to the exact (or native) maxsim computation. In other words, there is a trade-off between accuracy and speed—some precision is sacrificed for significantly faster search speeds. Let's break this down mathematically.
Native Maxsim Computation
For a query multivector
where the cosine similarity between two vectors
This computation is exact, meaning there is no approximation at this stage.
Approximate Similarity via PQ-Based Indexing
In many ANN methods, including those based on product quantization, each vector is split into
Each sub-vector
where
When computing the aggregated multivector similarity using indexing, we get:
Since each sub-vector is approximated, an error is introduced:
where
In the worst case, if each sub-vector approximation introduces a small error
or, more realistically, a sum of errors that does not fully cancel out. Thus, the total approximation error for one cosine similarity computation is:
where
Aggregating over the multivector, the approximate maxsim becomes:
where
Trade-off: Accuracy vs. Speed
There is a fundamental trade-off between computational accuracy and efficiency:
-
Exact (Native) Calculation:
The cosine similarities are computed exactly, making the results precise but computationally expensive for large datasets. -
ANN Indexing (PQ-Based Computation):
The similarity is computed using quantized representations, significantly speeding up the search but introducing an error termin each similarity computation.
Mathematically, while:
the error introduced is typically small relative to the differences in similarity scores between truly similar and dissimilar items. In practical retrieval scenarios, the most important factor is maintaining the correct ranking of items rather than the exact similarity values. Even if the absolute similarity values are slightly shifted or compressed (resulting in a smaller dynamic range), as long as the relative order is preserved, the ANN approach remains effective.
Conclusion
Using an ANN index like PQ in LanceDB does introduce an approximation:
-
Mathematically: The cosine similarities are replaced by their quantized approximations:
leading to an aggregated maxsim computation:
-
Implication: While the exact similarity values differ from the native computation, the relative ranking of results is generally preserved. This trade-off is a common and accepted compromise in ANN search systems, where efficiency is prioritized over minor precision losses.
My note
This is mathematical process, but not the real world. In real world, we might have this question:
Is it possible for the error to be nearly zero in this case? Since the key operation here involves MaxSim, which finds the maximum dot product for each query and sums the maximum values to compute the final score, the situation differs from typical approximate searches. If the ANN method consistently identifies the exact maximum dot product for each query, the approximation would still yield an accurate final score with minimal loss. This should, theoretically, result in much higher accuracy compared to the usual error associated with approximate cosine similarity searches.
The the short answer is:
Yes, if your ANN indexing method (e.g., using PQ) almost always returns the exact maximum candidate for each query vector, then the aggregated maxsim score will be very close to the native (exact) score. In this scenario, the error introduced by the approximation is nearly zero—especially when compared to methods that do not use the max operation or when the ANN method has lower recall.
Check details if you want to know why: Is it possible for the ANN indexing and MaxSim's error to be nearly zero in this case?