ANN using PQ on multivector

Using an Approximate Nearest Neighbor (ANN) index, such as one built with product quantization (PQ), introduces an approximation compared to the exact (or native) maxsim computation. In other words, there is a trade-off between accuracy and speed—some precision is sacrificed for significantly faster search speeds. Let's break this down mathematically.

Native Maxsim Computation

For a query multivector q and a stored item’s multivector x, the native similarity is defined as:

maxsim(q,x)=maxisim(qi,xi)

where the cosine similarity between two vectors qi and xi is computed exactly as:

sim(qi,xi)=qixiqixi

This computation is exact, meaning there is no approximation at this stage.

Approximate Similarity via PQ-Based Indexing

In many ANN methods, including those based on product quantization, each vector is split into M sub-vectors. A vector x can be written as:

x=[x1,x2,...,xM]

Each sub-vector xm is then approximated by a centroid from a learned codebook:

xmcm

where cm is the centroid approximating the m-th sub-vector of x. As a result, the cosine similarity between a query q and a stored vector x is approximated by:

sim(qm,xm)sim(qm,cm)

When computing the aggregated multivector similarity using indexing, we get:

maxsim(q,x)maxisim(qi,ci)

Since each sub-vector is approximated, an error is introduced:

em=sim(qm,xm)sim(qm,cm)

where em represents the quantization error for each (qm,xm) pair.

In the worst case, if each sub-vector approximation introduces a small error em, then over M sub-vectors, the total error may accumulate to an order of approximately:

m=1Mem

or, more realistically, a sum of errors that does not fully cancel out. Thus, the total approximation error for one cosine similarity computation is:

E=O(Mem)

where E depends on the quantization precision and the number of sub-vectors used.

Aggregating over the multivector, the approximate maxsim becomes:

maxsim(q,x)maxi(sim(qi,ci)+ei)

where ei represents the effective error after taking the maximum over the approximated cosine similarities.

Trade-off: Accuracy vs. Speed

There is a fundamental trade-off between computational accuracy and efficiency:

Mathematically, while:

sim(qm,xm)sim(qm,cm)

the error introduced is typically small relative to the differences in similarity scores between truly similar and dissimilar items. In practical retrieval scenarios, the most important factor is maintaining the correct ranking of items rather than the exact similarity values. Even if the absolute similarity values are slightly shifted or compressed (resulting in a smaller dynamic range), as long as the relative order is preserved, the ANN approach remains effective.

Conclusion

Using an ANN index like PQ in LanceDB does introduce an approximation:

My note

This is mathematical process, but not the real world. In real world, we might have this question:

Is it possible for the error to be nearly zero in this case? Since the key operation here involves MaxSim, which finds the maximum dot product for each query and sums the maximum values to compute the final score, the situation differs from typical approximate searches. If the ANN method consistently identifies the exact maximum dot product for each query, the approximation would still yield an accurate final score with minimal loss. This should, theoretically, result in much higher accuracy compared to the usual error associated with approximate cosine similarity searches.

The the short answer is:

Yes, if your ANN indexing method (e.g., using PQ) almost always returns the exact maximum candidate for each query vector, then the aggregated maxsim score will be very close to the native (exact) score. In this scenario, the error introduced by the approximation is nearly zero—especially when compared to methods that do not use the max operation or when the ANN method has lower recall.

Check details if you want to know why: Is it possible for the ANN indexing and MaxSim's error to be nearly zero in this case?