IVF-PQ Integration

IVF and PQ combine hierarchically to achieve sub-linear search time and memory efficiency. Let’s formalize their synergy using our toy dataset and mathematical analysis.

1. Workflow Overview

Indexing Phase (Preprocessing):

  1. IVF Clustering: Partition data into K clusters.
  2. PQ Compression: Encode vectors in each cluster into PQ codes.

Query Phase (Search):

  1. IVF: Find the nprobe closest clusters to q.
  2. PQ: Compute approximate distances within those clusters using LUTs.

2. Indexing Phase (Step-by-Step)

Step 1: IVF Clustering

  1. Input: Dataset X={x1,...,xn}Rd.
  2. K-means: Partition X into K clusters with centroids {μ1,...,μK}.
  3. Output: Inverted lists mapping centroids to their vectors.

Example:

Step 2: PQ Compression

  1. Subspace Decomposition: Split each cluster’s vectors into m subspaces.
  2. Codebook Training: Learn h centroids per subspace via k-means.
  3. PQ Encoding: Replace subvectors with centroid indices.

Example:

Original Vector PQ Code Reconstructed Vector
[5.2,5.0] (2,2) [5.2,4.97]
[5.1,4.8] (2,2) [5.2,4.97]
[5.3,5.1] (2,3) [5.2,5.1]

3. Query Phase (Step-by-Step)

Step 1: IVF Cluster Selection

  1. Compute Distances to Centroids:qμk2for k=1,...,K
  2. Select nprobe Closest Clusters:
    • Example: For q=[5.4,5.2], probe Cluster 2 (green).

Step 2: PQ Distance Approximation

  1. Precompute LUTs:
    For each subspace j, compute distances from q(j) to codebook centroids.

    LUTj[k]=q(j)cj,k2
  2. Search in Probed Clusters:
    For each PQ code (k1,...,km) in the cluster:

    d~2(q,x)=j=1mLUTj[kj]

Example:

PQ Code d~2
(2,2) 0.04+0.05=0.09
(2,2) 0.09
(2,3) 0.04+0.05=0.09

4. Error Sources & Mitigation

Error 1: IVF Cluster Miss

Error 2: PQ Quantization

Example:

5. Parameter Tuning

Parameter IVF Impact PQ Impact
num_partitions (K) Higher K → smaller clusters, lower IVF error. No direct impact.
nprobe Higher → slower search, better recall. No direct impact.
num_sub_vectors (m) No direct impact. Higher m → lower PQ error, slower LUTs.
accelerator Faster k-means clustering. Faster codebook training.
Parameter Effect on Efficiency Effect on Accuracy
num_partitions (K) Higher K → Smaller clusters, faster search. Too high K may fragment true neighbors.
nprobe Higher nprobenprobe​ → Slower search. Higher recall.
num_sub_vectors(m) Higher m → Slower but more accurate ADC. Lower ϵPQ

Example:
For a 1536D vector:

6. Real-World Scaling

For 1B vectors in R1536:

7. Summary

Aspect IVF-PQ Integration
Indexing Hierarchical: Clusters (IVF) + compressed codes (PQ).
Querying Two-stage: Coarse search (IVF) + fine approximation (PQ).
Efficiency Combines IVF’s search reduction and PQ’s compression for exponential speedup.
Accuracy Control Tunable via K, nprobe, m, and h.

8. Back to Your Code

def create_indexed_table(...):
    # IVF-PQ parameters
    target_table.create_index(
        num_partitions=256,    # 256 clusters (IVF)
        num_sub_vectors=16,    # Split 1536D into 16 subspaces (PQ)
        nprobe=20,             # Probe 20/256 clusters (7.8%)
        accelerator="cuda"     # GPU for faster training
    )

Why This Works:

Final Takeaway

IVF-PQ is like a two-layered sieve:

  1. IVF filters out irrelevant data (coarse layer).
  2. PQ refines the search within the filtered subset (fine layer).
    Together, they make high-dimensional ANN tractable!

Back to outline: IVF-PQ Outline