IVF-PQ Integration

IVF and PQ combine hierarchically to achieve sub-linear search time and memory efficiency. Let’s formalize their synergy using our toy dataset and mathematical analysis.

1. Workflow Overview

Indexing Phase (Preprocessing):

IVF Clustering: Partition data into $K$ clusters.
PQ Compression: Encode vectors in each cluster into PQ codes.

Query Phase (Search):

IVF: Find the $n_{probe}$ closest clusters to $q$ .
PQ: Compute approximate distances within those clusters using LUTs.

2. Indexing Phase (Step-by-Step)

Step 1: IVF Clustering

Input: Dataset $X = {x_{1}, . . ., x_{n}} \subset R^{d}$ .
K-means: Partition $X$ into $K$ clusters with centroids ${μ_{1}, . . ., μ_{K}}$ .
Output: Inverted lists mapping centroids to their vectors.

Example:

Clusters: 3 clusters (red, green, blue) in $R^{2}$ .
Inverted Lists: $Cluster 2 (green) \to {[5.2, 5.0], [5.1, 4.8], [5.3, 5.1]}$

Step 2: PQ Compression

Subspace Decomposition: Split each cluster’s vectors into $m$ subspaces.
Codebook Training: Learn $h$ centroids per subspace via k-means.
PQ Encoding: Replace subvectors with centroid indices.

Example:

Cluster 2 (green) PQ Codes:

Original Vector	PQ Code	Reconstructed Vector
$[5.2, 5.0]$	$(2, 2)$	$[5.2, 4.97]$
$[5.1, 4.8]$	$(2, 2)$	$[5.2, 4.97]$
$[5.3, 5.1]$	$(2, 3)$	$[5.2, 5.1]$

3. Query Phase (Step-by-Step)

Step 1: IVF Cluster Selection

Compute Distances to Centroids: $∥ q - μ_{k} ∥^{2} for k = 1, . . ., K$
Select $n_{probe}$ Closest Clusters:
- Example: For $q = [5.4, 5.2]$ , probe Cluster 2 (green).

Step 2: PQ Distance Approximation

Precompute LUTs:
For each subspace $j$ , compute distances from $q^{(j)}$ to codebook centroids.
${LUT}_{j} [k] = ∥ q^{(j)} - c_{j, k} ∥^{2}$
Search in Probed Clusters:
For each PQ code $(k_{1}, . . ., k_{m})$ in the cluster:
${\tilde{d}}^{2} (q, x) = \sum_{j = 1}^{m} {LUT}_{j} [k_{j}]$

Example:

Query $q = [5.4, 5.2]$ :
- LUTs: ${LUT}_{1} = [18.92, 0.04, 13.58]$ , ${LUT}_{2} = [17.22, 0.05, 17.64]$ .
- PQ Codes in Cluster 2:

PQ Code	${\tilde{d}}^{2}$
$(2, 2)$	$0.04 + 0.05 = 0.09$
$(2, 2)$	$0.09$
$(2, 3)$	$0.04 + 0.05 = 0.09$

4. Error Sources & Mitigation

Error 1: IVF Cluster Miss

Cause: True nearest neighbor is outside probed clusters.
Mitigation: Increase $n_{probe}$ (e.g., from 1 to 5).

Error 2: PQ Quantization

Cause: Approximating subvectors with centroids.
Mitigation:
- Increase $m$ (more subspaces → finer quantization).
- Increase $h$ (more centroids per subspace).

Example:

True Distance: $∥ q - [5.3, 5.1] ∥^{2} = 0.05$ .
PQ Approximation: ${\tilde{d}}^{2} = 0.09$ (error = $0.04$ ).

5. Parameter Tuning

Parameter	IVF Impact	PQ Impact
`num_partitions` ( $K$ )	Higher $K$ → smaller clusters, lower IVF error.	No direct impact.
`nprobe`	Higher → slower search, better recall.	No direct impact.
`num_sub_vectors` ( $m$ )	No direct impact.	Higher $m$ → lower PQ error, slower LUTs.
`accelerator`	Faster k-means clustering.	Faster codebook training.

Parameter	Effect on Efficiency	Effect on Accuracy
`num_partitions` ( $K$ )	Higher $K$ → Smaller clusters, faster search.	Too high $K$ may fragment true neighbors.
`nprobe`	Higher nprobenprobe → Slower search.	Higher recall.
`num_sub_vectors`( $m$ )	Higher $m$ → Slower but more accurate ADC.	Lower $ϵ_{P Q}$

Example:
For a 1536D vector:

Optimal $m$ : 16 (splits into 96D subspaces).
Optimal $K$ : 256 (balances cluster granularity and training cost).

6. Real-World Scaling

For 1B vectors in $R^{1536}$ :

IVF: Reduces search scope to $\frac{n_{probe}}{K} \cdot n = \frac{20}{256} \cdot 1 B = 78 M$ vectors.
PQ: Reduces per-vector distance cost from 1536 operations to 16 LUT additions.
Total Speedup: $\frac{K}{n_{probe}} \cdot \frac{d}{m} = \frac{256}{20} \cdot \frac{1536}{16} = 12.8 \times 96 = 1, 228 \times$

7. Summary

Aspect	IVF-PQ Integration
Indexing	Hierarchical: Clusters (IVF) + compressed codes (PQ).
Querying	Two-stage: Coarse search (IVF) + fine approximation (PQ).
Efficiency	Combines IVF’s search reduction and PQ’s compression for exponential speedup.
Accuracy Control	Tunable via $K$ , $n_{probe}$ , $m$ , and $h$ .

8. Back to Your Code

def create_indexed_table(...):
    # IVF-PQ parameters
    target_table.create_index(
        num_partitions=256,    # 256 clusters (IVF)
        num_sub_vectors=16,    # Split 1536D into 16 subspaces (PQ)
        nprobe=20,             # Probe 20/256 clusters (7.8%)
        accelerator="cuda"     # GPU for faster training
    )

Why This Works:

Balances speed (GPU + 7.8% search scope) and accuracy (16 subspaces).
Scales to billion-scale datasets with minimal memory footprint.

Final Takeaway

IVF-PQ is like a two-layered sieve:

IVF filters out irrelevant data (coarse layer).
PQ refines the search within the filtered subset (fine layer).
Together, they make high-dimensional ANN tractable!

Back to outline: IVF-PQ Outline