Colpali Log

2025-2-8

Implemented the Lancedb indexing.

Big issue in modal container and lancedb:


RuntimeError: lance error: LanceError(IO): Execution error: ExecNode(Take): thread panicked: task 25 panicked with message "called Result::unwrap() on an Err value: JoinError::Panic(Id(150),\"called Option::unwrap() on a None value\", ...)"

We got no choice, so we file a issue: bug(python): tbl.create_index(metric="cosine") causes Rust panic in Modal container, but works locally · Issue #2105 · lancedb/lancedb

2025-2-9

Reorganized the code. Now the code is clean.

Tested RuntimeError again by copying the local Indies to modal container. Still get same Error.

Next, try quantization.

Insight: Indexing is crucial here. In the Colpali case, indexing does not reduce accuracy. Additionally, even a single image can be indexed effectively since it generates 1,030 vectors, providing sufficient data for PQ (Product Quantization) to learn features. The more images available, the better the indexing performance, approaching 99.999% of native MaxSim computation accuracy.

2025-2-10

Test indexing performance on RTX 4090

Results for query: 'What is Sushan Wild party?'

Indexing time result
No 0.035 data_test_1.png, Distance: 21.97092056274414
Exploring_the_Limits_of_Language_Modeling_page_10.png, Distance: 22.033401489257812
Yes 0.019 data_test_1.png, Distance: 10.978950500488281
Generating Sequences_With_Recurrent_Neural_Networks_page_31.png, Distance: 13.490697860717773

This looks good, but each time indexing is different. Most of time the indexing will messing result. We need a solution for it.

2025-2-11

Went through the theoretical pipeline of how to build best ColPali.

Make a plan:

Stage 1: Native ColPali Speed: 1x Accuracy: 98.1% NDCG@20 Memory: High Stage 2: Hybrid Speed: 13x faster Accuracy: 95.2% NDCG@20 Memory: Moderate Stage 3: Hybrid + BQ Speed: 40x faster Accuracy: 94.8% NDCG@20 Memory: Low

Stage 1

Stage 2

Stage 3

2025-2-12

Native ColPali finished. test result:
dataset: test-pdfs

Query 1: What is Sushan Wild party?
Query 2: Which party got more women in 112th?
Query 3: Who are transformers paper's authors?

Methods Query Score Time
Native Q1 data_test_1.png (score: 10.004)
gao-25-900570_page_74.png (score: 7.956)
0.28765s
Native Q2 data_test_2.png (score: 17.327)
data_test_1.png (score: 14.918)
0.08996s
Native Q3 gao-25-900570_page_24.png (score: 9.214)
gao-25-900570_page_7.png (score: 7.922)
0.09184s

2025-2-13

Problem 1: upsert() of Qdrant has uploading limit which is 17. So we only process with for loop to upsert embeddings.

Today I implemented: HNSW, mean_pooling_columns and mean_pooling_rows and get prefetch.

Some questions has been answered:

Q1: How prefetch Works in Qdrant?

search_queries.append(
    QueryRequest(
        query=q_embedding,
        prefetch=[
            Prefetch(query=q_embedding, limit=200, using="mean_pooling_columns"),
            Prefetch(query=q_embedding, limit=200, using="mean_pooling_rows")
        ],
        limit=top_k,
        with_payload=True,
        with_vector=False,
        using="original"
    )
)

This means:
After Qdrant finds the top top_k matches from "original", it also fetches up to 200 entries from "mean_pooling_columns" and "mean_pooling_rows" that are related to those results.

  1. Primary Search (using="original")
    Your search is performed only on "original", meaning that Qdrant finds the most similar vectors in that space.
  2. Prefetching (prefetch=[...])
    After Qdrant retrieves the best matching points (data entries) from "original", it also fetches their related embeddings from "mean_pooling_columns" and "mean_pooling_rows". Prefetched vectors are not used for ranking but can be useful for additional processing.

Q2: Why Use prefetch with mean_pooling_columns and mean_pooling_rows?

How prefetch work?

2025-2-14

81 Imges

Search with prefetch

pipeline.search_with_text_queries.remote(queries, prefetch_size=20, top_k=3)

Query: What is Sushan Wild party?

Search time: 0.12687s

Search without prefetch

pipeline.search_without_prefetch.remote(queries, top_k=3)

Query: What is Sushan Wild party?

HNSW

hnsw_config=HnswConfigDiff(m=0) # HNSW switched off

Number of neighbours to consider during the index building. Larger the value - more accurate the search, more time required to build index.

Binary Quantization

Query: What is Sushan Wild party?

Query: Which party got more women in 112th?

Query: Who are transformers paper's authors?

Search time: 0.19166s

The result is not that perfect. BQ will change the accuracy of MaxSim searching.

2025-2-25

“Full Model vs. State Dict” in PyTorch

Finished classifier training.

2025-3-2

curl -X POST "https://tu-zhenzhao--stylemi-app-v2-api-service.modal.run/search?amount=5"
-H "Content-Type: multipart/form-data"
-F "file=@nail/183491c3-519c-446a-92d2-0e0955ff1eba.jpg"

2025-4-6

compared two classify_images_async

For 1600 images

2025-4-23

Page not found · GitHub · GitHub

This code can runs large image dataset. Try to upload 20k images.

For 1000 batch: Memory 6.97GB, Low 3.41GB

Each hour with $2.03, ~4k images

$5 for 8k images

2025-5-1

Iteration 1

Using original method for reranking with prefilter on for final_df (.where(f"id IN {tuple_ids}", prefilter=True))

Direct Search Reranking Search
Query: batman say 'Good shot robin, and now we'll see who our masked mystery man is'
Filename: 1_page_272.png
Score : 0.6503
Duration: 21.34 s
----------------------------------------
Filename: 2_page_60.png
Score : 0.6483
Duration: 21.34 s
----------------------------------------
Filename: 6_page_224.png
Score : 0.6477
Duration: 21.34 s
----------------------------------------
Filename: 1_page_268.png
Score : 0.6462
Duration: 21.34 s
----------------------------------------
Filename: 1_page_234.png
Score : 0.6431
Duration: 21.34 s
----------------------------------------

= Query: Limit of LLM =
Filename: Captain America vol 1 383 (1991) (c2ce-dcp)_page_3.png
Score : 0.8310
Duration: 20.16 s
----------------------------------------
Filename: Captain America vol 1 415 (1993) (c2ce-dcp)_page_8.png
Score : 0.8257
Duration: 20.16 s
----------------------------------------
Filename: Captain America vol 1 412 (1993) (c2ce-dcp)_page_12.png
Score : 0.8245
Duration: 20.16 s
----------------------------------------
Filename: Captain America vol 1 407 (1992) (c2ce-dcp)_page_15.png
Score : 0.8162
Duration: 20.16 s
----------------------------------------
Filename: Captain America vol 1 382 (1991) (c2ce-dcp)_page_3.png
Score : 0.8138
Duration: 20.16 s
----------------------------------------
Query: batman say 'Good shot robin, and now we'll see who our masked mystery man is'
Filename: 1_page_303.png
Score : 0.6543
Duration: 29.04 s
----------------------------------------
Filename: 4_page_186.png
Score : 0.6540
Duration: 29.04 s
----------------------------------------
Filename: 1_page_315.png
Score : 0.6514
Duration: 29.04 s
----------------------------------------
Filename: 1_page_268.png
Score : 0.6462
Duration: 29.04 s
----------------------------------------
Filename: 1_page_234.png
Score : 0.6431
Duration: 29.04 s
----------------------------------------

= Query: Limit of LLM =
Filename: Marvel Universe v1 004_page_3.png
Score : 0.9017
Duration: 31.92 s
----------------------------------------
Filename: Marvel Universe v1 005_page_4.png
Score : 0.8899
Duration: 31.92 s
----------------------------------------
Filename: Marvel Universe v1 001_page_3.png
Score : 0.8888
Duration: 31.92 s
----------------------------------------
Filename: 352.08 Spider-Man V1 #19 (Digital)_page_14.png
Score : 0.8878
Duration: 31.92 s
----------------------------------------
Filename: Captain America vol 1 263 (1981) (c2ce) (Mazen-DCP)_page_22.png
Score : 0.8454
Duration: 31.92 s
----------------------------------------

Iteration 2

Now indexed pooling_rows and pooling_cols , but no scaler and no original indexing.

Direct Search Reranking Search
Query: batman say 'Good shot robin, and now we'll see who our masked mystery man is'
Filename: 1_page_272.png
Score : 0.6503
Duration: 21.34 s
----------------------------------------
Filename: 2_page_60.png
Score : 0.6483
Duration: 21.34 s
----------------------------------------
Filename: 6_page_224.png
Score : 0.6477
Duration: 21.34 s
----------------------------------------
Filename: 1_page_268.png
Score : 0.6462
Duration: 21.34 s
----------------------------------------
Filename: 1_page_234.png
Score : 0.6431
Duration: 21.34 s
----------------------------------------

= Query: Limit of LLM =
Filename: Captain America vol 1 383 (1991) (c2ce-dcp)_page_3.png
Score : 0.8310
Duration: 20.16 s
----------------------------------------
Filename: Captain America vol 1 415 (1993) (c2ce-dcp)_page_8.png
Score : 0.8257
Duration: 20.16 s
----------------------------------------
Filename: Captain America vol 1 412 (1993) (c2ce-dcp)_page_12.png
Score : 0.8245
Duration: 20.16 s
----------------------------------------
Filename: Captain America vol 1 407 (1992) (c2ce-dcp)_page_15.png
Score : 0.8162
Duration: 20.16 s
----------------------------------------
Filename: Captain America vol 1 382 (1991) (c2ce-dcp)_page_3.png
Score : 0.8138
Duration: 20.16 s
= Query: batman say 'Good shot robin, and now we'll see who our masked mystery man is' =
Filename: 3_page_235.png
Score : 0.6637
Duration: 25.72 s
----------------------------------------
Filename: 3_page_250.png
Score : 0.6624
Duration: 25.72 s
----------------------------------------
Filename: 3_page_20.png
Score : 0.6591
Duration: 25.72 s
----------------------------------------
Filename: 3_page_234.png
Score : 0.6581
Duration: 25.72 s
----------------------------------------
Filename: 1_page_234.png
Score : 0.6431
Duration: 25.72 s
----------------------------------------

= Query: Limit of LLM =
Filename: Captain America vol 1 283 (c2ce-dcp)_page_35.png
Score : 0.9489
Duration: 22.05 s
----------------------------------------
Filename: Marvel Universe v1 002_page_2.png
Score : 0.9303
Duration: 22.05 s
----------------------------------------
Filename: Marvel Universe v1 002_page_29.png
Score : 0.9256
Duration: 22.05 s
----------------------------------------
Filename: Captain America vol 1 284 (c2ce-dcp)_page_31.png
Score : 0.9251
Duration: 22.05 s
----------------------------------------
Filename: Marvel Universe v1 004_page_3.png
Score : 0.9017
Duration: 22.05 s

Iteration 3

Now indexed pooling_rows and pooling_cols and scaler, but no original indexing.

Direct Search Reranking Search
Query: batman say 'Good shot robin, and now we'll see who our masked mystery man is'
Filename: 1_page_272.png
Score : 0.6503
Duration: 21.34 s
----------------------------------------
Filename: 2_page_60.png
Score : 0.6483
Duration: 21.34 s
----------------------------------------
Filename: 6_page_224.png
Score : 0.6477
Duration: 21.34 s
----------------------------------------
Filename: 1_page_268.png
Score : 0.6462
Duration: 21.34 s
----------------------------------------
Filename: 1_page_234.png
Score : 0.6431
Duration: 21.34 s
----------------------------------------

= Query: Limit of LLM =
Filename: Captain America vol 1 383 (1991) (c2ce-dcp)_page_3.png
Score : 0.8310
Duration: 20.16 s
----------------------------------------
Filename: Captain America vol 1 415 (1993) (c2ce-dcp)_page_8.png
Score : 0.8257
Duration: 20.16 s
----------------------------------------
Filename: Captain America vol 1 412 (1993) (c2ce-dcp)_page_12.png
Score : 0.8245
Duration: 20.16 s
----------------------------------------
Filename: Captain America vol 1 407 (1992) (c2ce-dcp)_page_15.png
Score : 0.8162
Duration: 20.16 s
----------------------------------------
Filename: Captain America vol 1 382 (1991) (c2ce-dcp)_page_3.png
Score : 0.8138
Duration: 20.16 s
= Query: batman say 'Good shot robin, and now we'll see who our masked mystery man is' =
Filename: 3_page_235.png
Score : 0.6637
Duration: 1.60 s
----------------------------------------
Filename: 3_page_250.png
Score : 0.6624
Duration: 1.60 s
----------------------------------------
Filename: 3_page_20.png
Score : 0.6591
Duration: 1.60 s
----------------------------------------
Filename: 3_page_234.png
Score : 0.6581
Duration: 1.60 s
----------------------------------------
Filename: 1_page_234.png
Score : 0.6431
Duration: 1.60 s
----------------------------------------

= Query: Limit of LLM =
Filename: Captain America vol 1 283 (c2ce-dcp)_page_35.png
Score : 0.9489
Duration: 2.72 s
----------------------------------------
Filename: Marvel Universe v1 002_page_2.png
Score : 0.9303
Duration: 2.72 s
----------------------------------------
Filename: Marvel Universe v1 002_page_29.png
Score : 0.9256
Duration: 2.72 s
----------------------------------------
Filename: Captain America vol 1 284 (c2ce-dcp)_page_31.png
Score : 0.9251
Duration: 2.72 s
----------------------------------------
Filename: Marvel Universe v1 004_page_3.png
Score : 0.9017
Duration: 2.72 s

Iteration 4

Now indexed pooling_rows and pooling_cols and scaler and original indexing.

original indexing time

Direct Search Reranking Search
= Query: batman say 'Good shot robin, and now we'll see who our masked mystery man is' =
Filename: 5_page_156.png
Score : 0.6910
Duration: 0.87 s
----------------------------------------
Filename: 3_page_173.png
Score : 0.6894
Duration: 0.87 s
----------------------------------------
Filename: 2_page_159.png
Score : 0.6868
Duration: 0.87 s
----------------------------------------
Filename: 2_page_385.png
Score : 0.6832
Duration: 0.87 s
----------------------------------------
Filename: Captain America vol 1 400 (1992) (c2ce-dcp)_page_30.png
Score : 0.6819
Duration: 0.87 s
----------------------------------------

= Query: Limit of LLM =
Filename: Captain America vol 1 382 (1991) (c2ce-dcp)_page_3.png
Score : 0.9471
Duration: 0.36 s
----------------------------------------
Filename: Captain America vol 1 406 (1992) (c2ce-dcp)_page_24.png
Score : 0.9397
Duration: 0.36 s
----------------------------------------
Filename: Captain America vol 1 407 (1992) (c2ce-dcp)_page_15.png
Score : 0.9275
Duration: 0.36 s
----------------------------------------
Filename: Daredevil 157 (03-1979)(HD)(C2C)(RexTyler-DCP)_page_11.png
Score : 0.9242
Duration: 0.36 s
----------------------------------------
Filename: 06. Thor 373_page_12.png
Score : 0.9238
Duration: 0.36 s
----------------------------------------
= Query: batman say 'Good shot robin, and now we'll see who our masked mystery man is' =
Filename: 3_page_235.png
Score : 0.6637
Duration: 1.60 s
----------------------------------------
Filename: 3_page_250.png
Score : 0.6624
Duration: 1.60 s
----------------------------------------
Filename: 3_page_20.png
Score : 0.6591
Duration: 1.60 s
----------------------------------------
Filename: 3_page_234.png
Score : 0.6581
Duration: 1.60 s
----------------------------------------
Filename: 1_page_234.png
Score : 0.6431
Duration: 1.60 s
----------------------------------------

= Query: Limit of LLM =
Filename: Captain America vol 1 283 (c2ce-dcp)_page_35.png
Score : 0.9489
Duration: 2.72 s
----------------------------------------
Filename: Marvel Universe v1 002_page_2.png
Score : 0.9303
Duration: 2.72 s
----------------------------------------
Filename: Marvel Universe v1 002_page_29.png
Score : 0.9256
Duration: 2.72 s
----------------------------------------
Filename: Captain America vol 1 284 (c2ce-dcp)_page_31.png
Score : 0.9251
Duration: 2.72 s
----------------------------------------
Filename: Marvel Universe v1 004_page_3.png
Score : 0.9017
Duration: 2.72 s

2025-5-6

Key facts about our table

Baseline:

no indices

stage server work latency
pooled_cols search flat scan (25 k × 1 030 dots) ≈ 10 s
pooled_rows search flat scan again ≈ 10 s
original refine flat scan, then discard rows not in id IN (…) (post-filter, because no scalar idx) ≈ 10 s
total 3 scans × 25 k rows 32 – 35 s

Plan shows full_scan: true, prefilter: false.

Add IVF-PQ on the two pooled columns

stage latency now
pooled_cols (IVF-PQ, cosine) ≈ 40 ms
pooled_rows (IVF-PQ, cosine) ≈ 40 ms
original refine (flat scan) ≈ 22–25 s
total 22 – 26 s (my log)

The 9–10 s gain we observed is exactly the two full scans you removed.

Add BTREE scalar index on

id

stage latency now
pooled_cols IVF-PQ 40 ms
pooled_rows IVF-PQ 40 ms
original refine on 200 rows ~ 500 ms
total (including 3× HTTPS) 1.6 – 2.7 s

Query-to-query variation (1.6 vs 2.7 s) is mostly:

Execution plan now shows

VectorSearch
  index: IVF_PQ(cosine)          -- for pooled vectors
  prefilter: true
  row_count: 197                 -- refine touches only ~200 rows
full_scan: false                 -- ✅ no table-wide scan

Why we didn’t index the “original” column

If we need even lower latency later (or many more pages), we can build an HNSW on a single-vector surrogate (e.g., CLS-token embedding) or switch prefetch_limit down to 50.

Take-away table

configuration pooled vectors index scalar id index refine rows total latency
None 25 000 32–35 s
IVF-PQ on pooled 25 000 22–26 s
IVF-PQ + BTREE (your final) ~200 1.6–2.7 s
11:26am

Size of Multi-vector

Original Pooled
1030 vectors × 128 dimensions =
131,840 float32 numbers
1030 vectors × 128 dimensions =
4,864 float32 numbers
~527.36 KB per image ~19 KB per image
25,000 images 13,184,000,000 bytes 25,000 = 486,400,000 bytes
1030 × 25,000 =
25,750,000 vectors = 25M
38 × 25,000 =
950,000 vectors < 1M
12.28 GB 0.45 GB

Eg:

Reranker. rerank number: 200