Colpali Log

2025-2-8

Implemented the Lancedb indexing.

Big issue in modal container and lancedb:


RuntimeError: lance error: LanceError(IO): Execution error: ExecNode(Take): thread panicked: task 25 panicked with message "called Result::unwrap() on an Err value: JoinError::Panic(Id(150),\"called Option::unwrap() on a None value\", ...)"

We got no choice, so we file a issue: bug(python): tbl.create_index(metric="cosine") causes Rust panic in Modal container, but works locally · Issue #2105 · lancedb/lancedb

2025-2-9

Reorganized the code. Now the code is clean.

Tested RuntimeError again by copying the local Indies to modal container. Still get same Error.

Next, try quantization.

Insight: Indexing is crucial here. In the Colpali case, indexing does not reduce accuracy. Additionally, even a single image can be indexed effectively since it generates 1,030 vectors, providing sufficient data for PQ (Product Quantization) to learn features. The more images available, the better the indexing performance, approaching 99.999% of native MaxSim computation accuracy.

2025-2-10

Test indexing performance on RTX 4090

Results for query: 'What is Sushan Wild party?'

Indexing time result
No 0.035 data_test_1.png, Distance: 21.97092056274414
Exploring_the_Limits_of_Language_Modeling_page_10.png, Distance: 22.033401489257812
Yes 0.019 data_test_1.png, Distance: 10.978950500488281
Generating Sequences_With_Recurrent_Neural_Networks_page_31.png, Distance: 13.490697860717773

This looks good, but each time indexing is different. Most of time the indexing will messing result. We need a solution for it.

2025-2-11

Went through the theoretical pipeline of how to build best ColPali.

Make a plan:

Stage 1: Native ColPali Speed: 1x Accuracy: 98.1% NDCG@20 Memory: High Stage 2: Hybrid Speed: 13x faster Accuracy: 95.2% NDCG@20 Memory: Moderate Stage 3: Hybrid + BQ Speed: 40x faster Accuracy: 94.8% NDCG@20 Memory: Low

Stage 1

Stage 2

Stage 3

2025-2-12

Native ColPali finished. test result:
dataset: test-pdfs

Query 1: What is Sushan Wild party?
Query 2: Which party got more women in 112th?
Query 3: Who are transformers paper's authors?

Methods Query Score Time
Native Q1 data_test_1.png (score: 10.004)
gao-25-900570_page_74.png (score: 7.956)
0.28765s
Native Q2 data_test_2.png (score: 17.327)
data_test_1.png (score: 14.918)
0.08996s
Native Q3 gao-25-900570_page_24.png (score: 9.214)
gao-25-900570_page_7.png (score: 7.922)
0.09184s

2025-2-13

Problem 1: upsert() of Qdrant has uploading limit which is 17. So we only process with for loop to upsert embeddings.

Today I implemented: HNSW, mean_pooling_columns and mean_pooling_rows and get prefetch.

Some questions has been answered:

Q1: How prefetch Works in Qdrant?

search_queries.append(
    QueryRequest(
        query=q_embedding,
        prefetch=[
            Prefetch(query=q_embedding, limit=200, using="mean_pooling_columns"),
            Prefetch(query=q_embedding, limit=200, using="mean_pooling_rows")
        ],
        limit=top_k,
        with_payload=True,
        with_vector=False,
        using="original"
    )
)

This means:
After Qdrant finds the top top_k matches from "original", it also fetches up to 200 entries from "mean_pooling_columns" and "mean_pooling_rows" that are related to those results.

  1. Primary Search (using="original")
    Your search is performed only on "original", meaning that Qdrant finds the most similar vectors in that space.
  2. Prefetching (prefetch=[...])
    After Qdrant retrieves the best matching points (data entries) from "original", it also fetches their related embeddings from "mean_pooling_columns" and "mean_pooling_rows". Prefetched vectors are not used for ranking but can be useful for additional processing.

Q2: Why Use prefetch with mean_pooling_columns and mean_pooling_rows?

How prefetch work?

2025-2-14

81 Imges

Search with prefetch

pipeline.search_with_text_queries.remote(queries, prefetch_size=20, top_k=3)

Query: What is Sushan Wild party?

Search time: 0.12687s

Search without prefetch

pipeline.search_without_prefetch.remote(queries, top_k=3)

Query: What is Sushan Wild party?

HNSW

hnsw_config=HnswConfigDiff(m=0) # HNSW switched off

Number of neighbours to consider during the index building. Larger the value - more accurate the search, more time required to build index.

Binary Quantization

Query: What is Sushan Wild party?

Query: Which party got more women in 112th?

Query: Who are transformers paper's authors?

Search time: 0.19166s

The result is not that perfect. BQ will change the accuracy of MaxSim searching.