Modal Cost for Alex
1. GPU Running Time
GPU Tasks | Costs |
---|---|
Nvidia H100 | $0.001267 / sec |
Nvidia A100, 80 GB | $0.000944 / sec |
Nvidia A100, 40 GB | $0.000772 / sec |
Nvidia L40S | $0.000542 / sec |
Nvidia A10G | $0.000306 / sec |
Nvidia L4 | $0.000222 / sec |
Nvidia T4 | $0.000164 / sec |
-
Catalog class runs on a GPU (H100 in your example) for image embeddings.
-
Lookup class runs on a GPU (L4 in your code snippet) for queries.
2. CPU Running Time
CPU Physical core (2 vCPU equivalent) $0.000038 / core / sec
-
Each class also reserves CPU resources (e.g., cpu=4.0 for Catalog, cpu=2.0 for Lookup).
-
If your system or function runs on CPU for a certain duration, that can also incur cost.
3. Memory Allocation
Memory $0.00000667 / GiB / sec
- For instance, memory=16384 (16 GB) or memory=1024 (1 GB). There is a per-GB-hour cost for memory.
- [ ]
4. Cloud Storage & Data Transfer
- Volume usage (the modal.Volume for models/lancedb).
- CloudBucketMount usage (your R2 buckets).
- Data egress from Modal to your users (e.g. returning images or large JSON to the client).
5. Idle Container Time (container_idle_timeout)
-
If you set a container idle timeout to 720 seconds, for example, you pay for any time the container remains active (whether or not a request is being handled).
Testing Data
get_image :
$0.40 takes 347.9s + 17.06s
get_image_urns:
Reserved 16GB Memory Used: 1.06GB
Reserved 4 core CPU Used: 0.11 Core
GPU Memory Used: 16.40GB
Cost: 23mins 17s
In total: $1.42 = $0.92 (L40S) + $0.27 (CPU) + $0.19 (Memory)
from D "cost me like 4.50 and 19min for the 2.7k images in H100"
save_image_embedding
takes 4.51s
copy_to_volume
takes 3.92s
get_text_embedding
takes 17s spins up embed
kNN
takes 0.469s search
$0 cost
First call API
curl "https://tu-zhenzhao--stylemi-app-v2-api-service.modal.run/search?query=pink%20coffin%20nails&amount=6"
takes 32s
Second quick call
takes 0.35s (kNN) + 0.40s (Text EB)
Reserved 1GB Memory Used: 832MB
Reserved 2 core CPU Used: 0.35 Core
GPU Memory Used: 4.53 GB
Start to timeout in 720s: total $0.24 = $0.18 (L4) + $0.06 (CPU) + $0.01 (Memory)
Now $8.30 see if it change if we let the container cold.
Frequent Request Test
40 request per min
In total request: 160 times
Lookup memory: 2.54/2.61GB
Cost in total: $0.88 = $0.64 (L40S) + $0.22 (Memory) + $0.02 (CPU)
Takes 720s = 12mins
This keeps api_serivce
= 22 live container and Lookup
= 3 live container.
About $0.0055 per call
Takeaway
container_idle_timeout=720s
for both functions will keep creating containers. Here is how they work:
First call -> Activate 1 container in Lookup and 1 container in api_service.
This call will process like 900ms, during this time if there is any new request, 1 container in Lookup and 1 container in api_service will be created. Thus it turns to 2 containers.
Note: Lookup is class-based service (@app.cls), it only spins up new containers if an existing one is busy. Modal tried to reuse an existing Lookup container, but when multiple queries needed Lookup at the same time, it created extra instances (3 total).
So api_service will create container for every request. But Lookup will try to load as much as possible to prevent creating more containers.
So if those warm up container not cold down, they will keep hand new request without adding more.
Plan for optimal
✅ Strategy Breakdown
- Make Lookup (GPU model) handle more requests per container
- Increase memory & concurrency so each container can process multiple API requests before needing a new one.
- Keep container_idle_timeout longer so it can efficiently reuse the same container for new queries.
container startup time avg: 25s
Example Config:
@app.cls(
gpu="L4", # Keep L4 GPU for inference
cpu=2.0, # Enough CPU for request handling
memory=4096, # 🟢 Increase memory from 1024 → 4096 MB (4 GB)
container_idle_timeout=300, # 🟢 Keep warm for 5 min (was 720)
concurrency_limit=4, # 🟢 Allow handling 4 requests at the same time
)
class Lookup:
...
✅ More memory → Handle larger requests per container.
✅ Longer timeout (300s) → Avoid cold starts & GPU reloading.
✅ Concurrency limit = 4 Lookup requests at the same time, meaning fewer total containers.
- Make api_service (FastAPI web handler) lightweight & disposable
- Shorten container_idle_timeout so unused API containers shut down quickly (reducing cost).
- Keep low memory & no warm instances, so new API requests only spawn when necessary.
container startup time avg: 3s
Example Config:
@app.function(
image=endpoints_image,
enable_memory_snapshot=True,
container_idle_timeout=30, # 🔴 Reduce to 30s (was 720s)
concurrency_limit=2, # 🔴 Keep only 2 API containers max
keep_warm=0, # 🔴 No pre-warmed containers (save cost)
)
@modal.asgi_app()
def api_service():
...
✅ Short timeout (30s) → API containers shut down almost immediately if idle.
✅ Concurrency limit (2) → At most 2 containers alive, even under heavy load.
✅ No keep_warm → New requests start fresh, but fast.
2025-2-15
I setup the configuration up there.
Then i tested running api call for 5 mins. There will be 200 request sending to server.
The report:
Cost in total: $0.31 = $0.21 (L4) + $0.07 (Memory) + $0.03 (CPU)
About $0.0015 per call.
2025-2-16
Started a container 605s, takes 14s for first spin up.
Total cost: $0.23 , thus the cost per second is $0.0003801653
5mins takes $0.09 = 10mins takes $0.18 = 1h takes $1.08
v1: Price Report
Function Test
API call for 5 mins each min got 40 calls. There will be 200 request sending to server.
Lookup Container: 591s (9m51s); 1 container
api_service: average 1min30s; 6 containers
Cost: $0.16 = $ 0.13(L4) + $0.01 (Memory) + $0.02 (CPU)
Prediction function:
Lookup GPU Cost of L4: $0.133200
Lookup CPU Cost of L4: $0.011400
Lookup Memory Cost of L4: $0.008004
------------------------------------
Lookup GPU Cost of None: $0.000000
Lookup CPU Cost of None: $0.010640
Lookup Memory Cost of None: $0.007470
------------------------------------
Estimated Lookup Service Cost: $0.152604
Estimated API Service Cost: $0.018110
Estimated Total Cost for Stress Test: $0.170714
More test:
API call for 5 mins each min got 40 calls. There will be 200 request sending to server.
Lookup Container: 410s (6m50s); 1 container
api_service: average 5min35s; 6 containers
Total cost: $0.11
Prediction Function:
💰 **Estimated Costs:**
- Lookup Service (L4, 20 concurrent inputs, 100s idle)
⮕ Execution Time: 410s ⮕ Cost: $0.104279
- API Service (5 concurrent inputs, 30s idle)
⮕ Execution Time: 335s ⮕ Cost: $0.010834
🚀 **Total Estimated Cost: $0.115113**