Price Report

v1 from Modal Cost for Alex

The cost breakdown for the searching service is categorized into the following components:

GPU Running Time: Costs vary depending on the GPU type used (e.g., Nvidia H100, L4, L40S).
CPU Running Time: Costs are based on the number of CPU cores reserved and their usage duration.
Memory Allocation: Costs are calculated per GiB per second.
Cloud Storage & Data Transfer: Includes volume usage, CloudBucketMount, and data egress costs.
Idle Container Time: Costs are incurred for the duration a container remains active, even if idle.

The Methodology Plan to Measure the Cost

GPU Tasks: Measure the cost per second for each GPU type used in the Catalog and Lookup classes.
CPU Tasks: Calculate the cost based on the number of CPU cores reserved and their usage time.
Memory Usage: Track memory allocation and usage to determine the cost per GiB per second.
Idle Container Time: Monitor the container idle timeout to calculate the cost of keeping containers active.
Testing Data: Use real-world testing data (e.g., API calls, image processing) to validate cost calculations.

Key Points of Cost

GPU Costs:
- Nvidia H100: $0.001267/sec
- Nvidia L4: $0.000222/sec
- Nvidia L40S: $0.000542/sec
CPU Costs:
- $0.000038/core/sec
Memory Costs:
- $0.00000667/GiB/sec
Idle Container Costs:
- Containers remain active for 720 seconds by default, incurring costs even when idle.
Optimization Strategies:
- Increase memory and concurrency for Lookup to handle more requests per container.
- Reduce container idle timeout for API services to minimize idle costs.

Cost Log Report

Testing Data

get_image:
- Cost: $0.40
- Time: 347.9s + 17.06s
get_image_urns:
- Memory: 1.06GB/16GB
- CPU: 0.11 core/4 core
- GPU Memory: 16.40GB
- Cost: $1.42 ($0.92 GPU + $0.27 CPU + $0.19 Memory)
- Time: 23mins 17s
First API Call:
- Time: 32s
- Cost: $0.24 ($0.18 GPU + $0.06 CPU + $0.01 Memory)
Frequent Request Test (40 requests/min):
- Total Requests: 160
- Cost: $0.88 ($0.64 GPU + $0.22 Memory + $0.02 CPU)
- Time: 12mins
- Cost per Call: $0.0055
Optimized Configuration Test (200 requests in 5 mins):
- Cost: $0.31 ($0.21 GPU + $0.07 Memory + $0.03 CPU)
- Cost per Call: $0.0015

Cost Analysis

GPU Costs:
- The Nvidia L40S GPU is the most cost-effective for frequent requests, with a cost of $0.000542/sec.
- The Nvidia H100 is more expensive but may be necessary for specific tasks.
CPU Costs:
- CPU costs are minimal compared to GPU costs but should still be optimized by reducing reserved cores.
Memory Costs:
- Memory costs are relatively low but can add up with high concurrency and large datasets.
Idle Container Costs:
- Reducing the container idle timeout significantly lowers costs, especially for API services.
Optimization Impact:
- Increasing memory and concurrency for Lookup reduced the cost per call from $0.0055 to $0.0015.
- Shortening the container idle timeout for API services further minimized idle costs.

Price Table

Resource Type	Cost per Unit	Example Usage	Cost Example
Nvidia L40S	$0.000542/sec	Frequent request handling	720s = $0.39
Nvidia L4	$0.000222/sec	Lookup class for queries	720s = $0.16
CPU (per core)	$0.000038/core/sec	Catalog (4 cores), Lookup (2 cores)	720s = $0.11 (Catalog), $0.05 (Lookup)
Memory	$0.00000667/GiB/sec	16GB reserved for Catalog	720s = $0.08
Idle Container	Based on GPU/CPU/Memory	720s timeout	$0.24 (L4 GPU + CPU + Memory)

Optimization Plan

1. Lookup Class (GPU Model)

Goal: Handle more requests per container to reduce the number of containers needed.
Actions:
- Increase memory from 1024 MB → 4096 MB (4 GB).
- Set container_idle_timeout=300s (5 minutes) to avoid cold starts.
- Set concurrency_limit=4 to allow handling 4 requests simultaneously.
Impact:
- Fewer containers are created, reducing GPU and memory costs.
- Cost per call reduced from $0.0055 to $0.0015.

2. API Service (FastAPI Web Handler)

Goal: Make the API service lightweight and disposable to minimize idle costs.
Actions:
- Reduce container_idle_timeout to 30 seconds.
- Set concurrency_limit=2 to limit the number of active containers.
- Disable keep_warm to avoid pre-warmed containers.
Impact:
- API containers shut down quickly when idle, reducing idle costs.
- Startup time is fast (3s), so new requests are handled efficiently.

Cost Projections

Current Configuration

Cost per Call: $0.0015 (optimized configuration).
Hourly Cost: $1.08 (based on 200 requests in 5 minutes).
Monthly Cost: ~$777.60 (assuming 24/7 operation).

Future Improvements

Further Increase Concurrency:
- If concurrency_limit is increased to 8, the cost per call could drop further.
Dynamic Scaling:
- Implement dynamic scaling based on request load to reduce idle time.
GPU Selection:
- Evaluate if a lower-cost GPU (e.g., Nvidia T4) can handle the workload without performance degradation.

Key Takeaways

GPU Costs Dominate:
- The choice of GPU (e.g., L40S vs. H100) has the most significant impact on overall costs.
Idle Time is Expensive:

Reducing container_idle_timeout for API services significantly lowers costs.

Optimization Works:

Increasing memory and concurrency for Lookup reduced costs by 72% ($0.0055 → $0.0015 per call).

Testing is Crucial:

Real-world testing (e.g., 200 requests in 5 minutes) validated the cost savings.

Next Steps

Monitor Costs:

Continuously track costs after implementing the optimized configuration.

Evaluate GPU Options:

Test if a lower-cost GPU (e.g., Nvidia T4) can handle the workload.

Object Cost from Modal

The Methodology Plan to Measure the Cost

Key Points of Cost

Cost Log Report

Testing Data

Cost Analysis

Price Table

Optimization Plan

1. Lookup Class (GPU Model)

2. API Service (FastAPI Web Handler)

Cost Projections

Current Configuration

Future Improvements

Key Takeaways

Next Steps