Price Report
v1 from Modal Cost for Alex
Object Cost from Modal
The cost breakdown for the searching service is categorized into the following components:
- GPU Running Time: Costs vary depending on the GPU type used (e.g., Nvidia H100, L4, L40S).
- CPU Running Time: Costs are based on the number of CPU cores reserved and their usage duration.
- Memory Allocation: Costs are calculated per GiB per second.
- Cloud Storage & Data Transfer: Includes volume usage, CloudBucketMount, and data egress costs.
- Idle Container Time: Costs are incurred for the duration a container remains active, even if idle.
The Methodology Plan to Measure the Cost
- GPU Tasks: Measure the cost per second for each GPU type used in the Catalog and Lookup classes.
- CPU Tasks: Calculate the cost based on the number of CPU cores reserved and their usage time.
- Memory Usage: Track memory allocation and usage to determine the cost per GiB per second.
- Idle Container Time: Monitor the container idle timeout to calculate the cost of keeping containers active.
- Testing Data: Use real-world testing data (e.g., API calls, image processing) to validate cost calculations.
Key Points of Cost
-
GPU Costs:
- Nvidia H100: $0.001267/sec
- Nvidia L4: $0.000222/sec
- Nvidia L40S: $0.000542/sec
-
CPU Costs:
- $0.000038/core/sec
-
Memory Costs:
- $0.00000667/GiB/sec
-
Idle Container Costs:
- Containers remain active for 720 seconds by default, incurring costs even when idle.
-
Optimization Strategies:
- Increase memory and concurrency for Lookup to handle more requests per container.
- Reduce container idle timeout for API services to minimize idle costs.
Cost Log Report
Testing Data
-
get_image:
- Cost: $0.40
- Time: 347.9s + 17.06s
-
get_image_urns:
- Memory: 1.06GB/16GB
- CPU: 0.11 core/4 core
- GPU Memory: 16.40GB
- Cost: $1.42 ($0.92 GPU + $0.27 CPU + $0.19 Memory)
- Time: 23mins 17s
-
First API Call:
- Time: 32s
- Cost: $0.24 ($0.18 GPU + $0.06 CPU + $0.01 Memory)
-
Frequent Request Test (40 requests/min):
- Total Requests: 160
- Cost: $0.88 ($0.64 GPU + $0.22 Memory + $0.02 CPU)
- Time: 12mins
- Cost per Call: $0.0055
-
Optimized Configuration Test (200 requests in 5 mins):
- Cost: $0.31 ($0.21 GPU + $0.07 Memory + $0.03 CPU)
- Cost per Call: $0.0015
Cost Analysis
-
GPU Costs:
- The Nvidia L40S GPU is the most cost-effective for frequent requests, with a cost of $0.000542/sec.
- The Nvidia H100 is more expensive but may be necessary for specific tasks.
-
CPU Costs:
- CPU costs are minimal compared to GPU costs but should still be optimized by reducing reserved cores.
-
Memory Costs:
- Memory costs are relatively low but can add up with high concurrency and large datasets.
-
Idle Container Costs:
- Reducing the container idle timeout significantly lowers costs, especially for API services.
-
Optimization Impact:
- Increasing memory and concurrency for Lookup reduced the cost per call from $0.0055 to $0.0015.
- Shortening the container idle timeout for API services further minimized idle costs.
Price Table
Resource Type | Cost per Unit | Example Usage | Cost Example |
---|---|---|---|
Nvidia L40S | $0.000542/sec | Frequent request handling | 720s = $0.39 |
Nvidia L4 | $0.000222/sec | Lookup class for queries | 720s = $0.16 |
CPU (per core) | $0.000038/core/sec | Catalog (4 cores), Lookup (2 cores) | 720s = $0.11 (Catalog), $0.05 (Lookup) |
Memory | $0.00000667/GiB/sec | 16GB reserved for Catalog | 720s = $0.08 |
Idle Container | Based on GPU/CPU/Memory | 720s timeout | $0.24 (L4 GPU + CPU + Memory) |
Optimization Plan
1. Lookup Class (GPU Model)
- Goal: Handle more requests per container to reduce the number of containers needed.
- Actions:
- Increase memory from 1024 MB → 4096 MB (4 GB).
- Set
container_idle_timeout=300s
(5 minutes) to avoid cold starts. - Set
concurrency_limit=4
to allow handling 4 requests simultaneously.
- Impact:
- Fewer containers are created, reducing GPU and memory costs.
- Cost per call reduced from $0.0055 to $0.0015.
2. API Service (FastAPI Web Handler)
- Goal: Make the API service lightweight and disposable to minimize idle costs.
- Actions:
- Reduce
container_idle_timeout
to 30 seconds. - Set
concurrency_limit=2
to limit the number of active containers. - Disable
keep_warm
to avoid pre-warmed containers.
- Reduce
- Impact:
- API containers shut down quickly when idle, reducing idle costs.
- Startup time is fast (3s), so new requests are handled efficiently.
Cost Projections
Current Configuration
- Cost per Call: $0.0015 (optimized configuration).
- Hourly Cost: $1.08 (based on 200 requests in 5 minutes).
- Monthly Cost: ~$777.60 (assuming 24/7 operation).
Future Improvements
- Further Increase Concurrency:
- If
concurrency_limit
is increased to 8, the cost per call could drop further.
- If
- Dynamic Scaling:
- Implement dynamic scaling based on request load to reduce idle time.
- GPU Selection:
- Evaluate if a lower-cost GPU (e.g., Nvidia T4) can handle the workload without performance degradation.
Key Takeaways
- GPU Costs Dominate:
- The choice of GPU (e.g., L40S vs. H100) has the most significant impact on overall costs.
- Idle Time is Expensive:
- Reducing
container_idle_timeout
for API services significantly lowers costs.
- Optimization Works:
- Increasing memory and concurrency for Lookup reduced costs by 72% ($0.0055 → $0.0015 per call).
- Testing is Crucial:
- Real-world testing (e.g., 200 requests in 5 minutes) validated the cost savings.
Next Steps
- Monitor Costs:
- Continuously track costs after implementing the optimized configuration.
- Evaluate GPU Options:
- Test if a lower-cost GPU (e.g., Nvidia T4) can handle the workload.