Price Report

v1 from Modal Cost for Alex

Object Cost from Modal

The cost breakdown for the searching service is categorized into the following components:

  1. GPU Running Time: Costs vary depending on the GPU type used (e.g., Nvidia H100, L4, L40S).
  2. CPU Running Time: Costs are based on the number of CPU cores reserved and their usage duration.
  3. Memory Allocation: Costs are calculated per GiB per second.
  4. Cloud Storage & Data Transfer: Includes volume usage, CloudBucketMount, and data egress costs.
  5. Idle Container Time: Costs are incurred for the duration a container remains active, even if idle.

The Methodology Plan to Measure the Cost

  1. GPU Tasks: Measure the cost per second for each GPU type used in the Catalog and Lookup classes.
  2. CPU Tasks: Calculate the cost based on the number of CPU cores reserved and their usage time.
  3. Memory Usage: Track memory allocation and usage to determine the cost per GiB per second.
  4. Idle Container Time: Monitor the container idle timeout to calculate the cost of keeping containers active.
  5. Testing Data: Use real-world testing data (e.g., API calls, image processing) to validate cost calculations.

Key Points of Cost

  1. GPU Costs:

    • Nvidia H100: $0.001267/sec
    • Nvidia L4: $0.000222/sec
    • Nvidia L40S: $0.000542/sec
  2. CPU Costs:

    • $0.000038/core/sec
  3. Memory Costs:

    • $0.00000667/GiB/sec
  4. Idle Container Costs:

    • Containers remain active for 720 seconds by default, incurring costs even when idle.
  5. Optimization Strategies:

    • Increase memory and concurrency for Lookup to handle more requests per container.
    • Reduce container idle timeout for API services to minimize idle costs.

Cost Log Report

Testing Data

  1. get_image:

    • Cost: $0.40
    • Time: 347.9s + 17.06s
  2. get_image_urns:

    • Memory: 1.06GB/16GB
    • CPU: 0.11 core/4 core
    • GPU Memory: 16.40GB
    • Cost: $1.42 ($0.92 GPU + $0.27 CPU + $0.19 Memory)
    • Time: 23mins 17s
  3. First API Call:

    • Time: 32s
    • Cost: $0.24 ($0.18 GPU + $0.06 CPU + $0.01 Memory)
  4. Frequent Request Test (40 requests/min):

    • Total Requests: 160
    • Cost: $0.88 ($0.64 GPU + $0.22 Memory + $0.02 CPU)
    • Time: 12mins
    • Cost per Call: $0.0055
  5. Optimized Configuration Test (200 requests in 5 mins):

    • Cost: $0.31 ($0.21 GPU + $0.07 Memory + $0.03 CPU)
    • Cost per Call: $0.0015

Cost Analysis

  1. GPU Costs:

    • The Nvidia L40S GPU is the most cost-effective for frequent requests, with a cost of $0.000542/sec.
    • The Nvidia H100 is more expensive but may be necessary for specific tasks.
  2. CPU Costs:

    • CPU costs are minimal compared to GPU costs but should still be optimized by reducing reserved cores.
  3. Memory Costs:

    • Memory costs are relatively low but can add up with high concurrency and large datasets.
  4. Idle Container Costs:

    • Reducing the container idle timeout significantly lowers costs, especially for API services.
  5. Optimization Impact:

    • Increasing memory and concurrency for Lookup reduced the cost per call from $0.0055 to $0.0015.
    • Shortening the container idle timeout for API services further minimized idle costs.

Price Table

Resource Type Cost per Unit Example Usage Cost Example
Nvidia L40S $0.000542/sec Frequent request handling 720s = $0.39
Nvidia L4 $0.000222/sec Lookup class for queries 720s = $0.16
CPU (per core) $0.000038/core/sec Catalog (4 cores), Lookup (2 cores) 720s = $0.11 (Catalog), $0.05 (Lookup)
Memory $0.00000667/GiB/sec 16GB reserved for Catalog 720s = $0.08
Idle Container Based on GPU/CPU/Memory 720s timeout $0.24 (L4 GPU + CPU + Memory)

Optimization Plan

1. Lookup Class (GPU Model)

2. API Service (FastAPI Web Handler)


Cost Projections

Current Configuration

Future Improvements

  1. Further Increase Concurrency:
    • If concurrency_limit is increased to 8, the cost per call could drop further.
  2. Dynamic Scaling:
    • Implement dynamic scaling based on request load to reduce idle time.
  3. GPU Selection:
    • Evaluate if a lower-cost GPU (e.g., Nvidia T4) can handle the workload without performance degradation.

Key Takeaways

  1. GPU Costs Dominate:
    • The choice of GPU (e.g., L40S vs. H100) has the most significant impact on overall costs.
  2. Idle Time is Expensive:
  1. Optimization Works:
  1. Testing is Crucial:

Next Steps

  1. Monitor Costs:
  1. Evaluate GPU Options: