Can you use "Spot" or "Preemptible" GPU instances for real-time inference? What happens to the user's request if the cloud provider reclaims the GPU mid-generation?

Question

Accepted Answer

You can use them for Batch Processing (where timing doesn't matter), but they are risky for Real-time Inference. If an instance is reclaimed, you must have a "Fallback" mechanism that instantly reroutes the user's request to a higher-cost, on-demand instance.

Can you use "Spot" or "Preemptible" GPU instances for real-time inference? What happens to the user's request if the cloud provider reclaims the GPU mid-generation?

Practice Your Response

Similar Questions in Deployment & Cost (AI-Ops)