Runway ML's capacity controller dynamically shifts GPUs between production and research based on demand, using queueing theory for precise sizing. GPUs flow to research overnight when traffic halves, then return before peak hours, cutting costs and wait times.

Get hand-picked daily summaries of the best, most informative AI articles from around the web.

Runway ML built a capacity controller that dynamically shifts GPUs between production inference and research based on daily demand cycles. Using queueing theory to size capacity precisely, production GPUs flow to research overnight when traffic drops to half its peak, then return before the morning surge — cutting costs and queue wait times simultaneously.

Borrowing the Night: Reclaiming Idle Inference GPUs for Research

Get the next batch of curated summaries in your inbox.