Inference mode
16 8192
0 1.5
0.1 1
1 20

API backend setup

This ZeroGPU demo keeps local vLLM disabled. Benchmarking runs against a remote or external OpenAI-compatible endpoint.

OpenAI-compatible preset
5 300
Remote API model

Performance History

Exports appear after the first successful run.