LLM Inference & Benchmark Playground

Run progress

Inference mode

Single prompt Batch / Multi-prompt

Prompt

Max new tokens

16 8192

Temperature

0 1.5

Top-p

0.1 1

Benchmark mode

Number of runs

1 20

This ZeroGPU demo keeps local vLLM disabled. Benchmarking runs against a remote or external OpenAI-compatible endpoint.

OpenAI-compatible preset

API timeout (seconds)

5 300

API base URL

Remote API model

API key (optional for local servers)

API backend status

Output

Metrics

Performance History

Exports appear after the first successful run.

Benchmark CSV Export

Benchmark JSON Export

Benchmark Report Export

Session Bundle Export (.zip)

History Summary

Latency History

Tokens/sec History

Avg Throughput by Provider