Delivering 1.5 M TPS Inference on NVIDIA GB200 NVL72, NVIDIA Accelerates OpenAI gpt-oss Models from Cloud to Edge

Open AI and NVIDIA logos. NVIDIA and OpenAI began pushing the boundaries of AI with the launch of NVIDIA DGX back in 2016. The collaborative AI innovation continues with the OpenAI…

NVIDIA and OpenAI began pushing the boundaries of AI with the launch of NVIDIA DGX back in 2016. The collaborative AI innovation continues with the OpenAI gpt-oss-20b and gpt-oss-120b launch. NVIDIA has optimized both new open-weight models for accelerated inference performance on NVIDIA Blackwell architecture, delivering up to 1.5 million tokens per second (TPS) on an NVIDIA GB200 NVL72 system.

Source

Leave a Reply Cancel reply