Category: Misc
GeForce NOW is throwing open the vault doors to welcome the legendary Borderland series to the cloud. Whether a seasoned Vault Hunter or new to the mayhem of Pandora, prepare to experience the high-octane action and humor that define the series that includes Borderlands Game of the Year Enhanced, Borderlands 2, Borderlands 3 and Borderlands:
Read Article
As enterprises generate and consume increasing volumes of diverse data, extracting insights from multimodal documents, like PDFs and presentations, has become a…
As enterprises generate and consume increasing volumes of diverse data, extracting insights from multimodal documents, like PDFs and presentations, has become a major challenge. Traditional text-only extraction and basic retrieval-augmented generation (RAG) pipelines fall short, failing to capture the full value of these complex documents. The result? Missed insights, inefficient workflows…
In today’s fast-paced IT environment, not all incidents begin with obvious alarms. They may start as subtle, scattered signals, a missed alert, a quiet SLO…
In today’s fast-paced IT environment, not all incidents begin with obvious alarms. They may start as subtle, scattered signals, a missed alert, a quiet SLO breach, or a degraded service that slowly impacts users. Designed by the NVIDIA IT team, ITMonitron is an internal tool that helps make sense of these faint signals. By combining real-time telemetry with NVIDIA NIM inference microservices…
A chunking strategy is the method of breaking down large documents into smaller, manageable pieces for AI retrieval. Poor chunking leads to irrelevant results,…
A chunking strategy is the method of breaking down large documents into smaller, manageable pieces for AI retrieval. Poor chunking leads to irrelevant results, inefficiency, and reduced business value. It determines how effectively relevant information is fetched for accurate AI responses. With so many options available—page-level, section-level, or token-based chunking with various sizes—how do…
Have you ever wondered exactly what the CUDA compiler generates when you write GPU kernels? Ever wanted to share a minimal CUDA example with a colleague…
Have you ever wondered exactly what the CUDA compiler generates when you write GPU kernels? Ever wanted to share a minimal CUDA example with a colleague effortlessly, without the need for them to install a specific CUDA toolkit version first? Or perhaps you’re completely new to CUDA and looking for an easy way to start without needing to install anything or even having a GPU on hand?
LMArena at the University of California, Berkeley is making it easier to see which large language models excel at specific tasks, thanks to help from NVIDIA and…
LMArena at the University of California, Berkeley is making it easier to see which large language models excel at specific tasks, thanks to help from NVIDIA and Nebius. Its rankings, powered by the Prompt-to-Leaderboard (P2L) model, collect votes from humans on which AI performs best in areas such as math, coding, or creative writing. “We capture user preferences across tasks and apply…
The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL…
The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL is a central piece of software for multi-GPU deep learning training. It handles any kind of inter-GPU communication, be it over PCI, NVIDIA NVLink, or networking. It uses advanced topology detection, optimized communication graphs…
NVIDIA leverages data science and machine learning to optimize chip manufacturing and operations workflows—from wafer fabrication and circuit probing to…
NVIDIA leverages data science and machine learning to optimize chip manufacturing and operations workflows—from wafer fabrication and circuit probing to packaged chip testing. These stages generate terabytes of data, and turning that data into actionable insights at speed and scale is critical to ensuring quality, throughput, and cost efficiency. Over the years, we’ve developed robust ML pipelines…
This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to determine the cost of LLM…
This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to determine the cost of LLM inference by estimating the total cost of ownership (TCO). See LLM Inference Benchmarking: Fundamental Concepts for background knowledge on common metrics for benchmarking and parameters. See LLM Inference Benchmarking Guide: NVIDIA…