NVIDIA pushed a fresh round of PiD (Pixel Diffusion Decoder) checkpoints on June 2, including a FLUX.2 variant that fixes a color-drifting issue from the original release and brand-new checkpoints for Qwen-Image and Qwen-Image-2512. The drop also adds an SDXL 2k-to-4k checkpoint, putting all three of the most-used image-gen latents on a single Apache 2.0 decoder.

PiD replaces the standard VAE decode + super-resolution cascade with a single conditional pixel-diffusion pass, decoding directly into 4K. On a consumer RTX 5090, the team reports a 512 to 2048 decode under one second, with the full research page claiming roughly 6x faster than cascaded SR baselines.

Try It: Run the New Checkpoints in ComfyUI

The fastest path is the community ComfyUI-PiD node, which auto-downloads NVIDIA's checkpoints on first run. Add the PiD Decode node to any FLUX.2 or Qwen-Image workflow, point it at pid_ckpt_type: 2kto4k, and feed it the latent plus your prompt and sigma value. The result is a 4K image with no separate upscaler in the graph.

For VRAM-tight rigs, the staged PiD Prepare to PiD Sample to PiD Finalize nodes run sampling in a subprocess so CUDA memory is freed between steps. Both node packs are on the ComfyUI feature-request tracker for first-party integration.

Why It Matters

VAE decoders are reconstruction-trained and become the visual-quality ceiling at 4K. PiD treats decoding as generation, so it can hallucinate plausible detail rather than approximating what a downstream upscaler would invent. The June 2 checkpoint resolves the most-reported issue from the first FLUX.2 release: a green-tinted color cast on saturated images. The original PiD paper shows the architecture distilled to four sampling steps using DMD2, which is what keeps decode latency under a second. For creators running FLUX.2 in production pipelines, swapping the decoder is a one-line change with no LoRA or sampler impact.

Key Details

The repository now ships seven official checkpoints across FLUX.1, FLUX.2, SD3, SDXL, Qwen-Image, Qwen-Image-2512, and Z-Image latent spaces, plus DINOv2 and SigLIP variants for semantic-latent backbones. Resolution variants are 2k (sr4x or sr8x) and 2kto4k (sr4x), all distilled to 4 sampling steps. The HuggingFace checkpoint directory lists upload dates, and the latest batch carries the _2606 suffix. License is Apache 2.0, which means the checkpoints are usable in commercial work without redistribution restrictions, matching the model-side licenses for FLUX and Qwen-Image. A GB200 GPU drops decode time to 210 milliseconds.

What to Do Next

Pull the new checkpoints from HuggingFace and swap your current VAE decode node for PiD Decode on a test prompt before changing your production graph. Compare PiD against your existing VAE plus SR pipeline at 4K and measure both visual fidelity and end-to-end latency. If you batch-render at scale, the single-pass decode should cut total inference time noticeably on RTX-class hardware.