Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron

NVIDIA Technical Blog

Hao Wu

Apr 22, 2026, 04:01 PM

Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved... Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved significant success more recently when applied to leading LLMs. In particular, Muon (MomentUm Orthogonalized by Newton-Schulz) was used to train some of today’s best open source models, including Kimi K2 and GLM-5. Source