Close Menu
Techora News HubTechora News Hub
    Facebook X (Twitter) Instagram
    Techora News HubTechora News Hub
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Techora News HubTechora News Hub
    Home»AI News»NVIDIA Releases Nemotron-Cascade 2: An Open 30B MoE with 3B Active Parameters, Delivering Better Reasoning and Strong Agentic Capabilities
    AI News

    NVIDIA Releases Nemotron-Cascade 2: An Open 30B MoE with 3B Active Parameters, Delivering Better Reasoning and Strong Agentic Capabilities

    March 21, 2026
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    NVIDIA Releases Nemotron-Cascade 2: An Open 30B MoE with 3B Active Parameters, Delivering Better Reasoning and Strong Agentic Capabilities
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email
    coinbase


    NVIDIA has announced the release of Nemotron-Cascade 2, an open-weight 30B Mixture-of-Experts (MoE) model with 3B activated parameters. The model focuses on maximizing ‘intelligence density,’ delivering advanced reasoning capabilities at a fraction of the parameter scale used by frontier models. Nemotron-Cascade 2 is the second open-weight LLM to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals.

    https://research.nvidia.com/labs/nemotron/files/Nemotron-Cascade-2.pdf

    Targeted Performance and Strategic Trade-offs

    The primary value proposition of Nemotron-Cascade 2 is its specialized performance in mathematical reasoning, coding, alignment, and instruction following. While it achieves state-of-the-art results in these key reasoning-intensive domains, it is surely not a ‘blanket win’ across all benchmarks.

    The model’s performance excels in several targeted categories compared to the recently released Qwen3.5-35B-A3B (February 2026) and the larger Nemotron-3-Super-120B-A12B:

    • Mathematical Reasoning: Outperforms Qwen3.5-35B-A3B on AIME 2025 (92.4 vs. 91.9) and HMMT Feb25 (94.6 vs. 89.0).
    • Coding: Leads on LiveCodeBench v6 (87.2 vs. 74.6) and IOI 2025 (439.28 vs. 348.6+).
    • Alignment and Instruction Following: Scores significantly higher on ArenaHard v2 (83.5 vs. 65.4+) and IFBench (82.9 vs. 70.2).
    https://research.nvidia.com/labs/nemotron/files/Nemotron-Cascade-2.pdf

    Technical Architecture: Cascade RL and Multi-domain On-Policy Distillation (MOPD)

    The model’s reasoning capabilities stem from its post-training pipeline, starting from the Nemotron-3-Nano-30B-A3B-Base model.

    binance

    1. Supervised Fine-Tuning (SFT)

    During SFT, NVIDIA research team utilized a meticulously curated dataset where samples were packed into sequences of up to 256K tokens. The dataset included:

    • 1.9M Python reasoning traces and 1.3M Python tool-calling samples for competitive coding.
    • 816K samples for mathematical natural language proofs.
    • A specialized Software Engineering (SWE) blend consisting of 125K agentic and 389K agentless samples.

    2. Cascade Reinforcement Learning

    Following SFT, the model underwent Cascade RL, which applies sequential, domain-wise training. This prevents catastrophic forgetting by allowing hyperparameters to be tailored to specific domains without destabilizing others. The pipeline includes stages for instruction-following (IF-RL), multi-domain RL, RLHF, long-context RL, and specialized Code and SWE RL.

    https://research.nvidia.com/labs/nemotron/files/Nemotron-Cascade-2.pdf

    3. Multi-Domain On-Policy Distillation (MOPD)

    A critical innovation in Nemotron-Cascade 2 is the integration of MOPD during the Cascade RL process. MOPD assembly uses the best-performing intermediate ‘teacher’ models—already derived from the same SFT initialization—to provide a dense token-level distillation advantage. This advantage is defined mathematically as:

    $$a_{t}^{MOPD}=log~\pi^{domain_{t}}(y_{t}|s_{t})-log~\pi^{train}(y_{t}|s_{t})$$

    The research team found that MOPD is substantially more sample-efficient than sequence-level reward algorithms like Group Relative Policy Optimization (GRPO). For instance, on AIME25, MOPD reached teacher-level performance (92.0) within 30 steps, while GRPO achieved only 91.0 after matching those steps.

    Inference Features and Agentic Interaction

    Nemotron-Cascade 2 supports two primary operating modes through its chat template:

    • Thinking Mode: Initiated by a single <think> token, followed by a newline. This activates deep reasoning for complex math and code tasks.
    • Non-Thinking Mode: Activated by prepending an empty <think></think> block for more efficient, direct responses.

    For agentic tasks, the model utilizes a structured tool-calling protocol within the system prompt. Available tools are listed within <tools> tags, and the model is instructed to perform tool calls wrapped in <tool_call> tags to ensure verifiable execution feedback.

    By focusing on ‘intelligence density,’ Nemotron-Cascade 2 demonstrates that specialized reasoning capabilities once thought to be the exclusive domain of frontier-scale models are achievable at a 30B scale through domain-specific reinforcement learning.

    Check out Paper and Model on HF. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.



    Source link

    aistudios
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    Google-Agent vs Googlebot: Google Defines the Technical Boundary Between User Triggered AI Access and Search Crawling Systems Today

    March 29, 2026

    Seeing sounds | MIT News

    March 28, 2026

    Intercom's new post-trained Fin Apex 1.0 beats GPT-5.4 and Claude Sonnet 4.6 at customer service resolutions

    March 27, 2026

    Family offices turn to AI for financial data insights

    March 26, 2026

    Google Introduces TurboQuant: A New Compression Algorithm that Reduces LLM Key-Value Cache Memory by 6x and Delivers Up to 8x Speedup, All with Zero Accuracy Loss

    March 25, 2026

    How to create “humble” AI | MIT News

    March 24, 2026
    livechat
    Latest Posts

    Google-Agent vs Googlebot: Google Defines the Technical Boundary Between User Triggered AI Access and Search Crawling Systems Today

    March 29, 2026

    the AI influencers that ACTUALLY get you paid

    March 29, 2026

    Peter Schiff Warns Bitcoin Collateral Plan Could Amplify Housing Market Risks

    March 28, 2026

    Stablecoins Will Be Crypto’s “ChatGPT Moment,” Says Ripple

    March 28, 2026

    Bitcoin, Altcoins Give Back March Gains As Investors Cut Risk

    March 28, 2026
    synthesia
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights

    BNP Paribas Adds Bitcoin, Ether ETNs for France Retail Users

    March 29, 2026

    The next Bitcoin shock could be where Wall Street finally loses faith and starts selling

    March 29, 2026
    synthesia
    Facebook X (Twitter) Instagram Pinterest
    © 2026 TechoraNewsHub.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 66,503.00
    ethereum
    Ethereum (ETH) $ 2,000.35
    tether
    Tether (USDT) $ 0.999227
    bnb
    BNB (BNB) $ 608.77
    xrp
    XRP (XRP) $ 1.32
    usd-coin
    USDC (USDC) $ 0.999718
    solana
    Solana (SOL) $ 81.87
    tron
    TRON (TRX) $ 0.322906
    figure-heloc
    Figure Heloc (FIGR_HELOC) $ 1.02
    staked-ether
    Lido Staked Ether (STETH) $ 2,265.05