Close Menu
Techora News HubTechora News Hub
    Facebook X (Twitter) Instagram
    Techora News HubTechora News Hub
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Techora News HubTechora News Hub
    Home»AI News»MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding
    AI News

    MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

    June 2, 2026
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email
    binance


    MiniMax officially released MiniMax M3 on June 1, 2026. The model introduces MSA (MiniMax Sparse Attention), a new sparse attention architecture that gives M3 a 1M-token context window. M3 also supports image and video input and desktop computer operation natively. The API is live now.

    MiniMax M3 is available today via MiniMax Code, the MiniMax Token Plan, and the MiniMax API. It is the next model in the M-series line after M2.7. MiniMax positions M3 as an open-weight model combining frontier-level coding performance, a 1M-token context window, and native multimodal input in a single architecture — the first to do so, per MiniMax. The corresponding model weights and technical report are scheduled for release within 10 days of launch.

    MSA: MiniMax Sparse Attention

    The central architectural change in MiniMax M3 is MSA (MiniMax Sparse Attention). Standard full attention has quadratic computational complexity: as context length grows, compute cost grows as the square of the sequence length. MSA is designed to address this.

    Sparse attention mechanisms generally add a pre-filtering stage before computing attention, avoiding full quadratic cost. MiniMax team states that compared to approaches like DSA and MoBA, MSA partitions the KV cache into blocks more precisely, achieving higher effective context coverage.

    changelly

    At the operator level, MSA uses a “KV outer gather Q” approach. KV blocks serve as the outer loop to aggregate the queries that hit them. Each block is read only once and memory access is contiguous. MiniMax team reports this is more than 4× faster than open-source implementations such as Flash-Sparse-Attention and flash-moba under MiniMax M3’s head configuration.

    The result: at a context length of 1 million tokens, MiniMax M3’s per-token compute is 1/20th that of the previous-generation M2 models. MiniMax team reports a speedup of more than 9× in the prefill stage and more than 15× in the decoding stage at 1M-token context. Across multiple ablation studies, MSA matched full attention on the majority of capabilities.

    Coding and Agentic Benchmarks

    Coding and agentic capabilities are key areas of improvement for M3. The benchmark results below are reported by MiniMax team. Several evaluations were run on MiniMax internal infrastructure, while some comparison scores were taken from official leaderboards or external benchmark sources, as noted in MiniMax’s methodology. SWE-Bench Verified was tested on internal infrastructure using Claude Code scaffolding and averaged over 4 runs. SWE-Bench Pro was also tested on internal infrastructure using Claude Code scaffolding, with testing logic aligned to the official evaluation.

    • SWE-Bench Pro: 59.0% (surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Opus 4.7)
    • Terminal-Bench 2.1: 66.0%
    • SWE-fficiency: 34.8%
    • KernelBench Hard: 28.8% (evaluated on NVIDIA Blackwell GPUs, CUDA capability sm_120)
    • MCP Atlas: 74.2%
    • Claw-Eval: highest score among models evaluated (General Task Group, 161 tasks)
    • SVG-Bench: surpasses Opus 4.7

    On OmniDocBench, a multimodal document understanding benchmark, M3 scores above Gemini 3.1 Pro. On OSWorld-Verified (361 samples), M3 achieves a 70.06% task completion rate for computer use (Max Steps = 200).

    MiniMax also built an interactive user simulator framework for training and evaluation. It simulates multi-turn developer collaboration: requirement elaboration, solution discussion, feedback-based correction, continuous task switching, and multi-round project iteration. This is intended to reduce the gap between single-turn benchmark performance and real-world, multi-turn developer workflows.

    Native Multimodality

    MiniMax M3 underwent mixed-modality training from step 0. Text, images, and video are trained together from the beginning rather than added post-training. MiniMax team reports that interleaved data — sequences where text and images are naturally intermixed — is more critical to model performance than commonly assumed. After rebuilding the entire data pipeline for interleaved formats, training data was scaled to the order of 100 trillion tokens.

    MiniMax M3 supports image and video input and can operate a desktop computer.

    Real-World Task Examples from MiniMax

    MiniMax documents three internal tasks in the release post:

    Paper reproduction: MiniMax gave MiniMax M3 the ICLR 2025 Outstanding Paper Award-winning paper Learning Dynamics of LLM Finetuning and asked it to reproduce the experiments independently. M3 ran autonomously for nearly 12 hours, produced 18 commits and 23 experimental figures, and completed the core experiments without human intervention. It required multimodal capability to read curves and formulas, long context to hold the paper and experiment logs simultaneously, and coding capability to execute the reproduction across a long thread.

    CUDA kernel optimization: MiniMax asked MiniMax M3 to optimize an FP8 matrix multiplication (GEMM) kernel on NVIDIA Hopper architecture GPUs. The model started with only a task description, a benchmark evaluation script, and a non-functional Triton skeleton — no reference implementation was provided. Over approximately 24 hours, MiniMax M3 made 147 benchmark submissions and 1,959 tool calls. It progressed through baseline implementation, autotune configuration generation, performance bottleneck diagnosis, CUDA Graph integration, persistent kernel rewriting, and host-side scheduling optimization. After six landmark rounds of optimization, MiniMax M3 improved Hopper FP8 hardware peak utilization from 7.6% to 71.3%, a 9.4× speedup. The best solution appeared on the 145th submission. MiniMax notes that most other models stopped making new progress within the first 30 submissions; only Opus 4.7 and M3 continued beyond that point.

    PostTrainBench (autonomous model training): MiniMax gave MiniMax M3 four base models that had completed pretraining only. MiniMax M3 autonomously ran the full data synthesis → training → evaluation → iteration cycle over 12 hours with no human intervention. The target was for the base models to acquire capabilities across mathematical reasoning (AIME2025), tool calling (BFCL), scientific knowledge reasoning (GPQA Main), arithmetic reasoning (GSM8K), and code generation (HumanEval). MiniMax M3 scored 0.37, below Opus 4.7 (0.42) and GPT-5.5 (0.39), but ahead of the other models tested.

    Marktechpost’s Visual Explainer

    Overview

    MiniMax M3: Frontier Coding, 1M-Token Context, Native Multimodality

    MiniMax officially released M3 on June 1, 2026. The API is live now. Model weights and technical report will be open-sourced within 10 days.

    M3 is the next model in the M-series line after M2.7. MiniMax positions it as the first open-weight model to combine all three of the following in a single architecture:

    1M
    Token Context Window

    59.0%
    SWE-Bench Pro Score

    MSA
    Sparse Attention Architecture

    70.06%
    OSWorld-Verified (Computer Use)

    Architecture

    MSA: MiniMax Sparse Attention

    Standard full attention has quadratic computational complexity. As context length grows, compute cost grows as the square of the sequence length. MSA is designed to solve this at the operator level.

    Compared to approaches like DSA and MoBA, MSA partitions the KV cache into blocks more precisely, achieving higher effective context coverage.

    MSA uses a “KV outer gather Q” approach — each KV block is read only once, memory access is contiguous, and arithmetic intensity is significantly better than common methods.

    >9×
    Prefill Speedup at 1M ctx

    >15×
    Decoding Speedup at 1M ctx

    1/20
    Per-token compute vs M2 at 1M

    >4×
    Faster than Flash-Sparse-Attn

    Benchmarks

    Coding and Agentic Performance

    Results reported by MiniMax. SWE-Bench Verified used Claude Code scaffolding, averaged over 4 runs. SWE-Bench Pro used Claude Code scaffolding, aligned to official evaluation.

    • SWE-Bench Pro: 59.0% — surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Opus 4.7
    • Terminal-Bench 2.1: 66.0%
    • SWE-fficiency: 34.8%
    • KernelBench Hard: 28.8% — evaluated on NVIDIA Blackwell GPUs (sm_120)
    • MCP Atlas: 74.2%
    • Claw-Eval: Highest score among models evaluated (161 tasks)
    • SVG-Bench: Surpasses Opus 4.7
    • OmniDocBench: Above Gemini 3.1 Pro
    • OSWorld-Verified: 70.06% — 361 samples, Max Steps = 200
    Multimodality

    Native Multimodal Training from Step 0

    M3 underwent mixed-modality training from step 0. Text, images, and video are trained together from the start — not added as a post-training capability.

    MiniMax reports that interleaved data — sequences where text and images are naturally intermixed — is more critical to model performance than commonly assumed.

    After rebuilding the entire data pipeline for interleaved formats, training data was scaled to the order of 100 trillion tokens.

    • Image input
    • Video input
    • Desktop computer operation (computer use)
    Real-World Tasks

    Three Internal Tasks Documented by MiniMax

    • Paper Reproduction — M3 reproduced the ICLR 2025 paper Learning Dynamics of LLM Finetuning autonomously over ~12 hours, producing 18 commits and 23 experimental figures with no human intervention.
    • CUDA Kernel Optimization — M3 optimized an FP8 GEMM kernel on NVIDIA Hopper GPUs over ~24 hours: 147 benchmark submissions, 1,959 tool calls, 6 landmark optimization rounds. Improved Hopper FP8 peak utilization from 7.6% → 71.3% (9.4× speedup). Best solution appeared on submission 145.
    • PostTrainBench — M3 autonomously ran data synthesis → training → evaluation → iteration for 4 base models over 12 hours. Scored 0.37, below Opus 4.7 (0.42) and GPT-5.5 (0.39), but ahead of other evaluated models. Targets: AIME2025, BFCL, GPQA Main, GSM8K, HumanEval.
    MiniMax Code

    MiniMax Code: Agent Product Built and Trained with M3

    MiniMax Code is an agent product built and trained together with M3. Available at agent.minimaxi.com/download. Works with MiniMax Token Plans.

    • Agent Teams — multiple agents run concurrent, multi-stage, dynamically adjustable workflows
    • Producer + Verifier loop — adversarial harness enables continuous self-correction during execution
    • Computer use — M3’s native multimodal capability enables cross-application desktop automation
    • Built on OpenCode and Pi — MiniMax states it plans to open-source MiniMax Code in the future

    // Example use case
    User (on phone): “Open the local ERP client
    and batch-enter invoice data from this Excel file.”
    → MiniMax Code handles operations across
    applications, files, and systems on desktop.

    API & Pricing

    API Details and Token Plan Tiers

    The M3 API is live at platform.minimax.io.

    Pricing by input length: Calls ≤512K tokens → standard rate. Calls >512K → higher long-context rate.

    Thinking mode: Toggle on/off at request time. Both modes share the same pricing.

    Service tiers: standard (default) and priority (service_tier=priority) — priority available via sales, opening to all users soon.

    Plus
    ~1.7B tokens/mo
    $20/mo

    Max
    ~5.1B tokens/mo
    $50/mo

    Ultra
    ~9.8B tokens/mo
    $120/mo

    Text, image, speech, and music usage all draw from the same token pool.

    Key Takeaways

    What Engineers and Researchers Need to Know

    • MiniMax M3 launched June 1, 2026. API is live. Open model weights and technical report committed within 10 days.
    • MSA delivers >9× prefill and >15× decoding speedup at 1M-token context vs M2, at 1/20th the per-token compute.
    • M3 scores 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro.
    • Natively multimodal from step 0 — supports image, video input, and 70.06% on OSWorld-Verified for computer use.
    • Thinking mode toggleable at request time. Token Plan starts at $20/month (~1.7B M3 tokens).

    Key Takeaways

    • MiniMax M3 launched June 1, 2026; API is live now. MiniMax has committed to releasing open model weights and a technical report within 10 days.
    • MSA (MiniMax Sparse Attention) delivers more than 9× prefill and more than 15× decoding speedup at 1M-token context versus M2, at 1/20th the per-token compute.
    • M3 scores 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro.
    • M3 is natively multimodal from step 0, supporting image and video input, and achieves 70.06% on OSWorld-Verified for computer use.

    Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities

    – Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas- MiniMax Sparse Attention scales context to 1M-… pic.twitter.com/TF891iJukF

    — MiniMax (official) (@MiniMax_AI) June 1, 2026

    Check out the Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

    Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us





    Source link

    Customgpt
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    Claude Mythos exposed a hard truth: Your enterprise patching process is way too slow

    June 1, 2026

    OpenAI governance frameworks secure enterprise AI deployments

    May 31, 2026

    NVIDIA Introduces X-Token: Projection-Guided Cross-Tokenizer KD That Outperforms GOLD by +3.82 Average Points on Llama-3.2-1B

    May 30, 2026

    Media Advisory: MIT to establish regional quantum hub | MIT News

    May 29, 2026

    MiniMax teases upcoming M3 model with new sparse attention mechanism and 15.6X long-context response speed boost

    May 27, 2026

    Autonomous AI systems test governance in physical environments

    May 26, 2026
    binance
    Latest Posts

    MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

    June 2, 2026

    You’re Not Behind (Yet): Learn AI Agents in 13 Minutes

    June 2, 2026

    Hyperliquid’s HYPE Breakout Puts $100 Price Target in Play

    June 1, 2026

    Sui Addresses Three Network Outages With Major Upgrade

    June 1, 2026

    Will it Push Ether’s Price Lower?

    June 1, 2026
    kraken
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights

    Strive Doubles Down on Bitcoin With $185M Buy, Holdings Near 19,000 BTC

    June 2, 2026

    EdgeX Blames Outsider for EDGE Token Crash as ZachXBT Alleges Insider Manipulation

    June 2, 2026
    frase
    Facebook X (Twitter) Instagram Pinterest
    © 2026 TechoraNewsHub.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 67,705.00
    ethereum
    Ethereum (ETH) $ 1,925.95
    tether
    Tether (USDT) $ 0.998389
    bnb
    BNB (BNB) $ 668.42
    xrp
    XRP (XRP) $ 1.24
    usd-coin
    USDC (USDC) $ 0.999644
    solana
    Solana (SOL) $ 77.07
    tron
    TRON (TRX) $ 0.337161
    figure-heloc
    Figure Heloc (FIGR_HELOC) $ 1.03
    staked-ether
    Lido Staked Ether (STETH) $ 2,265.05