Mistral AI Goes Big: Remote Coding Agents, 128B Model, and the Promise of Async AI
Last week, Mistral AI released something quietly significant — Mistral Medium 3.5, their first “flagship merged model”, paired with a feature that changes how coding agents actually feel to use.
For once, a new model release isn’t just about benchmark numbers. It’s about a shift in how we interact with AI-powered development.
The Model: 128B Dense, Self-Hosted on Four GPUs
Mistral Medium 3.5 is a 128-billion-parameter dense model with a 256K context window, trained from scratch with a custom vision encoder (not a reused CLIP, which is notable). It handles instruction-following, reasoning, and coding in a single weight set — what Mistral calls their first “merged” flagship model.
The numbers: 77.6% on SWE-Bench Verified, ahead of Mistral’s own Devstral 2 and Qwen3.5 397B A17B. It also scores 91.4 on τ³-Telecom. Released on Hugging Face under a modified MIT license, and available for self-hosting on as few as four GPUs.
API pricing sits at $1.5 per million input tokens and $7.5 per million output tokens.
A new benchmark puts leading language models through 100 everyday ethical scenarios — a reminder that these models are increasingly being tested on nuanced reasoning, not just code.
The Feature That Matters: Remote Agents in Vibe
Here’s the part that’s genuinely interesting. Until now, Mistral Vibe (their coding agent, accessible via CLI) ran locally on your laptop. You kicked it off, watched your terminal, babysit every step. You were the bottleneck.
Mistral has moved Vibe sessions to the cloud.
Sessions now run asynchronously — you start a coding task, walk away, and it keeps going. You can spawn multiple agents in parallel, inspect diffs and tool calls in real-time, and the agent opens a pull request when done. You review the PR. You didn’t watch every keystroke.
Local CLI sessions can even be “teleported” to the cloud when you want to leave them running — with session history, task state, and approvals all carrying across.
Integration-wise, Vibe connects to GitHub (code and PRs), Linear and Jira (issues), Sentry (incidents), and Slack or Teams (reporting).
Source: Mistral AI official announcement
Le Chat’s Work Mode: Beyond Coding
Mistral Medium 3.5 also powers a new Work mode in Le Chat — an agentic mode for multi-step tasks beyond coding. The agent becomes the execution backend for the assistant, calling tools in parallel and working through projects until they’re complete.
Cross-tool workflows: catching up across email, messages, and calendar. Research and synthesis: diving across the web, internal docs, connected tools. Inbox triage: drafting replies, creating Jira issues from team discussions, sending Slack summaries.
Sessions persist longer than a typical chat response. Every action is visible, with tool calls and reasoning rationale surfaced. Explicit approval is required for sensitive tasks.
Source: The New Stack – Mistral pushes coding agents to the cloud
Why This Matters
The agentic coding race is heating up. Cursor, GitHub Copilot, Amazon Q, Claude Code — they’re all pushing in this direction. But Mistral’s approach is distinct because:
1. The model is open weights. If you run your own infrastructure, you can run this yourself. No vendor lock-in on the base model.
2. The 256K context window is huge. Processing 200,000 words in a single pass means the model can handle large codebases more effectively than models with smaller context windows.
3. Configurable reasoning effort. The same model can dial up compute for a complex multi-step task on a single API call — dial down for quick lookups. No model switching required.
4. The “teleport” feature is practical. If you’ve ever had a coding session running locally and then needed to walk away from your machine, this actually solves a real workflow problem.
The Bigger Picture
Mistral, based in Paris, continues to position itself as Europe’s answer to the US AI labs. They’re not just training better models — they’re building complete agent infrastructures. Medium 3.5 wasn’t just a model release; it was a platform update.
The fact that it scores 77.6% on SWE-Bench Verified with “only” 128B parameters (versus Qwen3.5’s 397B) also suggests Mistral is making progress on parameter efficiency — bigger wins per parameter, which is critical for anyone running these models on their own hardware.
As agentic development tools move from “interesting demo” to “daily workflow”, the question becomes less about which model has the highest benchmark score and more about which one integrates cleanly into your existing toolchain. Mistral’s Medium 3.5, with its open weights, configurable reasoning, and cloud agents, is making that path clearer.