In just six weeks, an AI model Anthropic calls Mythos has, according to data published by the company itself, uncovered thousands of zero-days in critical software, produced functional exploits against Firefox at a rate no commercial model before it had reached, and rediscovered a 27-year-old vulnerability buried in OpenBSD, alongside an FFmpeg flaw that five million automated tests had failed to catch.

For now, only around fifty companies in the world have operational access to it.

For any company that ships software, or that depends on a digital stack to run (almost all of them), this represents a structural shift in how corporate security has to be thought through. And the way Anthropic and the wider industry are managing the transition leaves most firms in a hard-to-defend position, because attackers will have equivalent models within months while defenders, as of today, don't even have access.

What Mythos has done in six weeks

The numbers Anthropic has released are the most alarming the company has ever shared about one of its own models.

On SWE-bench Verified, Mythos scores 93.9%, nearly 13 points above its immediate predecessor. That benchmark matters because it measures the model's ability to resolve real software engineering issues — the sort of work that separates a junior from a senior — and a 13-point jump on an already saturated metric amounts to a category shift in its own right.

On real-world vulnerabilities the gap becomes brutal. Against Firefox, Mythos produced 181 functional exploits, compared with the 2 that Opus 4.6 managed in the same exercise. On OSS-Fuzz benchmarks it generated 595 tier 1-2 crashes and 10 tier 5 (full control-flow hijack), while Opus 4.6 reached only a single tier 3.

The most uncomfortable part of the experiment: engineers with zero security background received complete, working remote code execution exploits overnight, without writing a line of them and simply by asking the model for them.

Anthropic has also reported thousands of high-severity zero-days, many of them one or two decades old, sitting in code that had been publicly audited for years. Among them sits CVE-2026-4747, which lets an unauthenticated attacker take complete control of a server from anywhere on the internet.

This episode isn't a bolt from the blue. Mythos's commercial predecessor, Claude Opus 4.6, publicly available since February 2026, has identified more than 500 validated high-severity vulnerabilities in open-source software while operating in an environment with analysis tools. And back in 2024, Google DeepMind together with Project Zero showed with the Big Sleep project that an AI could find a zero-day in SQLite before the official release. Mythos marks the next order-of-magnitude jump in a trajectory that's been documented for at least eighteen months.

"We only have something like six months before the open-weight models catch up."
— Alex Stamos, former CSO at Facebook, quoted by Platformer

That six-month window, cited by Stamos and aligned with Anthropic's own estimates, is the real timeline the next several quarters will run on. After that, criminal actors will be able to weaponize bugs at marginal cost.

The asymmetry nobody wants to look at

Anthropic has chosen not to release Mythos to the general public and has instead launched Project Glasswing, a programme with roughly fifty partners (AWS, Apple, Google, Microsoft, Nvidia, Cisco, CrowdStrike, Broadcom, Palo Alto Networks, JPMorgan Chase and the Linux Foundation, among others), $100M in usage credits and $4M donated to open-source security. The stated logic is coherent on paper, even if in practical terms it falls short of what the wider market needs. The partner list in fact reinforces the point: hyperscalers and large banks already have privileged access, while the rest of the corporate fabric does not.

For a mid-market company with fifty, five hundred or five thousand employees, one that ships software or depends on a digital stack to operate, the relevant question is no longer whether this shift reaches them but which specific vector will hit first. The answer the market is offering today amounts to waiting: waiting for Glasswing partners to patch their code, for a version with sufficient guardrails, for pentesting vendors to incorporate the capability, for SIEM platforms to learn the new patterns, and for every software vendor to ship the relevant fixes. Attackers, meanwhile, aren't waiting for anyone.

On the attacker side, the criminal community has been experimenting since 2023 with unfiltered LLM variants (WormGPT, FraudGPT and successors, publicly documented by firms such as SlashNext and Trend Micro); hostile intelligence services run offensive labs on state-level budgets; and, per statements from Anthropic's own offensive research team, within 6 to 12 months there will be comparable open-weight alternatives without filters, most likely originating in China.

The resulting asymmetry is uncomfortable to look at: attackers are investing systematically in offensive capability while defenders keep waiting for someone to give them permission to match it.

Why "limit access" is the wrong answer

The default regulatory instinct is to restrict — less access, less risk — and in almost every security domain that instinct works reasonably well. In this particular case it fails for structural reasons.

The offensive capabilities of Mythos follow directly from scaling frontier models past a certain compute threshold, so any lab with the resources to get there will eventually land at an equivalent place. The only meaningful question is who uses that capability against your infrastructure first.

When an offensive capability becomes inevitable, the only sensible defensive strategy is to put the same capability in defenders' hands before the attacker has it operational, and to do so broadly rather than gating it to forty handpicked companies.

That translates into three concrete implications for any company that ships or meaningfully consumes software:

First. The reasonable bar from this point forward is that software reaching production has been through an automated audit with a model of this class, without waiting for a regulator to force it or for the incident to have already happened. If your vendor doesn't do that, in practice your attack surface is being defined by them, even though the insurance premium is paid by you.

Second. Companies that own code (far more of them than recognise themselves as such, because that category includes any team with custom integrations, scripts, serious low-code, internal APIs or workflow glue) need a defensive equivalent. This isn't about massively expanding the security team; it's about giving the technical profiles you already have (in-house or via your usual pentesting vendor) access to models comparable to what will be used against them.

Third. For CISOs who've been in "we're evaluating AI" mode for a while, this is a good moment to land and move into operationalisation. Pure evaluation is starting to fall short against the pace of the offensive side.

What the bad actors already have

There's a comfortable fantasy, widespread in boardrooms, that sophisticated attacks come only from nation-states and are the government's problem. That fantasy costs real money every quarter.

The crimeware-as-a-service market has been offering attack infrastructure on subscription for years, so what Mythos reshapes is the capability ceiling available to whoever pays a relatively modest subscription in a closed forum.

A ransomware affiliate group with no real in-house technical sophistication, backed by an open model with advanced pentesting capability, can now automate in hours what used to require weeks of an experienced operator: identify the right CVE, validate exploitability in a cloned environment, build the payload, adapt the C2, move laterally across the network, exfiltrate data and encrypt. An attack cycle that used to stretch across a month now fits inside an afternoon.

It's worth asking whether your security architecture is designed to withstand an attacker moving at that speed. Across the mid-market the honest answer is almost invariably no, and the root of the problem lies mostly in how the security budget has been distributed: the spend of the last five years has gone mainly to more SIEM, more EDR, more cyber insurance and more anti-phishing training, all of it designed to slow down a human attacker with limited time and manual friction, and almost none of it built to contain an automated attacker capable of finding and exploiting inside a single session.

And while the attackers gear up, the attack surface itself is expanding from the inside. The Veracode GenAI Code Security Report 2025, which ran more than 100 LLMs through 80 code-completion tasks across four languages, found that 45% of the generated samples contained OWASP Top 10 vulnerabilities, and that AI-generated code carries on average 2.74x more vulnerabilities than human-written code. In parallel, Stack Overflow's 2025 Developer Survey (49,000+ respondents across 177 countries) puts at 84% the share of professional developers who already use or plan to use AI tools in their workflow, and SonarSource's 2026 State of Code report estimates around 40% of new code is AI-assisted or AI-generated. So the Mythos problem stacks on top of a baseline one: a growing share of the software you're shipping enters production with more security defects than two years ago, at a rhythm the traditional review cycle no longer absorbs.

Six imperatives for CISOs and CEOs

If you run a company that ships software or relies on it, and in 2026 that category leaves almost no one out, these are the moves to close in the next 90 days rather than in the next strategic plan.

1. Make offensive-AI auditing a procurement criterion in your vendor conversations. On every renewal, every new SaaS onboarding and every open-source component you pull in, add to the list of questions which offensive models have passed over that code before it reached you, and how often. Many vendors today will say they're only starting to explore it; the point is to open that conversation now and, all else being equal, start preferring vendors who do have a concrete answer. The CISO's responsibility plays out as much at the firewall as at the procurement table.

2. Integrate pentesting models into your SDLC now, without waiting for Mythos. There's an entire catalogue of LLM-powered tools available commercially today: Snyk DeepCode AI, Cycode, Semgrep AI, GitHub Advanced Security with Copilot Autofix, CodeAnt and Aikido, among others. They plug into standard pipelines and surface serious vulnerabilities across both human-written and AI-generated code. Your development cycle should include at least one automated pass on every meaningful pull request. License cost is marginal compared with the expected cost of an incident, and the findings are real from day one. If your technical team still sees the integration as unworkable, it's worth asking them for an up-to-date comparison: the tooling has jumped noticeably over the last six months.

3. Compress your patch cycle as far as the architecture allows. If your production patching process takes two weeks, your exposure window against an automated attacker is two weeks. Redesign the pipeline so that a critical-severity patch can ideally ship the same day, and in any case well inside a sprint. AI only shortens the countdown here; the underlying issue is plain operational discipline.

4. Red-team with models, not only with humans. Annual audits from an external consultant are still necessary, although they are no longer sufficient on their own. Complement them with continuous exercises against models equivalent to what attackers use, and if your pentesting vendor doesn't offer that capability in 2026, it's reasonable to consider rotating providers.

5. Take this to the board this quarter. Frame it as an investment decision with a dedicated budget line and an assigned owner. The asymmetry is the thesis: attackers have offensive models, you don't; the cost of matching capabilities is bounded and measurable, while the potential cost of a serious incident (regulatory reporting, reputational damage, downtime, ransom) is typically several orders of magnitude higher, as documented year after year by the IBM Cost of a Data Breach Report. Treating it as a bullet labelled "emerging AI risk" in the quarterly report effectively means not treating it at all.

6. Prepare the organization for a world with much shorter vulnerability windows. It may sound counterintuitive, although it is the direction the trajectory points to. As frontier models systematically surface more and more vulnerabilities, security will shift from "discover what nobody knows" to "patch fast what everybody knows," and competitive advantage will live in response speed rather than in opacity.

The 6 to 12 month window

Anthropic estimates that open-weight models with offensive capabilities comparable to Mythos will become available within 6 to 12 months. Given the pace at which open-weight models have advanced since 2023, that estimate may well fall short.

Call that your preparation window.

Inside that window, your company has to close three fronts: audit all critical software with capabilities equivalent to current frontier, redesign the patching process to operate in hours, and deploy monitoring able to detect automated attack patterns rather than only known signatures.

If you reach the end of that window without those three pieces in place, your company is defenseless regardless of where any AI-maturity benchmark positions it. "Defenseless" isn't a word that usually shows up in board reports, but after what Mythos has just demonstrated it should start to.

What to ask your software vendors

Closing with the operational piece. When you sit down tomorrow with a critical vendor (your ERP, your CRM, your ecommerce platform, your identity provider, your data stack), these are the questions the CISO should be asking before anything gets signed:

1. Which offensive models pass over your code before each release?

2. What's your mean time to patch for critical-severity vulnerabilities discovered internally, and what's the number for the ones discovered externally?

3. Do you have access to Project Glasswing or an equivalent, and if not, what's your alternative?

4. What evidence can you deliver each quarter of the audits performed on the code I consume?

5. Does your architecture allow patching without downtime, or do you still depend on maintenance windows?

If a vendor becomes uncomfortable with those questions, you already have valuable information; if they answer with specifics, the same applies. In both cases the procurement decision changes.

Mythos isn't Anthropic's problem alone. It's a phase change in how software gets attacked and defended, and the companies that understand this in the spring of 2026 will hold a twelve-month defensive lead over the ones that catch up in the fall.

If you want to work through what this means for your organization, our strategic advisory programmes are built exactly for that, with no implementation and just the honest conversation your board needs to have.