CyberGym
83.1% Mythos vs. 66.6% Opus 4.6. Vulnerability reproduction. The headline number.
A specialized Claude model for defensive cybersecurity research. Announced April 7, 2026 as part of Project Glasswing. Invitation-only — not generally available.
Mythos is the first publicly named Claude model specialized for a single research domain: defensive cybersecurity. It identifies zero-day vulnerabilities in operating systems, web browsers, and critical software, and can autonomously discover and exploit security flaws with minimal human guidance. Anthropic positions it as a tool for the defensive side of the AI-era security arms race.
The most-quoted finding from the launch announcement: Mythos uncovered a 27-year vulnerability in OpenBSD and a 16-year flaw in FFmpeg that automated testing had encountered five million times without detecting. The number is intended to dramatize the capability gap; the underlying point is that pre-Mythos automated testing had a ceiling, and this model breaks it.
Anthropic published four benchmarks at launch, each comparing Mythos to Claude Opus 4.6:
83.1% Mythos vs. 66.6% Opus 4.6. Vulnerability reproduction. The headline number.
93.9% Mythos vs. 80.8% Opus 4.6. General software-engineering tasks.
77.8% Mythos vs. 53.4% Opus 4.6. Harder professional engineering benchmark.
82.0% Mythos vs. 65.4% Opus 4.6. Terminal-driven agentic tasks.
Thousands of high-severity vulnerabilities across every major OS and browser; specific finds include OpenBSD (27 yrs old) and FFmpeg (16 yrs).
$25 / $125 per million input/output tokens. Roughly 5× Opus 4.7’s rate — reflects the research-tier positioning.
Mythos didn’t ship as a product. It shipped as the engine of an industry initiative. Project Glasswing is a 12-org coalition launched alongside it — named for the glasswing butterfly, the “hidden vulnerabilities and transparency in defense” metaphor. The founders:
Amazon Web Services · Anthropic · Apple · Broadcom · Cisco · CrowdStrike · Google · JPMorgan Chase · Linux Foundation · Microsoft · NVIDIA · Palo Alto Networks.
Plus 40+ additional critical-infrastructure organizations with access. The stated goal: get Mythos-class capability into defenders’ hands before attackers reach parity.
Anthropic also committed: $100M in model-usage credits for participating researchers, $2.5M to Alpha-Omega/OpenSSF (Linux Foundation), and $1.5M to the Apache Software Foundation. There’s a 90-day public-reporting cycle on findings and fixed vulnerabilities.
A running log of post-launch movement, refreshed daily by the local Mythos watcher (cron 06:15 UTC) that diffs Anthropic’s public surfaces and flags changes. Newest first.
anthropic.com top navigation now lists Models > Mythos preview alongside Opus, Sonnet, and Haiku. A subtle promotion — on April 7 the announcement page lived under /glasswing and didn’t appear in the global product nav. As of today, every page on Anthropic’s marketing site offers Mythos as a navigable surface. Read: it has graduated from a launch event to a permanent shelf position. (Detected by the watcher this morning; research, news, and release-notes all changed.)
docs.anthropic.com/release-notes overview page changed; the watcher saw a content delta. No GA-style entry yet — the change is most likely a Mythos-context insertion, not a new public-tier announcement. Worth re-reading directly.
Mythos’s public footprint is intentionally narrow. The signal is in adjacent moves — pricing, product-nav placement, partner statements, and what shows up in sibling Anthropic surfaces.
April 7 launched at /glasswing — a campaign URL. May 9, Mythos joins the global model menu. Anthropic doesn’t list research-preview models in nav unless they expect them to stay. This is the institutional commitment showing.
$25 / $125 per million tokens is roughly 5× the consumer Opus rate. That’s not “bigger model is more expensive” pricing — it’s gating. Anthropic is using the price tag to keep usage to org-scale defenders, not curious individuals.
$2.5M to Alpha-Omega/OpenSSF, $1.5M to Apache — the non-profit OSS security ecosystem. When Mythos finds something nasty in a widely-deployed library, the disclosure path is already funded and friendly. Smart defensive PR move; also actually useful.
SWE-bench Verified jumped from 80.8% (Opus 4.6) to 93.9% (Mythos). That’s a 13-point improvement on a general-purpose engineering benchmark, not just security. Whatever training recipe makes Mythos better at vulnerability-hunting also makes it better at code in general. That recipe will surface in next-generation Opus / Sonnet long before Mythos itself does.
Most of the last 30 days has been silent on the marketing side and visible only as small content diffs — release-note paragraphs, nav reshuffles, doc cross-links. That’s why the watcher exists: launch energy fades but the institutional footprint keeps moving. Dashboard at wholetech.com/admin/mythos-watcher/.
Every Mythos discussion eventually returns to the question Anthropic carefully doesn’t answer: when does the same capability show up on the offensive side? CrowdStrike’s “minutes” framing at launch was the most direct industry comment. Watch for analogous statements from the defensive vendor coalition over the summer.
Last refreshed 2026-05-09 after the Mythos watcher (cron daily 06:15 UTC, dashboard at wholetech.com/admin/mythos-watcher/) detected three Anthropic-surface changes today.