Claude Mythos Preview: Anthropic’s Unreleased AI Cracked Linux and OpenBSD Bugs Humans Missed for Decades – Bitcoin News

Claude Mythos Preview: Anthropic’s Unreleased AI Cracked Linux and OpenBSD Bugs Humans Missed for Decades – Bitcoin News

Claude Mythos Preview: Anthropic’s Unreleased AI Cracked Linux and OpenBSD Bugs Humans Missed for Decades – Bitcoin News

https://news.bitcoin.com/claude-mythos-preview-anthropics-unreleased-ai-cracked-linux-and-openbsd-bugs-humans-missed-for-decades/

Publish Date: 2026-04-09 03:51:00

Source Domain: news.bitcoin.com

Key Takeaways:

  • Anthropic’s Claude Mythos Preview scored 83.1% on Cybergym, finding thousands of zero-days across every major OS and browser.
  • Project Glasswing launched April 7, 2026, with 11 founding partners and up to $100 million in Mythos usage credits for defenders.
  • A 27-year-old OpenBSD flaw and a 16-year-old FFmpeg bug survived millions of automated tests until Mythos found them in hours.

Claude Mythos AI Scored 83% on Cybergym and Found Critical Flaws Across Every Major Browser and OS

The model, which Anthropic describes as the largest single-model capability gain in frontier AI history, completed training and was announced publicly on April 7, 2026, after internal details surfaced in late March through a misconfigured content management system that exposed roughly 3,000 internal files.

Anthropic is not releasing the Claude Mythos Preview to the public or through its general API. The company restricted access to a vetted group of partners after the model demonstrated it could discover and exploit unknown software flaws previously at a speed and scale that outpaces both human experts and prior AI systems.

On cybersecurity benchmarks, the gap between Mythos and Claude Opus 4.6 is hard to ignore. Mythos scored 83.1% on Cybergym versus 66.6% for Opus 4.6, and 93.9% versus 80.8% on SWE-bench Verified. On SWE-bench Pro, it posted 77.8% against 53.4% — a 24-point spread. It hit 56.8% on Humanity’s Last Exam without tools, compared to 40.0% for its predecessor.

The model does not need cybersecurity-specific training to find these bugs. Its gains come from broader advances in reasoning, multi-step planning, and autonomous agentic behavior. Given a target codebase in an isolated container, it reads source code, forms hypotheses about memory-safety flaws, compiles and runs the software, uses debuggers like Address Sanitizer, ranks files by vulnerability likelihood, and produces validated bug reports with working proof-of-concept exploits.

Some of those…

Source