Introducing Guardian 2.0 Thinking
The second generation of OQENYX's flagship reasoning model. IFBench 76.5, GPQA Diamond 86, MMLU-Pro 85 — and native support for 200+ languages.
Today we're releasing Guardian 2.0 Thinking — the second generation of OQENYX's flagship reasoning model. It represents a step-change upgrade across instruction following, graduate-level reasoning, and multilingual capability.
A defining strength: instruction-following. Guardian 2.0 Thinking scores 76.5 on IFBench — the benchmark that most directly measures how reliably a model does exactly what you ask, with nothing missed and nothing invented.
Performance Overview
Guardian 2.0 Thinking achieves top-tier results across the benchmarks that matter for complex, real-world tasks.
Instruction Following (IFBench)
Guardian 2.0 Thinking scores 76.5 on IFBench — the benchmark that most directly measures real-world reliability. When you ask it to follow constraints, formatting requirements, and multi-step instructions, it does so reliably and consistently.
Graduate-Level Reasoning (GPQA Diamond)
Professional Knowledge (MMLU-Pro)
Complex Instructions (MultiChallenge)
Function Calling (BFCL-V4)
Detailed Comparison
| Benchmark | Guardian 2.0 Thinking | GPT-5.3-Codex | Claude Opus 4.6 | Gemini 3.1 Pro |
|---|---|---|---|---|
| GPQA Diamond (Graduate Reasoning) | 86% | 81% | 91.3% | 94.3% |
| SWE-bench Verified (Real-World Coding) | 79% | 85% | 80.8% | 80.6% |
| Terminal-Bench 2.0 (Agentic Tasks) | 62% | 77.3% | 65.4% | 68.5% |
| AIME 2026 (Math Reasoning) | 93% | — | — | 91.2% |
| MMLU-Pro (Professional Knowledge) | 85% | — | — | 80.5% |
Generational Improvements: 1.0 → 2.0
Guardian 2.0 Thinking improves substantially over the first generation across professional knowledge and multilingual capability.
MMLU-Pro: Professional Knowledge
+12.5%MultiChallenge: Complex Instructions
+9.0%What's New in 2.0
IFBench — Instruction Following
Instruction following is the most practically important capability for production deployments. A model that misses a constraint, ignores a formatting requirement, or invents information it was told to omit is unreliable regardless of its raw knowledge scores. Guardian 2.0 Thinking achieves 76.5 on IFBench.
GPQA Diamond and MMLU-Pro
Guardian 2.0 Thinking scores 86 on GPQA Diamond (graduate-level science reasoning) and 85 on MMLU-Pro (professional knowledge across law, medicine, engineering, and more) — competitive, frontier-class results.
200+ Language Support
Guardian 2.0 Thinking natively supports 200+ languages — exceptionally broad multilingual coverage. For global enterprise deployments, this eliminates the need for translation pre-processing and the quality degradation that comes with it.
Three Reasoning Levels
The three-level reasoning system (low, medium, high) carries forward from Guardian 1.0 Thinking, now with improved calibration. Low-reasoning mode is substantially faster than 1.0. High-reasoning mode shows the largest gains on multi-step scientific and mathematical tasks.
Privacy and Infrastructure
All inference runs with Guardian's privacy-first architecture — encrypted in transit and at rest. Minimal retention by default: prompts and completions are handled with transparent processing and clear policies for sensitive workloads.
Availability
Guardian 2.0 Thinking is available now through:
- Onora — OQENYX's consumer AI assistant at oqenyx.com/onora
What's Next
Guardian 2.0 Thinking is part of the full Guardian 2.0 release. The same generation brings:
- G-2.0-Lite — Compact, fast model with native vision and long context
- G-2.0-Code — Agentic coding model with BFCL-V4 72 and SWE-bench 77%
- Guardian Pulse 2.0 — Native multimodal: audio, video, and text in one endpoint