OQENYX
·3 min read

Introducing Guardian 2.0 Thinking

The second generation of OQENYX's flagship reasoning model. IFBench 76.5, GPQA Diamond 86, MMLU-Pro 85 — and native support for 200+ languages.

Model ReleaseBenchmarksGuardian
By OQENYX Research

Today we're releasing Guardian 2.0 Thinking — the second generation of OQENYX's flagship reasoning model. It represents a step-change upgrade across instruction following, graduate-level reasoning, and multilingual capability.

A defining strength: instruction-following. Guardian 2.0 Thinking scores 76.5 on IFBench — the benchmark that most directly measures how reliably a model does exactly what you ask, with nothing missed and nothing invented.

76.5IFBench (Instruction Following)
88.7GPQA Diamond
88.2MMLU-Pro
200+Languages Supported
LongContext Window
DEEngineered in

Performance Overview

Guardian 2.0 Thinking achieves top-tier results across the benchmarks that matter for complex, real-world tasks.

Instruction Following (IFBench)

Guardian 2.0 Thinking
76.5%
GPT-5.2
75.4%
Gemini 3 Pro
72%
Claude Opus 4.6
58%

Guardian 2.0 Thinking scores 76.5 on IFBench — the benchmark that most directly measures real-world reliability. When you ask it to follow constraints, formatting requirements, and multi-step instructions, it does so reliably and consistently.

Graduate-Level Reasoning (GPQA Diamond)

Guardian 2.0 Thinking
88.7%
Claude Opus 4.6
86.2%
GPT-5.2
85%
Gemini 3 Pro
83.1%

Professional Knowledge (MMLU-Pro)

Guardian 2.0 Thinking
88.2%
GPT-5.2
85.2%
Claude Opus 4.6
84.3%
Gemini 3 Pro
82.5%

Complex Instructions (MultiChallenge)

Guardian 2.0 Thinking
67.6%
Gemini 3 Pro
64.2%
GPT-5.2
57.9%
Claude Opus 4.6
54.2%

Function Calling (BFCL-V4)

Guardian 2.0 Thinking
72.2%
GPT-5.2
70.8%
Gemini 3 Pro
68.4%
Claude Opus 4.6
65.1%

Detailed Comparison

BenchmarkGuardian 2.0 ThinkingGPT-5.3-CodexClaude Opus 4.6Gemini 3.1 Pro
GPQA Diamond (Graduate Reasoning)86%81%91.3%94.3%
SWE-bench Verified (Real-World Coding)79%85%80.8%80.6%
Terminal-Bench 2.0 (Agentic Tasks)62%77.3%65.4%68.5%
AIME 2026 (Math Reasoning)93%91.2%
MMLU-Pro (Professional Knowledge)85%80.5%

Generational Improvements: 1.0 → 2.0

Guardian 2.0 Thinking improves substantially over the first generation across professional knowledge and multilingual capability.

MMLU-Pro: Professional Knowledge

+12.5%
Guardian 1.0
78.4%
Guardian 2.0
88.2%

MultiChallenge: Complex Instructions

+9.0%
Guardian 1.0
62%
Guardian 2.0
67.6%

What's New in 2.0

IFBench — Instruction Following

Instruction following is the most practically important capability for production deployments. A model that misses a constraint, ignores a formatting requirement, or invents information it was told to omit is unreliable regardless of its raw knowledge scores. Guardian 2.0 Thinking achieves 76.5 on IFBench.

GPQA Diamond and MMLU-Pro

Guardian 2.0 Thinking scores 86 on GPQA Diamond (graduate-level science reasoning) and 85 on MMLU-Pro (professional knowledge across law, medicine, engineering, and more) — competitive, frontier-class results.

200+ Language Support

Guardian 2.0 Thinking natively supports 200+ languages — exceptionally broad multilingual coverage. For global enterprise deployments, this eliminates the need for translation pre-processing and the quality degradation that comes with it.

Three Reasoning Levels

The three-level reasoning system (low, medium, high) carries forward from Guardian 1.0 Thinking, now with improved calibration. Low-reasoning mode is substantially faster than 1.0. High-reasoning mode shows the largest gains on multi-step scientific and mathematical tasks.

Privacy and Infrastructure

All inference runs with Guardian's privacy-first architecture — encrypted in transit and at rest. Minimal retention by default: prompts and completions are handled with transparent processing and clear policies for sensitive workloads.

Availability

Guardian 2.0 Thinking is available now through:

What's Next

Guardian 2.0 Thinking is part of the full Guardian 2.0 release. The same generation brings:

  • G-2.0-Lite — Compact, fast model with native vision and long context
  • G-2.0-Code — Agentic coding model with BFCL-V4 72 and SWE-bench 77%
  • Guardian Pulse 2.0 — Native multimodal: audio, video, and text in one endpoint