March 31, 2026·3 min read

Introducing Guardian 2.0 Thinking

The second generation of OQENYX's flagship reasoning model. IFBench 76.5, GPQA Diamond 86, MMLU-Pro 85 — and native support for 200+ languages.

Model ReleaseBenchmarksGuardian

By OQENYX Research

Today we're releasing Guardian 2.0 Thinking — the second generation of OQENYX's flagship reasoning model. It represents a step-change upgrade across instruction following, graduate-level reasoning, and multilingual capability.

A defining strength: instruction-following. Guardian 2.0 Thinking scores 76.5 on IFBench — the benchmark that most directly measures how reliably a model does exactly what you ask, with nothing missed and nothing invented.

76.5IFBench (Instruction Following)

88.7GPQA Diamond

88.2MMLU-Pro

200+Languages Supported

LongContext Window

DEEngineered in

Performance Overview

Guardian 2.0 Thinking achieves top-tier results across the benchmarks that matter for complex, real-world tasks.

Instruction Following (IFBench)

Guardian 2.0 Thinking

76.5%

GPT-5.2

75.4%

Gemini 3 Pro

72%

Claude Opus 4.6

58%

Guardian 2.0 Thinking scores 76.5 on IFBench — the benchmark that most directly measures real-world reliability. When you ask it to follow constraints, formatting requirements, and multi-step instructions, it does so reliably and consistently.

Graduate-Level Reasoning (GPQA Diamond)

Guardian 2.0 Thinking

88.7%

Claude Opus 4.6

86.2%

GPT-5.2

85%

Gemini 3 Pro

83.1%

Professional Knowledge (MMLU-Pro)

Guardian 2.0 Thinking

88.2%

GPT-5.2

85.2%

Claude Opus 4.6

84.3%

Gemini 3 Pro

82.5%

Complex Instructions (MultiChallenge)

Guardian 2.0 Thinking

67.6%

Gemini 3 Pro

64.2%

GPT-5.2

57.9%

Claude Opus 4.6

54.2%

Function Calling (BFCL-V4)

Guardian 2.0 Thinking

72.2%

GPT-5.2

70.8%

Gemini 3 Pro

68.4%

Claude Opus 4.6

65.1%

Detailed Comparison

Benchmark	Guardian 2.0 Thinking	GPT-5.3-Codex	Claude Opus 4.6	Gemini 3.1 Pro
GPQA Diamond (Graduate Reasoning)	86%	81%	91.3%	94.3%
SWE-bench Verified (Real-World Coding)	79%	85%	80.8%	80.6%
Terminal-Bench 2.0 (Agentic Tasks)	62%	77.3%	65.4%	68.5%
AIME 2026 (Math Reasoning)	93%	—	—	91.2%
MMLU-Pro (Professional Knowledge)	85%	—	—	80.5%

Generational Improvements: 1.0 → 2.0

Guardian 2.0 Thinking improves substantially over the first generation across professional knowledge and multilingual capability.

MMLU-Pro: Professional Knowledge

+12.5%

Guardian 1.0

78.4%

Guardian 2.0

88.2%

MultiChallenge: Complex Instructions

+9.0%

Guardian 1.0

62%

Guardian 2.0

67.6%

What's New in 2.0

IFBench — Instruction Following

Instruction following is the most practically important capability for production deployments. A model that misses a constraint, ignores a formatting requirement, or invents information it was told to omit is unreliable regardless of its raw knowledge scores. Guardian 2.0 Thinking achieves 76.5 on IFBench.

GPQA Diamond and MMLU-Pro

Guardian 2.0 Thinking scores 86 on GPQA Diamond (graduate-level science reasoning) and 85 on MMLU-Pro (professional knowledge across law, medicine, engineering, and more) — competitive, frontier-class results.

200+ Language Support

Guardian 2.0 Thinking natively supports 200+ languages — exceptionally broad multilingual coverage. For global enterprise deployments, this eliminates the need for translation pre-processing and the quality degradation that comes with it.

Three Reasoning Levels

The three-level reasoning system (low, medium, high) carries forward from Guardian 1.0 Thinking, now with improved calibration. Low-reasoning mode is substantially faster than 1.0. High-reasoning mode shows the largest gains on multi-step scientific and mathematical tasks.

Privacy and Infrastructure

All inference runs with Guardian's privacy-first architecture — encrypted in transit and at rest. Minimal retention by default: prompts and completions are handled with transparent processing and clear policies for sensitive workloads.

Availability

Guardian 2.0 Thinking is available now through:

Onora — OQENYX's consumer AI assistant at oqenyx.com/onora

What's Next

Guardian 2.0 Thinking is part of the full Guardian 2.0 release. The same generation brings:

G-2.0-Lite — Compact, fast model with native vision and long context
G-2.0-Code — Agentic coding model with BFCL-V4 72 and SWE-bench 77%
Guardian Pulse 2.0 — Native multimodal: audio, video, and text in one endpoint

← All Research ← Back to Model