Introducing Guardian Pulse 2.0
OQENYX's multimodal model. Speech recognition in 113 languages, text-to-speech in 36 — audio, video, and text unified in one Guardian endpoint.
Today we're releasing Guardian Pulse 2.0 — OQENYX's multimodal model: audio, video, and text unified in a single architecture, with a privacy-first design.
Audio, video, and text in one model — speech recognition across 113 languages and text-to-speech in 36, with strong voice quality in blind human evaluation.
Performance Overview
Voice Quality — TTS Human Evaluation (20 languages)
Speech Recognition Accuracy — ASR (113 languages)
Audio-Visual Benchmark Wins (of 36 total)
Audio-Visual Benchmark Overview
| Benchmark | Guardian Pulse 2.0 | GPT-4o Audio | Gemini 3.1 Flash | ElevenLabs v3 |
|---|---|---|---|---|
| ASR Languages Supported | 113% | — | — | — |
| TTS Languages Supported | 36% | — | — | — |
What Guardian Pulse 2.0 Can Do
Audio — Up to 10 Hours Per Request
Pulse 2.0 processes audio inputs up to 10 hours in a single request within the 256K context window. Entire meetings, full-day recordings, and long-form podcasts can be transcribed, summarised, and analysed in one call — with speaker diarisation, timestamp accuracy, and multilingual switching handled natively.
Video — 400 Seconds at 720p
Video inputs up to 400 seconds at 720p. Pulse 2.0 understands scene content, spoken dialogue, on-screen text, and visual context simultaneously — enabling use cases like meeting recording analysis, video captioning, and content moderation that require joint audio-visual understanding.
Text-to-Speech in 36 Languages
The TTS engine in Pulse 2.0 delivers strong voice quality across 20+ languages in blind human evaluation. Prosody, intonation, and natural pacing are modelled jointly with semantic content — not post-processed. Output is natural across European, Asian, and Middle Eastern language families.
ASR in 113 Languages
Automatic speech recognition in 113 languages with strong accuracy. Pulse 2.0 handles accent variation, code-switching (mixing languages mid-sentence), and noisy audio without degrading to a fallback model.
Audio-Visual Understanding
Across a broad suite of audio-visual tasks, Pulse 2.0 performs strongly on the evaluations that matter most for meeting intelligence, multilingual voice synthesis, and video summarisation.
Privacy and Infrastructure
All audio and video processing follows the same privacy-first architecture as every Guardian model — encrypted in transit and at rest, with minimal retention by default. Voice data, meeting recordings, and video content are handled with transparent processing and clear policies. For enterprise deployments handling sensitive voice data, minimal retention ensures no audio is retained after the request completes without your consent.
Availability
Guardian Pulse 2.0 is available now through:
- Onora — Voice mode in the Onora App is powered by Pulse 2.0
The Full 2.0 Release
Guardian Pulse 2.0 launches alongside the complete Guardian 2.0 family:
- Guardian 2.0 Thinking — Frontier reasoning, IFBench 76.5, 200+ languages
- G-2.0-Lite — Compact, fast model, AIME 93, native vision, long context
- G-2.0-Code — Agentic coding, BFCL-V4 72, SWE-bench 77%