The Sovereign Speech Index
The multilingual speech AI benchmark across Afghan languages and dialects.
Read →Research · Frameworks & Benchmarks
A model can clear its Afghan-language benchmark on a single score — the average of fluent Dari and broken Pashto. The people who speak Pashto are not an average.
The Pashto-Dari Parity Index is Ariana Nexus's annual benchmark of linguistic parity in Afghan-language AI — a measurement of whether a system serves Afghanistan's two official languages equally, or only appears to because a single score averages one language's competence over the other's failure. It scores each language on its own, by dialect band, validated by speakers, and reports the gap between them — because parity is the one thing an aggregate score is built to hide.
Why Parity
Pashto and Dari are both official languages of Afghanistan, and an institution serving the country serves speakers of both. AI systems do not treat them as equals, and the reason is structural rather than accidental. Dari is a variety of Persian and inherits the comparative wealth of Persian-language data and tooling; Pashto is lower-resourced and more morphologically demanding — grammatical gender, complex inflection, features that a thinly-trained model handles badly. The predictable result is a system that performs respectably in one language and poorly in the other, while its single reported capability — "supports Afghan languages" — averages the two into a number that conceals exactly which speakers it fails.
That concealment is not a reporting quirk; it is a decision about who gets served. A health system that procures an AI translation tool on its aggregate Afghan-language score may be serving Dari speakers adequately and failing Pashto speakers entirely, with no figure on the page that would tell it so. The gap between the two languages is the most consequential thing about the system, and it is the one thing the headline number is designed not to show.
The Pashto-Dari Parity Index exists to show it. It refuses the aggregate, scores each language on its own terms and by dialect band, has the results validated by people who speak them, and reports the disparity as the headline rather than the footnote. The gap can run in either direction; what the Index guarantees is that it is measured, named, and put in front of the institution that would otherwise procure the average.
Dari inherits Persian's data wealth. Pashto does not. The gap between them is structural — not incidental, and not visible in a single score.
2
official languages held to one standard of parity, not averaged into one score
6
dimensions of parity the Index measures
0
parity claimed on a single aggregate score — each language scored alone
100%
of scores speaker-validated, not assigned by automatic metric alone
The Doctrine
An average is not a parity.
A single score across two languages can look healthy while one of them is failing. Parity is the gap the average conceals — and the gap is who gets served.
The Dimensions
Translation
Accuracy in both directions — into and out of each language — because a system fluent translating from a language can still fail translating into it.
Comprehension
Whether the system understands meaning and intent in each language, not merely matches surface strings.
Generation
Output that is fluent, correct, and right in register for each language — fluency in one is not fluency in both.
Speech
Recognition and synthesis parity, measured in concert with the Sovereign Speech Index — voice systems fail across the Pashto-Dari line as readily as text systems do.
Dialect coverage
Parity across the dialect bands within each language, not only the standard register — an average over dialects hides its own gap.
Cultural accuracy
Whether output is culturally faithful and not merely linguistically fluent, drawing on the Cultural Hallucination Audit — a parity of words is not a parity of meaning.
Exact metrics, weights, and scales for each dimension are the firm's defined methodology.
Anatomy of the measurement
Language A — Pashto
score by dimension and dialect band
Language B — Dari
score by dimension and dialect band
The parity gap
the disparity between them — the Index's headline
Coverage
measured by dialect band, not the standard register alone
Validation
speaker-validated and gated; not automatic-metric-only
The Method
01
Scored per language, then compared.
Pashto and Dari are each evaluated on their own terms; the headline is the disparity between them, never an average across them.
02
Speaker-validated, not self-scored.
Results are validated by qualified speakers, because automatic metrics underperform for low-resource languages and cannot be trusted to grade the very gap the Index exists to find.
03
By dialect band.
Each language is measured across its dialect bands, so a parity that holds for the standard register but collapses in a dialect is reported, not hidden.
04
Held to the firm's gates.
The Index methodology runs through the firm's validation — the Five-Gate Protocol and the Cultural Compliance Bureau — so the measurement is itself held to a standard.
05
Published and reproducible.
The method is documented so the score can be examined and reproduced — a benchmark no one can check is a claim, not a measurement.
The Editions
The Index is published annually. Each edition reports the state of Pashto-Dari parity in Afghan-language AI as measured that year — by dimension and by dialect band — so that progress, or its absence, is visible across editions rather than asserted. Where an edition warrants the formal record, it is issued as a citable report. The point of an annual cadence is accountability over time: a single year's measurement is a snapshot; a series is a trajectory.
A single year is a snapshot. A series is a trajectory — and accountability lives in the trajectory.
Inaugural edition
The inaugural edition of the Pashto-Dari Parity Index is in preparation. Editions publish here as they are completed and validated. Request the Index to receive the inaugural edition on release.
Request the IndexReading It
For the institution choosing or relying on Afghan-language AI, the Index converts a single, misleading number into the one that matters: how far apart the two languages actually are, and where. It is built to be cited in a procurement, to set a bar a system must clear before it touches a population, and to hold a vendor to a standard their aggregate score was designed to dodge. Read it for the gap, by dimension and by band, and read it across editions for the direction of travel.
Procure against it.
A parity bar a system must clear before it serves a population.
Cite the gap.
The disparity, by dimension and band — named, not averaged away.
Track the trajectory.
Year over year, whether the gap is closing or widening.
Parity is the absence of disparity, not the presence of quality. The Index measures whether the two languages are served equally; whether they are served well is a separate question, and a system can be perfectly equal and equally inadequate. Parity is necessary. It is not sufficient.
24
Afghan languages & dialect bands
0
security incidents
100%
senior-led engagements
41+
Trust Center documents
Continue
The multilingual speech AI benchmark across Afghan languages and dialects.
Read →The methodology for detecting culturally inaccurate AI output.
Read →The architecture — this Index lives in its Measures layer.
Read →The full directory of named frameworks.
Browse →For the health systems, agencies, courts, and vendors choosing or building Afghan-language AI — and unwilling to let a single score decide who gets served. Briefings are conducted under NDA, in Washington, D.C. or virtually.
Request the Index