Skip to content

Research · Frameworks & Benchmarks

The Pashto-Dari Parity Index

A model can clear its Afghan-language benchmark on a single score — the average of fluent Dari and broken Pashto. The people who speak Pashto are not an average.

The Pashto-Dari Parity Index is Ariana Nexus's annual benchmark of linguistic parity in Afghan-language AI — a measurement of whether a system serves Afghanistan's two official languages equally, or only appears to because a single score averages one language's competence over the other's failure. It scores each language on its own, by dialect band, validated by speakers, and reports the gap between them — because parity is the one thing an aggregate score is built to hide.

Why Parity

Afghanistan has two official languages. AI rarely serves them as if it knows that.

Pashto and Dari are both official languages of Afghanistan, and an institution serving the country serves speakers of both. AI systems do not treat them as equals, and the reason is structural rather than accidental. Dari is a variety of Persian and inherits the comparative wealth of Persian-language data and tooling; Pashto is lower-resourced and more morphologically demanding — grammatical gender, complex inflection, features that a thinly-trained model handles badly. The predictable result is a system that performs respectably in one language and poorly in the other, while its single reported capability — "supports Afghan languages" — averages the two into a number that conceals exactly which speakers it fails.

That concealment is not a reporting quirk; it is a decision about who gets served. A health system that procures an AI translation tool on its aggregate Afghan-language score may be serving Dari speakers adequately and failing Pashto speakers entirely, with no figure on the page that would tell it so. The gap between the two languages is the most consequential thing about the system, and it is the one thing the headline number is designed not to show.

The Pashto-Dari Parity Index exists to show it. It refuses the aggregate, scores each language on its own terms and by dialect band, has the results validated by people who speak them, and reports the disparity as the headline rather than the footnote. The gap can run in either direction; what the Index guarantees is that it is measured, named, and put in front of the institution that would otherwise procure the average.

Dari inherits Persian's data wealth. Pashto does not. The gap between them is structural — not incidental, and not visible in a single score.

2

official languages held to one standard of parity, not averaged into one score

6

dimensions of parity the Index measures

0

parity claimed on a single aggregate score — each language scored alone

100%

of scores speaker-validated, not assigned by automatic metric alone

The Doctrine

An average is not a parity.

A single score across two languages can look healthy while one of them is failing. Parity is the gap the average conceals — and the gap is who gets served.

The Dimensions

Parity is not one number. It is six, measured twice.

Translation

Accuracy in both directions — into and out of each language — because a system fluent translating from a language can still fail translating into it.

Comprehension

Whether the system understands meaning and intent in each language, not merely matches surface strings.

Generation

Output that is fluent, correct, and right in register for each language — fluency in one is not fluency in both.

Speech

Recognition and synthesis parity, measured in concert with the Sovereign Speech Index — voice systems fail across the Pashto-Dari line as readily as text systems do.

Dialect coverage

Parity across the dialect bands within each language, not only the standard register — an average over dialects hides its own gap.

Cultural accuracy

Whether output is culturally faithful and not merely linguistically fluent, drawing on the Cultural Hallucination Audit — a parity of words is not a parity of meaning.

Exact metrics, weights, and scales for each dimension are the firm's defined methodology.

Anatomy of the measurement

Language A — Pashto

score by dimension and dialect band

Language B — Dari

score by dimension and dialect band

The parity gap

the disparity between them — the Index's headline

Coverage

measured by dialect band, not the standard register alone

Validation

speaker-validated and gated; not automatic-metric-only

The Method

Each language on its own, then the gap between them.

01

Scored per language, then compared.

Pashto and Dari are each evaluated on their own terms; the headline is the disparity between them, never an average across them.

02

Speaker-validated, not self-scored.

Results are validated by qualified speakers, because automatic metrics underperform for low-resource languages and cannot be trusted to grade the very gap the Index exists to find.

03

By dialect band.

Each language is measured across its dialect bands, so a parity that holds for the standard register but collapses in a dialect is reported, not hidden.

04

Held to the firm's gates.

The Index methodology runs through the firm's validation — the Five-Gate Protocol and the Cultural Compliance Bureau — so the measurement is itself held to a standard.

05

Published and reproducible.

The method is documented so the score can be examined and reproduced — a benchmark no one can check is a claim, not a measurement.

The Editions

Measured every year, so the gap can be watched closing — or not.

The Index is published annually. Each edition reports the state of Pashto-Dari parity in Afghan-language AI as measured that year — by dimension and by dialect band — so that progress, or its absence, is visible across editions rather than asserted. Where an edition warrants the formal record, it is issued as a citable report. The point of an annual cadence is accountability over time: a single year's measurement is a snapshot; a series is a trajectory.

A single year is a snapshot. A series is a trajectory — and accountability lives in the trajectory.

Inaugural edition

The inaugural edition of the Pashto-Dari Parity Index is in preparation. Editions publish here as they are completed and validated. Request the Index to receive the inaugural edition on release.

Request the Index

Reading It

What the gap tells you — and what it does not.

For the institution choosing or relying on Afghan-language AI, the Index converts a single, misleading number into the one that matters: how far apart the two languages actually are, and where. It is built to be cited in a procurement, to set a bar a system must clear before it touches a population, and to hold a vendor to a standard their aggregate score was designed to dodge. Read it for the gap, by dimension and by band, and read it across editions for the direction of travel.

Procure against it.

A parity bar a system must clear before it serves a population.

Cite the gap.

The disparity, by dimension and band — named, not averaged away.

Track the trajectory.

Year over year, whether the gap is closing or widening.

Parity is the absence of disparity, not the presence of quality. The Index measures whether the two languages are served equally; whether they are served well is a separate question, and a system can be perfectly equal and equally inadequate. Parity is necessary. It is not sufficient.

24

Afghan languages & dialect bands

0

security incidents

100%

senior-led engagements

41+

Trust Center documents

Continue

Explore Frameworks & Benchmarks.

Index

The Sovereign Speech Index

The multilingual speech AI benchmark across Afghan languages and dialects.

Read →
Methodology

The Cultural Hallucination Audit

The methodology for detecting culturally inaccurate AI output.

Read →
Architecture

The Lapis Stack

The architecture — this Index lives in its Measures layer.

Read →
Directory

All Frameworks & Benchmarks

The full directory of named frameworks.

Browse →

Procure on the gap, not the average.

For the health systems, agencies, courts, and vendors choosing or building Afghan-language AI — and unwilling to let a single score decide who gets served. Briefings are conducted under NDA, in Washington, D.C. or virtually.

Request the Index