Skip to content

Research · Frameworks & Benchmarks

The Sovereign Speech Index

The voice system understood standard Dari on the first try. The woman speaking a regional Pashto dialect repeated herself until she gave up.

Why Speech

Speech AI fails the way it was trained to fail — quietly, and for the voices that had the least data.

A voice system is only as wide as the speech it learned from, and the speech it learned from is narrow: the standard register of the highest-resource languages, recorded in clean conditions, spoken by the demographics best represented in the data. Bring it a regional dialect, an older speaker, a noisy clinic line, a language it was never meaningfully trained on, and it does not announce that it is lost. It returns confident, wrong text, or it returns nothing, and the person on the other end is left to conclude that the failure is theirs. Speech fails more quietly than text, and for the people furthest from the training data, the quiet failure is the normal case.

That silence is why this work carries the word sovereign. Every Afghan language and every dialect is a voice with standing — a right to be recognized and spoken by the systems that increasingly mediate access to a clinic, a court, or an agency. A speech system that hears the prestige register of the dominant language and nothing else has not served the country; it has selected a fraction of it and left the rest unheard. The stakes are immediate: a voice interface that cannot understand a Pashai speaker or a regional Pashto dialect does not inconvenience that person, it excludes them, at the moment they reached out.

The Sovereign Speech Index measures the exclusion. It benchmarks speech recognition and synthesis across the full range of Afghan languages and their dialects — not the standard register alone, not the studio recording, not the best-represented voice — and it reports, by language and by dialect, who the system can hear and who it cannot. The gap is measured on consented voices, validated by the people who speak them, so that the silence has a number, and the number has a name.

Speech fails more quietly than text — and for the people furthest from the training data, the quiet failure is the normal case.

24

Afghan languages, and their dialects, the Index measures speech across

2

modalities measured: speech recognition and speech synthesis

0

voices in the benchmark used without consent — provenance tracked

100%

of results speaker-validated, across languages and dialects

The Doctrine

Spoken is not heard.

Every language and dialect is spoken. Only some are heard by the machine — and the unheard ones fail in silence. The Index measures the distance between the two.

The Dimensions

Six ways a voice is heard, or is not.

Recognition (ASR).

Whether the system correctly transcribes speech in each language — not whether it returns text, but whether the text is what was said.

Synthesis (TTS).

Whether the system produces natural, intelligible, correct speech in each language — a synthetic voice that mangles a language is its own kind of silence.

Dialect and register coverage.

Performance across dialects and registers, not the standard or prestige form alone — the form most people actually speak.

Speaker variation.

Performance across the voices training data underrepresents — age, gender, regional accent — because a system tuned to one kind of speaker fails the others.

Real-world conditions.

Performance under the audio of actual use — telephony, background noise, field conditions — not the studio recording the system was demonstrated on.

Code-switching and mixed speech.

Handling of the natural mixing of languages and registers that real speakers use, rather than assuming one clean language per utterance.

Exact metrics, weights, and scales for each dimension are the firm's defined methodology.

Anatomy of the measurement

Modalities

recognition (ASR) · synthesis (TTS)

Languages

scored across the 24, individually

Dialects & registers

by band, not the standard form alone

Conditions

real-world audio, not studio

Coverage map

where the system hears, and where it does not — the Index's headline

Voices

consented and provenance-tracked; results speaker-validated

The Method

Measured by language and dialect, on voices that consented to be heard.

01

Scored per language and dialect, then mapped.

Each language and dialect is evaluated on its own; the headline is the coverage map — where the system hears and where it fails — not an average across them.

02

Speaker-validated.

Results are validated by native speakers of each language and dialect, because the people who speak a form are the only ones who can confirm a machine heard it.

03

On consented, provenance-tracked voices.

The evaluation voices are gathered with consent and tracked to source — no scraped audio — because a benchmark for sovereign speech cannot itself be built on voices taken without permission.

04

Under real conditions.

Tested under the audio of actual use, so a system that works in the studio and fails on a clinic phone line is reported as failing where it matters.

05

Held to the firm's gates, and reproducible.

The methodology runs through the firm's validation, and is documented so the measurement can be examined and reproduced.

PLACEHOLDER — replace with a cinematic, dark-graded institutional image (Afghan speaker or landscape). Flagged for manual swap.

The unheard do not raise their hands. The Index counts them anyway.

The Editions

Measured every year, so the map of who is heard can be redrawn.

The Index is published annually. Each edition reports the state of speech AI across Afghan languages and dialects as measured that year — by modality, by language, by dialect, under real conditions — so that coverage gained, or lost, is visible across editions. Where an edition warrants the formal record, it is issued as a citable report. The annual cadence makes the silence accountable: a single measurement shows who is unheard now; a series shows whether anyone is doing anything about it.

A single measurement shows who is unheard now. A series shows whether anyone is doing anything about it.

Inaugural edition

The inaugural edition of the Sovereign Speech Index is in preparation. Editions publish here as they are completed and validated. Request the Index to receive the inaugural edition on release.

Request the Index

Reading It

What the coverage map tells you — and what it does not.

For the institution choosing or building speech systems, the Index turns a vendor's demonstration into a map: which languages and dialects a system can actually hear and speak, under the conditions you will actually use it in, and where it goes silent. Read it before a voice system touches a population, to set a coverage bar a system must clear; read it by dialect and by condition, not by headline; and read it across editions to see whether the unheard are being heard.

Set a coverage bar.

A standard a speech system must clear — by language, dialect, and condition — before it serves a population.

Read the map, not the demo.

Where a system hears and where it fails, under real conditions, not a studio showcase.

Track who gets heard.

Year over year, whether coverage of the unheard is widening.

Being heard is the threshold, not the destination. The Index measures whether a machine can recognize and speak a language; it does not measure whether the institution behind it then serves the person, and it is not a license to replace a qualified human interpreter where the stakes require one. A high score means a system can hear a voice — not that it should be the only thing listening.

24

Afghan languages & dialect bands

0

security incidents

100%

senior-led engagements

41+

Trust Center documents

Continue

Explore Frameworks & Benchmarks.

Index

The Pashto-Dari Parity Index

The parity benchmark for the two official languages; its speech dimension draws on this Index.

Read →
Reference set

ASR & TTS Reference Sets

The consented speech data behind responsible voice work.

Read →
Architecture

The Lapis Stack

The architecture — this Index lives in its Measures layer.

Read →
Directory

All Frameworks & Benchmarks

The full directory of named frameworks and benchmarks.

Browse →

Find out whose voice your system cannot hear.

For the health systems, agencies, courts, and builders deploying speech AI to reach Afghan populations — and unwilling to discover the silence at the moment someone needs to be heard. Briefings are conducted under NDA, in Washington, D.C. or virtually.

Request the Index