Skip to content

Research · Frameworks & Benchmarks

The ADF Pipeline

The rest of this catalog measures the data. This one makes it.

The ADF Pipeline is the AI Data Factory’s standard methodology for producing multilingual annotation and reference sets — the labeled, validated data that AI systems learn from and that the benchmarks in this catalog measure against. A benchmark is only as honest as the reference set behind it, and the Pipeline is how that reference set is built. This page places it among the frameworks; the full account of how the Pipeline runs, station by station, lives on its primary page.

An AI Data Factory methodology · Updated June 2026

ADF

The ADF Pipeline

Made · Validated · Reference-ready

In the catalog

Everything here measures the data. This is how the data is made.

Most of the Frameworks & Benchmarks catalog acts on data that already exists. The indices and scores measure it — the Pashto-Dari Parity Index, the Sovereign Speech Index, and the rest measure where a system stands against a reference. The audits and standards validate it. But all of that depends on something prior: a reference set worth measuring against, and training data worth learning from. The ADF Pipeline is what produces them. It is not a measure or a check; it is the production methodology that makes the multilingual annotation and reference sets the rest of the catalog takes as given.

That is the Pipeline’s place here, and why it sits among instruments that otherwise judge rather than build. A benchmark that measures against a careless reference set measures nothing; a model trained on unvalidated data learns the wrong thing confidently. The Pipeline’s reference sets are produced through validation — built and checked, not assumed — which is precisely what allows a benchmark to trust them and a system to rely on them. The ground truth the rest of the catalog stands on is made here.

The complete account — how the Pipeline runs, station by station, and how annotation and reference sets are produced and validated — is on its primary page. → The ADF Pipeline

The shape of it

What everything else stands on.

Make · The Pipeline
MeasureValidate

The Pipeline makes the reference sets the rest of the catalog measures and validates against — the ground truth everything else stands on.

AI Data Factory imagery — placeholder, pending the Pipeline's visual identity from its primary page

The catalog

Where the Pipeline sits among the instruments.

The catalog runs in three movements — the data is made, then measured, then validated — and the Pipeline is where it begins.

Make

The production methodology behind the reference and training data.

The ADF Pipeline

The distinction

What the Pipeline is — and is not.

Not a measurement

It produces no score about where a system stands. That is the work of the indices.

Not a check

It does not validate a finished deliverable. That is the work of the audits and standards.

A production methodology

It makes the multilingual annotation and reference sets the rest of the catalog depends on.

Validated, not assumed

The Pipeline’s reference sets are produced through validation — built and human-checked, not assumed — which is precisely what lets a benchmark trust them and a system rely on them. The full account is on its primary page.

The data is made here. The full method is one click away.

The ADF Pipeline belongs to the AI Data Factory, and its complete page covers everything this one does not — the stations, the production of annotation and reference sets, and the validation that makes them trustworthy. For work that depends on data built right, the conversation starts here.