Retrieval Bench
Multilingual retrieval tested on real editorial discovery tasks
We care less about generic leaderboard bragging rights and more about which stack actually finds the right story in a bilingual publication with tags, archives, and shifting terminology.
- Test cross-language related-story discovery
- Measure on-site retrieval quality, not isolated benchmark vanity
- Turn results into concrete search and recommendation decisions