DroidNexus Labs

The original research and evaluation layer behind DroidNexus.

This is where coverage turns into reusable signal: benchmark notes, decision pages, and operational evaluations for retrieval, translation, speech workflows, and MCP safety instead of recycled commentary.

Original evaluation framing English + Arabic workflows Hugging Face-aware research

Lab tracks

Four authority lanes we can scale without diluting the site

Each track is designed to ship something citeable: a benchmark, a failure analysis, or a decision page that helps readers and teams act with more confidence.

Retrieval Bench

Multilingual retrieval tested on real editorial discovery tasks

We care less about generic leaderboard bragging rights and more about which stack actually finds the right story in a bilingual publication with tags, archives, and shifting terminology.

  • Test cross-language related-story discovery
  • Measure on-site retrieval quality, not isolated benchmark vanity
  • Turn results into concrete search and recommendation decisions
Open track

Translation Bench

Translation measured by editorial burden, not generation speed

We compare drafts by terminology discipline, voice drift, and human post-edit cost, because bilingual publishing usually breaks at that layer first.

  • Track technical terminology drift
  • Measure post-edit burden after model output
  • Separate acceptable drafts from publishable copy
Open track

Speech Bench

Transcription that is actually usable for editing, quotes, and search

The better question is not just WER. It is whether a transcript becomes useful source material inside interviews, note capture, and fast-moving reporting workflows.

  • Test quoteability and searchability
  • Measure source usefulness, not abstract accuracy alone
  • Pull audio back into coverage instead of leaving it stranded
Open track

MCP Safety Desk

MCP security from prompt layer to human approval boundaries

This lane combines research synthesis with operating judgment: tool permissions, tool descriptions, delegation limits, and the ways protection fails in practice.

  • Keep research and operational guidance clearly separated
  • Aggregate failure modes instead of generic warnings
  • Translate papers into product-facing implementation choices
Open track

Public standard

What makes a Labs piece publishable

  • State the task, corpus, and evaluation frame before the conclusion.
  • Separate ecosystem signal from independent testing.
  • Lower certainty when reproducibility is weak instead of overselling the verdict.
  • Keep Arabic and English aligned in judgment, not mirrored sentence by sentence.

Current output

Published work that already reflects the Labs mindset

These are not filler articles. They are the first layer of a citeable library around evaluation, comparison, and bilingual workflow design.

Signal Watchlist

What we are actively tracking on Hugging Face

Build-time live Hub metadata tied directly to the editorial lanes we are measuring and publishing inside DroidNexus Labs.

Retrieval

Live hub data
Open repo

A strong retrieval candidate for bilingual benchmark work

We track it because it remains a practical multilingual retrieval reference, not just a leaderboard headline.

BAAI/bge-m3

Sentence Similarity • Sentence Transformers • MIT

Downloads
15M
Likes
2.9K
Updated
Jul 3, 2024

Small Models

Live hub data
Open repo

A real small-embedding contender worth operational testing

This lane matters because serving efficiency matters as much as retrieval quality in fast editorial products.

ibm-granite/granite-embedding-107m-multilingual

Sentence Similarity • Transformers • APACHE-2.0

Downloads
34.2K
Likes
48
Updated
Aug 19, 2025

Translation

Live hub data
Open repo

A key signal in the draft-translation lane

We care about its editorial repair cost and terminology behavior, not only raw translation output.

tencent/HY-MT1.5-1.8B

Translation • Transformers

Downloads
22.1K
Likes
591
Updated
Jan 1, 2026

Realtime Speech

Live hub data
Open repo

A live speech signal worth testing beyond latency demos

We track it because fast transcription only matters when the output still holds up inside reporting and source-capture workflows.

mistralai/Voxtral-Mini-4B-Realtime-2602

Automatic Speech Recognition • Vllm • APACHE-2.0

Downloads
777.4K
Likes
750
Updated
Mar 11, 2026

Published Artifact

The first public DroidNexus benchmark asset is live now.

Arabic-English Editorial Retrieval Mini Benchmark

The retrieval lane has moved from analysis into a public Hugging Face artifact with downloadable files and a Labs scorecard that ties the benchmark directly back to the site’s editorial corpus.

This artifact is designed for teams evaluating editorial search behavior on a small bilingual corpus before they scale into larger retrieval pipelines or Hugging Face dataset releases.

Public artifact seed Built for bilingual search Published on Hugging Face

Current output

Published work that already reflects the Labs mindset

These are not filler articles. They are the first layer of a citeable library around evaluation, comparison, and bilingual workflow design.