Harness Research¶
The OpenCastor fleet runs distributed harness evaluation — searching 263,424 possible configurations to find the optimal AI agent harness for robotics tasks.
The problem¶
AI agent harness configuration matters. Changing max_iterations from 8 to 6, or thinking_budget from 2048 to 1024, can mean the difference between a task completing cleanly and an agent drifting into repeated retries. But the right values depend on hardware, task type, and model.
No one runs enough robots to explore this space alone. A fleet can.
The search space¶
8 axes, 263,424 total configurations:
| Axis | Values | Options |
|---|---|---|
max_iterations |
3–10 | 8 |
thinking_budget |
256–4096 | 7 |
context_budget |
2048–32768 | 7 |
cost_gate_usd |
0.001–0.10 | 7 |
retry_on_error |
true/false | 2 |
drift_detection |
true/false | 2 |
p66_consent_threshold |
physical/digital/none | 3 |
pattern |
4 patterns | 4 |
memory_backend |
working/filesystem/firestore | 2 (for research) |
Explored so far: 435 configs (0.17%)
Current champion: lower_cost — OHB-1 score 0.6541
How it works¶
graph LR
A[Robot idles] --> B[Fetch candidate config]
B --> C[Run OHB-1 benchmark\n30 tasks · gemma3:1b]
C --> D[Submit score + RRN]
D --> E[Nightly aggregation]
E --> F[New champion found]
F --> G[Stored as pending update]
G --> H[You apply when ready]
- Your robot fetches a candidate harness config from the queue
- Runs the OHB-1 benchmark — 30 real robotics tasks using
gemma3:1bvia local Ollama - Score + your RRN submitted to Firestore (
harness_eval_results) - Nightly pipeline finds the highest-scoring config across all contributors
- Champion stored as
harness_pendingin Firestore — never auto-applied - You apply it via app or CLI when ready
Tracks¶
Three research tracks run in parallel:
| Track | Focus | What it varies |
|---|---|---|
| Track A — HarnessParam | Core tunables | cost/tokens/timeout combos |
| Track B — Architecture | Execution pattern | pattern names (single, init/exec, multi) |
| Track C — Skill | Skill sets | skill_set combinations |
Contributor lineage¶
Every evaluation is attributed to the contributing robot's RRN. Firestore stores:
{
"candidate_id": "lower_cost",
"evaluated_by": "RRN-000000000001",
"score": 0.6541,
"evaluated_at": "2026-03-21T...",
"is_champion": true
}
See Contributing Compute and Castor Credits.