Skip to content

Evaluator

Evaluator is the release confidence layer.

It tracks important user journeys and shows whether the project still satisfies the flows that matter before work is considered ready to ship.

Open the Evaluator prototype

Why It Matters

Task completion is not the same as product readiness.

An agent can finish a local task while a critical journey remains broken. Evaluator exists to keep delivery focused on outcomes:

  • required journeys are visible;
  • failed journeys block confidence;
  • disabled journeys are explicit rather than forgotten;
  • evidence and review gates are attached to the journey;
  • humans can see where release risk still lives.

This turns "tests passed" into a stronger question: did the user journey pass?

What Evaluator Tracks

Evaluator is designed around journeys, not files.

Examples:

  • onboarding works end to end;
  • invitation flow can add a teammate;
  • report download remains available;
  • agent resume does not lose project context;
  • dashboard actions still map to the right workspace.

Each journey can have checkpoints, latest run status, evidence, and human review state.

How It Fits With Tasks

Evaluator should not replace the Board. It complements it:

  • Board answers "what should agents work on next?"
  • Evaluator answers "is the product still safe to ship?"
  • Activity explains "what happened and why?"
  • Plans explain "what are we building and under which constraints?"

Together they make agentic development observable from plan to release.

Related Entry Points

NeedEntry point
Inspect delivery healthsinaris hub
Review blocked journeysEvaluator view
Connect evaluation to implementationBoard + Activity
Understand release confidencePlan + Evaluator

Released under the MIT License.