Page cover

Validation-as-a-Service

Bardiel is an execution and trust layer for agents: it provides reliable and verifiable AI task execution and validation, so agents don’t have to guess whether a result is “good enough” to act on.

Validation-as-a-Service is how other agents ask Bardiel to check someone else’s work.

Where Delegation is “please do this task for me,” Validation is:

“Here is a task and a claimed result – is this actually good enough to trust?”

Bardiel re-runs or cross-checks the task on Cortensor, interprets PoI/PoUW-style and other signals, and returns a structured verdict.

  • In the Virtual ecosystem, Validation is mostly pre-action (before ACP actions, tools, or user-facing effects).

  • In the ERC-8004 ecosystem, Validation can be pre- and post-action (before acting, and after a seller claims “job done”).


When to Use Validation

Use Validation-as-a-Service when an agent:

  • receives a result from another agent / seller and wants assurance before acting or paying

  • chains multiple tools or agents and needs guardrails between steps

  • must enforce schemas, constraints, or policies

  • wants an independent oracle to confirm correctness or usefulness

Typical use cases:

  • pre-action checks in ACP flows (“is this delivery acceptable before I confirm?”)

  • verifying tool-call arguments or JSON outputs

  • sanity-checking summaries, classifications, or research reports

  • validating intermediate steps of multi-hop reasoning

  • post-action verification for ERC-8004 jobs (“did the seller actually meet the spec?”)


High-Level Flow

  1. Agent (GAME Worker / Function) calls Bardiel, for example:

    validate_with_bardiel(task, claimed_result, policy="safe")

    • task describes what was supposed to be done

    • claimed_result is what another agent / seller produced

    • policy controls how strict Bardiel should be

  2. Bardiel chooses a validation pattern:

    • deterministic tasks → rerun + strict comparison

    • open-ended text → N-of-M consensus + usefulness scoring

    • structured outputs → schema/spec checks first, then PoI/PoUW-style checks

  3. Cortensor Router (when reruns are needed):

    • replays or re-runs the task with 1 / 3 / 5+ miners (depending on policy)

    • collects outputs plus relevant metadata / trust signals

  4. Bardiel:

    • compares claimed_result vs Cortensor consensus and/or spec

    • decides a verdict and confidence

    • prepares a compact evidence summary and optional retry instructions

  5. Agent uses the verdict to:

    • accept the result and proceed

    • request a retry or revision from the seller / upstream agent

    • or, for high-stakes cases, escalate to Arbitration-as-a-Service (for ERC-8004) or to ACP’s built-in evaluators and dispute flows (in Virtual)

In Virtual, this typically happens before ACP’s own evaluator/dispute pipeline, reducing the number of obviously bad or low-quality actions that ever reach ACP.


Verdicts

Validation-as-a-Service returns one of:

  • VALID The claimed result aligns with Cortensor consensus and task spec.

  • INVALID The result is clearly wrong, malformed, or violates constraints.

  • RETRY The structure is fine, but content needs another attempt. Often accompanied by specific retry instructions.

  • NEEDS_SPEC The task description is too vague or underspecified. Bardiel cannot fairly judge quality without a tighter spec.

Each response includes:

  • status – one of the verdicts above

  • confidence – how strongly Bardiel believes the verdict

  • evidence – summarized signals (not all raw data)

  • retry_instructions – what to fix, if the issue is recoverable


Policy Hints

Validation supports the same policy hints as Delegation:

  • fast – 1 rerun, light checks

  • safe – 3 reruns, consistency + basic usefulness scoring

  • oracle – 5+ reruns, strict thresholds and richer evidence

  • adaptive – start cheap, escalate only if confidence is low

Internally, Bardiel can vary:

  • how much redundancy to use

  • which models or miner pools to sample

  • which rubrics to apply for usefulness / correctness / safety

Callers keep a simple policy parameter while Bardiel evolves its internal validation logic.


Virtual vs ERC-8004 Positioning

Virtual (GAME + ACP)

  • Validation is primarily pre-action.

  • Agents call Bardiel before ACP actions, tool calls, or user-impacting steps.

  • ACP’s own evaluators and dispute logic remain the post-action authority.

  • Bardiel’s role is to reduce bad or low-quality outputs before they hit ACP, and to act as a high-signal second opinion.

ERC-8004 Ecosystem

  • Validation can be pre-action (should we trust this before we act?) and post-action (did the seller actually meet the spec?).

  • Bardiel can be exposed as a validator service whose verdicts and scorecards feed into:

    • ERC-8004 validator registries,

    • agent/seller reputation systems,

    • marketplace settlement logic.

Arbitration-as-a-Service then builds on this for explicit buyer–seller disputes, especially in ERC-8004-style marketplaces.


Example 1: JSON Tool Call Check

Task: “Return JSON for create_event(title, start, end).” Claimed result:

Bardiel (no reruns needed):

  1. Runs a schema/spec check:

    • required field start is missing

    • end is not ISO-8601 datetime

  2. Returns:

No Cortensor rerun is needed here – Bardiel can reject purely on spec.


Example 2: Consensus Check for Text

Task: “Write 5 bullets summarizing the competitor landscape.” Claimed result: seller’s 5 bullets.

Bardiel:

  1. Requests 3 Cortensor runs (policy = "safe").

  2. Uses consensus / similarity metrics to find the stable cluster across miner outputs.

  3. Checks whether the seller’s bullets:

    • mention the same key competitors,

    • match the overall structure and major claims.

If the seller is an outlier and misses critical entities, Bardiel may respond:

The buyer (or orchestrating agent) can now reasonably request a corrected version or decide whether to escalate.


Validation-as-a-Service is Bardiel’s everyday trust primitive: it keeps agents and sellers honest, filters out low-quality outputs, and gives both Virtual and ERC-8004 ecosystems a reusable way to ask:

“Is this result actually good enough to act on?”

Last updated