When agents make decisions, logs are not enough.

Attova records the decision behind every action: the evidence the agent had, the alternatives it rejected, its confidence, and the outcome that followed.

billing-recovery-agent / decision run_5f1a3c · v4.2.0
Decision · seq 3

Refund the $42,000 disputed charge, or escalate to human review?

Chosenauto_refund Rejectedescalate_to_human Confidence 0.95
Reverted confident and wrong 0.88 queued for review

A trace shows what happened. It cannot show why.

Execution runs the agent. Observation records the trace. Cognition preserves why it decided and whether that reasoning held up. Today only the first two exist.

What the trace keeps
  • + tool calls and spans
  • + inputs and outputs
  • + latency and errors
  • + the order things ran

Enough to see the steps. Not enough to defend the call.

What Attova keeps
  • the decision and what it chose
  • the evidence it had at the time
  • the alternatives it rejected, and why
  • its stated confidence
  • the outcome, and where confidence and outcome diverged

Enough to reconstruct the reasoning and answer for it.

Find where the agent was confident, and wrong.

A billing-recovery agent, instrumented with Attova. Open a decision to see the evidence it had and what it rejected, then replay the run.

Sandbox data is illustrative and runs entirely in your browser. No backend, no login.

billing-recovery-agent / divergence active · v4.2.0

DecisionConfidenceOutcomeDivergenceReplay

Review what the agent is trying to learn.

Nothing learns silently. Candidate experiences queue for approval before they become active knowledge, each one grounded in the run, decision, and outcome it came from.

Learning candidate / pending 3 in queue
outcome rule 0.82

Do not auto-refund disputed charges above $25,000 without a fraud-ring check. Escalate to human review instead.

Reason for review Would supersede an active experience. Derived from a high-confidence decision that reversed.
Expected
Refund holds, customer retained, no manual queue.
Observed
Refund reverted as fraud, clawed back nine days later.

See what the agent would remember before it acts.

Inspect which experiences a runtime scenario would retrieve, why they matched, and where they land in the agent's context.

Runtime context invoice.payment_failed · charge $42,000 · issuer_code lost_card · value_p98
Retrieved experiences
#1outcome rule 0.91

Disputed charges above $25,000 need a fraud-ring check before any auto-refund.

matchedcharge>25kdisputed
#2policy note 0.84

issuer_code lost_card can lag a customer card update by up to an hour.

matchedlost_card
#3action pattern 0.77

High-value accounts: prefer escalation over write-off on ambiguous disputes.

matchedvalue_p98dispute
Injection order
  1. 1outcome_rule 0.91, fraud-ring check
  2. 2policy_note 0.84, lost_card lag
  3. 3action_pattern 0.77, prefer escalation

Injected ahead of tool selection, highest-confidence first. Two lower-ranked experiences fell below the recall threshold and were dropped.

One call records the decision, not just the action.

Most agents log what they did. The decision call records why: the question, what it chose, what it rejected, the evidence, the confidence. Dependency-free, in TypeScript and Python.

const run = await attova.startRun({ trigger: 'invoice.payment_failed' });

const action = await run.action({ actionType: 'tool_call', toolName: 'stripe.refund' });

// the call that makes it defensible
await run.decision({
  actionId: action.id,
  question: 'Refund or escalate?',
  chosen: 'refund',
  confidence: 0.95,
  alternatives: [{ option: 'escalate', rejectedReason: 'under auto-approve threshold' }],
  evidence: [{ type: 'tool_output', summary: 'charge 4d old, issuer code lost_card' }],
});

await run.outcome({ outcomeType: 'success', signal: 'succeeded' });
await run.finish('succeeded');

Running agents that make consequential decisions?

Work with us to instrument one production decision and see where confidence, evidence, and outcomes diverge. We do the setup. You tell us the truth.