Skip to content

OrgBench

What OrgBench is

OrgBench is a benchmark and dataset initiative for evaluating agentic systems using governance-aligned cognitive and behavioral metrics.

It is designed to answer questions like:

Does the agent retrieve the right memories when it should?
Does it stay aligned to its role and constraints?
Can it explain decisions and show evidence?
Does it behave predictably under low-temperature settings?

What OrgBench evaluates (high level)

Memory quality: precision and relevance of retrieved context
Judgment consistency: stable decisions under similar conditions
Observability: transparency of reasoning, evidence, and traceability
Diligence: thoroughness in considering alternatives and risks
Precedent responsiveness: retrieving and applying relevant precedents

Outputs (planned)

A set of benchmark tasks and agent scenarios
A scoring harness and reporting format
Reference agent manifests and evaluation metadata

Status

Early build phase—initial task suites and scoring harness are being shaped alongside AWTF and CognitiveCache™.