Skip to content

OrgBench

OrgBench is a benchmark and dataset initiative for evaluating agentic systems using governance-aligned cognitive and behavioral metrics.

It is designed to answer questions like:

  • Does the agent retrieve the right memories when it should?
  • Does it stay aligned to its role and constraints?
  • Can it explain decisions and show evidence?
  • Does it behave predictably under low-temperature settings?
  • Memory quality: precision and relevance of retrieved context
  • Judgment consistency: stable decisions under similar conditions
  • Observability: transparency of reasoning, evidence, and traceability
  • Diligence: thoroughness in considering alternatives and risks
  • Precedent responsiveness: retrieving and applying relevant precedents
  • A set of benchmark tasks and agent scenarios
  • A scoring harness and reporting format
  • Reference agent manifests and evaluation metadata

Early build phase—initial task suites and scoring harness are being shaped alongside AWTF and CognitiveCache™.