OrgBench
What OrgBench is
Section titled “What OrgBench is”OrgBench is a benchmark and dataset initiative for evaluating agentic systems using governance-aligned cognitive and behavioral metrics.
It is designed to answer questions like:
- Does the agent retrieve the right memories when it should?
- Does it stay aligned to its role and constraints?
- Can it explain decisions and show evidence?
- Does it behave predictably under low-temperature settings?
What OrgBench evaluates (high level)
Section titled “What OrgBench evaluates (high level)”- Memory quality: precision and relevance of retrieved context
- Judgment consistency: stable decisions under similar conditions
- Observability: transparency of reasoning, evidence, and traceability
- Diligence: thoroughness in considering alternatives and risks
- Precedent responsiveness: retrieving and applying relevant precedents
Outputs (planned)
Section titled “Outputs (planned)”- A set of benchmark tasks and agent scenarios
- A scoring harness and reporting format
- Reference agent manifests and evaluation metadata
Status
Section titled “Status”Early build phase—initial task suites and scoring harness are being shaped alongside AWTF and CognitiveCache™.