Files
everything-claude-code/.opencode/commands/eval.md
Affaan Mustafa 6d440c036d feat: complete OpenCode plugin support with hooks, tools, and commands
Major OpenCode integration overhaul:

- llms.txt: Comprehensive OpenCode documentation for LLMs (642 lines)
- .opencode/plugins/ecc-hooks.ts: All Claude Code hooks translated to OpenCode's plugin system
- .opencode/tools/*.ts: 3 custom tools (run-tests, check-coverage, security-audit)
- .opencode/commands/*.md: All 24 commands in OpenCode format
- .opencode/package.json: npm package structure for opencode-ecc
- .opencode/index.ts: Main plugin entry point

- Delete incorrect LIMITATIONS.md (hooks ARE supported via plugins)
- Rewrite MIGRATION.md with correct hook event mapping
- Update README.md OpenCode section to show full feature parity

OpenCode has 20+ events vs Claude Code's 3 phases:
- PreToolUse → tool.execute.before
- PostToolUse → tool.execute.after
- Stop → session.idle
- SessionStart → session.created
- SessionEnd → session.deleted
- Plus: file.edited, file.watcher.updated, permission.asked, todo.updated

- 12 agents: Full parity
- 24 commands: Full parity (+1 from original 23)
- 16 skills: Full parity
- Hooks: OpenCode has MORE (20+ events vs 3 phases)
- Custom Tools: 3 native OpenCode tools

The OpenCode configuration can now be:
1. Used directly: cd everything-claude-code && opencode
2. Installed via npm: npm install opencode-ecc
2026-02-05 05:14:33 -08:00

1.6 KiB
Raw Blame History

description, agent
description agent
Run evaluation against acceptance criteria build

Eval Command

Evaluate implementation against acceptance criteria: $ARGUMENTS

Your Task

Run structured evaluation to verify the implementation meets requirements.

Evaluation Framework

Grader Types

  1. Binary Grader - Pass/Fail

    • Does it work? Yes/No
    • Good for: feature completion, bug fixes
  2. Scalar Grader - Score 0-100

    • How well does it work?
    • Good for: performance, quality metrics
  3. Rubric Grader - Category scores

    • Multiple dimensions evaluated
    • Good for: comprehensive review

Evaluation Process

Step 1: Define Criteria

Acceptance Criteria:
1. [Criterion 1] - [weight]
2. [Criterion 2] - [weight]
3. [Criterion 3] - [weight]

Step 2: Run Tests

For each criterion:

  • Execute relevant test
  • Collect evidence
  • Score result

Step 3: Calculate Score

Final Score = Σ (criterion_score × weight) / total_weight

Step 4: Report

Evaluation Report

Overall: [PASS/FAIL] (Score: X/100)

Criterion Breakdown

Criterion Score Weight Weighted
[Criterion 1] X/10 30% X
[Criterion 2] X/10 40% X
[Criterion 3] X/10 30% X

Evidence

Criterion 1: [Name]

  • Test: [what was tested]
  • Result: [outcome]
  • Evidence: [screenshot, log, output]

Recommendations

[If not passing, what needs to change]

Pass@K Metrics

For non-deterministic evaluations:

  • Run K times
  • Calculate pass rate
  • Report: "Pass@K = X/K"

TIP: Use eval for acceptance testing before marking features complete.