Files
everything-claude-code/.opencode/commands/eval.md
Affaan Mustafa 6d440c036d feat: complete OpenCode plugin support with hooks, tools, and commands
Major OpenCode integration overhaul:

- llms.txt: Comprehensive OpenCode documentation for LLMs (642 lines)
- .opencode/plugins/ecc-hooks.ts: All Claude Code hooks translated to OpenCode's plugin system
- .opencode/tools/*.ts: 3 custom tools (run-tests, check-coverage, security-audit)
- .opencode/commands/*.md: All 24 commands in OpenCode format
- .opencode/package.json: npm package structure for opencode-ecc
- .opencode/index.ts: Main plugin entry point

- Delete incorrect LIMITATIONS.md (hooks ARE supported via plugins)
- Rewrite MIGRATION.md with correct hook event mapping
- Update README.md OpenCode section to show full feature parity

OpenCode has 20+ events vs Claude Code's 3 phases:
- PreToolUse → tool.execute.before
- PostToolUse → tool.execute.after
- Stop → session.idle
- SessionStart → session.created
- SessionEnd → session.deleted
- Plus: file.edited, file.watcher.updated, permission.asked, todo.updated

- 12 agents: Full parity
- 24 commands: Full parity (+1 from original 23)
- 16 skills: Full parity
- Hooks: OpenCode has MORE (20+ events vs 3 phases)
- Custom Tools: 3 native OpenCode tools

The OpenCode configuration can now be:
1. Used directly: cd everything-claude-code && opencode
2. Installed via npm: npm install opencode-ecc
2026-02-05 05:14:33 -08:00

89 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
description: Run evaluation against acceptance criteria
agent: build
---
# Eval Command
Evaluate implementation against acceptance criteria: $ARGUMENTS
## Your Task
Run structured evaluation to verify the implementation meets requirements.
## Evaluation Framework
### Grader Types
1. **Binary Grader** - Pass/Fail
- Does it work? Yes/No
- Good for: feature completion, bug fixes
2. **Scalar Grader** - Score 0-100
- How well does it work?
- Good for: performance, quality metrics
3. **Rubric Grader** - Category scores
- Multiple dimensions evaluated
- Good for: comprehensive review
## Evaluation Process
### Step 1: Define Criteria
```
Acceptance Criteria:
1. [Criterion 1] - [weight]
2. [Criterion 2] - [weight]
3. [Criterion 3] - [weight]
```
### Step 2: Run Tests
For each criterion:
- Execute relevant test
- Collect evidence
- Score result
### Step 3: Calculate Score
```
Final Score = Σ (criterion_score × weight) / total_weight
```
### Step 4: Report
## Evaluation Report
### Overall: [PASS/FAIL] (Score: X/100)
### Criterion Breakdown
| Criterion | Score | Weight | Weighted |
|-----------|-------|--------|----------|
| [Criterion 1] | X/10 | 30% | X |
| [Criterion 2] | X/10 | 40% | X |
| [Criterion 3] | X/10 | 30% | X |
### Evidence
**Criterion 1: [Name]**
- Test: [what was tested]
- Result: [outcome]
- Evidence: [screenshot, log, output]
### Recommendations
[If not passing, what needs to change]
## Pass@K Metrics
For non-deterministic evaluations:
- Run K times
- Calculate pass rate
- Report: "Pass@K = X/K"
---
**TIP**: Use eval for acceptance testing before marking features complete.