mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-02-08 15:34:56 +08:00
Major OpenCode integration overhaul: - llms.txt: Comprehensive OpenCode documentation for LLMs (642 lines) - .opencode/plugins/ecc-hooks.ts: All Claude Code hooks translated to OpenCode's plugin system - .opencode/tools/*.ts: 3 custom tools (run-tests, check-coverage, security-audit) - .opencode/commands/*.md: All 24 commands in OpenCode format - .opencode/package.json: npm package structure for opencode-ecc - .opencode/index.ts: Main plugin entry point - Delete incorrect LIMITATIONS.md (hooks ARE supported via plugins) - Rewrite MIGRATION.md with correct hook event mapping - Update README.md OpenCode section to show full feature parity OpenCode has 20+ events vs Claude Code's 3 phases: - PreToolUse → tool.execute.before - PostToolUse → tool.execute.after - Stop → session.idle - SessionStart → session.created - SessionEnd → session.deleted - Plus: file.edited, file.watcher.updated, permission.asked, todo.updated - 12 agents: Full parity - 24 commands: Full parity (+1 from original 23) - 16 skills: Full parity - Hooks: OpenCode has MORE (20+ events vs 3 phases) - Custom Tools: 3 native OpenCode tools The OpenCode configuration can now be: 1. Used directly: cd everything-claude-code && opencode 2. Installed via npm: npm install opencode-ecc
89 lines
1.6 KiB
Markdown
89 lines
1.6 KiB
Markdown
---
|
||
description: Run evaluation against acceptance criteria
|
||
agent: build
|
||
---
|
||
|
||
# Eval Command
|
||
|
||
Evaluate implementation against acceptance criteria: $ARGUMENTS
|
||
|
||
## Your Task
|
||
|
||
Run structured evaluation to verify the implementation meets requirements.
|
||
|
||
## Evaluation Framework
|
||
|
||
### Grader Types
|
||
|
||
1. **Binary Grader** - Pass/Fail
|
||
- Does it work? Yes/No
|
||
- Good for: feature completion, bug fixes
|
||
|
||
2. **Scalar Grader** - Score 0-100
|
||
- How well does it work?
|
||
- Good for: performance, quality metrics
|
||
|
||
3. **Rubric Grader** - Category scores
|
||
- Multiple dimensions evaluated
|
||
- Good for: comprehensive review
|
||
|
||
## Evaluation Process
|
||
|
||
### Step 1: Define Criteria
|
||
|
||
```
|
||
Acceptance Criteria:
|
||
1. [Criterion 1] - [weight]
|
||
2. [Criterion 2] - [weight]
|
||
3. [Criterion 3] - [weight]
|
||
```
|
||
|
||
### Step 2: Run Tests
|
||
|
||
For each criterion:
|
||
- Execute relevant test
|
||
- Collect evidence
|
||
- Score result
|
||
|
||
### Step 3: Calculate Score
|
||
|
||
```
|
||
Final Score = Σ (criterion_score × weight) / total_weight
|
||
```
|
||
|
||
### Step 4: Report
|
||
|
||
## Evaluation Report
|
||
|
||
### Overall: [PASS/FAIL] (Score: X/100)
|
||
|
||
### Criterion Breakdown
|
||
|
||
| Criterion | Score | Weight | Weighted |
|
||
|-----------|-------|--------|----------|
|
||
| [Criterion 1] | X/10 | 30% | X |
|
||
| [Criterion 2] | X/10 | 40% | X |
|
||
| [Criterion 3] | X/10 | 30% | X |
|
||
|
||
### Evidence
|
||
|
||
**Criterion 1: [Name]**
|
||
- Test: [what was tested]
|
||
- Result: [outcome]
|
||
- Evidence: [screenshot, log, output]
|
||
|
||
### Recommendations
|
||
|
||
[If not passing, what needs to change]
|
||
|
||
## Pass@K Metrics
|
||
|
||
For non-deterministic evaluations:
|
||
- Run K times
|
||
- Calculate pass rate
|
||
- Report: "Pass@K = X/K"
|
||
|
||
---
|
||
|
||
**TIP**: Use eval for acceptance testing before marking features complete.
|