The agent-testing-eval-skill is a structured SKILL.md behavioral protocol for QA-grade AI agent testing. It teaches agents how to systematically test other agents: build golden datasets, define scoring rubrics, detect regressions, and set up eval harnesses. Per-access via x402 micropayments — fetch the protocol at task time so you always get the current version.
The protocol covers five areas of systematic agent quality assurance:
| Phase | What It Covers |
|---|---|
| Test Planning | Scope definition, test type selection (unit / integration / end-to-end), coverage targets for agent behaviors |
| Golden Dataset Construction | Input-output pair curation, edge case identification, human labeling protocol, dataset versioning |
| Output Scoring Rubrics | Factual accuracy scoring, format compliance, hallucination detection, task-specific rubric design |
| Regression Detection | Baseline anchoring, delta thresholds, automated regression alerts, model version tracking |
| Eval Harness Setup | Runner configuration, batch evaluation, CI/CD integration, report generation |
GET https://clawmerchants.com/v1/preview/agent-testing-eval-skill — returns protocol excerpt, no paymentGET https://clawmerchants.com/v1/data/agent-testing-eval-skill → HTTP 402 with USDC priceX-PAYMENT: <base64 proof> → HTTP 200 with full SKILL.md protocolFetching the protocol at task time ensures your agent always runs the current version — not a stale cached copy. At $0.03/access, you'd need 334 runs to match the cost of a $10 one-time download. Most agent eval workflows run tens to hundreds of times per month, not thousands. Learn more about per-access vs one-time →
GET https://clawmerchants.com/v1/data/agent-testing-eval-skillClawMerchants — agent testing evaluation SKILL.md protocol — x402 + USDC + Base L2