Agent Testing & Evaluation Protocol — SKILL.md for QA-Grade Agent Testing

$0.03 / access SKILL.md protocol 66%+ conversion rate

The agent-testing-eval-skill is a structured SKILL.md behavioral protocol for QA-grade AI agent testing. It teaches agents how to systematically test other agents: build golden datasets, define scoring rubrics, detect regressions, and set up eval harnesses. Per-access via x402 micropayments — fetch the protocol at task time so you always get the current version.

Highest-converting asset on ClawMerchants (66%+ CVR): 2 of 3 agents that probe this endpoint purchase it. Enterprise eval clusters — testing → governance → observability → cost optimization — anchor on this skill. Real purchase signal from production agent workflows.

Protocol Overview

The protocol covers five areas of systematic agent quality assurance:

PhaseWhat It Covers
Test PlanningScope definition, test type selection (unit / integration / end-to-end), coverage targets for agent behaviors
Golden Dataset ConstructionInput-output pair curation, edge case identification, human labeling protocol, dataset versioning
Output Scoring RubricsFactual accuracy scoring, format compliance, hallucination detection, task-specific rubric design
Regression DetectionBaseline anchoring, delta thresholds, automated regression alerts, model version tracking
Eval Harness SetupRunner configuration, batch evaluation, CI/CD integration, report generation

Protocol Excerpt

# Agent Testing & Evaluation Protocol ## Phase 1: Test Planning You are a QA engineer for AI agent systems. Your goal is to ensure agent behavior meets defined quality standards before deployment. ### 1.1 Scope Definition - Identify the agent's primary task and subtasks - List all input types the agent must handle - Define acceptance criteria for each behavior ... [full protocol requires $0.03 access via x402 — free preview at /v1/preview/agent-testing-eval-skill]

Use Cases

Enterprise eval cluster — agents cross-purchase in sequence:
Agent Testing & Eval ($0.03) → Governance & SLA ($0.05) → Observability ($0.03) → Cost Optimization ($0.03) → Prompt Engineering ($0.03)
Proven cross-sell cluster with consistent purchase signal on ClawMerchants.

How to Access via x402

  1. Free preview: GET https://clawmerchants.com/v1/preview/agent-testing-eval-skill — returns protocol excerpt, no payment
  2. Probe: GET https://clawmerchants.com/v1/data/agent-testing-eval-skill → HTTP 402 with USDC price
  3. Pay: Send 0.03 USDC on Base L2 (chain ID 8453) to the provider wallet
  4. Receive: Resend with X-PAYMENT: <base64 proof> → HTTP 200 with full SKILL.md protocol

Why Per-Access SKILL.md vs One-Time Download

Fetching the protocol at task time ensures your agent always runs the current version — not a stale cached copy. At $0.03/access, you'd need 334 runs to match the cost of a $10 one-time download. Most agent eval workflows run tens to hundreds of times per month, not thousands. Learn more about per-access vs one-time →

Free preview: GET /v1/preview/agent-testing-eval-skill
Probe the endpoint: GET https://clawmerchants.com/v1/data/agent-testing-eval-skill
Full agent guide: How agents buy SKILL.md protocols via x402 →

ClawMerchants — agent testing evaluation SKILL.md protocol — x402 + USDC + Base L2