Skip to content

Eval feature additon.#9

Merged
kunalkushwaha merged 4 commits intomainfrom
eval-v2
Feb 7, 2026
Merged

Eval feature additon.#9
kunalkushwaha merged 4 commits intomainfrom
eval-v2

Conversation

@kunalkushwaha
Copy link
Member

Implement AGK eval command for automated workflow testing with three semantic
strategies: embedding similarity, LLM-as-judge, and hybrid approach. Add
EvalServer integration to v1beta with HTTP endpoints for test execution.
Generate professional markdown reports with confidence scoring, collapsible
sections, and trace links.

Fix streaming bug in LLM judge by reading both Delta and Content fields.
Add comprehensive documentation (docs/eval.md, docs/trace.md) with examples,
best practices, and troubleshooting guides. Update all READMEs with eval
and trace sections.

fixes #8

Implement AGK eval command for automated workflow testing with three semantic
strategies: embedding similarity, LLM-as-judge, and hybrid approach. Add
EvalServer integration to v1beta with HTTP endpoints for test execution.
Generate professional markdown reports with confidence scoring, collapsible
sections, and trace links.

Fix streaming bug in LLM judge by reading both Delta and Content fields.
Add comprehensive documentation (docs/eval.md, docs/trace.md) with examples,
best practices, and troubleshooting guides. Update all READMEs with eval
and trace sections.
@kunalkushwaha kunalkushwaha merged commit 02305ed into main Feb 7, 2026
8 checks passed
@kunalkushwaha kunalkushwaha deleted the eval-v2 branch February 8, 2026 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add eval Support to agk

1 participant