Eval feature additon. by kunalkushwaha · Pull Request #9 · AgenticGoKit/agk

kunalkushwaha · 2026-02-07T07:31:19Z

Implement AGK eval command for automated workflow testing with three semantic
strategies: embedding similarity, LLM-as-judge, and hybrid approach. Add
EvalServer integration to v1beta with HTTP endpoints for test execution.
Generate professional markdown reports with confidence scoring, collapsible
sections, and trace links.

Fix streaming bug in LLM judge by reading both Delta and Content fields.
Add comprehensive documentation (docs/eval.md, docs/trace.md) with examples,
best practices, and troubleshooting guides. Update all READMEs with eval
and trace sections.

fixes #8

Implement AGK eval command for automated workflow testing with three semantic strategies: embedding similarity, LLM-as-judge, and hybrid approach. Add EvalServer integration to v1beta with HTTP endpoints for test execution. Generate professional markdown reports with confidence scoring, collapsible sections, and trace links. Fix streaming bug in LLM judge by reading both Delta and Content fields. Add comprehensive documentation (docs/eval.md, docs/trace.md) with examples, best practices, and troubleshooting guides. Update all READMEs with eval and trace sections.

kunalkushwaha added 4 commits February 6, 2026 16:17

eval feature implemented using trace and eval hook in agenticgokit

08061b9

LLM as Judge implemented for eval

8ebf154

LLM issues resolved

219c9e7

kunalkushwaha force-pushed the eval-v2 branch from 30bf9c2 to 76f4cbb Compare February 7, 2026 07:40

kunalkushwaha merged commit 02305ed into main Feb 7, 2026
8 checks passed

kunalkushwaha deleted the eval-v2 branch February 8, 2026 11:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval feature additon.#9

Eval feature additon.#9
kunalkushwaha merged 4 commits intomainfrom
eval-v2

kunalkushwaha commented Feb 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kunalkushwaha commented Feb 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant