eval-kit

A TypeScript SDK for evaluating content quality using traditional metrics and AI-powered evaluation.

Features

Traditional Metrics: BLEU, TER, BERTScore, Coherence, Perplexity
AI-Powered Evaluation: LLM-based evaluator with prompt templating (via Vercel AI SDK)
Batch Processing: Concurrent execution, progress tracking, retry logic, CSV/JSON export

Installation

npm install @loveholidays/eval-kit
# or
pnpm add @loveholidays/eval-kit

For AI evaluation, you'll also need an AI SDK provider:

npm install @ai-sdk/openai
# or @ai-sdk/anthropic, @ai-sdk/google, etc.

Quick Start

Traditional Metrics

import { calculateBleu, calculateCoherence } from '@loveholidays/eval-kit';

// BLEU score for translation quality
const bleuResult = calculateBleu(
  'The cat sits on the mat',
  ['The cat is on the mat']
);
console.log(bleuResult.score); // 75.98

// Coherence for text flow
const coherenceResult = calculateCoherence(
  'The cat sat on the mat. It was comfortable.'
);
console.log(coherenceResult.score); // 65.43

AI-Powered Evaluation

import { openai } from '@ai-sdk/openai';
import { Evaluator } from '@loveholidays/eval-kit';

const evaluator = Evaluator.create('fluency', openai('gpt-4'));

const result = await evaluator.evaluate({
  candidateText: 'The quick brown fox jumps over the lazy dog.'
});

console.log(result.score);    // 95
console.log(result.feedback); // "Excellent fluency..."

Batch Evaluation

import { anthropic } from '@ai-sdk/anthropic';
import { BatchEvaluator, Evaluator } from '@loveholidays/eval-kit';

const evaluator = new Evaluator({
  name: 'quality',
  model: anthropic('claude-3-5-haiku-20241022'),
  evaluationPrompt: 'Rate the quality of this text from 1-10.',
  scoreConfig: { type: 'numeric', min: 1, max: 10 },
});

const batchEvaluator = new BatchEvaluator({
  evaluators: [evaluator],
  concurrency: 5,
  onResult: (result) => console.log(`Row ${result.rowId}: ${result.results[0].score}`),
});

const result = await batchEvaluator.evaluate({ filePath: './data.csv' });

await batchEvaluator.export({
  format: 'csv',
  destination: './results.csv',
});

Documentation

Guide	Description
Metrics	BLEU, TER, BERTScore, Coherence, Perplexity
Evaluator	AI-powered evaluation and scoring
Batch Evaluation	Concurrent processing, progress tracking
Export	CSV and JSON export options

Supported LLM Providers

Via Vercel AI SDK: OpenAI, Anthropic, Google, Mistral, Groq, Cohere, and any OpenAI-compatible endpoint.

Development

pnpm install    # Install dependencies
pnpm build      # Build the project
pnpm test       # Run tests
pnpm lint       # Lint code

Publishing

This package uses Changesets for version management and is published to the npm registry.

Creating a Release

Add a changeset when you make changes that should be released:
```
pnpm changeset
```
- Select the version bump type (patch/minor/major)
- Write a summary of your changes
- This creates a markdown file in .changeset/
Merge to main — The CI will automatically:
- Detect changesets
- Bump the version in package.json
- Update CHANGELOG.md
- Publish to npm registry
- Push git tags

Manual Publishing

For local testing or manual releases:

pnpm build              # Build the package
pnpm changeset version  # Apply version bumps
pnpm changeset publish  # Publish to registry

Version Types

Type	When to use
`patch`	Bug fixes, small updates
`minor`	New features (backwards compatible)
`major`	Breaking changes

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.changeset		.changeset
.codescene		.codescene
.github		.github
docs		docs
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
jest.config.cjs		jest.config.cjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eval-kit

Features

Installation

Quick Start

Traditional Metrics

AI-Powered Evaluation

Batch Evaluation

Documentation

Supported LLM Providers

Development

Publishing

Creating a Release

Manual Publishing

Version Types

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

loveholidays/eval-kit

Folders and files

Latest commit

History

Repository files navigation

eval-kit

Features

Installation

Quick Start

Traditional Metrics

AI-Powered Evaluation

Batch Evaluation

Documentation

Supported LLM Providers

Development

Publishing

Creating a Release

Manual Publishing

Version Types

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages