A TypeScript SDK for evaluating content quality using traditional metrics and AI-powered evaluation.
- Traditional Metrics: BLEU, TER, BERTScore, Coherence, Perplexity
- AI-Powered Evaluation: LLM-based evaluator with prompt templating (via Vercel AI SDK)
- Batch Processing: Concurrent execution, progress tracking, retry logic, CSV/JSON export
npm install @loveholidays/eval-kit
# or
pnpm add @loveholidays/eval-kitFor AI evaluation, you'll also need an AI SDK provider:
npm install @ai-sdk/openai
# or @ai-sdk/anthropic, @ai-sdk/google, etc.import { calculateBleu, calculateCoherence } from '@loveholidays/eval-kit';
// BLEU score for translation quality
const bleuResult = calculateBleu(
'The cat sits on the mat',
['The cat is on the mat']
);
console.log(bleuResult.score); // 75.98
// Coherence for text flow
const coherenceResult = calculateCoherence(
'The cat sat on the mat. It was comfortable.'
);
console.log(coherenceResult.score); // 65.43import { openai } from '@ai-sdk/openai';
import { Evaluator } from '@loveholidays/eval-kit';
const evaluator = Evaluator.create('fluency', openai('gpt-4'));
const result = await evaluator.evaluate({
candidateText: 'The quick brown fox jumps over the lazy dog.'
});
console.log(result.score); // 95
console.log(result.feedback); // "Excellent fluency..."import { anthropic } from '@ai-sdk/anthropic';
import { BatchEvaluator, Evaluator } from '@loveholidays/eval-kit';
const evaluator = new Evaluator({
name: 'quality',
model: anthropic('claude-3-5-haiku-20241022'),
evaluationPrompt: 'Rate the quality of this text from 1-10.',
scoreConfig: { type: 'numeric', min: 1, max: 10 },
});
const batchEvaluator = new BatchEvaluator({
evaluators: [evaluator],
concurrency: 5,
onResult: (result) => console.log(`Row ${result.rowId}: ${result.results[0].score}`),
});
const result = await batchEvaluator.evaluate({ filePath: './data.csv' });
await batchEvaluator.export({
format: 'csv',
destination: './results.csv',
});| Guide | Description |
|---|---|
| Metrics | BLEU, TER, BERTScore, Coherence, Perplexity |
| Evaluator | AI-powered evaluation and scoring |
| Batch Evaluation | Concurrent processing, progress tracking |
| Export | CSV and JSON export options |
Via Vercel AI SDK: OpenAI, Anthropic, Google, Mistral, Groq, Cohere, and any OpenAI-compatible endpoint.
pnpm install # Install dependencies
pnpm build # Build the project
pnpm test # Run tests
pnpm lint # Lint codeThis package uses Changesets for version management and is published to the npm registry.
-
Add a changeset when you make changes that should be released:
pnpm changeset
- Select the version bump type (patch/minor/major)
- Write a summary of your changes
- This creates a markdown file in
.changeset/
-
Merge to main — The CI will automatically:
- Detect changesets
- Bump the version in
package.json - Update
CHANGELOG.md - Publish to npm registry
- Push git tags
For local testing or manual releases:
pnpm build # Build the package
pnpm changeset version # Apply version bumps
pnpm changeset publish # Publish to registry| Type | When to use |
|---|---|
patch |
Bug fixes, small updates |
minor |
New features (backwards compatible) |
major |
Breaking changes |
MIT