Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
011d489
first stab at STT evals
AkhileshNegi Jan 30, 2026
7777290
Merge branch 'main' of github.com:ProjectTech4DevAI/kaapi-backend int…
AkhileshNegi Jan 30, 2026
d8df80c
Merge branch 'main' of github.com:ProjectTech4DevAI/kaapi-backend int…
AkhileshNegi Jan 31, 2026
f1df7f9
fix migration naming
AkhileshNegi Jan 31, 2026
cda0611
fixing endpoints
AkhileshNegi Jan 31, 2026
ad5779f
update dataset endpoint
AkhileshNegi Jan 31, 2026
01e2beb
update types
AkhileshNegi Jan 31, 2026
1637007
updated dataset with URL
AkhileshNegi Jan 31, 2026
36af7e9
added few more testcases
AkhileshNegi Jan 31, 2026
78fd206
added storage to core for easy reuse
AkhileshNegi Jan 31, 2026
4ac2ca6
cleanup for audio duration
AkhileshNegi Jan 31, 2026
d8b531c
first stab at fixing celery task to cron
AkhileshNegi Jan 31, 2026
2295da5
added gemini as provider
AkhileshNegi Feb 2, 2026
25e6002
moving to batch job in gemini
AkhileshNegi Feb 2, 2026
db2512e
code refactoring, using batch requests and files similar to OpenAI
AkhileshNegi Feb 2, 2026
ff29ddd
few cleanups
AkhileshNegi Feb 2, 2026
cd979fd
updated migration
AkhileshNegi Feb 3, 2026
b6c633a
cleanup config for batch
AkhileshNegi Feb 3, 2026
b6e6649
moved documentation to separate folder
AkhileshNegi Feb 3, 2026
719584d
updated score format in stt result
AkhileshNegi Feb 3, 2026
bf0b4c2
cleaner dataset sample count
AkhileshNegi Feb 3, 2026
68e6821
got rid of redundant sample count
AkhileshNegi Feb 3, 2026
2247faa
removed deadcode
AkhileshNegi Feb 3, 2026
056612c
removing more redundant code
AkhileshNegi Feb 3, 2026
13bb9cc
clean few more cruds
AkhileshNegi Feb 3, 2026
7bbf811
more free from dead code
AkhileshNegi Feb 3, 2026
04e419c
cleanup batch request code
AkhileshNegi Feb 3, 2026
09deab2
cleanup batch
AkhileshNegi Feb 3, 2026
f6bf0c2
got rid of processed_samples as well
AkhileshNegi Feb 3, 2026
d20084b
cleanup provider_metadata from results
AkhileshNegi Feb 3, 2026
4afdd2d
cleanup optimize results
AkhileshNegi Feb 4, 2026
3e62a98
cleanup queries
AkhileshNegi Feb 4, 2026
63de270
cleanup leftovers
AkhileshNegi Feb 4, 2026
c95c044
added validation for provider
AkhileshNegi Feb 4, 2026
9aa6858
updated test suite
AkhileshNegi Feb 4, 2026
4a92416
coderabbit suggestions
AkhileshNegi Feb 4, 2026
e204416
added few more testcases
AkhileshNegi Feb 4, 2026
0210dab
added more testcases for coverage
AkhileshNegi Feb 4, 2026
cce5f11
moving to file table
AkhileshNegi Feb 5, 2026
497427e
Merge branch 'main' into feature/stt-evaluation
AkhileshNegi Feb 6, 2026
0d5a0f8
Merge branch 'feature/stt-evaluation' of github.com:ProjectTech4DevAI…
AkhileshNegi Feb 6, 2026
066f645
update migration
AkhileshNegi Feb 6, 2026
a3428df
updating with language id
AkhileshNegi Feb 6, 2026
5dcf743
updated testcases
AkhileshNegi Feb 6, 2026
d07f6fa
cleanup code
AkhileshNegi Feb 6, 2026
7f8cfaa
removed language_id from evaluation run
AkhileshNegi Feb 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
462 changes: 462 additions & 0 deletions backend/app/alembic/versions/044_add_stt_evaluation_tables.py

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions backend/app/api/docs/stt_evaluation/create_dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Create a new STT evaluation dataset with audio samples.

Each sample requires:
- **object_store_url**: S3 URL of the audio file (from /evaluations/stt/files/audio endpoint)
- **ground_truth**: Reference transcription (optional, for WER/CER metrics)
1 change: 1 addition & 0 deletions backend/app/api/docs/stt_evaluation/get_dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Get an STT dataset with its samples.
1 change: 1 addition & 0 deletions backend/app/api/docs/stt_evaluation/get_result.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Get a single STT transcription result.
1 change: 1 addition & 0 deletions backend/app/api/docs/stt_evaluation/get_run.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Get an STT evaluation run with its results.
1 change: 1 addition & 0 deletions backend/app/api/docs/stt_evaluation/list_datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
List all STT evaluation datasets for the current project.
1 change: 1 addition & 0 deletions backend/app/api/docs/stt_evaluation/list_runs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
List all STT evaluation runs for the current project.
8 changes: 8 additions & 0 deletions backend/app/api/docs/stt_evaluation/start_evaluation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Start an STT evaluation run on a dataset.

The evaluation will:
1. Process each audio sample through the specified providers
2. Generate transcriptions using Gemini Batch API
3. Store results for human review

**Supported providers:** gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash
5 changes: 5 additions & 0 deletions backend/app/api/docs/stt_evaluation/update_feedback.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Update human feedback on an STT transcription result.

**Fields:**
- **is_correct**: Boolean indicating if the transcription is correct
- **comment**: Optional feedback comment explaining issues or observations
7 changes: 7 additions & 0 deletions backend/app/api/docs/stt_evaluation/upload_audio.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Upload a single audio file to S3 for STT evaluation.

**Supported formats:** mp3, wav, flac, m4a, ogg, webm

**Maximum file size:** 200 MB

Returns the S3 URL which can be used when creating an STT dataset.
2 changes: 2 additions & 0 deletions backend/app/api/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
collection_job,
)
from app.api.routes.evaluations import dataset as evaluation_dataset, evaluation
from app.api.routes import stt_evaluations
from app.core.config import settings

api_router = APIRouter()
Expand All @@ -40,6 +41,7 @@
api_router.include_router(doc_transformation_job.router)
api_router.include_router(evaluation_dataset.router)
api_router.include_router(evaluation.router)
api_router.include_router(stt_evaluations.router)
api_router.include_router(languages.router)
api_router.include_router(llm.router)
api_router.include_router(login.router)
Expand Down
5 changes: 5 additions & 0 deletions backend/app/api/routes/stt_evaluations/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""STT Evaluation API routes."""

from .router import router

__all__ = ["router"]
193 changes: 193 additions & 0 deletions backend/app/api/routes/stt_evaluations/dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
"""STT dataset API routes."""

import logging

from fastapi import APIRouter, Body, Depends, HTTPException, Query

from app.api.deps import AuthContextDep, SessionDep
from app.api.permissions import Permission, require_permission
from app.crud.file import get_files_by_ids
from app.crud.language import get_language_by_id
from app.crud.stt_evaluations import (
get_stt_dataset_by_id,
list_stt_datasets,
get_samples_by_dataset_id,
)
from app.models.stt_evaluation import (
STTDatasetCreate,
STTDatasetPublic,
STTDatasetWithSamples,
STTSamplePublic,
)
from app.services.stt_evaluations.dataset import upload_stt_dataset
from app.utils import APIResponse, load_description

logger = logging.getLogger(__name__)

router = APIRouter()


@router.post(
"/datasets",
response_model=APIResponse[STTDatasetPublic],
dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))],
summary="Create STT dataset",
description=load_description("stt_evaluation/create_dataset.md"),
)
def create_dataset(
_session: SessionDep,
auth_context: AuthContextDep,
dataset_create: STTDatasetCreate = Body(...),
) -> APIResponse[STTDatasetPublic]:
"""Create an STT evaluation dataset."""
# Validate language_id if provided
if dataset_create.language_id is not None:
language = get_language_by_id(
session=_session, language_id=dataset_create.language_id
)
if not language:
raise HTTPException(
status_code=400, detail="Invalid language_id: language not found"
)

dataset, samples = upload_stt_dataset(
session=_session,
name=dataset_create.name,
samples=dataset_create.samples,
organization_id=auth_context.organization_.id,
project_id=auth_context.project_.id,
description=dataset_create.description,
language_id=dataset_create.language_id,
)

return APIResponse.success_response(
data=STTDatasetPublic(
id=dataset.id,
name=dataset.name,
description=dataset.description,
type=dataset.type,
language_id=dataset.language_id,
object_store_url=dataset.object_store_url,
dataset_metadata=dataset.dataset_metadata,
sample_count=len(samples),
organization_id=dataset.organization_id,
project_id=dataset.project_id,
inserted_at=dataset.inserted_at,
updated_at=dataset.updated_at,
)
)


@router.get(
"/datasets",
response_model=APIResponse[list[STTDatasetPublic]],
dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))],
summary="List STT datasets",
description=load_description("stt_evaluation/list_datasets.md"),
)
def list_datasets(
_session: SessionDep,
auth_context: AuthContextDep,
limit: int = Query(50, ge=1, le=100, description="Maximum results to return"),
offset: int = Query(0, ge=0, description="Number of results to skip"),
) -> APIResponse[list[STTDatasetPublic]]:
"""List STT evaluation datasets."""
datasets, total = list_stt_datasets(
session=_session,
org_id=auth_context.organization_.id,
project_id=auth_context.project_.id,
limit=limit,
offset=offset,
)

return APIResponse.success_response(
data=datasets,
metadata={"total": total, "limit": limit, "offset": offset},
)


@router.get(
"/datasets/{dataset_id}",
response_model=APIResponse[STTDatasetWithSamples],
dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))],
summary="Get STT dataset",
description=load_description("stt_evaluation/get_dataset.md"),
)
def get_dataset(
_session: SessionDep,
auth_context: AuthContextDep,
dataset_id: int,
include_samples: bool = Query(True, description="Include samples in response"),
sample_limit: int = Query(100, ge=1, le=1000, description="Max samples to return"),
sample_offset: int = Query(0, ge=0, description="Sample offset"),
) -> APIResponse[STTDatasetWithSamples]:
"""Get an STT evaluation dataset."""
dataset = get_stt_dataset_by_id(
session=_session,
dataset_id=dataset_id,
org_id=auth_context.organization_.id,
project_id=auth_context.project_.id,
)

if not dataset:
raise HTTPException(status_code=404, detail="Dataset not found")

samples = []
samples_total = (dataset.dataset_metadata or {}).get("sample_count", 0)

if include_samples:
sample_records = get_samples_by_dataset_id(
session=_session,
dataset_id=dataset_id,
org_id=auth_context.organization_.id,
project_id=auth_context.project_.id,
limit=sample_limit,
offset=sample_offset,
)

# Fetch file records to get object_store_url
file_ids = [s.file_id for s in sample_records]
file_records = get_files_by_ids(
session=_session,
file_ids=file_ids,
organization_id=auth_context.organization_.id,
project_id=auth_context.project_.id,
)
file_map = {f.id: f for f in file_records}

samples = [
STTSamplePublic(
id=s.id,
file_id=s.file_id,
object_store_url=file_map.get(s.file_id).object_store_url
if s.file_id in file_map
else None,
language_id=s.language_id,
ground_truth=s.ground_truth,
sample_metadata=s.sample_metadata,
dataset_id=s.dataset_id,
organization_id=s.organization_id,
project_id=s.project_id,
inserted_at=s.inserted_at,
updated_at=s.updated_at,
)
for s in sample_records
]

return APIResponse.success_response(
data=STTDatasetWithSamples(
id=dataset.id,
name=dataset.name,
description=dataset.description,
type=dataset.type,
language_id=dataset.language_id,
object_store_url=dataset.object_store_url,
dataset_metadata=dataset.dataset_metadata,
organization_id=dataset.organization_id,
project_id=dataset.project_id,
inserted_at=dataset.inserted_at,
updated_at=dataset.updated_at,
samples=samples,
),
metadata={"samples_total": samples_total},
)
Loading
Loading