-
Notifications
You must be signed in to change notification settings - Fork 9
Evaluation: STT #571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
AkhileshNegi
wants to merge
46
commits into
main
Choose a base branch
from
feature/stt-evaluation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Evaluation: STT #571
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
011d489
first stab at STT evals
AkhileshNegi 7777290
Merge branch 'main' of github.com:ProjectTech4DevAI/kaapi-backend int…
AkhileshNegi d8df80c
Merge branch 'main' of github.com:ProjectTech4DevAI/kaapi-backend int…
AkhileshNegi f1df7f9
fix migration naming
AkhileshNegi cda0611
fixing endpoints
AkhileshNegi ad5779f
update dataset endpoint
AkhileshNegi 01e2beb
update types
AkhileshNegi 1637007
updated dataset with URL
AkhileshNegi 36af7e9
added few more testcases
AkhileshNegi 78fd206
added storage to core for easy reuse
AkhileshNegi 4ac2ca6
cleanup for audio duration
AkhileshNegi d8b531c
first stab at fixing celery task to cron
AkhileshNegi 2295da5
added gemini as provider
AkhileshNegi 25e6002
moving to batch job in gemini
AkhileshNegi db2512e
code refactoring, using batch requests and files similar to OpenAI
AkhileshNegi ff29ddd
few cleanups
AkhileshNegi cd979fd
updated migration
AkhileshNegi b6c633a
cleanup config for batch
AkhileshNegi b6e6649
moved documentation to separate folder
AkhileshNegi 719584d
updated score format in stt result
AkhileshNegi bf0b4c2
cleaner dataset sample count
AkhileshNegi 68e6821
got rid of redundant sample count
AkhileshNegi 2247faa
removed deadcode
AkhileshNegi 056612c
removing more redundant code
AkhileshNegi 13bb9cc
clean few more cruds
AkhileshNegi 7bbf811
more free from dead code
AkhileshNegi 04e419c
cleanup batch request code
AkhileshNegi 09deab2
cleanup batch
AkhileshNegi f6bf0c2
got rid of processed_samples as well
AkhileshNegi d20084b
cleanup provider_metadata from results
AkhileshNegi 4afdd2d
cleanup optimize results
AkhileshNegi 3e62a98
cleanup queries
AkhileshNegi 63de270
cleanup leftovers
AkhileshNegi c95c044
added validation for provider
AkhileshNegi 9aa6858
updated test suite
AkhileshNegi 4a92416
coderabbit suggestions
AkhileshNegi e204416
added few more testcases
AkhileshNegi 0210dab
added more testcases for coverage
AkhileshNegi cce5f11
moving to file table
AkhileshNegi 497427e
Merge branch 'main' into feature/stt-evaluation
AkhileshNegi 0d5a0f8
Merge branch 'feature/stt-evaluation' of github.com:ProjectTech4DevAI…
AkhileshNegi 066f645
update migration
AkhileshNegi a3428df
updating with language id
AkhileshNegi 5dcf743
updated testcases
AkhileshNegi d07f6fa
cleanup code
AkhileshNegi 7f8cfaa
removed language_id from evaluation run
AkhileshNegi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
462 changes: 462 additions & 0 deletions
462
backend/app/alembic/versions/044_add_stt_evaluation_tables.py
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| Create a new STT evaluation dataset with audio samples. | ||
|
|
||
| Each sample requires: | ||
| - **object_store_url**: S3 URL of the audio file (from /evaluations/stt/files/audio endpoint) | ||
| - **ground_truth**: Reference transcription (optional, for WER/CER metrics) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Get an STT dataset with its samples. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Get a single STT transcription result. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Get an STT evaluation run with its results. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| List all STT evaluation datasets for the current project. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| List all STT evaluation runs for the current project. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| Start an STT evaluation run on a dataset. | ||
|
|
||
| The evaluation will: | ||
| 1. Process each audio sample through the specified providers | ||
| 2. Generate transcriptions using Gemini Batch API | ||
| 3. Store results for human review | ||
|
|
||
| **Supported providers:** gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| Update human feedback on an STT transcription result. | ||
|
|
||
| **Fields:** | ||
| - **is_correct**: Boolean indicating if the transcription is correct | ||
| - **comment**: Optional feedback comment explaining issues or observations |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| Upload a single audio file to S3 for STT evaluation. | ||
|
|
||
| **Supported formats:** mp3, wav, flac, m4a, ogg, webm | ||
|
|
||
| **Maximum file size:** 200 MB | ||
|
|
||
| Returns the S3 URL which can be used when creating an STT dataset. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| """STT Evaluation API routes.""" | ||
|
|
||
| from .router import router | ||
|
|
||
| __all__ = ["router"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,193 @@ | ||
| """STT dataset API routes.""" | ||
|
|
||
| import logging | ||
|
|
||
| from fastapi import APIRouter, Body, Depends, HTTPException, Query | ||
|
|
||
| from app.api.deps import AuthContextDep, SessionDep | ||
| from app.api.permissions import Permission, require_permission | ||
| from app.crud.file import get_files_by_ids | ||
| from app.crud.language import get_language_by_id | ||
| from app.crud.stt_evaluations import ( | ||
| get_stt_dataset_by_id, | ||
| list_stt_datasets, | ||
| get_samples_by_dataset_id, | ||
| ) | ||
| from app.models.stt_evaluation import ( | ||
| STTDatasetCreate, | ||
| STTDatasetPublic, | ||
| STTDatasetWithSamples, | ||
| STTSamplePublic, | ||
| ) | ||
| from app.services.stt_evaluations.dataset import upload_stt_dataset | ||
| from app.utils import APIResponse, load_description | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| router = APIRouter() | ||
|
|
||
|
|
||
| @router.post( | ||
| "/datasets", | ||
| response_model=APIResponse[STTDatasetPublic], | ||
| dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))], | ||
| summary="Create STT dataset", | ||
| description=load_description("stt_evaluation/create_dataset.md"), | ||
| ) | ||
| def create_dataset( | ||
| _session: SessionDep, | ||
| auth_context: AuthContextDep, | ||
| dataset_create: STTDatasetCreate = Body(...), | ||
| ) -> APIResponse[STTDatasetPublic]: | ||
| """Create an STT evaluation dataset.""" | ||
| # Validate language_id if provided | ||
| if dataset_create.language_id is not None: | ||
| language = get_language_by_id( | ||
| session=_session, language_id=dataset_create.language_id | ||
| ) | ||
| if not language: | ||
| raise HTTPException( | ||
| status_code=400, detail="Invalid language_id: language not found" | ||
| ) | ||
|
|
||
| dataset, samples = upload_stt_dataset( | ||
| session=_session, | ||
| name=dataset_create.name, | ||
| samples=dataset_create.samples, | ||
| organization_id=auth_context.organization_.id, | ||
| project_id=auth_context.project_.id, | ||
| description=dataset_create.description, | ||
| language_id=dataset_create.language_id, | ||
| ) | ||
|
|
||
| return APIResponse.success_response( | ||
| data=STTDatasetPublic( | ||
| id=dataset.id, | ||
| name=dataset.name, | ||
| description=dataset.description, | ||
| type=dataset.type, | ||
| language_id=dataset.language_id, | ||
| object_store_url=dataset.object_store_url, | ||
| dataset_metadata=dataset.dataset_metadata, | ||
| sample_count=len(samples), | ||
| organization_id=dataset.organization_id, | ||
| project_id=dataset.project_id, | ||
| inserted_at=dataset.inserted_at, | ||
| updated_at=dataset.updated_at, | ||
| ) | ||
| ) | ||
|
|
||
|
|
||
| @router.get( | ||
| "/datasets", | ||
| response_model=APIResponse[list[STTDatasetPublic]], | ||
| dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))], | ||
| summary="List STT datasets", | ||
| description=load_description("stt_evaluation/list_datasets.md"), | ||
| ) | ||
| def list_datasets( | ||
| _session: SessionDep, | ||
| auth_context: AuthContextDep, | ||
| limit: int = Query(50, ge=1, le=100, description="Maximum results to return"), | ||
| offset: int = Query(0, ge=0, description="Number of results to skip"), | ||
| ) -> APIResponse[list[STTDatasetPublic]]: | ||
| """List STT evaluation datasets.""" | ||
| datasets, total = list_stt_datasets( | ||
| session=_session, | ||
| org_id=auth_context.organization_.id, | ||
| project_id=auth_context.project_.id, | ||
| limit=limit, | ||
| offset=offset, | ||
| ) | ||
|
|
||
| return APIResponse.success_response( | ||
| data=datasets, | ||
| metadata={"total": total, "limit": limit, "offset": offset}, | ||
| ) | ||
|
|
||
|
|
||
| @router.get( | ||
| "/datasets/{dataset_id}", | ||
| response_model=APIResponse[STTDatasetWithSamples], | ||
| dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))], | ||
| summary="Get STT dataset", | ||
| description=load_description("stt_evaluation/get_dataset.md"), | ||
| ) | ||
| def get_dataset( | ||
| _session: SessionDep, | ||
| auth_context: AuthContextDep, | ||
| dataset_id: int, | ||
| include_samples: bool = Query(True, description="Include samples in response"), | ||
| sample_limit: int = Query(100, ge=1, le=1000, description="Max samples to return"), | ||
| sample_offset: int = Query(0, ge=0, description="Sample offset"), | ||
| ) -> APIResponse[STTDatasetWithSamples]: | ||
| """Get an STT evaluation dataset.""" | ||
| dataset = get_stt_dataset_by_id( | ||
| session=_session, | ||
| dataset_id=dataset_id, | ||
| org_id=auth_context.organization_.id, | ||
| project_id=auth_context.project_.id, | ||
| ) | ||
|
|
||
| if not dataset: | ||
| raise HTTPException(status_code=404, detail="Dataset not found") | ||
|
|
||
| samples = [] | ||
| samples_total = (dataset.dataset_metadata or {}).get("sample_count", 0) | ||
|
|
||
| if include_samples: | ||
| sample_records = get_samples_by_dataset_id( | ||
| session=_session, | ||
| dataset_id=dataset_id, | ||
| org_id=auth_context.organization_.id, | ||
| project_id=auth_context.project_.id, | ||
| limit=sample_limit, | ||
| offset=sample_offset, | ||
| ) | ||
|
|
||
| # Fetch file records to get object_store_url | ||
| file_ids = [s.file_id for s in sample_records] | ||
| file_records = get_files_by_ids( | ||
| session=_session, | ||
| file_ids=file_ids, | ||
| organization_id=auth_context.organization_.id, | ||
| project_id=auth_context.project_.id, | ||
| ) | ||
| file_map = {f.id: f for f in file_records} | ||
|
|
||
| samples = [ | ||
| STTSamplePublic( | ||
| id=s.id, | ||
| file_id=s.file_id, | ||
| object_store_url=file_map.get(s.file_id).object_store_url | ||
| if s.file_id in file_map | ||
| else None, | ||
| language_id=s.language_id, | ||
| ground_truth=s.ground_truth, | ||
| sample_metadata=s.sample_metadata, | ||
| dataset_id=s.dataset_id, | ||
| organization_id=s.organization_id, | ||
| project_id=s.project_id, | ||
| inserted_at=s.inserted_at, | ||
| updated_at=s.updated_at, | ||
| ) | ||
| for s in sample_records | ||
| ] | ||
|
|
||
| return APIResponse.success_response( | ||
| data=STTDatasetWithSamples( | ||
| id=dataset.id, | ||
| name=dataset.name, | ||
| description=dataset.description, | ||
| type=dataset.type, | ||
| language_id=dataset.language_id, | ||
| object_store_url=dataset.object_store_url, | ||
| dataset_metadata=dataset.dataset_metadata, | ||
| organization_id=dataset.organization_id, | ||
| project_id=dataset.project_id, | ||
| inserted_at=dataset.inserted_at, | ||
| updated_at=dataset.updated_at, | ||
| samples=samples, | ||
| ), | ||
| metadata={"samples_total": samples_total}, | ||
| ) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.