Skip to content

reference_finder is a package for finding references in a target corpus that come from a search corpus. It uses the cosine similarities of rolling embeddings to find subsections of target documents that likely came from a source document. It works for audio, video, and text files.

License

Notifications You must be signed in to change notification settings

tom-pinckney/reference_finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Description

reference_finder is a package for finding references in a target corpus that come from a search corpus. It uses the cosine similarities of rolling embeddings to find subsections of target documents that likely came from a source document. It works for audio, video, and text files.

Requirements

For transcribing of audio/video files ffmpeg and whisper are required. In google colab these can easily be installed with the following commands

!sudo apt update && sudo apt install ffmpeg
!pip install -q git+https://github.com/openai/whisper.git

About

reference_finder is a package for finding references in a target corpus that come from a search corpus. It uses the cosine similarities of rolling embeddings to find subsections of target documents that likely came from a source document. It works for audio, video, and text files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages