Dear all,
First of all, thanks a lot for the amazing work and the benchmarks. The dataset on the Huggingface website contains text, labels, and dimensions. How can I access the audio-video files for each of the dataset rows? This would be a big contribution.
Best,
Esam