-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hi @Kaichengalex 🤗
I'm Niels and work as part of the open-source team at Hugging Face. I discovered your work through Hugging Face's daily papers as yours got featured: https://huggingface.co/papers/2601.10305.
The paper page lets people find artifacts about your paper, and you can also claim the paper as yours which will show up on your public profile, add Github and project page URLs.
It's great to see the DanQing100M dataset already hosted on the Hub! It's an excellent resource for the Chinese vision-language community.
Would you also like to host the SigLIP2 model checkpoints that you've pre-trained on DanQing on https://huggingface.co/models?
Hosting the weights on Hugging Face will give them more visibility and enable better discoverability. We can add metadata tags so that people find the models easier (e.g. when filtering for Chinese multimodal models), link them to the paper page, etc.
Additionally, I noticed you mentioned storage limitations for hosting the full 12TB of images. Have you considered using the Parquet format or WebDataset? These are very efficient for large-scale vision datasets and are the standard for hosting big data on Hugging Face, allowing users to stream the data directly.
If you're down, leaving a guide for uploading models here.
Let me know if you're interested or need any guidance!
Kind regards,
Niels