-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hi,
Thanks for the great tool. I'm really interested in the concept, and I'm trying to optimize the process to analyze a large dataset (up to 1M cells). Digging into the repository, I found you provide the add_dataset.py script, which should help analyze large numbers of cells by allowing the analysis to be split into chunks.
However, it seems that each chunk must then be analyzed sequentially, and thus, the run time remains unpractical. If I got it right, for each new chunk, I have to provide the folder of previous results, all the mtx files for the previously analyzed chunks, and the new chunk. Then, the tool apparently performs exactly the same process as in the single chunk run, so each new chunk takes the same time to be analyzed.
In my case, a 1000-cell chunk takes about 1h with 32 cores (which is the maximum I can use per node); thus, the 1M dataset will take ~1000h (42 days).
Am I doing something wrong here?
Is there any way to analyze all the small chunks in parallel and then "merge" results?
Thanks!