Conversation
|
I'm not sure what the situation is currently on Windows. I kept it optional for ExLlama because it can be a little tricky to get it up and running, even on Linux where it sometimes takes 15-30 minutes to install if it needs to compile, but especially on Windows. Or at least that used to be the case. |
|
I think it should work on Windows too now. Ooba installs wheels from https://github.com/bdashore3/flash-attention for Windows, and from https://github.com/Dao-AILab/flash-attention for Linux. Not so sure about the package on pypi though. |
|
flash-attn did not install on Windows for me, so worth investigating before integrating if the intent is for this to be nice and lite and user-friendly. |
|
|
|
I'd agree, nice to have it optional. I do actually have packaging installed, so not really sure. I gave up trying to get flash-attn working as it was just causing massive headaches. |
Speedup, especially as context grows.