Conversation
experiments/run_slicegpt.py
Outdated
|
|
||
| import torch | ||
| import wandb | ||
| from transformers.models.llama.modeling_llama import LlamaConfig |
There was a problem hiding this comment.
I think we might be missing one abstraction. We shouldn't need any model specific imports here. hf_utils.get_model_and_tokenizer abstracts the model type away; we want the same for saving HF-compatible sliced models. This save abstraction will also avoid the Sliced<Model>ForCausalLM imports here.
There was a problem hiding this comment.
Since there is a single intermediate size, we can make this explicit by using the type ConstSlicingScheduler in the interface instead of SlicingScheduler. Otherwise, we can add support for all slicing schedulers and keep the base type SlicingScheduler. I don't mind how this is done - in this PR or a follow up PR. We should update the PR title to reflect the decision and contents of the PR.
pashminacameron
left a comment
There was a problem hiding this comment.
Hmm :/ Tests are not working due to import issues.
…icrosoft/TransformerCompression into liana/make_model_HF_compatible
|
I would like to hold off merging this into main for a bit. I will work on this more (after the smaller fixes) and we can re-review. |
Any update by when the changes will be merged |
| parallel_blocks=True, | ||
| ) | ||
|
|
||
| sliced_model = SlicedPhiForCausalLM.from_pretrained( |
There was a problem hiding this comment.
The sliced_model can be of type SlicedPhiForCausalLM or SlicedLlamaForCausalLM at this point. Debugging..
| model_adapter.use_cache = False | ||
|
|
||
| tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True, token=token, local_files_only=local_model) | ||
| tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, token=token, local_files_only=local_model) |
There was a problem hiding this comment.
This change breaks loading of local models.
There was a problem hiding this comment.
Probably, should remain model_path.
We will work on merging these changes by the end of this week |
|
Hi Please let me know when these changes will be merged ? |
|
Hi @pashminacameron and @LianaMikael, Is there a timeline to merge these changes to main? |
This PR adds the implementations for sliced Phi and Llama models to make it easy to save and load sliced models.
The models can be initialized with a given scheduler (or no scheduler for zero sparsity) and support
save_pretrainedandfrom_pretrainedmethods like standard HF models.