-
Notifications
You must be signed in to change notification settings - Fork 22
[test] Add LlamaDecoderLayerWithKVCache using captured input #215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It adds LlamaDecoderLayer test, which uses captured input. TICO-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>
|
Hmm... (ADD) Ah, by default ./ccex format fixed the format error. I added the change. |
| import copy, inspect, types | ||
|
|
||
| from transformers.models.llama.modeling_llama import LlamaDecoderLayer | ||
|
|
||
| forward_old = LlamaDecoderLayer.forward | ||
|
|
||
|
|
||
| def capture_and_forward(self, *args, **kwargs): | ||
| global captured_input | ||
|
|
||
| # Prepare args tuple for TICO.convert() | ||
| # Get arg_names in positional args order using inspect | ||
| sig = inspect.signature(forward_old) | ||
| args_names = [ | ||
| # signature includes `self`` and `kwargs``. | ||
| # Just retrieve the ordinary positional inputs only | ||
| name for name in sig.parameters.keys() if name not in ("self", "kwargs") | ||
| ] | ||
|
|
||
| args_dict = dict(zip(args_names, args)) | ||
| args_dict.update(kwargs) | ||
|
|
||
| def populate_args(args_dict, filter): | ||
| for key in filter: | ||
| args_dict.pop(key, None) | ||
| args_tuple = tuple(args_dict.get(name, None) for name in args_names) | ||
| return copy.deepcopy(args_tuple) | ||
|
|
||
| if len(args_dict['past_key_value'].key_cache) != 0: | ||
| input_to_remove = [ "use_cache" ] | ||
| captured_input = populate_args(args_dict, input_to_remove) | ||
|
|
||
| return forward_old(self, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to put this code in somewhere like TICO/util.
| # Tokenizer | ||
| from transformers import AutoTokenizer | ||
|
|
||
| tokenizer = AutoTokenizer.from_pretrained(model_name) | ||
| tokenizer.pad_token = tokenizer.eos_token | ||
| tokenizer.padding_side = "right" | ||
| inputs = tokenizer( | ||
| prompt, | ||
| return_tensors="pt", | ||
| padding="max_length", | ||
| max_length=32, | ||
| truncation=True, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code can be generated simply using gpt for new models.
99e8496 to
040cca8
Compare
|
See #217. |
It adds LlamaDecoderLayerWithKVCache test, which uses captured input.
TICO-DCO-1.0-Signed-off-by: Sanggyu Lee sg5.lee@samsung.com