Skip to content

Conversation

@glistening
Copy link
Contributor

@glistening glistening commented Jul 17, 2025

It adds LlamaDecoderLayerWithKVCache test, which uses captured input.

TICO-DCO-1.0-Signed-off-by: Sanggyu Lee sg5.lee@samsung.com

./ccex test -m LlamaDecoderLayerWithKVCache
open llama.decoderlayer.circle

It adds LlamaDecoderLayer test, which uses captured input.

TICO-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>
@glistening
Copy link
Contributor Author

It is a few line modified version from #208.
It generates circle model for LlamaDecoderLayer while #208 generates the whole LlamaModel for prefill pass with padded user tokens.

@glistening
Copy link
Contributor Author

glistening commented Jul 17, 2025

Hmm... ./ccex format on my local does not show any error. But CI complains. I will fix it manually.

(ADD) Ah, by default ./ccex format fixed the format error. I added the change.

Comment on lines 7 to 39
import copy, inspect, types

from transformers.models.llama.modeling_llama import LlamaDecoderLayer

forward_old = LlamaDecoderLayer.forward


def capture_and_forward(self, *args, **kwargs):
global captured_input

# Prepare args tuple for TICO.convert()
# Get arg_names in positional args order using inspect
sig = inspect.signature(forward_old)
args_names = [
# signature includes `self`` and `kwargs``.
# Just retrieve the ordinary positional inputs only
name for name in sig.parameters.keys() if name not in ("self", "kwargs")
]

args_dict = dict(zip(args_names, args))
args_dict.update(kwargs)

def populate_args(args_dict, filter):
for key in filter:
args_dict.pop(key, None)
args_tuple = tuple(args_dict.get(name, None) for name in args_names)
return copy.deepcopy(args_tuple)

if len(args_dict['past_key_value'].key_cache) != 0:
input_to_remove = [ "use_cache" ]
captured_input = populate_args(args_dict, input_to_remove)

return forward_old(self, *args, **kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to put this code in somewhere like TICO/util.

Comment on lines 42 to 55
# Tokenizer
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
inputs = tokenizer(
prompt,
return_tensors="pt",
padding="max_length",
max_length=32,
truncation=True,
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code can be generated simply using gpt for new models.

@glistening glistening changed the title [test] Add LlamaDecoderLayer using captured input [test] Add LlamaDecoderLayerWithKVCache using captured input Jul 17, 2025
@glistening glistening force-pushed the decoderlayer branch 4 times, most recently from 99e8496 to 040cca8 Compare July 17, 2025 06:06
- forward_old → forward_org
- output filename : llama → tinyllama
- LlamaDecoderLayerWithCache → LlamaDecoderLayerWithKVCache
@glistening
Copy link
Contributor Author

See #217.

@glistening glistening closed this Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant