[test] Add LlamaDecoderLayerWithKVCache using captured input #215

glistening · 2025-07-17T01:36:11Z

It adds LlamaDecoderLayerWithKVCache test, which uses captured input.

TICO-DCO-1.0-Signed-off-by: Sanggyu Lee sg5.lee@samsung.com

./ccex test -m LlamaDecoderLayerWithKVCache
open llama.decoderlayer.circle

It adds LlamaDecoderLayer test, which uses captured input. TICO-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>

glistening · 2025-07-17T01:38:54Z

It is a few line modified version from #208.
It generates circle model for LlamaDecoderLayer while #208 generates the whole LlamaModel for prefill pass with padded user tokens.

glistening · 2025-07-17T02:47:27Z

Hmm... ./ccex format on my local does not show any error. But CI complains. I will fix it manually.

(ADD) Ah, by default ./ccex format fixed the format error. I added the change.

glistening · 2025-07-17T01:40:14Z

test/modules/model/LlamaDecoderLayerWithCache/model.py

+import copy, inspect, types
+
+from transformers.models.llama.modeling_llama import LlamaDecoderLayer
+
+forward_old = LlamaDecoderLayer.forward
+
+
+def capture_and_forward(self, *args, **kwargs):
+    global captured_input
+
+    # Prepare args tuple for TICO.convert()
+    # Get arg_names in positional args order using inspect
+    sig = inspect.signature(forward_old)
+    args_names = [
+        # signature includes `self`` and `kwargs``.
+        # Just retrieve the ordinary positional inputs only
+        name for name in sig.parameters.keys() if name not in ("self", "kwargs")
+    ]
+
+    args_dict = dict(zip(args_names, args))
+    args_dict.update(kwargs)
+
+    def populate_args(args_dict, filter):
+        for key in filter:
+            args_dict.pop(key, None)
+        args_tuple = tuple(args_dict.get(name, None) for name in args_names)
+        return copy.deepcopy(args_tuple)
+
+    if len(args_dict['past_key_value'].key_cache) != 0:
+        input_to_remove = [ "use_cache" ]
+        captured_input = populate_args(args_dict, input_to_remove)
+
+    return forward_old(self, *args, **kwargs)


It would be good to put this code in somewhere like TICO/util.

glistening · 2025-07-17T01:40:46Z

test/modules/model/LlamaDecoderLayerWithCache/model.py

+# Tokenizer
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+tokenizer.pad_token = tokenizer.eos_token
+tokenizer.padding_side = "right"
+inputs = tokenizer(
+    prompt,
+    return_tensors="pt",
+    padding="max_length",
+    max_length=32,
+    truncation=True,
+)
+


I think this code can be generated simply using gpt for new models.

- forward_old → forward_org - output filename : llama → tinyllama - LlamaDecoderLayerWithCache → LlamaDecoderLayerWithKVCache

glistening · 2025-11-04T05:01:34Z

See #217.

[test] Add LlamaDecoderLayer using captured input

55fa695

It adds LlamaDecoderLayer test, which uses captured input. TICO-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>

glistening commented Jul 17, 2025

View reviewed changes

glistening force-pushed the decoderlayer branch from 6881f91 to 20e41d9 Compare July 17, 2025 02:59

Make format happy

5bc2ce6

glistening force-pushed the decoderlayer branch from 20e41d9 to 5bc2ce6 Compare July 17, 2025 03:06

glistening mentioned this pull request Jul 17, 2025

Auto example input generation #207

Open

glistening changed the title ~~[test] Add LlamaDecoderLayer using captured input~~ [test] Add LlamaDecoderLayerWithKVCache using captured input Jul 17, 2025

glistening force-pushed the decoderlayer branch 4 times, most recently from 99e8496 to 040cca8 Compare July 17, 2025 06:06

Rename

fe498e2

- forward_old → forward_org - output filename : llama → tinyllama - LlamaDecoderLayerWithCache → LlamaDecoderLayerWithKVCache

glistening force-pushed the decoderlayer branch from 040cca8 to fe498e2 Compare July 17, 2025 06:33

glistening closed this Nov 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[test] Add LlamaDecoderLayerWithKVCache using captured input #215

[test] Add LlamaDecoderLayerWithKVCache using captured input #215

Uh oh!

glistening commented Jul 17, 2025 •

edited

Loading

Uh oh!

glistening commented Jul 17, 2025

Uh oh!

glistening commented Jul 17, 2025 •

edited

Loading

Uh oh!

glistening Jul 17, 2025

Uh oh!

glistening Jul 17, 2025

Uh oh!

glistening commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[test] Add LlamaDecoderLayerWithKVCache using captured input #215

[test] Add LlamaDecoderLayerWithKVCache using captured input #215

Uh oh!

Conversation

glistening commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glistening commented Jul 17, 2025

Uh oh!

glistening commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glistening Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

glistening Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

glistening commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

glistening commented Jul 17, 2025 •

edited

Loading

glistening commented Jul 17, 2025 •

edited

Loading