[passes] Add ConvertMatmulToLinear pass #341

dayo09 · 2025-09-12T11:50:34Z

Let's add convert matmul to linear pass.
This commit...
refactors mm serialization logic and make convert_matmul_to_linear pass
introduces new CompileConfig attribute convert_lhs/rhs_const_mm_to_fc.

TICO-DCO-1.0-Signed-off-by: Dayoung Lee dayoung.lee@samsung.com

For #339

dayo09 · 2025-09-12T11:55:59Z

test/modules/op/mm.py

+    def get_compile_config(self):
+        return CompileConfigV1(convert_lhs_const_mm_to_fc=True)


@glistening @seockho-kim Using this compile config will enable matmul op with lhs const node conversion.

@dayo09 Could you improve it to handle bmm, too?

@tag.use_onert class BmmTest(TestModuleBase): def __init__(self): super().__init__() self.weight = torch.randn(2, 3, 4) def forward(self, rhs): out = self.weight @ rhs return out def get_example_inputs(self): return (torch.randn(2, 4, 5),), {} def get_compile_config(self): return CompileConfigV1(convert_lhs_const_mm_to_fc=True)

@seockho-kim Above case is not supported because matmul to fc conversion can be done only if weight is 2dim. Circle FullyConnected operation assumes its weight to be in rank 2.

I'm sorry, I gave you wrong example.
I mean bmm(batch=1) case.

Let me add it in the next PR !

tico/passes/convert_matmul_to_linear.py

jinevening · 2025-09-15T01:05:55Z

tico/passes/convert_matmul_to_linear.py

+    * Linear has better quantization accuracy (NPU backend)
+        Due to ONE compiler's quantization policy;
+        FullyConnected(=Linear) uses per-channel quantization for weight and per-tensor for input.
+        BatchMatmul(=matmul) uses per-tensor quantization for both rhs and lhs.


FYI, a new generation of NPU would support cwq for matmul.

@jinevening Do you mean 3rd generation?

tico/passes/convert_matmul_to_linear.py

jinevening · 2025-09-15T01:08:48Z

tico/serialize/operators/op_mm.py

        inputs = [input, other]
        outputs = [node]

-        if not is_const(other) and prior_latency:


prior_latency is not used anymore?

@jinevening Yes, the old feature is basically part of the new one.

# BEFORE prior_latency==False (default) # AFTER (default) If rhs is const: conversion ON else: conversion OFF # BEFORE prior_latency==True # AFTER convert_rhs_const_mm_to_fc ==False always: conversion OFF

I see. Then it would be possible to remove that arg and related codes.

Removed ;-D

mhs4670go · 2025-09-16T04:23:52Z

tico/passes/convert_matmul_to_linear.py

+        return fc_node
+
+
+class ConvertLhsConstMatmulToLinear(Converter):


Why should it consider lhs and rhs? The left and right matter when converting?

@mhs4670go onert doesn't run lhs const matmul. So this PR converts matmul to fullyconnected for onert, conditionally with config

mhs4670go · 2025-09-17T06:33:41Z

@glistening I saw this comment. Is this pass still needed?

dayo09 · 2025-09-17T07:43:37Z

I saw Samsung/ONE#16064 (comment). Is this pass still needed?

@glistening @seockho-kim How do you think whether this pass is needed still? If it does, I plan to add pass for lowering bmm to mm pass in another pr. Please share your opinions.

glistening · 2025-09-18T00:47:18Z

I saw Samsung/ONE#16064 (comment). Is this pass still needed?

@dayo09 Sure. I need your PR :). The maintainer of ONERT will keep the constraints (not allowing lhs const). My PR (#356) and commenting out is a workaround to solve the next steps in onert. If you're in hurry or it takes much work, please feel free to inform me.

dayo09 · 2025-09-18T05:45:31Z

@jinevening @seockho-kim @mhs4670go PTAL :-D

mhs4670go · 2025-09-18T06:19:02Z

test/modules/op/mm.py

+    """ """
+
+    def __init__(self):


Suggested change

""" """

def __init__(self):

def __init__(self):

I think it would be good to describe what error is expected. NNFW_STATUS_ERROR is a bit ambiguous.

That is how onert throws. It should match.

Ah, I think using docstring or comments is also enough.

jinevening

LGTM

mhs4670go · 2025-09-18T06:27:40Z

tico/config/v1.py

+    convert_lhs_const_mm_to_fc: bool = False
+    convert_rhs_const_mm_to_fc: bool = True


On second thought, just convert_const_mm_to_fc could be simpler choice. Do you have any reasons that chose this design?

rhs_const_mm_to_fc doesn't have trade-off because tranpose is foldable to const, but lhs_const_mm_to_fc requires potential latency trade-off. Therefore, the user needs separate decisions on each case.

glistening · 2025-09-18T06:31:45Z

If I understand correctly, TICO will generate matmul(const lhs, rhs) to linear(...) + transpose1() + transpose2() where transpose1() + transpose2() does nothing. I don't fully understand why it happens. Who and why batchmatmul + transpose is generated for matmul? Does the initial aten graph has batchmatmul + transpose? Or TICO does?

To remove the redundant transpose x 2, it requires to add a pass in circle2circle to fuse batchmatmul + transpose to 1 fullyconnected. It would be nice if TICO can do this.

mhs4670go · 2025-09-18T06:54:07Z

It would be nice if TICO can do this.

We've decided to delegate graph optimization to one-optimize in order to avoid code duplication. Is it hard to use one-optimize?

mhs4670go

glistening · 2025-09-18T07:19:38Z

Yes, I agree that circle2circle is better for circle level optimization.
It does better in that batchmatmul is converted to fullyconnected based on config.

( I would like to check it works. However, as usual, I cannot access GitHub.
I will check later after I managed to check out this PR. )

I was just curious why TICO or torch.export choose the inefficient operation sequence for simple matmul.

glistening · 2025-09-18T07:29:02Z

Hmm. I succeed to gh pr checkout 341.
I converted my model, but there is no change.
It still emit batchmatmul.

@seockho-kim Does this PR solve the same issue in gemma3?

mhs4670go · 2025-09-18T07:38:46Z

@dayo09 Conflicts should be resolved.

@glistening

~~After you apply this PR, you should give the configuration to use introduced feature. Did you do it like below?~~ Seems that bmm to mm conversion is also needed.

config = tico.CompileConfigV1()
config.convert_lhs_const_mm_to_fc = True
circle_model = tico.convert(torch_module, example_inputs, config = config)

Let's add convert matmul to linear pass. This commit... refactors mm serialization logic and make convert_matmul_to_linear pass introduces new CompileConfig attribute convert_lhs/rhs_const_mm_to_fc. TICO-DCO-1.0-Signed-off-by: Dayoung Lee <dayoung.lee@samsung.com>

Co-authored-by: Hyukjin Jeong <hj1.jeong@samsung.com>

dayo09 · 2025-09-18T07:48:00Z

@glistening Sorry for ambiguousness in my comment.

If it does, I plan to add pass for lowering bmm to mm pass in another pr. Please share your opinions.

I mean, this PR doesn't support bmm to mm YET, it needs further PR. batch-1 bmm to mm is separate feature so I planned to do it in the next pr.

dayo09 · 2025-09-18T07:48:37Z

@@jinevening @seockho-kim @mhs4670go It's rebased. PTAL again 😅

seockho-kim · 2025-09-18T07:50:24Z

Hmm. I succeed to gh pr checkout 341. I converted my model, but there is no change. It still emit batchmatmul.

@seockho-kim Does this PR solve the same issue in gemma3?

This pr does not solve issue in gemma3, because bmm is not changed.
I'm waiting for another PR.

seockho-kim

LGTM

dayo09 · 2025-09-18T07:51:55Z

On a second thought, just adding 1-batch bmm to mm may include redundant reshape, so it's not a general pass.

@jinevening @mhs4670go Do you think conversion of 1-batch bmm to mm is allowed or may I introduce the pass optionally for matmul to linear case?

mhs4670go · 2025-09-18T08:03:03Z

Do you think conversion of 1-batch bmm to mm is allowed or may I introduce the pass optionally for matmul to linear case?

I think it should be optional because it's not kind of circle legalization.

jinevening · 2025-09-18T08:08:39Z

Do you think conversion of 1-batch bmm to mm is allowed or may I introduce the pass optionally for matmul to linear case?

+1 for the latter

glistening · 2025-09-19T01:29:47Z

If TICO cannot convert my case by its own and needs to run circle level optimizer:

It would be good to write a single pass in circle level optimizer. Rather than introducing two pass — one in TICO, another one in circle optimizer.
I don't expect circle2circle has such pass
I don't like circle2circle — c++ solution. Maybe it would be better to introduce my circle level graph optimizer written in python using flatbuffer provided object api.

dayo09 force-pushed the 0912-mm-to-linear branch 2 times, most recently from fddc18a to 72a3617 Compare September 12, 2025 11:55

dayo09 commented Sep 12, 2025

View reviewed changes

jinevening reviewed Sep 15, 2025

View reviewed changes

tico/passes/convert_matmul_to_linear.py Show resolved Hide resolved

jinevening reviewed Sep 15, 2025

View reviewed changes

tico/passes/convert_matmul_to_linear.py Outdated Show resolved Hide resolved

jinevening reviewed Sep 15, 2025

View reviewed changes

glistening mentioned this pull request Sep 15, 2025

batchmatmul with lhs constant to fullyconnected #339

Closed

mhs4670go reviewed Sep 16, 2025

View reviewed changes

dayo09 force-pushed the 0912-mm-to-linear branch from 27bde5b to 9027726 Compare September 17, 2025 08:00

glistening mentioned this pull request Sep 18, 2025

[serialize] prefer fc over bmm on const lhs #356

Closed

dayo09 requested review from jinevening, mhs4670go and seockho-kim September 18, 2025 04:49

mhs4670go reviewed Sep 18, 2025

View reviewed changes

jinevening previously approved these changes Sep 18, 2025

View reviewed changes

mhs4670go reviewed Sep 18, 2025

View reviewed changes

mhs4670go previously approved these changes Sep 18, 2025

View reviewed changes

dayo09 and others added 3 commits September 18, 2025 16:45

Apply feedback

831aa03

Update tico/passes/convert_matmul_to_linear.py

d6bc302

Co-authored-by: Hyukjin Jeong <hj1.jeong@samsung.com>

dayo09 added 2 commits September 18, 2025 16:45

Remove unused feature

ec94c76

Fix

4588a1b

dayo09 dismissed stale reviews from mhs4670go and jinevening via 4588a1b September 18, 2025 07:45

dayo09 force-pushed the 0912-mm-to-linear branch from 9027726 to 4588a1b Compare September 18, 2025 07:45

dayo09 requested review from jinevening and mhs4670go September 18, 2025 07:49

seockho-kim approved these changes Sep 18, 2025

View reviewed changes

jinevening approved these changes Sep 18, 2025

View reviewed changes

dayo09 merged commit 16276d7 into Samsung:main Sep 18, 2025
6 checks passed

		def get_compile_config(self):
		return CompileConfigV1(convert_lhs_const_mm_to_fc=True)

		return fc_node


		class ConvertLhsConstMatmulToLinear(Converter):

		convert_lhs_const_mm_to_fc: bool = False
		convert_rhs_const_mm_to_fc: bool = True

[passes] Add ConvertMatmulToLinear pass #341

[passes] Add ConvertMatmulToLinear pass #341

Uh oh!

Conversation

dayo09 commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhs4670go commented Sep 17, 2025

Uh oh!

dayo09 commented Sep 17, 2025

Uh oh!

glistening commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dayo09 commented Sep 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jinevening left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glistening commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhs4670go commented Sep 18, 2025

Uh oh!

mhs4670go left a comment

Choose a reason for hiding this comment

Uh oh!

glistening commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glistening commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhs4670go commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dayo09 commented Sep 18, 2025

Uh oh!

dayo09 commented Sep 12, 2025 •

edited

Loading

glistening commented Sep 18, 2025 •

edited

Loading

glistening commented Sep 18, 2025 •

edited

Loading

glistening commented Sep 18, 2025 •

edited

Loading

glistening commented Sep 18, 2025 •

edited

Loading

mhs4670go commented Sep 18, 2025 •

edited

Loading

glistening commented Sep 19, 2025 •

edited

Loading