Releases · UbiquitousLearning/mllm

New Features

Pythonic eager execution – Rapid model development
Unified hardware support – Arm CPU, OpenCL GPU, QNN NPU
Advanced optimizations – Quantization, pruning, speculative execution
NPU-ready IR – Seamless integration with NPU frameworks
Deployment toolkit – SDK + CLI inference tool
mllm JIT Kernel

News

[2026 Feb 03] 🔥🔥🔥 MLLM Qnn AOT Support for Full Graph Execution on NPU! Quick Start, Technical Report
[2025 Nov 27] Android Demo Update: Enabled stable Qwen3 and DeepSeek-OCR streaming on Android via a novel In-App Go Server Architecture.
[2025 Nov 23] MLLM v2 released!

What's Changed

Develop qnn zh by @liang1232018 in #42
Develop qnn zh by @liang1232018 in #43
Develop qnn zh by @liang1232018 in #44
fix: qnn rope file name by @liang1232018 in #46
Develop qnn zh by @liang1232018 in #47
Develop qnn zh by @liang1232018 in #48
chore: qnn arm build config by @liang1232018 in #49
Develop qnn zh by @liang1232018 in #50
Develop qnn zh by @liang1232018 in #51
Develop qnn zh by @liang1232018 in #52
fix: qnn linear quantize tensor duplicate by @liang1232018 in #53
Feat: Add new FrontEnd and model demos. by @yirongjie in #68
feat: Add OPT Tokenizer. by @lx200916 in #66
Feat: Optimize the operation process by @yirongjie in #69
Fix: Tensor:: mm(): reference not passed in as input by @yirongjie in #70
Feat: Fill in input Tensor by @yirongjie in #72
Single precision inference support for the gemma-2B model by @chenghuaWang in #75
Update README.md by @yirongjie in #76
Support for the QWen1.5-0.5B model by @chenghuaWang in #79
feat: mistral v0.2 7B support by @chenghuaWang in #83
Update requirements.txt by @lx200916 in #87
doc: Update README.md by @xumengwei in #89
feat: Add Multi-Head Latent Attention(MLA) support. by @yirongjie in #90
feat: add sparse inference like powerinfer by @XieWeikai in #86
feat: Yi-1.5-6B support by @chenghuaWang in #88
feat: Inference speed(tokens/s) profiling by @yirongjie in #91
feat: Add new demo: demo_imagebind_1mod by @yirongjie in #92
feat: Stablelm 2 1.6b support by @emt0re0 in #94
doc: Update README.md by @yirongjie in #95
feat: add elastic llama by @yirongjie in #98
feat:Add OPT support by @yirongjie in #99
feat: add Qwen 1.8B demo by @yirongjie in #100
perf: Use vector<shared_ptr<Tensor>> Tensor::graphs by @yirongjie in #101
perf: add AArch64 GEMM/GEMV for q4_0. by @yirongjie in #104
feat: add DEBUGSAVETENSOR & DEBUGOPTIME by @yirongjie in #106
feat: topk/topp sampling by @chenghuaWang in #105
fix: Qwen v1.5 Tokenizer bug by @chenghuaWang in #107
feat: add clear_kvcache && fix: BUG in quantize. by @yirongjie in #108
feat: GEMV + Bias mixed precision support for ARM Devices by @chenghuaWang in #109
feat: llamafile_sgemm bias support by @chenghuaWang in #111
chore: Disable OpenMP for Mac. by @lx200916 in #110
feat: Preliminary implementation on Qualcomm NPU (QNN) backend. by @liang1232018 in #112
doc: Update README.md by @xumengwei in #113
refactor: Layer::run & Tensor::getStaticFunc by @yirongjie in #120
feat: add Phi-3-mini model by @WhiteNight123 in #119
refactor: Tensor::run &Layer::getFunc: Tensor& -> Tensor by @yirongjie in #121
perf: CPU Function: +-*/ by @yirongjie in #122
fix: +-*/ for old front end by @yirongjie in #129
refactor: Tensor::run &Layer::getFunc by @yirongjie in #130
fix 修复windows环境 by @WhiteNight123 in #127
feat: add MiniCPM 2B demo by @yirongjie in #132
refactor:: remove Layer Class Split, replace it with Tensor::split by @yirongjie in #136
fix: python bindings, clang-tidy, set line width to 100 by @chenghuaWang in #142
fix: Memory Alignment Error by @chenghuaWang in #143
fix: calculate bugs, cmakelist and clang-tidy by @yirongjie in #144
fix: bug fix for windows compilation by @chenghuaWang in #145
fix: windows compile bug by @chenghuaWang in #147
feat: cross compile arm on windows(x86) by @chenghuaWang in #148
fix: Memory Alloc bug in CPU Backend by @chenghuaWang in #149
Fix: QNN Cmakelists Config by @oreomaker in #150
Xnnpack backend support by @chenghuaWang in #152
Fixed typos. by @hustc12 in #155
fix: SmolLM name by @chenghuaWang in #157
feat: Support QWen2.5-1.5B, OpenELM-1,1B, DCLM-1B by @yirongjie in #160
feat: add_profilling_activation by @chunfenri in #154
fix: CMakeLists.txt in example by @yirongjie in #161
refactor: add TransformerConfig by @yirongjie in #162
fix: mv Tensor::graph to Module.activation_tensors; by @yirongjie in #164
feat: add PhoneLM by @yirongjie in #165
QNN Module API(new frontend) Preliminary Support by @oreomaker in #158
fix: repe_theta is set wrong by @yirongjie in #169
feat:QNN New Frontend End to End Inference by @oreomaker in #170
feat: Add modeling bert support by @XieWeikai in #166
fix: commen used in uni by @yirongjie in #171
fix: BerTokenizer::tokenizes by @yirongjie in #172
Add Bert for JNI. by @lx200916 in #173
Xnnpack backend support by @chenghuaWang in #159
feat: Boost xnnpack backend inference speed by freeze tensor weight. by @chenghuaWang in #174
fix: CPUTensorFunction.hpp by @UbiquitousLearning in #176
feat: drop xnn wrapper and move xnnwrapper to new front-end by @chenghuaWang in #177
feat: QNN New Frontend Phonelm Support and Refactors by @oreomaker in #179
fix: smollm tokenizer regex pattern by @chenghuaWang in #180
refactor: change tokenize method parameter from std::string& to const by @lx200916 in #181
fix： NPU affect CPU by @yirongjie in #182
1. refactor: add MLLM_LOG for Android support. by @lx200916 in #184
fix: remove unused "fmt" files by @yirongjie in #185
...