Releases: UbiquitousLearning/mllm
Releases · UbiquitousLearning/mllm
MLLM-V2 V2.0.0 Release
New Features
- Pythonic eager execution – Rapid model development
- Unified hardware support – Arm CPU, OpenCL GPU, QNN NPU
- Advanced optimizations – Quantization, pruning, speculative execution
- NPU-ready IR – Seamless integration with NPU frameworks
- Deployment toolkit – SDK + CLI inference tool
- mllm JIT Kernel
News
[2026 Feb 03] 🔥🔥🔥 MLLM Qnn AOT Support for Full Graph Execution on NPU! Quick Start, Technical Report
[2025 Nov 27] Android Demo Update: Enabled stable Qwen3 and DeepSeek-OCR streaming on Android via a novel In-App Go Server Architecture.
[2025 Nov 23] MLLM v2 released!
What's Changed
- Develop qnn zh by @liang1232018 in #42
- Develop qnn zh by @liang1232018 in #43
- Develop qnn zh by @liang1232018 in #44
- fix: qnn rope file name by @liang1232018 in #46
- Develop qnn zh by @liang1232018 in #47
- Develop qnn zh by @liang1232018 in #48
- chore: qnn arm build config by @liang1232018 in #49
- Develop qnn zh by @liang1232018 in #50
- Develop qnn zh by @liang1232018 in #51
- Develop qnn zh by @liang1232018 in #52
- fix: qnn linear quantize tensor duplicate by @liang1232018 in #53
- Feat: Add new FrontEnd and model demos. by @yirongjie in #68
- feat: Add OPT Tokenizer. by @lx200916 in #66
- Feat: Optimize the operation process by @yirongjie in #69
- Fix: Tensor:: mm(): reference not passed in as input by @yirongjie in #70
- Feat: Fill in input Tensor by @yirongjie in #72
- Single precision inference support for the gemma-2B model by @chenghuaWang in #75
- Update README.md by @yirongjie in #76
- Support for the QWen1.5-0.5B model by @chenghuaWang in #79
- feat: mistral v0.2 7B support by @chenghuaWang in #83
- Update requirements.txt by @lx200916 in #87
- doc: Update README.md by @xumengwei in #89
- feat: Add Multi-Head Latent Attention(MLA) support. by @yirongjie in #90
- feat: add sparse inference like powerinfer by @XieWeikai in #86
- feat: Yi-1.5-6B support by @chenghuaWang in #88
- feat: Inference speed(tokens/s) profiling by @yirongjie in #91
- feat: Add new demo: demo_imagebind_1mod by @yirongjie in #92
- feat: Stablelm 2 1.6b support by @emt0re0 in #94
- doc: Update README.md by @yirongjie in #95
- feat: add elastic llama by @yirongjie in #98
- feat:Add OPT support by @yirongjie in #99
- feat: add Qwen 1.8B demo by @yirongjie in #100
- perf: Use
vector<shared_ptr<Tensor>> Tensor::graphsby @yirongjie in #101 - perf: add AArch64 GEMM/GEMV for q4_0. by @yirongjie in #104
- feat: add DEBUGSAVETENSOR & DEBUGOPTIME by @yirongjie in #106
- feat: topk/topp sampling by @chenghuaWang in #105
- fix: Qwen v1.5 Tokenizer bug by @chenghuaWang in #107
- feat: add clear_kvcache && fix: BUG in quantize. by @yirongjie in #108
- feat: GEMV + Bias mixed precision support for ARM Devices by @chenghuaWang in #109
- feat: llamafile_sgemm bias support by @chenghuaWang in #111
- chore: Disable OpenMP for Mac. by @lx200916 in #110
- feat: Preliminary implementation on Qualcomm NPU (QNN) backend. by @liang1232018 in #112
- doc: Update README.md by @xumengwei in #113
- refactor:
Layer::run&Tensor::getStaticFuncby @yirongjie in #120 - feat: add Phi-3-mini model by @WhiteNight123 in #119
- refactor:
Tensor::run&Layer::getFunc: Tensor& -> Tensor by @yirongjie in #121 - perf: CPU Function: +-*/ by @yirongjie in #122
- fix: +-*/ for old front end by @yirongjie in #129
- refactor:
Tensor::run&Layer::getFuncby @yirongjie in #130 - fix 修复windows环境 by @WhiteNight123 in #127
- feat: add MiniCPM 2B demo by @yirongjie in #132
- refactor:: remove Layer Class
Split, replace it withTensor::splitby @yirongjie in #136 - fix: python bindings, clang-tidy, set line width to 100 by @chenghuaWang in #142
- fix: Memory Alignment Error by @chenghuaWang in #143
- fix: calculate bugs, cmakelist and clang-tidy by @yirongjie in #144
- fix: bug fix for windows compilation by @chenghuaWang in #145
- fix: windows compile bug by @chenghuaWang in #147
- feat: cross compile arm on windows(x86) by @chenghuaWang in #148
- fix: Memory Alloc bug in CPU Backend by @chenghuaWang in #149
- Fix: QNN Cmakelists Config by @oreomaker in #150
- Xnnpack backend support by @chenghuaWang in #152
- Fixed typos. by @hustc12 in #155
- fix: SmolLM name by @chenghuaWang in #157
- feat: Support QWen2.5-1.5B, OpenELM-1,1B, DCLM-1B by @yirongjie in #160
- feat: add_profilling_activation by @chunfenri in #154
- fix: CMakeLists.txt in
exampleby @yirongjie in #161 - refactor: add TransformerConfig by @yirongjie in #162
- fix: mv
Tensor::graphtoModule.activation_tensors; by @yirongjie in #164 - feat: add PhoneLM by @yirongjie in #165
- QNN Module API(new frontend) Preliminary Support by @oreomaker in #158
- fix: repe_theta is set wrong by @yirongjie in #169
- feat:QNN New Frontend End to End Inference by @oreomaker in #170
- feat: Add modeling bert support by @XieWeikai in #166
- fix: commen used in uni by @yirongjie in #171
- fix: BerTokenizer::tokenizes by @yirongjie in #172
- Add Bert for JNI. by @lx200916 in #173
- Xnnpack backend support by @chenghuaWang in #159
- feat: Boost xnnpack backend inference speed by freeze tensor weight. by @chenghuaWang in #174
- fix: CPUTensorFunction.hpp by @UbiquitousLearning in #176
- feat: drop xnn wrapper and move xnnwrapper to new front-end by @chenghuaWang in #177
- feat: QNN New Frontend Phonelm Support and Refactors by @oreomaker in #179
- fix: smollm tokenizer regex pattern by @chenghuaWang in #180
- refactor: change tokenize method parameter from std::string& to const by @lx200916 in #181
- fix: NPU affect CPU by @yirongjie in #182
- fix: remove unused "fmt" files by @yirongjie in #185
- ...