vkop 是一个基于 Vulkan 实现的迷你AI推理引擎, 仅在GPU上运行.
首先需要安装项目的依赖项shaderc或者vulkan sdk:
wget https://sdk.lunarg.com/sdk/download/latest/linux/vulkan-sdk.tar.gz
tar xvf vulkan-sdk.tar.gz
source path/to/VulkanSDK/setup-env.sh
export PATH=$VULKAN_SDK/x86_64_bin:$PATH对于模型转换
export CMAKE_POLICY_VERSION_MINIMUM=3.5
pip install onnx onnx-simplifier onnxsim onnxruntime
对于压测模型下载
pip install torchvision
对于测试依赖libtorch, cmake过程自动下载解压
设置 Vulkan ICD 加载器,以 NVIDIA 为例:
export VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json确保使用正确的 Vulkan 版本:
source path/to/VulkanSDK/setup-env.shcmake .. -DENABLE_TESTS=ON -DUSE_VALIDATION_LAYERS=ON -DENABLE_ASAN=OFF -DUSE_DEBUG_LAYERS=OFF -DUSE_FP16=OFF -DUSE_MEASURE_TIME=OFF
如果是交叉编译,需要设置交叉编译环境变量,借鉴参考toolchain.cmake
cmake .. -DCMAKE_TOOLCHAIN_FILE=../toolchain.cmake -DENABLE_TESTS=OFF
python3 model/onnx2vkop.py -i resnet18-v2-7.onnx- 支持量化:fp16, int8 对称量化
- 支持指定batch size
- 支持针对3D,4D NCHW to RGBA转换到模型
- 支持tensor合并,以便节约内存
usage: onnx2vkop.py [-h] [-q QUANT] -i INPUT [-u] [-b BATCH] [-r]
options:
-h, --help show this help message and exit
-q, --quant QUANT Override input_model
-i, --input INPUT input_model file
-u, --unify convert initializers to a single memory block
-b, --batch BATCH batch size for inference
-r, --rgba nchw to rgba conversion for initializers
./benchmark/vkbench ../resnet18-v2-7.vkopbin dog.jpeg
支持将postproc 手动注册到gpu 处理,比如softmax,topk减少CPU与GPU间的内存吞吐
vkop is a mini AI inference engine based on Vulkan, with runtime logic under 1000 lines of code.
First, install the required dependencies, such as shaderc or Vulkan SDK:
wget https://sdk.lunarg.com/sdk/download/latest/linux/vulkan-sdk.tar.gz
tar xvf vulkan-sdk.tar.gz
source path/to/VulkanSDK/setup-env.sh
export PATH=$VULKAN_SDK/x86_64_bin:$PATHFor model conversion:
export CMAKE_POLICY_VERSION_MINIMUM=3.5
pip install onnx onnx-simplifier onnxsim onnxruntimeFor benchmarking models:
pip install torchvision
For testing dependencies, libtorch is downloaded and extracted automatically during the cmake process.
Set up the Vulkan ICD loader, using NVIDIA as an example:
export VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.jsonEnsure the correct Vulkan version is used:
source path/to/VulkanSDK/setup-env.shcmake .. -DENABLE_TESTS=ON -DUSE_VALIDATION_LAYERS=OFF -DENABLE_ASAN=OFF -DUSE_DEBUG_LAYERS=OFF -DUSE_FP16=OFF -DUSE_MEASURE_TIME=OFFIf you are cross-compiling, set up the cross-compilation environment variables, based on toolchain.cmake:
cmake .. -DCMAKE_TOOLCHAIN_FILE=../toolchain.cmake -DENABLE_TESTS=OFF
python3.13 tools/onnx2vkop.py -i resnet18-v2-7.onnx- Supports quantization: fp16, int8 symmetric quantization
- Supports specifying batch size
- Supports 3D/4D NCHW to RGBA model conversion
- Supports tensor merging to save memory
usage: onnx2vkop.py [-h] [-q QUANT] -i INPUT [-u] [-b BATCH] [-r]
options:
-h, --help show this help message and exit
-q, --quant QUANT Override input_model
-i, --input INPUT input_model file
-u, --unify convert initializers to a single memory block
-b, --batch BATCH batch size for inference
-r, --rgba nchw to rgba conversion for initializers
./benchmark/vkbench ../resnet18-v2-7.vkopbin dog.jpegSupports manually registering post-processing operations like softmax and top-k on the GPU to reduce memory throughput between CPU and GPU.