WIP Allow Python backend to directly write Numpy arrays to SHM #264
WIP Allow Python backend to directly write Numpy arrays to SHM #264asos-danielbunting wants to merge 6 commits intotriton-inference-server:r23.05from
Conversation
Tabrizian
left a comment
There was a problem hiding this comment.
@asos-danielbunting thanks for the PR. I was wondering what is the use-case that this PR is trying to address? Is the idea to pre-allocate the buffers in shared memory and directly work with them to speed up the inference process?
Could you please share more details about the places where this becomes useful?
|
Hi @Tabrizian I'm looking at trying to speed up passing a large tensor between a Python BLS model doing preprocessing and a Tensorflow inference model. As you say the idea is to allocate the buffer and directly write my data into it from the python side and so avoid an extra allocation + copy time. I've run a couple of tests and for my use case this can speed up my inference time by a decent amount eg for a 100000 x 200 float32 tensor the saving was 30ms |
| @@ -0,0 +1,31 @@ | |||
| FROM asnpdsacr.azurecr.io/public/tritonserver:23.05-tf2-python-py3 | |||
| @@ -431,8 +431,12 @@ Stub::StubSetup() | |||
| py::setattr( | |||
There was a problem hiding this comment.
Remove all the changes except the ones in the src directory.
| c_python_backend_utils.attr("new_shm_tensor")); | ||
|
|
||
| c_python_backend_utils.attr("shared_memory") = py::cast(shm_pool_.get()); | ||
| python_backend_utils.attr("shared_memory") = py::cast(shm_pool_.get()); |
| python_backend_utils, "InferenceResponse", | ||
| c_python_backend_utils.attr("InferenceResponse")); | ||
| c_python_backend_utils.attr("shared_memory") = py::cast(shm_pool_.get()); | ||
| python_backend_utils.attr("shared_memory") = py::cast(shm_pool_.get()); |
| py::register_exception<PythonBackendException>( | ||
| module, "TritonModelException"); | ||
|
|
||
| module.def("new_shm_tensor", &PbTensor::CreateInSHM, "Creates a new Tensor directly into shared memory"); |
There was a problem hiding this comment.
Can we rename this to pb.Tensor.new(shape, dtype, device='cpu')?
| reinterpret_cast<char*>(tensor_shm_ptr) + pb_memory_offset, | ||
| shm_handle + pb_memory_offset, false); | ||
| tensor_shm_ptr->memory = 0; | ||
| std::cout << "Offset is - " << pb_memory_offset<< "\n"; |
| { | ||
|
|
||
| // Input params of tensor | ||
| //std::vector<int64_t> dims = std::vector<int64_t>({10, 10}); |
No description provided.