Improve dpnp.partition implementation#2766
Conversation
…s the computation issue
…element from OneDPL
|
View rendered docs @ https://intelpython.github.io/dpnp/index.html |
|
Array API standard conformance tests for dpnp=0.20.0dev2=py313h509198e_33 ran successfully. |
| axis = normalize_axis_index(axis, nd) | ||
| length = a.shape[axis] | ||
|
|
||
| if isinstance(kth, int): |
There was a problem hiding this comment.
It looks too strict because does not support integer-like scalars as dpnp.int64
In other dpnp functions this type are accepted for axes:
# axis : {None, int, tuple}, optional
In [22]: dpnp.count_nonzero(a, axis=dpnp.int64(0))
Out[22]: array([1, 1, 2, 1])
# axis : {None, int, tuple of ints}, optional
np.all(x, axis=np.int64(0))
Out[25]: array([ True, False])
There was a problem hiding this comment.
But axes is validating by special function normalize_axis_tuple from NumPy.
There was a problem hiding this comment.
I meant kth argument
dpnp.partition(a,dpnp.int64(4),axis=None)
TypeError: kth must be int or sequence of ints, but got <class 'numpy.int64'>
Should we use operator.index for this as implemented in dpnp.tri for k?
There was a problem hiding this comment.
We could, but it's not mandatory, because we explicitly state the support of int and sequence of itns in the documentation.
We can implement a general improvement and to add a helper function, like normalize_int_sequence. But I'd prefer to have that done separately as a general improvement.
vlad-perevezentsev
left a comment
There was a problem hiding this comment.
LGTM
Thank you @antonwolfy !
The PR propose to improve implementation and to use `dpnp.sort` call when - input array has number of dimensions > 1 - input array has previously not supported integer dtype - `axis` keyword is passed (previously not supported) - sequence of `kth` is passed (previously not supported) In case of `ndim > 1` previously the implementation from legacy backend was used, which is significantly slow (see performance comparation below). It used a copy of input data into the shared USM memory and included computations on the host. This PR proposes to reuse `dpnp.sort` for all the above cases. While in case when the legacy implementation is stable and fast (for 1D input array), it will remain, because it relays on `std::nth_element` from OneDPL. The benchmark results were collected on PVC with help of the below code: ```python import dpnp, numpy as np from dpnp.tests.helper import generate_random_numpy_array a = generate_random_numpy_array(10**7, dtype=np.float64, seed_value=117) ia = dpnp.array(a) %timeit x = dpnp.partition(ia, 513); x.sycl_queue.wait() ``` Below tables contains data in case of 1D input array (shape=(10**7,)), where the implementation path was kept the same, plus adding support of missing integer dtypes using fallback on the sort function: | Implementation | int32 | uint32 | int64 | uint64 | float32 | float64 | complex64 | complex128 | |--------|--------|--------|--------|--------|--------|--------|--------|--------| | old (legacy backend) | 7.46 ms | not supported | 9.46 ms | not supported | 7.39 ms | 8.92 ms | 10.9 ms | 21.2 ms | | new (backend + sort) | 7.34 ms | 10.8 ms | 9.48 ms | 12.5 ms | 7.37 ms | 8.89 ms | 11 ms | 21.2 ms | The following code was used for 2D input array with shape=(10**4, 10**4): ```python import dpnp, numpy as np from dpnp.tests.helper import generate_random_numpy_array a = generate_random_numpy_array((10**4, 10**4), dtype=np.float64, seed_value=117) ia = dpnp.array(a) %timeit x = dpnp.partition(ia, 1513); x.sycl_queue.wait() ``` In that case the new implementation is fully based on the sort call: | Implementation | int32 | int64 | float32 | float64 | complex64 | complex128 | |--------|--------|--------|--------|--------|--------|--------| | old (legacy backend) | 6.4 s | 6.89 s | 7.36 s | 7.66 s | 8.61 s | 10 s | | new (sort) | 57.4 ms | 64.7 ms | 62.2 ms | 68 ms | 77 ms | 151 ms | 9c4aed2
The PR propose to improve implementation and to use
dpnp.sortcall whenaxiskeyword is passed (previously not supported)kthis passed (previously not supported)In case of
ndim > 1previously the implementation from legacy backend was used, which is significantly slow (see performance comparation below). It used a copy of input data into the shared USM memory and included computations on the host.This PR proposes to reuse
dpnp.sortfor all the above cases.While in case when the legacy implementation is stable and fast (for 1D input array), it will remain, because it relays on
std::nth_elementfrom OneDPL.The benchmark results were collected on PVC with help of the below code:
Below tables contains data in case of 1D input array (shape=(10**7,)), where the implementation path was kept the same, plus adding support of missing integer dtypes using fallback on the sort function:
The following code was used for 2D input array with shape=(104, 104):
In that case the new implementation is fully based on the sort call: