Skip to content

Improve dpnp.partition implementation#2766

Merged
antonwolfy merged 12 commits intomasterfrom
resolve-gh-2762
Feb 16, 2026
Merged

Improve dpnp.partition implementation#2766
antonwolfy merged 12 commits intomasterfrom
resolve-gh-2762

Conversation

@antonwolfy
Copy link
Contributor

@antonwolfy antonwolfy commented Feb 11, 2026

The PR propose to improve implementation and to use dpnp.sort call when

  • input array has number of dimensions > 1
  • input array has previously not supported integer dtype
  • axis keyword is passed (previously not supported)
  • sequence of kth is passed (previously not supported)
    In case of ndim > 1 previously the implementation from legacy backend was used, which is significantly slow (see performance comparation below). It used a copy of input data into the shared USM memory and included computations on the host.

This PR proposes to reuse dpnp.sort for all the above cases.
While in case when the legacy implementation is stable and fast (for 1D input array), it will remain, because it relays on std::nth_element from OneDPL.

The benchmark results were collected on PVC with help of the below code:

import dpnp, numpy as np
from dpnp.tests.helper import generate_random_numpy_array

a = generate_random_numpy_array(10**7, dtype=np.float64, seed_value=117)
ia = dpnp.array(a)
%timeit x = dpnp.partition(ia, 513); x.sycl_queue.wait()

Below tables contains data in case of 1D input array (shape=(10**7,)), where the implementation path was kept the same, plus adding support of missing integer dtypes using fallback on the sort function:

Implementation int32 uint32 int64 uint64 float32 float64 complex64 complex128
old (legacy backend) 7.46 ms not supported 9.46 ms not supported 7.39 ms 8.92 ms 10.9 ms 21.2 ms
new (backend + sort) 7.34 ms 10.8 ms 9.48 ms 12.5 ms 7.37 ms 8.89 ms 11 ms 21.2 ms

The following code was used for 2D input array with shape=(104, 104):

import dpnp, numpy as np
from dpnp.tests.helper import generate_random_numpy_array

a = generate_random_numpy_array((10**4, 10**4), dtype=np.float64, seed_value=117)
ia = dpnp.array(a)
%timeit x = dpnp.partition(ia, 1513); x.sycl_queue.wait()

In that case the new implementation is fully based on the sort call:

Implementation int32 int64 float32 float64 complex64 complex128
old (legacy backend) 6.4 s 6.89 s 7.36 s 7.66 s 8.61 s 10 s
new (sort) 57.4 ms 64.7 ms 62.2 ms 68 ms 77 ms 151 ms
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?

@antonwolfy antonwolfy added this to the 0.20.0 release milestone Feb 11, 2026
@antonwolfy antonwolfy self-assigned this Feb 11, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

View rendered docs @ https://intelpython.github.io/dpnp/index.html

@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

Array API standard conformance tests for dpnp=0.20.0dev2=py313h509198e_33 ran successfully.
Passed: 1353
Failed: 4
Skipped: 7

@coveralls
Copy link
Collaborator

coveralls commented Feb 11, 2026

Coverage Status

coverage: 81.124% (-0.04%) from 81.167%
when pulling 6422635 on resolve-gh-2762
into 9d6d5a5 on master.

@antonwolfy antonwolfy linked an issue Feb 11, 2026 that may be closed by this pull request
@antonwolfy antonwolfy marked this pull request as ready for review February 12, 2026 13:05
axis = normalize_axis_index(axis, nd)
length = a.shape[axis]

if isinstance(kth, int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks too strict because does not support integer-like scalars as dpnp.int64
In other dpnp functions this type are accepted for axes:

# axis : {None, int, tuple}, optional
In [22]: dpnp.count_nonzero(a, axis=dpnp.int64(0))
Out[22]: array([1, 1, 2, 1])
# axis : {None, int, tuple of ints}, optional
np.all(x, axis=np.int64(0))
Out[25]: array([ True, False])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But axes is validating by special function normalize_axis_tuple from NumPy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant kth argument

dpnp.partition(a,dpnp.int64(4),axis=None)
TypeError: kth must be int or sequence of ints, but got <class 'numpy.int64'>

Should we use operator.index for this as implemented in dpnp.tri for k?

Copy link
Contributor Author

@antonwolfy antonwolfy Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but it's not mandatory, because we explicitly state the support of int and sequence of itns in the documentation.
We can implement a general improvement and to add a helper function, like normalize_int_sequence. But I'd prefer to have that done separately as a general improvement.

Copy link
Contributor

@vlad-perevezentsev vlad-perevezentsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thank you @antonwolfy !

@antonwolfy antonwolfy merged commit 9c4aed2 into master Feb 16, 2026
94 of 97 checks passed
@antonwolfy antonwolfy deleted the resolve-gh-2762 branch February 16, 2026 19:16
github-actions bot added a commit that referenced this pull request Feb 16, 2026
The PR propose to improve implementation and to use `dpnp.sort` call
when
- input array has number of dimensions > 1
- input array has previously not supported integer dtype
- `axis` keyword is passed (previously not supported)
- sequence of `kth` is passed (previously not supported)
In case of `ndim > 1` previously the implementation from legacy backend
was used, which is significantly slow (see performance comparation
below). It used a copy of input data into the shared USM memory and
included computations on the host.

This PR proposes to reuse `dpnp.sort` for all the above cases.
While in case when the legacy implementation is stable and fast (for 1D
input array), it will remain, because it relays on `std::nth_element`
from OneDPL.

The benchmark results were collected on PVC with help of the below code:
```python
import dpnp, numpy as np
from dpnp.tests.helper import generate_random_numpy_array

a = generate_random_numpy_array(10**7, dtype=np.float64, seed_value=117)
ia = dpnp.array(a)
%timeit x = dpnp.partition(ia, 513); x.sycl_queue.wait()
```

Below tables contains data in case of 1D input array (shape=(10**7,)),
where the implementation path was kept the same, plus adding support of
missing integer dtypes using fallback on the sort function:
| Implementation | int32 | uint32 | int64 | uint64 | float32 | float64 |
complex64 | complex128 |

|--------|--------|--------|--------|--------|--------|--------|--------|--------|
| old (legacy backend) | 7.46 ms | not supported | 9.46 ms | not
supported | 7.39 ms | 8.92 ms | 10.9 ms | 21.2 ms |
| new (backend + sort) | 7.34 ms | 10.8 ms | 9.48 ms | 12.5 ms | 7.37 ms
| 8.89 ms | 11 ms | 21.2 ms |

The following code was used for 2D input array with shape=(10**4,
10**4):
```python
import dpnp, numpy as np
from dpnp.tests.helper import generate_random_numpy_array

a = generate_random_numpy_array((10**4, 10**4), dtype=np.float64, seed_value=117)
ia = dpnp.array(a)
%timeit x = dpnp.partition(ia, 1513); x.sycl_queue.wait()
```

In that case the new implementation is fully based on the sort call:
| Implementation | int32 | int64 | float32 | float64 | complex64 |
complex128 |
|--------|--------|--------|--------|--------|--------|--------|
| old (legacy backend) | 6.4 s | 6.89 s | 7.36 s | 7.66 s | 8.61 s | 10
s |
| new (sort) | 57.4 ms | 64.7 ms | 62.2 ms | 68 ms | 77 ms | 151 ms | 9c4aed2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dpnp.partition fails on 2D array of complex dtype

3 participants