Conversation
|
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1187/index.html |
|
For array of rank But in our case the Also, instead of decoding integer to sequence of bits using a string, perhaps consider using |
|
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
|
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
eaa63ee to
e24fe4e
Compare
|
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
2 similar comments
|
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
|
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
|
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
oleksandr-pavlyk
left a comment
There was a problem hiding this comment.
Before:
In [1]: import dpctl.tensor as dpt, dpctl
In [2]: m = dpt.ones((17, 15, 4, 31, 9, 4, 13), dtype="i2")
In [3]: from dpctl.tensor._print import _nd_corners
In [4]: %timeit -n 500 -r 12 dpt.asnumpy(_nd_corners(m, 3)).shape
14.2 ms ± 1.29 ms per loop (mean ± std. dev. of 12 runs, 500 loops each)
With changes from this PR:
In [1]: import dpctl.tensor as dpt, dpctl
In [2]: m = dpt.ones((17, 15, 4, 31, 9, 4, 13), dtype="i2")
In [3]: from dpctl.tensor._print import _nd_corners
In [4]: %timeit -n 500 -r 12 _nd_corners(m, 3).shape
4.72 ms ± 357 µs per loop (mean ± std. dev. of 12 runs, 500 loops each)
|
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
|
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
In this PR, the recursive method used in
dpctl.tensor._print._nd_cornersfunction is replaced with an iterative method to improve performance.x_dpt = dpt.reshape(dpt.arange(6*6*117*117, dtype='i4'),(6,117,117,6))%timeit -r 20 dpt.usm_ndarray_repr(x_dpt)New timing: 4.55 ms ± 645 µs per loop (mean ± std. dev. of 20 runs, 100 loops each)
Old timing: 6.43 ms ± 2.31 ms per loop (mean ± std. dev. of 20 runs, 100 loops each)