Skip to content

No need to call _copy_overlapping if src and dst address same memory#1284

Merged
oleksandr-pavlyk merged 1 commit intomasterfrom
improve-overlap-check-in-copy
Jul 17, 2023
Merged

No need to call _copy_overlapping if src and dst address same memory#1284
oleksandr-pavlyk merged 1 commit intomasterfrom
improve-overlap-check-in-copy

Conversation

@oleksandr-pavlyk
Copy link
Copy Markdown
Contributor

In [1]: import dpctl.tensor as dpt, dpctl, dpctl.utils

In [2]: n, m = 8 * 540, 8 * 960

In [3]: a = dpt.ones((m, n))

In [4]: b = dpt.zeros((m, n))

In [5]: b_s = dpt.zeros((m, n+2))

In [6]: with dpctl.utils.onetrace_enabled():
   ...:     b_s[:,:-2] += a
      ...:
      Device Timeline (queue: 0x556080b9cea0): zeCommandListAppendMemoryCopy(H2D)[48 bytes]<4.1> [ns] = 16946404661 (append) 16952292497 (submit) 16952613747 (start) 16952623538 (end)
      Device Timeline (queue: 0x556080b9cea0): dpctl::tensor::kernels::add::add_inplace_strided_kernel<float, float, dpctl::tensor::offset_utils::TwoOffsets_StridedIndexer>[SIMD32 {64800; 1; 1} {512; 1; 1}]<5.1> [ns] = 17017855801 (append) 17018342202 (submit) 17019138920 (start) 17030770482 (end)

Earlier, two more copy operations were being performed as well.

Previously:

In [7]: %time b_s[:,:-2] += a
CPU times: user 13.2 ms, sys: 24.7 ms, total: 37.9 ms
Wall time: 53 ms

Now:

In [7]: %time b_s[:,:-2] += a
CPU times: user 5.08 ms, sys: 9.58 ms, total: 14.7 ms
Wall time: 16.7 ms
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

@AlexanderKalistratov
Copy link
Copy Markdown

Shouldn't it also fix sqrt with 'out' for pairwise distance?

@github-actions
Copy link
Copy Markdown

```
In [1]: import dpctl.tensor as dpt, dpctl, dpctl.utils

In [2]: n, m = 8 * 540, 8 * 960

In [3]: a = dpt.ones((m, n))

In [4]: b = dpt.zeros((m, n))

In [5]: b_s = dpt.zeros((m, n+2))

In [6]: with dpctl.utils.onetrace_enabled():
   ...:     b_s[:,:-2] += a
      ...:
      Device Timeline (queue: 0x556080b9cea0): zeCommandListAppendMemoryCopy(H2D)[48 bytes]<4.1> [ns] = 16946404661 (append) 16952292497 (submit) 16952613747 (start) 16952623538 (end)
      Device Timeline (queue: 0x556080b9cea0): dpctl::tensor::kernels::add::add_inplace_strided_kernel<float, float, dpctl::tensor::offset_utils::TwoOffsets_StridedIndexer>[SIMD32 {64800; 1; 1} {512; 1; 1}]<5.1> [ns] = 17017855801 (append) 17018342202 (submit) 17019138920 (start) 17030770482 (end)
```

Earlier, two more copy operations were being performed as well.
@oleksandr-pavlyk oleksandr-pavlyk force-pushed the improve-overlap-check-in-copy branch from 17a2623 to 701c05b Compare July 17, 2023 14:24
@github-actions
Copy link
Copy Markdown

Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_11 ran successfully.
Passed: 448
Failed: 552
Skipped: 119

1 similar comment
@github-actions
Copy link
Copy Markdown

Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_11 ran successfully.
Passed: 448
Failed: 552
Skipped: 119

@oleksandr-pavlyk oleksandr-pavlyk merged commit a6d16f2 into master Jul 17, 2023
@oleksandr-pavlyk oleksandr-pavlyk deleted the improve-overlap-check-in-copy branch July 17, 2023 17:43
@github-actions
Copy link
Copy Markdown

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@github-actions
Copy link
Copy Markdown

Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_11 ran successfully.
Passed: 448
Failed: 552
Skipped: 119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants