From cac3e24b914f30a0c193c003d5f2b7616427956f Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Thu, 15 Feb 2024 06:22:26 -0600 Subject: [PATCH 01/13] Update to CHANGELOG for 0.16.0 --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 5068b75b93..1a4215d9a9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,7 +4,7 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [0.16.0] - MMM. DD, YYYY +## [0.16.0] - Feb. 16, 2024 This release will require DPC++ 2024.1.0, which no longer supports Intel Gen9 integrated GPUs found in Intel CPUs of 10th generation and older. Featurewise, this release is identical to 0.15.1. From bdbca85c1bf0b98b479226a6dacba59c901a4a9a Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Wed, 27 Mar 2024 11:44:52 -0500 Subject: [PATCH 02/13] Added changelog entries for the upcoming 0.16.1 release --- CHANGELOG.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1a4215d9a9..504c412a5a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,25 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.16.1] - Apr. XX, 2024 + +This is a bug-fix release, which also provides a change needed by ``numba_dpex`` project to support dispatching kernels +consuming instances of ``sycl::kernel_accessor`` template type. + +### Changed + +* Changed behavior of ``dpctl.tensor.usm_ndarray.__dlpack_device__`` method to return device id of the parent unpartition device if array is allocated on a sub-device instead of raising an exception: [#1604](https://github.com/IntelPython/dpctl/pull/1604) +* Array creation functions and the ``usm-ndarray`` constructor in `dpctl.tensor` submodule now use cached default-selected device to improve performance: [#1606](https://github.com/IntelPython/dpctl/pull/1606) +* Changed treatment of `axis` keyword for `dpctl.tensor.tensordot` and `dpctl.tensor.vecdot` to align with Python Array API 2023.12 specification: [#1608](https://github.com/IntelPython/dpctl/pull/1608) +* Changed implementation of `DPCTLQueue_SubmitRange`, `DPCTLQueue_SubmitNDRange` in DPCTLSyclInterface library to support ``sycl::local_accessor`` arguments needed by ``numba_dpex``; the enum `DPCTLKernelArgType` to correspond to C++ disjoint types: [#1609](https://github.com/IntelPython/dpctl/pull/1609), [#1611](https://github.com/IntelPython/dpctl/pull/1611), [#1612](https://github.com/IntelPython/dpctl/pull/1612) + +### Fixed + +* Fixed a crash on Windows platform during execution of getter of `dpctl.SyclPlatfom.default_context` property: : [#1604](https://github.com/IntelPython/dpctl/pull/1604) +* Fixed kernel submission error on NVidia CUDA GPUs during `dpctl.tensor.matmul` operation: [#1605](https://github.com/IntelPython/dpctl/pull/1605) +* Fixed corruption of context cache table entries: [#1607](https://github.com/IntelPython/dpctl/pull/1607) + + ## [0.16.0] - Feb. 16, 2024 This release will require DPC++ 2024.1.0, which no longer supports Intel Gen9 integrated GPUs found in Intel CPUs of 10th generation and older. From c090c77e89a6d7b8face1b9bcd201d094126838c Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Wed, 27 Mar 2024 13:36:03 -0500 Subject: [PATCH 03/13] Updated change-log to document fix in gh-1615 --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 504c412a5a..8859645a63 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -21,6 +21,7 @@ consuming instances of ``sycl::kernel_accessor`` template type. * Fixed a crash on Windows platform during execution of getter of `dpctl.SyclPlatfom.default_context` property: : [#1604](https://github.com/IntelPython/dpctl/pull/1604) * Fixed kernel submission error on NVidia CUDA GPUs during `dpctl.tensor.matmul` operation: [#1605](https://github.com/IntelPython/dpctl/pull/1605) * Fixed corruption of context cache table entries: [#1607](https://github.com/IntelPython/dpctl/pull/1607) +* Fixed output of ``python -m dpctl --library`` to fix specified library name: [#1615](https://github.com/IntelPython/dpctl/pull/1615) ## [0.16.0] - Feb. 16, 2024 From 567a845ae6f16c19c632d307c06d479f6c85aaef Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Wed, 27 Mar 2024 16:31:26 -0500 Subject: [PATCH 04/13] Update CHANGELOG.md usm-ndarray->usm_ndarray Co-authored-by: ndgrigorian <46709016+ndgrigorian@users.noreply.github.com> --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8859645a63..d786876792 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,7 +12,7 @@ consuming instances of ``sycl::kernel_accessor`` template type. ### Changed * Changed behavior of ``dpctl.tensor.usm_ndarray.__dlpack_device__`` method to return device id of the parent unpartition device if array is allocated on a sub-device instead of raising an exception: [#1604](https://github.com/IntelPython/dpctl/pull/1604) -* Array creation functions and the ``usm-ndarray`` constructor in `dpctl.tensor` submodule now use cached default-selected device to improve performance: [#1606](https://github.com/IntelPython/dpctl/pull/1606) +* Array creation functions and the ``usm_ndarray`` constructor in `dpctl.tensor` submodule now use cached default-selected device to improve performance: [#1606](https://github.com/IntelPython/dpctl/pull/1606) * Changed treatment of `axis` keyword for `dpctl.tensor.tensordot` and `dpctl.tensor.vecdot` to align with Python Array API 2023.12 specification: [#1608](https://github.com/IntelPython/dpctl/pull/1608) * Changed implementation of `DPCTLQueue_SubmitRange`, `DPCTLQueue_SubmitNDRange` in DPCTLSyclInterface library to support ``sycl::local_accessor`` arguments needed by ``numba_dpex``; the enum `DPCTLKernelArgType` to correspond to C++ disjoint types: [#1609](https://github.com/IntelPython/dpctl/pull/1609), [#1611](https://github.com/IntelPython/dpctl/pull/1611), [#1612](https://github.com/IntelPython/dpctl/pull/1612) From 126a1433323d3fe633f511299eb11713206dbc85 Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Wed, 27 Mar 2024 15:58:50 -0500 Subject: [PATCH 05/13] Fixed typo, documented fix of gh-1570 --- CHANGELOG.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index d786876792..9d153752ea 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,7 +11,7 @@ consuming instances of ``sycl::kernel_accessor`` template type. ### Changed -* Changed behavior of ``dpctl.tensor.usm_ndarray.__dlpack_device__`` method to return device id of the parent unpartition device if array is allocated on a sub-device instead of raising an exception: [#1604](https://github.com/IntelPython/dpctl/pull/1604) +* Changed behavior of ``dpctl.tensor.usm_ndarray.__dlpack_device__`` method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: [#1604](https://github.com/IntelPython/dpctl/pull/1604) * Array creation functions and the ``usm_ndarray`` constructor in `dpctl.tensor` submodule now use cached default-selected device to improve performance: [#1606](https://github.com/IntelPython/dpctl/pull/1606) * Changed treatment of `axis` keyword for `dpctl.tensor.tensordot` and `dpctl.tensor.vecdot` to align with Python Array API 2023.12 specification: [#1608](https://github.com/IntelPython/dpctl/pull/1608) * Changed implementation of `DPCTLQueue_SubmitRange`, `DPCTLQueue_SubmitNDRange` in DPCTLSyclInterface library to support ``sycl::local_accessor`` arguments needed by ``numba_dpex``; the enum `DPCTLKernelArgType` to correspond to C++ disjoint types: [#1609](https://github.com/IntelPython/dpctl/pull/1609), [#1611](https://github.com/IntelPython/dpctl/pull/1611), [#1612](https://github.com/IntelPython/dpctl/pull/1612) @@ -21,6 +21,7 @@ consuming instances of ``sycl::kernel_accessor`` template type. * Fixed a crash on Windows platform during execution of getter of `dpctl.SyclPlatfom.default_context` property: : [#1604](https://github.com/IntelPython/dpctl/pull/1604) * Fixed kernel submission error on NVidia CUDA GPUs during `dpctl.tensor.matmul` operation: [#1605](https://github.com/IntelPython/dpctl/pull/1605) * Fixed corruption of context cache table entries: [#1607](https://github.com/IntelPython/dpctl/pull/1607) +* Fixed incorrect result from ``dpctl.tensor.tensordot`` reported in issue [#1570](https://github.com/IntelPython/dpctl/issues/1570): [#1608](https://github.com/IntelPython/dpctl/pull/1608) * Fixed output of ``python -m dpctl --library`` to fix specified library name: [#1615](https://github.com/IntelPython/dpctl/pull/1615) From 77149a28b0f03c738f320c45dd548ba9eefd3dc7 Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Wed, 27 Mar 2024 22:00:48 -0500 Subject: [PATCH 06/13] Fictional sycl::kernel_accessor -> sycl::local_accessor --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9d153752ea..8ad86f4404 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,7 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.16.1] - Apr. XX, 2024 This is a bug-fix release, which also provides a change needed by ``numba_dpex`` project to support dispatching kernels -consuming instances of ``sycl::kernel_accessor`` template type. +consuming instances of ``sycl::local_accessor`` template type. ### Changed From 731b2097b9ca2ac02d3a337a662c6563e199f9e4 Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Wed, 10 Apr 2024 11:45:30 -0500 Subject: [PATCH 07/13] Set date in Changelog for release of 0.16.1 --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8ad86f4404..9bdb230092 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,7 +4,7 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [0.16.1] - Apr. XX, 2024 +## [0.16.1] - Apr. 10, 2024 This is a bug-fix release, which also provides a change needed by ``numba_dpex`` project to support dispatching kernels consuming instances of ``sycl::local_accessor`` template type. From 68f9b85f07bb6552f91b3b253275d684bd525328 Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Wed, 15 May 2024 15:33:04 -0500 Subject: [PATCH 08/13] Populated changelog for 0.17 --- CHANGELOG.md | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9bdb230092..75432ca897 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,37 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.17.0] - May. XX, 2024 + +This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions, +and complies with revision [2023.12](https://data-apis.org/array-api/2023.12/) of Python Array API specification. + +### Added + +* Added pybind11 caster for ``sycl::half`` to map to/from Python `float` to ``"dpctl4pybind11.hpp"`` header: [gh-1655](https://github.com/IntelPython/dpctl/pull/1655) +* Added support for DLPack data interchange per Python Array API 2023.12 specification: [gh-1667](https://github.com/IntelPython/dpctl/pull/1667) +* Implemented `tensor.cumulative_sum`, `tensor.cumulative_prod` and `tensor.cumulative_logsumexp`: [gh-1602](https://github.com/IntelPython/dpctl/pull/1602) + +### Changed + +* Expanded documentation for `dpctl`: [gh-1619](https://github.com/IntelPython/dpctl/pull/1619) +* Expanded `utils.intel_device_info` functionality: [gh-1656](https://github.com/IntelPython/dpctl/pull/1656) +* Improved performance of elementwise operations: [gh-1651](https://github.com/IntelPython/dpctl/pull/1651) +* Efficiency improvement by avoiding unnecessary copying: [gh-1645](https://github.com/IntelPython/dpctl/pull/1645) +* `dpctl` uses pybind11 2.12.0: [gh-1640](https://github.com/IntelPython/dpctl/pull/1640) + + +### Fixed + +* Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: [gh-1624](https://github.com/IntelPython/dpctl/pull/1624) +* Fixed crash in `tensor.sort` reported for a CPU device and a CUDA device: [gh-1676](https://github.com/IntelPython/dpctl/pull/1676) +* Fixed comparison operators for mixed signed and unsigned integral types: [gh-1650](https://github.com/IntelPython/dpctl/pull/1650) +* Support use of index arrays of different integral types in indexing operations: [gh-47](https://github.com/IntelPython/dpctl/pull/1647) +* Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: [gh-1630](https://github.com/IntelPython/dpctl/pull/1630) +* Corrected `tensor.tile` for scalar inputs and empty repetitions: [gh-1628](https://github.com/IntelPython/dpctl/pull/1628) +* Fixed support for `out` keyword in `tensor.matmul`: [gh-1610](https://github.com/IntelPython/dpctl/pull/1610) + + ## [0.16.1] - Apr. 10, 2024 This is a bug-fix release, which also provides a change needed by ``numba_dpex`` project to support dispatching kernels From 8a787d5b5873ad8d796da37aab82d45310892c83 Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Wed, 15 May 2024 16:00:38 -0500 Subject: [PATCH 09/13] Added item for fix in gh-1665 --- CHANGELOG.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 75432ca897..4ac2d083e9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -26,8 +26,9 @@ and complies with revision [2023.12](https://data-apis.org/array-api/2023.12/) o ### Fixed -* Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: [gh-1624](https://github.com/IntelPython/dpctl/pull/1624) +* Fixed initialization of byte type constants in `dpctl_capi` Python/C API loader class in `"dpctl4pybind11.hpp"`: [gh-1665](https://github.com/IntelPython/dpctl/pull/1665) * Fixed crash in `tensor.sort` reported for a CPU device and a CUDA device: [gh-1676](https://github.com/IntelPython/dpctl/pull/1676) +* Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: [gh-1624](https://github.com/IntelPython/dpctl/pull/1624) * Fixed comparison operators for mixed signed and unsigned integral types: [gh-1650](https://github.com/IntelPython/dpctl/pull/1650) * Support use of index arrays of different integral types in indexing operations: [gh-47](https://github.com/IntelPython/dpctl/pull/1647) * Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: [gh-1630](https://github.com/IntelPython/dpctl/pull/1630) From e01f2c20b7ca119a12043dd3350aa1e8259c5b07 Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Wed, 15 May 2024 21:12:50 -0500 Subject: [PATCH 10/13] Update CHANGELOG.md Co-authored-by: ndgrigorian <46709016+ndgrigorian@users.noreply.github.com> --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 4ac2d083e9..b689f9c92b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -20,7 +20,7 @@ and complies with revision [2023.12](https://data-apis.org/array-api/2023.12/) o * Expanded documentation for `dpctl`: [gh-1619](https://github.com/IntelPython/dpctl/pull/1619) * Expanded `utils.intel_device_info` functionality: [gh-1656](https://github.com/IntelPython/dpctl/pull/1656) * Improved performance of elementwise operations: [gh-1651](https://github.com/IntelPython/dpctl/pull/1651) -* Efficiency improvement by avoiding unnecessary copying: [gh-1645](https://github.com/IntelPython/dpctl/pull/1645) +* Efficiency improvement by avoiding unnecessary copying of ``sycl::queue``: [gh-1645](https://github.com/IntelPython/dpctl/pull/1645) * `dpctl` uses pybind11 2.12.0: [gh-1640](https://github.com/IntelPython/dpctl/pull/1640) From 0da9d2bb8ad10114c878ba80df99ce31cf296356 Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Wed, 15 May 2024 21:13:14 -0500 Subject: [PATCH 11/13] Update CHANGELOG.md Co-authored-by: ndgrigorian <46709016+ndgrigorian@users.noreply.github.com> --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b689f9c92b..06988f9677 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -54,7 +54,7 @@ consuming instances of ``sycl::local_accessor`` template type. * Fixed kernel submission error on NVidia CUDA GPUs during `dpctl.tensor.matmul` operation: [#1605](https://github.com/IntelPython/dpctl/pull/1605) * Fixed corruption of context cache table entries: [#1607](https://github.com/IntelPython/dpctl/pull/1607) * Fixed incorrect result from ``dpctl.tensor.tensordot`` reported in issue [#1570](https://github.com/IntelPython/dpctl/issues/1570): [#1608](https://github.com/IntelPython/dpctl/pull/1608) -* Fixed output of ``python -m dpctl --library`` to fix specified library name: [#1615](https://github.com/IntelPython/dpctl/pull/1615) +* Fixed library name output by ``python -m dpctl --library``: [#1615](https://github.com/IntelPython/dpctl/pull/1615) ## [0.16.0] - Feb. 16, 2024 From 9ade7f199ae52d2b3d17f14f6e16416ddb177882 Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Thu, 16 May 2024 10:31:35 -0500 Subject: [PATCH 12/13] Added gh-1677 and gh-1680 to the changelog --- CHANGELOG.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 06988f9677..5fe058ff5e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -22,7 +22,7 @@ and complies with revision [2023.12](https://data-apis.org/array-api/2023.12/) o * Improved performance of elementwise operations: [gh-1651](https://github.com/IntelPython/dpctl/pull/1651) * Efficiency improvement by avoiding unnecessary copying of ``sycl::queue``: [gh-1645](https://github.com/IntelPython/dpctl/pull/1645) * `dpctl` uses pybind11 2.12.0: [gh-1640](https://github.com/IntelPython/dpctl/pull/1640) - +* Improved performance of `tensor.reshape` operation with `order="F"` when copying is needed, or requested: [gh-1677](https://github.com/IntelPython/dpctl/pull/1677) ### Fixed @@ -34,6 +34,7 @@ and complies with revision [2023.12](https://data-apis.org/array-api/2023.12/) o * Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: [gh-1630](https://github.com/IntelPython/dpctl/pull/1630) * Corrected `tensor.tile` for scalar inputs and empty repetitions: [gh-1628](https://github.com/IntelPython/dpctl/pull/1628) * Fixed support for `out` keyword in `tensor.matmul`: [gh-1610](https://github.com/IntelPython/dpctl/pull/1610) +* Fixed bug in basic slicing of empty arrays: [gh-1680](https://github.com/IntelPython/dpctl/pull/1680) ## [0.16.1] - Apr. 10, 2024 From e4c60f824c46ba19fadfd8616deb262216ace301 Mon Sep 17 00:00:00 2001 From: Oleksandr Pavlyk Date: Thu, 16 May 2024 15:33:34 -0500 Subject: [PATCH 13/13] Added entries for fixes gh-1681 and gh-1682 --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 5fe058ff5e..4d373efa75 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -35,6 +35,8 @@ and complies with revision [2023.12](https://data-apis.org/array-api/2023.12/) o * Corrected `tensor.tile` for scalar inputs and empty repetitions: [gh-1628](https://github.com/IntelPython/dpctl/pull/1628) * Fixed support for `out` keyword in `tensor.matmul`: [gh-1610](https://github.com/IntelPython/dpctl/pull/1610) * Fixed bug in basic slicing of empty arrays: [gh-1680](https://github.com/IntelPython/dpctl/pull/1680) +* Fixed bug in `tensor.bitwise_invert` for boolean input array: [gh-1681](https://github.com/IntelPython/dpctl/pull/1681) +* Fixed bug in `tensor.repeat` on zero-size input arrays: [gh-1682](https://github.com/IntelPython/dpctl/pull/1682) ## [0.16.1] - Apr. 10, 2024