Used fma in linsequence_affine kernel#1034
Conversation
|
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1034/index.html |
|
Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_14 ran successfully. |
|
Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_15 ran successfully. |
|
Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_16 ran successfully. |
afcccfd to
5a126fd
Compare
|
Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_14 ran successfully. |
|
@npolina4 I fixed the issue by using |
|
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
|
Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_14 ran successfully. |
This PR changes to
_tensor_implto usesycl::fmafunction to work-around aggressive compiler optimizations reordering multiplications and causing overflows. This could be addressed by applying-fno-associative-mathflag (See https://clang.llvm.org/docs/UsersManual.html#controlling-floating-point-behavior for how to control FP-behavior in clang), which help to address the issue on Linux, but not Windows.This fixes output of
dpt.linspace(dpt.finfo('f4').max, dpt.finfo('f4').max, num=16, dtype='f4')which unexpectedly containednanvalues as discovered by @npolina4