PR #41167: Fix shape-dependent erf precision by adding saturation select by copybara-service[bot] · Pull Request #116345 · tensorflow/tensorflow

copybara-service · 2026-04-20T09:46:02Z

PR #41167: Fix shape-dependent erf precision by adding saturation select

Imported from GitHub PR openxla/xla#41167

Summary

The scalar erf lowering in expand_float_ops.cc clamps input x to [-kErfInvOneMinusHalfULP, +kErfInvOneMinusHalfULP] and evaluates the rational polynomial, producing 0.9999998212 at the boundary instead of 1.0 (3 f32 ULPs off). The vector path in xla/codegen/intrinsic/erf.cc already handles saturation correctly via a select; this adds the matching select to the scalar path.

Context

This was diagnosed collaboratively in #41122:

@wuyii8941 reported the shape-dependent precision issue and provided a side-by-side LLVM IR dump that isolated the divergence to the scalar-vs-vector lowering paths
Follow-up investigation narrowed down that the polynomial coefficients are identical across both paths — only the saturation strategy differs
@wuyii8941 confirmed a saturation select after line 107 as the correct fix direction

The comment on line 83-84 documents the intended behavior ("outside of which x should be +/-1") but the implementing code never enforced the ±1 fallback.

Approach

Compute the saturation check on the original unclamped x before the clamp is applied:

Value abs_x = math::AbsFOp::create(b, original_x);
Value saturates = ma::CmpFOp::create(b, CmpFPredicate::OGE, abs_x,
                                     c(kErfInvOneMinusHalfULP));
Value saturated_value = math::CopySignOp::create(b, c(1.0f), original_x);
// ... existing clamp + polynomial evaluation ...
rewriter.replaceOpWithNewOp<SelectOp>(op, saturates, saturated_value,
                                      poly_result);

For x = 4.5: |4.5| >= 3.7439 → select returns copysign(1.0, 4.5) = 1.0.
For in-range x: select picks the polynomial result, unchanged from before.

Testing

Added CHECK lines to expand_float_ops.mlir verifying the lowering contains math.absf, arith.cmpf oge, math.copysign, and arith.select
All 19 tests in //xla/codegen/emitters/transforms/tests:tests pass locally (bazel test //xla/codegen/emitters/transforms/tests:tests)

cc @wuyii8941 — you flagged this and confirmed the fix direction in the issue, feel free to take a look if interested.
Copybara import of the project:

--
cd3de7891e6deaeedbd80d89a1ff93625b538fbc by Manish Reddy kreddy.manish@gmail.com:

Fix shape-dependent erf precision by adding saturation select.

The scalar erf lowering in expand_float_ops.cc clamped the input to
[-kErfInvOneMinusHalfULP, kErfInvOneMinusHalfULP] and evaluated the
rational polynomial, producing ~0.9999998212 at the boundary instead
of 1.0 (3 f32 ULPs off). The vector path in xla/codegen/intrinsic/erf.cc
already handles saturation correctly via a select; this adds the
matching select to the scalar path.

The comment on line 83-84 documents this intent ("outside of which x
should be +/-1") but the original code never enforced it. The fix
follows exactly the strategy used by EmitErfF32 in erf.cc: check the
original unclamped |x| against the threshold, and if it exceeds, return
copysign(1.0, x) directly.

Fixes #41122

Merging this change closes #41167

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#41167 from kredd2506:fix-scalar-erf-saturation cd3de7891e6deaeedbd80d89a1ff93625b538fbc

@wuyii8941

Imported from GitHub PR openxla/xla#41167 ## Summary Fixes #41122 The scalar erf lowering in `expand_float_ops.cc` clamps input `x` to `[-kErfInvOneMinusHalfULP, +kErfInvOneMinusHalfULP]` and evaluates the rational polynomial, producing `0.9999998212` at the boundary instead of `1.0` (3 f32 ULPs off). The vector path in [`xla/codegen/intrinsic/erf.cc`](https://github.com/openxla/xla/blob/main/xla/codegen/intrinsic/erf.cc) already handles saturation correctly via a `select`; this adds the matching select to the scalar path. ## Context This was diagnosed collaboratively in #41122: - @wuyii8941 reported the shape-dependent precision issue and provided a side-by-side LLVM IR dump that isolated the divergence to the scalar-vs-vector lowering paths - Follow-up investigation narrowed down that the polynomial coefficients are identical across both paths — only the saturation strategy differs - @wuyii8941 confirmed a saturation select after [line 107](https://github.com/openxla/xla/blob/main/xla/codegen/emitters/transforms/expand_float_ops.cc#L107) as the correct fix direction The [comment on line 83-84](https://github.com/openxla/xla/blob/main/xla/codegen/emitters/transforms/expand_float_ops.cc#L83-L84) documents the intended behavior (*"outside of which x should be +/-1"*) but the implementing code never enforced the `±1` fallback. ## Approach Compute the saturation check on the original unclamped `x` before the clamp is applied: ```cpp Value abs_x = math::AbsFOp::create(b, original_x); Value saturates = ma::CmpFOp::create(b, CmpFPredicate::OGE, abs_x, c(kErfInvOneMinusHalfULP)); Value saturated_value = math::CopySignOp::create(b, c(1.0f), original_x); // ... existing clamp + polynomial evaluation ... rewriter.replaceOpWithNewOp<SelectOp>(op, saturates, saturated_value, poly_result); ``` For `x = 4.5`: `|4.5| >= 3.7439` → select returns `copysign(1.0, 4.5) = 1.0`. For in-range `x`: select picks the polynomial result, unchanged from before. ## Testing - Added CHECK lines to `expand_float_ops.mlir` verifying the lowering contains `math.absf`, `arith.cmpf oge`, `math.copysign`, and `arith.select` - All 19 tests in `//xla/codegen/emitters/transforms/tests:tests` pass locally (`bazel test //xla/codegen/emitters/transforms/tests:tests`) cc @wuyii8941 — you flagged this and confirmed the fix direction in the issue, feel free to take a look if interested. Copybara import of the project: -- cd3de7891e6deaeedbd80d89a1ff93625b538fbc by Manish Reddy <kreddy.manish@gmail.com>: Fix shape-dependent erf precision by adding saturation select. The scalar erf lowering in expand_float_ops.cc clamped the input to [-kErfInvOneMinusHalfULP, kErfInvOneMinusHalfULP] and evaluated the rational polynomial, producing ~0.9999998212 at the boundary instead of 1.0 (3 f32 ULPs off). The vector path in xla/codegen/intrinsic/erf.cc already handles saturation correctly via a select; this adds the matching select to the scalar path. The comment on line 83-84 documents this intent ("outside of which x should be +/-1") but the original code never enforced it. The fix follows exactly the strategy used by EmitErfF32 in erf.cc: check the original unclamped |x| against the threshold, and if it exceeds, return copysign(1.0, x) directly. Fixes #41122 Merging this change closes #41167 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#41167 from kredd2506:fix-scalar-erf-saturation cd3de7891e6deaeedbd80d89a1ff93625b538fbc PiperOrigin-RevId: 902516787

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR #41167: Fix shape-dependent erf precision by adding saturation select#116345

PR #41167: Fix shape-dependent erf precision by adding saturation select#116345
copybara-service[bot] wants to merge 1 commit intomasterfrom
exported_pr_902516787

copybara-service bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

copybara-service bot commented Apr 20, 2026

Summary

Context

Approach

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant