CI: add Linux x86_64 clang ASAN+LSAN job by mayeut · Pull Request #30800 · numpy/numpy

mayeut · 2026-02-07T10:29:25Z

This PR adds a Linux x86_64 clang AddressSanitizer+LeakSanitizer job as discussed in #30761 (comment) and #30583

I'll try to address the various comments found in #30583 regarding leak checks in this description.

For now, AddressSanitizer is only ran on macOS arm64 so it does make sense to add a job for Linux x86_64 which likely covers a much larger share of users and uses different optimisations not yet covered by existing AddressSanitizer tests. LeakSanitizer should not add much overhead to testing time compared to AddressSanitizer tests.

Limitations:

CPython itself does not seem to be tested with detect_leaks=1 and thus, leaks in CPython might be revealed here on CPython upgrades.
The actual leak check is only done once the interpreter exits and the (native only) stack trace does not help tracking which tests leak. As mentioned by @seberg in ENH: add C memory leak regression tests #30583 (comment), if this runs on every PR then you will know relatively well what might have caused the leak as it would be related to code changed in the PR itself.
It's much easier to skip some tests rather than to try and update the suppression file (there are 4 tests skipped in this PR, at least one of those should be a leak mentioned in BUG: Fix some bugs found via valgrind #30680 (comment)) which mean those are not covered by AddressSanitizer, for this specific job.

Remarks:

I didn't try to use __lsan_do_leak_check/__lsan_do_recoverable_leak_check to check if it could help narrow down which specific tests are actually leaking. This could be investigated as a future enhancement.
I would expect more than 2 leaks for initialize_static_globals, furthermore, when running the full test suite, those do not appear (as opposed to running python -c "import numpy" or some subprocess tests in the test suite).
Trying to use ghcr.io/nascheme/numpy-asan:3.14t shows many leaks and at this point, I did not check any of those and felt it was better to get something working reliably with the GIL-enabled interpreter rather than nothing.
Running on Linux aarch64 (I do use this locally in a docker container on macOS at home where I do opensource contributions in my spare time) is really slow compared to x86_64 (which I'm more used to at work).
Skipping tests is only done based on the presence of LSAN_OPTIONS in environment variables. I did not want to overcomplicate things to determine if leak sanitizer was actually enabled. This also means that if the suppression file evolves to be empty, it shall probably shall be kept, rather than removed, with no updates to the job or, this skip mechanism should be reworked.
The leaks fixed by BUG: Fix some leaks found via LeakSanitizer #30756 and BUG: Fix some leaks found in f2py via LeakSanitizer #30761 were found while setting up this job in my fork.

ngoldbaum · 2026-02-08T19:11:36Z

Neat, thanks for working on this! I'll try to take a close look at this next week sometime.

ngoldbaum · 2026-02-20T14:50:41Z

Sorry for dropping this! I'll look this over today.

ngoldbaum · 2026-02-20T16:20:17Z

          `find numpy -name "test*.py" | xargs grep -E -l "import threading|ThreadPoolExecutor" | tr '\n' ' '` \
           -- -v -s --timeout=600 --durations=10
+
+  gcc_ASAN:


Sorry for taking two weeks to notice this.

For a while we were using the GCC compiler sanitizers in CI but turned them off after encountering mysterious flaky failures. Perhaps things have improved since then?

If I were setting this up from scratch on Linux, I'd use an LLVM toolchain.

I have a vague feeling that the tests will also run faster under LLVM's ASan implementation. Of course it's apples-to-oranges because of the different CPU and OS but the Mac ASan builds using LLVM run the tests in 8.5 minutes but the GCC ASan tests run in 15 minutes.

Thankfully, we have a neat fix for this already. @nascheme set up ASan docker images that have a CPython build using an LLVM toolchain, see https://github.com/nascheme/cpython_sanity. You should be able to follow the set up of the docker images for the TSan tests, which also use the cpython_sanity images and an LLVM toolchain. This also nicely sidesteps the need to build CPython from source under TSan and the docker images already have a build environment installed.

I haven't seen flakiness running gcc but in any case moving to clang allows to reduce run time a bit, mostly numpy build time, see this workflow run in my fork.
I can move to clang in this PR if this is the preferred option.

As for the Asan docker image, it is not usable without further investigations into the leaks reported when running the free-threaded python build which is the only one available in those images as mentioned in my 3rd remark in the description: "Trying to use ghcr.io/nascheme/numpy-asan:3.14t shows many leaks and at this point, I did not check any of those and felt it was better to get something working reliably with the GIL-enabled interpreter rather than nothing."
I'm trying to narrow down the tests causing those leaks but it will take weeks given the time I can spend on this in my spare time.

which is the only one available in those images as mentioned in my 3rd remark in the description

Sorry, missed this.

I'd like to fix these issues on the free-threaded build, but no worries if you're not interested.

I can move to clang in this PR if this is the preferred option.

Let's do that.

Sorry, missed this.

no worries

I'd like to fix these issues on the free-threaded build, but no worries if you're not interested.

I'll try & post the list of test files that shows leak even if I can't narrow it down to individual tests just yet. This first pass should be faster and can help further investigations by others. Maybe this is better posted as a new issue in the bug tracker.

Let's do that.

done. rebased on main and added clang-21 installation.

ngoldbaum · 2026-02-26T15:12:18Z

This now comes in at 25 minutes, which is only about a minute longer than our longest QEMU-based test jobs.

I think there are plenty of opportunities to speed this up and I'm OK with adding another long test job - particularly one that will help us avoid memory leaks in CI.

@charris @rgommers @seberg what do you all think?

seberg · 2026-02-26T15:15:38Z

I agree, I think this is awesome to have on regular CI, thanks for all the work!

ngoldbaum

I looked over the CI logs and didn't see anything besides clang-21 adding some compiler warnings, which isn't something to worry about here. One question inline.

ngoldbaum · 2026-02-26T15:19:04Z

+    - name: Test
+      run: |
+        # pass -s to pytest to see ASAN errors and warnings, otherwise pytest captures them
+        export ASAN_OPTIONS="detect_leaks=1:symbolize=1:strict_init_order=true:allocator_may_return_null=1:use_sigaltstack=0"


First time I've seen use_sigaltstack or strict_init_order. Should we maybe be enabling in other sanitizer jobs?

I've never used them myself before and never saw them either. They come from a grep on ASAN_OPTIONS on the numpy code base which lead me to the following line:

numpy/pixi-packages/asan/pixi.toml

Line 16 in 089ceb7

env.ASAN_OPTIONS = "detect_leaks=0:symbolize=1:strict_init_order=true:allocator_may_return_null=1:use_sigaltstack=0"

Given it seemed to be used for the linux-x64 build in their, I've copied the same options in the new job.

rgommers · 2026-02-26T15:21:45Z

Agreed, sounds like a good idea to me - I don't see a problem with one more heavy job. Thanks for doing this.

rgommers

Overall LGTM, approving - just two small suggestions

rgommers · 2026-02-26T15:25:51Z

+
+    - name: Install clang 21
+      run: |
+        curl -fsSLo /tmp/llvm.sh https://apt.llvm.org/llvm.sh


This looks a bit questionable, if acceptable. We should replace it as soon as Clang 21 packages are available from Ubuntu though, it looks like that's in the 26.04 image which should arrive in a month or two. Maybe add a comment?

comment added.

This looks a bit questionable, if acceptable.

What kind of issue are you worried about ?

I don't want to speak for Ralf, but I guess just depending on upstream binaries instead of system binaries is a little odd. Maybe also the security implications of downloading and executing binaries like this?

Yes, just a general security principle. The more random downloads there are, with no pinning or hashes, the wider the trust circle. If it's the best option for right now, then sure it's not a major issue. But when we can replace it with trusting Ubuntu packages, we should.

I don't think this could be considered as a major supply chain security risk given the source (not so random, upstream LLVM repos) and the fact that's this is only a test job which does not expose sensitive secrets and does not produce any artefacts. It could have been better by manually adding the repo and the public key rather than an unchecked script download but I did not considered the risk enough to warrant for the extra complexity.

I agree however that this can count as an exemple of bad habit regarding supply chain security risk.

@nascheme published some new images including a GIL-enabled ASAN python build. See #30905 for an update of this job. This moves the supply chain security risk to what's been accepted for the TSAN job.

Yep, agreed with all your thoughts - thanks for the follow-up!

I wonder if would be good if I change my docker images to use Ubuntu 25 rather than 24. The 25 release comes with Clang 21 as an Ubuntu package. For testing with TSAN or ASAN, perhaps it's better to be running on the newest stable OS version that's supported.

That does seem useful indeed, newer Clang versions can help. But no hurry to change it right now on account of NumPy.

rgommers · 2026-02-26T15:27:47Z

+
+    - name: Build Python with address sanitizer
+      run: |
+        CONFIGURE_OPTS="--with-address-sanitizer" pyenv install 3.14


The line numbers in the suppressions file might change with bug fix releases. Might be better to use 3.14.3 here?

The line numbers in the suppressions file are just illustrative (the same goes for numpy line numbers in there).
They are not used to determine suppressions, it's just a comment.
I used the same major.minor minus free-threading as what's found in https://github.com/nascheme/cpython_sanity (although cpython 3.14 images have not been rebuilt in a while so stuck to 3.14.0 for now).
I see both pros and cons in pinning python to an exact release. In the pros, obviously stability of the job, it won't fail because a new cpython version is available. On the other hand, I think it would be best to test with the latest patch version as is done in most of the other jobs.

I'm in the process of building images for 3.14.3. They will also be built automatically when there is a new release, which will be nice.

- add a comment to use regular ubuntu package once ubuntu 26.04 is available

ngoldbaum · 2026-02-27T19:30:43Z

Let's bring this in and see if it is noisy or, even better, starts telling us when we write memory leaks.

Thanks so much for the contribution @mayeut and also for your work on manylinux.

jorenham changed the title ~~ci: add Linux x86_64 gcc ASAN+LSAN job~~ CI: add Linux x86_64 gcc ASAN+LSAN job Feb 20, 2026

jorenham added the component: CI label Feb 20, 2026

jorenham reviewed Feb 20, 2026

View reviewed changes

Comment thread .github/workflows/compiler_sanitizers.yml

ngoldbaum reviewed Feb 20, 2026

View reviewed changes

ci: add Linux clang ASAN job

fc808db

mayeut force-pushed the linux-x64-gcc-asan branch from 8c96728 to fc808db Compare February 26, 2026 06:46

mayeut changed the title ~~CI: add Linux x86_64 gcc ASAN+LSAN job~~ CI: add Linux x86_64 clang ASAN+LSAN job Feb 26, 2026

ngoldbaum approved these changes Feb 26, 2026

View reviewed changes

rgommers approved these changes Feb 26, 2026

View reviewed changes

address review comments

934e2d1

- add a comment to use regular ubuntu package once ubuntu 26.04 is available

ngoldbaum merged commit fb3b178 into numpy:main Feb 27, 2026
77 checks passed

mayeut mentioned this pull request Feb 28, 2026

CI: use nascheme/cpython_sanity for clang ASAN #30905

Merged

mayeut deleted the linux-x64-gcc-asan branch February 28, 2026 19:58

sabasiddique1 pushed a commit to sabasiddique1/numpy that referenced this pull request Mar 4, 2026

CI: add Linux x86_64 clang ASAN+LSAN job (numpy#30800)

d87cafd

Uh oh!

Conversation

mayeut commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Feb 8, 2026

Uh oh!

Uh oh!

ngoldbaum commented Feb 20, 2026

Uh oh!

ngoldbaum Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngoldbaum commented Feb 26, 2026

Uh oh!

seberg commented Feb 26, 2026

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgommers commented Feb 26, 2026

Uh oh!

rgommers left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngoldbaum commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mayeut commented Feb 7, 2026 •

edited

Loading

ngoldbaum Feb 20, 2026 •

edited

Loading