Skip to content

CI: add Linux x86_64 clang ASAN+LSAN job#30800

Merged
ngoldbaum merged 2 commits intonumpy:mainfrom
mayeut:linux-x64-gcc-asan
Feb 27, 2026
Merged

CI: add Linux x86_64 clang ASAN+LSAN job#30800
ngoldbaum merged 2 commits intonumpy:mainfrom
mayeut:linux-x64-gcc-asan

Conversation

@mayeut
Copy link
Copy Markdown
Contributor

@mayeut mayeut commented Feb 7, 2026

This PR adds a Linux x86_64 clang AddressSanitizer+LeakSanitizer job as discussed in #30761 (comment) and #30583

I'll try to address the various comments found in #30583 regarding leak checks in this description.

For now, AddressSanitizer is only ran on macOS arm64 so it does make sense to add a job for Linux x86_64 which likely covers a much larger share of users and uses different optimisations not yet covered by existing AddressSanitizer tests. LeakSanitizer should not add much overhead to testing time compared to AddressSanitizer tests.

Limitations:

  • CPython itself does not seem to be tested with detect_leaks=1 and thus, leaks in CPython might be revealed here on CPython upgrades.
  • The actual leak check is only done once the interpreter exits and the (native only) stack trace does not help tracking which tests leak. As mentioned by @seberg in ENH: add C memory leak regression tests #30583 (comment), if this runs on every PR then you will know relatively well what might have caused the leak as it would be related to code changed in the PR itself.
  • It's much easier to skip some tests rather than to try and update the suppression file (there are 4 tests skipped in this PR, at least one of those should be a leak mentioned in BUG: Fix some bugs found via valgrind #30680 (comment)) which mean those are not covered by AddressSanitizer, for this specific job.

Remarks:

  • I didn't try to use __lsan_do_leak_check/__lsan_do_recoverable_leak_check to check if it could help narrow down which specific tests are actually leaking. This could be investigated as a future enhancement.
  • I would expect more than 2 leaks for initialize_static_globals, furthermore, when running the full test suite, those do not appear (as opposed to running python -c "import numpy" or some subprocess tests in the test suite).
  • Trying to use ghcr.io/nascheme/numpy-asan:3.14t shows many leaks and at this point, I did not check any of those and felt it was better to get something working reliably with the GIL-enabled interpreter rather than nothing.
  • Running on Linux aarch64 (I do use this locally in a docker container on macOS at home where I do opensource contributions in my spare time) is really slow compared to x86_64 (which I'm more used to at work).
  • Skipping tests is only done based on the presence of LSAN_OPTIONS in environment variables. I did not want to overcomplicate things to determine if leak sanitizer was actually enabled. This also means that if the suppression file evolves to be empty, it shall probably shall be kept, rather than removed, with no updates to the job or, this skip mechanism should be reworked.
  • The leaks fixed by BUG: Fix some leaks found via LeakSanitizer #30756 and BUG: Fix some leaks found in f2py via LeakSanitizer #30761 were found while setting up this job in my fork.

@ngoldbaum
Copy link
Copy Markdown
Member

Neat, thanks for working on this! I'll try to take a close look at this next week sometime.

@jorenham jorenham changed the title ci: add Linux x86_64 gcc ASAN+LSAN job CI: add Linux x86_64 gcc ASAN+LSAN job Feb 20, 2026
Comment thread .github/workflows/compiler_sanitizers.yml
@ngoldbaum
Copy link
Copy Markdown
Member

Sorry for dropping this! I'll look this over today.

`find numpy -name "test*.py" | xargs grep -E -l "import threading|ThreadPoolExecutor" | tr '\n' ' '` \
-- -v -s --timeout=600 --durations=10

gcc_ASAN:
Copy link
Copy Markdown
Member

@ngoldbaum ngoldbaum Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking two weeks to notice this.

For a while we were using the GCC compiler sanitizers in CI but turned them off after encountering mysterious flaky failures. Perhaps things have improved since then?

If I were setting this up from scratch on Linux, I'd use an LLVM toolchain.

I have a vague feeling that the tests will also run faster under LLVM's ASan implementation. Of course it's apples-to-oranges because of the different CPU and OS but the Mac ASan builds using LLVM run the tests in 8.5 minutes but the GCC ASan tests run in 15 minutes.

Thankfully, we have a neat fix for this already. @nascheme set up ASan docker images that have a CPython build using an LLVM toolchain, see https://github.com/nascheme/cpython_sanity. You should be able to follow the set up of the docker images for the TSan tests, which also use the cpython_sanity images and an LLVM toolchain. This also nicely sidesteps the need to build CPython from source under TSan and the docker images already have a build environment installed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen flakiness running gcc but in any case moving to clang allows to reduce run time a bit, mostly numpy build time, see this workflow run in my fork.
I can move to clang in this PR if this is the preferred option.

As for the Asan docker image, it is not usable without further investigations into the leaks reported when running the free-threaded python build which is the only one available in those images as mentioned in my 3rd remark in the description: "Trying to use ghcr.io/nascheme/numpy-asan:3.14t shows many leaks and at this point, I did not check any of those and felt it was better to get something working reliably with the GIL-enabled interpreter rather than nothing."
I'm trying to narrow down the tests causing those leaks but it will take weeks given the time I can spend on this in my spare time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is the only one available in those images as mentioned in my 3rd remark in the description

Sorry, missed this.

I'd like to fix these issues on the free-threaded build, but no worries if you're not interested.

I can move to clang in this PR if this is the preferred option.

Let's do that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this.

no worries

I'd like to fix these issues on the free-threaded build, but no worries if you're not interested.

I'll try & post the list of test files that shows leak even if I can't narrow it down to individual tests just yet. This first pass should be faster and can help further investigations by others. Maybe this is better posted as a new issue in the bug tracker.

Let's do that.

done. rebased on main and added clang-21 installation.

@mayeut mayeut force-pushed the linux-x64-gcc-asan branch from 8c96728 to fc808db Compare February 26, 2026 06:46
@mayeut mayeut changed the title CI: add Linux x86_64 gcc ASAN+LSAN job CI: add Linux x86_64 clang ASAN+LSAN job Feb 26, 2026
@ngoldbaum
Copy link
Copy Markdown
Member

This now comes in at 25 minutes, which is only about a minute longer than our longest QEMU-based test jobs.

I think there are plenty of opportunities to speed this up and I'm OK with adding another long test job - particularly one that will help us avoid memory leaks in CI.

@charris @rgommers @seberg what do you all think?

@seberg
Copy link
Copy Markdown
Member

seberg commented Feb 26, 2026

I agree, I think this is awesome to have on regular CI, thanks for all the work!

Copy link
Copy Markdown
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked over the CI logs and didn't see anything besides clang-21 adding some compiler warnings, which isn't something to worry about here. One question inline.

- name: Test
run: |
# pass -s to pytest to see ASAN errors and warnings, otherwise pytest captures them
export ASAN_OPTIONS="detect_leaks=1:symbolize=1:strict_init_order=true:allocator_may_return_null=1:use_sigaltstack=0"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First time I've seen use_sigaltstack or strict_init_order. Should we maybe be enabling in other sanitizer jobs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never used them myself before and never saw them either. They come from a grep on ASAN_OPTIONS on the numpy code base which lead me to the following line:

env.ASAN_OPTIONS = "detect_leaks=0:symbolize=1:strict_init_order=true:allocator_may_return_null=1:use_sigaltstack=0"

Given it seemed to be used for the linux-x64 build in their, I've copied the same options in the new job.

@rgommers
Copy link
Copy Markdown
Member

Agreed, sounds like a good idea to me - I don't see a problem with one more heavy job. Thanks for doing this.

Copy link
Copy Markdown
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, approving - just two small suggestions


- name: Install clang 21
run: |
curl -fsSLo /tmp/llvm.sh https://apt.llvm.org/llvm.sh
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit questionable, if acceptable. We should replace it as soon as Clang 21 packages are available from Ubuntu though, it looks like that's in the 26.04 image which should arrive in a month or two. Maybe add a comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment added.

This looks a bit questionable, if acceptable.

What kind of issue are you worried about ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to speak for Ralf, but I guess just depending on upstream binaries instead of system binaries is a little odd. Maybe also the security implications of downloading and executing binaries like this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just a general security principle. The more random downloads there are, with no pinning or hashes, the wider the trust circle. If it's the best option for right now, then sure it's not a major issue. But when we can replace it with trusting Ubuntu packages, we should.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this could be considered as a major supply chain security risk given the source (not so random, upstream LLVM repos) and the fact that's this is only a test job which does not expose sensitive secrets and does not produce any artefacts. It could have been better by manually adding the repo and the public key rather than an unchecked script download but I did not considered the risk enough to warrant for the extra complexity.

I agree however that this can count as an exemple of bad habit regarding supply chain security risk.

@nascheme published some new images including a GIL-enabled ASAN python build. See #30905 for an update of this job. This moves the supply chain security risk to what's been accepted for the TSAN job.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, agreed with all your thoughts - thanks for the follow-up!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if would be good if I change my docker images to use Ubuntu 25 rather than 24. The 25 release comes with Clang 21 as an Ubuntu package. For testing with TSAN or ASAN, perhaps it's better to be running on the newest stable OS version that's supported.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does seem useful indeed, newer Clang versions can help. But no hurry to change it right now on account of NumPy.


- name: Build Python with address sanitizer
run: |
CONFIGURE_OPTS="--with-address-sanitizer" pyenv install 3.14
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line numbers in the suppressions file might change with bug fix releases. Might be better to use 3.14.3 here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line numbers in the suppressions file are just illustrative (the same goes for numpy line numbers in there).
They are not used to determine suppressions, it's just a comment.
I used the same major.minor minus free-threading as what's found in https://github.com/nascheme/cpython_sanity (although cpython 3.14 images have not been rebuilt in a while so stuck to 3.14.0 for now).
I see both pros and cons in pinning python to an exact release. In the pros, obviously stability of the job, it won't fail because a new cpython version is available. On the other hand, I think it would be best to test with the latest patch version as is done in most of the other jobs.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in the process of building images for 3.14.3. They will also be built automatically when there is a new release, which will be nice.

- add a comment to use regular ubuntu package once ubuntu 26.04 is available
@ngoldbaum
Copy link
Copy Markdown
Member

Let's bring this in and see if it is noisy or, even better, starts telling us when we write memory leaks.

Thanks so much for the contribution @mayeut and also for your work on manylinux.

@ngoldbaum ngoldbaum merged commit fb3b178 into numpy:main Feb 27, 2026
77 checks passed
@mayeut mayeut deleted the linux-x64-gcc-asan branch February 28, 2026 19:58
sabasiddique1 pushed a commit to sabasiddique1/numpy that referenced this pull request Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants