Skip to content

Fix space leak between modules during compilation#4517

Merged
JordanMartinez merged 27 commits intopurescript:masterfrom
MonoidMusician:mm/module-space-leak
Dec 20, 2023
Merged

Fix space leak between modules during compilation#4517
JordanMartinez merged 27 commits intopurescript:masterfrom
MonoidMusician:mm/module-space-leak

Conversation

@MonoidMusician
Copy link
Copy Markdown
Contributor

@MonoidMusician MonoidMusician commented Nov 7, 2023

To fix #4491.

For builds with a lot of warnings, memory usage grows drastically since it appears that the thunks for the warnings hang onto a lot of memory from compiling the module itself.

The goal of this change is to get memory usage for full builds back in line with partial builds.

This is just a WIP patch for now, showing a sort of minimal solution to the problem that isn't overly strict. I'd appreciate suggestions on how to accomplish it better.

rnf x = rnf (show x) is obviously a bad NFData SimpleErrorMessage instance, but it was going to be very tedious to derive NFData for every transitive data type. Perhaps it's better to make a dedicated function that computes just enough strictness for the errors? Or adding more strictness when computing the warnings in the first place, and to the data types themselves.

For part of our work codebase (which generates a lot of warnings), the difference is quite dramatic: 4GB of memory usage down to under 1GB, as measured by the GHC RTS profiling:

image image

The difference is less dramatic, but still visible, when running on just our public dependencies – I suspect this is since they produce much fewer warnings.

I've been profiling with this command, though -hc is not a particularly helpful heap profile to track in retrospect:

$PURS +RTS -l-agu -i0.5 -hc -RTS compile -g corefn $(spago sources)

@mjrussell do you have a way to compile PureScript and test this out on your codebase?

Checklist:

  • Added a file to CHANGELOG.d for this PR (see CHANGELOG.d/README.md)
  • Linked any existing issues or proposals that this pull request should close
  • Updated or added relevant documentation
  • Added a test for the contribution (if applicable)

For builds with a lot of warnings, memory usage grows drastically
since it appears that the thunks for the warnings hang onto a lot
of memory from compiling the module itself.

The goal of this change is to get memory usage for full builds
back in line with partial builds.
@mjrussell
Copy link
Copy Markdown
Contributor

Wow! Nice work. I can try to give it a run this weekend. Sorry I never got you a smaller repro case like I promised

@mjrussell
Copy link
Copy Markdown
Contributor

@MonoidMusician I built your PR of purescript locally:

purs --version
0.15.12 [development build; commit: 27ec9cea1e45774647c58dfccc1665afc6d1d710 DIRTY]

I'm still seeing some pretty sizable memory pressure issues and longer compile times than on 15.2. I'm seeing about 26GB peak ram usage by this PR and 4m4s for a compilation of the bottleneck package whereas 15.2 maxes out at 1GB of ram and runs in 22 seconds.

@MonoidMusician
Copy link
Copy Markdown
Contributor Author

Thanks for taking the time to try it out!

I have pushed some more commits you can try: being more strict in the data kept between modules, and removing the CSE (in two commits: just dropping the CSE pass, and reverting the structural changes too).

On our codebase I actually don’t see a noticeable difference with these approaches. Just hoping it might help your codebase. If it doesn’t, we’ll have to look elsewhere!

@JordanMartinez
Copy link
Copy Markdown
Contributor

I feel like reverting the CSE pass doesn't give Ryan an opportunity to refactor that code. And I'd guess he hasn't because (besides potentially not having time) we still don't have a reproducible example.

@MonoidMusician
Copy link
Copy Markdown
Contributor Author

Yeah, I understand, this is just for debugging purposes to narrow down if it is CSE or not. (I don't necessarily believe it is CSE but that was one of the biggest changes that I can point to in the timeframe. Although I have another commit to look at now ...)

@mjrussell
Copy link
Copy Markdown
Contributor

@MonoidMusician I went to 7cba4ec and its close to 15.2 performance!

spago build 88.56s user 7.10s system 272% cpu 35.124 total so thats like ~3-4x worse time, but not 10-11x.

The memory pressure is greatly reduced, only about 2GB of ram, way under 26GB earlier in the PR.

I'll give the commits with just strictness a try

@mjrussell
Copy link
Copy Markdown
Contributor

dec9091 alone doesn't seem to fix the issue, in fact..it might be worse than 27ec9ce.

spago build 254.14s user 160.48s system 136% cpu 5:04.05 total

The memory pressure was really strong, like 40GB of ram. On to the next

@mjrussell
Copy link
Copy Markdown
Contributor

Yeah looks to be CSE: On 33582d7

spago build 87.00s user 7.48s system 246% cpu 38.336 total 1-2 GB of ram

@rhendric
Copy link
Copy Markdown
Member

Again, if you can carve out something that you can make public and demonstrates the issue—it doesn't have to be minimal—there are at least a half dozen different things I can imagine that might improve memory use in the CSE pass and I would be happy to investigate if any of them actually help.

@mjrussell
Copy link
Copy Markdown
Contributor

mjrussell commented Nov 23, 2023

I was writing up some explanation for why I can't make the file public b/c its internal types and we should try scripting out the generation of a 14k line file, while at the same time testing a fix that seems to alleviate the problem.

spago build 124.99s user 12.42s system 206% cpu 1:06.55 total It has a bit higher mem pressure ~2GB constant but its easily workable.

Replacing Data.Map with Data.Map.Strict in CSE.hs does the trick.

I was expecting it to be something like that, building up thunks in a lazy structure. I thought you might need to use RWS.Strict too but this is enough for a dramatic improvement

@MonoidMusician
Copy link
Copy Markdown
Contributor Author

interesting! it looks like we're chasing down rather separate issues then

I also have a fix coming up for more inter-module leaks WRT concurrency, so that should improve it even more

(also will help narrow down what modules are problematic)

This ensures that modules are fully built in one pass,
to avoid partial builds being interrupted and holding onto
memory in the meantime.
This reverts commit 33582d7.
@mjrussell
Copy link
Copy Markdown
Contributor

I pulled out that single Map change into #4521, thanks again!

@MonoidMusician
Copy link
Copy Markdown
Contributor Author

Okay, here's the second fix I promised:

With +RTS -N1 -RTS, the Haskell runtime was single threaded but the module builds were getting interleaved in some way, so the memory usage kind of bled across modules and peaked at 4.6GB. After introducing a semaphore to limit the concurrent module builds to the number of runtime threads, the memory usage looks more reasonable (a nice sawtooth graph) and peaked at 1.9GB.

image image

I verified with the event logging that there are no longer too many modules being built simultaneously. (See trace.js if you want to replicate this.)

It matters less for builds with -N8 (the default on my system) but still seems to make a difference.

I don't see significant differences with the CSE fix/reverts on this build, even measuring memory per-module in single threaded mode, so I guess it is orthogonal.

@MonoidMusician
Copy link
Copy Markdown
Contributor Author

More stats from the eventlogs:

with -N8 it can stay at 8 concurrent modules for 60% of the build and averages just under 6 concurrent modules ... but the user time speedup is only 2x-3x compared to a single-threaded build?

with -N6 it will stay at 6 concurrent modules for 70% of the build and average just under 5 concurrent modules, with comparable user time and a bit lower memory usage

with -N4 it gets a little slower (maybe 10%?) but still less memory usage

I can replicate these proportions without profiling, so there's probably more to investigate performance wise, but I'm happy with this for now.

@MonoidMusician
Copy link
Copy Markdown
Contributor Author

Sorry for the commit noise, but I think this should be good to go.

The changes are:

  • NFData instances for more types
  • Semaphore to limit concurrent module builds
  • Deep sequence the module data before returning from that semaphore.
    (NFData is a pretty big hammer but it should ensure we don't see leaks here again and I don't think it noticeably makes things slower.)

Can I keep traces.js around? Maybe stick it in a dev directory or something? I think it is a useful script for debugging compile times.

There's still some work that could be done in optimization (e.g. purs is spending half of its time in GC), but this is good improvement for now.

@f-f
Copy link
Copy Markdown
Member

f-f commented Nov 27, 2023

@MonoidMusician did you benchmark running times? I.e. does it improve memory usage while being about as fast as before, or is there a tradeoff involved? (with a longer running time)

@MonoidMusician
Copy link
Copy Markdown
Contributor Author

I timed each run and didn't see noticeable slow downs from the fixes here. Just speedups from the reduced memory usage, where applicable.

@JordanMartinez
Copy link
Copy Markdown
Contributor

@MonoidMusician Is this PR ready for review? Or are you still working on it? It's still in a draft state, so I'm unsure whether to wait or to start reviewing.

@MonoidMusician
Copy link
Copy Markdown
Contributor Author

Yeah, I'll un-draft it

@MonoidMusician MonoidMusician marked this pull request as ready for review December 2, 2023 04:00
Copy link
Copy Markdown
Contributor

@JordanMartinez JordanMartinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some general comments.

Comment thread traces.js
Comment thread traces.js
Comment thread src/Language/PureScript/AST/Declarations.hs Outdated
Comment thread src/Language/PureScript/AST/Declarations.hs Outdated
@JordanMartinez
Copy link
Copy Markdown
Contributor

Here's the steps I took. I don't know what I'm doing so feel free to correct me when I'm wrong. Since there's not a public codebase to compile this PR against, I went with the package set.

# Enter purescript root directory
$ pushd purescript
# Make a profile build
$ stack build --profile

# Install entire package set in another dir called 'z'
$ mkdir z
$ pushd z
$ spago init
$ spago install $(spago sources | cut -f 1 -d ' ' | tr '\n' ' ')
$ popd
# Setup eventlog2html in separate directory because PS's stack config 
# doesn't work on eventlog2html's latest version
$ popd
$ mkdir eventlog2html-temp-build
$ pushd eventlog2html-temp-build
$ stack build --copy-compiler-tool eventlog2html
$ popd

# Run the compiler on the package set
$ pushd purescript/z
# ... except that the below command fails...
$ stack --profile exec bash <<EOF
purs +RTS -l-agu -i1.5 -hc -RTS compile -g corefn $(spago sources)
EOF
# ... something about `purs` needing to be built with `-profile`.
# After more researching later, I passed the globs directly by producing the string manually
$ stack --profile exec -- purs +RTS -l-agu -i1.5 -hc -RTS compile -g corefn "src/**/*.purs" "test/**/*.purs" ".spago/prelude/v6.0.1/src/**/*.purs" # and so forth...

# Produce the HTML file
$ path/to/install/eventlog2html purs.eventlog
# Produce the JSON file
$ path/to/install/eventlog2html --json purs.eventlog
$ node ../debug/eventlog.js purs.eventlog.json > eventlog-output.txt

Outputs I get are:

Copy link
Copy Markdown
Member

@rhendric rhendric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate how deep into declarations/expressions the NFData instances need to go, but I also don't see a straightforward alternative that achieves the goal, and I agree with the general direction. After some cleanup, I'm good with this.

Comment thread src/Language/PureScript/Make.hs Outdated
Comment thread src/Language/PureScript/AST/Binders.hs Outdated
Comment thread src/Language/PureScript/Make.hs Outdated
Comment thread src/Language/PureScript/AST/Binders.hs Outdated
Comment thread src/Language/PureScript/Sugar/Names/Env.hs Outdated
Comment thread src/Language/PureScript/CST/Layout.hs Outdated
@MonoidMusician
Copy link
Copy Markdown
Contributor Author

Thanks for the reviews! Should be good now

@rhendric Agreed on all the places NFData needs to reach. But I'd rather be sure that the problem isn't going to come back, and I don't think there's noticeable performance impact from it. Thanks for the style tips.

@JordanMartinez I added those stack profiling instructions to eventlog.js too, thanks for that. I think the reason that stack --profile exec bash doesn't work (at least for me) is that my user directories take precedence over the stack provided $PATH.

Also a singular glob .spago/*/*/src/**/*.purs should work to pick up all spago packages (as long as there is only one version of each package downloaded).

Comment thread src/Language/PureScript/CST/Layout.hs Outdated
@JordanMartinez JordanMartinez merged commit e826bff into purescript:master Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression in compilation time and memory in Purescript > 0.15.2

5 participants