Fix space leak between modules during compilation#4517
Fix space leak between modules during compilation#4517JordanMartinez merged 27 commits intopurescript:masterfrom
Conversation
For builds with a lot of warnings, memory usage grows drastically since it appears that the thunks for the warnings hang onto a lot of memory from compiling the module itself. The goal of this change is to get memory usage for full builds back in line with partial builds.
|
Wow! Nice work. I can try to give it a run this weekend. Sorry I never got you a smaller repro case like I promised |
|
@MonoidMusician I built your PR of purescript locally: I'm still seeing some pretty sizable memory pressure issues and longer compile times than on 15.2. I'm seeing about 26GB peak ram usage by this PR and 4m4s for a compilation of the bottleneck package whereas 15.2 maxes out at 1GB of ram and runs in 22 seconds. |
|
Thanks for taking the time to try it out! I have pushed some more commits you can try: being more strict in the data kept between modules, and removing the CSE (in two commits: just dropping the CSE pass, and reverting the structural changes too). On our codebase I actually don’t see a noticeable difference with these approaches. Just hoping it might help your codebase. If it doesn’t, we’ll have to look elsewhere! |
|
I feel like reverting the CSE pass doesn't give Ryan an opportunity to refactor that code. And I'd guess he hasn't because (besides potentially not having time) we still don't have a reproducible example. |
|
Yeah, I understand, this is just for debugging purposes to narrow down if it is CSE or not. (I don't necessarily believe it is CSE but that was one of the biggest changes that I can point to in the timeframe. Although I have another commit to look at now ...) |
|
@MonoidMusician I went to 7cba4ec and its close to 15.2 performance!
The memory pressure is greatly reduced, only about 2GB of ram, way under 26GB earlier in the PR. I'll give the commits with just strictness a try |
|
Yeah looks to be CSE: On 33582d7
|
|
Again, if you can carve out something that you can make public and demonstrates the issue—it doesn't have to be minimal—there are at least a half dozen different things I can imagine that might improve memory use in the CSE pass and I would be happy to investigate if any of them actually help. |
|
I was writing up some explanation for why I can't make the file public b/c its internal types and we should try scripting out the generation of a 14k line file, while at the same time testing a fix that seems to alleviate the problem.
Replacing I was expecting it to be something like that, building up thunks in a lazy structure. I thought you might need to use RWS.Strict too but this is enough for a dramatic improvement |
|
interesting! it looks like we're chasing down rather separate issues then I also have a fix coming up for more inter-module leaks WRT concurrency, so that should improve it even more (also will help narrow down what modules are problematic) |
This ensures that modules are fully built in one pass, to avoid partial builds being interrupted and holding onto memory in the meantime.
…escript#3915)"" This reverts commit 7cba4ec.
This reverts commit 33582d7.
|
I pulled out that single Map change into #4521, thanks again! |
|
More stats from the eventlogs: with -N8 it can stay at 8 concurrent modules for 60% of the build and averages just under 6 concurrent modules ... but the user time speedup is only 2x-3x compared to a single-threaded build? with -N6 it will stay at 6 concurrent modules for 70% of the build and average just under 5 concurrent modules, with comparable user time and a bit lower memory usage with -N4 it gets a little slower (maybe 10%?) but still less memory usage I can replicate these proportions without profiling, so there's probably more to investigate performance wise, but I'm happy with this for now. |
|
Sorry for the commit noise, but I think this should be good to go. The changes are:
Can I keep There's still some work that could be done in optimization (e.g. purs is spending half of its time in GC), but this is good improvement for now. |
|
@MonoidMusician did you benchmark running times? I.e. does it improve memory usage while being about as fast as before, or is there a tradeoff involved? (with a longer running time) |
|
I timed each run and didn't see noticeable slow downs from the fixes here. Just speedups from the reduced memory usage, where applicable. |
|
@MonoidMusician Is this PR ready for review? Or are you still working on it? It's still in a draft state, so I'm unsure whether to wait or to start reviewing. |
|
Yeah, I'll un-draft it |
JordanMartinez
left a comment
There was a problem hiding this comment.
Just some general comments.
|
Here's the steps I took. I don't know what I'm doing so feel free to correct me when I'm wrong. Since there's not a public codebase to compile this PR against, I went with the package set. # Enter purescript root directory
$ pushd purescript
# Make a profile build
$ stack build --profile
# Install entire package set in another dir called 'z'
$ mkdir z
$ pushd z
$ spago init
$ spago install $(spago sources | cut -f 1 -d ' ' | tr '\n' ' ')
$ popd
# Setup eventlog2html in separate directory because PS's stack config
# doesn't work on eventlog2html's latest version
$ popd
$ mkdir eventlog2html-temp-build
$ pushd eventlog2html-temp-build
$ stack build --copy-compiler-tool eventlog2html
$ popd
# Run the compiler on the package set
$ pushd purescript/z
# ... except that the below command fails...
$ stack --profile exec bash <<EOF
purs +RTS -l-agu -i1.5 -hc -RTS compile -g corefn $(spago sources)
EOF
# ... something about `purs` needing to be built with `-profile`.
# After more researching later, I passed the globs directly by producing the string manually
$ stack --profile exec -- purs +RTS -l-agu -i1.5 -hc -RTS compile -g corefn "src/**/*.purs" "test/**/*.purs" ".spago/prelude/v6.0.1/src/**/*.purs" # and so forth...
# Produce the HTML file
$ path/to/install/eventlog2html purs.eventlog
# Produce the JSON file
$ path/to/install/eventlog2html --json purs.eventlog
$ node ../debug/eventlog.js purs.eventlog.json > eventlog-output.txtOutputs I get are:
|
rhendric
left a comment
There was a problem hiding this comment.
It's unfortunate how deep into declarations/expressions the NFData instances need to go, but I also don't see a straightforward alternative that achieves the goal, and I agree with the general direction. After some cleanup, I'm good with this.
|
Thanks for the reviews! Should be good now @rhendric Agreed on all the places NFData needs to reach. But I'd rather be sure that the problem isn't going to come back, and I don't think there's noticeable performance impact from it. Thanks for the style tips. @JordanMartinez I added those stack profiling instructions to eventlog.js too, thanks for that. I think the reason that Also a singular glob |




To fix #4491.
For builds with a lot of warnings, memory usage grows drastically since it appears that the thunks for the warnings hang onto a lot of memory from compiling the module itself.
The goal of this change is to get memory usage for full builds back in line with partial builds.
This is just a WIP patch for now, showing a sort of minimal solution to the problem that isn't overly strict. I'd appreciate suggestions on how to accomplish it better.
rnf x = rnf (show x)is obviously a badNFData SimpleErrorMessageinstance, but it was going to be very tedious to deriveNFDatafor every transitive data type. Perhaps it's better to make a dedicated function that computes just enough strictness for the errors? Or adding more strictness when computing the warnings in the first place, and to the data types themselves.For part of our work codebase (which generates a lot of warnings), the difference is quite dramatic: 4GB of memory usage down to under 1GB, as measured by the GHC RTS profiling:
The difference is less dramatic, but still visible, when running on just our public dependencies – I suspect this is since they produce much fewer warnings.
I've been profiling with this command, though
-hcis not a particularly helpful heap profile to track in retrospect:@mjrussell do you have a way to compile PureScript and test this out on your codebase?
Checklist: