Move insertSorted from server to core, use for diagnostic collections#21401
Move insertSorted from server to core, use for diagnostic collections#21401weswigham merged 5 commits intomicrosoft:masterfrom
Conversation
|
It seemed surprising that it would be faster to insert sorted than to just push and sort at the end, then I noticed:
That would explain it, if we were adding some diagnostics, doing a complete sort, then adding one more and doing another complete sort. Can you explain why we do this? When do we add diagnostics after having gotten the sorted diagnostics? |
|
Minimally, we always get diagnostics after each file, and our largest rwc test now has 2100 files. |
|
OK, but why do we query the complete list of diagnostics after each individual file? It looks like we have per-file lists available. |
|
We query the global diagnostics before and after each file to see if they've mutated (to see if the global diagnostic is caused by the current file? I'm not fully sure on the reason) doing so caused us to sort and deduplicate all diagnostics, all the time. |
|
@weswigham why not just sort and deduplicate diagnostics for the given fileName or global ones as they are asked.. That would be better than doing it on all files all the time and yet we dont need to to insert and keep diagnostics sorted |
|
@sheetalkamat quicksort (as is usually done by the JS engine) has an average time complexity of |
|
@weswigham i am not sure i understand the original issue, why were we alternating between getting errors and adding them? is that for global? or for file diagnostics? |
|
While a sort-once strategy is likely fine for per-file diagnostics (provided it's actually only sorted once ever), both the list of all diagnostics and the global diagnostics list need to be constantly resorted, since additions (can) occur in every file. Rather than holding a dirty flag for every file and sorting file diagnostics once, but also inserting sorted for the global and all diagnostics lists, it's much simpler, implementation-wise, to use insertSorted for all of the lists, and is just as effective to use for the per-file lists provided we always actually query for them (and we always query for diagnostics if the checker is configured to generate them), and is way less likely to become a unintended performance pitfall in the future (for example, if there's some future feature where you only check part of a file at a time). |
|
I like your suggestion of creating allDiagnostics list on demand by combining all file diagnostics in the right order instead of having to keep a second sorted list. let's do that as well. |
…st creation lazy and more efficient
|
@mhegazy done. |
| fileDiagnostics.forEach((diagnostics, key) => { | ||
| fileDiagnostics.set(key, sortAndDeduplicateDiagnostics(diagnostics)); | ||
| }); | ||
| return [...nonFileDiagnostics, ...flatMap(filesWithDiagnostics, f => fileDiagnostics.get(f))]; |
There was a problem hiding this comment.
Dont like usage of flatMap here.. We are unnecessarily constructing one big array just to be merged.. better would be to use push with filesWithDiagnostics.forEach
There was a problem hiding this comment.
I think
return [...nonFileDiagnostics, ...flatMap(filesWithDiagnostics, f => fileDiagnostics.get(f))];is more idiomatic than
const fileDiags = flatMap(filesWithDiagnostics, f => fileDiagnostics.get(f));
if (!nonFileDiagnostics.length) {
return fileDiags;
}
fileDiags.unshift(...nonFileDiagnostics);
return fileDiags;(and would also avoid making any intermediate arrays if we actually targeted es6) but I'll change it.
This causes us to spend considerably less time sorting diagnostics in projects with many files where we alternate between getting the file's diagnostics and adding diagnostics (~10s, or 8% on a very large project). Additionally, I've upped the time rwc tests have to verify js output, because our newest RWC test takes 5s (!!!) just to textually compare its js emit, and mocha's default 2s timeout was causing it to timeout when I ran it via mocha directly (to profile it).