Skip to content

Move insertSorted from server to core, use for diagnostic collections#21401

Merged
weswigham merged 5 commits intomicrosoft:masterfrom
weswigham:insert-sorted
Feb 8, 2018
Merged

Move insertSorted from server to core, use for diagnostic collections#21401
weswigham merged 5 commits intomicrosoft:masterfrom
weswigham:insert-sorted

Conversation

@weswigham
Copy link
Copy Markdown
Member

This causes us to spend considerably less time sorting diagnostics in projects with many files where we alternate between getting the file's diagnostics and adding diagnostics (~10s, or 8% on a very large project). Additionally, I've upped the time rwc tests have to verify js output, because our newest RWC test takes 5s (!!!) just to textually compare its js emit, and mocha's default 2s timeout was causing it to timeout when I ran it via mocha directly (to profile it).

@weswigham weswigham requested review from a user, mhegazy and sandersn January 25, 2018 00:03
@ghost
Copy link
Copy Markdown

ghost commented Jan 25, 2018

It seemed surprising that it would be faster to insert sorted than to just push and sort at the end, then I noticed:

we alternate between getting the file's diagnostics and adding diagnostics

That would explain it, if we were adding some diagnostics, doing a complete sort, then adding one more and doing another complete sort. Can you explain why we do this? When do we add diagnostics after having gotten the sorted diagnostics?

@weswigham
Copy link
Copy Markdown
Member Author

Minimally, we always get diagnostics after each file, and our largest rwc test now has 2100 files.

@ghost
Copy link
Copy Markdown

ghost commented Jan 25, 2018

OK, but why do we query the complete list of diagnostics after each individual file? It looks like we have per-file lists available.

@weswigham
Copy link
Copy Markdown
Member Author

We query the global diagnostics before and after each file to see if they've mutated (to see if the global diagnostic is caused by the current file? I'm not fully sure on the reason) doing so caused us to sort and deduplicate all diagnostics, all the time.

@sheetalkamat
Copy link
Copy Markdown
Member

@weswigham why not just sort and deduplicate diagnostics for the given fileName or global ones as they are asked.. That would be better than doing it on all files all the time and yet we dont need to to insert and keep diagnostics sorted

@weswigham
Copy link
Copy Markdown
Member Author

weswigham commented Feb 1, 2018

@sheetalkamat quicksort (as is usually done by the JS engine) has an average time complexity of n log n, but a worst case complexity of n^2 (though it may use another algo if it heuristically determines it's in a worst-case scenario). Inserting sorted is a log n operation, done n times, so is just always n log n time complexity. We always ask for sorted diagnostics so this is minimally as good as the previous delayed-sorting method from a computational standpoint, since we will always query for diagnostics at least once, and this significantly reduces the constant factor if we query for diagnostics (between mutations) more than once. Plus, it makes the implementation way cleaner to just keep a sorted list at all times (for both), since then we don't need to keep dirty bits in the state, or conditional branches on get.

@mhegazy
Copy link
Copy Markdown
Contributor

mhegazy commented Feb 2, 2018

@weswigham i am not sure i understand the original issue, why were we alternating between getting errors and adding them? is that for global? or for file diagnostics?

@weswigham
Copy link
Copy Markdown
Member Author

getSemanticDiagnostics, as used by emitWorker or getPreEmitDiagnostics, takes a source file (or not). When it is called with an individual source file (as is always true in emitWorker and sometimes true for getPreEmitDiagnostics), it called into checker's getDiagnostics function, which in turn calls getDiagnosticsWorker, which, when called with a sourceFile, gets the global diagnostics, then checks the file, then gets the file diagnostics and the global diagnostics again (so it can return only global diagnostics added during the check of that file). All three of those calls were capable of triggering diagnostic sorting (though only the second is likely to). Previously when any of the methods on diagnostic collection were called to get any subset of the diagnostics, it would sort all diagnostics (unless it had already been sorted and hadn't been modified in any way) - including those from files that weren't currently being looked at. Additionally, whenever you queried for all diagnostics, it would sort and dedupe everything... then combine them all into an all diagnostics array... then sort and dedupe them again. (The first sort and dedupe being extraneous.)

While a sort-once strategy is likely fine for per-file diagnostics (provided it's actually only sorted once ever), both the list of all diagnostics and the global diagnostics list need to be constantly resorted, since additions (can) occur in every file. Rather than holding a dirty flag for every file and sorting file diagnostics once, but also inserting sorted for the global and all diagnostics lists, it's much simpler, implementation-wise, to use insertSorted for all of the lists, and is just as effective to use for the per-file lists provided we always actually query for them (and we always query for diagnostics if the checker is configured to generate them), and is way less likely to become a unintended performance pitfall in the future (for example, if there's some future feature where you only check part of a file at a time).

@mhegazy
Copy link
Copy Markdown
Contributor

mhegazy commented Feb 7, 2018

I like your suggestion of creating allDiagnostics list on demand by combining all file diagnostics in the right order instead of having to keep a second sorted list. let's do that as well.

@weswigham
Copy link
Copy Markdown
Member Author

@mhegazy done.

Comment thread src/compiler/utilities.ts Outdated
fileDiagnostics.forEach((diagnostics, key) => {
fileDiagnostics.set(key, sortAndDeduplicateDiagnostics(diagnostics));
});
return [...nonFileDiagnostics, ...flatMap(filesWithDiagnostics, f => fileDiagnostics.get(f))];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont like usage of flatMap here.. We are unnecessarily constructing one big array just to be merged.. better would be to use push with filesWithDiagnostics.forEach

Copy link
Copy Markdown
Member Author

@weswigham weswigham Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

return [...nonFileDiagnostics, ...flatMap(filesWithDiagnostics, f => fileDiagnostics.get(f))];

is more idiomatic than

            const fileDiags = flatMap(filesWithDiagnostics, f => fileDiagnostics.get(f));
            if (!nonFileDiagnostics.length) {
                return fileDiags;
            }
            fileDiags.unshift(...nonFileDiagnostics);
            return fileDiags;

(and would also avoid making any intermediate arrays if we actually targeted es6) but I'll change it.

@weswigham weswigham merged commit 871e71d into microsoft:master Feb 8, 2018
@weswigham weswigham deleted the insert-sorted branch February 8, 2018 01:01
@microsoft microsoft locked and limited conversation to collaborators Jul 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants