Move insertSorted from server to core, use for diagnostic collections by weswigham · Pull Request #21401 · microsoft/TypeScript

weswigham · 2018-01-25T00:03:29Z

This causes us to spend considerably less time sorting diagnostics in projects with many files where we alternate between getting the file's diagnostics and adding diagnostics (~10s, or 8% on a very large project). Additionally, I've upped the time rwc tests have to verify js output, because our newest RWC test takes 5s (!!!) just to textually compare its js emit, and mocha's default 2s timeout was causing it to timeout when I ran it via mocha directly (to profile it).

ghost · 2018-01-25T15:41:02Z

It seemed surprising that it would be faster to insert sorted than to just push and sort at the end, then I noticed:

we alternate between getting the file's diagnostics and adding diagnostics

That would explain it, if we were adding some diagnostics, doing a complete sort, then adding one more and doing another complete sort. Can you explain why we do this? When do we add diagnostics after having gotten the sorted diagnostics?

weswigham · 2018-01-25T20:30:27Z

Minimally, we always get diagnostics after each file, and our largest rwc test now has 2100 files.

ghost · 2018-01-25T20:43:59Z

OK, but why do we query the complete list of diagnostics after each individual file? It looks like we have per-file lists available.

weswigham · 2018-01-25T20:58:01Z

We query the global diagnostics before and after each file to see if they've mutated (to see if the global diagnostic is caused by the current file? I'm not fully sure on the reason) doing so caused us to sort and deduplicate all diagnostics, all the time.

sheetalkamat · 2018-01-25T23:16:53Z

@weswigham why not just sort and deduplicate diagnostics for the given fileName or global ones as they are asked.. That would be better than doing it on all files all the time and yet we dont need to to insert and keep diagnostics sorted

weswigham · 2018-02-01T20:46:38Z

@sheetalkamat quicksort (as is usually done by the JS engine) has an average time complexity of n log n, but a worst case complexity of n^2 (though it may use another algo if it heuristically determines it's in a worst-case scenario). Inserting sorted is a log n operation, done n times, so is just always n log n time complexity. We always ask for sorted diagnostics so this is minimally as good as the previous delayed-sorting method from a computational standpoint, since we will always query for diagnostics at least once, and this significantly reduces the constant factor if we query for diagnostics (between mutations) more than once. Plus, it makes the implementation way cleaner to just keep a sorted list at all times (for both), since then we don't need to keep dirty bits in the state, or conditional branches on get.

mhegazy · 2018-02-02T19:07:28Z

@weswigham i am not sure i understand the original issue, why were we alternating between getting errors and adding them? is that for global? or for file diagnostics?

weswigham · 2018-02-02T21:14:13Z

getSemanticDiagnostics, as used by emitWorker or getPreEmitDiagnostics, takes a source file (or not). When it is called with an individual source file (as is always true in emitWorker and sometimes true for getPreEmitDiagnostics), it called into checker's getDiagnostics function, which in turn calls getDiagnosticsWorker, which, when called with a sourceFile, gets the global diagnostics, then checks the file, then gets the file diagnostics and the global diagnostics again (so it can return only global diagnostics added during the check of that file). All three of those calls were capable of triggering diagnostic sorting (though only the second is likely to). Previously when any of the methods on diagnostic collection were called to get any subset of the diagnostics, it would sort all diagnostics (unless it had already been sorted and hadn't been modified in any way) - including those from files that weren't currently being looked at. Additionally, whenever you queried for all diagnostics, it would sort and dedupe everything... then combine them all into an all diagnostics array... then sort and dedupe them again. (The first sort and dedupe being extraneous.)

While a sort-once strategy is likely fine for per-file diagnostics (provided it's actually only sorted once ever), both the list of all diagnostics and the global diagnostics list need to be constantly resorted, since additions (can) occur in every file. Rather than holding a dirty flag for every file and sorting file diagnostics once, but also inserting sorted for the global and all diagnostics lists, it's much simpler, implementation-wise, to use insertSorted for all of the lists, and is just as effective to use for the per-file lists provided we always actually query for them (and we always query for diagnostics if the checker is configured to generate them), and is way less likely to become a unintended performance pitfall in the future (for example, if there's some future feature where you only check part of a file at a time).

mhegazy · 2018-02-07T23:29:11Z

I like your suggestion of creating allDiagnostics list on demand by combining all file diagnostics in the right order instead of having to keep a second sorted list. let's do that as well.

…st creation lazy and more efficient

weswigham · 2018-02-07T23:51:50Z

@mhegazy done.

sheetalkamat · 2018-02-07T23:55:17Z

-            fileDiagnostics.forEach((diagnostics, key) => {
-                fileDiagnostics.set(key, sortAndDeduplicateDiagnostics(diagnostics));
-            });
+            return [...nonFileDiagnostics, ...flatMap(filesWithDiagnostics, f => fileDiagnostics.get(f))];


Dont like usage of flatMap here.. We are unnecessarily constructing one big array just to be merged.. better would be to use push with filesWithDiagnostics.forEach

I think

return [...nonFileDiagnostics, ...flatMap(filesWithDiagnostics, f => fileDiagnostics.get(f))];

is more idiomatic than

const fileDiags = flatMap(filesWithDiagnostics, f => fileDiagnostics.get(f)); if (!nonFileDiagnostics.length) { return fileDiags; } fileDiags.unshift(...nonFileDiagnostics); return fileDiags;

(and would also avoid making any intermediate arrays if we actually targeted es6) but I'll change it.

weswigham added 3 commits January 24, 2018 15:21

Move insertSorted from server to core, use for diagnostic collections

0c79731

All keep the overall list sorted, too

0d6bc79

Increase timeout for js verification

22ff7c0

weswigham requested review from a user, mhegazy and sandersn January 25, 2018 00:03

sandersn approved these changes Jan 25, 2018

View reviewed changes

mhegazy approved these changes Feb 7, 2018

View reviewed changes

Use knowledge of how diagnostics are sorted to make all diagnostic li…

0948a6b

…st creation lazy and more efficient

sheetalkamat reviewed Feb 7, 2018

View reviewed changes

Staunchly avoid array allocation in favor of resizing an existing array

80a0650

weswigham merged commit 871e71d into microsoft:master Feb 8, 2018

weswigham deleted the insert-sorted branch February 8, 2018 01:01

microsoft locked and limited conversation to collaborators Jul 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move insertSorted from server to core, use for diagnostic collections#21401

Move insertSorted from server to core, use for diagnostic collections#21401
weswigham merged 5 commits intomicrosoft:masterfrom
weswigham:insert-sorted

weswigham commented Jan 25, 2018

Uh oh!

ghost commented Jan 25, 2018

Uh oh!

weswigham commented Jan 25, 2018

Uh oh!

ghost commented Jan 25, 2018

Uh oh!

weswigham commented Jan 25, 2018

Uh oh!

sheetalkamat commented Jan 25, 2018

Uh oh!

weswigham commented Feb 1, 2018 •

edited

Loading

Uh oh!

mhegazy commented Feb 2, 2018

Uh oh!

weswigham commented Feb 2, 2018

Uh oh!

mhegazy commented Feb 7, 2018

Uh oh!

weswigham commented Feb 7, 2018

Uh oh!

sheetalkamat Feb 7, 2018

Uh oh!

weswigham Feb 8, 2018 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

weswigham commented Jan 25, 2018

Uh oh!

ghost commented Jan 25, 2018

Uh oh!

weswigham commented Jan 25, 2018

Uh oh!

ghost commented Jan 25, 2018

Uh oh!

weswigham commented Jan 25, 2018

Uh oh!

sheetalkamat commented Jan 25, 2018

Uh oh!

weswigham commented Feb 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhegazy commented Feb 2, 2018

Uh oh!

weswigham commented Feb 2, 2018

Uh oh!

mhegazy commented Feb 7, 2018

Uh oh!

weswigham commented Feb 7, 2018

Uh oh!

sheetalkamat Feb 7, 2018

Choose a reason for hiding this comment

Uh oh!

weswigham Feb 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

weswigham commented Feb 1, 2018 •

edited

Loading

weswigham Feb 8, 2018 •

edited

Loading