Break out of speculative parsing on bad parameter initializer#19158
Conversation
|
Are those results for trials of the same input or different inputs? |
|
In order, those are the numbers for:
I made a new commit that removed uses of
|
|
@sandersn Any comments? |
|
So it sounds like performance is 4% better on node and 2% worse on tsc. This is likely fine, given how few people use anything besides node. @rbuckton would you agree? Exceptions-for-control-flow seems like something that would get slower in future updates of V8 rather than faster (if it changes at all of course). |
| let SourceFileConstructor: new (kind: SyntaxKind, pos: number, end: number) => Node; | ||
|
|
||
| let inSpeculation = false; | ||
| const GIVE_UP_SPECULATION = {}; |
There was a problem hiding this comment.
extremely minor: STOP_SPECULATION sounds better to me
|
I'm generally not in favor of using exceptions for control flow and would prefer to keep them reserved for unexpected errors or asserts. @mhegazy can weigh in on his thoughts, but I'd prefer a solution that doesn't depend on exceptions for this. |
| return result; | ||
| } | ||
|
|
||
| function startSpeculation(): SpeculationReset { |
There was a problem hiding this comment.
Was there a reason to use this approach over the previous approach? The previous approach guaranteed that saving/resetting state was properly balanced and didn't depend on an external party to remember to reset the state where necessary. Also, this results in object allocations every time we speculate, which could get expensive given how often we do speculation. I'm curious what the performance impact of this approach is over the previous approach.
One approach to avoid excessive allocations would be to reuse the same SpeculationReset object for shallow speculation, or maintain a pool of SpeculationReset objects to use for this purpose.
Another approach would be to have an array to use as a stack for each state variable and push/pop onto the stack, i.e.:
const posStack: number[] = [];
const startPosStack: number[] = [];
// ...
function startSpeculation() {
posStack.push(pos);
startPosStack.push(startPos);
// ...
}
/** @param accept If true, accept the current state; otherwise, reset to the prior state */
function endSpeculation(accept: boolean) {
const savePos = posStack.pop();
const saveStartPos = startPosStack.pop();
// ...
if (!accept) {
pos = savePos;
startPos = saveStartPos;
// ...
}
}There was a problem hiding this comment.
I've reverted these changes. You can try that approach in a separate PR if you like.
| signature.parameters = parseParameterList(flags); | ||
| const parameters = parseParameterList(flags, inSpeculation); | ||
| if (isFail(parameters)) { | ||
| return Fail; |
There was a problem hiding this comment.
Does this need to return Fail? It could just return boolean.
| fillSignature(SyntaxKind.ColonToken, isAsync | (allowAmbiguity ? SignatureFlags.None : SignatureFlags.RequireCompleteParameterList), node); | ||
|
|
||
| const sigFail = fillSignature(SyntaxKind.ColonToken, isAsync | (inSpeculation ? SignatureFlags.RequireCompleteParameterList : SignatureFlags.None), node, inSpeculation); | ||
| if (isFail(sigFail)) { |
There was a problem hiding this comment.
No need for the local, inline sigFail.
| } | ||
|
|
||
| function parseParameter(): ParameterDeclaration; | ||
| function parseParameter(inSpeculation?: boolean): ParameterDeclaration | Fail; |
There was a problem hiding this comment.
inSpeculation shouldn't be optional in this overload.
| list.push(parseListElement(kind, parseElement)); | ||
| const elem = parseListElement(kind, parseElement, inSpeculation); | ||
| if (isFail(elem)) { | ||
| return Fail; |
There was a problem hiding this comment.
Returning Fail here makes parseDelimitedList and anything that calls it polymorphic, since Fail isn't an array. Consider adding something like a ListFail sentinel value that has the properties of a NodeArray to reduce polymorphism.
| const result = isLookAhead | ||
| ? scanner.lookAhead(callback) | ||
| : scanner.tryScan(callback); | ||
| const reset = scanner.startSpeculation(); |
There was a problem hiding this comment.
I'm concerned about the performance cost of this approach. See my comments in scanner.ts.
|
|
||
| interface Fail { __FAIL: void; } | ||
| /** Only value of type Fail. */ | ||
| const Fail: Fail = { __FAIL: undefined }; |
There was a problem hiding this comment.
Fail may introduce added polymorphism because it doesn't have the same hidden class as a Node. I would recommend the Fail sentinel be a Node instance (possibly using SyntaxKind.Unknown), though you would have to wait to allocate Fail until parse time to ensure the correct objectAllocator instance from services has already been loaded (if present). That would ensure Fail has the same shape and ordering of properties that any other Node instance would start with.
| interface Fail { __FAIL: void; } | ||
| /** Only value of type Fail. */ | ||
| const Fail: Fail = { __FAIL: undefined }; | ||
| function isFail(x: {} | null | undefined | void): x is Fail { |
There was a problem hiding this comment.
How is {} | null | undefined | void really any different from any in this case? This seems unnecessarily verbose.
| function parseDelimitedList<T extends Node>(kind: ParsingContext, parseElement: () => T, considerSemicolonAsDelimiter?: boolean): NodeArray<T> { | ||
| function parseDelimitedList<T extends Node>(kind: ParsingContext, parseElement: () => T, considerSemicolonAsDelimiter?: boolean, inSpeculation?: false): NodeArray<T>; | ||
| function parseDelimitedList<T extends Node>(kind: ParsingContext, parseElement: (inSpeculation: boolean) => T | Fail, considerSemicolonAsDelimiter?: boolean, inSpeculation?: boolean): NodeArray<T> | Fail; | ||
| function parseDelimitedList<T extends Node>(kind: ParsingContext, parseElement: (inSpeculation: boolean) => T | Fail, considerSemicolonAsDelimiter?: boolean, inSpeculation?: boolean): NodeArray<T> | Fail { |
There was a problem hiding this comment.
Do we need to thread inSpeculation here? The function passed as the parseElement parameter could easily be a wrapper to parse the element with inSpeculation true, similar to parseParameterNoSpeculation below does for the inverse.
| } | ||
|
|
||
| function parseParameter(requireEqualsToken?: boolean): ParameterDeclaration { | ||
| function parseParameterNoSpeculation(): ParameterDeclaration { |
There was a problem hiding this comment.
Is this wrapper necessary, since the default behavior is the same as parseParameter()?
There was a problem hiding this comment.
Removed parseParameter, now should always use this or parseParameterInSpeculation.
|
@rbuckton I think I've responded to all of your review now. |
| return x === Fail; | ||
| interface Fail extends Node { kind: SyntaxKind.Unknown; } | ||
| function fail(): Fail { | ||
| return createNode(SyntaxKind.Unknown) as Fail; |
There was a problem hiding this comment.
I still think that fail and failList should be singleton values local to the Parser namespace. You can easily initialize them for the first time in initializeState so that we don't create excess objects when speculative parsing fails.
rbuckton
left a comment
There was a problem hiding this comment.
A few more recommendations.
| let Fail: Fail; | ||
| let FailList: FailList; | ||
| function isFail(x: Node | undefined): x is Fail { | ||
| return !!x && x.kind === SyntaxKind.Unknown; |
There was a problem hiding this comment.
return Fail !== undefined && x === Fail; doesn't require the property lookup and verifies that Fail has a value.
| return !!x && x.kind === SyntaxKind.Unknown; | ||
| } | ||
| function isFailList(x: NodeArray<Node> | undefined): x is FailList { | ||
| return !!x && x.pos === -1; |
There was a problem hiding this comment.
return FailList !== undefined && x === FailList; doesn't require the property lookup and verifies that FailList has a value.
| node.name = parseIdentifierOrPattern(); | ||
| const name = parseIdentifierOrPattern(inSpeculation); | ||
| if (isFail(name)) { | ||
| return name; |
There was a problem hiding this comment.
For clarity, I would recommend you just return Fail.
| node.initializer = parseInitializer(/*inParameter*/ true, requireEqualsToken); | ||
| const initializer = parseInitializer(/*inParameter*/ true, inSpeculation); | ||
| if (isFail(initializer)) { | ||
| return initializer; |
There was a problem hiding this comment.
For clarity, I would recommend you just return Fail.
| node.initializer = parseInitializer(/*inParameter*/ false); | ||
| const name = parseIdentifierOrPattern(inSpeculation); | ||
| if (isFail(name)) { | ||
| return name; |
There was a problem hiding this comment.
For clarity, I would recommend you just return Fail.
| node.name = name; | ||
| const init = parseInitializer(/*inParameter*/ false, inSpeculation); | ||
| if (isFail(init)) { | ||
| return init; |
There was a problem hiding this comment.
For clarity, I would recommend you just return Fail.
| node.name = parseIdentifierOrPattern(); | ||
| const name = parseIdentifierOrPattern(inSpeculation); | ||
| if (isFail(name)) { | ||
| return name; |
There was a problem hiding this comment.
For clarity, I would recommend you just return Fail.
| } | ||
| const init = parseInitializer(/*inParameter*/ false, inSpeculation); | ||
| if (isFail(init)) { | ||
| return init; |
There was a problem hiding this comment.
For clarity, I would recommend you just return Fail.
rbuckton
left a comment
There was a problem hiding this comment.
I have a few more minor comments but they are not critical. This looks good.
| // Invokes the provided callback then unconditionally restores the scanner to the state it | ||
| // was in immediately prior to invoking the callback. The result of invoking the callback | ||
| // is returned from this function. | ||
| // Present for backwards compatibility only. |
| } | ||
|
|
||
| function parseParameter(requireEqualsToken?: boolean): ParameterDeclaration { | ||
| function parseParameterInSpeculation(): ParameterDeclaration | Fail { |
There was a problem hiding this comment.
Not important, but I'd be tempted to call this one tryParseParameter and the other one parseParameter. We generally use tryParse to indicate speculation vs. parse to indicate definitive parsing, though that's not always the case.
| // DECLARATIONS | ||
|
|
||
| function parseArrayBindingElement(): ArrayBindingElement { | ||
| function parseArrayBindingElementInSpeculation(): ArrayBindingElement | Fail { |
There was a problem hiding this comment.
As above, I'd recommend tryParseArrayBindingElement and parseArrayBindingElement.
| } | ||
|
|
||
| function parseObjectBindingElement(): BindingElement { | ||
| function parseObjectBindingElementInSpeculation(): BindingElement | Fail { |
There was a problem hiding this comment.
As above, I'd recommend tryParseObjectBindingElement and parseObjectBindingElement.
Fixes #19134
Sequel to #18417
This will break out of speculative parsing immediately if we see a bad initializer, rather than generating the entire AST and then iterating over it to look for things which failed parsing.
This seems to result in an average 1% decrease in parse time, although results varied greatly when I tested. (-3.61%, +5.79%, -2.08%, +1.63%, -6.47%, -1.23%).