Conversation
9a37061 to
fac6641
Compare
kaeluka
left a comment
There was a problem hiding this comment.
✅ Approve
This is the kind of PR where I'm extra glad we have CI checks: big surface, but not much new logic ;)
I'll give this a formal approval once the first PR is merged and this has been rebased; but I've done the review now and it all looks good to me. I've left one question for my own understanding.
`isOtherModeledArgument` and `isArgumentToBuiltinFunction` contained the old logic for selecting negative endpoints for training. These can now be deleted, and replaced by a single base class that collects all EndpointCharacteristics that are currently used to indicate negative training samples: `OtherModeledArgumentCharacteristic`. This in turn lets us delete code from `StandardEndpointFilters` that effectively said that endpoints that are high-confidence non-sinks shouldn't be scored at inference time, either.
All remaining functionality in `CoreKnowledge` is only being used in `EndpointCharacteristics`, so it can be moved there as a small set of helper predicates.
All remaining functionality in `StandardEndpointFilters` is only being used in `EndpointCharacteristics`, so it can be moved there as a small set of helper predicates.
fac6641 to
4580b55
Compare
I rebased this on top of the latest version of |
|
@kaeluka to merge this into main I had to remove a merge conflict due to a tiny comment change in |
@adityasharad Thank you! ❤️ @kaeluka Never mind! 😄 |
This PR is the code cleanup made possible by the previous few PRs. As such, it has a lot of lines of change (mostly code deletion 🥳), but the changes are conceptually small. I tried to make the commits as clear as possible, tackling one piece of complexity reduction per commit, and explaining it in the commit comment. Commit-by-commit review warmly recommended 😄
Note that this PR is based on the branch
tiferet/endpoint-filters, because that branch has not yet been merged and I don't want to see the commits from #11281 in this PR as well.main code deletions / simplifications
isOtherModeledArgumentandisArgumentToBuiltinFunctioncontained the old logic for selecting negative endpoints for training. These can now be deleted, and replaced by a single base class that collects all EndpointCharacteristics that are currently used to indicate negative training samples:OtherModeledArgumentCharacteristic. This in turn lets us delete code fromStandardEndpointFiltersthat effectively said that endpoints that are high-confidence non-sinks shouldn't be scored at inference time, either.FilteringReasonis no longer being used and can be deleted.CoreKnowledgeandStandardEndpointFilters: All remaining functionality inCoreKnowledgeandStandardEndpointFiltersis only being used inEndpointCharacteristics, so it can be moved there as a small set of helper predicates.Timing checks
endpoint_large_scale/ExtractEndpointDataTrainingremains like it was after the last PR that affected timing: About 5s.Closes https://github.com/github/ml-ql-adaptive-threat-modeling/issues/2110