Add MaDs for Apache Avro by jacknojo · Pull Request #21754 · github/codeql

jacknojo · 2026-04-24T11:37:56Z

Add Models as Data for the Java version of Apache Avro. This is based on this subfolder/commit.

This is entirely LLM-generated and the output has undergone a voting procedure. It is not meant to fully cover the library. I am curious for any feedback on this. We also need to decide if the provenance is OK or if there is a better name.

This PR contains #21751 for the qlpack.yml wildcard, needed for a DCA run.

Copilot

Copilot wasn't able to review any files in this pull request.

jacknojo · 2026-04-24T11:47:55Z

I am not asking for a full review necessarily, this is machine-generated and I have looked at the rows myself individually. Obviously my CodeQL knowledge is still limited so I may very well have made mistakes.

I am open for discussing how to approach getting this reviewed/merged.

owen-mc

I think the output format needs to be updated to correctly deal with nested classes.

owen-mc · 2026-04-28T11:04:37Z

+      pack: codeql/java-all
+      extensible: sinkModel
+    data:
+      - ["org.apache.avro.data", "ObjectReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]


I tried to look up the docs for this and found that it's actually a nested class. I believe we would need the below syntax to correctly specify it. (You can see this is used for java.io.ObjectInputFilter.Config, for example.)

Suggested change

- ["org.apache.avro.data", "ObjectReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]

- ["org.apache.avro.data", "Json$ObjectReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]

owen-mc · 2026-04-28T11:15:27Z

+      pack: codeql/java-all
+      extensible: sinkModel
+    data:
+      - ["org.apache.avro.data", "ObjectReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]


Also, Gemini 3.1 Pro doesn't think this should be a sink.

Reasoning: Unsafe deserialization vulnerabilities occur when an application deserializes untrusted data into arbitrary Java objects (allowing an attacker to trigger malicious gadget chains). However, Json.ObjectReader is designed to strictly read Avro-encoded data matching the specific Json.SCHEMA internal to Apache Avro.

If you examine its implementation, it maps incoming primitive tokens directly to basic, safe Jackson JsonNode types (like LongNode, DoubleNode, TextNode, ArrayNode, and ObjectNode) and then unwraps them into basic Java structures (Map, List, String, Long, etc.). Since it does not perform polymorphic deserialization or resolve arbitrary class names from the data stream, it is structurally immune to unsafe class instantiation and does not act as a deserialization sink.

owen-mc · 2026-04-28T11:33:09Z

In the past, when we have generated too many models to manually check, we have validated them by running QA and checking the alert changes are reasonable (not too many, not too many FPs). It turns out that if you have 100 incorrect models, most of them won't cause any problems, because they just don't lead to any extra data flow paths, but there are some which do cause a lot of extra data flow paths, which will cause lots of FPs, which are pretty obvious when you look at the QA results.

This PR doesn't introduce that many models, so it might be worth reviewing them manually. Also, the above procedure won't work for sources that below to non-default threat models. It might well be possible to run QA with a particular threat model. I haven't done this before so I've asked around. If so we should definitely make sure to add file to the default remote threat model, so that those models are exercised.

jacknojo added 5 commits April 24, 2026 13:24

Move generated MaDs for Java into modelgenerator/

6ec2509

Move generated MaDs for CPP into modelgenerator/

07cb980

Move generated MaDs for Rust into modelgenerator/

073529a

Move generated MaDs for C# into modelgenerator/

a6e052b

Change path where tool generate MaDs

7f12fb7

Copilot AI review requested due to automatic review settings April 24, 2026 11:37

jacknojo requested a review from a team as a code owner April 24, 2026 11:37

Copilot AI reviewed Apr 24, 2026

View reviewed changes

github-actions Bot added the Java label Apr 24, 2026

jacknojo requested a review from michaelnebel April 24, 2026 11:38

jacknojo removed the request for review from michaelnebel April 27, 2026 13:24

jacknojo marked this pull request as draft April 27, 2026 13:24

jacknojo changed the base branch from main to jacknojo/move_java_generated_mads April 27, 2026 13:42

owen-mc requested changes Apr 28, 2026

View reviewed changes

jacknojo added 2 commits April 28, 2026 13:23

Add MaDs for Apache Avro

069e749

Change provenance to ai-generated

ec10873

jacknojo force-pushed the jacknojo/add_llm_generated_mads_for_avro branch from 5d78705 to ec10873 Compare April 28, 2026 11:24

jacknojo changed the base branch from jacknojo/move_java_generated_mads to main April 28, 2026 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MaDs for Apache Avro#21754

Add MaDs for Apache Avro#21754
jacknojo wants to merge 7 commits intomainfrom
jacknojo/add_llm_generated_mads_for_avro

jacknojo commented Apr 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

jacknojo commented Apr 24, 2026

Uh oh!

owen-mc left a comment

Uh oh!

owen-mc Apr 28, 2026

Uh oh!

owen-mc Apr 28, 2026

Uh oh!

owen-mc commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	- ["org.apache.avro.data", "ObjectReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]
	- ["org.apache.avro.data", "Json$ObjectReader", True, "read", "(Object,Decoder)", "", "Argument[1]", "unsafe-deserialization", "ai-generated"]

Conversation

jacknojo commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

jacknojo commented Apr 24, 2026

Uh oh!

owen-mc left a comment

Choose a reason for hiding this comment

Uh oh!

owen-mc Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

owen-mc Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

owen-mc commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jacknojo commented Apr 24, 2026 •

edited

Loading