Skip to content

Conversation

@danielvartan
Copy link

@danielvartan danielvartan commented Aug 5, 2025

Description

Hi there,

I’d like to propose adding the .nls extension to the NetLogo language in linguist. This is an official NetLogo file type, but it's currently missing from languages.yml.

I've added heuristics to differentiate .nls files as NetLogo, TeX, or INI. Each pattern was tested individually, and all of them return results associated with the extension.

Thanks for considering this!

Checklist:

@danielvartan danielvartan requested a review from a team as a code owner August 5, 2025 05:05
Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the .nls extension is unique—no other language in languages.yml uses it.

And that's going to cause a problem as there appear to be more non-NetLogo files than there are NetLogo (add NOT before the keyword in your search) which means they will all be incorrectly classified.

You will need to identity the other main user of this extension and add support at the same time in this PR and use a heuristic to differentiate the two.

@danielvartan
Copy link
Author

Hi @lildude,

Thanks for pointing out the issue.

I've added heuristics to distinguish .nls files as NetLogo, TeX, or INI, along with two sample files for each type. I also updated the PR description and the checklist sections.

Let me know if any further changes are needed.

Copy link
Collaborator

@Alhadis Alhadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question the need for heuristics in the first place, as our classifier should do a decent enough job disambiguating between the filetypes. If not, more (and better-quality) samples are needed.

Regardless, the feedback I've left relates specifically to the accuracy and formatting of the heuristics you've added.

Comment on lines 617 to 630
pattern:
- '^\s*;'
- '^\s*to\s+[\w-]+'
- '^\s*to-report\s+[\w-]+'
- '^\s*__includes\s+\['
- '^\s*extensions\s+\['
- '^\s*globals\s+\['
- '^\s*breed\s+\['
- '^\s*turtles-own\s+\['
- '^\s*patches-own\s+\['
- '^\s*links-own\s+\['
- '^\s*undirected-link-breed\s+\['
- '^\s*directed-link-breed\s+\['
- '^\s*ask\s+[\w-]+\s+\['
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Many of these heuristics are extremely open-ended, and could very easily match a valid line of a TeX file or INI. In particular, something like ^\s*; is likely to match a comment-line in an INI file (here are three such examples I plucked from a quick search, for example).

    Even less obvious forms such as this might be matched by ^\s*ask\s+[\w-]+\s*\[ (remember, \s includes newlines as well as horizontal whitespace):

    notice = Fpr help with regular expressions,
    	ask Alhadis
    
    [section]
    Foo = Bar
  2. Secondly, these patterns (problematic as they are) can be more efficiently written as a single, combined expression in expanded (?x) mode:

    Suggested change
    pattern:
    - '^\s*;'
    - '^\s*to\s+[\w-]+'
    - '^\s*to-report\s+[\w-]+'
    - '^\s*__includes\s+\['
    - '^\s*extensions\s+\['
    - '^\s*globals\s+\['
    - '^\s*breed\s+\['
    - '^\s*turtles-own\s+\['
    - '^\s*patches-own\s+\['
    - '^\s*links-own\s+\['
    - '^\s*undirected-link-breed\s+\['
    - '^\s*directed-link-breed\s+\['
    - '^\s*ask\s+[\w-]+\s+\['
    pattern: >-
    (?x) ^ \s*
    ( ;
    | to(-report)? \s+ [\w-]+
    | ask \s+ [\w-]+ \s+ \[
    | (extension|global|__include)s \s+ \[
    | (turtles|patches|links)-own \s+ \[
    | ((un)?directed-link-)?breed \s+ \[
    )

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Alhadis,

Thank you for your review.

Regarding your points:

  • I have no problem removing the heuristics. I didn't include them initially but added them at @lildude's request.
  • I removed the ^\s*; pattern, which, as you pointed out, was too open-ended, and used the (?x) mode as suggested to combine all patterns into a single expression. I made a few modifications to the regex you proposed.
  • I maintained \s after the keyword because NetLogo, like other formats, allows the opening bracket on a new line (example). I removed the ask keyword to avoid confusion. Since the first \s is preceded by ^, I don't think it needs to be changed.
  • I also added more examples, all with commercial use permissions.

The PR description has been updated to reflect these changes.

@danielvartan danielvartan requested a review from Alhadis January 21, 2026 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants