Skip to content

Conversation

@betatim
Copy link

@betatim betatim commented Jan 27, 2026

For the PR or Issue author show an alert icon if they match one of several heuristics that could mean they are a "spammy" user.

In the scikit-learn monthly meeting we discussed that there are increasingly many accounts that are "spammy" and that many of the maintainers visit the profile page of a new contributor before doing anything else to try and judge the "spammyness". So I thought I'd give modifying Refined GitHub a try as I've used this extension for a long time and other maintainers also use it.

However, I don't know any TypeScript or the code base. So I asked Cursor (a AI agent editor thing) to make the changes for me. For my untrained eye they look reasonable, but yeah, an expert's opinion would be useful (from my experience with using Cursor on scikit-learn: it needs quite a bit of polishing to be truly great code). I'd be happy to do the polishing if pointed in the right direction.

I had a quick look through the issues but couldn't find anything about a feature like this. I'm wondering if Refined GitHub is the place to implement this or not? How useful are these heuristics in practice?

Test URLs

The PR and Issue opened by the same user both get a alert symbol:

A long time GitHub user's Issue doesn't get the alert scikit-learn/scikit-learn#30088

Screenshot

Screenshot 2026-01-27 at 09 55 11

edit: as a friend just pointed out about this PR: "Ah, a AI spammer!" - the irony of using AI to make something to help with (mostly AI powered) spammers is not lost on me 🤣

For the PR or Issue author show a alert icon if they match one of
several heuristics that could mean they are a "spammy" user.
@fregante
Copy link
Member

Thank you for opening this PR. I have personally not seen this sort of spam before, is it more common in the python community? Or on larger OSS repos? Generally the spam I see is truly spam and immediately detectable by its contents.

This feature would require sending one API request per user seen, which would probably be a lot of HTTP traffic only to catch a tiny fraction of users. See

https://github.com/refined-github/refined-github/wiki/%22Can-you-add-this-feature%3F%22#6-it-doesnt-require-too-much-http-traffic

For this reason I suggest creating a userscript instead and sharing it with your team.

@fregante fregante closed this Jan 27, 2026
@SunsetTechuila
Copy link
Member

There's currently a massive wave of AI slop PRs flooding a lot of big OSS projects. I'm surprised you haven't heard about that. I think this feature would be really useful.

Not arguing that this particular PR should be reopened, just sharing my opinion.

@betatim
Copy link
Author

betatim commented Jan 27, 2026

Thanks for the quick reply!

Thank you for opening this PR. I have personally not seen this sort of spam before, is it more common in the python community? Or on larger OSS repos? Generally the spam I see is truly spam and immediately detectable by its contents.

On scikit-learn we see a lot of these "spammy" contributions. Am example is the issue and PR linked as "testing links". They aren't spam in the sense of trying to sell me something, instead they are fixing a problem that doesn't really exist (in this case an obscure bug that can't be triggered from the public API). It seems that a lot of these are the result of using some automated tool to find the issue. These users also often open many PRs to many different repos. My guess is that this is somehow "farming for karma"?

For the linked PR/issue it took an experienced maintainer a bit of effort to come to the conclusion that this is probably "spam". While discussing this problem we realised that many of the maintainers look at contributor's profiles before doing anything else to try and asses "how serious is this contribution?". As a result I thought it could be useful to bring some of the information from the profile into the PR/issue view (to save one click).

This feature would require sending one API request per user seen, which would probably be a lot of HTTP traffic only to catch a tiny fraction of users. See

Wiki: "Can you add this feature?" (6 it doesnt require too much http traffic)

For this reason I suggest creating a userscript instead and sharing it with your team.

This was a mistake. I only want to add the indicator to the issue/PR creator. So there should be one HTTP request per page, not one for each user. I don't know if that is still too many requests?

@fregante
Copy link
Member

@SunsetTechuila the core point is "immediately detectable", which you can also confirm yourself by looking at the PRs tagged spam in that repo:

https://github.com/scikit-learn/scikit-learn/pulls?q=sort%3Aupdated-desc+is%3Apr+label%3Aspam+is%3Aclosed

Most of them lack an avatar, which is a huge signal that they're spam. The rest of them are repeated PRs or have junk contents like https://github.com/scikit-learn/scikit-learn/pull/33007/files

The special part about the PR/issue links in the current PR is that the user has an avatar and the contents appear useful.

@fregante
Copy link
Member

@betatim looking back at the issues/PRs that were opened recently in your repo, how many would actually benefit from this signal? The signals I highlighted (no avatar, junk contents, certain names) already seem to cover most of them.

If this signal would help you detect 2 issues or PRs per month in a repo as popular as that one, I'd say it's not worth sending hundreds of thousands of HTTP requests daily by Refined GitHub users.

@fregante
Copy link
Member

fregante commented Jan 27, 2026

To reiterate, a userscript to be used by maintainers of popular repos is a better solution than adding this to Refined GitHub where almost no one would benefit.

You could also create a GitHub workflow that marks certain PRs as spam. Or create an AI bot that can use better heuristics to detect spam and deal with it.

@betatim
Copy link
Author

betatim commented Jan 27, 2026

I think a userscript might indeed be the right solution here (I need to learn about how and what that is, but that is a detail).

The (I think) useful signals that are not directly available when looking at a PR/issue are the "user set their contribution history to private" and "opened many PRs to many different repos in a short time" (not implemented in this PR). But they are only one click away

@SunsetTechuila
Copy link
Member

the contents appear useful.

That's the problem with a lot of low-effort AI PRs - you can't immediately tell whether the code makes sense, particularly if the PR is not small.

Most of them lack an avatar, which is a huge signal that they're spam.

Sure, but the lack of an avatar alone is not enough to determine whether a user is likely to be a spammer. And, as @betatim said, other signals are one click away.

where almost no one would benefit.

I'm not sure where this comes from. Perhaps it would be a good idea to open an issue to see if users are interested.

@fregante
Copy link
Member

fregante commented Jan 27, 2026

I'm not sure where this comes from

You said it best:

flooding a lot of big OSS projects

How many Refined GitHub users manage "big OSS projects"? The answer? Almost no one (of the 150k+ Refined GitHub users)

This repo has 30k stars and 150k users, yet I don’t think I've seen more than a couple of such PRs.

At the end of the day, as you both have said it, this signal is one click away. Frankly even discussing the utility of this feature in the context of Refined GitHub is wasting more time than clicking those 5 profiles a month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants