The site is getting absolutely hammered by bots. Arduboy has been growing in popularity in Russia. Coincidence? Maybe. I think someone is scraping Arduboy forum for AI training.
Like, get bent ya hoser. This is costing me thousands in annual hosting fees.
The forum has features to mitigate this, but they are woefully misguided in their design. Ivory tower, no dog food eating nonsense. They have a white list, but that will block any user agent not on it. This means search engines in smaller countries that I don’t know about would not index the site. Maybe that is the arbitrage here.
There is also a rate limiting feature, which would work perfect in this situation but it is only applied to user agents that are specifically targeted for rate limiting. You cannot apply rate limiting to “new” user agents that you have not yet captured.
The easy fix here would be just have an option to rate limit user agents that are not on a white list.
Even better, and I am scratching my head and starting to get a little pissed as to why this isn’t a feature:
Why is there no feature to block or rate limit a user agent after it hits a certain number of views in a certain time??? Like: Wow I’ve never seen this user agent before and it’s generated 100x as much traffic as a normal user, maybe I’ll put them in timeout.
This really is a weak link in discourses forums right now. I doubt I’m the only one dealing with this problem.
Does anyone have any suggestions here? At this point I think I will just have to create a very inclusive white list and just deal with the sad reality I’m forcing people to use the major search engines to find out about Arduboy.
Maybe that’s not a big deal? Am I making a bigger deal out of it than it needs to be?
I mean, Discourse is free, open source and can be self hosted. So I keep running into situations where it would be in my best interest to host Discourse myself. It sort of seems like their support system is designed to push customers in this direction instead of actually addressing their needs. I.e. Discourse hosting seems only interested in taking money from low hanging fruit.