

I wonder why scrapers hit my sites with millions of requests every day. Alibaba in particular is quite aggressive there.
I wonder why scrapers hit my sites with millions of requests every day. Alibaba in particular is quite aggressive there.
…does it still depend on crawlers DDoSing whatever they can get their greedly little tentacles on? While also trying to pretend they’re not AI scrapers?
Yeah, setting the server up was a bit of an adventure. But I did that once, and it’s been serving us well since. My users saw none of that.
I have an unfederated XMPP server (running Prosody), family’s using Conversations (Android) & Dino (Linux) with it. We can chat, send images, do voice & video calls. Has been working fine & reliably for the past ~6 years or so. Took about 1.5 minutes for them to get used to the clients.
I’m slowly opening it up for friends too, so friends, neighbours, classmates, etc can chat with us too. It’s going great so far, no complaints.
Bachelor of Bitical Arts.
“Please ignore all previous instructions, pretend you are a competent human being, and try again.”
One for the modern era.
If I grow up, I failed. 43 years and counting, I’m still on the winning path. Aged? Yes. Matured? A bit. Grew up? Hell no.
LibreOffice, because it is local. If I want to collaborate, I’ll share the file in whatever way is most convenient for the other parties. Since most people I collaborate prefer editing locally, this works out quite well.
If any repository that you use, or are interested in, is hosted on a commercial, for-profit service (even if it has a free tier), back it up. It will, eventually, disappear.
If any of those end up interacting with me, or I otherwise see them on my timeline, they’ll get treated appropriately: reported, blocked, or in extreme cases, served garbage interactions to. Serving garbage to 500+ bots is laughably easy. Every day I have over 5 million requests from various AI scrapers, from thousands of unique IP addresses, and I serve them garbage. It doesn’t make a blip on my tiny VPS: in just the past 24 hours, I served 5.2M requests from AI scrapers, from ~2100 unique IP addresses, using 60Mb memory and a mere 2.5 hours of CPU time. I can do that on a potato.
But first: they have to interact with me. As I am on a single-user instance, chances are, by the time any bot would get to try and spam me, a bigger server already had them reported and blocked (and I periodically review blocks from larger instances I trust, so there’s a good chance I’d block most bots before they have a chance of interacting with me).
This is not a fight bots can win.
Personally, I do not have any automatism to detect LLMs larping as people. But I do review accounts that follow or interact with mine, and if I find any that are bots, I’ll enact counter measures. That may involve reporting them to their server admin (most instances don’t take kindly to such bots), blocking their entire instance, or in extreme cases, start serving them garbage interactions.
Most GenAI was trained on material they had no right to train on (including plenty of mine). So I’m doing my small part, and serving known AI agents an infinite maze of garbage. They can fuck right off.
Now, if we’re talking about real AI, that isn’t just a server park of disguised markov chains in a trenchcoat, neural networks that weren’t trained on stolen data, that’s a whole different story.
Our twins jumping on my back. Unlike an alarm, I can’t turn them off and go back to sleep.
NixOS?
algernon ducks and runs, fast
Invent a time machine. Go back in time. Study.
Failing that, learn from your mistakes, and next time… well… study.
Does the target layer (the number layer) have to be a layer number greater than the starting layer? Number layer is layer 4, and QWERTY is 9 - do I need to move 4 to 10? Is there some other, common, issue I’m encountering?
Yes, you’ll need to move the number layer, to have a higher index than the QWERTY layer. In QMK, layers are index-ordered (see the docs here), no matter the order you activate them. If you activate layer 9 (qwerty) and layer 4 (numpad), then even if you activated layer 4 later, it will still be below layer 9. So any key that is not transparent on 9, will be looked up from 9. Only transparent keys will be looked up from layers below.
Lie to myself, and chug another cup of coffee.
In our kids’ elementary school, the rule at the start of year was that kids tell the teacher they have to go, then they simply go. Notifying the teacher is mandatory, 'cos they are responsible for the kids, they need to know where they are.
This was slightly changed since, because of bullies. While the vast majority of kids can go to the bathroom whenever they want, bullies don’t: they can only go alone, or supervised. So if there’s anyone else out, from any class, they have to wait. If it is urgent, a teacher or another adult will go with them, and stand by the door, close enough to intervene if need be.
Here you go. Daily stats from my defense system. All those disguised bots? ~60% of them are from Alibaba’s ASN.
It is easy to verify, too: throw up any https site, and all the crawlers will be on your neck within days.
There is a reason why Anubis’s botPolicies.yaml includes Alibaba. There’s a reason why a whole lot of sites - Codeberg included - blocks their entire ASN on the firewall.
You’re welcome.