When I first got into self hosting, I originally wanted to join the Fediverse by hosting my own instance. After realizing I am not that committed to that idea, I went into a simpler direction.

Originally I was using Cloudflare’s tunnel service. Watching the logs, I would get traffic from random corporations and places.

Being uncomfortable with Cloudflare after pivoting away from social media, I learned how to secure my device myself and started using an uncommon port with a reverse proxy. My logs now only ever show activity when I am connecting to my own site.

Which is what lead me to this question.

What do bots and scrapers look for when they come to a site? Do they mainly target known ports like 80 or 22 for insecurities? Do they ever scan other ports looking for other common services that may be insecure? Is it even worth their time scanning for open ports?

Seeing as I am tiny and obscure, I most likely won’t need to do much research into protecting myself from such threats but I am still curious about the threats that bots pose to other self-hosters or larger platforms.

  • derek@infosec.pub
    link
    fedilink
    English
    arrow-up
    2
    ·
    16 hours ago

    That sounds pretty good to me for self-hosted services you’re running just for you and yours. The only addition I have on the DR front is implementing an off-site backup as well. I prefer restic for file-level backups, Proxmox Backup Server for image backups (clonezilla works in a pinch), and Backblaze B2 for off-site storage. They’re reliable and reasonably priced. If a third party service isn’t in the cards then get a second SSD and put it in a safety deposit box or bury it on the other side of town or something. Swap the two backup disks once a month.

    The point is to make sure you’re following the 3-2-1 principal. Three copies of your data. Two different storage mediums. One remote location (at least). If disaster strikes and your home disappears you want something to restore from rather than losing absolutely everything.

    Extending your current set up to ship the external SSD’s contents out to B2 would likely just be pointing rsync at your B2 bucket and scheduling a cron or systemd timer to run it.

    After that if you’re itching for more I’d suggest reading/watching some Red Team content like the stuff at hacker101 dot com and sans dot org. OWASP dot org is also building some neat educational tools. Getting a better understanding of the what and why around internet background noise and threat actor patterns is powerful.

    You could also play around with Wazuh if you want to launch straight into the Blue Team weeds. Education of the attacking side is essential for us to be effective as defenders but deeper learning anywhere across the spectrum is always a good thing. Standing up a full blown SIEM XDR, for free, offers a lot of education.

    P. S. I realize this is all tangential to your OP. I don’t care for the grizzled killjoys who chime in with “that’s dumb don’t do that” or similar, offer little helpful insight, and trot off arrogantly over the horizon on their high horse. I wanted to be sure I offered actionable suggestions for improvement and was tangibly helpful.