Lemmy newb here, not sure if this is right for this /c.

An article I found from someone who hosts their own website and micro-social network, and their experience with web-scraping robots who refuse to respect robots.txt, and how they deal with them.

  • F04118F@feddit.nl
    link
    fedilink
    English
    arrow-up
    25
    ·
    4 days ago

    Interesting approach but looks like this ultimately ends up:

    • being a lot of babysitting / manual work
    • blocking a lot of humans
    • not being robust against scrapers

    Anubis seems like a much better option, for those wanting to block bots without relying on Cloudflare:

    https://anubis.techaro.lol/