Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn’t entail handing over your website to Cloudflare?
Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn’t entail handing over your website to Cloudflare?
Perhaps feed the convincing fake data so they don’t realize they’ve been IP banned/used agent filtered.
A commenter in the hackernews post has created this: https://marcusb.org/hacks/quixotic.html
I’m interested, but it seems like an easy way for bots to exhaust your own server resources before they give up crawling.