WordPress website owners have observed their sites becoming sluggish and less responsive. A significant contributor to this issue is the increased activity of AI crawlers—automated bots deployed by companies to collect data from the internet for training large language models (LLMs). These crawlers can inundate websites with requests, leading to performance issues and degraded user experience.
The Problem: AI Crawlers Overloading WordPress Websites
AI companies employ web crawlers to amass vast amounts of data from various websites. While this data collection is vital for developing advanced AI models, it can inadvertently harm the websites being accessed. For instance, SourceHut, a platform hosting open-source projects, reported significant slowdowns due to aggressive crawling by AI bots, likening the excessive traffic to a Distributed Denial of Service (DDoS) attack.
The Solution: Implementing Protective Measures for WordPress Sites
To mitigate the adverse effects of AI crawlers on WordPress websites, administrators can adopt several strategies:
- Utilising the Robots.txt File: The
robots.txt
file serves as a guide for web crawlers, indicating which parts of your site they are permitted to access. By updating this file, you can instruct AI crawlers to avoid specific sections of your site. For example, adding directives to disallow known AI user agents can help reduce unwanted crawling.Example ofrobots.txt
entries:User-agent: GPTBot Disallow: /
However, it’s important to note that not all bots adhere to the
robots.txt
directives, and some may ignore these instructions entirely. - Deploying Security Plugins like Wordfence: Wordfence is a comprehensive security plugin for WordPress that offers features to block malicious traffic, including unwanted bots. While Wordfence primarily focuses on security threats, it can be configured to limit the impact of aggressive crawlers. Discussions within the WordPress community suggest that Wordfence is aware of the challenges posed by AI bots and is considering measures to address them.
- Leveraging Cloudflare’s Bot Management: Cloudflare provides a feature that allows website owners to block AI scrapers and crawlers with a single click. By navigating to the Security > Bots section of the Cloudflare dashboard, you can enable the “Block AI Scrapers and Crawlers” option, effectively preventing these bots from accessing your site.
- Adjusting Crawl Rate in Google Search Console: Google Search Console used to allow you to manage how frequently Google’s bots crawl your site. If Google’s crawling was causing server strain, you could request a reduction in crawl rate through the Search Console settings. As of January 8, 2024, Google Search Console no longer offers the ability to manually adjust the crawl rate of Googlebot. This change occurred because Google deprecated the Crawl Rate Limiter Tool, citing advancements in their crawling algorithms that now autonomously manage crawl rates based on server capacity and response times.
Balancing Data Collection and Website Performance
While AI companies require data to advance their technologies, it’s crucial to balance this need with the performance and stability of WordPress websites. Website owners should remain vigilant, monitor their traffic, and implement protective measures as needed. Collaborative efforts between AI developers and web administrators can lead to solutions that respect both the necessity for data collection and the integrity of online platforms.