Don't let ChatGPT harm your content and SEO - find out how to protect your website from scraping and unauthorised usage with these simple steps.
In today's digital landscape, website owners must be proactive in protecting their content from scraping and unauthorised usage. ChatGPT, a large language model developed by OpenAI, is a tool that can easily scrape and replicate website content, potentially causing harm to your Search Engine Optimisation (SEO) and brand reputation.
However, with a few simple steps, website owners can effectively block ChatGPT from using their content.
Using Robots.txt File
The robots.txt file is a standard file used to communicate with search engine crawlers, such as Googlebot, about which pages on your website should not be crawled.
To block ChatGPT completely from scraping your website, simply add the following code to your robots.txt file:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: CCBot
Disallow: /
To block ChatGPT partially, add the following code:
User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/
User-agent: ChatGPT-User
Allow: /directory-1/
Disallow: /directory-2/
This will prevent AI Crawlers like ChatGPT from accessing any pages on your website, ensuring that your content is not used without your permission.
How to block on WordPress Websites
If you’re lacking the dev resources and don’t know what a robots.txt file is, but are using a WordPress websites, we have an alternate solution for you. This is using through a WordPress Plugin:
Prerequisite: Install the Yoast Plugin
- Click on “Yoast SEO” in the menu
- Click on “Tools”
- Click on “File Editor”
- Click on the “Create robots.txt file” button
- Copy/paste the above lines into the file
- Click on “Save changes to robots.txt”
Implement CAPTCHA
Another effective method to prevent ChatGPT from scraping your website is to use CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart).
This is a tool that requires users to complete a task that is difficult for a computer to perform, such as solving a simple puzzle or entering text from an image.
By implementing CAPTCHA on your website, you can prevent ChatGPT from accessing your content and scraping it.
Implement IP Range Blocks
Another common method that can be considered when blocking Generative AI crawlers, is through the implementation of a blocked IP range based on the latest publications from respective Generative AI Crawlers.
Note: It’s possible to block that IP range through .htaccess, but the IP range can change, which means that the .htaccess file will have to be updated in line with the updates from the Generative AI publications.
These are the current GPTBot IP ranges as of 08-09-2023:
- 20.15.240.64/28
- 20.15.240.80/28
- 20.15.240.96/28
- 20.15.240.176/28
- 20.15.241.0/28
- 20.15.242.128/28
- 20.15.242.144/28
- 20.15.242.192/28
- 40.83.2.64/28
Follow: https://openai.com/gptbot-ranges.txt to stay up to date.
How to Opt-Out of ChatGPT Scrapped Data?
- Head to this OpenAI Data Opt-Out Request.
- Type in your email address associated with the account.
- Enter the Organisation ID.
- Type in your Organisation Name found in your ChatGPT settings.
- Solve the Captcha, and the data opt-out form will be submitted to OpenAI.
Note: Check your email to see if a copy of the form has been emailed to your user account as a verification process. For more information on OpenAI’s ChatGPT service, ensure you read through its privacy policy and the ChatGPT terms of use.
Conclusion
By using these simple methods, website owners can effectively block ChatGPT from using their content, preserving their SEO and brand reputation. Implementing these measures will also help protect your website from other types of content scraping, ensuring that your hard-earned content remains protected.
Last Update: 03 October 2023