With the advent of artificial intelligence and large language models, publishers may want to prevent their sites from being crawled and their content used to train and inform these models. Here’s how to do that using Yoast, which comes standard with Newspack. Publishers can also elect to pay for Yoast Premium, which makes this task a bit easier.
Note: These settings tell web crawlers what you’d like them to do, but it’s up to the crawlers to respect your preferences.
For those with standard Yoast
Go to Yoast SEO > Tools > File editor and choose “Create robots.txt file.”

If you get a giant screen full of code, hit the back button once on your browser. You should see this:

Create a blank line after “Disallow:” by hitting Return/Enter twice. Copy the following and paste after that blank line.## OpenAI crawlerUser-agent: GPTbotDisallow: /## ChatGPT serviceUser-agent: ChatGPT-User Disallow: /## Common Crawl crawlerUser-agent: CCBotDisallow: /## Bard/Gemini serviceUser-agent: Google-ExtendedDisallow: /## Perplexity crawlerUser-Agent: PerplexityBotDisallow: /
Make sure there’s at least one blank line before and after each entry. It should look like this:

Save changes at the bottom of the file. You’ll see a confirmation message at the top of the screen.

You can ensure it’s correct by opening a browser and adding /robots.txt to the end of your URL (such as https://newspack.com/robots.txt). You should see those lines in the result.
As more crawlers become common, we’ll add them to the list here, and you can modify your robots.txt file by adding new entries.
For those with Yoast Premium
They make it simple. Go to Yoast SEO > Settings > Advanced > Crawl optimization to find this screen. Set the appropriate toggles.

