With the advent of artificial intelligence and large language models, publishers may want to prevent their sites from being crawled and their content used to train and inform these models. Here’s how to do that using Yoast, which comes standard with Newspack. Publishers can also elect to pay for Yoast Premium, which makes this task a bit easier.

Note: These settings tell web crawlers what you’d like them to do, but it’s up to the crawlers to respect your preferences.

For those with standard Yoast

Go to Yoast SEO > Tools > File editor and choose “Create robots.txt file.”

If you get a giant screen full of code, hit the back button once on your browser. You should see this:

Create a blank line after “Disallow:” by hitting Return/Enter twice. Copy the following and paste after that blank line.

## OpenAI crawler
User-agent: GPTbot
Disallow: /

## ChatGPT service
User-agent: ChatGPT-User
Disallow: /

## Common Crawl crawler
User-agent: CCBot
Disallow: /

## Bard/Gemini service
User-agent: Google-Extended
Disallow: /

## Perplexity crawler
User-Agent: PerplexityBot
Disallow: /

Make sure there’s at least one blank line before and after each entry. It should look like this:

Save changes at the bottom of the file. You’ll see a confirmation message at the top of the screen.

You can ensure it’s correct by opening a browser and adding /robots.txt to the end of your URL (such as https://newspack.com/robots.txt). You should see those lines in the result.

As more crawlers become common, we’ll add them to the list here, and you can modify your robots.txt file by adding new entries.

For those with Yoast Premium

They make it simple. Go to Yoast SEO > Settings > Advanced > Crawl optimization to find this screen. Set the appropriate toggles.