Which AI crawlers should I allow?

The most important AI crawlers to allow are GPTBot (OpenAI/ChatGPT), ClaudeBot and Anthropic-AI (Anthropic/Claude), PerplexityBot (Perplexity), and Googlebot (Google AI search features). Allowing all of them maximizes your content's discoverability across AI platforms.

How do I use the generated robots.txt?

Download the generated robots.txt file and upload it to the root of your domain so it is accessible at yourdomain.com/robots.txt. If you already have a robots.txt, carefully merge the AI crawler rules into your existing file rather than replacing it entirely.

robots.txt Generator for AI Crawlers

Q: What happens if I block AI crawlers in robots.txt?

If you block AI crawlers like GPTBot or ClaudeBot, your content cannot be indexed by AI systems for training or real-time retrieval. This means your site will not be cited or referenced in AI-generated answers, even if it ranks well on Google.

Why it matters

Why you should allow AI crawlers

AI answers are the new search results

ChatGPT, Claude, Gemini and Perplexity answer questions directly often without users ever clicking a link. If AI crawlers can't access your content, it won't be included in those answers. Ever.

No access means no citation

AI systems can only cite sources they've been able to read. Blocking GPTBot or ClaudeBot is the equivalent of telling ChatGPT and Claude: "Don't mention my site." Even if your content is excellent.

Organic AI visibility compounds over time

The earlier AI systems index your content, the more likely they are to include it in answers. Sites that block AI crawlers now are falling behind and catching up will take time.

You stay in control

Allowing AI crawlers doesn't mean giving up control. You decide which paths are accessible and which aren't. You can allow your public content while still blocking admin areas, private pages, or anything else.

The crawlers

Who are these AI bots and what do they do?

Each major AI company operates its own crawler that visits websites to collect content. This content is used for two main purposes: training AI models and powering real-time retrieval in AI search features. Here is who they are:

OAI

GPTBot - OpenAI

The crawler behind ChatGPT and OpenAI's models. Used for both training data collection and real-time web browsing in ChatGPT. Blocking GPTBot means your content won't be cited in ChatGPT responses.

ANT

ClaudeBot - Anthropic

The primary crawler for Claude. Used for real-time retrieval when Claude accesses the web during conversations. Blocking it prevents Claude from reading or citing your content.

ANT

Anthropic-AI - Anthropic

A secondary Anthropic crawler used primarily for AI model training. Separate from ClaudeBot. Recommended to allow both if you want full Anthropic coverage.

PPX

PerplexityBot - Perplexity AI

Powers Perplexity's AI search engine, which provides direct answers with citations. Perplexity is one of the fastest-growing AI search platforms being cited there drives real referral traffic.

G

Googlebot - Google

Powers both traditional Google search and Google's AI Overviews (formerly SGE). Blocking Googlebot affects both your organic rankings and your visibility in AI-generated search summaries.

How sites accidentally block AI crawlers

01

Blocking all unknown bots as a "security measure"

A common pattern is to use User-agent: * with broad Disallow rules to block anything that isn't a known search engine. This was reasonable practice before AI crawlers existed but now it silently blocks GPTBot, ClaudeBot, and every other AI crawler that wasn't on the original allowlist.

02

Copy-pasting a "block all AI" robots.txt from 2023

In 2023, when concerns about AI training data were at their peak, many publishers copied robots.txt templates that blocked all AI crawlers. Two years later, the landscape has changed but the robots.txt hasn't. Many sites are still blocking crawlers they've long since forgotten about.

03

Blocking training crawlers but forgetting retrieval crawlers

Some site owners distinguish between AI training (which they want to block) and real-time retrieval (which they want to allow). But they block Anthropic-AI without realizing that ClaudeBot is a separate crawler or vice versa. The result is unintended partial blocking.

04

Never having a robots.txt at all

Sites with no robots.txt default to "allow everything" which is actually fine for AI crawlers. But without explicit rules, you lose the ability to block specific paths, and you miss the opportunity to include your sitemap URL, which helps crawlers discover all your pages efficiently.

FAQ

Common questions

What happens if I block AI crawlers?

Your content cannot be indexed by AI systems for training or real-time retrieval. This means your site will not be cited or referenced in AI-generated answers even if it ranks well on Google.

Does allowing AI crawlers affect my Google ranking?

No. AI crawlers like GPTBot and ClaudeBot are completely separate from Googlebot. Allowing or blocking them has no effect on your Google search ranking.

I already have a robots.txt can I just add the AI rules?

Yes and that's the recommended approach. Don't replace your existing robots.txt entirely. Instead, copy the AI crawler rules from the generated file and add them to your existing one. Be careful not to duplicate or conflict with existing User-agent rules.

Do AI crawlers actually respect robots.txt?

Yes all major AI companies (OpenAI, Anthropic, Google, Perplexity) have publicly committed to respecting robots.txt directives. This is verified through their published crawler documentation and is enforced as a condition of responsible AI development.

How does this affect my LLM Visibility Score?

The hey-eye Analyzer checks your robots.txt live at analysis time. If AI crawlers are blocked, it applies a −5 penalty in the Authority & Trust pillar. Allowing them earns +5 points.

robots.txt Generator

Why you should allow AI crawlers

Who are these AI bots and what do they do?

How sites accidentally block AI crawlers

Check if your robots.txt is blocking AI crawlers

Common questions