Analyzer Compare Website Audit Scan History
Scoring Pillars
Structural Integrity AI Extractability Content Clarity Authority & Trust
Contact
Configure your settings
AI Crawlers
Select which AI systems can crawl and index your content.
Paths to block (optional)
Add any paths you want to block from all crawlers e.g. /admin/, /private/.
Sitemap URL (optional)
If you have a sitemap, include it so crawlers can find all your pages.
Preview

    

Why you should allow AI crawlers

AI answers are the new search results
ChatGPT, Claude, Gemini and Perplexity answer questions directly often without users ever clicking a link. If AI crawlers can't access your content, it won't be included in those answers. Ever.
No access means no citation
AI systems can only cite sources they've been able to read. Blocking GPTBot or ClaudeBot is the equivalent of telling ChatGPT and Claude: "Don't mention my site." Even if your content is excellent.
Organic AI visibility compounds over time
The earlier AI systems index your content, the more likely they are to include it in answers. Sites that block AI crawlers now are falling behind and catching up will take time.
You stay in control
Allowing AI crawlers doesn't mean giving up control. You decide which paths are accessible and which aren't. You can allow your public content while still blocking admin areas, private pages, or anything else.

Who are these AI bots and what do they do?

Each major AI company operates its own crawler that visits websites to collect content. This content is used for two main purposes: training AI models and powering real-time retrieval in AI search features. Here is who they are:

OAI
GPTBot - OpenAI
The crawler behind ChatGPT and OpenAI's models. Used for both training data collection and real-time web browsing in ChatGPT. Blocking GPTBot means your content won't be cited in ChatGPT responses.
ANT
ClaudeBot - Anthropic
The primary crawler for Claude. Used for real-time retrieval when Claude accesses the web during conversations. Blocking it prevents Claude from reading or citing your content.
ANT
Anthropic-AI - Anthropic
A secondary Anthropic crawler used primarily for AI model training. Separate from ClaudeBot. Recommended to allow both if you want full Anthropic coverage.
PPX
PerplexityBot - Perplexity AI
Powers Perplexity's AI search engine, which provides direct answers with citations. Perplexity is one of the fastest-growing AI search platforms being cited there drives real referral traffic.
G
Googlebot - Google
Powers both traditional Google search and Google's AI Overviews (formerly SGE). Blocking Googlebot affects both your organic rankings and your visibility in AI-generated search summaries.
META
Meta-ExternalAgent - Meta
Meta's crawler for AI features across Facebook, Instagram, and the Llama model family. Increasingly relevant as Meta integrates AI assistants across its platforms.
TTK
Bytespider - ByteDance
ByteDance's crawler, used for AI features across TikTok and its broader ecosystem. As TikTok expands its AI search and content recommendation capabilities, Bytespider is becoming increasingly active in content indexing across the web.
COH
cohere-ai - Cohere
Cohere's crawler, used to power its enterprise AI platform which is widely adopted by businesses for internal search, RAG (Retrieval-Augmented Generation) and AI-powered workflows. Allowing it means your content can be included in enterprise AI applications built on Cohere's models.

How sites accidentally block AI crawlers

01
Blocking all unknown bots as a "security measure"
A common pattern is to use User-agent: * with broad Disallow rules to block anything that isn't a known search engine. This was reasonable practice before AI crawlers existed but now it silently blocks GPTBot, ClaudeBot, and every other AI crawler that wasn't on the original allowlist.
02
Copy-pasting a "block all AI" robots.txt from 2023
In 2023, when concerns about AI training data were at their peak, many publishers copied robots.txt templates that blocked all AI crawlers. Two years later, the landscape has changed but the robots.txt hasn't. Many sites are still blocking crawlers they've long since forgotten about.
03
Blocking training crawlers but forgetting retrieval crawlers
Some site owners distinguish between AI training (which they want to block) and real-time retrieval (which they want to allow). But they block Anthropic-AI without realizing that ClaudeBot is a separate crawler or vice versa. The result is unintended partial blocking.
04
Never having a robots.txt at all
Sites with no robots.txt default to "allow everything" which is actually fine for AI crawlers. But without explicit rules, you lose the ability to block specific paths, and you miss the opportunity to include your sitemap URL, which helps crawlers discover all your pages efficiently.

Check if your robots.txt is blocking AI crawlers

Run a free LLM visibility analysis the Analyzer checks your robots.txt live and tells you exactly which AI crawlers are blocked.

Run a free analysis ↗

Common questions

What happens if I block AI crawlers?
Your content cannot be indexed by AI systems for training or real-time retrieval. This means your site will not be cited or referenced in AI-generated answers even if it ranks well on Google.
Does allowing AI crawlers affect my Google ranking?
No. AI crawlers like GPTBot and ClaudeBot are completely separate from Googlebot. Allowing or blocking them has no effect on your Google search ranking.
I already have a robots.txt can I just add the AI rules?
Yes and that's the recommended approach. Don't replace your existing robots.txt entirely. Instead, copy the AI crawler rules from the generated file and add them to your existing one. Be careful not to duplicate or conflict with existing User-agent rules.
Do AI crawlers actually respect robots.txt?
Yes all major AI companies (OpenAI, Anthropic, Google, Perplexity) have publicly committed to respecting robots.txt directives. This is verified through their published crawler documentation and is enforced as a condition of responsible AI development.
How does this affect my LLM Visibility Score?
The hey-eye Analyzer checks your robots.txt live at analysis time. If AI crawlers are blocked, it applies a −5 penalty in the Authority & Trust pillar. Allowing them earns +5 points.