Loading...
Loading...
Generate robots.txt, llms.txt, and a per-bot AI policy in your browser. Allow or block GPTBot, ClaudeBot, PerplexityBot, Google-Extended and 8 more — no signup.
Files regenerate live as you edit.
~160 chars. Shown at the top of llms.txt.
Included in llms.txt so LLMs can route inquiries to you.
Paths to keep all crawlers out of (admin areas, search params, staging, etc.).
Tick to allow, untick to block.
# robots.txt for My Site
# Generated at https://softechinfra.com/tools/robots-llms-ai-policy-builder
User-agent: *
Disallow: /admin
Disallow: /api
Sitemap: https://example.com/sitemap.xml
# AI / LLM crawlers
# Policy generated at https://softechinfra.com/tools/robots-llms-ai-policy-builder
# OpenAI — GPTBot
User-agent: GPTBot
Allow: /
# Anthropic — ClaudeBot
User-agent: ClaudeBot
Allow: /
# Perplexity — PerplexityBot
User-agent: PerplexityBot
Allow: /
# Common Crawl — CCBot
User-agent: CCBot
Allow: /
# Google — Google-Extended
User-agent: Google-Extended
Allow: /
# Apple — Applebot-Extended
User-agent: Applebot-Extended
Allow: /
# Meta — meta-externalagent
User-agent: meta-externalagent
Allow: /
# ByteDance — Bytespider
User-agent: Bytespider
Allow: /
# Anthropic (legacy) — anthropic-ai
User-agent: anthropic-ai
Allow: /
# Cohere — cohere-ai
User-agent: cohere-ai
Allow: /
# Amazon — amazonbot
User-agent: amazonbot
Allow: /
# You.com — YouBot
User-agent: YouBot
Allow: /
robots.txt and llms.txt are voluntary standards — well-behaved bots respect them, but bad actors ignore them entirely. For hard access control use server-side rules (Cloudflare bot management, IP blocks, WAF). The bot catalog reflects the most active AI crawlers as of mid 2026 and is updated as new agents emerge.
Enter your site name, URL, a short description (~160 chars), sitemap URL, and a contact email. These populate both files.
Tick or untick each of the 12 AI crawlers. Defaults allow all (recommended for AI-search visibility). Use "Block all" if your concern is content protection.
Add the 5–15 most important pages on your site (about, services, key blog posts, pricing). LLMs use this list to answer questions about you accurately.
Use the Copy buttons on each tab. Save robots.txt and llms.txt at your site root. The AI-Bot Policy tab is a human-readable summary you can share with your team.
llms.txt is an emerging convention (spec at llmstxt.org) — a Markdown file at your site root (e.g. https://yoursite.com/llms.txt) that gives LLMs a curated index of your most important pages plus a short summary. It's like robots.txt for LLMs, but instead of telling crawlers what NOT to do, it tells them what to read first so they cite you accurately. You don't strictly need it yet — most LLMs still crawl HTML — but adopting early is cheap insurance for AI-search visibility, the same way XML sitemaps were optional for years before becoming standard.
Allowing AI crawlers is how you become a citable source inside ChatGPT, Claude, and Perplexity answers. When a user asks an AI assistant a question your content can answer, the assistant pulls from sites it has crawled and cites them by name with a clickable link. If you block these bots you become invisible in AI answers — even if you rank #1 in Google. For most marketing sites, the citation traffic is worth more than any copyright concern.
Blocking dedicated AI bots (GPTBot, ClaudeBot, CCBot, etc.) does NOT affect Google Search, Bing, or any traditional SEO ranking — those use different user-agents (Googlebot, Bingbot) which you should never block. Special case: blocking Google-Extended only opts you out of Gemini training, NOT Google Search. Same for Applebot-Extended (opts out of Apple Intelligence training, doesn't touch Spotlight or Siri). The only SEO-adjacent risk is losing visibility inside AI search engines (ChatGPT search, Perplexity, You.com).
No. robots.txt is a voluntary standard (RFC 9309). Well-behaved crawlers from OpenAI, Anthropic, Google, Microsoft, and most reputable AI companies do respect it. Bad actors — scrapers, training datasets that anonymise their fetches, or unbranded crawlers — ignore it entirely. For hard enforcement use server-side rules: Cloudflare bot management, WAF rules, IP allowlisting, or rate-limiting at the edge. Treat robots.txt as a polite request, not a security control.
Yes — the robots.txt spec supports per-user-agent Disallow paths. The current builder generates a site-wide Allow / Disallow per bot for simplicity. Per-bot path-level rules (e.g. block GPTBot from /pricing only) is on the roadmap. If you need it now, hand-edit the generated file: under the GPTBot block, replace "Disallow: /" with one or more "Disallow: /pricing" lines.
Both files go at your site root, served at the URL paths /robots.txt and /llms.txt with content-type text/plain. On Next.js you can put them in /public, return them from app/robots.ts and app/llms.txt/route.ts, or generate them at build time. On WordPress, most SEO plugins (Yoast, Rank Math, AIOSEO) have a robots.txt editor. On a static site, just commit the file. Don't forget to ping Google Search Console to re-crawl after major robots.txt changes.
Generating the files is step one. Our SEO team optimises sites for AI search engines — schema markup, llms.txt curation, citable content structure, and the technical SEO that wins both Google and AI answers. The same playbook we use on our in-house products PenLeap and TalkDrill.
Talk to our SEO teamWe audit your site for AI citation readiness, generate llms.txt + robots.txt + schema markup, and ship the structured-data plumbing that gets you cited in ChatGPT and Perplexity answers.