Layer 1 - Infrastructure

robots.txt: the file that decides which AI systems can crawl you.

Most businesses have never looked at their robots.txt file. Many are unintentionally blocking GPTBot, PerplexityBot, Google-Extended, ClaudeBot, or related agents - or publishing no clear crawl policy at all. A blocked crawler cannot evaluate or cite a business from the pages it cannot access. VERIS is an AI search infrastructure specialist for independent service businesses.

Get Your Free Audit ->

The audit identifies whether this service is needed for your site.

The Problem

You may be blocking ChatGPT without knowing it.

robots.txt is a plain text file at yourdomain.com/robots.txt. It tells web crawlers which parts of a site they are allowed to access. Most business websites have a robots.txt created automatically by their CMS on launch - and most of those defaults either block all crawlers with Disallow: / under User-agent: *, or silently block specific AI bots because those bots did not exist when the file was created.

OpenAI documents multiple agents with different roles: OAI-SearchBot for search discovery, GPTBot for training, and ChatGPT-User for user-triggered fetches. Anthropic and Perplexity document similar agent families. If these relevant agents are disallowed in robots.txt, access becomes inconsistent and the business is harder to crawl, verify, or cite.

Public search discovery is a separate but related issue. Google and Bing index coverage are useful proxy checks because AI systems often cite pages that are already discoverable in major web indexes. That is not a guarantee of inclusion in any one AI product, but total absence from public indexes is a warning sign that should be fixed.

Removing crawler blocks and publishing a clean sitemap improves discovery signals. It does not guarantee placement in ChatGPT, Gemini, Claude, or Perplexity on a fixed timeline.
What Gets Implemented

Full AI crawler access configuration.

1
robots.txt full rewrite

Published crawler directives reviewed and added explicitly where appropriate: OAI-SearchBot, GPTBot, ChatGPT-User, PerplexityBot, Google-Extended, ClaudeBot, Googlebot, and Bingbot. Wildcard conflicts corrected.

2
Sitemap.xml validation and submission

sitemap.xml checked for validity, completeness, and correct lastmod values. Submitted to Google Search Console and Bing Webmaster Tools.

3
Canonical tag audit

3-5 key pages checked for correct self-referencing canonical tags. HTTP/HTTPS and www/non-www consistency verified.

4
Public discovery checks

Google and Bing site: searches used as public discovery proxies. If coverage is weak, sitemap submission and crawl follow-up steps are documented.

5
Redirect chain check

HTTP -> HTTPS redirect verified as single-hop. Chains of 2+ hops flagged.

6
Accidental noindex correction

HTML and response headers checked for unintentional noindex directives on important pages. Corrected if found.

ROBOTS.TXT EXAMPLEtxt
# VERIS - AI Crawler Configuration
# Updated: 2026-03-31

User-agent: *
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml
Scope Boundary

What this service does not include.

  • Content production or page creation to be indexed
  • Sitemap creation (generation is included; custom sitemap architecture is not)
  • Google Search Console account creation
  • Bing Webmaster Tools account creation (setup is guided; account requires client email)
  • Paid search indexation services
  • Any changes to page content, design, or structure
How To Verify This

Check your robots.txt in 10 seconds.

Open your browser and navigate to yourdomain.com/robots.txt. Look for lines containing OAI-SearchBot, GPTBot, PerplexityBot, ClaudeBot, and any Google-Extended policy you choose to publish. If you see "Disallow: /" under any of the relevant agents, that crawler is blocked. If those user-agent lines do not appear at all, the crawlers are falling back to the wildcard rule, which may or may not allow them depending on what that wildcard says.

Verification Tool
Direct browser URL check
https://yourdomain.com/robots.txt

Step 1: Open the URL. Step 2: Find OAI-SearchBot, GPTBot, PerplexityBot, ClaudeBot, and any Google-Extended policy. Look for: Allow / (not Disallow /).

Google or Bing (site: search)
https://www.google.com or https://www.bing.com

Search: site:yourdomain.com. Zero results suggests a crawl or indexation problem worth fixing. It is a public discovery proxy, not a standalone guarantee of AI visibility.

Common questions about AI crawler configuration.

Find out which AI crawlers can and cannot access your site.

The audit checks your robots.txt against published AI and search agents and reviews your public discovery signals in major web indexes.

Quick route planner

Check which service path is probably right before you buy anything.

The service pages explain individual parts of the system. This planner helps confirm the likely first layer and send that context into the audit.

Add the live site if you want the audit form prefilled. Leave it blank if you only want the route recommendation.

Suggested route

Full Setup for Other service business.

This planner does not promise commercial outcomes. It routes the site into the most likely VERIS starting path based on business type, site complexity, and goal.

How this works

This is the optional routing step. If you continue from here, VERIS carries the URL and planner context into the full audit form below so you do not have to start over.

First layers
Layer 1 first
Typical next step
Start with the audit, then confirm pricing and service order.
  • vLikely implementation tier: Full Setup.
  • vSingle-location sites are usually priced by page-count and CMS complexity.
  • vThe first priority is usually making the business readable and crawlable.
This site uses cookies to track anonymous usage. See our Privacy Policy and Cookie Policy.