robots.txt for AI Agents: Complete Guide
Learn how to read, respect, and work with robots.txt when building AI agents that access the web.
Marcus Chen
Founder & CEO

robots.txt is the standard for telling bots what they can and can't access. Understanding it is essential for building ethical AI agents.
What is robots.txt?
A text file at the root of a website (e.g., https://example.com/robots.txt) that provides crawling instructions for bots.
Basic Syntax
# Allow all bots
User-agent: *
Allow: /
# Block all bots
User-agent: *
Disallow: /
# Block specific paths
User-agent: *
Disallow: /admin/
Disallow: /private/
# Allow specific bot
User-agent: Tryb-Agent
Allow: /
Common Directives
| Directive | Meaning |
|---|---|
| User-agent: * | Applies to all bots |
| Disallow: / | Block entire site |
| Disallow: /path/ | Block specific path |
| Allow: /path/ | Explicitly allow path |
| Crawl-delay: 10 | Wait 10s between requests |
| Sitemap: url | Location of sitemap |
AI Agent Considerations
AI agents should:
- Check robots.txt before scraping any domain
- Use a descriptive User-agent string
- Respect Crawl-delay directives
- Cache robots.txt (refresh every 24h)
import robotsParser from 'robots-parser';
async function canScrape(url: string): Promise<boolean> {
const domain = new URL(url).origin;
const robotsUrl = `${domain}/robots.txt`;
const response = await fetch(robotsUrl);
const robotsTxt = await response.text();
const robots = robotsParser(robotsUrl, robotsTxt);
return robots.isAllowed(url, 'Tryb-Agent');
}
Legal Status
robots.txt is not legally binding, but:
- Courts have referenced it in scraping cases
- Ignoring it may constitute trespass or ToS violation
- Following it demonstrates good faith
Related Guides

Marcus Chen
Founder & CEO at Tryb
Marcus advocates for ethical AI development.


