About This Project
This project provides tools and resources related to the llmstxt.org initiative, which aims to standardize interactions with Large Language Models. You can learn more at the main project site: https://llmstxt.org/.
Our Data Collection Process
To provide comprehensive insights, we regularly crawl the top 1 million websites (updated every few days) to find `llms.txt` and `llms-full.txt` files. The collected data is then enriched with additional metrics, including website category and a quality score.
The quality score is derived by validating the discovered `llms.txt` and `llms-full.txt` files against the official specification using the llms_text
module available from llmstxt.org.
About the llmstxt Standard
The `llmstxt` standard aims to provide a structured way for website owners to communicate instructions and permissions to Large Language Models (LLMs) and other AI agents interacting with their sites. It complements existing standards like `robots.txt` (which focuses on crawlers) and `sitemap.xml` (which aids discovery).
An `llms.txt` file is a simple plain text file, typically located in the root directory of a website, containing directives in a key-value format. These directives can specify preferred interaction models, data usage policies, contact information for AI-related inquiries, and more. The goal is to foster clearer communication and responsible AI interactions online.
Key Features
- Comprehensive database of `llms.txt` and `llms-full.txt` files.
- Regular updates sourced from crawling top websites.
- Data enrichment with website category and quality score metrics.
- Validation against the official llmstxt.org standard.
Technology Stack
Built with Astro, Tailwind CSS.