# Output Formats URL: /bengal/docs/building/output-formats/ Section: building Description: Generate JSON, LLM-ready text, and other output formats for search and AI discovery --- Bengal can generate multiple output formats for your content, enabling search functionality, AI discovery, and programmatic access. ## Available Formats ### Per-Page Formats Generated for every page in your site: - **JSON** (`index.json`): Structured data including metadata, HTML content, plain text, and optional heading-level chunks for RAG. - **LLM Text** (`index.txt`): AI-friendly plain text format optimized for **RAG** (Retrieval-Augmented Generation) and LLM consumption. ### Site-Wide Formats Generated at the site root: - **Site Index** (`index.json`): A searchable index of all pages (useful for client-side search). - **Full LLM Text** (`llm-full.txt`): The complete content of your site in a single plain text file. - **LLMs.txt** (`llms.txt`): Curated site overview per the [llms.txt spec](https://llmstxt.org/) — lightweight navigation for AI agents. - **Build Changelog** (`changelog.json`): Per-build diff of added, modified, and removed pages (for incremental indexing). - **Agent Manifest** (`agent.json`): Hierarchical site structure with sections and available formats (for agent discovery). ## Configuration Enable output formats in your config file. :::{tab-set} :::{tab-item} YAML (directory config) ```yaml # config/_default/outputs.yaml output_formats: enabled: true per_page: ["json"] site_wide: ["index_json"] options: excerpt_length: 200 # Excerpt length for site index json_indent: null # null for compact JSON, 2 for pretty-print llm_separator_width: 80 # Width of LLM text separators include_full_content_in_index: false # Include full content in site index include_chunks: true # Heading-level chunks in per-page JSON (for RAG) exclude_sections: [] # Sections to exclude from output formats exclude_patterns: ["404.html", "search.html"] # Files to exclude ``` ::: :::{tab-item} TOML (single file) ```toml # bengal.toml [output_formats] enabled = true per_page = ["json"] site_wide = ["index_json"] [output_formats.options] excerpt_length = 200 json_indent = null llm_separator_width = 80 include_full_content_in_index = false include_chunks = true exclude_sections = [] exclude_patterns = ["404.html", "search.html"] ``` ::: :::{/tab-set} :::{tip} **Effective Defaults**: The `[features]` section controls which formats are enabled. With default features (`json = true`, `llm_txt = true`), Bengal generates: - **per_page**: `["json", "llm_txt"]` (both JSON and LLM text) - **site_wide**: `["index_json", "llm_full", "llms_txt", "changelog", "agent_manifest"]` (search index, LLM texts, build changelog, and agent manifest) To disable LLM text generation, set `features.llm_txt = false` in your config. ::: :::{note} **Visibility**: Output formats respect page visibility settings. Hidden pages and drafts are excluded by default. Use `exclude_sections` or `exclude_patterns` for additional filtering. ::: ## Use Cases ### Client-Side Search Fetch the site index to implement fast, client-side search without a backend. :::{note} For larger sites, enable the **Pre-built Lunr Index** to improve performance. This requires the `search` optional dependency: ```bash pip install "bengal[search]" ``` This generates `search-index.json` (a pre-serialized Lunr index) in addition to `index.json`, which loads faster in the browser. ::: ```html

``` ### AI & LLM Discovery Provide `llm-full.txt` to LLMs to allow them to ingest your entire documentation site efficiently. ```bash curl https://mysite.com/llm-full.txt ``` ### Static API Use your static site as a read-only API for other applications. ```python import requests # Get page data data = requests.get('https://mysite.com/docs/intro/index.json').json() print(data['title']) print(data['word_count']) ```