Output Formats

Generate JSON, LLM-ready text, and other output formats for search and AI discovery

3 min read 630 words
Edit this page

Was this page helpful?

Bengal can generate multiple output formats for your content, enabling search functionality, AI discovery, and programmatic access.

Available Formats

Per-Page Formats

Generated for every page in your site:

  • JSON (index.json): Structured data including metadata, HTML content, plain text, and optional heading-level chunks for RAG.
  • LLM Text (index.txt): AI-friendly plain text format optimized for RAG (Retrieval-Augmented Generation) and LLM consumption.
  • Markdown (index.md): Markdown mirror for coding agents and documentation checkers. Each file includes a short directive pointing agents to the site's llms.txtindex.

Site-Wide Formats

Generated at the site root:

  • Site Index (index.json): A searchable index of all pages (useful for client-side search).
  • Full LLM Text (llm-full.txt): The complete content of your site in a single plain text file.
  • LLMs.txt (llms.txt): Curated site overview per the llms.txt spec — lightweight navigation for AI agents.
  • Build Changelog (changelog.json): Per-build diff of added, modified, and removed pages (for incremental indexing).
  • Agent Manifest (agent.json): Hierarchical site structure with sections and available formats (for agent discovery).

Configuration

Enable output formats in your config file.

YAML
# config/_default/outputs.yaml
output_formats:
  enabled: true
  per_page: ["json", "llm_txt", "markdown"]
  site_wide: ["index_json"]
  options:
    excerpt_length: 200                    # Excerpt length for site index
    json_indent: null                      # null for compact JSON, 2 for pretty-print
    llm_separator_width: 80                # Width of LLM text separators
    include_full_content_in_index: false   # Include full content in site index
    include_chunks: true                    # Heading-level chunks in per-page JSON (for RAG)
    exclude_sections: []                   # Sections to exclude from output formats
    exclude_patterns: ["404.html", "search.html"]  # Files to exclude
TOML
# bengal.toml
[output_formats]
enabled = true
per_page = ["json", "llm_txt", "markdown"]
site_wide = ["index_json"]

[output_formats.options]
excerpt_length = 200
json_indent = null
llm_separator_width = 80
include_full_content_in_index = false
include_chunks = true
exclude_sections = []
exclude_patterns = ["404.html", "search.html"]

Tip

Effective Defaults: The[features] section controls which formats are enabled. With default features (json = true, llm_txt = true), Bengal generates:

  • per_page:["json", "llm_txt", "markdown"](JSON, LLM text, and Markdown mirrors)
  • site_wide:["index_json", "llm_full", "llms_txt", "changelog", "agent_manifest"](search index, LLM texts, build changelog, and agent manifest)

To disable LLM text generation, setfeatures.llm_txt = falsein your config.

Note

Visibility: Output formats respect page visibility settings. Hidden pages and drafts are excluded by default. Useexclude_sections or exclude_patternsfor additional filtering.

Use Cases

Fetch the site index to implement fast, client-side search without a backend.

Note

For larger sites, enable the Pre-built Lunr Index to improve performance. This requires thesearchoptional dependency:

BASH
pip install "bengal[search]"

This generates search-index.json (a pre-serialized Lunr index) in addition to index.json, which loads faster in the browser. Bengal's search backend is explicit and defaults tosearch.backend: lunr. index.jsonremains the stable source artifact for client-side search, and search-index.jsonis emitted only by the Lunr backend when prebuilding is enabled.

HTML
<!-- Simple search UI -->
<input type="text" id="search-input" placeholder="Search...">
<ul id="search-results"></ul>

<script>
  const searchInput = document.getElementById('search-input');
  const resultsList = document.getElementById('search-results');
  let searchIndex = [];

  // Fetch index once
  fetch('/index.json')
    .then(response => response.json())
    .then(data => {
      searchIndex = data.pages;
    });

  // Filter and display results
  searchInput.addEventListener('input', (e) => {
    const query = e.target.value.toLowerCase();
    if (query.length < 2) {
      resultsList.innerHTML = '';
      return;
    }

    const results = searchIndex.filter(page =>
      (page.title && page.title.toLowerCase().includes(query)) ||
      (page.excerpt && page.excerpt.toLowerCase().includes(query))
    ).slice(0, 10);

    resultsList.innerHTML = results.map(page => `
      <li>
        <a href="${page.href}">
          <strong>${page.title}</strong>
          <p>${page.excerpt}</p>
        </a>
      </li>
    `).join('');
  });
</script>

AI & LLM Discovery

Providellm-full.txtto LLMs to allow them to ingest your entire documentation site efficiently.

BASH
curl https://mysite.com/llm-full.txt

Static API

Use your static site as a read-only API for other applications.

PYTHON
import requests

# Get page data
data = requests.get('https://mysite.com/docs/intro/index.json').json()
print(data['title'])
print(data['word_count'])