SEO & Discovery

Bengal already ships with most of the technical building blocks needed for search and discovery. The main work is using those features well and publishing pages that match real search intent.

What Bengal Supports

Page Metadata

Bengal pages can carry structured front matter such as:

title
description
keywords
canonical
noindex

The default theme uses those fields to render metadata such as descriptions, keyword tags, canonical URLs, Open Graph tags, Twitter cards, and robots directives.

See SEO Functions for the template helpers that power these tags.

Search Engine Discovery

Bengal's post-processing pipeline includes:

XML sitemap generation for search engines
RSS feeds for blog-style content
Generated special pages such as404
Generatedrobots.txtwith Content Signals directives
Version-aware canonical URLs for versioned documentation
.well-known/content-signals.jsonmachine-readable policy manifest

These features help avoid duplicate-content problems, give search engines a clean map of your site, and let you control how AI systems use your content.

Bengal supports social sharing metadata through:

Open Graph URL and description tags
Open Graph image support
Auto-generated social cards
Twitter card metadata in the default theme

For projects that rely on docs links shared in Slack, Discord, X, or GitHub, social cards are one of the highest-leverage discovery features after page titles and descriptions.

On-Site Discovery

Bengal also improves discovery inside the site itself:

Client-side search indexes
Pre-built Lunr search indexes
Related-content patterns through tags and template queries
Broken-link detection and health checks
Content analysis for orphan pages and internal-link quality

These do not directly rank pages, but they make sites easier to navigate, which often leads to better content structure and clearer internal linking.

Machine Discovery

Bengal can generate machine-friendly output formats such as:

Per-pageindex.json(with optional heading-level chunks for RAG)
Site-wideindex.json
search-index.json
llm-full.txt— full plain-text corpus of all pages
llms.txt— curated site overview per the llms.txt spec
changelog.json— per-build diff of added, modified, and removed pages
agent.json— hierarchical site structure for agent discovery

See Output Formats for configuration details.

llms.txtis a short Markdown table of contents that tells AI agents what the site is and where to find things. It is auto-generated from the site's section hierarchy and page descriptions. Unlikellm-full.txt (a full content dump), llms.txtis a lightweight navigation aid — typically under 100 lines.

Per-page JSON includes structured navigation, freshness data, and optional heading-level chunks for AI agents:

{
  "url": "/docs/getting-started/installation/",
  "title": "Installation",
  "navigation": {
    "parent": "/docs/getting-started/",
    "prev": "/docs/getting-started/quickstart/",
    "next": "/docs/getting-started/configuration/",
    "related": ["/docs/building/deployment/"]
  },
  "last_modified": "2026-03-10T14:30:00",
  "content_hash": "a1b2c3...",
  "chunks": [
    {"anchor": "prerequisites", "title": "Prerequisites", "level": 2, "content": "...", "content_hash": "..."},
    {"anchor": "steps", "title": "Steps", "level": 2, "content": "...", "content_hash": "..."}
  ]
}

navigationlets agents traverse docs without parsing HTML nav elements
last_modified comes from frontmatter (lastmod, last_modified, updated) or file mtime
content_hashis a SHA-256 of the plain text, so RAG pipelines know when to re-index
chunks(when enabled) splits content by headings for finer-grained RAG retrieval

These outputs help with search, internal tooling, and AI consumption without adding a backend.

Connect to IDE (Cursor MCP)

Bengal can show a "Connect to IDE" button that opens Cursor and adds your docs as an MCP server via a one-click install. Requires a hosted Streamable HTTP MCP server — Bengal generates the button; you provide the server. See Connect to IDE for setup.

Content Signals

Bengal generates arobots.txtwith Content Signals directives that declare how automated systems may use your content. Three signals are supported:

Signal	Default	Meaning
`search`	`true`	Allow search engine indexing
`ai_input`	`true`	Allow AI input (RAG, grounding, AI answers)
`ai_train`	`false`	Allow AI model training and fine-tuning

The default posture is privacy-first: content is discoverable and citable by AI systems, but not available for training. Users opt in toai_train.

Site-Wide Configuration

Set defaults inbengal.toml:

[content_signals]
search = true
ai_input = true
ai_train = false  # opt-in for training

# Target specific crawlers
[content_signals.user_agents.GPTBot]
ai_train = false
ai_input = true

Per-Page and Per-Section Control

Override signals using thevisibilityfrontmatter. These values cascade through sections via_index.md:

# Per page
---
visibility:
  ai_train: false
  ai_input: true
---

# Section cascade (docs/_index.md) — all children inherit
---
cascade:
  visibility:
    ai_train: true
---

Disabling Content Signals

To skiprobots.txtand manifest generation entirely:

[content_signals]
enabled = false

Enforcement

Content Signals are not just advisory. Bengal enforces them at the output format level:

Pages withai_input: false do not get page.json or page.txtgenerated
Pages withai_train: false are excluded from llm-full.txt
Pages withsearch: false are excluded from index.json
Draft pages are excluded from all machine-readable outputs regardless of visibility

The format simply does not exist on disk for denied or draft pages.

Generated Files

File	Purpose
`robots.txt`	Content-Signal directives per the spec
`.well-known/content-signals.json`	Machine-readable policy manifest for AI discovery
`llms.txt`	Curated site overview for AI agents per llmstxt.org
`changelog.json`	Per-build diff of added, modified, removed pages (for incremental indexing)
`agent.json`	Hierarchical site structure and available formats (for agent discovery)

The meta tagscontent-signal:ai-train and content-signal:ai-inputare also emitted in the HTML<head>when a page restricts any signal.

Practical Strategy

If you want Bengal sites to rank and convert better, focus on this order:

Write pages that match real search intent: installation, quickstart, tutorials, migration guides, troubleshooting, and comparisons.
Give every important page a cleartitle and description.
Use tags, categories, and internal links so related pages reinforce each other.
Configurebaseurlso canonical URLs and metadata resolve to the right domain.
Enable social cards for content people are likely to share publicly.
Publish feeds, sitemap, and search indexes as part of normal builds.
Review your content signals policy — decide which sections allow AI training.

Recommended Page Types

Bengal works well when a project publishes some mix of:

Install <project>
<project> quickstart
<project> tutorial
<project> vs <alternative>
Migrate from <other tool>
<project> troubleshooting

Those titles line up with how Python developers search for libraries.

Example Front Matter

---
title: Build a Documentation Site with Bengal
description: Create a Python-powered documentation site with search, sitemap, RSS, and social sharing metadata.
keywords:
  - python static site generator
  - documentation site generator
  - bengal
canonical: https://example.com/docs/build-a-documentation-site/
visibility:
  ai_train: true  # allow training on this page
---

Content Over Tricks

The technical side of SEO in Bengal is already strong. The bigger opportunity is publishing better discovery pages:

sharper README copy
better docs landing pages
migration guides
comparison pages
tutorial pages for concrete workflows

That is where many of the biggest gains come from.