Module

postprocess.output_formats.utils

Shared utilities for output format generation.

Provides common functions used across all output format generators including text processing, URL handling, and path resolution.

Note:

Text utilities delegate to bengal.utils.text for DRY compliance.
See RFC: plan/active/rfc-code-quality-improvements.md

Functions

strip_html
Remove HTML tags from text and normalize whitespace. Delegates to bengal.utils.text.strip_html wit…
1 str
def strip_html(text: str) -> str

Remove HTML tags from text and normalize whitespace.

Delegates to bengal.utils.text.strip_html with additional whitespace normalization specific to output format generation.

Parameters 1

Name Type Default Description
text str

HTML text

Returns

str

Plain text with HTML tags, entities, and excess whitespace removed

generate_excerpt
Generate excerpt from text using character-based truncation. Note: This uses character-based trunc…
2 str
def generate_excerpt(text: str, length: int = 200) -> str

Generate excerpt from text using character-based truncation.

Note: This uses character-based truncation for backward compatibility with output format generation. For word-based truncation, use bengal.utils.text.generate_excerpt directly.

Parameters 2

Name Type Default Description
text str

Source text (may contain HTML)

length int 200

Maximum character length

Returns

str

Excerpt string, truncated at word boundary with ellipsis

get_page_relative_url
Get clean relative URL for page (without baseurl).
2 str
def get_page_relative_url(page: Page, site: Any) -> str

Get clean relative URL for page (without baseurl).

Parameters 2

Name Type Default Description
page Page

Page to get URL for

site Any

Site instance

Returns

str

Relative URL string (without baseurl)

get_page_url
Get the public URL for a page.
2 str
def get_page_url(page: Page, site: Any) -> str

Get the public URL for a page.

Parameters 2

Name Type Default Description
page Page

Page to get URL for

site Any

Site instance

Returns

str

Full public URL including baseurl

get_page_json_path
Get the output path for a page's JSON file.
1 Path | None
def get_page_json_path(page: Page) -> Path | None

Get the output path for a page's JSON file.

Parameters 1

Name Type Default Description
page Page

Page to get JSON path for

Returns

Path | None

Path for the JSON file, or None if output_path not available

get_page_txt_path
Get the output path for a page's TXT file.
1 Path | None
def get_page_txt_path(page: Page) -> Path | None

Get the output path for a page's TXT file.

Parameters 1

Name Type Default Description
page Page

Page to get TXT path for

Returns

Path | None

Path for the TXT file, or None if output_path not available

normalize_url
Normalize a URL for consistent comparison.
1 str
def normalize_url(url: str) -> str

Normalize a URL for consistent comparison.

Parameters 1

Name Type Default Description
url str

URL to normalize

Returns

str

Normalized URL with consistent formatting