utils

postprocess.output_formats.utils

Shared utilities for output format generation.

Provides common functions used across all output format generators including text processing, URL handling, and path resolution.

Note:

Text utilities delegate to bengal.utils.text for DRY compliance.
See RFC: plan/active/rfc-code-quality-improvements.md

Functions

strip_html

Remove HTML tags from text and normalize whitespace. Delegates to bengal.utils.text.strip_html wit…

1 str

def strip_html(text: str) -> str

Remove HTML tags from text and normalize whitespace.

Delegates to bengal.utils.text.strip_html with additional whitespace normalization specific to output format generation.

Parameters 1

Name	Type	Default	Description
`text`	`str`	—	HTML text

Returns

str —

Plain text with HTML tags, entities, and excess whitespace removed

generate_excerpt

Generate excerpt from text using character-based truncation. Note: This uses character-based trunc…

2 str

def generate_excerpt(text: str, length: int = 200) -> str

Generate excerpt from text using character-based truncation.

Note: This uses character-based truncation for backward compatibility with output format generation. For word-based truncation, use bengal.utils.text.generate_excerpt directly.

Parameters 2

Name	Type	Default	Description
`text`	`str`	—	Source text (may contain HTML)
`length`	`int`	`200`	Maximum character length

Returns

str —

Excerpt string, truncated at word boundary with ellipsis

get_page_relative_url

Get clean relative URL for page (without baseurl).

2 str

def get_page_relative_url(page: Page, site: Any) -> str

Get clean relative URL for page (without baseurl).

Parameters 2

Name	Type	Default	Description
`page`	`Page`	—	Page to get URL for
`site`	`Any`	—	Site instance

Returns

str —

Relative URL string (without baseurl)

get_page_url

Get the public URL for a page.

2 str

def get_page_url(page: Page, site: Any) -> str

Get the public URL for a page.

Parameters 2

Name	Type	Default	Description
`page`	`Page`	—	Page to get URL for
`site`	`Any`	—	Site instance

Returns

str —

Full public URL including baseurl

get_page_json_path

Get the output path for a page's JSON file.

1 Path | None

def get_page_json_path(page: Page) -> Path | None

Get the output path for a page's JSON file.

Parameters 1

Name	Type	Default	Description
`page`	`Page`	—	Page to get JSON path for

Returns

Path | None —

Path for the JSON file, or None if output_path not available

get_page_txt_path

Get the output path for a page's TXT file.

1 Path | None

def get_page_txt_path(page: Page) -> Path | None

Get the output path for a page's TXT file.

Parameters 1

Name	Type	Default	Description
`page`	`Page`	—	Page to get TXT path for

Returns

Path | None —

Path for the TXT file, or None if output_path not available

normalize_url

Normalize a URL for consistent comparison.

1 str

def normalize_url(url: str) -> str

Normalize a URL for consistent comparison.

Parameters 1

Name	Type	Default	Description
`url`	`str`	—	URL to normalize

Returns

str —

Normalized URL with consistent formatting