Functions
strip_html
Remove HTML tags from text and normalize whitespace.
Delegates to bengal.utils.text.strip_html wit…
strip_html
def strip_html(text: str) -> str
Remove HTML tags from text and normalize whitespace.
Delegates to bengal.utils.text.strip_html with additional whitespace normalization specific to output format generation.
Parameters 1
| Name | Type | Default | Description |
|---|---|---|---|
text |
str |
— | HTML text |
Returns
Plain text with HTML tags, entities, and excess whitespace removedstr
—
generate_excerpt
Generate excerpt from text using character-based truncation.
Note: This uses character-based trunc…
generate_excerpt
def generate_excerpt(text: str, length: int = 200) -> str
Generate excerpt from text using character-based truncation.
Note: This uses character-based truncation for backward compatibility with output format generation. For word-based truncation, use bengal.utils.text.generate_excerpt directly.
Parameters 2
| Name | Type | Default | Description |
|---|---|---|---|
text |
str |
— | Source text (may contain HTML) |
length |
int |
200 |
Maximum character length |
Returns
Excerpt string, truncated at word boundary with ellipsisstr
—
get_page_relative_url
Get clean relative URL for page (without baseurl).
get_page_relative_url
def get_page_relative_url(page: Page, site: Any) -> str
Get clean relative URL for page (without baseurl).
Parameters 2
| Name | Type | Default | Description |
|---|---|---|---|
page |
Page |
— | Page to get URL for |
site |
Any |
— | Site instance |
Returns
Relative URL string (without baseurl)str
—
get_page_url
Get the public URL for a page.
get_page_url
def get_page_url(page: Page, site: Any) -> str
Get the public URL for a page.
Parameters 2
| Name | Type | Default | Description |
|---|---|---|---|
page |
Page |
— | Page to get URL for |
site |
Any |
— | Site instance |
Returns
Full public URL including baseurlstr
—
get_page_json_path
Get the output path for a page's JSON file.
get_page_json_path
def get_page_json_path(page: Page) -> Path | None
Get the output path for a page's JSON file.
Parameters 1
| Name | Type | Default | Description |
|---|---|---|---|
page |
Page |
— | Page to get JSON path for |
Returns
Path for the JSON file, or None if output_path not availablePath | None
—
get_page_txt_path
Get the output path for a page's TXT file.
get_page_txt_path
def get_page_txt_path(page: Page) -> Path | None
Get the output path for a page's TXT file.
Parameters 1
| Name | Type | Default | Description |
|---|---|---|---|
page |
Page |
— | Page to get TXT path for |
Returns
Path for the TXT file, or None if output_path not availablePath | None
—
normalize_url
Normalize a URL for consistent comparison.
normalize_url
def normalize_url(url: str) -> str
Normalize a URL for consistent comparison.
Parameters 1
| Name | Type | Default | Description |
|---|---|---|---|
url |
str |
— | URL to normalize |
Returns
Normalized URL with consistent formattingstr
—