Module

utils.text

Text processing utilities.

Provides canonical implementations for common text operations like slugification, HTML stripping, truncation, and excerpt generation. These utilities consolidate duplicate implementations found throughout the codebase.

Example:

from bengal.utils.text import slugify, strip_html, truncate_words

slug = slugify("Hello World!")  # "hello-world"
text = strip_html("<p>Hello</p>")  # "Hello"
excerpt = truncate_words("Long text here...", 10)

Functions

slugify
Convert text to URL-safe slug with Unicode support. Preserves Unicode word characters (letters, di…
4 str
def slugify(text: str, unescape_html: bool = True, max_length: int | None = None, separator: str = '-') -> str

Convert text to URL-safe slug with Unicode support.

Preserves Unicode word characters (letters, digits, underscore) to support international content. Modern web browsers and servers handle Unicode URLs.

Consolidates implementations from:

  • bengal/rendering/parser.py:629 (_slugify)
  • bengal/rendering/template_functions/strings.py:92 (slugify)
  • bengal/rendering/template_functions/taxonomies.py:184 (tag_url pattern)

Parameters 4

Name Type Default Description
text str

Text to slugify

unescape_html bool True

Whether to decode HTML entities first (e.g., &amp; -> &)

max_length int | None None

Maximum slug length (None = unlimited)

separator str '-'

Character to use between words (default: '-')

Returns

str

URL-safe slug (lowercase, with Unicode word chars and separators)

strip_html
Remove all HTML tags from text. Consolidates implementation from: - bengal/rendering/template_func…
2 str
def strip_html(text: str, decode_entities: bool = True) -> str

Remove all HTML tags from text.

Consolidates implementation from:

  • bengal/rendering/template_functions/strings.py:157 (strip_html)

Parameters 2

Name Type Default Description
text str

HTML text to clean

decode_entities bool True

Whether to decode HTML entities (e.g., &lt; -> <)

Returns

str

Plain text with HTML tags removed

truncate_words
Truncate text to specified word count. Consolidates pattern from: - bengal/rendering/template_func…
3 str
def truncate_words(text: str, word_count: int, suffix: str = '...') -> str

Truncate text to specified word count.

Consolidates pattern from:

  • bengal/rendering/template_functions/strings.py (truncatewords)

Parameters 3

Name Type Default Description
text str

Text to truncate

word_count int

Maximum number of words

suffix str '...'

Suffix to append if truncated

Returns

str

Truncated text with suffix if shortened

truncate_chars
Truncate text to specified character length (including suffix).
3 str
def truncate_chars(text: str, length: int, suffix: str = '...') -> str

Truncate text to specified character length (including suffix).

Parameters 3

Name Type Default Description
text str

Text to truncate

length int

Maximum total length (including suffix if truncated)

suffix str '...'

Suffix to append if truncated

Returns

str

Truncated text with suffix if shortened, never exceeding length

truncate_middle
Truncate text in the middle (useful for file paths).
3 str
def truncate_middle(text: str, max_length: int, separator: str = '...') -> str

Truncate text in the middle (useful for file paths).

Parameters 3

Name Type Default Description
text str

Text to truncate

max_length int

Maximum total length

separator str '...'

Separator to use in middle

Returns

str

Truncated text with separator in middle

generate_excerpt
Generate plain text excerpt from HTML content. Combines strip_html and truncate_words for common u…
3 str
def generate_excerpt(html: str, word_count: int = 50, suffix: str = '...') -> str

Generate plain text excerpt from HTML content.

Combines strip_html and truncate_words for common use case.

Consolidates pattern from:

  • bengal/postprocess/output_formats.py:674
  • Various template functions

Parameters 3

Name Type Default Description
html str

HTML content

word_count int 50

Maximum number of words

suffix str '...'

Suffix to append if truncated

Returns

str

Plain text excerpt

normalize_whitespace
Normalize whitespace in text.
2 str
def normalize_whitespace(text: str, collapse: bool = True) -> str

Normalize whitespace in text.

Parameters 2

Name Type Default Description
text str

Text to normalize

collapse bool True

Whether to collapse multiple spaces to single space

Returns

str

Text with normalized whitespace

escape_html
Escape HTML entities. Converts special characters to HTML entities: - < becomes &lt; - > becomes &…
1 str
def escape_html(text: str) -> str

Escape HTML entities.

Converts special characters to HTML entities:

  • < becomes &lt;
  • becomes &gt;

  • & becomes &amp;
  • " becomes &quot;
  • ' becomes &#x27;

Parameters 1

Name Type Default Description
text str

Text to escape

Returns

str

HTML-escaped text

unescape_html
Unescape HTML entities. Converts HTML entities back to characters: - &lt; becomes < - &gt; becomes…
1 str
def unescape_html(text: str) -> str

Unescape HTML entities.

Converts HTML entities back to characters:

  • &lt; becomes <
  • &gt; becomes >
  • &amp; becomes &
  • &quot; becomes "

Parameters 1

Name Type Default Description
text str

HTML text with entities

Returns

str

Unescaped text

pluralize
Return singular or plural form based on count.
3 str
def pluralize(count: int, singular: str, plural: str | None = None) -> str

Return singular or plural form based on count.

Parameters 3

Name Type Default Description
count int

Count value

singular str

Singular form

plural str | None None

Plural form (default: singular + 's')

Returns

str

Appropriate form for the count

humanize_bytes
Format bytes as human-readable string.
1 str
def humanize_bytes(size_bytes: int) -> str

Format bytes as human-readable string.

Parameters 1

Name Type Default Description
size_bytes int

Size in bytes

Returns

str

Human-readable string (e.g., "1.5 KB", "2.3 MB")

humanize_number
Format number with thousand separators.
1 str
def humanize_number(num: int) -> str

Format number with thousand separators.

Parameters 1

Name Type Default Description
num int

Number to format

Returns

str

Formatted string with commas

humanize_slug
Convert slug or filename stem to human-readable title. Transforms kebab-case and snake_case identi…
1 str
def humanize_slug(slug: str) -> str

Convert slug or filename stem to human-readable title.

Transforms kebab-case and snake_case identifiers into Title Case strings suitable for display in navigation, page titles, and other user-facing contexts.

Consolidates pattern from:

  • bengal/core/page/metadata.py (title property)
  • bengal/discovery/content_discovery.py (fallback titles)
  • bengal/rendering/template_functions/navigation.py (breadcrumbs)
  • bengal/cli/helpers/menu_config.py (menu titles)
  • Various Jinja templates

Parameters 1

Name Type Default Description
slug str

Slug or filename stem (e.g., "my-page-name", "data_model")

Returns

str

Human-readable title (e.g., "My Page Name", "Data Model")