Module

utils.html

HTML utilities for Kida template engine.

Provides optimized HTML escaping, the Markup safe-string class, and related utilities. Zero external dependencies — pure Python 3.14+.

Security Hardening (RFC: Markup Security Hardening):

  • NUL byte stripping in all escaping
  • O(1) frozenset lookups for character classes
  • Attribute name validation in xmlattr()
  • Event handler attribute warnings
  • Context-specific escaping (JS, CSS, URL)

All operations are O(n) single-pass with no ReDoS risk.

Classes

Markup 33
A string subclass marking content as safe (won't be auto-escaped). The Markup class implements the…

A string subclass marking content as safe (won't be auto-escaped).

The Markup class implements the__html__protocol used by template engines to identify pre-escaped content. When combined with regular strings via operators like+, the non-Markup strings are automatically escaped.

This is Kida's native implementation — no external dependencies required.

Methods

format 0 Self
Format string, escaping non-Markup arguments.
def format(self, *args: Any, **kwargs: Any) -> Self
Returns
Self
join 1 Self
Join sequence, escaping non-Markup elements.
def join(self, seq: Iterable[str]) -> Self
Parameters
Name Type Description
seq
Returns
Self
capitalize 0 Self
def capitalize(self) -> Self
Returns
Self
casefold 0 Self
def casefold(self) -> Self
Returns
Self
center 2 Self
def center(self, width: SupportsIndex, fillchar: str = ' ') -> Self
Parameters
Name Type Description
width
fillchar Default:' '
Returns
Self
lower 0 Self
def lower(self) -> Self
Returns
Self
upper 0 Self
def upper(self) -> Self
Returns
Self
title 0 Self
def title(self) -> Self
Returns
Self
swapcase 0 Self
def swapcase(self) -> Self
Returns
Self
strip 1 Self
def strip(self, chars: str | None = None) -> Self
Parameters
Name Type Description
chars Default:None
Returns
Self
lstrip 1 Self
def lstrip(self, chars: str | None = None) -> Self
Parameters
Name Type Description
chars Default:None
Returns
Self
rstrip 1 Self
def rstrip(self, chars: str | None = None) -> Self
Parameters
Name Type Description
chars Default:None
Returns
Self
ljust 2 Self
def ljust(self, width: SupportsIndex, fillchar: str = ' ') -> Self
Parameters
Name Type Description
width
fillchar Default:' '
Returns
Self
rjust 2 Self
def rjust(self, width: SupportsIndex, fillchar: str = ' ') -> Self
Parameters
Name Type Description
width
fillchar Default:' '
Returns
Self
zfill 1 Self
def zfill(self, width: SupportsIndex) -> Self
Parameters
Name Type Description
width
Returns
Self
replace 3 Self
def replace(self, old: str, new: str, count: SupportsIndex = -1) -> Self
Parameters
Name Type Description
old
new
count Default:-1
Returns
Self
expandtabs 1 Self
def expandtabs(self, tabsize: SupportsIndex = 8) -> Self
Parameters
Name Type Description
tabsize Default:8
Returns
Self
split 2 list[Self]
def split(self, sep: str | None = None, maxsplit: SupportsIndex = -1) -> list[Self]
Parameters
Name Type Description
sep Default:None
maxsplit Default:-1
Returns
list[Self]
rsplit 2 list[Self]
def rsplit(self, sep: str | None = None, maxsplit: SupportsIndex = -1) -> list[Self]
Parameters
Name Type Description
sep Default:None
maxsplit Default:-1
Returns
list[Self]
splitlines 1 list[Self]
def splitlines(self, keepends: bool = False) -> list[Self]
Parameters
Name Type Description
keepends Default:False
Returns
list[Self]
partition 1 tuple[Self, Self, Self]
def partition(self, sep: str) -> tuple[Self, Self, Self]
Parameters
Name Type Description
sep
Returns
tuple[Self, Self, Self]
rpartition 1 tuple[Self, Self, Self]
def rpartition(self, sep: str) -> tuple[Self, Self, Self]
Parameters
Name Type Description
sep
Returns
tuple[Self, Self, Self]
striptags 0 Self
Remove HTML tags from the string. **IMPORTANT: This is for DISPLAY purposes on…
def striptags(self) -> Self

Remove HTML tags from the string.

IMPORTANT: This is for DISPLAY purposes only, not security.

This method uses regex-based tag removal which can be bypassed with malformed HTML. It is suitable for:

  • Displaying text previews
  • Extracting text content for search indexing
  • Formatting plain-text emails from HTML

For security (preventing XSS), always use html_escape() or the Markup class with autoescape enabled.

Returns
Self Markup with all HTML tags removed.
unescape 0 str
Convert HTML entities back to characters.
def unescape(self) -> str
Returns
str Plain string with entities decoded (no longer marked as safe).
escape 1 Self
Escape a value and wrap it as Markup. This is the class method form of escapin…
classmethod
def escape(cls, value: Any) -> Self

Escape a value and wrap it as Markup.

This is the class method form of escaping — use for explicit escaping.

Parameters
Name Type Description
value

Value to escape. Objects with__html__()are used as-is.

Returns
Self Markup instance with the escaped content.
Internal Methods 8
__new__ 1 Self
Create a Markup string.
def __new__(cls, value: Any = '') -> Self
Parameters
Name Type Description
value

Content to mark as safe. If it has an__html__()method, that method is called to get the string value.

Default:''
Returns
Self Markup instance containing the safe content.
__html__ 0 Self
Return self — already safe content. This method is the `__html__` protocol tha…
def __html__(self) -> Self

Return self — already safe content.

This method is the__html__protocol that template engines use to detect pre-escaped content.

Returns
Self
__repr__ 0 str
def __repr__(self) -> str
Returns
str
__add__ 1 Self
Concatenate, escaping `other` if not Markup.
def __add__(self, other: str) -> Self
Parameters
Name Type Description
other
Returns
Self
__radd__ 1 Self
Reverse concatenate, escaping `other` if not Markup.
def __radd__(self, other: str) -> Self
Parameters
Name Type Description
other
Returns
Self
__mul__ 1 Self
Repeat string n times.
def __mul__(self, n: SupportsIndex) -> Self
Parameters
Name Type Description
n
Returns
Self
__rmul__ 1 Self
Repeat string n times (reverse).
def __rmul__(self, n: SupportsIndex) -> Self
Parameters
Name Type Description
n
Returns
Self
__mod__ 1 Self
Format string with %-style, escaping non-Markup args.
def __mod__(self, args: Any) -> Self
Parameters
Name Type Description
args
Returns
Self
JSString 2
A string safe for JavaScript string literal context. Similar to Markup but for JavaScript strings …

A string safe for JavaScript string literal context.

Similar to Markup but for JavaScript strings instead of HTML. Prevents accidental double-escaping in JS contexts.

Methods

Internal Methods 2
__new__ 1 Self
def __new__(cls, value: Any = '') -> Self
Parameters
Name Type Description
value Default:''
Returns
Self
__repr__ 0 str
def __repr__(self) -> str
Returns
str
SoftStr 6
A string wrapper that defers __str__ evaluation. Useful for expensive string operations that may n…

A string wrapper that defers str evaluation.

Useful for expensive string operations that may not be needed. Commonly used with missing template variables or expensive lookups.

Thread-Safety: The lazy evaluation is NOT thread-safe. If you need thread-safe lazy evaluation, compute the value before passing to templates.

Methods

Internal Methods 6
__init__ 1
def __init__(self, func: Callable[[], str]) -> None
Parameters
Name Type Description
func
__str__ 0 str
def __str__(self) -> str
Returns
str
__html__ 0 str
Support __html__ protocol - escape when rendered. Handles the case where _func…
def __html__(self) -> str

Support html protocol - escape when rendered.

Handles the case where _func returns Markup properly.

Returns
str
__repr__ 0 str
def __repr__(self) -> str
Returns
str
__bool__ 0 bool
def __bool__(self) -> bool
Returns
bool
__len__ 0 int
def __len__(self) -> int
Returns
int

Functions

_escape_str 1 str
Escape a string for HTML (internal helper). Uses O(n) single-pass str.translat…
def _escape_str(s: str) -> str

Escape a string for HTML (internal helper).

Uses O(n) single-pass str.translate() for all strings.

Rationale: benchmarks on 2026-01-11 (Python 3.14) showed the previous frozenset intersection "fast path" was slower for 64–8192 byte inputs with no escapable characters (154ns → 5.9µs for translate-only vs 569ns → 50µs with intersection). We now always translate; CPython returns the original string when no substitutions are needed.

Complexity: O(n) single pass. No backtracking, no regex.

Parameters
Name Type Description
s str

String to escape.

Returns
str
_escape_arg 1 Any
Escape a value if it's a string but not Markup. Used for escaping format argum…
def _escape_arg(value: Any) -> Any

Escape a value if it's a string but not Markup.

Used for escaping format arguments.

Parameters
Name Type Description
value Any
Returns
Any
html_escape 1 str
O(n) single-pass HTML escaping with type optimization. Returns plain str (for …
def html_escape(value: Any) -> str

O(n) single-pass HTML escaping with type optimization.

Returns plain str (for template._escape use).

Complexity: O(n) single-pass using str.translate().

Security:

  • Escapes &, <, >, ", '
  • Strips NUL bytes () which can bypass some filters
  • Objects with html() are returned as-is (already safe)

Optimizations:

  1. Skip Markup objects (already safe)
    1. Skip objects with html() method (protocol-based safety)

    2. Skip numeric types (int, float, bool) - cannot contain HTML chars

    3. Single-pass translation instead of 5 chained .replace()

      The numeric type optimization provides ~2.5x speedup for number-heavy templates (benchmarks/test_benchmark_optimization_levers.py).

Parameters
Name Type Description
value Any

Value to escape (will be converted to string)

Returns
str
html_escape_filter 1 Markup
HTML escape returning Markup (for filter use). Returns Markup object so result…
def html_escape_filter(value: Any) -> Markup

HTML escape returning Markup (for filter use).

Returns Markup object so result won't be escaped again by autoescape.

Parameters
Name Type Description
value Any

Value to escape (will be converted to string)

Returns
Markup
_is_valid_attr_name 1 bool
Check if attribute name is valid per HTML5 spec. Uses O(n) single-pass check w…
def _is_valid_attr_name(name: str) -> bool

Check if attribute name is valid per HTML5 spec.

Uses O(n) single-pass check with frozenset for O(1) char lookup. No regex.

Parameters
Name Type Description
name str

Attribute name to validate.

Returns
bool
xmlattr 1 Markup
Convert dict to XML/HTML attributes string. Escapes attribute values and forma…
def xmlattr(value: dict[str, Any]) -> Markup

Convert dict to XML/HTML attributes string.

Escapes attribute values and formats as key="value" pairs. Returns Markup to prevent double-escaping when autoescape is enabled.

Attribute ordering: Python 3.7+ dicts maintain insertion order. Output order matches input dict order.

Security:

  • Validates attribute names per HTML5 spec
  • Warns on event handler attributes (onclick, onerror, etc.)
  • Escapes all attribute values
Parameters
Name Type Description
value dict[str, Any]

Dictionary of attribute names to values.

Returns
Markup
js_escape 1 str
Escape a value for use inside JavaScript string literals. This escapes charact…
def js_escape(value: Any) -> str

Escape a value for use inside JavaScript string literals.

This escapes characters that could break out of a JS string or inject code. Use this when embedding user data in inline scripts.

Complexity: O(n) single pass using str.translate().

Parameters
Name Type Description
value Any

Value to escape (will be converted to string).

Returns
str
css_escape 1 str
Escape a value for use in CSS contexts. Protects against breaking out of quote…
def css_escape(value: Any) -> str

Escape a value for use in CSS contexts.

Protects against breaking out of quotes in properties or injecting malicious content into url() or @import.

Complexity: O(n) single pass using str.translate().

Parameters
Name Type Description
value Any

Value to escape.

Returns
str
url_is_safe 1 bool
Check if a URL has a safe protocol scheme. Protects against javascript:, vbscr…
def url_is_safe(url: str) -> bool

Check if a URL has a safe protocol scheme.

Protects against javascript:, vbscript:, and data: URLs that can execute code when used in href/src attributes.

Uses window-based parsing: O(n) single pass, no regex.

Parameters
Name Type Description
url str

URL to check.

Returns
bool
safe_url 1 str
Return URL if safe, otherwise return fallback. Use in templates where you need…
def safe_url(url: str) -> str

Return URL if safe, otherwise return fallback.

Use in templates where you need a safe URL value (href, src, etc.).

Parameters
Name Type Description
url str

URL to validate.

Returns
str
strip_tags 1 str
Remove HTML tags from string. **IMPORTANT: This is for DISPLAY purposes only, …
def strip_tags(value: str) -> str

Remove HTML tags from string.

IMPORTANT: This is for DISPLAY purposes only, not security.

This function uses regex-based tag removal which can be bypassed with malformed HTML. It is suitable for:

  • Displaying text previews
  • Extracting text content for search indexing
  • Formatting plain-text emails from HTML

For security (preventing XSS), always use html_escape() or the Markup class with autoescape enabled.

Uses pre-compiled regex for performance. O(n) single pass.

Parameters
Name Type Description
value str

String potentially containing HTML tags

Returns
str
spaceless 1 str
Remove whitespace between HTML tags.
def spaceless(html_str: str) -> str
Parameters
Name Type Description
html_str str

HTML string

Returns
str
format_html 1 Markup
Format a string with HTML escaping of all arguments. Like str.format() but esc…
def format_html(format_string: str, *args: Any, **kwargs: Any) -> Markup

Format a string with HTML escaping of all arguments.

Like str.format() but escapes all arguments for HTML safety. The format string itself is trusted (not escaped).

This is a convenience wrapper around Markup().format().

Parameters
Name Type Description
format_string str

Format string (trusted, not escaped). *args: Positional arguments (escaped). **kwargs: Keyword arguments (escaped).

Returns
Markup