Module

`parsing.charsets`

Character sets for O(1) classification.

All sets are frozensets for:

O(1) membership testing (vs O(n) for strings)
Immutability (thread-safe)
Module-level caching (no per-call allocation)

Reference: CommonMark 0.31.2 specification

Usage:

from patitas.parsing.charsets import ASCII_PUNCTUATION

if char in ASCII_PUNCTUATION:  # O(1) lookup
    ...

Functions

is_unicode_punctuation 1 bool ▼

Check if character is Unicode punctuation (Pc, Pd, Pe, Pf, Pi, Po, Ps, or Sc, S…

def is_unicode_punctuation(char: str) -> bool

Check if character is Unicode punctuation (Pc, Pd, Pe, Pf, Pi, Po, Ps, or Sc, Sk, Sm, So).

CommonMark uses Unicode punctuation categories for flanking rules. This includes ASCII punctuation as a subset.

Parameters

Name	Type	Description
`char`	`str`

Returns

bool

is_unicode_whitespace 1 bool ▼

Check if character is Unicode whitespace. CommonMark uses Unicode whitespace f…

def is_unicode_whitespace(char: str) -> bool

Check if character is Unicode whitespace.

CommonMark uses Unicode whitespace for emphasis flanking rules. Includes ASCII whitespace and Unicode category Zs (space separator). Also treats empty string as whitespace (for boundary checks).

Parameters

Name	Type	Description
`char`	`str`

Returns

bool