Module

parsing.charsets

Character sets for O(1) classification.

All sets are frozensets for:

  • O(1) membership testing (vs O(n) for strings)
  • Immutability (thread-safe)
  • Module-level caching (no per-call allocation)

Reference: CommonMark 0.31.2 specification

Usage:

from patitas.parsing.charsets import ASCII_PUNCTUATION

if char in ASCII_PUNCTUATION:  # O(1) lookup
    ...

Functions

is_unicode_punctuation 1 bool
Check if character is Unicode punctuation (Pc, Pd, Pe, Pf, Pi, Po, Ps, or Sc, S…
def is_unicode_punctuation(char: str) -> bool

Check if character is Unicode punctuation (Pc, Pd, Pe, Pf, Pi, Po, Ps, or Sc, Sk, Sm, So).

CommonMark uses Unicode punctuation categories for flanking rules. This includes ASCII punctuation as a subset.

Parameters
Name Type Description
char str
Returns
bool
is_unicode_whitespace 1 bool
Check if character is Unicode whitespace. CommonMark uses Unicode whitespace f…
def is_unicode_whitespace(char: str) -> bool

Check if character is Unicode whitespace.

CommonMark uses Unicode whitespace for emphasis flanking rules. Includes ASCII whitespace and Unicode category Zs (space separator). Also treats empty string as whitespace (for boundary checks).

Parameters
Name Type Description
char str
Returns
bool