Module

lexers.yaml_sm

Hand-written YAML lexer using composable scanner mixins.

O(n) guaranteed, zero regex, thread-safe.

Language Support:

  • YAML 1.2 syntax
  • Block scalars (| and >)
  • Flow sequences and mappings
  • Anchors (&name) and aliases (*name)
  • Tags (!tag)
  • Multiple boolean spellings (yes/no, on/off, true/false)

Special Handling:

YAML is whitespace-sensitive, so the lexer tracks line-start context:

  • Keys are identified by trailing:
  • Block indicators (|, >) start multiline scalars
  • Anchors/aliases use& and *prefixes

Boolean values in YAML have many spellings (true, True, TRUE, yes, Yes, YES, on, On, ON) — all are recognized as KEYWORD_CONSTANT.

Performance:

~60µs per 100-line file (YAML's complexity increases overhead).

Thread-Safety:

All lookup tables (_BOOL_VALUES, _NULL_VALUES) are frozen sets.

See Also:

  • rosettes.lexers.json_sm: JSON lexer (subset of YAML)
  • rosettes.lexers.toml_sm: TOML lexer (similar config format)

Classes

YamlStateMachineLexer 1
YAML lexer using composable mixins. Handles YAML's whitespace-sensitive syntax with context tracki…

YAML lexer using composable mixins.

Handles YAML's whitespace-sensitive syntax with context tracking.

Token Classification:

  • Keys: Identifiers followed by ':'
  • Booleans: true/false, yes/no, on/off (all case variants)
  • Null: null, Null, NULL, ~
  • Anchors: &name → NAME_LABEL
  • Aliases: *name → NAME_VARIABLE
  • Tags: !tag → NAME_TAG

Methods

tokenize 4 Iterator[Token]
def tokenize(self, code: str, config: LexerConfig | None = None, *, start: int = 0, end: int | None = None) -> Iterator[Token]
Parameters
Name Type Description
code
config Default:None
start Default:0
end Default:None
Returns
Iterator[Token]