Module

lexers.json_sm

Hand-written JSON lexer optimized for speed.

O(n) guaranteed, zero regex, thread-safe.

Design Philosophy:

JSON has a minimal grammar (7 token types), so this lexer is optimized for raw speed rather than code reuse. All scanning is inlined to minimize function call overhead.

Language Support:

  • Standard JSON (RFC 8259)
  • Strings with escape sequences
  • Numbers (integers and floats with exponents)
  • Literals:true,false,null
  • Arrays and objects

Performance:

~25µs per 100-line file — fastest lexer in Rosettes due to JSON's simple grammar. No mixin overhead, all hot paths inlined.

Token Types Used:

  • STRING:"string values"
  • NUMBER:123,3.14,1e10
  • KEYWORD_CONSTANT:true,false,null
  • PUNCTUATION:[ ] { } : ,
  • WHITESPACE: spaces, tabs, newlines
  • ERROR: invalid characters

Thread-Safety:

Uses only local variables intokenize(). No class-level mutable state.

See Also:

  • rosettes.lexers.yaml_sm: YAML lexer (superset of JSON)
  • rosettes.lexers.toml_sm: TOML lexer (similar config format)

Classes

JsonStateMachineLexer 1
JSON lexer optimized for minimal overhead. JSON has a simple grammar — this lexer optimizes for ra…

JSON lexer optimized for minimal overhead.

JSON has a simple grammar — this lexer optimizes for raw speed with all scanning inlined (no mixin overhead).

Methods

tokenize 2 Iterator[Token]
def tokenize(self, code: str, config: LexerConfig | None = None) -> Iterator[Token]
Parameters
Name Type Description
code
config Default:None
Returns
Iterator[Token]