Module

`lexers.json_sm`

Hand-written JSON lexer optimized for speed.

O(n) guaranteed, zero regex, thread-safe.

Design Philosophy:

JSON has a minimal grammar (7 token types), so this lexer is optimized for raw speed rather than code reuse. All scanning is inlined to minimize function call overhead.

Language Support:

Standard JSON (RFC 8259)
Strings with escape sequences
Numbers (integers and floats with exponents)
Literals:true, false, null
Arrays and objects

Performance:

~25µs per 100-line file — fastest lexer in Rosettes due to JSON's simple grammar. No mixin overhead, all hot paths inlined.

Token Types Used:

STRING:"string values"
NUMBER:123, 3.14, 1e10
KEYWORD_CONSTANT:true, false, null
PUNCTUATION:[ ] { } : ,
WHITESPACE: spaces, tabs, newlines
ERROR: invalid characters

Thread-Safety:

Uses only local variables intokenize(). No class-level mutable state.

See Also:

rosettes.lexers.yaml_sm: YAML lexer (superset of JSON)
rosettes.lexers.toml_sm: TOML lexer (similar config format)

Classes

JsonStateMachineLexer 1 ▼

JSON lexer optimized for minimal overhead. JSON has a simple grammar — this lexer optimizes for ra…

JSON lexer optimized for minimal overhead.

JSON has a simple grammar — this lexer optimizes for raw speed with all scanning inlined (no mixin overhead).

Methods

tokenize 4 Iterator[Token] ▼

def tokenize(self, code: str, config: LexerConfig | None = None, *, start: int = 0, end: int | None = None) -> Iterator[Token]

Parameters

Name	Type	Description
`code`	`—`
`config`	`—`	Default:`None`
`start`	`—`	Default:`0`
`end`	`—`	Default:`None`

Returns

Iterator[Token]