Module

lexers.javascript_sm

Hand-written JavaScript lexer using composable scanner mixins.

O(n) guaranteed, zero regex, thread-safe.

Language Support:

  • ECMAScript 2024 (ES15) syntax
  • Template literals (backtick strings with${}interpolation)
  • BigInt literals (123n suffix)
  • Optional chaining (?.) and nullish coalescing (??)
  • async/await, generators, classes
  • All standard operators including**(exponentiation)

Architecture:

This lexer demonstrates the mixin composition pattern. Most scanning logic is inherited from reusable mixins:

  • CStyleCommentsMixin://and/* */comments
  • CStyleNumbersMixin: Hex, octal, binary, floats with exponents
  • CStyleStringsMixin: Double/single quotes with escapes, backticks
  • CStyleOperatorsMixin: Configurable multi-char operators

Only language-specific parts are implemented in this class:

  • Keyword classification
  • Identifier handling ($allowed)
  • Language-specific token types

Performance:

~45µs per 100-line file, benefiting from optimized mixin code.

Thread-Safety:

All lookup tables (_KEYWORDS,_BUILTINS, etc.) are frozen sets. Mixins use only local variables in scanning methods.

See Also:

  • rosettes.lexers._scanners: Mixin definitions and configuration
  • rosettes.lexers.typescript_sm: TypeScript extends this pattern

Classes

JavaScriptStateMachineLexer 2
JavaScript/ECMAScript lexer using composable mixins. Supports ES2024 syntax with all modern featur…

JavaScript/ECMAScript lexer using composable mixins.

Supports ES2024 syntax with all modern features. Most scanning logic is inherited from C-style mixins; only JS-specific parts are here.

Configuration: NUMBER_CONFIG: Enables BigInt suffix ('n') STRING_CONFIG: Enables template literals (backticks) OPERATOR_CONFIG: JS-specific operators (===, ??, ?., etc.)

Token Classification:

  • Declaration keywords: function, class, const, let, var
  • Namespace keywords: import, export, from
  • Constants: true, false, null, undefined, NaN, Infinity
  • Builtins: Array, Promise, console, window, etc.

Methods

tokenize 2 Iterator[Token]
Tokenize JavaScript source code.
def tokenize(self, code: str, config: LexerConfig | None = None) -> Iterator[Token]
Parameters
Name Type Description
code
config Default:None
Returns
Iterator[Token]
Internal Methods 1
_classify_word 1 TokenType
Classify an identifier.
def _classify_word(self, word: str) -> TokenType
Parameters
Name Type Description
word
Returns
TokenType