Module

lexers.java_sm

Hand-written Java lexer using composable scanner mixins.

O(n) guaranteed, zero regex, thread-safe.

Language Support:

  • Java 21 syntax
  • Text blocks (triple-quoted strings)
  • Records, sealed classes, pattern matching
  • Annotations (@Override,@FunctionalInterface)
  • Lambda expressions
  • All numeric literal formats including underscores

Architecture:

Uses C-style mixins. Java-specific additions:

  • Annotations:@Name→ NAME_DECORATOR
  • Text blocks: triple-quoted multiline strings
  • Package/import classification
  • JavaDoc comments/** ... */special handling

Performance:

~50µs per 100-line file.

Thread-Safety:

All lookup tables are frozen sets.

See Also:

  • rosettes.lexers.kotlin_sm: Kotlin lexer (JVM language)
  • rosettes.lexers.scala_sm: Scala lexer (JVM language)

Classes

JavaStateMachineLexer 1
Java lexer using composable mixins.

Java lexer using composable mixins.

Methods

tokenize 2 Iterator[Token]
def tokenize(self, code: str, config: LexerConfig | None = None) -> Iterator[Token]
Parameters
Name Type Description
code
config Default:None
Returns
Iterator[Token]