Module

lexers.ruby_sm

Hand-written Ruby lexer using state machine approach.

O(n) guaranteed, zero regex, thread-safe.

Language Support:

  • Ruby 3.x syntax
  • Symbols (:name,:"string")
  • Regular expressions (/pattern/)
  • Here-documents (<<EOF,<<-EOF,<<~EOF)
  • String interpolation (#{expr})
  • Instance variables (@var) and class variables (@@var)
  • Global variables ($var)
  • Percent strings (%q,%Q,%w,%W,%i,%I,%r,%s,%x)

Special Handling:

Ruby has complex string/regex syntax with multiple delimiters:

  • Standard quotes:"string",'string'
  • Percent literals:%q{string},%r(regex), etc.
  • Here-documents: Multiline strings with custom delimiters

Symbols can be barewords (:name) or quoted (:"name with spaces").

Performance:

~60µs per 100-line file (Ruby's syntax complexity adds overhead).

Thread-Safety:

All lookup tables are frozen sets.

See Also:

  • rosettes.lexers.python_sm: Similar dynamic language
  • rosettes.lexers.perl_sm: Similar regex/string handling

Classes

RubyStateMachineLexer 1
Ruby lexer.

Ruby lexer.

Methods

tokenize 2 Iterator[Token]
def tokenize(self, code: str, config: LexerConfig | None = None) -> Iterator[Token]
Parameters
Name Type Description
code
config Default:None
Returns
Iterator[Token]