Module

lexers.ruby_sm

Hand-written Ruby lexer using state machine approach.

O(n) guaranteed, zero regex, thread-safe.

Language Support:

  • Ruby 3.x syntax
  • Symbols (:name, :"string")
  • Regular expressions (/pattern/)
  • Here-documents (<<EOF, <<-EOF, <<~EOF)
  • String interpolation (#{expr})
  • Instance variables (@var) and class variables (@@var)
  • Global variables ($var)
  • Percent strings (%q, %Q, %w, %W, %i, %I, %r, %s, %x)

Special Handling:

Ruby has complex string/regex syntax with multiple delimiters:

  • Standard quotes:"string", 'string'
  • Percent literals:%q{string}, %r(regex), etc.
  • Here-documents: Multiline strings with custom delimiters

Symbols can be barewords (:name) or quoted (:"name with spaces").

Performance:

~60µs per 100-line file (Ruby's syntax complexity adds overhead).

Thread-Safety:

All lookup tables are frozen sets.

See Also:

  • rosettes.lexers.python_sm: Similar dynamic language
  • rosettes.lexers.perl_sm: Similar regex/string handling

Classes

RubyStateMachineLexer 1
Ruby lexer.

Ruby lexer.

Methods

tokenize 4 Iterator[Token]
def tokenize(self, code: str, config: LexerConfig | None = None, *, start: int = 0, end: int | None = None) -> Iterator[Token]
Parameters
Name Type Description
code
config Default:None
start Default:0
end Default:None
Returns
Iterator[Token]