Module

lexers.go_sm

Hand-written Go lexer using composable scanner mixins.

O(n) guaranteed, zero regex, thread-safe.

Language Support:

  • Go 1.21+ syntax
  • Raw strings (backtick literals)
  • Runes (character literals)
  • Imaginary numbers (1i,3.14i)
  • Channel operator (<-)
  • Short variable declaration (:=)
  • All standard operators

Go-Specific Features:

  • Exported Names: Go convention is that exported (public) names start with uppercase. The lexer detects this and classifies them as NAME_CLASS for visual distinction.

  • Raw Strings: Backtick-delimited strings can span multiple lines and contain literal newlines without escaping.

  • Imaginary Numbers: Numbers can haveisuffix for complex literals.

Performance:

~40µs per 100-line file (Go has simple, regular syntax).

Thread-Safety:

All lookup tables are frozen sets. Scanning uses local variables only.

See Also:

  • rosettes.lexers.rust_sm: Similar systems language lexer
  • rosettes.lexers._scanners: Shared mixin implementations

Classes

GoStateMachineLexer 2
Go lexer using composable mixins. Go has clean, regular syntax making it one of the simpler lexers…

Go lexer using composable mixins.

Go has clean, regular syntax making it one of the simpler lexers.

Token Classification:

  • Declaration keywords: func, type, struct, interface, const, var
  • Namespace keywords: import, package
  • Constants: true, false, nil, iota
  • Types: Primitive types (int, string, bool, etc.)
  • Builtins: make, len, cap, append, etc.

Special Handling:

  • Exported names (starting with uppercase) → NAME_CLASS
  • Raw strings (backticks) can span multiple lines
  • Runes (character literals) use single quotes

Methods

tokenize 2 Iterator[Token]
Tokenize Go source code.
def tokenize(self, code: str, config: LexerConfig | None = None) -> Iterator[Token]
Parameters
Name Type Description
code
config Default:None
Returns
Iterator[Token]
Internal Methods 1
_classify_word 1 TokenType
Classify an identifier.
def _classify_word(self, word: str) -> TokenType
Parameters
Name Type Description
word
Returns
TokenType