Module

lexers.c_sm

Hand-written C lexer using composable scanner mixins.

O(n) guaranteed, zero regex, thread-safe.

Language Support:

  • C11/C17 syntax with common extensions
  • Preprocessor directives (#include,#define, etc.)
  • All standard types including stdint.h types
  • Integer suffixes (L,LL,U,UL, etc.)
  • Floating-point suffixes (f,F,l,L)

Architecture:

Uses C-style mixins for common patterns. C-specific additions:

  • Preprocessor directive handling (#include,#define, etc.)
  • Type suffixes on numeric literals
  • Standard C types as built-in keywords

Performance:

~40µs per 100-line file.

Thread-Safety:

All lookup tables are frozen sets.

See Also:

  • rosettes.lexers.cpp_sm: C++ extends this lexer
  • rosettes.lexers._scanners: Shared mixin implementations

Classes

CStateMachineLexer 1
C lexer using composable mixins.

C lexer using composable mixins.

Methods

tokenize 2 Iterator[Token]
def tokenize(self, code: str, config: LexerConfig | None = None) -> Iterator[Token]
Parameters
Name Type Description
code
config Default:None
Returns
Iterator[Token]