Module

lexers.bash_sm

Hand-written Bash/Shell lexer using composable scanner mixins.

O(n) guaranteed, zero regex, thread-safe.

Language Support:

  • Bash 5.x syntax
  • POSIX shell compatibility
  • Variable expansion ($var,${var},$(cmd))
  • Here-documents (<<EOF)
  • Process substitution (<(cmd),>(cmd))
  • All control structures (if/fi,case/esac,for/done, etc.)
  • Function definitions
  • Arrays and associative arrays

Special Handling:

  • Single-quoted strings: No escape sequences, literal content
  • Double-quoted strings: Variable expansion, escape sequences
  • $'...'strings: ANSI-C quoting
  • Heredocs: Track delimiter for multiline content

Performance:

~55µs per 100-line file (complex quoting rules add overhead).

Thread-Safety:

All lookup tables are frozen sets.

See Also:

  • rosettes.lexers.powershell_sm: PowerShell lexer

Classes

BashStateMachineLexer 1
Bash/Shell lexer using composable mixins.

Bash/Shell lexer using composable mixins.

Methods

tokenize 4 Iterator[Token]
def tokenize(self, code: str, config: LexerConfig | None = None, start: int = 0, end: int | None = None) -> Iterator[Token]
Parameters
Name Type Description
code
config Default:None
start Default:0
end Default:None
Returns
Iterator[Token]