Classes
MarkdownStateMachineLexer
1
▼
Markdown lexer with CommonMark syntax support.
Line-oriented lexer that tracks block-level context…
MarkdownStateMachineLexer
1
▼
Markdown lexer with CommonMark syntax support.
Line-oriented lexer that tracks block-level context for accurate tokenization of headers, lists, and code blocks.
Token Types:
- GENERIC_HEADING: Headers (
#through######) - STRING: Fenced code blocks and inline code
- GENERIC_STRONG: Bold text (
**text**) - GENERIC_EMPH: Italic text (
*text*) - NAME_TAG: Link/image markers and URLs
Example:
>>> from rosettes import get_lexer
>>> lexer = get_lexer("markdown")
>>> tokens = list(lexer.tokenize("# Header"))
>>> tokens[0].type # '#' is a heading marker
<TokenType.GENERIC_HEADING: 'gh'>
Methods
tokenize
2
Iterator[Token]
▼
tokenize
2
Iterator[Token]
▼
def tokenize(self, code: str, config: LexerConfig | None = None) -> Iterator[Token]
Parameters
| Name | Type | Description |
|---|---|---|
code |
— |
|
config |
— |
Default:None
|
Returns
Iterator[Token]