Functions
get_thread_parser
Get or create a MarkdownParser instance for the current thread.
Thread-Local Caching Strategy:
…
get_thread_parser
def get_thread_parser(engine: str | None = None) -> BaseMarkdownParser
Get or create a MarkdownParser instance for the current thread.
Thread-Local Caching Strategy:
- Creates ONE parser per worker thread (expensive operation ~10ms)
- Caches it for the lifetime of that thread
- Each thread reuses its parser for all pages it processes
- Total parsers created = number of worker threads
Performance Impact:
With max_workers=N (from config):
N worker threads created
N parser instances created (one per thread)
Each parser handles ~(total_pages / N) pages
Example with max_workers=10 and 200 pages:
10 threads → 10 parsers created
Each parser processes ~20 pages
Creation cost: 10ms × 10 = 100ms one-time
Reuse savings: 9.9 seconds (avoiding 190 × 10ms)
Thread Safety:
Each thread gets its own parser instance, no locking needed.
Read-only access to site config and xref_index is safe.
Parameters 1
| Name | Type | Default | Description |
|---|---|---|---|
engine |
str | None |
None |
Parser engine to use ('python-markdown', 'mistune', or None for default) |
Returns
Cached MarkdownParser instance for this threadBaseMarkdownParser
—
get_created_dirs
Get the thread-safe set of already created directories.
get_created_dirs
def get_created_dirs() -> ThreadSafeSet
Get the thread-safe set of already created directories.
Returns
ThreadSafeSet
mark_dir_created
Mark a directory as created, return True if it was new.
This is the preferred way to track directo…
mark_dir_created
def mark_dir_created(dir_path: str) -> bool
Mark a directory as created, return True if it was new.
This is the preferred way to track directory creation. Uses atomic check-and-add to prevent race conditions.
Parameters 1
| Name | Type | Default | Description |
|---|---|---|---|
dir_path |
str |
— | Path to directory as string |
Returns
True if directory was newly added, False if already trackedbool
—