Module

rendering.asset_extractor

Asset extraction utilities for tracking page-to-asset dependencies.

Extracts references to assets (images, stylesheets, scripts, fonts) from rendered HTML to populate the AssetDependencyMap cache. This enables incremental builds to discover only the assets needed for changed pages.

Asset types tracked:

  • Images: <img src>, <picture> <source srcset>
  • Stylesheets: <link href> with rel=stylesheet
  • Scripts: <script src>
  • Fonts: <link href> with rel=preload type=font
  • Data URLs, IFrames, and other embedded resources

Classes

AssetExtractorParser
HTML parser for extracting asset references from rendered content.
6

HTML parser for extracting asset references from rendered content.

Inherits from HTMLParser

Methods 5

handle_starttag
Extract asset references from opening tags. Handles: - <img src>, <img srcset>…
2 None
def handle_starttag(self, tag: str, attrs: list[tuple[str, str | None]]) -> None

Extract asset references from opening tags.

Handles:

  • <img src>, <img srcset>
  • <script src>

  • <link href>

  • <source srcset>

  • <iframe src>

  • <picture> with sources
Parameters 2
tag str
attrs list[tuple[str, str | None]]
handle_endtag
Handle closing tags.
1 None
def handle_endtag(self, tag: str) -> None

Handle closing tags.

Parameters 1
tag str
handle_data
Extract @import URLs from style tag content. Handles: - @import url('...') - @…
1 None
def handle_data(self, data: str) -> None

Extract @import URLs from style tag content.

Handles:

  • @import url('...')
  • @import url("...")
  • @import url(...) - without quotes
Parameters 1
data str
feed
Parse HTML and return self for chaining.
1 AssetExtractorParser
def feed(self, data: str) -> AssetExtractorParser

Parse HTML and return self for chaining.

Parameters 1
data str
Returns

AssetExtractorParser

self to allow parser(html).get_assets() pattern

get_assets
Get all extracted asset URLs. Filters out empty strings and returns normalized set.
0 set[str]
def get_assets(self) -> set[str]

Get all extracted asset URLs.

Filters out empty strings and returns normalized set.

Returns

set[str]

Set of asset URLs/paths

Internal Methods 1
__init__
Initialize the asset extractor parser.
0 None
def __init__(self) -> None

Initialize the asset extractor parser.

Functions

extract_assets_from_html
Extract all asset references from rendered HTML.
1 set[str]
def extract_assets_from_html(html_content: str) -> set[str]

Extract all asset references from rendered HTML.

Parameters 1

Name Type Default Description
html_content str

Rendered HTML content

Returns

set[str]

Set of asset URLs/paths referenced in the HTML