Module

health.linkcheck.async_checker

Async external link checker with retries, backoff, and concurrency control.

Classes

AsyncLinkChecker
Async HTTP link checker with retries, backoff, and concurrency control. Features: - Global concurr…
6

Async HTTP link checker with retries, backoff, and concurrency control.

Features:

  • Global concurrency limit via semaphore
  • Per-host concurrency limit to avoid rate limits
  • Exponential backoff with jitter for retries
  • HEAD request first, fallback to GET on 405/403
  • Connection pooling and DNS caching via httpx
  • Ignore policies for patterns, domains, and status ranges

Methods 2

check_links async
Check multiple external URLs concurrently.
1 dict[str, LinkCheck…
async def check_links(self, urls: list[tuple[str, str]]) -> dict[str, LinkCheckResult]

Check multiple external URLs concurrently.

Parameters 1
urls list[tuple[str, str]]

List of (url, first_ref) tuples where first_ref is the page that first referenced this URL

Returns

dict[str, LinkCheckResult]

Dict mapping URL to LinkCheckResult

from_config classmethod
Create AsyncLinkChecker from config dict.
1 AsyncLinkChecker
def from_config(cls, config: dict[str, Any]) -> AsyncLinkChecker

Create AsyncLinkChecker from config dict.

Parameters 1
config dict[str, Any]

Configuration dict

Returns

AsyncLinkChecker

Configured AsyncLinkChecker instance

Internal Methods 4
__init__
Initialize async link checker.
7 None
def __init__(self, max_concurrency: int = 20, per_host_limit: int = 4, timeout: float = 10.0, retries: int = 2, retry_backoff: float = 0.5, ignore_policy: IgnorePolicy | None = None, user_agent: str = 'Bengal-LinkChecker/1.0')

Initialize async link checker.

Parameters 7
max_concurrency int

Maximum concurrent requests across all hosts

per_host_limit int

Maximum concurrent requests per host

timeout float

Request timeout in seconds

retries int

Number of retry attempts

retry_backoff float

Base backoff time for exponential backoff (seconds)

ignore_policy IgnorePolicy | None

Policy for ignoring links/statuses

user_agent str

User-Agent header value

_check_url async
Check a single URL with retries and backoff.
3 LinkCheckResult
async def _check_url(self, client: httpx.AsyncClient, url: str, refs: list[str]) -> LinkCheckResult

Check a single URL with retries and backoff.

Parameters 3
client httpx.AsyncClient

httpx AsyncClient

url str

URL to check

refs list[str]

List of pages that reference this URL

Returns

LinkCheckResult

LinkCheckResult

_check_with_retries async
Check URL with exponential backoff retries.
3 LinkCheckResult
async def _check_with_retries(self, client: httpx.AsyncClient, url: str, refs: list[str]) -> LinkCheckResult

Check URL with exponential backoff retries.

Parameters 3
client httpx.AsyncClient

httpx AsyncClient

url str

URL to check

refs list[str]

List of pages that reference this URL

Returns

LinkCheckResult

LinkCheckResult

_calculate_backoff
Calculate exponential backoff with jitter. Delegates to centralized calculate_…
1 float
def _calculate_backoff(self, attempt: int) -> float

Calculate exponential backoff with jitter.

Delegates to centralized calculate_backoff utility for consistent backoff behavior across the codebase.

Parameters 1
attempt int

Attempt number (0-indexed)

Returns

float

Backoff time in seconds