Classes
AsyncLinkChecker
Async HTTP link checker with retries, backoff, and concurrency control.
Features:
- Global concurr…
AsyncLinkChecker
Async HTTP link checker with retries, backoff, and concurrency control.
Features:
- Global concurrency limit via semaphore
- Per-host concurrency limit to avoid rate limits
- Exponential backoff with jitter for retries
- HEAD request first, fallback to GET on 405/403
- Connection pooling and DNS caching via httpx
- Ignore policies for patterns, domains, and status ranges
Methods 2
check_links
async
Check multiple external URLs concurrently.
check_links
async async def check_links(self, urls: list[tuple[str, str]]) -> dict[str, LinkCheckResult]
Check multiple external URLs concurrently.
Parameters 1
urls |
list[tuple[str, str]] |
List of (url, first_ref) tuples where first_ref is the page that first referenced this URL |
Returns
Dict mapping URL to LinkCheckResultdict[str, LinkCheckResult]
—
from_config
classmethod
Create AsyncLinkChecker from config dict.
from_config
classmethod def from_config(cls, config: dict[str, Any]) -> AsyncLinkChecker
Create AsyncLinkChecker from config dict.
Parameters 1
config |
dict[str, Any] |
Configuration dict |
Returns
Configured AsyncLinkChecker instanceAsyncLinkChecker
—
Internal Methods 4
__init__
Initialize async link checker.
__init__
def __init__(self, max_concurrency: int = 20, per_host_limit: int = 4, timeout: float = 10.0, retries: int = 2, retry_backoff: float = 0.5, ignore_policy: IgnorePolicy | None = None, user_agent: str = 'Bengal-LinkChecker/1.0')
Initialize async link checker.
Parameters 7
max_concurrency |
int |
Maximum concurrent requests across all hosts |
per_host_limit |
int |
Maximum concurrent requests per host |
timeout |
float |
Request timeout in seconds |
retries |
int |
Number of retry attempts |
retry_backoff |
float |
Base backoff time for exponential backoff (seconds) |
ignore_policy |
IgnorePolicy | None |
Policy for ignoring links/statuses |
user_agent |
str |
User-Agent header value |
_check_url
async
Check a single URL with retries and backoff.
_check_url
async async def _check_url(self, client: httpx.AsyncClient, url: str, refs: list[str]) -> LinkCheckResult
Check a single URL with retries and backoff.
Parameters 3
client |
httpx.AsyncClient |
httpx AsyncClient |
url |
str |
URL to check |
refs |
list[str] |
List of pages that reference this URL |
Returns
LinkCheckResultLinkCheckResult
—
_check_with_retries
async
Check URL with exponential backoff retries.
_check_with_retries
async async def _check_with_retries(self, client: httpx.AsyncClient, url: str, refs: list[str]) -> LinkCheckResult
Check URL with exponential backoff retries.
Parameters 3
client |
httpx.AsyncClient |
httpx AsyncClient |
url |
str |
URL to check |
refs |
list[str] |
List of pages that reference this URL |
Returns
LinkCheckResultLinkCheckResult
—
_calculate_backoff
Calculate exponential backoff with jitter.
Delegates to centralized calculate_…
_calculate_backoff
def _calculate_backoff(self, attempt: int) -> float
Calculate exponential backoff with jitter.
Delegates to centralized calculate_backoff utility for consistent backoff behavior across the codebase.
Parameters 1
attempt |
int |
Attempt number (0-indexed) |
Returns
Backoff time in secondsfloat
—