Health Check System

Comprehensive build validation and health checks

7 min read 1329 words

Bengal includes a comprehensive health check system that validates source content and author-facing policy. Generated artifact correctness is now split out intobengal audit, while rendering owns the URL and anchor registries used by link validation.

Validation Surfaces

Surface Command/API Owner Purpose
Source policy checks bengal check bengal/health/ Config, directives, navigation, taxonomy, connectivity, and author-facing link checks
Compatibility alias bengal health bengal/cli/ Legacy entrypoint forbengal checkwhile automation migrates
Artifact audit bengal audit bengal/audit/ Post-build scan of generated HTML references and output files
Reference truth bengal/rendering/reference_registry.py bengal/rendering/ Rendered URLs, source paths, anchors, and auxiliary output URLs

Health Check (bengal/health/health_check.py)

  • Purpose: Orchestrates validators and produces unified health reports
  • Features:
    • Modular validator architecture
    • Fast execution (< 100ms per validator)
    • Configurable per-validator enable/disable
    • Console and JSON report formats
    • Integration with build stats
  • Usage:
      from bengal.health import HealthCheck
    
      health = HealthCheck(site)
      report = health.run(build_stats=stats)
      print(report.format_console())
    

Base Validator (bengal/health/base.py)

  • Purpose: Abstract base class for all validators

  • Interface:validate(site, build_context=None) -> list[CheckResult]

  • Features:

    • Independent execution (no validator dependencies)
    • Error handling and crash recovery
    • Performance tracking per validator
    • Configuration-based enablement
    • Access to cached build artifacts viabuild_context

Health Report (bengal/health/report/)

  • Purpose: Unified reporting structure for health check results

  • Components:

    • CheckStatus: SUCCESS, INFO, SUGGESTION, WARNING, ERROR (ordered by severity)
    • CheckResult: Individual check result with recommendation
    • ValidatorReport: Results from a single validator
    • HealthReport: Aggregated report from all validators
    • ReportEnvelope: Versioned CLI/machine envelope for Milo and Kida output
  • Formats:

    • Console output (colored, progressive disclosure)
    • JSON output (machine-readable)
    • Versioned result envelope (bengal.check.v1)
    • Summary statistics (pass/warning/error counts)
    • Quality scoring (0-100 with ratings)

Artifact Audit (bengal/audit/)

Artifact audit is intentionally cheaper and more orthogonal than a full health check. It scans the generated output directory after a build, extractshref andsrcreferences from HTML, skips external protocols, and verifies that internal references resolve to files, clean URL directories, or.htmlfiles in the output tree.

bengal build
bengal audit
bengal audit --json

Audit output uses the bengal.audit.v1envelope and renders through the same Kida validation report template as source checks.

Remediation (bengal/health/remediation/)

The remediation subpackage provides automated fixes for common validation errors:

  • Purpose: Generate and apply fixes from health check results
  • Components:
    • AutoFixer: Framework for generating and applying fixes
    • FixAction: Single fix with metadata and application logic
    • FixSafety: Safety classification (SAFE, CONFIRM, UNSAFE)
  • Usage:
      from bengal.health.remediation import AutoFixer, FixSafety
    
      fixer = AutoFixer(report, site_root=site.root_path)
      fixes = fixer.suggest_fixes()
      safe_fixes = [f for f in fixes if f.safety == FixSafety.SAFE]
      results = fixer.apply_fixes(safe_fixes)
    

Validators (bengal/health/validators/)

Validators are registered in phases based on execution cost and dependencies.

Phase 1 - Core Validation: | Validator | Validates | |-----------|-----------| | ConfigValidatorWrapper | Configuration validity, essential fields, common issues | | URLCollisionValidator | Duplicate URL detection (catches conflicts early) | | OwnershipPolicyValidator | URL ownership and content governance |

Phase 2 - Content Validation: | Validator | Validates | |-----------|-----------| | RenderingValidator | HTML quality, unrendered template syntax, SEO/social metadata, JSON-LD syntax | | DirectiveValidator | Directive syntax, completeness, and performance | | NavigationValidator | Page navigation (next/prev, breadcrumbs, ancestors) | | MenuValidator | Menu structure integrity, circular reference detection | | TaxonomyValidator | Tags, categories, archives, pagination integrity | | TrackValidator | Learning track structure and progression | | LinkValidatorWrapper | Broken links detection (internal and external) | | AnchorValidator | Explicit anchor targets and cross-reference integrity |

Phase 3 - Advanced Validation: | Validator | Validates | |-----------|-----------| | CacheValidator | Incremental build cache integrity and consistency | | PerformanceValidator | Build performance metrics and bottleneck detection |

Phase 4 - Production-Ready Validation: | Validator | Validates | |-----------|-----------| | RSSValidator | RSS feed quality, XML validity, URL formatting | | SitemapValidator | Sitemap.xml validity for SEO, no duplicate URLs | | FontValidator | Font downloads, CSS generation, file sizes | | AssetValidator | Asset optimization, minification hints, size analysis |

Phase 5 - Knowledge Graph Validation: | Validator | Validates | |-----------|-----------| | ConnectivityValidator | Page connectivity using semantic link model and weighted scoring |

Specialized Validators (not auto-registered): | Validator | Validates | |-----------|-----------| | AutodocValidator | API documentation HTML structure validation | | OutputValidator | Page sizes, asset presence (redundant with build errors) | | CrossReferenceValidator | Internal cross-reference resolution | | AccessibilityValidator | WCAG compliance and accessibility checks | | AssetURLValidator | Asset URL resolution and validation |

Utility Classes (not BaseValidator subclasses): | Class | Purpose | |-------|---------| | TemplateValidator | Jinja2 template syntax validation (requires TemplateEngine) |

Connectivity Validator

The Connectivity Validator uses a semantic link model with weighted scoring to provide nuanced page connectivity analysis beyond binary orphan detection.

Link Types and Weights: | Link Type | Weight | Description | |-----------|--------|-------------| | Explicit | 1.0 | Human-authored markdown links | | Menu | 10.0 | Navigation menu items (high editorial intent) | | Taxonomy | 1.0 | Shared tags/categories | | Related | 0.75 | Algorithm-computed related posts | | Topical | 0.5 | Section hierarchy (parent → child) | | Sequential | 0.25 | Next/prev navigation |

Connectivity levels: | Level | Score Range | Status | |-------|-------------|--------| | Well-connected | ≥ 2.0 | No action needed | | Adequately linked | 1.0 - 2.0 | Could improve | | Lightly linked | 0.25 - 1.0 | Should improve (only structural links) | | Isolated | < 0.25 | Needs attention |

Configuration:

[health_check]
# Connectivity thresholds
isolated_threshold = 5      # Max isolated pages before error
lightly_linked_threshold = 20  # Max lightly-linked before warning

# Customize weights (optional)
[health_check.link_weights]
explicit = 1.0
menu = 10.0
taxonomy = 1.0
related = 0.75
topical = 0.5
sequential = 0.25

CLI Commands:

# Full connectivity report
bengal graph report

# List isolated pages
bengal graph orphans --level isolated

# List lightly-linked pages
bengal graph orphans --level lightly

# CI mode with exit code
bengal graph report --ci --threshold-isolated 5

Configuration

Health checks use a tiered validation system for optimal performance:

Tier Name Time Trigger Validators
1 build <100ms Always Config, URL Collisions, Rendering, Directives, Navigation, Menu, Taxonomy
2 full ~500ms --fullflag + Connectivity, Cache, Performance, Anchors
3 ci ~30s --ciflag or CI env + External link checking (LinkValidatorWrapper)

Configuration viabengal.toml:

[health_check]
enabled = true
verbose = false
strict_mode = false

# Connectivity thresholds
isolated_threshold = 5        # Max isolated pages before error
lightly_linked_threshold = 20 # Max lightly-linked before warning

# Connectivity score thresholds
[health_check.connectivity_thresholds]
well_connected = 2.0    # Score >= 2.0
adequately_linked = 1.0 # Score 1.0-2.0
lightly_linked = 0.25   # Score 0.25-1.0
# Score < 0.25 = isolated

# Link type weights for scoring
[health_check.link_weights]
explicit = 1.0    # Human-authored markdown links
menu = 10.0       # Navigation menu items
taxonomy = 1.0    # Shared tags/categories
related = 0.75    # Algorithm-computed related posts
topical = 0.5     # Section hierarchy (parent → child)
sequential = 0.25 # Next/prev navigation

Per-profile validator filtering:

Validators run based on the active build profile:

Profile Validators Enabled
writer Config, Menu (fast feedback)
theme-dev + Rendering, Directives
dev All validators (full observability)

Validators can be explicitly enabled/disabled in config regardless of profile.

Integration

Health checks run automatically after builds in strict mode and can be triggered manually:

# Automatic validation in strict mode
site.config["strict_mode"] = True
stats = site.build()

# Manual validation
from bengal.health import HealthCheck
health = HealthCheck(site)
report = health.run(build_stats=stats)