Classes
GraphMetrics
dataclass
Metrics about the knowledge graph structure.
GraphMetrics
dataclass Metrics about the knowledge graph structure.
Attributes
| Name | Type | Description |
|---|---|---|
total_pages |
int |
Total number of pages analyzed |
total_links |
int |
Total number of links between pages |
avg_connectivity |
float |
Average connectivity score per page |
hub_count |
int |
Number of hub pages (highly connected) |
leaf_count |
int |
Number of leaf pages (low connectivity) |
orphan_count |
int |
Number of orphaned pages (no connections at all) |
PageConnectivity
dataclass
Connectivity information for a single page.
PageConnectivity
dataclass Connectivity information for a single page.
Attributes
| Name | Type | Description |
|---|---|---|
page |
Page |
The page object |
incoming_refs |
int |
Number of incoming references |
outgoing_refs |
int |
Number of outgoing references |
connectivity_score |
int |
Total connectivity (incoming + outgoing) |
is_hub |
bool |
True if page has many incoming references |
is_leaf |
bool |
True if page has few connections |
is_orphan |
bool |
True if page has no connections at all |
KnowledgeGraph
Analyzes the connectivity structure of a Bengal site.
Builds a graph of all pages and their connec…
KnowledgeGraph
Analyzes the connectivity structure of a Bengal site.
Builds a graph of all pages and their connections through:
- Internal links (cross-references)
- Taxonomies (tags, categories)
- Related posts
- Menu items
Provides insights for:
- Content strategy (find orphaned pages)
- Performance optimization (hub-first streaming)
- Navigation design (understand structure)
- SEO improvements (link structure)
Methods 26
build
Build the knowledge graph by analyzing all page connections.
This analyzes:
1.…
build
def build(self) -> None
Build the knowledge graph by analyzing all page connections.
This analyzes:
- Cross-references (internal links between pages)
- Taxonomy references (pages grouped by tags/categories)
- Related posts (pre-computed relationships)
- Menu items (navigation references)
Call this before using any analysis methods.
get_analysis_pages
Get list of pages to analyze, excluding autodoc pages if configured.
get_analysis_pages
def get_analysis_pages(self) -> list[Page]
Get list of pages to analyze, excluding autodoc pages if configured.
Returns
List of pages to include in graph analysislist[Page]
—
get_connectivity
Get connectivity information for a specific page.
get_connectivity
def get_connectivity(self, page: Page) -> PageConnectivity
Get connectivity information for a specific page.
Parameters 1
page |
Page |
Page to analyze |
Returns
PageConnectivity with detailed metricsPageConnectivity
—
get_hubs
Get hub pages (highly connected pages).
Hubs are pages with many incoming refe…
get_hubs
def get_hubs(self, threshold: int | None = None) -> list[Page]
Get hub pages (highly connected pages).
Hubs are pages with many incoming references. These are typically:
- Index pages
- Popular articles
- Core documentation
Parameters 1
threshold |
int | None |
Minimum incoming refs (defaults to self.hub_threshold) |
Returns
List of hub pages sorted by incoming references (descending)list[Page]
—
get_leaves
Get leaf pages (low connectivity pages).
Leaves are pages with few connections…
get_leaves
def get_leaves(self, threshold: int | None = None) -> list[Page]
Get leaf pages (low connectivity pages).
Leaves are pages with few connections. These are typically:
- One-off blog posts
- Changelog entries
- Niche content
Parameters 1
threshold |
int | None |
Maximum connectivity (defaults to self.leaf_threshold) |
Returns
List of leaf pages sorted by connectivity (ascending)list[Page]
—
get_orphans
Get orphaned pages (no connections at all).
Orphans are pages with no incoming…
get_orphans
def get_orphans(self) -> list[Page]
Get orphaned pages (no connections at all).
Orphans are pages with no incoming or outgoing references. These might be:
- Forgotten content
- Draft pages
- Pages that should be linked from navigation
Returns
List of orphaned pages sorted by sluglist[Page]
—
get_connectivity_report
Get comprehensive connectivity report with pages grouped by level.
Uses weight…
get_connectivity_report
def get_connectivity_report(self, thresholds: dict[str, float] | None = None, weights: dict[LinkType, float] | None = None) -> ConnectivityReport
Get comprehensive connectivity report with pages grouped by level.
Uses weighted scoring based on semantic link types to provide nuanced analysis beyond binary orphan detection.
Parameters 2
thresholds |
dict[str, float] | None |
Custom thresholds for connectivity levels. Defaults to DEFAULT_THRESHOLDS. |
weights |
dict[LinkType, float] | None |
Custom weights for link types. Defaults to DEFAULT_WEIGHTS. |
Returns
ConnectivityReport with pages grouped by level and statistics.ConnectivityReport
—
get_page_link_metrics
Get detailed link metrics for a specific page.
get_page_link_metrics
def get_page_link_metrics(self, page: Page) -> LinkMetrics
Get detailed link metrics for a specific page.
Parameters 1
page |
Page |
Page to get metrics for |
Returns
LinkMetrics with breakdown by link typeLinkMetrics
—
get_connectivity_score
Get total connectivity score for a page.
Connectivity = incoming_refs + outgoi…
get_connectivity_score
def get_connectivity_score(self, page: Page) -> int
Get total connectivity score for a page.
Connectivity = incoming_refs + outgoing_refs
Parameters 1
page |
Page |
Page to analyze |
Returns
Connectivity score (higher = more connected)int
—
get_layers
Partition pages into three layers by connectivity.
Layers enable hub-first str…
get_layers
def get_layers(self) -> PageLayers
Partition pages into three layers by connectivity.
Layers enable hub-first streaming builds:
- Layer 0 (Hubs): High connectivity, process first, keep in memory
- Layer 1 (Mid-tier): Medium connectivity, batch processing
- Layer 2 (Leaves): Low connectivity, stream and release
Returns
PageLayers dataclass with hubs, mid_tier, and leaves attributes
(supports tuple unpacking for backward compatibility)PageLayers
—
get_metrics
Get overall graph metrics.
get_metrics
def get_metrics(self) -> GraphMetrics
Get overall graph metrics.
Returns
GraphMetrics with summary statisticsGraphMetrics
—
format_stats
Format graph statistics as a human-readable string.
format_stats
def format_stats(self) -> str
Format graph statistics as a human-readable string.
Returns
Formatted statistics stringstr
—
get_actionable_recommendations
Generate actionable recommendations for improving site structure.
get_actionable_recommendations
def get_actionable_recommendations(self) -> list[str]
Generate actionable recommendations for improving site structure.
Returns
List of recommendation strings with emoji prefixeslist[str]
—
get_seo_insights
Generate SEO-focused insights about site structure.
get_seo_insights
def get_seo_insights(self) -> list[str]
Generate SEO-focused insights about site structure.
Returns
List of SEO insight strings with emoji prefixeslist[str]
—
get_content_gaps
Identify content gaps based on link structure and taxonomies.
get_content_gaps
def get_content_gaps(self) -> list[str]
Identify content gaps based on link structure and taxonomies.
Returns
List of content gap descriptionslist[str]
—
compute_pagerank
Compute PageRank scores for all pages in the graph.
PageRank assigns importanc…
compute_pagerank
def compute_pagerank(self, damping: float = 0.85, max_iterations: int = 100, force_recompute: bool = False) -> PageRankResults
Compute PageRank scores for all pages in the graph.
PageRank assigns importance scores based on link structure. Pages that are linked to by many important pages get high scores.
Parameters 3
damping |
float |
Probability of following links vs random jump (default 0.85) |
max_iterations |
int |
Maximum iterations before stopping (default 100) |
force_recompute |
bool |
If True, recompute even if cached |
Returns
PageRankResults with scores and metadataPageRankResults
—
compute_personalized_pagerank
Compute personalized PageRank from seed pages.
Personalized PageRank biases ra…
compute_personalized_pagerank
def compute_personalized_pagerank(self, seed_pages: set[Page], damping: float = 0.85, max_iterations: int = 100) -> PageRankResults
Compute personalized PageRank from seed pages.
Personalized PageRank biases random jumps toward seed pages, useful for finding pages related to a specific topic or set of pages.
Parameters 3
seed_pages |
set[Page] |
Set of pages to bias toward |
damping |
float |
Probability of following links vs random jump |
max_iterations |
int |
Maximum iterations before stopping |
Returns
PageRankResults with personalized scoresPageRankResults
—
get_top_pages_by_pagerank
Get top-ranked pages by PageRank score.
Automatically computes PageRank if not…
get_top_pages_by_pagerank
def get_top_pages_by_pagerank(self, limit: int = 20) -> list[tuple[Page, float]]
Get top-ranked pages by PageRank score.
Automatically computes PageRank if not already computed.
Parameters 1
limit |
int |
Number of pages to return |
Returns
List of (page, score) tuples sorted by score descendinglist[tuple[Page, float]]
—
get_pagerank_score
Get PageRank score for a specific page.
Automatically computes PageRank if not…
get_pagerank_score
def get_pagerank_score(self, page: Page) -> float
Get PageRank score for a specific page.
Automatically computes PageRank if not already computed.
Parameters 1
page |
Page |
Page to get score for |
Returns
PageRank score (0.0 if page not found)float
—
detect_communities
Detect topical communities using Louvain method.
Discovers natural clusters of…
detect_communities
def detect_communities(self, resolution: float = 1.0, random_seed: int | None = None, force_recompute: bool = False) -> CommunityDetectionResults
Detect topical communities using Louvain method.
Discovers natural clusters of related pages based on link structure. Communities represent topic areas or content groups.
Parameters 3
resolution |
float |
Resolution parameter (higher = more communities, default 1.0) |
random_seed |
int | None |
Random seed for reproducibility |
force_recompute |
bool |
If True, recompute even if cached |
Returns
CommunityDetectionResults with discovered communitiesCommunityDetectionResults
—
get_community_for_page
Get community ID for a specific page.
Automatically detects communities if not…
get_community_for_page
def get_community_for_page(self, page: Page) -> int | None
Get community ID for a specific page.
Automatically detects communities if not already computed.
Parameters 1
page |
Page |
Page to get community for |
Returns
Community ID or None if page not foundint | None
—
analyze_paths
Analyze navigation paths and centrality metrics.
Computes:
- Betweenness centr…
analyze_paths
def analyze_paths(self, force_recompute: bool = False, k_pivots: int = 100, seed: int = 42, auto_approximate_threshold: int = 500) -> PathAnalysisResults
Analyze navigation paths and centrality metrics.
Computes:
- Betweenness centrality: Pages that act as bridges
- Closeness centrality: Pages that are easily accessible
- Network diameter and average path length
For large sites (> auto_approximate_threshold pages), uses pivot-based approximation for O(k*N) complexity instead of O(N²).
Parameters 4
force_recompute |
bool |
If True, recompute even if cached |
k_pivots |
int |
Number of pivot nodes for approximation (default: 100) |
seed |
int |
Random seed for deterministic results (default: 42) |
auto_approximate_threshold |
int |
Use exact if pages <= this (default: 500) |
Returns
PathAnalysisResults with centrality metricsPathAnalysisResults
—
get_betweenness_centrality
Get betweenness centrality for a specific page.
Automatically analyzes paths i…
get_betweenness_centrality
def get_betweenness_centrality(self, page: Page) -> float
Get betweenness centrality for a specific page.
Automatically analyzes paths if not already computed.
Parameters 1
page |
Page |
Page to get centrality for |
Returns
Betweenness centrality scorefloat
—
get_closeness_centrality
Get closeness centrality for a specific page.
Automatically analyzes paths if …
get_closeness_centrality
def get_closeness_centrality(self, page: Page) -> float
Get closeness centrality for a specific page.
Automatically analyzes paths if not already computed.
Parameters 1
page |
Page |
Page to get centrality for |
Returns
Closeness centrality scorefloat
—
suggest_links
Generate smart link suggestions to improve site connectivity.
Uses multiple si…
suggest_links
def suggest_links(self, min_score: float = 0.3, max_suggestions_per_page: int = 10, force_recompute: bool = False) -> LinkSuggestionResults
Generate smart link suggestions to improve site connectivity.
Uses multiple signals:
- Topic similarity (shared tags/categories)
- PageRank importance
- Betweenness centrality (bridge pages)
- Link gaps (underlinked content)
Parameters 3
min_score |
float |
Minimum score threshold for suggestions |
max_suggestions_per_page |
int |
Maximum suggestions per page |
force_recompute |
bool |
If True, recompute even if cached |
Returns
LinkSuggestionResults with all suggestionsLinkSuggestionResults
—
get_suggestions_for_page
Get link suggestions for a specific page.
Automatically generates suggestions …
get_suggestions_for_page
def get_suggestions_for_page(self, page: Page, limit: int = 10) -> list[tuple[Page, float, list[str]]]
Get link suggestions for a specific page.
Automatically generates suggestions if not already computed.
Parameters 2
page |
Page |
Page to get suggestions for |
limit |
int |
Maximum number of suggestions |
Returns
List of (target_page, score, reasons) tupleslist[tuple[Page, float, list[str]]]
—
Internal Methods 11
__init__
Initialize knowledge graph analyzer.
__init__
def __init__(self, site: Site, hub_threshold: int = 10, leaf_threshold: int = 2, exclude_autodoc: bool = True)
Initialize knowledge graph analyzer.
Parameters 4
site |
Site |
Site instance to analyze |
hub_threshold |
int |
Minimum incoming refs to be considered a hub |
leaf_threshold |
int |
Maximum connectivity to be considered a leaf |
exclude_autodoc |
bool |
If True, exclude autodoc/API reference pages from analysis (default: True) |
_ensure_links_extracted
Extract links from all pages if not already extracted.
Links are normally extr…
_ensure_links_extracted
def _ensure_links_extracted(self) -> None
Extract links from all pages if not already extracted.
Links are normally extracted during rendering, but graph analysis needs them before rendering happens. This ensures links are available.
_analyze_cross_references
Analyze cross-references (internal links between pages).
Uses the site's xref_…
_analyze_cross_references
def _analyze_cross_references(self) -> None
Analyze cross-references (internal links between pages).
Uses the site's xref_index to find all internal links. Only analyzes links from/to pages included in analysis (excludes autodoc).
_resolve_link
Resolve a link string to a target page.
_resolve_link
def _resolve_link(self, link: str) -> Page | None
Resolve a link string to a target page.
Parameters 1
link |
str |
Link string (path, slug, or ID) |
Returns
Target page or None if not foundPage | None
—
_analyze_taxonomies
Analyze taxonomy references (pages grouped by tags/categories).
Pages in the s…
_analyze_taxonomies
def _analyze_taxonomies(self) -> None
Analyze taxonomy references (pages grouped by tags/categories).
Pages in the same taxonomy group reference each other implicitly. Only includes pages in analysis (excludes autodoc).
_analyze_related_posts
Analyze related posts (pre-computed relationships).
Related posts are pages th…
_analyze_related_posts
def _analyze_related_posts(self) -> None
Analyze related posts (pre-computed relationships).
Related posts are pages that share tags or other criteria. Only includes pages in analysis (excludes autodoc).
_analyze_menus
Analyze menu items (navigation references).
Pages in menus get a significant b…
_analyze_menus
def _analyze_menus(self) -> None
Analyze menu items (navigation references).
Pages in menus get a significant boost in importance. Only includes pages in analysis (excludes autodoc).
_analyze_section_hierarchy
Analyze implicit section links (parent _index.md → children).
Section index pa…
_analyze_section_hierarchy
def _analyze_section_hierarchy(self) -> None
Analyze implicit section links (parent _index.md → children).
Section index pages implicitly link to all child pages in their directory. This represents topical containment—the parent page defines the topic, children belong to that topic.
Weight: 0.5 (structural but semantically meaningful)
_analyze_navigation_links
Analyze next/prev sequential relationships.
Pages in a section often have prev…
_analyze_navigation_links
def _analyze_navigation_links(self) -> None
Analyze next/prev sequential relationships.
Pages in a section often have prev/next relationships representing a reading order or logical sequence (e.g., tutorial steps, changelogs).
Weight: 0.25 (pure navigation, lowest editorial intent)
_build_link_metrics
Build detailed link metrics for each page.
Aggregates links by type into LinkM…
_build_link_metrics
def _build_link_metrics(self) -> None
Build detailed link metrics for each page.
Aggregates links by type into LinkMetrics objects for weighted connectivity scoring.
_compute_metrics
Compute overall graph metrics.
_compute_metrics
def _compute_metrics(self) -> GraphMetrics
Compute overall graph metrics.
Returns
GraphMetrics with summary statisticsGraphMetrics
—