Classes
PathAnalysisResults
dataclass
Results from path analysis and centrality computations.
Contains centrality metrics that identify …
PathAnalysisResults
dataclass Results from path analysis and centrality computations.
Contains centrality metrics that identify important pages in the site's link structure. High betweenness indicates bridge pages, high closeness indicates easily accessible pages.
Attributes
| Name | Type | Description |
|---|---|---|
betweenness_centrality |
dict[Page, float] |
Map of pages to betweenness scores (0.0-1.0) |
closeness_centrality |
dict[Page, float] |
Map of pages to closeness scores (0.0-1.0) |
avg_path_length |
float |
Average shortest path length between all page pairs |
diameter |
int |
Network diameter (longest shortest path) |
is_approximate |
bool |
True if approximation was used (for large sites) |
pivots_used |
int |
Number of pivot nodes used (if approximate) |
Methods 4
get_top_bridges
Get pages with highest betweenness centrality (bridge pages).
get_top_bridges
def get_top_bridges(self, limit: int = 20) -> list[tuple[Page, float]]
Get pages with highest betweenness centrality (bridge pages).
Parameters 1
limit |
int |
Number of pages to return |
Returns
List of (page, centrality) tuples sorted by centrality descendinglist[tuple[Page, float]]
—
get_most_accessible
Get most accessible pages (highest closeness centrality).
get_most_accessible
def get_most_accessible(self, limit: int = 20) -> list[tuple[Page, float]]
Get most accessible pages (highest closeness centrality).
Parameters 1
limit |
int |
Number of pages to return |
Returns
List of (page, centrality) tuples sorted by centrality descendinglist[tuple[Page, float]]
—
get_betweenness
Get betweenness centrality for specific page.
get_betweenness
def get_betweenness(self, page: Page) -> float
Get betweenness centrality for specific page.
Parameters 1
page |
Page |
Returns
float
get_closeness
Get closeness centrality for specific page.
get_closeness
def get_closeness(self, page: Page) -> float
Get closeness centrality for specific page.
Parameters 1
page |
Page |
Returns
float
PathSearchResult
dataclass
Result from find_all_paths including metadata about the search.
PathSearchResult
dataclass Result from find_all_paths including metadata about the search.
Attributes
| Name | Type | Description |
|---|---|---|
paths |
list[list[Page]] |
List of paths found (each path is a list of pages) |
complete |
bool |
True if search completed, False if terminated early |
termination_reason |
str | None |
Reason for early termination (if any) |
PathAnalyzer
Analyze navigation paths and page accessibility.
Computes centrality metrics to identify:
- Bridge…
PathAnalyzer
Analyze navigation paths and page accessibility.
Computes centrality metrics to identify:
- Bridge pages (high betweenness): Pages that connect different parts of the site
- Accessible pages (high closeness): Pages that are easy to reach from anywhere
- Navigation bottlenecks: Critical pages for site navigation
For large sites (>500 pages by default), uses pivot-based approximation to achieve O(k*N) complexity instead of O(N²). This provides ~100x speedup for 10k page sites while maintaining accurate relative rankings.
Methods 4
find_shortest_path
Find shortest path between two pages using BFS.
find_shortest_path
def find_shortest_path(self, source: Page, target: Page) -> list[Page] | None
Find shortest path between two pages using BFS.
Parameters 2
source |
Page |
Starting page |
target |
Page |
Destination page |
Returns
List of pages representing the path, or None if no path existslist[Page] | None
—
analyze
Compute path-based centrality metrics.
Computes:
- Betweenness centrality: How…
analyze
def analyze(self, progress_callback: ProgressCallback | None = None) -> PathAnalysisResults
Compute path-based centrality metrics.
Computes:
- Betweenness centrality: How often a page appears on shortest paths
- Closeness centrality: How close a page is to all other pages
- Network diameter: Longest shortest path
- Average path length
For large sites, automatically uses pivot-based approximation for O(k*N) complexity instead of O(N²).
Parameters 1
progress_callback |
ProgressCallback | None |
Optional callback for progress updates. Called as callback(current, total, phase) where phase is "betweenness" or "closeness". |
Returns
PathAnalysisResults with all metricsPathAnalysisResults
—
find_all_paths
Find all simple paths between two pages with safety limits.
This operation can…
find_all_paths
def find_all_paths(self, source: Page, target: Page, max_length: int = 10, max_paths: int = 1000, timeout_seconds: float | None = 30.0) -> PathSearchResult
Find all simple paths between two pages with safety limits.
This operation can be expensive for highly connected graphs.
Safety limits prevent runaway computation:
- max_length: Maximum path length to consider
- max_paths: Stop after finding this many paths
- timeout_seconds: Stop after this many seconds
Parameters 5
source |
Page |
Starting page |
target |
Page |
Destination page |
max_length |
int |
Maximum path length to search (default: 10) |
max_paths |
int |
Maximum number of paths to find (default: 1000) |
timeout_seconds |
float | None |
Maximum time in seconds (default: 30.0, None for no limit) |
Returns
PathSearchResult with paths found and completion statusPathSearchResult
—
find_all_paths_simple
Find all simple paths between two pages (legacy API).
This is the original API…
find_all_paths_simple
def find_all_paths_simple(self, source: Page, target: Page, max_length: int = 10) -> list[list[Page]]
Find all simple paths between two pages (legacy API).
This is the original API preserved for backward compatibility. For new code, prefer find_all_paths() which includes safety limits.
Parameters 3
source |
Page |
Starting page |
target |
Page |
Destination page |
max_length |
int |
Maximum path length to search |
Returns
List of paths (each path is a list of pages)list[list[Page]]
—
Internal Methods 4
__init__
Initialize path analyzer.
__init__
def __init__(self, graph: KnowledgeGraph, k_pivots: int = DEFAULT_K_PIVOTS, seed: int = DEFAULT_SEED, auto_approximate_threshold: int = DEFAULT_AUTO_THRESHOLD)
Initialize path analyzer.
Parameters 4
graph |
KnowledgeGraph |
KnowledgeGraph with page connections |
k_pivots |
int |
Number of pivot nodes to use for approximation (default: 100) |
seed |
int |
Random seed for deterministic pivot selection (default: 42) |
auto_approximate_threshold |
int |
Use exact algorithm if page count <= this (default: 500) |
_compute_betweenness_centrality
Compute betweenness centrality using Brandes' algorithm.
Betweenness measures …
_compute_betweenness_centrality
def _compute_betweenness_centrality(self, pages: list[Page], use_approximate: bool, progress_callback: ProgressCallback | None = None) -> dict[Page, float]
Compute betweenness centrality using Brandes' algorithm.
Betweenness measures how often a page appears on shortest paths between other pages. High betweenness indicates a "bridge" page.
For large sites, uses pivot-based approximation: only compute from k randomly selected source nodes and scale results.
Parameters 3
pages |
list[Page] |
List of pages to analyze |
use_approximate |
bool |
If True, use pivot-based approximation |
progress_callback |
ProgressCallback | None |
Optional progress callback |
Returns
Dictionary mapping pages to betweenness centrality scoresdict[Page, float]
—
_compute_closeness_centrality
Compute closeness centrality and network metrics.
Closeness measures how close…
_compute_closeness_centrality
def _compute_closeness_centrality(self, pages: list[Page], use_approximate: bool, progress_callback: ProgressCallback | None = None) -> tuple[dict[Page, float], float, int]
Compute closeness centrality and network metrics.
Closeness measures how close a page is to all other pages. Higher closeness = more accessible.
For large sites, uses pivot-based approximation: only compute distances from k randomly selected source nodes.
Parameters 3
pages |
list[Page] |
List of pages to analyze |
use_approximate |
bool |
If True, use pivot-based approximation |
progress_callback |
ProgressCallback | None |
Optional progress callback |
Returns
Tuple of (closeness_dict, avg_path_length, diameter)tuple[dict[Page, float], float, int]
—
_bfs_distances
Compute shortest path distances from source to all other pages.
_bfs_distances
def _bfs_distances(self, source: Page, pages: list[Page]) -> dict[Page, int]
Compute shortest path distances from source to all other pages.
Parameters 2
source |
Page |
|
pages |
list[Page] |
Returns
dict[Page, int]
Functions
analyze_paths
Convenience function for path analysis.
analyze_paths
def analyze_paths(graph: KnowledgeGraph, k_pivots: int = PathAnalyzer.DEFAULT_K_PIVOTS, seed: int = PathAnalyzer.DEFAULT_SEED, auto_approximate_threshold: int = PathAnalyzer.DEFAULT_AUTO_THRESHOLD, progress_callback: ProgressCallback | None = None) -> PathAnalysisResults
Convenience function for path analysis.
Parameters 5
| Name | Type | Default | Description |
|---|---|---|---|
graph |
KnowledgeGraph |
— | KnowledgeGraph with page connections |
k_pivots |
int |
PathAnalyzer.DEFAULT_K_PIVOTS |
Number of pivot nodes for approximation (default: 100) |
seed |
int |
PathAnalyzer.DEFAULT_SEED |
Random seed for deterministic results (default: 42) |
auto_approximate_threshold |
int |
PathAnalyzer.DEFAULT_AUTO_THRESHOLD |
Use exact if pages <= this (default: 500) |
progress_callback |
ProgressCallback | None |
None |
Optional progress callback |
Returns
PathAnalysisResults with centrality metricsPathAnalysisResults
—