path_analysis

Classes

PathAnalysisResults dataclass

Results from path analysis and centrality computations. Contains centrality metrics that identify …

Results from path analysis and centrality computations.

Contains centrality metrics that identify important pages in the site's link structure. High betweenness indicates bridge pages, high closeness indicates easily accessible pages.

Attributes

Name	Type	Description
`betweenness_centrality`	`dict[Page, float]`	Map of pages to betweenness scores (0.0-1.0)
`closeness_centrality`	`dict[Page, float]`	Map of pages to closeness scores (0.0-1.0)
`avg_path_length`	`float`	Average shortest path length between all page pairs
`diameter`	`int`	Network diameter (longest shortest path)
`is_approximate`	`bool`	True if approximation was used (for large sites)
`pivots_used`	`int`	Number of pivot nodes used (if approximate)

Methods 4

get_top_bridges

Get pages with highest betweenness centrality (bridge pages).

1 list[tuple[Page, float]]

def get_top_bridges(self, limit: int = 20) -> list[tuple[Page, float]]

Get pages with highest betweenness centrality (bridge pages).

Parameters 1

limit

int

Number of pages to return

Returns

list[tuple[Page, float]] —

List of (page, centrality) tuples sorted by centrality descending

get_most_accessible

Get most accessible pages (highest closeness centrality).

1 list[tuple[Page, float]]

def get_most_accessible(self, limit: int = 20) -> list[tuple[Page, float]]

Get most accessible pages (highest closeness centrality).

Parameters 1

limit

int

Number of pages to return

Returns

list[tuple[Page, float]] —

List of (page, centrality) tuples sorted by centrality descending

get_betweenness

Get betweenness centrality for specific page.

1 float

def get_betweenness(self, page: Page) -> float

Get betweenness centrality for specific page.

Parameters 1

page Page

Returns

float

get_closeness

Get closeness centrality for specific page.

1 float

def get_closeness(self, page: Page) -> float

Get closeness centrality for specific page.

Parameters 1

page Page

Returns

float

PathSearchResult dataclass

Result from find_all_paths including metadata about the search.

Attributes

Name	Type	Description
`paths`	`list[list[Page]]`	List of paths found (each path is a list of pages)
`complete`	`bool`	True if search completed, False if terminated early
`termination_reason`	`str \| None`	Reason for early termination (if any)

PathAnalyzer

Analyze navigation paths and page accessibility. Computes centrality metrics to identify: - Bridge…

Analyze navigation paths and page accessibility.

Computes centrality metrics to identify:

Bridge pages (high betweenness): Pages that connect different parts of the site
Accessible pages (high closeness): Pages that are easy to reach from anywhere
Navigation bottlenecks: Critical pages for site navigation

For large sites (>500 pages by default), uses pivot-based approximation to achieve O(k*N) complexity instead of O(N²). This provides ~100x speedup for 10k page sites while maintaining accurate relative rankings.

Methods 4

find_shortest_path

Find shortest path between two pages using BFS.

2 list[Page] | None

def find_shortest_path(self, source: Page, target: Page) -> list[Page] | None

Find shortest path between two pages using BFS.

Parameters 2

`source`	`Page`	Starting page
`target`	`Page`	Destination page

Returns

list[Page] | None —

List of pages representing the path, or None if no path exists

analyze

Compute path-based centrality metrics. Computes: - Betweenness centrality: How…

1 PathAnalysisResults

def analyze(self, progress_callback: ProgressCallback | None = None) -> PathAnalysisResults

Compute path-based centrality metrics.

Computes:

Betweenness centrality: How often a page appears on shortest paths
Closeness centrality: How close a page is to all other pages
Network diameter: Longest shortest path
Average path length

For large sites, automatically uses pivot-based approximation for O(k*N) complexity instead of O(N²).

Parameters 1

progress_callback

ProgressCallback | None

Optional callback for progress updates. Called as callback(current, total, phase) where phase is "betweenness" or "closeness".

Returns

PathAnalysisResults —

PathAnalysisResults with all metrics

find_all_paths

Find all simple paths between two pages with safety limits. This operation can…

5 PathSearchResult

def find_all_paths(self, source: Page, target: Page, max_length: int = 10, max_paths: int = 1000, timeout_seconds: float | None = 30.0) -> PathSearchResult

Find all simple paths between two pages with safety limits.

This operation can be expensive for highly connected graphs.

Safety limits prevent runaway computation:

max_length: Maximum path length to consider
max_paths: Stop after finding this many paths
timeout_seconds: Stop after this many seconds

Parameters 5

`source`	`Page`	Starting page
`target`	`Page`	Destination page
`max_length`	`int`	Maximum path length to search (default: 10)
`max_paths`	`int`	Maximum number of paths to find (default: 1000)
`timeout_seconds`	`float \| None`	Maximum time in seconds (default: 30.0, None for no limit)

Returns

PathSearchResult —

PathSearchResult with paths found and completion status

find_all_paths_simple

Find all simple paths between two pages (legacy API). This is the original API…

3 list[list[Page]]

def find_all_paths_simple(self, source: Page, target: Page, max_length: int = 10) -> list[list[Page]]

Find all simple paths between two pages (legacy API).

This is the original API preserved for backward compatibility. For new code, prefer find_all_paths() which includes safety limits.

Parameters 3

`source`	`Page`	Starting page
`target`	`Page`	Destination page
`max_length`	`int`	Maximum path length to search

Returns

list[list[Page]] —

List of paths (each path is a list of pages)

Internal Methods 4

__init__

Initialize path analyzer.

4 None

def __init__(self, graph: KnowledgeGraph, k_pivots: int = DEFAULT_K_PIVOTS, seed: int = DEFAULT_SEED, auto_approximate_threshold: int = DEFAULT_AUTO_THRESHOLD)

Initialize path analyzer.

Parameters 4

`graph`	`KnowledgeGraph`	KnowledgeGraph with page connections
`k_pivots`	`int`	Number of pivot nodes to use for approximation (default: 100)
`seed`	`int`	Random seed for deterministic pivot selection (default: 42)
`auto_approximate_threshold`	`int`	Use exact algorithm if page count <= this (default: 500)

_compute_betweenness_centrality

Compute betweenness centrality using Brandes' algorithm. Betweenness measures …

3 dict[Page, float]

def _compute_betweenness_centrality(self, pages: list[Page], use_approximate: bool, progress_callback: ProgressCallback | None = None) -> dict[Page, float]

Compute betweenness centrality using Brandes' algorithm.

Betweenness measures how often a page appears on shortest paths between other pages. High betweenness indicates a "bridge" page.

For large sites, uses pivot-based approximation: only compute from k randomly selected source nodes and scale results.

Parameters 3

`pages`	`list[Page]`	List of pages to analyze
`use_approximate`	`bool`	If True, use pivot-based approximation
`progress_callback`	`ProgressCallback \| None`	Optional progress callback

Returns

dict[Page, float] —

Dictionary mapping pages to betweenness centrality scores

_compute_closeness_centrality

Compute closeness centrality and network metrics. Closeness measures how close…

3 tuple[dict[Page, fl…

def _compute_closeness_centrality(self, pages: list[Page], use_approximate: bool, progress_callback: ProgressCallback | None = None) -> tuple[dict[Page, float], float, int]

Compute closeness centrality and network metrics.

Closeness measures how close a page is to all other pages. Higher closeness = more accessible.

For large sites, uses pivot-based approximation: only compute distances from k randomly selected source nodes.

Parameters 3

`pages`	`list[Page]`	List of pages to analyze
`use_approximate`	`bool`	If True, use pivot-based approximation
`progress_callback`	`ProgressCallback \| None`	Optional progress callback

Returns

tuple[dict[Page, float], float, int] —

Tuple of (closeness_dict, avg_path_length, diameter)

_bfs_distances

Compute shortest path distances from source to all other pages.

2 dict[Page, int]

def _bfs_distances(self, source: Page, pages: list[Page]) -> dict[Page, int]

Compute shortest path distances from source to all other pages.

Parameters 2

`source`	`Page`
`pages`	`list[Page]`

Returns

dict[Page, int]

Functions

analyze_paths

Convenience function for path analysis.

5 PathAnalysisResults

def analyze_paths(graph: KnowledgeGraph, k_pivots: int = PathAnalyzer.DEFAULT_K_PIVOTS, seed: int = PathAnalyzer.DEFAULT_SEED, auto_approximate_threshold: int = PathAnalyzer.DEFAULT_AUTO_THRESHOLD, progress_callback: ProgressCallback | None = None) -> PathAnalysisResults

Convenience function for path analysis.

Parameters 5

Name	Type	Default	Description
`graph`	`KnowledgeGraph`	—	KnowledgeGraph with page connections
`k_pivots`	`int`	`PathAnalyzer.DEFAULT_K_PIVOTS`	Number of pivot nodes for approximation (default: 100)
`seed`	`int`	`PathAnalyzer.DEFAULT_SEED`	Random seed for deterministic results (default: 42)
`auto_approximate_threshold`	`int`	`PathAnalyzer.DEFAULT_AUTO_THRESHOLD`	Use exact if pages <= this (default: 500)
`progress_callback`	`ProgressCallback \| None`	`None`	Optional progress callback

Returns

PathAnalysisResults —

PathAnalysisResults with centrality metrics

analysis.path_analysis

Classes

Attributes

Methods 4

Parameters 1

Returns

Parameters 1

Returns

Parameters 1

Returns

Parameters 1

Returns

Attributes

Methods 4

Parameters 2

Returns

Parameters 1

Returns

Parameters 5

Returns

Parameters 3

Returns

Parameters 4

Parameters 3

Returns

Parameters 3

Returns

Parameters 2

Returns

Functions

Parameters 5

Returns

`analysis.path_analysis`