Module

analysis.community_detection

Community Detection for Bengal SSG.

Implements the Louvain method for discovering topical clusters in content. The algorithm optimizes modularity to find natural groupings of pages.

The Louvain method works in two phases:

  1. Local optimization: Move nodes to communities that maximize modularity gain
  2. Aggregation: Treat each community as a single node and repeat

References:

  • Blondel, V. D., et al. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment.

Classes

Community dataclass
A community of related pages discovered through link structure. Represents a group of pages that a…
2

A community of related pages discovered through link structure.

Represents a group of pages that are densely connected to each other and share similar topics or themes. Useful for understanding content organization and identifying topic clusters.

Attributes

Name Type Description
id int

Unique community identifier

pages set[Page]

Set of pages belonging to this community

size

Number of pages in the community

density

Internal connection density (0.0-1.0)

Methods 2

size property
Number of pages in this community.
int
def size(self) -> int

Number of pages in this community.

Returns

int

get_top_pages_by_degree
Get most connected pages in this community.
1 list[Page]
def get_top_pages_by_degree(self, limit: int = 5) -> list[Page]

Get most connected pages in this community.

Parameters 1
limit int
Returns

list[Page]

CommunityDetectionResults dataclass
Results from community detection analysis. Contains discovered communities and quality metrics. Co…
3

Results from community detection analysis.

Contains discovered communities and quality metrics. Communities represent natural groupings of related pages based on link structure.

Attributes

Name Type Description
communities list[Community]

List of detected communities

modularity float

Modularity score (quality metric, -1.0 to 1.0, higher is better)

iterations int
num_communities

Total number of communities detected

Methods 3

get_community_for_page
Find which community a page belongs to.
1 Community | None
def get_community_for_page(self, page: Page) -> Community | None

Find which community a page belongs to.

Parameters 1
page Page
Returns

Community | None

get_largest_communities
Get largest communities by page count.
1 list[Community]
def get_largest_communities(self, limit: int = 10) -> list[Community]

Get largest communities by page count.

Parameters 1
limit int
Returns

list[Community]

get_communities_above_size
Get communities with at least min_size pages.
1 list[Community]
def get_communities_above_size(self, min_size: int) -> list[Community]

Get communities with at least min_size pages.

Parameters 1
min_size int
Returns

list[Community]

LouvainCommunityDetector
Detect communities using the Louvain method. The Louvain algorithm is a greedy optimization method…
7

Detect communities using the Louvain method.

The Louvain algorithm is a greedy optimization method that attempts to optimize the modularity of a partition of the network. It runs in two phases:

  1. Modularity Optimization: Each node is moved to the community that yields the largest increase in modularity.

  2. Community Aggregation: A new network is built where nodes are communities and edges represent connections between communities.

These phases are repeated until no further improvement is possible.

Methods 1

detect
Detect communities using Louvain method.
0 CommunityDetectionResults
def detect(self) -> CommunityDetectionResults

Detect communities using Louvain method.

Returns

CommunityDetectionResults

CommunityDetectionResults with discovered communities

Internal Methods 6
__init__
Initialize Louvain community detector.
3 None
def __init__(self, graph: KnowledgeGraph, resolution: float = 1.0, random_seed: int | None = None)

Initialize Louvain community detector.

Parameters 3
graph KnowledgeGraph

KnowledgeGraph with page connections

resolution float

Resolution parameter (higher = more communities)

random_seed int | None

Random seed for reproducibility

_build_edge_weights
Build edge weights from the graph. Uses frozenset to represent undirected edges.
1 dict[frozenset[Page…
def _build_edge_weights(self, pages: list[Page]) -> dict[frozenset[Page], float]

Build edge weights from the graph.

Uses frozenset to represent undirected edges.

Parameters 1
pages list[Page]
Returns

dict[frozenset[Page], float]

_compute_node_degrees
Compute weighted degree for each node.
2 dict[Page, float]
def _compute_node_degrees(self, pages: list[Page], edge_weights: dict[frozenset[Page], float]) -> dict[Page, float]

Compute weighted degree for each node.

Parameters 2
pages list[Page]
edge_weights dict[frozenset[Page], float]
Returns

dict[Page, float]

_get_neighboring_communities
Get communities that are neighbors of this page.
3 set[int]
def _get_neighboring_communities(self, page: Page, page_to_community: dict[Page, int], edge_weights: dict[frozenset[Page], float]) -> set[int]

Get communities that are neighbors of this page.

Parameters 3
page Page
page_to_community dict[Page, int]
edge_weights dict[frozenset[Page], float]
Returns

set[int]

_modularity_gain
Calculate modularity gain from moving page to new community. This uses the fas…
6 float
def _modularity_gain(self, page: Page, to_community: int, page_to_community: dict[Page, int], edge_weights: dict[frozenset[Page], float], node_degrees: dict[Page, float], total_weight: float) -> float

Calculate modularity gain from moving page to new community.

This uses the fast incremental formula for modularity change.

Parameters 6
page Page
to_community int
page_to_community dict[Page, int]
edge_weights dict[frozenset[Page], float]
node_degrees dict[Page, float]
total_weight float
Returns

float

_compute_modularity
Compute Newman's modularity Q.
4 float
def _compute_modularity(self, page_to_community: dict[Page, int], edge_weights: dict[frozenset[Page], float], node_degrees: dict[Page, float], total_weight: float) -> float

Compute Newman's modularity Q.

Parameters 4
page_to_community dict[Page, int]
edge_weights dict[frozenset[Page], float]
node_degrees dict[Page, float]
total_weight float
Returns

float

Functions

detect_communities
Convenience function to detect communities.
3 CommunityDetectionResults
def detect_communities(graph: KnowledgeGraph, resolution: float = 1.0, random_seed: int | None = None) -> CommunityDetectionResults

Convenience function to detect communities.

Parameters 3

Name Type Default Description
graph KnowledgeGraph

KnowledgeGraph with page connections

resolution float 1.0

Resolution parameter (higher = more communities)

random_seed int | None None

Random seed for reproducibility

Returns

CommunityDetectionResults

CommunityDetectionResults with discovered communities