Bengal includes a comprehensive analysis system that helps understand and optimize site structure, connectivity, and content relationships.
Overview
The analysis module provides tools for:
- Knowledge Graph Analysis: Build and analyze page connectivity through links, taxonomies, and menus
- PageRank: Compute page importance scores using Google's PageRank algorithm
- Community Detection: Discover topical clusters using the Louvain method
- Path Analysis: Identify navigation bridges and bottlenecks using centrality metrics
- Link Suggestions: Get smart recommendations for internal linking
- Graph Visualization: Generate interactive visualizations of site structure
- Performance Advisor: Analyze and recommend performance optimizations
Knowledge Graph (bengal/analysis/knowledge_graph.py)
Purpose: Analyzes the connectivity structure of a Bengal site by building a graph of all pages and their connections.
Tracks connections through:
- Internal links (cross-references)
- Taxonomies (tags, categories)
- Related posts
- Menu items
Key Classes:
| Class | Description |
|---|---|
KnowledgeGraph |
Main analyzer that builds and queries the connectivity graph |
GraphMetrics |
Aggregated metrics (total pages, links, hubs, leaves, orphans) |
PageConnectivity |
Per-page connectivity information |
Usage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
Key Features:
- Autodoc Filtering: Excludes API reference pages from analysis by default (
exclude_autodoc=True) - Actionable Recommendations: Provides specific, actionable suggestions for improvement
- Link Extraction: Automatically extracts links from pages before analysis
Provides insights for:
- Content strategy (find orphaned pages)
- Performance optimization (hub-first streaming)
- Navigation design (understand structure)
- SEO improvements (link structure)
PageRank (bengal/analysis/page_rank.py)
Purpose: Computes page importance scores using Google's PageRank algorithm with the iterative power method.
Algorithm considers:
- Number of incoming links (popularity)
- Importance of pages linking to it (authority)
- Damping factor for random navigation (user behavior)
Key Classes:
| Class | Description |
|---|---|
PageRankComputer |
Computes PageRank scores iteratively |
PageRankResults |
Results with scores, convergence info, and ranking methods |
Usage:
1 2 3 4 5 6 7 8 9 10 11 12 | |
Applications:
- Prioritize important content
- Guide content promotion
- Optimize navigation structure
- Implement hub-first streaming
Community Detection (bengal/analysis/community_detection.py)
Purpose: Discovers topical clusters in content using the Louvain method for modularity optimization.
Algorithm: Two-phase iterative approach
- Local optimization: Move nodes to communities that maximize modularity gain
- Aggregation: Treat each community as a single node and repeat
Key Classes:
| Class | Description |
|---|---|
LouvainCommunityDetector |
Implements Louvain method for community detection |
Community |
Represents a community of related pages |
CommunityDetectionResults |
Results with communities, modularity score, and query methods |
Usage:
1 2 3 4 5 6 7 8 9 10 11 12 | |
Applications:
- Discover topical organization
- Guide content categorization
- Improve internal linking within topics
- Generate navigation menus
Path Analysis (bengal/analysis/path_analysis.py)
Purpose: Analyze navigation paths and page accessibility using centrality metrics.
Computes:
- Betweenness centrality: Pages that connect different parts of the site (bridges)
- Closeness centrality: Pages that are easy to reach from anywhere (accessible)
- Shortest paths: Navigation paths between pages
Key Classes:
| Class | Description |
|---|---|
PathAnalyzer |
Computes centrality metrics with auto-scaling approximation |
PathAnalysisResults |
Results with centrality scores and approximation metadata |
PathSearchResult |
Results from path search with termination metadata |
Scalability: For large sites (>500 pages by default), automatically uses pivot-based approximation to achieve O(k*N) complexity instead of O(N²). This provides ~100x speedup for 10k page sites while maintaining accurate relative rankings.
Usage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | |
Configuration:
k_pivots: Number of pivot nodes for approximation (default: 100)seed: Random seed for deterministic results (default: 42)auto_approximate_threshold: Use exact if pages ≤ this (default: 500)
Applications:
- Identify navigation bottlenecks
- Optimize site structure
- Find critical pages for user flows
- Improve site accessibility
Link Suggestions (bengal/analysis/link_suggestions.py)
Purpose: Provides smart cross-linking recommendations based on multiple signals.
Considers:
- Topic similarity (shared tags, categories, keywords)
- PageRank importance (prioritize linking to high-value pages)
- Navigation patterns (betweenness, closeness)
- Current link structure (avoid over-linking, find gaps)
Key Classes:
| Class | Description |
|---|---|
LinkSuggestionEngine |
Generates smart link suggestions |
LinkSuggestion |
A suggested link with score and reasons |
LinkSuggestionResults |
Collection of suggestions with query methods |
Usage:
1 2 3 4 5 6 7 8 9 10 11 | |
Helps improve:
- Internal linking structure
- SEO through better site connectivity
- Content discoverability
- User navigation
Graph Visualization (bengal/analysis/graph_visualizer.py)
Purpose: Generate interactive visualizations of site structure using D3.js force-directed graphs.
Features:
- Node sizing by PageRank
- Node coloring by community
- Edge thickness by connection strength
- Interactive zoom and pan
- Node hover information
- Responsive design
Usage:
1 2 3 4 5 6 7 8 | |
Performance Advisor (bengal/analysis/performance_advisor.py)
Purpose: Analyzes site structure and provides performance optimization recommendations.
Analyzes:
- Hub-first streaming opportunities
- Parallel rendering candidates
- Cache hit potential
- Link structure efficiency
Usage:
1 2 3 4 5 6 7 8 9 | |
CLI Integration
The analysis system is integrated into the CLI with dedicated commands:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
Export Formats: All commands support multiple output formats:
table(default) - Human-readable table formatjson- JSON for programmatic processingcsv- CSV for spreadsheet analysissummary- Summary statistics (pagerank, communities, bridges)markdown- Markdown checklist (suggest command)
Key Features:
- Actionable Recommendations: The
analyzecommand provides specific recommendations - Autodoc Filtering: API reference pages are excluded by default for cleaner analysis
- Multiple Export Formats: Export results for further analysis or reporting
See Graph Analysis for detailed usage examples and workflows.