Bengal includes a comprehensive analysis system that helps understand and optimize site structure, connectivity, and content relationships.
Overview
The analysis module is organized into focused subpackages:
bengal/analysis/
├── graph/ # Graph-based page analysis
│ ├── knowledge_graph.py # Central graph representation
│ ├── builder.py # Graph construction
│ ├── analyzer.py # Structure analysis (hubs, leaves, orphans)
│ ├── metrics.py # Connectivity metrics
│ ├── reporter.py # Human-readable insights
│ ├── visualizer.py # D3.js visualization
│ ├── page_rank.py # PageRank scoring
│ └── community_detection.py # Louvain clustering
├── links/ # Link analysis
│ ├── suggestions.py # Cross-linking recommendations
│ ├── patterns.py # Link pattern detection
│ └── types.py # Link classification
├── performance/ # Performance analysis
│ ├── advisor.py # Build optimization tips
│ └── path_analysis.py # Centrality metrics
├── content_intelligence.py # Content analysis utilities
└── results.py # Shared result types
The module provides tools for:
- Knowledge Graph Analysis: Build and analyze page connectivity through links, taxonomies, and menus
- PageRank: Compute page importance scores using Google's PageRank algorithm
- Community Detection: Discover topical clusters using the Louvain method
- Path Analysis: Identify navigation bridges and bottlenecks using centrality metrics
- Link Suggestions: Get smart recommendations for internal linking
- Graph Visualization: Generate interactive visualizations of site structure
- Performance Advisor: Analyze and recommend performance optimizations
Knowledge Graph (bengal/analysis/graph/knowledge_graph.py)
Purpose: Analyzes the connectivity structure of a Bengal site by building a graph of all pages and their connections.
Tracks connections through:
- Internal links (cross-references)
- Taxonomies (tags, categories)
- Related posts
- Menu items
Key Classes:
| Class | Description |
|---|---|
KnowledgeGraph |
Main analyzer that builds and queries the connectivity graph |
GraphMetrics |
Aggregated metrics (total pages, links, hubs, leaves, orphans) |
PageConnectivity |
Per-page connectivity information |
Usage:
from bengal.analysis import KnowledgeGraph
graph = KnowledgeGraph(site, exclude_autodoc=True) # Exclude API docs by default
graph.build()
# Find orphaned pages
orphans = graph.get_orphans()
print(f"Found {len(orphans)} orphaned pages")
# Find hub pages
hubs = graph.get_hubs(threshold=10)
print(f"Found {len(hubs)} hub pages")
# Get metrics
metrics = graph.metrics
print(f"Average connectivity: {metrics.avg_connectivity:.2f}")
# Get actionable recommendations
recommendations = graph.get_actionable_recommendations()
for rec in recommendations:
print(rec)
Key Features:
- Autodoc Filtering: Excludes API reference pages from analysis by default (
exclude_autodoc=True) - Actionable Recommendations: Provides specific, actionable suggestions for improvement
- Link Extraction: Automatically extracts links from pages before analysis
Provides insights for:
- Content strategy (find orphaned pages)
- Performance optimization (hub-first streaming)
- Navigation design (understand structure)
- SEO improvements (link structure)
PageRank (bengal/analysis/graph/page_rank.py)
Purpose: Computes page importance scores using Google's PageRank algorithm with the iterative power method.
Algorithm considers:
- Number of incoming links (popularity)
- Importance of pages linking to it (authority)
- Damping factor for random navigation (user behavior)
Key Classes:
| Class | Description |
|---|---|
PageRankCalculator |
Computes PageRank scores iteratively |
PageRankResults |
Results with scores, convergence info, and ranking methods |
Usage:
from bengal.analysis import KnowledgeGraph
graph = KnowledgeGraph(site)
graph.build()
# Compute PageRank (recommended: use the graph's convenience method)
results = graph.compute_pagerank(damping=0.85, max_iterations=100)
# Get top-ranked pages
top_pages = results.get_top_pages(limit=20)
for page, score in top_pages:
print(f"{page.title}: {score:.4f}")
# Get pages above 80th percentile
important_pages = results.get_pages_above_percentile(80)
Applications:
- Prioritize important content
- Guide content promotion
- Optimize navigation structure
- Implement hub-first streaming
Community Detection (bengal/analysis/graph/community_detection.py)
Purpose: Discovers topical clusters in content using the Louvain method for modularity optimization.
Algorithm: Two-phase iterative approach
- Local optimization: Move nodes to communities that maximize modularity gain
- Aggregation: Treat each community as a single node and repeat
Key Classes:
| Class | Description |
|---|---|
LouvainCommunityDetector |
Implements Louvain method for community detection |
Community |
Represents a community of related pages |
CommunityDetectionResults |
Results with communities, modularity score, and query methods |
Usage:
from bengal.analysis.graph.community_detection import LouvainCommunityDetector
detector = LouvainCommunityDetector(knowledge_graph)
results = detector.detect()
# Get largest communities
large_communities = results.get_largest_communities(limit=10)
for community in large_communities:
print(f"Community {community.id}: {community.size} pages")
# Find which community a page belongs to
community = results.get_community_for_page(page)
Applications:
- Discover topical organization
- Guide content categorization
- Improve internal linking within topics
- Generate navigation menus
Path Analysis (bengal/analysis/performance/path_analysis.py)
Purpose: Analyze navigation paths and page accessibility using centrality metrics.
Computes:
- Betweenness centrality: Pages that connect different parts of the site (bridges)
- Closeness centrality: Pages that are easy to reach from anywhere (accessible)
- Shortest paths: Navigation paths between pages
Key Classes:
| Class | Description |
|---|---|
PathAnalyzer |
Computes centrality metrics with auto-scaling approximation |
PathAnalysisResults |
Results with centrality scores and approximation metadata |
PathSearchResult |
Results from path search with termination metadata |
Scalability: For large sites (>500 pages by default), automatically uses pivot-based approximation to achieve O(k*N) complexity instead of O(N²). This provides ~100x speedup for 10k page sites while maintaining accurate relative rankings.
Usage:
from bengal.analysis.performance.path_analysis import PathAnalyzer
# Basic usage (auto-selects exact vs approximate)
analyzer = PathAnalyzer(knowledge_graph)
results = analyzer.analyze()
# With progress callback for long-running analysis
def on_progress(current, total, phase):
print(f"{phase}: {current}/{total}")
results = analyzer.analyze(progress_callback=on_progress)
print(f"Approximate: {results.is_approximate}, pivots: {results.pivots_used}")
# Find bridge pages (navigation bottlenecks)
bridges = results.get_top_bridges(10)
for page, score in bridges:
print(f"Bridge: {page.title} (score: {score:.4f})")
# Find shortest path between pages
path = analyzer.find_shortest_path(source_page, target_page)
if path:
print(" → ".join([p.title for p in path]))
# Safe path finding with limits (prevents runaway computation)
result = analyzer.find_all_paths(
source_page, target_page,
max_length=10,
max_paths=100,
timeout_seconds=5.0
)
if not result.complete:
print(f"Search stopped: {result.termination_reason}")
Configuration:
k_pivots: Number of pivot nodes for approximation (default: 100)seed: Random seed for deterministic results (default: 42)auto_approximate_threshold: Use exact if pages ≤ this (default: 500)
Applications:
- Identify navigation bottlenecks
- Optimize site structure
- Find critical pages for user flows
- Improve site accessibility
Link Suggestions (bengal/analysis/links/suggestions.py)
Purpose: Provides smart cross-linking recommendations based on multiple signals.
Considers:
- Topic similarity (shared tags, categories, keywords)
- PageRank importance (prioritize linking to high-value pages)
- Navigation patterns (betweenness, closeness)
- Current link structure (avoid over-linking, find gaps)
Key Classes:
| Class | Description |
|---|---|
LinkSuggestionEngine |
Generates smart link suggestions |
LinkSuggestion |
A suggested link with score and reasons |
LinkSuggestionResults |
Collection of suggestions with query methods |
Usage:
from bengal.analysis import KnowledgeGraph
graph = KnowledgeGraph(site)
graph.build()
# Generate suggestions (recommended: use the graph's convenience method)
results = graph.suggest_links(min_score=0.5, max_suggestions_per_page=5)
# Get suggestions for specific page
suggestions = results.get_suggestions_for_page(page, limit=10)
for suggestion in suggestions:
print(f"Suggest linking to: {suggestion.target.title}")
print(f" Score: {suggestion.score:.3f}")
print(f" Reasons: {', '.join(suggestion.reasons)}")
Helps improve:
- Internal linking structure
- SEO through better site connectivity
- Content discoverability
- User navigation
Graph Visualization (bengal/analysis/graph/visualizer.py)
Purpose: Generate interactive visualizations of site structure using D3.js force-directed graphs.
Features:
- Node sizing based on connectivity and content depth
- Node coloring by type (hub, orphan, regular, generated)
- Edge weight based on bidirectional links
- Interactive zoom and pan
- Search and filtering by title, tags, or type
- Theme-aware styling (light/dark mode)
Usage:
from pathlib import Path
from bengal.analysis import KnowledgeGraph, GraphVisualizer
graph = KnowledgeGraph(site)
graph.build()
visualizer = GraphVisualizer(site, graph)
html = visualizer.generate_html(title="My Site Graph")
Path("public/graph.html").write_text(html)
Performance Advisor (bengal/analysis/performance/advisor.py)
Purpose: Analyzes build statistics and provides performance optimization recommendations.
Analyzes:
- Parallel processing opportunities
- Incremental build potential
- Rendering bottlenecks
- Asset optimization
- Memory usage patterns
- Template complexity
Grading System: A (90-100), B (75-89), C (60-74), D (45-59), F (0-44)
Usage:
from bengal.analysis.performance import analyze_build
# After a build completes, analyze its statistics
advisor = analyze_build(stats)
# Get performance grade
grade = advisor.get_grade()
print(f"Performance Grade: {grade.grade} ({grade.score}/100)")
print(f"Category: {grade.category}")
# Get top recommendations
for suggestion in advisor.get_top_suggestions(3):
print(f"{suggestion.title}")
print(f" Impact: {suggestion.impact}")
print(f" Action: {suggestion.action}")
CLI Integration
The analysis system is integrated into the CLI underbengal graph:
# Analyze site structure and get report
bengal graph report
bengal graph report --brief
bengal graph report --format json > report.json
# Find orphaned pages
bengal graph orphans
bengal graph orphans --level lightly
bengal graph orphans --format json > orphans.json
# Compute PageRank scores
bengal graph pagerank
bengal graph pagerank --top-n 50
bengal graph pagerank --format json > pagerank.json
# Detect communities
bengal graph communities
bengal graph communities --min-size 10
bengal graph communities --resolution 2.0
# Find bridge pages (high betweenness centrality)
bengal graph bridges
bengal graph bridges --metric closeness
bengal graph bridges --format json > bridges.json
# Get link suggestions
bengal graph suggest
bengal graph suggest --min-score 0.5
bengal graph suggest --format markdown > TODO.md
Export Formats: All commands support multiple output formats:
table(default) - Human-readable table formatjson- JSON for programmatic processingmarkdown- Markdown format (suggest command)
Key Features:
- Actionable Recommendations: The
reportcommand provides specific recommendations - Autodoc Filtering: API reference pages are excluded by default for cleaner analysis
- Multiple Export Formats: Export results for further analysis or reporting
Refer to Graph Analysis for usage examples and workflows.