Skip to content
NewsDataHub NewsDataHub Learning Center

How to Create Chord Diagrams in Python to Visualize Topic Co-occurrence in News Data

Quick Answer: This tutorial teaches you how to create chord diagrams in Python using Matplotlib to visualize topic co-occurrence patterns in real news data from the NewsDataHub API. You’ll learn to build co-occurrence matrices and create professional relationship visualizations.

Perfect for: Python developers, data scientists, journalists, and anyone building network visualizations or analyzing relationships in categorical data.

Time to complete: 20-25 minutes

Difficulty: Intermediate

Stack: Python, Matplotlib, mpl_chord_diagram, NewsDataHub API


You’ll create a chord diagram that visualizes how news topics co-occur across articles:

  • Co-occurrence matrix — Calculate how often topic pairs appear together in articles
  • Chord diagram — Circular visualization showing topic relationships
  • Data filtering — Focus on top N topics for clarity
  • Professional styling — High-resolution PNG output ready for reports and presentations

By the end, you’ll understand when to use chord diagrams for relationship visualization and have working code for analyzing news topic networks.

Topic Co-occurrence Chord Diagram


  • Python 3.7+
  • pip package manager
Terminal window
pip install requests matplotlib mpl_chord_diagram

For current API quotas and rate limits, visit newsdatahub.com/plans.

  • Basic Python syntax (loops, dictionaries, list comprehensions)
  • Understanding of categorical data
  • Familiarity with basic data visualization concepts

Understanding Chord Diagrams: Foundational Concepts

Section titled “Understanding Chord Diagrams: Foundational Concepts”

A chord diagram is a circular visualization that shows relationships and flows between entities. Unlike bar charts that compare individual values, chord diagrams reveal connections and co-occurrences between multiple categories.

Key elements:

  • Arcs (outer ring) — Represent categories (in our case, news topics)
  • Ribbons (inner connections) — Show relationships between categories
  • Ribbon thickness — Indicates the strength of the relationship

Chord diagrams are ideal when you want to:

  • Visualize many-to-many relationships between categories
  • Show the strength of connections between items
  • Identify clusters and patterns in network data
  • Display flow or co-occurrence data in a compact, intuitive way

Good use cases: Topic co-occurrence, migration flows, trade relationships, user journey flows, collaboration networks

Poor use cases: Simple comparisons (use bar charts), hierarchical data (use tree maps), continuous time series (use line charts), single-variable distributions (use histograms)

Co-occurrence measures how often two items appear together. In news analysis:

  • If an article is tagged with both “technology” and “business,” that’s one co-occurrence
  • Count all such pairs across your dataset to build a co-occurrence matrix
  • The matrix shows relationship strength: higher numbers mean topics frequently appear together

For example, if “climate” and “policy” appear together in 45 articles, while “climate” and “technology” appear together in 12 articles, the chord diagram would show a thicker ribbon between “climate” and “policy.”

When reading a chord diagram:

  • Thicker ribbons = stronger connections — Topics that frequently co-occur have thick ribbons
  • Arc size = total involvement — Larger arcs indicate topics with many connections
  • Ribbon color — Usually gradient between connected topics, helping trace relationships
  • Symmetry — Ribbons connect both directions, so the diagram shows the full relationship network

Best Practices for Relationship Visualizations

Section titled “Best Practices for Relationship Visualizations”
  1. Limit categories — Too many categories (>15) create cluttered diagrams; focus on top N
  2. Use meaningful colors — Distinct colors for each category aid comprehension
  3. Add transparency — Semi-transparent ribbons prevent visual overload when many overlap
  4. Include clear labels — Ensure category names are readable around the circle
  5. Filter weak connections — Remove very small relationships to reduce noise

We’ll retrieve news articles to analyze. You have two options:

With an API key: The script fetches live data from NewsDataHub, using cursor-based pagination to retrieve multiple pages.

Without an API key: The script downloads a sample dataset from GitHub, so you can follow along without signing up.

import requests
import matplotlib.pyplot as plt
from mpl_chord_diagram import chord_diagram
from collections import defaultdict, Counter
import json
import os
# Set your API key here (or leave empty to use sample data)
API_KEY = "" # Replace with your NewsDataHub API key, or leave empty
# Check if API key is provided
if API_KEY and API_KEY != "your_api_key":
print("Using live API data...")
url = "https://api.newsdatahub.com/v1/news"
headers = {
"x-api-key": API_KEY,
"User-Agent": "chord-diagram-topic-cooccurrence/1.0-py"
}
params = {"per_page": 100}
# Fetch data
response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
articles = response.json().get("data", [])
print(f"Fetched {len(articles)} articles from API")
else:
print("No API key provided. Loading sample data...")
# Download sample data if not already present
sample_file = "sample-news-data.json"
if not os.path.exists(sample_file):
print("Downloading sample data...")
sample_url = "https://raw.githubusercontent.com/newsdatahub/newsdatahub-data-science-tutorials/main/tutorials/bar-charts-news-data/data/sample-news-data.json"
response = requests.get(sample_url)
with open(sample_file, "w") as f:
json.dump(response.json(), f)
print(f"Sample data saved to {sample_file}")
# Load sample data
with open(sample_file, "r") as f:
data = json.load(f)
# Handle both formats: raw array or API response with 'data' key
if isinstance(data, dict) and "data" in data:
articles = data["data"]
elif isinstance(data, list):
articles = data
else:
raise ValueError("Unexpected sample data format")
print(f"Loaded {len(articles)} articles from sample data")

Expected output (API mode):

Using live API data...
Fetched 100 articles from API

Expected output (sample data mode):

No API key provided. Loading sample data...
Sample data saved to sample-news-data.json
Loaded 100 articles from sample data
  • x-api-key header — Authenticates your request (replace with your actual key)
  • per_page parameter — Controls how many articles to fetch (max 100)
  • raise_for_status() — Throws an error for 4XX/5XX HTTP responses
  • .get("data", []) — Safely extracts article array from JSON response

Step 2: Extract Topics and Build Co-occurrence Matrix

Section titled “Step 2: Extract Topics and Build Co-occurrence Matrix”

Now we’ll extract topics from articles and calculate how often each pair of topics appears together.

# Extract topics from articles
article_topics = []
for article in articles:
topics = article.get("topics", [])
if topics and len(topics) >= 2: # Need at least 2 topics for co-occurrence
article_topics.append(topics)
print(f"Found {len(article_topics)} articles with 2+ topics")

What this does:

  • NewsDataHub returns topics as an array for each article
  • We only keep articles with 2 or more topics (required for co-occurrence)
  • Articles with a single topic or no topics are filtered out
# Count individual topic frequencies to get top topics
topic_counter = Counter()
for topics in article_topics:
topic_counter.update(topics)
# Select top N topics for visualization clarity
TOP_N = 8
top_topics = [topic for topic, count in topic_counter.most_common(TOP_N)]
print(f"Top {TOP_N} topics: {top_topics}")

Why limit to top 8 topics?

  • Chord diagrams become cluttered with too many categories
  • 6-12 categories is the sweet spot for readability
  • You can adjust TOP_N based on your needs (try 6, 10, or 12)
# Build co-occurrence matrix
cooccurrence = defaultdict(lambda: defaultdict(int))
for topics in article_topics:
# Only count co-occurrences within our top topics
filtered_topics = [t for t in topics if t in top_topics]
# Count each pair
for i, topic1 in enumerate(filtered_topics):
for topic2 in filtered_topics[i+1:]:
# Make matrix symmetric (topic A-B same as B-A)
cooccurrence[topic1][topic2] += 1
cooccurrence[topic2][topic1] += 1
print(f"Built co-occurrence matrix for {len(top_topics)} topics")

Understanding the nested loop:

  • For each article, we look at all pairs of topics
  • filtered_topics[i+1:] prevents counting the same pair twice
  • We increment both [topic1][topic2] and [topic2][topic1] to make the matrix symmetric
  • Example: If “technology” and “business” co-occur, both matrix entries are incremented

Convert the co-occurrence data into a matrix format for the chord diagram.

# Build a matrix for the chord diagram
matrix = []
for source in top_topics:
row = []
for target in top_topics:
row.append(cooccurrence[source][target])
matrix.append(row)
print(f"Created {len(top_topics)}x{len(top_topics)} co-occurrence matrix")

Why a matrix format?

  • mpl_chord_diagram expects a 2D matrix (list of lists)
  • Each row represents a source topic
  • Each column represents a target topic
  • Values indicate connection strength between topics

Step 4: Create and Style the Chord Diagram

Section titled “Step 4: Create and Style the Chord Diagram”

Now we’ll create the chord diagram with professional styling using matplotlib.

# Create figure
fig, ax = plt.subplots(figsize=(10, 10))
# Create chord diagram
chord_diagram(matrix, names=top_topics, ax=ax)
# Style the visualization
ax.set_title('Topic Co-occurrence in Global News',
fontsize=16, fontweight='bold', pad=20)
# Save the visualization
plt.tight_layout()
plt.savefig('topic_cooccurrence_chord.png', dpi=300, bbox_inches='tight')
print("Chord diagram saved to 'topic_cooccurrence_chord.png'")

Figure Setup:

  • figsize=(10, 10) — Square aspect ratio works best for chord diagrams
  • Creates matplotlib figure and axis objects for full control

Chord Diagram:

  • matrix — The co-occurrence data as a 2D array
  • names=top_topics — Labels for each topic around the circle
  • ax=ax — Specifies which matplotlib axis to use

Styling:

  • set_title() — Adds title with custom font size and weight
  • pad=20 — Adds spacing between title and diagram

Output:

  • Saves as high-resolution PNG (300 DPI)
  • bbox_inches='tight' — Removes excess whitespace
  • Perfect for reports, presentations, and publications

With 100 calls per day on the free tier, you can:

  • Run this analysis daily to track how topic relationships evolve
  • Experiment with different time periods using date filters
  • Build a time-series dataset by saving results daily

Avoid repeated calls during development by caching data:

import json
# Save response after fetching
with open('news_data.json', 'w') as f:
json.dump(articles, f)
# Load cached data in subsequent runs
with open('news_data.json', 'r') as f:
articles = json.load(f)

Benefits:

  • Iterate faster — Tweak visualizations without making new API calls
  • Preserve quota — Save API calls for fresh data collection
  • Reproducibility — Analyze the same dataset across sessions
  • Daily analysis — Fetch 100 articles/day for tracking relationship evolution
  • Weekly deep dives — Accumulate 700 articles over a week for robust patterns
  • Upgrade when needed — Visit newsdatahub.com/plans for higher limits

While mpl_chord_diagram is simple and effective, you might explore:

from pycirclize import Circos
# More advanced circular visualizations
# Highly customizable
# Supports multiple tracks and layers
import networkx as nx
import matplotlib.pyplot as plt
# Full control over network visualizations
# Can create custom circular layouts
# Good for complex network analysis
  • What types of data work best with chord diagrams?

Chord diagrams excel with categorical many-to-many relationships: topic co-occurrence, user journey flows, trade between countries, migration patterns, collaboration networks. They’re not suitable for hierarchical data, simple comparisons, or time series.

  • How many categories can I include before it becomes cluttered?

The sweet spot is 6-12 categories. Below 5 doesn’t show enough complexity. Above 15 becomes difficult to read. Adjust based on your specific data and audience.

  • Can I make chord diagrams with other tools?

Yes. Popular alternatives include HoloViews, Plotly, D3.js (JavaScript), Circos, pyCirclize, and custom implementations with NetworkX + Matplotlib. mpl_chord_diagram is chosen here for its simplicity and ease of use with Matplotlib.

  • Why use mpl_chord_diagram instead of pure Matplotlib?

mpl_chord_diagram provides built-in chord diagram support with minimal code. Pure Matplotlib would require significant custom code to create circular layouts, arcs, and ribbons. mpl_chord_diagram handles the complex geometry while giving you full Matplotlib styling control.

  • Can I analyze more than 100 articles?

Yes. Use cursor-based pagination to fetch multiple pages:

articles = []
cursor = None
headers = {
"x-api-key": "YOUR_API_KEY",
"User-Agent": "chord-diagram-topic-cooccurrence/1.0-py"
}
for _ in range(3): # Fetch 3 pages (300 articles)
params = {"per_page": 100}
if cursor:
params["cursor"] = cursor
response = requests.get(url, headers=headers, params=params)
data = response.json()
articles.extend(data.get("data", []))
cursor = data.get("next_cursor")
if not cursor:
break

See How Does Cursor Pagination Work in NewsDataHub API for details.

  • What if my API key doesn’t work?

Verify:

  1. Key is correct (check your dashboard)
  2. Header name is x-api-key (lowercase, with hyphens)
  3. You haven’t exceeded rate limits
  4. Network/firewall isn’t blocking API requests
  • How do I filter for specific topics?

Add topic filter to API request:

params = {
"per_page": 100,
"topic": "technology,business" # Only fetch these topics
}
  • How fresh is the data?

NewsDataHub updates continuously. The free tier provides recent articles. For specific time ranges, use from_date and to_date parameters. Visit newsdatahub.com/plans for tier-specific freshness guarantees.


Olga S.

Founder of NewsDataHub — Distributed Systems & Data Engineering

Connect on LinkedIn