What if my API key doesn't work?

Verify: 1. Key is correct ( check your dashboard ) 2. Header name is `x-api-key` (lowercase, with hyphens) 3. You haven't exceeded rate limits 4. Network/firewall isn't blocking API requests

How do I filter for specific topics?

Add topic filter to API request: ```python params = { "per_page": 100, "topic": "technology,business" # Only fetch these topics } ```

How to Create Chord Diagrams in Python to Visualize Topic Co-occurrence in News Data

Q: Can I analyze more than 100 articles?

Yes. Use cursor-based pagination to fetch multiple pages: ```python articles = [] cursor = None headers = { "x-api-key": "YOUR_API_KEY", "User-Agent": "chord-diagram-topic-cooccurrence/1.0-py" } for _ in range(3): # Fetch 3 pages (300 articles) params = {"per_page": 100} if cursor: params["cursor"] = cursor response = requests.get(url, headers=headers, params=params) data = response.json() articles.extend(data.get("data", [])) cursor = data.get("next_cursor") if not cursor: break ``` See [How Does Cursor Pagination Work in NewsDataHub API](https://newsdatahub.com/learning-center/article/newsdatahub-pagination-optimization) for details.

intermediate 22 min read December 1, 2025

chord diagram pythonholoviews tutorialdata visualizationtopic cooccurrencenetwork visualizationrelationship graphsbokeh pythonnews data analysiscircular visualizationpython intermediate

Quick Answer: This tutorial teaches you how to create chord diagrams in Python using Matplotlib to visualize topic co-occurrence patterns in real news data from the NewsDataHub API. You’ll learn to build co-occurrence matrices and create professional relationship visualizations.

Perfect for: Python developers, data scientists, journalists, and anyone building network visualizations or analyzing relationships in categorical data.

Time to complete: 20-25 minutes

Difficulty: Intermediate

Stack: Python, Matplotlib, mpl_chord_diagram, NewsDataHub API

What You’ll Build

You’ll create a chord diagram that visualizes how news topics co-occur across articles:

Co-occurrence matrix — Calculate how often topic pairs appear together in articles
Chord diagram — Circular visualization showing topic relationships
Data filtering — Focus on top N topics for clarity
Professional styling — High-resolution PNG output ready for reports and presentations

By the end, you’ll understand when to use chord diagrams for relationship visualization and have working code for analyzing news topic networks.

Topic Co-occurrence Chord Diagram

Prerequisites

Required Tools

Python 3.7+
pip package manager

Install Required Packages

pip install requests matplotlib mpl_chord_diagram

API Key

NewsDataHub API key — Get free key

For current API quotas and rate limits, visit newsdatahub.com/plans.

Knowledge Prerequisites

Basic Python syntax (loops, dictionaries, list comprehensions)
Understanding of categorical data
Familiarity with basic data visualization concepts

Understanding Chord Diagrams: Foundational Concepts

What Are Chord Diagrams?

A chord diagram is a circular visualization that shows relationships and flows between entities. Unlike bar charts that compare individual values, chord diagrams reveal connections and co-occurrences between multiple categories.

Key elements:

Arcs (outer ring) — Represent categories (in our case, news topics)
Ribbons (inner connections) — Show relationships between categories
Ribbon thickness — Indicates the strength of the relationship

When to Use Chord Diagrams

Chord diagrams are ideal when you want to:

Visualize many-to-many relationships between categories
Show the strength of connections between items
Identify clusters and patterns in network data
Display flow or co-occurrence data in a compact, intuitive way

Good use cases: Topic co-occurrence, migration flows, trade relationships, user journey flows, collaboration networks

Poor use cases: Simple comparisons (use bar charts), hierarchical data (use tree maps), continuous time series (use line charts), single-variable distributions (use histograms)

Understanding Co-occurrence

Co-occurrence measures how often two items appear together. In news analysis:

If an article is tagged with both “technology” and “business,” that’s one co-occurrence
Count all such pairs across your dataset to build a co-occurrence matrix
The matrix shows relationship strength: higher numbers mean topics frequently appear together

For example, if “climate” and “policy” appear together in 45 articles, while “climate” and “technology” appear together in 12 articles, the chord diagram would show a thicker ribbon between “climate” and “policy.”

How to Interpret Chord Diagrams

When reading a chord diagram:

Thicker ribbons = stronger connections — Topics that frequently co-occur have thick ribbons
Arc size = total involvement — Larger arcs indicate topics with many connections
Ribbon color — Usually gradient between connected topics, helping trace relationships
Symmetry — Ribbons connect both directions, so the diagram shows the full relationship network

Best Practices for Relationship Visualizations

Limit categories — Too many categories (>15) create cluttered diagrams; focus on top N
Use meaningful colors — Distinct colors for each category aid comprehension
Add transparency — Semi-transparent ribbons prevent visual overload when many overlap
Include clear labels — Ensure category names are readable around the circle
Filter weak connections — Remove very small relationships to reduce noise

Step 1: Fetch News Data

We’ll retrieve news articles to analyze. You have two options:

With an API key: The script fetches live data from NewsDataHub, using cursor-based pagination to retrieve multiple pages.

Without an API key: The script downloads a sample dataset from GitHub, so you can follow along without signing up.

Fetching Articles

import requests
import matplotlib.pyplot as plt
from mpl_chord_diagram import chord_diagram
from collections import defaultdict, Counter
import json
import os

# Set your API key here (or leave empty to use sample data)
API_KEY = ""  # Replace with your NewsDataHub API key, or leave empty

# Check if API key is provided
if API_KEY and API_KEY != "your_api_key":
    print("Using live API data...")

    url = "https://api.newsdatahub.com/v1/news"
    headers = {
        "x-api-key": API_KEY,
        "User-Agent": "chord-diagram-topic-cooccurrence/1.0-py"
    }
    params = {"per_page": 100}

    # Fetch data
    response = requests.get(url, headers=headers, params=params)
    response.raise_for_status()
    articles = response.json().get("data", [])

    print(f"Fetched {len(articles)} articles from API")

else:
    print("No API key provided. Loading sample data...")

    # Download sample data if not already present
    sample_file = "sample-news-data.json"

    if not os.path.exists(sample_file):
        print("Downloading sample data...")
        sample_url = "https://raw.githubusercontent.com/newsdatahub/newsdatahub-data-science-tutorials/main/tutorials/bar-charts-news-data/data/sample-news-data.json"
        response = requests.get(sample_url)
        with open(sample_file, "w") as f:
            json.dump(response.json(), f)
        print(f"Sample data saved to {sample_file}")

    # Load sample data
    with open(sample_file, "r") as f:
        data = json.load(f)

    # Handle both formats: raw array or API response with 'data' key
    if isinstance(data, dict) and "data" in data:
        articles = data["data"]
    elif isinstance(data, list):
        articles = data
    else:
        raise ValueError("Unexpected sample data format")

    print(f"Loaded {len(articles)} articles from sample data")

Expected output (API mode):

Using live API data...
Fetched 100 articles from API

Expected output (sample data mode):

No API key provided. Loading sample data...
Sample data saved to sample-news-data.json
Loaded 100 articles from sample data

Understanding the Code

x-api-key header — Authenticates your request (replace with your actual key)
per_page parameter — Controls how many articles to fetch (max 100)
raise_for_status() — Throws an error for 4XX/5XX HTTP responses
.get("data", []) — Safely extracts article array from JSON response

Step 2: Extract Topics and Build Co-occurrence Matrix

Now we’ll extract topics from articles and calculate how often each pair of topics appears together.

Extract Topics from Articles

# Extract topics from articles
article_topics = []
for article in articles:
    topics = article.get("topics", [])
    if topics and len(topics) >= 2:  # Need at least 2 topics for co-occurrence
        article_topics.append(topics)

print(f"Found {len(article_topics)} articles with 2+ topics")

What this does:

NewsDataHub returns topics as an array for each article
We only keep articles with 2 or more topics (required for co-occurrence)
Articles with a single topic or no topics are filtered out

Count Topic Frequencies

# Count individual topic frequencies to get top topics
topic_counter = Counter()
for topics in article_topics:
    topic_counter.update(topics)

# Select top N topics for visualization clarity
TOP_N = 8
top_topics = [topic for topic, count in topic_counter.most_common(TOP_N)]

print(f"Top {TOP_N} topics: {top_topics}")

Why limit to top 8 topics?

Chord diagrams become cluttered with too many categories
6-12 categories is the sweet spot for readability
You can adjust TOP_N based on your needs (try 6, 10, or 12)

Build Co-occurrence Matrix

# Build co-occurrence matrix
cooccurrence = defaultdict(lambda: defaultdict(int))

for topics in article_topics:
    # Only count co-occurrences within our top topics
    filtered_topics = [t for t in topics if t in top_topics]

    # Count each pair
    for i, topic1 in enumerate(filtered_topics):
        for topic2 in filtered_topics[i+1:]:
            # Make matrix symmetric (topic A-B same as B-A)
            cooccurrence[topic1][topic2] += 1
            cooccurrence[topic2][topic1] += 1

print(f"Built co-occurrence matrix for {len(top_topics)} topics")

Understanding the nested loop:

For each article, we look at all pairs of topics
filtered_topics[i+1:] prevents counting the same pair twice
We increment both [topic1][topic2] and [topic2][topic1] to make the matrix symmetric
Example: If “technology” and “business” co-occur, both matrix entries are incremented

Step 3: Create Chord Diagram Matrix

Convert the co-occurrence data into a matrix format for the chord diagram.

# Build a matrix for the chord diagram
matrix = []
for source in top_topics:
    row = []
    for target in top_topics:
        row.append(cooccurrence[source][target])
    matrix.append(row)

print(f"Created {len(top_topics)}x{len(top_topics)} co-occurrence matrix")

Why a matrix format?

mpl_chord_diagram expects a 2D matrix (list of lists)
Each row represents a source topic
Each column represents a target topic
Values indicate connection strength between topics

Step 4: Create and Style the Chord Diagram

Now we’ll create the chord diagram with professional styling using matplotlib.

# Create figure
fig, ax = plt.subplots(figsize=(10, 10))

# Create chord diagram
chord_diagram(matrix, names=top_topics, ax=ax)

# Style the visualization
ax.set_title('Topic Co-occurrence in Global News',
             fontsize=16, fontweight='bold', pad=20)

# Save the visualization
plt.tight_layout()
plt.savefig('topic_cooccurrence_chord.png', dpi=300, bbox_inches='tight')

print("Chord diagram saved to 'topic_cooccurrence_chord.png'")

Understanding the Styling Options

Figure Setup:

figsize=(10, 10) — Square aspect ratio works best for chord diagrams
Creates matplotlib figure and axis objects for full control

Chord Diagram:

matrix — The co-occurrence data as a 2D array
names=top_topics — Labels for each topic around the circle
ax=ax — Specifies which matplotlib axis to use

Styling:

set_title() — Adds title with custom font size and weight
pad=20 — Adds spacing between title and diagram

Output:

Saves as high-resolution PNG (300 DPI)
bbox_inches='tight' — Removes excess whitespace
Perfect for reports, presentations, and publications

Working Within API Limits

With 100 calls per day on the free tier, you can:

Run this analysis daily to track how topic relationships evolve
Experiment with different time periods using date filters
Build a time-series dataset by saving results daily

Cache Your API Responses

Avoid repeated calls during development by caching data:

import json

# Save response after fetching
with open('news_data.json', 'w') as f:
    json.dump(articles, f)

# Load cached data in subsequent runs
with open('news_data.json', 'r') as f:
    articles = json.load(f)

Benefits:

Iterate faster — Tweak visualizations without making new API calls
Preserve quota — Save API calls for fresh data collection
Reproducibility — Analyze the same dataset across sessions

Plan Your Data Collection

Daily analysis — Fetch 100 articles/day for tracking relationship evolution
Weekly deep dives — Accumulate 700 articles over a week for robust patterns
Upgrade when needed — Visit newsdatahub.com/plans for higher limits

Alternative Visualization Libraries

While mpl_chord_diagram is simple and effective, you might explore:

pyCirclize

from pycirclize import Circos

# More advanced circular visualizations
# Highly customizable
# Supports multiple tracks and layers

NetworkX with Circular Layout

import networkx as nx
import matplotlib.pyplot as plt

# Full control over network visualizations
# Can create custom circular layouts
# Good for complex network analysis

FAQ

What types of data work best with chord diagrams?

Chord diagrams excel with categorical many-to-many relationships: topic co-occurrence, user journey flows, trade between countries, migration patterns, collaboration networks. They’re not suitable for hierarchical data, simple comparisons, or time series.

How many categories can I include before it becomes cluttered?

The sweet spot is 6-12 categories. Below 5 doesn’t show enough complexity. Above 15 becomes difficult to read. Adjust based on your specific data and audience.

Can I make chord diagrams with other tools?

Yes. Popular alternatives include HoloViews, Plotly, D3.js (JavaScript), Circos, pyCirclize, and custom implementations with NetworkX + Matplotlib. mpl_chord_diagram is chosen here for its simplicity and ease of use with Matplotlib.

Why use mpl_chord_diagram instead of pure Matplotlib?

mpl_chord_diagram provides built-in chord diagram support with minimal code. Pure Matplotlib would require significant custom code to create circular layouts, arcs, and ribbons. mpl_chord_diagram handles the complex geometry while giving you full Matplotlib styling control.

Can I analyze more than 100 articles?

Yes. Use cursor-based pagination to fetch multiple pages:

articles = []
cursor = None
headers = {
    "x-api-key": "YOUR_API_KEY",
    "User-Agent": "chord-diagram-topic-cooccurrence/1.0-py"
}
for _ in range(3):  # Fetch 3 pages (300 articles)
    params = {"per_page": 100}
    if cursor:
        params["cursor"] = cursor

    response = requests.get(url, headers=headers, params=params)
    data = response.json()
    articles.extend(data.get("data", []))
    cursor = data.get("next_cursor")
    if not cursor:
        break

See How Does Cursor Pagination Work in NewsDataHub API for details.

What if my API key doesn’t work?

Verify:

Key is correct (check your dashboard)
Header name is x-api-key (lowercase, with hyphens)
You haven’t exceeded rate limits
Network/firewall isn’t blocking API requests

How do I filter for specific topics?

Add topic filter to API request:

params = {
    "per_page": 100,
    "topic": "technology,business"  # Only fetch these topics
}

How fresh is the data?

NewsDataHub updates continuously. The free tier provides recent articles. For specific time ranges, use from_date and to_date parameters. Visit newsdatahub.com/plans for tier-specific freshness guarantees.

Next Steps

Olga S.

Founder of NewsDataHub — Distributed Systems & Data Engineering

Connect on LinkedIn