What if my API key doesn't work?

Verify: 1. Key is correct ( check your dashboard ) 2. Header name is `x-api-key` (lowercase, with hyphens) 3. You haven't exceeded rate limits 4. Network/firewall isn't blocking API requests

How do I filter for specific topics or countries?

Add parameters to your API request: ```python params = { "per_page": 100, "topic": "technology", "country": "US" } ``` See [NewsDataHub Search & Filtering Guide](https://newsdatahub.com/learning-center/article/newsdatahub-search-filtering-guide) for all available filters.

How to Create Treemap Visualizations in Python to Display Topic Distribution

beginner 18 min read December 1, 2025

python treemapsquarify tutorialdata visualizationnews data analysispython data vizproportional data chartsnewsapi pythontopic distributionspace-efficient visualizationpython beginners

Quick Answer: This tutorial teaches you how to create professional treemap visualizations in Python using Squarify and real news data from the NewsDataHub API. You’ll learn to visualize topic distributions with space-efficient, colorful treemaps that make patterns instantly visible.

Perfect for: Python developers, data analysts, students, and anyone building news analytics dashboards or data visualization projects.

Time to complete: 15-20 minutes

Difficulty: Beginner

Stack: Python, Matplotlib, Squarify, NewsDataHub API

What You’ll Build

You’ll create a professional treemap visualization to analyze news data:

Topic distribution treemap — See which topics dominate your news dataset with proportional rectangles
Space-efficient visualization — Display 20+ categories in a single, readable chart
Publication-ready output — Export high-resolution images for reports and presentations

By the end, you’ll know when to use treemaps vs. bar charts or pie charts for categorical data.

Topic Distribution Treemap

Prerequisites

Required Tools

Python 3.7+
pip package manager

Install Required Packages

pip install requests matplotlib squarify

API Key (Optional)

You don’t need an API key to complete this tutorial. The code automatically downloads sample data from GitHub if no key is provided, so you can follow along and build all the charts right away.

If you want to fetch live data instead, grab a free key at newsdatahub.com/login. Note that some fields used in this tutorial (topics, keywords, source metadata) require a paid plan - the sample data includes these fields so you can explore the full analysis regardless.

For current API quotas and rate limits, visit newsdatahub.com/plans.

Knowledge Prerequisites

Basic Python syntax
Familiarity with lists and dictionaries
Understanding of loops and functions

Understanding Treemaps: When and Why to Use Them

What Are Treemaps?

A treemap is a visualization that displays hierarchical data using nested rectangles. Each rectangle’s area is proportional to the value it represents. Unlike bar charts that show data linearly, treemaps maximize space efficiency by filling the entire canvas with data.

Key characteristics:

Area-based encoding — Rectangle size directly corresponds to value magnitude
Space-efficient — No wasted space between categories
Pattern recognition — Dominant categories become immediately obvious through visual weight
Hierarchical capability — Can show multiple levels of categorization simultaneously

When to Use Treemaps vs. Other Charts

Use treemaps when:

You have many categories (10-30+) to compare
You want to show parts of a whole and their relative proportions
Space efficiency is important (displaying many categories in limited space)
You need to make dominant patterns immediately visible
Categories have significantly different magnitudes

Use bar charts when:

You have fewer categories (3-10)
Precise value comparison is critical
You’re showing rankings or ordered data
Exact numerical differences matter more than proportions

Use pie charts when:

You have very few categories (2-5)
You need to show simple percentage breakdown
Total equals 100% (pie charts are not suitable when parts don’t sum to a whole)

For a comprehensive guide on bar charts, see How to Create Bar Charts in Python Using Real News Data.

How to Interpret Treemaps

When reading a treemap:

Larger rectangles = higher values — The biggest rectangle represents the category with the most items
Compare areas, not dimensions — A rectangle with twice the area represents twice the quantity
Color encodes categories — Each color represents a distinct category or theme
Layout optimizes space — Rectangles are arranged algorithmically to minimize wasted space

Real-World Use Cases

Treemaps excel in scenarios like:

News topic analysis — Visualize which topics dominate media coverage
Market share visualization — Show company market shares in an industry
Disk space usage — Display file system storage consumption
Budget allocation — Represent spending across departments
Website analytics — Show page views across different sections

Step 1: Fetch News Data

We’ll retrieve news articles to analyze. You have two options:

With an API key: The script fetches live data from NewsDataHub, using cursor-based pagination to retrieve multiple pages.

Without an API key: The script downloads a sample dataset from GitHub, so you can follow along without signing up.

Fetching Articles

import requests
import matplotlib.pyplot as plt
import squarify
from collections import Counter
import json
import os

# Set your API key here (or leave empty to use sample data)
API_KEY = ""  # Replace with your NewsDataHub API key, or leave empty

# Check if API key is provided
if API_KEY and API_KEY != "your_api_key_here":
    print("Using live API data...")

    url = "https://api.newsdatahub.com/v1/news"
    headers = {
        "x-api-key": API_KEY,
        "User-Agent": "treemap-visualization-topic-distribution-with-newsdatahub/1.0-py"
    }

    # Fetch 100 articles
    params = {"per_page": 100}

    response = requests.get(url, headers=headers, params=params)
    response.raise_for_status()
    data = response.json()

    articles = data.get("data", [])
    print(f"Fetched {len(articles)} articles from API")

else:
    print("No API key provided. Loading sample data...")

    # Download sample data if not already present
    sample_file = "sample-news-data.json"

    if not os.path.exists(sample_file):
        print("Downloading sample data...")
        sample_url = "https://raw.githubusercontent.com/newsdatahub/newsdatahub-data-science-tutorials/main/tutorials/bar-charts-news-data/data/sample-news-data.json"
        response = requests.get(sample_url)
        with open(sample_file, "w") as f:
            json.dump(response.json(), f)
        print(f"Sample data saved to {sample_file}")

    # Load sample data
    with open(sample_file, "r") as f:
        data = json.load(f)

    # Handle both formats: raw array or API response with 'data' key
    if isinstance(data, dict) and "data" in data:
        articles = data["data"]
    elif isinstance(data, list):
        articles = data
    else:
        raise ValueError("Unexpected sample data format")

    print(f"Loaded {len(articles)} articles from sample data")

Expected output (with API key):

Using live API data...
Fetched 100 articles from API

Expected output (without API key):

No API key provided. Loading sample data...
Downloading sample data...
Sample data saved to sample-news-data.json
Loaded 100 articles from sample data

Understanding the Code

API_KEY - Set to your NewsDataHub API key for live data, or leave empty to use sample data

When API_KEY is provided:

x-api-key header — Authenticates your request (replace with your actual key)
per_page parameter — Controls how many articles to fetch (max 100 on free tier)
raise_for_status() — Throws an error for 4XX/5XX HTTP responses
data.get("data", []) — Safely extracts the articles array from the response

When API_KEY is empty, the else block runs:

os.path.exists() — Checks if sample data was already downloaded to avoid redundant requests
Downloads from GitHub — Fetches a curated sample dataset with 100 articles including topic metadata
json.dump() — Saves the sample data locally for reuse across multiple runs
Format handling — Uses isinstance() to handle both raw arrays and API response objects
ValueError — Raises clear error if data format is unexpected

Why this dual-mode pattern?

Lower barrier to entry — You can complete the tutorial without signing up or using API quota
Faster iteration — Sample data loads instantly vs. waiting for API calls
Consistent learning experience — Sample data includes all fields (topics, source metadata) regardless of API tier

Step 2: Extract and Aggregate Topic Data

NewsDataHub returns topics as an array because articles often cover multiple subjects. For treemap visualization, we need to count how many articles mention each topic.

Extract Topics from Articles

# ============================================================================
# Extract and Count Topics
# ============================================================================
# Extract topics - NewsDataHub returns 'topics' as an array
topics = []
for article in articles:
    article_topics = article.get("topics", [])
    if article_topics:
        # If topics is a list, extend our topics list
        if isinstance(article_topics, list):
            topics.extend(article_topics)
        else:
            topics.append(article_topics)

print(f"\nTotal topic mentions: {len(topics)}")

# Exclude 'general' topic (articles not yet categorized)
topics = [t for t in topics if t != 'general']

# Count topic occurrences
topic_counts = Counter(topics)
print(f"Found {len(topic_counts)} unique topics (excluding 'general')")

# Get top 20 topics for visualization
top_topics = dict(topic_counts.most_common(20))
print(f"Displaying top 20 topics out of {len(topic_counts)} total")

What this does:

NewsDataHub returns topics as an array, not a single value
Each article can have multiple topics (e.g., an AI startup funding story might be tagged as both “technology” and “business”)
We use extend() to add all topics from each article to our list
Excluding ‘general’ topic — The ‘general’ topic is applied to articles not yet categorized. Filtering it out ensures we only visualize meaningful content categories
Counter efficiently aggregates and counts topic occurrences
Limiting to top 20 topics — Prevents clutter from rare topics with only 1-2 mentions, making the treemap readable

Why 20 topics?

Treemaps can handle many categories, but beyond 20-30, smaller rectangles become too small to label clearly
Top 20 captures the dominant patterns while maintaining readability
You can adjust this number based on your data distribution

Step 3: Create a Professional Treemap Visualization

Now you’ll transform the aggregated topic counts into a beautiful, space-efficient treemap.

Prepare Data for Plotting

# Prepare data for plotting
labels = list(top_topics.keys())
sizes = list(top_topics.values())

print(f"Ready to visualize {len(labels)} topics")

Create the Treemap

# ============================================================================
# Create Treemap Visualization
# ============================================================================
# Prepare data for plotting
labels = list(top_topics.keys())
sizes = list(top_topics.values())

# Vibrant color palette for visual distinction
colors = [
    '#EF4444',  # Red
    '#3B82F6',  # Blue
    '#10B981',  # Green
    '#FBBF24',  # Yellow
    '#8B5CF6',  # Purple
    '#F59E0B',  # Orange
    '#EC4899',  # Pink
    '#14B8A6',  # Teal
    '#6366F1',  # Indigo
    '#F97316',  # Orange-red
    '#FF6B6B',  # Light red
    '#4ECDC4',  # Cyan
    '#45B7D1',  # Sky blue
    '#FFA07A',  # Light salmon
    '#98D8C8',  # Mint
    '#F7DC6F',  # Light yellow
    '#BB8FCE',  # Lavender
    '#85C1E2',  # Baby blue
    '#52B788',  # Forest green
    '#34D399'   # Emerald
]

# Create figure with appropriate size
plt.figure(figsize=(16, 10))

# Create treemap using squarify
squarify.plot(
    sizes=sizes,
    label=labels,
    color=colors[:len(labels)],
    text_kwargs={'fontsize': 11, 'weight': 'bold', 'color': 'white'},
    bar_kwargs={'edgecolor': 'white', 'linewidth': 3, 'alpha': 0.8}
)

# Style the chart
plt.title('Topic Distribution in Current News Coverage',
          fontsize=18, fontweight='bold', pad=20)
plt.axis('off')  # Remove axes for cleaner look

plt.tight_layout()
plt.savefig('topic-distribution-treemap.png', dpi=300, bbox_inches='tight')
print("\n✓ Treemap visualization saved: topic-distribution-treemap.png")

Styling breakdown:

Figure Size:

figsize=(16, 10) — Large canvas provides ample space for 20 rectangles and labels
Wide aspect ratio (16:10) works well for treemaps, balancing horizontal and vertical divisions

Color Selection:

Vibrant palette with 20 distinct colors — Each topic gets a unique, easily distinguishable color
High saturation — Makes categories stand out and creates visual impact
Colors chosen for maximum distinction — Adjacent rectangles have contrasting colors

Typography:

fontsize=11 — Balances readability with space constraints
weight='bold' — Makes text stand out against colored backgrounds
color='white' — Ensures text is visible on all background colors
Automatic text placement — Squarify library handles optimal label positioning

Visual Enhancements:

bar_kwargs={'alpha': 0.8} — Subtle transparency adds visual depth without compromising readability
bar_kwargs={'edgecolor': 'white', 'linewidth': 3} — Thick white borders create clear separation between categories
axis('off') — Removes unnecessary chart elements (x-axis, y-axis) for a clean, minimal design

Output Quality:

dpi=300 — High resolution for publication-quality images (suitable for reports, presentations, print)
bbox_inches='tight' — Removes excess whitespace around the visualization

Step 4: Enhance with Value Labels

For better context, you can include article counts alongside topic names in the labels.

Custom Labels with Counts

# ============================================================================
# Create Enhanced Treemap with Value Labels
# ============================================================================
# Create labels with topic names and counts
labels_with_counts = [f"{topic}\n({count})" for topic, count in top_topics.items()]

plt.figure(figsize=(16, 10))

squarify.plot(
    sizes=sizes,
    label=labels_with_counts,
    color=colors[:len(labels)],
    text_kwargs={'fontsize': 10, 'weight': 'bold', 'color': 'white'},
    bar_kwargs={'edgecolor': 'white', 'linewidth': 3, 'alpha': 0.8}
)

plt.title('Topic Distribution in Current News Coverage (with counts)',
          fontsize=18, fontweight='bold', pad=20)
plt.axis('off')

plt.tight_layout()
plt.savefig('topic-distribution-treemap-with-counts.png', dpi=300, bbox_inches='tight')
print("✓ Enhanced treemap with counts saved: topic-distribution-treemap-with-counts.png")

Why include counts?

Provides exact values — Viewers can see precise article counts, not just relative proportions
Enhances interpretability — Combines visual (area) and numerical (count) information
Aids decision-making — Exact numbers support data-driven conclusions

Label formatting:

\n separator — Places count on a new line for better readability
Parentheses — Visually distinguishes the count from the topic name
Reduced font size — fontsize=10 accommodates longer labels

Working Within API Rate Limits

NewsDataHub free tier offers 100 API calls per day. Here’s how to maximize your usage:

Cache Data During Development

import json

# Save fetched data to disk
with open("cached_news.json", "w") as f:
    json.dump(articles, f, indent=2)

# Load from cache instead of making API calls
with open("cached_news.json", "r") as f:
    articles = json.load(f)

Benefits:

Iterate faster — No waiting for API responses during chart tweaking
Preserve quota — Save API calls for fresh data collection
Reproducibility — Analyze the same dataset across sessions
Experiment freely — Try different visualization parameters without using API calls

Plan Your Data Collection

Daily topic tracking — Fetch 100 articles daily to monitor topic trends over time
Weekly deep dives — Accumulate data over a week for larger sample sizes
Upgrade when needed — Visit newsdatahub.com/plans for higher limits

Best Practices for Professional Treemaps

1. Choose Distinct, Vibrant Colors

Use high-contrast colors that are easy to distinguish:

# Professional color palette
colors = [
    '#EF4444', '#3B82F6', '#10B981', '#FBBF24', '#8B5CF6',
    '#F59E0B', '#EC4899', '#14B8A6', '#6366F1', '#F97316'
]

Color palette resources:

ColorBrewer for data viz: colorbrewer2.org
Tailwind CSS colors — Modern, accessible palette
Brand colors if building for a specific organization

2. Limit Categories for Readability

# Limit to top N categories
top_topics = dict(topic_counts.most_common(20))  # Sweet spot: 15-25 categories

# Or filter by minimum threshold
top_topics = {topic: count for topic, count in topic_counts.items() if count >= 3}

Why limit categories?

Too many small rectangles become unreadable (text won’t fit)
Aim for 15-30 categories maximum
Group rare categories into “Other” if needed

3. Ensure Text Readability

# Use white text on colored backgrounds
text_kwargs={'fontsize': 11, 'weight': 'bold', 'color': 'white'}

# Add thick white borders for separation
bar_kwargs={'edgecolor': 'white', 'linewidth': 3}

Readability tips:

White text works on most vibrant backgrounds
Bold weight improves legibility at small sizes
Borders prevent visual bleeding between adjacent rectangles

4. Optimize Figure Size

# Large canvas for many categories
plt.figure(figsize=(16, 10))  # For 20+ categories

# Smaller canvas for fewer categories
plt.figure(figsize=(12, 8))   # For 10-15 categories

5. Add Descriptive Titles

plt.title('Topic Distribution in Current News Coverage',
          fontsize=18, fontweight='bold', pad=20)

Title best practices:

Clearly describe what the visualization shows
Include time context if relevant (“Current”, “Weekly”, “November 2025”)
Use sentence case for readability
Add padding for visual separation from the chart

Extending This Tutorial

Once you’re comfortable with basic treemaps, try these enhancements:

1. Filter by Country or Source

Analyze regional differences in topic coverage:

# Filter for US news only
params = {"per_page": 100, "country": "US"}

# Filter by specific sources
params = {"per_page": 100, "source": "bbc-news,cnn"}

2. Compare Multiple Treemaps

Create side-by-side comparisons:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10))

# US news topics
plt.sca(ax1)
squarify.plot(sizes=us_sizes, label=us_labels, ...)
ax1.set_title('US News Topics')

# UK news topics
plt.sca(ax2)
squarify.plot(sizes=uk_sizes, label=uk_labels, ...)
ax2.set_title('UK News Topics')

3. Time Series Analysis

Track how topic distribution changes over time:

# Fetch data for multiple days
# Create separate treemaps for each day
# Compare to identify trending topics

4. Interactive Treemaps with Plotly

For web dashboards, use Plotly for hover effects:

import plotly.express as px

fig = px.treemap(
    names=labels,
    parents=[""] * len(labels),  # Flat hierarchy
    values=sizes,
    title='Topic Distribution'
)
fig.show()

5. Hierarchical Treemaps

Group topics into broader categories:

# Group topics into parent categories
categories = {
    'Politics': ['politics', 'world', 'government'],
    'Technology': ['technology', 'science', 'ai'],
    'Entertainment': ['entertainment', 'sports', 'lifestyle']
}

# Create hierarchical structure for plotting

FAQ

When should I use a treemap instead of a bar chart?

Use treemaps when you have 10+ categories and want to show proportions visually. Treemaps are more space-efficient and make dominant patterns immediately obvious. Use bar charts when you have fewer categories (3-10) or need precise value comparisons.

How many categories can a treemap handle?

Treemaps can theoretically handle unlimited categories, but practically limit to 20-30 for readability. Beyond this, smaller rectangles become too small to label clearly. Consider grouping rare categories into “Other”.

Can treemaps show hierarchical data?

Yes! Treemaps excel at hierarchical data. However, the squarify library only supports flat (non-hierarchical) layouts. For hierarchical treemaps with nested rectangles showing parent-child relationships, use Plotly’s px.treemap() which has built-in support for hierarchical data via the parents parameter.

Why use squarify instead of matplotlib’s built-in treemap?

Matplotlib doesn’t have a built-in treemap function. Squarify is the standard Python library for treemaps, using the squarified treemap algorithm which creates more readable layouts than simple rectangular subdivision.

How does the squarified algorithm work?

The squarified algorithm arranges rectangles to minimize their aspect ratios (closeness to squares). Squares are easier to compare visually than elongated rectangles, improving readability.

Can I customize the layout algorithm?

Squarify uses the squarified algorithm by default. For alternative layouts, explore libraries like plotly (which offers different tiling methods) or implement custom algorithms.

How do I handle negative values in treemaps?

Treemaps cannot display negative values (area cannot be negative). If your data includes negative values, either filter them out, use absolute values, or choose a different chart type like bar charts.

What if my API key doesn’t work?

Verify:

Key is correct (check your dashboard)
Header name is x-api-key (lowercase, with hyphens)
You haven’t exceeded rate limits
Network/firewall isn’t blocking API requests

How do I filter for specific topics or countries?

Add parameters to your API request:

params = {
    "per_page": 100,
    "topic": "technology",
    "country": "US"
}

See NewsDataHub Search & Filtering Guide for all available filters.

Can I fetch more than 100 articles on the free tier?

Yes, using pagination. Each API call is limited to 100 results, but you can make multiple calls (up to your daily limit) using cursor-based pagination. See How Does Cursor Pagination Work in NewsDataHub API for details.

Next Steps

Olga S.

Founder of NewsDataHub — Distributed Systems & Data Engineering

Connect on LinkedIn