Skip to content
NewsDataHub NewsDataHub Learning Center

How to Create Treemap Visualizations in Python to Display Topic Distribution

Quick Answer: This tutorial teaches you how to create professional treemap visualizations in Python using Squarify and real news data from the NewsDataHub API. You’ll learn to visualize topic distributions with space-efficient, colorful treemaps that make patterns instantly visible.

Perfect for: Python developers, data analysts, students, and anyone building news analytics dashboards or data visualization projects.

Time to complete: 15-20 minutes

Difficulty: Beginner

Stack: Python, Matplotlib, Squarify, NewsDataHub API


You’ll create a professional treemap visualization to analyze news data:

  • Topic distribution treemap — See which topics dominate your news dataset with proportional rectangles
  • Space-efficient visualization — Display 20+ categories in a single, readable chart
  • Publication-ready output — Export high-resolution images for reports and presentations

By the end, you’ll know when to use treemaps vs. bar charts or pie charts for categorical data.

Topic Distribution Treemap


  • Python 3.7+
  • pip package manager
Terminal window
pip install requests matplotlib squarify

You don’t need an API key to complete this tutorial. The code automatically downloads sample data from GitHub if no key is provided, so you can follow along and build all the charts right away.

If you want to fetch live data instead, grab a free key at newsdatahub.com/login. Note that some fields used in this tutorial (topics, keywords, source metadata) require a paid plan - the sample data includes these fields so you can explore the full analysis regardless.

For current API quotas and rate limits, visit newsdatahub.com/plans.

  • Basic Python syntax
  • Familiarity with lists and dictionaries
  • Understanding of loops and functions

Understanding Treemaps: When and Why to Use Them

Section titled “Understanding Treemaps: When and Why to Use Them”

A treemap is a visualization that displays hierarchical data using nested rectangles. Each rectangle’s area is proportional to the value it represents. Unlike bar charts that show data linearly, treemaps maximize space efficiency by filling the entire canvas with data.

Key characteristics:

  • Area-based encoding — Rectangle size directly corresponds to value magnitude
  • Space-efficient — No wasted space between categories
  • Pattern recognition — Dominant categories become immediately obvious through visual weight
  • Hierarchical capability — Can show multiple levels of categorization simultaneously

Use treemaps when:

  • You have many categories (10-30+) to compare
  • You want to show parts of a whole and their relative proportions
  • Space efficiency is important (displaying many categories in limited space)
  • You need to make dominant patterns immediately visible
  • Categories have significantly different magnitudes

Use bar charts when:

  • You have fewer categories (3-10)
  • Precise value comparison is critical
  • You’re showing rankings or ordered data
  • Exact numerical differences matter more than proportions

Use pie charts when:

  • You have very few categories (2-5)
  • You need to show simple percentage breakdown
  • Total equals 100% (pie charts are not suitable when parts don’t sum to a whole)

For a comprehensive guide on bar charts, see How to Create Bar Charts in Python Using Real News Data.

When reading a treemap:

  • Larger rectangles = higher values — The biggest rectangle represents the category with the most items
  • Compare areas, not dimensions — A rectangle with twice the area represents twice the quantity
  • Color encodes categories — Each color represents a distinct category or theme
  • Layout optimizes space — Rectangles are arranged algorithmically to minimize wasted space

Treemaps excel in scenarios like:

  • News topic analysis — Visualize which topics dominate media coverage
  • Market share visualization — Show company market shares in an industry
  • Disk space usage — Display file system storage consumption
  • Budget allocation — Represent spending across departments
  • Website analytics — Show page views across different sections

We’ll retrieve news articles to analyze. You have two options:

With an API key: The script fetches live data from NewsDataHub, using cursor-based pagination to retrieve multiple pages.

Without an API key: The script downloads a sample dataset from GitHub, so you can follow along without signing up.

import requests
import matplotlib.pyplot as plt
import squarify
from collections import Counter
import json
import os
# Set your API key here (or leave empty to use sample data)
API_KEY = "" # Replace with your NewsDataHub API key, or leave empty
# Check if API key is provided
if API_KEY and API_KEY != "your_api_key_here":
print("Using live API data...")
url = "https://api.newsdatahub.com/v1/news"
headers = {
"x-api-key": API_KEY,
"User-Agent": "treemap-visualization-topic-distribution-with-newsdatahub/1.0-py"
}
# Fetch 100 articles
params = {"per_page": 100}
response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
data = response.json()
articles = data.get("data", [])
print(f"Fetched {len(articles)} articles from API")
else:
print("No API key provided. Loading sample data...")
# Download sample data if not already present
sample_file = "sample-news-data.json"
if not os.path.exists(sample_file):
print("Downloading sample data...")
sample_url = "https://raw.githubusercontent.com/newsdatahub/newsdatahub-data-science-tutorials/main/tutorials/bar-charts-news-data/data/sample-news-data.json"
response = requests.get(sample_url)
with open(sample_file, "w") as f:
json.dump(response.json(), f)
print(f"Sample data saved to {sample_file}")
# Load sample data
with open(sample_file, "r") as f:
data = json.load(f)
# Handle both formats: raw array or API response with 'data' key
if isinstance(data, dict) and "data" in data:
articles = data["data"]
elif isinstance(data, list):
articles = data
else:
raise ValueError("Unexpected sample data format")
print(f"Loaded {len(articles)} articles from sample data")

Expected output (with API key):

Using live API data...
Fetched 100 articles from API

Expected output (without API key):

No API key provided. Loading sample data...
Downloading sample data...
Sample data saved to sample-news-data.json
Loaded 100 articles from sample data

API_KEY - Set to your NewsDataHub API key for live data, or leave empty to use sample data

When API_KEY is provided:

  • x-api-key header — Authenticates your request (replace with your actual key)
  • per_page parameter — Controls how many articles to fetch (max 100 on free tier)
  • raise_for_status() — Throws an error for 4XX/5XX HTTP responses
  • data.get("data", []) — Safely extracts the articles array from the response

When API_KEY is empty, the else block runs:

  • os.path.exists() — Checks if sample data was already downloaded to avoid redundant requests
  • Downloads from GitHub — Fetches a curated sample dataset with 100 articles including topic metadata
  • json.dump() — Saves the sample data locally for reuse across multiple runs
  • Format handling — Uses isinstance() to handle both raw arrays and API response objects
  • ValueError — Raises clear error if data format is unexpected

Why this dual-mode pattern?

  • Lower barrier to entry — You can complete the tutorial without signing up or using API quota
  • Faster iteration — Sample data loads instantly vs. waiting for API calls
  • Consistent learning experience — Sample data includes all fields (topics, source metadata) regardless of API tier

NewsDataHub returns topics as an array because articles often cover multiple subjects. For treemap visualization, we need to count how many articles mention each topic.

# ============================================================================
# Extract and Count Topics
# ============================================================================
# Extract topics - NewsDataHub returns 'topics' as an array
topics = []
for article in articles:
article_topics = article.get("topics", [])
if article_topics:
# If topics is a list, extend our topics list
if isinstance(article_topics, list):
topics.extend(article_topics)
else:
topics.append(article_topics)
print(f"\nTotal topic mentions: {len(topics)}")
# Exclude 'general' topic (articles not yet categorized)
topics = [t for t in topics if t != 'general']
# Count topic occurrences
topic_counts = Counter(topics)
print(f"Found {len(topic_counts)} unique topics (excluding 'general')")
# Get top 20 topics for visualization
top_topics = dict(topic_counts.most_common(20))
print(f"Displaying top 20 topics out of {len(topic_counts)} total")

What this does:

  • NewsDataHub returns topics as an array, not a single value
  • Each article can have multiple topics (e.g., an AI startup funding story might be tagged as both “technology” and “business”)
  • We use extend() to add all topics from each article to our list
  • Excluding ‘general’ topic — The ‘general’ topic is applied to articles not yet categorized. Filtering it out ensures we only visualize meaningful content categories
  • Counter efficiently aggregates and counts topic occurrences
  • Limiting to top 20 topics — Prevents clutter from rare topics with only 1-2 mentions, making the treemap readable

Why 20 topics?

  • Treemaps can handle many categories, but beyond 20-30, smaller rectangles become too small to label clearly
  • Top 20 captures the dominant patterns while maintaining readability
  • You can adjust this number based on your data distribution

Step 3: Create a Professional Treemap Visualization

Section titled “Step 3: Create a Professional Treemap Visualization”

Now you’ll transform the aggregated topic counts into a beautiful, space-efficient treemap.

# Prepare data for plotting
labels = list(top_topics.keys())
sizes = list(top_topics.values())
print(f"Ready to visualize {len(labels)} topics")
# ============================================================================
# Create Treemap Visualization
# ============================================================================
# Prepare data for plotting
labels = list(top_topics.keys())
sizes = list(top_topics.values())
# Vibrant color palette for visual distinction
colors = [
'#EF4444', # Red
'#3B82F6', # Blue
'#10B981', # Green
'#FBBF24', # Yellow
'#8B5CF6', # Purple
'#F59E0B', # Orange
'#EC4899', # Pink
'#14B8A6', # Teal
'#6366F1', # Indigo
'#F97316', # Orange-red
'#FF6B6B', # Light red
'#4ECDC4', # Cyan
'#45B7D1', # Sky blue
'#FFA07A', # Light salmon
'#98D8C8', # Mint
'#F7DC6F', # Light yellow
'#BB8FCE', # Lavender
'#85C1E2', # Baby blue
'#52B788', # Forest green
'#34D399' # Emerald
]
# Create figure with appropriate size
plt.figure(figsize=(16, 10))
# Create treemap using squarify
squarify.plot(
sizes=sizes,
label=labels,
color=colors[:len(labels)],
text_kwargs={'fontsize': 11, 'weight': 'bold', 'color': 'white'},
bar_kwargs={'edgecolor': 'white', 'linewidth': 3, 'alpha': 0.8}
)
# Style the chart
plt.title('Topic Distribution in Current News Coverage',
fontsize=18, fontweight='bold', pad=20)
plt.axis('off') # Remove axes for cleaner look
plt.tight_layout()
plt.savefig('topic-distribution-treemap.png', dpi=300, bbox_inches='tight')
print("\n✓ Treemap visualization saved: topic-distribution-treemap.png")

Styling breakdown:

Figure Size:

  • figsize=(16, 10) — Large canvas provides ample space for 20 rectangles and labels
  • Wide aspect ratio (16:10) works well for treemaps, balancing horizontal and vertical divisions

Color Selection:

  • Vibrant palette with 20 distinct colors — Each topic gets a unique, easily distinguishable color
  • High saturation — Makes categories stand out and creates visual impact
  • Colors chosen for maximum distinction — Adjacent rectangles have contrasting colors

Typography:

  • fontsize=11 — Balances readability with space constraints
  • weight='bold' — Makes text stand out against colored backgrounds
  • color='white' — Ensures text is visible on all background colors
  • Automatic text placement — Squarify library handles optimal label positioning

Visual Enhancements:

  • bar_kwargs={'alpha': 0.8} — Subtle transparency adds visual depth without compromising readability
  • bar_kwargs={'edgecolor': 'white', 'linewidth': 3} — Thick white borders create clear separation between categories
  • axis('off') — Removes unnecessary chart elements (x-axis, y-axis) for a clean, minimal design

Output Quality:

  • dpi=300 — High resolution for publication-quality images (suitable for reports, presentations, print)
  • bbox_inches='tight' — Removes excess whitespace around the visualization

For better context, you can include article counts alongside topic names in the labels.

# ============================================================================
# Create Enhanced Treemap with Value Labels
# ============================================================================
# Create labels with topic names and counts
labels_with_counts = [f"{topic}\n({count})" for topic, count in top_topics.items()]
plt.figure(figsize=(16, 10))
squarify.plot(
sizes=sizes,
label=labels_with_counts,
color=colors[:len(labels)],
text_kwargs={'fontsize': 10, 'weight': 'bold', 'color': 'white'},
bar_kwargs={'edgecolor': 'white', 'linewidth': 3, 'alpha': 0.8}
)
plt.title('Topic Distribution in Current News Coverage (with counts)',
fontsize=18, fontweight='bold', pad=20)
plt.axis('off')
plt.tight_layout()
plt.savefig('topic-distribution-treemap-with-counts.png', dpi=300, bbox_inches='tight')
print("✓ Enhanced treemap with counts saved: topic-distribution-treemap-with-counts.png")

Why include counts?

  • Provides exact values — Viewers can see precise article counts, not just relative proportions
  • Enhances interpretability — Combines visual (area) and numerical (count) information
  • Aids decision-making — Exact numbers support data-driven conclusions

Label formatting:

  • \n separator — Places count on a new line for better readability
  • Parentheses — Visually distinguishes the count from the topic name
  • Reduced font sizefontsize=10 accommodates longer labels

NewsDataHub free tier offers 100 API calls per day. Here’s how to maximize your usage:

import json
# Save fetched data to disk
with open("cached_news.json", "w") as f:
json.dump(articles, f, indent=2)
# Load from cache instead of making API calls
with open("cached_news.json", "r") as f:
articles = json.load(f)

Benefits:

  • Iterate faster — No waiting for API responses during chart tweaking
  • Preserve quota — Save API calls for fresh data collection
  • Reproducibility — Analyze the same dataset across sessions
  • Experiment freely — Try different visualization parameters without using API calls
  • Daily topic tracking — Fetch 100 articles daily to monitor topic trends over time
  • Weekly deep dives — Accumulate data over a week for larger sample sizes
  • Upgrade when needed — Visit newsdatahub.com/plans for higher limits

Use high-contrast colors that are easy to distinguish:

# Professional color palette
colors = [
'#EF4444', '#3B82F6', '#10B981', '#FBBF24', '#8B5CF6',
'#F59E0B', '#EC4899', '#14B8A6', '#6366F1', '#F97316'
]

Color palette resources:

  • ColorBrewer for data viz: colorbrewer2.org
  • Tailwind CSS colors — Modern, accessible palette
  • Brand colors if building for a specific organization
# Limit to top N categories
top_topics = dict(topic_counts.most_common(20)) # Sweet spot: 15-25 categories
# Or filter by minimum threshold
top_topics = {topic: count for topic, count in topic_counts.items() if count >= 3}

Why limit categories?

  • Too many small rectangles become unreadable (text won’t fit)
  • Aim for 15-30 categories maximum
  • Group rare categories into “Other” if needed
# Use white text on colored backgrounds
text_kwargs={'fontsize': 11, 'weight': 'bold', 'color': 'white'}
# Add thick white borders for separation
bar_kwargs={'edgecolor': 'white', 'linewidth': 3}

Readability tips:

  • White text works on most vibrant backgrounds
  • Bold weight improves legibility at small sizes
  • Borders prevent visual bleeding between adjacent rectangles
# Large canvas for many categories
plt.figure(figsize=(16, 10)) # For 20+ categories
# Smaller canvas for fewer categories
plt.figure(figsize=(12, 8)) # For 10-15 categories
plt.title('Topic Distribution in Current News Coverage',
fontsize=18, fontweight='bold', pad=20)

Title best practices:

  • Clearly describe what the visualization shows
  • Include time context if relevant (“Current”, “Weekly”, “November 2025”)
  • Use sentence case for readability
  • Add padding for visual separation from the chart

Once you’re comfortable with basic treemaps, try these enhancements:

Analyze regional differences in topic coverage:

# Filter for US news only
params = {"per_page": 100, "country": "US"}
# Filter by specific sources
params = {"per_page": 100, "source": "bbc-news,cnn"}

Create side-by-side comparisons:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10))
# US news topics
plt.sca(ax1)
squarify.plot(sizes=us_sizes, label=us_labels, ...)
ax1.set_title('US News Topics')
# UK news topics
plt.sca(ax2)
squarify.plot(sizes=uk_sizes, label=uk_labels, ...)
ax2.set_title('UK News Topics')

Track how topic distribution changes over time:

# Fetch data for multiple days
# Create separate treemaps for each day
# Compare to identify trending topics

For web dashboards, use Plotly for hover effects:

import plotly.express as px
fig = px.treemap(
names=labels,
parents=[""] * len(labels), # Flat hierarchy
values=sizes,
title='Topic Distribution'
)
fig.show()

Group topics into broader categories:

# Group topics into parent categories
categories = {
'Politics': ['politics', 'world', 'government'],
'Technology': ['technology', 'science', 'ai'],
'Entertainment': ['entertainment', 'sports', 'lifestyle']
}
# Create hierarchical structure for plotting

  • When should I use a treemap instead of a bar chart?

Use treemaps when you have 10+ categories and want to show proportions visually. Treemaps are more space-efficient and make dominant patterns immediately obvious. Use bar charts when you have fewer categories (3-10) or need precise value comparisons.

  • How many categories can a treemap handle?

Treemaps can theoretically handle unlimited categories, but practically limit to 20-30 for readability. Beyond this, smaller rectangles become too small to label clearly. Consider grouping rare categories into “Other”.

  • Can treemaps show hierarchical data?

Yes! Treemaps excel at hierarchical data. However, the squarify library only supports flat (non-hierarchical) layouts. For hierarchical treemaps with nested rectangles showing parent-child relationships, use Plotly’s px.treemap() which has built-in support for hierarchical data via the parents parameter.

  • Why use squarify instead of matplotlib’s built-in treemap?

Matplotlib doesn’t have a built-in treemap function. Squarify is the standard Python library for treemaps, using the squarified treemap algorithm which creates more readable layouts than simple rectangular subdivision.

  • How does the squarified algorithm work?

The squarified algorithm arranges rectangles to minimize their aspect ratios (closeness to squares). Squares are easier to compare visually than elongated rectangles, improving readability.

  • Can I customize the layout algorithm?

Squarify uses the squarified algorithm by default. For alternative layouts, explore libraries like plotly (which offers different tiling methods) or implement custom algorithms.

  • How do I handle negative values in treemaps?

Treemaps cannot display negative values (area cannot be negative). If your data includes negative values, either filter them out, use absolute values, or choose a different chart type like bar charts.

  • What if my API key doesn’t work?

Verify:

  1. Key is correct (check your dashboard)
  2. Header name is x-api-key (lowercase, with hyphens)
  3. You haven’t exceeded rate limits
  4. Network/firewall isn’t blocking API requests
  • How do I filter for specific topics or countries?

Add parameters to your API request:

params = {
"per_page": 100,
"topic": "technology",
"country": "US"
}

See NewsDataHub Search & Filtering Guide for all available filters.

  • Can I fetch more than 100 articles on the free tier?

Yes, using pagination. Each API call is limited to 100 results, but you can make multiple calls (up to your daily limit) using cursor-based pagination. See How Does Cursor Pagination Work in NewsDataHub API for details.


Olga S.

Founder of NewsDataHub — Distributed Systems & Data Engineering

Connect on LinkedIn