How can I make charts interactive?

Switch from Matplotlib to Plotly for interactive charts: ```python import plotly.express as px fig = px.bar(x=list(topic_counts.keys()), y=list(topic_counts.values())) fig.show() ```

What if my API key doesn't work?

Verify: 1. Key is correct ( check your dashboard ) 2. Header name is `x-api-key` (lowercase, with hyphens) 3. You haven't exceeded rate limits 4. Network/firewall isn't blocking API requests

How do I filter for specific topics or countries?

Add parameters to your API request: ```python params = { "per_page": 100, "topic": "technology", "country": "US" } ``` See [NewsDataHub Search & Filtering Guide](https://newsdatahub.com/learning-center/article/newsdatahub-search-filtering-guide) for all available filters.

How to Create Bar Charts in Python Using Real News Data

Q: How do I handle too many categories?

Limit to top N (e.g., `most_common(10)`), group rare categories into "Other", or use a different chart type (treemap, word cloud).

Q: Why use `Counter` instead of manual counting?

`Counter` from Python's `collections` module is optimized for frequency counting, provides useful methods like `most_common()`, and handles missing keys gracefully.

Q: Can I combine multiple datasets in one chart?

Yes, use grouped or stacked bars: ```python # Grouped bars (side-by-side comparison) x = np.arange(len(categories)) width = 0.35 plt.bar(x - width/2, dataset1, width, label='Dataset 1') plt.bar(x + width/2, dataset2, width, label='Dataset 2') plt.legend() ```

beginner 18 min read November 30, 2025

python bar chartsmatplotlib tutorialdata visualizationnews data analysispython data vizcategorical data chartsnewsapi pythonhorizontal bar charttopic distributionpython beginners

Quick Answer: This tutorial teaches you how to create professional bar charts in Python using Matplotlib and real news data from the NewsDataHub API. You’ll learn to visualize topic distributions, language breakdowns, and top news sources with clean, readable charts.

Perfect for: Python developers, data analysts, students, and anyone building news analytics dashboards or data visualization projects.

Time to complete: 15-20 minutes

Difficulty: Beginner

Stack: Python, Matplotlib, NewsDataHub API

What You’ll Build

You’ll create three types of bar charts to analyze news data:

Topic distribution chart — See which topics dominate your news dataset
Language distribution chart — Analyze article distribution across languages using horizontal bars
Top 10 sources chart — Identify the most active news publishers
Optional: Political leaning analysis — Understand bias distribution across coverage

All Tutorial Bar Charts

By the end, you’ll understand when to use bar charts vs. other chart types for categorical data.

Prerequisites

Required Tools

Python 3.7+
pip package manager

Install Required Packages

pip install requests matplotlib

API Key (Optional)

You don’t need an API key to complete this tutorial. The code automatically downloads sample data from GitHub if no key is provided, so you can follow along and build all the charts right away.

If you want to fetch live data instead, grab a free key at newsdatahub.com/login. Note that some fields used in this tutorial (topics, keywords, source metadata) require a paid plan - the sample data includes these fields so you can explore the full analysis regardless.

For current API quotas and rate limits, visit newsdatahub.com/plans.

Knowledge Prerequisites

Basic Python syntax
Familiarity with lists and dictionaries
Understanding of loops and functions

Step 1: Fetch News Data

We’ll retrieve news articles to analyze. You have two options:

With an API key: The script fetches live data from NewsDataHub, using cursor-based pagination to retrieve multiple pages (up to 200 articles).

Without an API key: The script downloads a sample dataset from GitHub, so you can follow along without signing up.

Fetching Articles

import requests
import matplotlib.pyplot as plt
from collections import Counter
import json
import os

# Set your API key here (or leave empty to use sample data)
API_KEY = ""  # Replace with your NewsDataHub API key, or leave empty

# Check if API key is provided
if API_KEY and API_KEY != "your_api_key_here":
    print("Using live API data...")

    url = "https://api.newsdatahub.com/v1/news"
    headers = {
        "x-api-key": API_KEY,
        "User-Agent": "bar-charts-in-python-using-real-news-data/1.0-py"
    }

    articles = []
    cursor = None

    # Fetch 2 pages (up to 200 articles)
    for _ in range(2):
        params = {
            "per_page": 100,
            "country": "US,FR,DE,ES,BR",
            "source_type": "mainstream_news,digital_native"
        }
        if cursor:
            params["cursor"] = cursor

        response = requests.get(url, headers=headers, params=params)
        response.raise_for_status()
        data = response.json()

        articles.extend(data.get("data", []))
        cursor = data.get("next_cursor")

        if not cursor:
            break

    print(f"Fetched {len(articles)} articles from API")

else:
    print("No API key provided. Loading sample data...")

    # Download sample data if not already present
    sample_file = "sample-news-data.json"

    if not os.path.exists(sample_file):
        print("Downloading sample data...")
        sample_url = "https://raw.githubusercontent.com/newsdatahub/newsdatahub-data-science-tutorials/main/tutorials/bar-charts-news-data/data/sample-news-data.json"
        response = requests.get(sample_url)
        with open(sample_file, "w") as f:
            json.dump(response.json(), f)
        print(f"Sample data saved to {sample_file}")

    # Load sample data
    with open(sample_file, "r") as f:
        data = json.load(f)

    # Handle both formats: raw array or API response with 'data' key
    if isinstance(data, dict) and "data" in data:
        articles = data["data"]
    elif isinstance(data, list):
        articles = data
    else:
        raise ValueError("Unexpected sample data format")

    print(f"Loaded {len(articles)} articles from sample data")

Expected output:

Using live API data...
Fetched 200 articles from API

or if running without the API key:

No API key provided. Loading sample data...
Downloading sample data...
Sample data saved to sample-news-data.json
Loaded 100 articles from sample data

Understanding the Code

API_KEY - Set to your NewsDataHub API key for live data, or leave empty to use sample data

When API_KEY is provided:

x-api-key header — Authenticates your request (replace with your actual key)
per_page parameter — Controls batch size (max 100 on free tier)
country parameter — Fetches from multiple countries (US, France, Germany, Spain, Brazil) to get diverse languages and topics
source_type parameter — Filters for mainstream and digital-native sources for quality content
cursor parameter — Marks your position in the result set for the next page
next_cursor — Returned in the response; use it for the next request
raise_for_status() — Throws an error for 4XX/5XX HTTP responses

When API_KEY is empty, the else block runs: it downloads sample data from GitHub (or loads it locally if already downloaded), giving you the same dataset structure without needing API access

Why multi-country filtering?

Creates meaningful language distribution — You’ll see English, French, German, Spanish, and Portuguese
Demonstrates global news coverage — Shows NewsDataHub’s international reach
Produces diverse topic data — Different countries cover different stories

Step 2: Visualize Topic Distribution

Topic Distribution Bar Chart

Now you’ll aggregate topics and create a vertical bar chart showing the most popular topics.

Extract and Count Topics

# Extract topics from articles
topics = []
for article in articles:
    article_topics = article.get("topics", [])
    if article_topics:
        # Topics is a list- add all topics from this article
        if isinstance(article_topics, list):
            topics.extend(article_topics)
        else:
            topics.append(article_topics)

# Exclude 'general' topic (articles not yet categorized)
topics = [t for t in topics if t != 'general']

topic_counts = Counter(topics)
print(f"Found {len(topic_counts)} unique topics (excluding 'general')")

# Get top 15 topics to avoid chart clutter
top_topics = dict(topic_counts.most_common(15))
print(f"Displaying top 15 topics out of {len(topic_counts)} total")

What this does:

NewsDataHub returns topics as an array, not a single value
Each article can have multiple topics, so we use extend() to add them all
Filters out 'general' — a placeholder for uncategorized articles
Counter aggregates and counts topic occurrences
Limits to top 15 topics — Prevents clutter from rare topics with only 1-2 occurrences, making the chart readable

Create the Bar Chart

# Color palette for data visualization
vibrant_colors = [
    '#EF4444',  # Red
    '#3B82F6',  # Blue
    '#10B981',  # Green
    '#FBBF24',  # Yellow
    '#8B5CF6',  # Purple
    '#F59E0B',  # Orange
    '#EC4899',  # Pink
    '#14B8A6',  # Teal
    '#6366F1',  # Indigo
    '#F97316'   # Orange-red
]

plt.figure(figsize=(12, 6))
categories = list(top_topics.keys())
values = list(top_topics.values())
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]

bars = plt.bar(categories, values, color=colors, edgecolor='white', linewidth=2)

plt.title("Top 15 Topics in News Coverage", fontsize=16, fontweight="bold", pad=20)
plt.xlabel("Topic", fontsize=12, fontweight="bold")
plt.ylabel("Article Count", fontsize=12, fontweight="bold")
plt.xticks(rotation=45, ha="right", fontsize=11)
plt.yticks(fontsize=11)
plt.grid(axis="y", linestyle="--", alpha=0.3)

# Add value labels on bars
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{int(height)}', ha='center', va='bottom', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.savefig("topic-distribution-chart.png", dpi=300, bbox_inches="tight")
plt.show()

Styling breakdown:

figsize=(12, 6) — Creates a wide chart to accommodate 15 categories
vibrant_colors — Professional color palette optimized for data visualization
edgecolor='white', linewidth=2 — White borders make bars stand out
rotation=45, ha="right" — Rotates x-labels to prevent overlap
grid(axis="y") — Adds horizontal gridlines for easier value comparison
tight_layout() — Prevents label cutoff
Value labels — Shows exact counts on top of each bar
savefig() — Exports high-resolution PNG for reports

Why Bar Charts for Topics?

Bar charts are ideal for categorical data like topics because they:

Show discrete categories clearly — Each bar represents a distinct topic
Make comparisons easy — Heights directly correspond to frequency
Reveal distribution patterns — Spot dominant vs. niche topics instantly

Step 3: Language Distribution with Horizontal Bars

Language Distribution Horizontal Bar Chart

Horizontal bar charts work better when you have many categories or want to display labels without rotation.

Why this chart shows multiple languages: Because we filtered for multiple countries (US, France, Germany, Spain, Brazil) in Step 1, our dataset includes articles in English, French, German, Spanish, and Portuguese. If you filtered for a single country or language, this chart would only show one bar.

Extract Languages

languages = [
    article.get("language")
    for article in articles
    if article.get("language")
]

lang_counts = Counter(languages)

Create Horizontal Bar Chart

plt.figure(figsize=(10, 6))
categories = list(lang_counts.keys())
values = list(lang_counts.values())
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]

bars = plt.barh(categories, values, color=colors, edgecolor='white', linewidth=2)

plt.title("Language Distribution in News Coverage", fontsize=16, fontweight="bold", pad=20)
plt.xlabel("Article Count", fontsize=12, fontweight="bold")
plt.ylabel("Language", fontsize=12, fontweight="bold")
plt.xticks(fontsize=11)
plt.yticks(fontsize=11)
plt.grid(axis="x", alpha=0.3, linestyle="--")

# Add value labels
for bar in bars:
    width = bar.get_width()
    plt.text(width, bar.get_y() + bar.get_height()/2.,
             f'{int(width)}', ha='left', va='center', fontsize=11, fontweight='bold',
             bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7, edgecolor='none'))

plt.tight_layout()
plt.savefig("language-distribution-chart.png", dpi=300, bbox_inches="tight")
plt.show()

When to Use Horizontal Bars

Prefer horizontal orientation when:

You have many categories (10+)
Labels are long (horizontal text is easier to read than rotated text)
Comparing similar values (horizontal layout makes small differences clearer)
Sorting alphabetically (creates a natural top-to-bottom reading flow)

Step 4: Top 10 News Sources Analysis

Top 10 News Sources Chart

Analyzing source distribution helps identify the most active publishers and potential dataset biases.

Extract and Rank Sources

Next, we extract the source name from each article and count how often each appears. In the NewsDataHub response, the source name is stored in the top-level source_title field. We use Python’s Counter to tally occurrences and grab the top 10 most frequent sources.

Note: The NewsDataHub API free tier does not include source metadata, topics, or keywords. But the sample data includes these fields, so you can follow along with the full analysis.

sources = [article.get("source_title") for article in articles if article.get("source_title")]

source_counts = Counter(sources)
top10 = source_counts.most_common(10)

print("Top 10 most active sources:")
for rank, (source, count) in enumerate(top10, 1):
    print(f"{rank}. {source}: {count} articles")

Create Top 10 Chart

plt.figure(figsize=(12, 6))
categories = [x[0] for x in top10]
values = [x[1] for x in top10]
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]

bars = plt.bar(categories, values, color=colors, edgecolor='white', linewidth=2)

plt.title("Top 10 Most Active News Sources", fontsize=16, fontweight="bold", pad=20)
plt.xlabel("News Source", fontsize=12, fontweight="bold")
plt.ylabel("Article Count", fontsize=12, fontweight="bold")
plt.xticks(rotation=45, ha="right", fontsize=11)
plt.yticks(fontsize=11)
plt.grid(axis="y", alpha=0.3, linestyle="--")

# Add value labels
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{int(height)}', ha='center', va='bottom', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.savefig("top-sources-chart.png", dpi=300, bbox_inches="tight")
plt.show()

Why Analyze Sources?

Understanding source distribution helps you:

Identify dominant publishers — See which outlets produce the most content
Detect dataset bias — Over-representation of certain sources may skew analysis
Assess coverage diversity — Balanced distribution suggests varied perspectives
Plan data collection — Adjust filters if you need more source diversity

Bonus: Sort by Value for Better Readability

# Sort top 10 by count (descending)
top10_sorted = sorted(top10, key=lambda x: x[1], reverse=True)
categories = [x[0] for x in top10_sorted]
values = [x[1] for x in top10_sorted]
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]

plt.bar(categories, values, color=colors, edgecolor='white', linewidth=2)
# ... rest of chart code

Optional: Political Leaning Analysis

Political Leaning Distribution

NewsDataHub includes political leaning metadata for sources, ranging from far-left to far-right. This enables bias analysis across your dataset.

Note: Political leaning data requires a paid NewsDataHub plan. Check newsdatahub.com/plans for feature availability.

Extract Political Leaning

leanings = [
    article.get("source", {}).get("political_leaning")
    for article in articles
    if article.get("source", {}).get("political_leaning")
]

leaning_counts = Counter(leanings)

Visualize Bias Distribution

print(f"Political leaning: {len(leanings)} out of {len(articles)} articles have leaning data")

# Define order: left to right political spectrum + nonpartisan
# 'nonpartisan' represents wire services (AP, Reuters, AFP) and fact-based outlets
order = ['far_left', 'left', 'center_left', 'center', 'center_right', 'right', 'far_right', 'nonpartisan']
categories = [cat for cat in order if cat in leaning_counts]
values = [leaning_counts[cat] for cat in categories]

plt.figure(figsize=(12, 6))
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]
bars = plt.bar(categories, values, color=colors, edgecolor='white', linewidth=2)

plt.title('Political Leaning Distribution of News Sources', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Political Leaning', fontsize=12, fontweight='bold')
plt.ylabel('Article Count', fontsize=12, fontweight='bold')
plt.xticks(rotation=45, ha='right', fontsize=11)
plt.yticks(fontsize=11)
plt.grid(axis='y', alpha=0.3, linestyle='--')

# Add value labels
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{int(height)}', ha='center', va='bottom', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.savefig('political-leaning-chart.png', dpi=300, bbox_inches='tight')
plt.show()

Understanding the categories:

Political spectrum (far_left → far_right) — Sources with identifiable political bias
Nonpartisan — Wire services (AP, Reuters, AFP) and fact-based outlets that maintain editorial neutrality

Note: Nonpartisan sources often make up a significant portion of the dataset, as they include major wire services that distribute content globally.

Use cases for political leaning analysis:

Media bias research — Understand ideological balance in coverage
Comparative topic analysis — See how different sides cover the same story
Source diversity metrics — Ensure balanced representation in news aggregators
Trend analysis — Track how political coverage shifts over time

For more on filtering by political leaning, see Filter News by Political Leaning.

Best Practices for Professional Bar Charts

1. Use Consistent Color Palettes

A consistent color palette makes your charts look polished and helps readers focus on the data rather than decoding random colors. Pick a palette and define it once at the top of your script:

# Define your color palette once at the top of your script
vibrant_colors = [
    '#EF4444',  # Red
    '#3B82F6',  # Blue
    '#10B981',  # Green
    '#FBBF24',  # Yellow
    '#8B5CF6',  # Purple
    '#F59E0B',  # Orange
    '#EC4899',  # Pink
    '#14B8A6',  # Teal
    '#6366F1',  # Indigo
    '#F97316'   # Orange-red
]

Then reference it in your charts - this keeps your visuals cohesive and makes future updates easier (change colors in one place, not ten).

Choosing a palette:

For general use: The vibrant palette above works well for most bar charts with up to 10 categories.
For accessibility: colorbrewer2.org offers palettes designed for colorblind-friendly visualization and print.
For branded reports: Use your organization’s brand colors to keep charts consistent with other materials.

2. Sort Categories Logically

The order of bars affects how easily readers interpret your chart. A random arrangement forces viewers to scan back and forth; a logical order tells a story at a glance.

# Sort by frequency (most common first) - best for rankings
sorted_items = topic_counts.most_common()

# Sort alphabetically - useful when readers need to find specific categories
sorted_items = sorted(topic_counts.items(), key=lambda x: x[0])

# Sort by custom order - use when categories have inherent sequence
custom_order = ["far-left", "left", "center", "right", "far-right"]
sorted_items = [(k, topic_counts[k]) for k in custom_order if k in topic_counts]

Which sorting to use:

Frequency: Default choice for most bar charts. Puts the most important data first and creates a natural visual hierarchy.
Alphabetical: Works well for reference charts where readers look up specific items (e.g., countries, source names).
Custom order: Essential when categories have a natural sequence - time periods, rating scales, or spectrums like political leaning.

3. Prevent Label Overlap

Long or numerous category labels often collide, making your chart unreadable. Matplotlib won’t fix this automatically, so plan for it.

# Rotate x-axis labels
plt.xticks(rotation=45, ha="right")

# OR use horizontal bars for many categories
plt.barh(...)

# OR limit to top N categories
top_n = topic_counts.most_common(10)

When to use each approach:

Rotation (45°): Works for 5–10 categories with medium-length labels. Use ha=“right” to align rotated text cleanly.
Horizontal bars: The best option when you have many categories (10+) or labels longer than a few words. Readers can scan top-to-bottom without tilting their heads.
Limit categories: If you have dozens of categories, showing all of them creates noise. Display the top 10 (or top 5) and consider grouping the rest as “Other” if the total matters.

4. Export High-Resolution Images

Default exports often look blurry in presentations or print. A few extra parameters ensure your charts stay crisp.

# PNG at 300 DPI - standard for reports and slides
plt.savefig("chart.png", dpi=300, bbox_inches="tight")

# SVG - scalable format for web, stays sharp at any size
plt.savefig("chart.svg", bbox_inches="tight")

Choosing a format:

PNG (300 DPI): Best for documents, slide decks, and email. The dpi=300 setting produces print-quality resolution.
SVG: Ideal for websites or interactive dashboards. Vector format means no pixelation regardless of zoom level.
bbox_inches=“tight”: Prevents Matplotlib from cropping labels or adding excessive whitespace around your chart.

Tip: Save both formats if you’re unsure where the chart will end up - it takes one extra line and covers all use cases.

Working Within API Rate Limits

NewsDataHub free tier offers 100 API calls per day. Here’s how to maximize your usage:

Cache Data During Development

import json

# Save fetched data to disk
with open("cached_news.json", "w") as f:
    json.dump(articles, f, indent=2)

# Load from cache instead of making API calls
with open("cached_news.json", "r") as f:
    articles = json.load(f)

Benefits:

Iterate faster — No waiting for API responses during chart tweaking
Preserve quota — Save API calls for fresh data collection
Reproducibility — Analyze the same dataset across sessions

Maximize each request

Use the per_page query parameter to fetch up to 100 articles per call (available on all tiers, including free). Two well-structured requests can give you 200 articles for analysis.

Track Your Usage

import datetime

# Log each API call
def fetch_with_logging(url, headers, params):
    response = requests.get(url, headers=headers, params=params)
    print(f"[{datetime.datetime.now()}] API call made. Status: {response.status_code}")
    return response

# Count calls per session
api_calls = 0
for _ in range(2):
    response = fetch_with_logging(url, headers, params)
    api_calls += 1
print(f"Total API calls this session: {api_calls}")

Plan Your Data Collection

Daily analysis — Fetch 100 articles/day for time-series tracking
Weekly deep dives — Accumulate 700 articles over a week
Upgrade when needed — Visit newsdatahub.com/plans for higher limits

FAQ

What types of data work best with bar charts?

Bar charts excel with categorical data: topics, languages, sources, countries, sentiment categories, or any discrete grouping. For continuous numerical data (like stock prices over time), use line charts instead.

When should I use horizontal vs. vertical bars?

Use vertical bars for 3-10 categories with short labels. Switch to horizontal bars when you have 10+ categories, long category names, or want alphabetical sorting (easier to read top-to-bottom than rotated text).

How do I handle too many categories?

Limit to top N (e.g., most_common(10)), group rare categories into “Other”, or use a different chart type (treemap, word cloud).

Why use Counter instead of manual counting?

Counter from Python’s collections module is optimized for frequency counting, provides useful methods like most_common(), and handles missing keys gracefully.

How can I make charts interactive?

Switch from Matplotlib to Plotly for interactive charts:

import plotly.express as px
fig = px.bar(x=list(topic_counts.keys()), y=list(topic_counts.values()))
fig.show()

Can I combine multiple datasets in one chart?

Yes, use grouped or stacked bars:

# Grouped bars (side-by-side comparison)
x = np.arange(len(categories))
width = 0.35
plt.bar(x - width/2, dataset1, width, label='Dataset 1')
plt.bar(x + width/2, dataset2, width, label='Dataset 2')
plt.legend()

What if my API key doesn’t work?

Verify:

Key is correct (check your dashboard)
Header name is x-api-key (lowercase, with hyphens)
You haven’t exceeded rate limits
Network/firewall isn’t blocking API requests

How do I filter for specific topics or countries?

Add parameters to your API request:

params = {
    "per_page": 100,
    "topic": "technology",
    "country": "US"
}

See NewsDataHub Search & Filtering Guide for all available filters.

How fresh is the data?

NewsDataHub updates continuously. Data freshness depends on your plan tier. Visit newsdatahub.com/plans for details.

Olga S.

Founder of NewsDataHub — Distributed Systems & Data Engineering

Connect on LinkedIn