Skip to content
NewsDataHub NewsDataHub Learning Center

How to Create Bar Charts in Python Using Real News Data

Quick Answer: This tutorial teaches you how to create professional bar charts in Python using Matplotlib and real news data from the NewsDataHub API. You’ll learn to visualize topic distributions, language breakdowns, and top news sources with clean, readable charts.

Perfect for: Python developers, data analysts, students, and anyone building news analytics dashboards or data visualization projects.

Time to complete: 15-20 minutes

Difficulty: Beginner

Stack: Python, Matplotlib, NewsDataHub API


You’ll create three types of bar charts to analyze news data:

  • Topic distribution chart — See which topics dominate your news dataset
  • Language distribution chart — Analyze article distribution across languages using horizontal bars
  • Top 10 sources chart — Identify the most active news publishers
  • Optional: Political leaning analysis — Understand bias distribution across coverage

All Tutorial Bar Charts

By the end, you’ll understand when to use bar charts vs. other chart types for categorical data.


  • Python 3.7+
  • pip package manager
Terminal window
pip install requests matplotlib

You don’t need an API key to complete this tutorial. The code automatically downloads sample data from GitHub if no key is provided, so you can follow along and build all the charts right away.

If you want to fetch live data instead, grab a free key at newsdatahub.com/login. Note that some fields used in this tutorial (topics, keywords, source metadata) require a paid plan - the sample data includes these fields so you can explore the full analysis regardless.

For current API quotas and rate limits, visit newsdatahub.com/plans.

  • Basic Python syntax
  • Familiarity with lists and dictionaries
  • Understanding of loops and functions

We’ll retrieve news articles to analyze. You have two options:

With an API key: The script fetches live data from NewsDataHub, using cursor-based pagination to retrieve multiple pages (up to 200 articles).

Without an API key: The script downloads a sample dataset from GitHub, so you can follow along without signing up.

import requests
import matplotlib.pyplot as plt
from collections import Counter
import json
import os
# Set your API key here (or leave empty to use sample data)
API_KEY = "" # Replace with your NewsDataHub API key, or leave empty
# Check if API key is provided
if API_KEY and API_KEY != "your_api_key_here":
print("Using live API data...")
url = "https://api.newsdatahub.com/v1/news"
headers = {
"x-api-key": API_KEY,
"User-Agent": "bar-charts-in-python-using-real-news-data/1.0-py"
}
articles = []
cursor = None
# Fetch 2 pages (up to 200 articles)
for _ in range(2):
params = {
"per_page": 100,
"country": "US,FR,DE,ES,BR",
"source_type": "mainstream_news,digital_native"
}
if cursor:
params["cursor"] = cursor
response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
data = response.json()
articles.extend(data.get("data", []))
cursor = data.get("next_cursor")
if not cursor:
break
print(f"Fetched {len(articles)} articles from API")
else:
print("No API key provided. Loading sample data...")
# Download sample data if not already present
sample_file = "sample-news-data.json"
if not os.path.exists(sample_file):
print("Downloading sample data...")
sample_url = "https://raw.githubusercontent.com/newsdatahub/newsdatahub-data-science-tutorials/main/tutorials/bar-charts-news-data/data/sample-news-data.json"
response = requests.get(sample_url)
with open(sample_file, "w") as f:
json.dump(response.json(), f)
print(f"Sample data saved to {sample_file}")
# Load sample data
with open(sample_file, "r") as f:
data = json.load(f)
# Handle both formats: raw array or API response with 'data' key
if isinstance(data, dict) and "data" in data:
articles = data["data"]
elif isinstance(data, list):
articles = data
else:
raise ValueError("Unexpected sample data format")
print(f"Loaded {len(articles)} articles from sample data")

Expected output:

Using live API data...
Fetched 200 articles from API

or if running without the API key:

No API key provided. Loading sample data...
Downloading sample data...
Sample data saved to sample-news-data.json
Loaded 100 articles from sample data

API_KEY - Set to your NewsDataHub API key for live data, or leave empty to use sample data

When API_KEY is provided:

  • x-api-key header — Authenticates your request (replace with your actual key)
  • per_page parameter — Controls batch size (max 100 on free tier)
  • country parameter — Fetches from multiple countries (US, France, Germany, Spain, Brazil) to get diverse languages and topics
  • source_type parameter — Filters for mainstream and digital-native sources for quality content
  • cursor parameter — Marks your position in the result set for the next page
  • next_cursor — Returned in the response; use it for the next request
  • raise_for_status() — Throws an error for 4XX/5XX HTTP responses

When API_KEY is empty, the else block runs: it downloads sample data from GitHub (or loads it locally if already downloaded), giving you the same dataset structure without needing API access

Why multi-country filtering?

  • Creates meaningful language distribution — You’ll see English, French, German, Spanish, and Portuguese
  • Demonstrates global news coverage — Shows NewsDataHub’s international reach
  • Produces diverse topic data — Different countries cover different stories

Topic Distribution Bar Chart

Now you’ll aggregate topics and create a vertical bar chart showing the most popular topics.

# Extract topics from articles
topics = []
for article in articles:
article_topics = article.get("topics", [])
if article_topics:
# Topics is a list- add all topics from this article
if isinstance(article_topics, list):
topics.extend(article_topics)
else:
topics.append(article_topics)
# Exclude 'general' topic (articles not yet categorized)
topics = [t for t in topics if t != 'general']
topic_counts = Counter(topics)
print(f"Found {len(topic_counts)} unique topics (excluding 'general')")
# Get top 15 topics to avoid chart clutter
top_topics = dict(topic_counts.most_common(15))
print(f"Displaying top 15 topics out of {len(topic_counts)} total")

What this does:

  • NewsDataHub returns topics as an array, not a single value
  • Each article can have multiple topics, so we use extend() to add them all
  • Filters out 'general' — a placeholder for uncategorized articles
  • Counter aggregates and counts topic occurrences
  • Limits to top 15 topics — Prevents clutter from rare topics with only 1-2 occurrences, making the chart readable
# Color palette for data visualization
vibrant_colors = [
'#EF4444', # Red
'#3B82F6', # Blue
'#10B981', # Green
'#FBBF24', # Yellow
'#8B5CF6', # Purple
'#F59E0B', # Orange
'#EC4899', # Pink
'#14B8A6', # Teal
'#6366F1', # Indigo
'#F97316' # Orange-red
]
plt.figure(figsize=(12, 6))
categories = list(top_topics.keys())
values = list(top_topics.values())
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]
bars = plt.bar(categories, values, color=colors, edgecolor='white', linewidth=2)
plt.title("Top 15 Topics in News Coverage", fontsize=16, fontweight="bold", pad=20)
plt.xlabel("Topic", fontsize=12, fontweight="bold")
plt.ylabel("Article Count", fontsize=12, fontweight="bold")
plt.xticks(rotation=45, ha="right", fontsize=11)
plt.yticks(fontsize=11)
plt.grid(axis="y", linestyle="--", alpha=0.3)
# Add value labels on bars
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'{int(height)}', ha='center', va='bottom', fontsize=11, fontweight='bold')
plt.tight_layout()
plt.savefig("topic-distribution-chart.png", dpi=300, bbox_inches="tight")
plt.show()

Styling breakdown:

  • figsize=(12, 6) — Creates a wide chart to accommodate 15 categories
  • vibrant_colors — Professional color palette optimized for data visualization
  • edgecolor='white', linewidth=2 — White borders make bars stand out
  • rotation=45, ha="right" — Rotates x-labels to prevent overlap
  • grid(axis="y") — Adds horizontal gridlines for easier value comparison
  • tight_layout() — Prevents label cutoff
  • Value labels — Shows exact counts on top of each bar
  • savefig() — Exports high-resolution PNG for reports

Bar charts are ideal for categorical data like topics because they:

  • Show discrete categories clearly — Each bar represents a distinct topic
  • Make comparisons easy — Heights directly correspond to frequency
  • Reveal distribution patterns — Spot dominant vs. niche topics instantly

Step 3: Language Distribution with Horizontal Bars

Section titled “Step 3: Language Distribution with Horizontal Bars”

Language Distribution Horizontal Bar Chart

Horizontal bar charts work better when you have many categories or want to display labels without rotation.

Why this chart shows multiple languages: Because we filtered for multiple countries (US, France, Germany, Spain, Brazil) in Step 1, our dataset includes articles in English, French, German, Spanish, and Portuguese. If you filtered for a single country or language, this chart would only show one bar.

languages = [
article.get("language")
for article in articles
if article.get("language")
]
lang_counts = Counter(languages)
plt.figure(figsize=(10, 6))
categories = list(lang_counts.keys())
values = list(lang_counts.values())
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]
bars = plt.barh(categories, values, color=colors, edgecolor='white', linewidth=2)
plt.title("Language Distribution in News Coverage", fontsize=16, fontweight="bold", pad=20)
plt.xlabel("Article Count", fontsize=12, fontweight="bold")
plt.ylabel("Language", fontsize=12, fontweight="bold")
plt.xticks(fontsize=11)
plt.yticks(fontsize=11)
plt.grid(axis="x", alpha=0.3, linestyle="--")
# Add value labels
for bar in bars:
width = bar.get_width()
plt.text(width, bar.get_y() + bar.get_height()/2.,
f'{int(width)}', ha='left', va='center', fontsize=11, fontweight='bold',
bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7, edgecolor='none'))
plt.tight_layout()
plt.savefig("language-distribution-chart.png", dpi=300, bbox_inches="tight")
plt.show()

Prefer horizontal orientation when:

  • You have many categories (10+)
  • Labels are long (horizontal text is easier to read than rotated text)
  • Comparing similar values (horizontal layout makes small differences clearer)
  • Sorting alphabetically (creates a natural top-to-bottom reading flow)

Top 10 News Sources Chart

Analyzing source distribution helps identify the most active publishers and potential dataset biases.

Next, we extract the source name from each article and count how often each appears. In the NewsDataHub response, the source name is stored in the top-level source_title field. We use Python’s Counter to tally occurrences and grab the top 10 most frequent sources.

Note: The NewsDataHub API free tier does not include source metadata, topics, or keywords. But the sample data includes these fields, so you can follow along with the full analysis.

sources = [article.get("source_title") for article in articles if article.get("source_title")]
source_counts = Counter(sources)
top10 = source_counts.most_common(10)
print("Top 10 most active sources:")
for rank, (source, count) in enumerate(top10, 1):
print(f"{rank}. {source}: {count} articles")
plt.figure(figsize=(12, 6))
categories = [x[0] for x in top10]
values = [x[1] for x in top10]
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]
bars = plt.bar(categories, values, color=colors, edgecolor='white', linewidth=2)
plt.title("Top 10 Most Active News Sources", fontsize=16, fontweight="bold", pad=20)
plt.xlabel("News Source", fontsize=12, fontweight="bold")
plt.ylabel("Article Count", fontsize=12, fontweight="bold")
plt.xticks(rotation=45, ha="right", fontsize=11)
plt.yticks(fontsize=11)
plt.grid(axis="y", alpha=0.3, linestyle="--")
# Add value labels
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'{int(height)}', ha='center', va='bottom', fontsize=11, fontweight='bold')
plt.tight_layout()
plt.savefig("top-sources-chart.png", dpi=300, bbox_inches="tight")
plt.show()

Understanding source distribution helps you:

  • Identify dominant publishers — See which outlets produce the most content
  • Detect dataset bias — Over-representation of certain sources may skew analysis
  • Assess coverage diversity — Balanced distribution suggests varied perspectives
  • Plan data collection — Adjust filters if you need more source diversity

Bonus: Sort by Value for Better Readability

Section titled “Bonus: Sort by Value for Better Readability”
# Sort top 10 by count (descending)
top10_sorted = sorted(top10, key=lambda x: x[1], reverse=True)
categories = [x[0] for x in top10_sorted]
values = [x[1] for x in top10_sorted]
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]
plt.bar(categories, values, color=colors, edgecolor='white', linewidth=2)
# ... rest of chart code

Political Leaning Distribution

NewsDataHub includes political leaning metadata for sources, ranging from far-left to far-right. This enables bias analysis across your dataset.

Note: Political leaning data requires a paid NewsDataHub plan. Check newsdatahub.com/plans for feature availability.

leanings = [
article.get("source", {}).get("political_leaning")
for article in articles
if article.get("source", {}).get("political_leaning")
]
leaning_counts = Counter(leanings)
print(f"Political leaning: {len(leanings)} out of {len(articles)} articles have leaning data")
# Define order: left to right political spectrum + nonpartisan
# 'nonpartisan' represents wire services (AP, Reuters, AFP) and fact-based outlets
order = ['far_left', 'left', 'center_left', 'center', 'center_right', 'right', 'far_right', 'nonpartisan']
categories = [cat for cat in order if cat in leaning_counts]
values = [leaning_counts[cat] for cat in categories]
plt.figure(figsize=(12, 6))
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]
bars = plt.bar(categories, values, color=colors, edgecolor='white', linewidth=2)
plt.title('Political Leaning Distribution of News Sources', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Political Leaning', fontsize=12, fontweight='bold')
plt.ylabel('Article Count', fontsize=12, fontweight='bold')
plt.xticks(rotation=45, ha='right', fontsize=11)
plt.yticks(fontsize=11)
plt.grid(axis='y', alpha=0.3, linestyle='--')
# Add value labels
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'{int(height)}', ha='center', va='bottom', fontsize=11, fontweight='bold')
plt.tight_layout()
plt.savefig('political-leaning-chart.png', dpi=300, bbox_inches='tight')
plt.show()

Understanding the categories:

  • Political spectrum (far_left → far_right) — Sources with identifiable political bias
  • Nonpartisan — Wire services (AP, Reuters, AFP) and fact-based outlets that maintain editorial neutrality

Note: Nonpartisan sources often make up a significant portion of the dataset, as they include major wire services that distribute content globally.

Use cases for political leaning analysis:

  • Media bias research — Understand ideological balance in coverage
  • Comparative topic analysis — See how different sides cover the same story
  • Source diversity metrics — Ensure balanced representation in news aggregators
  • Trend analysis — Track how political coverage shifts over time

For more on filtering by political leaning, see Filter News by Political Leaning.


Best Practices for Professional Bar Charts

Section titled “Best Practices for Professional Bar Charts”

A consistent color palette makes your charts look polished and helps readers focus on the data rather than decoding random colors. Pick a palette and define it once at the top of your script:

# Define your color palette once at the top of your script
vibrant_colors = [
'#EF4444', # Red
'#3B82F6', # Blue
'#10B981', # Green
'#FBBF24', # Yellow
'#8B5CF6', # Purple
'#F59E0B', # Orange
'#EC4899', # Pink
'#14B8A6', # Teal
'#6366F1', # Indigo
'#F97316' # Orange-red
]

Then reference it in your charts - this keeps your visuals cohesive and makes future updates easier (change colors in one place, not ten).

Choosing a palette:

  • For general use: The vibrant palette above works well for most bar charts with up to 10 categories.
  • For accessibility: colorbrewer2.org offers palettes designed for colorblind-friendly visualization and print.
  • For branded reports: Use your organization’s brand colors to keep charts consistent with other materials.

The order of bars affects how easily readers interpret your chart. A random arrangement forces viewers to scan back and forth; a logical order tells a story at a glance.

# Sort by frequency (most common first) - best for rankings
sorted_items = topic_counts.most_common()
# Sort alphabetically - useful when readers need to find specific categories
sorted_items = sorted(topic_counts.items(), key=lambda x: x[0])
# Sort by custom order - use when categories have inherent sequence
custom_order = ["far-left", "left", "center", "right", "far-right"]
sorted_items = [(k, topic_counts[k]) for k in custom_order if k in topic_counts]

Which sorting to use:

  • Frequency: Default choice for most bar charts. Puts the most important data first and creates a natural visual hierarchy.
  • Alphabetical: Works well for reference charts where readers look up specific items (e.g., countries, source names).
  • Custom order: Essential when categories have a natural sequence - time periods, rating scales, or spectrums like political leaning.

Long or numerous category labels often collide, making your chart unreadable. Matplotlib won’t fix this automatically, so plan for it.

# Rotate x-axis labels
plt.xticks(rotation=45, ha="right")
# OR use horizontal bars for many categories
plt.barh(...)
# OR limit to top N categories
top_n = topic_counts.most_common(10)

When to use each approach:

  • Rotation (45°): Works for 5–10 categories with medium-length labels. Use ha=“right” to align rotated text cleanly.
  • Horizontal bars: The best option when you have many categories (10+) or labels longer than a few words. Readers can scan top-to-bottom without tilting their heads.
  • Limit categories: If you have dozens of categories, showing all of them creates noise. Display the top 10 (or top 5) and consider grouping the rest as “Other” if the total matters.

Default exports often look blurry in presentations or print. A few extra parameters ensure your charts stay crisp.

# PNG at 300 DPI - standard for reports and slides
plt.savefig("chart.png", dpi=300, bbox_inches="tight")
# SVG - scalable format for web, stays sharp at any size
plt.savefig("chart.svg", bbox_inches="tight")

Choosing a format:

  • PNG (300 DPI): Best for documents, slide decks, and email. The dpi=300 setting produces print-quality resolution.
  • SVG: Ideal for websites or interactive dashboards. Vector format means no pixelation regardless of zoom level.
  • bbox_inches=“tight”: Prevents Matplotlib from cropping labels or adding excessive whitespace around your chart.

Tip: Save both formats if you’re unsure where the chart will end up - it takes one extra line and covers all use cases.


NewsDataHub free tier offers 100 API calls per day. Here’s how to maximize your usage:

import json
# Save fetched data to disk
with open("cached_news.json", "w") as f:
json.dump(articles, f, indent=2)
# Load from cache instead of making API calls
with open("cached_news.json", "r") as f:
articles = json.load(f)

Benefits:

  • Iterate faster — No waiting for API responses during chart tweaking
  • Preserve quota — Save API calls for fresh data collection
  • Reproducibility — Analyze the same dataset across sessions

Use the per_page query parameter to fetch up to 100 articles per call (available on all tiers, including free). Two well-structured requests can give you 200 articles for analysis.

import datetime
# Log each API call
def fetch_with_logging(url, headers, params):
response = requests.get(url, headers=headers, params=params)
print(f"[{datetime.datetime.now()}] API call made. Status: {response.status_code}")
return response
# Count calls per session
api_calls = 0
for _ in range(2):
response = fetch_with_logging(url, headers, params)
api_calls += 1
print(f"Total API calls this session: {api_calls}")
  • Daily analysis — Fetch 100 articles/day for time-series tracking
  • Weekly deep dives — Accumulate 700 articles over a week
  • Upgrade when needed — Visit newsdatahub.com/plans for higher limits

  • What types of data work best with bar charts?

Bar charts excel with categorical data: topics, languages, sources, countries, sentiment categories, or any discrete grouping. For continuous numerical data (like stock prices over time), use line charts instead.

  • When should I use horizontal vs. vertical bars?

Use vertical bars for 3-10 categories with short labels. Switch to horizontal bars when you have 10+ categories, long category names, or want alphabetical sorting (easier to read top-to-bottom than rotated text).

  • How do I handle too many categories?

Limit to top N (e.g., most_common(10)), group rare categories into “Other”, or use a different chart type (treemap, word cloud).

  • Why use Counter instead of manual counting?

Counter from Python’s collections module is optimized for frequency counting, provides useful methods like most_common(), and handles missing keys gracefully.

  • How can I make charts interactive?

Switch from Matplotlib to Plotly for interactive charts:

import plotly.express as px
fig = px.bar(x=list(topic_counts.keys()), y=list(topic_counts.values()))
fig.show()
  • Can I combine multiple datasets in one chart?

Yes, use grouped or stacked bars:

# Grouped bars (side-by-side comparison)
x = np.arange(len(categories))
width = 0.35
plt.bar(x - width/2, dataset1, width, label='Dataset 1')
plt.bar(x + width/2, dataset2, width, label='Dataset 2')
plt.legend()
  • What if my API key doesn’t work?

Verify:

  1. Key is correct (check your dashboard)
  2. Header name is x-api-key (lowercase, with hyphens)
  3. You haven’t exceeded rate limits
  4. Network/firewall isn’t blocking API requests
  • How do I filter for specific topics or countries?

Add parameters to your API request:

params = {
"per_page": 100,
"topic": "technology",
"country": "US"
}

See NewsDataHub Search & Filtering Guide for all available filters.

  • How fresh is the data?

NewsDataHub updates continuously. Data freshness depends on your plan tier. Visit newsdatahub.com/plans for details.


Olga S.

Founder of NewsDataHub — Distributed Systems & Data Engineering

Connect on LinkedIn