Skip to content
NewsDataHub NewsDataHub Learning Center

How Does Cursor Pagination Work in NewsDataHub API? Efficient News Data Fetching

TL;DR: NewsDataHub API implements cursor pagination where each response includes a next_cursor token you pass in subsequent requests to efficiently fetch the next batch of results without offset limitations.

Large datasets are common in APIs. Cursor-based pagination lets you fetch results efficiently by using a pointer instead of offsets. NewsDataHub returns a next_cursor you pass back to continue where you left off.

Instead of requesting page 1, 2, 3 by offset, you receive a next_cursor token in each response. Send that token on your next request to fetch the next slice of data. This is stable, fast, and better for large result sets. Learn more about cursor based vs offset based pagination in this article.

  • Endpoint: GET /v1/news
  • Params:
    • per_page: integer, up to 100 for Pro plan (use the maximum allowed by your plan)
    • next_cursor: string returned by the previous response
{
"next_cursor": "My4wLDE3NTkzMzE1MDUwMDAsMjAxMzkzNTEz",
"total_results": 23253078,
"per_page": 1,
"data": [
{
"id": "45d72061-17f7-4222-8092-219a3dc60f5d",
"title": "Trump's pharmaceutical tariff threat loses bite after Pfizer deal reassures drugmakers",
"source_title": "CNBC",
"source_link": "[https://cnbc.com](https://cnbc.com)",
"article_link": "[https://www.cnbc.com/2025/10/01/trump-pharmaceutical-tariffs-pfizer-deal.html](https://www.cnbc.com/2025/10/01/trump-pharmaceutical-tariffs-pfizer-deal.html)",
"keywords": [
"pharmaceutical tariffs",
"Pfizer deal",
"drug pricing deal",
"most-favored-nation policy"
],
"topics": [
"politics",
"business",
"healthcare",
"tariffs"
],
"description": "Pfizer's deal with Trump was a relief to the pharma industry, signaling that drugmakers could strike similar agreements that would make them immune to tariffs.",
"pub_date": "2025-10-01T16:00:10",
"creator": "CNBC",
"content": "...",
"media_url": "[https://image.cnbcfm.com/api/v1/image/108205811-1759248258700-gettyimages-2238328629-wm029751_kj9ps1cp.jpeg?v=1759248296&w=1920&h=1080](https://image.cnbcfm.com/api/v1/image/108205811-1759248258700-gettyimages-2238328629-wm029751_kj9ps1cp.jpeg?v=1759248296&w=1920&h=1080)",
"media_type": null,
"media_description": null,
"media_credit": null,
"media_thumbnail": null,
"language": "en",
"sentiment": {
"pos": 0.062,
"neg": 0.037,
"neu": 0.901
},
"source": {
"id": "cnbc",
"country": "US",
"political_leaning": "center",
"reliability_score": 8.0,
"type": "mainstream_news"
}
}
]
}
import time
import requests
API_KEY = "your-api-key"
URL = "https://api.newsdatahub.com/v1/news" # GET /v1/news
# Up to 100 for Pro plan - use maximum allowed by your specific plan
params = {"per_page": 100}
headers = {
"x-api-key": API_KEY,
"accept": "application/json",
"User-Agent": "newsdatahub-pagination-optimization/1.0-py"
}
max_retries = 5
base_backoff = 1
page_count = 0
max_pages = 2 # Only fetch 2 pages
while page_count < max_pages:
attempts = 0
backoff = base_backoff
while True:
r = requests.get(URL, params=params, headers=headers, timeout=30)
# Do not retry on 429 (likely out of quota)
if r.status_code == 429:
raise RuntimeError("Received 429 (likely out of quota). Stop or adjust your plan.")
# Retry on non-2xx (except 429) with simple exponential backoff
if not (200 <= r.status_code < 300):
attempts += 1
if attempts > max_retries:
r.raise_for_status()
time.sleep(backoff)
backoff = min(backoff * 2, 60)
continue
# Success: proceed
break
payload = r.json()
articles = payload.get("data", [])
for a in articles:
print(a.get("title"), a.get("source_title"), a.get("pub_date"), a.get("article_link"))
page_count += 1
# Check if there are more pages available
next_cursor = payload.get("next_cursor")
if not next_cursor: # stop when server returns no cursor
break
# Set up for next page if we haven't reached max_pages
params["next_cursor"] = next_cursor

1. Optimize request size: Use per_page=100 when possible to reduce requests. Use smaller values only if memory-constrained.

2. Respect cursor integrity: Stop when next_cursor is null. Never create or modify cursors.

  • Cursors contain encoded pagination data only the server understands
  • Modified cursors break pagination and cause data issues

3. Maintain consistent filters: Keep filters unchanged during a pagination sequence.

  • Each cursor is calibrated to your specific search criteria
  • Changing filters invalidates cursors and creates inconsistent results
  • For new filters, start a fresh pagination run

4. Enable resumability: Log next_cursor and item IDs to create checkpoints.

  • Allows resuming after interruptions without data loss
  • Critical for large datasets and working with rate limits
  • Seeing duplicates across pages? Ensure you pass the latest next_cursor returned by the server without modification.
  • Empty page returned unexpectedly? Stop if next_cursor is absent. If present, retry the request on transient non-2xx statuses.
  • Hitting 429s? Treat it as a hard stop and review your quota or plan limits.
  • How do I know I’m done paginating?
    • When next_cursor is null. We recommend to be mindful of your monthly quota allowance since it is very easy to exhaust your quota when paginating until the collection runs out.
  • Can I change filters in the middle of pagination?
    • Start a new run. Changing filters changes the result set and invalidates the cursor.
  • Do cursors expire?
    • Cursors can become invalid if the underlying dataset changes significantly. If that happens, start a fresh run from the first page.

Use per_page up to your plan’s limit, keep filters stable, loop with next_cursor until it’s missing, and treat 429 as a stop signal. For other non-2xx responses, retry with capped backoff, then resume with the latest next_cursor. This yields fast, reliable pagination for large result sets.

Olga S.

Founder of NewsDataHub — Distributed Systems & Data Engineering

Connect on LinkedIn