How do I know I’m done paginating?

When `next_cursor` is null. We recommend to be mindful of your monthly quota allowance since it is very easy to exhaust your quota when paginating until the collection runs out.

Can I change filters in the middle of pagination?

Start a new run. Changing filters changes the result set and invalidates the cursor.

Cursors can become invalid if the underlying dataset changes significantly. If that happens, start a fresh run from the first page.

How Does Cursor Pagination Work in NewsDataHub API? Efficient News Data Fetching

intermediate 10 min read October 1, 2025

newsdatahub paginationcursor paginationrate limitsefficient fetchingapi optimization

TL;DR: NewsDataHub API implements cursor pagination where each response includes a next_cursor token you pass in subsequent requests to efficiently fetch the next batch of results without offset limitations.

Large datasets are common in APIs. Cursor-based pagination lets you fetch results efficiently by using a pointer instead of offsets. NewsDataHub returns a next_cursor you pass back to continue where you left off.

What Is Cursor Pagination?

Instead of requesting page 1, 2, 3 by offset, you receive a next_cursor token in each response. Send that token on your next request to fetch the next slice of data. This is stable, fast, and better for large result sets. Learn more about cursor based vs offset based pagination in this article.

Endpoint and Minimal Params

Endpoint: GET /v1/news
Params:
- per_page: integer, up to 100 for Pro plan (use the maximum allowed by your plan)
- next_cursor: string returned by the previous response

Example Response

{
  "next_cursor": "My4wLDE3NTkzMzE1MDUwMDAsMjAxMzkzNTEz",
  "total_results": 23253078,
  "per_page": 1,
  "data": [
    {
      "id": "45d72061-17f7-4222-8092-219a3dc60f5d",
      "title": "Trump's pharmaceutical tariff threat loses bite after Pfizer deal reassures drugmakers",
      "source_title": "CNBC",
      "source_link": "[https://cnbc.com](https://cnbc.com)",
      "article_link": "[https://www.cnbc.com/2025/10/01/trump-pharmaceutical-tariffs-pfizer-deal.html](https://www.cnbc.com/2025/10/01/trump-pharmaceutical-tariffs-pfizer-deal.html)",
      "keywords": [
        "pharmaceutical tariffs",
        "Pfizer deal",
        "drug pricing deal",
        "most-favored-nation policy"
      ],
      "topics": [
        "politics",
        "business",
        "healthcare",
        "tariffs"
      ],
      "description": "Pfizer's deal with Trump was a relief to the pharma industry, signaling that drugmakers could strike similar agreements that would make them immune to tariffs.",
      "pub_date": "2025-10-01T16:00:10",
      "creator": "CNBC",
      "content": "...",
      "media_url": "[https://image.cnbcfm.com/api/v1/image/108205811-1759248258700-gettyimages-2238328629-wm029751_kj9ps1cp.jpeg?v=1759248296&w=1920&h=1080](https://image.cnbcfm.com/api/v1/image/108205811-1759248258700-gettyimages-2238328629-wm029751_kj9ps1cp.jpeg?v=1759248296&w=1920&h=1080)",
      "media_type": null,
      "media_description": null,
      "media_credit": null,
      "media_thumbnail": null,
      "language": "en",
      "sentiment": {
        "pos": 0.062,
        "neg": 0.037,
        "neu": 0.901
      },
      "source": {
        "id": "cnbc",
        "country": "US",
        "political_leaning": "center",
        "reliability_score": 8.0,
        "type": "mainstream_news"
      }
    }
  ]
}

Paginating With next_cursor (Python)

import time
import requests

API_KEY = "your-api-key"
URL = "https://api.newsdatahub.com/v1/news"  # GET /v1/news

# Up to 100 for Pro plan - use maximum allowed by your specific plan
params = {"per_page": 100}
headers = {
    "x-api-key": API_KEY,
    "accept": "application/json",
    "User-Agent": "newsdatahub-pagination-optimization/1.0-py"
}

max_retries = 5
base_backoff = 1

page_count = 0
max_pages = 2  # Only fetch 2 pages

while page_count < max_pages:
    attempts = 0
    backoff = base_backoff
    while True:
        r = requests.get(URL, params=params, headers=headers, timeout=30)

        # Do not retry on 429 (likely out of quota)
        if r.status_code == 429:
            raise RuntimeError("Received 429 (likely out of quota). Stop or adjust your plan.")

        # Retry on non-2xx (except 429) with simple exponential backoff
        if not (200 <= r.status_code < 300):
            attempts += 1
            if attempts > max_retries:
                r.raise_for_status()
            time.sleep(backoff)
            backoff = min(backoff * 2, 60)
            continue

        # Success: proceed
        break

    payload = r.json()
    articles = payload.get("data", [])

    for a in articles:
        print(a.get("title"), a.get("source_title"), a.get("pub_date"), a.get("article_link"))

    page_count += 1

    # Check if there are more pages available
    next_cursor = payload.get("next_cursor")
    if not next_cursor:  # stop when server returns no cursor
        break

    # Set up for next page if we haven't reached max_pages
    params["next_cursor"] = next_cursor

Best Practices For Reliable Pagination

1. Optimize request size: Use per_page=100 when possible to reduce requests. Use smaller values only if memory-constrained.

2. Respect cursor integrity: Stop when next_cursor is null. Never create or modify cursors.

Cursors contain encoded pagination data only the server understands
Modified cursors break pagination and cause data issues

3. Maintain consistent filters: Keep filters unchanged during a pagination sequence.

Each cursor is calibrated to your specific search criteria
Changing filters invalidates cursors and creates inconsistent results
For new filters, start a fresh pagination run

4. Enable resumability: Log next_cursor and item IDs to create checkpoints.

Allows resuming after interruptions without data loss
Critical for large datasets and working with rate limits

Common Pitfalls (Quick Fixes)

Seeing duplicates across pages? Ensure you pass the latest next_cursor returned by the server without modification.
Empty page returned unexpectedly? Stop if next_cursor is absent. If present, retry the request on transient non-2xx statuses.
Hitting 429s? Treat it as a hard stop and review your quota or plan limits.

FAQs

How do I know I’m done paginating?
- When next_cursor is null. We recommend to be mindful of your monthly quota allowance since it is very easy to exhaust your quota when paginating until the collection runs out.
Can I change filters in the middle of pagination?
- Start a new run. Changing filters changes the result set and invalidates the cursor.
Do cursors expire?
- Cursors can become invalid if the underlying dataset changes significantly. If that happens, start a fresh run from the first page.

Summary

Use per_page up to your plan’s limit, keep filters stable, loop with next_cursor until it’s missing, and treat 429 as a stop signal. For other non-2xx responses, retry with capped backoff, then resume with the latest next_cursor. This yields fast, reliable pagination for large result sets.

Olga S.

Founder of NewsDataHub — Distributed Systems & Data Engineering

Connect on LinkedIn