How Does Cursor Pagination Work in NewsDataHub API? Efficient News Data Fetching
TL;DR: NewsDataHub API implements cursor pagination where each response includes a next_cursor token you pass in subsequent requests to efficiently fetch the next batch of results without offset limitations.
Large datasets are common in APIs. Cursor-based pagination lets you fetch results efficiently by using a pointer instead of offsets. NewsDataHub returns a next_cursor you pass back to continue where you left off.
What Is Cursor Pagination?
Section titled “What Is Cursor Pagination?”Instead of requesting page 1, 2, 3 by offset, you receive a next_cursor token in each response. Send that token on your next request to fetch the next slice of data. This is stable, fast, and better for large result sets. Learn more about cursor based vs offset based pagination in this article.
Endpoint and Minimal Params
Section titled “Endpoint and Minimal Params”- Endpoint:
GET /v1/news - Params:
per_page: integer, up to 100 for Pro plan (use the maximum allowed by your plan)next_cursor: string returned by the previous response
Example Response
Section titled “Example Response”{ "next_cursor": "My4wLDE3NTkzMzE1MDUwMDAsMjAxMzkzNTEz", "total_results": 23253078, "per_page": 1, "data": [ { "id": "45d72061-17f7-4222-8092-219a3dc60f5d", "title": "Trump's pharmaceutical tariff threat loses bite after Pfizer deal reassures drugmakers", "source_title": "CNBC", "source_link": "[https://cnbc.com](https://cnbc.com)", "article_link": "[https://www.cnbc.com/2025/10/01/trump-pharmaceutical-tariffs-pfizer-deal.html](https://www.cnbc.com/2025/10/01/trump-pharmaceutical-tariffs-pfizer-deal.html)", "keywords": [ "pharmaceutical tariffs", "Pfizer deal", "drug pricing deal", "most-favored-nation policy" ], "topics": [ "politics", "business", "healthcare", "tariffs" ], "description": "Pfizer's deal with Trump was a relief to the pharma industry, signaling that drugmakers could strike similar agreements that would make them immune to tariffs.", "pub_date": "2025-10-01T16:00:10", "creator": "CNBC", "content": "...", "media_url": "[https://image.cnbcfm.com/api/v1/image/108205811-1759248258700-gettyimages-2238328629-wm029751_kj9ps1cp.jpeg?v=1759248296&w=1920&h=1080](https://image.cnbcfm.com/api/v1/image/108205811-1759248258700-gettyimages-2238328629-wm029751_kj9ps1cp.jpeg?v=1759248296&w=1920&h=1080)", "media_type": null, "media_description": null, "media_credit": null, "media_thumbnail": null, "language": "en", "sentiment": { "pos": 0.062, "neg": 0.037, "neu": 0.901 }, "source": { "id": "cnbc", "country": "US", "political_leaning": "center", "reliability_score": 8.0, "type": "mainstream_news" } } ]}Paginating With next_cursor (Python)
Section titled “Paginating With next_cursor (Python)”import timeimport requests
API_KEY = "your-api-key"URL = "https://api.newsdatahub.com/v1/news" # GET /v1/news
# Up to 100 for Pro plan - use maximum allowed by your specific planparams = {"per_page": 100}headers = { "x-api-key": API_KEY, "accept": "application/json", "User-Agent": "newsdatahub-pagination-optimization/1.0-py"}
max_retries = 5base_backoff = 1
page_count = 0max_pages = 2 # Only fetch 2 pages
while page_count < max_pages: attempts = 0 backoff = base_backoff while True: r = requests.get(URL, params=params, headers=headers, timeout=30)
# Do not retry on 429 (likely out of quota) if r.status_code == 429: raise RuntimeError("Received 429 (likely out of quota). Stop or adjust your plan.")
# Retry on non-2xx (except 429) with simple exponential backoff if not (200 <= r.status_code < 300): attempts += 1 if attempts > max_retries: r.raise_for_status() time.sleep(backoff) backoff = min(backoff * 2, 60) continue
# Success: proceed break
payload = r.json() articles = payload.get("data", [])
for a in articles: print(a.get("title"), a.get("source_title"), a.get("pub_date"), a.get("article_link"))
page_count += 1
# Check if there are more pages available next_cursor = payload.get("next_cursor") if not next_cursor: # stop when server returns no cursor break
# Set up for next page if we haven't reached max_pages params["next_cursor"] = next_cursorBest Practices For Reliable Pagination
Section titled “Best Practices For Reliable Pagination”1. Optimize request size: Use per_page=100 when possible to reduce requests. Use smaller values only if memory-constrained.
2. Respect cursor integrity: Stop when next_cursor is null. Never create or modify cursors.
- Cursors contain encoded pagination data only the server understands
- Modified cursors break pagination and cause data issues
3. Maintain consistent filters: Keep filters unchanged during a pagination sequence.
- Each cursor is calibrated to your specific search criteria
- Changing filters invalidates cursors and creates inconsistent results
- For new filters, start a fresh pagination run
4. Enable resumability: Log next_cursor and item IDs to create checkpoints.
- Allows resuming after interruptions without data loss
- Critical for large datasets and working with rate limits
Common Pitfalls (Quick Fixes)
Section titled “Common Pitfalls (Quick Fixes)”- Seeing duplicates across pages? Ensure you pass the latest
next_cursorreturned by the server without modification. - Empty page returned unexpectedly? Stop if
next_cursoris absent. If present, retry the request on transient non-2xx statuses. - Hitting 429s? Treat it as a hard stop and review your quota or plan limits.
- How do I know I’m done paginating?
- When
next_cursoris null. We recommend to be mindful of your monthly quota allowance since it is very easy to exhaust your quota when paginating until the collection runs out.
- When
- Can I change filters in the middle of pagination?
- Start a new run. Changing filters changes the result set and invalidates the cursor.
- Do cursors expire?
- Cursors can become invalid if the underlying dataset changes significantly. If that happens, start a fresh run from the first page.
Summary
Section titled “Summary”Use per_page up to your plan’s limit, keep filters stable, loop with next_cursor until it’s missing, and treat 429 as a stop signal. For other non-2xx responses, retry with capped backoff, then resume with the latest next_cursor. This yields fast, reliable pagination for large result sets.