CT Log Paging: Short Reads, Retries, Backoff, and Contiguous Indices

CT logs use index-range retrieval. The API looks simple. The failure modes are not.

Overview Guides Pricing Schemas & examples

Start by anchoring on tree size

Fetching by date usually starts by calling get-sth to obtain the current tree_size. Your ingestion window becomes an index range.

GET /ct/v1/get-sth  →  {"tree_size": 123456789, ...}

A common strategy is “last N entries”: compute end = tree_size - 1 and start = max(0, end - cap + 1).

Retries: 429 and 5xx are normal

CT operators rate limit. Networks fail. Servers throw 5xx. If you treat these as rare, your pipeline will break.

Retry on 429 and 5xx
Retry on transport/timeout exceptions
Use exponential backoff

Best practice: record retries and the last URL so you can debug operator-specific behavior without guessing.

Short reads: the API can return fewer entries than requested

You request start..end and expect (end-start+1) entries. Sometimes you get fewer. This is a short read.

Accept the page (don’t discard it)
Record it as a short read
Advance by the count actually returned

Hard failure is zero entries. If the server returns an empty page for a non-empty range, you can’t make progress safely.

Contiguous indices: assign indices explicitly

The CT API returns an array of entries without explicit indices. If you want deterministic replay, you must assign indices as idx = cur + i while writing.

This also protects you from oddities like a server returning more than requested — you bound by end_index.

Rate limiting: simple RPS pacing beats “hope”

If you don’t pace requests, you’ll self-induce 429s. A simple sleep(1/rps) per request works well. If rps=0, treat it as “no pacing.”

If you don’t want to maintain this, that’s the point

ct-cert-feed exists to turn these operator-specific ingestion headaches into a stable bulk artifact: records.jsonl.gz + stats.json, replayable by date.

ct-cert-feed overview Pricing