CT Log Paging: Short Reads, Retries, Backoff, and Contiguous Indices
CT logs use index-range retrieval. The API looks simple. The failure modes are not.
Start by anchoring on tree size
Fetching by date usually starts by calling get-sth to obtain the current tree_size.
Your ingestion window becomes an index range.
GET /ct/v1/get-sth → {"tree_size": 123456789, ...}
A common strategy is “last N entries”: compute end = tree_size - 1 and
start = max(0, end - cap + 1).
Retries: 429 and 5xx are normal
CT operators rate limit. Networks fail. Servers throw 5xx. If you treat these as rare, your pipeline will break.
- Retry on
429and5xx - Retry on transport/timeout exceptions
- Use exponential backoff
Short reads: the API can return fewer entries than requested
You request start..end and expect (end-start+1) entries. Sometimes you get fewer.
This is a short read.
- Accept the page (don’t discard it)
- Record it as a short read
- Advance by the count actually returned
Contiguous indices: assign indices explicitly
The CT API returns an array of entries without explicit indices. If you want deterministic replay,
you must assign indices as idx = cur + i while writing.
This also protects you from oddities like a server returning more than requested — you bound by end_index.
Rate limiting: simple RPS pacing beats “hope”
If you don’t pace requests, you’ll self-induce 429s. A simple sleep(1/rps) per request works well.
If rps=0, treat it as “no pacing.”
If you don’t want to maintain this, that’s the point
ct-cert-feed exists to turn these operator-specific ingestion headaches into a stable bulk artifact:
records.jsonl.gz + stats.json, replayable by date.