hefftools.dev
Infrastructure-focused data utilities

x509 vs Precertificate CT Entries: What You Can Decode and How to Normalize

Most CT ingestion pipelines break here: treating precerts like normal x509 certificates.

CT leaf types

Each CT entry has a Merkle tree leaf. The leaf indicates an entry type:

  • x509: leaf contains a DER-encoded X.509 certificate
  • precert: leaf contains issuer key hash + TBSCertificate bytes for a precertificate
These are not interchangeable. If you pretend they are, you will emit inconsistent fields and drift your schema.

x509 entries: decode the leaf cert, optionally decode chain

For x509 entries, you can decode the leaf DER normally (subject, issuer, SAN, validity, key algorithm, etc.). The extra_data often contains chain certs; those are optional to decode and may fail.

  • Decode leaf cert DER into normalized fields
  • Store leaf DER b64 if you want verifiability
  • Decode chain certs best-effort (non-fatal failures)

Precert entries: preserve TBSCertificate bytes and decode what you can

A precert leaf does not contain a normal x509 certificate. It contains:

  • issuer_key_hash
  • tbs_certificate DER bytes (TBSCertificate)

Many libraries won’t parse TBSCertificate as a full certificate object. So the safe move is:

  • Store issuer_key_hash
  • Store tbs_certificate_der_b64
  • Decode chain certs from extra_data to get issuer/subject fields when possible
If you provide a “leaf guess” from the chain, label it explicitly so downstream never confuses it with the precert TBSCertificate.

Normalization strategy: don’t lie

Your normalized record should remain schema-stable across both entry types. That usually means:

  • Top-level common metadata: log, index, CT timestamp, entry_type
  • For x509: normalized x509 fields + optional chain
  • For precert: issuer key hash + TBSCertificate bytes + optional chain + optional explicitly-labeled leaf guess

Downstream can decide whether to treat precert leaf guesses as usable inventory signals. Your job is to publish facts and be honest about their provenance.

Schema enforcement prevents drift

The fastest way to wreck a dataset is “just add a field.” ct-cert-feed enforces schemas by validating each normalized record against a versioned JSON Schema before it is written.

That is also why publishing manifest.json with SHA-256 hashes matters: it makes daily artifacts verifiable.