ByAUJay
How Do Modern Proof Aggregation Layers Keep Latency Low When Batching Mixed Plonk, STARK, and zkVM Proofs?
Short description: Proof aggregation layers hit low end-to-end latency by normalizing heterogeneous proofs up front, routing them through parallel queues, chunked recursion trees, and GPU‑pipelined provers, then wrapping to a single cheap on‑chain verifier—with dynamic batch sizing, straggler mitigation, and dual settlement modes to balance UX and finality. This post distills concrete architectures, latency math, and emerging best practices from live systems and recent research.
Why this matters now
Teams increasingly need to batch heterogeneous proofs—Plonk/Halo2, STARKs, and zkVM receipts (SP1, RISC Zero)—for cross‑chain updates, rollup checkpoints, ZK coprocessors, and compliance flows. Done naively, batching adds seconds to minutes of queueing and proving time. Done well, today’s layers routinely confirm in a few seconds on L2s and a single Ethereum transaction on L1, at a few hundred thousand gas per batch. (blog.alignedlayer.com)
The modern low‑latency architecture (what the fastest stacks actually do)
- Intake and normalization at the “verification layer” edge
- Parse and sanity‑check multiple proof families (Groth16/Plonk/Halo2, STARK receipts, zkVM receipts) into a canonical metadata schema: curve points, public IO sizes, versioned verification keys, and normalized byte layouts.
- Use per‑system parallel queues so a malformed or slow STARK submission doesn’t block fast Groth16 micro‑proofs.
This design shows up in production “verification layers” and internal aggregators. (docs.layeredge.io)
- Route into parallel, type‑aware fast paths
- SNARKs (Groth16/Plonk/Halo2): batch/aggregate directly (e.g., SnarkPack for Groth16; Halo2‑KZG aggregation).
- STARKs and zkVMs: first compose/recursively fold receipts, then wrap once to a succinct SNARK for cheap on‑chain verification.
- Mixed batches: reduce each proof to a “verification gadget” inside a common reduction circuit, then produce one super‑proof. Public implementations support Groth16/Plonk, Plonky2, Halo2‑KZG, SP1, and RISC0 in one pipeline. (research.protocol.ai)
- Chunked recursion trees to avoid head‑of‑line blocking
- Aggregate in fixed‑size chunks (e.g., 256) and then aggregate the aggregations. This prevents a few very heavy receipts from delaying a full batch. Live systems have shipped “aggregate proofs of aggregated proofs” and chunked pipelines. (blog.alignedlayer.com)
- Dual settlement modes to trade latency vs. finality
- “Fast mode” (off‑chain verification layer): verify proofs off‑chain with a decentralized operator set, post a small attestation (e.g., aggregated BLS signature) to L1/L2 for near‑instant confirmations; throughput scales to thousands of verifications/sec.
- “Hard‑finality mode” (on‑chain aggregation): recursively compress many proofs into one SNARK verified on Ethereum (~300k gas), posted on a cadence that your app can tolerate. Deployed stacks expose both options. (blog.alignedlayer.com)
- GPU‑pipelined proving and witness/prover decoupling
- Keep GPUs busy with pipeline stages (sum‑check, Merkle, encoders) and overlap transfers with compute; recent systems report 259× throughput gains and sub‑second proof generation for targeted workloads.
- Separate witness generation from proving and auto‑partition large circuits to balance stage times, improving resource utilization and end‑to‑end latency. (eprint.iacr.org)
Where latency actually hides (and how the best teams shave it)
Latency = queueing to fill a batch + proving/aggregation + any wrapper + on‑chain inclusion.
- Queueing to fill batch N at arrival rate λ: expect ≈ N/λ wait; with a time cap T, median wait ≈ T/2.
- Aggregation/recursion: seconds if GPU‑pipelined and chunked; minutes if you over‑batch or wrap slowly.
- Wrapping: some stacks add a constant overhead to convert STARK→SNARK; for SP1 today, ~6s extra for Groth16 and ~70s for Plonk.
- On‑chain inclusion: L2s confirm in 1–2 blocks; Ethereum L1 finality is minutes, but many apps accept economic finality earlier. (7blocklabs.com)
Concrete public numbers you can budget against:
- Groth16 aggregation (SnarkPack): aggregates 8,192 proofs in ≈8–9s on a 32‑core CPU; verifies in ≈33 ms off‑chain. On‑chain, Groth16 verifiers are typically a few hundred k gas. (research.protocol.ai)
- Halo2‑KZG aggregation (e.g., Nebra UPA): ≈350k gas base + ≈7k gas per included proof; ≈18–22k gas/proof at N≈32. (blog.nebra.one)
- zkVM receipts on EVM: SP1 Groth16 ≈270–300k gas; Plonk ≈300k gas; proof sizes ≈260 bytes (Groth16) or ≈868 bytes (Plonk). (docs.succinct.xyz)
- Mixed‑scheme aggregation on Ethereum via a reduction circuit (Electron): super‑proof verify ≈380k gas; inclusion checks ≈16k gas per micro‑proof. (docs.electron.dev)
- Off‑chain verification layer (Aligned): mainnet beta live since 2024 with ~200 verifications/s observed; tested >4,000/s; aggregated attestation ~113k gas; aggregation service: ~300k gas per aggregated proof (minutes‑scale cadence). (blog.alignedlayer.com)
How layers keep heterogeneous batches fast in practice
- Normalize first, verify later
- Proofs arrive as Risc0 SuccinctReceipts, SP1 compressed/Groth16/Plonk proofs, Halo2‑KZG, Plonky2, Groth16 from gnark/circom, etc.
- The intake layer validates structure (curve checks, lengths, public IO bounds) and emits canonical metadata for the aggregator, keeping downstream logic branch‑free and parallel. (docs.layeredge.io)
- Build recursion trees that cap wait time
- Chunk aggregation so each chunk’s wall‑clock stays within your SLO (e.g., ≤1–3s on your GPU tier), then fold chunk roots. Aligned’s engineering notes describe splitting aggregation “in chunks” and then aggregating the chunk roots. (blog.alignedlayer.com)
- Stream with folding when arrivals are continuous
- Use folding/IVC (Nova/HyperNova/MicroNova/Mova families) to absorb each new proof at ~O(1) incremental cost, publishing frequent “heads” and SNARK‑compressing on a slower cadence. This minimizes queueing and smooths p95 tails. (eprint.iacr.org)
- Treat STARK→SNARK as a constant in the budget
- zkVM stacks designed for L1 verification do recursion in STARK and then emit a single Groth16/Plonk:
- RISC Zero: STARK execution + STARK recursion + R1CS “STARK‑to‑SNARK” wrapper → Groth16 receipt verified by a canonical Solidity contract.
- SP1: STARK recursion + Groth16 (~260 B, ~270k gas) or Plonk (~868 B, ~300k gas) for on‑chain verification.
Account for the fixed wrapping overhead when setting batch windows. (dev.risczero.com)
- GPU pipelines and witness/prover decoupling
- Pipeline sum‑check/Merkle/encoding kernels and overlap PCIe copies; BatchZK reports >259× throughput vs. prior GPU systems.
- Partition circuits and run witness‑gen concurrently with proving (Yoimiya), aligning stage times to eliminate idle gaps. (eprint.iacr.org)
- Keep the on‑chain verifier cheap and future‑proof
- Ethereum’s bn128 precompiles (EIP‑1108) make Groth16 verifiers ~constant‑time in gas (pairing formula 34k·k + 45k) and affordable; use Groth16 when you want the smallest proof and lowest gas.
- EIP‑4844 adds the 0x0A KZG point‑evaluation precompile (50k gas) for blob DA; combine with BLS12‑381 precompiles from EIP‑2537 (0x0b–0x11) for fast signature aggregation in off‑chain verification layers and bridges.
- EIP‑7623 increases calldata price for data‑heavy bundles, further favoring SNARK‑wrapped proofs over raw STARK verification on L1. (eips.ethereum.org)
- Offer two settlement paths (and let integrators choose)
- Fast confirmations: verify many proofs off‑chain (restaked operator set) and post one attestation (BLS aggregate) → sub‑second UX on many L2s.
- Hard L1 finality: post an aggregated SNARK every few minutes or at a target block cadence. Many production stacks expose both APIs. (blog.alignedlayer.com)
Worked latency budgets you can copy
A) Mixed batch to an L2 (target p50 ≤ 10–15 s)
- Arrival rate λ ≈ 10/s; choose chunk N = 32 → expected fill ≈ 3.2 s.
- Aggregation (GPU‑pipelined Halo2‑KZG) ≈ 2–4 s for the chunk.
- zkVM items in the mix: add STARK→Groth16 wrap if you don’t pre‑wrap receipts (SP1 Groth16 wrap ≈ +6 s).
- L2 inclusion: ~1–2 blocks (≈1–3 s).
Total: ~7–14 s to confirmation, with 20–40k gas per included proof on L2 when using Halo2‑KZG aggregation. If arrivals drop to 1/s, the window dominates—use folding or smaller N. (blog.nebra.one)
B) L1 “hard finality” batch (target cost and finality, not raw speed)
- Aggregate heterogeneous inputs into a single Groth16 super‑proof (~300–400k gas to verify).
- Users verify inclusion with a ~16–25k‑gas call later (Merkle/bitmap check).
- Post every X minutes or Y proofs, whichever first (common cadence: several times/day on testnets; production depends on fee markets). (docs.electron.dev)
Concrete examples of mixed‑proof latency playbooks
- Low‑latency cross‑chain updates (oracle or intent settlement)
- Ingest publisher SNARKs (Groth16 via gnark/circom), RISC0 succinct receipts, and SP1 Groth16 receipts.
- Off‑chain verification layer verifies all receipts in parallel; posts one BLS‑aggregated attestation to Ethereum or to destination L2 for near‑instant consumption.
- Periodically, the aggregation service emits a recursive SNARK to Ethereum (~300k gas) to anchor hard finality. This hybrid keeps p50 confirmations to a few seconds while maintaining L1‑level checkpoints. (blog.alignedlayer.com)
- Rollup proof consolidation for fee and exit‑time reduction
- Rollups generating STARK proofs (and even non‑ZK stacks via pessimistic proofs) submit to an aggregation network.
- Recursive composition + SNARK wrapping collapses many L2 blocks into one L1 proof, cutting per‑block fixed L1 costs and exit windows; Polygon’s AggLayer additionally uses “pessimistic proofs” to make multi‑stack interop safe. (polygon.technology)
- ZK coprocessor hub (zkVM‑heavy, multi‑tenancy)
- Accept SP1 compressed proofs (or Groth16/Plonk receipts) and RISC0 Groth16 receipts.
- Use chunked recursion trees so different tenants don’t block each other; cap chunk wall‑time at ~2 s; emit a new head every block on your target L2; SNARK‑wrap and post to L1 every 5–10 min.
- Expect verifier gas of ~270–380k per batch on L1 and ~20k–25k per on‑chain inclusion check by end users. (docs.succinct.xyz)
Gas, bytes, and why wrapped proofs win on Ethereum L1
-
A single raw STARK verify is multi‑million gas with large calldata; wrapped Groth16/Plonk proofs verify at ~230k–300k gas with sub‑KB calldata. If EIP‑7623 raises calldata price, the gap widens. That’s why STARK→SNARK wrapping is the default for L1 settlement unless you verify off‑chain. (7blocklabs.com)
-
Groth16 got cheaper post‑EIP‑1108; keep public inputs tiny to minimize pairings/MSM. On EVM, that often beats verifying heterogeneous proofs separately—even when each micro‑proof is “small.” (eips.ethereum.org)
-
EIP‑4844’s KZG precompile (0x0A, 50k gas) helps DA, but not proof calldata; it pairs nicely with EIP‑2537’s BLS precompiles for fast BLS aggregation in bridges/verification layers. (eips.ethereum.org)
Emerging best practices we’re implementing with clients
-
Prefer streaming accumulation (folding/IVC) when arrivals are continuous. Publish a new recursive head often (e.g., per block on L2), SNARK‑compress on a slower cadence. This caps queueing and p95 tails. (eprint.iacr.org)
-
Keep batch windows ≤ one‑third of target block time on your settlement chain. This rule‑of‑thumb maximizes “next‑block” inclusion probability while retaining amortization benefits. (7blocklabs.com)
-
Pre‑wrap zkVM receipts if you need predictable p50. SP1’s Groth16/Plonk wrappers have fixed overheads; budget ~6 s extra for Groth16 and ~70 s for Plonk when wrapping on demand. (docs.succinct.xyz)
-
Use chunked aggregation with hedged execution. Start a backup aggregation for slow chunks after a percentile cutoff (e.g., p80) to reduce straggler impact and keep the batch cadence tight. Live systems split aggregation into chunks for exactly this reason. (blog.alignedlayer.com)
-
Engineer the prover like a streaming system. Apply GPU pipelining (BatchZK) and split witness generation (Yoimiya) to align stage times and avoid idle hardware. (eprint.iacr.org)
-
Treat verification as a product surface. Offer both “fast soft‑finality now” via an off‑chain verification layer and “hard L1 finality later” via recursive SNARK settlement. Many users will want both. (blog.alignedlayer.com)
-
Mind fee markets and evolving precompiles. EIP‑2537 unlocks BLS‑based aggregation patterns; EIP‑7623 penalizes calldata‑heavy bundles; design verifiers/public inputs accordingly. (eips.ethereum.org)
Buyer’s checklist for an aggregation layer (with numbers)
- Proof types supported today (docs > code > audits): Groth16, Plonk/Halo2‑KZG, Plonky2, SP1 receipts, RISC0 receipts. Ask for gas numbers and inclusion‑proof API. (docs.electron.dev)
- On‑chain gas per aggregated batch: target ~300–400k; per‑proof inclusion query ~16–25k. (docs.electron.dev)
- Throughput and p50/p95 latency in production, not test rigs (look for hundreds–thousands verifications/sec off‑chain; minutes‑cadence on-chain aggregation only if you need it). (blog.alignedlayer.com)
- Chunk size and policy (e.g., 256) and SLO‑based flush (time‑or‑count, whichever first). (blog.alignedlayer.com)
- zkVM wrap options and overheads (SP1/R0VM Groth16 vs. Plonk) and whether pre‑wrapping is available for low‑latency paths. (docs.succinct.xyz)
- Audit posture for reduction/aggregation circuits (look for recent audits and formal‑methods progress on zkVM stacks). (veridise.com)
Final takeaway
“Low‑latency mixed‑proof aggregation” isn’t about one clever cryptographic trick. It’s a systems problem: normalize early, parallelize aggressively, chunk and fold to cap wait times, pipeline the GPU, and post one tiny, cheap proof when you must. With today’s verification layers and recursive wrappers, you can hit sub‑15‑second confirmations on L2s and a single ~300k‑gas transaction on Ethereum for hard finality—even when your batches mix Plonk, STARK, and zkVM proofs. (blog.alignedlayer.com)
References (selected)
- SnarkPack (Groth16 aggregation) performance. (research.protocol.ai)
- Nebra UPA gas math (Halo2‑KZG aggregation). (blog.nebra.one)
- Electron (mixed‑scheme super‑proofs) gas. (docs.electron.dev)
- SP1 proof types, gas, and wrapping behavior. (docs.succinct.xyz)
- RISC Zero recursion and STARK→SNARK pipeline. (dev.risczero.com)
- Aligned Layer throughput and dual‑mode settlement. (blog.alignedlayer.com)
- EIP‑1108 (bn128 repricing), EIP‑4844 (0x0A KZG precompile), EIP‑2537 (BLS12‑381 precompiles), EIP‑7623 (calldata cost). (eips.ethereum.org)
- BatchZK (GPU‑pipelined proving) and Yoimiya (pipeline witness/prover). (eprint.iacr.org)
- AggLayer pessimistic proofs for secure multi‑stack interop. (polygon.technology)
Like what you're reading? Let's build together.
Get a free 30‑minute consultation with our engineering team.

