7Block Labs
Blockchain Technology

ByAUJay

How Do Modern Proof Aggregation Layers Keep Latency Low When Batching Mixed Plonk, STARK, and zkVM Proofs?

Why this matters now

Teams are finding that they need to batch different types of proofs--like Plonk/Halo2, STARKs, and zkVM receipts (think SP1, RISC Zero)--for things like cross-chain updates, rollup checkpoints, ZK coprocessors, and compliance flows. If you go about batching the wrong way, it could mean adding anywhere from a few seconds to even minutes to your queueing and proving times. But when it's done right, today’s layers can pull off confirmations in just a few seconds on L2s and a single Ethereum transaction on L1, costing only a few hundred thousand gas per batch. You can check out more about it here.


The modern low‑latency architecture (what the fastest stacks actually do)

1) Intake and Normalization at the "Verification Layer" Edge

  • We start by parsing and sanity-checking a bunch of proof families, like Groth16, Plonk, Halo2, STARK receipts, and zkVM receipts, and we fit them into a standard metadata schema. This includes things like curve points, public IO sizes, versioned verification keys, and a normalized byte layout.
  • To keep things running smoothly, we’ve set up separate parallel queues for each system. This way, if there’s a slow or malformed STARK submission, it won’t hold up the speedy micro-proofs from Groth16.

You can see this design in action in our production "verification layers" as well as internal aggregators. Check it out here: (docs.layeredge.io)

2) Route into Parallel, Type‑Aware Fast Paths

  • SNARKs (Groth16/Plonk/Halo2): You can batch or aggregate them straight away. For example, check out SnarkPack for Groth16 and Halo2‑KZG aggregation.
  • STARKs and zkVMs: Start by composing or recursively folding the receipts, then wrap it up once more into a succinct SNARK. This makes on-chain verification a lot more affordable.
  • Mixed batches: To simplify things, reduce each proof to a “verification gadget” within a common reduction circuit, and then create one super-proof. You’ll find that public implementations can handle Groth16/Plonk, Plonky2, Halo2‑KZG, SP1, and RISC0 all in one streamlined pipeline. (research.protocol.ai)

3) Chunked Recursion Trees to Avoid Head-of-Line Blocking

  • Instead of processing everything at once, break things down into fixed-size chunks (like 256) and then aggregate those chunks together. This method keeps a few super heavy receipts from holding up the entire batch. Some live systems have already rolled out “aggregate proofs of aggregated proofs” and chunked pipelines. Check it out here: (blog.alignedlayer.com).

4) Dual Settlement Modes: Balancing Latency and Finality

  • Fast Mode (off-chain verification layer): Here, we verify proofs off-chain using a decentralized group of operators. Then, a tiny attestation (like an aggregated BLS signature) is sent to L1/L2 for super quick confirmations. This setup can scale to handle thousands of verifications per second.
  • Hard-Finality Mode (on-chain aggregation): In this approach, we take a bunch of proofs and compress them into a single SNARK, which is verified on Ethereum and costs about ~300k gas. The frequency of posting can be adjusted based on what your app can handle. You’ll find that deployed stacks offer both of these options. Check it out for more info! (blog.alignedlayer.com)

5) GPU‑Pipelined Proving and Witness/Prover Decoupling

  • Let's keep those GPUs working hard by using pipeline stages like sum-check, Merkle, and encoders, while also juggling transfers with the compute tasks. Some recent setups are showing off impressive 259× throughput gains and can whip up proofs in under a second for specific workloads.
  • By splitting witness generation from proving and automatically partitioning large circuits, we can balance out the time across stages. This not only makes better use of our resources but also helps reduce overall latency. Check out more details here.

Where latency actually hides (and how the best teams shave it)

Latency is made up of a few parts: the time spent waiting to fill a batch, the proving and aggregation time, any additional wrappers, and then the actual on-chain inclusion.

  • When you’re queuing up to fill batch N at an arrival rate of λ, you can expect to wait roughly N/λ. If there's a time cap T in play, the median wait time is about T/2.
  • For aggregation and recursion, if you're using GPU pipelining and chunking, you're looking at seconds. But if you over-batch or take your time, it can stretch out to minutes.
  • Now, let's talk about wrapping: some stacks throw in a constant overhead when converting from STARK to SNARK. For SP1 today, you’re looking at an extra ~6 seconds for Groth16 and around ~70 seconds for Plonk.
  • When it comes to on-chain inclusion, L2s can confirm in 1 to 2 blocks. Ethereum L1 might take a few minutes to finalize, but many applications are okay with accepting economic finality a bit earlier. (7blocklabs.com)

Concrete public numbers you can plan your budget around:

  • Groth16 aggregation (SnarkPack): It manages to aggregate 8,192 proofs in about 8 to 9 seconds on a 32-core CPU and can verify them off-chain in roughly 33 milliseconds. When it comes to on-chain verifications, you're generally looking at a few hundred thousand gas. Check out more details here.
  • Halo2‑KZG aggregation (like Nebra UPA): This one starts at around 350k gas plus an extra 7k gas for each proof you include. At a level of N≈32, you're looking at about 18 to 22k gas per proof. For the nitty-gritty, click here.
  • zkVM receipts on EVM: For SP1 Groth16, gas costs are approximately 270 to 300k, while Plonk is a bit higher at around 300k gas. As for proof sizes, Groth16 clocks in at about 260 bytes and Plonk at roughly 868 bytes. Dive deeper into this here.
  • Mixed-scheme aggregation on Ethereum via a reduction circuit (Electron): When verifying a super-proof, expect around 380k gas, while inclusion checks for each micro-proof come in at about 16k gas. More info is available here.
  • Off-chain verification layer (Aligned): The mainnet beta has been live since 2024, managing about 200 verifications per second with tests showing over 4,000 per second! The aggregated attestation costs about 113k gas, and their aggregation service runs around 300k gas per aggregated proof, typically at a minutes-scale cadence. Get the full scoop here.

How layers keep heterogeneous batches fast in practice

  1. Normalize first, verify later
  • Proofs come in various forms like Risc0 SuccinctReceipts, SP1 compressed proofs, Groth16, Plonk proofs, Halo2‑KZG, or Plonky2, along with Groth16 from gnark/circom, and more.
  • The intake layer takes care of validating the structure, checking the curves, lengths, and public IO bounds. It then sends out some handy metadata for the aggregator, which helps keep the downstream logic nice and clean without branches, letting everything run parallel. (docs.layeredge.io)

2) Build Recursion Trees that Cap Wait Time

  • Start by breaking things down into smaller chunks and make sure each chunk’s wall-clock stays within your Service Level Objective (SLO)--think around 1 to 3 seconds on your GPU tier. After that, you can combine the chunk roots. If you want to dive deeper, check out Aligned’s engineering notes where they talk about splitting aggregation “in chunks” and then merging those chunk roots. You can read more about it here.
  1. Stream with folding for continuous arrivals
  • Go for folding/IVC (think Nova/HyperNova/MicroNova/Mova families) to seamlessly handle each new proof at around O(1) added cost. This lets you publish updates frequently while SNARK-compressing at a more relaxed pace. It keeps queueing to a minimum and really helps smooth out those p95 tails. (eprint.iacr.org)

4) Think of STARK→SNARK as a constant in your budget

  • For zkVM stacks that are built for L1 verification, the process runs STARK recursion and then produces a single Groth16/Plonk output:
    • RISC Zero: This one combines STARK execution with STARK recursion and uses an R1CS “STARK-to-SNARK” wrapper to generate a Groth16 receipt, which is then verified by a standard Solidity contract.
    • SP1: It takes care of STARK recursion and can use either Groth16 (about 260 bytes, roughly 270k gas) or Plonk (around 868 bytes, about 300k gas) for verification on-chain.

Make sure to factor in the fixed wrapping overhead when you’re planning out your batch windows. Check out more details at dev.risczero.com.

5) GPU Pipelines and Witness/Prover Decoupling

  • We’ve got this cool setup where pipeline sum-check/Merkle/encoding kernels run alongside PCIe copies, and guess what? BatchZK is crushing it with over 259× the throughput compared to older GPU systems!
  • By partitioning circuits and running witness generation in tandem with proving (thanks to Yoimiya), we can line up stage times perfectly, which means no more idle gaps. Check it out here: (eprint.iacr.org)

6) Keep the on-chain verifier cheap and future-proof

  • Thanks to Ethereum’s bn128 precompiles (EIP-1108), Groth16 verifiers are pretty much constant time in terms of gas usage (the pairing formula runs at 34k·k + 45k). So, if you’re after the tiniest proof and the lowest gas costs, Groth16 is the way to go.
  • With EIP-4844, we now have the 0x0A KZG point-evaluation precompile (which hits around 50k gas) for blob data availability. Pair this with the BLS12-381 precompiles from EIP-2537 (0x0b-0x11), and you’ll enjoy super fast signature aggregation in your off-chain verification layers and bridges.
  • EIP-7623 bumps up the calldata price for those data-heavy bundles, making SNARK-wrapped proofs even more appealing compared to raw STARK verification on L1. (eips.ethereum.org)

7) Offer two settlement paths (and let integrators choose)

  • Fast confirmations: This approach verifies a bunch of proofs off-chain using a restaked operator set and then posts a single attestation with a BLS aggregate. The result? You get a super fast, sub-second user experience on a bunch of L2s.
  • Hard L1 finality: In this one, we post an aggregated SNARK every few minutes or at a specific block cadence. A lot of production stacks have both APIs available. Check it out here: (blog.alignedlayer.com)

Worked latency budgets you can copy

A) Mixed Batch to an L2 (Target p50 ≤ 10-15 s)

  • Arrival rate is around λ ≈ 10/s, so let's go with a chunk size of N = 32, which gives us an expected fill time of about 3.2 seconds.
  • For aggregation using the GPU-pipelined Halo2-KZG, expect that to take around 2-4 seconds for the chunk.
  • If you're mixing in zkVM items, make sure to add a STARK→Groth16 wrap unless you’ve already got those receipts pre-wrapped (just a heads up: the SP1 Groth16 wrap takes about +6 seconds).
  • As for L2 inclusion, plan on about 1-2 blocks, which should take roughly 1-3 seconds.

Adding it all up, you're looking at a total time of roughly 7-14 seconds for confirmation. Just so you know, it's going to use about 20-40k gas per proof included on L2 when you're using that Halo2-KZG aggregation. If you find that arrivals are dropping to 1/s, the window time becomes the main factor--consider using folding or a smaller N to keep things moving. (blog.nebra.one)

B) L1 “hard finality” batch (we're focusing on cost and finality, not just raw speed)

  • Combine different inputs into one big Groth16 super-proof (verifying will cost about 300-400k gas).
  • Users can check their inclusion later with a call that costs around 16-25k gas (think Merkle/bitmap check).
  • We'll post every X minutes or after Y proofs, whichever comes first. Commonly, this happens several times a day on testnets, but the production schedule will depend on the fee markets. (docs.electron.dev)

Concrete examples of mixed‑proof latency playbooks

1) Low-Latency Cross-Chain Updates (Oracle or Intent Settlement)

  • We’re bringing in publisher SNARKs (Groth16 via gnark/circom), RISC0 succinct receipts, and SP1 Groth16 receipts.
  • An off-chain verification layer checks all receipts at the same time and sends one BLS-aggregated attestation over to Ethereum or the target L2 for super quick use.
  • Every so often, the aggregation service will drop a recursive SNARK to Ethereum (around ~300k gas) to make sure we have solid finality. This setup keeps our p50 confirmations down to just a few seconds while still providing L1-level checkpoints. (blog.alignedlayer.com)

2) Rollup Proof Consolidation for Fee and Exit-Time Reduction

  • Rollups that create STARK proofs (and even those using non-ZK stacks with pessimistic proofs) are sent to an aggregation network.
  • By using recursive composition and SNARK wrapping, multiple L2 blocks can be compressed into a single L1 proof. This approach helps to lower the fixed L1 costs per block and reduces exit windows. Plus, Polygon’s AggLayer adds “pessimistic proofs” to ensure safe multi-stack interoperability. Check it out here!

3) ZK Coprocessor Hub (zkVM-heavy, Multi-tenancy)

  • It’ll accept SP1 compressed proofs (or Groth16/Plonk receipts) along with RISC0 Groth16 receipts.
  • To keep things running smoothly, we’ve got chunked recursion trees in place so that different tenants won’t hold each other up. We’re aiming to cap chunk wall-time at about 2 seconds and drop a new head every block on your chosen L2. Plus, we’ll SNARK-wrap and post to L1 every 5 to 10 minutes.
  • You can expect the verifier gas to be around 270k to 380k per batch on L1 and approximately 20k to 25k for on-chain inclusion checks done by the end users. Check out the full details here!

Gas, bytes, and why wrapped proofs win on Ethereum L1

  • Verifying a single raw STARK can take up millions in gas and results in huge calldata. In contrast, wrapped Groth16/Plonk proofs check in at around 230k-300k gas with data sizes under a kilobyte. If EIP-7623 bumps up the calldata costs, that gap is only going to get bigger. That's why wrapping STARKs into SNARKs has become the go-to method for L1 settlements, unless you choose to verify off-chain. (7blocklabs.com)
  • Groth16 became more affordable after EIP-1108; try to keep those public inputs small to cut down on pairings/MSM. On the EVM, this typically outperforms verifying different proofs separately--even if each micro-proof is “small.” (eips.ethereum.org)
  • The KZG precompile from EIP‑4844 (0x0A, 50k gas) is a cool feature that aids Data Availability (DA), but it doesn't handle proof calldata. It works really well alongside the BLS precompiles from EIP‑2537, making BLS aggregation super speedy for bridges and verification layers. (eips.ethereum.org)

Emerging best practices we’re implementing with clients

  • It's usually better to go with streaming accumulation (like folding/IVC) when you’ve got continuous arrivals coming in. Try to publish a new recursive head frequently (like every block on L2), but you can take your time with SNARK-compression. This method helps keep the queue lengths in check and smooths out those pesky p95 tails. Check it out here: (eprint.iacr.org)
  • Try to keep your batch windows at or below one-third of the target block time on your settlement chain. This handy tip boosts the chances of getting included in the next block while still giving you the perks of amortization. Check out more details here.
  • If you're looking for consistent p50, consider pre-wrapping zkVM receipts. SP1’s Groth16/Plonk wrappers come with some fixed overheads--so keep in mind you'll need about an extra 6 seconds for Groth16 and around 70 seconds for Plonk when wrapping on demand. Check out the details here.
  • Implement chunked aggregation along with hedged execution. Once you hit a certain percentile cutoff (like p80), kick off a backup aggregation for the slower chunks. This approach helps minimize the impact of stragglers and keeps your batch processing on track. Live systems actually break aggregation into chunks precisely for this reason. (blog.alignedlayer.com)
  • Design the prover as if it were a streaming system. Leverage GPU pipelining (BatchZK) along with split witness generation (Yoimiya) to synchronize stage times and keep the hardware busy. (eprint.iacr.org)
  • Think of verification like a product feature. You can provide users with “fast soft-finality now” through an off-chain verification layer, while promising “hard L1 finality later” with recursive SNARK settlement. A lot of users are going to appreciate having both options. (blog.alignedlayer.com)
  • Keep an eye on fee markets and the changes happening with precompiles. EIP‑2537 opens up new possibilities for BLS-based aggregation patterns, while EIP‑7623 introduces penalties for bundles that are heavy on calldata. Make sure to design your verifiers and public inputs with these in mind. (eips.ethereum.org)

Buyer’s checklist for an aggregation layer (with numbers)

  • Here’s what we’re working with for proof types nowadays (docs > code > audits): Groth16, Plonk/Halo2‑KZG, Plonky2, SP1 receipts, and RISC0 receipts. If you need gas numbers and the inclusion-proof API, just ask! (docs.electron.dev)
  • For on-chain gas per aggregated batch, we’re aiming for about 300-400k; and for each proof inclusion query, expect around 16-25k. (docs.electron.dev)
  • When it comes to throughput and latency, we’re talking real production numbers--not just test rigs. You should see hundreds to thousands of verifications per second off-chain, while on-chain aggregation is done every few minutes if you really need it. (blog.alignedlayer.com)
  • For chunk size and policy, we're considering something like 256, and the SLO-based flush will be based on either time or count--whichever comes first. (blog.alignedlayer.com)
  • We also have zkVM wrap options and their overheads (like SP1/R0VM Groth16 compared to Plonk) to consider, not to mention if pre-wrapping is available for those low-latency paths. (docs.succinct.xyz)
  • Lastly, check out the audit posture for reduction/aggregation circuits. You’ll want to look for recent audits and any formal-methods progress on zkVM stacks. (veridise.com)

Final takeaway

"Low-latency mixed-proof aggregation" isn't just about one fancy cryptographic trick. It's really more of a systems challenge. You need to normalize early, go all out with parallelization, break things into chunks, and fold them together to keep wait times low. Don’t forget to pipeline the GPU and, when it comes down to it, just post one small, inexpensive proof when necessary. Thanks to the latest verification layers and recursive wrappers, you can achieve sub-15-second confirmations on L2s and use a single ~300k-gas transaction on Ethereum for that hard finality--even when your batches are mixing Plonk, STARK, and zkVM proofs. Check out more on this here.


References (selected)

  • SnarkPack (Groth16 aggregation) performance. Check it out here: (research.protocol.ai)
  • Nebra UPA gas math (Halo2‑KZG aggregation). Get the details: (blog.nebra.one)
  • Electron (mixed‑scheme super‑proofs) gas. Dive into the analysis: (docs.electron.dev)
  • SP1 proof types, gas, and wrapping behavior. Learn more about it: (docs.succinct.xyz)
  • RISC Zero recursion and STARK→SNARK pipeline. Explore the pipeline: (dev.risczero.com)
  • Aligned Layer throughput and dual‑mode settlement. Get the scoop here: (blog.alignedlayer.com)
  • EIP‑1108 (bn128 repricing), EIP‑4844 (0x0A KZG precompile), EIP‑2537 (BLS12‑381 precompiles), EIP‑7623 (calldata cost). Check out the info: (eips.ethereum.org)
  • BatchZK (GPU‑pipelined proving) and Yoimiya (pipeline witness/prover). Get the research paper here: (eprint.iacr.org)
  • AggLayer pessimistic proofs for secure multi‑stack interop. Read all about it: (polygon.technology)

Like what you're reading? Let's build together.

Get a free 30-minute consultation with our engineering team.

7BlockLabs

Full-stack blockchain product studio: DeFi, dApps, audits, integrations.

7Block Labs is a trading name of JAYANTH TECHNOLOGIES LIMITED.

Registered in England and Wales (Company No. 16589283).

Registered Office address: Office 13536, 182-184 High Street North, East Ham, London, E6 2JA.

© 2026 7BlockLabs. All rights reserved.