Web3 Discovery and On-Chain Web3 Insight: Finding Hidden Signals in Indexed Blockchain Data

Web3 teams win by spotting on-chain signals early—sequencer revenues spiking on a new L2, blob fees compressing your rollup’s unit economics, restaking risk turning from hypothetical to enforceable. This guide shows decision‑makers exactly how to surface those signals from modern, indexed blockchain data and wire them into product and strategy.

Description: A 1,700-word, hands-on playbook for decision‑makers to discover hidden signals in on-chain data—covering Substreams, ClickHouse, Dune, BigQuery, blob markets, Superchain metrics, MEV orderflow auctions, EigenLayer restaking, and ERC‑4337—with concrete queries, architectures, and 2025–2026 realities.

Why “discovery” now looks different

In 2025–2026, three shifts changed what “on‑chain insight” must track:

Rollups and DA layers went mainstream. EIP‑4844 blobs became the default data rail for L2s; Ethereum’s blob capacity rose (target/max increased via Pectra/EIP‑7691 and subsequent BPO forks), and DA alternatives like EigenDA and Celestia reached meaningful throughput. If you’re not monitoring blob supply, DA mix, and finality windows, you’re behind. (eips.ethereum.org)
The OP Stack’s Superchain concentrated usage and revenue—Base led H1 2025 by transactions and sequencer revenue—so “chain discovery” equals “Superchain discovery.” (messari.io)
Orderflow and wallet UX matured: Flashbots MEV‑Share turned user orderflow into measurable refunds; AA (ERC‑4337) standardised smart accounts across v0.6/v0.7 EntryPoints you can index directly. (docs.flashbots.net)

The implication: build a discovery pipeline that’s DA‑aware, rollup‑aware, and orderflow‑aware—then wire it into alerts and product bets.

The modern on‑chain data stack (that actually scales)

Here’s the stack we deploy for clients that need sub‑minute discovery and auditable history.

1) Lossless ingest and replay with Substreams/Firehose

Firehose streams full block data with reorg‑safe chunking and file‑oriented backfill, built with The Graph’s StreamingFast team. This is the easiest way to keep “truth” and “speed” without reindexing your world on every fork. (firehose.streamingfast.io)
Substreams modules let you express transforms once and sink to multiple targets (SQL, Kafka, Parquet). Recent releases added more chains, improved Base performance, and Foundational Store primitives for better consistency. (forum.thegraph.com)

2) Low‑latency analytics with ClickHouse + Kafka

ClickHouse remains the best open columnar engine for time‑series blockchain analytics. Use the Kafka engine or Kafka Connect Sink for exactly‑once semantics (via KeeperMap) and predictable backpressure. (docs-content.clickhouse.tech)
Substreams:SQL maps protobuf → relational tables, supports inserts/updates/upserts, and handles reorgs (with a small delay on ClickHouse). Add dbt inside the sink for continuous materializations. (docs.substreams.dev)
Expect 10–100x faster scans than row stores; teams report tens of millions of rows/sec for interactive dashboards. (clickhouse.com)

3) Indexed APIs for speed to value

Sometimes you don’t need custom indexing yet:

Dune’s API exposes 500TB+ of curated on‑chain data, serverless querying, dbt connectors, and BI integrations—perfect for prototyping KPIs and alerting without infrastructure. (dune.com)
Covalent GoldRush gives structured REST/WS on 100+ chains, with decoded logs, pricing, and real‑time streams—handy for product teams that want “working data” in hours. (goldrush.dev)
Google BigQuery public datasets (Ethereum, Polygon, more) are still a gold standard for reproducible analysis and ML; OP Labs’ Superchain data is also accessible via BigQuery through community mirrors. (cloud.google.com)

4) Label enrichment (without paying a fortune)

Etherscan API v2 added “Nametags”/labels and unified multichain access—good enough for exchange/bridge labeling and basic entity stitching. Use it to enrich top counterparties in your data lake. (docs.etherscan.io)

5) Finality and data retention you can explain to Finance

On Ethereum, practical finality is ~15 minutes (two epochs) today; plan alerts and revenue recognition around this, not per‑slot proposals. SSF research is ongoing, but your near‑term SLOs should assume current finality. (ethereum.org)
Blobs only persist ~18 days; if you rely on them for audits/analytics, you must offload content and proofs yourself (design like Blobscan: indexer + API + multi‑provider blob storage). (migalabs.io)

What to discover: the hidden signals that move strategy

Ethereum’s DA share shows Base often as the largest poster; EigenDA and Celestia contribute meaningful throughput (Mantle/Eclipse commonly top posters). Watch daily share and concentration—this is your supply‑side of blockspace and a leading indicator for users and fees. (l2beat.com)

Why it matters: DA costs and throughput drive L2 margins. If your product depends on a chain’s gas economics, rising blob capacity (e.g., Pectra target 6/max 9, then BPO1 to target 10/max 15) can widen your unit cost envelope. Tie your pricing experiments to blob fee regimes. (l2beat.com)

B) Superchain growth and sequencer revenues

In H1 2025, OP Superchain processed 2.47B txs; Base led with 1.57B and generated $42.4M in sequencer revenue—87.2% of the OP‑chain cohort. That’s “product-market-chain fit.” Track net flows, usage, and revenue sharing to anticipate incentive shifts and listing catalysts. (messari.io)

C) MEV orderflow auctions (user refunds as a KPI)

Flashbots MEV‑Share lets wallets/apps auction orderflow and refund users by default. Index its event stream (BigQuery dataset from Eden) to quantify how much value your users reclaim and which partners maximize refunds. (docs.flashbots.net)

D) Restaking: from TVL hype to enforceable risk

EigenLayer turned on mainnet slashing in 2025, moving risk from docs to code. Discovery here means: which AVSs are live, which operators opt‑in, and how slashing conditions map to your counterparty risk. Track TVL and AVS count, but index operator/AVS opt‑ins and slashing events. (outposts.io)

E) Account Abstraction: measurable UX lift, concrete event trail

EntryPoint v0.6 address: 0x5FF1…2789; v0.7: 0x0000…7032 across many chains. Counting UserOperationEvent across chains is the most direct adoption gauge. Segment by Paymaster to see where “gasless” UX subsidizes retention. (alchemy.com)

Concrete examples you can run this week

1) Blob‑era cost pressure: alert on capacity changes and blob fees

Blob supply and fee dynamics directly alter L2 posting costs and, by extension, your app’s margin if you subsidize transactions.

Track Ethereum DA throughput and capacity usage daily; alert when capacity usage >60% or when target/max blobs change via “BPO” governance. L2BEAT’s Ethereum DA page documents current parameters and changes. (l2beat.com)
Store blob contents yourself. Design like Blobscan: ingest consensus/execution metadata, persist content to multi‑provider storage (e.g., GCS/S3), and keep an index keyed by versioned hash. (docs.blobscan.com)

Pseudocode for a Substreams‑to‑ClickHouse sink that flags high‑usage days:

# substreams.yaml
sink:
  module: map_blobs
  type: sf.substreams.sink.sql.v1.Service
  config:
    engine: clickhouse
    dbt_config:
      files: ./dbt
      run_interval_seconds: 300
      enabled: true

Then a dbt model that computes daily capacity used and flips an alert flag when above threshold (feed that to PagerDuty/Slack).

2) Measure MEV‑Share refunds per user cohort (BigQuery)

-- Requires BigQuery access to eden-data-public.flashbots.mev_share
-- MEV refunds by day and top orderflow providers
SELECT
  DATE(block_timestamp) AS day,
  orderflow_provider,
  COUNT(*) AS bundles,
  SUM(refund_amount_eth) AS eth_refunded
FROM `eden-data-public.flashbots.mev_share`
WHERE block_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY 1,2
ORDER BY day DESC, eth_refunded DESC;

This gives you a concrete KPI: “refunds per active wallet,” which you can correlate to retention. Dataset updates every ~15 minutes. (docs.edennetwork.io)

3) Multi‑chain AA adoption: count UserOperationEvent across chains

UserOperationEvent topic (v0.6) is 0x49628f…1419f. Count events by day on mainnet, Base, OP, Polygon using public datasets (or your own ClickHouse lake):

-- Ethereum mainnet in BigQuery
SELECT
  DATE(block_timestamp) AS day,
  COUNT(1) AS userops
FROM `bigquery-public-data.crypto_ethereum.logs`
WHERE topics[SAFE_OFFSET(0)] = '0x49628fd1471006c1482da88028e9ce4dbb080b815c9b0344d39e5a8e6ec1419f'
GROUP BY 1
ORDER BY day DESC
LIMIT 30;

Repeat for

crypto_optimism.logs

,

crypto_polygon.logs

, etc., then join by day to compare adoption. If you run your own infra, the same filter runs blazing‑fast in ClickHouse (JSONEachRow → materialized columns on

topics.0

). Topic reference seen on public explorers and consistent with the v0.6 EntryPoint. (cloud.google.com)

4) Enrich counterparties with Etherscan Nametags (labels)

When a wallet spikes interactions with a handful of destinations, add lightweight entity context:

curl "https://api.etherscan.io/v2/api?module=account&action=nametags&address=0x...&chainid=1&apikey=$KEY"

You’ll get labels like “Coinbase, Exchange” and metadata to join back into your fact tables—no expensive label subscription required. Use API v2 going forward. (docs.etherscan.io)

5) Real‑time pipeline: Substreams → Kafka → ClickHouse

Substreams to Kafka topics per module (e.g., erc20_transfers, blobs_meta).
ClickHouse Kafka engine tables ingest and merge into MergeTree targets; keep “reorg watermark” and only promote rows older than N blocks or epochs.
Kafka Connect Sink gives exactly‑once semantics; set batch sizes ≥1000 and JSON array batching to reduce CPU/IO. (docs-content.clickhouse.tech)

“What changed” checklists you can turn into dashboards

DA and blob market checklist

Ethereum DA capacity used (%) vs. target/max blobs; track parameter changes (Pectra, BPO1). (l2beat.com)
Largest DA posters per day across Ethereum/EigenDA/Celestia; note Base/Mantle/Eclipse dominance by day. (l2beat.com)
Blob retention coverage in your archive (≥18 days + redundancy). (migalabs.io)

Superchain fundamentals

Tx share and sequencer revenue per OP‑chain; Base typically leads both—watch for regime changes from incentives or performance features like Flashblocks preconfirmations (~200ms). Treat preconfirmations as soft state until finalized. (messari.io)

MEV and orderflow

MEV‑Share refunds per MAU, top orderflow partners, % of tx volume via protected routes. (docs.flashbots.net)

Restaking and AVS risk

AVSs live vs. in‑dev, operator opt‑ins, slashing events count/value. Slashing live on mainnet since April 2025—this is now a real risk surface. (coindesk.com)

AA adoption

UserOperationEvent per chain/day, Paymaster share, EntryPoint version mix (v0.6 vs. v0.7). Use canonical addresses: v0.6 0x5FF1…2789; v0.7 0x0000…7032. (alchemy.com)

Emerging best practices (2026‑ready)

Design for finality—not just speed

For Ethereum, model alerts and user‑visible states around ~15‑minute finality. Use “soft” indicators (e.g., Flashblocks preconfirmations) to improve UX, but don’t book revenue until the canonical window elapses. SSF may shorten this in the future; keep a feature flag. (ethereum.org)

Treat blobs as ephemeral; archive aggressively

Replicate blob content and metadata to your own storage with keyed versioned hashes. Blobscan’s reference architecture (indexer + API + multi‑provider blob storage) is a solid blueprint. (docs.blobscan.com)

Separate compute from storage in your lakehouse

Substreams → Kafka for transport; ClickHouse for fast OLAP; cold history in Parquet; dbt for models; and a small Postgres for joins that truly must be transactional. Substreams:SQL now ships dbt hooks to keep models fresh. (docs.substreams.dev)

Prefer open, testable interfaces for orderflow and MEV

MEV‑Share’s open specs and public datasets make it auditable; avoid black‑box “private relays” where you can’t quantify user refunds or fairness. (docs.flashbots.net)

Make label enrichment explainable

Use Etherscan API v2 Nametags for a baseline and log the “why” (source, timestamp, label slug). This beats opaque vendor labels when auditors ask for provenance. (docs.etherscan.io)

Restaking KPIs beyond TVL

Track “enforceability” (slashing conditions live?), AVS diversity, and operator concentration. TVL without slashing is soft security. Slashing is now live—measure it. (coindesk.com)

Account abstraction metrics that correlate with retention

Monitor Paymaster‑sponsored share and failure reasons. EntryPoint v0.7 reduces gas and improves validations—flag apps lagging on v0.6. (alchemy.com)

Two discovery playbooks to copy

Playbook A: “Which OP‑chain should we launch on next?”

Shortlist by DA costs and blob capacity trajectory (Ethereum DA parameters; chain’s share of postings). (l2beat.com)
Check sequencer economics: transactions, fee revenue split, and any profit‑sharing with the Collective (Messari’s H1 2025 report is a great baseline). (messari.io)
Validate UX speed: if Flashblocks‑style preconfirmations are live, instrument “time‑to‑preconfirm” vs. “time‑to‑finalize” in your client telemetry; document the trust/latency tradeoff for stakeholders. (docs.gelato.cloud)

Playbook B: “Are our gasless wallets actually moving the needle?”

Count UserOperationEvent by day and Paymaster label across your top chains; normalize by daily active wallets for a “gasless adoption index.” (hekla.taikoscan.io)
Correlate with MEV‑Share refunds earned by those users; if refunds are material, surface them inside the wallet to reinforce behavior. (docs.flashbots.net)
If you’re still on v0.6, set an internal deadline to migrate to v0.7 in 2026 and simplify operations; track version mix in your dashboards. (alchemy.com)

Implementation notes that save months

Don’t reimplement decoders if you can help it. Start with GoldRush/Dune/BigQuery to prototype, then “graduate” hot paths to Substreams/ClickHouse when KPIs harden. (goldrush.dev)
Partition by block number and chain ID everywhere; promote to “finalized” tables via materialized views that lag by a configurable number of slots/epochs per chain. Use ClickHouse TTLs to prune pre‑finalized buffers. (docs-content.clickhouse.tech)
Keep a “reorg ledger.” Every backfill, log the before/after block hash for changed heights and replay affected aggregates—your auditors will thank you. Substreams’ file‑first model makes this sane. (firehose.streamingfast.io)
For OP‑stack latency experiments, label preconfirmations separately from L2 blocks; wallets can show “preconfirmed” then “finalized,” mirroring how exchanges show “confirmations.” Treat it like optimistic UI. (docs.gelato.cloud)

Looking ahead: proof‑carrying insights

The line between “off‑chain analytics” and “on‑chain guarantees” is blurring:

ZK coprocessors like Succinct’s SP1 make it feasible to prove heavy computations (even ranges of L1 blocks) and verify them on‑chain, with proof aggregation to reduce cost. Expect “proof‑carrying metrics” to move from marketing to contracts that gate rewards. (docs.succinct.xyz)
Historical data proving to contracts is here: Axiom v2 enables verified queries over Ethereum history with on‑chain block hash roots. This unlocks “trust‑minimized leaderboards,” “verified loyalty,” and more. (github.com)

Build now so you can flip a switch later—turning your dashboards into attestable program logic when it matters.

TL;DR for decision‑makers

Index with Substreams/ClickHouse for speed and correctness; prototype with Dune/GoldRush/BigQuery to reduce time‑to‑insight. (docs.substreams.dev)
Track the signals that move strategy: DA share and blob capacity, Superchain revenues, MEV‑Share refunds, EigenLayer slashing/opt‑ins, and ERC‑4337 UserOps by chain and Paymaster. (l2beat.com)
Engineer for finality and retention realities: ~15‑minute Ethereum finality and 18‑day blob retention must be explicit in alerts, SLAs, and archives. (ethereum.org)

If you want help wiring these into your roadmap, 7Block Labs has shipped this stack across exchanges, wallets, and L2s—we’re happy to turn this playbook into dashboards and alerts your exec team will actually use.

Like what you're reading? Let's build together.

Get a free 30-minute consultation with our engineering team.

Talk to us View services

Web3 Discovery and On-Chain Web3 Insight: Finding Hidden Signals in Indexed Blockchain Data

Why “discovery” now looks different

The modern on‑chain data stack (that actually scales)

1) Lossless ingest and replay with Substreams/Firehose

2) Low‑latency analytics with ClickHouse + Kafka

3) Indexed APIs for speed to value

4) Label enrichment (without paying a fortune)

5) Finality and data retention you can explain to Finance

What to discover: the hidden signals that move strategy

B) Superchain growth and sequencer revenues

C) MEV orderflow auctions (user refunds as a KPI)

D) Restaking: from TVL hype to enforceable risk

E) Account Abstraction: measurable UX lift, concrete event trail

Concrete examples you can run this week

1) Blob‑era cost pressure: alert on capacity changes and blob fees

2) Measure MEV‑Share refunds per user cohort (BigQuery)

3) Multi‑chain AA adoption: count UserOperationEvent across chains

4) Enrich counterparties with Etherscan Nametags (labels)

5) Real‑time pipeline: Substreams → Kafka → ClickHouse

“What changed” checklists you can turn into dashboards

DA and blob market checklist

Superchain fundamentals

MEV and orderflow

Restaking and AVS risk

AA adoption

Emerging best practices (2026‑ready)

Two discovery playbooks to copy

Playbook A: “Which OP‑chain should we launch on next?”

Playbook B: “Are our gasless wallets actually moving the needle?”

Implementation notes that save months

Looking ahead: proof‑carrying insights

TL;DR for decision‑makers

Like what you're reading? Let's build together.

Related Posts

Building 'Algorithm-Audited' Financial Statements for DAOs

Developing 'Private Social Networks' with Onchain Keys

How to Tokenize 'Intellectual Property' for AI Models

Web3 Discovery and On-Chain Web3 Insight: Finding Hidden Signals in Indexed Blockchain Data

Why “discovery” now looks different

The modern on‑chain data stack (that actually scales)

1) Lossless ingest and replay with Substreams/Firehose

2) Low‑latency analytics with ClickHouse + Kafka

3) Indexed APIs for speed to value

4) Label enrichment (without paying a fortune)

5) Finality and data retention you can explain to Finance

What to discover: the hidden signals that move strategy

A) Data availability share: who’s writing where?

B) Superchain growth and sequencer revenues

C) MEV orderflow auctions (user refunds as a KPI)

D) Restaking: from TVL hype to enforceable risk

E) Account Abstraction: measurable UX lift, concrete event trail

Concrete examples you can run this week

1) Blob‑era cost pressure: alert on capacity changes and blob fees

2) Measure MEV‑Share refunds per user cohort (BigQuery)

3) Multi‑chain AA adoption: count UserOperationEvent across chains

4) Enrich counterparties with Etherscan Nametags (labels)

5) Real‑time pipeline: Substreams → Kafka → ClickHouse

“What changed” checklists you can turn into dashboards

DA and blob market checklist

Superchain fundamentals

MEV and orderflow

Restaking and AVS risk

AA adoption

Emerging best practices (2026‑ready)

Two discovery playbooks to copy

Playbook A: “Which OP‑chain should we launch on next?”

Playbook B: “Are our gasless wallets actually moving the needle?”

Implementation notes that save months

Looking ahead: proof‑carrying insights

TL;DR for decision‑makers

Like what you're reading? Let's build together.

Related Posts

Building 'Algorithm-Audited' Financial Statements for DAOs

Developing 'Private Social Networks' with Onchain Keys

How to Tokenize 'Intellectual Property' for AI Models