7Block Labs
Blockchain Technology

ByAUJay

Blockchain Intelligence API and Blockchain Intelligence Data Layer: Indexed Blockchain Data for BI

Summary: This post shows decision‑makers exactly how to stand up a “Blockchain Intelligence API” and an enterprise “Blockchain Intelligence Data Layer” that delivers indexed, enriched, and query-ready on‑chain data for BI and AI—using current tooling, schemas, and latency/finality guarantees across L1s and L2s.


Why this matters now

Three developments in 2025 made enterprise-grade blockchain analytics both possible and necessary:

  • Ethereum’s Pectra upgrade (activated May 7, 2025, epoch 364032) doubled blob throughput and shipped major account and validator improvements—driving L2 data volumes higher and making warehouse-grade pipelines a first-class need. (ethereum.org)
  • Google BigQuery expanded managed public blockchain datasets and standardized schemas across more chains (with recent additions like Arbitrum, Optimism, Tron, Polygon, and Hedera), making cross-chain SQL a practical baseline for BI. (cloud.google.com)
  • Ethereum clients began partial history expiry (EIP‑4444 roadmap), meaning node operators can prune older block data; relying on nodes for long-tail history is no longer a safe plan—warehousing and curated data shares are. (eips.ethereum.org)

Together, these shifts make it imperative to formalize your on‑chain data strategy: a real-time “Blockchain Intelligence API” for apps and operations, and a durable “Blockchain Intelligence Data Layer” for analysts, finance, risk, and AI.


What do we mean by “Blockchain Intelligence API” and “Data Layer”?

  • Blockchain Intelligence API: A developer-facing facade that returns normalized, enriched objects (e.g., AllTransfersByAddress, AddressRiskSummary, ContractMethodCalls), hides chain quirks, exposes finality and reorg semantics, and supports streaming/webhooks for low-latency use cases.
  • Blockchain Intelligence Data Layer: Your governed warehouse/lakehouse tables (Bronze/Silver/Gold) plus a semantic layer for BI and AI. It ingests from multiple sources, standardizes schemas (transactions/logs/traces/tokens/entities), captures finality states, enriches with labels/sanctions, and serves SQL to tools like Looker, Power BI, and Snowflake Cortex/AISQL.

Source options you can assemble today

You’ll likely mix at least two of these to balance cost, latency, coverage, and control.

  1. Managed data APIs for low-latency ingestion
  • Alchemy Enhanced APIs provide pre-indexed Transfers, Token, NFT, Receipts, Trace, and Webhook products across EVM chains—useful for app features and incremental backfills without running your own indexers. (alchemy.com)
  • QuickNode Streams pushes blocks/receipts/traces in finality order with exactly-once delivery semantics, supports server-side JavaScript filters, historical backfill, and destination webhooks—ideal for streaming Bronze tables. (quicknode.com)
  • Covalent GoldRush exposes structured multichain Wallet, Transactions, NFT, Cross-Chain Activity, Security, and DEX price endpoints, with sub-second streaming for trading/agent use cases and 100+ chain coverage. (docs.linea.build)
  1. Open indexing and stream-to-SQL
  • The Graph’s Firehose + Substreams let you write chain-specific Rust/Protobuf pipelines, then map directly to SQL or Parquet sinks; recent updates add SQL mapping from Protobuf, upsert semantics in the SQL sink, and foundational stores for consistent stateful derivations. Use this for reproducible, portable indexing at scale. (forum.thegraph.com)
  1. Warehouse-native datasets and data sharing
  • BigQuery public crypto datasets cover Ethereum, Bitcoin and a broad set of L1/L2s with a common schema for several chains; you can start with bigquery-public-data.crypto_ethereum (blocks, transactions, logs, traces, token_transfers) and crypto_bitcoin.* immediately. (cloud.google.com)
  • Snowflake Marketplace/Sharing brings curated blockchain data (e.g., Dune DataShare, Amberdata datasets) straight into your account with zero-ETL; Snowflake now shares Iceberg tables live, enabling open-format governance and interop. (community.dune.com)

Reference architecture we deploy at 7Block Labs

  • Ingest
    • Streaming: QuickNode Streams (blocks, receipts, traces) → Kafka → Delta/Parquet Bronze; Alchemy webhooks for address/contract activity; Covalent Streaming API for sub-second DEX pairs.
    • Batch: BigQuery public datasets; The Graph Substreams SQL sink to Postgres/BigQuery; Dune/Flipside exports for curated derived tables. (quicknode.com)
  • Standardize
    • Medallion model: Bronze (raw canonical: blocks, txs, logs, traces), Silver (decoded events, token balances, address facts), Gold (subject-area marts: revenue, cohorting, risk).
    • Schema governance: typed transactions (EIP‑2718), EIP‑1559 fee fields, and blob metadata (EIP‑4844) persisted with exact types. (eip.directory)
  • Enrich
    • ABI/event decoding via 4byte & OpenChain signatures; OFAC SDN joins; optional entity labels from trusted providers. (4byte.directory)
  • Serve
    • API gateway over Gold views (GraphQL/REST) with finality flags; Snowflake/BigQuery direct SQL; feature store for AI; scheduled extracts to BI tools.

Schema details decision‑makers should insist on

EVM “core” tables across chains (Silver tier suggested columns)

  • transactions: hash, block_number, block_timestamp, from_address, to_address, value_wei, status, type (EIP‑2718), gas, max_fee_per_gas, max_priority_fee_per_gas, effective_gas_price, input, chain_id. (eip.directory)
  • receipts: hash, cumulative_gas_used, gas_used, contract_address, logs_bloom (or derived), effective_gas_price, blob_gas_used/fee if present. (eip.directory)
  • logs: tx_hash, log_index, address, topics[0..n], data, decoded_event_name, decoded_args (JSON), signature_source.
  • traces: tx_hash, trace_address, call_type, from_address, to_address, input, output, value, gas, gas_used, error.
  • blocks: number, hash, parent_hash, timestamp, base_fee_per_gas, blob sidecar refs (if applicable). (hackmd.io)
  • tokens: contract_address, symbol, name, decimals, standard (erc20/721/1155), first_seen_block.
  • token_transfers: tx_hash, log_index, token_address, from_address, to_address, value, decoded standard.

Non‑EVM chains get analogous tables (adapt for account model vs UTXO, program logs, etc.). For UTXO chains, mirror BigQuery’s common schema to simplify cross-chain joins. (cloud.google.com)

Finality columns you’ll need across all tables

  • commitment_level: latest | safe/justified | finalized for Ethereum; confirmations integer for Bitcoin; sequencer_accepted | L1_safe | L1_finalized for OP‑Stack L2s (with time-to-finality SLA targets per chain). (alchemy.com)

History retention and completeness fields

  • history_source: node | BigQuery_public | substreams_sql | provider_api; history_gap boolean for ranges pruned by EIP‑4444. (eips.ethereum.org)

Getting finality and reorgs right (this is where BI often breaks)

  • Ethereum: finality usually ~2 epochs ≈ 12.8 minutes; many teams accept “safe/justified” (~6.4 minutes) for ops dashboards and wait for finalized for finance books. Model both states in your marts. (ethos.dev)
  • Bitcoin: standard operational policy is 6 confirmations (~1 hour) for high-value settlement; parametrize this per risk appetite. (en.bitcoin.it)
  • OP‑Stack L2s (e.g., Base): “L1 batch finality” is typically ~20 minutes; withdrawals still honor 7‑day fraud window—model different finality semantics for transfers vs withdrawals. (docs.base.org)

Your API should:

  • return a finality_state per object,
  • emit idempotent “upsert” change events when state upgrades from latest → safe → finalized,
  • preserve reorganized rows with a reorg_of pointer for full auditability.

QuickNode Streams and The Graph Substreams SQL sink already support ordered delivery and upsert semantics that make this straightforward to implement. (quicknode.com)


Practical examples you can copy

  1. Cross‑chain wallet P&L and gas spend (BigQuery)
  • Goal: 30‑day ETH gas spend and ERC‑20 netflows by wallet.
  • Sketch:
-- Gas spend (ETH) over last 30 days
SELECT from_address AS wallet,
       SUM((receipt_gas_used * effective_gas_price)/1e18) AS eth_gas_spend
FROM `bigquery-public-data.crypto_ethereum.transactions`
WHERE block_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  AND receipt_status = 1
GROUP BY wallet;

-- ERC-20 netflow by wallet
WITH t AS (
  SELECT to_address AS wallet, token_address, CAST(value AS NUMERIC)/POWER(10, tok.decimals) AS amt
  FROM `bigquery-public-data.crypto_ethereum.token_transfers` tt
  JOIN `bigquery-public-data.crypto_ethereum.tokens` tok ON tok.address = tt.token_address
  WHERE block_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  UNION ALL
  SELECT from_address AS wallet, token_address, -CAST(value AS NUMERIC)/POWER(10, tok.decimals) AS amt
  FROM `bigquery-public-data.crypto_ethereum.token_transfers` tt
  JOIN `bigquery-public-data.crypto_ethereum.tokens` tok ON tok.address = tt.token_address
  WHERE block_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
)
SELECT wallet, token_address, SUM(amt) AS net_tokens_30d
FROM t GROUP BY wallet, token_address;

Use the standardized crypto_ethereum schema; extend the same logic to crypto_ethereum_classic, crypto_polygon, etc., when available. (cloud.google.com)

  1. Exactly‑once streaming Bronze with QuickNode Streams → Delta
  • Configure a “receipts+traces” stream with server-side filters for the contracts you care about; deliver to a webhook that writes Kafka. In Databricks, use Structured Streaming with checkpointing and Delta idempotent write options (txnAppId/txnVersion) for exactly‑once. (quicknode.com)
  1. On‑the‑fly EVM method/event decoding
  • Keep a local cache table signatures_4byte(hex_signature, text_signature, source). Populate from 4byte and OpenChain on demand; batch backfill nightly. This unlocks human‑readable method names in BI without bundling full ABIs. (4byte.directory)
  1. OFAC sanctions joins and token blacklist checks
  • Maintain a sanctions_addresses table (address, asset, sdn_entity, first_seen). Refresh from OFAC SDN sources and audited mirrors; join against wallet tables when generating compliance/risk dashboards. (ofac.treasury.gov)
  1. Portfolio overview without indexing: flip to managed APIs
  • For quick wins, call Alchemy’s Transfers API to fetch complete external/internal/token transfers for user addresses; store responses into Bronze and reconcile to your ledger. Add Token API for balances. This is often “day‑1” functionality before your substreams are ready. (alchemy.com)
  1. Cross‑chain activity fingerprint for onboarding
  • Covalent’s Cross‑Chain Activity API returns which chains an address has touched and last-seen timestamps—useful for risk scoring and UX personalization. (docs.linea.build)

Handling scale, latency, and private orderflow realities

  • Latency targets: Sub‑second streams for DEX prices and pairs are feasible (Covalent co‑processor + streaming), while full L1 finalization will take minutes; expose both TTFB and “finalized-at” timestamps in your API. (covalenthq.com)
  • Private orderflow: A growing slice of transactions routes through private channels (e.g., Flashbots Protect/MEV‑Share) before inclusion; analytics based only on public mempools will miss this flow. Treat mempool analytics as directional, not authoritative; measure against on‑chain inclusion. (writings.flashbots.net)
  • L2 spam and MEV noise: Expect bursts that distort TPS/usage; filter by finalized L1 batches and de‑spam heuristics in Silver. (theblock.co)

Emerging best practices we recommend (2025–2026)

  • Use The Graph Substreams SQL sink upserts for deterministic state tables (e.g., latest token balances, pool states) instead of rebuilding from raw logs; this cuts compute and simplifies late‑arriving reorgs. (forum.thegraph.com)
  • Treat BigQuery public datasets as your “ground truth” backstop for history—especially as EIP‑4444 history expiry rolls forward—and layer managed APIs for low‑latency deltas. (eips.ethereum.org)
  • Adopt warehouse-native sharing for curated data. Dune DataShare on Snowflake and Amberdata Marketplace listings provide vetted, ready-to-join tables without ingestion or duplication. (community.dune.com)
  • Track post‑Pectra blob usage and L2 fee dynamics; schemas should include blob gas metrics for capacity planning and unit economics at the product line level. (blog.ethereum.org)
  • Normalize finality levels across chains in your semantic layer; do not hardcode “6 confirmations” or “2 epochs” into metrics—parameterize per chain/product. (en.bitcoin.it)

Security, compliance, and governance

  • Sanctions: Join OFAC SDN lists to alert/block flows involving sanctioned addresses. Maintain provenance (hashes, publish dates) for audit. (ofac.treasury.gov)
  • PII: On‑chain data is public; however, entity resolution and labels can create personal data. Govern labels as sensitive and tag lineage to derived metrics/lakehouse tables.
  • Data retention: Keep raw Bronze indefinitely in compressed open formats (Parquet/Iceberg/Delta). Snowflake now lets you share Iceberg tables with zero‑ETL; use governance features (RBAC, data masking, access logs). (snowflake.com)

RFP checklist for your Blockchain Intelligence API/Data Layer

Ask vendors (or your internal team) to commit to:

  • Coverage: Enumerate chains/networks, historical backfill completeness, and ABI coverage.
  • Latency & delivery: Finality-aware SLAs; streaming guarantees (exactly-once, ordered by finality). QuickNode Streams documents exactly‑once in finality order—make this a requirement. (quicknode.com)
  • Schema: Typed tx envelopes (EIP‑2718), EIP‑1559 and blob fields; stable semantic versioning of responses. (eip.directory)
  • Reorg handling: Upsert semantics, idempotent CDC for state tables (e.g., Substreams SQL sink upserts). (forum.thegraph.com)
  • Enrichment provenance: Sources for signatures (4byte/OpenChain), labels, and sanctions; update frequencies and hashes. (4byte.directory)
  • Warehouse delivery: BigQuery/Snowflake native tables or Parquet/Delta exports; support for Iceberg sharing for zero‑ETL consumption. (cloud.google.com)

Brief build/buy guidance

  • Buy first for speed: Use Alchemy Transfers/Token, QuickNode Streams, and Covalent GoldRush to stand up an MVP API in weeks. (alchemy.com)
  • Build for edge cases and cost control: Layer The Graph Substreams to replace hot paths with deterministic, portable mappers; push outputs to SQL/Parquet to feed your lakehouse. (forum.thegraph.com)
  • Anchor history in warehouses: Use BigQuery public datasets and Snowflake shares for history guarantees as EIP‑4444 rolls forward. (cloud.google.com)

How 7Block Labs can help

We design and implement production Blockchain Intelligence APIs and Data Layers with:

  • Substreams/Firehose pipelines mapped to SQL/Parquet with upserts, plus managed API integrations for low-latency deltas. (forum.thegraph.com)
  • Finality-aware CDC patterns to Databricks Delta or Snowflake with exactly‑once semantics. (docs.databricks.com)
  • Warehouse‑native assets on BigQuery/Snowflake with governance, lineage, and semantic models tailored to finance, risk, and growth teams. (cloud.google.com)

If your 2026 roadmap involves on‑chain BI or AI agents, the combination of a Blockchain Intelligence API and a governed Data Layer is the shortest path to reliable insights and measurable ROI.

Like what you're reading? Let's build together.

Get a free 30‑minute consultation with our engineering team.

Related Posts

7BlockLabs

Full-stack blockchain product studio: DeFi, dApps, audits, integrations.

7Block Labs is a trading name of JAYANTH TECHNOLOGIES LIMITED.

Registered in England and Wales (Company No. 16589283).

Registered Office address: Office 13536, 182-184 High Street North, East Ham, London, E6 2JA.

© 2025 7BlockLabs. All rights reserved.