ByAUJay
Web3 Application Penetration Testing: A Practical Playbook for Dev Teams
Why this playbook now
- Wow, 2025 has really thrown blockchain security a curveball. Just in the first quarter, we saw a staggering $1.64 billion in losses, mostly due to one huge incident, but there was also a steady stream of exploits happening all over the place. It’s clear that decision-makers can't just count on “just audits” anymore. Now’s the time for thorough pen-testing that takes both on-chain and off-chain attack paths into account. (theblock.co)
- On another front, Ethereum’s Pectra upgrade coming up on May 7, 2025, is shaking things up with the introduction of EIP-7702 programmable EOAs on the mainnet. This is changing the game for wallet threat models and bringing new testing needs for transaction flows, front-ends, and signers. (eips.ethereum.org)
- And let’s not forget about drainer kits and phishing--they’re still a big part of the loss story. In 2024, around $494 million was drained through those nasty malicious signatures, and 2025 is seeing waves of exploits using those 7702-style batch signatures. So, when you’re pen testing, it’s super important to validate the signature UX, simulation, and transaction policy--not just the bytecode. (drops.scamsniffer.io)
- On a brighter note, rollups are really coming into their own. OP Stack fault proofs are now permissionless on OP Mainnet, and both Base and Arbitrum are moving up to Stage 1 with permissionless validation (BoLD on Arbitrum). Make sure your tests include scenarios for forced-exit, challenge periods, and governance keys. (optimism.io)
Here’s a practical playbook we use with our engineering teams. It takes you through everything from threat modeling to detailed checks and tool commands, covering the entire stack that real attackers target.
Scope first: a web3 pen‑test includes more than contracts
Define the Attack Surface in Six Layers
When it comes to outlining the attack surface, it's all about breaking it down into six distinct layers. Each layer plays a crucial role in understanding where vulnerabilities might lurk. Here’s a quick rundown for your statement of work:
1. Physical Layer
This is the groundwork of your security strategy. It covers all the tangible elements--like servers, workstations, and networking equipment. Make sure to assess things like access control to buildings, locks, and surveillance systems. If someone can physically access your hardware, they can bypass a lot of digital defenses.
2. Network Layer
Next up is the network layer, which involves everything connected to your internal and external networks. Think firewalls, routers, switches, and even the protocols you use. Scope here should include assessing network configuration, identifying open ports, and monitoring for any unauthorized devices or traffic.
3. Perimeter Layer
The perimeter layer is your first line of defense against external threats. It includes your various gateway technologies such as firewalls, IDS/IPS systems, and gateways. In your scope, look into the strength of your existing defenses, how easily they can be bypassed, and any potential backdoors that could lead to your internal systems.
4. Application Layer
Now we zoom in on the applications themselves. This layer focuses on both web and mobile applications, assessing their code for vulnerabilities like SQL injection or cross-site scripting. Scope this area by conducting regular security audits and ensuring that all applications are kept up-to-date with the latest patches.
5. Data Layer
Here’s where the really sensitive stuff lives--your data. This layer focuses on data storage, transmission, and encryption. Clearly define the types of data you hold, how it’s protected, and how you ensure compliance with any relevant regulations. It’s crucial to understand the risks associated with data breaches.
6. User Layer
Finally, we have the user layer, which encompasses all your employees and their interactions with systems. It's vital to scrutinize user permissions, authentication methods, and training programs. In your scope, outline policies for password management, multi-factor authentication, and how you’ll educate users on recognizing phishing attempts.
By tackling each of these layers, you can get a clearer picture of your attack surface and make informed decisions on where to focus your security efforts.
- Contracts and protocols
- We're diving into Solidity/Vyper code, tackling proxies, upgrade paths, timelocks, access control, tokenomics, and oracle logic.
- For mapping coverage, we’ll follow the standards laid out in the OWASP Smart Contract Top 10 (2025), EEA EthTrust Security Levels v3, and SCSVS v2 checklists. You can check them out here: (scs.owasp.org).
2) Wallets and Key Material
- You’ve got your EOAs, smart accounts (ERC‑4337), and EIP‑7702 programmable EOAs. Then there are the different types of wallets like hardware, MPC/TSS, and hot wallets. Check it out on ethereum.org for more details!
3) L2s, Bridges, and Cross-Chain Paths
- We're talking about canonical bridges, message passing, forced-exit mechanisms, challenge windows, and those Security Council controls. To gauge how mature your setup is, check out L2BEAT’s Stages framework. You can find it here.
4) Off-chain services
- You've got a bunch of cool off-chain services like relayers, indexers, oracles, pricing feeds, lambdas, queues, and webhook consumers.
5) Front-ends and Supply Chain
When it comes to front-ends and their supply chain, we’re dealing with a bunch of important elements:
- Domains
- DNS
- CDN
- Signing modals
- NPM dependencies
- CI/CD secrets
- Analytics scripts
6) Transaction Routing and MEV Surface
- Let's dive into public versus private mempool submission, along with how refunds and relay policies play out, plus a look at bundle behavior. Check out the details here: (docs.flashbots.net)
The deliverables you should aim for include: a threat model, a test plan for each layer, proof of exploit attempts, proof of concepts (PoCs) complete with reproduction steps, and validated remediation guidance aligned with standards like EthTrust levels, OWASP SC Top 10, and SCSVS control IDs. Check it out here: (entethalliance.org)
Threat modeling updates for 2025
- Programmable EOAs (EIP‑7702): Get a handle on every user journey that can tweak or update the delegation indicator (0xef0100 || address) and the duration of that delegation. Make sure to explicitly map out what happens during revocation, replay, and phishing scenarios, and adjust your signer policies to match. Check out the full details here: (eips.ethereum.org)
- Drainers and Signature Traps: Think of “what can a single blind signature authorize?” as a key risk factor. Make it a priority to ensure that your simulation layer and wallet prompts show net balance changes and NFT transfers before any signing happens. Keep an eye on recent drainer stats to help you focus on the most pressing scenarios. More info available here: (drops.scamsniffer.io)
- Rollup Governance and Exits: Double-check that your dependency chains hit the Stage 1 benchmarks--like having at least 7-day challenge windows for optimistic systems and a Security Council model with ≥75% compromise. After that, put those assumptions to the test using mainnet-like testnets. Dive deeper here: (forum.l2beat.com)
A step‑by‑step pen‑test playbook
1) Preparation and baseline
- Map standards → tests
- Take SC01-SC10 from the OWASP Smart Contract Top 10 and turn them into unit, fuzz, or formal checks.
- Use EthTrust v3 requirements to guide your static and dynamic checks; make sure to publish your coverage.
- Treat SCSVS v2 items as your go-to “pentest checklist” for contracts, integrations, and design. Check it out here!
- Toolchain and CI
- Static analysis: Use Slither for detecting issues and reviewing upgradeability; make sure to integrate this into your CI pipeline. (github.com)
- Fuzzing/invariants: Leverage Foundry for fuzz and invariant tests. For long-running fuzzing, try Echidna or Medusa; don't forget to capture the corpus to catch any regressions! (blog.trailofbits.com)
- Symbolic testing: Check out Halmos or Manticore to dive into path exploration for those tricky math and authorization flows. (github.com)
- Formal verification (selected properties): Use Certora Prover to ensure key invariants are solid, like equivalences, access controls, and token accounting. Keep an eye on CVL rule coverage in your CI process. (docs.certora.com)
- Transaction simulation: You can run single and bundle simulations with Tenderly--set up front-end hooks and CI checks for those “dangerous deltas.” (docs.tenderly.co)
- Data points to keep an eye on
- Check out the external risk trendlines (Immunefi/Chainalysis/TRM) to help us adjust our testing priorities every quarter--like looking at CeFi private-key compromises versus DeFi logic flaws. (theblock.co)
2) Contract layer: attack-driven testing
- Static quick pass
- Run Slither on the entire Foundry/Hardhat repository. Make sure to fail any builds that hit critical issues like reentrancy, delegate calls to untrusted sources, or unprotected upgrade functions. Don’t forget to add the built-in upgradeability review. (github.com)
- Invariant-driven fuzzing (IDD)
- Start by writing some solid system-level invariants, like making sure
sum(balances) == totalSupply, checking debt conservation, and setting collateralization limits. Use Echidna for its exploration and assertion features, and don’t hesitate to let those long-running jobs crank out billions of iterations using cloud runners. The insights from Trail of Bits’ Curvance engagement really highlighted how expanding invariants can reveal some critical issues. (blog.trailofbits.com) - Remember to keep your corpora in check: rebases can throw a wrench in fuzz corpora, so it's smart to preserve, port, and shrink sequences to maintain high coverage after any refactoring. (blog.trailofbits.com)
- Start by writing some solid system-level invariants, like making sure
- Symbolic tests where fuzzing stalls
- Check out Halmos for doing some symbolic tests on authorization matrices or those tricky batched flows, like complex approve/transferFrom ladders. It’s great for finding counterexamples! You can also dive into Manticore to work on crafted paths specifically on those revert edges. You can find Halmos here.
- Formal specs for “cannot fail” properties
- Make sure to use Certora CVL rules for access control, pausing, and token accounting. If you made the jump to CLI 5.0 in 2025, keep an eye on those changes (parametric rules coverage). Don’t forget to include the rule reports in your pen-test deliverables. Check out the details here.
- Refresh your taxonomy
- While SWC is still handy, it's no longer actively maintained. It's a good idea to sync it up with EthTrust v3 and the OWASP SC Top 10 (2025) to address the latest concerns, like oracle manipulation tiers and unchecked external call patterns in today's proxy designs. Check it out here: (github.com)
3) EIP‑7702 programmable EOA testing (post‑Pectra)
Key 7702 Mechanics to Validate in Tests
When it comes to Type‑4 transactions, there's something to keep in mind: you can set a delegation indicator (0xef0100 || delegate) for the EOA. What this does is pretty neat--it allows the delegate’s code to run in the context of the EOA.
You'll also want to take a close look at the authorization list, which comes with tuples that look like this: [chain_id, address, nonce, y_parity, r, s]. Plus, keep in mind that some intrinsic gas and refund paths have changed from the pre‑Pectra rules. For more details, check out the full specs here: (eips.ethereum.org).
Concrete Test Cases
- Enforce a Delegate Allow-list: Make sure your front-end is set up to block any delegation to untrusted delegates. You can run a penetration test by hosting a malicious delegate and see if the prompts or simulations catch any potential net asset drains. Check it out here.
- TTL and Revocation: Ensure that when you revoke delegation, it resets to a null address. Also, confirm that your UI and signer policy trigger a revocation after either a transaction batch or a time-boxed session. More details can be found here.
- Simulation as a Policy Gate: You should require simulations for Type-4 transactions and have them fail if the predicted deltas involve approvals or transfers of unrelated assets. Don't forget to include bundle simulations when the delegate performs batched calls. Dive deeper here.
- Phishing Drill: Put your security to the test by running red-team drainer flows with “swap-looking” transactions that approve multiple tokens via the 7702 delegation. Make sure your wallet modal and back-end heuristics can detect and block these attempts. It's worth noting that industry reports indicated attackers were exploiting 7702 batch signatures back in 2025. Get the scoop here.
Just a heads up: EIP‑3074 has been withdrawn. If you’ve got any legacy code that mentions AUTH/AUTHCALL paths or has those expectations, it’s time to retire them and switch over to 7702 semantics. You can check out more details here.
4) L2 and bridge testing
- Stage conformance checks (design)
- Stick to L2BEAT’s Stage 1 principle: only a ≥75% Security Council agreement (in addition to any bugs) can permanently halt or approve invalid L2→L1 messages. Also, optimistic rollups should implement a challenge period of at least 7 days. Make sure your tests focus on these aspects. (l2beat.com)
- Fault‑proofs and exits (execution)
- OP Stack: You can now verify those permissionless fault proofs right on the OP Mainnet! If you’re working with OP-based chains like Base, make sure you confirm that they’ve upgraded to this new mechanism. We should also run through inflight-withdrawal invalidation and reproving after any upgrades. You can check it out here.
- Base: Time to validate those permissionless proofs and keep an eye on the Security Council controls as they transition to Stage-1. Let's not forget to include some outage drills for when the sequencer goes down or if we need to force withdrawals. More details can be found here.
- Arbitrum: Exciting news--BoLD is officially live as of February 12, 2025! We've got to keep an eye on the pen-test challenge windows, validator participation, and those time-bounded dispute assumptions. Don’t forget to ensure that the bridging UIs clearly show settlement timelines. Find out more here.
- Bridge Attack Readiness
- Since 2021, cross-chain bridge logic attacks have led to losses in the billions. To beef up security, it's essential to include tests for event mismatches, replay attacks, message ordering issues, and assumptions related to light-client verification. (arxiv.org)
5) MEV and transaction‑routing tests
- Private Orderflow Sanity
- Flashbots Protect: Test out those privacy “hint” settings in the staging environment, like choosing between hash-only or logs/calldata for refunds. Also, check out the mempool failover behavior and make sure there’s solid revert protection in place. It’s super important to ensure that any risky flows don’t leak into public mempools. (docs.flashbots.net)
- MEV Blocker: Benchmark the inclusion time and see how much you can improve prices on swaps. Don’t forget to set up full-privacy endpoints for those sensitive transactions. Plus, work rebates into your unit-economics, and make sure the RPC is only connected for swaps that allow data sharing. (docs.cow.fi)
- Decommissioned Products: If you’re still using those old RPCs (like Eden), it’s time to migrate. Just a heads-up, some private RPCs are shutting down in 2025! (theblock.co)
- Sandwich and backrun drills
- Check out those big swaps by replaying them through the public mempool in a fork. This helps you nail down your slippage, deadline, and routing. After that, try routing through private RPCs and see how the prices/output stack up, along with refunds and failure rates. Based on what you find, put together a “routing SLO.” (docs.cow.fi)
6) Wallets, keys, and off‑chain services
- Key Management Tabletop
- Let’s run some simulations around theft and key rotation: we’ll switch up the deployer and governance keys, pause things using multisig, and then recover through either the guardians or the Security Council, just like we’ve mapped out. Plus, we should practice a signer-compromise scenario every quarter to keep our skills sharp.
- 4337 Paymasters and Aggregators
- Check out OpenZeppelin’s public audit notes for 4337 (think gas, deposit records, and signature aggregation) and make sure to retest your paymaster logic. Don’t forget to add some invariants for those fee-sponsored flows! (openzeppelin.com)
- Front-end integrity
- Create “modal invariants”: make sure that what the user signs actually aligns with the on-chain effects (like ensuring there are no sneaky extra approvals or NFT transfers). If your dependency graph drags in unsigned third-party scripts into crucial user flows, it should fail the build.
Putting it into practice: minimal viable test packs
Here’s a handy list of concrete packs that teams can roll out during a sprint. You’ll find some examples right here that you can easily copy into your repositories or tools.
1) Static + Upgradeability Pack
- Set up Slither to run in your CI pipeline; make sure to add the upgradeability tool and check for ERC compliance.
Here's a sample command you could use:
slither . --upgradeability --ERC20
slither . --checklist --print inheritance,solc-version,contracts,functions
slither-check-upgradeability .
2) Invariant Fuzz Pack (Foundry + Echidna)
- Set up Foundry invariant tests along with Echidna properties that reflect your tokenomics.
- Store your corpora in object storage for rebase-aware replay. According to Trail of Bits, this approach significantly enhances bug detection over time. (blog.trailofbits.com)
3) Symbolic Pack (Halmos)
- When it comes to handling complex batched flows like DEX routers and multi-asset accounting, consider adding Halmos symbolic tests along with Foundry. This is especially useful for those tricky paths that fuzzers often have a hard time reaching. Check it out here: github.com
4) Formal Pack (Certora)
- Create a compact CVL suite that includes supply conservation, access control, pause invariants, and upgrade preconditions, focusing solely on the critical contracts. Make sure to keep the rule reports as artifacts. Check out the details here: docs.certora.com
5) 7702 Policy Pack
- Introduce a policy service that rejects Type‑4 transactions directed to delegates that aren't on the approved list. Also, make sure to require Tenderly simulations for any changes in delegation and automatically revoke access once a bundle gets mined. (eips.ethereum.org)
- MEV Routing Pack
- Use private RPCs for swaps by default. For sensitive orders, set it to “hash-only” privacy, and make sure mempool failover is only activated for stale pending transactions after more than 25 blocks. Don't forget to benchmark your performance against the public mempool path every quarter. Check out the Flashbots documentation for more details!
Governance and rollup‑dependent drills
- Security Council exercises
- Stay on top of the latest best practices for Security Councils: think about key rotation timing, decision trees for emergency pauses, and keeping communication about incidents clear and open. Don't forget to add Council-triggered pause/unpause drills into your quarterly penetration testing. Check out this guide for more details: (blog.openzeppelin.com)
- Stage audits
- Reach out to your L2 providers and have them note down the Stage 1/2 status along with those challenge periods. It’s a good idea to pen-test those forced-exit paths and message timeouts to match what L2BEAT suggests (that’s a ≥7-day optimistic challenge). Check out more on this here.
KPIs and what “good” looks like
- Coverage and signal
- We've got over 90% of those critical contracts covered by invariants thanks to long-run fuzz corpora. Plus, we've nailed down at least 3 formal properties on the top-tier contract. (blog.trailofbits.com)
- Time‑to‑detect (TTD) and time‑to‑revoke (TTR) for 7702 delegations
- The simulated deltas need to render in under 500 ms, and revocation flows should wrap up within one block right after bundle execution. (eips.ethereum.org)
- Exit drills
- We’ve successfully run forced exit drills on OP/Arbitrum-based rollups, keeping in mind the challenge periods and post-upgrade reproving patterns. Keep an eye on any failures that pop up after system upgrades in OP’s fault-proof setup. (help.superbridge.app)
- MEV routing outcomes
- Private routing is really showing its worth, giving us noticeably better swap outputs and we haven't seen any sandwiches in staging. Make sure to document those price improvement deltas and keep track of refund share policies. (docs.cow.fi)
Emerging practices to adopt in 2025
- Invariant‑Driven Development (IDD) as a product culture
- Kicking things off by writing invariants and then using fuzzing or formal tools to check them tends to do way better than the old “test after you code” approach. It’s a good idea to weave those invariants into your design docs and code reviews. Check it out here: (blog.trailofbits.com)
- Shift-left simulation
- Mimic all actions that create signatures, both on the front-end and back-end. You’ll be able to see balance/NFT changes and catch those sneaky “implicit approval” patterns that can siphon off unrelated assets. Check it out at (docs.tenderly.co).
- Standards-aligned reporting
- Release EthTrust v3 and SCSVS coverage along with audit links. Also, categorize issues by the OWASP SC Top 10 to make it easier for non-security folks to prioritize. (entethalliance.org)
Budget and timeline guidance
- Two-week “MVP” pentest for an MVP protocol/wallet
- Day 1-3: Let’s kick things off with scoping, building a threat model, and mapping out the standards.
- Day 4-7: Time to dive into Slither and run a quick fuzzing pass, tackle 7702 policy checks, and check out those MEV routing benchmarks.
- Day 8-10: We’ll wrap it up with some targeted symbolic/formal checks on the critical flows and perform a rollup exit drill on the testnet.
- Deliverables: Expect to see prioritized findings, remediation diffs, policy configs, and a setup for CI.
- Six-week “production-ready” pentest
- This includes everything mentioned above plus a long-run fuzz test in the cloud, formal properties to ensure key invariants are solid, comprehensive L2/bridge exercises, a tabletop session with the Security Council, and a phishing red-team campaign focused on 7702.
Final word
In 2025, security really shook things up: EIP-7702 tweaked how wallets behave, Stage-1 rollups altered exit guarantees, and drainer kits took advantage of signature UX on a larger scale. Nowadays, a modern pen-test dives into reality as it stands--it's all about rigorously testing transaction policies, running simulations, exploring mempool routing, scrutinizing governance keys, and examining exits just like we would with bytecode.
Need a hand tweaking this playbook to fit your stack, whether it’s EVM, OP/Arbitrum, or something cross-chain? Look no further than 7Block Labs! They can help you transform it into a solid backlog, set up CI pipelines, and even create drills for your team to tackle every quarter.
References and data points
- We're coming off the worst quarter in hacks (Q1 2025) with some jaw-dropping monthly figures, according to Immunefi via The Block. Check it out here.
- In 2024, we saw wallet-drainer losses alongside a wave of phishing attacks in 2025 (you’ll want to look into those 7702 batch signatures) tracked by ScamSniffer and others. More details can be found here.
- If you’re curious about the Pectra/EIP-7702 spec and its current status, plus the Type-4 details, head over to ethereum.org and EIPs.
- There's also the EIP-3074 withdrawal you might want to check out. Find it here.
- Don't miss the latest on the L2BEAT Stage framework and the updates on Stage-1 principles. You can find more info here.
- OP Labs has released the details on OP Stack fault proofs and the Stage-1 announcement. Get the scoop here.
- The Base network reached its Stage-1 transition, and it’s interesting to look at the outage context for resilience testing. Read more on Coindesk.
- There's also the Arbitrum BoLD deployment you should look into for the latest docs and news. Check it out here.
- For those interested in MEV, the private RPC docs (Flashbots Protect and MEV Blocker) and their setup details are available here.
- Trail of Bits has shared some practical insights on Echidna/Medusa invariant work and IDD guidance. It's worth a read here.
- If you’re dealing with smart contracts, the Slither static analysis and upgradeability review can be found here.
- Don’t forget to check out the Certora Prover docs and the 2025 CLI updates available here.
- Last but not least, take a look at the OWASP Smart Contract Top 10 for 2025, along with EEA EthTrust v3 and SCSVS v2. Find it here.
Get a free security quick-scan of your smart contracts
Submit your contracts and our engineer will review them for vulnerabilities, gas issues and architecture risks.
Related Posts
ByAUJay
Web3 App Penetration Testing: Avoiding Sneaky Scope Creep Traps
A handy guide for nailing down Web3 pentests without wasting time or money. Discover the 12 biggest scope-creep pitfalls in L1/L2, AA/4337, bridges, oracles, MEV, and supply chains--and get tips on how to secure them with clear acceptance criteria.
ByAUJay
Web3-Anwendungs-Tests: Testfälle für Smart-Account-Wallets und Signaturen
**Kurzfassung:** In diesem Leitfaden zeigen wir dir, wie du Smart-Account-Wallets (ERC‑4337/Modular Accounts) und ihre Signaturpfade besser verstehen kannst. Wir gehen durch die verschiedenen Schritte, von EOA → Smart‑Account‑Delegationen (EIP‑7702) bis hin zu ERC‑1271/6492/712 und den Themen Paymaster und Bundler.
ByAUJay
Penetration Testing Web3 Apps: Exploring Common Attack Paths for Wallet Connectors
A handy guide for decision-makers on testing wallet connectors in today’s web3 environments. We dive into specific attack paths, outline exactly what you should be testing, and recommend some solid controls that will be crucial for 2025, helping to significantly minimize risks like draining and phishing attacks.

