The previous post walked through an end-to-end implementation: a minimal token contract, off-chain state reconstruction, and a React frontend — all the way from `mint()` to MetaMask. This post picks up where that left off: how do you QA something like this?
I’m not a blockchain engineer (yet), but QA patterns port well across domains, and borrowing what already works elsewhere is how I learn fastest.
The contract only does three things: `mint`, `transfer`, and `burn`, but even that is enough to practice the full QA toolchain: static analysis, mutation testing, gas profiling, formal verification.
The code is in `egpivo/ethereum-account-state`.
Blockchain QA Pyramid: from static analysis at the base to formal verification at the topBefore adding anything new, the project already had:
All tests passed. Coverage looked fine. So why bother with more?
Because “all tests pass” does not mean “all bugs are caught.” 100% line coverage can still miss a real bug if no assertion checks the right thing.
Slither(Trail of Bits) catches issues that are invisible to tests: reentrancy, unchecked return values, interface mismatches.
./scripts/run-qa.sh slither
Result: 1 Medium finding: `erc20-interface`: `transfer()` doesn’t return `bool`.
This is expected. The contract is intentionally not a full ERC20: it is an educational state machine. But the finding is not academic:
If someone later imports this token into a protocol expecting ERC20, the interface mismatch would silently fail. Slither flags it now so the decision is conscious.
./scripts/run-qa.sh coverageCoverage result.
One uncovered function: `BalanceLib.gt()`. We will come back to this.
forge coverage output: 24 tests passed, Token.sol coverage table./scripts/run-qa.sh gas
Baseline gas costs for the three operations:
Gas in terms of operationsOn subsequent runs, `forge snapshot — diff` compares against the baseline. A 20% gas regression in `transfer()` is a real cost to every user — catching it before merge is cheap.
This is where things got interesting. Gambit(Certora) generates mutants: copies of `Token.sol` with small deliberate bugs (`+=` to `-=`, `>=` to `>`, conditions negated). The pipeline runs the full test suite against each mutant. If a mutant survives (all tests still pass), that is a concrete test gap.
./scripts/run-qa.sh mutation
Result: 97.0% mutation score — 32 killed, 1 survived out of 33 mutants.
Gambit’s output log shows each mutant and what it changed. A few examples:
Generated mutant #7: BinaryOpMutation — Token.sol:168
totalSupply = totalSupply.add(amountBalance) → totalSupply = totalSupply.sub(amountBalance)
KILLED by test_Mint_Success
Generated mutant #19: RelationalOpMutation — Token.sol:196
if (!fromBalance.gte(amountBalance)) → if (fromBalance.gte(amountBalance))
KILLED by test_Transfer_Success
Generated mutant #28: SwapArgumentsMutation — Token.sol:81
return Balance.unwrap(a) > Balance.unwrap(b) → return Balance.unwrap(b) > Balance.unwrap(a)
SURVIVED ← no test caught thisGambit mutation testing: 32 killed, 1 survived, mutation score 97.0%
The surviving mutant swapped `a > b` to `b > a` in `BalanceLib.gt()`. No test caught it because `gt()` is dead code. It is never called anywhere in `Token.sol`.
Coverage flagged 91.67% functions but could not explain the gap. Mutation testing did: `gt()` is dead code, nothing calls it, and nobody would notice if it were wrong.
Dead or unprotected code in smart contracts has real precedent.
The function was not intended to be callable, but nobody tested that assumption. Our `gt()` is harmless by comparison, but the pattern is the same: code that exists but is never exercised is code that nobody is watching.
Halmos(a16z) reasons about all possible inputs symbolically. Where fuzz tests sample random values and hope to hit edge cases, Halmos proves properties exhaustively.
./scripts/run-qa.sh halmos
Result: 9/9 symbolic tests pass — all properties proven for all inputs.
Properties verified:
Verified propertiesOne practical note: Halmos 0.3.3 does not support `vm.expectRevert()`, so I could not write revert tests the normal Foundry way. The workaround is a try/catch pattern — if the call succeeds when it should revert, `assert(false)` fails the proof:
function check_mint_reverts_on_zero_address(uint256 amount) public {
vm.assume(amount > 0);
try token.mint(address(0), amount) {
assert(false); // should not reach here
} catch {
// expected revert - Halmos proves this path is always taken
}
}
Not the prettiest, but it works — Halmos still proves the property for all inputs. This is the kind of thing you only find out by actually running the tool.
For context on why formal verification matters:
The vulnerability was in the code, reviewable by anyone, but no tool or test caught it before deployment. Symbolic provers like Halmos exist precisely to close that gap — they do not sample; they exhaust the input space.
Halmos output: 9 tests passed, 0 failed, symbolic test resultsThe test file is `contracts/test/Token.halmos.t.sol`.
The first post’s architecture has a TypeScript domain layer that mirrors the on-chain state machine. This phase tests whether the two actually agree.
I added fast-check property tests for the TypeScript domain layer, mirroring what Foundry’s fuzzer does for Solidity:
npm test - tests/unit/property.test.ts
Result: 9/9 property tests pass after fixing a real bug.
Properties tested:
fast-check found a real cross-layer consistency bug in `Token.ts` `transfer()`. The shrunk counterexample was immediately clear:
Property failed after 3 tests
Shrunk 2 time(s)
Counterexample: transfer(from=0xaaa…, to=0xaaa…, amount=1n)
→ from == to (self-transfer)
→ verifyInvariant() returned false
Self-transfer (`from == to`) broke the `sum(balances) == totalSupply` invariant. `toBalance` was read before `fromBalance` was updated, so when `from == to`, the stale value overwrote the deduction:
// Before (buggy)
const fromBalance = this.getBalance(from);
const toBalance = this.getBalance(to); // ← stale when from == to
this.accounts.set(from.getValue(), fromBalance.subtract(amount));
this.accounts.set(to.getValue(), toBalance.add(amount)); // ← overwrites the subtraction
Fix: read `toBalance` after writing `fromBalance`, matching Solidity’s storage semantics:
// After (fixed)
const fromBalance = this.getBalance(from);
this.accounts.set(from.getValue(), fromBalance.subtract(amount));
const toBalance = this.getBalance(to); // ← now reads updated value
this.accounts.set(to.getValue(), toBalance.add(amount));
The Solidity contract was not affected: it re-reads storage after each write. But the TypeScript mirror had a subtle ordering dependency that no existing unit test covered.
Cross-layer mismatches at larger scale have been catastrophic.
Our self-transfer bug would not have lost anyone money, but the failure mode is structurally the same: two layers that are supposed to agree, don’t.
Running QA tools on an existing project is never just “install and run.” A few things broke before they worked:
Everything runs through two scripts:
./scripts/run-qa.sh slither gas # just static analysis + gas
./scripts/run-qa.sh mutation # just mutation testing
./scripts/run-qa.sh all # everything
Not every check is fast. Slither and coverage run on every commit. Mutation testing and Halmos are slower — better suited for weekly or pre-release runs.
Five QA layers, each catching a different class of problem.
Layer explanationGambit and fast-check gave the most actionable results in this round.
The QA checks are now wired into GitHub Actions as a six-stage pipeline:
CI Pipeline: Build & Lint fans out to Test, Coverage, Gas, Slither, and Audit stagesGitHub Actions pipeline: Build & Lint gates all downstream stages.
Stage explanationEthereum Account State: QA Pipeline for a Minimal Token was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.


