What most teams get wrong before they test
The most common A/B testing mistake isn't a statistical error — it's a hypothesis error. Teams test the wrong things: button colours, stock photo selections, slight copy variations. These tests are easy to set up and reliably produce no meaningful result. The tests that produce significant, repeatable lifts are always testing structural decisions: the hierarchy of information, the specificity of the value proposition, the placement of social proof.
Before running any test, ask: what is the mechanism by which this change should affect conversion? If you can't articulate a plausible mechanism — 'changing this should improve conversion because [specific reason why this change affects visitor psychology or decision-making]' — the test probably won't produce a useful result even if it reaches statistical significance.
“If you can't articulate why a change should work before testing, you won't understand the result after.”
The 12 tests and what happened
Test 1: Moving social proof logos from below-fold to hero section. Result: +22% on primary CTA. The mechanism: credibility established earlier in the reading flow reduces the risk perception at the decision moment. Test 3: Replacing generic benefit headlines with outcome-specific headlines matched to the ad keyword. Result: +34%. Test 7: Adding a 'Who this is not for' section near the CTA. Result: +18%. The mechanism: explicit exclusion increases trust and purchase confidence among the included audience.
The six tests that produced no result: two button colour changes, one headline length variation, two image style changes (lifestyle vs. product), one navigation simplification. The pattern: aesthetic and stylistic changes almost never move conversion. Structural and messaging changes move it significantly.
Sound familiar? We can help.
We work with ambitious brands to solve exactly this kind of challenge. If you’re dealing with this right now, let’s talk specifics.
What the data actually tells you
Statistical significance is not the same as practical significance. A test that reaches 95% confidence with a 0.3% lift is statistically significant and practically worthless. Always calculate the business impact of the lift before declaring a test worth running — if a 10% conversion improvement on a page with 100 visitors per month produces 10 additional leads, and each lead is worth £50, that's £500 per month. Is that worth the opportunity cost of the test?
The most valuable output of a testing programme isn't the individual test results — it's the accumulated understanding of what your audience responds to. Each test that reveals a mechanism builds a model of how your specific visitors make decisions. That model, built over dozens of tests, is a strategic asset that compounds in value over time.
Ready to solve this for your brand?
We work with ambitious companies who want design that moves their business forward — not just websites that look good in screenshots.
Response within 24 hours · No pressure, no pitch deck