Tips for low-volume A/B testing

Giff Constablelean, startups

I tweeted about low-volume A/B testing the other day, and wanted to share a few thoughts.  I spent a few minutes talking to Hiten Shah (CEO of KISSMetrics) on the phone, who probably knows as much about A/B testing as anyone out there. He passed on three tips that I wanted to share here:

1. only do one variation at a time
2. the more radical the variation the better
3. 100 conversions per variation is a good target although can sometimes see significant differences with less

Two other really smart product designers also weighed in on twitter:
From Laura Klein (@lauraklein): “I always rely more on qualitative in those situations. Bad data are worse than no data in some cases.”
Followed up by Amy Jo Kim (@amyjokim): “what she said”

I agree that when it comes to low volume testing, the need to have qualitative feedback is paramount. A lot of judgement and intuition is required.

Teeing up a Real Case: Unpakt
As with all this stuff, context is everything. One project I have been working on (for Proof) is just starting to do A/B testing. The startup is called Unpakt, and they are creating a “Kayak for moving companies”.

It is early days for the company, but their beta site is now live for the New York market. This has been a whisper-soft launch, but the first revenue has already started coming through the door.

From a UX perspective, this has been a very tricky project. First, planning a move is immensely complex and data-intensive. Second, the target customer is hard to find for testing. The target customer is someone who is within one to four months of a move, and who wants to work with a professional moving company. If you aren’t in that context, you probably don’t feel the pain.

Within these challenges, we did lots of krug-esque qualitative testing while getting the founder comfortable with launching early / imperfect and iterating in the market.

The qualitative learnings have to be taken with a grain of salt because they are artificial. In artificial settings it is easy to test for usability, but it is much harder to test for value to the customer and thus desire.

Now that the product is live and in the wild, we get to see truth. However, here again the raw metrics have to be taken with a grain of salt, since not everybody coming to the site has actual intent. Google AdWords helps us carve out a group of users that we believe have actual purchase intent. Yes, we look at our overall conversion funnel, but we also look specifically at the funnel for AdWords-sourced users.

If I can stress one thing: when testing, be cognizant of who your testers are and what their context is. If you don’t take that into consideration, it can distort your thinking. Try not to rely on one source of people for your data.

Moving is so complex (at least if your goal is to provide real prices) that many of our simpler UX approaches failed early on. Counter-intuitively, more complex approaches have performed better in our usability tests. But we still believe that we can try new, simpler approaches that will perform even better. As is always the case, there is debate within the team, and this is where the A/B tests can really help out.

We’ve just built the infrastructure to run experiments, or sets of experiments, against each other. The team’s next step is to implement a series of changes to the UX that will hopefully simplify and improve things. We’ll do both qualitative testing and live A/B tests. As Hiten advises above, we’ll try things serially, which might require some patience but will make for better decisions.  We’ll gather 50 to 100 data points for each approach. If the newer, simpler approach performs equivalent or better than the current approach, then the new method will win (normally I would say don’t make a change without clear progress, but in this case, I would still choose simple over complex in a tie).

Ultimately, judgement calls will still be required, but at least they will be counterbalanced by real qualitative and quantitative data. All this makes for a much healthier process too, because we don’t have the ego battles that can plague a lot of product development projects.

p.s. Unpakt is looking for a head of product, rails developer and marketing director!