Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

13 Easy AB and Split Test Screwups - Conversionista Meetup - Stockholm

My first deck showing all the common mistakes and screwups that plague AB testers. Learn how to avoid the problems of biased tests, broken recording and baffling results - be in control of your testing accuracy and great lifts!

  • Inicia sesión para ver los comentarios

13 Easy AB and Split Test Screwups - Conversionista Meetup - Stockholm

  1. 1. 13 Easy split testing f***ups 1 @OptimiseOrDie
  2. 2. Top Fuckups for 2013 1. Testing in the wrong place 2. Your hypothesis inputs are crap 3. No analytics integration 4. Your test will finish after you die 5. Not testing for long enough 6. No QA for your split test 7. Opportunities are not prioritised 8. Testing cycles are too slow 9. Your test fails 10. The result is ‘about the same’ 11. Test flips or moves around 12. Nobody ‘feels’ the test 13. You forgot you were responsive @OptimiseOrDie
  3. 3. @OptimiseOrDie • UX and Analytics (1999) • User Centred Design (2001) • Agile, Startups, No budget (2003) • Funnel optimisation (2004) • Multivariate & A/B (2005) • Conversion Optimisation (2005) • Persuasive Copywriting (2006) • Joined Twitter (2007) • Lean UX (2008) • Holistic Optimisation (2009) Was : Group eBusiness Manager, Belron Now : Consulting
  4. 4. Timeline - 1998 1999 - 2004 2004-2008 2008-2012 @OptimiseOrDie
  5. 5. #1 : You’re doing it in the wrong place @OptimiseOrDie
  6. 6. #1 : You’re doing it in the wrong place There are 4 areas a CRO expert always looks at: 1. Inbound attrition (medium, source, landing page, keyword, intent and many more…) 2. Key conversion points (product, basket, registration) 3. Processes and steps (forms, logins, registration, checkout) 4. Layers of engagement (search, category, product, add) 1. 2. 3. 4. Use visitor flow reports for attrition – very useful. For key conversion points, look at loss rates & interactions Processes and steps – look at funnels or make your own Layers and engagement – make a model Let’s look at an example I’ve used recently @OptimiseOrDie
  7. 7. Examples – Concept Bounce Engage Outcome @OptimiseOrDie
  8. 8. Examples – Search or Category Product Page Add to basket View basket Checkout Bounce Complete @OptimiseOrDie
  9. 9. Examples – Login to Account Content Engage Start Application Type and Details Eligibility Bounce Photo Complete @OptimiseOrDie
  10. 10. 6.3 – Examples – Guide Dogs Content Engage Donation Pathway Donation Page Starts process Funnel steps Bounce Complete @OptimiseOrDie
  11. 11. 6.3 – Within a layer Exit Page 1 Page 3 Page 2 Page 4 Wishlist Contact Page 5 Email Like Deeper Layer @OptimiseOrDie Micro Conversions
  12. 12. #1 : You’re doing it in the wrong place • Get to know the flow and loss (leaks) inbound, inside and through key processes or conversion points. • Once you know the key steps you’re losing people at and how much traffic you have – make a money model. • Let’s say 1,000 people see the page a month. Of those, 20% (200) convert to checkout. • Estimate the influence your test can bring. How much money or KPI improvement would a 10% lift in the checkouts deliver? • Congratulations – you’ve now built the worlds first IT plan with a return on investment estimate attached! • I’ll talk more about prioritising later – but a good real world analogy for you to use: @OptimiseOrDie
  13. 13. Think like a store owner! If you can’t refurbish the entire store, which floors or departments will you invest in optimising? Wherever there is: • • • @OptimiseOrDie Footfall Low return Opportunity
  14. 14. #2 : Your hypothesis inputs are all wrong Insight - Inputs Opinion Cherished notions Marketing whims Cosmic rays Not ‘on brand’ enough Ego IT inflexibility Panic Internal company needs #FAIL Competitor change An article the CEO read Some dumbass consultant Competitor copying Dice rolling Guessing Knee jerk reactons Shiny feature blindness @OptimiseOrDie
  15. 15. #2 : These are the inputs you need… Insight - Inputs Usability testing Forms analytics Search analytics Voice of Customer Market research Eye tracking Customer contact A/B and MVT testing Big & unstructured data Insight Social analytics Session Replay Web analytics Segmentation Sales and Call Centre Surveys Customer services Competitor evals @OptimiseOrDie
  16. 16. #2 : Solutions • You need multiple tool inputs – Tool decks are here : • Usability testing and User facing teams – If you’re not using these properly, you’re hosed • Session replay tools provide vital input – Get vital additional customer evidence • Simple page Analytics don’t cut it – Invest in your analytics, especially event tracking • Ego, Opinion, Cherished notions – fill gaps – Fill these vacuums with insights and data • Champion the user – Give them a chair at every meeting @OptimiseOrDie
  17. 17. #3 : No analytics integration • • • • • • Investigating problems with tests Segmentation of results Tests that fail, flip or move around Tests that don’t make sense Broken test setups What drives the averages you see? @OptimiseOrDie
  18. 18. These We still keep Danish porn watching our sites are so old AB tests in hardcore! retirement • Use a test length calculator like this one: #4 : The test will finish after you die •
  19. 19. #5 : You don’t test for long enough • The minimum length – – – – – 2 business cycles (comparison) Always test ‘whole’ not partial cycles Don’t self stop! Usually a week, 2 weeks, Month Be aware of multiple cycles • How long after that – – – – – – – – 95% confidence or higher is my aim – and often hit higher than this I aim for a minimum 250 outcomes, ideally 350+ for each ‘creative’ If you test 4 recipes, that’s 1400 outcomes needed You should have worked out how long each batch of 350 needs before you start! If you segment, you’ll need more data It may need a bigger sample if the response rates are similar* Use a test length calculator but be aware of minimums Important insider tip – watch the error bars! The +/- stuff – let’s explain * Stats geeks know I’m glossing over something here. That test time depends on how the two experiments separate in terms of relative performance as well as how volatile the test response is. I’ll talk about this when I record this one! This is why testing similar stuff sux. 0 2
  20. 20. #2 : The tennis court – Let’s say we want to estimate, on average, what height Roger Federer and Nadal hit the ball over the net at. So, let’s start the match: @OptimiseOrDie
  21. 21. First Set Federer 6-4 – We start to collect values 63.5cm +/- 2cm 62cm +/- 2cm @OptimiseOrDie
  22. 22. Second Set – Nadal 7-6 – Nadal starts sending them low over the net 62.5cm +/- 1cm 62cm +/- 1cm @OptimiseOrDie
  23. 23. Final Set Nadal 7-6 – We start to collect values 62cm +/- .3cm 61.8cm +/- .3cm
  24. 24. Let’s look at this a different way 9.1 ± 0.3% 62.5cm +/- 1cm @OptimiseOrDie
  25. 25. Graph is a range, not a line: 9.1 ± 0.3%
  26. 26. #5 : Summary • The minimum length: – – – – 2 business cycles minimum, regardless of outcomes 250+, prefer 350+ outcomes in each 95%+ confidence Error bar separation between creatives • Pay attention to: – – – – – Time it will take for the number of ‘recipes’ in the test The actual footfall to the test – not sitewide numbers Test results that don’t separate – makes the test longer This is why you need brave tests – to drive difference The error bars – the numbers in your AB testing tool are not precise – they’re fuzzy regions that depend on response and sample size. – Sudden changes in test performance or response – Monitor early tests like a chef! @OptimiseOrDie
  27. 27. #6 : No QA testing for the AB test?
  28. 28. #6 : What QA testing should I do? • • • • • • • Cross Browser Testing Testing from several locations (office, home, elsewhere) Testing the IP filtering is set up Test tags are firing correctly (analytics and the test tool) Test as a repeat visitor and check session timeouts Cross check figures from 2+ sources Monitor closely from launch, recheck, watch @OptimiseOrDie
  29. 29. #7 : Opportunities are not prioritised Once you have a list of potential test areas, rank them by opportunity vs. effort. The common ranking metrics I use include: •Opportunity (profit, revenue) •Dev resource •Time to market •Risk / Complexity Make yourself a quadrant diagram and plot them
  30. 30. #8 : Your cycles are too slow Conversion 0 6 12 18 Months @OptimiseOrDie
  31. 31. #8 : Solutions • Give Priority Boarding for opportunities – The best seats reserved for metric shifters • Release more often to close the gap – More testing resource helps, analytics ‘hawk eye’ • Kaizen – continuous improvement – Others call it JFDI (just f***ing do it) • Make changes AS WELL as tests, basically! – These small things add up • RUSH Hair booking – Over 100 changes – No functional changes at all – 37% improvement • Inbetween product lifecycles? – The added lift for 10 days work, worth 360k @OptimiseOrDie
  32. 32. #8 : Make your own cycles @OptimiseOrDie
  33. 33. #9 : Your test fails 34 @OptimiseOrDie
  34. 34. #9 : Your test fails • Learn from the failure! If you can’t learn from the failure, you’ve designed a crap test. • Next time you design, imagine all your stuff failing. What would you do? If you don’t know or you’re not sure, get it changed so that a negative becomes insightful. • So : failure itself at a creative or variable level should tell you something. • On a failed test, always analyse the segmentation and analytics • One or more segments will be over and under • Check for varied performance • Now add the failure info to your Knowledge Base: • Look at it carefully – what does the failure tell you? Which element do you think drove the failure? • If you know what failed (e.g. making the price bigger) then you have very useful information • You turned the handle the wrong way • Now brainstorm a new test @OptimiseOrDie
  35. 35. #10 : The test is ‘about the same’ • • • • • Analyse the segmentation Check the analytics and instrumentation One or more segments may be over and under They may be cancelling out – the average is a lie The segment level performance will help you (beware of small sample sizes) • If you genuinely have a test which failed to move any segments, it’s a crap test – be bolder • This usually happens when it isn’t bold or brave enough in shifting away from the original design, particularly on lower traffic sites • Get testing again! @OptimiseOrDie
  36. 36. #11 : The test keeps moving around • There are three reasons it is moving around – Your sample size (outcomes) is still too small – The external traffic mix, customers or reaction has suddenly changed or – Your inbound marketing driven traffic mix is completely volatile (very rare) • • • • Check the sample size Check all your marketing activity Check the instrumentation If no reason, check segmentation @OptimiseOrDie
  37. 37. #11 : The test has flipped on me • • • • • Something like this can happen: Check your sample size. If it’s still small, then expect this until the test settles. If the test does genuinely flip – and quite severely – then something has changed with the traffic mix, the customer base or your advertising. Maybe the PPC budget ran out? Seriously! To analyse a flipped test, you’ll need to check your segmented data. This is why you have a split testing package AND an analytics system. The segmented data will help you to identify the source of the shift in response to your test. I rarely get a flipped one and it’s always something changing on me, without being told. The heartless bastards.
  38. 38. #12 : Nobody feels the test • • • • • • • • You promised a 25% rise in checkouts - you only see 2% Traffic, Advertising, Marketing may have changed Check they’re using the same precise metrics Run a calibration exercise I often leave a 5 or 10% stub running in a test This tracks old creative once new one goes live If conversion is also down for that one, BINGO! Remember – the AB test is an estimate – it doesn’t precisely record future performance • This is why infrequent testing is bad • Always be trying a new test instead of basking in the glory of one you ran 6 months ago. You’re only as good as your next test. @OptimiseOrDie
  39. 39. #13 : You forgot you were responsive • • • • • • • • If you’re AB testing a responsive site, pay attention Content will break differently on many screens Know thy users and their devices Use bango or google analytics to define a test list Make sure you test mobile devices & viewports What looks good on your desk may not be for the user Harder to design cross device tests You’ll need to segment mobile, tablet & desktop response in the analytics or AB testing package • Your personal phone is not a device mix @OptimiseOrDie
  40. 40. Top Fuckups for 2013 1. Testing in the wrong place 2. Your hypothesis inputs are crap 3. No analytics integration 4. Your test will finish after you die 5. Not testing for long enough 6. No QA for your split test 7. Opportunities are not prioritised 8. Testing cycles are too slow 9. Your test fails 10. The result is ‘about the same’ 11. Test flips or moves around 12. Nobody ‘feels’ the test 13. You forgot you were responsive @OptimiseOrDie
  41. 41. BONUS : What is a good conversion rate? Higher than the one you had last month! 42
  42. 42. Is there a way to fix this then? Conversion Heroes! 43 @OptimiseOrDie
  43. 43. Slides at Email : Twitter : @OptimiseOrDie : 44