The Startling Truth: Expert Ad Scoring Software Increase Search Engine Advertising Profitability

The Startling Truth:
Expert Ad Scoring Software
Increases Search Advertising Profitability

Google Adwords and Yahoo Sponsored Search Offer
Two Ways to Display Ads: Optimize or Rotate

Optimization

When Google or Yahoo optimizes the display of two or more PPC ads, Google
determines which of the ads has the highest click-through rate (CTR). Over
time, they will show the ad with the higher CTR more often than the other ads
in the ad group. Eventually, one of the ads will be shown 95% of the time due
to Google’s and Yahoo’s CTR optimization.

Ad Rotate

Google and Yahoo provide the option to specify that ads be shown in rotation.
For example, if an ad group has three ads in an ad group, they will show ad 1,
then ad 2, then ad 3, and then begin the rotation again with ad 1. This option
evenly distributes impressions across all 3 ads (for the most part).

If your goal as an advertiser is to determine which ad
generates the greatest number of conversions, the highest
ROI and/or the most profit -- Ad Optimization is the wrong
choice.

CTR optimization does not optimize based on conversions
or profitability.

According to Forrester Research:

“More Than 50% of Search-Based Ads are Not Effective.”

Why?
than the other ads in the ad group. Eventually, one of the ads will be shown
95% of the time due to Google’s and Yahoo’s CTR optimization.

Ad Rotate

Google and Yahoo provide the option to specify

Page 2
The Startling Truth: Expert Ad Scoring Software Increases Search Engine Advertising Profitability
Logic361 (C) Copyright April 2010 All Rights Reserved

Forrester Research
Forrester Research, addressed ad
testing issues surrounding the
best and worst of paid search in
2009, in a report of the same
name. In the report, Forrester
judged the success or failure of
paid search campaigns based on a
system of judging they called the
―Search Marketing Review‖
(hereafter ―the Review‖).

The Review was Forrester’s
objective means of reviewing 300
of the most relevant search terms
spread out among major industry
verticals to diagnose various
―search program strengths,
defects, and ways to improve
effectiveness.‖* The report
identified ways that paid search
campaigns were falling short
throughout specific industry More than 50% of Search-Based Ads
verticals. Are Not Effective: Why?

According to the Review (which Forrester stated that the ads failed in
included five common search three main categories:
terms across six different
industries and then a qualitative 1. Keywords—they were inefficiently used
evaluation of the first 10 Google or not used at all in the ad itself
AdWords ads that appeared),
more than half of search-based 2. Conversions—the ads failed to screen
ads failed to be effective. out irrelevant clickers or move them to
take action

3. Landing pages—visitors were taken to
pages not relevant to their search, or
pages that offered ―not enough content
or too much detail*‖

Forrester’s report can be purchased at:
http://www.forrester.com

Page 3

Assessing A/B Ad Testing: Dollars & Sense

A study of Logic361’s 73 clients, 11
industries found the percentages of
non-value generating ads were
consistent with Forrester Research’s
findings. The following is a sample
of the data we reviewed (green
indicates a positive change from the
previous time period and red a
negative change)

Over the last 3 years, Logic361’s
software has analyzed over $100
million dollars in search based
advertising (19 billion impressions,
119 million clicks and 5.2 million
conversions.) Our findings include:

 85% of our client’s ad groups
had 2 or more search ads in
the majority of their ads
groups

 Search based ads were
displayed an average of 77
days regardless of the number
of impressions, clicks or
conversions.

 34% of client’s ad groups had
ads that had been running for
110 days or more.
www.logic361.com

Page 4

If The Majority of Advertisers are Doing A/B
testing – Why are More Than Half of Search-
Based Ads Ineffective?

The answer to this important
question can be found by
answering the following two The answer to dramatically
questions: increasing the effectiveness of
1. What is the best
search based advertising is
methodology for determining AdScoringTM.
the minimum number of
impressions, clicks and
conversions necessary to
confidently evaluate the
AdScoringtm Analysis Software:
progress of an A/B ad test?
Dramatically increases the
2. What are the total costs of
effectiveness of search based A/B
A/B ad testing and why is it ad testing and increases
important to conclude tests profitability.
as quickly as possible?
In plain terms, Logic361 thinks of
Over the past 9 years, technology an ―advertising impression‖ as a
solutions have emerged to assist coin toss with a low probability of
search marketers with keyword coming up heads (being clicked).
research, automated bidding and When we compare search based
account management. Yet, ad ads, we are comparing coins and
asking: which one is most likely to
testing continues to be measured
yield the most heads in the long
and managed using inconsistent term?
and contradictory industry ―rules of
thumb.‖ The simple approach is to pick the
coin that has the highest
proportion of heads. We would like
to be able to say not just that one
coin is better, but have some level
www.logic361.com

of confidence in our judgment.

Page 5

For example, in the following 30 day A/B ad test (actual results) which ad is
the most effective and why?

Click Conversion Cost Per Cost Per
Impressions Clicks Rate Conversions Rate Click Spend Conversion
Challenger (B) 131,995 1,036 0.78% 11 1.06% $ 0.77 $ 795 $ 72.24
Champion (A) 110,748 823 0.74% 21 2.55% $ 0.80 $ 657 $ 31.28
Overall Results 242,743 1,859 32 $ 1,452 $ 45.36

The answer is the Champion (A) ad based on the number of conversions and
cost per conversion.

What if you knew after 7 days that the Champion Ad (A) had a higher
statistical probability of being more effective than the Challenger Ad (B) –
would you conclude the test? The answer is ―yes‖ especially given the
significantly differences in effectiveness.

After 7 days, Logic361’s ad scoring algorithm determined that there was a
72% probability that the Champion Ad (A) would out-perform the Challenger
Ad (B) and by the 30th day the probability had increased to 93%.

What was the opportunity cost of not concluding the test at the end of 7
days? What would have been the impact of correspondingly shifting the
Challenger Ad impressions to the Champion Ad?

The Effectiveness of “Shifting” Impressions from “B” to “A”

Click Conversion Cost Per Cost Per
Impressions Clicks Rate Conversions Rate Click Spend Conversion
Shifted From Challenger (B) 101,636 752 0.74% 19 2.55% $ 0.80 $ 602 $ 31.37
Champion (A) 110,748 823 0.74% 21 2.55% $ 0.80 $ 657 $ 31.28
Overall Results 212,384 1,575 40 $ 1,259 $ 31.33
Previous Overall Results 242,743 1,859 32 $ 1,452 $ 45.38

Net Conversion Gain 8 Net CPC Decrease $ (14.05)

Had the search marketer concluded the test after 7 days, the result would
www.logic361.com

have been a 25% increase in conversions (from 32 to 40) and a 31%
decrease in cost per conversion ($31.33 versus $45.38).

Page 6

The Hidden Cost of Not Concluding A/B Ad
Tests Quickly
With Without
AdScoring AdScoring
(40 Orders) (32 Orders)
Revenue $ 9,880 $ 7,904
(Average Order $247.00)

Cost of Goods Sold (70%) 6,916 5,533
Search Advertising Cost 1,259 1,452

Contribution Margin $ 1,705 $ 919

Contribution Margin Increase (85%) $ 786

The financial impact of making a decision sooner would have been
an 85% increase in contribution margin.

Customer Case Study

When Logic361’s AdScoringTM is applied to an entire search advertising
account dramatic financial results can be achieved.

Utilizing the Logic361’s software our professional services consultants
analyzed 4,000 ad groups for an international retailer with 210 online stores.
Our software identified the one or two best performing ads per ad group
(based on revenue and client profit targets) and modeled the impact of
pausing 37% of the low or non-performing ad inventory and shifting over 3
million under-monetized monthly impressions.
www.logic361.com

Page 7

The actual results achieved for this client were:

 26% increase (536) in monthly conversions (from 2,063 to 2,599)

 37% decrease in average cost per conversion (From $35.54 to $22.39)
which equated to a monthly advertising cost savings of $34,176 per
month (based on the increased conversions.)

The third week following the pausing of the low and non-performing ad
inventory the client realized the highest record week of sales in the history
of company.

Summary

Why Should You Use Ad Scoring to Improve A/B Ad Testing?
Writing and determining effective ad copy is arguably the most important
responsibility a search marketer has. The effectiveness of search ads is the
keystone of search based advertising ROI. Given its importance, it’s
surprising that statistical ad scoring is not a standard practice on par with
bid management.

Logic361 is the first company (to our knowledge) to develop a search based
ad scoring solution capable of systematically analyzing hundreds of
thousands of simultaneous A/B ad tests. The results that have been achieved
clearly demonstrate that assessing ad inventory, scoring A/B ad tests and
re-distributing under monetized impressions can generate dramatic bottom
line results.

We developed our ad scoring solution with two goals in mind. First, we
wanted to be 100% confident in our scoring methodology. Second, we
wanted the scoring algorithm/methodology to be precise but not so complex
that it was a ―black box‖ that could not be easily understood and explained.

The addendum provides the opportunity to review our methodology. We
welcome your questions and comments.
www.logic361.com

Page 8

About Logic361

Logic361 AdScoringTM software speeds decision-making and empowers
search engine advertisers with the unprecedented capability to quickly
assess, prioritize and respond to changes in paid search advertising
performance. The Company’s data driven, scientific approach combines
automated analyses, results-orientated prioritization and advanced decision-
making methodologies -- within a single, powerful application.

For information, or to schedule an analysis of your ad inventory, contact:

Stephen Schramke
stephen@logic361.com
(206) 842-0747

Copyright © 2010 Logic361 Corporation. Logic361TM, the Logic361 logo, and AdScoringTM are
trademarks of Logic361 Corporation that may be registered in some jurisdictions. All other company
and product names are the property of their respective owners.

All rights reserved worldwide. No part of this publication may be reproduced, transmitted, transcribed,
stored in a retrieval system, or translated into any human or computer language in any form or by any
means without the express written permission of:

Logic361 Corporation
93 S Jackson Street, Suite 22340
Seattle, WA 98104

This publication is provided as is without warranty of any kind, express or implied, including, but not
limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-
infringement.

This publication could include technical inaccuracies or typographical errors. Changes are periodically
added to the information herein. These changes will be incorporated in new editions of the publication.
Logic361 Corporation may make improvements and/or changes at any time to the product(s) and/or
www.logic361.com

the programs(s) described in this publication.

All terms mentioned in this publication that are known to be trademarks or service marks have been
appropriately capitalized. Use of a term in this publication should not be regarded as affecting the
validity of any trademark or service mark.

Page 9

Addendum
How Logic361 Developed a Solution for Assessing Search Based A/B
Ad Tests

Logic361’s software engineering team partnered with the Dataspora
company to develop a programmable statistical methodology for accessing
paid search ad performance.

The Dataspora team was led by Michael E. Driscoll, who has a decade of
experience developing large-scale databases and data mining algorithms
within industry, government, and academic institutions. Michael has a Ph.D.
in Bioinformatics from Boston University and an A.B. from Harvard
University.

Michael was assisted by John Mount, Ph.D.. John is an expert in web-scale
algorithms and statistics. His interests include optimization and recent
positions include directing research at online retailer Shopping.com. John
has a Ph.D. in Computer Science from Carnegie Mellon University and an
A.B. in Mathematics from U.C. Berkeley.
www.logic361.com

Page 10

Logic361’s Statistical Methodology
For Assessing Ad Performance

Comparing the Click Rate Performance of Two Ads

When we compare the click rate performance of two ads, we are comparing
two binomial random distributions and asking: which one is better? The
graphic below shows two curves: (i) a champion ad in blue with a 5% click
rate based on 10,000 impressions, and (ii) a challenger ad in green with a
7% click rate based on 100 impressions.

We know the challenger ad is performing better on a click rate basis, but
because we have only 100 data points, we can’t say for certain that this is
not a result of chance.

What we’d like to do is
have a measure that
could tell us the
likelihood that our
challenger ad click rate
would end-up lower than
the champion ad by
chance; if we took
samples from the blue
and green distributions,
how often would green
be higher?

Unfortunately,
calculating this for two
binomial distributions is
a non‐trivial task. Fortunately, calculating this for two normal distributions is
easier, and more relevantly, can be implemented programmatically in a
straightforward way.

We start by approximating the binomial distribution to a normal distribution
with mean and variance given by:
www.logic361.com

Page 11

Where N is the sample size (or in this case, impressions), and p is the
success rate (click rate).

As seen below, when the number of impressions N is large this
approximation is excellent (blue is binomial, red is normal) but not so when
N is much smaller than 100.

Calculating the difference between two normal distributions X and Y yields a
normal distribution as a result, with mean and variance given by:

We can express this in terms of two sets of binomial parameters, N and p, as
follows:
www.logic361.com

We now have a probability distribution that describes, given the sample sizes
and click rates of two ads, the likelihood of seeing a given margin of
difference by chance.

Page 12

For our basic question – how confident are we of the observed difference
between challenger and champion ad – we can calculate a p value by way of
the z statistic.

The z statistic is a normalized measurement of deviation from the mean: for
a given value in a distribution, it’s the value’s distance from the mean
divided by the standard deviation. Thus z scores have a zero mean and a
standard deviation of 1: the classic normal distribution. This has relevance
because programmatically, once we have z scores, we can easily convert
them into p values: percentages that say ―there’s 95% chance the champion
or challenger ad is better.‖

z scores are calculated as:

Where the mean variance is defined as:

We can convert z scores into p values using the cumulative density function
for the normal. Given a value, it returns the quartile.

When 2 Ads Have a Small Number of Clicks: Fisher’s Exact Test

When our sample sizes are small (as is more often the case with clicks
rather than impressions), we calculate our confidence metric (that a
challenger is outperforming a champion or vice versa) using Fisher’s Exact
Test. We use Fisher’s Exact Test when the sum of observations is less than
20 (this becomes computationally infeasible on a standard server platform to
www.logic361.com

perform for N > 20).

Page 13

The Fisher Exact test relies on exhaustively calculating all possible
outcomes, and identifying those that match or exceed our observed
difference between challenger versus champion impressions and clicks. This
fraction represents the probability that our difference could have occurred by
chance (one minus this fraction is thus our confidence or p‐value).

First we define standard five Fisher quantities (Fischer’s test is usually
applied to 2 x 2 tables) as:

We can then calculate our p value as follows:

In the figure below, we show the implicit distributions for the click rates on
two ads, with 1.88% CTR and 2.53% CTR, respectively. Based on these
underlying distributions, we have 35% confidence that the higher ad (in
red), will outperform the lower (in blue) going forward.
www.logic361.com

Page 14

This diagram illustrates our approach visually: that we can estimate our
confidence value by looking at the amount of overlap of our distributions.
The more overlap, the less confident we are that the higher one is higher.
We can quantify this value exactly as:

Where the functions are probability distribution functions, we know that the
red area sums to 1, so it reduces to simply 1- area(intersection). In our
example, about 65% of the red area is in the overlap, thus only 35% is
strictly greater.

In general, we can calculate the area of overlap between two random,
unimodal probability distribution functions X and Y via integration, as follows

The following are additional examples:
www.logic361.com

Page 15

Calculating the Confidence Metric for the Cost Distributions

We have observable data for impressions, clicks, and acquisitions. Average
cost per click is a constant (for a given ad). Given our observable data, we
first (i) infer the implicit distributions from which these values are drawn
(sometimes called the posterior distribution), and (ii) compare two
distributions and state a level of confidence that, for future observations,
one will remain higher than another.

We then extend our approach of analyzing probability functions to estimating
differences between other metrics, such as cost-per-acquisition, cpa. The
cpa is a function that depends on a random variable for acquisitions, a.
Given a pair of cpa measures for two ads, call them cpaX and cpaY we can
generate distribution functions which are conditioned on a, the number of
acquisitions.

Where cpa(a) is a function defined (from our table) as:
www.logic361.com

Page 16

And because cpc and c are constants, we only integrate over all the possible
values of a. We perform the same analysis for spend to value, plugging in
the values from our table, and integrating over all possible values of c while
holding other variables constant.

Variable Symbol Derivation Example Posterior Probability Distribution
Impressions 100,00
0
Clicks 1,100 binom (i,ctr)
Acquisitions 22 binom (c,ar)
Click-thru 1.1% beta(i-c,c)
rate
Acquisition 2.0% beta(c-a,a)
rate
Cost per $2.00
click
Cost per $100
acquisition
Total spend $2200
Total value $3300
Spend to 66.0%
value
www.logic361.com

Page 17

The Startling Truth: Expert Ad Scoring Software Increase Search Engine Advertising Profitability

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Recently uploaded

Recently uploaded (20)

The Startling Truth: Expert Ad Scoring Software Increase Search Engine Advertising Profitability