Uncovering Bias in Ad Feedback Data Analyses & Applications ¶

Context¶

Trying to provide a rewarding investment to advertisers
Minimize negative impact to users

Annoying ads have a real cost to users beyond mere annoyance: reduced visits of shorter duration, fewer referrals, long-term user disengagement...

It has been shown that it is better to not show any ads than to show non relevant ones.

=> Using explicit feedback from users can help capture all these effects and once integrated directly into the ad ranking score allows ads to be ranked interms of bit short term and long term expected revenue.

Bias can come from

the fact that ads are targeted
the type of users (interacting a lot or not with content)

1. Analysis¶

2. Bias: explanation and correction¶

3. Ad Ranking¶

1. Analysis¶

Analysis¶

⇨ Investigate if the association between ads and ad feedback is affected by

ads being targeted to users with particular demographics, interests, behaviours
user behaviour (eg clicks, interaction with content)

Users may dislike ads but not indicate this through a feedback option whereas others may always give feedback, however minro the complaint.

Analysis¶

Data¶

40 million distinct users and 200 000 distinct ads

Analysis¶

Metric¶

$$ Hide Rate = \frac{hides}{impressions} $$

feedback is generally a signal of bad quality: explicit negative signal
CTR: the absence of click does not necessarily indicate a low quality ad, high CTR may not mean high quality

Analysis¶

Features - to characterize users¶

User demographics: age, gender, interests, location
User behaviour: ad impressions, ad clicks, article clicks

Analysis¶

Features - to characterize ads¶

Text-based: spam, readability, adult
Image-based: contains text, contains flesh
Advertiser: pagerank score of the ad landing page

Analysis¶

Formula¶

Study the difference of ad hide behaviour for each user variable in turn

$$ HR_{var}(u) = \frac{HR(u) - mean(HR(U))}{mean(Hr(U))}$$

Analysis¶

Results¶

User demographics
- users from different states have different behaviours
- demographic distribitions differ per state
- female users are less likely to hide than male users
User interests
- "Retail" and "Technology" most likely to hide
- "Business/B2B" and "Telecommunication" less likely to hide
- feedback variations accross these variables make them good candidates to indentify bias due to targeting
Ad quality
- more likely to be spam: more likely to be hidden
- easier to read: more likely to be hidden
- most and least "adultness": less likely to be hidden

Analysis¶

Conclusion¶

Some types of users that provide feedback may be more sensitve to ads and have a higher tendency to provide feedback
Since ads are targeted, the ad feedback may be from a group unrepresentative of the general population

2. Modelling and Correcting Bias¶

Modelling Bias¶

An ad quality model based on such biased data wil consistently over or under estimate the quality of ads.

⇨ develop a model able to determine the proportion of bias present in the feedback on ads.

Descriptive model
Only include variables able to explain the source of selection bias

Modelling Bias¶

Simple logistic regression based ad-user model

one ad feature a
one user selection feature u
associated weights

$$ f(\hat{p}) = w_0 + w_a . a + w_u . u + \epsilon $$

only with one ad feature a

$$ f(\hat{p}) = \hat{w}_0 + \hat{w}_a . a + \epsilon $$

If both models are fit to the feedback data with the selection bias then the bias in the coefficient of the ad model is

$$ \hat{w}_a = w_a + \rho w_u $$

$\rho$ is the correlation between a and u
The bias in the ad only model is the true user bias proportional to the correlation between the user and the ad feature

⇨ Goal is to identify user selection bias term $ w_u . u $

Modelling Bias¶

Deviance statistics for the models of interest

Model name
systemic structure
deviance statistic
difference between null model and current model
number of parameters used in the model

table 3

Different age levels are more significative than gender or state
Interest variables are a popular targeting criteria

⇨ Suggests that there is a selection bias due to targeting present in the feedback data

⇨ + selection bias due to user ad sensitivity (click behaviour variables explain additional feedback)

Modelling Bias¶

net effect $\beta$: how individual variable level affects the selection bias present in bias
p : probability of hiding ads

$$ \frac{p}{1-p} = e^\beta $$$$ p = \frac{e^\beta}{1 + e^\beta} $$

table 4

variables included in the model
levels of each variable
effects of each level ($\beta$)

Correcting Bias¶

Formula¶

Formula that explicitly models the user selection bias in addition to the ad features:

$$ f(\hat{p}) = w_0 + w_a . a + I(w_u . u) + \epsilon $$

I binarizes the user selection bias term using a threshold (0.5)

Conclusions¶

⇨ ads with low pagerank, low readability, low adult and low spam levels are considered as low quality

⇨ single features do not characterize the quality of an ad

⇨ features such as adultness and pagerank: for high levels, likely to receive feedback from general population, but less likely from a specific segment of users.

3. Ad Ranking¶

Ad Auctions¶

ads are ranked according to a function of their bids
generally ranked by expected cost per impression eCPI: based on probability that ad is clicked given a user impression
some companies started to incorporate quality score

$$ eCPI = bid_a . P(C_a = 1 | U = u) $$$$ eCPI_q = bid_a . P(C_a = 1 | U = u) . P(Q_a = 1 | U = u) $$

$ P(C_a = 1 | U = u) $: probability that the ad is clicked given a user impression
$ P(Q_a = 1 | U = u) $: probability that the ad will provide a good quality experience, $= 1 - p$ (hide probability)

Using ad feedback in ad ranking: model comparison¶

(i) oracle: empirical, use available logs
(ii) biased: use estimates from the biased model
(iii) unbiased: use estimates from corrected model

fig 5

Ad feedback filtering on revenue¶

Fig 6

Uncovering Bias in Ad Feedback Data Analyses & Applications¶

Context¶

1. Analysis¶

2. Bias: explanation and correction¶

3. Ad Ranking¶

1. Analysis¶

Analysis¶

Analysis¶

Data¶

Analysis¶

Metric¶

Analysis¶

Features - to characterize users¶

Analysis¶

Features - to characterize ads¶

Analysis¶

Formula¶

Analysis¶

Results¶

Analysis¶

Conclusion¶

2. Modelling and Correcting Bias¶

Modelling Bias¶

Modelling Bias¶

Modelling Bias¶

Modelling Bias¶

Correcting Bias¶

Formula¶

Conclusions¶

3. Ad Ranking¶

Ad Auctions¶

Using ad feedback in ad ranking: model comparison¶

Ad feedback filtering on revenue¶

Uncovering Bias in Ad Feedback Data Analyses & Applications ¶