Super learner

08 Mar 2024

Posted on:

07 Mar 2024

0

# Resolved:CTR test comparison

Why do you directly compare the before and after without computing the mean by day?
This is the actual code:
`res = ttest_ind(before.query('groupid == 0')['ctr'].to_numpy(),`
`                before.query('groupid == 1')['ctr'].to_numpy()).pvalue`
`print(res)`

This is how I thought it would be:
`res = ttest_ind(before.query('groupid == 0').groupby('dt')['ctr'].mean(),`
`                before.query('groupid == 1').groupby('dt')['ctr'].mean()).pvalue`
`print(res)`

The question comes because I have counted the number of users we have in each group and it's different:`len(before.query('groupid == 0')['ctr'].to_numpy()) `results in 474947, while

`len(before.query('groupid == 1')['ctr'].to_numpy()) `results in 475928.

Are those really comparable or we need to have the same number of observations in each group?

2 answers ( 1 marked as helpful)
Instructor
Posted on:

08 Mar 2024

0

Hi Daniel!

Thanks for reaching out!

The direct comparison of before and after without computing the daily mean is a common approach in A/B testing when the focus is on the overall effect of the change across the entire test period, not the daily fluctuations. This method aggregates the overall performance across the period, giving a broader view of the impact.

As for the different numbers of observations, this is common in A/B testing due to random assignment and doesn't inherently invalidate the results. Statistical tests like the t-test can accommodate groups of unequal sizes. However, extremely disproportionate group sizes can indeed affect the test's power.

Additionally, if your test's integrity relies on daily patterns or if there's a concern about underlying trends that could affect the outcome, then averaging daily before testing can be more appropriate and your approach would be the exact method.

Hope this helps.

Best,

Ivan

Super learner
Posted on:

08 Mar 2024

0

Sure it helps, thanks a lot for the explanation Ivan