Paper: The Unreasonable Effectiveness of Monte Carlo Simulations in A/B Testing

Marton Trencseni - Thu 05 December 2024 • Tagged with ab-testing

I run Monte Carlo simulations of content production over random Watts-Strogatz graphs to show various effects relevant to modeling and understanding Randomized Controlled Trials on social networks: the network effect, spillover effect, experiment dampening effect, intrinsic dampening effect, clustering effect, degree distribution effect and the experiment size effect. I will also define some simple metrics to measure their strength. When running experiments these potentially unexpected effects must be understood and controlled for in some manner, such as modeling the underlying graph structure to establish a baseline.

Continue reading

The Unreasonable Effectiveness of Monte Carlo Simulations in A/B Testing

Marton Trencseni - Sun 20 October 2024 • Tagged with ab-testing

The article illustrates how Monte Carlo simulations serve as a powerful method for exploring statistical concepts in A/B testing, enhancing understanding, improving experimental design and analysis.

Monte Carlo casino

Continue reading

Techtalk: Why are A/B tests the gold standard of causal inference?

Marton Trencseni - Sun 29 September 2024 • Tagged with ab-testing, techtalk, kohavi

Recently, I delivered a techtalk on A/B testing to an audience of non-technical Product Managers and experienced Data Scientists.

100

Continue reading

Paper review: A Comparison of Approaches to Advertising Measurement

Marton Trencseni - Wed 01 May 2024 • Tagged with ab-testing, facebook, stratification, propensity

Why are Randomized Controlled Trials (RCTs, known as A/B testing in much of the industry) testing is widely regarded as the golden standard of causal inference? What else can a Data Scientist do if A/B testing is not possible, and why are those alternatives inferior to A/B testing?

This papers shows, using 15 experiments (for ads on Facebook) where a RCT was conducted, that common observational methods (run on the Facebook data, by ignoring the control group) severely mis-estimate the true treatment life (as measured by the RCT), often by a factor of 3x or more. This is true, even though Facebook has (i) very large sample sizes, and, (ii) very high quality data (per-user feature vector) about its users which are used in the observational methods. This should be a major red flag for Data Scientists working on common marketing measurements (such as marketing campaigns) using observational methods.

Continue reading

Paper: Monte Carlo Experiments of Network Effects in Randomized Controlled Trials

Marton Trencseni - Tue 09 April 2024 • Tagged with ab-testing

I run Monte Carlo simulations of content production over random Watts-Strogatz graphs to show various effects relevant to modeling and understanding Randomized Controlled Trials on social networks: the network effect, spillover effect, experiment dampening effect, intrinsic dampening effect, clustering effect, degree distribution effect and the experiment size effect. I will also define some simple metrics to measure their strength. When running experiments these potentially unexpected effects must be understood and controlled for in some manner, such as modeling the underlying graph structure to establish a baseline.

Continue reading

Real-world experiments: 5 Lessons from Google, Bing, Netflix and Alibaba

Marton Trencseni - Sun 18 June 2023 • Tagged with ab-testing

I discuss five lessons from large-scale experiments conducted by Google, Bing, Netflix and Alibaba: Kohavi's 1 out of 3 rule, Google's 41 shades of blue, Bing's unexpected big win, Alibaba's personalization experiment and Netflix' movie image personalization.

Netflix

Continue reading

Five ways to reduce variance in A/B testing

Marton Trencseni - Sun 19 September 2021 • Tagged with ab-testing, variance, stratification, cuped

I use toy Monte Carlo simulations to demonstrate 5 ways to reduce variance in A/B testing: increase sample size, move towards a more even split, reduce variance in the metric definition, stratification and CUPED.

Historic lift

Continue reading

Correlations, seasonality, lift and CUPED

Marton Trencseni - Sun 05 September 2021 • Tagged with ab-testing, cuped

In this final blog post about CUPED, I will address some questions about CUPED, such as, is correlation between "before" and "after" the same as seasonality?

.

Continue reading

A/A testing and false positives with CUPED

Marton Trencseni - Sun 15 August 2021 • Tagged with ab-testing, cuped

I use Monte Carlo simulations of A/A tests to demonstrate how Data Scientists can incorrectly skew lift and p-values if they pick-and-choose between reporting traditional and CUPED results after the experiment has concluded.

Historic lift

Continue reading

Reducing variance in conversion A/B testing with CUPED

Marton Trencseni - Sat 07 August 2021 • Tagged with ab-testing, cuped

I use Monte Carlo simulations of conversion A/B tests to demonstrate how CUPED reduces measurement variance in conversion experiments.

Historic lift

Continue reading

Reducing variance in A/B testing with CUPED

Marton Trencseni - Sat 31 July 2021 • Tagged with ab-testing, cuped

I use Monte Carlo simulations of A/B tests to demonstrate CUPED, a method to use historic "before" data to reduce the variance in the measurement of the treatment lift.

Historic lift

Continue reading

Building intuition for p-values and statistical significance

Marton Trencseni - Sun 25 April 2021 • Tagged with ab-testing

This is the transcript of a talk I did on experimentation and A/B testing to give the audience an intuitive understanding of p-values and statistical significance.

Coin flip

Continue reading

Making statistics lie for the 2020 Presidential election

Marton Trencseni - Thu 17 December 2020 • Tagged with ab-testing, trump, politics

After the 2020 US presidential election, the Trump campaign filed over 50 lawsuits and attacked the integrity of the elections by claiming there was voter fraud. One of the last lawsuits was filed in the Supreme Court of the United States by the state of Texas. Here I look at the statistical claims made in this lawsuit that were supposed to show irregularities in the Georgia vote.

Trump vs Biden

Continue reading

Comparing conversion at control and treatment sites

Marton Trencseni - Thu 03 December 2020 • Tagged with ab-testing

In real-life, non-digital situations, it's often not feasible to run true A/B tests. In such cases, we can compare before and after rollout conversions at a treatment site, while using a similar control site to measure and correct for seasonality. The post discusses how to compute increasingly correct p-values and bayesian probabilities in such scenarios.

Monte Carlo simulated control lifts

Continue reading

Multi-armed bandits and false positives

Marton Trencseni - Fri 21 August 2020 • Tagged with ab-testing

I use Monte Carlo simulations to explore the false positive rate of Multi-armed bandits.

Epsilon-greedy

Continue reading

A/B testing and Multi-armed bandits

Marton Trencseni - Fri 07 August 2020 • Tagged with ab-testing

Multi-armed bandits minimize regret when performing A/B tests, trading off between exploration and exploitation. Monte Carlo simulations shows that less exploration yields less statistical significance.

Epsilon-greedy

Continue reading

Understanding Facebook’s Planout A/B testing framework

Marton Trencseni - Fri 22 May 2020 • Tagged with ab-testing

PlanOut is a framework for online field experiments. It was created by Facebook in 2014 to make it easy to run and iterate on sophisticated experiments in a statistically sound manner.

Planout

Continue reading

Validation checks for A/B tests

Marton Trencseni - Thu 16 April 2020 • Tagged with ab-testing

A/B tests go wrong all the time, even in sophisticated product teams. As this article shows, for a range of problems we can run automated validation checks to catch problems early, before they have too bad of an effect on customers or the business. These validation checks compare various statistical properties of the funnels A and B to catch likely problems. Large technology companies are running such validation checks automatically and continuously for their online experiments.

Kolmogorov-Smirnov test

Continue reading

Running multiple A/B tests in parallel

Marton Trencseni - Mon 06 April 2020 • Tagged with ab-testing

I show using Monte Carlo simulations that randomizing user assignments into A/B test experiments makes it possible to run multiple A/B tests at once and measure accurate lifts on the same metric, assuming the experiments are independent.

Watts-Strogatz

Continue reading

Bayesian A/B conversion tests

Marton Trencseni - Tue 31 March 2020 • Tagged with bayesian, ab-testing

I compare probabilities from Bayesian A/B testing with Beta distributions to frequentist A/B tests using Monte Carlo simulations. Under a lot of circumstances, the bayesian probability of the action hypothesis being true and the frequentist p value are complementary.

Bayes vs z-test

Continue reading