[2312.01607] Monte Carlo Experiments of Network Effects in Randomized Controlled Trials

Marton Trencseni - Tue 09 April 2024 - monte-carlo-experiments-of-network-effects-in-randomized-controlled-trials

Monte Carlo Experiments of Network Effects in Randomized Controlled Trials

Márton Trencséni (mtrencseni@gmail.com)
Abstract

I run Monte Carlo simulations of content production over random Watts-Strogatz graphs to show various effects relevant to modeling and understanding Randomized Controlled Trials on social networks: the network effect, spillover effect, experiment dampening effect, intrinsic dampening effect, clustering effect, degree distribution effect and the experiment size effect. I will also define some simple metrics to measure their strength. When running experiments these potentially unexpected effects must be understood and controlled for in some manner, such as modeling the underlying graph structure to establish a baseline.

1 Introduction

In academic as well as industry domains Randomized Controlled Trials (RCTs) are the gold standard tool for causal inference. Examples are clinical trials of new drugs and treatments, or improving online services and other online products. One of the core assumptions of traditional RCTs is the Independence Assumption (IA), that the units of experimentation are independent of each other. In plain words, the assumption is that units don’t communicate with each other, they do not affect other units’ outcomes based on their own experiences — whether they are in treatment or control. For many experiments ran on online services however — social networks such as Facebook or Twitter/X — the IA does not hold. The simplest toy example considered in this paper is a scenario of a treatment applied to a group of users that is intended to increase their content production. Increased content production of treatment users may boost content production of control users, since they will see their treatment friend’s additional posts in their feed, and thus be motivated or inclined to themselves post more. In this paper I consider this toy model, and examine various, potentially unexpected effects that affect (contaminate) an experiment run under such circumstances on a network, where the IA does not hold. To aid the discussion, I will also define some simple metrics to measure the strength of these effects.

2 Effects considered in this paper

To discuss and understand these network effects, the paper uses Monte Carlo simulations of RCTs of content production over random Watts-Strogatz graphs. The effects considered and discussed in this paper are:

  • 1.

    Network effect: each node’s neighbours’ content production boosts the node’s own content production.

  • 2.

    Degree distribution effect: the network effect boost is a function of the graph’s degree distribution.

  • 3.

    Spillover effect: in an experiment that boosts intrinsic content production in a subset of the nodes (the treatment group), this boost will spill over to the rest of the nodes.

  • 4.

    Experiment dampening effect: due to the spillover effect, we underestimate the lift in the treatment group as compared to control.

  • 5.

    Intrinsic dampening effect: the full lift of the treatment group is not realized in the network due to content production in the rest of the network being lower.

  • 6.

    Clustering effect: a more tightly clustered treatment group leads to higher lift compared to control.

  • 7.

    Experiment size effect: as we add more nodes to the treatment group, absolute content production increases in the treatment group (and the entire network) due to the network effects, but not relative to the control group.

Refer to caption
Figure 1: Watts-Strogatz graph with (n=100,k=6,p=0.1)formulae-sequence𝑛100formulae-sequence𝑘6𝑝0.1(n=100,k=6,p=0.1), with N=20𝑁20N=20 random treatment nodes colored blue.

3 Models of content production

The network is modeled as graph G=(V,E)𝐺𝑉𝐸G=(V,E), where V𝑉V are nodes and E𝐸E are edges. For simplicity we assume edges are bi-directional, in other words the modeled relationship between nodes (such as friendship) is symmetric. Each node i𝑖i has an associated random variable Risubscript𝑅𝑖R_{i}, which describes the node’s propensity for content production. Furthermore, we assume the content production random variables Risubscript𝑅𝑖R_{i} follow the same distribution R𝑅R and are parameterized by a variable λisubscript𝜆𝑖\lambda_{i}, such that Ri=R(λi)subscript𝑅𝑖𝑅subscript𝜆𝑖R_{i}=R(\lambda_{i}). Without loss of generality, we will also assume that the random variables have an expectation value, and that E[Ri]=E[R(λi)]=μi𝐸delimited-[]subscript𝑅𝑖𝐸delimited-[]𝑅subscript𝜆𝑖subscript𝜇𝑖E[R_{i}]=E[R(\lambda_{i})]=\mu_{i}. Specifically, we will parameterize the distributions such that E[Ri]=E[R(λi)]=λi𝐸delimited-[]subscript𝑅𝑖𝐸delimited-[]𝑅subscript𝜆𝑖subscript𝜆𝑖E[R_{i}]=E[R(\lambda_{i})]=\lambda_{i}. Note that in some of the Monte Carlo simulations, in cases when we are investigating the mean behaviour, we will replace the random variable with its mean — after verifying this is justified!

We construct a dynamic model of T𝑇T discrete time steps: let citsuperscriptsubscript𝑐𝑖𝑡c_{i}^{t} be the amount of content produced by node i𝑖i at time step t𝑡t. At t=0𝑡0t=0 all nodes have 0 content: cit=0=0superscriptsubscript𝑐𝑖𝑡00c_{i}^{t=0}=0. In subsequent time steps, we use the random variables R(λi)𝑅subscript𝜆𝑖R(\lambda_{i}) for each node i𝑖i to model content production, in other words citR(λit)superscriptsubscript𝑐𝑖𝑡𝑅superscriptsubscript𝜆𝑖𝑡c_{i}^{t}\leftarrow R(\lambda_{i}^{t}), where \leftarrow denotes drawing a value from the random variable. Note that we will omit indexes and superscripts at times to aid readability. We model the λisubscript𝜆𝑖\lambda_{i} as having two components λi=λint+νisubscript𝜆𝑖subscript𝜆𝑖𝑛𝑡subscript𝜈𝑖\lambda_{i}=\lambda_{int}+\nu_{i}. The first parameter λintsubscript𝜆𝑖𝑛𝑡\lambda_{int} models the node’s intrinsic propensity to produce content, independent of the rest of the graph, and is the same for all nodes on a network. When we model experiments, it is the overall λ𝜆\lambda that will get boosted by ΔλΔ𝜆\Delta\lambda for treatment nodes, like λi=(λint+νi)(1+Δλ)subscript𝜆𝑖subscript𝜆𝑖𝑛𝑡subscript𝜈𝑖1Δ𝜆\lambda_{i}=(\lambda_{int}+\nu_{i})(1+\Delta\lambda) — this overall boost simplifies reasoning about the toy models. The second parameter νisubscript𝜈𝑖\nu_{i} models the propensity of the node to produce more content if its neighbours produce more content. We model νisubscript𝜈𝑖\nu_{i} as some function f𝑓f of the node’s neighbours content production in the previous time step: νit=f(cjt1)superscriptsubscript𝜈𝑖𝑡𝑓superscriptsubscript𝑐𝑗𝑡1\nu_{i}^{t}=f(c_{j}^{t-1}), where j𝑗j runs for all of i𝑖i’s neighbours. Specifically, we will use νit=νdampjVicjt1superscriptsubscript𝜈𝑖𝑡subscript𝜈𝑑𝑎𝑚𝑝subscript𝑗subscript𝑉𝑖superscriptsubscript𝑐𝑗𝑡1\nu_{i}^{t}=\nu_{damp}\sum_{j\in V_{i}}{c_{j}^{t-1}}, where Visubscript𝑉𝑖V_{i} is the set of nodes that are i𝑖i’s neighbours. νdampsubscript𝜈𝑑𝑎𝑚𝑝\nu_{damp} is an important global dampening parameter, whose presence is important to make the overall model stable in the steady-state (see next section).

The above model can be summed up as: at each time step t𝑡t, for each node, we add up content produced by its neighbours in the previous time step, multiply by νdampsubscript𝜈𝑑𝑎𝑚𝑝\nu_{damp}, add λintsubscript𝜆𝑖𝑛𝑡\lambda_{int}, and draw with this parameter from the random variable R𝑅R to get content production at t𝑡t.

Throughout this paper we will use connected Watts–Strogatz random graphs. A Watts–Strogatz graph is described by 3 parameters (n,k,p)𝑛𝑘𝑝(n,k,p). First, a regular ring lattice is constructed, a graph with n𝑛n nodes each connected to k<n𝑘𝑛k<n neighbors. Then, ”shortcuts” are created by replacing some edges as follows: for each original edge (u,v)𝑢𝑣(u,v) in the graph, with probability p𝑝p replace it with a new edge (u,w)𝑢𝑤(u,w), where w𝑤w is randomly chosen. If the process yield a graph that is not connected, then try again until we get a connected graph. Note that the final randomization process does not change the mean degree in the graph, which remains k𝑘k at every step.

4 Mean behaviour, stability and the network effect

In the model constructed, there are two sources of randomness: (i) randomness introduced when the graph is constructed, and (ii) the random variables used in content production (if one is used). As we will see, to understand the network effects discussed in this paper (ii) is not important, and the random variable may be replaced by its mean.

First, to examine the mean behaviour of the model we run simulations on a Watts-Strogatz graph with (n=10 000,k=50,p=0.1)formulae-sequence𝑛10000formulae-sequence𝑘50𝑝0.1(n=10\,000,k=50,p=0.1). We look at three models of content production: (i) a uniform distribution R(λ)=U(0,2λ)𝑅𝜆𝑈02𝜆R(\lambda)=U(0,2\lambda), (ii) a Poisson distribution R(λ)=Pois(λ)𝑅𝜆𝑃𝑜𝑖𝑠𝜆R(\lambda)=Pois(\lambda), and (iii) the mean constant case R(λ)=λ𝑅𝜆𝜆R(\lambda)=\lambda. Note that in all three cases, the mean is E[R(λ)]=λ𝐸delimited-[]𝑅𝜆𝜆E[R(\lambda)]=\lambda.

Figure 2 shows that content production stabilizes after about T=10𝑇10T=10 steps, and that that in all three cases mean content production across the network is the same, apart from random fluctuations coming from the random variables. Note that in case (i) and (iii) content production is modeled to be continuous, whereas in case of (ii) it’s discrete, since the Poisson is a discrete random variable. Figure 3 shows the histogram of content production across nodes in the last time step, for all three models considered.

Refer to caption
Figure 2: Content production on Watts-Strogatz graph with (n=10 000,k=50,p=0.1)formulae-sequence𝑛10000formulae-sequence𝑘50𝑝0.1(n=10\,000,k=50,p=0.1), λint=1subscript𝜆𝑖𝑛𝑡1\lambda_{int}=1 and νdamp=0.01subscript𝜈𝑑𝑎𝑚𝑝0.01\nu_{damp}=0.01. Mean content production converges.
Refer to caption
Figure 3: Density histogram of content production during the last step of Monte Carlo simulation for the previous models. The y-axis is logarithmic to ease readability.

In Figure 2, mean content production clearly stabilized. Next, let’s re-run the same simulations, but on a Watts-Strogatz graph with k=200𝑘200k=200. The results are shown on Figure 4.

Refer to caption
Figure 4: Content production on Watts-Strogatz graph with (n=10 000,k=200,p=0.1)formulae-sequence𝑛10000formulae-sequence𝑘200𝑝0.1(n=10\,000,k=200,p=0.1), λint=1subscript𝜆𝑖𝑛𝑡1\lambda_{int}=1 and νdamp=0.01subscript𝜈𝑑𝑎𝑚𝑝0.01\nu_{damp}=0.01. Mean content production diverges.

With these parameters content production diverges, it does not reach a steady-state as before. Consider the mean constant case to see why: assume that the model reaches a steady-state mean content production cbasesubscript𝑐𝑏𝑎𝑠𝑒c_{base}. Then, from the way the model is constructed it follows that (using E[R(λ)]=λ𝐸delimited-[]𝑅𝜆𝜆E[R(\lambda)]=\lambda):

cbase=λint+cbaseVi¯νdampsubscript𝑐𝑏𝑎𝑠𝑒subscript𝜆𝑖𝑛𝑡subscript𝑐𝑏𝑎𝑠𝑒¯subscript𝑉𝑖subscript𝜈𝑑𝑎𝑚𝑝c_{base}=\lambda_{int}+c_{base}\cdot\bar{V_{i}}\cdot\nu_{damp} (1)

where Vi¯¯subscript𝑉𝑖\bar{V_{i}} is the average number of neighbours per node, which for the Watts-Strogatz graph is Vi¯=k¯subscript𝑉𝑖𝑘\bar{V_{i}}=k always. Solving for cbasesubscript𝑐𝑏𝑎𝑠𝑒c_{base} we get:

cbase=λint1kνdampsubscript𝑐𝑏𝑎𝑠𝑒subscript𝜆𝑖𝑛𝑡1𝑘subscript𝜈𝑑𝑎𝑚𝑝c_{base}=\frac{\lambda_{int}}{1-k\cdot\nu_{damp}} (2)

For the first model with λint=1,k=50,νdamp=0.01formulae-sequencesubscript𝜆𝑖𝑛𝑡1formulae-sequence𝑘50subscript𝜈𝑑𝑎𝑚𝑝0.01\lambda_{int}=1,k=50,\nu_{damp}=0.01 this yield cbase=2subscript𝑐𝑏𝑎𝑠𝑒2c_{base}=2, which matches the numeric results. Clearly, this equation yields the stability condition:

kνdamp<1𝑘subscript𝜈𝑑𝑎𝑚𝑝1k\cdot\nu_{damp}<1 (3)

For the divergent case, the above inequality is not satisfied. It is interesting to see the case where the denominator in Equation (2) is exactly 0, which we can achieve by setting k=50𝑘50k=50. In this case, content production does not explode, but increases linearly indefinitely.

Refer to caption
Figure 5: Same as before, with k=100𝑘100k=100.

This simple experiment of Figure 2 shows the network effect: we parameterized intrinsic content production with λint=1subscript𝜆𝑖𝑛𝑡1\lambda_{int}=1, which itself, without the boost from neighbours, would have yielded mean content production cbase=λint=1subscript𝑐𝑏𝑎𝑠𝑒subscript𝜆𝑖𝑛𝑡1c_{base}=\lambda_{int}=1 in this simple model. However, due to the network effect, which here is a function of k𝑘k and νdampsubscript𝜈𝑑𝑎𝑚𝑝\nu_{damp}, the actual mean content production cbasesubscript𝑐𝑏𝑎𝑠𝑒c_{base} is significantly higher, by a factor of 11kνdamp11𝑘subscript𝜈𝑑𝑎𝑚𝑝\frac{1}{1-k\cdot\nu_{damp}}.

5 A note on regularizing models

Real networks can experience exponential growth for a while, but not indefinitely — some physical resource, such as people, attention or time runs out eventually. In this sense, the models above have unrealistic or un-physical domains because they can diverge. In principle, we can regularize the model in two places:

  • 1.

    regularize the network effect: assume that each node can only consume a finite amount ηmaxsubscript𝜂𝑚𝑎𝑥\eta_{max} of its neighbours’ content in a time step

  • 2.

    regularize content production: assume that each node can only produce a finite amount cmaxsubscript𝑐𝑚𝑎𝑥c_{max} content

In both cases, the assumption is physical in a real network, because only finite time is available per time step to read or write content.

The simplest way to regularize is to cut off with the max{}𝑚𝑎𝑥max\{\} function, so that νit=max{νmax,νdampjVicjt1}superscriptsubscript𝜈𝑖𝑡𝑚𝑎𝑥subscript𝜈𝑚𝑎𝑥subscript𝜈𝑑𝑎𝑚𝑝subscript𝑗subscript𝑉𝑖superscriptsubscript𝑐𝑗𝑡1\nu_{i}^{t}=max\{\nu_{max},\nu_{damp}\sum_{j\in V_{i}}{c_{j}^{t-1}}\} and citmax{cmax,R(λit)}superscriptsubscript𝑐𝑖𝑡𝑚𝑎𝑥subscript𝑐𝑚𝑎𝑥𝑅superscriptsubscript𝜆𝑖𝑡c_{i}^{t}\leftarrow max\{c_{max},R(\lambda_{i}^{t})\}. A smooth approach is to use a sigmoid function σ()𝜎\sigma(\cdot) to achieve smoother cut-off: νit=νmaxσ(νdampjVicjt1)superscriptsubscript𝜈𝑖𝑡subscript𝜈𝑚𝑎𝑥𝜎subscript𝜈𝑑𝑎𝑚𝑝subscript𝑗subscript𝑉𝑖superscriptsubscript𝑐𝑗𝑡1\nu_{i}^{t}=\nu_{max}\,\sigma\bigl{(}\nu_{damp}\sum_{j\in V_{i}}{c_{j}^{t-1}}\bigr{)} and citcmaxσ(R(λit))superscriptsubscript𝑐𝑖𝑡subscript𝑐𝑚𝑎𝑥𝜎𝑅superscriptsubscript𝜆𝑖𝑡c_{i}^{t}\leftarrow c_{max}\,\sigma\bigl{(}R(\lambda_{i}^{t})\bigr{)}.

Refer to caption
Figure 6: Same as Figure 5, but with regularized network effect using the sigmoid σ()𝜎\sigma(\cdot) and νmax=1subscript𝜈𝑚𝑎𝑥1\nu_{max}=1. Content production is no longer divergent.

Figure 6 shows the same model as Figure 5, but with (just) the network effect regularized. In this case, solving for cbasesubscript𝑐𝑏𝑎𝑠𝑒c_{base} we get:

cbase=λint+νmaxσ(cbaseVi¯νdamp)subscript𝑐𝑏𝑎𝑠𝑒subscript𝜆𝑖𝑛𝑡subscript𝜈𝑚𝑎𝑥𝜎subscript𝑐𝑏𝑎𝑠𝑒¯subscript𝑉𝑖subscript𝜈𝑑𝑎𝑚𝑝c_{base}=\lambda_{int}+\nu_{max}\sigma(c_{base}\cdot\bar{V_{i}}\cdot\nu_{damp}) (4)

Due to the properties of σ()𝜎\sigma(\cdot), there is no closed form solution for cbasesubscript𝑐𝑏𝑎𝑠𝑒c_{base} in this case. In the rest of this paper, we will not use regularization, as it makes it harder to reason about the behaviour of the network — we will work with ”raw” models, but always use parameters that yield a stable network.

6 Degree distribution effect

In this section we will discuss a small, nuanced adjustment to the above formula for mean content production on a random graph that is due to the degree distribution of nodes. In a Watts-Strogatz graph with p=0𝑝0p=0, all nodes have exactly k𝑘k neighbours; however, at p>0𝑝0p>0, due to the random adjustment of edges, some nodes will end up with lower, and some with higher degree count than k𝑘k. Figure 6 shows the degree distribution of a Watts-Strogatz graph with k=50𝑘50k=50 at different p𝑝p values. It shows that with increasing p𝑝p, the distribution spreads out around the mean of k𝑘k.

Refer to caption
Figure 7: Degree distribution of a Watts-Strogatz with n=10 000,k=50formulae-sequence𝑛10000𝑘50n=10\,000,k=50 at different p𝑝p values.

When examining content production on these random graphs at different p𝑝p values, an interesting effect shows itself: mean content production increases in a concave way over the base value, which is only realized exactly in the p=0𝑝0p=0 case. Defining

edegreedistribution=c¯/cbase1subscript𝑒𝑑𝑒𝑔𝑟𝑒𝑒𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛¯𝑐subscript𝑐𝑏𝑎𝑠𝑒1e_{degree\,distribution}=\bar{c}/c_{base}-1 (5)

where c¯¯𝑐\bar{c} is the actual content production measured in the network in steady-state and cbasesubscript𝑐𝑏𝑎𝑠𝑒c_{base} is the theoretically calculated value from Equation (2). Figure 8 shows the effect as a function of p𝑝p.

Refer to caption
Figure 8: Increased content production due to more spread out degree distribution of a Watts-Strogatz with n=10 000,k=50formulae-sequence𝑛10000𝑘50n=10\,000,k=50 at different p𝑝p values.

The explanation is as follows: the mean degree of nodes remains exactly k𝑘k in a Watts-Strogatz graph, despite (and during) the final randomization of edges, since the total number of nodes and edges always remains the same. However, as the histogram in Figure 7 shows, some nodes will end up with higher, and some with lower degree, and this is not entirely symmetric: on the lower side, the distribution is more bunched around the mean, whereas on the higher side, the degree distribution moves away further from the mean k𝑘k. As a result, some nodes will have a relatively high neighbour count, get a high network boost, and this is not cancelled out by the nodes having lower degree and experiencing lower network boost, since the distribution is not symmetric. This is illustrated in Figure 9, which shows the mean content production for the same graph as above, but as a function of node degree.

Refer to caption
Figure 9: Content production in the previous network, as a function of node degree. The horizontal red dashed line shows the baseline content production we would have without a spread of degree distribution, the vertical red dashed line is the mean degree. Clearly content production is a function of degree, and the spread is wider in the positive direction.

In the rest of the paper we will use the primed notation cbasec_{base}\prime to refer to the ”bare” content production on the network, taking into account the above effect (so slightly higher than cbasesubscript𝑐𝑏𝑎𝑠𝑒c_{base}). Bare here means no treatment is applied, all nodes are identical. Since there is no known formula for cbasec_{base}\prime to me, it is calculated from a bare Monte Carlo network simulation in the rest of the paper.

7 Spillover effect

In this section we will start simulating Randomized Controlled Trials (RCTs) on networks. It’s important to point out that RCTs are traditionally run with the Independence Assumption (IA), that the units of experimentation are independent of each other. In the network models considered here, the IA does not hold — the resulting behaviour is the topic and raison d’etre of this paper itself.

Consider a model on a Watts-Strogatz graph with (n=500 000,k=50,p=0.1)formulae-sequence𝑛500000formulae-sequence𝑘50𝑝0.1(n=500\,000,k=50,p=0.1), with identical parameters as before, without regularization. Per Equation (2), such a model has an expected steady-state content production of cbase2c_{base}\prime\approxeq 2. This base model is modified so that a random 2% treatment subset of nodes receives a Δλ=0.05Δ𝜆0.05\Delta\lambda=0.05 boost to their overall content production. Figure 10 shows content production separately for treatment, neigbhours of treatment and the rest of nodes (these 3 are distinct sets covering the whole graph).

Refer to caption
Figure 10: Spillover effect in a Randomized Controlled Trial on a Watts-Strogatz graph with (n=500 000,k=50,p=0.1)formulae-sequence𝑛500000formulae-sequence𝑘50𝑝0.1(n=500\,000,k=50,p=0.1) with N=10 000𝑁10000N=10\,000 treatment nodes receiving a Δλ=0.05Δ𝜆0.05\Delta\lambda=0.05 boost to their overall content production, for all three (mean constant, uniform, Poisson) content production models. The line types match the legend on previous figures. The first few time steps where steady-state is approached is not shown.

The following table shows the split of nodes and the average content production c𝑐c level of the various subsets in the network (control is the union of neighbours and rest):

group cardinality c/cbasec/c_{base}\prime
treatment 10 0001000010\,000 1.0258
control 490 000490000490\,000 1.0005
neighbours 313 507313507313\,507 1.0012
rest 176 493176493176\,493 0.9991

What this shows is that the treatment effect spills over into the control group. The spillover is stronger into direct neighbours of treatment, weaker for the rest of nodes. It’s also worth noting that treatment, control and neighbours subsets have higher content production than the simulated cbase=2.0019c_{base}\prime=2.0019 base value, which shows that even a small treatment group can ”contaminate” the entire network. However, rest (nodes that are not treatment and also not neighbours of treatment) has slightly lower (crest/cbase<1c_{rest}/c_{base}\prime<1) content production — how is this possible? The explanation is statistical: the nodes in rest are biased to have less neighbours, in this Monte Carlo simulation their mean degree is k=49.2𝑘49.2k=49.2, and this results in this subset’s content production being slightly lower due to lower network effects (using the naive formula, cbase(k=49.2)=1.9686subscript𝑐𝑏𝑎𝑠𝑒𝑘49.21.9686c_{base}(k=49.2)=1.9686. However, the overall conclusions are unaffected by this observation.

We can measure the strength of the spillover effect by defining:

espillover=ccontrol/cbase1e_{spillover}=c_{control}/c_{base}\prime-1 (6)

In other words, espilloversubscript𝑒𝑠𝑝𝑖𝑙𝑙𝑜𝑣𝑒𝑟e_{spillover} measures the degree of contamination (due to the treatment effect applied to the treatment group). In this scenario we measure espillover=0.0005subscript𝑒𝑠𝑝𝑖𝑙𝑙𝑜𝑣𝑒𝑟0.0005e_{spillover}=0.0005.

Clearly, the spillover effect must be a function of N/n𝑁𝑛N/n and k𝑘k:

  • 1.

    as N/n0,espillover0formulae-sequence𝑁𝑛0subscript𝑒𝑠𝑝𝑖𝑙𝑙𝑜𝑣𝑒𝑟0N/n\rightarrow 0,e_{spillover}\rightarrow 0, because the treatment group becomes insignificant in the overall network

  • 2.

    as k0,espillover0formulae-sequence𝑘0subscript𝑒𝑠𝑝𝑖𝑙𝑙𝑜𝑣𝑒𝑟0k\rightarrow 0,e_{spillover}\rightarrow 0, because nodes have less neighbours and the model approaches a conventional RCT where the Independence Assumption holds

To illustrate the dependence on N/n𝑁𝑛N/n and k𝑘k, let’s look at the same experiment as above, but with k=10𝑘10k=10 and a 1% treatment group of N=5 000𝑁5000N=5\,000 nodes. In this scenario, where cbase=1.1112c_{base}\prime=1.1112, the mean steady-state of the network is:

group cardinality c/cbasec/c_{base}\prime
treatment 5 00050005\,000 1.0450
control 495 000495000495\,000 1.0000
neighbours 47 2754727547\,275 1.0014
rest 447 725447725447\,725 0.9999

The spillover effect works out to be espillover=0.00005subscript𝑒𝑠𝑝𝑖𝑙𝑙𝑜𝑣𝑒𝑟0.00005e_{spillover}=0.00005, about 10x weaker than in the previous example.

Refer to caption
Figure 11: Same as Figure 10, but with k=10𝑘10k=10 and N=5 000𝑁5000N=5\,000.

8 Experiment dampening effect

In an RCT, we measure the treatment versus control group lift, and attribute it to be the effect of the applied treatment, with the caveat that one must be careful to separate out signal from noise using statistical hypothesis testing or bayesian inference. In the terminology of this paper, the treatment effect is defined as:

etreatment=ctreatment/ccontrol1subscript𝑒𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡subscript𝑐𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡subscript𝑐𝑐𝑜𝑛𝑡𝑟𝑜𝑙1e_{treatment}=c_{treatment}/c_{control}-1 (7)

In the previous two experiments, we measure etreatment=0.0253subscript𝑒𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡0.0253e_{treatment}=0.0253 and etreatment=0.0449subscript𝑒𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡0.0449e_{treatment}=0.0449, respectively. Both are lower than the actual treatment effect Δλ=0.05Δ𝜆0.05\Delta\lambda=0.05 we applied synthetically. This is the experiment dampening effect, a result of the spillover effect: because the treatment effect spills over into control and boosts control as well, we measure a lower treatment effect. We can define the experiment dampening effect as:

edampening=etreatment/Δλsubscript𝑒𝑑𝑎𝑚𝑝𝑒𝑛𝑖𝑛𝑔subscript𝑒𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡Δ𝜆e_{dampening}=e_{treatment}/\Delta\lambda (8)

In the cases above, we get edampening=0.5075subscript𝑒𝑑𝑎𝑚𝑝𝑒𝑛𝑖𝑛𝑔0.5075e_{dampening}=0.5075 and edampening=0.8990subscript𝑒𝑑𝑎𝑚𝑝𝑒𝑛𝑖𝑛𝑔0.8990e_{dampening}=0.8990, respectively.

9 Intrinsic dampening effect

The reason experiments measure treatment against control in traditional RCTs is to factor out seasonal effects. If the experimenter were to compare the treatment group’s before-treatment and during-treatment (or after-treatment) metrics, she would not know whether the measured effect is due to the applied treatment, or something else that changed during the experiment. However, in our models, there is no seasonality, so we can examine ctreatment/cbasec_{treatment}/c_{base}\prime. Let’s define the intrinsic dampening effect:

eintrinsic=ctreatment/cbase1e_{intrinsic}=c_{treatment}/c_{base}\prime-1 (9)

This measures the dampening in the effect itself due to the connected, non-independent nature of the network. Naively, we may expect that this ratio should be just the applied treatment effect. However, as we will see, this is not the case. Specifically, for the first experiment considered previously, eintrinsic=0.0258subscript𝑒𝑖𝑛𝑡𝑟𝑖𝑛𝑠𝑖𝑐0.0258e_{intrinsic}=0.0258, for the second it is eintrinsic=0.0450subscript𝑒𝑖𝑛𝑡𝑟𝑖𝑛𝑠𝑖𝑐0.0450e_{intrinsic}=0.0450 (note these are not the same values as for etreatmentsubscript𝑒𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡e_{treatment}). The full lift is not realized in the treatment group due to content production in the rest of the network being lower, hence, the feedback the treatment nodes receive in their νit=ν0jVicjt1superscriptsubscript𝜈𝑖𝑡subscript𝜈0subscript𝑗subscript𝑉𝑖superscriptsubscript𝑐𝑗𝑡1\nu_{i}^{t}=\nu_{0}\sum_{j\in V_{i}}{c_{j}^{t-1}} term is lower. If the entire network were to receive the ΔλΔ𝜆\Delta\lambda boost, in other words if N=n𝑁𝑛N=n, then the measured eintrinsic=Δλsubscript𝑒𝑖𝑛𝑡𝑟𝑖𝑛𝑠𝑖𝑐Δ𝜆e_{intrinsic}=\Delta\lambda would hold, as is evident from Equation (2).

Note that the experiment dampening effect and the intrinsic dampening effect are related, but not the same thing:

  • 1.

    experiment dampening effect: we underestimate the lift in a treatment versus control measurement due to the treatment effect leaking to the control group in the network at treatment time

  • 2.

    intrinsic dampening effect: the full lift of the treatment group is not realized in the network due to content production being lower in the rest of the network (the control group) at treatment time; note that the control group does receive some of the treatment lift, so it’s content production also increases, but not to the level of the treatment group

10 Clustering effect

The clustering effect is a third effect closely related to the previous two: if the treatment nodes are more densely clustered than than the overall graph (or the control group), than the treatment nodes will achieve higher content production, both compared to the control group’s ccontrolsubscript𝑐𝑐𝑜𝑛𝑡𝑟𝑜𝑙c_{control} and the base rate cbasec_{base}\prime.

In a randomized test, on average the treatment and control’s clustering should be the same, but variations may produce higher clustering in treatment; stratification controls for this in conventional RCTs. Or, an experiment may have selection bias and thus result is a more clustered treatment group.

To illustrate the effect, let’s look at a Monte Carlo run where the treatment nodes are not randomly selected, but significantly more clustered on purpose. To accomplish tighter clustering in a Watts-Strogatz graph is straightforward: instead of randomly selecting nodes, pick a contiguous set of nodes from the original ring (before randomization). At the relatively low p𝑝p edge randomization values that we’re using, this will result in a highly clustered subset.

Refer to caption
Figure 12: The same Watts-Strogatz graph from Figure 1, but the treatment group is the first 20 nodes from the original ring.

We run an experiment with identical parameters as in Figure 10, except with a much more clustered treatment group: a Watts-Strogatz graph with (n=500 000,k=50,p=0.1)formulae-sequence𝑛500000formulae-sequence𝑘50𝑝0.1(n=500\,000,k=50,p=0.1), λint=1subscript𝜆𝑖𝑛𝑡1\lambda_{int}=1 and νdamp=0.01subscript𝜈𝑑𝑎𝑚𝑝0.01\nu_{damp}=0.01, with a 2% treatment group of N=10 000𝑁10000N=10\,000 nodes receiving a Δλ=0.05Δ𝜆0.05\Delta\lambda=0.05 boost to their content production.

Refer to caption
Figure 13: The same Watts-Strogatz graph from Figure 7, but the treatment group is tightly clustered, resulting in significantly higher content production.

However, previously, with a random treatment group the 10 0001000010\,000 treatment nodes had 313 507313507313\,507 non-treatment neighbours and 37.2% of treatment node’s neighbours were also in treatment, now the 10 0001000010\,000 treatment nodes only have 46 5454654546\,545 non-treatment neighbours and 90.6% of treatment node’s neighbours are also in treatment. This highly clustered model yields:

group cardinality c/cbasec/c_{base}\prime
treatment 10 0001000010\,000 1.0457
control 490 000490000490\,000 1.0001
neighbours 46 5544655446\,554 1.0052
rest 443 446443446443\,446 0.9995

The highly clustered case achieves edampening=0.9136subscript𝑒𝑑𝑎𝑚𝑝𝑒𝑛𝑖𝑛𝑔0.9136e_{dampening}=0.9136, so there is barely any dampening, the treatment lift is 91% of the actual ΔλΔ𝜆\Delta\lambda lift. The explanation is that 90% of the treatment group nodes’ neighbours are also in treatment, so they get ”their own boost back”, and there is relatively little dampening. It is also worth noting that espillover=0.0001subscript𝑒𝑠𝑝𝑖𝑙𝑙𝑜𝑣𝑒𝑟0.0001e_{spillover}=0.0001, 5x lower than with a truly random treatment group. The explanation is that there are less edges running between treatment and control, so less ways for the spillover to happen.

11 Experiment size effect

The last effect we consider is the experiment size effect. The experiment size effect is closely related to the spillover effect, and is simply the observation that with a larger treatment group, there is stronger spillover, which yields stronger feedback, which results in higher content production in the treatment group. Figure 14 shows the variation of eintrinsicsubscript𝑒𝑖𝑛𝑡𝑟𝑖𝑛𝑠𝑖𝑐e_{intrinsic} with N/n𝑁𝑛N/n for a Watts-Strogatz graph with n=10 000,p=0.1formulae-sequence𝑛10000𝑝0.1n=10\,000,p=0.1 and Δλ=0.5Δ𝜆0.5\Delta\lambda=0.5.

Refer to caption
Figure 14: As N/n𝑁𝑛N/n approaches 1, the intrinsic effect (the ratio of treatment content production to base content production) approaches the experiment effect ΔλΔ𝜆\Delta\lambda. Experiment ran on a Watts-Strogatz graph with n=10 000𝑛10000n=10\,000 and p=0.1𝑝0.1p=0.1, at different k𝑘k values.

However, as pointed out earlier, in an RCT, the experimenter compares treatment content production to control’s (and not base). Figure 15 shows edampeningsubscript𝑒𝑑𝑎𝑚𝑝𝑒𝑛𝑖𝑛𝑔e_{dampening} (the ratio of the treatment effect etreatmentsubscript𝑒𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡e_{treatment} to the actual ΔλΔ𝜆\Delta\lambda effect) for the same experiment. Remarkably, the dampening is essentially constant with N/n𝑁𝑛N/n, which means that even with increasing N𝑁N, the experimenter would measure the same treatment effect, because due to spillover, control’s content production also goes up.

Refer to caption
Figure 15: The dampening effect (the ratio of the treatment effect to ΔλΔ𝜆\Delta\lambda) is constant with N/n𝑁𝑛N/n, and is only a function of k𝑘k.

Both plots split as a function of k𝑘k, the mean degree: the lower k𝑘k, the closer we are to the Independence Assumption, the more the measured treatment effect approaches ΔλΔ𝜆\Delta\lambda. Separate experiments (not discussed here) show no dependence on p𝑝p.

12 Conclusion

In traditional RCTs where the Independence Assumption holds, the experimenters’ main focus is on study design and controlling for statistical fluctuations between the various experiment groups using techniques such as A/A tests, sample size mismatch tests, p-values, Bayesian testing, and so on. This paper showed that when running experiments on social networks or other graphs, where the IA does not hold, various other effects, all a result of the network effect, must be considered and controlled for. As the last section showed, even with high treatment group sizes, due to the treatment effect spilling over to the control group, the experimenter will never measure the true treatment effect when comparing treatment to control. This suggests that in networked conditions measuring or estimating the underlying graph structure and network effects, and then running simulations (similar to the ones in this paper) may be a good method to establish a baseline to compare treatment group behaviour with.

13 Code

The paper’s accompanying code, including all figures in the paper, is available as a Python notebook on Github. The random seed has been set to a fixed value in the code, so all numerical results are reproducible by re-running the notebook. Use the below link to access the notebook: https://github.com/mtrencseni/monte-carlo-network-effects-rct-2023

References

  • Kohavi (2020) Kohavi, Tang, and Xu (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press.
  • Watts, Strogatz (1998) Watts, Strogatz (1998). Collective dynamics of ’small-world’ networks. Nature. 393 (6684): 440–442.
  • Barabási (2002) Albert, Barabási (2002). Statistical mechanics of complex networks. Reviews of Modern Physics. 74 (1): 47–97. arXiv:cond-mat/0106096.