A/B testing and networks effects

Marton Trencseni - Sat 21 March 2020 - Data

Introduction

In the previous post I calculated the expected average posts on a social network. The model had 2 components:

intrinsic: users intrinsically create posts
network effects: users create more posts if they see their friends’ posts

But, there is actually a third, even more nuanced effect: the strength of the network effect depends on the degree distribution in the network.

The code shown below is up on Github.

Degree dependence of the network effect

Simulations showed that even a weak network effect boosts overall post production significantly. If $c_{int}$ is intrinsic post production, and for each friend's post users create an additional $c_{net}$ post the next day, and on average users have $k$ friends, then in the steady state $c = c_{int} + c \times c_{net} \times k $. Solving this we get $ c = \frac{ c_{int} }{ 1 - c_{net} \times k } $.

In the last post, I used a $U(0, 1)$ random variable to multiply both $c_{int}$ and $c_{net}$. This is not neccessary and just introduces noise, so let’s leave it out. For this post let’s use $c_{int}=1/4$ and $c_{net}=3/200$, with these values $c=1$.

In the above formula, the structure of the graph only appears in the parameter $k$. This is a good first approximation, but it’s not entirely accurate. A more accurate way to think about it is to write $c = c_{int} + c \times f(c_{net}, g) $, where $f$ is some function, and $g$ is the graph ($f=c_{net} \times k$ is our initial approximation). Let's explore this dependence further.

We are using Watts-Strogatz graphs for these experiment, which have a re-wiring randomization parameter $p$. When the Watts-Strogatz graph is constructed, initially a ring graph is created (large diameter) with $k$ edges from each node, and then each edge is randomly re-wired with probability $p$ to get a small-world network (small diameter). If we set $p=0$, the degree distribution of the graph is exactly $k$. As we increase $p$, the degree distribution becomes a gaussian around $k$. Code to visualize this:

Note: some of these simulations take a while to run, so I now use multiprocessing. This makes the code harder to read, but the speed-up is significant.

def mp_degree_distribution(args):
    (n, k, p) = args
    g = connected_watts_strogatz_graph(n, k, p)
    ds = Counter(sorted([d for n, d in g.degree()], reverse=True))
    d, c = zip(*ds.items())
    return p, d, c

n=100*1000
k=50

results = mp_map(
    network_scaling_worker.mp_degree_distribution,
    [(n, k, p/100.0) for p in range(10, 110, 20)])

plt.figure(figsize=(10,5))
plt.xlabel('degree')
plt.ylabel('node count')
for x in results:
    plt.plot(x[1], x[2])
plt.legend(['p=%s' % x[0] for x in results])
plt.show()

Shows:

Watts–Strogatz degree distribution

Now comes the interesting bit. Let’s calculate the average post production for a Watts-Strogatz graph for different parameters $(n, p)$, with fixed $k=50$:

def mp_lifts(args):
    (n, k, p, N, T) = args
    g = connected_watts_strogatz_graph(n, k, p).to_directed()
    population_A = set(sample(g.nodes, N))
    stats = compute_stats(g, population_A, T) # see the previous post
    return (args, stats)

ns = [10*1000, 50*1000, 100*1000, 300*1000]
k=50
ps = range(0, 120, 20)
T=50
N=0

results = []
for n in ns:
    print('Running simulations for n=%s...' % n)
    start_time = time.time()
    results.extend(mp_map(
        network_scaling_worker.mp_lifts,
        [(n, k, p/100.0, N, T) for p in ps],
    ))
    elapsed_time = time.time() - start_time
    print('Done! Elapsed %s' % time.strftime("%M:%S", time.gmtime(elapsed_time)))

Shows:

Post production depends on Watts-Strogatz p

This shows the (third) effect:

At $p=0$, when all nodes have exactly $k$ neighbours, the mean post production is exactly 1.
At higher $p$, there is an additional boost to post production, due to some nodes having more neighbours than $k$; and this boost is not canceled by some nodes having less neighbours.
The maximum boost due to uneven degree distribution is about 2.5% at $p=1$ for $k=50$.
The effect does not seem to depend on the $n$ network size.

This has implications for A/B testing: when we compute the lift in our simulations, we need to use this adjusted baseline.

Another way to get a sense for this is: assume half the nodes are have degree $k=50-d$, half have $k=50+d$ after randomization, so the mean is still $k=50$. We can take the original formula $ c = \frac{ c_{int} }{ 1 - c_{net} \times k } $ and make an improved version: $ c(d) = \frac{1}{2} ( \frac{ c_{int} }{ 1 - c_{net} \times (k+d) } + \frac{ c_{int} }{ 1 - c_{net} \times (k-d) } ) $. This is not the exact formula (the experimental result above is concave, and this is convex), but it’s good for intuition:

Post production as a function of degree distribution

Initially I suspected the effect is due to the graph diameter decreasing with increasing $p$, so I checked how the mean shortest path depends on $p$ and $n$:

def mp_shortest_paths(args):
    (n, k, p, num_samples) = args
    g = connected_watts_strogatz_graph(n, k, p)
    samples = []
    for _ in range(num_samples):
        st = sample(g.nodes, 2)
        l = nx.shortest_path_length(g, st[0], st[1])
        samples.append(l)
    mean_shortest_path = np.mean(samples)
    return (args, mean_shortest_path)

ns = [10*1000, 50*1000, 100*1000, 300*1000]
k=50
ps = range(0, 120, 20)
num_samples = 1000

results = []
for n in ns:
    print('Running simulations for n=%s...' % n)
    start_time = time.time()
    results.extend(mp_map(
        network_scaling_worker.mp_shortest_paths,
        [(n, k, p/100.0, num_samples) for p in ps],
    ))
    elapsed_time = time.time() - start_time
    print('Done! Elapsed %s' % time.strftime("%M:%S", time.gmtime(elapsed_time)))

Shows:

Mean shortest path as a function of Watts-Strogatz p

This does not appear to be the reason for the additional boost. The mean shortest path in the graph strongly depends on $n$, as expected, but the effect does not. Intuitively, since all nodes are equal for now (there is no A population producing more content), path length shouldn’t matter.

So, to get more accurate lift readings for our A/B tests, we need to first calculate the correct baseline mean posts for a Watts-Strogatz graph with those $(k, p)$ params (no strong $n$ dependence, experimentally, as we just saw):

n=100*1000
k=50
p=0.1
T=50

g = connected_watts_strogatz_graph(n=n, k=k, p=p).to_directed()
posts = prev_posts = None
for t in range(T):
    posts = step_posts(g, prev_posts)
    prev_posts = posts
baseline_avg_posts = np.mean(list(posts.values()))
print("Correct baseline for a (%s, %s, %s) Watts-Strogatz graph after %s steps = %.4f" % (n, k, p, T, baseline_avg_posts))

Prints:

Correct baseline for a (100000, 50, 0.1) Watts-Strogatz graph after 50 steps = 1.0044

A/B testing

Let’s re-run the A/B test experiment from the last post and calculate the lifts, but compared to the new, corrected baseline. Let’s have an experimental group of $N$ people, whose intrinsic post production $c_{int}$ is lifted by 5%, using both $N=100$ and $N=1,000$:

ns = [5*1000, 10*1000, 50*1000, 100*1000]
k=50
p=0.1
T=50
num_simulations = 50

params = chain(
    prepend(100,  ns),
    prepend(1000, ns),
)

def thread_count(n):
    if n <= 100*1000: return 24
    if n == 200*1000: return 12
    else:             return 8

stats_list = []
if __name__ ==  '__main__':
    for N, n in params:
        print('Running simulations for n=%s...' % n)
        start_time = time.time()
        results = mp_map(
            network_scaling_worker.mp_lifts,
            [(n, k, p, N, T) for _ in range(num_simulations)],
            thread_count(n),
        )
        avg_stats = [np.mean([x[1][i] for x in results]) for i in range(len(results[0][1]))]
        stats_list.append((N, n, avg_stats))
        elapsed_time = time.time() - start_time
        print('Done! Elapsed %s' % time.strftime("%M:%S", time.gmtime(elapsed_time)))

Experimental post production lift on a Watts-Strogatz network

What we expect to see:

as $N/n$ goes to 0, the rest and overall lifts tend to 0.
as $N/n$ goes to 1, we expted to see a bigger lift for population A (if $N=n$, we released to 100% and lifted everybody's post production)

Conclusion

A simple toy post production model on a Watts-Strogatz graph shows multiple interesting effects (also see previous post):

network effect: boosts post production by a significant factor
degree distribution effect: the network effect boost is a function of the graph's degree distribituion, which for a Watts-Strogatz graph is a function of $p$
dampening effect: we underestimate the true intrinsic lift of the A/B test, because A’s non-A friends don’t get the intrinsic post production boost, so As don’t get the boost “back” through these edges
spillover effect: we measure a lift due to the network effect for friends of A, and further down the network, depending on the distance from As
clustering effect: if the A group is more tightly clustered, we measured a higher lift
experiment size effect: as $N/n$ goes to 1, the effect size approaches the true effect size