Understanding Facebook’s Planout A/B testing framework

Marton Trencseni - Fri 22 May 2020 • Tagged with ab-testing

PlanOut is a framework for online field experiments. It was created by Facebook in 2014 to make it easy to run and iterate on sophisticated experiments in a statistically sound manner.


Continue reading

Validation checks for A/B tests

Marton Trencseni - Thu 16 April 2020 • Tagged with ab-testing

A/B tests go wrong all the time, even in sophisticated product teams. As this article shows, for a range of problems we can run automated validation checks to catch problems early, before they have too bad of an effect on customers or the business. These validation checks compare various statistical properties of the funnels A and B to catch likely problems. Large technology companies are running such validation checks automatically and continuously for their online experiments.

Kolmogorov-Smirnov test

Continue reading

Running multiple A/B tests in parallel

Marton Trencseni - Mon 06 April 2020 • Tagged with ab-testing

I show using Monte Carlo simulations that randomizing user assignments into A/B test experiments makes it possible to run multiple A/B tests at once and measure accurate lifts on the same metric, assuming the experiments are independent.


Continue reading

Bayesian A/B conversion tests

Marton Trencseni - Tue 31 March 2020 • Tagged with bayesian, ab-test

I compare probabilities from Bayesian A/B testing with Beta distributions to frequentist A/B tests using Monte Carlo simulations. Under a lot of circumstances, the bayesian probability of the action hypothesis being true and the frequentist p value are complementary.

Bayes vs z-test

Continue reading

A/B testing and the G-test

Marton Trencseni - Mon 23 March 2020 • Tagged with ab-testing

The G-test for conversion A/B tests is similar to the Chi-squared test. Monte-Carlo simulations show that the two are indistinguishable in practice.

G-test vs Chi-squared p differences

Continue reading

A/B testing and networks effects

Marton Trencseni - Sat 21 March 2020 • Tagged with ab-testing

I use Monte Carlo simulations to explore how A/B testing on Watts–Strogatz random graphs depends on the degree distribution of the social network.

Watts-Strogatz degree distribution

Continue reading

A/B testing on social networks

Marton Trencseni - Mon 09 March 2020 • Tagged with ab-testing

I use Monte Carlo simulations to show that experimentation on social networks is a beatiful statistical problem with unexpected nuances due to network effects.


Continue reading

Early stopping in A/B testing

Marton Trencseni - Thu 05 March 2020 • Tagged with ab-testing

Increased false positive rate due to early stopping is beautiful nuance of statistical testing. It is equivalent to running at an overall higher alpha. Data scientists need to be aware of this phenomenon so they can control it and keep their organizations honest about their experimental results.

Early stopping

Continue reading

A/B testing and Fisher's exact test

Marton Trencseni - Tue 03 March 2020 • Tagged with ab-testing

Fisher’s exact test directly computes the same p value as the Chi-squared test, so it does not rely on the Central Limit Theorem to hold.

Fisher's test, Fisher Monte Carlo and Chi-squared test p values

Continue reading

A/B testing and the Chi-squared test

Marton Trencseni - Fri 28 February 2020 • Tagged with ab-testing

In an ealier post, I wrote about A/B testing conversion data with the Z-test. The Chi-squared test is a more general test for conversion data, because it can work with multiple conversion events and multiple funnels being tested (A/B/C/D/..).

Chi-squared distribution

Continue reading

A/B testing and the t-test

Marton Trencseni - Sun 23 February 2020 • Tagged with ab-testing

The t-test is better than the z-test for timespent A/B tests, because it explicitly models the uncertainty of the variance due to sampling. Using Monte-Carlo simulations I show that around N=100, the t-test becomes the z-test.

Normal distribution vs t-distribution

Continue reading

A/B testing and the Z-test

Marton Trencseni - Sat 15 February 2020 • Tagged with ab-testing

I discuss the Z-test for A/B testing and show how to compute parameters such as sample size from first principles. I use Monte Carlo simulations to validate significance level and statistical power, and visualize parameter scaling behaviour.

Conversion difference vs N

Continue reading

Beyond the Central Limit Theorem

Marton Trencseni - Thu 06 February 2020 • Tagged with data, ab testing, statistics

In the previous post, I talked about the importance of the Central Limit Theorem (CLT) to A/B testing. Here we will explore cases when we cannot rely on the CLT to hold.

Running mean for Cauchy distribution

Continue reading

A/B testing and the Central Limit Theorem

Marton Trencseni - Wed 05 February 2020 • Tagged with data, ab testing, statistics

When working with hypothesis testing, the desciptions of the statistical method often has normality assumptions. For example, the Wikipedia page for the z-test starts like this: "A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution". What does this mean? How do I know it’s a valid assumption for my data?

Normal distribution from uniform

Continue reading

Optimizing waits in Airflow

Marton Trencseni - Sat 01 February 2020 • Tagged with data, airflow, python

Sometimes I get to put on my Data Engineering hat for a few days. I enjoy this because I like to move up and down the Data Science stack and I try to keep myself sharp technically. Recently I was able to spend a few days optimizing our Airflow ETL for speed.

Airflow DAG

Continue reading

SQL best practices for Data Scientists and Analysts

Marton Trencseni - Sun 26 January 2020 • Tagged with data, programming, sql

My list of SQL best practices for Data Scientists and Analysts, or, how I personally write SQL code. I picked this up at Facebook, and later improved it at Fetchr.

SQL code

Continue reading

How I write SQL code

Marton Trencseni - Fri 24 January 2020 • Tagged with data, programming, sql

This is a simple post about SQL code formatting. Most of this comes from my time as a Data Engineer at Facebook.

SQL code

Continue reading

Small team planning

Marton Trencseni - Fri 10 January 2020 • Tagged with planning, teams, goaling

I’ve worked at 5-10 different organizations, most of them were startups or startuppy companies. I’ve done a lot of planning in small teams, and also taken part in company-wide leadership planning. Here I will describe what has worked well for me in small team settings, focusing on time estimation.

Reaching the peak

Continue reading

Personal goaling

Marton Trencseni - Sun 22 December 2019 • Tagged with self help, goaling

The meta-goal of goaling is to stretch yourself to achieve more, and to feel good about what you’ve achieved. Whatever happened this year, it’s always possible to achieve a lot more and feel better about yourself next year. To hijack a Feynman quote, there is plenty of room at the top.

2019 running

Continue reading

Pytorch in 2019

Marton Trencseni - Thu 12 December 2019 • Tagged with pytorch

2019 was another big year for Pytorch, one of the most popular Deep Learning libraries out there. Pytorch has become the de facto deep learning library used for research thanks to it’s dynamic graph model which allows fast model experimentation. It’s also become production ready, with support for mobile and infrastructure tooling such as Tensorboard.

Pytorch Google Trends 2019

Continue reading

Warren Buffett style fundamental metrics of long-term company performance

Marton Trencseni - Mon 02 December 2019 • Tagged with investing, stocks, warren buffett

I look at some fundamental charts of Apple, Activision Blizzard and Intel.

AAPL shareholder wealth curve

Continue reading

Calibration curves for delivery prediction with Scikit-Learn

Marton Trencseni - Thu 21 November 2019 • Tagged with machine, learning, fetchr, skl, calibration

I show calibration curves for four different binary classification Scikit-Learn models we built for delivery prediction at Fetchr, trained using real-world data: LogisticRegression, DecisionTree, RandomForest and GradientBoosting.

Logistic regression calibration curve

Continue reading

Using simulated self-play to solve all OpenAI Gym classic control problems with Pytorch

Marton Trencseni - Thu 14 November 2019 • Tagged with python, pytorch, reinforcement, learning, openai, gym

I use simulated self-play by ranking episodes by summed reward. Game outcomes are divided in two by cutting at the median, winners are assigned +1 rewards, losers are assigned -1 rewards, like in games like Go and Chess. Unlike naive policy gradient descent used in previous posts, this version solves all OpenAI classic control problems, albeit slowly.

OpenAI mountaincar

Continue reading

Applying policy gradient to OpenAI Gym classic control problems with Pytorch

Marton Trencseni - Tue 12 November 2019 • Tagged with python, pytorch, reinforcement, learning, openai, gym

I try to generalize the policy gradient algorithm as introduced earlier to solve all the OpenAI classic control problems. It works for CartPole and Acrobot, but not for Pendulum and MountainCar environments.

OpenAI classic control environments

Continue reading

Machine Learning at Fetchr

Marton Trencseni - Tue 29 October 2019 • Tagged with machine, learning, fetchr, skl

Opportunities for automating, optimizing and enabling processes with ML at a delivery company such as Fetchr are plentiful. We put three families of ML models into production. These 3 areas are: Scheduling, Notifications and Operational choice.

Operational choice

Continue reading

Solving the CartPole Reinforcement Learning problem with Pytorch

Marton Trencseni - Tue 22 October 2019 • Tagged with python, pytorch, reinforcement, learning, openai, gym, cartpole

The CartPole problem is the Hello World of Reinforcement Learning, originally described in 1985 by Sutton et al. The environment is a pole balanced on a cart. CartPole is one of the environments in OpenAI Gym, so we don't have to code up the physics. Here I walk through a simple solution using Pytorch.

Cartpole animation

Continue reading

Metrics Atlas

Marton Trencseni - Thu 29 August 2019 • Tagged with data, fetchr

The idea is simple: write a document which helps new and existing people—both managers and individual contributors—get an objective, metrics-based picture of the business. This is helpful when new people join, when people start working in new segments of the business, and to understand other parts of the company.

Metrics atlas

Continue reading

Playing Go with supervised learning in Pytorch

Marton Trencseni - Sun 25 August 2019 • Tagged with python, pytorch, cnn, go

Using historic gameplay between strong Go players as training data, a CNN model is built to predict good Go moves on a standard 19x19 Go board.

Go prediction sample

Continue reading

Arabic name classification with Scikit-Learn and Pytorch

Marton Trencseni - Fri 02 August 2019 • Tagged with pytorch, skl, arabic, fetchr

While working on arabic-vs-rest classification, I was curious how good out-of-the-box models perform with publicly available data, and then compare that with what we can achieve with internal data / features derived from millions of deliveries. We train Scikit-learn and Pytorch models for this classification task and achieve 90% prediction accuracy on publicly available data and out-of-the-box models, while internally 99% is achievable.

ROC curve

Continue reading

Exploring prior beliefs with MCMC

Marton Trencseni - Sat 06 July 2019 • Tagged with python, math, pymc3

I use PyMC3 to solve the food delivery toy problem and explore some alternative priors.

PyMC3 traceplot()

Continue reading

A/B tests: Moving Fast vs Being Sure

Marton Trencseni - Mon 01 July 2019 • Tagged with ab-testing, fetchr

Most A/B testing tools default to α=0.05, meaning the expected false positive rate is 5%. In this post I explore the trade-offs between moving fast, ie. using higher α, versus being sure, ie. using lower α.

14. slide

Continue reading

Food deliveries, Bayes and Computational Statistics

Marton Trencseni - Sat 22 June 2019 • Tagged with python, math, fetchr

I was grabbing a burger at Shake Shack, Mall of the Emirates in Dubai, when I noticed this notebook on the counter. The staff is using it to track food deliveries and each service (Carriage, Talabat, UberEats, Deliveroo) has its own column with the order numbers. Let's assume this is the only page for the day, and ask ourselves: given this data, what is the probability that UberEats is the most popular food delivery service?.

Shake shack food deliveries

Continue reading

The Collatz conjecture

Marton Trencseni - Sun 02 June 2019 • Tagged with python, math

The Collatz conjecture is a conjecture in mathematics that concerns a sequence defined as follows: start with any positive integer n. Then each term is obtained from the previous term as follows: if the previous term is even, the next term is one half the previous term. If the previous term is odd, the next term is 3 times the previous term plus 1. The conjecture is that no matter what value of n, the sequence will always reach 1.


Continue reading

MNIST pixel attacks with Pytorch

Marton Trencseni - Sat 01 June 2019 • Tagged with python, pytorch, cnn, torchvision, mnist, skl

It’s easy to build a CNN that does well on MNIST digit classification. How easy is it to break it, to distort the images and cause the model to misclassify?

MNIST attack accuracy

Continue reading

Solving CIFAR-10 with Pytorch and SKL

Marton Trencseni - Tue 14 May 2019 • Tagged with python, pytorch, cnn, torchvision, cifar, skl

CIFAR-10 is a classic image recognition problem, consisting of 60,000 32x32 pixel RGB images (50,000 for training and 10,000 for testing) in 10 categories: plane, car, bird, cat, deer, dog, frog, horse, ship, truck. Convolutional Neural Networks (CNN) do really well on CIFAR-10, achieving 99%+ accuracy. The Pytorch distribution includes an example CNN for solving CIFAR-10, at 45% accuracy. I will use that and merge it with a Tensorflow example implementation to achieve 75%. We use torchvision to avoid downloading and data wrangling the datasets. Like in the MNIST example, I use Scikit-Learn to calculate goodness metrics and plots.

CIFAR examples

Continue reading

Solving MNIST with Pytorch and SKL

Marton Trencseni - Thu 02 May 2019 • Tagged with python, pytorch, cnn, torchvision, mnist, skl

MNIST is a classic image recognition problem, specifically digit recognition. It contains 70,000 28x28 pixel grayscale images of hand-written, labeled images, 60,000 for training and 10,000 for testing. Convolutional Neural Networks (CNN) do really well on MNIST, achieving 99%+ accuracy. The Pytorch distribution includes a 4-layer CNN for solving MNIST. Here I will unpack and go through this example. We use torchvision to avoid downloading and data wrangling the datasets. Finally, instead of calculating performance metrics of the model by hand, I will extract results in a format so we can use SciKit-Learn's rich library of metrics.

MNIST example digits

Continue reading

SVM with Pytorch

Marton Trencseni - Tue 16 April 2019 • Tagged with pytorch, svm, iris

I use the standard Iris dataset for supervised learning with a Support Vector Machine model using Pytorch's autograd.


Continue reading

Hacker News Embeddings with PyTorch

Marton Trencseni - Tue 12 March 2019 • Tagged with pytorch, embedding

A PyTorch model is trained on public Hacker News data, embedding posts and comments into a high-dimensional vector space, using the mean squared error (MSE) of dot products as the loss function. The resulting model is reasonably good at finding similar posts and recommending posts for users.

Vector space

Continue reading

rxe: literate and composable regular expressions

Marton Trencseni - Sat 02 March 2019 • Tagged with python

rxe is a thin wrapper around Python's re module. The various rxe functions are wrappers around corresponding re patterns. For example, rxe.digit().one_or_more('a').whitespace() corresponds to \da+\s. Because rxe uses parentheses but wants to avoid unnamed groups, the internal (equivalent) representation is actually \d(?:a)+\s. This pattern can always be retrieved with get_pattern().

rxe example code

Continue reading

PyTorch Basics: Solving the Ax=b matrix equation with gradient descent

Marton Trencseni - Fri 08 February 2019 • Tagged with pytorch

I will show how to solve the standard A x = b matrix equation with PyTorch. This is a good toy problem to show some guts of the framework without involving neural networks.

PyTorch computational graph

Continue reading

Automating a Call Center with Machine Learning

Marton Trencseni - Sun 27 January 2019 • Tagged with fetchr, machine-learning, call-center

Over a period of 6 months, we rolled out a Machine Learning model to predict a customer’s delivery (latitude, longitude). During the recent holiday peak, this ML model handled most of Fetchr’s order scheduling.

Share of ML scheduled versus Call center scheduled deliveries

Continue reading

5 things that happened in Data Science in 2018

Marton Trencseni - Wed 09 January 2019 • Tagged with data, openai, waymo, deepmind, tesla, reinforce

2018 was a hot year for Data Science and AI. Here we picked out 5 highlights, which in our opinion shaped the field in the past year.

Deepmind playing CTF

Continue reading

Warehouse locations with k-means

Marton Trencseni - Wed 26 September 2018 • Tagged with data, data-science, metrics, fetchr

Sometimes, the seven gods of data science, Pascal, Gauss, Bayes, Poisson, Markov, Shannon and Fisher, all wake up in a good mood, and things just work out. Recently we had such an occurence at Fetchr, when the Operational Excellence team posed the following question: if we could pick our Saudi warehouse locations, where would be put them? What is the ideal number of warehouses, and, what does ideal even mean? Also, what should our “delivery radius” be?

Continue reading

Growth Accounting and Backtraced Growth Accounting

Marton Trencseni - Sun 16 September 2018 • Tagged with data, data-science, metrics, growth-accounting, fetchr

Previously I wrote two articles about data infra and data engineering at Fetchr. This time I want to move up the stack and talk about a simple piece of metrics engineering that proved to be very impactful: Growth Accounting and Backtraced Growth Accounting.

Backtraced Growth Accounting

Continue reading

Fetchr Data Science Infra at 1 year

Marton Trencseni - Tue 14 August 2018 • Tagged with data, etl, workflow, airflow, fetchr, model, ml

A description of our Analytics+ML cluster running on AWS, using Presto, Airflow and Superset.

Fetchr Data Science Infra

Continue reading

What not to spend time on

Marton Trencseni - Mon 23 July 2018 • Tagged with warren, buffett, self, help, physics, haskell

Warren Buffett says deciding what not to spend time on is just as important as deciding what to spend time on.

Warren Buffett

Continue reading

Beat the averages

Marton Trencseni - Sat 07 July 2018 • Tagged with statistics, data

When working with averages, we have to be careful. There are pitfalls lurking to pollute our statistics and results reported.

Probability distribution

Continue reading

Building the Fetchr Data Science Infra on AWS with Presto and Airflow

Marton Trencseni - Wed 14 March 2018 • Tagged with data, etl, workflow, airflow, fetchr

We used Hive/Presto on AWS together with Airflow to rapidly build out the Data Science Infrastructure at Fetchr in less than 6 months.

Warehouse DAG

Continue reading

Don’t build cockpits, become a coach

Marton Trencseni - Wed 09 November 2016 • Tagged with data, science, product, analytics

I used to think that a good analogy for using data is the instrumentation of a cockpit in an airliner. Lots of instruments, and if they fail, the pilot can’t fly the plane and bad things happen. There’s no autopilot for companies. The problem with this analogy is that planes aren’t built in mid-air. Product teams and companies constantly need to build and ship new products.

A big complicated cockpit

Continue reading

Beautiful A/B testing

Marton Trencseni - Sun 05 June 2016 • Tagged with ab-testing, strata, statistics, data

I gave this talk at the O’Reilly Strata Conference London in 2016 June, mostly based on what I learned at Prezi from 2012-2016.

14. slide

Continue reading

Hack, HHVM and avoiding the Second-system effect

Marton Trencseni - Sat 14 May 2016 • Tagged with books, programming, hhvm, brooks

I read this book on my first vacation after I started working at Facebook and thus became a semi-regular Hack/HHVM user. I highly recommend reading (parts of) it. But not to learn Hack/PHP, which is irrelevant to most people. Instead, it’s to learn about how Facebook improved it’s www codebase and performance without rewriting the old PHP code in one big effort, and thus avoided the famous Second-system effect.

Hack book

Continue reading

Einstein's amazing theory

Marton Trencseni - Tue 16 February 2016 • Tagged with physics, einstein, relativity

This post is about the amazing success of Einstein's general theory of relativity. The theory predicts, among other things the accelerating Universe, black holes, gravitational lensing and gravitational waves. The real shocker is to remember that Einstein didn't invent general relativity to explain these. He didn’t know about these, they didn't exist at that time!

Continue reading

Heisengames and the importance of patience in business

Marton Trencseni - Mon 08 February 2016 • Tagged with heisengames, business

Most bets businesses take, be it hiring, features, products or strategy don't work out. Still, many businesses are successful despite setbacks. A negative attitude---even when the analysis of the situation is in fact correct---may be missing the bigger picture.

Continue reading

Cloud9: Cloud coding that actually works

Marton Trencseni - Sun 07 February 2016 • Tagged with coding, ide, c9

For the past 2 months I've been using Cloud9 for writing code in the cloud, and I can wholeheartedly recommend it: it just works for me. It's basically Docker plus an IDE: you get a Docker container running Ubuntu that you can access over a web IDE.

Continue reading

Luigi vs Airflow vs Pinball

Marton Trencseni - Sat 06 February 2016 • Tagged with data, etl, workflow, luigi, airflow, pinball

A spreadsheet comparing the three opensource workflow tools for ETL.


Continue reading

Pinball review

Marton Trencseni - Sat 06 February 2016 • Tagged with data, etl, workflow, pinball

Pinball is an ETL tool written by Pinterest. Like Airflow, it supports defining tasks and dependencies as Python code, executing and scheduling them, and distributing tasks across worker nodes. It supports calendar scheduling (hourly/daily jobs, also visualized on the web dashboard). Unfortunately, I found Pinball has very little documentation, very few recent commits in the Github repo and few meaningful answers to Github issues by maintainers, while it's architecture is complicated and undocumented.

Continue reading

How to make a blog like this

Marton Trencseni - Thu 07 January 2016 • Tagged with blog, pelican

Make a simple blog with Github Pages and Pelican.

Continue reading

Airflow review

Marton Trencseni - Wed 06 January 2016 • Tagged with data, etl, workflow, airflow

Airflow is a workflow scheduler written by Airbnb. It supports defining tasks and dependencies as Python code, executing and scheduling them, and distributing tasks across worker nodes. It supports calendar scheduling (hourly/daily jobs, also visualized on the web dashboard), so it can be used as a starting point for traditional ETL. It has a nice web dashboard for seeing current and past task state, querying the history and making changes to metadata such as connection strings.


Continue reading

Systems thinking and system traps

Marton Trencseni - Wed 06 January 2016 • Tagged with systems, books

Thinking in Systems, written by the late Donella Meadows, is a book about how to think about systems, how to control systems and how systems change and control themselves. A system can be anything from a heating furnace to a social system. The gem of the book is the part about system traps. System traps are ways a system can go wrong; examples are drift to low performance, seeking the wrong goals, shifting the burden, etc.

Thinking in systems

Continue reading

Luigi review

Marton Trencseni - Sun 20 December 2015 • Tagged with data, etl, workflow, luigi

I review Luigi, an execution framework for writing data pipes in Python code. It supports task-task dependencies, it has a simple central scheduler with an HTTP API and an extensive library of helpers for building data pipes for Hadoop, AWS, Mysql etc. It was written by Spotify for internal use and open sourced in 2012. A number of companies use it, such as Foursquare, Stripe, Asana.

Continue reading

Cargo Cult Data

Marton Trencseni - Mon 26 January 2015 • Tagged with data

Cargo cult data is when you're collecting and looking at data when making decisions, but you're only following the forms and outside appearances of scientific investigation and missing the essentials, so it doesn't work.

Continue reading