Testing randomness extractors
Marton Trencseni - Sun 02 June 2024 • Tagged with randomness, extractors, biased, coin, fair
I apply NIST's suite of statistical tests to my randomness extractor implementations from the previous posts.
Marton Trencseni - Sun 02 June 2024 • Tagged with randomness, extractors, biased, coin, fair
I apply NIST's suite of statistical tests to my randomness extractor implementations from the previous posts.
Marton Trencseni - Sun 26 May 2024 • Tagged with randomness, extractors, biased, coin, fair
I discuss various models of biased bit sequences, and how to extract uniform random (or close to it) output bit sequences from them, illustrated with Python code.
Marton Trencseni - Wed 01 May 2024 • Tagged with ab-testing, facebook, stratification, propensity
Why are Randomized Controlled Trials (RCTs, known as A/B testing in much of the industry) testing is widely regarded as the golden standard of causal inference? What else can a Data Scientist do if A/B testing is not possible, and why are those alternatives inferior to A/B testing?
This papers shows, using 15 experiments (for ads on Facebook) where a RCT was conducted, that common observational methods (run on the Facebook data, by ignoring the control group) severely mis-estimate the true treatment life (as measured by the RCT), often by a factor of 3x or more. This is true, even though Facebook has (i) very large sample sizes, and, (ii) very high quality data (per-user feature vector) about its users which are used in the observational methods. This should be a major red flag for Data Scientists working on common marketing measurements (such as marketing campaigns) using observational methods.
Marton Trencseni - Sun 11 February 2024 • Tagged with outlook, 2024, datahub, capm
It is the beginning of the year — a good time to reflect on the previous year and make plans for the year ahead. I wrote this document for my team members in 2024 January to kick off the year. This is an abridged version with sensitive content removed.
Marton Trencseni - Sun 23 July 2023 • Tagged with mmm, marketing, mixed, model, lightweight_mmm, google, python
I describe the concept of Marketing Mix Modeling using Google's LightweightMMM library.
Marton Trencseni - Sun 18 June 2023 • Tagged with ab-testing
I discuss five lessons from large-scale experiments conducted by Google, Bing, Netflix and Alibaba: Kohavi's 1 out of 3 rule, Google's 41 shades of blue, Bing's unexpected big win, Alibaba's personalization experiment and Netflix' movie image personalization.
Marton Trencseni - Sun 11 June 2023 • Tagged with probability, statistics, simpsons, paradox
I give examples of "unintuitive" conditional probabilities and discuss Simpson's paradox.
Marton Trencseni - Sat 01 October 2022 • Tagged with interviewing
I will attempt to enumerate all the categories of questions commonly asked in technical interview loops, and my experience with them.
In my experience, many Data Scientists struggle to write SQL queries in interviews.
Marton Trencseni - Mon 18 October 2021 • Tagged with meta
A review and introspect on the first 100 articles written on Bytepawn.
Marton Trencseni - Fri 23 July 2021 • Tagged with data, fallacies
What are the core skills a data scientist needs to sustainably achieve bottom-line impact, without blocking on external help from other roles?
Marton Trencseni - Sat 10 July 2021 • Tagged with yolo, yolov5, vision, object detection
I discuss the YOLO neural network architecture for object detection.
Marton Trencseni - Fri 02 July 2021 • Tagged with yolo, yolov5, vision, object detection
I run object detection experiments with pre-trained YOLOv5 models.
Marton Trencseni - Sun 20 June 2021 • Tagged with statistics, trump, politics, fasttext, twitter
I train a fasttext classifier on 1.2M data points to predict US politicians' party affiliations from their twitter messages.
Marton Trencseni - Sat 29 May 2021 • Tagged with statistics
The post explores the distribution of digits of random and non-random numbers from receipts, verifying Benford's law of first digit distribution.
Marton Trencseni - Fri 14 May 2021 • Tagged with mlflow, tracking
What's the best way to iteratore from 0 to 1 in steps of 0.1 in Python, and what are the potential pitfalls?
Marton Trencseni - Sun 25 April 2021 • Tagged with ab-testing
This is the transcript of a talk I did on experimentation and A/B testing to give the audience an intuitive understanding of p-values and statistical significance.
Marton Trencseni - Sat 17 April 2021 • Tagged with bayesian, ab-test
The base $e$ of the natural logarithm shows up in an unexpected place. Let's derive why!
Marton Trencseni - Thu 18 March 2021 • Tagged with pytorch, autoencoder, mnist
I build an Autoencoder network to categorize MNIST digits in Pytorch.
Marton Trencseni - Wed 03 March 2021 • Tagged with python, pytorch, torchvision, mnist, gan
I train a Pytorch Wasserstein MNIST GAN on Google Colab to beautiful MNIST digits.