The Full Stack Data Scientist

Marton Trencseni - Fri 23 July 2021 • Tagged with data, fallacies

What are the core skills a data scientist needs to sustainably achieve bottom-line impact, without blocking on external help from other roles?

Data Scientist

Continue reading

YOLO object detection architecture

Marton Trencseni - Sat 10 July 2021 • Tagged with yolo, yolov5, vision, object detection

I discuss the YOLO neural network architecture for object detection.

YOLO architecture

Continue reading

YOLOv5 object detection experiments

Marton Trencseni - Fri 02 July 2021 • Tagged with yolo, yolov5, vision, object detection

I run object detection experiments with pre-trained YOLOv5 models.

YOLO object detection example

Continue reading

Predicting party affiliation of US politicians using fasttext

Marton Trencseni - Sun 20 June 2021 • Tagged with statistics, trump, politics, fasttext, twitter

I train a fasttext classifier on 1.2M data points to predict US politicians' party affiliations from their twitter messages.

Trump Schiff

Continue reading

Random digits and Benford's law

Marton Trencseni - Sat 29 May 2021 • Tagged with statistics

The post explores the distribution of digits of random and non-random numbers from receipts, verifying Benford's law of first digit distribution.

Early stopping

Continue reading

10 ways to iterate from 0 to 1 with deciles

Marton Trencseni - Fri 14 May 2021 • Tagged with mlflow, tracking

What's the best way to iteratore from 0 to 1 in steps of 0.1 in Python, and what are the potential pitfalls?

Iterating from 0 to 1 in steps of 0.1

Continue reading

Building intuition for p-values and statistical significance

Marton Trencseni - Sun 25 April 2021 • Tagged with ab-testing

This is the transcript of a talk I did on experimentation and A/B testing, to give the audience an intuitive understanding of p-values and statistical significance.

Coin flip

Continue reading

Random numbers, the natural logarithm and higher dimensional simplexes

Marton Trencseni - Sat 17 April 2021 • Tagged with bayesian, ab-test

The base $e$ of the natural logarithm shows up in an unexpected place. Let's derive why!

Simplex

Continue reading

Building a Pytorch Autoencoder for MNIST digits

Marton Trencseni - Thu 18 March 2021 • Tagged with pytorch, autoencoder, mnist

I build an Autoencoder network to categorize MNIST digits in Pytorch.

Conversion difference vs N

Continue reading

Training a Pytorch Wasserstein MNIST GAN on Google Colab

Marton Trencseni - Wed 03 March 2021 • Tagged with python, pytorch, torchvision, mnist, gan

I train a Pytorch Wasserstein MNIST GAN on Google Colab to beautiful MNIST digits.

Wasserstein GAN Generated MNIST digits

Continue reading

Training a Pytorch Lightning MNIST GAN on Google Colab

Marton Trencseni - Sat 20 February 2021 • Tagged with python, pytorch, gan, mnist, google-colab

I explore MNIST digits generated by a Generative Adversarial Network trained on Google Colab using Pytorch Lightning.

Softmax GAN after 5 epoch, 100 samples.

Continue reading

Automatic MLFlow logging for Pytorch

Marton Trencseni - Sun 24 January 2021 • Tagged with mlflow, tracking

I explore the automatic logging capabilities of MLFlow for Pytorch.

MLFlow Pytorch loss example.

Continue reading

Automatic MLFlow logging for Scikit Learn

Marton Trencseni - Fri 15 January 2021 • Tagged with mlflow, tracking

I explore the automatic logging capabilities of MLFlow for Scikit Learn. In the process I found a bug in MLFlow, reported it and wrote a pull request to fix it.

MLFlow scatter plot.

Continue reading

Getting Started with MLFlow

Marton Trencseni - Sun 10 January 2021 • Tagged with mlflow, tracking

For the last few months I’ve been using MFlow in production, specifically its Tracking component. MLFlow is an open source project for lifecycle tracking and serving of ML models, coming out of Databricks. MLFlow is model agnostic, so you can use with SKLearn, XGBoost, Pytorch, Tensorflow, FBProphet, anything.

MLFlow overview

Continue reading

Making statistics lie for the 2020 Presidential election

Marton Trencseni - Thu 17 December 2020 • Tagged with ab-testing, trump, politics

After the 2020 US presidential election, the Trump campaign filed over 50 lawsuits and attacked the integrity of the elections by claiming there was voter fraud. One of the last lawsuits was filed in the Supreme Court of the United States by the state of Texas. Here I look at the statistical claims made in this lawsuit that were supposed to show irregularities in the Georgia vote.

Trump vs Biden

Continue reading

Comparing conversion at control and treatment sites

Marton Trencseni - Thu 03 December 2020 • Tagged with ab-testing

In real-life, non-digital situations, it's often not feasible to run true A/B tests. In such cases, we can compare before and after rollout conversions at a treatment site, while using a similar control site to measure and correct for seasonality. The post discusses how to compute increasingly correct p-values and bayesian probabilities in such scenarios.

Monte Carlo simulated control lifts

Continue reading

Unevenness at the edges

Marton Trencseni - Fri 30 October 2020 • Tagged with stats, data

Sometimes we look at the top performers in a field and see obviously uneven representations of groups (gender, ethnicity, etc). There a multitude of factors that can lead to it, such as unfair bias in access to opportunities. Here I will show one unintuitive mathematical effect that can contribute to such unevenness in the case of normal distributions.

Continue reading

Effective Data Visualization Part 3: Line charts and stacked area charts

Marton Trencseni - Tue 01 September 2020 • Tagged with charts, dashboards, data, visualization

Most charts should be line charts or stacked area chart, because they communicate valuable trend information and are easy to parse for the human eyes and brain.

Continue reading

Effective Data Visualization Part 2: Formatting numbers

Marton Trencseni - Sun 23 August 2020 • Tagged with charts, dashboards, data, visualization

Format numbers for human consumption. What is more readable, 1.539e+5 or 153,859? Showing numbers effectively on spreadsheets, charts, dashboards, reports is a basic ingredient for readability, like formatting code in programming.

Continue reading

Effective Data Visualization Part 1: Categorical data

Marton Trencseni - Sat 22 August 2020 • Tagged with charts, dashboards, data, visualization

Making clear, readable charts is part of the craftmanship minimum for any data related role. In part one, I look at how to present categorical data.

A pie chart

Continue reading