2024 Data Outlook

Marton Trencseni - Sun 11 February 2024 • Tagged with outlook, 2024, datahub, capm

It is the beginning of the year — a good time to reflect on the previous year and make plans for the year ahead. I wrote this document for my team members in 2024 January to kick off the year. This is an abridged version with sensitive content removed.

.

Continue reading

Introduction to Marketing Mix Modeling

Marton Trencseni - Sun 23 July 2023 • Tagged with mmm, marketing, mixed, model, lightweight_mmm, google, python

I describe the concept of Marketing Mix Modeling using Google's LightweightMMM library.

MMM attribution

Continue reading

Real-world experiments I: 5 Lessons from Google, Bing, Netflix and Alibaba

Marton Trencseni - Sun 18 June 2023 • Tagged with ab-testing

I discuss five lessons from large-scale experiments conducted by Google, Bing, Netflix and Alibaba: Kohavi's 1 out of 3 rule, Google's 41 shades of blue, Bing's unexpected big win, Alibaba's personalization experiment and Netflix' movie image personalization.

Netflix

Continue reading

Conditional Probabilities and Simpson's Paradox

Marton Trencseni - Sun 11 June 2023 • Tagged with probability, statistics, simpsons, paradox

I give examples of "unintuitive" conditional probabilities and discuss Simpson's paradox.

Simpson's paradox

Continue reading

Common patterns in technical interviewing

Marton Trencseni - Sat 01 October 2022 • Tagged with interviewing

I will attempt to enumerate all the categories of questions commonly asked in technical interview loops, and my experience with them.

.

Continue reading

More Data Scientists should learn SQL

Marton Trencseni - Sun 29 May 2022 • Tagged with data, sql

In my experience, many Data Scientists struggle to write SQL queries in interviews.

SQL

Continue reading

100 articles

Marton Trencseni - Mon 18 October 2021 • Tagged with meta

A review and introspect on the first 100 articles written on Bytepawn.

100

Continue reading

The Full Stack Data Scientist

Marton Trencseni - Fri 23 July 2021 • Tagged with data, fallacies

What are the core skills a data scientist needs to sustainably achieve bottom-line impact, without blocking on external help from other roles?

Data Scientist

Continue reading

YOLO object detection architecture

Marton Trencseni - Sat 10 July 2021 • Tagged with yolo, yolov5, vision, object detection

I discuss the YOLO neural network architecture for object detection.

YOLO architecture

Continue reading

YOLOv5 object detection experiments

Marton Trencseni - Fri 02 July 2021 • Tagged with yolo, yolov5, vision, object detection

I run object detection experiments with pre-trained YOLOv5 models.

YOLO object detection example

Continue reading

Predicting party affiliation of US politicians using fasttext

Marton Trencseni - Sun 20 June 2021 • Tagged with statistics, trump, politics, fasttext, twitter

I train a fasttext classifier on 1.2M data points to predict US politicians' party affiliations from their twitter messages.

Trump Schiff

Continue reading

Random digits and Benford's law

Marton Trencseni - Sat 29 May 2021 • Tagged with statistics

The post explores the distribution of digits of random and non-random numbers from receipts, verifying Benford's law of first digit distribution.

Early stopping

Continue reading

10 ways to iterate from 0 to 1 with deciles

Marton Trencseni - Fri 14 May 2021 • Tagged with mlflow, tracking

What's the best way to iteratore from 0 to 1 in steps of 0.1 in Python, and what are the potential pitfalls?

Iterating from 0 to 1 in steps of 0.1

Continue reading

Building intuition for p-values and statistical significance

Marton Trencseni - Sun 25 April 2021 • Tagged with ab-testing

This is the transcript of a talk I did on experimentation and A/B testing to give the audience an intuitive understanding of p-values and statistical significance.

Coin flip

Continue reading

Random numbers, the natural logarithm and higher dimensional simplexes

Marton Trencseni - Sat 17 April 2021 • Tagged with bayesian, ab-test

The base $e$ of the natural logarithm shows up in an unexpected place. Let's derive why!

Simplex

Continue reading

Building a Pytorch Autoencoder for MNIST digits

Marton Trencseni - Thu 18 March 2021 • Tagged with pytorch, autoencoder, mnist

I build an Autoencoder network to categorize MNIST digits in Pytorch.

Conversion difference vs N

Continue reading

Training a Pytorch Wasserstein MNIST GAN on Google Colab

Marton Trencseni - Wed 03 March 2021 • Tagged with python, pytorch, torchvision, mnist, gan

I train a Pytorch Wasserstein MNIST GAN on Google Colab to beautiful MNIST digits.

Wasserstein GAN Generated MNIST digits

Continue reading

Training a Pytorch Lightning MNIST GAN on Google Colab

Marton Trencseni - Sat 20 February 2021 • Tagged with python, pytorch, gan, mnist, google-colab

I explore MNIST digits generated by a Generative Adversarial Network trained on Google Colab using Pytorch Lightning.

Softmax GAN after 5 epoch, 100 samples.

Continue reading

Automatic MLFlow logging for Pytorch

Marton Trencseni - Sun 24 January 2021 • Tagged with mlflow, tracking

I explore the automatic logging capabilities of MLFlow for Pytorch.

MLFlow Pytorch loss example.

Continue reading

Automatic MLFlow logging for Scikit Learn

Marton Trencseni - Fri 15 January 2021 • Tagged with mlflow, tracking

I explore the automatic logging capabilities of MLFlow for Scikit Learn. In the process I found a bug in MLFlow, reported it and wrote a pull request to fix it.

MLFlow scatter plot.

Continue reading