Similar posts recommendation with Doc2Vec - Part II

Marton Trencseni - Sun 04 December 2022 • Tagged with similarity, python, gensim, word2vec, doc2vec, pyml

In the previous post, I used the Doc2Vec neural network architecture to compute the similarities between my blog posts. In this second post I investigate the results further by examining clusters in graphs.

Continue reading

Similar posts recommendation with Doc2Vec - Part I

Marton Trencseni - Sat 03 December 2022 • Tagged with similarity, python, gensim, word2vec, doc2vec, pyml

One of the things I learned at Facebook is the power of recommendations. Examples are People You May Know (PYMK), Groups You May Like (GYML) and Pages You May Like (PYML). Inspired by these, I am planning to add an Articles You May Like widget to Bytepawn, based on the semantic similarity of blog posts. I use the Doc2Vec neural network architecture to compute the similarity between my blog posts, and return the top 3 recommendations for each page.

Continue reading

Ask HN: Data Scientists, what libraries do you use for timeseries forecasting?

Marton Trencseni - Wed 30 November 2022 • Tagged with timeseries, prophet, darts, python

One of the most common Data Science tasks in a business setting is timeseries forecasting. I was curious what methods and libraries other Data Scientists use, so I posted an "Ask HN" on Hacker News. The post generated 89 comments, most of them high-quality. This is my summary of the discussion.

Continue reading

Useful Python decorators for Data Scientists

Marton Trencseni - Sun 22 May 2022 • Tagged with python, decorators

I show toy implementations of Python decorator patterns that may be useful for Data Scientists.

Python decorators

Continue reading

Building a toy Python @dataclass decorator

Marton Trencseni - Thu 12 May 2022 • Tagged with python, dataclass, decorator

I write a toy implementation of Python's @dataclass decorator to improve my Python fu and learn more about decorators and metaprogramming.

Python enum

Continue reading

Python decorator patterns

Marton Trencseni - Sun 08 May 2022 • Tagged with python, decorators

I show toy implementations of Python decorator patterns such as @measure, @repeat, @trace, @count, @singleton, and @app.route (made famous by Flask).

Python enum

Continue reading

Building a toy Python Enum class - Part II

Marton Trencseni - Thu 05 May 2022 • Tagged with python, enum

I extend my previous toy implementation of Python's Enum class to add more features.

Python enum

Continue reading

Building a toy Python Enum class - Part I

Marton Trencseni - Tue 03 May 2022 • Tagged with python, enum

I write a toy implementation of Python's Enum class to learn about Python metaclasses.

Python enum

Continue reading

Python types for Data Scientists - Part III

Marton Trencseni - Fri 22 April 2022 • Tagged with python, types

I show slightly more advanced aspects of type checking in Python for Data Scientists.

Mypy

Continue reading

Python types for Data Scientists - Part II

Marton Trencseni - Sun 17 April 2022 • Tagged with python, types

I show slightly more advanced uses of type checking in Python.

Python snake

Continue reading

Python types for Data Scientists - Part I

Marton Trencseni - Fri 08 April 2022 • Tagged with python, types

I show how to use basic type hints and get type checking working in ipython notebooks.

Python types for Data Scientists

Continue reading

Solving 5 algorithmic interview questions

Marton Trencseni - Sat 26 March 2022 • Tagged with interview, python

Recently I was considering whether to introduce some CS style algorithmic interview questions into our Data Science hiring loop, since having an understanding of algorithms and data structures can be useful for Data Scientists. Not having done this soft of interview for a few years I picked up my copy of Daily Coding Problem and starting solving a few problems to refresh my feeling for what it feels like as a candidate, and whether it would give us any useful signals.

Daily coding problem

Continue reading

Sometimes brute forcing just works

Marton Trencseni - Thu 06 May 2021 • Tagged with python

I describe a real world use-case where a simple, brute force search based solution worked really well, making more sophisticated Machine Learning unnecessary.

Sample receipt

Continue reading

Classification accuracy of quantized Autoencoders with Pytorch and MNIST

Marton Trencseni - Fri 09 April 2021 • Tagged with python, pytorch, cnn, torchvision, mnist, autoencoder

I measure how the classification accuracy of quantized Autoencoder neural network varies with encoding bits on MNIST digits.

Classifier accuracy on quantized Autoencoder output after quantization

Continue reading

Investigating information storage in quantized Autoencoders with Pytorch and MNIST

Marton Trencseni - Sun 04 April 2021 • Tagged with python, pytorch, cnn, torchvision, mnist, autoencoder

I investigate how much information an Autoencoder neural network encodes for MNIST digits.

Pytorch Autoencoder loss with encoding dimension and quantization bits

Continue reading

Training a Pytorch Wasserstein MNIST GAN on Google Colab

Marton Trencseni - Wed 03 March 2021 • Tagged with python, pytorch, torchvision, mnist, gan

I train a Pytorch Wasserstein MNIST GAN on Google Colab to beautiful MNIST digits.

Wasserstein GAN Generated MNIST digits

Continue reading

Training a Pytorch Classic MNIST GAN on Google Colab

Marton Trencseni - Tue 02 March 2021 • Tagged with python, pytorch, torchvision, mnist, gan

I train a Pytorch Classic MNIST GAN on Google Colab to generate MNIST digits.

Classic GAN Generated MNIST digits

Continue reading

Training a Pytorch Lightning MNIST GAN on Google Colab

Marton Trencseni - Sat 20 February 2021 • Tagged with python, pytorch, gan, mnist, google-colab

I explore MNIST digits generated by a Generative Adversarial Network trained on Google Colab using Pytorch Lightning.

Softmax GAN after 5 epoch, 100 samples.

Continue reading

Optimizing waits in Airflow

Marton Trencseni - Sat 01 February 2020 • Tagged with data, airflow, python

Sometimes I get to put on my Data Engineering hat for a few days. I enjoy this because I like to move up and down the Data Science stack and I try to keep myself sharp technically. Recently I was able to spend a few days optimizing our Airflow ETL for speed.

Airflow DAG

Continue reading

Using simulated self-play to solve all OpenAI Gym classic control problems with Pytorch

Marton Trencseni - Thu 14 November 2019 • Tagged with python, pytorch, reinforcement, learning, openai, gym

I use simulated self-play by ranking episodes by summed reward. Game outcomes are divided in two by cutting at the median, winners are assigned +1 rewards, losers are assigned -1 rewards, like in games like Go and Chess. Unlike naive policy gradient descent used in previous posts, this version solves all OpenAI classic control problems, albeit slowly.

OpenAI mountaincar

Continue reading