Python types for Data Scientists - Part I
Marton Trencseni - Fri 08 April 2022 • Tagged with python, types
I show how to use basic type hints and get type checking working in ipython notebooks.
Marton Trencseni - Fri 08 April 2022 • Tagged with python, types
I show how to use basic type hints and get type checking working in ipython notebooks.
Marton Trencseni - Sat 26 March 2022 • Tagged with interview, python
Recently I was considering whether to introduce some CS style algorithmic interview questions into our Data Science hiring loop, since having an understanding of algorithms and data structures can be useful for Data Scientists. Not having done this soft of interview for a few years I picked up my copy of Daily Coding Problem and starting solving a few problems to refresh my feeling for what it feels like as a candidate, and whether it would give us any useful signals.
Marton Trencseni - Thu 06 May 2021 • Tagged with python
I describe a real world use-case where a simple, brute force search based solution worked really well, making more sophisticated Machine Learning unnecessary.
Marton Trencseni - Fri 09 April 2021 • Tagged with python, pytorch, cnn, torchvision, mnist, autoencoder
I measure how the classification accuracy of quantized Autoencoder neural network varies with encoding bits on MNIST digits.
Marton Trencseni - Sun 04 April 2021 • Tagged with python, pytorch, cnn, torchvision, mnist, autoencoder
I investigate how much information an Autoencoder neural network encodes for MNIST digits.
Marton Trencseni - Wed 03 March 2021 • Tagged with python, pytorch, torchvision, mnist, gan
I train a Pytorch Wasserstein MNIST GAN on Google Colab to beautiful MNIST digits.
Marton Trencseni - Tue 02 March 2021 • Tagged with python, pytorch, torchvision, mnist, gan
I train a Pytorch Classic MNIST GAN on Google Colab to generate MNIST digits.
Marton Trencseni - Sat 20 February 2021 • Tagged with python, pytorch, gan, mnist, google-colab
I explore MNIST digits generated by a Generative Adversarial Network trained on Google Colab using Pytorch Lightning.
Marton Trencseni - Sat 01 February 2020 • Tagged with data, airflow, python
Sometimes I get to put on my Data Engineering hat for a few days. I enjoy this because I like to move up and down the Data Science stack and I try to keep myself sharp technically. Recently I was able to spend a few days optimizing our Airflow ETL for speed.
Marton Trencseni - Thu 14 November 2019 • Tagged with python, pytorch, reinforcement, learning, openai, gym
I use simulated self-play by ranking episodes by summed reward. Game outcomes are divided in two by cutting at the median, winners are assigned +1 rewards, losers are assigned -1 rewards, like in games like Go and Chess. Unlike naive policy gradient descent used in previous posts, this version solves all OpenAI classic control problems, albeit slowly.
Marton Trencseni - Tue 12 November 2019 • Tagged with python, pytorch, reinforcement, learning, openai, gym
I try to generalize the policy gradient algorithm as introduced earlier to solve all the OpenAI classic control problems. It works for CartPole and Acrobot, but not for Pendulum and MountainCar environments.
Marton Trencseni - Tue 22 October 2019 • Tagged with python, pytorch, reinforcement, learning, openai, gym, cartpole
The CartPole problem is the Hello World of Reinforcement Learning, originally described in 1985 by Sutton et al. The environment is a pole balanced on a cart. CartPole is one of the environments in OpenAI Gym, so we don't have to code up the physics. Here I walk through a simple solution using Pytorch.
Marton Trencseni - Sun 25 August 2019 • Tagged with python, pytorch, cnn, go
Using historic gameplay between strong Go players as training data, a CNN model is built to predict good Go moves on a standard 19x19 Go board.
Marton Trencseni - Sat 06 July 2019 • Tagged with python, math, pymc3
I use PyMC3 to solve the food delivery toy problem and explore some alternative priors.
Marton Trencseni - Sat 22 June 2019 • Tagged with python, math, fetchr
I was grabbing a burger at Shake Shack, Mall of the Emirates in Dubai, when I noticed this notebook on the counter. The staff is using it to track food deliveries and each service (Carriage, Talabat, UberEats, Deliveroo) has its own column with the order numbers. Let's assume this is the only page for the day, and ask ourselves: given this data, what is the probability that UberEats is the most popular food delivery service?.
Marton Trencseni - Sun 02 June 2019 • Tagged with python, math
The Collatz conjecture is a conjecture in mathematics that concerns a sequence defined as follows: start with any positive integer n. Then each term is obtained from the previous term as follows: if the previous term is even, the next term is one half the previous term. If the previous term is odd, the next term is 3 times the previous term plus 1. The conjecture is that no matter what value of n, the sequence will always reach 1.
Marton Trencseni - Sat 01 June 2019 • Tagged with python, pytorch, cnn, torchvision, mnist, skl
It’s easy to build a CNN that does well on MNIST digit classification. How easy is it to break it, to distort the images and cause the model to misclassify?
Marton Trencseni - Tue 14 May 2019 • Tagged with python, pytorch, cnn, torchvision, cifar, skl
CIFAR-10 is a classic image recognition problem, consisting of 60,000 32x32 pixel RGB images (50,000 for training and 10,000 for testing) in 10 categories: plane, car, bird, cat, deer, dog, frog, horse, ship, truck. Convolutional Neural Networks (CNN) do really well on CIFAR-10, achieving 99%+ accuracy. The Pytorch distribution includes an example CNN for solving CIFAR-10, at 45% accuracy. I will use that and merge it with a Tensorflow example implementation to achieve 75%. We use torchvision to avoid downloading and data wrangling the datasets. Like in the MNIST example, I use Scikit-Learn to calculate goodness metrics and plots.
Marton Trencseni - Thu 02 May 2019 • Tagged with python, pytorch, cnn, torchvision, mnist, skl
MNIST is a classic image recognition problem, specifically digit recognition. It contains 70,000 28x28 pixel grayscale images of hand-written, labeled images, 60,000 for training and 10,000 for testing. Convolutional Neural Networks (CNN) do really well on MNIST, achieving 99%+ accuracy. The Pytorch distribution includes a 4-layer CNN for solving MNIST. Here I will unpack and go through this example. We use torchvision to avoid downloading and data wrangling the datasets. Finally, instead of calculating performance metrics of the model by hand, I will extract results in a format so we can use SciKit-Learn's rich library of metrics.
Marton Trencseni - Sat 02 March 2019 • Tagged with python
rxe
is a thin wrapper around Python's re
module. The various rxe
functions are wrappers around corresponding re
patterns. For example, rxe.digit().one_or_more('a').whitespace()
corresponds to \da+\s
. Because rxe
uses parentheses but wants to avoid unnamed groups, the internal (equivalent) representation is actually \d(?:a)+\s
. This pattern can always be retrieved with get_pattern()
.