Calibration curves for delivery prediction with Scikit-Learn

Marton Trencseni - Thu 21 November 2019 • Tagged with machine, learning, fetchr, skl, calibration

I show calibration curves for four different binary classification Scikit-Learn models we built for delivery prediction at Fetchr, trained using real-world data: LogisticRegression, DecisionTree, RandomForest and GradientBoosting.

Logistic regression calibration curve

Continue reading

Machine Learning at Fetchr

Marton Trencseni - Tue 29 October 2019 • Tagged with machine, learning, fetchr, skl

Opportunities for automating, optimizing and enabling processes with ML at a delivery company such as Fetchr are plentiful. We put three families of ML models into production. These 3 areas are: Scheduling, Notifications and Operational choice.

Operational choice

Continue reading

Metrics Atlas

Marton Trencseni - Thu 29 August 2019 • Tagged with data, fetchr

The idea is simple: write a document which helps new and existing people—both managers and individual contributors—get an objective, metrics-based picture of the business. This is helpful when new people join, when people start working in new segments of the business, and to understand other parts of the company.

Metrics atlas

Continue reading

Arabic name classification with Scikit-Learn and Pytorch

Marton Trencseni - Fri 02 August 2019 • Tagged with pytorch, skl, arabic, fetchr

While working on arabic-vs-rest classification, I was curious how good out-of-the-box models perform with publicly available data, and then compare that with what we can achieve with internal data / features derived from millions of deliveries. We train Scikit-learn and Pytorch models for this classification task and achieve 90% prediction accuracy on publicly available data and out-of-the-box models, while internally 99% is achievable.

ROC curve

Continue reading

A/B tests: Moving Fast vs Being Sure

Marton Trencseni - Mon 01 July 2019 • Tagged with ab-testing, fetchr

Most A/B testing tools default to α=0.05, meaning the expected false positive rate is 5%. In this post I explore the trade-offs between moving fast, ie. using higher α, versus being sure, ie. using lower α.

14. slide

Continue reading

Food deliveries, Bayes and Computational Statistics

Marton Trencseni - Sat 22 June 2019 • Tagged with python, math, fetchr

I was grabbing a burger at Shake Shack, Mall of the Emirates in Dubai, when I noticed this notebook on the counter. The staff is using it to track food deliveries and each service (Carriage, Talabat, UberEats, Deliveroo) has its own column with the order numbers. Let's assume this is the only page for the day, and ask ourselves: given this data, what is the probability that UberEats is the most popular food delivery service?.

Shake shack food deliveries

Continue reading

Automating a Call Center with Machine Learning

Marton Trencseni - Sun 27 January 2019 • Tagged with fetchr, machine-learning, call-center

Over a period of 6 months, we rolled out a Machine Learning model to predict a customer’s delivery (latitude, longitude). During the recent holiday peak, this ML model handled most of Fetchr’s order scheduling.

Share of ML scheduled versus Call center scheduled deliveries

Continue reading

Warehouse locations with k-means

Marton Trencseni - Wed 26 September 2018 • Tagged with data, data-science, metrics, fetchr

Sometimes, the seven gods of data science, Pascal, Gauss, Bayes, Poisson, Markov, Shannon and Fisher, all wake up in a good mood, and things just work out. Recently we had such an occurence at Fetchr, when the Operational Excellence team posed the following question: if we could pick our Saudi warehouse locations, where would be put them? What is the ideal number of warehouses, and, what does ideal even mean? Also, what should our “delivery radius” be?

.

Continue reading

Growth Accounting and Backtraced Growth Accounting

Marton Trencseni - Sun 16 September 2018 • Tagged with data, data-science, metrics, growth-accounting, fetchr

Previously I wrote two articles about data infra and data engineering at Fetchr. This time I want to move up the stack and talk about a simple piece of metrics engineering that proved to be very impactful: Growth Accounting and Backtraced Growth Accounting.

Backtraced Growth Accounting

Continue reading

Fetchr Data Science Infra at 1 year

Marton Trencseni - Tue 14 August 2018 • Tagged with data, etl, workflow, airflow, fetchr, model, ml

A description of our Analytics+ML cluster running on AWS, using Presto, Airflow and Superset.

Fetchr Data Science Infra

Continue reading

Building the Fetchr Data Science Infra on AWS with Presto and Airflow

Marton Trencseni - Wed 14 March 2018 • Tagged with data, etl, workflow, airflow, fetchr

We used Hive/Presto on AWS together with Airflow to rapidly build out the Data Science Infrastructure at Fetchr in less than 6 months.

Warehouse DAG

Continue reading