In the first year of Data Science at Fetchr, we built out a data science infrastructure that allows us to understand the data, our business and operations from a quantitative perspective. We built a Presto data warehouse on AWS, built approx. 100-200 Airflow pipelines to feed it from our production systems, and approx. 50-100 Superset dashboards to visualize it all. This has been a great success and a big step towards making the company data-driven.
While doing this work, we identified a number of opportunities to deploy Machine Learning. During our second year at Fetchr, we put three families of ML models into production. These 3 areas are:
- Operational choice
In all cases, we used A/B tests and % releases to put our models into production, or when putting new versions of models into production and deprecating older ones.
Scheduling is a critical step in the delivery funnel in the Middle East. It refers to looking at an order’s data (recipient name, phone number, freetext address, etc), and trying to figure out where our courier has to go (latitude, longitude) to deliver the package. This is trivial in Europe or the US, where we have mature addressing systems, and often the address can be resolved to a (latitude, longitude) perfectly with a Google Maps API call. In the Middle East, addressing is a challenge, because there are no zip codes, and street names and numbers are unclear and/or people don’t know. Google Maps has very limited coverage, so querying their APIs doesn’t help (same with OSM based services). Also, because people know this, they often don’t try to put their actual address into the address field, instead they put down a nearby point of interest and/or instructions. For this reason, the
(address) -> (latitude, longitude) mapping was originally performed manually:
- either through self-scheduling (recipient gets an SMS, clicks through, and drops a pin in Google Maps on our scheduling mweb page)
- or by a call center agent, either by reading the address (“Blind”) by calling the recipient and talking with them on the phone, and meanwhile dropping a pin in Google Maps
We realized this mapping can be automated with Machine Learning for a large majority of orders. I’ve described the automated scheduling on the blog before, and described the models we use.
By now we have a lot more models in production. They are (see earlier post for details):
- address matching to:
- manually maintained rules
- ML rules (single text fragment)
- multi-level ML rules (multi text fragment)
- ML rules for arabic text
- provided locations
- zip codes (in KSA)
As described in an earlier post, we have lots of knobs to tune to move the models in the (conversion, Delivery Performance) space, ie. schedule more or less orders at lower or higher Delivery %. Scheduling more orders at overall lower delivery % makes sense if alternative scheduling channels (such as the call center) are experiencing technical difficulties (eg. lines are down), because in this case there is no next-best alternative for scheduling.
In 2019 September, ML scheduling was Fetchr’s biggest scheduling channel globally, handling approx. 37% of all orders dispatched globally. In terms of Delivery Performance, it outperforms call center scheduling, and is second only by approx. 5% to our best channel, recipient self-scheduling.
As mentioned above, our best scheduling channel in terms of Delivery Performance is self-scheduling. This makes sense: self-scheduling means the recipient visits our website and explicitly tells us the (day, time, location) they want us to deliver the order. These are recipients who really want their orders, and are willing to invest time to give us high-quality scheduling coordinates. So this is a biased, but highly valuable group.
Clearly, more self-scheduling is better for any delivery company. How can we get more self-scheduling? The basic scheduling flow is for us to send out notifications to the recipients that their orders are ready to go in our last mile warehouse in their city. There are various notification channels, the biggest one is SMS. The message contains a link to our mweb scheduling page, where the recipients can self-schedule. If we can get more people to click and convert, we get more self-scheduling, which means we will have higher overall Delivery Performance (since this is the best channel wrt Delivery Performance).
We ran many A/B tests on notifications, and found that getting the language right matters a lot. In our markets, the biggest split is between English and Arabic (the third would be Hindi). So the challenge is, given a name like “Marton Trencseni” (Hungary), “Mohit Ahuja” (India), or “Tariq Sanad” (Bahrain), all expats living in the UAE, what is the right language? In the first 2 cases, it should be english, in the last it should be arabic.
We experimented with numerous Scikit Learn models, but in the end we went with a hand-rolled one. We were not able to use public datasets for this classification task, because these datasets are highly polluted:
- many common arabic names are also common in non-arabic countries (eg. India)
- popular arabic names also show up in english speaking country’s name databases (eg. Ali, Ahmed are very common in US/UK)
So in the end we used names from our own delivery dataset (10M+ deliveries) to bootstrap a classification dataset, where we used frequency in mostly homogeneous countries as an initial signal and went from there. This worked, but then we realized that the algorithms (n-gram frequency and co-occurrence counting) we used to do the bootstrapping can also be re-used for the classification task, there’s no need for Scikit Learn. The resulting hand-tuned model is 99% accurate.
This is the latest family of models we introduced at Fetchr. From a Machine Learning perspective this is the most straightforward: we have a large number of features (essentially columns in our data warehouse) available for our historic dispatches:
- sender’s information
- recipient’s information (address, etc.)
- recipient’s historic information
- scheduling channel
For each dispatch, we know whether it was successfully delivered or not. Given our historic data, we can build a classifier which predicts which orders will be delivered (or not) tomorrow (or a later date), of all orders scheduled for dispatch. After one-hot encoding, our feature vector length is in the 1000s, and we can achieve 90%+ accuracy with out-of-the-box Scikit Learn models. In other words, perhaps not too surprisingly, it is possible to predict the chances of delivery success quite well.
What are the use-cases for using delivery prediction? Not dispatching orders, even if they have low predicted probability of delivery success is not an option; it's our job to attempt the delivery! But we can use the relative probabilities to prioritize orders to increase the chances of success and improve efficiency. Another potential use-case, currently not planned at Fetchr, is differential pricing.
I worked at Facebook in 2016-17 and experienced a very effective Data Science culture as part of a product team. We’ve been building Data Science at Fetchr based on this template and it has worked out well so far. We had significant impact in the past 2 years, both with our Analytics and our Machine Learning projects. Opportunities for automating, optimizing and enabling processes with ML are plentiful.