Getting Started with MLFlow

Marton Trencseni - Sun 10 January 2021 - Data


For the last few months I’ve been using MFlow in production, specifically its Tracking component. MLFlow is an open source project for lifecycle tracking and serving of ML models, coming out of Databricks. MLFlow is model agnostic, so you can use with SKLearn, XGBoost, Pytorch, Tensorflow, FBProphet, anything. The best way to get an initial understanding of MLFlow Tracking is to watch the first 25 minutes of this talk by Corey Zumar, who works on MLFlow at Databricks.

MLFlow overview.

MLFlow Tracking is a simple client/server solution to log, track and share model metrics at scale. Using MLFlow makes sense both during the development of the model and also once the model is running in production.

Using MLFlow for model development

In this scenario, I’m using an ipython notebook and iterating on a model. I’m either reading training data from a files or straight from the database, training models, evaluating on test data, rinse and repeats potentially 100s of times, over days, weeks or months. With MLFlow, I can continuously track various metrics and artifacts of my models runs, such as:

  • the current source code for the model training/running
  • size/shape of the data I'm using for training
  • various metrics and distributions of the training and test data
  • model runtime and memory usage
  • goodness metrics such as accurary, f1 score, precision, recall, TPR, FPR, etc.
  • various plots such as ROC

With MLFlow I achieve a kind of raw versioning of my model development.

Using MLFlow in production

In production can mean several things:

  1. Some models are batchy, and are re-trained at pre-specified intervals (eg. hourly, daily or weekly) on new data, and then emit their prediction for existing data into a table, which is then read out by other processes.
  2. Other models are re-trained at regular intervals like above, but then serve prediction requests in real-time, eg. over an HTTP API.
  3. In advanced use-cases models are also updated in real-time as new training data comes in, but this is not typical outside of advanced Internet companies.

MLFlow Tracking is agnostic to the above, so it can be used in all scenarios. Personally I'm using it in scenario #1 use-cases so far, from Airflow.

Setting up MLFlow and the artifact gotcha

Setting up MLFlow is easy. Just create a new venv and pip install mlflow. You will need to pip install mlflow on both the server and all the clients that will be emitting tracking metrics to MLFlow. Remember that you can !pip install mlflow from ipython notebooks.

Now, we can start the server, like:

mlflow server --backend-store-uri XXX --default-artifact-root YYY -h -p 5000

What is backend-store-uri and default-artifact-root?

We will use MLFlow for storing metrics like accuracy, essentially in a flat key-value manner. It will store these in the backend store, which can be a filesystem location or a database like Postgresql. We will also send files to MLFlow for tracking, such as the source code currently running, or a plot we generated (eg. an ROC curve). These will be sent and stored as files under artifact root**, which has to be a filesystem or an S3 bucket.

The big gotcha is that the MLFlow client library:

  • when sending metrics, it sends those to the MLFlow server, which stores them in the backend store; the client does not access the backend store directly
  • when sending artifacts, the MLFlow client library puts them directly into the artifact store, so both the client and server needs to have access to it! This is very unexpected because it goes against basic client/server principles. For this reason, although you can start up MLFlow by specifying a directory on the server as the artifact root, client calls to save artifacts will fail. You need to specify something like an S3 bucket, and both the client and server needs an AWS access key to read/write!

I have set up MLFlow to use the server's local filesystem for backend store and a shared (between server and clients) S3 bucket for artifact storage.

Once the server is up and running, you can open it in your browser, you should see an empty page like this:

Empty MLFlow page.

Sending tracking data to MLFlow

Now we can send tracking data to the MLFlow server. Fundamentally the steps are:

  • set up a connection
  • set the experiment; if the experiment doesn’t exist, it will be created automatically (the experiment is essentially the project you’re working on)
  • open a new experiment run
  • log params, metrics and artifacts
  • check the results on the web console!

In code:

import os
import mlflow

mlflow.set_tracking_uri('')  # set up connection
mlflow.set_experiment('test-experiment')          # set the experiment
with mlflow.start_run():                          # open new experiment run
        mlflow.log_artifact(os.path.abspath(__file__), 'source code')
        mlflow.log_param('some parameter', 1)
        mlflow.log_metric('accuracy', 0.9)

The experiment test-experiment will show up on the web console:

MLFlow experiment.

Inside the experiment are the experiment runs. Selecting the latest one we will see our logged data:

MLFlow experiment contents.


This was a quick introduction on how to set up MLFlow and how to get some initial data into it. Surprisingly, there is not much more handholding that you will get from MLFlow. Although there is experimental integration of MLFlow into common ML libraries such as SKL (mlflow.sklearn), Pytorch (mlflow.pytorch) and many others, most of the relevant metrics for your model you will have to log manually using the functions above: log_artifact(), log_param(), log_metric(). For example, to save an ROC curve, you will have to generate the plot using pyplot, save it to a .png file, and then save that file as an artifact. In later posts I plan to discuss useful patterns for MLFlow.

Happy MLFlowing!