The ingredients of a reproducible machine learning model

Chloe Mawer, PhD

Principal Data Scientist, Lineage Logistics
Adjunct Lecturer, Masters of Science in Analytics, Northwestern University

Irreproducibility in the wild

Steps of a machine learning model

How to write unmaintainable code by Roedy Green


Why is this so hard?

Randomness is everywhere

  • Sampling of data for training
  • Train/test split
  • Model initialization
  • Sampling of data within algorithm
  • Order of exposure of the model in training to the data
  • Sampling of data for evaluation and cross validation
  • And more!

The path is long

Steps of a machine learning model



Ingredients of a reproducible model


Find every random state and parameterize it

Versioning of everything

Versioning code is not enough

├── src                               <- Source data for the model 
│   ├──                <- Script for ingesting data from different sources 
│   ├──          <- Script for cleaning and transforming data for use in training and scoring.
│   ├──                <- Script for training machine learning model(s)
│   ├──                <- Script for scoring new predictions using a trained model.
│   ├──                <- Script for postprocessing predictions and model results
│   ├──             <- Script for evaluating model performance 
├──                            <- Simplifies the execution of one or more of the src scripts 
├── requirements.txt                  <- Python package dependencies 

Parameters and settings



  • At minimum, version an explicit query and include in configuration filters used.
  • Source data can change so even this is not sufficient in many cases.
  • Ideally, you can version the entire training dataset through tools like gitlfs, S3 or your own tables in HDFS or the database of your choosing.


  • If a feature changes, the downstream models change too.
  • Often a feature is the output of another model.
  • Ideally each feature should be treated this way and managed accordingly.

Auxiliary data

  • Models can be highly dependent on auxiliary data, such as the options for categorical variables.
  • If this data gets out of sync with the model files or code, it can cause code to fail.

Trained model objects


  • Something needs to remember how the steps were stringed together.
  • Use tools like Make files, Airlflow, Luigi and version them.

Version them all together

  • Commit hashes
  • Manually cultivated version list
  • Dates

Reproducibility testing

Traditional software testing is not enough

(Though you should definitely do it still!)

Model testing

Environment management



Code alignment

Steps of a machine learning model - offline

Steps of a machine learning model - online

Thank you!

Thank you!

You can find these slides at

and the reproducible model template repo at

Chloe Mawer | Lineage Logistics | @chloemawer