How to Manage Machine Learning Models

Notice:
This post is older than 5 years – the content might be outdated.

Developing a good machine learning model is not straight forward, but rather an iterative process which involves many steps. Mostly Data Scientists start by building a so called baseline, which can be used as a reference point to compare other models. This baseline can be created by just calculating the average or using some simple models. After that a data scientist will probably try different models to see how they perform before doing some kind of hyper-parameter tuning to improve the most promising ones. Even if those models achieve good results, there will still be plenty of options to improve such as using more data (pre-)processing steps, creating additional features, using some form of dimensionality reduction or even applying stacking strategies.

Have a look at our 2022 update comparing frameworks for machine learning experiment tracking and see how MLflow, ClearML, neptune.ai and DAGsHub hold up!

Clearly this is an explorative process, which requires expertise and flexibility in tooling. Therefore data scientists mostly use notebooks to quickly try new ideas, rapidly train models and compare them, most of the times using simple print statements. This works well first, but will become confusing as the number of models and parameters increases. A common approach is write the prediction results to a table or dataframe, but even so it is difficult to track all important information such as used hyper-parameters, datasources, time of execution etc.

While big companies such as Google, Facebook and Uber develop custom machine learning platforms to support data scientists in this challenge, also smaller projects arose within the last months, for example DVC, Sacred or Databricks‘ mlFlow. While we currently evaluate these, we also tested another alternative named ModelDB earlier this year. The following blog article was created during this evaluation in May this year. While ModelDB might not be the best choice at this point of time, its evaluation explains the concept of machine learning model management and will be used as baseline in another article to follow in this series.

ModelDB: Architecture and installation

ModelDB was developed at the computer science and artificial intelligence laboratory at MIT, integrates tightly with scikit-learn or SparkML and offers additional visualisation tools to evaluate model performance. It consists of a backend which stores the data, a frontend for visualisation purposes and some client libraries. For installation it’s easiest to clone the git repo and use docker-compose to set up the infrastructure. Beside the backend- and the frontend container you will find one which includes a MongoDB instance, but at the point of writing this is not really used, instead all data is stored in a SQLite database within the backend container.

The Backend service is implemented using the interface definition language Apache Thrift and compiled to Java. This way the backend service provides REST endpoints to access the data within the SQLite database. The Frontend-Service is provided by NodeJS using the Express-framework and Backbone as well as Vega to display charts.

To interact with these services modelDB provides clients for two ML-Frameworks: scikit-learn and SparkML. Sadly the compilation with Scala 2.13 does not work.

The installation of the scikit-learn client instead works via pip: pip install modeldb . However, if you want to follow this tutorial, you will need to build it from source until my pull request gets accepted. So clone the modeldb repo and run client/python/setup.py.

Train a Regression model

ModelDB already provides some examples which mostly do classification tasks, so let’s try some regression. We will use the Boston Housing Dataset which is contained within scikit-learn. If you want to run the notebook yourself, you can find it at github. I’ll walk you through it below.

First, let’s make some imports and check the data:

import pandas as pd

from IPython.display import display as ipd

from sklearn.datasets import load_boston

boston = load_boston()

data = pd.DataFrame(boston.data,columns=boston.feature_names)

data['target'] = pd.Series(boston.target)

ipd(data.sample(5))

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	target

447	9.92485	0.0	18.10	0.0	0.740	6.251	96.6	2.1980	24.0	666.0	20.2	388.52	16.44	12.6

307	0.04932	33.0	2.18	0.0	0.472	6.849	70.3	3.1827	7.0	222.0	18.4	396.90	7.53	28.2

356	8.98296	0.0	18.10	1.0	0.770	6.212	97.4	2.1222	24.0	666.0	20.2	377.73	17.60	17.8

272	0.11460	20.0	6.96	0.0	0.464	6.538	58.7	3.9175	3.0	223.0	18.6	394.96	7.73	24.4

308	0.49298	0.0	9.90	0.0	0.544	6.635	82.5	3.3175	4.0	304.0	18.4	396.90	4.54	22.8

import pandas as pd

from IPython.display import display as ipd

from sklearn.datasets import load_boston

boston = load_boston()

data = pd.DataFrame(boston.data,columns=boston.feature_names)

data['target'] = pd.Series(boston.target)

ipd(data.sample(5))

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT target

447 9.92485 0.0 18.10 0.0 0.740 6.251 96.6 2.1980 24.0 666.0 20.2 388.52 16.44 12.6

307 0.04932 33.0 2.18 0.0 0.472 6.849 70.3 3.1827 7.0 222.0 18.4 396.90 7.53 28.2

356 8.98296 0.0 18.10 1.0 0.770 6.212 97.4 2.1222 24.0 666.0 20.2 377.73 17.60 17.8

272 0.11460 20.0 6.96 0.0 0.464 6.538 58.7 3.9175 3.0 223.0 18.6 394.96 7.73 24.4

308 0.49298 0.0 9.90 0.0 0.544 6.635 82.5 3.3175 4.0 304.0 18.4 396.90 4.54 22.8

The dataset has 14 columns, where 13 are attributes and one is the target variable. They do not contain any nulls and are formatted numerically, even that CHAS actually is a Boolean value.

Let’s see, how a simple linear regression without ModelDB would look like:

from sklearn.cross_validation import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_absolute_error

# Do a train_test_split

x_train, x_test, y_train, y_test = train_test_split(data.iloc[:,:-1], data.iloc[:,-1], test_size=10, random_state=42)

# Create and fit regression

linreg = LinearRegression()

linreg.fit(x_train, y_train)

# Do prediction and calculate mean absolute error

test_pred = linreg.predict(x_test)

mean_absolute_error(y_test, test_pred)

from sklearn.cross_validation import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_absolute_error

# Do a train_test_split

x_train, x_test, y_train, y_test = train_test_split(data.iloc[:,:-1], data.iloc[:,-1], test_size=10, random_state=42)

# Create and fit regression

linreg = LinearRegression()

linreg.fit(x_train, y_train)

# Do prediction and calculate mean absolute error

test_pred = linreg.predict(x_test)

mean_absolute_error(y_test, test_pred)

We get an absolute error of 2.499. To do the same using ModelDB, we first need to import the library and then create a syncer object by providing a project, an experiment and and an experiment-run object (If you get an error, make sure your docker-containers are up and running).

import modeldb.sklearn_native.ModelDbSyncer as mdb

project = mdb.NewOrExistingProject(name="ModelDB Evaluation", author="Nico", description="using Bosten Housing Dataset")

experiment = mdb.NewOrExistingExperiment(name="Simple model training", description="")

syncer = mdb.Syncer(

    project,

    experiment,

    mdb.NewExperimentRun("Linear Regression"))

import modeldb.sklearn_native.ModelDbSyncer as mdb

project = mdb.NewOrExistingProject(name="ModelDB Evaluation", author="Nico", description="using Bosten Housing Dataset")

experiment = mdb.NewOrExistingExperiment(name="Simple model training", description="")

syncer = mdb.Syncer(

project,

experiment,

mdb.NewExperimentRun("Linear Regression"))

After that, we can reuse the linear regression code from above with a few minor changes: We do not use the scikit-learn classes directly, but through modelDB which extends them with a function called *_sync. This function tells the syncer object to keep track of the calculated object. You could even minimise the changes by simply overwriting the default scikit-learn objects using an import like

import modeldb.sklearn_native.ModelDbSyncer as *

1	import modeldb.sklearn_native.ModelDbSyncer as *

, but we will stay with the mdb prefix to make clear which feature gets used. Finally we calculate the absolute and the squared mean error and tell the syncer to synchronize these changes to the backend-service.

from modeldb.sklearn_native import SyncableMetrics

# Do a train_test_split

x_train, x_test, y_train, y_test = mdb.cross_validation.train_test_split_sync(data.iloc[:,:-1], data.iloc[:,-1], test_size=10, random_state=42)

# Create and fit regression

linreg = mdb.linear_model.LinearRegression()

linreg.fit_sync(x_train, y_train)

# Do prediction and calculate mean absolute error

test_pred = linreg.predict_sync(x_test)

mae = SyncableMetrics.compute_metrics(linreg, mean_absolute_error, y_test, test_pred, data.iloc[:,:-1].values,"predictionCol", 'target')

mse = SyncableMetrics.compute_metrics(linreg, mean_squared_error, y_test, test_pred, data.iloc[:,:-1].values,"predictionCol", 'target')

# Sync with the backend service

syncer.sync()

from modeldb.sklearn_native import SyncableMetrics

# Do a train_test_split

x_train, x_test, y_train, y_test = mdb.cross_validation.train_test_split_sync(data.iloc[:,:-1], data.iloc[:,-1], test_size=10, random_state=42)

# Create and fit regression

linreg = mdb.linear_model.LinearRegression()

linreg.fit_sync(x_train, y_train)

# Do prediction and calculate mean absolute error

test_pred = linreg.predict_sync(x_test)

mae = SyncableMetrics.compute_metrics(linreg, mean_absolute_error, y_test, test_pred, data.iloc[:,:-1].values,"predictionCol", 'target')

mse = SyncableMetrics.compute_metrics(linreg, mean_squared_error, y_test, test_pred, data.iloc[:,:-1].values,"predictionCol", 'target')

# Sync with the backend service

syncer.sync()

After that the modelDB frontend should show you a project called ModelDB Evaluation probably at http://127.0.0.1:3000. Within this project, you will find a simple diagram, which shows two dots, representing the two error scores we just calculated. You can also click on those and see further information about the model on the sidebar on the right. Feel free to adapt the code and try some other parameters and models to see what happens.

Add additional regressors to ModelDB

If you played a bit around and tried to exchange the linear regression for some alternative with regularisation such as Ridge or Lasso, you will have found an error such as:

<span class="ansi-red-fg">ImportError</span>: No module named 'Ridge'

1	<span class="ansi-red-fg">ImportError</span>: No module named 'Ridge'

This happens because ModelDB does not support any regressors other than linear regression yet. Luckily it is really easy to add those. To do so open the file client/python/modeldb/sklearn_native/ModelDbSyncer.py and find the enable_sklearn_sync_functions function. Within that, you find an array containing all models that the fit_sync and the predict_sync functionality should get added to. Just add ElasticNet, Ridge, Lasso, and any other model you want to use. After that, we can increment the version number of the library since we did add a new feature. You’ll find it in client/python/setup.py.

Now we will run client/python/build_client.sh to package the new version and import it into our notebook using:

import pkg_resources

pkg_resources.require("modeldb==0.0.1a31")

import pkg_resources

pkg_resources.require("modeldb==0.0.1a31")

Compare multiple models

Using modelDB for only a few models is like cracking a nut with a sledge hammer. To become useful we will need some more model-parameter combinations. So let’s apply grid search using ModelDB.

experiment = mdb.NewOrExistingExperiment(name="Grid Search", description="")

syncer = mdb.Syncer(project,experiment,mdb.NewExperimentRun("ElasticNet"))

model = mdb.linear_model.ElasticNet()

parameters = {

    'alpha': (10,5,2,1,0.5,0.2,0.1,0)

}

scorer = sklearn.metrics.make_scorer(mean_absolute_error)

clf = mdb.GridSearchCV(model, parameters, cv=2, scoring=scorer,error_score=100)

# Fit the gridsearch

clf.fit_sync(x_train, y_train)

test_pred = clf.predict(x_test)

# Compute various metrics on the testing set

#mae = SyncableMetrics.compute_metrics(clf, mean_absolute_error, y_test, test_pred, data.iloc[:,:-1].values,"predictionCol", 'target')

#mse = SyncableMetrics.compute_metrics(clf, mean_squared_error, y_test, test_pred, data.iloc[:,:-1].values,"predictionCol", 'target')

syncer.sync()

experiment = mdb.NewOrExistingExperiment(name="Grid Search", description="")

syncer = mdb.Syncer(project,experiment,mdb.NewExperimentRun("ElasticNet"))

model = mdb.linear_model.ElasticNet()

parameters = {

'alpha': (10,5,2,1,0.5,0.2,0.1,0)

}

scorer = sklearn.metrics.make_scorer(mean_absolute_error)

clf = mdb.GridSearchCV(model, parameters, cv=2, scoring=scorer,error_score=100)

# Fit the gridsearch

clf.fit_sync(x_train, y_train)

test_pred = clf.predict(x_test)

# Compute various metrics on the testing set

#mae = SyncableMetrics.compute_metrics(clf, mean_absolute_error, y_test, test_pred, data.iloc[:,:-1].values,"predictionCol", 'target')

#mse = SyncableMetrics.compute_metrics(clf, mean_squared_error, y_test, test_pred, data.iloc[:,:-1].values,"predictionCol", 'target')

syncer.sync()

We are going to use ElasticNet, which is a regularized version of a linear regression and applies both, l1 and l2 regularization. The parameter alpha defines the absolute factor of the regularization, while l1_ratio defines the mixture between l1 and l2 regularization. We use a couple of different values and 4-fold cross validation. The following steps are similar to before, before we finally compute mean average and squared error of the best predictor on the test-set. After the code got completed the ModelDB UI should look something like this:

We find a point per parameter k-fold combination, so 5*5*5 = 125 points. These are quite some models to compare and ModelDB supports us doing so with an additional tool just below the default chart. In the select fields choose continuous as metric to display on the y-axis and the parameters alpha and l1_ratio as x-axis and group-by values. With compare you should receive a bar-chart comparing the average model performance across the calculated folds.

If you are more of a numbers type you can also compare your models using a table. To do so switch from the „Models and Charts“ to the „Models“ page. You will find a table of all stored models which can be filtered and grouped using the same drag and drop mechanism. However it is difficult to compare results using this view, since parameters don’t get their own column. Just use „create table“ and add all columns you’re interested in to generate a customized table. First, drag the experiment_run_id to the filter section on the left sidebar reduce the set of values. Then place the fields you are are interested in in the Customize panel to generate a table based on these. However you should not place too many combinations there, since the HTML table lacks scrolling functionality.

Summary

ModelDB makes it easy to structure your models and helps you to analyze them to find the best combination. It is limited to scikit-learn (and SparkML algorithms), but provides an easy and minimally invasive way to integrate for implemented methods. However, only a subset of scikit-learn features are supported and ModelDB does neither support complex models nor randomSearch instead of gridSearch. Even the regularized versions of linear regressions haven’t been supported until recently, even though it is easy to add such functionality.

But the concept of the tightly integrated clients struggles by design: ModelDB overwrites scitkit-learn’s native functionality with custom extensions which are likely to break as scikit learn releases new versions. Without adaptions this will prevent modelDB users from updating to scikit-learn versions later then 17.2. The extension of an independent and changing framework is troublesome in general, especially if there is no powerful vendor behind it. ModelDB did not progress too much this year, even though there are definitely areas that could use improvement, such as a more intuitive UI, missing components in the library and the lack of an extensive documentation.

Conclusion

The concept of a framework which provides structure to machine learning experiments looks really promising. ModelDB uses a modern technology stack and provides many features for model comparison but lacks maturity and documentation. The project works as inspiration and for experiments, but will need some more supporters to stay alive, especially since the clients will need continuous work to stay in sync with the machine learning libraries supported. Therefore be careful using it in a production environment or be prepared to contribute some work. Luckily there have been some similar projects popping up recently such as datmo, dvc or Sacred. We we’ll probably have a look at those in the future.

Mastering Python: Fortgeschrittene Techniken für Entwickler:innen

Dieses Training führt praxisorientiert in fortgeschrittene Konzepte von Python ein. Im Verlauf des Trainings lernen die Teilnehmer:innen anhand von interaktiven Beispielen und umfangreichen Praxisaufgaben alle wichtigen Konzepte der Sprache kennen.

Zum Training

2 Kommentare

Daniel sagt:

20.10.2020 um 11:49 am Uhr

Note: The info here is outdated, especially concerning ModelDBs support only for SparkML and scikit-learn. It’s referring to an older version of ModelDB.

Antworten
Lukas Funk sagt:

20.10.2020 um 6:58 pm Uhr

Sure, this article is 2 years old. You can find a slightly updated view on this article: https://www.inovex.de/blog/machine-learning-model-management/

Antworten

Name	Borlabs Cookie
Anbieter	Eigentümer dieser Website
Zweck	Speichert die Einstellungen der Besucher, die in der Cookie Box von Borlabs Cookie ausgewählt wurden.
Cookie Name	borlabs-cookie
Cookie Laufzeit	1 Jahr

Akzeptieren
Name	Google Analytics
Anbieter	Google LLC
Zweck	Cookie von Google für Website-Analysen. Erzeugt statistische Daten darüber, wie der Besucher die Website nutzt.
Datenschutzerklärung	https://policies.google.com/privacy?hl=de
Cookie Name	_ga,_gat,_gid
Cookie Laufzeit	2 Jahre

Akzeptieren
Name	Hotjar
Anbieter	Hotjar Ltd.
Zweck	Hotjar ist ein Analysewerkzeug für das Benutzerverhalten von Hotjar Ltd. Wir verwenden Hotjar, um zu verstehen, wie Benutzer mit unserer Website interagieren.
Datenschutzerklärung	https://www.hotjar.com/legal/policies/privacy/
Host(s)	*.hotjar.com
Cookie Name	_hjClosedSurveyInvites, _hjDonePolls, _hjMinimizedPolls, _hjDoneTestersWidgets, _hjIncludedInSample, _hjShownFeedbackMessage, _hjid, _hjRecordingLastActivity, hjTLDTest, _hjUserAttributesHash, _hjCachedUserAttributes, _hjLocalStorageTest, _hjptid
Cookie Laufzeit	Sitzung / 1 Jahr

Akzeptieren
Name	HubSpot
Anbieter	HubSpot Inc.
Zweck	HubSpot ist ein Verwaltungsdienst für Benutzerdatenbanken bereitgestellt von HubSpot, Inc. Wir nutzen HubSpot auf dieser Website für unsere Online Marketing-Aktivitäten.
Datenschutzerklärung	https://legal.hubspot.com/privacy-policy
Host(s)	*.hubspot.com, hubspot-avatars.s3.amazonaws.com, hubspot-realtime.ably.io, hubspot-rest.ably.io, js.hs-scripts.com
Cookie Name	__hs_opt_out, __hs_d_not_track, hs_ab_test, hs-messages-is-open, hs-messages-hide-welcome-message, __hstc, hubspotutk, __hssc, __hssrc, messagesUtk
Cookie Laufzeit	Sitzung / 30 Minuten / 1 Tag / 1 Jahr / 13 Monate

Akzeptieren
Name	OpenStreetMap
Anbieter	OpenStreetMap Foundation
Zweck	Wird verwendet, um OpenStreetMap-Inhalte zu entsperren.
Datenschutzerklärung	https://wiki.osmfoundation.org/wiki/Privacy_Policy
Host(s)	.openstreetmap.org
Cookie Name	_osm_location, _osm_session, _osm_totp_token, _osm_welcome, _pk_id., _pk_ref., _pk_ses., qos_token
Cookie Laufzeit	1-10 Jahre

How to Manage Machine Learning Models

ModelDB: Architecture and installation

Train a Regression model

Add additional regressors to ModelDB

Compare multiple models

Summary

Conclusion

Mastering Python: Fortgeschrittene Techniken für Entwickler:innen

2 Kommentare

Hat dir der Beitrag gefallen? Antworten abbrechen

Ähnliche Artikel

From Data to Decision: Pioneering Predictive Maintenance in Food Safety

Survival Analysis for State of Charge Prediction in IoT Devices

Who Let the Dogs Out? Pet Tracker Analytics for Early Identification of Users’ Behavioral Patterns

How to Manage Machine Learning Models

ModelDB: Architecture and installation

Train a Regression model

Add additional regressors to ModelDB

Compare multiple models

Summary

Conclusion

Mastering Python: Fortgeschrittene Techniken für Entwickler:innen

2 Kommentare

Hat dir der Beitrag gefallen? Antworten abbrechen

Ähnliche Artikel

From Data to Decision: Pioneering Predictive Maintenance in Food Safety

Survival Analysis for State of Charge Prediction in IoT Devices

Who Let the Dogs Out? Pet Tracker Analytics for Early Identification of Users’ Behavioral Patterns

Newsletter