10 MLops platforms to manage the machine learning lifecycle

For most professional software developers, using application lifecycle management (ALM) is a given. Data scientists, many of whom do not have a software development background, often have not used lifecycle management for their machine learning models. That’s a problem that’s much easier to fix now than it was a few […]

For most professional software developers, using application lifecycle management (ALM) is a given. Data scientists, many of whom do not have a software development background, often have not used lifecycle management for their machine learning models. That’s a problem that’s much easier to fix now than it was a few years ago, thanks to the advent of “MLops” environments and frameworks that support machine learning lifecycle management.

What is machine learning lifecycle management?

The easy answer to this question would be that machine learning lifecycle management is the same as ALM, but that would also be wrong. That’s because the lifecycle of a machine learning model is different from the software development lifecycle (SDLC) in a number of ways.

To begin with, software developers more or less know what they are trying to build before they write the code. There may be a fixed overall specification (waterfall model) or not (agile development), but at any given moment a software developer is trying to build, test, and debug a feature that can be described. Software developers can also write tests that make sure that the feature behaves as designed.

By contrast, a data scientist builds models by doing experiments in which an optimization algorithm tries to find the best set of weights to explain a dataset. There are many kinds of models, and currently the only way to determine which is best is to try them all. There are also several possible criteria for model “goodness,” and no real equivalent to software tests.

Unfortunately, some of the best models (deep neural networks, for example) take a long time to train, which is why accelerators such as GPUs, TPUs, and FPGAs have become important to data science. In addition, a great deal of effort often goes into cleaning the data and engineering the best set of features from the original observations, in order to make the models work as well as possible.

Keeping track of hundreds of experiments and dozens of feature sets isn’t easy, even when you are using a fixed dataset. In real life, it’s even worse: Data often drifts over time, so the model needs to be tuned periodically.

There are several different paradigms for the machine learning lifecycle. Often, they start with ideation, continue with data acquisition and exploratory data analysis, move from there to R&D (those hundreds of experiments) and validation, and finally to deployment and monitoring. Monitoring may periodically send you back to step one to try different models and features or to update your training dataset. In fact, any of the steps in the lifecycle can send you back to an earlier step.

Machine learning lifecycle management systems try to rank and keep track of all your experiments over time. In the most useful implementations, the management system also integrates with deployment and monitoring.

Copyright © 2020 IDG Communications, Inc.

Next Post

Satellite payment center for some county services proposed for Amory | News

ABERDEEN – A drive-thru location to service the north end of the county for various tax collector’s and assessor’s office services was proposed during Sept. 11’s board of supervisors meeting. The property is a former bank building in Amory. “It would only be for tag renewals, paperwork that needed to […]