By Andrea Spina — CTO at Radicalbit
Event stream processing systems’ main features are high throughput and low latency; this is the reason why we put Kubernetes, Seldon, and Apache Kafka together to develop a brand-new MLOps platform — Helicon, which is able to serve together thousands of ML models, regardless of the underlying technology.
I’m quite sure that some of you struggled to understand if your ML models in production would become obsolete. On top of the above-described platform, Radicalbit Helicon tracks down when it is the right time to train your model again and identifies the proper subset of fresh data needed. All of that, kindly provided by event streaming.
Finally, I’ll tell you about the ultimate Radicalbit’s goal: introduce the first production-ready serving layer for streaming machine learning models: special algorithms that are able to learn while they produce inference, and by that adapting over time to the unpredictable changes of our input data behaviour (because domains change continuously over time), enabling the highest prediction performance ever reached before at ridiculously fast pace.
How can any model be shipped by containers?
Radicalbit’s solutions are Kubernetes-based products; for this reason, we need to ship by containers our clients ML models.
In the IT world, where new technologies arise every day, the easiest method to ship intelligent and scalable models in production is by leveraging containers. What we need is just the list of the minimum technology requirements the model needs in order to run. We use this information to wrap the trained models into consistent containers. And that’s it. The model is already able to (1) run in production, (2) scale horizontally, and (3) produce inference. The good news is the awesome MLFlow project is able to give us these capabilities (and many others), so we make only sure we use them the right way.
However, in a containerized world, this is not enough anymore. Most of the time, in a production system, some sophisticated rollout strategies for ML models (like canary deployment) might be necessary, e.g. deploying different instances based on users’ subscription tiers. Luckily, in the last months, we measured the rise of a new generation of DevOps-driven MLOps tools, which enable an unprecedented set of features for production AI-driven systems.
We wanted to pick the best technology, so we picked Seldon core. This great open-source project is able to serve any kind of ML assets (even custom ones), offers out-of-the-box support to modern cloud-native technologies like service meshes, and even accepts Kafka as a communication bus.
We integrated Seldon in our Kubernetes orchestration and today backs up most of our MLOps features like dynamic serving, continuous training, and streaming machine learning, but how do we make interact streaming workloads and ML models? Radicalbit Helicon offers a codeless pipeline editor by which engineers and scientists work together to seamlessly manage and process data in motion.
Once the pipeline is built and tested (we provide a novel codeless debugging feature), the users can deploy the topology on top of the most beloved streaming engine. We provide out-of-the-box support for Apache Flink, Spark Streaming, and the Kafka streams library. Don’t worry, at the end what you will have is a new horizontally-scaled deployment on Kubernetes. We come up then with two main deployment families:
1) Your streaming workload topology
2) A set of Seldon orchestrated ML trained models
While the deployment runs, the topology asks the models for predictions and the models respond with high-throughput and low-latency. The communication channel is Kafka, of course. Through this solution, we can exchange inference asynchronously using Kafka messages conveyed by Kafka topics; therefore, we can leverage fault-tolerant and highly scalable features. Consequently, it is possible to scale infinitely, the only limit is our (or yours if you run us on-prem) Kafka’s infrastructure availability.
Let’s talk about Continuous Training
For Radicalbit, “Continuous Training” means providing our customers with a set of DevOps tools, to help them in establishing when it is the right time to retrain a model in production. We are driven by one key metric only: drift. For us, drift is not only upstream changes in the data structure but a change in data behavior and its domain evolution.
We identify two kinds of drifts in data. The first one is the so-called “Data Drift Concept”; at this stage, the input data behavior is monitored. As you can see in the figure below, we track the behavior of the value of the feature x we are monitoring, and we do not consider the inference values at all. However, when the input data distribution changes in unprecedented ways, your model might not be good anymore, so a new training session would be required. ADWIN (Adaptive Windowing) is a well-known method to spot these drifts by monitoring data streams and — in the figure below — the vertical red lines represent the ADWIN activation.
However, this approach might not properly work in some cases — for instance, for stock market data, where the trend concept is as meaningful as the values the feature assumes. Where “how the values change over time” represents information as well, data drift is not a perfect fit anymore. So we need to move to a new monitoring strategy, where the values of the inference are also important. Thus, we built a detection model for the second kind of drift we identified, the so-called “Concept Drift”.
By monitoring the prediction trend, we can understand how well we are predicting. However, in order to understand, we need feedback, thus a piece of information stating the true class for a determined and already emitted prediction. By this hint, the platform is then able to compute the error rate. Methods like drift detection model track down sudden changes in the error rate value (as in the figure below as vertical red lines).
If detected, the platform alerts the clients that it is time to retrain their ML model with fresh data. Radicalbit Helicon offers a tiered set of notifications or triggers useful to activate actions (e.g. execute remote continuous integration pipelines aimed to re-train a model).
Let’s talk about Online Learning
How do we train machine learning models with streaming data?
Like I eagerly anticipated in this article preface, Radicalbit aims to build the first production-ready platform for streaming machine learning models. This is a completely different approach for many reasons, but the obvious one is that it changes the input data that our models’ crunch (therefore, goodbye traditional datasets!) while they learn. We do not have historical and finished data, but an unfinished and continuous flow of information that changes over time. Therefore the chances of making valuable a-priori assumptions on the input data are very limited. The objective is to make our models dynamic in behavior so they can adapt to the input data changes over time.
On the other hand, concerning Supervised Learning, we are driven by the same requirement of drifts, which is feedback. In this situation, also the paradigm is changing; indeed, for online machine learning, the prediction is firstly computed, then delivered to the downstream, and finally, in reaction to incoming feedback, the models update the learning functions based on the measured loss.
The main difference with traditional batch learning tasks is that online models predict and train the algorithm at the same time. In our latest R&D project, we built one fit for an online neural network algorithm based on a hedge algorithm. The main feature of hedging lies in the expert-advice approach: each network layer is able to provide an output prediction, shallower layers will be more valuable in earlier stages of the algorithm lifetime; then, the more events the model crunches, the more credibility will be assigned to deeper layers output predictions.
Radicalbit also proposes a first distributed implementation for it, as shown in the picture below.
The main advantage of using streaming machine learning algorithms in our platform is that you can dramatically reduce the effort of periodically re-train models because they fix themselves over time. Additionally, by building the right serving layer for streaming models, we are about to prove that for many contexts streaming algorithms can outperform traditional batch learning applications.