Sorry, Batman is Busy

Ian Hellström | 10 May 2021 | 9 min read

The Riddler is back in town and no one can deploy a single machine learning model. Throwing more resources at the problem won’t solve anything and Commissioner Gordon knows it. There is only one man who can save Gotham from the pitfalls of perennial productionless prototypes: Batman!

“Sorry, Batman is busy,” Alfred informs Commissioner Gordon.

That’s right. Batman is busy. Busy being fictional.

Portions of this article are also available as a recorded conference talk (28 min) at Data Summit Connect.

Success of machine learning lies in its operationalization: it is where companies can see any sustained returns on investments, but it also happens to be where the majority of initiatives fail. Reported figures for failure rates lie in the region of 85% to 87%, or roughly 9 out of 10. This is in line with failure rates of data transformation projects. Model deployment issues are a barrier to adoption to more than a quarter of companies. And only one in ten companies reports significant pay-offs from machine learning.

What’s stopping companies? GOTHAM!

When it comes to the operationalization of machine learning models, there are six (5 + 1) traps that companies can learn to spot and subsequently avoid:

Governance, or the Lack Thereof

Every time the Riddler is caught by the dynamic duo and put back in prison, he somehow manages to escape to terrorize Gotham City again and again.

While it offers a great prompt to another episode, it is a common enough pattern in business worth mentioning: dealing with symptoms instead of the cause. Issues related to data fall squarely into that category. Especially when issues are dealt with as one-offs.

Data quality is by far the most important and most overlooked aspect of machine learning. It must be assessed along several dimensions:

  • availability
  • timeliness
  • completeness
  • validity
  • consistency
  • correctness
  • trustworthiness

Please check the accompanying post with more details on each of these dimensions.

Operationalization as an Afterthought

You bought the tech, you brought in the talent, now you just lean back and wait for the magic to happen. Only nothing happens.

The essence of the operationalization as an afterthought trap is the misguided belief that a Jupyter notebook on a data scientist’s laptop is a solid enough foundation for a production-grade solution.

D/MLOps

To deploy a machine learning model to production repeatedly, you need:

  • ML: the model and all relevant features
  • DevOps: the ability to automatically test and deploy software
  • MLOps (or ModelOps): the ability to automatically test and deploy entire reproducible machine learning pipelines replete with orchestration and model performance management
  • DataOps: the ability to automatically check, version, and annotate data sets, deploy entire data pipelines with orchestration, and manage schemas

As Google already noted in 2014, a production machine learning system consists of at least 95% glue code; at most 5% of any production ML system consists of the model code. The model itself is but a small circuit in a much larger Batcomputer.

It is why I advocate for the combination of DataOps and MLOps: D/MLOps. Machine learning without data is useless: data can exist without machine learning, but machine learning without data is impossible.

The problem therefore is not so much that data scientists cannot operationalize models, it is that companies hire data scientists with the expectation that it’s a trivial matter, whereas the operationalization is anything but.

There are three options to move forward:

  1. Introduce a hard split between science and engineering with the requisite handovers from data engineering to data science and data science to machine learning engineering. I do not recommend this approach, as we shall come back to its consequences in the M of GOTHAM: management.
  2. Train data and/or machine learning engineers to build models. More on that when we hit the H of GOTHAM: hiring practices.
  3. Give data scientists the tools to deploy production-grade pipelines for training, tuning, and deploying. That requires data science-friendly technologies that go from data science all the way up to D/MLOps through DevOps, MLOps, and DataOps.

However, even with data governance and data science-friendly tools for operations, companies are still liable to four additional stumbling blocks within the operationalization as an afterthought trap.

Measure Business Metrics: Before and After

Measure before and after in terms of business metrics, not model performance metrics. Model performance metrics rarely map one-to-one to KPIs. Tweaks to a model that aim to eke out the last drop of performance may go to waste outside of the lab. Unless what happens in the lab also applies to real life, it’s a waste of effort.

Operationalization Constraints

It may still be that the intended operationalization cannot be accomplished in practice due to constraints that have little to do with the machine learning infrastructure. For instance, you have a fixed time window for payment verification, but the (mobile) networks are too slow or unreliable, or if you want to grab faulty intermediate goods off an assembly line, but quality control does not want a robot arm moving quickly over delicate components during assembly.

Such cases can be avoided if making the model fully operational is part of the prototype or at least part of the problem definition. Thinking a problem through is an easy step to skip when you’re itching to build a fancy model. It sounds like common sense, but all too often common sense is a superpower.

Change Management

Companies that see exponential returns on their machine learning investments spend more than half of their budgets on operationalization and change management, yet fewer than one in twelve are in that illustrious group.

Even if you can deploy models to production as intended, the business may need to adapt its processes while algorithms continue to garner trust. It is important for machine learning products to show the benefits to their immediate users in the business. If users perceive such automated solutions as threats, they may fight them. Productivity improvements are an extremely sensitive topic, as they can easily be seen as a means to automate people’s livelihoods away. Showing off the value of machine learning with humans in the loop through internal A/B testing can allay such fears.

Sharing the credit with the business is crucial: measure KPIs before and keep monitoring these after. Ultimately, it’s the business that gets to claim any of the monetary benefits though.

Business Context

Copying without thinking about the business context is the final obstacle in operationalization as an afterthought. Out of the box, many standard evaluation tools and techniques look at the average performance. Where that is unacceptable is in healthcare, medicine, finance, and manufacturing, where safety and compliance are paramount. In these cases, the worst-case performance of a machine learning model is often more important. So, don’t copy thinking everything has already been figured out. Adapt evaluation criteria to the business context.

Technology as a Universal Solution

Batman may be able to spring any trap thanks to his unique utility belt, but that’s only because the laws of physics do not apply to fiction.

It is a common misconception that a new piece of technology will change everything for the better, but what it really does is add more work without a huge pay-off. We can trace a lot of irrationality around technology to various cognitive biases.

Dunning-Kruger Effect

Newcomers to production machine learning underestimate the complexity, whereas experts from related fields, such as DevOps, overestimate the difficulty of using and adapting existing technology to suit ML’s unique needs. This is the Dunning-Kruger effect applied to machine learning infrastructure.

Overconfidence in enterprise software is not unheard of, and machine learning is no exception: more than half identify as mature adopters of ML technologies, regardless of the high failure rates mentioned earlier.

Present Bias

Most of the cost is up front with infrastructure, integration, customization, and of course training. The spoils of machine learning are almost entirely delayed. This does not square with the desire to see results almost immediately. Our tendency to settle for smaller rewards right now rather than delay gratification may force data scientists to rush to build prototypes that never see the light of day, because operationalization is, well, an afterthought.

IKEA Effect

Many prefer to build a platform from scratch due to the IKEA effect, which makes people value what they build or assemble themselves more, regardless of the result, and without knowing what they are getting themselves into. It took many engineers inside tech companies many years and many iterations to arrive at the platforms that so many others seek to emulate, without looking at their own needs. DIY’d platforms are often held together by duct tape and prayers, but mostly the latter.

Even stitching together components requires quite a bit of effort, and it depends on who does the stitching. It’s easy for data scientists to ignore security or rely entirely on brittle shell scripts that must be started manually and supervised at all times. Likewise, forcing the Dev(Sec)Ops stack onto data scientists is also not conducive, as operations engineers already feel burnt out and overwhelmed by the complexity of the ecosystem.

Whatever the solution, it is not entirely technological. Talent is crucial too.

Hiring Practices

No Batman is an island: full stack is a fantasy. It’s all about teamwork.

Plenty of companies seek Batpersons who can do everything but do not exist. What companies really need are cross-functional product teams of data engineers, data scientists, machine learning engineers, backend and frontend engineers, a product manager, and possibly designers. Such a team focuses on a particular domain or product area.

Too often, machine learning is run like a project, a one-off initiative with an end date rather than a product with a team that is responsible for ongoing development, operations, support, and maintenance. The outcome of a project is either handed off to engineers to make it production-worthy and scalable, which is the operationalization as an afterthought trap, or it remains a notebook linked to from a slide deck, where it generates no value.

When viewed as a collection of projects, machine learning will never be anything but a series of one-offs that fail to live up to expectations, since ownership is hazy and operationalization an afterthought.

Batman is not for hire. But what about training people inside the company?

Analytics is often where people look first. A lot of people mistake machine learning for advanced analytics, which it is not. Yes, SQL and data visualization are important to both, but that’s where the similarities end. Going from a Tableau dashboard and a bit of SQL to a production machine learning system is more than a few Coursera courses away.

It is possible to train people, but it’s often much easier to turn a great data or machine learning engineer into someone who can build models as well as productionize them than to turn data scientists into people who care about operations, especially without proper abstractions or the right attitude.

Attitude(s)

The flip side of unrealistic expectations towards data scientists is the often unrealistic expectations of data scientists themselves. Academia, online courses, data science competitions, and the media paint a picture of machine learning as being restricted to mere modelling. Novices only want to try out every fashionable technology instead of doing the hard work that leads to successful ML-powered products. If that attitude persists, it turns a data scientist into what I call a rogue data scientist, someone who focuses on machine learning modelling and ignores anything considered a nuisance, such as operations.

Organizations contribute to the problem too. In quite a few companies, engineering is a cost centre, whereas data science is attached to the business. Data and machine learning engineers are therefore viewed differently from those who analyse data and build models, even if these never make it to production.

It’s only been five years since the first details of tech companies’ cloud-native end-to-end machine learning platforms were published. While plenty of commercial platforms exist, the full user experience is not quite there yet, especially since data scientists are often indifferent to productionization, and operations engineers are comfortable with the technology as is. Such attitudes can in fact reinforce the division between science and engineering.

Machine learning has long moved from the lab to the real world where it affect billions of people daily. The same attitudes that worked in research do not necessarily work in industry. We must reward the often invisible yet essential work that goes into making machine learning successful: data collection, data cleansing, documentation, on-going maintenance, labelling, and so on.

That is where management comes in.

Management

While often ignored, management is crucial as no machine learning product can be handled by a single team that is decoupled from the rest of the business. Such a solo team often consist mostly of data scientists who can easily go rogue without supervision, and that’s a failure of management.

Support from executives is important, but only if they understand data and machine learning as a core aspect of and an asset to the organization, and they don’t just run after the latest fad. Management plays a starring role in both data governance and organizational structure. An artificial split between research or science and engineering is an anti-pattern that leads to many unnecessary handovers that slow down development and often lead to failures in deployment.

Making data ML-ready and machine learning successful takes a lot of effort, a lot of team effort. That is only possible if management sets the team up for success with the right mix of skills, all the necessary technology, and of course support.

Summary

To make machine learning stick and succeed, think in terms of machine learning products, not one-off projects. Production usage must be the goal: operationalization and ongoing maintenance and support cannot be ignored until afterwards. With a product, there is no afterwards.

Work backwards from that goal: a machine learning solution in production, embedded in the infrastructure, integrated into the business, and then include all stages from data through model development all the way to deployment of the entire workflow that leads up to a model served in production: deploy the factory, not just the final product.

Because Batman won’t save anyone.

And you know it.

Think product: from data through development to deployment
Think product: from data through development to deployment.