MLOps

Eliminate technical debt with iterative, reproducible pipelines

Photo by EJ Strat on Unsplash

Today, Machine Learning powers the top 1% of the most valuable organizations in the world (FB, ALPH, AMZ, N etc). However, 99% of enterprises struggle to productionalize ML, even with the possession of hyper-specific datasets and exceptional data science departments.

Going one layer further into how ML propagates through an organization reveals the problem in more depth. The graphic below shows an admittedly simplified representation of a typical setup for machine learning:

Optimize ML training costs on the cloud using spot instances

Save more than 70% costs using spot instances on the cloud. [Source: Unsplash]

Every organization at any scale understands that leveraging the public cloud is a trade-off between convenience and cost. While cloud providers like Google, Amazon, and Microsoft have immensely reduced the barrier of entry for machine learning, GPU costs are still at a premium.

There is an increasing fear in the machine learning community that the true power of machine learning is still within the hands of the few. The flagship example of this is OpenAI’s massive GPT-3 model containing 175 billion parameters, a memory footprint of 350GB and reportedly costing at least $4.6 million to train. The trend also looks…

Strike a balance between rapid results and high-quality coding.

A wall which says “Until debt tear us apart”
A wall which says “Until debt tear us apart”
Technical Debt can be hard to manage in machine learning. Source: Unsplash

Okay, let's make it clear at the start: This post is NOT intended for people who are doing one-off, siloed projects like participating in Kaggle competitions or doing hobby projects on Jupyter notebooks to learn the trade. The value of throw-away, quick, dirty script code is obvious there — and has its place. Rather, it is intended for ML practitioners working in a production setting. So if you’re working in an ML team that is struggling to manage technical debt while pumping out ML models, this one’s for you.

A typical workflow

It can be frustrating to reproduce machine learning [Source: Unsplash]

It is now widely agreed that reproducibility is an important aspect of any scientific endeavor. With Machine Learning being a scientific discipline, as well as an engineering one, reproducibility is equally important here.

There is widespread fear in the ML community that we are living through a reproducibility crisis. Efforts like the Papers with Code Reproducibility Challenge, signaled a clear call-to-action for practitioners, after a 2016 Nature survey revealed that 70% of results are non-reproducible.

While a lot of the talk amongst the community has centered on reproducing machine learning results in research, there has been less focus on the…

Predictive maintenance can help but does it always make sense to use machine learning? (Photo by rawpixel on Unsplash)

Predictive maintenance has diffused through the cycle of innovation to the extent that it is highly probable that you have heard about the term in one way or another. While the jury is still out on what the best practices to exactly implement predictive maintenance are, there is no doubt of the business value that drives much of the fan-fare around the buzzword.

This, coupled with the increasing accessibility of the machine learning toolbox to handle large scale sensor data, has led to many attempts in building ML pipelines for predicting failures and optimizing maintenance schedules. …

Hamza Tahir

Software Engineer turned ML Engineer. Interested in building tech products end-to-end. Co-creator of PicHance, you-tldr, and ZenML.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store