Optimize ML training costs on the cloud using spot instances

Image for post
Image for post
Save more than 70% costs using spot instances on the cloud. [Source: Unsplash]

Every organization at any scale understands that leveraging the public cloud is a trade-off between convenience and cost. While cloud providers like Google, Amazon, and Microsoft have immensely reduced the barrier of entry for machine learning, GPU costs are still at a premium.

There is an increasing fear in the machine learning community that the true power of machine learning is still within the hands of the few. The flagship example of this is OpenAI’s massive GPT-3 model containing 175 billion parameters, a memory footprint of 350GB and reportedly costing at least $4.6 million to train. The trend also looks…


Strike a balance between rapid results and high-quality coding.

A wall which says “Until debt tear us apart”
A wall which says “Until debt tear us apart”
Technical Debt can be hard to manage in machine learning. Source: Unsplash

Okay, let's make it clear at the start: This post is NOT intended for people who are doing one-off, siloed projects like participating in Kaggle competitions or doing hobby projects on Jupyter notebooks to learn the trade. The value of throw-away, quick, dirty script code is obvious there — and has its place. Rather, it is intended for ML practitioners working in a production setting. So if you’re working in an ML team that is struggling to manage technical debt while pumping out ML models, this one’s for you.

A typical workflow


Image for post
Image for post
It can be frustrating to reproduce machine learning [Source: Unsplash]

It is now widely agreed that reproducibility is an important aspect of any scientific endeavor. With Machine Learning being a scientific discipline, as well as an engineering one, reproducibility is equally important here.

There is widespread fear in the ML community that we are living through a reproducibility crisis. Efforts like the Papers with Code Reproducibility Challenge, signaled a clear call-to-action for practitioners, after a 2016 Nature survey revealed that 70% of results are non-reproducible.

While a lot of the talk amongst the community has centered on reproducing machine learning results in research, there has been less focus on the…


Image for post
Image for post
Predictive maintenance can help but does it always make sense to use machine learning? (Photo by rawpixel on Unsplash)

Predictive maintenance has diffused through the cycle of innovation to the extent that it is highly probable that you have heard about the term in one way or another. While the jury is still out on what the best practices to exactly implement predictive maintenance are, there is no doubt of the business value that drives much of the fan-fare around the buzzword.

This, coupled with the increasing accessibility of the machine learning toolbox to handle large scale sensor data, has led to many attempts in building ML pipelines for predicting failures and optimizing maintenance schedules. …

Hamza Tahir

Software Engineer turned ML Engineer. Interested in building tech products end-to-end. Co-creator of PicHance, you-tldr, and ZenML.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store