Access and Persist MLflow running in Google Colab
Conducting experiments for ML projects often requires a lot of feedback loops and changes at almost every step of the process. This often involves tuning a large number of parameters, adding new features, or even deleting existing ones, as well as handcrafting more sophisticated features. Every model also comes with a myriad of additional metadata. Keeping notes of everything manually would be a nightmare. In the early days of my career, I used to do it with pen and paper, then I moved to complex Excel files, which are perhaps better than the physical medium since I can perform searches. Managing this metadata cost me a lot of time and mental energy. Then I discovered MLflow, which has been my savior ever since.
I have much more to say about the platform, but I’ll save that for another article. In this short blog post, I’ll briefly state that MLflow was created by Databricks, who are also the creators of Apache Spark. The main goal of MLflow is to make it easier for data scientists (like me) and engineers (also me) to manage all the complexities involved in machine learning projects, such as experiment tracking, model versioning, and deployment, allowing us to focus on the more crucial aspects.
In this short blog post, I will introduce how to use MLflow in an ephemeral environment like Google Colab and still be able to save the…