Hassam Tahir
--> Expertise required = Go Lang & Python
Hyperparamter tunning through standalone Katib
Katib (also known as "Kubeflow Katib") is an open-source hyperparameter tuning system developed by the Kubeflow community. It is designed to automate the process of hyperparameter optimization for machine learning models running on Kubernetes clusters. Hyperparameter tuning is a critical step in the machine learning workflow, as it involves finding the best combination of hyperparameters that optimize the model's performance and generalization.
Here are some key features and concepts of Katib:
Hyperparameter Tuning: Katib allows users to define the hyperparameters and their search spaces for a machine learning model. It supports various search algorithms, including random search, grid search, and Bayesian optimization, among others. Katib automatically explores different combinations of hyperparameters to find the configuration that yields the best performance.
Kubernetes Integration: As part of the Kubeflow ecosystem, Katib leverages Kubernetes for orchestrating the hyperparameter tuning jobs. It takes advantage of Kubernetes' scalability and resource management capabilities, allowing users to efficiently conduct distributed hyperparameter tuning experiments.
Early Stopping: Katib supports early stopping strategies to terminate underperforming experiments early, saving time and resources.
Metric Collection: During hyperparameter tuning, Katib collects and logs metrics from each training run. These metrics are used to evaluate the performance of different hyperparameter configurations.
Integration with Other Kubeflow Components: Katib can be integrated with other Kubeflow components like TensorFlow, PyTorch, and other machine learning frameworks, enabling seamless experimentation and training on Kubernetes.
Experimental Steps
-->Pre-requisite: Do add ML model file which you want to fine-tune/train on custom dataset Step#1: Cross check katib installation and its components:
Step#2: Define hierarchy of experimentation:
Step#3: Check python code to be executed:
Step#4: Defining hyperparameter Yaml File:
Step#5: Command to apply yaml file: kubectl appy –f
Step#6: Run & Check experiment status
Step#7: Acces container inside pod to observe logs and epoch acurracy of machine learning model: --> Check logs of metric collector container
Step#8: Best trial and result collection according to user: