Predict Remaining Life of Pipe

Michael Tawk

Data Scientist

ML Engineer

pandas

scikit-learn

TensorFlow

Objective

The client sought to predict how long a pipe transporting fluids will still serve before leaking. The goal was to develop, train and tune Machine Learning algorithms and find the best performing algorithm. The client had a large dataset of pipelines failing due to different causes.

Methodology

The dataset had a lot of outliers and blanks. In addition to that the dataset contained a lot of redundant predictors that don't have add value to the problem at hand. After dropping outliers and filling the missing data with the most recurring values, the numerical predictors were normalized to not give advantage to predictors with a bigger scale range and categorical predictors were transformed to numerical classes.

After the data was made ready the search for the best algorithm can be initiated. Deep Neural Networks with different architectures were evaluated, in addition to that Ensemble Models like ExtraTrees, RandomForest, XGB were evaluated and proved to perform better for this specific problem