Bulldozer Auction Price Prediction with Machine LearningBulldozer Auction Price Prediction with Machine Learning
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started
πŸ—οΈ Bulldozer Price Prediction β€” End-to-End Machine Learning Project 🚜 Overview
How do you accurately estimate the value of heavy equipment at auction?
In this project, I built a machine learning model to predict the sale price of bulldozers using historical auction data β€” effectively creating a data-driven β€œblue book” valuation system.
This project simulates a real-world ML workflow:
Handling messy, real-world data Engineering meaningful features Iterating through models and tuning Evaluating performance using proper metrics 🎯 Problem Statement
Auction prices for heavy equipment can vary significantly based on:
Machine specifications Usage and configuration Market conditions over time The objective is to build a model that can accurately predict the SalePrice, enabling:
Better pricing decisions Reduced uncertainty in auctions Data-driven valuation systems πŸ“Š Dataset
The dataset is split into three time-based sets:
Train Set β†’ Data up to 2011 Validation Set β†’ Jan 2012 – April 2012 Test Set β†’ May 2012 – Nov 2012 This structure mimics real-world forecasting, where models are trained on past data and evaluated on future data.
⚠️ Due to size limitations, the dataset is not included in this repository. πŸ‘‰ Download here: https://www.kaggle.com/competitions/bluebook-for-bulldozers/data
βš™οΈ Machine Learning Workflow
🧹 Data Preprocessing
Converted saledate to datetime format Extracted time-based features: Year, Month, Day, Day of Week Handled missing values: Numerical β†’ median imputation Categorical β†’ encoded as numerical values 🧠 Feature Engineering
Created time-based features from sale date Leveraged machine attributes and configuration data Improved model performance through iterative feature refinement 🌲 Model Used
RandomForestRegressor
Why?
Handles non-linear relationships well Works great with structured/tabular data Robust to noise and missing values πŸ“ˆ Results
Metric Training Validation MAE 2953.82 5951.25 RMSLE 0.1447 0.2452 RΒ² 0.9588 0.8818 πŸ” Iteration Journey (What Actually Happened)
This project wasn’t a straight line β€” and that’s where the real learning happened.
Stage Validation RMSLE Baseline Model 0.2936 First Tuning Attempt 0.5638 ❌ Final Optimized Model 0.2452 βœ… πŸ’‘ Key Takeaway:
Better hyperparameters don’t guarantee better performance β€” experimentation does.
πŸ”§ Hyperparameter Tuning
Used RandomizedSearchCV (100 iterations) to explore the parameter space.
Best parameters:
n_estimators=40 min_samples_leaf=1 min_samples_split=14 max_features=0.5
πŸ’‘ Key Insights Feature engineering had the biggest impact on performance Poor tuning can significantly degrade model accuracy Time-based splits are crucial for realistic evaluation Iteration and experimentation are core ML skills
πŸš€ Future Improvements Apply log transformation to improve RMSLE Experiment with LightGBM / XGBoost Build a deployment-ready app (Streamlit) Add feature importance visualization
πŸ“ Project Structure bulldozer-price-prediction/ β”‚ β”œβ”€β”€ notebook.ipynb β”œβ”€β”€ README.md β”œβ”€β”€ .gitignore └── requirements.txt
πŸ§‘β€πŸ’» Author Toby Chuks GitHub: https://github.com/tobychuks01 LinkedIn: https://www.linkedin.com/in/toby-chuks-630b44217
⭐ Final Note This project reflects more than just building a model β€” it demonstrates the importance of iteration, experimentation, and learning from failure in machine learning.

Post image
Back to feed
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started