addoikenna/Mexico-City-Real-Estate-Price-Prediction

Ikenna Addo

Data Scientist

Data Analyst

Mexico-City-Real-Estate-Price-Prediction

This repository contains a Jupyter notebook that predicts the price of apartments in Mexico City based on their size, location (longitude and latitude) and neighborhood. The notebook uses a linear regression and ridge model.

Predicting Apartment Price with Size

The predict_price_with_size notebook first explores the relationship between apartment prices and size using descriptive statistics and visualizations. It then splits the data into a training set and a test set. The linear regression model is trained on the training set and evaluated on the test set.

The results show that the linear regression model can predict apartment prices with a mean absolute error of $1100 USD. The model also shows that larger apartments tend to have higher prices.

The notebook also includes a section on communicating the results of the analysis. This section includes the model equation, the model intercept and coefficient, and a visualization of the model.

Predicting Apartment Price with Location (Longitude and Latitude)

The predict_price_with_location notebook contains code and analysis for predicting apartment prices in Mexico City using location data.

Price of apartment in USD

Location: Latitude and longitude coordinates

Analysis

The following analyses are included in the notebook:

Exploratory data analysis through data visualization

Training a baseline mean price model

Building a regression pipeline with imputation and model

Evaluating model performance on training and test sets

Model

A linear regression pipeline is implemented with the following steps:

Impute missing values using mean imputation

Fit a linear regression model to predict price based on latitude and longitude

Results

The location features are found to not be strong predictors of apartment price. The model has similar performance to just predicting mean price

Predicting Apartment Prices with Neighborhood

This predict_price_with_neighborhood notebook builds a model to predict apartment prices in Mexico City based on the neighborhood (borough).

Data

The data comes from a CSV file with the following features:

borough - the neighborhood or borough in Mexico City

price_aprox_usd - the apartment price in USD

Approach

The steps taken in the notebook are:

Import and explore the data

Split into training feature and target

Create a baseline prediction using the mean price

One-hot encode the categorical borough feature

Build a linear regression model pipeline

Evaluate on the training data

Predict on the test data

Extract model coefficients and feature importances

Switch to Ridge regression to reduce overfitting

Extract Ridge model coefficients and feature importances

Visualize Ridge feature importances

Key Findings

The linear model reduces the training MAE to around

Ridge regularization further reduces overfitting

The most important features are boroughs like San Ángel, Del Valle Centro, Escandón, etc.

Future Work

Some ways the model could be improved:

Add more features like size, bedrooms, other amenities

Try different regularization techniques

Ensemble methods like random forests could help too.

Broaden data to more Mexican cities.

Like this project

Posted Mar 29, 2024

This repository contains a Jupyter notebook that predicts the price of apartments in Mexico City based on their size. The notebook uses a linear regression mod…

Likes

Views