Undergraduate Capstone Project

Predicting House Prices with
Advanced Regression Logic

A machine learning project utilizing Gradient Boosting and Lasso Regression to estimate property values with high precision based on the Ames Housing Dataset.

1460
Data Points
79
Features Analyzed
0.13
RMSLE Score
92%
Accuracy (R²)

Methodology & Workflow

From raw data to a deployable model, here is the technical pipeline used in this project.

1. Data Preprocessing

Handled missing values using KNN imputation. Corrected skewness in target variables using Log Transformation (`np.log1p`) to normalize the distribution for regression models.

2. Feature Engineering

Created interaction features (e.g., `TotalSF` = Basement + 1stFlr + 2ndFlr). Encoded categorical variables using One-Hot Encoding and addressed high cardinality.

3. Modeling (Ensemble)

Stacked XGBoost, LightGBM, and Lasso Regression. Used GridSearchCV for hyperparameter tuning to minimize Root Mean Squared Error (RMSE).

INTERACTIVE DEMO

Estimate Property Value

Enter the details of a hypothetical property below. This form uses a JavaScript approximation of the trained model's coefficients to generate a real-time estimate.

Fill out the form to see the prediction

Model Performance Analysis

Visualizing feature importance and prediction accuracy on test data.

Data Source: Ames Housing

Top 5 Predictive Features

* "OverallQual" is the dominant predictor, followed by Living Area.

Actual vs. Predicted Prices

* High correlation along the diagonal indicates strong model accuracy.