top of page

Linear Regression Cont. (Multi Linear Regression model)

Updated: Sep 20, 2021

Please check Part A Linear Regression Simple version for dataset information

and link to Github for part B




Continue from the previous project with Birthweight_reduced.csv dataset.


Research question:


In this part, we like to investigate which other variables are significant in predicting the birthweight? Also, check how close your prediction is to actual values?


Step 1: Some data cleanup may be necessary. Let us go ahead and drop the features “id”, “LowBirthWeight”, and “lowbwt”.

Verify that these columns are now gone.

Step 2: Setup X matrix with all the independent variables left and the target of prediction is “Birthweight” which goes to y matrix.

Step 3: Split data set into training and testing with a split ratio of 75:25.

Step 4: After importing the necessary Linear Regression libraries, create and fit the model.

Step 5: Print intercept value.

Step 6: Print all coefficients.

Step 7: Make predictions. Remember to provide X_test data now.

Step 8: Check prediction value with actual real values.

Do a scatter plot to see how they align.

Step 9: from sklearn import metrics

Show the values MAE, MSE, RMSE.


Result Interpretation

  • The intercept has a negative value -12.34. The expected value on Birthweight is -12.339 if all of the independent variables are 0.

  • In the coefficient result, the most stand out value is -0.803 from Mother over 35 variable. This indicates that if Mother is over 35 there will be a negative impact or decrease in weight of the baby.

  • Based on the result of prediction with actual real values above, we can see the prediction and the actual values are not too much different from each other.

  • The mean absolute error (MAE) value is 0.85 implies that, on average, the predicting distance from the true value is 0.85, this value is closer to 0 meaning the model is a good predictor of the outputs.

  • Root Mean Square Error (RMSE) is a measure of how spread out the residuals are, which the result is 1.075

  • The mean squared error ( MSE) shows how close a regression line is to a set of points. It does this by taking the distances from the points to the regression line and squaring them. MSE has the value of 1.156

  • The MAE, RMSE, and MSE results indicate that the model above is a moderate prediction model.

 

Comments


Post: Blog2_Post
bottom of page