Please check Part A Linear Regression Simple version for dataset information
and link to Github for part B
Continue from the previous project with Birthweight_reduced.csv dataset.
Research question:
In this part, we like to investigate which other variables are significant in predicting the birthweight? Also, check how close your prediction is to actual values?
Step 1: Some data cleanup may be necessary. Let us go ahead and drop the features “id”, “LowBirthWeight”, and “lowbwt”.
Verify that these columns are now gone.
Step 2: Setup X matrix with all the independent variables left and the target of prediction is “Birthweight” which goes to y matrix.
Step 3: Split data set into training and testing with a split ratio of 75:25.
Step 4: After importing the necessary Linear Regression libraries, create and fit the model.
Step 5: Print intercept value.
Step 6: Print all coefficients.
Step 7: Make predictions. Remember to provide X_test data now.
Step 8: Check prediction value with actual real values.
Do a scatter plot to see how they align.
Step 9: from sklearn import metrics
Show the values MAE, MSE, RMSE.
Result Interpretation
The intercept has a negative value -12.34. The expected value on Birthweight is -12.339 if all of the independent variables are 0.
In the coefficient result, the most stand out value is -0.803 from Mother over 35 variable. This indicates that if Mother is over 35 there will be a negative impact or decrease in weight of the baby.
Based on the result of prediction with actual real values above, we can see the prediction and the actual values are not too much different from each other.
The mean absolute error (MAE) value is 0.85 implies that, on average, the predicting distance from the true value is 0.85, this value is closer to 0 meaning the model is a good predictor of the outputs.
Root Mean Square Error (RMSE) is a measure of how spread out the residuals are, which the result is 1.075
The mean squared error ( MSE) shows how close a regression line is to a set of points. It does this by taking the distances from the points to the regression line and squaring them. MSE has the value of 1.156
The MAE, RMSE, and MSE results indicate that the model above is a moderate prediction model.
Comments