top of page

Linear Regression (Simple and Multiple)

Updated: Aug 20, 2021

Link to GitHub:



Regression is an analytical model that can identify the linear relationships (correlation) of variables. Simple or Multiple Linear regression can be used as an effective tool in Machine Learning. There is an important difference between classification and regression problems. Fundamentally, classification is about predicting a label and regression is about predicting a quantity. The Dependent Variable (DV) is typically a continuous variable.


This project consists of two parts:

Part A - Create a Simple Linear Regression (SLR), a single predictor variable predictive model using birthweight data provided and discuss findings.


Part B - Create a Multiple Linear Regression (MLR), a multiple predictor variable predictive model using birthweight data provided, and discuss your findings.



Source Dataset


The data Birthweight_reduced.csv will be used for creating and testing the two models i) Simple Linear Regression, ii) Multiple Linear Regression. This (real) dataset contains information on newborn babies and their parents. It contains all continuous except for one variable “LowBirthWeight” which is categorical. Hence this dataset is most useful for correlation and regression. The attribute metadata is available in the below table. Birthweight is the dependent variable.

Research questions


1. Check if baby birthweight is dependent upon mother’s pre-pregnancy weight

2. Check if the baby length is dependent upon the mother's height


I created two separate Models Using the Python Jupyter Notebook to investigate the above two research questions as follows.


Step 1: import all our standard libraries. Also, include seaborn


1. Load the CSV file into a dataframe df

2. Explore the top 5 rows of this data frame

3. How many rows and columns do it have?

4. See all the columns this dataset has


Step 2: Using seaborn visualize the pairplots between 'headcirumference', 'length', 'Birthweight', 'mppwt', 'mheight'


Step 3: compute the correlations of all these variables. Then show a heatmap of the correlations.


Step 4: Develop two SLR Models to answer the two research questions:


1. Research question# 1, configure the attribute “mppwt” as predictor(independent) variable and “Birthweight” outcome (dependent) variable.

2. Research question# 2, configure the attribute “mheight” as predictor(independent) variable and “Length” outcome (dependent) variable


Step 5: I will repeat the following steps to first check the relationship between “mppwt” and “Birthweight” and later between “mheight” and “length” of the baby.


Step 6: Constructing the training and testing sets for the 2 SLR.


Result Interpretation


According to the pair plot, Some of the variables have linear relationships with the others. Some don't have any relationships.



  1. `headcircumference` has a positively weak and linear relationship with Birthweight

  2. `length` has a positive relationship with Birthweight

  3. `mppwt` has a positive linear relationship with `mheight`

After running the correlation as shown in the heatmap below:



  • `headcirumference` has positive relationship with length, Birthweight, `mppwt`, and height. Besides, `headcirumference` has a stronger positive relationship with `Birthweight` (0.736) and a moderate relationship with length (0.565328)

  • `length` has a moderate positive relationship with `headcirumference`, strong positive relationship with `Birthweight`, and weaker relationships with `mppwt` and `mheight`

  • Birthweight has a strong positive relationship with `headcirumference` (0.736) and length (0.697). Besides, it has weaker positive relationships with `mppwt` and `mheight`

  • `mppwt` has strong and positive relationship with `mheight` (0.671) and weaker positive relationships with `headcirumference`,length, and Birthweight

  • `mheight` has a positive relationship with `mppwt` and weaker relationships with ``headcirumference`,`length`,`Birthweight`.

Then we configured the features and ran a linear regression model.

The result shows that mother pre-pregnancy ('mppwt') and Weight of baby ('Birthweight') have a positive linear relationship

  • Intercept value of Mother pre-pregnancy and Weight of baby is 2.074. Meaning that the expected mean value of the Weight of the baby is 2.074 when Mother's pre-pregnancy value equals 0. However, this intercept has no intrinsic meaning since Mother's pre-pregnancy weight never equals 0.

  • For every lbs increase in Mother pre-pregnancy weight, the Weight of the baby will increase by 0.0425 lbs.


Recent Posts

See All

Comments


Post: Blog2_Post
bottom of page