top of page

Machine Learning Topic: Mental Health in Tech Workplace

Updated: Aug 20, 2021

Authors: Tram Nguyen

Claremont Graduate University, 2021

 


You can find my Jupyter Notebook (Python) here:





INTRODUCTION

Mental health has always been the central topic around workplace culture and wellness conversations. In a study of "mental well-being at the workplace" in 2010, it is increasingly recognized that employees' mental health is a crucial determinant in their overall health. Poor mental health and stressors at the workplace can contribute to a range of physical illnesses. Mental health can also affect their personal and professional lives (Rajgopal). Furthermore, mental issues such as depression and anxiety have a significant economic impact; the estimated cost to the global economy is US$1 trillion per year in lost productivity (who.int).

Technology is a fast-moving industry with high stakes. Tech workers are often under intense pressure to stay on top of a fast-paced, competitive industry. They have to contribute their skills and knowledge to build the companies' values and meet the digital age demand.

Some factors such as long working days, late nights, tight deadlines, gender gaps, and lack of inclusion and diversity contribute to poor mental health in tech. In 2019, BIMA published one of the statistics results in their Tech Inclusivity and Diversity Report. It stated that the mental health in the tech industry is currently in a poor state, not to mention some would even go as far as saying it is reaching a crisis point. The statistic report found that more than 50% of tech employees suffered from anxiety or depression at some point, more than 60% of the respondents were stressed by their work.

Organizations or employers need to recognize how mental wellness can impact their employees' productivity and performance (diversityintech.co.uk). The WHO has found that by investing in mental health wellness every $1 into scaled-up treatment for common mental illness, there is an ROI of $4 in improved health and productivity (who.int).

Though mental health issues in the workplace are important and need to be discussed, mental health is still heavily stigmatized. Mental illness is viewed as a shameful personal deficiency, a failure of weakness. Many people don't feel comfortable discussing their mental health problems or admit that they are dealing with them for fear of being judged by others. People who can't share their experiences or seek treatment are locked up in their bubbles with the intense pressure to balance their personal and professional lives.

The CDC and WHO encourage organizations or employers to promote health awareness and provide employees with support and resources, such as low-cost medical benefits for mental health counseling or mental health programs for prevention and treatments, to improve mental health in the workplace.

Our objective in this report is to analyze the attitude of tech workers towards mental health and the main predictors of mental health illness in the U.S. tech workplace. We explore and analyze the dataset "Mental Health in Tech workplace survey 2014", with questions had questions pertaining to how mental health is perceived at tech workplaces by employees and their employers.


PROBLEM DEFINITION

1. Research question

  • Do tech employees seek mental health treatments?

  • What are the main predictors of mental health illness in the tech workplace?

2. Dataset description

The dataset name is "Mental Health in tech survey" found on the Kaggle website and is in a single CSV file. This dataset was collected from 2014 and open source from Open Sourcing Mental Illness, LTD. The dataset contains the measurement results of attitudes towards mental health and frequency of mental health disorders in the tech workplace from different countries and regions.

This dataset contains the following data:

  • Timestamp

  • Age

  • Gender

  • Country

  • state: If you live in the United States, which state or territory do you live in?

  • self_employed: Are you self-employed?

  • family_history: Do you have a family history of mental illness?

  • treatment: Have you sought treatment for a mental health condition?

  • work_interfere: If you have a mental health condition, do you feel that it interferes with your work?

  • no_employees: How many employees does your company or organization have?

  • remote_work: Do you work remotely (outside of an office) at least 50% of the time?

  • tech_company: Is your employer primarily a tech company/organization?

  • benefits: Does your employer provide mental health benefits?

  • care_options: Do you know the options for mental health care your employer provides?

  • wellness_program: Has your employer ever discussed mental health as part of an employee wellness program?

  • seek_help: Does your employer provide resources to learn more about mental health issues and how to seek help?

  • anonymity: Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?

  • leave: How easy is it for you to take medical leave for a mental health condition?

  • mentalhealthconsequence: Do you think that discussing a mental health issue with your employer would have negative consequences?

  • physhealthconsequence: Do you think that discussing a physical health issue with your employer would have negative consequences?

  • coworkers: Would you be willing to discuss a mental health issue with your coworkers?

  • supervisor: Would you be willing to discuss a mental health issue with your direct supervisor(s)?

  • mentalhealthinterview: Would you bring up a mental health issue with a potential employer in an interview?

  • physhealthinterview: Would you bring up a physical health issue with a potential employer in an interview?

  • mentalvsphysical: Do you feel that your employer takes mental health as seriously as physical health?

  • obs_consequence: Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?

  • comments: Any additional notes or comments

There are 26 columns in the dataset. Only the "Age" column is an integer data type, other columns are object data types.

EXPERIMENTAL EVALUATION

1. Methodology

To answer the research questions mentioned above, we found that the "treatment" variable is our dependent variable. This variable is the response to the question "Have you seek treatment for a mental health condition?". Our goal is to understand what are the factors that possibly encourage the tech employees to seek mental health treatments and improve their mental health (Answer = "yes" in "treatment").

To define the appropriate independent variables (the factors), we used Spearman's correlations to evaluate the statistical relationships between "treatment" and other variables. We picked those variables that have Spearman's correlations are greater than 0.07(moderately low) to be our independent variables.

Next step, we statistically analyzed the responses for each of the related factors above to understand more about the respondents' attitudes toward mental health programs and features in their workplace. We also researched the current situation in Mental Health in the tech industry to compare with the findings.

Finally, we created and performed three different data modeling algorithms, including Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, AdaBoost, and K Neighbors. We split the dataset into 70% training set and 30% testing set. Lastly, we compared the accuracy results and picked out the model that offers the most accurate score for our prediction.


2. Data Cleaning

Before applying Machine Learning models and methods, we need to remove the corrupted, incorrectly formatted, duplicate, or incomplete data within the dataset (tableau.com). Without this step, the outcomes and algorithms are unreliable.

Data geographic distribution

We examine the dataset and found that the majority of respondents came from the US and UK (have more than 100 respondents). In this analysis, we only target these top countries instead of countries with a smaller number of respondents. Hence, we drop every other country that has less than 100 respondents.

Fig. 4 Data geographic distribution

Since we chose not to look at the U.S specifically, we drop the "State" column so we can focus on countries aspect.

Missing values (nulls) and irrelevant data


Fig. 5 Missing values in the dataset

Based on our observation, we can see that the comments column contains a lot of missing values (86%). The comment section in the survey usually is an optional text box, and respondents often leave it blank. we can drop this comments column because it doesn't contribute any useful information to our analysis. We decide to drop this column.

The second most missing values are in work_interfere with 20% of missing values in the dataset. We check the data values to see what type of input values this column contains and replacing missing values with "NA".


Fig. 6a Current values in work_interfere
Fig. 6b After transform missing values to new category NA in work_interfere

Similarly, 'self_employed' contains only 13 missing values. We decide to transform these values into 'No' with the assumption these people aren't self-employed.

Inconsistency in input values

We have noticed that the dataset has quite a few inconsistencies in values in the "Gender" column.


Our data cleaning up for this column is transforming all of the values to lower cases and then recoded values of males to 'male', females to 'female', any other gender identification to 'other'.



M = ['male','m','make','male-ish', 'maile','something kinda male?', 'cis male','mal','male (cis)','guy (-ish) ^_^','male ','man','msle','mail','malr','cis man']
F = ['female','f','woman','cis female','femake','female ', 'cis-female/femme','female (cis)','femail']
Others = ['trans-female','queer/she/they','non-binary','nah', 'all', 'enby', 'fluid', 'genderqueer','androgyne','agender', 'male leaning androgynous','trans woman','neuter', 'female (trans)', 'queer', 'a little about you','p','ostensibly male, unsure what that really means']

data.Gender.loc[data.Gender.isin(m)]= 'male'
data.Gender.loc[data.Gender.isin(f)]= 'female'
data.Gender.loc[data.Gender.isin(o)]= 'other'

The result shows there are 716 males, 206 females, and 14 of other gender identities.


Another value that needs to be transformed for a better analysis process is in the no_employees column. We converted the values 'More than 1000' to '>1000'.


Values out of Ranges

We investigated and found column Age contains values out of range or illogical such as negative values, values that larger than 100, values that smaller than 16 (assume legal age of working is 16, retired at 75).



Fig. 7 Age original data

As we can see in Fig.8 below, the Age is now in a range from 18-65 years old and no more negative values


Fig 8 Ages are in range

Irrelevant columns

Several columns don't contribute much to the analysis, such as "Timestamp".

As mentioned in figure 2, "Timestamp" contains the date respondents submitted the survey, which is not an essential factor or contributes to the analysis and prediction. Therefore, we drop this column.

3. Data Exploration

3.1 Target of Data

We evaluate the results of the question "Have you get treatment for a mental health condition?", which is our target variable "treatment". As shown in fig.9, the percentage of respondents who want to get treatment is 54%, and not getting treatment is around 46%. These results indicate that tech workplaces should promote mental health and support employees with mental health issues. According to the CDC website, workplaces promote mental health programs that have proven successful, especially when combining psychological and physical health interventions. Poor mental health and stress can negatively affect employees, such as job performance and productivity, engagement with one's work, communication between coworkers, and physical capability and daily functioning.



Fig. 9 Get Treatment of Survey Respondents


3.2 Respondent Demographics

3.2.1 Type of companies

The majority of the respondents from the U.S and U.K are not in technology-based companies even they are working in tech positions. Most of these tech-based employees are from mid-size to large-size companies.


Fig. 10


3.2.2 Age

We plot the Distribution for Age of the Survey Respondents by histogram and boxplot (Fig. 11a and 11b). The minimum age is 18 years old, and the oldest respondents' age is 65. The boxplot result indicates most of the survey respondents around their mid-20s to early 40s. The distribution score of 0.90 means the data are highly positive right-skewed. The distribution of ages is right-skewed, which is expected as the tech industry tends to have younger employees. Boxplot result also shows no statistically significant difference of ages between respondents who say yes and no to seek treatments.



Fig. 11a Respondents' Age distribution




3.2.3 Gender


The related question to gender result is: "What are your gender identities?"

More than 76% of the respondents are male, which is expected as the tech industry is male-dominated (Figure 12). We can also see there are minimal female and queer employees who work in the tech industry. The result shows how it is not popular for females and queers to work in the tech industry.

According to a study from Western Governors University in 2020, the gender gap in the tech industry has always been an issue, as many women face inequality in the tech workplace. These issues including salary discrepancies, harassment, stereotypes, and more. The lack of balance can be a breeding ground for gender bias within a company or organization and potentially creates mental health issues in their workplaces.



Fig.12 Gender proportion


3.3 Insights from tech industry mental health


Correlations

The relationships between "Treatment" and other features are not linear. Therefore, we use Spearman's correlations to measure the relationships between these variables. We will eliminate those variables that don't have any relationships with the target variable with a correlation score less than 0.07.


Fig. 13 Correlation heatmap

As shown in the map above, we can detect there are no relationships between target variable "treatment" and variables: Age, Country, self_employeed, remote_work, tech_company, mental_health_consequence, no_employees, physical_health_consequence, coworkers, phys_health_interview, mental_vs_physical.

Besides, there are positively weak relationships between treatment and variables: wellness_program (7% ), seek_help (10%), anonymity (18% ), leave (10% ), obs_consequence (15%), and mental_health_interview(10%). There are positively moderate relationships between treatment and variables: family_history (35%), benefits (23% ), care_options (25%). There is a negatively weak relationship between treatment and Gender (-14%)

Lastly, there is a strong relationship between treatment and work_interfere (63%). We need to understand the respondents' reactions and attitudes toward mental health in their workplace. Hence, we statically analyze each of the variables that have relationships with treatment.


Family History

"Do you have a family history of mental illness?"

Around 42% of the respondents said that they have a family history of mental illness. The plots in fig. 14 show that these respondents also prefer to seek treatment compared to those that don't have any family history of mental illness. This is explainable since the respondents with mental illness have more awareness, experience, and concern about their mental health in their family history. They are more open to seeking treatments when facing mental health issues. Some research suggests that mental illness can run in the family and may be passed on for different reasons, not just genes (rethink.org). Another study, "Inheriting Mental Disorders," stated that the chance of an individual having a specific mental disorder is higher if other family members have that same mental disorder (healthychildern.org).



Fig. 14 Family History of Survey Respondents

​Workplace Interferes

"If you have a mental health condition, do you feel that it interferes with your work?"

The respondents were asked to describe the frequency of mental health condition interferes with their work.

We want to see whether these respondents who admitted mental issues interfered with their work seek treatment (see Fig. 15). The responses included "Sometimes" with 39%, "Never" with 17%, "Rarely" with 14%, "Often" with 10%, and 20% of NA. Overall, we can see that about 65% of the responses admitted mental health issues interfered with their work sometimes, rarely, and often.

The result from the second plot also shows people actually admitted and choose to seek treatments for their mental health. This is a good sign of how people consider their mental health treatment to improve their work environment or performance.

Surprisingly, on the other hand, we can also see some people who answered "Never" to the question but still seek treatment. These may be the early or prevention treatments for mental health issues (due to stress) before work or the project starts.


Fig. 15 Work_interfere of Respondents

​Observed Consequence

"Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?"

There is 13% of respondents agree that they know the negative consequences for coworkers with mental health issues; almost 70% of them want to get treatment (Fig. 16). It is important for employers to know that employees observe coworkers who suffered from mental health issues will encourage them to seek treatments for prevention.

Fig. 16 Consequence observed for disclosing Mental Health

Employers' mental health benefits

"Does your employer provide mental health benefits?"

More than 45% of the respondents know about the mental health benefits offered by their employers, and 60% of them want to seek treatment. Many people don't know that their employers offer mental health benefits (33%). Surprisingly, among these people who don't know about mental health benefits, nearly 45% of them still want to seek mental health treatment. The goal for the companies is to raise awareness of the benefits so employees can take advantage of them to improve their mental health in the workplace.


Fig. 17 Benefits awareness

Care options

"Do you know the options for mental health care your employer provides?"

37% of the surveys said no, and 26% left the answer blank (Fig.18). More than half of these people did not seek treatments because they don't know about the mental health programs offered by the companies. On the other hand, 37% of the surveys said yes. They acknowledge their companies provide care options for mental health. More than 72% of these are willing to seek treatments. Similar to the benefits section above, this is proof that if more employees acknowledge their companies' health care options, they will take the opportunities to seek professional help regarding mental health issues.

Fig. 18 Care options acknowledgement

Wellness Program

"Has your employer ever discussed mental health as part of an employee wellness program?"

There are 20% of the respondents said Yes to Mental Health is offered in their companies' wellness programs. About 60% of them chose to seek treatment. However, it is concerned to see there are 65% of the respondents said that they aren't offered mental health as a part of their wellness programs provided by the companies. Though more than 60% of them still seek treatments, which means companies need to fulfill their duties and include mental health as a part of their wellness programs.

Fig.19 Mental health wellness program

Seek help

"Does your employer provide resources to learn more about mental health issues and how to seek help?"

Fig. 20 Mental health in Wellness Program

Similar to care_options, wellness_program, and benefits, this question is also related to companies' mental health support systems. 44% said their companies don't provide mental health in their wellness programs, while only 23% said their companies actually provide support systems for mental health programs. Either companies provide mental health support as a part of a well program, employees still choose to seek treatments. Only those who don't know about mental health issues, choose not to seek treatments. Companies should raise awareness of mental health issues and provide more information about mental health issues and supports to their employees.

Anonymity

"Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?"

Anonymity is an interesting question about whether the respondents' anonymity protected if they choose to take advantage of mental health or substance abuse treatment resources?

About 31% of their responses are “yes.” They know their anonymity is protected while taking advantage of mental health or substance abuse treatment resources, and more than 65% want to get treatment. An explanation for this is maybe the employees feel safer when their confidential data, as well as private information, are protected by the companies. However, more than 66% of the respondents don't know if their anonymity is protected and decide not to seek treatments. Like benefits awareness, companies need to promote this information to their employees to build trust and encourage them to seek treatments if issues occur.

Fig. 21 Anonymity for Mental Health

Leave

"How easy is it for you to take medical leave for a mental health condition?"

Nearly 50% of respondents said they don't know whether they can take medical leave due to mental health conditions; 45% of them seek treatment anyways. A small percentage of people (7%) admitted that it is difficult for them to take a leave for mental health conditions; 75% of them seek treatment. Besides, if we look at respondents who said it is "somewhat easy" or "very easy" to take a mental condition leave, more than 50% of them seek treatments.

Fig. 22 Medical leave for Mental Health reasons

Mental Health in Interview

"Would you bring up a mental health issue with a potential employer in an interview?"

Mental health can be a very sensitive topic to be brought up during the job interview section. 83% of respondents choose to not discuss their mental health with the interviewer. Only 3% of the respondents think it is a good idea to talk about their mental health to the interviewer.


Fig. 23 Mental Health in Interview

DATA MODELING


As mentioned, we decided to create different prediction models including Logistic Regression, K-Nearest Neighbors, Decision Trees, Random Forest, Gradient Booster, and AdaBoost. After examined Spearman's correlation, we selected the highest-correlated features that have correlation scores above 0.07 for column X (independent variables). The y dependent variable is "Treatment".

We use the elbow method to detect which k-value will give us the lowest error rate. As in the graph (fig. 24) below, we can see that the lowest points are k = 14 or/and k= 22.


Fig. 24 K-value elbow method

Figure 25 shows our algorithms set up.


Fig. 25 Algorithms set up

The result shows KNN has the highest accuracy score of .84 and the lowest is Decission Trees.


Fig. 26 Result

FINDINGS AND INSIGHTS

We find that mental health illness exists in the tech workspace in the North American region through this study. Predictors such as 'Gender,' 'work_interfere,' 'family_history,' 'care_options,' 'benefits,' 'anonymity,' 'obs_consequence,' 'seek_help,' 'mental_health_interview', and so on have an overall 84% accuracy in predicting whether an employee would seek treatment. More importantly, "work_interfere" has the highest correlation (0.63) with "Treatment," which explains the direct impact of suffering mental disorders on work. Interestingly, we found that "Age" has a very low correlation with "Treatment" (0.04), indirectly revealing the potential influence of work on mental illness. Although this study does not have a medical focus, we found that family history has a moderate impact on an employee's mental health condition, which further affects one's decision to seek treatment.

Based on gender and family history data exploration and analysis results, there is a great influence on the respondents' decisions whether they should seek the mental health issues treatments or not. Companies with a younger demographic may suffer from more mental health issues at the workplace; however, they are more open-minded to seek treatments.

According to the correlations and data exploration in work interference, work interference plays as the most influential of the employees who want to get treatments. Companies should consider providing facilities and supports to anticipate stress at the workplace for employees.

Besides, as mention earlier, companies need to promote and provide good health benefits and keep their employees' personal information private. These are also the way to help to build trust and improve mental health for their employees.

In the data modeling and evaluation, we are satisfied that after thoroughly exploring, cleaning, and encoding the data, all six algorithms (Logistic Regression, KNN, Decision Trees, Random forest, Gradient Booster, and Adaboost) all achieved above 77% accuracy dataset. However, we are surprised that KNN performs the best among the three algorithms, especially when we believed that Logistic Regression should be the most suitable to the binary characteristic of the "Treatment" variable. After additional exploration, we think that since KNN is a non-parametric model used for classification and Regression that supports non-linear solutions, the performance is possible to be better than Logistic Regression. Additionally, the better predictive nature of KNN in this dataset also reveals that KNN's technique can better explain the features that we selected rather than logistic Regression's algebraic calculation. KNN is also sensitive to outliers, that eliminating the outliers in the previous data processing stage directly improved KNN's performance in this study.


FUTURE WORK

​In future work, we are interested in seeing the improvement of Mental Health in the tech workplace in recent years.

In updating the dataset, the dataset owner just uploaded the Mental Health in tech workplace in 2016 and 2019. We can run a similar analysis through these datasets to see whether the companies have improved their mental health programs. The features that need to look at including age, work interference, family history, care options, benefits, anonymity, observation consequence, seek help, and mental health interview.


CONCLUSION

In conclusion, we found that mental health issues are becoming more prevalent in the tech space in the North American region, especially the U.S. It causes a direct impact on employees' overall well-being as well as influencing their daily work performances. In this project, machine learning helped us substantiate our findings and identified the features that impact an employee's decision to seek mental health-related treatment.


REFERENCES


Data cleaning: The benefits and steps to creating and using clean data. (n.d.). Retrieved May 10, 2021, from https://www.tableau.com/learn/articles/what-is-data-leaning#:~:text=Data%20cleaning%20is%20the%20process,to%20be%20duplicated%20or%20mislabeled

Does mental illness run in families? (n.d.). Retrieved May 10, 2021, from https://www.rethink.org/advice-and-information/carers-hub/does-mental-illness-run-in-families/


Inheriting mental disorders. (2015, November 25). Retrieved May 10, 2021, from https://www.healthychildren.org/English/health-issues/conditions/emotional-problems/Pages/Inheriting-Mental-Disorders.aspx


Logistic regression¶. (n.d.). Retrieved May 10, 2021, from https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html

Mental health in the workplace. (2019, April 10). Retrieved May 10, 2021, from https://www.cdc.gov/workplacehealthpromotion/tools-resources/workplace-health/mental-health/index.html


Mental health in the workplace. (n.d.). Retrieved May 10, 2021, from https://www.who.int/teams/mental-health-and-substance-use/mental-health-in-the-workplace


Open Sourcing Mental Illness, L. (2016, November 03). Mental health in tech survey. Retrieved May 10, 2021, from https://www.kaggle.com/osmi/mental-health-in-tech-survey


Pant, A. (2019, January 22). Introduction to logistic Regression. Retrieved May 10, 2021, from https://towardsdatascience.com/introduction-to-logistic-regression-66248243c148


Rajgopal, T. (2010, September). Mental well-being at the workplace. Retrieved May 10, 2021, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3062016/

Workplace diversity and mental health in tech. (2019, November 25). Retrieved May 10, 2021, from https://www.diversityintech.co.uk/workplace-diversity-and-mental-health-in-tech






















Recent Posts

See All

Comments


Post: Blog2_Post
bottom of page