Do CEOs Have Deeper Voices?

Analyzing the voice frequency, projection, and fluctuation of Fortune 100 male CEOs compared to average male voices using R.

Posted May 31, 2019

By

views 5 min read

Do CEOs Have Deeper Voices?

Are Male CEO Voices Different than the Average Male?

Introduction

For my senior project, I wanted to analyze male voices from the Fortune 100 companies and see if there was any significant difference between their voices compared to the average male voice. At the start, I had no idea how to analyze sound, and I had to learn more about the acoustical science before I could really start analyzing the voices. I decided to use the R programming language to gather and analyze the sound. I am very familiar with the language, and I found that compared to a lot of other free source programming languages, it had more information when it came to analyzing sound. Specifically, I wanted to see if there was any difference between their voice frequency, projection, and fluctuation.

Getting the Data

I used a data set that came from a study with help done by Richie Zweigenhaft¹ were he gathered information about CEO’s. I used this data set to get the name, gender, and company of the CEO. I also used data from Mozilla’s Common Voice project to gather samples of male voices to compare with the CEO’s. I was able to get CEO voices from videos off YouTube.

The Process

Here is the process that I went through to do my analysis:

I used an R package called RSelenium to find the CEOs videos that I needed. Once I found those videos, I used some command line tools, youtube-dl and ffmpg, to scrape the audio off the videos. I then went through the tedious process of listening to those audio segments one by one, finding the CEO voice, and slicing out an excerpt of their voice. The sound clips that I sliced only contains the voice of the CEO, and I tried to get clips with the least amount of background noise as possible.

To analyze the sound in R, I used the tuneR, seewave, and warbleR packages. I relied heavily on a modified warbleR function called spec_an from Kory Becker². It processed many acoustic summary statistics for a sound wave. I then started to test some of the variables that were produced using t-tests and Kruskall-Wallis tests. I was also hoping that

Now that I have discussed a little bit of my process, I am now going to get into some of the exploratory analysis and the results that I found.

Exploratory Analysis

# meanfreq
mfrq <- dat %>%
  ggplot(aes(is_ceo, meanfreq, color = as.factor(is_ceo))) +
  geom_boxplot() +
  labs(title = "Mean Frequencies of CEO's and Average Males", y = "Mean Frequency (kHz)", color = "CEO", x = "Is CEO")

# dfrange
rang <- dat %>%
  ggplot(aes(is_ceo, dfrange, color = as.factor(is_ceo))) +
  geom_boxplot() +
  labs(title = "Range of CEO's and Average Males", y = "Range (Hz)", color = "CEO", x = "Is CEO")

ggarrange(mfrq, rang, ncol = 2)

Looking at the boxplots above, we learn a little bit more about our data. We see that as far as their mean fundamental frequencies go, that there is no difference between a CEO voice and a male voice. This finding is also supported in a paper done by Duke University³. What is interesting is looking at their voice fluctuation. It looks like there could be a difference here. I am going to test this variable to see if it could be a difference between the voices.

Testing

# t-test
# pander(t.test(meanfreq ~ is_ceo, data = dat, mu = 0, alternative = "two.sided",
#        conf.level = 0.95))
# Kruskall-Wallis
pander(wilcox.test(dfrange ~ is_ceo, data = dat, mu = 0, alternative = "two.sided", 
            conf.level = 0.95))

</col> </col> </col>

Wilcoxon rank sum test with continuity correction: `dfrange` by `is_ceo`
Test statistic	P value	Alternative hypothesis
14150	0.0203 *	two.sided

After conducting an independent samples t-test, I couldn’t find any significance for the mean frequency being lower for CEO’s than for Average males. For detecting range, the errors where not normal to conduct a t-test. However, when I ran a Kruskall-Wallis test which compares the medians instead of the means, it showed that there was a difference in their fluctuation. After trying to see if I could find any differences in exploring the data, and trying to do some tests, I am now going to move into machine learning to see if it can pick out any other variables that could be useful in picking out differences.

Machine Learning

dat1 <- dat %>% 
  as.tibble() %>%
  dplyr::select(-sound.files)

ind <- sample.split(Y = dat1$is_ceo, SplitRatio = 0.7)
trainDF <- dat1[ind,]
testDF <- dat1[!ind,]

modelRandom <- randomForest(is_ceo ~ ., data = trainDF, mtry = 9, ntree = 20)
# importance(modelRandom)
varImpPlot(modelRandom, main = "Random Forest for Predicting CEO Voice")

The machine learning algorithms that I used to see if it could correctly predict CEO’s voices where a random forest and k-means clustering. Using the random forest, there where to many false negatives (85%) with detecting CEO’s to have it be useful, even though it reported to have 91% accuracy. Also, the variables that it reports to have the highest predictability has to do with background noise, and not their voice. Even when taking out those variables, it doesn’t improve the accuracy. The k-means clustering was not able to cluster into the CEO and average male groups. Having more CEO voice samples could be used to help with better detection so that the model isn’t trained on more average male voices.

Conclusion

In conclusion, it seems that CEO’s have more range in their voices, but they don’t have a lower voice compared to the average male. There are many things that could be done to improve this research. One is to have more samples of CEO voices in different settings, and using the same recording equipment to have more accurate results. The voice samples that I had for the average males where either reading from a book, or answering an on screen question, while the CEO’s where usually talking to someone. There could also be other acoustic variables that could be used to detect differences. There must be other reasons besides their voice that they are chosen to be a CEO.

References

The Faces of American Power, Nearly as White as the Oscar Nominees, https://www.nytimes.com/interactive/2016/02/26/us/race-of-american-power.html ↩
K. Becker. Identifying the Gender of a Voice using Machine Learning http://www.primaryobjects.com/2016/06/22/identifying-the-gender-of-a-voice-using-machine-learning/↩
J. Mayew, William & A. Parsons, Christopher & Venkatachalam, Mohan. (2013). Voice Pitch and the Labor Market Success of Male Chief Executive Officers. Evolution and Human Behavior. 34. 10.1016/j.evolhumbehav.2013.03.001.↩

</div> </div>

</div>

Data Analysis, Sound

CEO voice R sound analysis Fortune 100

This post is licensed under CC BY 4.0 by the author.