An R Function for the Blau Index of Diversity
In diversity research, one is often interested in how an individual feature is distributed among the members of a group. In other words, one is interested in how diverse a group is with regard to that feature. If the particular feature can be expressed in a metric way, e.g. age or organizational tenure, researchers use measures of dispersion for quantifying the diversity of a group with regard to that feature. For example, the standard deviation of the average age of group members can be employed to indicate the age diversity of a group.
If researchers wish to quantify the diversity of a group with regard to a nominal feature, such as ethnicity, gender, or education, they usually employ the Blau Index (Blau, 1977). The Blau Index is calculated by
where p is the proportion of group members in a given category and i is the number of different categories of the feature across all groups. If a group is homogeneous with regard to the feature in question, i.e., if all group members have the same nationality, the Blau Index of the group for nationality is 0. If all members of the group have a different nationality, the Blau Index of that group for nationality approaches 1. The maximum Blau Index for a feature in a given data set depends on the number of categories of that feature in the data set.
A number of studies have linked the Blau Index of (management-) teams to team processes and team outcomes (e.g., Bantel & Jackson, 1989; Richard, Barnett, Dwyer, & Chadwick, 2004; Chandler, Honig, & Wiklund, 2005; Pitts, 2005). Threfore, I also wanted to include the Blau Index for various features in the analysis of the data I obtained in an attempt to replicate and extend a study by Homan, van Knippenberg, van Kleef, & De Dreu (2007).
In doing so, I was unable to locate an R function for calculating the Blau Index. I therefore wrote my own and thought that others might also find it useful.
The function takes two arguments:
A numeric vector, groupID, denoting the group of every person/participant in the data set.
A second vector, feat, that can be either numeric or string, denoting the expression of the feature for each person/participant in the data set.
The function returns a vector of length = number of groups with the Blau Index for each group.
Example:
groupid <- c(1,1,1,2,2,2,2)
feature <- c("male", "male", "male", "female", "female", "male", "male")
blau.index(groupid, feature)
[1] 0.0 0.5
Here is the code:
blau.index <- function(groupid, feat){
blau.index <- rep(0, length(levels(as.factor(groupid))))
if (is.numeric(feat)) { # if the feature is denoted as a numeric ordinal variable
for (i in 1:length(levels(as.factor(groupid)))){
for (j in 1:length(levels(as.factor(feat)))){
blau.index[i] <- blau.index[i] + ((sum(feat[groupid == i & feat == j])/j)/ length(feat[groupid == i]))^2
}
}
} else { # if the feature is denoted as as strings
number.of.features <- length(levels(as.factor(feat)))
feat.num <- rep(NA, times = length(as.factor(feat)))
for (i in 1:number.of.features){
feat.num[feat == levels(as.factor(feat))[i]] <- i
feat.num[is.na(feat.num)] <- (number.of.features + 1)
}
for (i in 1:length(levels(as.factor(groupid)))){
for (j in 1:length(levels(as.factor(feat.num)))){
blau.index[i] <- blau.index[i] + ((sum(feat.num[groupid == i & feat.num == j])/j)/ length(feat.num[groupid == i]))^2
}
}
}
blau.index <- (1 - blau.index)
return(blau.index)
}
I would appreciate suggestions for improvements.
References
Blau, P. M. (1977). Inequality and heterogeneity. New York, NY: Free Press.
Bantel, K., & Jackson, S. (1989). Top management and innovations in banking: does the composition of the top team make a difference? Strategic Management Journal, 10, 107–124.
Chandler, G. N., Honig, B., & Wiklund, J. (2005). Antecedents, moderators, and performance consequences of membership change in new venture teams. Journal of Business Venturing, 20, 705–725.
Homan, A. C., van Knippenberg, D., Kleef, G. A. van, & De Dreu, C. K. W. (2007). Bridging faultlines by valuing diversity: Diversity beliefs, information elaboration, and performance in diverse work groups. Journal of Applied Psychology, 92(5), 1189–1199.
Pitts, D. (2005). Diversity, representation, and performance: Evidence about race and ethnicity in public organizations. Journal of Public Administration Research and Theory, 15, 615–631.
Richard, O., Barnett, T., Dwyer, S., & Chadwick, K. (2004). Cultural diversity in management, firm performance, and the moderating role of entrepreneurial orientation dimensions. Academy of Management Journal, 47, 255–266.
If researchers wish to quantify the diversity of a group with regard to a nominal feature, such as ethnicity, gender, or education, they usually employ the Blau Index (Blau, 1977). The Blau Index is calculated by
where p is the proportion of group members in a given category and i is the number of different categories of the feature across all groups. If a group is homogeneous with regard to the feature in question, i.e., if all group members have the same nationality, the Blau Index of the group for nationality is 0. If all members of the group have a different nationality, the Blau Index of that group for nationality approaches 1. The maximum Blau Index for a feature in a given data set depends on the number of categories of that feature in the data set.
A number of studies have linked the Blau Index of (management-) teams to team processes and team outcomes (e.g., Bantel & Jackson, 1989; Richard, Barnett, Dwyer, & Chadwick, 2004; Chandler, Honig, & Wiklund, 2005; Pitts, 2005). Threfore, I also wanted to include the Blau Index for various features in the analysis of the data I obtained in an attempt to replicate and extend a study by Homan, van Knippenberg, van Kleef, & De Dreu (2007).
In doing so, I was unable to locate an R function for calculating the Blau Index. I therefore wrote my own and thought that others might also find it useful.
The function takes two arguments:
A numeric vector, groupID, denoting the group of every person/participant in the data set.
A second vector, feat, that can be either numeric or string, denoting the expression of the feature for each person/participant in the data set.
The function returns a vector of length = number of groups with the Blau Index for each group.
Example:
groupid <- c(1,1,1,2,2,2,2)
feature <- c("male", "male", "male", "female", "female", "male", "male")
blau.index(groupid, feature)
[1] 0.0 0.5
Here is the code:
blau.index <- function(groupid, feat){
blau.index <- rep(0, length(levels(as.factor(groupid))))
if (is.numeric(feat)) { # if the feature is denoted as a numeric ordinal variable
for (i in 1:length(levels(as.factor(groupid)))){
for (j in 1:length(levels(as.factor(feat)))){
blau.index[i] <- blau.index[i] + ((sum(feat[groupid == i & feat == j])/j)/ length(feat[groupid == i]))^2
}
}
} else { # if the feature is denoted as as strings
number.of.features <- length(levels(as.factor(feat)))
feat.num <- rep(NA, times = length(as.factor(feat)))
for (i in 1:number.of.features){
feat.num[feat == levels(as.factor(feat))[i]] <- i
feat.num[is.na(feat.num)] <- (number.of.features + 1)
}
for (i in 1:length(levels(as.factor(groupid)))){
for (j in 1:length(levels(as.factor(feat.num)))){
blau.index[i] <- blau.index[i] + ((sum(feat.num[groupid == i & feat.num == j])/j)/ length(feat.num[groupid == i]))^2
}
}
}
blau.index <- (1 - blau.index)
return(blau.index)
}
I would appreciate suggestions for improvements.
References
Blau, P. M. (1977). Inequality and heterogeneity. New York, NY: Free Press.
Bantel, K., & Jackson, S. (1989). Top management and innovations in banking: does the composition of the top team make a difference? Strategic Management Journal, 10, 107–124.
Chandler, G. N., Honig, B., & Wiklund, J. (2005). Antecedents, moderators, and performance consequences of membership change in new venture teams. Journal of Business Venturing, 20, 705–725.
Homan, A. C., van Knippenberg, D., Kleef, G. A. van, & De Dreu, C. K. W. (2007). Bridging faultlines by valuing diversity: Diversity beliefs, information elaboration, and performance in diverse work groups. Journal of Applied Psychology, 92(5), 1189–1199.
Pitts, D. (2005). Diversity, representation, and performance: Evidence about race and ethnicity in public organizations. Journal of Public Administration Research and Theory, 15, 615–631.
Richard, O., Barnett, T., Dwyer, S., & Chadwick, K. (2004). Cultural diversity in management, firm performance, and the moderating role of entrepreneurial orientation dimensions. Academy of Management Journal, 47, 255–266.
Labels: R, research, social psychology, statistics