2009-08-12

An R Function for the Blau Index of Diversity

In diversity research, one is often interested in how an individual feature is distributed among the members of a group. In other words, one is interested in how diverse a group is with regard to that feature. If the particular feature can be expressed in a metric way, e.g. age or organizational tenure, researchers use measures of dispersion for quantifying the diversity of a group with regard to that feature. For example, the standard deviation of the average age of group members can be employed to indicate the age diversity of a group.

If researchers wish to quantify the diversity of a group with regard to a nominal feature, such as ethnicity, gender, or education, they usually employ the Blau Index (Blau, 1977). The Blau Index is calculated by
where p is the proportion of group members in a given category and i is the number of different categories of the feature across all groups. If a group is homogeneous with regard to the feature in question, i.e., if all group members have the same nationality, the Blau Index of the group for nationality is 0. If all members of the group have a different nationality, the Blau Index of that group for nationality approaches 1. The maximum Blau Index for a feature in a given data set depends on the number of categories of that feature in the data set.

A number of studies have linked the Blau Index of (management-) teams to team processes and team outcomes (e.g., Bantel & Jackson, 1989; Richard, Barnett, Dwyer, & Chadwick, 2004; Chandler, Honig, & Wiklund, 2005; Pitts, 2005). Threfore, I also wanted to include the Blau Index for various features in the analysis of the data I obtained in an attempt to replicate and extend a study by Homan, van Knippenberg, van Kleef, & De Dreu (2007).

In doing so, I was unable to locate an R function for calculating the Blau Index. I therefore wrote my own and thought that others might also find it useful.

The function takes two arguments:
A numeric vector, groupID, denoting the group of every person/participant in the data set.
A second vector, feat, that can be either numeric or string, denoting the expression of the feature for each person/participant in the data set.

The function returns a vector of length = number of groups with the Blau Index for each group.

Example:

groupid <- c(1,1,1,2,2,2,2)
feature <- c("male", "male", "male", "female", "female", "male", "male")

blau.index(groupid, feature)

[1] 0.0 0.5

Here is the code:

blau.index <- function(groupid, feat){
blau.index <- rep(0, length(levels(as.factor(groupid))))
if (is.numeric(feat)) { # if the feature is denoted as a numeric ordinal variable
for (i in 1:length(levels(as.factor(groupid)))){
for (j in 1:length(levels(as.factor(feat)))){
blau.index[i] <- blau.index[i] + ((sum(feat[groupid == i & feat == j])/j)/ length(feat[groupid == i]))^2
}
}
} else { # if the feature is denoted as as strings
number.of.features <- length(levels(as.factor(feat)))
feat.num <- rep(NA, times = length(as.factor(feat)))
for (i in 1:number.of.features){
feat.num[feat == levels(as.factor(feat))[i]] <- i
feat.num[is.na(feat.num)] <- (number.of.features + 1)
}
for (i in 1:length(levels(as.factor(groupid)))){
for (j in 1:length(levels(as.factor(feat.num)))){
blau.index[i] <- blau.index[i] + ((sum(feat.num[groupid == i & feat.num == j])/j)/ length(feat.num[groupid == i]))^2
}
}
}
blau.index <- (1 - blau.index)
return(blau.index)
}


I would appreciate suggestions for improvements.

References

Blau, P. M. (1977). Inequality and heterogeneity. New York, NY: Free Press.

Bantel, K., & Jackson, S. (1989). Top management and innovations in banking: does the composition of the top team make a difference? Strategic Management Journal, 10, 107–124.

Chandler, G. N., Honig, B., & Wiklund, J. (2005). Antecedents, moderators, and performance consequences of membership change in new venture teams. Journal of Business Venturing, 20, 705–725.

Homan, A. C., van Knippenberg, D., Kleef, G. A. van, & De Dreu, C. K. W. (2007). Bridging faultlines by valuing diversity: Diversity beliefs, information elaboration, and performance in diverse work groups. Journal of Applied Psychology, 92(5), 1189–1199.

Pitts, D. (2005). Diversity, representation, and performance: Evidence about race and ethnicity in public organizations. Journal of Public Administration Research and Theory, 15, 615–631.

Richard, O., Barnett, T., Dwyer, S., & Chadwick, K. (2004). Cultural diversity in management, firm performance, and the moderating role of entrepreneurial orientation dimensions. Academy of Management Journal, 47, 255–266.

10 comments:

  1. Stephan Kolassa9:57 AM

    Here you go:

    blau <- function (features) { 1-sum((table(features)/length(features))^2) }
    by(data=feature,INDICES=groupid,FUN=blau)

    The first line defines a function "blau", which calculates the Blau index for a single group, by tabulating the feature using table(), getting the relative frequencies of the features by dividing by the length of the features vector, squaring and subtracting from one.

    The second line uses by() to apply blau() separately to the features as indexed by the groupid vector. We even get a nice tabulated output. by() is often quite helpful...

    This solution also seems to be faster:

    nn <- 1000000
    set.seed(2009)
    groupid <- sample(seq(1,10),size=nn, replace=TRUE)
    feature <- sample(c("male","female"),size=nn, replace=TRUE)

    system.time(blau.index(groupid, feature))
    system.time(by(data=feature,INDICES=groupid,FUN=blau))

    yields 5.09/.27/5.39 for blau.index() and .67/.05/.72 for by(blau).

    Good luck with your diversity research!

    ReplyDelete
  2. Along with being stylish to a great extent UGG are also quite comfortable. Though these kinds of UGG Boots go well any kind of outfit yet they are more perfectly suitable with tight jeans.

    ReplyDelete
  3. Anonymous10:37 PM

    Hello, I am a PhD student writing a dissertion about Board diversity. I have data about gender, age, tenure, and educaciĆ³n level for about 180 Spanish Boards. I want to calculate an accumulated Diversity index per Board. I am sorry for my ignorance, as I am just starting my work, but I wonder if there is a model for calculating this, or if I have to calculate the Blau index per attribute (for gender, for age, for tenure.....) I would be very grateful to receive information to adaktiva@yahoo.es. Thanks Brita Wergeland

    ReplyDelete
  4. I really enjoyed your awesome post, keep sharing more posts like this.
    t20 world cup 2016 theme song
    t20 world cup 2016 live

    ReplyDelete

Note: Only a member of this blog may post a comment.