Theoretical/empirical framework

Data

The original dataset is “Mexican political elite”. The characteristics (taken from the website under the link) are as follows:

“Mexican_power.net: 35 vertices (Mexican presidents and close collaborators), 117 edges (political, kinship, friendship, or business ties), no arcs, no loops, no line values.

Mexican_military.clu: a classification of the (35) politicians according to their professional background (1 - military in class, 2 - civilians).

Mexican_year.clu: the first year (minus 1900) in which the actor occupied a significant governmental position.”

Setting up the data

Loading all required packages

library(network)
library(migraph)
library(igraph)
library(corrplot)
library(sna)

Importing the data from the original dataset, nodes and edges as a matrix, attributes as lists

mexican_power <- read.paj(file = "mexican_power.paj")

m_p_network <- as.matrix(mexican_power$networks$mexican_power.net)
year <- unlist(mexican_power$partitions$mexican_year.clu)
military <- unlist(mexican_power$partitions$mexican_military.clu)

Transforming the data and adding the attributes

The year attribute is transformed based on the minimum value, in order to turn it into a seniority value, to be understood as duration since entry into political power apparatus.

year <- min(year)/year

# Initially, this line didn't work:
# m_p_network <- m_p_network %>% mutate(year = year,
#                                       military = military)

# Re-installing migraph and retrying didn't make it  work, but this solution helped:

m_p_network <- to_undirected(as_tidygraph(m_p_network))
m_p_network <- m_p_network %>% mutate(year = year,
                                      military = military)

Visualising with fitting parameters

# Visualising the network with the following line worked, but the size was off.

# autographr(to_undirected(m_p_network), labels = FALSE,
#            node_shape = "military", node_size = year)

# Using "year" as the node size produces an unsatisfying result, an initial attempt to adjust the size with *0.1 and /10 was not successful. Using log(year) then initially helped. After the transformation of "year" to a seniority value, which I implemented only later on in the drafting process, this adjustment worked:

autographr(to_undirected(m_p_network), labels = FALSE,
           node_shape = "military", node_size = 20*year) + ggtitle("Mexican Power Network (I)", subtitle = "based on Seniority (size) and Military (circle) / Civilian (square)")

I am adding another attribute, place of birth. I use a division of the 32 Mexican states into three regions (North, Center, and South) for this to keep the attribute manageable for the given number of nodes Viesti 2015, Regional Development Scan: Mexico, EUROsociAL Programme for Social Cohesion in Latin America, p. 8. In the source, three states were categorized as “hinge states”. I have decided to count those states toward the peripheral North or South region, respectively.

Pulling a list of node names

# It took several attempts to figure out how to pull the node names to be used to collect that data:

# names <- mexican_power$networks$mexican_power.net
# names <- names$val

# m_p_matrix <- as.matrix(mexican_power$networks$mexican_power.net)
# m_p_graph <- graph_from_adjacency_matrix(m_p_matrix)
# V(m_p_graph)$names # NULL
# row.names(m_p_matrix)

# Finally, this worked:

# write.csv(row.names(m_p_matrix),
#           file = "Mexican power names.csv")

Importing the csv file with the new attribute, adding it to the network object, and visualising the network

# For the data collection, see the CSV File named "Mexican power data collection", which includes the birthplaces of nodes according to their respective page on the Spanish Wikipedia (for two nodes, an additional web search was necessary), the state, and the region code (1=North, 2=Centre, 3=South).  I created a reduced file, region.csv, with the node names in the first and the region codes in the second column.

region <- read.csv(file = "region.csv")
region <- unlist(region$region)
m_p_network <- m_p_network %>% mutate(region = region)

BaseGraph <- autographr(to_undirected(m_p_network), labels = FALSE,
                    node_shape = "military", node_size = 20*year, node_color = "region") + ggtitle("Mexican Power Network (II)", subtitle = "based on Seniority (size),Military (circle) / Civilian (square),\nand Origin (North=red, Centre=blue, South=green)")

#ggraph::geom_node_point(alpha= Appearances/max(Appearances))) +
#  labs(alpha= 'Appearances',title= 'Figure 1A: Closeness Centrality')

# Despite some web research, I was unable to implement a legend with autographr. The shapes and colors were identified using node_group = "[military/region]".
region
##  [1] 1 1 1 1 3 1 1 3 3 2 3 3 1 2 2 3 3 3 2 2 3 3 2 3 3 2 3 1 2 3 1 2 2 3 2

Full network data visualization

BaseGraph

Research question/Hypothesis

The overall guiding hypothesis is whether the attributes have a significant explanatory power for the ties between actors. We suspect that homophily / proximity has a positive effect on whether two node have a tie.


Analysis based on (Multiple regression) Quadratic Assignment Procedure ((MR)QAP)

Exploring correlations between the variables

Next, I test correlations between the edges and nodal attributes. For this, matrices mapping whether nodes have the same military/civilian and region attribute are needed. Furthermore, the year attribute will be coerced into a matrix that expresses the temporal distance of entry into a significant governmental position between two actors.

militarymat <- matrix(military,35,35)
militarymatT <- t(matrix(military,35,35))

regionmat <- matrix(region,35,35)
regionmatT <- t(matrix(region,35,35))

# Since a binary matrix that indicates whether two nodes share an attribute value requires a "1" for same attribute values, but subtracting the column- and row-values matrices delivers "0" for this and numbers =/= 0 for differences, I use the following workaround to transform the matrix. 
sameMilit <- militarymat-militarymatT # produces values of -1, 0 and 1
# The following approach did not work, there seems to be an issue with the "=" sign
# sameMilit[sameMilit = 0] <- 2 # temporary placeholder for same attribute value
# sameMilit[sameMilit < 2] <- 0 # "1"s and "-1"s turned to "0" (= not same value)
# sameMilit[sameMilit = 2] <- 1 # final value for same attribute value
sameMilit[sameMilit < 0] <- 1 # different military/civilian
sameMilit[sameMilit < 1] <- 2 # same military/civilian
sameMilit[sameMilit < 2] <- 0 # final value for different military/civilian
sameMilit[sameMilit > 1] <- 1 # final value for same military/civilian

sameRegion <- regionmat-regionmatT # produces values of -2, -1, 0, 1 and 2
sameRegion[sameRegion < 0] <- 1 # different region
sameRegion[sameRegion > 0] <- 1 # different region
sameRegion[sameRegion < 1] <- 2 # same region
sameRegion[sameRegion < 2] <- 0 # final value for different region
sameRegion[sameRegion > 1] <- 1 # final value for same region

yearmat <- matrix(year,35,35)
yearmatT <- t(matrix(year,35,35))
distanceYear <- yearmat-yearmatT
distanceYear <- abs(distanceYear)

network_ties <- as.matrix(mexican_power$networks$mexican_power.net) # same code as for pulling the data for m_p_network (see above)
# colnames(network_ties) <- NULL
# rownames(network_ties) <- NULL

corrDataFrame <- data.frame(ties = as.numeric(network_ties),
                            sameMilitary = as.numeric(sameMilit),
                            sameRegion = as.numeric(sameRegion),
                            distanceYear = as.numeric(distanceYear))
# cor(corrDataFrame)
par(mfrow = c(1,1))
corrplot(cor(corrDataFrame))

We can observe positive correlations between the first three variables. For distance, we suspect the inverse of distance (= close temporal proximity of actors) to have a positive correlation on ties, meaning two actors whose time of entering into the political power structure is close to each other. The last variable can therefore be considered as a positive correlate with the other variables as well. Those correlations are indeed the strongest. The correlation between the same region and the same military-civilian status is the weakest.

Logistic (MR)QAP model

Since the network is unweighted, netlm would not be of help here, therefore we resort to netlogit with default settings which are the same as for netlm (nullhyp=QAP and rep=1000).

What is tested here is whether having the same military/civilian status, originating from the same (macro-)region, and having entered into a significant governmental position in close proximity has significant effect of nodes having a tie. For the first two attributes, ego and alter effect are tested as well:

Model 1 tests the military/civilian and the region attribute with regard to homophily. Model 2 tests these attributes with regard to their ego and alter effect. Model 3 combines both.

matrixlist1 <- list(sameMilit, sameRegion, distanceYear)
matrixlist2 <- list(militarymat, militarymatT, distanceYear, regionmat, regionmatT)
matrixlist3 <- list(sameMilit, sameRegion, distanceYear, regionmat, regionmatT, militarymat, militarymatT)

results1 <- sna::netlogit(network_ties, matrixlist1)
# summary(results1) # works just the same as print.summary.netlogit()

results2 <- sna::netlogit(network_ties, matrixlist2)
# summary(results2)

results3 <- sna::netlogit(network_ties, matrixlist3)
# summary(results3)
# Figuring out if turning the distance into proximity makes any difference...
# max(distanceYear)
dummyMatrix <- matrix(max(distanceYear), nrow = 35, ncol = 35)
proximityYear <- dummyMatrix - distanceYear

results1a <- sna::netlogit(network_ties, list(sameMilit, sameRegion, proximityYear))
# summary(results1a)

# no; the only difference from "distanceYear" to "proximityYear" is the sign (a positive instead of a negative estimate), as suspected.

# In order to avoid summary(...) outputs taking up lots of space, I sum them up in this table ("p-value" was "Pr(>=|b|)" in the original output, which caused issues with the table formatting):

Overview of results (I)

Variables EstimateM1 p-value EstimateM2 p-Value EstimateM3 p-Value
(intercept) -1.4628191 0.000 2.5686686 0.040 2.5036728 0.062
sameMilit 0.4745965 0.069 / / 0.7238885 0.006
sameRegion 0.6478313 0.003 / / 0.7751641 0.004
distanceYear -3.2905480 0.007 -5.9440693 0.000 -5.5577365 0.000
regionmat / / -0.3425002 0.045 -0.3813261 0.032
regionmatT / / -0.3425002 0.046 -0.3813261 0.028
militarymat / / -0.4970676 0.072 -0.6631996 0.026
militarymatT / / -0.4970676 0.074 -0.6631996 0.023

Ego and alter effects have highly collinear estimates here, which should be attributable to undirectedness of the network. Therefore, Model 2 and 3 will be reduced to ego effect only.

matrixlist2a <- list(militarymat, distanceYear, regionmat)
matrixlist3a <- list(sameMilit, sameRegion, distanceYear, regionmat, militarymat)

results2a <- sna::netlogit(network_ties, matrixlist2a)
# summary(results2a)

results3a <- sna::netlogit(network_ties, matrixlist3a)
summary(results3a)
## 
## Network Logit Model
## 
## Coefficients:
##             Estimate   Exp(b)     Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept)  0.5323624 1.70295063 0.788   0.212   0.412    
## x1           0.6492038 1.91401622 0.997   0.003   0.004    
## x2           0.7328251 2.08095118 1.000   0.000   0.000    
## x3          -4.4340427 0.01186642 0.000   1.000   0.000    
## x4          -0.3860645 0.67972671 0.008   0.992   0.013    
## x5          -0.6934540 0.49984661 0.004   0.996   0.004    
## 
## Goodness of Fit Statistics:
## 
## Null deviance: 1649.69 on 1190 degrees of freedom
## Residual deviance: 1048.442 on 1184 degrees of freedom
## Chi-Squared test of fit improvement:
##   601.2482 on 6 degrees of freedom, p-value 0 
## AIC: 1060.442    BIC: 1090.932 
## Pseudo-R^2 Measures:
##  (Dn-Dr)/(Dn-Dr+dfn): 0.3356588 
##  (Dn-Dr)/Dn: 0.3644613 
## Contingency Table (predicted (rows) x actual (cols)):
## 
##          Actual
## Predicted     0     1
##         0   952   226
##         1     4     8
## 
##  Total Fraction Correct: 0.8067227 
##  Fraction Predicted 1s Correct: 0.6666667 
##  Fraction Predicted 0s Correct: 0.8081494 
##  False Negative Rate: 0.965812 
##  False Positive Rate: 0.0041841 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Distribution Summary:
## 
##        (intercept)        x1        x2        x3        x4        x5
## Min      -4.945555 -4.162289 -3.570691 -4.624606 -4.190369 -5.138946
## 1stQ     -0.981142 -1.061858 -1.020036 -1.318859 -1.071289 -0.929046
## Median    0.166713 -0.077749 -0.105973 -0.134307 -0.029692 -0.004278
## Mean      0.107210 -0.079452 -0.033513  0.106749 -0.023757 -0.031568
## 3rdQ      1.124053  0.869296  0.855048  1.287952  0.985075  0.915415
## Max       4.289912  4.988686  4.038852  5.706692  4.110261  3.591326

Overview of results (II)

Variables EstimateM1 p-value EstimateM2a p-Value EstimateM3a p-Value
(intercept) -1.4628191 0.000 0.7747908 0.188 0.5323624 0.375
sameMilit 0.4745965 0.069 / / 0.6492038 0.007
sameRegion 0.6478313 0.003 / / 0.7328251 0.001
distanceYear -3.2905480 0.007 -4.9655892 0.000 -4.4340427 0.000
regionmat / / -0.3180530 0.027 -0.3860645 0.012
militarymat / / -0.4746983 0.061 -0.6934540 0.007

Interpretation

In Model 1, having the same military/civilian attribute is not significant with a p-Value higher than 0.05; originating from the same region and having proximate years of entering government is significant and positive (with a negative effect high distance having a negative effect).

In Model 2, distance/proximity is again significant. Ego effect of the region and the military/civilian attribute are negative, however, only region is significant.

In Model 3, the intercept is not significant (unlike in Model 1 and 2). All variables are significant, with the same sign as in Model 1 or 2 respectively, but with a higher estimate each. Notably, the simultaneous use of same[Attribute] and ego effects leads to minor changes in the estimates of homophily.

The interpretation of the given data set is clearly limited by a number of factors. First, the network simplifies different kinds of ties into uniform unweighted ties. Furthermore, information about when the ties were established, especially combined with the year attribute could have allowed to get more insightful results (employing a Stochastic Actor-oriented Model).


Conclusion

The analysis has shown that military/civilian homophily is a significant variable to explain ties between actors in the Mexican power elites only when controlling the base ego effect (though I am unsure about the interpretation of the negative estimates of the latter). This conforms with background information from the original data source (see link above) which notes that “[t]he main opposition seems to be situated between civilians and members of the military. After the revolution, the political elite was dominated by the military but gradually the civilians have assumed power.” The latter is, unsurprisingly, not directly reflected in the network. As noted, the temporal element is missing for the establishment of ties. Region homophily of actors has a significant positive explanatory effect for ties (with and without controlling for ego effect). This also applies to the temporal proximity of entering into a significant governmental position between two actors.