Introduction and Motivation

This report seeks to analyze the network data set collected by Gill et al. (2014) on the connections between members of the Provisional Irish Republican Army (PIRA). Their paper analyzes four networks corresponding to four different periods of PIRA’s history. This exercise will focus on the second period (1977-1980). The authors describe it as a period where PIRA decided they needed to undergo structural change, emphasizing “a return to secrecy and stricter discipline.”

An important thing to note is that the authors do not provide a replication package for their analysis. As such, it is impossible to know the exact specification of the model. Due to this issue, the goal of this analysis will not be to replicate the original table. Instead, we will focus on a simpler model that is (inevitably) specified differently and takes advantage of different ERGM effects (such as isolatededges) to attempt to achieve a strong fit between the modeled and the observed data.

Data

The data set contains 260 nodes (PIRA members) and 340 ties (connections between them). The authors collected attribute data for the nodes including: gender, education, and age at recruitment. The attribute matrix contained a number of missing values. Missing attribute values were imputed with the most common value for the given attribute.

Theoretical Framework

The main focus of this analysis will be on understanding the role of homophily in the formation of connections between PIRA members. To this end, we will focus on the demographic characteristics of nodes (gender, education, age at recruitment). Standard network theory tells us that similar nodes may be likelier to form connections with each others than nodes that do not share similar attributes. As such, we will test the following homophily-related hypotheses:

Furthermore, we will also test hypotheses that deal with structural factors:

Visualizing the Network

We can visualize the network to get an idea of how it is organized. In the process, we also take a look at how the nodes are distributed by gender, education level, and age of recruitment.

Modelling

The approach used in this report to test the aforementioned hypotheses will be to use exponential family random graph models (ERGMs). This approach will allow us to test how likely it is that certain dyadic covariates (matching/similarity of attributes), and network properties (degree distribution, triad closure) explain the ties and structure of the network.

We can begin with a naïve ERGM that only takes into account an edge count of our network.

Naive Model
  Naive Model
edges -4.59***
  (0.05)
AIC 3803.44
BIC 3811.86
Log Likelihood -1900.72
***p < 0.01; **p < 0.05; *p < 0.1

Unsurprisingly, the effect of edges is highly significant. To see how accurate the predictions of this model are, we can run goodness-of-fit diagnostics.

Clearly, this model is not a good fit. Both overestimating and underestimating the log odds for sections of the degree, edge-wise shared partners and triad census measures.

We will now attempt to improve these goodness-of-fit measures by estimating a more complex model using additional effects (gwesp, nodecov, nodefactor, and nodematch using the attributes gender, education, and recruiting age).

We are able to immediately see some improvement in every metric. However, the model still overestimates and underestimates the odds for the three aforementioned metrics.

Looking at the graph for the network, it is clear that there are 0 nodes with no ties (isolates), and many nodes with exactly one tie. We will include the term isolatededges to account for this.

The odds for an edge given a number of edge-wise shared partners decays too fast. To adjust for this, we can increase the decay parameter of gwesp().

After these changes, the goodness-of-fit diagnostics are as follows:

Again, we see substantial improvement all around. We were not able to get the odds for a node given degree 0 to track correctly with the observed data, but we did get it closer. The edge-wised shared partners and triad census metrics are also closer to the observed data than in the previous model.

MCMC Diagnostics

The MCMC diagnostics are roughly fine. Sample statistic auto-correlation values stay close to 0. The differences between the observed and simulated values are roughly normally distributed around 0 (with nodematch.gender and nodefactor.gender.1 being exceptions).

Results

ERGM results table
  Naive Model Complex Model Adjusted Complex Model
edges -4.59*** -7.55*** -7.57***
  (0.05) (0.65) (0.62)
gwdeg.fixed.0.25   2.21*** 2.63***
    (0.26) (0.32)
gwesp.fixed.0.25   2.52***  
    (0.11)  
nodematch.gender   0.64** 0.59**
    (0.25) (0.25)
nodefactor.gender.1   0.59*** 0.53***
    (0.20) (0.19)
nodematch.university   -0.25 -0.26
    (0.53) (0.51)
nodefactor.university.1   -0.21 -0.23
    (0.51) (0.48)
nodematch.rec_age   0.05 0.04
    (0.22) (0.21)
nodecov.rec_age   0.01 0.00
    (0.00) (0.00)
gwesp.fixed.0.4     2.61***
      (0.12)
isolatededges     1.60***
      (0.25)
AIC 3803.44 3109.99 2978.52
BIC 3811.86 3185.81 3062.76
Log Likelihood -1900.72 -1545.99 -1479.26
***p < 0.01; **p < 0.05; *p < 0.1

Interpretation

Homophily hypotheses:

The positive and significant at the 5% level coefficient for nodematch.gender provides evidence in favor of H1. That is to say, it appears as if PIRA members of the same gender are indeed more likely to share a tie. The coefficient for nodefactor.gender.1 is also positive and significant, this time at the 1% level. This tells us that being female makes you more likely to generate ties.

The insignificant coefficients for nodematch.university and nodematch.rec_age provide evidence against H2 and H3. Sharing the same education level or having been recruited at the same age does not appear to make it more likely for two nodes to share a tie. Similarly, education level and age at recruitment are not statistically significant predictors of ties.

Structural hypotheses:

The significant coefficient for gwesp indicates that triad closure positively affects the probability of a tie occurring, providing evidence in favor of H4. The coefficient for gwdegree indicates that preferential attachment also positively affects the probability of a tie, providing evidence in favor of H5.

Conclusions and limitations

Some of the estimated coefficients are remarkably different to those in the original paper. In particular, the estimates for structural factors are significant in our analysis while in theirs it is not. Furthermore, age of recruitment is significant in their analysis while in ours it is not. This could be the result of a number of factors, from advances in ERGM estimation, to differences in how the model is specified.

There are also some limitations to our approach. The goodness of fit diagnostics are not perfect, which may indicate that our results are not entirely accurate. Furthermore, this analysis does not use all of the effects used by the paper and as such it could be suffering from omitted variable bias.