rm(list = ls())
library(ergm)
library(sna)
library(coda)
library(latticeExtra)
library(migraph) #Version 0.8.13
library(sand)
library(RColorBrewer)
data(lazega)
This poster uses an Exponential Random Graph Model (ERGM) to analyze the Lazega network which represents lawyers in a US law firm at the East Coast. ERGMs aspire to “describe parsimoniously the local selection forces that shape the global structure of a network”. The Lazega network used in this study is a subset of the original research done by Emmanuel Lazega for his study on “The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership”. The undirected network consists of 36 nodes and 115 edges. Furthermore, there are nine attributes: ‘name’, ‘Seniority’, ‘Status’, ‘Gender’, ‘Office’, ‘Years’ (years with the firm), ‘Age’, ‘Practice’, and ‘School’.
summary(lazega)
## IGRAPH 3e8b2bf UN-- 36 115 --
## + attr: name (v/c), Seniority (v/n), Status (v/n), Gender (v/n), Office
## | (v/n), Years (v/n), Age (v/n), Practice (v/n), School (v/n)
It is unclear what edges refer to exactly as participants were asked three different questions referring to strong co-worker network, basic advice network, and friendship network, and the answers do not seem to be disaggregated. Therefore, this analysis works under the assumption that an edge means that there is a meaningful connection between two nodes and that meaning could be generated through strong professional connections, mentorship relationships, or friendships.
This research is interested in the influence of age of an individual and the school people went to and how those attributes shape the likelihood of a tie forming between two individuals working at the law firm.
The main research question is: What structures and attributes predict the network structure of a law firm?
This model shows us with the three different colors, the three different schools people went to. The sizes of the nodes reflects the age of a person – the bigger the node, the older a person is.
There are two isolates and a main component.
This research uses an ERGM as a model to study the four hypotheses.
After running the ERGM, the convergence test gives a p-value of < 0.0001. The model has converged with 99% confidence.
The usual two main issues, dependency on starting values and autocorrelation due to Markov chain, are absent as one can see below.
## Sample statistics summary:
##
## Iterations = 612352:4894720
## Thinning interval = 2048
## Number of chains = 1
## Sample size per chain = 2092
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## edges 2.8499 23.469 0.51312 0.9312
## gwesp.fixed.0.8 6.7323 53.192 1.16296 2.0703
## gwdeg.fixed.0.8 0.4547 5.327 0.11647 0.2019
## nodecov.Age 270.4082 2189.825 47.87717 86.0190
## nodematch.School.1 0.2772 4.488 0.09812 0.1326
## nodematch.School.2 0.3432 4.251 0.09295 0.1283
## nodematch.School.3 0.2844 5.288 0.11562 0.1576
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## edges -47.00 -12.000 5.000 19.000 44.000
## gwesp.fixed.0.8 -106.03 -27.760 10.000 44.061 102.320
## gwdeg.fixed.0.8 -12.28 -2.513 1.327 4.303 8.604
## nodecov.Age -4342.60 -1156.500 392.000 1795.000 4201.525
## nodematch.School.1 -7.00 -3.000 0.000 3.000 10.000
## nodematch.School.2 -8.00 -3.000 0.000 3.000 9.000
## nodematch.School.3 -9.00 -4.000 0.000 4.000 11.000
##
##
## Are sample statistics significantly different from observed?
## edges gwesp.fixed.0.8 gwdeg.fixed.0.8 nodecov.Age
## diff. 2.849904398 6.732334102 0.45469705 2.704082e+02
## test stat. 3.060333480 3.251889885 2.25207668 3.143588e+00
## P-val. 0.002210907 0.001146404 0.02431743 1.668901e-03
## nodematch.School.1 nodematch.School.2 nodematch.School.3
## diff. 0.27724665 0.34321224 0.28441683
## test stat. 2.09017610 2.67575826 1.80460873
## P-val. 0.03660198 0.00745604 0.07113593
## Overall (Chi^2)
## diff. NA
## test stat. 21.580539268
## P-val. 0.003270433
##
## Sample statistics cross-correlations:
## edges gwesp.fixed.0.8 gwdeg.fixed.0.8 nodecov.Age
## edges 1.0000000 0.9916708 0.9004306 0.9963389
## gwesp.fixed.0.8 0.9916708 1.0000000 0.8523068 0.9855116
## gwdeg.fixed.0.8 0.9004306 0.8523068 1.0000000 0.9048509
## nodecov.Age 0.9963389 0.9855116 0.9048509 1.0000000
## nodematch.School.1 0.5060076 0.4986980 0.4656503 0.5340938
## nodematch.School.2 0.4944579 0.4921203 0.4396115 0.4771588
## nodematch.School.3 0.5490601 0.5501791 0.4823859 0.5296334
## nodematch.School.1 nodematch.School.2 nodematch.School.3
## edges 0.50600765 0.49445790 0.54906013
## gwesp.fixed.0.8 0.49869802 0.49212033 0.55017905
## gwdeg.fixed.0.8 0.46565026 0.43961146 0.48238588
## nodecov.Age 0.53409378 0.47715878 0.52963344
## nodematch.School.1 1.00000000 0.02030184 0.04217687
## nodematch.School.2 0.02030184 1.00000000 0.05228600
## nodematch.School.3 0.04217687 0.05228600 1.00000000
##
## Sample statistics auto-correlation:
## Chain 1
## edges gwesp.fixed.0.8 gwdeg.fixed.0.8 nodecov.Age
## Lag 0 1.00000000 1.00000000 1.00000000 1.00000000
## Lag 2048 0.53403589 0.52009786 0.46675644 0.52678867
## Lag 4096 0.29316794 0.28411461 0.25206402 0.28572589
## Lag 6144 0.14560655 0.13472930 0.13503274 0.13905920
## Lag 8192 0.06859972 0.06296458 0.06489611 0.06649309
## Lag 10240 0.02950423 0.02812623 0.03854327 0.03037485
## nodematch.School.1 nodematch.School.2 nodematch.School.3
## Lag 0 1.000000000 1.00000000 1.00000000
## Lag 2048 0.252586350 0.31119163 0.30006376
## Lag 4096 0.103828941 0.11252911 0.10006358
## Lag 6144 0.033330758 0.06091694 0.01354401
## Lag 8192 0.016920925 0.03963811 -0.01079892
## Lag 10240 -0.008602328 0.04065951 -0.02951285
##
## Sample statistics burn-in diagnostic (Geweke):
## Chain 1
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## edges gwesp.fixed.0.8 gwdeg.fixed.0.8 nodecov.Age
## 0.7395 0.6502 1.4109 0.7514
## nodematch.School.1 nodematch.School.2 nodematch.School.3
## 1.2591 0.8767 0.1251
##
## Individual P-values (lower = worse):
## edges gwesp.fixed.0.8 gwdeg.fixed.0.8 nodecov.Age
## 0.4595941 0.5155815 0.1582829 0.4523986
## nodematch.School.1 nodematch.School.2 nodematch.School.3
## 0.2079788 0.3806570 0.9004144
## Joint P-value (lower = worse): 0.4433411 .
##
## MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).
The Goodness-of-Fit is not ideal which indicates that there is room to improve the model (see brief discussion below about taking into account more attributes).
##
## Goodness-of-fit for degree
##
## obs min mean max MC p-value
## degree0 2 0 2.22 7 1.00
## degree1 3 0 2.17 7 0.72
## degree2 2 0 2.28 6 1.00
## degree3 4 0 2.69 6 0.60
## degree4 2 0 3.10 9 0.82
## degree5 4 0 3.36 8 0.88
## degree6 4 0 3.40 8 0.88
## degree7 1 0 3.16 9 0.36
## degree8 1 0 2.66 8 0.44
## degree9 5 0 2.43 7 0.22
## degree10 1 0 2.23 8 0.70
## degree11 1 0 1.67 5 1.00
## degree12 2 0 1.36 6 0.74
## degree13 3 0 0.93 4 0.16
## degree14 0 0 0.91 4 0.80
## degree15 1 0 0.46 3 0.76
## degree16 0 0 0.33 3 1.00
## degree17 0 0 0.22 2 1.00
## degree18 0 0 0.12 1 1.00
## degree19 0 0 0.09 1 1.00
## degree20 0 0 0.06 1 1.00
## degree21 0 0 0.08 2 1.00
## degree22 0 0 0.03 1 1.00
## degree23 0 0 0.02 1 1.00
## degree25 0 0 0.02 1 1.00
##
## Goodness-of-fit for triad census
##
## obs min mean max MC p-value
## triadcensus.0 4036 2727 3985.10 4889 0.92
## triadcensus.1 2418 1888 2425.12 3000 0.92
## triadcensus.2 566 271 606.20 1153 0.90
## triadcensus.3 120 48 123.58 260 0.96
##
## Goodness-of-fit for edgewise shared partner
##
## obs min mean max MC p-value
## esp0 5 0 4.61 11 0.98
## esp1 16 2 15.85 31 1.00
## esp2 29 14 26.73 41 0.80
## esp3 17 14 26.62 44 0.14
## esp4 23 6 20.14 37 0.70
## esp5 11 1 12.20 29 0.94
## esp6 10 0 6.36 29 0.46
## esp7 4 0 2.89 16 0.60
## esp8 0 0 1.42 7 0.70
## esp9 0 0 0.65 7 1.00
## esp10 0 0 0.27 4 1.00
## esp11 0 0 0.11 2 1.00
## esp12 0 0 0.02 1 1.00
## esp13 0 0 0.02 1 1.00
##
## Goodness-of-fit for model statistics
##
## obs min mean max MC p-value
## edges 115.00000 77.00000 117.89000 179.00000 0.90
## gwesp.fixed.0.8 192.60814 100.07134 198.72708 345.21164 0.94
## gwdeg.fixed.0.8 67.92695 56.05722 68.55178 78.31059 0.82
## nodecov.Age 10526.00000 7260.00000 10809.52000 16563.00000 0.90
## nodematch.School.1 10.00000 2.00000 10.28000 23.00000 1.00
## nodematch.School.2 11.00000 2.00000 10.34000 19.00000 0.80
## nodematch.School.3 15.00000 4.00000 16.14000 29.00000 0.90
## Call:
## ergm(formula = lazeganetwork ~ edges + gwesp(0.8, fixed = T) +
## gwdegree(0.8, fixed = T) + nodecov("Age") + nodematch("School",
## diff = T))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -2.837923 0.852744 0 -3.328 0.000875 ***
## gwesp.fixed.0.8 1.147451 0.213771 0 5.368 < 1e-04 ***
## gwdeg.fixed.0.8 0.697980 0.628398 0 1.111 0.266685
## nodecov.Age -0.014988 0.006172 0 -2.428 0.015163 *
## nodematch.School.1 0.115268 0.317578 0 0.363 0.716635
## nodematch.School.2 -0.084793 0.322954 0 -0.263 0.792895
## nodematch.School.3 -0.091273 0.272982 0 -0.334 0.738112
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 873.4 on 630 degrees of freedom
## Residual Deviance: 510.6 on 623 degrees of freedom
##
## AIC: 524.6 BIC: 555.7 (Smaller is better. MC Std. Err. = 0.2072)
Structural related hypothesis:
Hypothesis 1: accepted - The p-value is statistically significant indicating that there are clusters in the law firm.
Hypothesis 2: rejected - The p-value is not statistically significant meaning that even if a node is very connected, it does not translate into a high likelihood of forming ties.
Attribute related hypotheses
Hypothesis 3: accepted - Age is negative and statistically significant which shows that the older a person gets, the less likely it is for them to form ties.
Hypothesis 4: rejected - Schools are not statistically significant, demonstrating that a common university does not predict ties forming and there is no homophily effect.
These findings mean that age in the firm correlates negatively with a node’s connection and this effect is significant. One could explain this phenomena that the older and more senior a person becomes in the firm, the less they are interested in having connections. New people are perhaps more dependent on forming connections. Nevertheless, this could also mean that the firm suffers from ageism and older colleagues get ignored by new colleagues who are probably younger.
Hypothesis 1 and 3 have been accepted meaning that there are clusters and older people have a smaller likelihood of forming ties. Hypothesis 2 and 4 have been rejected, demonstrating that current connections do not lead to more connections and schools are not a predictor for forming a tie.
Nevertheless, one would need to run a broader analysis that includes all attributes in order to check if these findings are correlated by chance. Below is the code that could be run on a stronger computer than mine. Additionally, the whole data set could be analyzed instead of just a subset of it. A limitation is that the data was collected through a survey where there is the risk of biased answers.
However, ff H3 is true, one could explain this as new people are more keen to expand their network and older people are more established in their career and do not need many connections. Alternatively, one could generate a new hypothesis and study to what extent older people get discriminated against at this law firm - which is a prevalent phenomena at law firms in the US.
# lazega.attr2 <- ergm(lazeganetwork ~ edges + gwesp(0.8,fixed=T) + gwdegree(0.8,fixed=T) + nodematch('Status') + nodefactor('Gender') +nodematch('Office') + nodecov('Years') + nodecov('Age') + nodematch('Practice') + nodematch('School'))
#
# # Diagnostics
# mcmc.diagnostics(lazega.attr2)
#
# # Goodness-of-Fit
# lazega.attr.gof <- gof(lazega.attr, GOF = ~ degree + triadcensus + espartners - model)
# lazega.attr.gof
# plot(lazega.attr.gof)
#
# # Summary
# summary(lazega.attr2)