A Social Network Analysis of a US Law Firm

Library and Data

rm(list = ls())
library(ergm)   
library(sna)    
library(coda)
library(latticeExtra)
library(migraph) #Version 0.8.13
library(sand)
library(RColorBrewer)
data(lazega)

Theoretical Framework

This poster uses an Exponential Random Graph Model (ERGM) to analyze the Lazega network which represents lawyers in a US law firm at the East Coast. ERGMs aspire to “describe parsimoniously the local selection forces that shape the global structure of a network”. The Lazega network used in this study is a subset of the original research done by Emmanuel Lazega for his study on “The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership”. The undirected network consists of 36 nodes and 115 edges. Furthermore, there are nine attributes: ‘name’, ‘Seniority’, ‘Status’, ‘Gender’, ‘Office’, ‘Years’ (years with the firm), ‘Age’, ‘Practice’, and ‘School’.

summary(lazega)
## IGRAPH 3e8b2bf UN-- 36 115 -- 
## + attr: name (v/c), Seniority (v/n), Status (v/n), Gender (v/n), Office
## | (v/n), Years (v/n), Age (v/n), Practice (v/n), School (v/n)

It is unclear what edges refer to exactly as participants were asked three different questions referring to strong co-worker network, basic advice network, and friendship network, and the answers do not seem to be disaggregated. Therefore, this analysis works under the assumption that an edge means that there is a meaningful connection between two nodes and that meaning could be generated through strong professional connections, mentorship relationships, or friendships.

Research Question and Hypotheses

This research is interested in the influence of age of an individual and the school people went to and how those attributes shape the likelihood of a tie forming between two individuals working at the law firm.

The main research question is: What structures and attributes predict the network structure of a law firm?

  • Structural related hypothesis:
    • H1: The law firm exhibits clustering structure as people connect with each other.
    • H2: The more connected a node is, the more likely it is to form new ties (preferential attachment).
  • Attribute related hypotheses:
    • H3: The longer an individual has been at the law firm, the more likely it is for them to have many connections.
    • H4: People who went to the same school are more likely to form a tie (homophily).

Visualization of the Network

This model shows us with the three different colors, the three different schools people went to. The sizes of the nodes reflects the age of a person – the bigger the node, the older a person is.

There are two isolates and a main component.

Modelling Approach

This research uses an ERGM as a model to study the four hypotheses.

  • Structural parameters:
    • Density (edges)
    • Clustering (gwesp, alpha = 0.8)
    • Preferential attachment (gwdegree, alpha = 0.8)
  • Attribute parameters:
    • Age
    • School

After running the ERGM, the convergence test gives a p-value of < 0.0001. The model has converged with 99% confidence.

Diagnostics

The usual two main issues, dependency on starting values and autocorrelation due to Markov chain, are absent as one can see below.

## Sample statistics summary:
## 
## Iterations = 612352:4894720
## Thinning interval = 2048 
## Number of chains = 1 
## Sample size per chain = 2092 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                        Mean       SD Naive SE Time-series SE
## edges                2.8499   23.469  0.51312         0.9312
## gwesp.fixed.0.8      6.7323   53.192  1.16296         2.0703
## gwdeg.fixed.0.8      0.4547    5.327  0.11647         0.2019
## nodecov.Age        270.4082 2189.825 47.87717        86.0190
## nodematch.School.1   0.2772    4.488  0.09812         0.1326
## nodematch.School.2   0.3432    4.251  0.09295         0.1283
## nodematch.School.3   0.2844    5.288  0.11562         0.1576
## 
## 2. Quantiles for each variable:
## 
##                        2.5%       25%     50%      75%    97.5%
## edges                -47.00   -12.000   5.000   19.000   44.000
## gwesp.fixed.0.8     -106.03   -27.760  10.000   44.061  102.320
## gwdeg.fixed.0.8      -12.28    -2.513   1.327    4.303    8.604
## nodecov.Age        -4342.60 -1156.500 392.000 1795.000 4201.525
## nodematch.School.1    -7.00    -3.000   0.000    3.000   10.000
## nodematch.School.2    -8.00    -3.000   0.000    3.000    9.000
## nodematch.School.3    -9.00    -4.000   0.000    4.000   11.000
## 
## 
## Are sample statistics significantly different from observed?
##                  edges gwesp.fixed.0.8 gwdeg.fixed.0.8  nodecov.Age
## diff.      2.849904398     6.732334102      0.45469705 2.704082e+02
## test stat. 3.060333480     3.251889885      2.25207668 3.143588e+00
## P-val.     0.002210907     0.001146404      0.02431743 1.668901e-03
##            nodematch.School.1 nodematch.School.2 nodematch.School.3
## diff.              0.27724665         0.34321224         0.28441683
## test stat.         2.09017610         2.67575826         1.80460873
## P-val.             0.03660198         0.00745604         0.07113593
##            Overall (Chi^2)
## diff.                   NA
## test stat.    21.580539268
## P-val.         0.003270433
## 
## Sample statistics cross-correlations:
##                        edges gwesp.fixed.0.8 gwdeg.fixed.0.8 nodecov.Age
## edges              1.0000000       0.9916708       0.9004306   0.9963389
## gwesp.fixed.0.8    0.9916708       1.0000000       0.8523068   0.9855116
## gwdeg.fixed.0.8    0.9004306       0.8523068       1.0000000   0.9048509
## nodecov.Age        0.9963389       0.9855116       0.9048509   1.0000000
## nodematch.School.1 0.5060076       0.4986980       0.4656503   0.5340938
## nodematch.School.2 0.4944579       0.4921203       0.4396115   0.4771588
## nodematch.School.3 0.5490601       0.5501791       0.4823859   0.5296334
##                    nodematch.School.1 nodematch.School.2 nodematch.School.3
## edges                      0.50600765         0.49445790         0.54906013
## gwesp.fixed.0.8            0.49869802         0.49212033         0.55017905
## gwdeg.fixed.0.8            0.46565026         0.43961146         0.48238588
## nodecov.Age                0.53409378         0.47715878         0.52963344
## nodematch.School.1         1.00000000         0.02030184         0.04217687
## nodematch.School.2         0.02030184         1.00000000         0.05228600
## nodematch.School.3         0.04217687         0.05228600         1.00000000
## 
## Sample statistics auto-correlation:
## Chain 1 
##                edges gwesp.fixed.0.8 gwdeg.fixed.0.8 nodecov.Age
## Lag 0     1.00000000      1.00000000      1.00000000  1.00000000
## Lag 2048  0.53403589      0.52009786      0.46675644  0.52678867
## Lag 4096  0.29316794      0.28411461      0.25206402  0.28572589
## Lag 6144  0.14560655      0.13472930      0.13503274  0.13905920
## Lag 8192  0.06859972      0.06296458      0.06489611  0.06649309
## Lag 10240 0.02950423      0.02812623      0.03854327  0.03037485
##           nodematch.School.1 nodematch.School.2 nodematch.School.3
## Lag 0            1.000000000         1.00000000         1.00000000
## Lag 2048         0.252586350         0.31119163         0.30006376
## Lag 4096         0.103828941         0.11252911         0.10006358
## Lag 6144         0.033330758         0.06091694         0.01354401
## Lag 8192         0.016920925         0.03963811        -0.01079892
## Lag 10240       -0.008602328         0.04065951        -0.02951285
## 
## Sample statistics burn-in diagnostic (Geweke):
## Chain 1 
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##              edges    gwesp.fixed.0.8    gwdeg.fixed.0.8        nodecov.Age 
##             0.7395             0.6502             1.4109             0.7514 
## nodematch.School.1 nodematch.School.2 nodematch.School.3 
##             1.2591             0.8767             0.1251 
## 
## Individual P-values (lower = worse):
##              edges    gwesp.fixed.0.8    gwdeg.fixed.0.8        nodecov.Age 
##          0.4595941          0.5155815          0.1582829          0.4523986 
## nodematch.School.1 nodematch.School.2 nodematch.School.3 
##          0.2079788          0.3806570          0.9004144 
## Joint P-value (lower = worse):  0.4433411 .

## 
## MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).

Goodness-of-Fit Diagnostics

The Goodness-of-Fit is not ideal which indicates that there is room to improve the model (see brief discussion below about taking into account more attributes).

## 
## Goodness-of-fit for degree 
## 
##          obs min mean max MC p-value
## degree0    2   0 2.22   7       1.00
## degree1    3   0 2.17   7       0.72
## degree2    2   0 2.28   6       1.00
## degree3    4   0 2.69   6       0.60
## degree4    2   0 3.10   9       0.82
## degree5    4   0 3.36   8       0.88
## degree6    4   0 3.40   8       0.88
## degree7    1   0 3.16   9       0.36
## degree8    1   0 2.66   8       0.44
## degree9    5   0 2.43   7       0.22
## degree10   1   0 2.23   8       0.70
## degree11   1   0 1.67   5       1.00
## degree12   2   0 1.36   6       0.74
## degree13   3   0 0.93   4       0.16
## degree14   0   0 0.91   4       0.80
## degree15   1   0 0.46   3       0.76
## degree16   0   0 0.33   3       1.00
## degree17   0   0 0.22   2       1.00
## degree18   0   0 0.12   1       1.00
## degree19   0   0 0.09   1       1.00
## degree20   0   0 0.06   1       1.00
## degree21   0   0 0.08   2       1.00
## degree22   0   0 0.03   1       1.00
## degree23   0   0 0.02   1       1.00
## degree25   0   0 0.02   1       1.00
## 
## Goodness-of-fit for triad census 
## 
##                obs  min    mean  max MC p-value
## triadcensus.0 4036 2727 3985.10 4889       0.92
## triadcensus.1 2418 1888 2425.12 3000       0.92
## triadcensus.2  566  271  606.20 1153       0.90
## triadcensus.3  120   48  123.58  260       0.96
## 
## Goodness-of-fit for edgewise shared partner 
## 
##       obs min  mean max MC p-value
## esp0    5   0  4.61  11       0.98
## esp1   16   2 15.85  31       1.00
## esp2   29  14 26.73  41       0.80
## esp3   17  14 26.62  44       0.14
## esp4   23   6 20.14  37       0.70
## esp5   11   1 12.20  29       0.94
## esp6   10   0  6.36  29       0.46
## esp7    4   0  2.89  16       0.60
## esp8    0   0  1.42   7       0.70
## esp9    0   0  0.65   7       1.00
## esp10   0   0  0.27   4       1.00
## esp11   0   0  0.11   2       1.00
## esp12   0   0  0.02   1       1.00
## esp13   0   0  0.02   1       1.00
## 
## Goodness-of-fit for model statistics 
## 
##                            obs        min        mean         max MC p-value
## edges                115.00000   77.00000   117.89000   179.00000       0.90
## gwesp.fixed.0.8      192.60814  100.07134   198.72708   345.21164       0.94
## gwdeg.fixed.0.8       67.92695   56.05722    68.55178    78.31059       0.82
## nodecov.Age        10526.00000 7260.00000 10809.52000 16563.00000       0.90
## nodematch.School.1    10.00000    2.00000    10.28000    23.00000       1.00
## nodematch.School.2    11.00000    2.00000    10.34000    19.00000       0.80
## nodematch.School.3    15.00000    4.00000    16.14000    29.00000       0.90

Summary

## Call:
## ergm(formula = lazeganetwork ~ edges + gwesp(0.8, fixed = T) + 
##     gwdegree(0.8, fixed = T) + nodecov("Age") + nodematch("School", 
##     diff = T))
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                     Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges              -2.837923   0.852744      0  -3.328 0.000875 ***
## gwesp.fixed.0.8     1.147451   0.213771      0   5.368  < 1e-04 ***
## gwdeg.fixed.0.8     0.697980   0.628398      0   1.111 0.266685    
## nodecov.Age        -0.014988   0.006172      0  -2.428 0.015163 *  
## nodematch.School.1  0.115268   0.317578      0   0.363 0.716635    
## nodematch.School.2 -0.084793   0.322954      0  -0.263 0.792895    
## nodematch.School.3 -0.091273   0.272982      0  -0.334 0.738112    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 873.4  on 630  degrees of freedom
##  Residual Deviance: 510.6  on 623  degrees of freedom
##  
## AIC: 524.6  BIC: 555.7  (Smaller is better. MC Std. Err. = 0.2072)
  • Structural related hypothesis:

    • Hypothesis 1: accepted - The p-value is statistically significant indicating that there are clusters in the law firm.

    • Hypothesis 2: rejected - The p-value is not statistically significant meaning that even if a node is very connected, it does not translate into a high likelihood of forming ties.

  • Attribute related hypotheses

    • Hypothesis 3: accepted - Age is negative and statistically significant which shows that the older a person gets, the less likely it is for them to form ties.

    • Hypothesis 4: rejected - Schools are not statistically significant, demonstrating that a common university does not predict ties forming and there is no homophily effect.

These findings mean that age in the firm correlates negatively with a node’s connection and this effect is significant. One could explain this phenomena that the older and more senior a person becomes in the firm, the less they are interested in having connections. New people are perhaps more dependent on forming connections. Nevertheless, this could also mean that the firm suffers from ageism and older colleagues get ignored by new colleagues who are probably younger.

Brief Conclusion and Reflections on Next Steps

Hypothesis 1 and 3 have been accepted meaning that there are clusters and older people have a smaller likelihood of forming ties. Hypothesis 2 and 4 have been rejected, demonstrating that current connections do not lead to more connections and schools are not a predictor for forming a tie.

Nevertheless, one would need to run a broader analysis that includes all attributes in order to check if these findings are correlated by chance. Below is the code that could be run on a stronger computer than mine. Additionally, the whole data set could be analyzed instead of just a subset of it. A limitation is that the data was collected through a survey where there is the risk of biased answers.

However, ff H3 is true, one could explain this as new people are more keen to expand their network and older people are more established in their career and do not need many connections. Alternatively, one could generate a new hypothesis and study to what extent older people get discriminated against at this law firm - which is a prevalent phenomena at law firms in the US.

# lazega.attr2 <- ergm(lazeganetwork ~ edges + gwesp(0.8,fixed=T) + gwdegree(0.8,fixed=T) + nodematch('Status') + nodefactor('Gender') +nodematch('Office') + nodecov('Years') + nodecov('Age') + nodematch('Practice') + nodematch('School'))
# 
# # Diagnostics
# mcmc.diagnostics(lazega.attr2)
# 
# # Goodness-of-Fit
# lazega.attr.gof <- gof(lazega.attr,  GOF = ~ degree + triadcensus + espartners - model)
# lazega.attr.gof
# plot(lazega.attr.gof)
# 
# # Summary
# summary(lazega.attr2)