Statistical Mechanics of Complex Networks - Review
Réka Albert and Albert-László Barabási wrote a seminal paper in 2001 that I have read recently, and am now just getting around to collecting my thoughts:
The authors begin by introducing random graphs - by way of introducing Erdos and Renyi’s work, where graphs are constructed with some number of nodes, and then random edges are distributed throughout the graph. However, intuitively, real world networks are not random - they must have some underlying organization that governs their topology.
They also describe the small world phenomenon, which, despite the large scale of real networks, there are usually small paths between nodes (Milgram’s 1967 experiment) - which we briefly referenced in class. This does not necessarily differentiate random graphs and real world graphs, because paths between any given pair of nodes scales logarithmically with the size of the random graph.
However, a true distinguishing factor is the amount of clustering, (which I’ve sometimes seen referred to as triangles), quantified by clustering coefficient (Watts and Strogatz). This differentiation is due to the fact that clustering coefficients tend to be higher in real world graphs.
The authors also introduce the power law degree distribution. I had heard of this before in reading other papers, but this finally made me go research this phenomenon from the original source, where it was discovered using the WWW (also by Albert and Barabasi, such as in [1]) that real-world graphs obey a power law distribution in the number of connections between nodes on the network. As I understand the concept, some few number of nodes exhibit very high connectivity (called scale-free) while the majority of nodes have low degree (of connectivity). This the distribution over the entire graph follows a power-law.
Now for an admission - I largely skipped Section II of the paper, because I have already seen large networks in my work, where we routinely use large graphs like IMDB and DBLP and so forth, so I was already sold on their motivation for figuring out the common underlying parameters for a unified statistical model for such networks. That said, Tables I and II inspired me to analyse graphs a little more rigorously - in that I think they provide metrics useful in characterizing the topology of new graphical data, and I plan on incorporating them into my future methodology when looking at new sets of data.
I have already read about Erdos and Renyi’s algorithm for random graph generation, primarily because I was thinking about generating graphs in a recent project, so I was already researching tools that generated graphs. Since random graphs have been studied since the 50’s or so, and the Erdos-Renyi algorithm has been incorprated in most of the tools I was planning to use, it was natural for me to look into it at the time. Briefly, they start with a set of nodes and randomly connect them with edges based on some probability.
One thing that jumped out at me from section III, was the discussion of graph spectra. While relatively terse, it did convey that complex networks can be measured in terms of the eigenvalues of their adjacency matrices. I am not sure of the relevance of the semi-circular distribution of the densities however.
I found the percolation discussion too much of a distraction. From what I understood, the graph could be converted into a percolation, and then additional characteristics of the graph, such as the structure of clusters, could be discovered. I found that a poor segue into the discussion of generalized graphs, which were more palatable, in that they were simply random graphs with limitations imposed on the randomized placement of edges, such that the final graph obeyed the power log degree distribution. I was also surprised to see Newman et al’s “one-mode” bipartite representation illustrated here, where non-homogeneous nodes in the graph form bipartite graphs, and then the graph is transformed into a homogeneous graph. The example shown involved using movies and actors, where co-stars were connected to movies but not to one another in the bipartite representation, but were connected (by virtue of their shared association with a movie) in the unipartite “one-mode” transformation.
Since the above merely restricts by power law distribution, the next section describes attempts to add another parameter to the mix - the clustering coefficient (as well as other clustering characteristics, like path length and spectral properties). I skipped this section and that of the properties of scale free networks, because I was more interested in the section on error tolerance and attacks (removing nodes from the graph, and the subsequent effects on topology).
In my skimming through, I did not miss the fact that, at some gamma, the network suddenly forms a super cluster. What is most striking about this section, is that strategic removal of nodes can actually break large clusters into disjoint groups. When translated to real-world networks, this could mean that certain proteins could be disrupted in a disease, or a terrorist network could be thwarted (sadly only in a static network). This is tempered as well by the note that scale-free networks are resilient to random node removal, so such removal must be strategic in targeting highly connected (large degree) nodes.
[1] Barabási, Albert-László and Albert, Réka. “Emergence of scaling in random networks“. Science, 286:509-512, October 15, 1999.