Review of “The Link Prediction Problem for Social Networks”
This paper[1] discusses the problem of predicting how new edges will form in social networks. This problem is interesting to me, because it can be combined with other techniques to completely solve the counter-terrorism problem. Here’s what we would need:
- A method for discovering covert networks hidden in the larger social network. The larger social network might be call logs for a region, financial transaction records, or border crossings, etc. There is already research being done in automated community discovery in graphs. If that were solved, this problem goes away.
- A method for determining importance of a node to such a discovered community. If we know how information flows in the sub-network, we can determine relative importance, and thereby maximize the effectiveness in efforts to disrupt nefarious plots.
- A method for predicting link occurrence would reveal the presence of a link despite its being hidden covertly through subterfuge. As a result, this would expose previously unknown avenues of investigation.
Taken together, these three can present a means to disrupt terrorist networks, through revealing the size and membership of terrorist cells, how they are organized (such as who is in charge and how they acquire necessary resources), and how to maximize countermeasures. If we can make these inferences with any reasonable degree of accuracy, and we can determine the information flow in the network, we can also inject sensors into the network, thereby making them more likely to be incorporated as replacements for nodes removed. Doing so effectively neutralizes a large portion (if not all) of the organization.
The problem is, forecasting the future is hard. This paper in particular talks about some of the challenges that arise when attempting this feat with regard to deciding whether a pair of nodes will likely link to one another. As the authors note, social networks are especially dynamic, with new information constantly updating the network. Consider a typical call graph, and one can see that the scale is both highly dynamic and massive in size.
The authors restrict the link prediction problem in terms of the network topology. In other words, just what can be observed in the network, disregarding out-of-band communications or other factors, like geospatial proximity. As a consequence, Liben-Nowell and Kleinberg attempted to use proximity metrics between pairs of nodes as a means of inferring new link formation.
The paper formalizes the problem beautifully (well worth the read for that alone). To evaluate their prediction accuracy, they use five author co-citation graphs (all subsets of arXiv) snapshotted for three time intervals (per graph). One alarming note is that they remark that the disambiguation problems (they use first initial and last name to identify authors, which is non-unique) in their methodology are minor. I would argue that point, since distinct authors become indistinguishable in their resultant graph.
A large portion of the paper talks about the nine proximity measures they used. I was a bit surprised to see PageRank, as I thought other spectral methods might be more powerful. I was also impressed with their comparisons of the resultant predictions - they compared their predictions for the nine different techniques to a random predictor as well as to each other, in addition to assessing their relative accuracies.
Perhaps most interesting was the revelation that by incorporating more diverse data, (by adding computer science co-citations to the arXiv data sets) they were able to do much better. Further, the authors suggest that adding attribute data (such as keywords of papers or geographic locations) might yield better results.
[1] Liben-Nowell, D. and Kleinberg, J. 2003. The link prediction problem for social networks. In Proceedings of the Twelfth international Conference on information and Knowledge Management (New Orleans, LA, USA, November 03 - 08, 2003). CIKM ‘03. ACM, New York, NY, 556-559.
June 10th, 2008 at 2:16 am
I just discovered this paper:
Hasan, M., Chaoji, V., Salem, S., and Zaki, M. J., (2006). “Link Prediction using Supervised Learning,” Workshop on Link Analysis, Counter-terrorism and Security (with SIAM Data Mining Conference), Bethesda, MD, April 2006.
…which discusses an attempt to use attribute data as I suggested in my review (they beat me to it!). It does occur to me however, that doing so sacrifices generality. Part of the beauty of Liben-Nowell and Kleinberg’s approach is that it can be applied to any social network, without having to do feature selection.