The Large Scale Network Analysis Project

Posted on by Brandon Klein

    We live in exciting times. Never before have we had the opportunity to mine such a wealth of information on the complex behavior of human societies. Within the last decade, new methods of quantifying people's interactions have resulted in behavioral datasets, primarily concerning email and other online communities, many orders of magnitude larger than anything previously possible. Furthermore, in the next decade these datasets will no longer be limited to human behavior occurring online. With the ubiquity of mobile phones, credit cards, RFIDs, and a growing suite of additional tracking technologies, it is rapidly becoming possible to quantify detailed dynamics of large-scale complex social systems.

While estimates differ, most agree that at this very moment there are over 2.4 billion people carrying a mobile telephone. In databases distributed throughout the world, mobile phone service providers are storing behavioral and social network data for one out of three people on Earth. While it is important not to understate the privacy implications associated with commercial companies recording a time series of locations and communication events for billions of people, the analysis of this data will have far-reaching implications for a variety of industries and academic disciplines. Although the social sciences have become adept at analyzing sparse datasets involving discrete observations over relatively short periods of time, the field is not prepared to deal with continuous behavioral data from thousands - and soon millions - of people over the course of their lifetimes. Current analytical tools simply won't scale.

The solution to this problem is inherently trans-disciplinary. To deal with the quantity of continuous human behavioral data that will be available in the 21st century it will be necessary to draw on a range of fields from traditional social network analysis to machine learning and statistical mechanics. In our own work on these new datasets we have found that the relationship between two people can be inferred from these datasets with startling accuracy using probabilistic classifiers from the field of machine learning. And for datasets consisting of the behavioral dynamics of hundreds of millions of people, we have been using tools from the burgeoning discipline of complex network analysis. It is our hope as engineers that these new insights into our own behaviors will enable the development of applications that better support both the individual and society. Indeed, by increasing our understanding of complex social systems, we can better inform the design of social structures such as organizations, cities, office buildings and schools to conform to how we as an aggregate actually behave, rather than how CEOs, architects, or city planners think we do.