MLFL Wiki |
Main /
Causal Estimation In Social Media And Distance Estimation In Social NetworksAbstract: In this talk, I will be talking about two recent projects. The first one is about causal estimation in social media systems, and the second one is about distance estimation for very large networks. (1)Social media systems are generating increasing amounts of data that measure and record online interactions among users. Careful analysis of such data can provide information about cause-and-effect dependence in such systems, and this can provide guidance for system administration and design. Quasi-experimental designs (QEDs) are commonly used in social sciences to discover causal knowledge from observational data, and QEDs can be exploited to discover causal knowledge about social media systems. In this talk, we report results from the application of three different QEDs to demonstrate how one can gain causal knowledge of a social media system. (2)Distance estimation is key to many network mining applications such as centrality and clustering. As the size of available networks increases to millions of nodes and edges, distance calculation becomes a bottleneck for such applications. One way to overcome such bottlenecks is to use the MapReduce parallel processing framework, though increasing resources linearly does not scale well for many network mining applications. In this paper, we propose a network structure index (NSI) by extending the basic breadthfirst search algorithm to accurately estimate shortest distance using MapReduce. We demonstrate the accuracy of our method for estimating shortest distance between node pairs with NSIs on synthetic and real networks. We use distance estimation along with progressive sampling to achieve two specific applications for very large networks: closeness centrality and betweenness centrality. We first evaluate our distance estimation method on relatively small networks, then we report our observations about the most central nodes of a Twitter network with more than 40 million nodes. Bio: Huseyin Oktay is a Ph.D. candidate in the Department of Computer Science at UMass Amherst, where he is a member of the Knowledge Discovery Laboratory directed by Prof. David Jensen. His research interests include designing and developing automated computational methods to estimate causal effects in complex relational domains as well as developing scalable methods for large-scale graph mining using mapreduce parallel processing framework. He applied some basic quasi-experimental designs to social media data for causal estimation. He also developed efficient and accurate distance estimation techniques for very large graphs using mapreduce. |