Detecting highly overlapping communities with Model-based Overlapping Seed Expansion Aaron McDaid and Neil Hurley
March 10, 2010
I
Clique percolation. I
Faster than alternatives (preliminary finding)
I
GCE
I
Model-based community finding
I
MOSES (Model-based Overlapping Seed ExpanSion) What next?
I
I I I
Benchmarks NMI follow the research basic structure. Kronecker?
MOSES
I
(Relatively) simple model
I
Optimize objective using heuristics
I
Scales
I
(a lot more to do though)
MOSES
MOSES
MOSES
Model that: I
Every pair of nodes has a (small) chance of having an edge.
I
Communities increase the probability of edges forming between member nodes.
I
Try to put most edges inside communities
I
Try to put most disconnected pairs outside communities s(i, j) number of communities in common between nodes i and j.
I
I I
Maximize s(i, j) where i and j are connected. Minimize s(i, j) where i and j are not connected.
MOSES A community around every edge. Maximizes P(edges|grouping )
MOSES I
So, maximizing P(edges|grouping ) is not good enough. Must consider P(grouping ) also. P(edges|grouping ) × P(grouping )
I
Need a prior on the grouping.
I
Arbitrary choices made so far. Might overhaul the model. Y
FX (G ) =
s (i,j) 1−Xi,j
qo qi Z
i
×
Y
s (i,j) Xi,j
1 − qo qi Z
i
× Qz !
1
Y ˆZ 1≤c≤Q
N nc
(N + 1)
(1)
Heuristics
I
Experimenting with various heuristics to maximize the objective I I I
I
I
Seed expansion. Start with an edge. Deletions Iterative update (a la Louvain method)
But many more things should be experimented with to get better results. Easy to efficiently optimize (with the current model) I I
Oklahoma Facebook network: 892,528 edges. 2,088 seconds. A lot more could be done to speed this up even further.
MOSES A
Current seed, C , in black. Frontier nodes in blue. The next node selected will probably be A, as it has the most connections into the seed. But that depends on pin and po and on the values of sz (i, j) for the edges
Evaluation 20-node communities (cliques), po = 0.005
Detecting highly overlapping communities with Model ...
Mar 10, 2010 - ... j are connected. â» Minimize s(i, j) where i and j are not connected. ... But many more things should be experimented with to get better results.
1Our C++ implementation of MOSES is available at http://sites.google.com/ ..... a) Edge expansion: In the initial phase of the algorithm, .... software. For the specification of overlapping NMI, see the appendix of .... development of the model.
a more highly overlapping community structure, with nodes .... community within a social network, most definitions try to ..... node to ten communities per node.
Jun 28, 2012 - Twitter, Social Networks, Community Detection, Graph Mining. 1. INTRODUCTION ... category, we selected the six most popular celebrities based on their number of ... 10. 12. 14. 16. 18. Control Group. Film & TVMusic Hosting News Bloggin
ABSTRACT. The popularity and prevalence of online social networks (OSN) have made them efficient platforms for advertising and mar- keting campaigns. One important problem in target adver- tising and viral marketing on OSNs is the efficient identifi-
IOS Press. An Interaction-based Approach to Detecting. Highly Interactive Twitter Communities using. Tweeting Links. Kwan Hui Limâ and Amitava Datta. School of Computer ... 1570-1263/16/$17.00 c 2016 â IOS Press and the authors. All rights reserv
when the tax rate is high enough (i.e., exceeds a âcriticalâ tax rate, which can be as low as zero ... Both savings and interest on savings are fully con- sumed. c2 t+1 = (1 + ..... be misleading if habit formation is taken into account. The intu
Twitter: Understanding microblogging usage and communi- ties. In Proceedings of the 9th WebKDD and 1st SNA-KDD. 2007 Workshop on Web Mining and Social Network Analysis. (WebKDD/SNA-KDD '07), pages 56â65, Aug 2007. [20] A. M. Kaplan and M. Haenlein.
increasing popularity of Location-based Social Networks offers the op- portunity to ... Most of these earlier works consider the spatial aspect of check-ins and co- location without the .... erties of communities with â¤30 users [2, 10]. In particul
among users, rather than the topological information implicit ... Interface (API)1. The availability of the Twitter API has stirred immense interest in the academic study of the Twitter social network. Various models have been proposed for studying a
financial support. 1 ...... Utility and Probability, New York, London: W.W. Norton & Company. ... Satisfaction, New York and Oxford: Oxford University Press.
line is similar, we count on the color and edge clues which lead us to the correct results. There are .... and Automation, May 2006. [16] S. M. Khan and M. Shah, ...
LFW and MIT-CMU databases. We also show promising results of our method when applied to a face recognition task. 1 Introduction. The focus of this work is on a method that combines face detection ('what is it?'), localization ('where is it?'), part i
School choice programs are implemented to give students/parents an ... Computer simulation results illustrate that DA-OT outperforms an .... In the literature of computer science, ...... Online stochastic optimization in the large: Applica- ... in Bo
that setting soft-bounds, which flexibly change the priorities of students based on .... the empirical analysis by Braun, Dwenger, Kübler, and Westkamp (2014).
simple tasks that a patient can use as a code to communicate. âyes.â Many extant ... user-friendly methods of communication that do not require practice, that ...
Jan 3, 2007 - In contrast, multi-voxel analyses of variations in selectivity patterns .... Preprocessing and statistical analysis of MRI data were performed using ...
Inductive loop detectors are widely used for vehicle detection. Histori- cally, these ... engineering. They have ... Engineering,. Purdue University, West Lafayette, IN 47907. 1 .... that a depth of 5 cm provides the closest fit to the measured data.
well studied, and there are design guidelines concerning how it should be constructed .... loops spaced 4.5 m on center, the bicycle interacts with only one loop at a time. ... from the model are compared with measured loop detector data. The.
Mar 20, 2012 - As mice flows do not have much data, they almost always complete in ... Therefore, in this work, we also analyze performance of the RED. AQM. ... Priority-based scheduling gives priority to packets of one type over pack-.
Software review ... and a flood of biological data is produced by means of high-throughout sequencing techniques, ... supervised analysis detects the possible.