Following the Follower: Detecting Communities with Common Interests on Twitter Kwan Hui Lim and Amitava Datta School of Computer Science and Software Engineering The University of Western Australia Crawley, WA 6009, Australia

[email protected], [email protected] ABSTRACT We propose an efficient approach for detecting communities that share common interests on Twitter, based on linkages among followers of celebrities representing an interest category. This approach differs from existing ones that detects all communities before determining the interest of these communities, a computationally intensive process given the large scale of online social networks. In addition, we also study the characteristics of these communities and the effects of deepening or specialization of interest.

Categories and Subject Descriptors

this set. Consider celebrity  cj , 1  j  k, and all the followership links for this celebrity i link(i, cj ). We construct the set:   P = ( link(i, cj )), f or 1  j  k i

P  is the set of fans who follow all the k celebrities in the set cj , f or 1  j  k. We consider only friendship links for community detection as friendship links are stronger and more reflective of real-life interactions. Next, we try to detect communities among the members of P using the Infomap algorithm and Clique Percolation Method (CPM) at a k-value of 3. Refer to [3] and [4] for more details on the CPM and Infomap respectively.

J.4 [Computer Applications]: Social and behavioral sciences

3. INVESTIGATING COMMON INTERESTS General Terms Theory

Keywords Twitter, Social Networks, Community Detection, Graph Mining

1.

INTRODUCTION

One important problem in the application of target advertising and viral marketing to online social networks is the efficient identification of communities with common interests. Current approaches involve detecting all communities, then determining the interests of these communities [2, 5]. These approaches involve a lengthy and intensive process of detecting communities for the entire social network and many of the detected communities may not share the interest we are looking for. We propose a method to identify communities comprising like-minded individuals with common interests on Twitter. Also, our method does not unnecessarily detect communities that do not share any specific interest.

2.

DATASET AND METHODS

The Twitter dataset collected by Kwak et al. [1] is used for our experimentations. A followership link (i, j) indicates that user i is a follower of user j, while a friendship link F ri,j indicates (i, j) = (j, i). We define celebrities as users with more than 10,000 followers. The interest of a user, Intcat is inferred by the number of celebrities (of category cat) that the user follows. Suppose we identify a set of k celebrities c1 , c2 , ..., ck . We next identify all the followership links for the individual celebrities in Copyright is held by the author/owner(s). HT’12, June 25–28, 2012, Milwaukee, Wisconsin, USA. ACM 978-1-4503-1335-3/12/06.

For our study, we selected Film & TV, Music, Hosting, News and Blogging as categories of interest due to their popularity. For each category, we selected the six most popular celebrities based on their number of followers. The categories that these celebrities represent were determined using information from Google and Wikipedia. Following which, we selected users with Intcat = 6, for cat ∈ {F ilm&T V, M usic, Hosting, N ews, Blogging}. As a control group, we randomly chose 200,858 users to represent the group with no shared interest. We now use our approach and compare the detected communities with common interests against the control group in terms of the total number of communities, size of largest community, and average community size. Fig. 1 and 2 show that users with common interests form larger and more communities than users without a common interest. Similarly, users with common interests form larger communities on average as shown in Fig. 3. The exception is the News category detected using CPM as many cliques of three nodes were detected as communities thus decreasing the average community size. Table 1: Network statistics of the communities Category Path Length Clustering Coefficient Diameter Average Degree

Control 2.83 0.60 6 7.81

Film/TV 3.03 0.62 7 6.80

Music 2.82 0.63 8 7.29

Hosting 3.09 0.59 8 8.17

News 3.35 0.58 8 9.15

Blogging 3.09 0.62 7 7.51

Users with common interests also form communities that are more cohesive than those without common interest. Table 1 shows this trend where the communities with common interest have a higher clustering coefficient than our control group with no common interest, except the Hosting and News categories. However, users interested in Hosting and News have a higher average degree of links which shows that these users are better connected than users in the control group. These results show that our community detection approach finds communities that are larger, more cohesive and share common interests.

103 102 101 100

5

10

4

CPM Infomap

103 102

Table 2: Comparison of General and Specialized Interest General (Music) 0.00032 0.00041 2.82 0.63 8 7.29

Specialized (Country) 0.00750 0.01151 2.10 0.76 4 5.52

Communities comprising users with a specialized interest are also more cohesive and well-connected than those with a general interest. Table 2 best illustrates this where users with a specialized interest in Country Music form communities with a shorter average path length and diameter but higher clustering coefficient compared to those with a general interest in Music. Next, we investigate the changes in communities as their interest in a category grows deeper, which is indicated by an increasing Intcat value. Specifically, we report on the changes in number of communities, community size, clustering coefficient and path length among users as their interest deepens. The size and number of communities show how likely users with common interests form communities while clustering coefficient and path length give an indication of connectedness within the communities. An increase in interest level among users corresponds to an increase in their average community size. We observe an increasing NACS with increasing IntCountry values. This result supports our original observation that communities are more likely to be formed

ng gi og Bl s ew N g tin os H ic us M TV & up lm ro Fi lG tro on

SPECIALIZING/DEEPENING INTERESTS

CPM Infomap

C

ng gi og Bl s ew N g tin os H ic us M TV & up lm ro Fi lG tro on

Figure 2: Size of Largest Community

We show that users sharing a specialized interest (i.e. Country Music) form a more tightly-coupled community than users sharing a general interest (i.e. Music). The control group is the users interested in the general Music category as discussed in Section 3. The celebrities representing the Country Music category are seven Country Music singers who have won various awards at the Country Music Awards between 2001 to 2008 and have more than 10,000 followers. We investigate the changes in community formation as users specialize in their common interest (from Music to Country Music). The results are normalized by the number of users in each respective group to give an accurate representation of the community characteristics of each interest group. This Normalized Average Community Size (NACS) allows us to compare if users with specialized interests form larger communities than users with a general interest, without the biases of the base population size. We observe that the NACS of the IntCountry = 6 group is 23 and 28 times larger than the IntM usic = 6 group using CPM and Infomap respectively as shown in Table 2. In addition, users with a lower level of interest in a specialized category are also more likely to form larger communities on average compared to users with a higher level of interest in a general category. Statistic NACS (CPM) NACS (Infomap) Path Length Clustering Coefficient Diameter Avg. Degree

18 16 14 12 10 8 6 4 2

C

ng gi og Bl s ew N g tin os H ic us M TV & up lm ro Fi lG tro on

C

Figure 1: Total Communities

4.

10

No of Nodes

CPM Infomap

No of Nodes

No of Communities

10

4

Figure 3: Average Size of Communities

among like-minded individuals. In addition, the average size and number of communities formed increases as the interest level of the users increases. Communities comprising users with a common interest get more tightly coupled as their level of interest increases. We observe a gradual increase in clustering coefficient among the largest communities with increasing IntCountry values. Similarly, the largest communities at varying values of IntCountry have an average path length of 1.7 to 3.0 hops, illustrating that users sharing common interests form communities that are better connected. Even considering only friendship links for community detection, the communities detected still display the characteristics of scalefree networks. Upon closer examination, we observe that many individuals with large degree distribution are also country music artists but with less fans than the celebrities we have chosen. The fact that there are other minor country singers among these communities shows that our method effectively detects communities comprising users with a common interest. In conclusion, we proposed a method to efficiently detect communities comprising individuals with common interests for application in target advertising and viral marketing. As Twitter has no explicit options for users to state their interest, we derived a measurement of interest based on the number of celebrities in an interest category that the user follows. Our approach detects communities that are larger, more cohesive and only comprise users that share a common interest. Also, we observed how their community structures become more connected and cohesive with specializing or deepening of interest in a given category.

5. ACKNOWLEDGMENTS Kwan Hui Lim was supported by the Australian Government, University of Western Australia (UWA) and School of Computer Science and Software Engineering (CSSE) under the International Postgraduate Research Scholarship, Australian Postgraduate Award, UWA CSSE Ad-hoc Top-up Scholarship and UWA Safety Net TopUp Scholarship.

6. REFERENCES [1] H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In Proc. of WWW, pages 591–600, 2010. [2] D. Li, B. He, Y. Ding, J. Tang, C. Sugimoto, Z. Qin, E. Yan, J. Li, and T. Dong. Community-based topic modeling for social tagging. In Proc. of CIKM, pages 1565–1568, 2010. [3] G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435:814–818, 2005. [4] M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal community structure. PNAS, 105(4):1118–1123,2008. [5] S. H. Yang, B. Long, A. Smola, N. Sadagopan, Z. Zheng, and H. Zha. Like like alike - Joint friendship and interest propagation in social networks. In Proc. of WWW, pages 537–546, 2011.

Detecting Communities with Common Interests on Twitter

Jun 28, 2012 - Twitter, Social Networks, Community Detection, Graph Mining. 1. INTRODUCTION ... category, we selected the six most popular celebrities based on their number of ... 10. 12. 14. 16. 18. Control Group. Film & TVMusic Hosting News Blogging. No of Nodes. CPM ... Twitter, a social network or a news media?

78KB Sizes 11 Downloads 423 Views

Recommend Documents

Finding Twitter Communities with Common Interests using Following ...
Jun 25, 2012 - Twitter is a popular micro-blogging service that allows messages of up to 140 characters (called tweets) to be posted and received by registered users. Tweets form the basis of social interactions in Twitter where a user is kept update

Detecting Like-minded Communities with Common ...
ABSTRACT. The popularity and prevalence of online social networks (OSN) have made them efficient platforms for advertising and mar- keting campaigns. One important problem in target adver- tising and viral marketing on OSNs is the efficient identifi-

Inferring Implicit Topical Interests on Twitter
their social posts, less work is dedicated to identifying implicit interests, which are also very important for building .... dedicated to inferring implicit interests of the users. Some authors have shown interest in ... In Definition 2, For instanc

Time-Sensitive Topic-Based Communities on Twitter
Laboratory for Systems, Software and Semantics (LS. 3. ),. Ryerson University, Toronto, Canada ... in order to compare our work with the state of the art. This paper is organized as follows: In the next .... our collection of tweets M with concepts d

Detecting highly overlapping communities with Model ...
1Our C++ implementation of MOSES is available at http://sites.google.com/ ..... a) Edge expansion: In the initial phase of the algorithm, .... software. For the specification of overlapping NMI, see the appendix of .... development of the model.

Detecting highly overlapping communities with Model ...
Mar 10, 2010 - 1. 5. 10. 50. 500. 0.0. 0.1. 0.2. 0.3. 0.4. 0.5. 0.6. Size of community. Density. Oklahoma. Princeton. UNC. Georgetown. Caltech ...

Detecting highly overlapping communities with Model ...
Mar 10, 2010 - ... j are connected. ▻ Minimize s(i, j) where i and j are not connected. ... But many more things should be experimented with to get better results.

Detecting highly overlapping communities with Model ...
a more highly overlapping community structure, with nodes .... community within a social network, most definitions try to ..... node to ten communities per node.

A Topological Approach for Detecting Twitter ... - Semantic Scholar
marketing to online social networking sites. Existing methods ... common interest [10–12], these are interaction-based methods which use tweet- ..... categories in Twitter and we selected the five most popular categories among them.3 For each ...

An Interaction-based Approach to Detecting Highly Interactive Twitter ...
IOS Press. An Interaction-based Approach to Detecting. Highly Interactive Twitter Communities using. Tweeting Links. Kwan Hui Lim∗ and Amitava Datta. School of Computer ... 1570-1263/16/$17.00 c 2016 – IOS Press and the authors. All rights reserv

A Topological Approach for Detecting Twitter ... - Semantic Scholar
marketing to online social networking sites. Existing methods ... nities with common interests in large social networks [6,7]. .... publicly available [20]. We also ...

An Interaction-based Approach to Detecting Highly Interactive Twitter ...
Twitter: Understanding microblogging usage and communi- ties. In Proceedings of the 9th WebKDD and 1st SNA-KDD. 2007 Workshop on Web Mining and Social Network Analysis. (WebKDD/SNA-KDD '07), pages 56–65, Aug 2007. [20] A. M. Kaplan and M. Haenlein.

Detecting Location-centric Communities using Social ...
increasing popularity of Location-based Social Networks offers the op- portunity to ... Most of these earlier works consider the spatial aspect of check-ins and co- location without the .... erties of communities with ≤30 users [2, 10]. In particul

Rumor Detection on Twitter with Tree-structured ...
2Victoria University of Wellington, New Zealand ... rooted from a source post rather than the parse tree ... be seen that when a post denies the false rumor,.

Detecting Consistent Common Lines in Cryo-EM by ...
Oct 24, 2009 - dDepartment of Applied Mathematics, School of Mathematical Sciences, Tel Aviv University, Tel Aviv. 69978 Israel. Abstract. The single-particle ...

Detecting Telecommunications Fraud based on ...
Detecting Telecommunications Fraud based on Signature Clustering Analysis. Pedro Gabriel Ferreira, Ronnie Alves, Orlando Belo, Joel Ribeiro. University of Minho ..... signature variation was considerably greater than the variation of the remain- ing

Detecting Cancer Metastases on Gigapixel Pathology Images ...
Detecting Cancer Metastases on Gigapixel Pathology Images (20170302).pdf. Detecting Cancer Metastases on Gigapixel Pathology Images (20170302).pdf.

An Automated Interaction Application on Twitter - GitHub
select the responses which are best matches to the user input ..... the last response when the bot talked about free ... User> go and take control the website that I.

Detecting Cancer Metastases on Gigapixel Pathology Images ...
Detecting Cancer Metastases on Gigapixel Pathology Images (20170303).pdf. Detecting Cancer Metastases on Gigapixel Pathology Images (20170303).pdf.

Question Identification on Twitter - Research at Google
Oct 24, 2011 - It contains two steps: detecting tweets that contain ques- tions (we call them ... empirical study,. 2http://blog.twitter.com/2011/03/numbers.html.

Detecting Cancer Metastases on Gigapixel Pathology Images ...
Whoops! There was a problem loading more pages. Retrying... Detecting Cancer Metastases on Gigapixel Pathology Images (20170302).pdf. Detecting Cancer Metastases on Gigapixel Pathology Images (20170302).pdf. Open. Extract. Open with. Sign In. Main me

Detecting Cancer Metastases on Gigapixel Pathology Images ...
Mar 3, 2017 - Detecting Cancer Metastases on Gigapixel Pathology Images (20170303).pdf. Detecting Cancer Metastases on Gigapixel Pathology Images ...

ViewFocus: Explore Places of Interests on Google ...
Google Maps [1] is a widely used online service to explore places on the earth. Users can virtually explore the maps by viewing the related geo-tagged photos.

Detecting Consciousness with MEG
simple tasks that a patient can use as a code to communicate. “yes.” Many extant ... user-friendly methods of communication that do not require practice, that ...