Characterizing Humans and Bots in Social Media Richard J. Oentaryo, Jia-Wei Low, Arinto Murdopo, Philips K. Prasetyo, Ee-Peng Lim Living Analytics Research Centre, Singapore Management University, Singapore 178902

{roentaryo, jwlow, arintom, pprasetyo, eplim}@smu.edu.sg 1.

INTRODUCTION

In recent years, we have witnessed immense growth in human activities taking place in social media. One of the most popular social media is Twitter, which was originally used as a personal microblogging site and has now evolved to news/information publishing venue. In Twitter, users share contents via short text messages called tweets (up to 140 characters each), which may include web links (URLs), images or videos, user mentions, or hashtags. The connectivity of the Twitter network also reflects social relationships among people, such as user communities and common interest groups. The popularity and openness of Twitter, however, have made it susceptible to infiltration by automated programs—known as bots. It is possible that, upon successful account registration, one sets up a bot (e.g., app service, RSS feed, blog widget) to post tweets on his/her behalf. The proliferation of bots has become a two-edged sword to Twitter [2, 3]. On one hand, bots can aid to generate many benign tweets, such as news and blog updates, which conforms to Twitter’s goal of becoming an information dissemination platform. These bots can be helpful, e.g., bots that aggregate contents from news feeds, or those that automatically respond to customer inquiries. On the other hand, spammers may exploit bots to spread malicious links, unsolicited messages, or hijack trending topics. Such bots usually follow human users randomly, hoping some of them will follow back and then see spam tweets that entice them to visit malicious sites. In addition to deteriorating user experience and network of trust, malicious bots may cause more severe impacts, such as creating panic during emergencies, affecting stock market, exposing private information, biasing political opinions, or damaging corporate reputation [5, 3]. It is thus critically important to characterize different types of bots and understand how they are different from human users. Such endeavor brings about several benefits. From a social science perspective, a more accurate understanding of human relationships and information diffusion patterns can be achieved by filtering out noisy, biased or false contents generated by bots. From a service standpoint, understanding the traits of humans and different types of bots can help Twitter develop new methods/strategies to increase user engagements, improve trust and security, and build more effective search and recommendation engines. These would in turn benefit the overall user community as well. Several studies have been recently conducted to identify bots [5, 2, 4, 3]. However, these works have been focused mainly on spam bots, and do not include detailed examinations on other types of bots. In this work, we present a new, more comprehensive categorization of bots in Twitter and their comparisons with human users. To our best knowledge, this work is the first to study different kinds of Twitter bots as well as human users, with detailed analyses on their posting patterns, network connectivity, and temporal signatures. The findings/insights presented in this work would contribute to the creation of informative features for bots and humans classification in social media, which are useful for social science and network mining researches.

2.

CATEGORIZATION OF BOTS

We define a bot as a Twitter account that generates contents and interacts with other users automatically, at least according to human judgment. This covers both benign and

Operator  

Bot  agent  

Content  

Bot  agent  

Content  

Organiza7on  

Individual  

Broadcast  bot  

Consump(on  bot   Operator  

Operator  

Bot  agent  

Content  

Operator  

Bot  agent  

Content  

Unfollower  

Individual  

Individual  

App/service  bot  

Spam  bot  

Figure 1: Different types of bots in Twitter malicious bots. Based on our long-term studies on Twitter, we propose to categorize bots into four main classes: Spam bot. This type of bots posts many random contents, and/or promote certain sites/brands aggressively. Such bots can be either malicious (e.g., tricking people by hijacking certain account or redirecting them to malicious sites), or purely promotional with no harmful/tricky content. Broadcast bot. This type of bot aims at disseminating information to general audience by providing, e.g., benign links to news, blogs or sites. Such bot is typically operated by an organization or a group of people (e.g., bloggers). Consumption bot. The main purpose of this bot is content aggregation for personal use. This is usually achieved via online services such as IFTTT (https://ifttt.com) or Twitterfeed (http://twitterfeed.com). Typically, this type of bots aggregates information from multiple sources/sites. App/service bot. This type of bots provides services to users, such as update on horoscope reading, weather, follower/followee status, etc., including automatic inquiry responders or chatbots. Such bots usually post tweets containing a focused set of topics regularly, with specific timings. Figure 1 illustrates a typical setting of the four bot types, where the arrow direction represents the flow of information.

3.

DATA AND FINDINGS

In our study, we use Twitter data collected for a 4-month period (i.e., 1 January–30 April 2014). Starting from 159,724 seed users, we crawled their followers and followees. With this, we obtained a total of 24.6 million Twitter accounts. To identify bots, we checked active accounts who tweeted at least 15 times within a month from a specified set of tweet sources (e.g., IFTTT, Twitterfeed, etc.), and manually classified them into one of the four bot types. For human users identification, we randomly sampled the remaining accounts and manually labeled them. In total, this gave us 5,838 labeled accounts (596 bots and 5,242 humans). Figure 2 summarizes the key findings in our preliminary study. Figure 2(a) shows the cumulative distribution functions (CDF) of different variables. A faster increase in the CDF value implies a more skewed variable distribution. We focus on several variables indicating the contents and social |E∩F | |F | , reciprocity = |E∪F , aspects of users: popularity = |E|+|F | | retweet ratio =

|R| , |T |

url ratio =

|U | , |T |

mention ratio =

|M | , |T |

0.5

spam broadcast consumption app/service human 1.0 1.5 2.0 2.5 mention_ratio

0.6

spam broadcast consumption app/service human 0.00.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 hashtag_ratio 0.2

(a) Cumulative distribution functions

30 25 20 15 10 5 0 5 10 15 20 hour

day

day

human (Jan'14)

0.4

day day

30 25 20 15 10 5 0 5 10 15 20 hour

(Apr'14) 30 25 20 15 10 5 0 5 10 15 20 hour (Apr'14) 30 25 20 15 10 5 0 5 10 15 20 hour

consumption

app/service (Mar'14)

human (Feb'14)

25 20 15 10 5 0 5 10 15 20 hour

day

0.2

Cumulative distribution

0.6

1.0

(Apr'14) 30 25 20 15 10 5 0 5 10 15 20 hour

Follow     Network  

broadcast

30 25 20 15 10 5 0 5 10 15 20 hour

consumption (Mar'14)

app/service (Feb'14)

25 20 15 10 5 0 5 10 15 20 hour

day

0.8

app/service (Jan'14)

30 25 20 15 10 5 0 5 10 15 20 hour

day

0.8

0.5

(Apr'14) 30 25 20 15 10 5 0 5 10 15 20 hour

day

0.00.0

spam broadcast consumption app/service human 1.0 1.5 2.0 2.5 url_ratio

(Apr'14) 30 25 20 15 10 5 0 5 10 15 20 hour

broadcast (Mar'14)

consumption (Feb'14)

25 20 15 10 5 0 5 10 15 20 hour

spam

30 25 20 15 10 5 0 5 10 15 20 hour

day

0.2

0.0 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 retweet_ratio 1.0

0.4

day

0.4

25 20 15 10 5 0 5 10 15 20 hour

day

0.2

day

0.4

30 25 20 15 10 5 0 5 10 15 20 hour

spam (Mar'14)

broadcast (Feb'14)

consumption (Jan'14)

0.6

day

spam broadcast consumption app/service human

30 25 20 15 10 5 0 5 10 15 20 hour

25 20 15 10 5 0 5 10 15 20 hour

30 25 20 15 10 5 0 5 10 15 20 hour

Men-on     Network  

app/service

human (Mar'14)

day

0.8 Cumulative distribution

0.8 Cumulative distribution

broadcast (Jan'14)

1.0

0.6

0.00.0

0.2

spam broadcast consumption app/service human 0.4 0.6 0.8 1.0 reciprocity

day

0.00.0

1.0

Cumulative distribution

0.2

spam (Feb'14)

day

0.2

spam broadcast consumption app/service human 0.4 0.6 0.8 1.0 popularity

30 25 20 15 10 5 0 5 10 15 20 hour

day

0.2

0.6 0.4

day

0.6 0.4

day

0.8 Cumulative distribution

Cumulative distribution

0.8

0.00.0

spam (Jan'14)

1.0

day

1.0

human

30 25 20 15 10 5 0 5 10 15 20 hour

Broadcast  

(b) Temporal dynamics

App/Service  

Consump5on  

Spam  

Human  

(c) Network connectivity

Figure 2: Key characteristics of humans and bots in our Twitter dataset hashtag ratio = |H| , where E, F , R, T , U , M , H are the |T | set of followees, followers, retweets, tweets, URLs, user mentions, and hashtags for a given account, respectively. Figure 2(b) shows heatmaps of tweet frequencies for different days and hours across 4 months. Finally, Figure 2(c) shows subsets of the follow and mention networks illustrating the connectivity among humans and different bots. The node size reflects the node importance, computed using PageRank [1]. Based on the results in Figure 2, we can readily answer several important research questions (RQs), such as: RQ1: How are humans different from bots (or bots from one another)? From Figure 2(a), we can distinguish the posting and social patterns of human users and different bot types. For instance, the popularity and reciprocity results suggest that bots are generally more popular (except for app/service) but less reciprocal (follow each other) than humans. From the retweet ratio and mention ratio results, we can also observe that humans tend to reshare contents from and mention (talk to) other accounts more than bots, respectively. Additionally, the url ratio and hashtag ratio results suggest that bots like to include more diverse web links and topics compared to humans, respectively. Meanwhile, comparisons among the bots show that the broadcast bots are the most popular and post the most diverse URLs and hashtags, but are the least reciprocal and very rarely mention others. This can be attributed to the fact that broadcast bots are typically operated by organizations, which hardly have interest to interact with other users. RQ2: How do activities of humans and bots change over time? As shown in Figure 2(b), seasonality exists in humans’ tweeting activities1 . During weekdays, humans rarely tweet from early morning to noon (busy working/school hours), and then the traffic increases until roughly 10pm. On the weekends, the activities are prolonged, starting earlier from 8am. We also notice higher regularity in the activities of app/service bots, suggesting that their posts follow certain schedules. In contrast, spam bots are active all days/hours, and their timings are random. Finally, broadcast and consumption bots similarly tweet more during weekdays, usually after working/school hours. In summary, different bots serve different purposes, and their temporal patterns reflect these and how they are distinct from humans. RQ3: Do humans (bots) connect or talk with bots (humans)? The statistics in Figure 2(a) (top left) sug1

The exceptionally low tweet frequencies in the first week of January and 12-14 February are due to major downtime of our servers.

gests that bots have generally higher popularity scores than humans, and hence are less likely to follow other accounts (comprising humans mostly). However, there are exceptions, especially the bots corresponding to the tail-end of the distribution. For example, Figure 2(c) (upper part) shows the existence of a group of malicious spam bots that follow one another. As mentioned, intuitively, malicious bots would try to follow other accounts indiscriminately, expecting them to follow back. However, it is not normally easy to lure humans, and they may end up attracting other malicious bots instead. As for the mention activity, Figure 2(c) (lower part) indicates that bots hardly like to talk to humans. This aligns with the overall results given in Figure 2(a) (bottom left). Finally, we note that the above RQs are by no means complete or exhaustive. Further studies shall be carried out to obtain in-depth insights and address other important RQs.

4.

IMPLICATIONS AND EXTENSIONS

Our study aims at providing valuable insights on the content, social and temporal characteristics of humans and bots in social media. Armed with a richer understanding of humans’ and agents’ behaviors, we envisage broader, long-term implications that benefit social science studies and facilitate the emergence of better services in social media. Moving forward, we wish to utilize the current results to identify a comprehensive set of discriminative features for explaining the characteristics of humans and bots. We are also interested in developing generalized principles and statistical models of human/bot behaviors in social media, and test them in larger-scale Twitter data as well as other social networks. Acknowledgments. This research is supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office, Media Development Authority (MDA).

5. [1] [2]

[3] [4]

[5]

REFERENCES

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107–117, 1998. Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia. Who is tweeting on Twitter: Human, bot, or cyborg? In ACSAC, pages 21–30, 2010. E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini. The rise of social bots. ArXiv, 2014. K. Lee, B. D. Eoff, and J. Caverlee. Seven months with the devils: A long-term study of content polluters on Twitter. In ICWSM, pages 185–192, 2011. A. H. Wang. Detecting spam bots in online social networking sites: A machine learning approach. In DBSec, pages 335–342, 2010.

Characterizing Humans and Bots in Social Media

INTRODUCTION. In recent years, we have witnessed immense growth in human activities taking place in social media. One of the most popular social media is Twitter, which was originally used as a personal microblogging site and has now evolved to news/information publishing venue. In Twitter, users share contents via ...

2MB Sizes 2 Downloads 154 Views

Recommend Documents

When humans form media and media form humans
digital media for representing information on a computer screen was limited to text, ... puter systems are now faced, therefore, with a major problem—that of choosing the ...... Lucid Adult Dyslexia Screening Administrator's Manual, Version 1.0, ..

Social cognition in humans and robots - socSMCs
Collectives”. 16:00 - 16:30 Coffee break. 16:30 - 18:00 Contributed talks. 19:00. Social dinner at MS Cap San Diego. (en.wikipedia.org/wiki/Cap_San_Diego).

Social Media and Sales Quota
50.1% of sales people told us that their time spent using social media ranged from less than 5% to up to 10%. Top social sites used for selling. Given that the ...

lgbt parents and social media during shifting social movements.pdf ...
lgbt parents and social media during shifting social movements.pdf. lgbt parents and social media during shifting social movements.pdf. Open. Extract. Open with.

Media planning and social marketing.pdf
Media planning and social marketing.pdf. Media planning and social marketing.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Media planning ...

Characterizing Polygons in R3
since the arc α1 lies on the same plane through v1 as v2v3, then π(α1) ∪ π(v2v3) forms a single great circle ...... E-mail address: [email protected].

Characterizing fragmentation in temperate South ...
processing we used the software ERDAS Imagine, Version. 8.2 (Leica .... compare landscapes of identical size, but it has also the disadvantage of ...... Monitoring environmental quality at the landscape scale. Bioscience 47 .... habitat networks.

DBaaS - Connected Social Media
It includes a variety of open source and commercial database solutions, supports .... Our open source architecture enables us to quickly and easily integrate new ... Setting Up. Disaster. Recovery. Automated. Capacity. Management. Automated. Self-Ser

SOCIAL MEDIA GUIDE
Page 1. SOCIAL MEDIA GUIDE. SOCIAL MEDIA TEAM. Follow the team on Twitter for their unique perspective on the conference, and read their blogs for ...

Julia Dalcin - SOCIAL MEDIA AND AUDIOVISUAL ...
hftLo, If]qLo, eflifs, wfld{s, n}+lus lje]b / ;a} k|sf/sf hftLo 5'jf5"tsf]. cGTo u/L ... Retrying... Julia Dalcin - SOCIAL MEDIA AND AUDIOVISUAL COMMUNICATIONS.pdf.

Media planning and social marketing.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Main menu.

social media and marketing pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect ...

Media planning and social marketing.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Media planning ...