Building Anti-Virus Email and File Storing Service Based on Grid Computing Tan Dang Cao, Van Tam Vo, Tai Duc Son, Son Ngoc Le, Phuong Duy Pham, Tuan Anh Dao, Son Tuan Pham, Tri Minh Vu, Tuan Minh Dang Faculty of Information Technology –University of Natural Sciences - Ho Chi Minh city, VietNam {cdtan, vtvan, ldtai, lnson, pdphuong, datuan, ptson, vmtri, dmtuan}@fit.hcmuns.edu.vn Abstract - Grid Computing is a new trend being developed in Information Technology. It helps us take full advantage of processing and storing capacity and other underused resources to provide an environment of high processing capacity and plentiful storing ability and using it to solve complicated problems with very low cost. The growth and popularity of computer science and Internet in both work and daily life have been taken place in the world. Network security is a serious problem and now, users have been facing many problems such as computer virus infection as transmitting files, virus emails, spam emails, or attacking DoS/DDoS/ … on Internet. This theme which is analysising and establishing a system based on Grid Computing technology first solves the most important problems protecting system from virus infection as transmitting/storing files and emails with large quantity. I. INTRODUCTION Grid computing technology is a new trend in Information Technology. The appearance of Grid Computing has marked a great achievement in the development of high performance computing. It helps us take full advantage of processing and storing capacity and other underused resources to provide an environment of high processing capacity and plentiful storing ability, and uses it to solve complicated problems which the current technology has difficulty dealing with or which can only be achieved at high financial cost [1], [2],[3],[4]. In the past years, a number of large IT organisations and corporations have chosen grid computing as their developing strategy and have invested much time in researching its practical applications[6],[18], [26].

II. THE NETWOKK SECURITY ISSUES The growth and the popularity of computer science and the Internet in both work and daily life have been worldwide and nowhere less than in Vietnam. Network security is an urgent need and so has become one of great interests of Vietnamese IT companies. Taking full advantages of the ability of a network environment to share data easily is wider and deeper. In some occasions and some situations, this requirement is crucial and it may lead the network system to easily pick up a computer virus infection. At work Internet users often receive many emails daily, any of which may introduce a virus to their computer system. This problem is always inconvenient, and potentially disastrous. When setting up a system to which users connect and to satisfy the user’s desire for a safe network, the importance of avoiding unexpected security problems and inconveniences cannot be understated, especially if the user is part of a large company, organization or ISPs. This is of concern to people with limited IT experiences as in reality most of people who use IT don’t know much about it. This theme which is analysing and establishing a system based on Grid Computing technology first solves the most important problems protecting the system from virus infection whilst transmitting/storing files and emails with large quantity. III. PROPOSED SOLUTION As mentioned above, the system, which supports anti-virus scanning (in the files and emails) must scan an average of hundreds to thousands of emails/files every day, which can be equivalent to some tens of GBs..

K. Elleithy (ed.), Innovations and Advanced Techniques in Systems, Computing Sciences and Software Engineering, 454–459. © Springer Science+Business Media B.V. 2008

455

BUILDING ANTI-VIRUS EMAIL AND FILE STORING SERVICE BASED ON GRID COMPUTING

Nowadays, most of models in business as well as in different organizations are based on the conventional models (using strong configuration as mail/FTP servers). However, these systems have some weaknesses: • Scale and range of operation of companies and corporations are more and more extended, which leads to their network systems facing more and more difficult processing or storage problems. • Other weaknesses of these models are bottleneck and low reliability situations. There are a lot of ways to solve the above problems such as buying expensive specialised hardware devices, using Cluster/Grid Computing technology, … Using Grid Computing technology will have many benefits as follows: • Allowing active managing of load balancing on servers, to schedule jobs in order to get the most benefits and also avoiding server overloading in grid computing. • Replacing or updating some servers during operation does not delay the system

however, Sendmail is insecure and the software is very bulky. Qmail, which is programmed by Dan Bernstein, repairs many security errors. Qmail is MTA open source, which is used popularly and highly valued because of its ease of operation and high performance standards. 2. Clam AntiVirus Clam AntiVirus (ClamAV) is an anti-virus tool. It is designed especially to scan emails on the email gateway/server. Clam AV provides a lot of flexible, extendable tools, dealing with multi-processes, scanning virus via command line and also providing a convenient tool to update data frequently. B. The current email service model The following figure describes the basic components and operation principles of the current email service Internet SMTP Server

• The cost to employ the system is low, and the system can process very large quantities of data (emails/files) IV. ANTI-VIRUS EMAIL SERVICE BASED ON GRID COMPUTING A. Anti-virus problem in emails Email systems in large companies and organizations, when totalled up, receive and process from hundred of thousands to millions of emails each day. These systems often integrate more anti-virus functions but scanning through such an enormous quantity of emails takes time. Using Grid Computing technology takes full advantage of computers’ processing ability in the Grids Computing system in order to decrease time in processing emails. This scheme does not build the whole system but makes use of freely available open source tools/software existing on the Internet. The following open source soft wares are chosen to establish anti-virus system in emails based on Grid Computing: 1. QMail Nowadays, there is a lot of Mail Transfer Agents (MTA) – often accompanied with Sendmail - in the UNIX environment;

User1

Storage System User2

User3

Pop3 Server

Figure 1: Current email service model The above model is used by most large companies/organisations. Mail system consists of SMTP Server (receive/send mails), Storage System (store mails), POP3 Server (users access to receive mails by POP3). POP3 Server and SMTP Server can be in the same computer. Anti-virus or anti-spam checking of emails is carried out at SMTP Server. C. Email service model based on Grid Computing technology With the email model in Figure 1, it’s easy to recognize that more numbers of mails, more time to process emails. The following figure describes elements and operative principles of email service that improve to solve the above problem:

456

CAO ET AL.

D. Experiment Model The following figure describes real system deployed in the general model:

2

Internet

12 SMTP Server

3 11

1

11

1

Grid Compute Grid

Grid

Compute Node

Compute Broker

4 Compute Grid 1

ClamAV

Qmail Broker

Qmail Information provider

User

Broker MDS

2

User

3

MDS

Data Grid

1

User

0

0

COMPUTE NODE Globus2.gfit.hcmuns.edu.vn

7

6

Data Node

0

Mail Storage

10

Data Broker

1

5

10

MDS

COMPUTE NODE Globus6.gfit.hcmuns.edu.vn

7

Data Grid

MDS

9

8

COMPUTE NODE Globus7.gfit.hcmuns.edu.vn POP

3 Server

Figure 2: Email service model based on Grid Computing technology The system is based on Grid Computing Technology by using open source code softwares: • Middleware Globus Toolkits [7],[8], [9],[11]supports the possibility of managing system information and services in network by MDS service (Monitoring and Discovery System) [5],[13]. • Ganglia’s function is to provide some information about local host such as CPU speed, the numbers of CPU, free RAM capacity, … to MDS (0) and Broker to schedule network processes (1). • Qmail and ClamAV. From the model in figure (1), when emails from Internet come into the system (2), SMTP server will get these emails and send them to Grid System (3), via Computer Broker at which the emails will be scanned before being stored. Broker program in Computer Broker System undertakes to monitor emails coming from SMTP server and then send to suitable Computer Nodes to be virus-scanned (4). This distribution is based on the information from MDS about these Computer Nodes. After finishing the scanning process, emails that have virus will be deleted; virusfree emails will be sent to System Data Grid (5). Data Broker gets these emails and selects the suitable Data Node to store them. When users want to receive mails, through Mail Client, they send requests to POP3 Server (7).After authenticating user account successfully, POP3 Server asks System Store email Data Grid (8) and gets emails from this system, then sends them back to users (10). When users want to send mails, emails will be sent to SMTP Server (11). The SMTP Server will analyse the destination email addresses, if those addresses are members of an internal network then emails will be sent to Grid System (3) to be processed. Conversely, emails will be sent to more suitable other SMTP Server on Internet (12).

COMPUTE BROKER Globus5.gfit.hcmuns.edu.vn

COMPUTE NODE Globus8.gfit.hcmuns.edu.vn

DATA NODE Globus4.gfit.hcmuns.edu.vn

COMPUTE NODE COMPUTE NODE Globus9.gfit.hcmuns.edu.vn Globus3.gfit.hcmuns.edu.vn

Figure 3: Experiment model for antivirus email service The system is deployed at the lab of Network Computer and Telecommunication Department, Faculty of IT, The University of Natural Sciences, which consists of 9 PCs, three of which are Pentium IV 2.4 GHz, RAM 256MB-512MB, six of which are Pentium II 400MHz, RAM 192MB-256MB. Compute Grid system is built with all functions that are analysed in the general model. Data Grid system now can only store data at one assigned host. V. STORING ANTI-VIRUS FILE SERVICE BASED ON GRID COMPUTING A. An anti-virus problem in file storing system Nowadays, there are many online filestoring services on Internet, which give top priority to the security and confidentiality of information passing into servers. So, it is necessary to have a method to control these data files and to make sure that each file is clear of virus infection before it is accepted; and can then be stored in the system. With the basic choices and the technology mentioned above, the main purpose of this project is to build the system to provide online storing and virus scanning functions for files uploaded by users. B. General composistions-model The following figure describes basic elements and operative principles of anti-virus file storing service and relations among them:

BUILDING ANTI-VIRUS EMAIL AND FILE STORING SERVICE BASED ON GRID COMPUTING

When users log in the system wanting to download a file (14), this request will be sent to Data Grid system (15). Broker will find a file having the most suitable version sought from the storage devices (16) and then creates a data flow to Web Portal by FTP (17). The data flow will be transmitted to users by HTTP (18). Grid system’s manipulations such as sending files to virus scanning system, receiving virus scanning results, storing files on distributed system using GridFTP [12],[14] and be transparent to users, and users only receive notice whether file is accepted to store on system or not.

Figure 4 : Genaral composistions model in antivirus file storing service System is built on Grid Computing Technology by using open source code softwares: • Middleware Globus Toolkits [7],[8],[9],[11] supports the possibility of managing system information and services in network by MDS service (Monitoring and Discovery System) [5],[13]. • Gangila has function as providing some information about local host such as CPU speed, the numbers of CPU, free RAM capacity, … to MDS (0) and Broker to schedule network processes (1). • Middleware Gridbus [15],[19],[21],[22],[23] supports the possibility of scheduling tasks to hosts in grid network environment. Using this system, user can send requests about storing files through web interface to system by HTTP protocol (2). After user signs in successfully (3), system will save file name (4) and then starts receiving data. The data flow will be sent to Grid system by FTP protocol (5) and is virus scanned before it is accepted in this system Broker program in Compute Broker system undertakes to observe files coming and distribute them to suitable Compute Nodes for virus scan (6, 7) which bases on information received from MDS. Result from the scanning will be written in a managed file (9, 10) and is used to decide if system should save that file. If detecting virus in file, system tries to delete virus (or delete infected file if it can’t get rid of the virus). If there is no virus or the system completely deletes virus, it will transmit file to Data Grid system (11). In the Data Grid, there is also Broker monitoring new files to distribute them to suitable Data Nodes then to store them.

C. Experiment model The following figure describes real system employed in the general model:

Figure 5: Experiment model for antivirus file storing service The system is deployed at the lab of Network Computer and Telecommunication Department, Faculty of IT, The University of Natural Sciences, which consists of 9 PCs, three of which are Pentium IV 2.4 GHz, RAM 256MB-512MB, six of which are Pentium II 400MHz, RAM 192MB-256MB. Compute Grid system is built with all functions, which are analysed in the general model. Data Grid system now can only store data at one assigned host. Broker is used to manage distributed storage and has not been developed completely. D. Experiment results • The main goal is to test the ability of whole system operation based on the above design. Experiment proves that design is logical and appropriate.

457

458

CAO ET AL.

• The second goal is testing the performance system with a gradual increase of file quantity. So in the experiments, it is supposed that users upload files to the system in advance, and only wait for the system to scan files for virus before storing them. • Each file which is virus scanned has a capacity of 1MB and it is assumed that antivirus programs are saved in advance at Compute Nodes, so it will not be necessary to download them to Compute Nodes whenever scanning virus. • Experiment result is described in the following scheme:

Figure 6: Average implement time graph per a scanned file The above figure shows that the number of tasks are more and more, while time for implementing every task is less and less. So, system is more effective with the large quantity of tasks. VI. • •





CONCLUSTION AND TREND OF DEVELOPMENT Succecssfully designing detailed models of two email services such as anti-virus and file storage based on Grid Computing. Implementing successfully restricted system (without Data Grid) with two above services. System operates well with designed functions. Anti-virus file storing subsystem is tested about its performance and the result is suitable to the official experiments of Gridbus Broker group for applications used to store data by using Gridbus Broker [20]. However, performance of anti-virus mail subsystem hasn’t been tested yet. Finishing researching, choosing and using open-source softwares suitable to implement the above system, but there is not enough

time to correct and to modify them for optimising the system’s performance. • The above services can expand the ability of storing on Data Grid in order to increase flexibility, ability of large storing, security and reduce time consumed during processing. • We can make more functions such as antispam based on Grid Computing technology for email service. REFERENCES [1]. Ian Foster, The Grid, CLUSTERWORLD, vol 1, 2001. [2]. Ian Foster, Carl Kesselman, Steven Tuecke, The Anatomy of Grid, Intl J. Supercomputer Applications, 2001. [3]. Ian Foster, What is the Grid? A Three Point Checklist, Argonne National Laboratory & University of Chicago, 20/06/2002. [4]. Ian Foster,Carl Kesselman, Jeffrey M. Nick, Steven Tuecke, The Physiology of the Grid - An Open Grid Services Architecture for Distributed Systems Integration, Version: 6/22/2002. [5]. Karl Czajkowski, Steven Fitzgerald, Ian Foster, Carl Kesselman, Grid Information Services for Distributed Resource Sharing, Proc. 10th IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press, 2001. [6]. Bart Jacob, How Grid infrastructure affects application design, RedBooks, IBM, 06/2003. [7]. Luis Ferreira,Viktors Berstis, Jonathan Armstrong, Mike Kendzierski, Andreas Neukoetter, Introduction to Grid Computing with Globus, Redbooks,IBM Corp, 09/2003, www.ibm.com/redbooks [8]. Bart Jacob, Luis Ferreira, Norbert Bieberstein, Candice Gilzean, Jean-Yves Girard, Roman Strachowski, Seong (Steve) Yu, Enabling Applications for Grid Computing with Globus, Redbooks, IBM Corp, 06/2003 [9]. Borja Sotomayor, The Globus Toolkit 3 Programmer’s Tutorial, 2003-2004. www.globus.org [10]. Luis Ferreira, Arun Thakore, Michael Brown, Fabiano Lucchese, Huang RuoBo, Linda Lin, Paul Manesco, Jeff Mausolf, Nasser Momtaheni, Karthik Subbian, Olegario, Hernandez, Grid Services Programming and Application Enablement, Redbooks, IBM Corp, 05/2004, www.ibm.com/redbooks

BUILDING ANTI-VIRUS EMAIL AND FILE STORING SERVICE BASED ON GRID COMPUTING

[11]. Luis Ferreira, Arun Thakore, Michael Brown, Fabiano Lucchese, Huang RuoBo, Linda Lin, Paul Manesco, Jeff Mausolf, Nasser Momtaheni, Karthik Subbian, Olegario, Hernandez, Grid Services Programming and Application Enablement, Redbooks, IBM Corp, 05/2004, www.ibm.com/redbooks [12]. William Allcock, Programming with GridFTP Client Library, CLUSTERWORLD volume 2 no 9, 10/2004 [13]. Globus Alliance, MDS ver 2.2 User’s Guide, Globus Aliance, www.globus.org, 3/10/2003. [14]. William Allcock, GridFTP: Protocol Extensions to FTP for the Grid, Argonne National Laboratory, 03/2003 [15]. Rajkumar Buyya and Srikumar Venugopal, The Gridbus Toolkit for Service Oriented Grid and Utility Computing: An Overview and Status Report, Proceedings of the First IEEE International Workshop on Grid Economics and Business Models (GECON 2004, April 23, 2004, Seoul, Korea), 19-36pp, ISBN 0-7803-8525-X, IEEE Press, New Jersey, USA. [16]. Qmail Project, www.qmailrocks.net. [17]. ClamAV Project, www.clamav.net. [18]. Son Ngoc Le, Tai Duc Le, “Grid Computing technology and an experiment application”, Bachelor thesis 2006, Ho Chi Minh University Natural Science. [19]. Tri Minh Vu, Tuan Minh Dang, “The Gridbus Broker and an experiment application”, Bachelor thesis 2007, Ho Chi Minh University Natural Science. [20]. Srikumar Venugopal, Krishna Nadiminti, Hussein Gibbins and Rajkumar Buyya, Designing a Resource Broker for Heterogeneous Grids, The University of Melbourne, Australia, 2006, www.gridbus.org. [21]. Krishna Nadiminti, Hussein Gibbins, Xingchen Chu, Srikumar Venugopal and Rajkumar Buyya, The Gridbus Grid Service Broker and Scheduler (v.3.0) User Guide, The University of Melbourne, Australia, 2006, www.gridbus.org/broker/. [22]. Srikumar Venugopal, Scheduling Distributed Data-Intensive Applications on Global Grids, Doctor of Philosophy, The University of Melbourne, Australia, July2006 [23]. Srikumar Venugopal, Rajkumar Buyya, and Lyle Winton, A Grid Service Broker for Scheduling e-Science Applications on Global Data Grids, The University of Melbourne, Australia, 2005.

459

[24]. Rajkumar Buyya, The Gridbus Toolkit: Creating and Managing Utility Grids for eScience and eBusiness Applications, The University of Melbourne, Australia, 2005. [25]. S. Venugopal, K. Nadiminti, R. Buyya, The Gridbus Broker for Service-Oriented Computational and Data Grids, Gridbus Project, University of Melbourne, Australia, 2005. [26]. Rajkumar Buyya, Introduction to Grid Computing: Trends, Challenges, Technologies, and Applications, The University of Melbourne, Australia, 2005. [27]. Rajkumar Buyya, Grid Resource Management and Application Scheduling, The University of Melbourne, Australia, 2005, www.gridbus.org.

Building Anti-Virus Email and File Storing Service ... - Springer Link

Network security is an urgent need and so has become one of great interests of ... chosen to establish anti-virus system in emails based on Grid Computing: 1.

638KB Sizes 1 Downloads 175 Views

Recommend Documents

Conflict and Health - Springer Link
Mar 14, 2008 - cle.php?art_id=5804]. May 30, 2006. 21. Tin Tad Clinic: Proposal for a Village-Based Health Care. Project at Ban Mai Ton Hoong, Fang District, ...

Enhancing Service Selection by Semantic QoS - Springer Link
Finally, its applicability and benefits are shown by using examples of In- frastructure .... actual transport technology at runtime. However, this .... [32], and it will be extended in the future work to relate business QoS metrics like avail- abilit

Tinospora crispa - Springer Link
naturally free from side effects are still in use by diabetic patients, especially in Third .... For the perifusion studies, data from rat islets are presented as mean absolute .... treated animals showed signs of recovery in body weight gains, reach

Chloraea alpina - Springer Link
Many floral characters influence not only pollen receipt and seed set but also pollen export and the number of seeds sired in the .... inserted by natural agents were not included in the final data set. Data were analysed with a ..... Ashman, T.L. an

GOODMAN'S - Springer Link
relation (evidential support) in “grue” contexts, not a logical relation (the ...... Fitelson, B.: The paradox of confirmation, Philosophy Compass, in B. Weatherson.

Bubo bubo - Springer Link
a local spatial-scale analysis. Joaquın Ortego Æ Pedro J. Cordero. Received: 16 March 2009 / Accepted: 17 August 2009 / Published online: 4 September 2009. Ó Springer Science+Business Media B.V. 2009. Abstract Knowledge of the factors influencing

Quantum Programming - Springer Link
Abstract. In this paper a programming language, qGCL, is presented for the expression of quantum algorithms. It contains the features re- quired to program a 'universal' quantum computer (including initiali- sation and observation), has a formal sema

BMC Bioinformatics - Springer Link
Apr 11, 2008 - Abstract. Background: This paper describes the design of an event ontology being developed for application in the machine understanding of infectious disease-related events reported in natural language text. This event ontology is desi

Candidate quality - Springer Link
didate quality when the campaigning costs are sufficiently high. Keywords Politicians' competence . Career concerns . Campaigning costs . Rewards for elected ...

Mathematical Biology - Springer Link
Here φ is the general form of free energy density. ... surfaces. γ is the edge energy density on the boundary. ..... According to the conventional Green theorem.

Artificial Emotions - Springer Link
Department of Computer Engineering and Industrial Automation. School of ... researchers in Computer Science and Artificial Intelligence (AI). It is believed that ...

Bayesian optimism - Springer Link
Jun 17, 2017 - also use the convention that for any f, g ∈ F and E ∈ , the act f Eg ...... and ESEM 2016 (Geneva) for helpful conversations and comments.

Contents - Springer Link
Dec 31, 2010 - Value-at-risk: The new benchmark for managing financial risk (3rd ed.). New. York: McGraw-Hill. 6. Markowitz, H. (1952). Portfolio selection. Journal of Finance, 7, 77–91. 7. Reilly, F., & Brown, K. (2002). Investment analysis & port

(Tursiops sp.)? - Springer Link
Michael R. Heithaus & Janet Mann ... differences in foraging tactics, including possible tool use .... sponges is associated with variation in apparent tool use.

Fickle consent - Springer Link
Tom Dougherty. Published online: 10 November 2013. Ó Springer Science+Business Media Dordrecht 2013. Abstract Why is consent revocable? In other words, why must we respect someone's present dissent at the expense of her past consent? This essay argu

Regular updating - Springer Link
Published online: 27 February 2010. © Springer ... updating process, and identify the classes of (convex and strictly positive) capacities that satisfy these ... available information in situations of uncertainty (statistical perspective) and (ii) r

Mathematical Biology - Springer Link
May 9, 2008 - Fife, P.C.: Mathematical Aspects of reacting and Diffusing Systems. ... Kenkre, V.M., Kuperman, M.N.: Applicability of Fisher equation to bacterial ...

Subtractive cDNA - Springer Link
database of leafy spurge (about 50000 ESTs with. 23472 unique sequences) which was developed from a whole plant cDNA library (Unpublished,. NCBI EST ...

Hitchin–Kobayashi Correspondence, Quivers, and ... - Springer Link
Digital Object Identifier (DOI) 10.1007/s00220-003-0853-1. Commun. ... sider its application to a number of situations related to Higgs bundles and .... of indefinite signature. ..... Let ωv be the Sv-invariant symplectic form on sv, for each v ∈

HUMAN DIETS AND ANIMAL WELFARE - Springer Link
KEY WORDS: animal welfare, farm animals, utilitarianism, vegetarianism, wildlife. It may be a credit to vegetarian diets that ethical arguments against them are.

Business groups and their types - Springer Link
Nov 23, 2006 - distinguish business groups from other types of firm networks based on the ... relationships among companies; business groups are defined as ...

Epistemic Responsibility and Democratic Justification - Springer Link
Feb 8, 2011 - Ó Springer Science+Business Media B.V. 2011. Many political ... This prospect raises serious worries, for it should be clear that, typically, the.

MAJORIZATION AND ADDITIVITY FOR MULTIMODE ... - Springer Link
where 〈z|ρ|z〉 is the Husimi function, |z〉 are the Glauber coherent vectors, .... Let Φ be a Gaussian gauge-covariant channel and f be a concave function on [0, 1].