An Ant Colony Optimization Algorithm for the Network Inference and Parameter Estimation of S-Systems Philip Christian Zuniga1

Maia Malonzo1,2

Henry Adorna 1

Prospero Naval1

Several modeling frameworks are being used in the analysis of Biochemical pathways. Some models involve the use of polynomial models, while some uses Artificial Neural Networks. In this paper, we concentrate on S-Systems model. This model involves two kinds of parameters, kinetic orders, usually represented by g and h, and rate constants, usually written as: and β . This parameters map one to one onto the structure of the metabolic network Therefore the problem of determining the structure of the model just depends on the estimation of the parameters.

ABSTRACT In this paper, we propose an Ant Colony Optimization algorithm for the network inference and parameter estimation of S-Systems. The ACO has been used for various problems, and with several improvements, it can also be used to the problems that we are considering.

α

Keywords Parameter Estimation, Network Inference, Local Minimum, SSystems

We now look at the structure of an S-system based biochemical model.

1.INTRODUCTION The use of Biochemical Systems Theory (BST) has found many applications in Metabolic Engineering, Drug development, etc. The GMA and S-Systems formulations of BST are non linear differential equations which involves parameters whose values have to be estimated. Unlike linear systems where effective parameter estimation can be used, there is currently no available algorithm which can handle S-System or GMA.

Given n biochemical species, with concentration at time t denoted with: X 1 (t ),... X n (t ) , the biochemical model can be represented as a system of ordinary differential equations of the form. n n dX i (t ) g h = α i ∏ X j ij − β i ∏ X j ij dt j =1 j =1

In this study, we would be solving two problems: the Network Inference and the Parameter Estimation problems. Network inference involves determining which chemical species interact within the biochemical network, and this is usually done using some fit to data. The Network inference problem can be considered as a combinatorial optimization problem, since the goal is to determine whether a certain metabolite is involved in a reaction or not.

Each differential equation in the system can be viewed as a model representing a single reaction. The system of differential equation on the other hand represents a biochemical network consisting of a series of reactions. Given a differential equation, with a metabolite having a 0 kinetic order implies that the metabolite is not involved in the reaction.

2.PROBLEM The parameter estimation problem involves determining the actual parameter values of the model, and thus requires a real-valued algorithm. With this problem, we propose to use a continuous variation of the Ant Colony Optimization algorithm.

The main problem that this research is trying to solve is on how to effectively produce the model given the concentration of the metabolites. We would want to find the set of parameters (kinetic orders and rate constants) that would produce the given concentrations. This is considered as an inverse problem, because we are given the results of a model, and what we need to find is the structure of the model.

It is known that ants have the behavior of leaving a certain kind of chemical called as pheromones. This chemical acts as a tool for the ants to communicate, so that they’ll know which paths are being used by the other ants. The more pheromone concentrated in the area implies that there were more ants which passed by the area, and hence it should be a better route for them to find their food. This idea was the one used by Marco Dorigo, in developing a novel optimization algorithm now known as the Ant Colony Optimization Algorithm.

This inverse problem is a difficult problem since we need to estimate a large number of parameters, given only the time series of the concentration of metabolites. In general, given a set of n metabolites, we need to estimate a 2n rate constants and 2n2 kinetic orders.

1

general optimization problem where, given a set of choices, the goal is to find the best combination of depending on a given fitness function. Usually the fitness is the cost function between the true solution, and the generated by the algorithm.

3.RELATED LITERATURE The problem of finding the corresponding structure and formulation of an S-System given the concentration of metabolites, has generated interest of scientist all around the world. As a result, a group of researchers from UP Diliman, and Munich Germany has formed a weekly seminar group that aims to create a benchmarking framework in solving biochemical systems. This group is collectively known as the MAD [6].

4.2Continuous ACO Algorithm A similar approach is used for solving continuous optimization problems. This time we are given a continuous fitness function,

f ( x1 ,....x n ) .

In their paper, they presented several methods that were used in solving the network inference and parameter estimation problems. Among the methods that were used are: Particle Swarm Optimization (P.Naval), Simulated Annealing (O.Gonzales, M. Echavez), Newton Flow (M.de Paz, R. del Rosario), Genetic Algorithm (M. Bargo). All the methods mentioned, and many of the other methods that were previously used are stochastic methods. This is because to the fact that the hardness of the problem has made it impossible for a deterministic algorithm to solve the problem.

The continuous ACO algorithm solves this problem by first generating a set of m random vectors. These solutions were generated from a multivariate Gaussian distribution. The initial mean and the variance of the multivariate Gaussian distribution is arbitrary. After the generation of each vectors, the fitness of each vectors will be computed and the resulting fitness values will be sorted. The vector that has the best fitness value will then be used for the next iteration of algorithm.

4.ANT COLONY OPTIMIZATION

New solutions will be generated in the next iterations, but this time, instead of using an arbitrary mean for the multivariate Gaussian distribution, the best vector that was generated in the previous algorithm will be used. The process will be repeated, until a certain number of iterations are used, or until a certain threshold value is obtained.

4.1Discrete ACO Algorithm The Discrete Ant Colony Optimization Algorithm can be characterized by two things:

-

The objective is to find the vector x that will

minimize (or sometimes maximize) the fitness function. We have to note, that the elements of the x are real numbers.

The results produced by the various algorithms are almost synonymous. They were able to solve the same networks with accuracy. (HA 96), but almost all of the methods failed on the CM06 network.

-

discrete choices function solution

5.PROPOSED ALGORITHM FOR SSYSTEMS

The probabilistic transition rule to determine the direction of each of the ants. The pheromone update mechanism.

5.1Network Inference

In solving the Traveling Salesman Problem, the algorithm considers a set of n solutions (called as ants) and performs several iterations until a certain termination condition is reached. For the TSP, a solution to the problem is any random tour that does not passes any city more than once and that terminates at the starting point. For the first iteration, the probability that a node will be selected is just the same as the other nodes. While making a tour, each ant will leave pheromone trails on its path. The amount of pheromone that will be left in the path will depend on the distance of the path traveled. The shorter the distance, the more the pheromone that will be left, while the longer the distance, the lesser the amount of pheromone that will be dropped. For the succeeding iterations, the selection of the nodes will depend on the amount of pheromone that was left in the trail. The more pheromone that were left the higher the probability that a certain path will be chosen.

We propose to use the discrete ACO algorithm for the network inference problem. The network inference problem involves the selection of the metabolites that are involved in each of the reaction in the S-System. We made some improvements in the discrete ACO algorithm to make it fit for the network inference problem. In the proposed algorithm, instead of choosing choices one at a time, for each solution, we selected the choices at once, and computed for its fitness value. This was done since the order of selection of metabolites is not material to the problem. Also, to ensure that proportional amount of pheromones will be assigned for each possible solution, we used the following function to assign pheromones

P( X ) = The amount of pheromone that will be left on ground will be updated depending on the cost of value of the solution. Consider a

2

1 F(X ) +1

\Where, F is the fitness function. In the equation, the higher the fitness value, the lesser the amount of pheromone that will be left for the solution. It is easy to notice, that the function would limit the distribution of pheromone between the range 0 – 1.

Consider an S-System model with the rate constants A and kinetic parameters B. 1) Generate random values for the said set of parameters.

Select nodes randomly

2) Compute for the resulting cost of the said randomly generated parameters.

While termination condition is false

3) Store in a table

For each ant

4) Repeat steps 1 – 4 for m times.

Construct solution according to pheromone values

5) Sort the table based on the cost values

Compute fitness value of the constructed solution Assign pheromone values based on fitness value. Use fitness value equation.

6) From the table, randomly select new parameters A and B. The selection should be biased on the parameters with lower costs

Figure 1: Discrete ACO for Network Inference.

7) Use the selected parameters as mean of a multivariate Gaussian distribution. 8) Generate new solutions from the distribution

The fitness function used in this research is the relative error of the resulting model’s output (based on the results of the algorithm) as compared to the model’s output based on the true parameter value. If I is the output based on the algorithm, and I’ is the output based from using the true value, then the fitness function is given by:

F=

9) Compute for the cost 10) Sort the table 11) Update the variance 12) repeat steps 6 – 11 until the terminating condition is satisfied.

I − I'

Figure 2: Continuous ACO for Parameter Estimation Problems

I'

5.3Jumping Ants

5.2Parameter Estimation

The problem of falling on local minimas is a common problem in parameter estimation. The same is true for the parameter estimation of S-Systems. In this research, we propose the use of Jumping Ants. In the algorithm, if the variance starts converging to 0, then it means that the algorithm is approaching a solution. If the resulting fitness value is still high enough, then it means that the algorithm is converging to a local minimum, rather than in a global minimum. If such a condition exists the algorithm should automatically set the variance to a larger constant c, to allow the Gaussian distribution to choose a solution that is significantly farther than the current solution. In this case, the algorithm will start finding other solution other than the current solution.

We propose to use the ACO algorithm for continuous problems in the parameter estimation problem. But several innovations were made to ensure that the algorithm will be useful for S-Systems. 1)

2)

We used the same process of distributing pheremones as the one that we used in the discrete case. This function was use to ensure the normality in the amount of pheromone that is used. Use the fitness value to determine the variance that will be used in the multivariate Gaussian distribution. This step is done to allow the variance to converge to 0 if the fitness value is low enough. On the other hand, it will also allow the variance to be high for high fitness values. This is helpful so that the the Gaussian distribution would be allowed to have a wider variance during cases when the fitness value is high.

6.RESULTS Currently, we are at the process of developing the discrete ACO algorithm for the network inference of biochemical networks. We have already applied the continuous algorithm for the VA04 network. The researchers are currently looking at how the variance will be converging so that it can converge to a solution, and yet allow variability so that it the algorithm can converge to the best solution possible.

The second innovation is one of the most critical step in this algorithm. The challenge right now is to find a suitable relationship between the fitness value and the variance.

The VA04 network is a network consists of 1 independent variable (X0) and 4 dependent variables.

3

dX 1 dt dX 2 dt dX 2 dt dX 2 dt

= 20 X 0 (t ) X 3 (t ) − 0.8 − 10 X 1 (t ) − 0.8 = 8 X 1 (t )0.5 − 3 X 1 (t ) 0.75 = 3 X 3 (t ) 0.75 − 5 X 3 (t )0.5 X 4 (t ) 0.2 = 2 X 1 (t )0.5 − 6 X 4 (t )0.8

We used the fitness function that was presented above. In our experiments, we considered 2 cases:

Figure 4: Exponential Depreciation for the Variance

Case 1: Variance converges independently from the fitness value Case 2: Variance converges with the fitness value. For Case 1, we let the variance converge independently. We used two methods to reduce the variance. The first method is by subtracting a constant from the variance. The second method is done by multiplying a certain constant on the variance. For the first method, it is seen that fitness function converges too slowly, thus allowing the algorithm to find other possible solutions.

Figure 5: Fitness Value Dependence of the Variance For case 2, the variance was decreased by multiplying it to the fitness value of the past iterations. This was done so that the variance becomes dependent on the fitness value. A high fitness value would make the variance larger, hence the algorithm would be more open to other solutions. A low fitness value on the other hand makes the variance smaller, hence the algorithm will be converging to the solution producing the low fitness value. The results have shown that the fitness value was oscillating while converging. This is due to the fact the variance depends on the fitness value. If the fitness value is high then the variance will also be high, hence more variation. While if the fitness value is low, then so is the variance.

Figure 3: Linear Depreciation for the Variance The second method on the other hand makes the variance converges to 0 too quickly. In this case, the algorithm always produces a local minimum, since the algorithm finds a solution; it would be converging to the solution.

Another thing that can be observed is that despite its oscillatory behavior, the magnitude of the fitness value steadily decreases, implying convergence of the solution that is produced by the algorithm.

7.CONCLUSION The researchers are currently on the process on fine tuning the ACO so that it can be used for the network inference and parameter estimation of S-Systems. The researcher plans to use the solution on the network inference algorithm to lessen the search space for the parameter estimation algorithm.

4

The results of the experiments have shown that the possibility of using the fitness value as a determinant for the variance. The ACO algorithm can be used on the network inference and parameter estimation of S-Systems. The discrete ACO algorithm can be used on the network inference problem, while the continuous ACO algorithm will be used for the parameter estimation problem. Several innovations are implemented to make the said algorithms suitable for S-Systems.

8.REFERENCES [1] S. Tsutsui, An Enhanced Aggregation Pheromone System for Real-Parameter Optimization in the ACO Metaphor, ANTS Workshop, (2006), pp. 60-71 [2] S.Tsutsui, M. Pelikan and A. Ghosh, Performance of Aggregation Pheromone System on Unimodal and Multimodel Problems, Proc. Of IEEE, (2005) [3] M.Dorigo and K. Socha, An Introduction to Ant Colony Optimization, Approximation Algorithms and Metaheuristics, (2007) [4] M. Dorigo, G. Di Caro, The Ant Colony Optimization Metaheuristic. New Ideas in Optimization, McGraw-Hill, New York (1999), pp. 11- 32 [5] E. O. Voit and J. Almeida, Decoupling Dynamical Systems for pathway identification of metabolic profiles, Bioinformatics, 20 (2004), pp. 1670–1681. [6] R. Del Rosario, P. Naval Et. Al, MADBUG: Framework for the Network Inference and Parameter Estimation of Biochemical Systems, ICMSB 2008

9.AUTHOR’S AFFILIATION 1)

Department of Computer Science, University of the Philippines – Diliman

2)

Marine Science Institute, University of the Philippines Diliman

5

Proceedings Template - WORD

the network inference and parameter estimation of S-Systems. The. ACO has .... considers a set of n solutions (called as ants) and performs several iterations ...

163KB Sizes 1 Downloads 226 Views

Recommend Documents

Proceedings Template - WORD
This paper presents a System for Early Analysis of SoCs (SEAS) .... converted to a SystemC program which has constructor calls for ... cores contain more critical connections, such as high-speed IOs, ... At this early stage, the typical way to.

Proceedings Template - WORD - PDFKUL.COM
multimedia authoring system dedicated to end-users aims at facilitating multimedia documents creation. ... LimSee3 [7] is a generic tool (or platform) for editing multimedia documents and as such it provides several .... produced with an XSLT transfo

Proceedings Template - WORD
Through the use of crowdsourcing services like. Amazon's Mechanical ...... improving data quality and data mining using multiple, noisy labelers. In KDD 2008.

Proceedings Template - WORD
software such as Adobe Flash Creative Suite 3, SwiSH, ... after a course, to create a fully synchronized multimedia ... of on-line viewable course presentations.

Proceedings Template - WORD
We propose to address the problem of encouraging ... Topic: A friend of yours insists that you must only buy and .... Information Seeking Behavior on the Web.

Proceedings Template - WORD
10, 11]. Dialogic instruction involves fewer teacher questions and ... achievment [1, 3, 10]. ..... system) 2.0: A Windows laptop computer system for the in-.

Proceedings Template - WORD
Universal Hash Function has over other classes of Hash function. ..... O PG. O nPG. O MG. M. +. +. +. = +. 4. CONCLUSIONS. As stated by the results in the ... 1023–1030,. [4] Mitchell, M. An Introduction to Genetic Algorithms. MIT. Press, 2005.

Proceedings Template - WORD
As any heuristic implicitly sequences the input when it reads data, the presentation captures ... Pushing this idea further, a heuristic h is a mapping from one.

Proceedings Template - WORD
Experimental results on the datasets of TREC web track, OSHUMED, and a commercial web search ..... TREC data, since OHSUMED is a text document collection without hyperlink. ..... Knowledge Discovery and Data Mining (KDD), ACM.

Proceedings Template - WORD
685 Education Sciences. Madison WI, 53706-1475 [email protected] ... student engagement [11] and improve student achievement [24]. However, the quality of implementation of dialogic ..... for Knowledge Analysis (WEKA) [9] an open source data min

Proceedings Template - WORD
presented an image of a historical document and are asked to transcribe selected fields thereof. FSI has over 100,000 volunteer annotators and a large associated infrastructure of personnel and hardware for managing the crowd sourcing. FSI annotators

Proceedings Template - WORD
has existed for over a century and is routinely used in business and academia .... Administration ..... specifics of the data sources are outline in Appendix A. This.

Proceedings Template - WORD
the technical system, the users, their tasks and organizational con- ..... HTML editor employee. HTML file. Figure 2: Simple example of the SeeMe notation. 352 ...

Proceedings Template - WORD
Dept. of Computer Science. University of Vermont. Burlington, VT 05405. 802-656-9116 [email protected]. Margaret J. Eppstein. Dept. of Computer Science. University of Vermont. Burlington, VT 05405. 802-656-1918. [email protected]. ABSTRACT. T

Proceedings Template - WORD
Mar 25, 2011 - RFID. 10 IDOC with cryptic names & XSDs with long names. CRM. 8. IDOC & XSDs with long ... partners to the Joint Automotive Industry standard. The correct .... Informationsintegration in Service-Architekturen. [16] Rahm, E.

Proceedings Template - WORD
Jun 18, 2012 - such as social networks, micro-blogs, protein-protein interactions, and the .... the level-synchronized BFS are explained in [2][3]. Algorithm I: ...

Proceedings Template - WORD
information beyond their own contacts such as business services. We propose tagging contacts and sharing the tags with one's social network as a solution to ...

Proceedings Template - WORD
accounting for the gap. There was no ... source computer vision software library, was used to isolate the red balloon from the ..... D'Mello, S. et al. 2016. Attending to Attention: Detecting and Combating Mind Wandering during Computerized.

Proceedings Template - WORD
fitness function based on the ReliefF data mining algorithm. Preliminary results from ... the approach to larger data sets and to lower heritabilities. Categories and ...

Proceedings Template - WORD
non-Linux user with Opera non-Linux user with FireFox. Linux user ... The click chain model is introduced by F. Guo et al.[15]. It differs from the original cascade ...

Proceedings Template - WORD
temporal resolution between satellite sensor data, the need to establish ... Algorithms, Design. Keywords ..... cyclone events to analyze and visualize. On the ...

Proceedings Template - WORD
Many software projects use dezvelopment support systems such as bug tracking ... hosting service such as sourceforge.net that can be used at no fee. In case of ...

Proceedings Template - WORD
access speed(for the time being), small screen, and personal holding. ... that implement the WAP specification, like mobile phones. It is simpler and more widely ...

Proceedings Template - WORD
effectiveness of the VSE compare to Google is evaluated. The VSE ... provider. Hence, the VSE is a visualized layer built on top of Google as a search interface with which the user interacts .... Lexical Operators to Improve Internet Searches.