Computational chemistry development of a unified free energy Markov ...

Viewer
Transcript

Computational Chemistry Development of a Uniﬁed Free Energy Markov Model for the Distribution of 1300 Chemicals to 38 Different Environmental or Biological Systems ´ LEZ-DI´AZ,2,3 GUILLERMI´N AGU ¨ ERO-CHAPI´N,3 MAYKEL CRUZ-MONTEAGUDO,1 HUMBERTO GONZA LOURDES SANTANA,2 FERNANDA BORGES,1 ELENA ROSA DOMI´NGUEZ,4 GIANNI PODDA,3 EUGENIO URIARTE2 1

Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, University of Porto 4050-047, Porto, Portugal 2 Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela 15782, Santiago de Compostela, Spain 3 Dipartimento Farmaco Chimico Tecnologico, Universita´ Degli Studi di Cagliari, 09124 Cagliari, Italy 4 CEQA and CAP, Faculty of Chemistry and Pharmacy, UCLV, 54830, Cuba Received 9 January 2007; Revised 14 February 2007; Accepted 15 February 2007 DOI 10.1002/jcc.20730 Published online 2 April 2007 in Wiley InterScience (www.interscience.wiley.com).

Abstract: Predicting tissue and environmental distribution of chemicals is of major importance for environmental and life sciences. Most of the molecular descriptors used in computational prediction of chemicals partition behavior consider molecular structure but ignore the nature of the partition system. Consequently, computational models derived up-to-date are restricted to the speciﬁc system under study. Here, a free energy-based descriptor (DGk) is introduced, which circumvent this problem. Based on DGk, we developed for the ﬁrst time a single linear classiﬁcation model to predict the partition behavior of a broad number of structurally diverse drugs and other chemicals (1300) for 38 different partition systems of biological and environmental signiﬁcance. The model presented training/ predicting set accuracies of 91.79/88.92%. Parametrical assumptions were checked. Desirability analysis was used to explore the levels of the predictors that produce the most desirable partition properties. Finally, inversion of the partition direction for each one of the 38 partition systems evidences that our models correctly classiﬁed 89.08% of compounds with an uncertainty of only 60.17% independently of the direction of the partition process used to seek the model. Other 10 different classiﬁcation models (linear, neural networks, and genetic algorithms) were also tested for the same purposes. None of these computational models favorably compare with respect to the linear model indicating that our approach capture the main aspects that govern chemicals partition in different systems. q 2007 Wiley Periodicals, Inc.

J Comput Chem 28: 1909–1923, 2007

Key words: chem-informatics; quantitative structure–property relationships; Markov models; free energy; partition coefﬁcients; chemicals environmental distribution

Introduction The distribution of chemicals is measured as their respective partition coefﬁcients are an efﬁcient tool for the quantitative determination of its toxicological/biological activity. Since Overton1 and Mayer2 demonstrated this idea- over 100 years ago, this measure has been used frequently to assess the biological activity of chemicals in all areas of science. Their use today range from medicinal chemistry to environmental sciences.3,4 For instance, breast milk to plasma concentration ratio5 of a drug is generally used to estimate the infant’s exposure to drugs through breast milk. As many women need to take medication while breast feeding, the relevance of this measure is revealed

This article contains Supplementary Material available at http://www. interscience.wiley.com/jpages/0192-8651/suppmat Correspondence to: H. Gonza´lez-Dı´az; e-mail: [email protected] or [email protected] Contract/grant sponsor: Xunta de Galicia; contract/grant numbers: PXIB20304PR, BTF20302PR Contract/grant sponsor: Ministerio de Sanidad y Consumo; contract/grant numbers: PI061457 Contract/grant sponsor: Direccio´n Xeral de Investigacio´n y Desenvolvemento of Xunta de Galicia

q 2007 Wiley Periodicals, Inc.

1910

Cruz-Monteagudo et al. • Vol. 28, No. 11 • Journal of Computational Chemistry

by estimating if a risk to the infant can exceed the beneﬁts of breast feeding.6 Physiologically based pharmacokinetic models to describe the absorption, distribution, and elimination in animals and humans of volatile organic compounds make frequent use of blood/air, saline/air, olive oil/air, and tissue/blood partition coefﬁcients to derive metabolic rate constants. Equally, the solubility of a volatile organic chemical in blood, indicated as the blood/ air partition coefﬁcient, is one of the most important physicochemical properties for understanding the pharmacokinetics of organic solvents.7 In ecosystem management, there is the need to establish scientiﬁcally credible risk assessments for chemical stressors. Aquatic toxicity tests are used to evaluate the potential toxicological effects of chemicals on aquatic organisms.8 In addition to their toxicological features, the bioconcentration factor (BCF) constitute an important parameter to establish the hazard potential of a chemical.9 Another interesting environmental partition system is the snow surface/air partition coefﬁcient. To date, only a few attempts have been made to include the compartments snow or ice into environmental fate models.10–16 One of the major uncertainties in these models concerns the sorption from the air (interstitial or atmospheric) to the snow or ice surface.17 The importance of this interface in the environment has been demonstrated,12,14,15,18 and the need for a better understanding of the interfacial sorption has been pointed out.11,16,19,20 As has been illustrated in the above paragraphs, to assess this kind of values is important in most science ﬁelds. However, experimental determination of these properties is, in many occasions, expensive and restricted to the goal of speciﬁc studies. Consequently, the feasibility of this kind of data is limited. In this sense, the use of computational tools represents a very attractive and practical option to generate reliable predictions of these values based on the experimental data available. Speciﬁcally, quantitative structure–properties/activity relationships (QSPR/ QSAR) studies have been efﬁciently applied in several areas. In this sense, Marrero-Ponce et al. have successfully conducted several QSPR/QSAR studies.21,22 On the other hand, Helguera et al. recently published a quantitative structure–toxicity relationships (QSTR) study applied to the prediction of rodent carcinogenicity.23 Several works have been conducted in the speciﬁc area of modeling the chemical partition behavior for diverse systems7,24–26 as treated in this work. The aforementioned literatures represent a very small fraction of the totality of the works conducted in this area, proving the utility and success of QSAR methods in environmental and life sciences. Despite success in predicting the chemical partition properties, molecular descriptors used in these models only codify structural information; consequently, they are restricted to the partition system under study and the resultant predictions does not go beyond their ﬁeld of application. In a previous work developed by our group, we attempt to overcome the aforementioned limitations.27 This work considered principally systems of biological signiﬁcance, whereas systems of environmental signiﬁcance were poorly represented. In the present article is introduced a new methodology, implemented in Markovian Chemicals In Silico Design

(MARCH-INSIDE) software*,28 able to correlate more than one partition system in a single classiﬁcation model. In addition to the ‘‘multisystem correlation property,’’ free energy-based stochastic indices used in this work encompass information concerning to the partition system and biological species, besides molecular structural information. Stochastic partition free energy (DGk) is able to offer a direct interpretation of predictions in terms of free energy. This descriptor has been successfully applied by our group in the prediction of multiple drug side-effects.29,30 All the aforementioned statements make the MARCH-INSIDE methodology a superior computational chemistry tool with evident advantages over most of the QSPR/QSAR methods developed todate with multiple applications in medicinal chemistry and environmental sciences. Speciﬁcally, in the current work, we developed a uniﬁed thermodynamic Markov model able to classify through a single equation a structurally diverse data of 1300 chemicals and drugs included in 38 biphasic partition systems according to their partition behavior.

Methods Computer Software

The calculation of the molecular descriptors was implemented in the in-house software MARCH-INSIDE.28 This software has a graphical interface that makes it user-friendly for medicinal chemists. Data Base

In total, we collect 1300 compounds including chemicals, drugs, and environmental pollutants. The compounds were grouped in 38 different biological and environmental biphasic partition systems (BPS) measured in several biological species (human, rat, ﬁsh, rabbit). Table1 shows some system atom partition coefﬁcients, the system break point (SBP), the number of compounds on each system as well as the references consulted to collect the data of compounds of each one of the 38 different systems under study. Unfortunately, the data come from highly diverse sources5,7,9,31–44 and have been determined by very different methods with different errors. So, instead of directly predicting partition coefﬁcients, it may be more consistent in terms of accuracy to classify molecules into two groups namely: partition-like (PL), when the experimental log p value is > system break point (SBP), and partition-unlike compounds (PUL), otherwise for an speciﬁc BPS. For the sake of simplicity, all the partition coefﬁcients were written in such a way that partition phenomena are studied from a more organized system to a less organized one, that is, from water to air, or from tissue to plasma.27 Names, BPS, data distribution for statistical analysis, experimental log p values in the speciﬁc BPS and SBP of PL/PUL appear in the supporting information material. SBP is determined here as SBP ¼ (log p valuemax þ log p valuemin)/2, where log p value-

*This is a preliminary experimental version. A future professional version shall be available to the public. For any information about it, send an e-mail to the corresponding author at [email protected] or [email protected].

Journal of Computational Chemistry

DOI 10.1002/jcc

Computational Chemistry Development of a Uniﬁed Free Energy Markov Model

1911

Table 1. Some System Atom Partition Coefﬁcients, System Break Point (SBP), Number of Compounds (N) on Each System, and References Consulted to Collect the Data of Compounds of the Systems Under Study.

System atom partition coefﬁcients (G) Ref.

N

SBPa

C

N

7 7 31 39 38 34 34 44 42 43 9 9 32 32 32 32 37 5 33, 35, 36, 40 7, 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 33, 35, 36, 40 41 41 41

109 100 118 40 12 53 39 43 99 77 20 43 9 16 7 16 63 91 12 87 16 17 15 20 11 20 34 11 8 8 13 12 14 12 8 5 5 17

1.75 0.04 2.82 3.28 1.34 0.83 5.99 3.39 1.23 0.06 0.62 0.87 2.41 1.62 1.17 1.9 2.68 0.73 2.61 0.8 2.1 1.81 0.2 0.38 0.06 0.26 0.27 0.08 0.3 0.01 0.55 0.61 0.81 0.22 0.44 4.3 3.42 4.14

9.98 1.48 1.99 0.49 2.19 1.76 0.66 0.54 1.16 1.11 1.96 0.28 1.5 0.7 0.67 0.5 0.8 3.63 0.7 1.96 0.7 0.42 1.09 0.66 1.1 2.94 0.59 1.75 4.7 0.29 0.44 0.74 0.48 0.42 0.45 1.91 0.48 2.08

1.44 1.44 2.81 1.44 1.44 0.92 4 5 0.3 1.29 1.6 0.35 1.44 1.44 1.44 1.44 1.38 3.63 1.44 1.6 1.44 1.44 1.24 0.6 1.56 3.38 0.58 1.41 2 0.24 0.5 0.81 0.5 0.38 0.5 1.75 0.83 1.67

System Olive oil/airb Saline/airb Soil/airb Snow surface/airb Active carbon/airb Vegetable oil/waterb Water/airb Water/dimyristoyl phosphatidyl cholineb Water/isopropyl miristateb CF3C6F11/CH3C6H5b BMC retention on 0.04 M Brij35b Fish/waterc Adipose tissue/aird Blood/aird Liver/aird Muscle/aird Skin/aird Breast milk/plasmad Adipose tissue/aire Blood/aire Liver/aire Muscle/aire Brain/plasmae Heart/plasmae Intestine/plasmae Lung/plasmae Muscle/plasmae Skin/Plasmae Spleen/plasmae Bone/plasmaf Brain/plasmaf Heart/plasmaf Lung/plasmaf Muscle/plasmaf Skin/plasmaf Vitreous humor/epithelium-plus-stromaf Vitreous humor/stromaf Vitreous humor/whole corneaf

O 3.16 23.5 2.14 1.22 3 0.81 3.75 5.5 0.68 0.73 1.78 0.53 0.33 7 3.16 7 1.55 2.34 0.33 1.78 7 7 4.71 1.12 1.9 6.73 0.92 3.67 4.83 0.57 2.83 4.4 3 2 1.75 1.14 0.25 1.52

S

H

1.78 2 1.08 1.78 1.78 1.25 0.33 1.78 0.63 0.57 2 0.2 1.78 1.78 1.78 1.78 1.78 1.07 1.78 2 1.78 1.78 1 8 4 1.78 4 6 1.78 0.5 1.78 1 0.33 0.33 0.5 1.78 1.78 1.78

12.79 1.71 2.2 0.43 2.14 1.68 0.71 1.22 1.21 0.72 1.74 0.25 1.1 0.81 0.33 0.69 0.92 4.09 0.44 1.74 0.81 0.63 0.95 0.58 0.99 2.8 0.55 1.58 4.19 0.27 0.45 0.77 0.56 0.53 0.36 1.58 0.37 1.95

Cl 5 0.95 0.61 1.56 0.8 1.56 1.56 0.01 1.56 1.56 1.56 0.31 3 1.56 3 1.56 1.56 5.75 1.56 1.56 1.56 1.56 0.67 2 1.56 2 0.22 1.56 1.56 0.67 1.56 0.25 1.56 1.56 0.67 1.56 1.56 1.56

SBP ¼ (log p valuemax þ log p valuemin)/2. log p valuemax is the highest partition coefﬁcient value for the speciﬁc biphasic partition system and log p valuemin is the lowest value. PL ¼ log p value > SBP; PUL otherwise. b Abiotic. c Measured on ﬁsh. d Measured on human. e Measured on rat. f Measured on rabbit. a

max and log p valuemin are the higher and lower partition coefﬁcient values in the system studied.

PL ¼ b þ bS1 ;1 G1 ðS1 Þ þ bS2 ;2 G2 ðS2 Þ X þ þ bSx ;k Gk ðSx Þ ¼ b þ bSx ;k Gk ðSx Þ

LDA Classiﬁcation Model

Using the MARCH-INSIDE methodology, as deﬁned previously, we can attempt to develop a simple linear QSPR with the general formula:

ð1Þ

Sx ;k

Here, DGk acts as the molecular descriptors. We selected linear discriminant analysis (LDA)45 to ﬁt the discriminant function. The model classify molecules into two general groups;

Journal of Computational Chemistry

DOI 10.1002/jcc

1912

Cruz-Monteagudo et al. • Vol. 28, No. 11 • Journal of Computational Chemistry

namely partition-like (PL ¼ 1, when partition coefﬁcient > SBP) and partition-unlike compounds (PL ¼ 1, otherwise) for a speciﬁc system. In eq. (1), bSx,k represents the coefﬁcients of the classiﬁcation function, determined by the least square method as implemented in the LDA module of the STATISTICA 6.0 software package.46 Sx represents a speciﬁc group of atoms in the molecule. When Sx contains all the atoms in the molecule, DGk(j) becomes a global molecular index then we write DGk(T). We can calculate different families of molecular descriptor by selecting different Sx conditions; for this purpose, atoms were grouped in sets or classes (sx): s0 ¼ CSat ¼ saturated carbon atom; s1 ¼ CInst ¼ unsaturated carbon atom; s2 ¼ Hal ¼ halogens; s3 ¼ Het ¼ heteroatoms; or s4 ¼ HX ¼ hydrogen-bonded to heteroatom to describe local aspects of molecular structure. All the variables to be explored were standardized.47 Forward stepwise, forward entry, backward stepwise, backward removal, and best subset methods were used for variable selection.45,48 LDA model’s quality was determined by examining Wilk’s U statistic, Fisher ratio (F), square Mahalanobis’s distance (D2), p-level (p), eigenvalues, and canonical regression coefﬁcient (RCAN). We also inspected the percentage of good classiﬁcation, cases/variables ratios, and number of variables to be explored to avoid overﬁtting or chance correlation.45,49 Randic’s orthogonalization procedure was carried out to avoid collinearity.50–54 Resubstitution of cases in four external predicting series was used to prove predictability of the classiﬁcation models.29,30,47 First, we select at random 75% of the compounds for the training set and the rest for predicting set. We call this experiment the validation set 1. Afterwards, compounds in predicting series are interactively interchanged with those in training ones (validation set 2, validation set 3, validation set 4). We report accuracy for training, predicting, and full set as well as averaged results.29,30,47 Finally, we developed desirability analysis. Using the Proﬁler option implemented in the general discriminant analysis (GDA) module, you can specify desirability functions for the dependent variables, and search for the levels of the independent variables that produce the most desirable responses on the dependent variables.46 PL/PUL Classiﬁcation Based on Artiﬁcial Neural Networks and Genetic Algorithms Regression Models

The ability of artiﬁcial neural networks (ANN) to capture nonlinear relationships has been widely reported in the literature. The work of Fernandez and Caballero well illustrate the efﬁcacy of this method.55–59 In this sense, ANN60,61 classiﬁcation models were also developed using the variables included in the best LDA model founded. The ANN models were performed using ﬁve different network architectures [three-layer perceptrons (MLP3), four-layer perceptrons (MLP4), linear network training, probabilistic neural network (PNN), and radial basis function (RBF)] included in the intelligent problem solver analysis implemented in the neural network module of the STATISTICA 6.0 software package.46 On the other hand, the classiﬁcation of the compounds in the data set was also carried out by using a regression model with

the same general formula of eq. (1), with log p being used as the dependent variable, classifying the compounds according to the model predictions. If the predicted log p is >SBP, then the compound is classiﬁed as PL; otherwise, the compound is classiﬁed as PUL. The most signiﬁcant variables were identiﬁed from the dataset using the genetic algorithm (GA) method.62 The speciﬁcities for the GA were linear polynomial equation term type, mutation probability of 30%, length of the equation of seven terms and a constant population size of 300.

Results and Discussion Uniﬁed Markov Model

MARCH-INSIDE approach (Markovian chemicals in silico design) can be used to calculate different molecular properties including kinetic constants and thermodynamic free energies DGk63 for drug partition processes:27,29,30,64–85 Gk ¼ RT logðk Þ

(2)

where R represents the universal gas constant and T is the absolute temperature of the system. Here, kG represents the average molecular kinetic constant characterizing the extension of partition at time tk ¼ k. It can be deﬁned as the sum of the initial (k ¼ 0) kinetic constant 0G(j,ps,bs) for every jth atom in the molecule multiplied by the average probability Apk(j,ps,bs) of the atom to undergo partition at time tk ¼ k. Every 0G(j,ps,bs) depends on the speciﬁc partition system (ps) and if applied on the given biological species (bs) at speciﬁc time tk: k

¼

n X

A

pkðj;ps;bsÞ

0

ðj;ps;bsÞ

(3)

j¼1

These probabilities introduce the idea of the decomposition of the drug partition process in a step-by-step manner, passing one atom from one phase to the other one in each step at discrete intervals of time tk. Figure 1 illustrates for simple molecule several alternative ways in which this process may take place. One may easily calculate the probabilities Apk(j,ps,bs) using the so-called Chapman–Kolmogorov equations. First, we express the sum (2) as a vector–vector product A’k 0G, where A’0 and G0 are vectors listing the elements Apk(j,ps,bs) and 0G(j,ps,bs), respectively. Afterwards, one may decompose A’k as a vector–matrix product A’0 kP, where A’0 is a vector listing the initial absolute probabilities of each atom to undergo partition and kP are the natural powers of 1P, which is the atom–atom partition probabilities matrix:

k

G¼

Journal of Computational Chemistry

n X

A

pkðj;ps;bsÞ 0 Gðj;ps;bsÞ ¼ A’k 0 G ¼ A’0 ðP1 Þk 0 G

j¼1

(4)

DOI 10.1002/jcc

Computational Chemistry Development of a Uniﬁed Free Energy Markov Model

1913

Figure 1. Stochastic step-by-step partition process. [Color ﬁgure can be viewed in the online issue, which is available at www.interscience.wiley.com.]

21

k

G ¼ ½ 0 p1

0

p2

p11 6 1 p21 6 0 pn :6 6 4 1 pn1

1

p12

1 pij

3 20 3 p1n G1 6 0 G2 7 7 76 7 6 7 7 7:6 7 54 5 1 0 pnn Gn (5) 1

This important equations allow us in calculating the change of the partition coefﬁcient with time kG considering only the initial kinetic constant (0G(j,ps,bs)), the initial absolute probabilities (0pk), and the conditional atom–atom probabilities 1pij of the jth atom in the molecule to undergo partition. Fortunately, these elements can be calculated in MARCH-INSIDE theory with simple equations: 0

Gðj;ps;bsÞ ¼

0g nPLðj;ps;bsÞ ðj;ps;bsÞ ¼ e RT nPULðj;ps;bsÞ

(6)

0 Gðj;ps;bsÞ pðj;ps;bsÞ ¼ P m 0G ðj;ps;bsÞ

(7)

ij 1 Gðij;ps;bsÞ pðij;ps;bsÞ ¼ þ1 P im 1 Gðim;ps;bsÞ

(8)

0

j¼1

1

j¼1

where nPL(j,ps,bs)/nPUL(j,ps,bs) are the number of jth atoms in PL/ PUL molecules, is the valence of the jth atom, m represents all the atoms in the molecule including jth atom, and ij represents the adjacency relationships between ith and jth atom (if ith and jth atoms are adjacent, then ij ¼ 1, otherwise ij ¼ 0). Figure 2 illustrates matrix calculation for a simple molecule.

A particular advantage of using average molecular kinetic constants (kG) characterizing the extension of partition at time tk ¼ k to derive thermodynamic partition free energies (DGk) relies on the simplicity of the solution offered to complicated chemistry problems usually addressed by means of molecular dynamics and Monte Carlo simulations.86,87 Because, in general, molecular systems consist of a large number of particles, it is impossible to ﬁnd the properties of such complex systems analytically. Molecular dynamics simulation (MDS) circumvents this problem by using numerical methods. It represents an interface between laboratory experiments and theory and can be understood as a virtual experiment.88 On the other hand, Monte Carlo simulation (MCS) is distinguished from MDS by being stochastic, that is nondeterministic in some manner—usually by using random numbers (or, more often, pseudo-random numbers)—as opposed to the deterministic algorithms used in MDS.89 A common limitation of MDS and MCS is that their design should account for the available computational power. Simulation size, time step, and total time duration must be selected so that the calculation can ﬁnish within a reasonable time period. However, the simulations should be long enough to be relevant to the time scales of the natural processes being studied. To obtain these simulations, usually a very high computational cost and time are needed.90 In contrast, by using DGk, it is possible to encode information related not only to the molecular structure but other features involved on the phenomenon under study at very low computational cost and time. The analysis carried out from now on well illustrates this ability that we have called ‘‘multisystem correlation property.’’ In Silico Prediction of Distribution Proﬁle—LDA Classiﬁcation Model

Several variable selection methods were applied to ﬁnd the best LDA equation able to discriminate between PL and PUL

Journal of Computational Chemistry

DOI 10.1002/jcc

Cruz-Monteagudo et al. • Vol. 28, No. 11 • Journal of Computational Chemistry

1914

Figure 2. Deﬁnition and calculation of P1 matrix for a speciﬁc compound. The element symbol is used to denote the value of the system atom partition coefﬁcient (G) [i.e., C represents the atom partition coefﬁcient (GC) of carbon atom for the speciﬁc biphasic partition system and biological specie].

compounds.45,49 Best subset selection variable method proves to be the best selection method for our data set. Model assessed by this method was able to correctly classify the same average of the compounds by using ‘‘backward’’ models (91.79%) in training set, but having two less variables (a four-parameter equation) (Table2). The average of correctly classiﬁed compounds in predicting series for this best subset model was very close to the

averages obtained in the aforementioned ‘‘backward’’ models (backward models ¼ 89.85%*/best subset model ¼ 88.92%), despite the present four parameters in the discriminant equation. The aforementioned results prove the predictive ability of the ﬁve models obtained, speciﬁcally for the best subset model. Having selected the best subset model [eq. (9)] as the more predictive and statistical signiﬁcant LDA equation is worthy to

Table 2. Summary of the Best LDA Models Assessed by Different Five Variable Selection Methods.

Variables Model

B0

DG5(T)

DG1(CSat)

DG5(HX)

DG0(T)

DG0(HX)

DG0(Hal)

DG0(CSat)

1a 2 3 4 5

0.36 0.35 0.35 0.38 0.38

3.10 3.80 3.80 3.16 3.16

0.89 1.56 1.56 1.04 1.04

1.01 0.94 0.94 1.47 1.47

1.54 – – 1.22 1.22

– – – 0.94 0.94

– – – 0.40 0.40

– 1.41 1.41 – –

Statistics N

Model a

1 2 3 4 5

1 1 1 1 1

300 300 300 300 300

D2

F

p

U

RCAN

Eigenvalue

%TTrain

%TPred.

5.88 5.84 5.84 6.42 6.42

355.32 352.78 352.78 258.09 258.09

0.00 0.00 0.00 0.00 0.00

0.41 0.41 0.41 0.38 0.38

0.7709 0.7698 0.7698 0.7844 0.7844

1.4652 1.4548 1.4548 1.5997 1.5997

91.79 89.85 89.85 91.79 91.79

88.92 86.77 86.77 89.85 89.85

1: Best Subset Model; 2: Forward Stepwise Model; 3: Forward Entry Model; 4: Backward Stepwise Model; 5: Backward Removal Model; %TTrain: Global average of compounds correctly classiﬁed in training set; %TPred.: Global average of compounds correctly classiﬁed in external predicting set. a Best LDA model according to the parsimony principle (Occam’s Razor).

Journal of Computational Chemistry

DOI 10.1002/jcc

Computational Chemistry Development of a Uniﬁed Free Energy Markov Model

1915

Table 3. Average of Correctly Classiﬁed Compounds and Resubstitution Validation in Four Different

External Predicting Sets Using Best Subset Model.

Validation Set 1 Full Set (1300, 692, 608)a Train Set (975, 522, 453)a Predicting Set (325, 170, 155)a Validation Set 2 Full Set (1300, 692, 608)a Train Set (975, 518, 457)a Predicting Set (325, 174, 151)a Validation Set 3 Full Set (1300, 692, 608)a Train Set (975, 517, 458)a Predicting Set (325, 175, 150)a Validation Set 4 Full Set (1300, 692, 608)a Train Set (975, 519, 456)a Predicting Set (325, 173, 152)a Validation Averaged Results Full Set Train Set Predicting Set Statistics N 1300

%PL

%PUL

%Total

91.33 91.95 89.41

90.79 91.61 88.39

91.08 91.79 88.92

91.33 90.93 92.53

91.12 91.25 90.73

91.23 91.08 91.69

91.18 91.88 89.14

91.78 91.27 93.33

91.46 91.59 91.08

90.90 89.98 93.64

92.60 92.32 93.42

91.69 91.08 93.54

91.18 91.18 91.18

91.57 91.61 91.47

91.37 91.38 91.31

D2 5.88

F 355.32

P 0.0000

U 0.4100

RCAN 0.7709

Eigenvalue 1.4652

a

Total number of compounds, number of PL compounds, number of PUL compounds.

offer a more detailed validation of the statistical signiﬁcation and predictability of it. PL ¼ 0:36 þ 3:1G5 ðTÞ 1:01G5 ðHXÞ þ 1:54G0 ðTÞ 0:89G1 ðCSatÞ ð9Þ In this sense, we decide to report, besides the results reached in training and predicting sets, the averaged results of the four validation sets used here as a resubstitution technique to test the predictive ability of the model. As can be seen, the values in all validation sets remain stable over a global averaged value of correctly classiﬁed compounds of 91.38%/91.31%/91.37% in training/predicting/full set. Results of training, predicting, and full sets in four different validation sets are shown in Table3. Parametrical assumptions (normality, homocedasticity or homogeneity of variances, and noncollinearity) as well as the correct speciﬁcation of the mathematical form of our model are very important aspects in the application of multivariate statistic techniques to QSPR.49,91 The validity and statistical signiﬁcation of any model is strongly conditioned by the aforementioned factors. Details on these parametrical assumptions can be found in refs. 49 and 91. In our case, the mathematical form of our model was chosen to be linear because, in the absence of prior information, this is the simplest mathematical form to assume. In support of this choice, visual examination of the distribution of the residuals for the 1300 cases (residuals against cases) did not show any characteristic pattern.92

With respect to normality, in our case, not all the variables included in the equation exhibit adequate values of skewness and kurtosis,91,93 which is a signal of deviation from the normal distribution. In addition, most signiﬁcant test of normality Kolmogorov–Smirnov (d), Shapiro–Wilk’s (W), and chi-square test (2)47 also rejected the hypothesis of normal distribution of the explored variables (see details in supporting information material). However, variables look normally distributed after visual examination of frequency distribution histograms (see in supporting information material). These results indicate us just a slight deviation from the normal distribution. So, the predictive ability and inferences reached should not be affected, considering the robustness of multivariate statistical techniques.49,91 As can be noted in the scatter plots of the four predictive variables against their respective square residuals (see supporting information material), any systematic pattern is observed, which indicates that homocedasticity assumption is fulﬁlled.91 As was detailed in the above paragraphs, collinearity of the variables within the model is a nondesirable property, which complicates model interpretation.47,94 A very simple way to inspect the collinearity of our variables is checking the resultant matrix correlation. If the correlation coefﬁcients (R) between pairs of variables are under 0.8, then the multicollinearity is not a serious problem in our model.91 In our case, the higher R value (0.798) is very close to 0.8 but the rest are between 0.678 and 0.225 (see correlation matrix in supporting information material).

Journal of Computational Chemistry

DOI 10.1002/jcc

Cruz-Monteagudo et al. • Vol. 28, No. 11 • Journal of Computational Chemistry

1916

Table 4. Summary of the Best Neural Network Models.

Neural network model Three-layer perceptrons (MLP3)e Four-layer perceptrons (MLP4) Linear network training Probabilistic neural network (PNN) Radial basis function (RBF)

Proﬁlea

Train perf.b

Select perf.c

Test perf.d

Correct (%)PULf

Wrong (%)PULg

Correct (%)PLh

Wrong (%)PLi

4:4-7-1:1

94.23

93.08

93.46

93.91

6.09

93.79

6.21

4:4-10-7-1:1

93.46

92.69

93.08

93.26

6.74

93.21

6.79

4:1:1 4:4-780-2-2:1

91.28 78.59

92.69 79.62

89.23 76.54

91.12 56.09

8.88 43.91

91.19 97.98

8.82 2.02

4:1:1

52.44

52.31

50.38

51.97

48.03

52.02

47.98

a

Network proﬁle, for instance: 4:4-10-7-1:1 means 4 variables (inputs) in the NN model:4 neurons in the ﬁrst layer-10 neurons in the second (hidden) layer-7 neurons in the third (hidden) layer-1 output in the fourth layer:1 output. b Performance of correctly classiﬁed compounds in training set. c Performance of correctly classiﬁed compounds in selection set. d Performance of correctly classiﬁed compounds in predicting set. e Best NN model. f Average of correctly classiﬁed PUL compounds. g Average of misclassiﬁed PUL compounds. h Average of correctly classiﬁed PL compounds. i Average of misclassiﬁed PL compounds.

In any case, we applied Randic´ orthogonalization procedure to eliminate any possible collinearity.50–52 Since model interpretation is always attractive, eq. (9) was orthogonalized [eq. (10)]. Variation in the coefﬁcients values was minimally contrasted with the nonorthogonalized model [eq. (9)], proving the low correlation between the variables. PL ¼ 0:36 þ 3:5 1 O5 ðTÞ 0:86 2 O5 ðHXÞ þ 1:55 3 O0 ðTÞ 0:89 4 O1 ðCSatÞ ð10Þ

ANN and GA Classiﬁcation Models

The best seven parameters of regression equation founded through a GA selection method together with the statistical parameters is shown in eq. (11). The model correctly classiﬁed 511 out of 692 PL compounds (73.84%) and 455 out of 608 PUL compounds (74.84%), total accuracy was 74.31%. Compared to LDA, the results reached here are quite good, but do not overcome the best subset LDA model results. Log SPC ¼ 0:218ð0:08Þ þ 0:692ð0:046ÞG1 ðTÞ 0:232ð0:046ÞG0 ðCSatÞ þ 0:232ð0:023ÞG5 ðHalÞ 0:194ð0:03ÞG1 ðHetÞ 19:367ð3:057ÞG3 ðHXÞ þ 41:463ð6:633ÞG4 ðHXÞ 22:363ð3:606ÞG5 ðHXÞ R ¼ 0:574

ð11Þ

performances. The best results were achieved by using multilayer perceptron (MLP) architectures such as MLP3 (see Table 4 for details). Nevertheless, all ANN models resulted more complicated than and only as accurate as the LDA one. Therefore, having obtained more than 10 classiﬁcation models, we consider the best subset LDA model [eq. (10)] as the more parsimonious model conﬁrming the linearity of the problem. After concluding that best subset LDA was the best model, we decided to shed light on a set of conditions (levels of the predictive variables) that increases the partition of the compound. In this sense, we used desirability analysis to ﬁnd the balance of molecular properties that optimizes the overall partition of the ﬁnal product. According to the desirability graph (Fig. 3), the optimal values should be 2.028, 1.93, 1.1247, and 1.372 for 1O5(T), 2O5(HX), 3O0(T), and 4O1(CSat), respectively. Additionally, it is interesting to interpret the effects on overall partition of different combinations of levels of all pairs of independent variables, depicted in Figure 4 as contour plots. This procedure involves transforming scores on each of the four variables into partition scores that could range from 0.0 (represented in green) for undesirable (low partition) to 1.0 (red) for very desirable (high partition). For instance, there is an interaction between saturated carbon and heteroatom-bound hydrogen atom’s partition free energies [4O1(CSat) and 2O5(HX)], which indicates that both values should be balanced at low levels to increase partition, no matter the partition system under study.

R2 ¼ 0:330 F ¼ 90:926 S ¼ 1:92 p < 0:0000

Additionally, several ANN classiﬁcation models were developed trying to improve the LDA results. As can be noted in Table 4, RBF model did not show any promising results. However, the rest of used network architectures showed satisfactory

Checking Physical Coherence of the Proposed Partition Probabilities

In this work, the probabilities with which the jth atom pass from phase A to B are calculated as the ratio between the afﬁnity (wj) of the jth atom by the phase B and the sum of these afﬁnities

Journal of Computational Chemistry

DOI 10.1002/jcc

Computational Chemistry Development of a Uniﬁed Free Energy Markov Model

1917

Figure 3. Proﬁles for posterior probabilities and desirability by using eq. (10). [Color ﬁgure can be viewed in the online issue, which is available at www.interscience.wiley.com.]

for all atoms in the molecule. The nature of the parameter wj will be discussed below. To ensure physical coherence, the probabilities calculated have to be in correspondence with the following properties: (1) the order in which atoms undergo partitions (ij or ji); (2) the direction toward which the partition undergoes () or (); (3) the probabilities deﬁned have to be invariant to the change of the system of reference for the partition phenomena. It means measuring the concentration of the drug in phase A or B (see

Table5 and Fig. 5).27,65,69 These three properties can be explained in detail as follows: 1. It is important to consider the order in which atoms undergo partition, mathematically expressed as the change of wj with 1/wj. Let partition process of a molecule be from the Phase A to B (represented as )) and from Phase B to A (() to ensure that the physical coherence of the probabilities deﬁned has to be asymmetrical with respect to this factor. That is to

Journal of Computational Chemistry

DOI 10.1002/jcc

1918

Cruz-Monteagudo et al. • Vol. 28, No. 11 • Journal of Computational Chemistry

Figure 4. Desirability surface/contours plots by using eq. (10). [Color ﬁgure can be viewed in the online issue, which is available at www.interscience.wiley.com.]

say, the probability [kpij(A, ))] with which the jth atom undergoes a partition from A to B ()), given that the ith atom has passed, may not be necessarily equal to the probability [kpji(A, ))] of ith atom passing, given that the jth atom has passed. This is consequence of the different propensity of ith and jth atom to pass from A to B. For instance, the probability [kpCO(A, ))] with which the oxygen atom in carbon monoxide pass from water phase to n-octanol phase, given that the carbon atom has passed, may not be necessarily equal to the probability [kpOC(A, ))] of the carbon atom

passing, given that oxygen atom has passed. See Table 5 for the relationship between probabilities, for instance (c) = (d), as well as Figure 5 for graphic illustration. 2. Actually, the propensity of ith atom to pass from A to B ()) is different from the propensity of ith atom to pass from B to A ((). So, the corresponding probabilities [kpij(A, ))] of the jth atom to undergo partition from A to B ()) have to be in general different from the probability [kpij(A, ()] of the jth atom to undergo partition from B to A ((). For instance, the probability [kpCO(A, ))] with which the oxygen

Journal of Computational Chemistry

DOI 10.1002/jcc

Computational Chemistry Development of a Uniﬁed Free Energy Markov Model

1919

Table 5. Equality and Inequality Relationships Among Different Mathematical Formulation of

Atomic Partition Probabilities After Inversion of the Partition Direction, Change of the Reference System, or Change in Atom Order to Undergoing Partition.

6¼

1 wj ðeÞ pij ðA; (Þ ¼ P 1 wm k 1 1 w w j j ¼ ðgÞ k pij ðB; (Þ ¼ P 1 1 P wm wm k

atom in carbon monoxide pass from water phase to n-octanol phase may not be necessarily equal to the probability [kpCO(A, ()] of passing the oxygen atom from n-octanol phase to water phase. See Table 5 for the relationship between probabilities, for instance, (c) = (e) and (d) = (f). 3. We deﬁned the system of reference in such a way that partition takes place from the more organized phase (A) to the less organized one (B) and we measure the concentration of the drug in A. An experiment measuring the probability of partition in one direction has to give the same result independent of the system of reference. That means, no matter where we measure the concentration of the drug in A or B, partition probability and then partition coefﬁcient in one direction have to be the same. This fact is straightforward to realize by changing wj with wj in the expression of probability [see Table 5 and note that (a) ¼ (c), (b) ¼ (d), (e) ¼ (g), and (f) ¼ (h)].

As mentioned above, some theoretical modiﬁcations have to be carried out with respect to the previous work.27 In the referenced work, wj is the atomic standard partition free energy (1gij) with which the jth atom pass from A to B. This is not coherent because inverting the sign of 1gij physically means inverting the direction of partition but in mathematical terms it indicates a change on the system of reference. See properties (2) and (3) above and Table 5 and Figure 5 for the relationships among the probabilities (a), (c), and (e). We can alternatively use wj as the number of jth atoms (nPL(j,ps,bs)) on partition-like molecules but in this case, the partition probabilities depend on the size of the sample used. Therefore, we have selected to estimate wj the atomic partition coefﬁcient (1G(ij,ps,bs)). Contrary to 1gij, the value G physically means the change of the system of reference and 1/G indicates inversion of the direction of partition in concordance with mathematic deﬁnition (see Table 5 and Fig. 5).

6¼

6¼

ðwi Þ wi ¼P ðbÞ 1 pji ðB; )Þ ¼ P wm ðwm Þ k wi ðdÞ k pji ðA; )Þ ¼ P wm

6¼

ðw Þ w ðaÞ 1 pij ðB; )Þ ¼ P j ¼ P j wm ðwm Þ k w ðcÞ 1 pij ðA; )Þ ¼ P j wm

1

ðfÞ k pji ðA; (Þ ¼ Pwi

6¼

1 wm

k

1 1 w ðhÞ k pji ðB; (Þ ¼ i ¼ Pwi 1 1 P wm wm

6¼

Once we discussed in theoretic terms the physical coherence of this approach, we are going to carry out a last experiment to ensure it. As mentioned before, we deﬁned the system of reference in such a way that partition takes place from the more organized phase (A) to the less organized one (B). In the present work, we demonstrate for the ﬁrst time for a QSAR study the property (2). To demonstrate the stability of our model to the inversion of the partition direction, we selected one of the ﬁve LDA models proposed in Table 2, speciﬁcally, LDA forward stepwise model (model 2). Afterwards, we changed directly in the QSAR equation the sign of the molecular partition energies and the PL/PUL classiﬁcation of the drugs for a given partition system. Later, we recalculate the coefﬁcients of the equation again. This process was applied one by one to each one of the 38 partition systems. As can be noted in Table6, there is only small variation in the coefﬁcients. This test provides evidences that our models correctly classiﬁed 89.08% of compounds with an uncertainty of only 60.17% independently of the direction of the partition process used to seek the model.

Conclusions In summary, by means of average molecular kinetic constants (kG), characterizing the extension of partition at time tk ¼ k was possible to derive thermodynamic partition free energies (DGk). This free energy-based descriptor is able to encode information related to the partition system and the biological species in addition to structural information. This particular feature allowed to develop for the ﬁrst time a single linear classiﬁcation model able to predict the partition behavior of a broad number of structurally diverse drugs and other chemicals for numerous and dissimilar partition systems and biological species. This uniﬁed model exhibits an excellent accuracy and predictability, fulﬁlls the principal parametrical assumptions and at the same time overcomes the main limitations of classic QSPR models developed up-to-date.

Journal of Computational Chemistry

DOI 10.1002/jcc

Figure 5. Schematic representation of different atomic partition probabilities formulated in Table 5. Blue arrow represents partition direction, a wave line the interface. A pipette is placed over the measured systems A or B (reference system). Atoms in gray/light blue are those ith/jth atoms that have/do not have undergone partition.

Computational Chemistry Development of a Uniﬁed Free Energy Markov Model

1921

Table 6. Details of the Experiment Performed to Check Physical Coherence of the Model.

Forward Stepwise Classiﬁcation Model: PL ¼ 0.35 þ 3.8DG5(T) þ 1.41DG0(CSat) 1.56DG1(CSat) 0.94DG5(HX); N ¼ 1300; %T ¼ 89.08 b0

Reverted system Air/blooda Plasma/brainb Plasma/heartc Air/liverb Air/musclea Plasma/musclec Air/saline1 Plasma/skinc Plasma/skinb Epithelium-plus-stroma/vitreous humorc Plasma/brainc Plasma/breast milka Plasma/heartb Air/livera Air/muscleb Plasma/muscleb Air/skina Plasma/spleenb Stroma/vitreous humorc Dimyristoyl phosphatidyl choline/waterd Air/active carbond Air/adipose tissuea Plasma/bonec Water/ﬁshe Plasma/lungc Air/soild Plasma/intestineb Plasma/lungb Isopropyl miristate/waterd BMC retention on 0.04 M Brij35d Air/snow surfaced Whole cornea/vitreous humorc Air/adipose tissueb CH3C6H5/CF3C6F11d Air/waterd Air/bloodb Water/vegetable oild Air/olive oild Mean Standard Deviation

0.33 0.34 0.34 0.33 0.32 0.35 0.40 0.36 0.32 0.34 0.35 0.29 0.38 0.33 0.33 0.39 0.50 0.34 0.35 0.36 0.31 0.35 0.34 0.31 0.36 0.19 0.33 0.32 0.77 0.26 0.27 0.29 0.31 0.34 0.27 0.27 0.25 0.52 0.34 0.12

DG5(T) 3.80 3.80 3.81 3.80 3.80 3.80 3.82 3.80 3.80 3.79 3.81 3.83 3.81 3.79 3.80 3.81 3.82 3.80 3.80 3.82 3.78 3.79 3.79 3.75 3.81 3.75 3.79 3.80 4.00 3.78 3.76 3.78 3.77 3.80 3.77 3.73 3.78 3.87 3.80 0.04

DG0(CSat) 1.41 1.41 1.40 1.41 1.40 1.40 1.44 1.41 1.41 1.41 1.40 1.41 1.40 1.41 1.40 1.40 1.42 1.41 1.41 1.40 1.42 1.42 1.40 1.40 1.40 1.38 1.41 1.41 1.40 1.40 1.39 1.40 1.41 1.36 1.40 1.45 1.39 1.63 1.41 0.04

DG1(CSat)

DG5(HX)

1.56 1.56 1.56 1.56 1.56 1.55 1.60 1.55 1.55 1.55 1.56 1.55 1.55 1.55 1.56 1.56 1.57 1.55 1.55 1.58 1.55 1.56 1.55 1.53 1.55 1.53 1.56 1.55 1.62 1.54 1.54 1.54 1.55 1.51 1.55 1.55 1.53 1.63 1.56 0.02

0.94 0.95 0.95 0.94 0.94 0.94 0.94 0.95 0.94 0.94 0.95 0.92 0.95 0.94 0.94 0.96 0.98 0.94 0.95 0.94 0.94 0.94 0.94 0.97 0.95 0.92 0.94 0.94 0.97 0.94 0.92 0.94 0.94 0.94 0.93 0.93 0.93 0.97 0.94 0.01

%T

Res.

89.08 89.08 89.08 89.08 89.08 89.08 89.08 89.08 89.08 89.08 89.15 89.15 89.15 89.15 89.00 89.00 89.15 89.00 89.00 89.15 89.23 89.23 89.23 89.23 89.23 89.23 88.92 88.92 88.92 89.31 89.31 89.31 89.38 89.38 89.38 89.46 89.46 88.62 89.14 0.17

0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.23 0.23 0.23 0.31 0.31 0.31 0.38 0.38 0.46 – –

BMC retention: Retention on biopartitioning micellar chromatography, a chromatographic system constituted by polioxoethylene (23) lauryl ether, Brij35, micellar mobile phases, and a C18 reversed stationary phase. Brij35: 3,6,9,12,15,18,21,24,27-Nonaoxanonatriacontan-1-ol or Polidocanol. A mixture of monolauryl ethers of polyoxyethylene glycols having a statistical average of eight ethylene oxide groups per molecule used as pharmaceutic aid (surfactant). %T: Global average of compounds correctly classiﬁed, N ¼ 1300. a Measured on Human. b Measured on Rat. c Measured on Rabbit. d Abiotic. e Measured on Fish.

In addition, the levels of the predictors that produce the most desirable partition properties were explored by the desirability analysis carried out. The inversion of the partition direction for each one of the 38 partition systems evi-

dences that our models correctly classiﬁed 89.08% of compounds with an uncertainty of only 60.17% independent of the direction of the partition process used to seek the model.

Journal of Computational Chemistry

DOI 10.1002/jcc

1922

Cruz-Monteagudo et al. • Vol. 28, No. 11 • Journal of Computational Chemistry

Other 10 different classiﬁcation models (linear, neural networks, and genetic algorithms) were also tested for the same purposes. None of these computational models favorably compares with respect to the linear model, indicating that our approach captures the main aspects that govern chemical partition in different systems. Finally, the analysis carried out in this study demonstrate that the distribution proﬁle of a broad number of structurally diverse drugs and other chemicals (1300) for 38 different partition systems of toxicological/biological and environmental signiﬁcance can be predicted by using a single and physically coherent fourparameter equation.

Acknowledgments H. Gonzalez-Dı´az acknowledges scholarship funding from Direccio´n Xeral de Investigacio´n y Desenvolvemento’’ of ‘‘Xunta de Galicia for a 1-year post-doctoral position in the Dipartimento Farmaco Chimico Tecnologico of the University of Cagliari, Italy. H. Gonzalez-Dı´az also acknowledges two contracts as guest professor in the Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, Spain. M. CruzMonteagudo thanks support from Department of Organic Chemistry, Faculty of Pharmacy, University of Porto, Portugal. The authors sincerely thank the help of the editor Prof. Gernot Frenking and the two unknown referees including English improvement.

References 1. Overton, E. Studien uber die Narkose Zugleich ein Beitrag zur Allgemeine Phamakologie. Jena, Germany: Vorlag von Gustav Fisher, 1901. 2. Meyer, H. Theorieder Alkoholnerkose. Archv. Exp Pathol Pharmacol 1899, 42, 109. 3. Taylor, P. J. Comprehensive Medicinal Chemistry; Pergamon Press: Oxford, 1990. 4. Livingstone, D. J. J Chem Inf Comput Sci 2000, 40, 195. 5. Katritzky, A. R.; Dobchev, D. A.; Hur, E.; Fara, D. C.; Karelson, M. Bioorg Med Chem 2005, 13, 1623. 6. Larsen, L. A.; Ito, S.; Koren, G. Ann Pharmacother 2003, 37, 1299– 1306. 7. Katritzky, A. R.; Kuanar, M.; Fara, D. C.; Karelson, M.; Acree, W. E., Jr. Bioorg Med Chem 2004, 12, 4735. 8. Rand, G. M. Fundamentals of Aquatic Toxicology; Taylor & Francis: Florida, 1995. 9. Escuder-Gilabert, L.; Martı´n-Biosca, Y.; Sagrado, S.; VillanuevaCaman˜as, R. M.; Medina-Herna´ndez, M. J. Anal Chim Acta 2001, 448, 173. 10. Hoff, J. T.; Wania, F.; Mackay, D.; Gillham, R. Environ Sci Technol 1995, 29, 1982. 11. Wania, F.; Hoff, J. T.; Jia, C. Q.; Mackay, D. Environ Pollut 1998, 102, 25. 12. Wania, F.; Hoff, J. T.; Jia, C. Q.; Mackay, D. Environ Sci Technol 1999, 33, 195. 13. Franz, T. P.; Eisenreich, S. J. Environ Sci Technol 1998, 32, 1771. 14. Grannas, A. M.; Shepson, P. B.; Guimbaud, C.; Sumner, A. L.; Albert, M.; Simpson, W.; Domine, F.; Boudries, H.; Bottenheim, J.; Beine, H. J.; Honrath, R.; Zhou, X. L. Atmos Environ 2002, 36, 2733. 15. Wania, F. Chemosphere 1997, 35, 2345.

16. Wania, F.; Semkin, R.; Hoff, J. T.; Mackay, D. Hydrol Process 1999, 13, 2245. 17. Petrenko, V. F.; Whitworth, R. W. Physics of Ice; Oxford University Press: Oxford, 1999. 18. Houdier, S.; Perrier, S.; Domine, F.; Cabanes, A.; Legagneux, L.; Grannas, A. M.; Guimbaud, C.; Shepson, P. B.; Boudries, H.; Bottenheim, J. W. Atmos Environ 2002, 36, 2609. 19. Domine, F.; Shepson, P. B. Science 2002, 297, 1506. 20. Boudries, H.; Bottenheim, J. W.; Guimbaud, C.; Grannas, A. M.; Shepson, P. B.; Houdier, S.; Perrier, S.; Domine, F. Atmos Environ 2002, 36, 2573. 21. Marrero Ponce, Y.; Cabrera Perez, M. A.; Romero Zaldivar, V.; Gonzalez-Diaz, H.; Torrens, F. J Pharm Pharm Sci 2004, 7, 186. 22. Marrero-Ponce, Y.; Medina-Marrero, R.; Torrens, F.; Martinez, Y.; Romero-Zaldivar, V.; Castro, E. A. Bioorg Med Chem 2005, 13, 2881. 23. Helguera, A. M.; Cabrera Perez, M. A.; Gonzalez, M. P.; Ruiz, R. M.; Gonzalez-Diaz, H. Bioorg Med Chem 2005, 13, 2477. 24. Hemmateenejad, B.; Miri, R.; Safarpour, M. A.; Mehdipour, A. R. J Comput Chem 2006, 27, 1125. 25. Yang, F.; Wang, Z. D.; Huang, Y. P. J Comput Chem 2004, 25, 881. 26. Yang, F.; Wang, Z. D.; Huang, Y. P.; Zhu, H. L. J Comput Chem 2003, 24, 1812. 27. Gonzalez-Diaz, H.; Aguero, G.; Cabrera, M. A.; Molina, R.; Santana, L.; Uriarte, E.; Delogu, G.; Castanedo, N. Bioorg Med Chem Lett 2005, 15, 551. 28. Hernandez, I.; Gonzales-Diaz, H. MARCH-INSIDE version 1.0 (Markovian Chemicals In Silico Design). Chemicals Bio-actives Center, Central University of Las Villas: Cuba, 2002. 29. Gonzalez-Diaz, H.; Cruz-Monteagudo, M.; Molina, R.; Tenorio, E.; Uriarte, E. Bioorg Med Chem 2005, 13, 1119. 30. Cruz-Monteagudo, M.; Gonzalez-Diaz, H. Eur J Med Chem 2005, 40, 1030. 31. Gramatica, P.; Corradi, M.; Consonni, V. Chemosphere 2000, 41 763. 32. Poulin, P.; Krishnan, K. Hum Exp Toxicol 1995, 14, 273. 33. Abraham, M. H.; Kamlet, M. J.; Taft, R. W.; Doherty, R. M.; Weathersby, P. K. J Med Chem 1985, 28, 865. 34. Kamlet, M. J.; Doherty, R. M.; Abboud, J. L.; Abraham, M. H.; Taft, R. W. J Pharm Sci 1986, 75, 338. 35. Paterson, S.; Mackay, D. Br J Ind Med 1989, 46, 321. 36. Abraham, M. H.; Weathersby, P. K. J Pharm Sci 1994, 83, 1450. 37. Moss, G. P.; Cronin, M. T. Int J Pharm 2002, 238, 105. 38. Burg, P.; Fydrych, P.; Abraham, M. H.; Matt, M.; Gruber, R. Fuel 2000, 79, 1041. 39. Roth, C. M.; Goss, K.; Schwarzenbach, R. P. Environ Sci Technol 2004, 38, 4079. 40. Degim, T.; Pugh, J. W.; Hadgraft, J. Int J Pharm 1998, 170, 129. 41. Martı´n-Biosca, Y.; Molero-Monfort, M.; Sagrado, S.; VillanuevaCaman˜as, R. M.; Medina-Herna´ndez, M. J. Eur J Pharm Sci 2003, 20, 209. 42. Abraham, M. H.; Acree, W. E., Jr. Int J Pharm 2005, 294, 121. 43. Duchowicz, P. R.; Ferna´ndez, F. M.; Castro, E. A. J Fluor Chem 2004, 125, 43. 44. Patel, H.; Schultz, T. W.; Cronin, M. T. D. J Mol Struct 2002, 593, 9. 45. Van Waterbeemd, H. In Chemometric Methods in Molecular Design; Van Waterbeemd, H., Ed.; Wiley-VCH: New York, 1995; pp. 265– 282. 46. Hill, T.; Lewicki, P. STATISTICS Methods and Applications, Statsoft, Tulsa, OK, 2006. 47. Van Waterbeemd, H. Chemometric Methods in Molecular Design; Wiley-VCH: New York, 1995. 48. Van de Waterbeemd, H.; Testa, B.; Carrupt, P. A.; el Tayar, N. Prog Clin Biol Res 1989, 291, 123.

Journal of Computational Chemistry

DOI 10.1002/jcc

Computational Chemistry Development of a Uniﬁed Free Energy Markov Model

49. Bisquerra Alzina, R. Introduccio´n Conceptual Al Ana´lisis Multivariante: Un Enfoque Informa´tico Con Los Paquetes SPSS-X, BMDP, LISREL y SPAD; PPU: Barcelona, 1989. 50. Randic, M. J Chem Inf Comput Sci 1991, 31, 311. 51. Randic, M. New J Chem 1991, 15, 517. 52. Randic, M. J Mol Struct 1991, 233, 45. 53. Lucic, B.; Nikolic, S.; Trinajstic, N.; Juric, D. J Chem Inf Comput Sci 1995, 35, 532. 54. Estrada, E.; Perdomo, I.; Torres-Lavandeira, J. J Chem Inf Comput Sci 2001, 41, 1561. 55. Fernandez, M.; Caballero, J.; Helguera, A. M.; Castro, E. A.; Gonzalez, M. P. Bioorg Med Chem 2005, 13, 3269. 56. Caballero, J.; Fernandez, M. J Mol Model (Online) 2005, 1. 57. Fernandez, M.; Caballero, J. J Mol Graph Model 2006, 25, 410. 58. Fernandez, M.; Caballero, J.; Tundidor-Camba, A. Bioorg Med Chem 2006, 14, 4137. 59. Fernandez, M.; Tundidor-Camba, A.; Caballero, J. J Chem Inf Model 2005, 45, 1884. 60. Havel, J.; Madden, E.; Haddad, P. R. Chromatographia 1999, 49, 481. 61. Zupan, J.; Gasteiger, J. Neural Networks in Chemistry and Drug Design; Wiley-VCH: Weinheim, 1999. 62. Rogers, D.; Hopﬁnger, A. J. J Chem Inf Comput Sci 1994, 34, 854. 63. Moreira, I. S.; Fernandes, P. A.; Ramos, M. J. J Comput Chem 2007, 28, 644. 64. Gonzalez-Diaz, H.; Aguero-Chapin, G.; Varona-Santos, J.; Molina, R.; de la Riva, G.; Uriarte, E. Bioorg Med Chem Lett 2005, 15, 2932. 65. Gonzalez-Diaz, H.; Bastida, I.; Castanedo, N.; Nasco, O.; Olazabal, E.; Morales, A.; Serrano, H. S.; de Armas, R. R. Bull Math Biol 2004, 66, 1285. 66. Gonzalez-Diaz, H.; Cruz-Monteagudo, M.; Vina, D.; Santana, L.; Uriarte, E.; De Clercq, E. Bioorg Med Chem Lett 2005, 15, 1651. 67. Gonzalez-Diaz, H.; de Armas, R. R.; Molina, R. Bull Math Biol 2003, 65, 991. 68. Gonzalez-Diaz, H.; de Armas, R. R.; Molina, R. Bioinformatics 2003, 19, 2079. 69. Gonzalez-Diaz, H.; Gia, O.; Uriarte, E.; Hernadez, I.; Ramos, R.; Chaviano, M.; Seijo, S.; Castillo, J. A.; Morales, L.; Santana, L.; Akpaloo, D.; Molina, E.; Cruz, M.; Torres, L. A.; Cabrera, M. A. J Mol Model (Online) 2003, 9, 395. 70. Gonzalez-Diaz, H.; Marrero, Y.; Hernandez, I.; Bastida, I.; Tenorio, E.; Nasco, O.; Uriarte, E.; Castanedo, N.; Cabrera, M. A.; Aguila, E.; Marrero, O.; Morales, A.; Perez, M. Chem Res Toxicol 2003, 16, 1318. 71. Gonzalez-Diaz, H.; Molina, R.; Uriarte, E. Bioorg Med Chem Lett 2004, 14, 4691.

1923

72. Gonzalez-Diaz, H.; Molina, R.; Uriarte, E. FEBS Lett 2005, 579, 4297. 73. Gonzalez-Diaz, H.; Olazabal, E.; Castanedo, N.; Sanchez, I. H.; Morales, A.; Serrano, H. S.; Gonzalez, J.; de Armas, R. R. J Mol Model (Online) 2002, 8, 237. 74. Gonzalez-Diaz, H.; Perez-Bello, A.; Uriarte, E.; Gonzalez-Diaz, Y. Bioorg Med Chem Lett 2006, 16, 547. 75. Gonzalez-Diaz, H.; Ramos de Armas, R.; Uriarte, E. Online J Bioinformatics 2002, 1, 83. 76. Gonzalez-Diaz, H.; Sanchez, I. H.; Uriarte, E.; Santana, L. Comput Biol Chem 2003, 27, 217. 77. Gonzalez-Diaz, H.; Tenorio, E.; Castanedo, N.; Santana, L.; Uriarte, E. Bioorg Med Chem 2005, 13, 1523. 78. Gonzalez-Diaz, H.; Torres-Gomez, L. A.; Guevara, Y.; Almeida, M. S.; Molina, R.; Castanedo, N.; Santana, L.; Uriarte, E. J Mol Model (Online) 2005, 11, 116. 79. Gonzalez-Diaz, H.; Uriarte, E. Bioorg Med Chem Lett 2005, 15, 5088. 80. Gonzalez-Diaz, H.; Uriarte, E. Biopolymers 2005, 77, 296. 81. Gonzalez-Diaz, H.; Uriarte, E.; Ramos de Armas, R. Bioorg Med Chem 2005, 13, 323. 82. Gonzalez-Diaz, H.; Vina, D.; Santana, L.; de Clercq, E.; Uriarte, E. Bioorg Med Chem 2006, 14, 1095. 83. Ramos de Armas, R.; Gonzalez-Diaz, H.; Molina, R.; Perez Gonzalez, M.; Uriarte, E. Bioorg Med Chem 2004, 12, 4815. 84. Ramos de Armas, R.; Gonzalez-Diaz, H.; Molina, R.; Uriarte, E. Proteins 2004, 56, 715. 85. Saiz-Urra, L.; Gonzalez-Diaz, H.; Uriarte, E. Bioorg Med Chem 2005, 13, 3641. 86. Velikson, B.; Garel, T.; Niel, J.-C.; Orland, H.; Smith, J. C. J Comput Chem 1992 13, 1216. 87. Yu, H.; Geerke, D. P.; Liu, H.; van Gunsteren, W. F. J Comput Chem 2006, 27, 1494. 88. Rapaport, D. C. The Art of Molecular Dynamics Simulation; Cambridge University Press: Cambridge, 1996. 89. Robert, C. P.; Casella, G. Monte Carlo Statistical Methods; SpringerVerlag: New York, 2004. 90. Schlick, T. Molecular Modeling and Simulation; Springer: New York, 2002. 91. Stewart, J.; Gill, L. Econometrics; Prentice Hall: London, 1998. 92. Dillon, W. R.; Goldstein, M. Multivariate Analysis: Methods and Applications; Wiley: New York, 1984. 93. Cliff, N. Analyzing Multivariate Data; Harcourt Brace Jovanovick, New York, 1987. 94. Kowalski, R. B.; Wold, S. In Handbook of Statistics; Krishnaiah, P. R.; Kanal, L. N., Eds.; North Holland Publishing Company: Amsterdam, 1982. pp 673–697.

Journal of Computational Chemistry

DOI 10.1002/jcc

Computational chemistry comparison of stable ...