QSPR predictions of heat of fusion of organic ...

Viewer
Transcript

QSPR predictions of heat of fusion of organic compounds using Bayesian regularized artificial neural networks Mohammad Goodarzi,a Tao Chenb and Matheus P. Freitasc,* a

Department of Chemistry, Faculty of Sciences and Young Researchers Club, Islamic Azad University, Arak Branch, Arak, Markazi, Iran

b

School of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, Singapore 637459, Singapore

c

Departamento de Química, Universidade Federal de Lavras, CP 3037, 37200-000, Lavras, MG, Brazil

Abstract Computational approaches for the prediction of environmental pollutants’ properties have great potential in rapid environmental risk assessment and management with reduced experimental cost. A quantitative structure-property relationship (QSPR) study was conducted to predict the heat of fusion of a set of organic compounds that have adverse effect on the environment. The forward selection (FS) strategy was used for descriptors selection. We examined the feasibility of using multiple linear regression (MLR), artificial neural networks (ANN) and Bayesian regularized artificial neural networks (BRANN) as linear and nonlinear methods. The QSPR models were validated by an external set of compounds that were not used in the model development stage. All models reliably predicted the heat of fusion of the organic compounds under study, whereas more accurate results were obtained by the BRANN model. Keywords: heat of fusion; QSPR; forward selection; MLR; BRANN

2

1. Introduction

In the last two decades, the human life has been increasingly affected by environmental pollution, where various organic pollutants have been recognized to play a major role, such as benzene derivatives, phenolic derivatives and organic acids. These compounds are important environmental contaminants because of their high toxicity, widespread occurrence, and the capability of long-distance transfer, precipitation and accumulation in environment [1]. They affect the growth and decay of plants, and the health of human being and animals. The adverse effect on human health, some of which are highly detrimental, has already been documented in the literature [2-4]. The heat of fusion has been correlated to the concentration of polycyclic aromatic hydrocarbons [5] as a key thermodynamic property for the Freundlich equation [6]. It is defined as the sum of heat of melting and heat of all polymorphic transitions. The amount of heat required to convert a unit mass of a solid at its melting point into a liquid without an increase in temperature. In contrast to other thermodynamic properties, the heat of fusion is difficult to be accurately estimated by the group contribution method [7-9]. One of the major goals in the energetic materials field is to predict the performance, sensitivity, physical and thermodynamic properties of the materials prior to actual synthesis of them. Quantitative structure-activity/property relationship (QSAR/QSPR) techniques have been used to achieve this objective, demonstrating to be powerful tools in many fields of materials and compounds design [8-11]. QSAR/QSPRs are indispensable in current drug discovery (and other computational chemistry applications), since the capability of prediction can greatly facilitate the virtual design of compound libraries, combinatorial libraries with appropriate

3

absorption,

distribution,

metabolism

and

excretion

properties.

Altogether,

QSAR/QSPR technology considerably saves time and money during the drug development process. The predictive accuracy of QSPR analyses is typically affected by two aspects: the selection of descriptors that sufficiently represent the structural information of the molecules, and the choice of a specific predictive model. Several regression methods have been used in the field of QSPR, such as the famous and popular method of artificial neural network (ANN). ANNs are computer-based systems derived from the simplified concept of the human brain. The building unit of a neural network is a simplified model of the functional behavior of an organic neuron. A detailed explanation of the ANN theory and application to chemical problems can be found in previous researches [12-16]. Bayesian regularized ANN (BRANN) is a multilayer feed-forward neural network trained using Bayesian algorithm. In contrast to traditional ANN, where the network weights are assumed to be fixed quantities, the Bayesian principle considers a probability distribution of these weights and infers the posterior distribution over network weights. BRANN has been shown to attain more reliable and accurate predictions than traditional ANN in many applications [17-19]. In this work, BRANN is implemented using the LevenbergMarquardt algorithm. The combination of the two methods can accelerate convergence of target and determine the optimum weights for the network [20, 21], and is briefly described as follows. Actually, a Bayesian structure directly applied to neural networks has been proposed by MacKay [22] to overcome the problem of interpolating noisy data. The MSE (mean-square error) error function for minimization of a case of back propagation learning algorithm is considered, and the adoption of this performance

4

measuring index may lead to overfitting problems because of the unbounded values of weights of ANN. The performance function in the Bayesian-regularized (BR) method is changed by adding a term that consists of MSE of a combination of weights and biases as F = ED + EW where F is the network performance function, ED is sum of square error and Ew is the sum of squares of the network weights and biases, and

and

are objective function

parameters and dictates the emphasis foe getting a smoother network response [23]. Therefore, the modification of performance function aims to improve the ANN model’s generalization capability. In this context, it is assumed that the weights and biases of the ANN are random variables following Gaussian distributions, and the parameters are related to the unknown variances associated with these distributions. However, based on the Bayes rule, the density function for the weights can be updated in Bayesian framework after the data are taken as P ( w | D,  ,  , M ) 

P ( D |, w,  , M )  P ( w |  , M ) P (D |  ,  , M )

Where the data set is represented by symbol D, M is the particular neural network, and w is the vector of network weight. The plausibility of weight distribution considering the information of the dataset in the model is P (w | D,  ,  , M ) , P ( D |, w,  , M ) is the likelihood function, P ( w |  , M ) is the prior density, and P ( D |  ,  , M ) is a normalization factor which guarantees the total probability is one.

Considering that the noise in the training set data, as well as prior distribution for the weights, is Gaussian and then P ( w | D,  ,  , M ) 

5

1 exp( F ) ZF

where ZF depends on objective function parameters, and based on this structure, the minimization of F is equal to find the (locally) most probable parameters [24]. In the present work, multiple linear regression (MLR), ANN and BRANN, as linear and nonlinear techniques, were used to predict the heat of fusion values of environmental pollutants. The aim of this work was to build QSPR models that can be used for predicting heat of fusion of these compounds from their molecular structure alone, and also to test the performance of the above methods and evaluate their applicability as powerful chemometric tools for predicting thermodynamic properties.

2. Computational methods

The experimental values were taken from the literature [25, 26]. Repeated compounds of Table 1 are due to different experimental values found in the literature. The 2D structures of the molecules were drawn using the HyperChem 7 software [27]. The final optimized geometries were obtained with the semi-empirical AM1 method in Hyperchem. All calculations were carried out at the restricted Hartree-Fock level with no configuration interaction. The molecular structures were optimized using the Polak–Ribiere algorithm until the root mean square gradient was 0.001 Kcal mol-1 [28]. The resulted geometry was transferred into the Dragon program package [29], in order to obtain descriptors on Constitutional, Topological, Geometrical, Charge, GETAWAY (GEometry, Topology and Atoms-Weighted AssemblY), WHIM (Weighted Holistic Invariant Molecular descriptors), 3D-MoRSE (3DMolecular Representation of Structure based on Electron diffraction), Molecular Walk Count, BCUT, 2D-Autocorrelation, Aromaticity Index, Randic molecular profile, Radial Distribution Function, Functional group and Atom-Centred Fragment classes. The

6

calculated descriptors were first analyzed for the existence of constant or near constant variables. The detected descriptors, 800 in total, were then removed since they do not provide sufficient information of the molecular structures. In addition, to decrease the redundancy existing in the descriptor data matrix, the descriptors correlation with each other and with the property of the molecules was examined and the collinear descriptors (i.e. r > 0.9) were detected. Among the collinear descriptors, the one presenting the highest correlation with the activity was retained and others were removed from the data matrix. Finally the 237 descriptors were used for the next step. The Bayesian regularized ANN was used in Matlab using NNet toolbox [30]. TABLE 1

3. Results and discussion

QSAR/QSPR analyses are particularly important for the design of new compounds, because the predictive capability can reduce the time and cost involved in purely experimental studies. Therefore, researchers have paid more attention to such studies on many compounds, but many of these works tended to focus more on the modeling ability of the QSAR/QSPR models and paid little consideration to the model validation and applicability domain which is essential for the assessment of the reliability. There are some techniques for validating multivariate models; one is based on cross-validation and other is the use of an external set, so we performed both of them. On the other hand, another point is that descriptors in model should represent the maximum information of structure variations and collinearity among them must be kept to a minimum. It should be noted that we have used of forward selection as a common and simple technique for feature selection.

7

The Forward Stepwise procedure (FS) is an effective and efficient approach for the selection of informative descriptors in QSPR. It consists of a stepwise addition of the best molecular descriptors to the model so as to minimize the standard deviation (S), until there is no variable outside the model to satisfy the selection criterion. Five descriptors were selected using the FS method, whose values are shown in Table 1. The correlation matrix of these five descriptors is depicted in Table 2, which shows that there is not significant correlation between the selected descriptors. Obviously, some problem to the model would arise if descriptors correlation coefficients are high, or one of them is collinear and other is chance of correlation; Table 2 shows that these problems are not observed in our models. TABLE 2 In order to develop a model and to assess validation, the data set of 74 compounds was divided into a training set of 56 compounds and a test set of 18 compounds. It should be noted that we split data based on the range of heat of fusion values; for the training set, the range was 12.12 to 56.60 (56 compounds), whilst for the test set, it was from 16.99 to 37.44 (18 compounds). This avoids extrapolation, since the training set range covers the test set one. In the training set, with the selected five descriptors, we built the linear model using the training set data, and the following equation was obtained:

Hfus = -11.762 + (53.969 × RDF010m) + (101.46 × R3e+) + (6.8494 × BEHm7) – (9.6472 × Mor20e) + (14.975 × Gs)

Thus, the built model was used to carry out validation and predict the test set data. The prediction results are given in Table 3. We constructed linear models with

8

different number of descriptors. Figure 1 shows that model with all five descriptors is more powerful than the others, since the squared correlation coefficient of experimental versus fitted/predicted heats of formation (r 2) increases and standard error (SE) decreases when more descriptors are added into the model. TABLE 3 FIGURE 1 In order to give insight about the nonlinearity of the model, a three-layer Back Propagation ANN model was constructed using the Levenberg-Marquardt (LM) algorithm as activity function. The input values were auto scaled before training the network, and also the initial weights were selected randomly between -0.3 and 0.3; then the neurons in the hidden layer, weights and biases learning rates and momentum values are optimized. The proper number of neurons in the hidden layer was 5, which was optimized by training the network using different number of neurons in the hidden layer and it was optimized based on root mean square error (RMSE), comparing the outputs with the target values. After optimization of all ANN parameters, the network for the adjustment of weights and biases values was trained. Additionally, a BRANN model was built to verify the enhancement capability of using Bayesian regularization when compared to ANN alone for the five-parametric model; a brief overview of the BRANN modeling is presented here. First, we assign prior distribution over the network weights. After the data is collected, the posterior distribution of the network weights can be determined by Bayesian inference. If a Gaussian prior distribution that penalizes the network weights is applied, and the data are assumed to be from a smooth function with additive Gaussian noise, maximizing the posterior distribution is equivalent to minimizing the standard sum-of-squares error together with a weight decay regularizer [31]. In the prediction stage, both the

9

mean

and variance

2

of the predictive distribution can be calculated to provide a

confidence bound on the predicted values. In this work, the Bayesian regularization was implemented within the Levenberg–Marquardt algorithm (LMBR). The combination of the two methods can accelerate convergence of targets and determine the optimum weights for the network [32, 33]. Before training the networks, the input values were normalized between -1 and 1. The initial weights were selected randomly between -0.3 and 0.3 and then the parameters of the number of nodes (neurons) in the hidden layer, weights and biases learning rates values were optimized. It should be noted that we used one hidden layer and the proper number of nodes in the hidden layer was determined by training the network with different number of neurons in the hidden layer. The root-mean-square error (RMSE) value measures how good the outputs are in comparison with the target values. It is worth mentioning that in BRANN process we used 8 compounds for validation set to evaluate the overfitting; the training of the network for the prediction of heat of fusion must stop when the RMSE of the validation set begins to increase while RMSE of training set continues to decrease. Therefore, training of the network was stopped when overtraining began. Actually, stop training is based on something: first the maximum number of epochs (repetitions) is reached, the maximum amount of time is exceeded, performance is minimized to the goal, etc., but the best way to prevent overtraining is to stop the training based on validation set. After that, we optimized everything and we trained the model we have used for external set as a test set of model, which did not have contribution in model development steps. On the other hand, for evaluating the MLR, ANN and BRANN based models, we used some statistical parameters, such as F statistical, t-test, squared correlation coefficients (r 2), root mean square error of prediction (RMSEP), relative standard error of prediction

10

(RSEP) and mean absolute error (MAE) values [34], in addition to other parameters useful for validation as reported elsewhere [35]. The statistical results for all three models (Table 4) show that they were reasonably well achieved in this study, but the FS-BRANN was more reliable in predicting the heat of fusion values. Figure 2 shows the experimental values versus heat of fusion obtained by FS-BRANN, whilst the residuals of the FS-BRANN predicted values of heat of fusion are plotted against the experimental values in Figure 3. The propagation of residuals at both sides of the zero line indicates that no systematic error exists in the development of the FS-BRANN model. TABLE 4 FIGURE 2 FIGURE 3

4. Conclusion

In this study, by the use of Bayesian regulation artificial neural network (BRANN), and also using artificial neural network (ANN) alone and multiple linear regression (MLR) methods, and five descriptors that were computed by Dragon software, predictive QSPR models were developed for heat of fusion of some environmental organic pollutants. The physicochemical (descriptors) properties of the compounds were selected by a common feature selection technique from a pool of descriptors; these descriptors were RDF010m, R3e+, BEHm, Mor20e and Gs, which captured enough information on molecular structures. The use of single descriptors can capture only one part of the property of interest, or of some occurring process, which is in many cases far from satisfactory. On the other hand, we can not use the

11

whole set of descriptors due to overfitting problems of statistical modeling. The use of multivariate regression instead is a great improvement for correlating physical properties with molecular parameters. We constructed a MLR-based model as a simple and fast linear approach, and a BRANN-based model as a powerful and nonlinear approach on five descriptors that were selected by forward selection; based on the prediction results obtained, it seems that the QSPR study performed is quite useful for predicting the heat of fusion of environmental organic pollutants.

Acknowledgements CNPq is gratefully acknowledged for the fellowship (to M.P.F.), as is FAPEMIG for the financial support.

References [1] R.V. Galiulin, V.N. Bashkin, R.A. Galiulina, Water Air Soil Pollut. 137 (2002) 179–191. [2] J.P. Giesy, K. Kannan, Crit. Rev. Toxicol. 28 (1998) 511–569. [3] A. Katsoyiannis, A. Zouboulis, C. Samara, Chemosphere 65 (2006) 1634–1641. [4] F. Flores-Céspedes, M. Fernández-Pérez, M. Villafranca-Sànchez, E. GonzálezPradas, Environ. Pollut. 142 (2006) 449–456. [5] C. Plaza, B. Xing, J.M. Fernández, N. Senesi, A. Polo, Environ. Poll. 157 (2009) 257–263. [6] A.M. Carmo, L.S. Hundal, M.L. Thompson, Environ. Sci. Technol. 34 (2000) 4363–4369

12

[7] R.C. Reid, J.M. Prausnitz, B.E. Poling, The Properties of Gases and Liquids, 4th ed., McGraw-Hill, New York, 1987. [8] P. Simamora, S.H. Yalkowsky, Ind. Eng. Chem. Res. 33 (1994) 1405–1409. [9] J.F. Krzyzaniak, P.B. Myrdal, P. Simamora, S.H. Yalkowsky, Ind. Eng. Chem. Res. 34 (1995) 2530–2535. [10] M. Goodarzi, M.P. Freitas, QSAR Comb. Sci. 27 (2008) 1092–1098. [11] M. Goodarzi, M.P. Freitas, J. Phys. Chem. A. 112 (2008) 11263–11265 [12] M. Goodarzi, M.P. Freitas, Chemom. Intell. Lab. Sys. 96 (2009) 59–62. [13] M.P. Freitas, E. F. F. da Cunha, T.C. Ramalho, M. Goodarzi, Curr. Comput.-Aid. Drug Des. 4 (2008) 273-282. [14] G. Kateman, Chemom. Intell. Lab. Sys. 19 (1993) 135–142. [15] J. Zupan, J. Gasteiger, Neural Networks in Chemistry and Drug Design, VCH, Weinheim, 1999. [16] S. P. Niculescu, J. Mol. Struct. (Theochem) 622 (2003) 71–83. [17] F.R. Burden, D. A. Winkler, Chem. Res. Toxicol. 13 (2000) 436–440. [18] F.R. Burden, D. A.Winkler, J. Med. Chem. 42 (1999) 3183–3187. [19] M. J. Polley, D. A. Winkler, F.R. Burden, J. Med. Chem. 47 (2004) 6230-6238. [20] Y.H. Wang, Y. Li, Y.H. Li, S.L. Yang, L. Yang, Bioorg. Med. Chem. Lett. 15 (2005) 4076–4084. [21] J. Caballero, M. Garriga, M. Fernandez, Bioorg. Med. Chem. 14 (2006) 3330– 3340. [22] D.J.C. MacKay, Neural Comput. 4 (1992) 448–472. [23] M Fernandez, J. Caballero, L. Fernandez, J. Ignacio Abreu, M. Garriga, Journal of Molecular Graphics and Modelling 26 (2007) 748–759. [24] D.J.C. MacKay, Neural Comput. 4 (1992) 415–447.

13

[25] M.H. Keshavarz, J. Hazardous Mat. 150 (2008) 387–393. [26] C. Chiou, D.W. Schmedding, M. Manes, Environ. Sci. Technol. 39 (2005) 8840– 8846. [27] HyperChem version 7.0, Hypercube, Inc, Gainesville, 2007. [28] D. C. Young, Computational Chemistry: A Practical Guide for Applying Techniques to Real-World Problems, John Wiley & Sons, New York, 2001. [29] R. Todeschini, V. Consonni, M. Pavan, Dragon software, Milano, 2002. [30] Matlab Version 7.6, MathWorks Inc., Natick, MA, 2007. [31] C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1995. [32] D.J.C. MacKay, Neural Comput. 4 (1992) 448–472. [33] F.D. Foresee, M.T. Hagan, Gauss-Newton Approximation to Bayesian Learning, in: Proceedings of the 1997 IEEE International Conference on Neural Networks, Houston, 1997, pp. 1930–1935. [34] M. Goodarzi, T. Goodarzi, N. Ghasemi, Ann. Chim. 97 (2007) 303–312 [35] M. Goodarzi, Matheus P. Freitas, N. Ghasemi, Eur. J. Med. Chem. 45 (2010) 3911–3915.

14

Figure Captions

Figure 1. Linear models with different number of descriptors versus the correlation coefficient and standard error.

Figure 2. Plot of the calculated Hfusion through the BRANN-based model against the experimental values.

Figure 3. Plot of the residuals of predicted values through the BRANN-based model versus experimental Hfusion values.

15

Table 1. Descriptor values used for models construction. No. Compounds

RDF010m R3e+

BEHm7 Mor20e

Gs

1 1,2,3-Trichlorobenzene

0.093

0.159

0.755

0.618

0.621

2 1,2,3,4-Tetrachlorobenzene

0.062

0.156

0.783

0.657

0.614

3 1,2,4,5-Tetrachlorobenzene

0.062

0.149

1.549

0.552

1

4 Biphenyl

0.309

0.053

0.935

1.15

1

5 Naphthalene

0.248

0.064

0.802

0.951

1

6 2,6-Dimethylnaphthalene

0.305

0.052

1.897

1.137

0.758

7 Phenanthrene

0.309

0.052

2.291

1.304

0.593

8 2,4,5-Pcb

0.217

0.092

2.482

1.015

0.346

9 2,2',5-Pcb

0.217

0.068

2.51

1.356

0.346

10 2,2',4,5,5'-Pcb

0.155

0.088

2.821

1.304

0.338

11 2,2',3,3',4,4'-Pcb

0.124

0.088

3.076

1.201

1

12 Chlorpyrifos

0.213

0.122

2.764

0.789

0.193

13 Lindane

0.106

0.139

2.174

0.307

0.362

14 P,P'-DDT

0.259

0.085

2.951

1.157

0.191

15 1,2,4,5-Tetramethylbenzene

0.301

0.031

1.586

0.755

1

0

0.143

2.146

0.856

1

17 Pyrene

0.309

0.044

2.29

1.324

1

18 2,4,6-Pcb

0.217

0.068

2.532

1.098

0.588

19 2,2',3,3',4,5,5',6,6'-Pcb

0.031

0.086

3.242

0.747

0.57

20 2,8-Dichlorodibenzofuran

0.186

0.071

2.381

1.007

0.588

21 Dieldrin

0.193

0.103

2.959

0.494

0.191

22 Leptophos

0.27

0.085

2.79

0.934

0.188

16 Hexachlorobenzene

16

23 Aldicarb

0.379

0.071

1.861

0.856

0.218

24 Carbaryl

0.378

0.079

2.407

1.181

0.204

25 Alachlor

0.427

0.049

2.532

1.778

0.193

26 Linuron

0.313

0.098

2.36

0.922

0.204

27 Nitrobenzene

0.212

0.077

0.789

0.558

0.524

28 2-Nitrophenol

0.273

0.074

0.848

0.516

0.377

29 3-Nitroaniline

0.432

0.051

3.25

0.909

0.168

30 1-Nitronaphthalene

0.246

0.094

1.331

0.204

0.213

31 3-Nitrophenol

0.283

0.087

0.863

0.526

0.377

32 1-Nitroaniline

0.359

0.071

0.793

0.656

0.377

33 2-Nitrobenzoic Acid

0.28

0.107

1.331

0.374

0.362

34 3-Nitrophthalic Anhydride

0.171

0.065

1.331

0.314

0.351

35 4-Nitrophthalic Anhydride

0.165

0.061

1.331

0.303

0.351

36 1-Methyl-2,4-Dinitrobenzene

0.26

0.061

1.331

0.593

0.356

37 2-Methyl-1,3-Dinitrobenzene

0.26

0.06

1.331

0.423

0.356

38 1,4-Dinitrobenzene

0.214

0.065

1.331

0.399

1

39 1,2-Dinitrobenzene

0.234

0.069

1.331

0.371

0.424

40 2,4-Dinitrophenol

0.288

0.083

1.331

0.442

0.356

41 2-Methyl-4,6-Dinitrophenol

0.326

0.056

1.514

0.394

0.351

42 2,6-Dinitrophenol

0.288

0.064

1.331

0.512

0.372

43 4-Methyl-1,2-Dinitrobenzene

0.263

0.053

1.331

0.452

0.356

44 2,5-Dinitrophenol

0.288

0.076

1.331

0.331

0.356

45 1-Methyl-2,3-Dinitrobenzene

0.26

0.065

1.331

0.391

0.356

46 3,4-Dinitrophenol

0.305

0.081

1.331

0.233

0.356

17

47 2,3-Dinitrophenol

0.311

0.078

1.331

0.256

0.356

48 1,8-Dinitronaphthalene

0.288

0.071

2.225

0.464

0.31

49 1,5-Dinitronaphthalene

0.288

0.051

2.058

0.866

1

50 1,3,5-Trinitrobenzene

0.246

0.044

1.331

0.526

0.346

51

0.246

0.044

1.331

0.526

0.346

52 2,4,6-Trinitroresorcinol

0.406

0.056

1.672

0.356

0.338

53

0.406

0.056

1.672

0.356

0.338

54 1-Methy-2,4,6-Trinitrobenzene

0.286

0.043

1.854

0.342

0.342

55

0.286

0.043

1.854

0.342

0.342

56

0.286

0.043

1.854

0.342

0.342

57

0.286

0.043

1.854

0.342

0.342

58 1-Methoxy-2,4,6-Trinitrobenzene

0.276

0.042

2.242

0.252

0.338

59 1-Methyl-3-Hydroxy-2,4,6-Trinitrobenzene

0.359

0.053

1.887

0.241

0.338

60

0.359

0.053

1.887

0.241

0.338

61 1-Amino-2,4,6-Trinitrobenzene

0.417

0.043

1.755

0.511

0.342

62 1,3-Diamino-2,4,6-Trinitrobenzene

0.545

0.04

1.847

-0.036

0.49

63 1,3,5-Triamino-2,4,6-Trinitrobenzene

0.702

0.036

1.986

-0.356

0.334

64 2,4,6-Trinitrobenzoic Acid

0.333

0.077

2.123

0.084

0.334

65 1,4,5-Trinitronaphthalene

0.316

0.049

2.437

0.709

0.331

66 1-(Methylnitramino)-2,4,6-Trinitrobenzene

0.351

0.073

2.343

0.449

0.198

67

0.351

0.073

2.343

0.449

0.198

68 2,2',4,4',6,6'-Hexanitrobiphenyl

0.442

0.047

2.963

1.04

0.819

69 2,2',4,4',6,6'-Hexanitrobibenzyl

0.504

0.044

2.963

1.04

0.819

70 2,2',4,4',6,6'-Hexanitrodiphenylamine

0.646

0.044

3.01

1.481

0.168

18

71 2,2',4,4',6,6'-Hexanitrostilbene

0.492

0.046

3.164

0.947

0.193

72 2,2',4,4',6,6'-Hexanitrodiphenylsulfide

0.432

0.05

3.25

0.933

0.168

73 2,2',4,4',6,6'-Hexanitrodiphenylsulfone

0.469

0.045

3.397

0.787

0.186

74 3,3'-Dimethyl-2,2',4,4',6,6'-Hexanitrobiphenyl

0.511

0.045

2.996

0.748

0.178

19

Table 2. Correlation matrix for the five selected descriptors a. RDF010m R3e+ RDF010m

BEHm7 Mor20e

1

R3e+

0.4448

1

BEHm7

0.061

0.0102

1

Mor20e

0.0033

0.0004

0.2023

1

Gs

0.1594

0.0193

0.064

0.0419

a

Gs

1

Radial distribution function -1.0/ weighted by atomic masses (RDF010m), R

maximal

autocorrelation

of

lag

3/

weighted

by

atomic

Sanderson

electronegativities(R3e+), highest eigenvalue n.7 of Burden matrix/ weighted by atomic masses (BEHm7), 3D MoRSE-signal 20/ weighted by atomic Sanderson electronegativities (Mor20e) and G total symmetry index/ weighted by atomic electrotopological states (Gs).

20

Table 3. Experimental and calculated heat of fusion ( Hfusion) by BRANN, ANN and MLR models.a No Compounds

Exp BRANN ANN MLR

1

1,2,3-Trichlorobenzene

17.36 15.13

17.24 17.90

2*

1,2,3,4-Tetrachlorobenzene

16.99 17.08

17.41 15.63

3** 1,2,4,5-Tetrachlorobenzene

24.10 26.68

25.27 26.96

4*

Biphenyl

17.49 18.54

17.78 20.58

5

Naphthalene

18.99 17.36

17.99 19.41

6

2,6-Dimethylnaphthalene

24.27 21.72

23.21 23.35

7

Phenanthrene

18.62 21.38

20.78 22.18

8*

2,4,5-Pcb

22.80 22.82

24.31 21.67

9

2,2',5-Pcb

17.91 16.07

18.59 16.14

10** 2,2',4,5,5'-Pcb

18.78 18.78

19.46 17.34

11

2,2',3,3',4,4'-Pcb

29.20 29.51

29.13 28.32

12

Chlorpyrifos

25.94 24.31

23.91 26.32

23.59 22.47

22.52 25.41

26.36 23.31

22.10 22.75

21.00 24.60

24.39 26.18

13* Lindane 14

P,P'-DDT

15* 1,2,4,5-Tetramethylbenzene 16

Hexachlorobenzene

28.74 24.80

27.02 24.16

17

Pyrene

23.51 24.34

25.53 27.27

18

2,4,6-Pcb

16.48 16.91

23.12 16.40

22.63 21.29

22.18 22.17

19* 2,2',3,3',4,5,5',6,6'-Pcb 20

2,8-Dichlorodibenzofuran

25.19 22.48

22.71 20.88

21

Dieldrin

20.08 23.14

23.69 27.47

21

19.49 24.07

23.75 24.35

23* Aldicarb

25.94 24.76

24.17 23.65

Carbaryl

24.27 25.49

25.11 24.80

25** Alachlor

17.74 17.60

18.62 19.34

26

Linuron

28.66 25.96

26.66 25.40

27

Nitrobenzene

12.12 14.24

15.41 15.36

28

2-Nitrophenol

17.45 16.45

16.02 16.96

29** 3-Nitroaniline

23.69 22.46

31.85 32.73

30* 1-Nitronaphthalene

18.43 20.82

20.94 21.39

31

3-Nitrophenol

19.20 18.05

19.58 18.82

32

1-Nitroaniline

16.11 17.38

17.03 19.57

33* 2-Nitrobenzoic Acid

27.99 25.64

24.88 25.14

22

24

Leptophos

34

3-Nitrophthalic Anhydride

18.40 16.84

16.98 15.41

35

4-Nitrophthalic Anhydride

17.14 16.35

16.51 14.78

36* 1-Methyl-2,4-Dinitrobenzene

20.12 18.23

18.29 17.19

37* 2-Methyl-1,3-Dinitrobenzene

19.28 20.19

19.39 18.72

38* 1,4-Dinitrobenzene

28.12 27.00

28.78 26.62

39** 1,2-Dinitrobenzene

28.12 25.15

22.02 19.75

40

2,4-Dinitrophenol

24.17 23.46

25.14 22.39

41

2-Methyl-4,6-Dinitrophenol

19.41 24.52

23.43 23.34

42

2,6-Dinitrophenol

19.58 21.36

21.08 20.02

43

4-Methyl-1,2-Dinitrobenzene

18.83 19.48

18.13 17.90

44

2,5-Dinitrophenol

23.73 23.67

24.84 22.75

45

1-Methyl-2,3-Dinitrobenzene

17.57 20.87

20.67 19.54

22

3,4-Dinitrophenol

25.38 25.39

26.30 25.12

47* 2,3-Dinitrophenol

26.24 25.37

26.36 24.91

46

48

1,8-Dinitronaphthalene

35.20 32.38

26.79 26.39

49

1,5-Dinitronaphthalene

33.03 28.94

29.66 29.67

50

1,3,5-Trinitrobenzene

15.69 16.82

16.47 15.20

16.74 16.82

16.47 15.20

33.50 29.21

28.70 28.91

28.80 29.21

28.70 28.91

21.86 22.70

20.86 22.56

55*

19.58 22.70

20.86 22.56

56**

21.94 22.70

20.86 22.56

57*

23.43 22.70

20.86 22.56

51** 52

2,4,6-Trinitroresorcinol

53* 54

1-Methy-2,4,6-Trinitrobenzene

58

1-Methoxy-2,4,6-Trinitrobenzene

19.64 23.35

22.63 25.38

59

1-Methyl-3-Hydroxy-2,4,6-Trinitrobenzene

26.74 27.61

28.09 28.65

26.01 27.61

28.09 28.65

60** 61

1-Amino-2,4,6-Trinitrobenzene

28.15 28.59

25.06 27.32

62

1,3-Diamino-2,4,6-Trinitrobenzene

35.25 38.11

37.23 42.05

63

1,3,5-Triamino-2,4,6-Trinitrobenzene

56.60 52.36

54.11 51.82

64

2,4,6-Trinitrobenzoic Acid

31.60 29.02

29.46 32.76

27.49 25.58

22.42 25.07

25.85 27.37

28.23 29.27

67

25.85 27.37

28.23 29.27

68* 2,2',4,4',6,6'-Hexanitrobiphenyl

37.44 36.81

36.39 39.39

2,2',4,4',6,6'-Hexanitrobibenzyl

43.85 43.54

45.63 40.30

65* 1,4,5-Trinitronaphthalene 66

69

1-(Methylnitramino)-2,4,6-Trinitrobenzene

23

70

2,2',4,4',6,6'-Hexanitrodiphenylamine

37.38 36.66

38.51 36.41

71

2,2',4,4',6,6'-Hexanitrostilbene

40.21 35.78

35.36 34.88

72

2,2',4,4',6,6'-Hexanitrodiphenylsulfide

38.00 32.17

31.09 32.40

73

2,2',4,4',6,6'-Hexanitrodiphenylsulfone

40.36 38.07

39.67 36.58

74

3,3'-Dimethyl-2,2',4,4',6,6'-Hexanitrobiphenyl

33.69 38.03

37.92 36.35

a

Compounds with * pertain to the test set, and compounds marked with ** were used

as monitoring set during construction of ANN models.

24

Table 4. Comparison of the statistical parameters obtained using the FS-BRANN, FSANN and FS-MLR models.

Parameters RMSEP RSEP(%) MAE(%) r2 F statistic t-test R02 R’02 R0m2 R’0m2 Ra 2

Training set Test set Training set Test set Training set Test set Training set Test set Training set Test set Training set Test set Training set Test set Training set Test set Training set Test set Training set Test set Training set Test set

FS-BRANN 2.4942 1.6781 9.3819 6.9166 18.996 27.636 0.9142 0.8901 575.54 129.54 23.990 11.382 0.8961 0.8608 0.9123 0.8877 0.7912 0.7377 0.8743 0.8465 0.9056 0.8443

25

FS-ANN 3.0756 2.0177 11.5692 8.3165 20.5577 29.0280 0.8641 0.8423 343.3514 85.46904 18.52974 9.244947 0.8384 0.8164 0.8638 0.8423 0.7256 0.7067 0.8491 0.8423 0.8505 0.7766

FS-MLR 3.6511 2.3258 13.734 9.5866 22.592 33.229 0.8082 0.7956 227.52 62.286 15.083 7.8921 0.7711 0.7852 0.8081 0.7837 0.6525 0.7144 0.8001 0.7088 0.7890 0.7104

Figure 1

26

Figure 2

27

Figure 3

28

Performance predictions of laminar and turbulent heat ...

Predictions of a Recurrent Model of Orientation

Universality of Bayesian Predictions

Capacity of Cooperative Fusion in the Presence of ...

Origin of Exopolyphosphatase Processivity: Fusion of an ... - Cell Press

Review of Image Fusion Techniques

Heat Recycling Of Data Centers - International Journal of Research in ...

fluent-cfd-steady-state-predictions-of-a-single-stage-centrifugal ...

Brochure of EVI heat pump.pdf

practical implementation of liquid cooling of high heat flux ...

numerical-investigation-of-pressure-drop-and-local-heat-transfer-of ...

Five Predictions about the State of ShipTech ... - Automotive Digest

Heat Recycling Of Data Centers

Heat_ Heat Content of Biodiesel.pdf

Autoregressive product of multi-frame predictions can improve the ...

Robust Predictions of Dynamic Optimal Contracts

Five Predictions about the State of ShipTech ... - Automotive Digest

Real-time predictions of geomagnetic storms and ...

Market Power, Survival and Accuracy of Predictions in ...

Development of Low Cost Package of Technologies for Organic ...

Recent applications of isatin in the synthesis of organic ... - Arkivoc

Impact Of Ranking Of Organic Search Results ... - Research at Google