Lorant Bodis1, Alfred Ross2, Ernö Pretsch1 1 2

Laboratory of Organic Chemistry, ETH Hönggerberg, CH-8093 Zürich, Switzerland Pharmaceuticals Division, F. Hoffmann-La Roche Ltd, CH-4070 Basel, Switzerland

Novel Similarity Measure for Comparing Spectra Introduction

Tests with Artificial Spectra •Performance of the bin method in comparison with other similarity criteria:

•Ten arbitrarily chosen compounds and the corresponding predicted 1H NMR spectra:

1.0 0.8 0.6

0.2

Unrelated spectra pairs

0

•Additionally, for each structure, two further spectra were calculated in which the multiplets were randomly shifted using a normal distribution with a standard deviation (SD) of 0.2 and 0.4 ppm.

•For each division, the similarity index, SIn, is calculated:

(

i =1

)

1.0

0.5

0.4

0.3

0.2

50

SIn*

0.80

n=1

0.60

SIn

0.20

0

5

10

15

20

25

30

35

40

45

50

Similarity of two functions f(x) and g(x) [3]:

ff

30 20

25

30

SD, 0.4 ppm SD, 0.2 ppm

35

40

45

50

55

60

•Cross-correlation method: triangle weighting, 1.4 ppm cut-off range; overlap: 306 (27%)

65

false negative

0

0.2

0.4

0.6

0.8

1.0

true negative

true positive

false negative

Conclusions

20 10

0.0

false positive

30

0

0.0

0.2

0.4

(r) dr ∫ w(r)cgg (r) dr

c fg (r) =



f (x)g(x + r)dx

with cfg(r) as the cross-correlation function, cff(r) and cgg(r) as the autocorrelation functions and w(r) as the triangular weighting function

0.6

0.8

•Similarity of related 1H NMR spectra has been successfully detected by a novel method based on dividing the spectra in bins. 1.0

S

false positive

50

true negative

40

20

0

false positive

true negative

true positive

0.0

0.2

false negative

0.4

0.6 S

0.8

1.0

•It has been shown that the correlation coefficient does not provide a useful similarity measure and that the recently introduced crosscorrelation-based method performs less well than our novel similarity measure. •Application of the new method with spectra of two or more dimensions including image analysis is straightforward.

40

30

10

(r) dr

•Bin method: minimal bin width of 0.4 ppm; overlap: 138 (12%)

40

10

50

Number of cases

Cross-correlation Method

∫ w(r)c

20

•Bin method: minimal bin width of 0.4 ppm

Number of bins, n

S fg =

15

S

0.00

fg

50

true negative

true positive

0.40

∫ w(r)c

false positive

40

Number of cases

* n

Number of cases

N

∑ SI

SIn

1 N

10

•Cross-correlation method: triangle weighting, 1.4 ppm cut-off range

Bin width, ppm 10.0

1.00

S=

5

•Histogram of similarity values, S, of measured and calculated spectra using correct and random structure assignments:

•Comparison with contingency diagrams: too low threshold values of S will consider incorrect pairs as correct ones, i.e., as false positives, while with too high threshold values of S, the number of false negatives will increase.

where Ix and Iy are the total integrals of the spectra x and y; Ix(i) and Iy(i) are the integrated intensities of the respective spectra within bin i •Similarity value:

•the other based on a randomly selected structure from the library (random assignment)

•Ideally, the comparison of spectra belonging to different structures should result in a low similarity, and of those with the randomly modified spectrum of the same structure, in a high one.

Number of cases

n

I xy (n) = ∑ min I x (i), I y (i)

I x + I y − I xy (n)

•one on the basis of the correct structure (normal assignment) and

•The ten spectra are compared with those corresponding to other structures (entries 1–45) and with those having randomly shifted signal groups (entries 46–55: SD = 0.4 ppm and entries 56–65: SD = 0.2 ppm). The last two sets correspond to an average of the results obtained with 100 randomly shifted spectra.

•The spectra are successively divided into n bins (n = 1,N, N being the maximal number of bins):

I xy (n)

•Each measured spectrum was compared with two predicted spectra:

Spectra pairs

•The total integral of each individual spectrum is normalized to the number of H atoms in the corresponding molecule

SI n =

Correlation coefficient

0.0

Similarity of two spectra x and y:

NMR spectra derived from a library of Chemical Concepts [4].

•Ideally, all normal comparisons should lead to a high, and the random ones to a low similarity value.

0.4

Bin Method

•1146

Bin method Crosscorrelation method

S

Most available vector comparison methods such as the correlation coefficient [1] and Tanimoto coefficient [2] are only able to find pointwise similarities. Similarity criteria for spectra comparison should include information about the neighborhood of the corresponding items in order to identify shifted signals as well. So far, only few such methods have been described. A recent method by de Gelder et al. [3] is based on a locally weighted cross-correlation function being normalized with the geometric mean of the individual autocorrelation functions. A much better performance has been achieved with our novel similarity criterion called bin method.

Tests with Measured Spectra 1H

30 20 10 0

false negative

true positive

0.0

0.2

0.4

S

0.6

0.8

References 1.0

•The ten true positive pairs result from comparing the original spectra with those having randomly shifted signal groups applying SD = 0.4 ppm (left) and SD = 0.2 ppm (right).

[1] [2] [3] [4]

K. Varmuza, M. Karlovits, W. Demuth, Anal. Chim. Acta 2003, 490, 313. P. Willett, J. B. Barnard, G. M. Downs, J. Chem. Inf. Comput. Sci. 1998, 38, 983. R. de Gelder, R. Wehrens, J. A. Hageman, J. Comp. Chem. 2001, 22, 273. Chemical Concepts GmbH, P.O. Box 100202, D-69442 Weinheim.

Novel Similarity Measure for Comparing Spectra

20. 30. 40. 50 false positive true negative true positive false negative. 0.0. 0.2. 0.4. 0.6. 0.8. 1.0. S. Num ber of c as es. S false positive true negative true positive.

291KB Sizes 1 Downloads 203 Views

Recommend Documents

A vector similarity measure for linguistic approximation: Interval type-2 ...
interval type-2 fuzzy sets (IT2 FSs), the CWW engine's output can also be an IT2 FS, eA, which .... similarity, inclusion, proximity, and the degree of matching.''.

Cross-Lingual Semantic Similarity Measure for ...
users. So the quality of the translations may vary from a user to another. ... WIKI and EuroNews corpora were collected and aligned at article level in [19]. WIKI is collected from. Wikipedia website8 and EuroNews is collected from EuroNews ..... of

A vector similarity measure for linguistic approximation
... Institute, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, ... Available online at www.sciencedirect.com.

Frequency And Ordering Based Similarity Measure For ...
the first keeps the signatures for known attacks in the database and compares .... P= open close ioctl mmap pipe pipe access access login chmod. CS(P, P1) ... Let S (say, Card(S) = m) be a set of system calls made by all the processes.

Learning Similarity Measure for Multi-Modal 3D Image ...
The most popular approach in multi-modal image regis- ... taken within a generative framework [5, 4, 8, 13]. ..... //www.insight-journal.org/rire/view_results.php ...

A novel method for measuring semantic similarity for XML schema ...
Enterprises integration has recently gained great attentions, as never before. The paper deals with an essential activity enabling seam- less enterprises integration, that is, a similarity-based schema matching. To this end, we present a supervised a

Model-Based Similarity Measure in TimeCloud
Our experimental results suggest that the approach proposed is efficient and scalable. Keywords: similar measure, time-series, cloud computing. 1 Introduction.

Refinement-based Similarity Measure over DL ...
2 Computer Science Department. Drexel University ..... queries that can be represented, it also simplifies to a large degree the similarity assessment process that ...

Model-Based Similarity Measure in TimeCloud
trajectories from moving objects, and scientific data. Despite the ... definitions, and establishes the theoretical foundations for the kNN query process presented ...

a novel coherence measure for discovering scaling ...
Discovering Scaling Biclusters from Gene Expression Data 855 ... There are different types of biclusters which are defined as follows. 12 ...... and data mining.

The Correlation Ratio as a New Similarity Measure for ...
ratio provides a good trade-off between accuracy and robustness. 1 Introduction ..... to each other modality in order to visualize the quality of registration.

A Vector Similarity Measure for Type-1 Fuzzy Sets - Springer Link
Signal and Image Processing Institute, Ming Hsieh Department of Electrical ... 1 In this paper we call the original FSs introduced by Zadeh [10] in 1965 T1 FSs ... sum of membership values, whereas the location (p2) was defined as the center.

ESSENTIAL CLOSURES AND AC SPECTRA FOR ...
URL: http://www.math.missouri.edu/personnel/faculty/makarovk.html. Department of Mathematics, California Institute of Technology, Pasadena, CA. 91125, USA.

Comparing Alternatives for Capturing Dynamic ...
the application domain, like airplanes or automobiles or even unlabeled moving ... then modeling and/or tracking moving objects share the common drawback of ..... LIBSVM: a library for support vector machines, 2001, software available at.

Comparing Authentication Protocols for Securely ...
wireless, hands-free, voice-only communication device without ... designing a wireless, voice-response communication ..... The wireless technology (currently.

A novel matrix-similarity based loss function for joint ...
Jun 7, 2014 - 39.2. 82.5. 60.25. 15.5. 92.3. 68.7. MRI-S. 91.2. 85.9. 92.5. 96.7. 76.7. 93.3. 37.6. 83.7. 64.5. 24.9. 95.8. 70.6. HOGM. 93.4. 89.5. 92.5. 97.1. 77.7.

Comparing methods for diagnosing temporomandibular ...
strate good agreement with diagnoses obtained by MRI. .... with a 1.5 tesla MRI scanner and a dedicated circular-polarized .... intra- and interob- server reliability.

A Recipe for Concept Similarity
knowledge. It seems to be a simple fact that Kristin and I disagree over when .... vocal critic of notions of concept similarity, it seems only fair to give his theory an.