High-Throughput Multi-dimensional Scaling (HiT-MDS) for cDNA-Array Expression Data M. Strickert1 , S. Teichmann2 , N. Sreenivasulu1 , and U. Seiffert1 1

Pattern Recognition Group, Gene Expression Group, Institute of Plant Genetics and Crop Plant Research Gatersleben {stricker, srinivas, seiffert}@ipk-gatersleben.de 2 University of Osnabr¨ uck [email protected]

Abstract. Multidimensional Scaling (MDS) is a powerful dimension reduction technique for embedding high-dimensional data into a lowdimensional target space. Thereby, the distance relationships in the source are reconstructed in the target space as best as possible according to a given embedding criterion. Here, a new stress function with intuitive properties and a very good convergence behavior is presented. Optimization is combined with an efficient implementation for calculating dynamic distance matrix correlations, and the implementation can be transferred to other related algorithms. The suitability of the proposed MDS for high-throughput data (HiT-MDS) is studied in applications to macroarray analysis for up to 12,000 genes. Keywords: Multi-dimensional scaling, clustering, gene expression analysis.

1

Introduction

Data analysis is a multi-stage process that usually starts at the raw data inspection from which iteratively more sophisticated data models are formulated. Thus, for unknown data, such as for gene expression levels from the first series of screening experiments, many future assumptions are based upon the groundwork of the initial results. Especially visualization techniques can help to gain basic knowledge about the data. For a decent number of continuous-valued attributes, scatter plots provide a very direct access to the data. For high-dimensional input, linear mapping techniques like the principal component analysis (PCA) or the projection pursuit (PP) can help to focus on data projections which are ‘interesting’ from the mathematical point of view, e.g. the directions of largest variance or views on the central data mass [3,14]. Self-organizing maps (SOM) are well-established non-linear neural tools for clustering data on a rectangular or a hexagonal lattice of specialized neurons [9]. The SOM, however, compels a data model that is structurally different from linear mappings. Since data are represented by prototypes, the one-to-one correspondence between input data and their low-dimensional counterparts vanishes: in case of Ultsch’s emergent SOM there are more prototypes than input points, but usually input samples W. Duch et al. (Eds.): ICANN 2005, LNCS 3696, pp. 625–633, 2005. c Springer-Verlag Berlin Heidelberg 2005 

626

M. Strickert et al.

are mapped to a smaller number of specialized neurons. As a result, prototypebased representations give a more abstract view on the data, which is beneficial for model generalization and simplification but not for revealing details of the data distribution or of the data granularity. A nonlinear method that assigns each data point a related copy in the target space is multi-dimensional scaling (MDS) [10]. Its most common objective for visualization is arranging the lowdimensional copies in such a way that their mutual distances match best the original distances. This optimization with respect to good surrogate locations is realized by stress function minimization for which several alternatives exist [2]. Here, MDS is realized by a stochastic gradient descent on a stress function with superior convergence properties. The proposed high-throughput HiT-MDS method is suitable for embedding very large data sets in a distance-preserving way into Euclidean space, usually with 2 or 3 dimensions for visualization. Using MDS for large data sets is closely related to the suggested speed-up of Sammon’s mapping by Pekalska et al. [12] and to Naud and Duch who combine MDS with the data prototypes from a learning vector quantizer [11]. The topic presented here, MDS for gene expression analysis, coincides perfectly with a recent publication of Taguchi and Oono introducing a non-metric MDS variant called nMDS [6]. Like other MDS approaches, the proposed HiT-MDS can be used for non-metric data too, for example, by considering the rank order of mutual data items; also non-symmetric dissimilarity measures and sparse distance matrices can be realized for getting global reconstructions of incomplete local distance information.

2

Improved Multi-dimensional Scaling

The challenge of multi-dimensional scaling lies in the optimization of the free ˆ n×d which are the locations of points ˆxi = (ˆ parameters ˆ xi ∈ X xi1 , . . . , xˆid ) in the d-dimensional target space corresponding to i = 1 . . . n input points xi ∈ Xn×q . ˆij = d(ˆxi , ˆxj ) of all The most canonic approach to forcing the mutual distances d data pairs indexed by (i, j) to the original distances dij = d(xi , xj ) is defined by minimizing the raw stress function of classical scaling [5]: s=

n  i
!

ˆij )2 = min (dij − d

with distances

dij (xi , xj ) =

d 

(xik − xjk )2 .

k=1

For computational convenience, the squared Euclidean distance is considered, and symmetry dij = dji can be assumed for the distance matrix. However, as pointed out by Basalaj [1], the straight-forward minimization of the stress by gradient descent is prone to getting stuck in local optima. For Euclidean distances though, eigenvalue methods can be used to find a solution, making classical scaling equivalent to PCA [5]. The freely available XGvis package by Buja et al. [2] implements different types of MDS for which eigenmethods fail. XGvis comes with a very good documentation of the technical details. Their stress 1 function is of the general form s = (1 − cosϕ2 ) 2 which is based on the squared cosine between the original distance matrix D and the reconstructed distance

HiT-MDS for cDNA-Array Expression Data

627

ˆ thereby, the variables in ϕ are parameters of the used Minkowski matrix D; metric, for data weighting, and for the specification of the order of the moments of the original distances. Here, an approach is taken which less versatile than the one of Buja et al., but which yields very faithful overall data representations: such most ‘honest’ data displays are aimed at by maximizing the Pearson correlation of distances: ˆ = r(D, D)



n i=j n i=j



ˆ ij − µ ˆ ) (dij − µD ) · (d D

(dij − µD )2 ·

n i=j

ˆ ij − µ ˆ )2 (d D

=: √



ˆ B(d) C·

ˆ D(d)

∈ [−1; 1]

ˆ ij )i,j=1...n those ˆ = (d Matrix D = (dij )i,j=1...n contains pattern distances and D of the reconstructions. Maximizing r yields a balanced consideration of all distance matrix entries; up to the author’s knowledge, this maximization cannot be solved with eigenmethods, therefore inducing the proposed heuristic solution. The right fraction is a convenient one-to-one correspondence to the left term ˆ is related to the mixed summation of both original used in the following: B(d) ˆ refers to the dissimilarities dependent on the and reconstructed distances, D(d) ˆ choices of the reconstructions X, and C denotes the connection to the initially calculated and thus constant input pattern distances. In order to formulate an efficient correlation-based stress function for minimization, the tight range of the Pearson correlation [−1; 1], −1 denoting anticorrelation, 0 uncorrelatedness, and 1 perfect correlation, is swapped and widened by inverse power transform:

 ˆ s = r(D, D)

−K

ˆ = for Euclidean distances D



d l=1

(ˆ xil − x ˆjl )2



j=i ,

.

i,j=1...n

Integer K > 0 are assumed. For even K, best ‘inverse’ point configurations with anti-correlated distance relationships might be found. This theoretical situation, however, has never been encountered experimentally – maybe the applied random projection initialization of the embedded points prevents such a defect. Thus, no matter which relationships are coded by D, the inverse power r−K can ˆ in the Euclidean tarbe minimized by optimally arranging the reconstructions X get space. This is achieved by a gradient descent on the stress function s, which requires finding zeros of the derivatives of s w.r.t. the free parameters x ˆik : ! ˆ = ˆ ◦X s = r−K ◦ D min ⇒

j=i  ˆij ! ∂r−K ∂ d ∂s = · = 0, i = 1 . . . n i ˆ ∂x ˆk ∂x ˆik j=1...n ∂ dij

(1)

of step size γ into the Solutions are found by iterative updates ∆ˆ xik = −γ · ∂∂s x ˆik direction of the steepest gradient of s. The missing derivatives in Eqn. 1 are ∂r−K ˆ ij ∂d − ˆ Bij (d)

ˆ

=

·D(d)) ∂ (CB( ˆ K d)

K 2

= K · r−K ·

ˆ − (dij − µD ) · D − (d) ˆ ˆ ij − µ ˆ ) · B − (d) (d ij ij D ˆ ˆ B(d) · D(d)

ˆ ij ∂d ˆ − (dij − µD ) · (d ˆ ij − µ ˆ ) , = B(d) D

ˆ ij ∂d = 2 · (ˆ xik − x ˆjk ) ∂x ˆik



d l=1

− ˆ ˆ − (d ˆ ij − µ ˆ )2 Dij (d) = D(d) D

Æ

ˆ ij . (ˆ xil − x ˆjl )2 = 2 · (ˆ xik − x ˆjk ) d

628

M. Strickert et al.

The embedding procedure is outlined in Algorithm 1. High-dimensional data items are assumed to be embedded. They are used for the initialization of their low-dimensional counterparts by random linear mapping. Random mappings provide a first-shot for loosely distance-preserving first target point configurations [8]; here, the matrix elements of R (Alg. 1, l. 2) are randomly taken from [−1; 1]. Input data are also used for calculating the distance matrix D which, in an alternative initialization setup, could be directly taken from file. For computational convenience the mean is subtracted from D, which does not affect ˆ Target point adaptation is done the Pearson correlation computations r(D, D). iteratively in a neural network training manner: in each step a random point is taken its distance relations to the other points is improved by changing its coordinates. Convergence takes place when for all points the stress function cannot ˆ is maximum. be further minimized, i.e. when the correlation r(D, D) Algorithm 1 HiT-MDS 1: 2: 3: 4: 5: 6: 7:

Read input data X. ˆ by random projection X ˆ n×d = Xn×q · Rq×d . Initialize X Calculate distance matrix D and subtract its mean ⇒ constant C . ˆ ⇒ initial B(d), ˆ D(d). ˆ Calculate D repeat Draw a pattern index 1 ≤ i ≤ n from randomly shuffled list. for all j = i do −K

ˆ ∂d

ij xi − ∂r { accumulate derivatives corresponding to Eqn. 1 } 8: ∆ˆ xi ← ∆ˆ ˆ ij · ∂ˆ xi ∂d 9: end for ˆ xi + γ · ∆ˆ xi { adapt location of target point } 10: xi ← ˆ ˆ xi , ˆ xj ) influenced by new position of point ˆ xi ; 11: Recalculate distances d(ˆ ˆ ˆ 12: thereby, keep track of changes in B(d) and D(d). 13: until convergence criterion is met. ˆ by largest dimension variance. 14: Postprocess: center and normalize X

Substantial Speedup is obtained if, for each update step, the distance matrix correlation is not completely recalculated but incrementally adjusted. An unoptimized implementation would take an order of O(n2 ) matrix elements into ˆij change during each iteration on consideration, but since only n − 1 distances d ˆij in the pattern i, much computing power is saved by tracking only changes ∆d ˆ ˆ constituents B(d) and C (d) of r. Generally, new values are expressed by means of existing old values plus adding the specific changes caused by the adaptation of the i-th target point. Reformulating the average target distance yields: ˆ µ(D

new

ˆ ) = µ(D

old

i

ˆ ) = µ(D ˆ + ∆D

old

i

ˆ ), ) + µ(∆D

i

ˆ kj )k,j=1...n ˆ = (∆d ∆D

ˆ kj ) for k = i. In the same manner, incremental update with zero elements of (∆d formulas can be given for the constituents of the correlation r and thus for the terms in the stress function s that depend on the target distances (Alg. 1, l. 12):

HiT-MDS for cDNA-Array Expression Data



629

j=i

ˆ = Bold (d) ˆ + Bnew (d)

j=1...n

 j=i

i

ˆ )2 · with T := µ(∆D

ˆ ij · dij , ∆d



ˆ = Dold (d) ˆ + (n2 − n) · T Dnew (d)





ˆ ij · ∆d ˆ ij + 2 · d ˆ old ˆ old ) + µ(∆D ˆ i) ∆d ij − µ(D



j=1...n

MDS with the new stress function in combination with the above speedup formulas is designed for high-throughput operation and it is therefore referred to as HiT-MDS.1 The free parameters of HiT-MDS are the exponent K of the stress function and the step size γ which can be called a learning rate. Empirical studies on different data sets have showed that the proposed algorithm is very robust with respect to the choices of γ and K. A two phase learning, first a rough ordering with K = 4 and increasing γ = 10−3 , 10−2 , 10−1 followed by a fine tuning with K = 1 and γ = 5, 25, 100, 250 is likely to meet the requirements even of unknown data sets; usually, a number between 10 and 500 training epochs will suffice to obtain good convergence for the current parameter set.

3

Application to cDNA Array Expression Data

Two studies are presented that show the suitability of HiT-MDS for the analysis of high-throughput expression data. In the experimental design, several thousand gene expression patterns were analyzed which correspond to barley seed development in the time span 0–26 days after flowering, in three distinct tissues: pericarp (p), endosperm (end), and embryo (e). The conducted experiments resulted in high-throughput expression data with a relatively small number of experiments and a large number of gene expression levels. The two major questions of interest are: 1. How are the experiments, representing the tissues at a particular developmental stage, characterized with respect to their transcriptome similarity of specifically expressed genes? 2. Can inter-experimental clusters of coexpressed genes be identified for which, afterwards, regulatory functions can be assigned? In order to address these key questions it is necessary to bring the high-dimensional data for visualization into lower dimension, thereby keeping their original distances. HiT-MDS is applied to obtain such visualizations, first for grouping the experiments with similar transcriptome, secondly for identifying the hot-spots of coexpressed genes from unfiltered expression data. 1. Experiment Clustering. The first HiT-MDS application aims at the visualization of the inter-experiment relationships: a two-dimensional embedding is wanted that reconstructs the distances of the original space of gene expression intensities. The common way to do this is principal component analysis (PCA). More recently, independent component analysis (ICA) has been used which is a linear decomposition technique [7], or locally linear embeddings LLE are computed [13]. However, none of these methods is explicitly formulated like MDS for the most natural approach to the original data, which is the best available 1

A C-language command line version is available at http://hitmds.webhop.net/.

630

M. Strickert et al. Principal Component Plot (28 gene expression experiments)

Multi−Dimensional Scaling (28 gene expression experiments) 0.8

10e 2 12e 10e12e

08e1

0.8

2

0.2

02e

00e1

2

08e

06e

2

2

1

04e

1

00e2 04e

1

2

0

02e 00e 02e 21 1 00e04e 2

1

−0.2

04e1

−0.1

04p 04p 1 02p 02p 11 00p 2 00p

−0.4 00p 00p21 02p 04p 02p 1 1 2 04p

−0.4

1

−0.7 −1 −1.5

1

1

02e1

x ˆ2

pc2

0.4

08e

06e 1 06e

0.5 0.2

0.6

10e 12e 2 2 10e 1 12e

08e

06e

11 2

1.1

08p1

06p

1

−1

−0.5

12p 10p1 1

06p2

08p

0

0.5

pc1

2

2

08p

1

06p1

−0.6

10p 12p2

1

10p1 12p

1

06p2

−0.8

08p

2

2

1

1.5

2

−1 −1.5

−1

−0.5

0

10p2 12p2 0.5

1

1.5

2

x ˆ1

Fig. 1. Gene expression experiments embedded in 2D. Left: PCA. Right: HiT-MDS.

distance reconstruction. The analyzed data set comprises 28 experiments, composing 7 developmental points on from endosperm/embryo (e) and pericarp (p) tissue which are reproduced by two independent experiments (1,2); each experiment is represented by 1,421 log-transformed gene expression values. Fig. 1 compares the experiment visualization of PCA (left) and HiT-MDS (right). Both embeddings show a temporal ordering from 00 to 12 days, but tissues e vs. p are more strictly separated by HiT-MDS, and the 00−04e1/2 cluster as well as the 08p1/2 pair are more faithfully spread. Distance correlations characterize the ˆ HiT-MDS ) = 0.981, ˆ PCA ) = 0.973 and r(X, X reconstruction quality; they are r(X, X which supports higher confidence for the HiT-MDS embedding. For these obtained groupings, the identification of regulating key genes will be of interest in future investigations. 2. Gene Clustering. A much more demanding experiment is conducted with data from 12k-arrays (11786 genes) available for 31 developmental stages of three tissue types. The left panel of Fig. 2 shows a density plot of the embedded 31-dimensional genes zoomed to the region of highest interest. Dark patches represent areas of high densities, i.e. many points with small Euclidean ˆ HiT-MDS ) = 0.979. In condistance, leading to an overall correlation of r(X, X ˆ trast, PCA yields a correlation of only r(X, XPCA ) = 0.937, which would result in misleading interpretations of a PCA-based density plot (omitted). HiT-MDS becomes even more useful for coexpression analysis, if the Euclidean distance measure for the 31-dimensional gene vectors is replaced by (1-correlation) between them; then, the maximum dissimilarity is 2 and the best correlation is 0. Thus, a 2D-embedding of such a dissimilarity matrix will cluster gene profiles of high correlation opposed to anti-correlated groups of genes. This is illustrated in the right panel of Fig. 2 for a preselected subset of 2000 embedded genes. By focusing on these fewer points, i.e. by loosening the global embedding constraints, more details become visible. The gene profiles associated with the points inside the exemplarily picked encircled high-density areas are plotted in Fig. 3. A good spatial separation and specificity is observed for the embedded genes and their corresponding relevance profiles, making the proposed approach a valuable tool for gene expression analysis: it is demonstrated that by HiT-MDS hot-spots of

HiT-MDS for cDNA-Array Expression Data

90

631

25

160 20

100

140

110

120

120

100

130

80

140

60

150

40

20 40

C2 15

60

80

10

C1

100

5 20

160

120 140

150

160

170

180

190

200

20

210

40

60

80

100

120

Fig. 2. Genes embedded in 2D. Left: Euclidean inter-gene embedding of 12k-genes. Right: correlation-based similarity on 2000 gene subset with finer clustering details.

coexpressed genes can be identified despite the presence of noise in the expression data by simultaneously processing all available data. Such coexpressed genes will be further examined for their functional role.

4

Conclusions

The proposed improvement of multi-dimensional scaling, HiT-MDS, allows to embed very large data sets into a low-dimensional Euclidean space, making the method suitable for visual data inspection. As a nonlinear method, HiT-MDS is able to capture even minor differences in the data by faithfully reconstructing given distance relations. The corresponding low-dimensional surrogates are obtained by a neural network type of incremental training that minimizes a fast converging stress function. This function is based on the correlations between input and output distances; thereby, functional terms contributing to the correlation calculation are also updated in an incremental way. In contrast to SOM learning, HiT-MDS is no mapping technique, which restricts its applicability to static data sets. However, no topological parameters have to be chosen, consequently HiT-MDS is not subject to topological restrictions of an underlying 3 4

C 1 2

C 2 3 2

1 1 0 0 -1

-1

-2 -2

-3

1

1

2

2

1

1 1

1

1

1

2

2

2

1

2

2

2

1 2

1

1

1

1

2

1

1

Fig. 3. Gene profiles corresponding to cluster C1 (left) and C2 (right) of Fig. 2.

2

2 n d n d n d n d n d n d n d n d n d n d n d n d n d n d

d

d

1

0 e 2 e 4 e 6 e 8 e 0 e 2 e 4 e 6 e 8 e 0 e 2 e 4 e 6 e

d

d d

d

d 0 e n 2 e n 4 e n 6 e n 8 e n 0 e n 2 e n 4 e n 6 e n 8 e n 0 e n 2 e n 4 e n 6 e n

d

d

d

d

d

d

d

1

1

1

1

1

1

1

1

2

1

2

2

2

0 p 2 p 4 p 6 p 8 p 0 p 2 p 4 p 6 p _ e _ e _ e _ e _ e _ e _ p _ p _ p _ p _ p _ p _ p _ p 2 e 4 e 6 e 8 e 0 e 2 e 4 e 6 e

0 p 2 p 4 p 6 p 8 p 0 p 2 p 4 p 6 p _ e _ e _ e _ e _ e _ e _ p _ p _ p _ p _ p _ p _ p _ p 2 e 4 e 6 e 8 e 0 e 2 e 4 e 6 e

-4

-3

632

M. Strickert et al.

neuron grid, and the success the data model can be well-evaluated by the correlation between input distances and those of the embedded points. Obviously, HiT-MDS can be applied to many problems beyond bioinformatics, e.g. for the visualization of high-dimensional items, for graph layout [4], and also for space conversion like the illustrated one, from correlation space to Euclidean space. Future attention is put on further improvements of the stress function – first preliminary results show that even faster initial convergence and more tolerant parameter choices are achieved by replacing the presented exponential correlations by Fisher’s Z transform. Finally, parallel implementations on an SMP machine are under consideration for making interactive zooming into large subsets of analyzed data easily possible. Acknowledgement. Thanks to Dr. Volodymyr Radchuk for helping probe preparation for the macroarray experiments. Thanks also to the anonymous reviewers for helping to improve the this contribution by their valuable comments. The work is supported by BMBF grant FKZ 0313115, GABI-SEED-II.

References 1. W. Basalaj. Proximity Visualization of Abstract Data. PhD thesis, Computer Lab, Univ. of Cambridge, URL: http://www.pavis.org/essay/pavis.pdf, 2001. 2. A. Buja, D. Swayne, M. Littman, N. Dean, and H. Hofmann. Interactive Data Visualization with Multidimensional Scaling. Report, University of Pennsylvania, URL: http://www-stat.wharton.upenn.edu/˜buja/, 2004. 3. D. Cook, A. Buja, and J. Cabrera. Grand tour and projection pursuit. Journal of Computational and Graphical Statistics, 4(3):155–172, 1995. 4. E. Gansner, Y. Koren, and S. North. Graph drawing by stress majorization. In Lecture Notes in Computer Science, Proc. of 12th Int. Symp. Graph Drawing (GD’04). Springer, 2004. 5. J. Gower. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53:325–338, 1966. 6. Y. h. Taguchi and Y. Oono. Relational patterns of gene expression via non-metric multidimensional scaling analysis. Bioinformatics, 21(6):730–740, 2005. 7. A. Hyv¨ arinen, J. Karhunen, and E. Oja. Independent component analysis. John Wiley & Sons, 1st edition, 2001. 8. S. Kaski. Dimensionality reduction by random mapping: Fast similarity computation for clustering. In Proc. of the International Joint Conf. on Neural Networks (IJCNN’98), volume 1, pages 413–418. IEEE Service Center, Piscataway, NJ, 1998. 9. T. Kohonen. Self-Organizing Maps. Springer-Verlag, Berlin, 3rd edition, 2001. 10. J. Kruskal and M. Wish. Multidimensional Scaling. Sage Publications, New Park, California, 1978. 11. A. Naud and W. Duch. Visualization of large data sets using MDS combined with LVQ. In L. Rutkowski and J. Kacprzyk, editors, Advances in Soft Computing, pages 632–637. Physica Verlag, 2002. 12. E. Pekalska, D. deRidder, R. Duin, and M. Kraaijveld. A new method of generalizing Sammon mapping with application to algorithm speed-up. In M. Boasson, J. Kaandorp, J. Tonino, and M. Vosselman, editors, Proc. 5th Annual Conf. of the Advanced School for Computing and Imaging, pages 221–228. ASCI, 1999.

HiT-MDS for cDNA-Array Expression Data

633

13. S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000. 14. M. Strickert, S. Teichmann, N. Sreenivasulu, and U. Seiffert. ’DiPPP’ Online SelfImproving Linear Map for Distance-Preserving Macro-Array Data Analysis. To appear at WSOM 2005.

LNCS 3696 - High-Throughput Multi-dimensional ...

Especially visualization techniques can help to gain basic knowledge about the data. ... are well-established non-linear neural tools for clustering data on a rectangular .... Draw a pattern index 1 ≤ i ≤ n from randomly shuffled list. 7: for all j = i ...

618KB Sizes 0 Downloads 147 Views

Recommend Documents

11-3696(L)
2 Apr 2012 - Carl Blessing, Edward A. Scerbo, John Cronin, Charles Bonsignore, Andrew. Dremak, Todd Hill, Curtis Jones, Joshua Nathan, James Sacchetta, David Salyer,. Susie Stanaj, Paul Stasiukevicius, Scott Byrd, Glenn Demott, Melissa Fast, James. H

Highthroughput DNA sequencing concepts and ...
available to many more researchers and projects. However, while ... standing of the technologies available; including sources of error, error rate, as well as the ...... ogy [14] and, recently, IBM's proposal of .... This may open the market further

Multidimensional generalized coherent states
Dec 10, 2002 - Generalized coherent states were presented recently for systems with one degree ... We thus obtain a property that we call evolution stability (temporal ...... The su(1, 1) symmetry has to be explored in a different way from the previo

sv-lncs - Research at Google
In dynamic web application development, software testing forms an ... Thus, in practice, a small company can rent these infrastructures from external cloud-.

Simultaneous multidimensional deformation ...
Jul 20, 2011 - whose real part constitutes a moiré interference fringe pattern. Moiré fringes encode information about multiple phases which are extracted by introducing a spatial carrier in one of the object beams and subsequently using a Fourier

Multidimensional Skill Mismatch - Fatih Guvenen
Dec 18, 2015 - the Armed Services Vocational Aptitude Battery (ASVAB) on workers, and the. O*NET on .... (i) his ability to learn that skill and (ii) the occupation he works in. In particular, the ...... Highest education Ш 4-year college. 31.74%.

Multidimensional Skill Mismatch - Fatih Guvenen
Dec 18, 2015 - Figure 1 – Wage Gap Between the Best- and Worst-Matched Workers ...... same theme as before that math and verbal skills are distinct, yet ...

Mondrian Multidimensional K-Anonymity
Optimal multidimensional anonymization is NP-hard. (like previous optimal ...... re-partition the data into two “runs” (lhs and rhs) on disk. It is worth noting that this ...

Mining Top-K Multidimensional Gradients - CiteSeerX
Several business applications such as marketing basket analysis, clickstream analysis, fraud detection and churning migration analysis demand gradient data ...

olap solutions building multidimensional information systems pdf ...
olap solutions building multidimensional information systems pdf. olap solutions building multidimensional information systems pdf. Open. Extract. Open with.

Complementarity and Multidimensional Heterogeneity ...
Jun 19, 2013 - not exist: simply assuming a large market is not sufficient to guarantee existence of stable matchings ..... seller must have access to all the contracts in Z; but if Z blocks ((hb(OB))b∈B,(hs(OS))s∈S), then some .... bundle receiv

Semantic Maps and Multidimensional Scaling
dimension running from left to right on the X-axis corresponding to conventional wisdom .... there and runs with data in plain text files on a Windows computer (see ..... expanded the data set to begin to account for the entire case system in ...

The multiobjective multidimensional knapsack problem ...
From the first survey [77] in 1994 till [24] in 2002, a lot of papers have been .... In less than two hours of computational time, they solved biob- .... these instances (that we will call the ZMKP instances), the number of objectives is equal ......

Multidimensional Inequality Comparisons : a ...
... checking the PGL%condition is necessary and suffi cient in the bot% ..... by providing some general observations about their differences from a theo%.

Multidimensional Scaling of Integrated Neurocognitive ...
Dec 14, 2001 - statistical technique that can be used to define sub- systems of ...... MDS and generalized Procrustes analysis, generates clouds or swarms of ...

The multiobjective multidimensional knapsack problem
new heuristic approach (section 3), the data used (section 4) and the results obtained (section. 5). ...... for metaheuristics based on the epsilon-constraint method.

LNCS 6361 - Automatic Segmentation and ... - Springer Link
School of Eng. and Computer Science, Hebrew University of Jerusalem, Israel. 2 ... OPG boundary surface distance error of 0.73mm and mean volume over- ... components classification methods are based on learning the grey-level range.

Resolving Multidimensional Ambiguity in Blind ... - Semantic Scholar
component in a variety of modern wireless communication ..... is the corresponding (K + L + 1) × (K + L) matrix. Define. ˜F = diag{¯F1, ¯F2,..., ¯FNt } ..... Labs Tech.

Resolving Multidimensional Ambiguity in Blind ... - Semantic Scholar
component in a variety of modern wireless communication systems. ... applications. In [11] ...... on UWB Communication Systems—Technology and Applications.

Ironing, Sweeping, and Multidimensional Screening
Jul 28, 2014 - (a) First compute the expected rent of the agent as a function of the allocation of ... We proceed in Section 4 by studying the simpler problem (which we call the "relaxed" ...... It is a geometric condition, linked to the relative pos

Multidimensional welfare rankings under weight ...
analytic solution is derived for an interesting special case of the model ... dominance, they established an analytic criterion for determining whether a (pairwise) ...