Introduction To DCA K.K M [email protected]

December 12, 2016

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

1 / 44

.

Contents

Contents

• Distance And Correlation • Maximum-Entropy Probability Model • Maximum-Likelihood Inference • Pairwise Interaction Scoring Function • Summary

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

2 / 44

.

Distance And Correlation

Definition of Distance

Definition of Distance

A statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be between an individual sample point and a population or a wider sample of points. • d(x , x ) = 0

▶ Identity of indiscernibles

• d(x , y ) ≥ 0

▶ Non negative

• d(x , y ) = d(y , x )

▶ Symmetry

• d(x , k) + d(k, y ) ≥ d(x , y )

▶ Triangle inequality

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

3 / 44

.

Distance And Correlation

Diﬀerent Distances

Diﬀerent Distances

• Minkowski distance

• Pearson correlation

• Euclidean distance

• Hamming distance

• Manhattan distance

• Jaccard similarity

• Chebyshev distance

• Levenshtein distance

• Mahalanobis distance

• DTW distance

• Cosine similarity

• KL-Divergence

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

4 / 44

.

Distance And Correlation

Minkowski Distance

Minkowski Distance P = (x1 , x2 , ..., xn ) and Q = (y1 , y2 , ..., yn ) ∈ Rn • Minkowski distance: (

n ∑

1

|xi − yi |p ) p

i=1

• Euclidean distance: p = 2 • Manhattan distance: p = 1 • Chebyshev distance: p = ∞

lim (

p→∞

n ∑ i=1

1

n

|xi − yi |p ) p = max |xi − yi | i=1

x • z-transform: (xi , yi ) 7→ ( xi −µ σx ,

yi −µy σy ),

xi and yi is independent

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

5 / 44

.

Distance And Correlation

Mahalanobis Distance

Mahalanobis Distance

• Mahalanobis Distance

Covariance matrix C = LLT L−1 (x −µ)

Transformation x −−−−−−→ x ′

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

6 / 44

.

Distance And Correlation

Mahalanobis Distance

Example for the Mahalanobis distance

Figure: Euclidean distance

Figure: Mahalanobis distance

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

7 / 44

.

Distance And Correlation

Pearson Correlation

Pearson Correlation

• Inner Product: Inner (x , y ) = ⟨x , y ⟩ =

xi yi

i

∑ xi yi • Cosine similarity: CosSim(x , y ) = √∑ 2i √∑ i

• Pearson correlation

xi

i

yi2

=

⟨x ,y ⟩ ∥x ∥ ∥y ∥

(xi − x¯ )(yi − y¯ ) √∑ Corr (x , y ) = √∑ i (xi − x¯ )2 (yi − y¯ )2 ⟨x − x¯ , y − y¯ ⟩ ∥x − x¯ ∥ ∥y − y¯ ∥ = CosSim(x − x¯ , y − y¯ ) =

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

8 / 44

.

Distance And Correlation

Pearson Correlation

Diﬀerent Values for the Pearson Correlation

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

9 / 44

.

Distance And Correlation

Limits of Pearson Correlation

Limits of Pearson Correlation

Q: Why don’t use Pearson correlation in pairwise associations? A: Pearson correlation is a misleading measure for direct dependence as it only reflects the association between two variables while ignoring the influence of the remaining ones.

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

10 / 44

.

Distance And Correlation

Partial Correlation

Partial Correlation For M given samples in L measured variables: x1 = (x11 , . . . , xL1 )T , . . . , xM = (x1M , . . . , xLM ) ∈ RL Pearson correlation coeﬃcient: ˆij C rij = √ ˆii C ˆ jj C ˆij = 1 ∑M (x m − x¯i )(x m − x¯j ) is empirical covariance matrix. where C m=1 i j M Scale ecah of variables to zero-mean and unit-standard deviation, xi 7→

(xi − x¯i ) √

Cˆii

Simplify the correlation coeﬃcient: rij ≡ xi xj .

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

11 / 44

.

Distance And Correlation

Partial Correlation

Partial Correlation of Three-variable System The partial correlation between X and Y given a set of n controlling variables Z = {Z1 , Z2 , . . . , Zn }, written rXY ·Z , is ∑N ∑N i=1 rX ,i rY ,i − i=1 rX ,i i=1 rY ,i √ ∑ ∑N 2 ∑N ∑N N 2 N 2 2 i=1 rX ,i − ( i=1 rX ,i ) i=1 rY ,i − ( i=1 rY ,i )

N

rXY ·Z = √ N

∑N

In particularly, where Z is a single variable, which of a three random variables between xA and xB given xC is defined as ˆ −1 )AB (C rAB − rBC rAC √ ≡ −√ rAB·C = √ 2 2 ˆ −1 )AA (C ˆ −1 )BB 1 − rAC 1 − rBC (C

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

12 / 44

.

Distance And Correlation

Pearson V.S. Partial Correlation

Reaction System Reconstruction

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

13 / 44

.

Maximum-Entropy Probability Model

Entropy

Entropy

The thermodynamic definition of entropy was developed in the early 1850s by Rudolf Clausius: ∫

∆S =

δQrev T

In 1948, Shannon defined the entropy H of a discrete random variable X with possible values {x1 , . . . , xn } and probability mass function p(x ) as: H(X ) = −

p(x ) log p(x )

x

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

14 / 44

.

Maximum-Entropy Probability Model

Joint & Conditional Entropy

Joint & Conditional Entropy • Joint Entropy: H(X , Y ) • Conditional Entropy: H(Y |X )

H(X , Y ) − H(X ) = −

p(x , y ) log p(x , y ) +

x ,y

=−

∑ ∑

p(x , y ) log p(x , y ) +

( ∑ ∑ x

p(x , y ) log p(x , y ) +

x ,y

=−

=−

)

p(x , y ) log p(x )

y

p(x , y ) log p(x )

x ,y

p(x , y ) log

x ,y

p(x ) log p(x )

x

x ,y

=−

p(x , y ) p(x )

p(x , y ) log p(y |x )

x ,y

= H(Y |X ) .

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

15 / 44

.

Maximum-Entropy Probability Model

Relative Entropy & Mutual Information

Relative Entropy & Mutual Information

• Relative Entropy(KL-Divergence): D(p∥q) =

p(x ) log

x

p(x ) q(x )

• D(p∥q) ̸= D(q∥p) • D(p∥q) ≥ 0

• Mutual Information:

I(X , Y ) = D( p(x , y ) ∥ p(x )p(y ) ) =

p(x , y ) log

x ,y

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

p(x , y ) p(x )p(y )

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

16 / 44

.

Maximum-Entropy Probability Model

Relative Entropy & Mutual Information

Mutual Information

H(Y ) − I(X , Y ) = −

p(y ) log p(y ) −

y

=−

( ∑ ∑ y

=−

)

∑ x ,y

=−

p(x , y ) log p(y ) −

x

p(x , y ) log p(y ) −

p(x , y ) p(x )p(y )

p(x , y ) log

p(x , y ) log

x ,y

p(x , y ) log

x ,y

p(x , y ) p(x )p(y )

x ,y

x ,y

p(x , y ) log

p(x , y ) p(x )p(y )

p(x , y ) p(x )

= H(Y |X ) H(X , Y ) − H(X ) = H(Y |X ) ⇒ I(X , Y ) = H(X ) + H(Y ) − H(X , Y ) .

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

17 / 44

.

Maximum-Entropy Probability Model

MI & DI

MI & DI

(

fij (σ, ω) • MIij = fij (σ, ω) ln fi (σ)fj (ω) σ,ω • DIij =

(

Pijdir (σ, ω) ln

σ,ω

where Pijdir (σ, ω) =

1 zij

)

Pijdir (σ, ω) fi (σ)fj (ω)

)

˜i (σ) + h ˜j (ω)) exp(eij (σ, ω) + h

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

18 / 44

.

Maximum-Entropy Probability Model

Principle of Maximum Entropy

Principle of Maximum Entropy

• The principle was first expounded by E.T.Jaynes in two papers in

1957 where he emphasized a natural correspondence between statistical mechanics and information theory. • The principle of maximum entropy states that, subject to precisely

stated prior data, the probability distribution which best represents the current state of knowledge is the one with largest entropy. • maximize S = −

x

P(x ) ln P(x )dx

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

19 / 44

.

Maximum-Entropy Probability Model

Principle of Maximum Entropy

Example • Suppose we want to estimate a probability distribution p(a, b),

where a ∈ {x , y } ans b ∈ {0, 1} • Furthermore the only fact known about p is that

p(x , 0) + p(y , 0) = 0.6

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

20 / 44

.

Maximum-Entropy Probability Model

Principle of Maximum Entropy

Unbiased Principle

Figure: One way to satisfy constraints

Figure: The most uncertain way to satisfy constraints .

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

21 / 44

.

Maximum-Entropy Probability Model

Principle of Maximum Entropy

Maximum-Entropy Probability Model

• Continuous random variables: x = (x1 , . . . , xL )T ∈ RL • Constraints: ∫ • x P(x )dx = 1 ∫ M 1 ∑ m • ⟨xi ⟩ = P(x )xi dx = x = xi M m=1 i x ∫ M 1 ∑ m m • ⟨xi xj ⟩ = P(x )xi xj dx = x x = xi xj M m=1 i j x ∫ • Maximize: S = − x P(x ) ln P(x )dx

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

22 / 44

.

Maximum-Entropy Probability Model

Lagrange Multipliers Method

Lagrange Multipliers Method Convert a constrained optimization problem into an unconstrained one by means of the Lagrangian L. L = S + α(⟨1⟩ − 1) +

L ∑

L ∑

βi (⟨xi ⟩ − xi ) +

i=1

γij (⟨xi xj ⟩ − xi xj )

i,j=1

L L ∑ ∑ δL = 0 ⇒ − ln P(x ) − 1 + α + βi xi + γij xi xj = 0 δP(x ) i=1 i,j=1

Pairwise maximum-entropy probability distribution 

P(x; β, γ) = exp −1 + α +

L ∑ i=1

βi xi +

L ∑

γij xi xj  =

i,j=1 .

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

1 −H(x ;β,γ) e Z . . . .

. . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

23 / 44

.

Maximum-Entropy Probability Model

Lagrange Multipliers Method

Lagrange Multipliers Method

• Partition function  as normalization constant  ∫ L L ∑ ∑ Z (β, γ) := exp  βi xi + γij xi xj  dx ≡ exp(1 − α) x

• Hamiltonian L ∑

H(x ) := −

i=1

i=1

βi xi −

i,j=1

L ∑

γij xi xj

i,j=1

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

24 / 44

.

Maximum-Entropy Probability Model

Maximum-Entropy Probability Model

Categorical Random Variables

• jointly distributed categorical variables x = (x1 , . . . , xL )T ∈ ΩL ,

each xi is defined on the finite set Ω = {σ1 , . . . , σq }.

• In the concrete example of modeling protein co-evolution, this set

contains the 20 amino acids represented by a 20-letter alphabet plus one gap element. Ω = {A, C , D, E , F , G, H, I, K , L, M, N, P, Q, R, S, T , V , W , Y , −} and q = 21

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

25 / 44

.

Maximum-Entropy Probability Model

Maximum-Entropy Probability Model

Binary Embedding of Amino Acid Sequence

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

26 / 44

.

Maximum-Entropy Probability Model

Maximum-Entropy Probability Model

Pairwise MEP Distribution on Categorical Variables • single and pairwise marginal probability ∑ ∑

⟨xi (σ)⟩ =

P(x (σ))xi (σ) =

⟨xi (σ)xj (ω)⟩ =

P(xi = σ) = Pi (σ)

x

x (σ)

x (σ) P(x (σ))xi (σ)xj (ω)

=

∑ x

P(xi = σ, xj = ω) = Pij (σ, ω)

• single-site and pair frequency counts

xi (σ) =

xi (σ)xj (ω) =

M 1 ∑ x m (σ) = fi (σ) M m=1 i

M 1 ∑ x m (σ)xjm (ω) = fij (σ, ω) M m=1 i

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

27 / 44

.

Maximum-Entropy Probability Model

Maximum-Entropy Probability Model

Pairwise MEP Distribution on Categorical Variables • Constraints ∑

P(x ) =

x

P(x (σ)) = 1

x (σ)

Pi (σ) = fi (σ), Pij (σ, ω) = fij (σ, ω) • Maximize S = −

P(x ) ln P(x ) =

x

P(x (σ)) ln P(x (σ))

x (σ)

• Lagrangian

L = S + α(⟨1⟩ − 1) +

L ∑ ∑

βi (σ)(Pi (σ) − fi (σ))

i=1 σ∈Ω

+

L ∑ ∑

γij (σ, ω)(Pij (σ, ω) − fij (σ, ω))

i,j=1 σ,ω∈Ω .

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

28 / 44

.

Maximum-Entropy Probability Model

Maximum-Entropy Probability Model

Pairwise MEP Distribution on Categorical Variables Distribution P(x (σ); β, γ) 

L ∑ L ∑ ∑ ∑ 1 = exp  βi (σ)xi (σ) + γij (σ, ω)xi (σ)xj (ω) Z i=1 σ∈Ω i,j=1 σ,ω∈Ω

Let hi (σ) := βi (σ) + γii (σ, σ), eij (σ, ω) := 2γij (σ, ω) Then 

L ∑ ∑ 1 P(xi , . . . , xL ) ≡ exp  hi (xi ) + eij (xi , xj ) Z i=1 1≤i
.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

29 / 44

.

Maximum-Entropy Probability Model

Gauge Fixing

Gauge Fixing L(q − 1) + L(L−1) (q − 1)2 independent constraints compared to 2 Lq + L(L−1) q 2 free parameters to be estimated. 2 1=

Pi (σ), i = 1, . . . , L

σ∈Ω

Pi (σ) =

Pij (σ, ω), i, j = 1, . . . , L

ω∈Ω

Two guage fixing ways • gap to zero: eij (σq , ·) = eij (·, σq ) = 0, hi (σq ) = 0 ∑ ∑ ∑ ′ • zero-sum guage: σ eij (σ, ω) = σ eij (ω , σ) = 0, σ hi (σ) = 0

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

30 / 44

.

Maximum-Entropy Probability Model

Closed-Form Solution for Continuous Variables

Closed-Form Solution for Continuous Variables rewirte the exponent of the pairwise maximum- entropy probability distribution (

)

1 1 P(x; β, γ ˜ ) = exp β T x − x T γ ˜x Z 2 ( ) 1 1 T −1 1 −1 T −1 = exp β γ ˜ β − (x − γ ˜ β) γ ˜ (x − γ ˜ β) Z 2 2 where γ˜ := −2γ, let z = (z1 , . . . , zL )T := x − γ˜ −1 β, then (

)

1 1 1 T 1 P(x ) = exp − (x − γ˜ −1 β)T γ˜ (x − γ˜ −1 β) ≡ e − 2 z γ˜z ˜ 2 Z Z˜ normalization condition ∫

1 1 = P(x )dx ≡ Z˜ x

e− 2 z

1 T γ ˜z

.

K.K M (HUST)

dz

x

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

31 / 44

.

Maximum-Entropy Probability Model

Closed-Form Solution for Continuous Variables

Closed-Form Solution for Continuous Variables use the point symmetry of the integrand ∫

e− 2 z

1 T γ ˜z

zi dz = 0

z

then ⟨xi ⟩ =

⟨xi xj ⟩ =

∫ x

∫ x

P(x )xi dx ≡

1 ˜ Z

P(x )xi xj dx ≡

e− 2 z

1 T γ ˜z

z

1 ˜ Z

e− 2 z

(

zi +

1 T γ ˜z

z

∑L

γ j=1 (˜

−1 ) β ij j

)

dz =

∑L

γ j=1 (˜

−1 ) β ij j

(zi − ⟨xi ⟩)(zj − ⟨xj ⟩)dz = ⟨zi zj ⟩ + ⟨xi ⟩⟨xj ⟩

so Cij = ⟨xi xj ⟩ − ⟨xi ⟩⟨xj ⟩ ≡ ⟨zi zj ⟩

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

32 / 44

.

Maximum-Entropy Probability Model

Closed-Form Solution for Continuous Variables

Closed-Form Solution for Continuous Variables the term ⟨zi zj ⟩ is solved using a spectral decomposition (

)

∫ L L 1 ∑ 1∑ ⟨zi zj ⟩ = (vl )i (vn )j exp − λk yk2 yl yn dy 2 k=1 Z˜ l,n=1 y

=

L ∑ 1

λ k=1 k

(vk )i (vk )j ≡ (˜ γ −1 )ij

with Cij = (˜ γ −1 )ij , the Lagrange multipliers β and γ are 1 1 β = C −1 ⟨x ⟩, γ = − γ˜ = − C −1 2 2 the real-valued maximum-entropy distribution −L/2

P(x ; ⟨x ⟩, C ) = (2π)

−1/2

det(C )

(

1 exp − (x − ⟨x ⟩)T C −1 (x − ⟨x ⟩) 2 .

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

)

.

.

.

.

33 / 44

.

Maximum-Entropy Probability Model

Closed-Form Solution for Continuous Variables

Pair Interaction Strength

The pair interaction strength is evaluated by the already introduced partial correlation coeﬃcient between xi and xj given the remaining variables {xr }r ∈{1,...,L}\{i,j} . 

ρij·{1,...,L}\{i,j}

(C −1 )

ij − √ γij (C −1 )ii (C −1 )jj ≡√ =  γii γjj 1

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

if i ̸= j, if i = j.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

34 / 44

.

Maximum-Entropy Probability Model

Solution for Categorical Variables

Mean-Field Approximation the empirical covariance matrix ˆij (σ, ω) = fij (σ, ω) − fi (σ)fj (ω) C application of the closed-form solution for continuous variables to the ˆ −1 (σ, ω) yields the so-called categorical variables for C −1 (σ, ω) ≈ C mean-field (MF) approximation 1 γijMF (σ, ω) = − (C −1 )ij (σ, ω) ⇒ eijMF (σ, ω) = −(C −1 )ij (σ, ω) 2 The same solution has been obtained by using a perturbation ansatz to solve the q-state Potts model termed (mean-field) Direct Coupling Analysis (DCA or mfDCA)

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

35 / 44

.

Maximum-Likelihood Inference

Maximum-Likelihood

What is the Maximum-Likelihood

• A well-known approach to estimate the parameters of a model is

maximum-likelihood inference. • The likelihood is a scalar measure of how likely the model

parameters are, given the observed data, and the maximum-likelihood solution denotes the parameter set maximizing the likelihood function.

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

36 / 44

.

Maximum-Likelihood Inference

Likelihood Function

Likelihood Function To use the method of maximum likelihood, one first specifies the joint density function for all observations. For an independent and identically distributed sample, this joint density function is f (x1 , x2 , . . . , xn |θ) = f (x1 |θ) × f (x2 |θ) × · · · × f (xn |θ) whereas θ be the function’s variable and allowed to vary freely; this function will be called the likelihood: L(θ; x1 , . . . , xn ) = f (x1 , x2 , . . . , xn |θ) =

n ∏

f (xi |θ)

i=1

log-likelihood ln L(θ; x1 , . . . , xn ) =

n ∑

ln f (xi |θ)

i=1

average log-likelihood

ℓˆ = n1 lnL .

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

37 / 44

.

Maximum-Likelihood Inference

Maximum-Likelihood Inference

Maximum-Likelihood Inference For a pairwise model with parameters h(σ) and e(σ, ω), the likelihood function of given observed data x 1 , . . . , x M ∈ ΩL , which are assumed to be independent and identically distributed as l(h(σ), e(σ, ω)|x 1 , . . . , x M ) =

∏M

m=1 P(x

m ; h(σ), e(σ, ω))

{hML (σ), e ML (σ, ω)} = arg max l(h(σ), e(σ, ω)) h(σ),e(σ,ω)

= arg min − ln l(h(σ), e(σ, ω) h(σ),e(σ,ω)

Maximum-entropy distribution as model distribution ln l(h(σ), e(σ, ω)) = 

= −M ln Z −

M ∑

ln P(x m ; h(σ), e(σ, ω))

m=1 L ∑ ∑ i=1 σ

hi (σ)fi (σ) −

eij (σ, ω)fij (σ, ω)

1≤i
K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

38 / 44

.

Maximum-Likelihood Inference

Maximum-Likelihood Inference

Maximum-Likelihood Inference Taking the derivatives [

]

∂ ∂ ln l = −M ln Z − fi (σ) = 0 ∂hi (σ) ∂hi (σ) {h(σ),e(σ,ω)} [

]

∂ ∂ ln l = −M ln Z − fij (σ, ω) = 0 ∂eij (σ, ω) ∂eij (σ, ω) {h(σ),e(σ,ω)}

Partial derivatives of the partition function

∂ 1 ln Z = ∂hi (σ) Z = Pi (σ; h(σ), e(σ, ω)) ∂hi (σ) Z {h(σ),e(σ,ω)} {h(σ),e(σ,ω)}

1 ∂ = ∂eij (σ,ω) Z = Pij (σ, ω; h(σ), e(σ, ω ln Z ∂eij (σ, ω) Z {h(σ),e(σ,ω)} {h(σ),e(σ,ω)}

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

39 / 44

.

Maximum-Likelihood Inference

Maximum-Likelihood Inference

Maximum-Likelihood Inference

The maximizing parameters, hML (σ) and e ML (σ, ω) are those matching the distributions single and pair marginal probabilities with the empirical single and pair frequency counts, Pi (σ; hML (σ), e ML (σ, ω)) = fi (σ), Pij (σ, ω; hML (σ), e ML (σ, ω)) = fij (σ, ω) In other words, matching the moments of the pairwise maximum-entropy probability distribution to the given data is equivalent to maximumlikelihood fitting of an exponential family.

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

40 / 44

.

Maximum-Likelihood Inference

Pseudo-Likelihood Maximization

Pseudo-Likelihood Maximization

Besag introduced the pseudo-likelihood as approximation to the likelihood function in which the global partition function is replaced by computationally tractable local estimates. In this approach, the probability of the m-th observation, x m , is approximated by the product of the conditional probabilities of xr = xrm given observations in the remaining variables x\r := (x1 , . . . , xr −1 , xr +1 , . . . , xL )T ∈ ΩL−1 P(x ; h(σ), e(σ, ω)) ≃ m

L ∏

m P(xr = xrm |x\r = x\r ; h(σ), e(σ, ω))

r =1

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

41 / 44

.

Maximum-Likelihood Inference

Pseudo-Likelihood Maximization

Pseudo-Likelihood Maximization Each factor is of the following analytical form, 

exphr (xrm )+

P(xr =

xrm |x\r

=

xm ; h(σ), e(σ, ω)) \r

=

erj (xrm , xjm )

j̸=r

exp hr (σ) +

σ

erj (σ, xjm )

j̸=r

By this approximation, the loglikelihood becomes the pseudo-loglikelihood, ln lPL (h(σ), e(σ, ω)) :=

M ∑ L ∑

m ln P(xr = xrm |x\r = x\r ; h(σ), e(σ, ω))

m=1 r =1

An ℓ2 -regularizer is added to select for small absolute values of the inferred parameters, {hPLM (σ), e PLM (σ, ω)} = arg min { − ln lPL (h(σ), e(σ, ω))+ h(σ),e(σ,ω)

λh ∥h(σ)∥22 + λe ∥e(σ, ω)∥22 } .

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

.

.

December 12, 2016

.

. .

.

.

.

.

42 / 44

.

Scoring Functions for Pairwise Interaction Strengths

Scoring Functions

Scoring Functions for Pairwise Interaction Strengths • Direct information: DIij = • Frobenius norm: ∥eij ∥F =

Pijdir (σ, ω) ln

σ,ω∈Ω  1/2 ∑ 2  eij (σ, ω) 

Pijdir (σ, ω) fi (σ)fj (ω)

σ,ω∈Ω

• Average product correction: APC − FNij = ∥eij ∥F −

where ∥ei· ∥F :=

L 1 ∑ ∥eij ∥F L − 1 j=1

∥e·j ∥F :=

L 1 ∑ ∥eij ∥F L − 1 i=1

∥e·· ∥F :=

L ∑ 1 ∥eij ∥F L(L − 1) i,j=1 .

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

∥ei· ∥F ∥e·j ∥F ∥e·· ∥F

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

43 / 44

.

Summary

Summary

.

K.K M (HUST)

Introduction To DCA

.

.

.

.

.

. . . . . . . .

. . . . . . . .

. . . . . . . .

. .

December 12, 2016

.

. .

.

.

.

.

44 / 44

.

## Introduction To DCA - GitHub

Maximum-Entropy Probability Model. Joint & Conditional Entropy. Joint & Conditional Entropy. â¢ Joint Entropy: H(X,Y ). â¢ Conditional Entropy: H(Y |X). H(X,Y ) ...

#### Recommend Documents

Introduction to Algorithms - GitHub
Each cut is free. The management of Serling ..... scalar multiplications to compute the 100 50 matrix product A2A3, plus another. 10 100 50 D 50,000 scalar ..... Optimal substructure varies across problem domains in two ways: 1. how many ...

Introduction to R - GitHub
Nov 30, 2015 - 6 Next steps ... equals, ==, for equality comparison. .... invoked with some number of positional arguments, which are always given, plus some ...

Introduction to phylogenetics using - GitHub
Oct 6, 2016 - 2.2 Building trees . ... Limitations: no model comparison (can't test for the 'best' tree, or the 'best' model of evolution); may be .... more efficient data reduction can be achieved using the bit-level coding of polymorphic sites ....

Introduction to Fluid Simulation - GitHub
upon the notes for a Siggraph course on Fluid Simulation[Bridson. 2007]. I also used .... âAt each time step all the fluid properties are moved by the flow field u.

122COM: Introduction to C++ - GitHub
All students are expected to learn some C++. .... Going to be learning C++ (approved. ). ..... Computer Science - C++ provides direct memory access, allowing.

Introduction to NumPy arrays - GitHub
www.scipy-lectures.org. Python. Matplotlib. SciKits. Numpy. SciPy. IPython. IP[y]:. Cython. 2015 ..... numbers and determine the fraction of pairs which has ... origin as a function of time. 3. Plot the variance of the trajectories as a function of t

An Introduction to BigQuery - GitHub
The ISB-CGC platform includes an interactive Web App, over a Petabyte of TCGA data in Google Genomics and Cloud Storage, and tutorials and code ...

Introduction to NumPy arrays - GitHub
we want our code to run fast. â· we want support for linear algebra ... 7. 8 a[0:5] a[5:8]. â· if step=1. â· slice contains the elements start to stop-1 .... Indexing and slicing in higher dimensions. 0. 8. 16. 24. 32. 1. 9. 17. 25. 33. 2. 10. 18.

Introduction to Framework One - GitHub
Introduction to Framework One [email protected] ... Event Management, Logging, Caching, . ... Extend framework.cfc in your Application.cfc. 3. Done. (or in the ... All controllers are passed the argument rc containing the request.context, and all v

introduction - GitHub
warehouse to assemble himself. Pain-staking and time-consuming... almost like building your own base container images. This piggy purchased high- quality ...

Introduction - GitHub
software to automate routine labor, understand speech or images, make diagnoses ..... Shaded boxes indicate components that are able to learn from data. 10 ...... is now used by many top technology companies including Google, Microsoft,.

Introduction - GitHub
data. There are many ways to learn functions, but one particularly elegant way is ... data helps to guard against over-fitting. .... Gaussian processes for big data.

Introduction - GitHub
For the case that your PDF viewer does not support this, there is a list of all the descriptions on ...... 10. Other Formats. 10.1. AMS-TEX. AMS-TEX2.0. A macro package provided by the American .... A TeX Live port for Android OS. Based on ...

Introduction - GitHub
them each year. In an aggregate travel demand model, this would be represented as 100/365.25 = 0.2737851 trucks per day. In the simulation by contrast, this is represented as ... based on the distance traveled (Table 3.3). 2FAF3 Freight Traffic Analy

Course: Introduction to Intelligent Transportation Systems - GitHub
... Introduction to Intelligent Transportation Systems. University of Tartu, Institute of Computer Science. Project: Automatic Plate Number. Recognition (APNR).

Introduction to REST and RestHUB - GitHub
2. RestHUBÐ°Ð½Ð°RESTful API for Oracle DB querying. 2.1. Overview. RestHub was designed .... For example we want to create a simple HTML + Javascript page.

A Beginner's Introduction to CoffeeKup - GitHub
the buffer, then calls the title function which adds it s own HTML to the buffer, and ... Now it is starting to look like real HTML you d find on an ugly web page. 2 ...

Introduction to RestKit Blake Watters - GitHub
Sep 14, 2011 - Multi-part params via RKParams. RKParams* params = [RKParams paramsWithDictionary:paramsDictionary];. NSData* imageData .... This is typically configured as a secondary target on your project. // Dump your seed data out of your backend

Introduction to Scientific Computing in Python - GitHub
Apr 16, 2016 - 1 Introduction to scientific computing with Python ...... Support for multiple parallel back-end processes, that can run on computing clusters or cloud services .... system, file I/O, string management, network communication, and ...

Glow Introduction - GitHub
Architecture: Data Flow. 1. Outputs of tasks are saved by local agents. 2. Driver remembers all data locations. 3. Inputs of next group of tasks are pulled from the ...

DCA Questions Fundamentals of Computer & Operating System.pdf ...
DCA Questions Fundamentals of Computer & Operating System.pdf. DCA Questions Fundamentals of Computer & Operating System.pdf. Open. Extract.

DCA Questions Fundamentals of Computer & Operating System.pdf ...
DCA Questions Fundamentals of Computer & Operating System.pdf. DCA Questions Fundamentals of Computer & Operating System.pdf. Open. Extract.