A non-Gaussian approach for causal discovery in the ...

Viewer
Transcript

A non-Gaussian approach for causal discovery in the presence of hidden common causes⋆ Shohei Shimizu The Institute of Scientific and Industrial Research, Osaka University 8-1 Mihogaoka, Ibaraki, Osaka 5670047, Japan https://sites.google.com/site/sshimizu06/

Abstract. We discuss the problem of estimating the causal direction between two observed variables in the presence of hidden common causes. Managing hidden common causes is essential when studying causal relations based on observational data. We previously proposed a Bayesian estimation method for estimating the causal direction using the nonGaussianity of data. This method does not require us to explicitly model hidden common causes. The experiments on artificial data presented in this paper imply that Bayes factors could be useful for selecting a better causal direction when using a non-Gaussian method. Keywords: causal discovery, hidden common causes, structural equation models, non-Gaussianity

1

Introduction

We consider the problem of estimating causal relations based on observational data [2,33,22]. Assume that we are interested in the causal relations between two observed random variables x1 and x2 . We are particularly interested in the causal direction between these two variables assuming one-way causation. We use the framework of structural causal models [22] to represent their causal relations. We want to estimate which of the following two models (Models 1 and 2) is better than the other by using a dataset of x1 and x2 randomly sampled from either of these two models: { x1 = e1 Model 1 : , (1) x2 = b21 x1 + e2 { x1 = b12 x2 + e1 Model 2 : (2) , x2 = e2 where e1 and e2 are unobserved or hidden random variables, typically called error variables, exogenous variables, or external influences. b21 and b12 are constants that represent the magnitude of causation from x1 to x2 and from x2 to x1 , ⋆

To appear in Proc. Second Workshop on Advanced Methodologies for Bayesian Networks (AMBN2015), 2016.

2

Shohei Shimizu

respectively. For simplicity, here we assume that b21 and b12 are non-zero. In Model 1, x1 causes x2 and the causal direction is x1 → x2 , whereas in Model 2, x2 causes x1 and the causal direction is x2 → x1 . Note that these two models describe the data-generating process of x1 and x2 rather than simply defining the probability distribution of x1 and x2 . For example, Model 1 states that the values of e1 and e2 are first generated and that the value of x1 is observed as that of x1 as it is; hence, the value of x2 is generated as a linear combination of those of x1 and e2 . The major diﬃculty with estimating causal directions based on observational data is that the error variables e1 and e2 are dependent in general. Then, even if we know that the right causal direction is x1 → x2 , we cannot obtain the right estimate of the coeﬃcient b21 by using the regression coeﬃcient obtained when regressing x2 on x1 . Such dependency between the error variables e1 and e2 is typically introduced by the unobserved variables that cause both x1 and x2 . These unobserved variables are known as hidden common causes. Assume that we have a single hidden common cause f1 that makes e1 and e2 dependent. Then, we rewrite Models 1 and 2 as follows:

Model 1′ :

 x1 = λ11 f1 + e′1   | {z }  e1

x2 = b21 x1 + λ21 f1 + e′2    | {z }

,

(3)

,

(4)

e2

Model 2′ :

 x1 = b12 x2 + λ11 f1 + e′1   | {z }  x2 = λ21 f1 + e′2    | {z }

e1

e2

where λ11 and λ21 are constants that represent the magnitudes of causation. Now, the new error variables e′1 and e′2 are statistically independent. A wellknown guideline [24,21] is to observe the hidden common cause f1 , incorporate it into the model, and carry out three-variable analysis so that the error variables are independent. Although we should certainly follow this guideline, doing so can be hard since a large number of hidden common causes may exist and we often have no idea what they are. In this paper, we discuss the problem of estimating the causal direction between two observed variables in the presence of hidden common causes. We use structural causal models [22] to represent causal relations and make causal inferences. We assume linear functional relations, acyclic causal relations, and the non-Gaussianity of the error variables. Further, we assume that the number of hidden common causes is unknown. Under these assumptions, we previously proposed a method for estimating the causal direction between two observed variables [26]. This method compares two models of two observed variables with opposite causal directions in a Bayesian model selection framework. The method does not require us to explicitly model hidden common causes and makes the number of hidden common causes remain unspecified. In the remainder of this

Non-Gaussian causal discovery in the presence of hidden common causes

3

paper, we first briefly review the method. Second, we consider using a set of prior distributions for cases with observed variables being standardized. Finally, we conduct experiments on artificial data. This paper thus supplements our previous work [26].

2

A non-Gaussian causal model with hidden common cause cases

We previously developed a linear structural causal model for causal discovery in the presence of hidden common causes [11]. This model is an extension of a linear non-Gaussian acyclic structural equation model known as LiNGAM [28,31]. Let us denote by x1 , · · · , xp the observed variables, by f1 , · · · , fQ the hidden common causes, and by e1 , · · · , ep the error variables. All these are continuous variables. Then, we write the model as follows: xi = µi +

∑ j̸=i

bij xj +

Q ∑

λiq fq + ei ,

(5)

q=1

where bij and λiq are constants that represent the magnitudes of causation and µi are intercepts. We assume that the causal relations are acyclic, i.e., there is no feedback relation. We further assume that the hidden common causes fq (q = 1, · · · , Q) and error variables ei (i = 1, · · · , p) are non-Gaussian and independent. Although the independence assumption on hidden common causes fq looks strong, we can make this assumption without loss of generality under some common assumptions including linearity. See [11] for the details of the independence assumption on hidden common causes. By using the model in Eq. (5), we compare the following two models with opposite directions of causation: { ∑Q x1 = µ1 + q=1 λ1q fq + e1 ∑Q , Model 3 : (6) x2 = µ2 + b21 x1 + q=1 λ2q fq + e2 { ∑Q x1 = µ1 + b12 x2 + q=1 λ1q fq + e1 ∑Q . Model 4 : (7) x2 = µ2 + q=1 λ2q fq + e2 , Figure 1 presents graphical representations of these two models. Note that we assume the number of hidden common causes Q to be unknown. In [26], we related the model in Eq. (5) to a model having individual-specific intercepts instead of explicitly having hidden common causes. A major advantage of this approach is that we do not estimate the number of hidden common causes Q. To explain the idea, we first rewrite the model in Eq. (5) for observation l as follows: (l)

xi = µi +

Q ∑ q=1

λiq fq(l) +

∑ j̸=i

(l)

(l)

bij xj + ei

(8)

4

Shohei Shimizu

Model 3

f1

Model 4

f3

f2 x1

x2

e1

e2

f1

f3

f2

vs

x1

x2

e1

e2

Fig. 1. Models 3 and 4: Two models with diﬀerent causal directions in the presence of three hidden common causes

∑Q (l) (l) Now, let us denote the sums of the hidden common causes by µ ˜i = q=1 λiq fq . Then, we have the following model with individual-specific intercepts: (l)

(l)

xi = µi + ∑Q

µ ˜i |{z}

q=1

+ (l)

∑

(l)

(l)

bij xj + ei ,

(9)

j̸=i

λiq fq

(l)

where µi are the intercepts common to all the observations and µ ˜i are the (l) individual-specific intercepts. The distributions of ei (l = 1, · · · , n) are assumed to be identical for every i. In this model, the observations are generated from the model with no hidden common causes, possibly with diﬀerent parameter values (l) of the means µi + µ ˜i . This model has the intercepts µi and bij that are common (l) to all the observations as well as the individual-specific intercepts µ ˜i . This is similar to mixed models [5]. Thus, we call this a mixed-LiNGAM. Now, the problem of comparing Models 3 and 4 in Eqs. (6) and (7) becomes that of comparing Models 3’ and 4’: { ′

Model 3 : { ′

Model 4 :

(l)

(l)

(l)

x1 = µ1 + µ ˜ 1 + e1 (l) (l) (l) (l) , x2 = µ2 + µ ˜2 + b21 x1 + e2 (l)

(l)

(l)

(10)

(l)

x1 = µ1 + µ ˜1 + b12 x2 + e1 , (l) (l) (l) x2 = µ2 + µ ˜ 2 + e2

(11)

∑Q ∑Q (l) (l) (l) (l) where µ ˜1 = q=1 λ1q fq and µ ˜2 = q=1 λ2q fq (l = 1, · · · , n). We apply a Bayesian approach to compare Models 3’ and 4’ and estimate the possible causal direction between the two observed variables x1 and x2 . We assume that the prior probabilities of the two candidate models are uniform. Then, we may simply compare the log-marginal likelihoods of the two models to assess their plausibility. The model with the larger log-marginal likelihood is considered as the closest to the true model [15].

Non-Gaussian causal discovery in the presence of hidden common causes

3

5

Likelihood T

T

(l)

(l)

Let D be the observed data set [x(1) , · · · , x(n) ]T , where x(l) = [x1 , x2 ]T . Denote Models 3’ and 4’ by M3′ and M4′ and their log-marginal likelihoods by log p(D|θ r , Mr ) (r = 3′ , 4′ ). Then, their log-marginal likelihoods are given by n ∏

log p(D|θ r , Mr ) = log

p(x(l) |θ r , Mr )

(12)

log p(x(l) |θ r , Mr )

(13)

l=1

=

n ∑ l=1

∑ (l) (l) n  (x1 − µ1 − µ ˜1 |θ 3′ , M3′ )  l=1 log pe(l)  1   (l) (l)  + ∑n log p (l) (x(l) − µ2 − µ ˜2 − b21 x1 |θ 3′ , M3′ ) for M3′ 2 l=1 e2 = ∑n (14). (l) (l) (l)  (x1 − µ1 − µ ˜1 − b12 x2 |θ 4′ , M4′ )  l=1 log pe(l)  ∑ 1   (l)  + n log p (l) (x(l) − µ2 − µ ˜2 |θ 4′ , M4′ ) for M4′ 2 l=1 e 2

(l)

(l)

The distributions of the error variables e1 and e2 are modeled by Laplace (l) (l) distributions with zero mean and variances of var(e1 ) = h21 and var(e2 ) = h22 (h1 , h2 > 0) as follows: √ pe(l) = Laplace(0, h1 / 2), 1 √ pe(l) = Laplace(0, h2 / 2).

(15) (16)

2

Here, we simply use a super-Gaussian distribution, the Laplace distribution, to model pe(l) and pe(l) . Super-Gaussian distributions have often been reported to 1 2 work well in non-Gaussian estimation methods including independent component analysis [12] and linear non-Gaussian structural causal models if the actual error distributions are super-Gaussian [12,13].

4

Prior distributions

The parameter vectors θ 3′ and θ 4′ in Eq. (14) are written as follows: (l)

θ 3′ = [µi , b21 , hi , µ ˜i ]T

(i = 1, 2; l = 1, · · · , n),

(17)

(l) [µi , b12 , hi , µ ˜i ]T

(i = 1, 2; l = 1, · · · , n).

(18)

θ 4′ =

We first standardize the two observed variables x1 and x2 to have their means and variances zeros and ones before computing the log-marginal likelihoods. We prefer that the inference is not sensitive to the means and scales of the observed variables. Then, following [9], we model the prior distributions of the parameters

6

Shohei Shimizu

common to all the observations as follows: b12 ∼ N (0, 1) b21 ∼ N (0, 1) h1 ∼ lnN (0, 1) h2 ∼ lnN (0, 1). Further, we set the intercepts µi (i = 1, 2) to be zeros following [9] since the observed variables have been standardized. Next, we use an informative prior distribution for the individual-specific inter(l) (l) cepts µ ˜i (i = 1, 2; l = 1, · · · , n). Those individual-specific intercepts µ ˜i are the sums of many non-Gaussian independent hidden common causes fq and are dependent. The central limit theorem states that the sum of independent variables becomes increasingly close to the Gaussian [1]. Motivated by this observation, we approximate the non-Gaussian distributions of the individual-specific inter(l) cepts µ ˜i that are the sums of many non-Gaussian independent hidden common causes by using a bell-shaped curve distribution. We here model the prior distribution of the individual-specific intercepts by the multivariate t-distribution as follows: ] ([√ [ ]T ) √ (l) µ ˜1 = diag τ1indvdl , τ2indvdl C−1/2 u (19) (l) µ ˜2 , where τ1indvdl and τ2indvdl are constants, u ∼ tν (0, Σ), and Σ = [σab ] is a symmetric scale matrix whose diagonal elements are 1s. C is a diagonal matrix whose ν diagonal elements give the variance of elements of u, i.e., C = ν−2 diag(Σ) for ν > 2. The degree of freedom ν is here taken to be eight. The hyper-parameters are τ1indvdl , τ2indvdl , and σ21 . We take an empirical Bayesian approach to select the hyper-parameters. We test τiindvdl = 0, 0.22 , ..., 0.82 , 1.02 (i = 1, 2) and σ12 = 0, ±0.3, ±0.5, ±0.7, ±0.9. We take the ordinary Monte Carlo sampling approach to compute the log-marginal likelihoods with 10,000 samples for the parameter vectors θ r (r = 3′ , 4′ ).

5

Experiments on artificial data

We generated data using the following non-Gaussian model with three hidden common causes: x1 = 5 + f1 + f2 + 1.5f3 + e1 x2 = 10 + f1 + 2f2 + 0.5f3 + 3x2 + e2 . We tested three distributions of the error variables e1 , e2 , and hidden common causes f1 , f2 , f3 : the √ Laplace distribution, the exponential distribution with the parameter value 1/ 2, and the uniform distribution. The Laplace distribution and exponential distribution have positive kurtoses and are super-Gaussian

Non-Gaussian causal discovery in the presence of hidden common causes

7

distributions, whereas the uniform distribution has negative kurtosis and is a sub-Gaussian distribution. The Laplace and uniform distributions are symmetric, whereas the exponential is asymmetric. The variances of e1 and e2 were set to 9 and those of fq were ones. We permuted the variables according to a random ordering to hide the true orderings. We conducted 200 trials with sample sizes of 50, 100, 200, and 500. We counted the numbers of successful discoveries of the causal directions and computed precisions. We also computed the Bayes factor. Let us denote by K the Bayes factor of the two models compared, M3′ and M4′ . For notational simplicity, we assume that we compute K so that the larger likelihood comes to the numerator and the smaller to the denominator. In [15], Kass and Raftery proposed that if 2 log K is 0 to 2, the evidence is not worth more than a bare mention, if 2 log K is 2 to 6, it is positive, if 2 log K is 6 to 10, it is strong, and if 2 log K is more than 10, it is very strong. Tables 1 to 3 show the results when the actual error variables follow the Laplace, exponential, and uniform distributions, respectively. Table 4 shows the result when the actual distribution of each error variable was randomly selected from the three distributions for every trial. Overall, if the sample size increased or the Bayes factors rose, the numbers of successful discoveries and precisions improved. This finding implies that considering Bayes factors is useful when selecting a better model by using our method. When the actual distribution was the Laplace or exponential, the performance seemed to be satisfactory (see Tables 1 and 2) because the the Laplace and exponential distributions are super-Gaussian, as is the postulated distribution, the Laplace. When the actual distribution was the uniform, the performance (Tables 3) was much worse than the cases with the actual distribution being the Laplace and exponential, because the uniform distribution is sub-Gaussian, unlike the Laplace. When each of the actual error distributions was randomly selected, the performance again became worse than the Laplace and exponential distribution cases (but performance was not terrible). This finding occurs because two of the three distributions used in this experiment were super-Gaussian, as was the postulated error distribution.

6

Related work

For the past 10 years, many semi-parametric methods for estimating causal directions under the assumption of no hidden common causes have been developed [6,28,30,10,35,14,13,4,29,27,23]. In contrast to non-parametric methods [33,22], semi-parametric methods make some assumptions on the function forms of causal relations and/or the error distributions to make the models identifiable. Those semi-parametric methods have recently been applied to empirical research including economics [20,16], neuroscience [19,18], epidemiology [25], and chemistry [3]. See [31] for a review of semi-parametric methods and [32,34] for

8

Shohei Shimizu

Table 1. Numbers of successful discoveries and precisions when the actual distributions are the Laplace. K is the Bayes factor. N. successes N. findings Precisions n = 50 2 log K 2 log K 2 log K 2 log K n = 100 2 log K 2 log K 2 log K 2 log K n = 200 2 log K 2 log K 2 log K 2 log K n = 500 2 log K 2 log K 2 log K 2 log K

> > > >

0 2 6 10

143 11 0 0

200 13 0 0

0.71 0.85 N/A N/A

> > > >

0 2 6 10

142 52 0 0

200 60 0 0

0.71 0.87 N/A N/A

> > > >

0 2 6 10

161 114 20 1

200 127 21 1

0.81 0.90 0.95 1.00

> > > >

0 2 6 10

167 144 108 56

200 164 114 57

0.83 0.88 0.95 0.98

recent reviews of non-parametric methods. Links to most of the papers related to semi-parametric methods are available on the web.1 In practice, those methods assuming no hidden common causes seem to work well in those papers. However, what distinguishes observational studies from experimental studies is the existence of hidden common causes. Therefore, in some applications, empirical researchers hesitate to accept the estimation results of those methods that assume no hidden common causes. We could take a non-Gaussian approach [11] that uses an extension of independent component analysis with more latent independent components than observed variables (overcomplete ICA [17]) to formally consider hidden common causes in semi-parametric methods. Unfortunately, however, current versions of the overcomplete ICA algorithms are computationally unreliable since they often suﬀer from local optima [7]. In [8], Henao and Winther proposed a Bayesian approach to estimate the model. Their method seems to work for larger numbers of variables than the overcomplete ICA-based method. However, both these methods need to explicitly model all the hidden common causes. This approach could sometimes be computationally tough since the number of hidden common causes can be large, while specifying the exact number of hidden common causes might also be challenging. Thus, in [26], we proposed an alternative approach that does not require us to specify the number of hidden common causes or explicitly model them.

1

https://sites.google.com/site/sshimizu06/home/lingampapers

Non-Gaussian causal discovery in the presence of hidden common causes

9

Table 2. Numbers of successful discoveries and precisions when the actual distributions are the exponential. K is the Bayes factor. N. successes N. findings Precisions n = 50 2 log K 2 log K 2 log K 2 log K n = 100 2 log K 2 log K 2 log K 2 log K n = 200 2 log K 2 log K 2 log K 2 log K n = 500 2 log K 2 log K 2 log K 2 log K

7

> > > >

0 2 6 10

137 16 0 0

200 20 0 0

0.69 0.85 N/A N/A

> > > >

0 2 6 10

151 56 0 0

200 64 0 0

0.76 0.88 N/A N/A

> > > >

0 2 6 10

161 120 32 3

200 136 33 3

0.81 0.88 0.97 1.00

> > > >

0 2 6 10

165 152 106 78

200 174 111 78

0.82 0.87 0.95 1.00

Conclusions

In this paper, we discussed a non-Gaussian approach for estimating causal directions in the presence of hidden common causes. The experiments on artificial data implied that looking at Bayes factors could be useful for selecting a better causal direction. We distribute the Python codes under the MIT license at https://sites.google.com/site/sshimizu06/mixedlingamcode. Acknowledgments. This work was partially supported by JSPS KAKENHI Grant Numbers 24700275 and 24300106 and the Center of Innovation Program from the Japan Science and Technology Agency, JST.

References 1. Billingsley, P.: Probability and Measure. Wiley-Interscience (1986) 2. Bollen, K.: Structural Equations with Latent Variables. John Wiley & Sons (1989) 3. Campomanes, P., Neri, M., Horta, B.A., R¨ ohrig, U.F., Vanni, S., Tavernelli, I., Rothlisberger, U.: Origin of the spectral shifts among the early intermediates of the rhodopsin photocycle. Journal of the American Chemical Society 136(10), 3842– 3851 (2014) 4. Chen, Z., Chan, L.: Causality in linear nonGaussian acyclic models in the presence of latent Gaussian confounders. Neural Computation 25(6), 1605–1641 (2013) 5. Demidenko, E.: Mixed Models: Theory and applications. Wiley-Interscience (2004) 6. Dodge, Y., Rousson, V.: Direction dependence in a regression line. Communications in Statistics-Theory and Methods 29(9-10), 1957–1972 (2000)

10

Shohei Shimizu

Table 3. Numbers of successful discoveries and precisions when the actual distributions are the uniform. K is the Bayes factor. N. successes N. findings Precisions n = 50 2 log K 2 log K 2 log K 2 log K n = 100 2 log K 2 log K 2 log K 2 log K n = 200 2 log K 2 log K 2 log K 2 log K n = 500 2 log K 2 log K 2 log K 2 log K

> > > >

0 2 6 10

77 0 0 0

200 1 0 0

0.39 0.00 N/A N/A

> > > >

0 2 6 10

65 3 0 0

200 24 0 0

0.33 0.13 N/A N/A

> > > >

0 2 6 10

60 10 0 0

200 74 3 0

0.30 0.14 0.00 N/A

> > > >

0 2 6 10

54 29 6 0

200 144 47 10

0.27 0.20 0.13 0.00

7. Entner, D., Hoyer, P.O.: Discovering unconfounded causal relationships using linear non-gaussian models. In: New Frontiers in Artificial Intelligence, Lecture Notes in Computer Science. vol. 6797, pp. 181–195 (2011) 8. Henao, R., Winther, O.: Sparse linear identifiable multivariate modeling. Journal of Machine Learning Research 12, 863–905 (2011) 9. Hoyer, P.O., Hyttinen, A.: Bayesian discovery of linear acyclic causal models. In: Proc. 25th Conference on Uncertainty in Artificial Intelligence (UAI2009). pp. 240–248 (2009) 10. Hoyer, P.O., Janzing, D., Mooij, J., Peters, J., Sch¨ olkopf, B.: Nonlinear causal discovery with additive noise models. In: Advances in Neural Information Processing Systems 21, pp. 689–696 (2009) 11. Hoyer, P.O., Shimizu, S., Kerminen, A., Palviainen, M.: Estimation of causal effects using linear non-Gaussian causal models with hidden variables. International Journal of Approximate Reasoning 49(2), 362–378 (2008) 12. Hyv¨ arinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001) 13. Hyv¨ arinen, A., Smith, S.M.: Pairwise likelihood ratios for estimation of nonGaussian structural equation models. Journal of Machine Learning Research 14, 111–152 (2013) 14. Hyv¨ arinen, A., Zhang, K., Shimizu, S., Hoyer, P.O.: Estimation of a structural vector autoregressive model using non-Gaussianity. Journal of Machine Learning Research 11, 1709–1731 (2010) 15. Kass, R.E., Raftery, A.E.: Bayes factors. Journal of the American Statistical Association 90(430), 773–795 (1995) 16. Lai, P.C., Bessler, D.A.: Price discovery between carbonated soft drink manufacturers and retailers: A disaggregate analysis with PC and LiNGAM algorithms. Journal of Applied Economics 18(1), 173–197 (2015) 17. Lewicki, M., Sejnowski, T.J.: Learning overcomplete representations. Neural Computation 12(2), 337–365 (2000)

Non-Gaussian causal discovery in the presence of hidden common causes

11

Table 4. Numbers of successful discoveries and precisions when the actual distributions are randomly selected from the Laplace, exponential, and uniform distributions. K is the Bayes factor. N. successes N. findings Precisions n = 50 2 log K 2 log K 2 log K 2 log K n = 100 2 log K 2 log K 2 log K 2 log K n = 200 2 log K 2 log K 2 log K 2 log K n = 500 2 log K 2 log K 2 log K 2 log K

> > > >

0 2 6 10

119 12 0 0

200 14 0 0

0.60 0.86 N/A N/A

> > > >

0 2 6 10

118 48 4 0

200 63 4 0

0.59 0.76 1.00 N/A

> > > >

0 2 6 10

122 71 18 1

200 109 22 1

0.61 0.65 0.82 1.00

> > > >

0 2 6 10

136 123 80 41

200 168 102 45

0.68 0.73 0.78 0.91

18. Liu, Y., Wu, X., Zhang, J., Guo, X., Long, Z., Yao, L.: Altered eﬀective connectivity model in the default mode network between bipolar and unipolar depression based on resting-state fMRI. Journal of Aﬀective Disorders 182, 8–17 (2015) 19. Mills-Finnerty, C., Hanson, C., Hanson, S.J.: Brain network response underlying decisions about abstract reinforcers. NeuroImage 103, 48–54 (2014) 20. Moneta, A., Entner, D., Hoyer, P., Coad, A.: Causal inference by independent component analysis: Theory and applications. Oxford Bulletin of Economics and Statistics 75(5), 705–730 (2013) 21. Pearl, J.: Causal diagrams for empirical research. Biometrika 82(4), 669–688 (1995) 22. Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press (2000), (2nd ed. 2009) 23. Peters, J., Janzing, D., Sch¨ olkopf, B.: Causal inference on discrete data using additive noise models. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(12), 2436–2450 (2011) 24. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal eﬀects. Biometrika 70(1), 41–55 (1983) 25. Rosenstr¨ om, T., Jokela, M., Puttonen, S., Hintsanen, M., Pulkki-R˚ aback, L., Viikari, J.S., Raitakari, O.T., Keltikangas-J¨ arvinen, L.: Pairwise measures of causal direction in the epidemiology of sleep problems and depression. PloS ONE 7(11), e50841 (2012) 26. Shimizu, S., Bollen, K.: Bayesian estimation of causal direction in acyclic structural equation models with individual-specific confounder variables and non-Gaussian distributions. Journal of Machine Learning Research 15, 2629–2652 (2014) 27. Shimizu, S., Hoyer, P.O., Hyv¨ arinen, A.: Estimation of linear non-gaussian acyclic models for latent factors. Neurocomputing 72, 2024–2027 (2009) 28. Shimizu, S., Hoyer, P.O., Hyv¨ arinen, A., Kerminen, A.: A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research 7, 2003– 2030 (2006)

12

Shohei Shimizu

29. Shimizu, S., Hyv¨ arinen, A.: Discovery of linear non-Gaussian acyclic models in the presence of latent classes. In: Proc. 14th International Conference on Neural Information Processing (ICONIP2007). pp. 752–761 (2008) 30. Shimizu, S., Inazumi, T., Sogawa, Y., Hyv¨ arinen, A., Kawahara, Y., Washio, T., Hoyer, P.O., Bollen, K.: DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research 12, 1225–1248 (2011) 31. Shimizu, S.: LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika 41(1), 65–98 (2014), Special Issue on Causal Discovery 32. Shpitser, I., Evans, R.J., Richardson, T.S., Robins, J.M.: Introduction to nested markov models. Behaviormetrika 41(1), 3–39 (2014), Special Issue on Causal Discovery 33. Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search. Springer Verlag (1993), (2nd ed. MIT Press 2000) 34. Tillman, R.E., Eberhardt, F.: Learning causal structure from multiple datasets with similar variable sets. Behaviormetrika 41(1), 41–64 (2014), Special Issue on Causal Discovery 35. Zhang, K., Hyv¨ arinen, A.: On the identifiability of the post-nonlinear causal model. In: Proc. 25th Conference on Uncertainty in Artificial Intelligence (UAI2009). pp. 647–655 (2009)

Identifying Dynamic Spillovers of Crime with a Causal Approach to ...

Undoing effect in causal reasoning 1 Submitted for ...

CROWD-IN-THE-LOOP: A Hybrid Approach for ...

Google Message Discovery - SPAM in a Box

A Causal Role for the Extrastriate Body Area in ...

Methods for Using Genetic Variants in Causal Estimation

Undoing effect in causal reasoning 1 Submitted for ...

Inference on Causal Effects in a Generalized ...

A dynamic causal model for evoked and induced ...

ESTIMATION OF CAUSAL STRUCTURES IN LONGITUDINAL DATA ...

Causal inference in motor adaptation

The Equivalence of Bayes and Causal Rationality in ... - Springer Link

Causal Thinking in the Health Sciences: Concepts and ...

A spatial variant approach for vergence control in ...

A Possibilistic Approach for Activity Recognition in ...

A Multi-objective Approach for Data Collection in ...

A Logical Approach for Preserving Confidentiality in Shared ...

A New Approach for Optimal Capacitor Placement in ...

A modified approach for aggregation technique in WSN - IJRIT