University of Liège – Montefiore Institute

Variable selection for Dynamic Treatment Regimes (DTR)

Raphael Fonteneau, Louis Wehenkel and Damien Ernst Department of Electrical Engineering and Computer Science University of Liège

27th  Benelux Meeting on Systems and Control, Heeze, The Netherlands, March 18­20, 2008

University of Liège – Montefiore Institute

Outline  Introduction



 An example: Nefazodone CBASP trial



 Many difficulties



 Problem formulation



 Dynamic programming



 Approach for solving the inference problem



 Algorithm



 Validation: 'Car on the hill' problem



 Conclusion and future work



University of Liège – Montefiore Institute

Introduction   Chronic diseases need long­term treatments



  Dynamic  Treatment  Regimes  (DTR):  treatments  are  operationalized  series  of  decisions specifying how treatment level and type should vary overtime ●

  Nowadays,  DTR  are  based on  clinical judgment and medical instinct, rather than  on formal and systematic data­driven process ●

  But  these  latter  ten  years,  one  has  seen  the  emergence  of  a  research  field  addressing specifically problems of inference of DTR from clinical data. ●

University of Liège – Montefiore Institute

An example: Nefazodone CBASP trial  A clinical trial set up for determining optimal DTR for chronical depresssion ●

 More than 60 variables: gender, racial category, marital satus, body mass index, medication current, depression, number of  depressive episodes, alcohol, drug, ... ●

1 Gender 2 Racial category 3-4 Marital status 5 Body mass index 6 Age in years at screening 7 Treated current depression 8 Medication current depression 9 Psychotherapy current depression 10 Treated past depression 11 Medication past depression 12 Psychotherapy past depression ...

 681 patients, 12 weeks of treatment, 3 types of treatments:



Nefazodone (200, 300, 400, 500 then 600 mg per day till the end) Cognitive behavioral­analysis system of psychotherapy (16 to 20 sessions) – twice  weekly session (weeks 1 to 4 or 8 if problem), weekly sessions (weeks 5 to 12) Both.  Tests are performed at t to evaluate the state of the patient, with a reward rt



University of Liège – Montefiore Institute

Many difficulties  Preference elicitation: to define a criterion that assess the 'well being' of patients



 Confounding issues: in the Nefazodone CBASP trial, experiments are highly sensitive to  the environment ●

 Inference problem



 Selecting a concise set of variables for representing the Dynamic Treatment Regime,  since a policy defined on more than 60 variables is not convenient. ●

University of Liège – Montefiore Institute

Problem formulation (I)  This problem can be seen has a discrete­time problem:



xt+1 = f (xt , ut , wt , t)  State:  xt  X (assimilated to the state of the patient)



 Actions: ut  U



 To the transition from t to t+1 is associated an instantaneous reward signal rt = r (xt , ut , wt , t), where r is the (real) reward function bounded by Br ●

 Disturbances: wt  W (disturbance space), where wt is generated by the probability 



distribution Pw(w|x, u, t) .

University of Liège – Montefiore Institute

Problem formulation (II)  The goal is to find a policy πT (t, x) : {0, ... , T­1} X  U that maximises rewards 



obtained on a certain time horizon T: T −1 T T

J x = E w

t t =0,1,... ,T −1

[ ∑ r  xt , T t , xt  ,w t , t ∣x0 =x] t =0

 The 'system dynamics' f is unknown and replaced by an ensemble F of trajectories:



 x 10 , u10 , r 10 , x 11 , ... , x 1T −1 , u1T −1 , r 1T −1 , x 1T  ,

 

 x 20 , u20 , r 20 , x 21 , ... , x 2T −1 , u2T −1 , r 2T −1 , x 2T  , ... p p p p p p p p  x0 , u0 , r 0 , x 1 , ... , x T −1 , u T −1 , r T −1 , x T 

University of Liège – Montefiore Institute

Dynamic programming  Let us define recursively the sequence of QN­ functions:  



Q N  x , u=E [r  x , u, w , tmax Q N −1  f  x , u , w ,t  , u']  with Q0 ≡ 0

w

u' ∈U

 The policiy defined by:



∀ t ∈{ 0,1,... , T −1 } , ∀ x ∈ X ,T ✶ t , x =argmax Q T −t x ,u u∈U

is a T­step optimal policy.

University of Liège – Montefiore Institute

Approach for solving the inference problem  An algorithm called fitted­Q iteration computes the successive QN­ functions from 



the ensemble of trajectories  This algorithm is particularly performant when QN­ functions  are appoximated 



using tree­based supervised learning methods  Problem: how to have a good policy defined on a small subset of variables ?



 Approach: (1) Run fitted Q iteration with trees (2) Compute the variance reduction associated with each variables



(3) Rerun the algorithm by considering that states are only made of the  components leading to the highest variance reduction.   

 

University of Liège – Montefiore Institute

 Algorithm   (1) Compute the         ­ functions (from N = 1 to N = T ) by running the fitted­Q             Q N  iteration algorithm on the F four­tuple set:  (2) Compute the relevance of different attributes a using the score evaluation: T

∑ ∑

scorea=

 where:



 a, node. redvnode.∣node∣

 node ∈tree N =1 tree ∈Q N T

∑ ∑



redv node .∣node∣

 node∈tree N=1 tree∈Q N

redv(node) is the variance reduction when splitting the tree­node node

∣              is the subset size before the splitting of node node∣                           if a is used to split node, else 0  a ,node=1  (3) Rerun the fitted Q iteration algorithm on 'best attributes'.

University of Liège – Montefiore Institute

Validation: 'Car on the hill' Problem (I)  A car, represented by a point, is riding on a slope represented by the following  graph ●

 Problem: starting from the lowest point, the car has to reach the top of the hill, using only value 4 or ­4 for u, in a minimum of iterations, and without running too fast ●

R mg

u

 x = (position, speed)



 Originally, the problem is deterministic



 We have added to the original state some non­informative components to set up an  experimental protocol. ●

University of Liège – Montefiore Institute

Validation: 'Car on the hill' Problem (II)  Results : variable relevance



Subset size k nb of trees nmin nb of irrelevant variables nb of iterations

position

speed

u

Scores rand [­2,2]

rand [­2,2]

rand [­2,2]

10000 5000 4000 3000 2000

3 3 3 3 3

15 15 15 15 15

2 2 2 2 2

0 0 0 0 0

50 50 50 50 50

0.30 0.24 0.23 0.28 0.23

0.36 0.35 0.33 0.28 0.34

0.34 0.41 0.44 0.44 0.43

/ / / / /

/ / / / /

/ / / / /

50000 40000 20000 10000 8000 5000 5000

4 4 4 4 4 4 4

50 15 15 15 15 15 15

2 2 8 4 2 2 4

1 1 1 1 1 1 1

50 30 30 30 30 30 30

0.21 0.24 0.30 0.16 0.21 0.23 0.27

0.24 0.25 0.20 0.34 0.24 0.18 0.30

0.44 0.39 0.48 0.41 0.44 0.47 0.35

0.11 0.12 0.10 0.09 0.11 0.12 0.08

/ / / / / / /

/ / / / / / /

20000 10000

5 5

15 15

2 4

2 2

30 30

0.11 0.15

0.26 0.24

0.46 0.43

0.08 0.08

0.09 0.10

/ /

20000 10000

6 6

50 15

4 2

3 3

50 30

0.15 0.10

0.21 0.28

0.41 0.42

0.08 0.08

0.08 0.06

0.07 0.06

University of Liège – Montefiore Institute

Conclusion and future work  A simple method of variable selection for reinforcement learning problems



 Incorporation of variable selection into the fitted­Q algorithm: possibility to compute  a policy depending only on the most informative variables ●

 Application to the Nefazodone CBASP trial



  Could  this  process  also  help  in  designing  algorithms  with  better  inference  capabilities ? ●

Variable selection for Dynamic Treatment Regimes (DTR)

University of Liège – Montefiore Institute. Problem formulation (I). ○ This problem can be seen has a discretetime problem: x t+1. = f (x t. , u t. , w t. , t). ○ State: x t. X (assimilated to the state of the patient). ○ Actions: u t. U. ○ To the transition from t to t+1 is associated an instantaneous reward signal r t. = r (x t. , u t. , w.

567KB Sizes 0 Downloads 189 Views

Recommend Documents

Variable selection for Dynamic Treatment Regimes (DTR)
Jul 1, 2008 - University of Liège – Montefiore Institute. Variable selection for ... Department of Electrical Engineering and Computer Science. University of .... (3) Rerun the fitted Q iteration algorithm on the ''best attributes''. S xi. = ∑.

Variable selection for Dynamic Treatment Regimes (DTR)
Department of Electrical Engineering and Computer Science. University of Liège. 27th Benelux Meeting on Systems and Control,. Heeze, The Netherlands ...

Variable selection for dynamic treatment regimes: a ... - ORBi
will score each attribute by estimating the variance reduction it can be associ- ated with by propagating the training sample over the different tree structures ...

Variable selection for dynamic treatment regimes: a ... - ORBi
Nowadays, many diseases as for example HIV/AIDS, cancer, inflammatory ... ical data. This problem has been vastly studied in. Reinforcement Learning (RL), a subfield of machine learning (see e.g., (Ernst et al., 2005)). Its application to the DTR pro

Variable selection for dynamic treatment regimes: a ... - ORBi
n-dimensional space X of clinical indicators, ut is an element of the action space. (representing treatments taken by the patient in the time interval [t, t + 1]), and xt+1 is the state at the subsequent time-step. We further suppose that the respons

Dynamic Treatment Regimes using Reinforcement ...
Fifth Benelux Bioinformatics Conference, Liège, 1415 December 2009. Dynamic ... clinicians often adopt what we call Dynamic Treatment Regimes (DTRs).

Dynamic Treatment Regimes using Reinforcement ...
Dec 15, 2009 - Raphael Fonteneau, Susan Murphy, Louis Wehenkel, Damien Ernst. University of Liège, University of Michigan. The treatment of chroniclike illnesses such has HIV infection, cancer or chronic depression implies longlasting treatments that

DYNAMIC GAUSSIAN SELECTION TECHNIQUE FOR ...
“best” one, and computing the distortion of this Gaussian first could .... Phone Accuracy (%). Scheme ... Search for Continuous Speech Recognition,” IEEE Signal.

Model Selection Criterion for Instrumental Variable ...
Graduate School of Economics, 2-1 Rokkodai-cho, Nada-ku, Kobe, .... P(h)ˆµ(h) can be interpreted as the best approximation of P(h)y in terms of the sample L2 norm ... Hence, there is a usual trade-off between the bias and the ..... to (4.8) depends

Bayesian linear regression and variable selection for ...
Email: [email protected]; Tel.: +65 6513 8267; Fax: +65 6794 7553. 1 ..... in Matlab and were executed on a Pentium-4 3.0 GHz computer running under ...

Sett selection and treatment for higher productivity of ...
Its importance in tropical agriculture is due to its drought tolerance, wide flexibility .... CTCRI,. Trivandrum. pp.7. Published by the Director,. CTCRI, Trivandrum.

Dynamic Discrete Choice and Dynamic Treatment Effects
Aug 3, 2006 - +1-773-702-0634, Fax: +1-773-702-8490, E-mail: [email protected]. ... tion, stopping schooling, opening a store, conducting an advertising campaign at a ...... (We recover the intercepts through the assumption E (U(t)) = 0.).

Dynamic Model Selection for Hierarchical Deep ... - Research at Google
Figure 2: An illustration of the equivalence between single layers ... assignments as Bernoulli random variables and draw a dif- ..... lowed by 50% Dropout.

A Dynamic Replica Selection Algorithm for Tolerating ...
in this system are distributed across a local area network. (LAN). A machine may ..... configuration file, which is read by the timing fault handler when it is loaded in the ..... Introduction to the Next Generation Directory Ser- vices. Technical re

Dynamic Adverse Selection - Economics - Northwestern University
Apr 14, 2013 - capturing our main idea that illiquidity may separate high and low quality assets in markets ... that she might later have to sell it, the owner of an asset had an incentive to learn its quality. ..... The proof is in an online appendi

Dynamic Adverse Selection - Economics - Northwestern University
Apr 14, 2013 - Of course, in reality adverse selection and search frictions may coexist in a market, and it is indeed ..... The proof is in an online appendix. Note that for .... Figure 1: Illustration of problem (P) and partial equilibrium. Figure 1

Consistent Variable Selection of the l1−Regularized ...
Proof. The proof for Lemma S.1 adopts the proof for Lemma 1 from Chapter 6.4.2 of Wain- ..... An application of bound (3) from Lemma S.4 with ε = φ. 6(¯c−1).

Variable selection in PCA in sensory descriptive and consumer data
Keywords: PCA; Descriptive sensory data; Consumer data; Variable selection; Validation. 1. Introduction. In multivariate analysis where data-tables with sen-.

Split Intransitivity and Variable Auxiliary Selection in ...
Mar 14, 2014 - Je suis revenu–j'ai revenu `a seize ans, j'ai revenu `a Ottawa. ... J'ai sorti de la maison. 6 ..... 9http://www.danielezrajohnson.com/rbrul.html.

DTR
Jul 1, 2008 - depressive episodes, alcohol, drug, ... > 681 patients, 12 weeks of treatment, 3 types of treatments: ○. Nefazodone (200, 300, 400, 500 then 600 mg ... Selecting a concise set of variables for representing the Dynamic Treatment Regime

Regularization and Variable Selection via the ... - Stanford University
ElasticNet. Hui Zou, Stanford University. 8. The limitations of the lasso. • If p>n, the lasso selects at most n variables. The number of selected genes is bounded by the number of samples. • Grouped variables: the lasso fails to do grouped selec

oracle inequalities, variable selection and uniform ...
consistent model selection. Pointwise valid asymptotic inference is established for a post-thresholding estimator. Finally, we show how the Lasso can be desparsified in the correlated random effects setting and how this leads to uniformly valid infer

Variable selection in PCA in sensory descriptive and consumer data
used to demonstrate how this aids the data-analyst in interpreting loading plots by ... Keywords: PCA; Descriptive sensory data; Consumer data; Variable ...

Variable density formulation of the dynamic ...
Apr 15, 2004 - Let us apply a filter (call this the “test” filter) of width, ̂∆ > ∆, to the ... the model for the Germano identity (the deviatoric part) we have,. LD ij = TD.