Patient-Reported Outcome Measures (PROMs) for routine use in Treatment Centres: recommendations based on a review of the scientific evidence

Sarah C Smith, Stefan Cano, Donna L Lamping, Sophie Staniszewska1, John Browne, James Lewsey, Jan van der Meulen, John Cairns, Nick Black

Health Services Research Unit, London School of Hygiene & Tropical Medicine, London, UK

1

Royal College of Nursing Institute, Oxford, UK

Final report to the Department of Health December 2005

Table of Contents Executive Summary…………………………………………………………………………………2 Acknowledgements……………...…………………………………………………………………..3 List of abbreviations…………………………………...……………………………………………4 Chapter 1

Background and methods……………………………………………………………..6

Chapter 2

Disease-specific PROMs in cataract surgery……………….……………………….17

Chapter 3

Disease-specific PROMs in varicose vein surgery………………………………….26

Chapter 4

Disease-specific PROMs in hernia repair….…………………………………..……30

Chapter 5

Disease-specific PROMs in hip and knee replacement surgery….…….………...…32

Chapter 6

Generic PROMs for utility assessment……………………………...……………...48

Chapter 7

Post-operative complications……………………………………………...………..57

Chapter 8

Recommendations ……………………..……...…………………………...……….59

References………………………………………………………………………………………….63 Appendix 1 Appendix 2 Appendix 3 Appendix 4 Appendix 5

Final set of text search terms for PROMs…………………….……………….……77 Final set of MESH search terms for PROMs………………………………….…....78 Final set of text & MESH search terms– patient-reported post-operative complications..…..………………………………………………………………......81 Data extraction sheet………………………………………………………………..82 Clinician survey………..………….…………………………………………...........84

1

Executive Summary The advent of Treatment Centres creates a new opportunity to provide efficient elective surgery to NHS patients. In the increasing culture of accountability within the health service, clinicians, patients, managers and policy-makers are concerned that Treatment Centres are not only efficient but also of high quality. The use of patientreported outcome measures (PROMs) ensures that patients’ assessments of outcomes are included in the assessment of performance. These complement assessments made by clinicians. The objectives were to: •

identify generic and disease-specific PROMs in five areas of surgery



review the psychometric properties (reliability, validity and responsiveness) of the identified PROMs



recommend one disease-specific measure for use in each of the five areas of surgery



recommend one generic measure that could be used for utility assessment across all five areas of surgery



review the available patient-reported measures for assessing post-operative complications and make a recommendation

To identify potentially relevant articles we searched 11 electronic databases, checked the reference lists of selected articles to identify further relevant articles, consulted experts in each of the five areas of surgery, and hand-searched the last 5 years of three journals. Articles were selected for inclusion using explicit criteria. Data were extracted from included articles using a standard form and where contactable a summary of the data was sent to developers for verification. We appraised each PROM using explicit criteria, based on gold standard psychometric methods and an explicit rating scheme. Each measure was appraised independently by two reviewers. To ensure that our short-listed PROMs were acceptable to patients we considered three operational criteria . Finally, to ensure any proposed measures would have clinical credibility, we invited surgeons to comment.

2

As a result of that process we recommend use of the following disease (procedure)specific measures: •

VF-14 (cataract surgery)



Aberdeen Varicose Vein Questionnaire (varicose vein surgery)



SF-36 (hernia repair)



OHS (hip replacement surgery)



OKS (knee replacement surgery)

In addition, we recommend the use of the EQ5D (generic utility measure) and relevant questions from the Patients' Experiences of Surgery Questionnaire to cover post-operative complications.

3

Acknowledgments We would like to thank the members of the Clinical Advisory Group (Professor Paul Edwards, Professor Paul Gregg, Professor Andrew Kingsnorth, Dr Eileen Scott, Dr John Sparrow) for their input. We are also grateful to the library staff of the London School of Hygiene & Tropical Medicine (LSHTM), particularly Janice Ward for invaluable assistance in obtaining relevant articles.

4

List of abbreviations ADVS - Activities of Daily Vision Scale AIMS - The Arthritis Impact Measurement Scale Alpha – Cronbach’s alpha BP – Bodily pain subscale of SF-36 EQ5D - EUROQOL ES – Effect size GH – General health perception subscale of SF-36 HAQ - Health Assessment Questionnaire HVAT - Houston Vision Assessment ICC – Intra-class coefficient KR-20 – Kuder Richardson-20 statistic LEFS - Lower Extremity Functional Scale MCS – Mental component summary score of the SF-36 MH – Mental health subscale of SF-36 MIDs – Minimally important differences OHS - Oxford Hip Score OKS - Oxford Knee Score PCS – Physical component summery score of the SF-36 PF – Physical functioning subscale of SF-36 PSI - Patient-Specific Index

5

RE – Role –emotional subscale of SF-36 RP – Role-physical subscale of SF-36 SF – Social functioning subscale of SF-36 SF-36 – Short-Form 36 Health Status Questionnaire SRM – Standardised response mean VCM1 - Vision Core Module 1 VDA - Visual Disability Assessment VEINES-QOL/Sym - Venous Insufficiency Epidemiological and Economic Study Quality of life/Symptoms Questionnaire VF-14 - Visual Function Index VSQ - Visual Symptoms and Quality of Life Questionnaire VT – Vitality subscale of SF-36 WOMAC TM - Western Ontario and McMaster Osteoarthritis Index

6

Chapter 1 Background and methods 1.1 Background The advent of Treatment Centres creates a new opportunity to provide efficient elective surgery to NHS patients. In the increasing culture of accountability within the health service, clinicians, patients managers and policy-makers are concerned that Treatment Centres are not only efficient but also of high quality. The use of patientreported outcome measures (PROMs) ensures that patients’ assessments of outcomes are included in the assessment of performance. These complement assessments made by clinicians. A PROM is any measure of the outcome of treatment that is reported directly by patients. This includes post-operative complications, health or functional status, health-related quality of life (HRQL) and satisfaction with the outcome. PROMs are useful tools for evaluating health changes following interventions but must be shown to be fit for purpose according to standard scientific criteria and practical considerations. Most importantly, as with all outcome measures used to evaluate health care, PROMs must be shown to be scientifically robust measuring instruments. This involves a rigorous and systematic assessment of psychometric properties (e.g. reliability, validity, responsiveness). For PROMS that have been shown to be psychometrically robust, practical/operational issues, such as patient burden, costs and clinicians' views about acceptability, then need to be considered. PROMs that are judged to be psychometrically rigorous and practical for routine use can be considered fit for purpose for evaluating outcomes. 1.2 Aims The first aim was to undertake a comprehensive and systematic review of the international literature on disease (or procedure) specific PROMs in five areas of surgery (cataract surgery, varicose vein procedures, hip replacement, knee replacement and hernia repair) that could be used to assess effectiveness. The second aim was to identify generic measures applicable to all surgical areas that can be used

7

to assess efficiency. In addition, we aimed to undertake a limited review of instruments that assess post-operative complications. 1.3 Objectives The objectives were to: •

identify generic and disease-specific PROMs in five areas of surgery



review the psychometric properties (reliability, validity and responsiveness) of the identified PROMs



recommend one disease-specific measure for use in each of the five areas of surgery



recommend one generic measure that could be used for utility assessment across all five areas of surgery



review the available patient-reported measures for assessing post-operative complications and make a recommendation

1.4 Search strategies We used four strategies to search for relevant papers that describe use and/or evaluation of either disease-specific or generic measures in any of the five areas of surgery. First, we searched nine electronic databases (Web of Knowledge, IBSS, Medline, Cochrane library, HMIC, Pubmed, Psychoinfo, Embase and Cinahl). This was supplemented with two additional databases (SIGLE and NY Academy of Grey Literature) specifically focused on grey literature. The search terms used for each of these databases are described below. All searches were conducted from 1950-present (or whenever the database started if it was later than 1950). Second, we checked the reference lists of selected articles to identify further relevant articles. Third, we consulted experts in each of the five areas of surgery to elicit suggestions of the measures that are most commonly used and further references. Finally, we handsearched the last 5 years of three journals (Annals of the Royal College of Surgeons, Journal of Clinical Epidemiology, Medical Care) to identify additional articles not already identified. Text searches

8

We used an iterative process to develop the search terms used for the electronic databases. This combined tried-and-tested search terms used by other reviews 1 in addition to existing Cochrane reviews, extensive consultation between members of the research team, and the input from the project's clinical advisory group and other members of the Royal College of Surgeons (England). Our intention was to develop a search that was as inclusive as possible, whilst at the same time yielding a manageable number of articles within the time frame for the review. To avoid excluding measures that may have been developed and/or tested on a relevant disease but not tested in relation to the specific surgical procedure, our initial searches focused on disease-related rather than procedure-related terms. After each revision to the search terms, searches were re-run to determine the effect of each revision. Our final search terms described six domains: body part (e.g. hip or knee), disease (e.g. osteoarthritis), respondent (e.g. patient, self-report), content area (e.g. health or functional status, health-related quality of life), elicitation method (e.g. measure, scale, questionnaire), and measure of scientific quality (e.g. psychometric, reliability, validity, responsiveness). The full details of the final set of search terms is described in Appendix 1. Terms were combined using Boolean operators and terms were truncated where necessary. MESH or thesaurus terms MESH or thesaurus terms were used to search the five databases which have this facility (Medline, Cochrane Library, PubMed, Embase and Cinahl). The MESH/thesaurus terms were initially identified from the MESH headings available in Medline that most closely matched each text word from the text search. Potential MESH terms were then reviewed by members of the steering group and clinical advisory group. The final selection of terms was intended to reflect the most relevant terms for each procedure and to avoid the selection of terms describing infrequently used procedures. The final list of MESH terms is described in Appendix 2. Searches for measures of post-operative complications To search for complications measures we conducted text and MESH searches on a more limited number of databases (Medline, HMIC and King's Fund). These databases were selected to give broad representation of medical literature and the wider professional health care literature. We developed a search strategy based on the 9

clinical terms from our previous searches and combined these with 'complications' terms obtained by consultation with the steering group. The details of the searches are presented in Appendix 3. The searches were run from 1950 to the present or whenever the database started if after 1950. All database searches were run in early 2005. 1.5 Refinement of search strategy and selection of articles for review As described above, we used an iterative process to define the optimal search strategy for identifying potentially relevant articles for review. The first and most inclusive iteration of the search strategy was not sensitive enough; it yielded 10244 articles. Subsequent iterations of various combinations of search terms yielded from 144 to 4957 articles. After several iterations of the search strategy, in which we repeatedly evaluated the sensitivity and inclusiveness of our results with each iteration, we identified 2956 potentially relevant articles (759 cataracts, 1178 hips and knees, 426 hernias, 593 veins). The criteria for selecting an article for review were related to the specific objectives of this work, i.e. to recommend the most scientifically robust generic and diseasespecific PROMs in five areas of surgery for use in Treatment Centres in the UK. Therefore, to be selected an article had to report in English in a culturally relevant setting on a PROM (either generic or disease-specific) applied in surgery or in a surgical context in an adult sample. We included any PROM except measures based on a visual analogue scale or other single item rating scale as the measurement properties of VAS are well documented elsewhere 2,3. We also included studies of patients who were listed for but who had not yet actually undergone surgery. Thus, articles were excluded, for example, that reported: i) in a language other than English, as the PROM would not be appropriate for use in a UK setting; ii) using a PROMS to evaluate outcomes in patients with rheumatoid arthritis or varicose veins undergoing medical (not surgical) treatment; iii) in a sample limited to children, as the PROM could not be used in adults; and iv) the development/use of a PROMS outside Europe, North America or Australia, as the PROM would not be culturally appropriate for use in a UK setting. 10

From the pool of 2956 articles identified in the searches, we selected articles for review as follows. First, to develop and test the selection criteria, two members of the team (SC and SS) independently reviewed a subsample of abstracts (approx 10%) to judge whether the article should be selected for review. The two reviewers then met to compare their selection judgements and finalise the selection criteria. All remaining abstracts were then jointly reviewed by the two reviewers on a consensus basis to select articles for review. Additional articles that were subsequently identified from reference lists of included papers, consultation with experts, and by hand searching journals were reviewed for inclusion by one of the reviewers. Where the abstract did not provide sufficient information to decide whether the article should be included, the decision was made based on the information in the full article. Where a measure was originally developed for use with a wider condition (e.g. osteoarthritis) we cited the original development paper but the appraisal was based only on its use in surgery. 1.6 Data extraction Data were extracted from each included article using a standard form (Appendix 4). Initially, data from a small number of articles were extracted by two reviewers (SS and SC) to ensure there were no inter-rater discrepancies. Minor amendments were subsequently made to the data extraction form for clarity. Subsequent data extraction was undertaken by one of the two reviewers. Where contactable, a summary of the data extracted for each disease-specific measure was sent to the developer for verification. 1.7 Psychometric criteria Appraisal of disease-specific measures For disease-specific measures, we undertook formal psychometric appraisal for each measure for which there was at least one piece of psychometric evidence (reliability, validity or responsiveness) available. To do this we used an appraisal framework developed in our previous work 4, 5 which included the following components: reliability (internal consistency, test-retest reliability); validity (content validity, criterion-related validity, construct validity [within scale analyses, convergent/discriminant, known groups], other hypothesis testing); and 11

responsiveness. 1, 6 ,7 The framework and criteria are described in more detail in Table 1a. The framework does not include the component of cross-cultural adaptation or translation as this was not considered relevant for the current review. For each of the components in our appraisal framework, two psychometric experts (SS and SC) independently rated the psychometric evidence for each PROM on a 4point rating scale (0 = not reported or no evidence in favour; + = some limited evidence in favour; ++ = some good evidence in favour, but some aspects do not meet criteria or some aspects not reported; +++ = good evidence in favour). For example, a +++ rating for internal reliability indicates that the article reported both the alpha coefficient and item-total correlations, whereas a ++ rating indicated that the article reported only the alpha coefficient. Any discrepancies between the two independent raters were discussed and a consensus rating agreed. [For components rated ‘0’, no distinction was made between those for which the component had never been evaluated and those in which their evaluation had revealed evidence of unsatisfactory performance. While in some circumstances this is an important distinction, in the context of this report it makes no material difference as regardless of the cause of a 0 rating, an instrument would be deemed inappropriate.] For each surgical procedure, the two raters reviewed the psychometric evidence for each PROM and agreed a short-list of measures with the strongest psychometric properties. As is standard practice in psychometric appraisals, the comparative assessment of measures and subsequent short-listing of the most scientifically robust PROMs was determined by a consensus decision, based on expert opinion about the strength of the psychometric evidence, rather than on quantitative algorithms or an explicit weighting scheme. Thus, a PROM with several +++ or ++ ratings across the range of psychometric components would be appraised as being psychometrically stronger than a PROM with largely ++ or + ratings. For a procedure where two measures showed strong psychometric properties, both were short-listed and then judged in terms of operational characteristics and clinical credibility. Appraisal of generic measures for utility assessment

12

Psychometric appraisal of generic measures used the same framework and procedure as for disease-specific measures. We undertook formal appraisal only of generic measures that had been used in all five areas of surgery. 1.8 Operational criteria Three operational criteria (acceptability, interpretability and feasibility/burden) were also appraised on a two-point scale: 0 = no evidence, + = some evidence. These are described in more detail in Table 1b. 1.9 Clinicians’ views To ensure that our short-listed PROMs were also acceptable and credible to clinicians we consulted three surgeons in each surgical area, identified by the clinical advisory group. Clinicians were sent a short description of each short-listed PROM in their surgical area and asked their view of the acceptability of this choice. A sample of the materials sent to clinicians is in Appendix 5. 1.10 Recommendations The final decision about which measure to recommend in each area was informed by the psychometric evidence, operational evidence and clinicians' views. Contenders had to first satisfy the psychometric criteria. For those that did, the operational criteria were then addressed. Finally, for those that met the practical requirements of the operational criteria, the views of clinicians were sought to establish whether or not the measure had clinical credibility. 1.11 Structure of the report Each of the next four chapters describe and appraise the psychometric properties of disease-specific PROMs: cataract surgery (Chapter 2); varicose vein surgery (Chapter 3); hernia repair (Chapter 4); and hip and knee replacement (Chapter 5). For each measure we provide a summary of the available surgery-specific evidence. Where measures were not originally developed for use in surgery, we cite the main development paper(s) but only describe the surgery-specific evidence in detail. Each chapter describes the format and content of each measure and then reviews the evidence on reliability (internal consistency and test-retest), validity (content, 13

construct, criterion-related and known groups) and responsiveness. A particular component is not described in the text if there was no available evidence. Appraisal of the evidence is summarised in a table using the rating system described above. Generic PROMs applicable for use in economic evaluation of all five surgical areas are described and appraised in Chapter 6. We provide a general description of each generic measure and an overview of the general population-based evidence, obtained from existing compendia 3, 8-10. The relevant surgery-specific evidence for each measure is then described in detail. In Chapter 7, measures of post-operative complications are discussed and the overall recommendations are presented in Chapter 8.

14

Construct validity Within-scale analyses

Predictive validity

Criterion-related validity Concurrent validity

Validity Content validity

Test-retest reliability

Reliability Internal consistency

Appraisal Component









15

evidence that a single entity (construct) is being measured and that items can be combined to form a single score; assessed on the basis of evidence of good internal consistency and correlations between scale scores (which purport to measure related aspects of the construct)





• •

internal consistency (Cronbach’s alpha) ≥ 0.70 moderate to high correlations between scale scores where appropriate* (*if only one scale in measure then only internal consistency is considered as sufficient evidence) factor analysis may also be undertaken to empirically confirm the conceptual framework. Factors should support the hypothesised scales item convergent/discriminant analyses to confirm hypothesised scale structure

high correlation between the scale and the criterion measure

high correlation between the scale and the criterion measure

qualitative evidence from pre-testing with patients, expert opinion and literature review that items in the scale are representative of the construct being measured

test-retest reliability correlations for summary scores ≥0.70 for group comparisons 6

Cronbach's alphas for summary scores ≥ 0.70 for group comparisons 6 item-total correlations ≥ 0.20 12

Criteria for Acceptability

evidence that the scale predicts a gold standard criterion • that is measured at the same time; assessed on the basis of correlations between the scale and the criterion measure evidence that the scale predicts a gold standard criterion • that is measured in the future; assessed on the basis of correlations between the scale and the criterion measure

The extent to which the content of a scale is representative of the conceptual domain it is intended to cover; assessed qualitatively during the questionnaire development stage through pre-testing with patients, expert opinion and literature review

The extent to which items comprising a scale measure the same construct (e.g. homogeneity of items in the scale); assessed by Cronbach's alphas 11 and item-total correlations The stability of a measuring instrument over time; assessed by administering the instrument to respondents on two different occasions and examining the correlation between test and retest scores

Definition/Test

Table 1a Psychometric criteria for evaluation of PROMs

Responsiveness

Known groups differences

Discriminant validity

Construct validity cont’d Analyses against external criteria Convergent validity

Appraisal Component

16

evidence that the scale is not correlated with measures of different constructs; assessed on the basis of correlations with measures of different constructs The ability of a scale to differentiate known groups; assessed by comparing scores for subgroups who are expected to differ on the construct being measured (e.g. a clinical group and a control group) The ability of a scale to detect significant change over time; assessed by comparing scores before and after an intervention of known efficacy (on the basis of various methods including t-tests,12 effect sizes (ES),13,14 standardised response means (SRM)15 or responsiveness statistics16

evidence that the scale is correlated with other measures of the same or similar constructs in the hypothesised direction; assessed on the basis of correlations between the measure and other similar measures

Definition/Test

Table 1a (continued) Psychometric criteria for evaluation of PROMs

significant differences between known groups and/or a difference of expected magnitude statistically significant change in scores from preto post-treatment and/or a difference of expected magnitude







Correlations are expected to vary according to the degree of similarity between the constructs that are being measured by each instrument. Specific hypotheses are formulated and predictions tested on the basis of correlations. low correlations between the instrument and measures of different constructs



Criteria for Acceptability

*Adapted from Lamping et al. 2002; 2003

17

a description of what a numerical score actually means (e.g. the mapping of a point difference to a textual description). Interpretability data should be based on an appropriately large sample to ensure generalisability. the time, energy, financial resources, personnel or other resources required of respondents or those administering the instrument.

Interpretability

Feasibility/burden

the quality of data; assessed by completeness of data and score distributions

Acceptability

Table 1b Operational criteria for evaluation of PROMs Appraisal Component Definition/Test missing data for summary scores <5% even distribution of endorsement frequencies across response categories • floor/ceiling effects for summary scores <10% There are various methods for providing information about interpretability of a score, including clinical/minimally important differences (MIDS). As a minimum, mean and standard deviation statistics should be provided. • low rates of non-completion • short time to administer/complete • reasonable time and resources to collect, process and analyse data

• •

Criteria for Acceptability

Chapter 2 Disease-specific PROMs in cataract surgery Our review identified eight disease-specific measures for use in cataract surgery. (We also identified four papers that have used visual analogue scales (VAS) or single question assessments in cataract surgery and four measures that had been developed/evaluated for use in cataract but not in cataract surgery. These were not included in the review.) The eight identified measures are described below together with the relevant psychometric evidence. Where a measure has been used/evaluated for cataract surgery but was not necessarily originally developed for this purpose, we cite the original development paper but only review the surgery specific evidence in detail. 2.1 Visual Function Index (VF-14) The Visual Function Index (VF-14) 13 consists of 14 items describing visual functional impairment resulting from cataract. It is a self-reported measure which can either be self- or interviewer-administered and was developed in US English. The measure is reported to take approximately 8 minutes to complete 14. Each item for which the patient has difficulty is reported on a 4-point Likert-type response scale and the time frame is the present. The measure generates a single overall score which is transformed to 0-100 where a higher score describes better outcome. The VF-14 was developed on the basis of a review of the literature, consultation with clinical experts and qualitative interviews with patients. The measure has acceptable internal consistency (alpha = 0.74-0.85, item-total correlations = 0.32-0.61) 13 15 and test-retest reliability (ICC = 0.79) 16. The VF-14 shows evidence of construct validity (convergent) (moderate correlations with Sickness Impact Profile (SIP) 13 15, visual acuity 15 17, standard gamble preferences expressed for visual health 14 and correlation with VR Sickness Impact Scale (vision-related modification of the generic SIP) 15 17 and verbal ratings of visual health 14.

18

Other validity evidence 18 has indicated that higher pre-operative VF-14 scores predict the likelihood of worse post-operative outcome. The measure has acceptable evidence of responsiveness (significant post-operative improvement) 15, (effect sizes pre/post surgery 0.87-1.49) 16; 17 and shows greater responsiveness with cataract surgery patients than the SIP 16. Mean and standard deviations are available to aid interpretation of the VF-14 16. Four subsequent studies 19 20 21 22 have considered shorter English versions of the VF14, either using classical item-reduction techniques 19; 20 or Rasch analyses 22 21. Although some psychometric evaluation has been conducted for each of these shortened versions, there is no consensus on a reduced set of items that should be included in a shortened version of the VF-14. The VF-14 has also been used to compare visual function in cataract patients implanted with multifocal lens with those implanted with mono-focal lens 23. We identified one additional study 24 that had used the VF-14 to evaluate functional outcomes from cataract surgery in centres where there were substantial differences in the way that care and clinical practice were organised. 2.2 Activities of Daily Vision Scale (ADVS) The Activities of Daily Vision Scale (ADVS) 25 consists of 20 items, describing vision-related activities. It is a self-reported measure that is administered by interviewer either in person or by telephone. The ADVS was developed in US English for use with cataract surgery patients. Items are reported on a 5-point scale. The measure generates five sub-scale scores (night driving, day driving, far vision, near vision, glare disability) and an overall score. Sub-scales can be computed if at least half of the items in the sub-scale are complete. All scores are transformed to 0100 where higher scores represent better outcome. The content of the ADVS was developed on the basis of clinical expert opinion and consultation with patients. Three studies have reported ceiling effects 26 and high missing data. 27 28 The measure has acceptable internal consistency (alpha = 0.910.94).25 27 Test-retest reliability is high for the overall score and four of the five sub19

scores was high (r = 0.94-0.77) though one sub-scale (day time driving) has lower test-retest reliability (r = 0.65).25 Construct validity (within scale) is supported by demonstration of one overall factor in principal component analysis 25. Construct validity (convergent) has been demonstrated (moderate correlations with SF-36, visual loss and global rating question, 25 visual acuity, 29 28 glare and spatial contrast sensitivity. 29 Known groups validity has also been demonstrated.30 The ADVS has acceptable evidence of responsiveness (significant post-surgery improvements) 26 29 and change in ADVS scores at 12 months is significantly greater for patients undergoing first and only cataract extraction than for those undergoing a second extraction and for those who had a second extraction within 12 months. 31 Relative responsiveness has been evaluated (change in ADVS is not related to change in visual acuity, but is related to change in SF-36 dimension scores, except mental health).31 28 Information is available on interpretability of scores (median values for a small sample of healthy controls, 26 average pre-operative scores31 prediction models for patients most likely to benefit.28 One study has considered an item-reduced version of the ADVS using Rasch analysis. 27

This version consists of 15 items and a revised response scale. It has acceptable

internal consistency (alpha = 0.91) and construct validity (moderate correlations with visual acuity and contrast sensitivity), but has not been tested for test-retest reliability or responsiveness and the authors suggest this version needs further refinement. 2.3 Vision Core Module 1 (VCM1) The Vision Core Module (VCM1) 32 consists of 10 items describing vision-related quality of life. It is intended to be a core vision-related questionnaire with general applicability, to which other vision-related, disease-specific modules can be added. It is a self-reported measure that was developed in the UK for general use with visually impaired patients, including a small number of cataract surgery patients. Items are 20

reported on a 6-point Likert-type scale. The measure generates a single, overall score, where higher scores indicate worse quality of life. The VCM1 was developed after an extensive phase of questionnaire development and item reduction. Initial items were developed on the basis of a review of the existing literature, expert opinion and qualitative interviews with patients and the preliminary measure was rigorously pre-tested. Item reduction (based on general applicability of the items, prevalence of the activity and social desirability) resulted in the final 10item measure. Missing data are reported to be low. The VCM1 has acceptable internal consistency (alpha = 0.93; item-total correlations 0.65-77) and high test-retest reliability (ICC = 0.89) 32. Content validity has been confirmed by correlations between the final items and the original items in the item pool (most correlations >0.6). Construct validity (within scale) has been shown in factor analysis (confirms that there is one underlying factor, which accounts for 68% of the variance). Construct validity (convergent) has also been shown in moderate to high correlations with other vision-related measures (r = -0.49 to - 0.80) and moderate correlations with generic quality of life measures (-0.34- -0.53).32 Other types of validity and responsiveness have not been evaluated. 2.4 Visual Disability Assessment (VDA) The Visual Disability Assessment (VDA)33 consists of 18 items which describe visual disability. It was developed in Australia for use with cataract surgery patients. The VDA is a self-reported measure that is interviewer administered and is reported to take 3-9 minutes to complete.33 Items are reported on 4-point Likert type scales and the time frame is the present. Items are summed to produce an overall total score and 3 sub-scores (mobility visual disability, distance/lighting/reading visual disability, near and related tasks visual disability). For all scores, a higher score indicates greater impact.

21

The content of the VDA was developed following a review of the literature and input from cataract patients and piloting. Item reduction (based on item relevance and factor analysis) resulted in the final 18-item measure. The VDA has acceptable internal consistency (alpha 0.92-0.94) and high test-retest reliability (ICC = 0.960.98). Factor analysis suggests a 3-factor model which supports the three sub-scores. The VDA shows some evidence of construct validity (correlation with ADVS -0.53 - 0.84). Other types of validity and responsiveness have not been evaluated with the VDA. 2.5 Visual Symptoms and Quality of Life Questionnaire (VSQ) The Visual Symptoms and Quality of Life Questionnaire (VSQ) 34 consists of 26 items describing visual symptoms and quality of life and was developed in the UK for use with cataract surgery patients. It is a self-reported measure that is described to take less than 10 minutes to complete.34 Items are reported on Likert-type scales (4-7 points) and the time frame is the last month. The items are summed to generate two scores (visual symptoms/dysfunction and impact on vision-related quality of life). For both, higher scores indicate greater impact. The content of the VSQ was developed using a review of the literature, consultation with experts and patient interviews. Item-reduction was conducted on a preliminary version to achieve the final 26-item version of the measure. Missing data are reported to be low. The measure has acceptable internal consistency (alpha = 0.84-0.87) and test-retest reliability (r = 0.95-0.96).34 Construct validity (within scales) has been assessed using factor analysis.34 Factor analysis supported a single underlying factor, but did not support a 2-factor model. The VSQ has evidence of construct validity (convergent) (expected magnitude of correlations with SF-36 sub-scales, but mixed associations with visual acuity). Limited responsiveness evidence is available at the item level (all except one item show significant improvement post-surgery).34

22

Other types of validity have not been evaluated with the VSQ. A shorter version of the VSQ (14 items) 34 is also available, but has not been evaluated psychometrically. 2.6 Houston Vision Assessment (HVAT) The Houston Vision Assessment (HVAT) 35 36 assesses health-related functional outcome and consists of 10 items describing vision-related activities. An eleventh item asks respondents to rate the degree of confidence they have in their answers. Items are reported on Likert type scales (3-6 points). The HVAT is a self-reported measure that was developed in US English for use with cataract patients. It is reported to take less than 8 minutes to complete 35. The measure generates a total score and two sub-scores (visual disabilities and non-visual (physical) disabilities). Scores are transformed to 0-100 where a higher score indicates greater impairment. The content of the HVAT was developed on the basis of qualitative interviews with cataract patients and item reduction was undertaken (redundant items were discarded). The HVAT has acceptable internal consistency (alpha = 0.94-0.96 and item-total correlations 0.54-0.84). 35 Construct validity (convergent) has been reported (moderate association with ophthalmic indicators at baseline).35 Known groups validity (HVAT differs with visual acuity groups) has also been reported.35 Limited responsiveness has been demonstrated for the visual impairment score but not for the physical impairment score (significant differences pre-post surgery.35 Other types of validity have not been evaluated with the HVAT. 2.7 Cataract TyPE specification The Cataract TyPE specification 37 consists of 13 items that assess visual functioning and cover 5 domains (distance vision, near vision, daytime vision, night-time driving and glare). It is a self-reported measure developed in US English that can be either self-administered or interviewer-administered. Completion time is reported to be 23

approximately 17 minutes. The Cataract TyPE specification has also been adapted for a UK population.38 The items are self-reported on a Likert-type intensity scale (5 points). The measure generates 5 domain scores (distance vision, near vision, daytime vision, night-time driving and glare) which are transformed to 0-100. An overall total score can also be calculated. The content of the Cataract TyPE specification was developed after a literature review, consultation with experts and focus groups with patients. Missing data are reported to be low.39 The measure has high internal consistency (alpha = 0.91-0.94) 37 40

and test-retest reliability (r and kappa >0.7), though test-retest reliability is

substantially worse when the respondent is helped by a different person to complete the retest questionnaire. 38 Acceptable construct validity (convergent) has been reported (high correlations with rating of vision and moderate correlation with visual acuity and rating of vision in better eye, and moderate correlations with SF-36 PCS).37 40 Known groups validity has also been reported (participants receiving multifocal lens implants report less limitation in visual functioning).40 Limited responsiveness evidence is available (significant association between change in Cataract TyPE specification and change in visual acuity and rating of vision 37 and with independent interview ratings.38 Mean and standard deviations are available to aid interpretation of the Cataract TyPE.37 Other types of validity have not been reported with the Cataract TyPE specification. 2.8 Quality of Vision Questionnaire The Quality of Vision Questionnaire 41 is an adaptation of the VF-14 13 for use with pseudophakic patients. It was developed in English and consists of 17 items describing pseudophakic visual symptoms. It is a self-reported measure that can be administered by interviewer if necessary. Each item is reported on a Likert-type 24

response scale (4-5 points) and the time frame is the present. Items are summed to generate a single score which is converted to a percentage. A higher score indicates better function. The content of the Quality of Vision Questionnaire was developed through review of existing measures, consultation with experts and piloting with patients. Content validity was ensured by qualitative comparison with the VF-14. Two items are reported to have possible floor effects, but this data is not available at the scale level. The measures has acceptable internal consistency (alpha = 0.93; item-total correlations >0.3) and is reported to have acceptable test-retest reliability (Bland Altman method).41 Evidence of construct (convergent) validity has been reported (high correlation of glare items with photographic image of photic phenomena). Known groups validity is also reported (significant difference between patients with posterior capsular opacification (PCO) and those without PCO). Other types of validity and responsiveness have not been evaluated with the Quality of Vision Questionnaire. 2.9 Conclusions Psychometric appraisal of the disease-specific PROMs for use in cataract surgery is presented in Table 2. While six of the measures have evidence of reasonable psychometric properties, the VF-14 stands out as significantly better than the others. This measure was therefore short-listed, based upon the psychometric evidence, and is discussed further in Chapter 8.

25

+++

Validity: Content validity

+ ++

Interpretability

Feasibility/burden

++

+

0

++

++

+

++

++

0

+

++

++

0

+++

++

Cataract TyPE ++

+

0

0

0

0

++

++

++

0

+++

+++

+++

QVQ*

+

0

+

0

0

0

++

+++

0

+++

+++

+++

VCM1

++

0

++

+

0

0

++

++

0

+++

+++

++

VSQ

++

0

0

0

0

0

+

++

0

++

+++

++

VDA

+

+

0

+

0

++

+

++

0

+

0

+++

HVAT

* Quality of Vision Questionnaire

Operational criteria (unshaded area) 0 = no evidence; + = some evidence

26

Psychometric criteria (shaded area) 0 = not reported or no evidence in favour; + limited evidence in favour; ++ some acceptable evidence in favour, but some aspects fail criteria or not reported; +++ acceptable evidence in favour.

0

Acceptability

+++

+

Validity: Other hypothesis testing

Responsiveness

+++

0 0

++

++

Validity: Construct validity: Convergent/ Discriminant Validity: Construct validity: Known groups

++

+++

0

++

++

++

ADVS

Validity: Construct validity: Within scale analyses

0

+++

Reliability: Test-retest reliability

Validity: Criterion-related validity

+++

VF-14

Appraisal of psychometric and operational criteria of disease-specific PROMs for use in cataract surgery

Reliability: Internal consistency

Table 2

Chapter 3 Disease-specific PROMs in varicose veins surgery Our review identified only two disease specific measures for use in varicose vein surgery. (In addition we identified two measures that had been developed/evaluated for use in venous-related disease but not in varicose vein surgery but these were excluded from the review.) The two identified measures are described below together with the relevant psychometric evidence. Where a measure has been used/evaluated for varicose vein surgery but was not necessarily originally developed for this purpose, we cite the original development paper but only review the surgery specific evidence in detail. 3.1 Aberdeen Varicose Veins Questionnaire The Aberdeen Varicose Veins Questionnaire42 43 measures health-related quality of life for varicose vein patients. It consists of 13 items and is a self-reported measure developed for use with varicose vein patients but has been validated with those undergoing surgery and is reported to take approximately 5 minutes to complete.43 Items are reported on Likert-type scales (2-4 points) and the time frame is the last 2 weeks. The measure generates a single overall score (higher is more severe). The Aberdeen Varicose Veins Questionnaire was developed after a review of the clinical literature and consultation with clinical experts and was rigorously pre-tested. It has acceptable evidence of internal consistency (Cronbach’s alpha = 0.72-0.74)42 43 and testretest reliability (ICC = 0.79).44 Criterion-related validity has been shown in significant relationships with clinical CEAP scores.9 45 46 Evidence of construct validity (convergent) has been reported (moderate correlation with 4 sub-scales of SF-36 and self-rated symptoms/concerns).47 48 Known groups difference validity has also been reported.44

27

The measure has evidence of responsiveness (SRM = 0.66), 44 significant improvements at 6 months and 2 years post-surgery.45 46 3.2 Venous Insufficiency Epidemiological and Economic Study Quality of life/Symptoms Questionnaire (VEINES-QOL/Sym) The VEINES-QOL/Sym 49 measures patient-reported outcome in chronic venous disorders of the leg. It consists of 26 items describing symptoms (10 items), limitations in daily activities (9 items), time of day of greatest impact (1 item), psychological impact (5 items). It is a self-reported measure developed for use with patients who have venous insufficiency of the lower limb, but has been validated with samples that include patients undergoing surgery for varicose veins. It is reported to take less than 10 minutes to complete. Items are reported on Likert-type response scales (2-7 points) and the time frame is the last 4 weeks. The measure generates two summary scores (symptoms and quality of life) and high scores indicate better outcome. The VEINES-QOL/Sym was developed after literature review, review of existing outcome measures and consultation with clinical experts and has undergone pre-testing. The measure has low missing data, well distributed endorsement frequencies and low floor/ceiling effects. It also has high internal consistency (Cronbach’s alpha 0.87-0.91 and item-total correlations >0.2) and test-retest reliability (ICC = 0.86-0.89). The measure has acceptable evidence of construct validity (within scale) (high internal consistency and inter-scale correlations) and also convergent/discriminant (moderate correlation with SF-36 PCS, and higher correlation between SF-36 PCS and VEINES-QOL than between SF-36 MCS and VEINES-Sym) 49. It also has a significant relationship with clinical classification of severity 50. Known groups validity (more severe have lower scores) 51. Evidence of responsiveness has been reported (significant differences between those clinically classified and improved and non-improved) 49. 3.3 Conclusions

28

Psychometric appraisal of the two disease-specific PROMs for use in varicose vein surgery is presented in Table 3. Both measures meet all the criteria for reliability, validity and responsiveness (with the exception of validity based on testing other hypotheses). Although VEINES-QOL/Sym has stronger evidence of internal consistency, construct validity and known group differences, we did not feel these were sufficient in themselves to favour this instrument over the alternative. It was therefore decided to short-list both and subject them to consideration of operational criteria and the views of clinicians. The results are discussed in Chapter 8.

29

Psychometric appraisal of disease-specific PROMs for use in varicose vein surgery

Operational criteria (unshaded area) 0 = no evidence; + = some evidence

30

Aberdeen Varicose Veins Questionnaire VEINES-QOL/Sym Reliability: Internal consistency ++ +++ Reliability: Test-retest reliability +++ +++ Validity: Content validity +++ +++ Validity: Criterion-related validity +++ +++ Validity: Construct validity Within-scale ++ +++ analyses Validity: Construct validity Convergent/ ++ +++ Discriminant Validity: Known groups differences ++ +++ Validity: Other hypothesis testing 0 0 Responsiveness +++ +++ Acceptability 0 +++ Interpretability * + + Feasibility/burden ++ ++ Psychometric criteria (shaded area) 0 = not reported or no evidence in favour; + limited evidence in favour; ++ some acceptable evidence in favour, but some aspects fail criteria or not reported; +++ acceptable evidence in favour.

Table 3

Chapter 4 PROMs in hernia repair Our review identified only one disease-specific measure for use in hernia repair. (We also identified 50 papers that have used visual analogue scales or single question assessments but these were mainly restricted to an assessment of pain and therefore were not included.) The single identified measure is described below. 4.1 RCS (England) National Hernia Outcome Questionnaire The RCS (England) National Hernia Outcome Questionnaire 52 includes 36 questions covering 5 domains (pain, surgical support, work, daily routines, leisure time). It is a self-reported measure developed for use in a national audit of groin hernia repair two weeks after surgery The items use a variety of Likert-type scales (2-6 points). The time frame is the present. No information is available about scoring of the measures and no psychometric evaluation has been undertaken. Much of the RCS questionnaire is in fact a modification of the SF-36. It was therefore decided, in the absence of other contenders, to review the use of the SF-36 in hernia repair. The SF-36 has acceptable internal consistency in hernia repair patients (alpha 0.900.92) 141. Construct validity (known groups differences) is supported by several studies that have evaluated effectiveness of surgical techniques 142-148 effectiveness of anaesthesia types for hernia surgery 149 and evaluated pain following hernia surgery 150. Responsiveness has been documented in both laparoscopic and open hernia repair 151155

, though the magnitude and direction of reported effects varies between studies and

the timing of post-operative follow-up. The 8 dimension scores and the 2 summary scores (PCS and MCS) show a similar pattern of change 6 weeks after surgery (improvement after laparoscopic surgery and worsening after open surgery).

31

Moderate to large effects were found for bodily pain (improved after laparoscopic surgery), role physical (worsened after open surgery), energy/vitality (improved after open surgery) 151. However other studies 152 153 report large negative effect sizes for SF-36 1 week and 10 days after both laparoscopic and open surgery. There is also evidence that effects for physical dimensions (pain, role limitation – physical, and physical function) remain large at 6 months post surgery 155. Relative responsiveness has been shown against comparable dimensions of the Dartmouth COOP charts 154 and also the SF-12 151. Published data are available to aid interpretability of the SF-36 in hernia patients. One study 149 has compared hernia SF-36 scores with population norms. Descriptive statistics (mean and standard deviation) for a large sample of UK hernia patients at baseline and 6 week follow-up are also available 154. Other forms of validity and test-retest reliability have not been evaluated in hernia repair patients. 4.2 Conclusions Given the short-comings of the only disease-specific instrument, it is proposed that the SF-36 be considered for assessing the outcome of hernia repair.

32

Chapter 5 Disease-specific PROMs in hip and knee replacement surgery Our review identified 13 disease-specific measures for use in hip and/or knee replacement. (We also identified 35 papers that had only used visual analogue scales or single questions and 19 measures that had been developed/evaluated for use in joint-related disease or other joint surgery but not in hip or knee replacement. These were not included in the review.) The 13 identified measures are described below together with the relevant psychometric evidence. Measures that have been developed for use with both hip and knee replacement patients are described first, followed by measures developed only for use with hip replacement patients and then measures that have been developed for use only with knee replacement patients. Where a measure has been used/evaluated for hip or knee replacement but was not necessarily originally developed for this purpose, we cite the original development paper but only review the surgery specific evidence in detail. 5.1 Western Ontario and McMaster Osteoarthritis Index (WOMAC TM) The Western Ontario and McMaster Osteoarthritis Index (WOMAC TM) 53 54 consists of 24 items that reflect three areas of disability of the hip or knee (physical function, pain and stiffness). The measure is self-reported and was originally developed in Canadian English for use with osteoarthritis patients, but has subsequently been evaluated with hip and knee replacement patients. It is reported to take less than 5 minutes to complete. WOMACTM and can be used with a Likert-type scale (5 points), visual analogue scale response scales or 11 point numerical rating scales. Items are transformed (0-100) and then summed to generate 3 scores (physical function, pain and stiffness) where higher scores indicate better outcome. A computerised version of WOMAC TM has been developed, but not evaluated with hip/knee replacement patients.54 WOMAC TM was developed on the basis of a literature review and qualitative interviews with a large number of patients (Bowling 1995). Missing data are reported

33

to be low for WOMAC TM (<10%). 55 WOMAC TM is reported to show ceiling effects post-operatively for the pain subscale in knee replacement patients56 and ceiling effects for pain and stiffness subscales in hip replacement patients, though ceiling effects were also seen for Oxford Hip Score, BP, RP, RE, MH and SF dimensions of the SF-36 and also the EQ 5D. 57 Telephone interviews are reported to significantly reduce the amount of missing data with hip replacement patients.58 High internal consistency has been reported for hip and knee replacement patients (alpha >0.8 and item-total correlations >0.53 for knee replacement patients and >0.47 for hip replacement patients). 59 55 60 Test-retest was found to be acceptable amongst hip and knee replacement patients (ICC >0.7), except for the stiffness sub-scale (ICC 0.43).59 Construct validity (within scales) has been demonstrated with knee replacement patients 56 (mean inter-scale correlations = 0.71, compared with 0.50 for SF-36) and also with a combined sample of hip and knee replacement patients (inter-scale correlations 0.55-0.98).60 Construct validity (convergent) has been evaluated with WOMAC TM. Several studies have compared WOMAC TM with clinical measures, including clinician-rated function, stiffness and pain, 61 gait,62 knee movement,63 64 walking.64 Clinician rated function and pain are highly associated with WOMAC TM function and pain scales respectively.61 Change in gait distance and velocity are moderately associated with change in WOMAC TM, 62 though associations with change in gait symmetry are lower (<0.5). Knee movement has low to moderate association with WOMAC TM at 41 months post knee replacement surgery. 63 64 Walking is moderately associated with WOMAC TM (0.53-0.68) in men but not women after knee replacement.64 Clinician-rated knee scores (Knee Society Clinical Rating System) are moderately correlated with WOMAC TM both pre-and post-operatively (r = 0.33=0.68).65. Construct validity (convergent) has also been evaluated by comparison with the SF-36.56 57 66 67 68 WOMAC TM has significant moderate correlations with similar scales on the SF-36 56 66 69 70 though dissimilar scales also had correlations of similar magnitude. Moderate correlations both pre-and post-operatively were found between SF-36 PCS and MCS and WOMAC pain and functioning scales.65 Head to head comparisons with other disease-specific PROMs also indicate construct validity

34

of WOMAC TM and in a sample of hip replacement patients, WOMAC TM has moderate to high correlations with Oxford Hip Scale.57 Construct validity (discriminant) is suggested by a lack of association between WOMAC TM and gender, 71 though other studies72 have shown that non-hispanic whites have significantly lower pre-operative WOMAC TM scores in a mixed sample of hip and knee replacement patients. No significant relationship has been found between waiting time and WOMAC TM. 73 74 or between age and WOMAC TM (though post-operatively younger patients perform better) in a sample of hip replacement patients. 74 Some evidence suggests that the 3-scale structure of WOMAC TM may not be supported by factor analysis in a combined sample of hip and knee replacement patients. 75 60 This may reflect overlap between the stiffness WOMAC TM scale and the other two scales. However, factor analysis in more general (non-surgical) osteoarthritis samples has shown several conflicting models and has not provided a basis for modification of WOMAC TM.76-81 82 83 More conclusive evidence is therefore needed about the factor structure of the WOMAC TM amongst hip and knee replacement patients. Known groups validity has been shown in significant differences between total knee arthroplasty patients and controls with no knee disability 64 and with controls who have osteoarthritis.71 Significant differences have been shown between knee replacement patients who have further surgery and those who have no further surgery (both WOMAC TM and SF-36 significant, but larger t-value for WOMAC TM); between patients reporting satisfaction with knee replacement and those not satisfied (significant difference for both WOMAC TM and SF-36, but larger t-value for WOMAC TM

). Significant differences are also reported for different levels of general health

rating and patients with upper extremity limitations versus no upper extremity limitation (larger t-values for SF-36). Significant post-operative differences have also been found between hip arthroplasty patients undergoing a post-surgery exercise programme and those undergoing routine in-hospital physical therapy. 84 WOMAC TM also discriminates better between patients with differing degrees of knee disability than the SF-36, though SF-36 discriminates levels of general health better than 35

WOMAC TM. 85 The stiffness and pain scales of WOMAC TM also significantly discriminate between differing levels of co-morbidity, 55 but a similar effect was not found for the function scale of WOMAC TM nor for SF-36 RP, RE, SF and PF scales. Further evidence of validity (hypothesis testing) has used regression models to predict WOMAC TM outcome 86 68 87. In general these have accounted for a small proportion of the variance (R2 = 0.20-0.30) and have identified clinical variables as significant predictors of pre-operative WOMAC TM (BMI, pre-operative pain relief, type of prosthesis and post-operative complications).86 Post-operative WOMAC TM is predicted by BMI, pre-operative pain relief, type of prosthesis, post-operative complications, PT intensity and age, 86 walking ability 87 and pre-operative WOMAC TM

scores, level of education and co-morbidity. 67 SF-36 scores have also been found

to predict post-operative WOMAC TM. 88 Similar models have also been found to predict SF-36 67 87 and RAND SF-36. 86 Patient expectations have also been shown to predict WOMAC TM 68 in a sample of hip and knee replacement patients. Responsiveness of WOMAC TM has been demonstrated for both hip and knee replacement patients. Significant differences have been found pre-post-surgery 73 89 and SRM values are reported to be small (0.16 to -0.18) 16 days post-surgery, 60 1.3 at 3 months and 1.4 at 12 months. 90 For both hip and knee replacement patients, most improvement is reported to be for the function sub-scale and between 0 and 3 months post-surgery (effect size 1.1-1.4).89 Effect sizes are reported to be 1.02 at 0-2 months post surgery for knee replacement patients, 91 larger at 6 months post-surgery in a mixed sample of hip and knee replacement patients (effect size = 1.42-2.40; SRM 1.25-3.10) and also at 12 months (effect size = 1.42 -2.34; SRM 1.17-2.74).92 Significant effects are reported to be still evident at 2-7 years post surgery in a sample of hip replacement patients.93 Several studies have investigated relative responsiveness compared with SF-36 and clinical scales. In general WOMACTM has high responsiveness relative to other measures for both hip and knee replacement patients 89 94 91 92 95 and compares well with SF-36 physical dimensions.89 95 96 65 In general WOMAC TM is more responsive than clinically rated scales. 91 92 94 96 65 Two studies 57 55 have directly compared responsiveness of WOMAC TM with another disease-specific patient-reported outcome measure. In a sample of hip replacement patients the OHS has larger effect sizes at 1 year post-surgery compared with 36

WOMAC TM (effect sizes = 3.1 and 1.7 to 2.6 respectively). 57 For knee replacement patients, WOMAC TM has higher responsiveness than the Health Assessment Questionnaire (HAQ) at 6 months post-surgery and also SF-36 and EQ 5D (SRM = 0.63-1.06 for WOMAC, TM 0.33 for HAQ; 1.19-0.71 for SF-36 and 0.01-0.56 for EQ 5D). However in a rheumatology control group who did not undergo surgery the SF36 was more responsive.55 Means and standard deviation statistics are available from several studies to aid interpretability of WOMAC TM in hip and knee replacement. 95 73 62 66 It has also been suggested that hip replacement patients are likely to show improvement after surgery of at least 10 points. 95 One study has applied Rasch analysis to WOMAC TM.77 A reduced (12 item) version of WOMAC TM has been developed on the basis of clinician selection90 which includes 7 function items and 5 pain items. This reduced measure correlates 0.96 with the original WOMAC TM (criterion-related validity). It has also been evaluated for construct validity (moderate to high correlations with SF-36 PF, r = 0.52-0.72 for knee patients and 0.69 for hip patients and high correlation with clinician rated Harris Hip Score, r = 0.69). Responsiveness has been evaluated for both hip and knee replacement patients (SRM = 1.3 at 3 months and 1.6 at 12 months). However, reliability and other forms of validity have not been evaluated. In addition we identified one paper that has used WOMAC TM in routine practice for physical therapy.97 One paper has developed an adaptation of WOMAC. TM 5.2 Lower Extremity Functional Scale (LEFS) The Lower Extremity Functional Scale (LEFS) 98 99 consists of 20 items describing lower limb function. It is a self-reported measure that is reported to take 2 minutes to complete. It was developed in English for use with lower extremity orthopaedic patients and has been additionally validated with patients receiving hip or knee arthroplasty.99 Each item is reported on a 5-point scale and the time frame is the present. The measure generates one overall score (0-80, items summed) where higher scores describes better function.

37

The LEFS has high internal consistency (alpha = 0.93) and high test-retest reliability (ICC = 0.93). Construct validity (convergent) has been evaluated (correlations with clinical measures (timed up and go test, locomotion and stairs items from FIM, timed stair test, pain intensity scale, pain limitation scale and 6 minute walk test), r = 0.160.68).99 Construct validity has also been examined longitudinally using the same clinical measures (correlation of change scores at 4 weeks follow-up, r = 0.07 – 0.64). Other types of validity and responsiveness have not been evaluated with the LEFS. 5.3 The Arthritis Impact Measurement Scale (AIMS) The Arthritis Impact Measurement Scale (AIMS) 100 consists of 45 items representing 9 domains (pain, mobility, physical activity, household activities, activities of daily living, anxiety, depression, social activity and dexterity). It is self-reported and was originally developed in English for use with osteoarthritis patients, but has subsequently been specifically evaluated with hip and knee replacement patients. Each item is reported on a 2-point scale. The measure generates 9 sub-scales (items transformed so that each scale is scored 0-10) and an overall score (0-90) where higher score indicates greater disability. Construct (convergent) validity of AIMS has been demonstrated in hip or knee replacement patients (low to high correlations with generic scales FSI, IWB and SIP, r = 0.21-0.87 and moderate to high correlations with disease-specific scales HAQ, r = 0.52-0.84 101). Discriminant validity has been assessed in pre-operative scores in hip replacement patients in terms of gender, (males more mobile), pathology (osteoarthritis patients have more pain), living alone (lower ADL ability), education level (high school graduates tend to have lower mobility, household activity ability and ADL ability).102 Known groups validity has been shown in differences between pain management group and a control group post hip-replacement.103 AIMS has been shown to have comparable responsiveness with the Functional Status Index, HAQ, SIP and the Index of Well-Being in hip or knee replacement patients at 3, 12 and 15 months post-surgery 101 104 (AIMS SRM range 0.17-1.11). At 3 months 38

post-surgery, AIMS is relatively more responsive for mobility and less for social domains.101 AIMS has also been shown to be responsive in a sample of hip replacement patients (significant differences 14 weeks after total hip replacement for pain, effect size 0.91 and physical activity domains, effect size 0.88, but not for other domains).102 However, post-operative scores in hip replacement patients have been shown to be significantly influenced by gender (males more mobile), pathology (osteoarthritis patients have more post-op pain, better at household activities and ADL, less anxiety and less depression) and living alone (more post-op mobility).102 Mean and standard deviation data are available to aid interpretability.103 Reliability and other forms of validity have not been evaluated in either hip or knee replacement patients. A revised version of AIMS has been developed specifically for use after hip replacement surgery.105 This consists of 57 items that include two additional questions about previous surgery and a transition question that compared health after surgery with health before surgery, but do not include dexterity. The revised version of the instrument is scored using a revised weighted scoring system to derive 4 subscales (physiologic function, self concept, role function, interdependence). The content validity of the revised instrument was evaluated qualitatively by a group of nurses and pre-testing undertaken. Construct validity of the revised AIMS has been demonstrated by moderate to high inter-scale correlations.105 Responsiveness has also been shown descriptively (improvements at 12-24 months post-surgery).105 Reliability and other forms of validity have not been evaluated with the revised version of AIMS. 5.4 Health Assessment Questionnaire (HAQ) The Health Assessment Questionnaire (HAQ) 106 consists of 45 items that describe functional ability, pain and illness affect. It is a self-reported measure that was originally developed for arthritis, but has since been evaluated with total knee and hip arthroplasty patients 107 108 109. Functional ability items are reported on Likert-type response scales (4 points) and pain and affect items are rated using visual analogue

39

scales. The measure generates 3 scores: functional ability (0-3), pain (0-3), affect (0100). Higher scores indicate more impact. Three studies have evaluated responsiveness of the HAQ with total with total hip or knee replacement patients 107 108 109. Two studies found significant differences at 6 weeks and 6 months after surgery 107 108 (effect sizes -0.16-1.53 at 6 weeks and 0.81.80 at 6 months) and these were found to be similar for AIMS and MPQ. One study109 has suggested that maximum change can take a year or more to occur, but no statistical data are available. Mean and standard deviation data are available to aid interpretation of HAQ data 108. Reliability and validity have not been evaluated with the HAQ in hip or knee replacement patients. 5.5 Mayo Scale The Mayo Scale 110 consists of 7 items describing pain, function, mobility and strength. It is a self-report measure, developed in US English for use with hip and knee replacement patients, which was very widely used from 1969-1994. No information is available about the response scales or the time frame. The measure generates two scores: a self-reported equivalent to the clinically rated Hospital for Special Surgery (HSS) Knee Score (range 0-80) 111 and the other is a self-reported equivalent to the clinically rated Mayo Clinical Hip Score (range 0-80) 112. For both scores, higher indicates better outcome. Construct validity (convergent) has been evaluated (high correlation between selfreported version and HSS and Mayo Clinical Hip Score, r = 0.69-0.80), though physician ratings tended to be significantly higher for total knee replacement patients 110

.

Reliability, other types of validity and responsiveness have not been reported for this measure. 5.6 Vigour Assessment Instrument 40

The Vigour Assessment Instrument 113 measures post-operative vigour in total joint arthroplasty patients, including hip and knee replacement. Two different versions have been developed, on to assess pre-operative characteristics (12 items describing 3 domains: well-being, readiness/ability to do activities, functional difficulty) and the other to assess post-operative characteristics (14 items, describing 3 domains: wellbeing, readiness to resume activities and functional ability). Each version is reported to take 5 minutes to complete and was developed in US English. Each item is reported on a Likert-type scale (5-6 points). No information is available about scoring. The Vigour Assessment Instrument was developed on the basis of a review of existing instruments (SF-36, WOMAC,TM) and consultation with an expert panel. An initial 23-item instrument was pre-tested with a small group of patients and item reduction undertaken (based on endorsement rates, item clarity, factor analysis and preliminary psychometric evaluation). No evaluation has been undertaken of the final itemreduced version of the Vigour Assessment Instrument. 5.7 Oxford Hip Score The Oxford Hip Score (OHS) 114 115 116 consists of 12 items describing symptoms related to hip replacement. It is a self-administered measure that was developed in English for use with total hip replacement patients. Each item is rated on a Likerttype scale (5 points) and the time frame is the last 4 weeks. The measure generates a single overall score ranging from 12-60 (summed items) where a higher score indicates more difficulty. The OHS was developed on the basis of an extensive review of the existing literature and qualitative interviews with patients. Pre-testing with patients reduced the initial questionnaire to 12 items 115 116. However, some concerns have been raised about the clarity and content validity of the OHS117. One study has reported ceiling effects at 1 year post-surgery 57. The measure has acceptable internal consistency (alpha = 0.84-0.92; item-total correlations >0.25) and test-retest reliability (Bland Altman coefficient 7.27) 115.

41

Construct validity (convergent) has been evaluated (low to moderate correlations with Charnley (r = -0.15 to -0.58), SF-36 (r = -0.19 to -0.71), AIMS (r = 0.03 to 0.66) 115 and high correlations with EQ 5D (r = -0.67 preoperatively and -0.77 postoperatively) 118. Moderate correlations have also been reported between OHS and clinical measures of postural stability, strength and range of movement (r = 0.56-0.58) 119

. One study 57 has directly compared OHS with WOMAC TM and found high

association (r = 0.63 – 0.88). Construct validity (discriminant) has been assessed 118 and no differences were found with respect to age or gender. Known groups validity has been demonstrated (overall significant differences in OHS between patients reporting different degrees of satisfaction) 118, though this effect was also found for the EQ 5D. Acceptable responsiveness has been reported, (effect size = 2.75 at 6 months; SRM = 2.0 114 115 120; effect size = 2.5 at 3 months and effect size = 3.05 - 3.1 at 12 months 116 57

. The OHS is reported to be generally more responsive than AIMS or SF-36 at 6

months post-surgery 114 115 120 and the EQ5D at 1 year after surgery, particularly for revision hip replacements 118. Mean and standard deviation data are available to aid interpretation of the OHS 114. Alternative scoring has been explored using Rasch analysis 121 122 123 and some gains in precision and discrimination have been reported. 5.8 Hip Rating Scale The Hip Rating Scale 124 consists of 14 items which describe hip function. It is a selfreported measure and was developed in US English for use with patients with arthritis of the hip, though it has been validated with hip replacement patients. The measure generates an overall score (16-100) and 4 sub-scales (ability to walk, ability to perform tasks, pain, and overall impact of arthritis). For all scores higher scores indicate better function. Items are reported on Likert-type scales (3-5 points) for pain, walking and function domains and a visual analogue scale for the global domain.

42

The Hip Rating Scale has high test-retest reliability (kappa for the total score = 0.70) 124

.

Construct validity (convergent) has been reported (moderate to high correlations with AIMS overall score, (r = -0.69) and between Hip Rating Scale pain score and AIMS pain score, (r = -0.71) and Hip Rating Scale walking score and clinical walking test, (r = 0.60) 124. Responsiveness (3, 6 and 12 months post-surgery) is suggested but no statistics are available 124. Mean and standard deviation data are available to aid interpretation of the Hip Rating Scale. Internal consistency, other types of validity and responsiveness have not been evaluated for the Hip Rating Scale. 5.9 Harris Hip Score The Harris Hip Score 125, 126 was originally developed as a clinician-reported measure, but has been modified for self-report 127. The self-report version consists of 7 items relating to pain, support for walking, limping, walking distance, climbing stairs, putting on shoes and socks and sitting. It was developed in English for use with total hip replacement patients. Items are reported on Likert-type scale (3-7 points) and the time frame is the present. The measure generates a single overall score (summed items), transformed to 0-100 where a higher score indicates most difficulty. Criterion-related validity has been reported (high correlations with original clinical Harris Hip Score, r = 0.99). Construct validity (convergent) has also been evaluated (correlation with WOMAC TM

; r = 0.90-0.96; SF-36, r = 0.71-0.97) 127.

Reliability, other types of validity and responsiveness have not been reported with the self-reported version of the Harris Hip Score. 43

5.10 Total Hip Arthroplasty Outcome Evaluation Questionnaire The Total Hip Arthroplasty Outcome Evaluation Questionnaire has separate versions for baseline (15 items administered pre-operatively), history (26 items administered both pre- and post-operatively) and post-operative (13`items administered after surgery). It is self-reported and was developed in US English for use with total hip replacement patients. The system describes 4 domains (pain, work and activities of daily living and satisfaction). All items are reported on Likert-type scales (3-8 points). Test-retest reliability has been reported at the item level (satisfaction correlations = 0.41-1.0; pain = 0.72-0.97; work = 0.71-1.0; activities of daily living = 0.58-1.0) 128, but no evidence is available at the scale level. Construct validity (within scale) has been reported (selected within scale correlations >0.45). Construct validity (convergent) has also been assessed by correlations between individual items and SIP (r = 0.32-0.56) 128. Test-retest reliability at the scale level, internal consistency, other forms of validity and responsiveness have not been evaluated for the Total Hip Arthroplasty Evaluation Questionnaire. 5.11 Patient-Specific Index The Patient-Specific Index (PSI) 129, 130 consists of 22 (interviewer administered version) or 24 (self-administered version) items that describe concerns associated with hip replacement. Each item is rated for both severity and importance using a Likerttype scale (7 points). There is also opportunity for respondents to nominate extra person-specific items. It was originally designed to be interviewer administered 129, but a self-administered version is also available 130. The measure was developed in English for use with total hip arthroplasty patients and is reported to take approximately 16 minutes to complete the measure 130. The measure generates a single overall score (transformed to 0-100) where higher scores indicate better outcome.

44

The PSI has acceptable test-retest reliability (ICC interviewer administered version = 0.73 - 0.96; self-administered version = 0.79) 129 130. Alternate forms reliability has also been assessed (no significant differences between self- and intervieweradministered versions; agreement between the two versions = 0.78) 130. Construct validity (convergent) has been demonstrated in moderate correlations with Harris Hip Score (r = 0.40-0.50), WOMAC TM disability scale (r = 0.56-0.72), WOMAC TM pain score (r = 0.43-0.55), SF-36 PF (r = 0.50-0.67), SF-36 BP (r = 0.400.48), MACTAR (r = 0.61-0.63) 129. Responsiveness has been reported (SRM = 1.0 - 1.8) and the PSI has been shown to have similar responsiveness to the Harris Hip Score, WOMAC TM, and the SF-36 (PF and BP dimensions) 129. Internal consistency and other types of validity have not been evaluated for the PSI. 5.12 Oxford Knee Score The Oxford Knee Score (OKS) 131consists of 12 items describing experience and problems associated with knee replacement. It is a self-administered measure that was developed in English for use with total knee replacement patients. Each item is rated on a Likert-type scale (5 points) and the time frame is the last 4 weeks. Items are summed to create a single overall score (12-60) where a higher score indicates greater difficulty. The OKS 131 was developed on the basis of qualitative interviews and extensive pretesting with patients. Items were reduced (from 20 to 12) on the basis of patient pretesting. Rates of missing data are reported to be low 131. The OKS has high internal consistency (alpha = 0.87 pre surgery and 0.97 post-surgery and item-total correlations >0.45) and high test-retest reliability (r = 0.92; item kappa = 0.26-0.68132) 131

.

Construct validity (convergent) has been demonstrated in moderate to high correlations with clinical ratings (AKS) and other self-reported outcome measures (SF-36 and HAQ) (correlations -0.47 to -0.78) 131. 45

Known groups validity has been shown in significantly lower scores for a sample of hip or lumbar spine patients who deny knee problems and a sample of knee replacement patients 133. No differences were found between patients receiving unicompartmental knee replacement and total knee replacements 134. Responsiveness has been evaluated and large effect sizes demonstrated at 6 months post-surgery (effect size = 2.19), compared with smaller effect sizes for SF-36 (effect sizes = 0.03 to -1.40) 131. Mean values are available to aid interpretation of the OKS 131. 5.13 McKnee System The McKnee System 135 is an adaptation of the Health Utilities Index (HUI) 136 and consists of 8 dimensions (vision, hearing, speech, mobility, dexterity, emotion, cognition and pain). The measure was developed in English, is interviewer administered and was designed for use with knee replacement patients. The time frame is the last week. Three clinical marker states have been developed which describe mild, moderate and severe manifestations of the 8 dimensions. The respondent is asked to rate their own health state on each of the dimensions. The three clinical marker states and the respondent’s own health state are rated using a visual analogue scale rating (thermometer) and also the standard gamble technique. The measure generates a utility score for the respondent’s health state according to a specified algorithm. The McKnee System was based on an extensive development phase, including a review of existing instruments (HUI2 and HUI3), consultations with clinicians involved in the care of total knee replacement patients and pre-testing with patients 135

. The measure is reported to have relatively low test-retest reliability for the

clinical marker states (ICC = 0.001 to 0.57, though no significant differences were found between the means). It is suggested that this may be high enough to justify use of the clinical marker states as reference points for group use, but that individual scores would be difficult to interpret 135.

46

Responsiveness has been evaluated at 3 months after surgery relative to the SF-36 135. McKnee System self health utility scores did not show change after surgery whereas significant change was observed for BP dimension and the single health transition question of the SF-36. Reliability of self health states and other types of validity have not been evaluated for the McKnee System. 5.14 Conclusions Psychometric appraisal of the disease-specific PROMs for use in hip and/or knee replacement is presented in Table 4. The WOMAC TM and OHS/OKS stand out as having the strongest evidence in all three psychometric components. As there is little difference between them, both WOMAC and OHS/OKS were short-listed and are discussed further in Chapter 8.

47

0

+ 0

++

+

+++

0

++

++

+++

0

++

+++

OKS +++

+

0

0

0

0

0

++

++

0

0

+++

LEFS ++

++

0

0

+++

0

0

++

0

0

0

+++

PSI 0

0

0

+

++

0

+

+++

0

0

++

0

AIMS 0

0

0

0

0

0

0

+

++

0

0

+

THOEQ* 0

0

0

0

++

0

0

0

0

0

++

+

MCKNEE 0

0

0

+

0

0

0

++

0

+++

0

0

HARRIS 0

0

0

+

+

0

0

++

0

0

0

+++

HRS* 0

0

0

0

0

0

0

0

0

+++

0

0

MAYO 0

0

0

+

+++

0

0

0

0

0

+

0

HAQ 0

VAI *

48

* THOEQ = Total hip Arthroplasty Outcome Evaluation Questionnaire; HRS = Hip Rating Scale; VAI = Vigour Assessment Instrument

Operational criteria (unshaded area) 0 = no evidence; + = some evidence

Psychometric criteria (shaded area) 0 = not reported or no evidence in favour; + limited evidence in favour; ++ some acceptable evidence in favour, but some aspects fail criteria or not reported; +++ acceptable evidence in favour.

++

+++

+++

Feasibility/burden

0

++

++

+++

+++

Acceptability

+++

+++

+

+++

++

+

0

++

++

0

+++

++

Interpretability

Validity: Criterion-related validity Validity: Construct validity: Within scale analyses Validity: Analyses against external criteria: Construct validity: Convergent/ Discriminant Validity: Construct validity: Known groups Validity: Other hypothesis testing Responsiveness

OHS +++

WOMAC TM +++

Appraisal of psychometric and operational criteria of disease-specific PROMS for use in hip and/or knee replacement

Reliability: Internal consistency Reliability: Test-retest reliability Validity: Content validity

Table 4

+

0

0

0

0

0

0

0

0

0

0

0

Chapter 6 Generic PROMs for utility assessment PROMs that are developed for use in economic evaluation are usually developed within a different measurement paradigm to the classical psychometric criteria (described in Chapter 1) that we have used here. The standard psychometric criteria have some similarities with the criteria for judging whether or not a particular measure is suitable for economic evaluation, but there are also important differences in the way that the two types of measures are evaluated. Firstly, measures for use in economic evaluation should reflect individual or societal preferences. Secondly, it is necessary for the economic measure to have cardinal properties (specifically interval properties), as most economic evaluation relates changes in the outcome measure to changes in resource use, Finally, the classical psychometric approach sets minimum standards that any candidate measure must achieve, but these do not exist for measures used in economic evaluation. For use in economic evaluation a measure must simply be better on balance than any alternative. There may therefore be a trade-off between psychometrically and economically derived measures. Measures that have strong psychometric properties may not necessarily have the best economic measurement properties and vice versa.

Our review identified three generic PROMs that can be used for utility assessment and that have been evaluated and/or used in at least one of the five areas of surgery of interest. Each generic measure is described separately for each of the five areas of surgery for which it has been evaluated/used. For each measure, we describe the instrument, summarise the existing psychometric evidence from the general population and review the specific evidence available for each of the five areas of surgery.

6.1 SF-36/SF-6D The SF-36 137 is a generic measure of health status. It consists of 36 items that represent 8 domains (physical functioning (PF), role-physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), role-emotional (RE), mental health (MH). There is also a single health transition question that does not contribute to any of the domains. It is a self-reported measure that can either be self- or interviewer administered and was developed in US English. A UK English version is also available 47,138, 139. Each item is self-reported on a Likert-type scale (2-6 points) and the time frame is the last 4 49

weeks (an acute version is also available with a time frame of 1 week). The measure generates 8 dimension scores (PF, RP, BP, GH, VT, SF, RE, MH) and also two summary scores (physical component score (PCS), and mental component score (MCS)). For all scores a higher score indicates a better outcome. The SF-36 was developed on the basis of an extensive literature review and development phase 140 and the 8 dimensions were reduced from an original pool of over 40 health concepts. Internal consistency has been evaluated in many types of patients in addition to the general population (alpha = 0.62-0.96) and test-retest reliability has also been evaluated (ICC=0.43-0.90; studies described in Ware et al 1993 137). Numerous studies have evaluated content, criterion-related and construct validity (convergent and discriminant) and known groups of the SF-36 in a variety of samples 137. Construct validity (within scales) has been demonstrated in factor analyses which supports a 2 scale solution. Construct validity (convergent) has been evaluated in comparisons with a number of other health-related measures (summarised in 137 including SIP and NHP (correlations between similar scales range from 0.51-0.85). The SF-36 dimensions have been compared with a range of other measures including ability to work, symptoms, utilization of care and various mental health criteria. Associations are generally significant and consistent 3. Responsiveness has been evaluated in a variety of samples and the SF-36 found to be sensitive to change. Large scale normative data are available for both US and UK samples to aid interpretation of SF-36 scores. A shorter form of the SF-36 has been developed consisting of 12 items (SF-12) which has also been extensively validated. The SF-12 covers the same 8 dimensions as the SF-36 and also generates two summary scores (PCS and MCS). For utility assessment, the SF6D has been developed but, as yet, there is little information available as to its psychometric properties. SF-36 in cataract surgery 50

There is limited evidence of construct validity (convergent) (significant correlations between GH and PF and utilities for visual health obtained via standard gamble, r = 0.260.30) but these were lower than were obtained the disease-specific VF-14. Small significant correlations were also demonstrated between RE and VE and a verbal rating of visual health (r = 0.21-0.28) 14. Reliability, other types of validity and responsiveness have not been evaluated in cataract surgery patients. SF-36 in varicose vein surgery The SF-36 has acceptable internal consistency (alpha (0.74-0.87) 48 and generally acceptable test-retest reliability (ICC >0.7 for all dimensions except RE) 44. Construct validity (known groups differences) has been evaluated. SF-36 PCS decreases with increasing severity, but there is no similar effect for MCS. PCS scores differ with categories of CEAP classification, but there is no similar effect for MCS and neither PCS nor PCS are predicted by CEAP scores 50. There is no difference on PCS or MCS for varicose vein versus no varicose vein groups 51. No dimensions showed differences between referred/not referred patients, only two dimensions (PF and RE) showed significant differences with severity, and no dimensions showed differences between groups with complications/no complications 44. Several studies have evaluated responsiveness of the SF-36 in varicose vein surgery. All 8 dimensions of the SF-36 show significant change 6 weeks after varicose vein surgery 48. All 8 dimensions (except GH) also show significant change based on transition question categories over 1 year 44. However, other studies report limited evidence of responsiveness 49. MCS shows significant change in relation to change in varicose vein related pain and PCS shows significant change in relation to change in CEAP classification, but neither score showed change in relation to change in clinically measured varicose veins nor edema. Varicose vein surgery data on the SF-36 have been compared with large scale UK population norms 47.

51

Other forms of validity have not been evaluated with surgical varicose vein patients. SF-36 in hernia repair (see Chapter 4). SF-36 in hip and/or knee replacement surgery The SF-36 has acceptable internal consistency in knee replacement patients (alpha = 0.770.90), but has not been evaluated for hip replacement patients 156. The SF-36 has documented floor and ceiling effects with both hip and knee replacement sample 157, 158. For hip replacements, baseline floor effects have been found for RP, RE (marginal) and ceiling effects for GH (marginal) SF (marginal) RE, RP; at 3 months floor effects for RP, and ceiling effects for RP, BP(marginal), GH, SF, RE and at 6 months floor effects for RP, RE( marginal) and ceiling effects for RP, BP, GH, SF, RE. For knee replacements, baseline floor effects have been found for RP, RE and ceiling effects for SF, RE, MH; at 3 months floor effects for RP, RE and ceiling effects for RP, BP, GH, SF, RE, MH and at 6 months floor effects for MH and ceiling effects for RP, BP, SF, RE, MH. Baseline floor effects for RP, REM, ceiling effects for RP, REM and SF. The SF-36 has been shown to have high test-retest reliability amongst hip replacement patients 127 (r = 0.71-0.97). Construct validity (convergent/discriminant) has been evaluated in hip replacement patients by agreement with other generic measures (agreement with HUI scores = 0.47) 159

. Construct validity (discriminant) has been evaluated in terms of association with age

(no differences), gender (women take longer to recover) and living alone (no differences in people who live alone compared with hose who do not live alone) 160. Several studies have evaluated known groups validity 161 158 162 163 164 165 In a mixed sample of hip and knee replacement patients significant differences were found between arthroplasty patients and controls 165 for SF-36 PF and RP dimensions. In hip replacement patients, differences have been shown between patients with acceptable physical performance at 3 months 161 and between different self-ratings of health for GH, BP, PF, SF and MH (magnitude and direction as expected) 158. In hip replacement 52

patients, significant differences have been shown between patients with unicondular knee prostheses and controls for SF-36 PF, RP, BP dimensions 163, and between patients with arthrodesis and athroplasty for SF-36 PF dimension 164. Patients with worse pre-operative knee scores (clinically assessed) have been found to have significantly worse SF-36 PCS scores but not MCS scores 162. The SF-36 does not significantly differentiate between patients receiving surgery and those not receiving surgery, nor between acute, moderate and chronic patients 98 162. A large number of studies have assessed responsiveness of the SF-36 in hip and knee replacement patients. Several studies have shown significant improvements for both hip and knee replacement samples 3 months after surgery 166, 6 months after surgery 167, 168, 1, 2, 4, 8, and 12 weeks after surgery (for PCS component score) and 1 and 2 weeks after surgery (MCS component score) 160, and 2 years after surgery 169 (though no difference was found for GH) and for selected individual questions at 5 days to 12 weeks after surgery 170. Magnitude of effects varies by dimension and largest effects are generally found for BP dimension and smallest effects for GH and MH dimensions. For hip replacement patients PF, RP show worse scores at 1 month but by 6 months all dimensions show improvement, though effects are still small for GH. Largest effects at 6 months are for BP (relative efficiency = 1) and these effects are sustained at 12months 158. At 3 and 6 months post-surgery magnitude of effects varies from moderate to large depending on the dimension (effect size = 0.30 (MH) to 1.21 (BP) 171 (SRM = 0.20 GH to 1.77 (BP)) 157. At 6 months effect sizes range between -0.07 (GH) and -2.13 (BP) 171 (SRM = 0.10 (GH) to 1.98 (BP) 157. In knee replacement patients, significant differences in PCS but not MCS components score have been shown at 3 months post-surgery 172. Significant difference have also been shown for PF, BP in men and BP, VE, RE and MH in women 173. At 3 months magnitude varies with dimension with lower SRM for VE (0.01) and higher for PF (-0.63) 157. Effect sizes at 3 months show similar range 131. At 6 months, lowest SRM are for GH (0.17) and highest for PF (1.68) 157. For total hip replacement patients mean PCS and MCS scores are available 174. For hip replacement patients mean and standard deviation data are available at baseline 158. Several studies have compared hip and/or knee replacement data with population norms and found knee replacement 173, 163 to have lower scores.

53

We identified one study that has used SF12 with hip and/or knee replacement patients 175. Results were descriptive and reported mean SF-12 PCS scores of 41.1 and mean SF-12 MCS scores of 28.7. 6.2 Euroqol (EQ5D) The Euroqol (EQ5D) 176 is a generic scale that measures health status. The most recent version consists of 15 items representing 5 domains (mobility, self-care, usual activities, pain/discomfort, anxiety/depression). For each domain, respondents are asked to indicate which of the three items best describes them, resulting in 243 possible permutations of the responses. Items in the EQ 5D are weighted by a utility function, either based on the respondents own health preferences, obtained via a visual analogue rating of the respondents current health state or by using published population health preferences. A valuation task may also be included as part of the questionnaire. The EQ5D generates a single index score where a higher score indicates better health 3. The EQ5D has acceptable test-retest reliability (r = 0.84) 177. Early evidence of construct validity was obtained by moderate to high correlations with HAQ (0.46-0.76) 178 and moderate correlations with HADS (0.44-0.51). Head to head comparisons have suggested that EQ5D may be more prone to ceiling effects than the SF36 179. Large scale normative data from the UK are available to aid interpretation of EQ5D 179. EQ5D in cataract surgery We did not identify any papers that have used/evaluated the EQ5D with cataract surgery patients. EQ5D in varicose vein surgery We identified one paper that has used EQ5D to evaluate treatment (including surgery) for varicose veins. Construct validity was not supported (no difference between surgical and non-surgical treatment groups at 4 or 12 weeks). Limited responsiveness (significant improvement at 12 weeks post surgery) is demonstrated 180.

54

Reliability and other forms of validity have not been evaluated with surgical varicose veins patients. EQ5D in hernia repair We did not identify any papers that have used/evaluated the EQ5D with hernia patients EQ5D in hip and/or knee replacement surgery There is limited evidence for 2-week test-retest reliability (Cohen’s Kappa=0.29-0.61) 181. These also provide some evidence for construct validity (convergent validity), ‘acceptable’ correlations with pain VAS, mobility, and ADL 182, and the EQ5D is described as being able to discriminate between known groups by pain scores, mobility and self care (known groups; 183. There is some evidence for responsiveness. Effect sizes 0.90 4 months post-surgery 182, and >0.86 1 year post hip replacement 118. Scores have been described as decreasing from 0.78 to 0.59 at 4 months and 0.51 at 17 months post-surgery 183. Other tests of reliability and validity have not been evaluated for the EQ5D with hip and/or knee replacement surgery patients. 6.3 Health Utilities Index (HUI) The Health Utilities Index (HUI) is a generic measure of health status and health-related quality of life. In its most widely used current format (HUI3) it consists of 8 domains (vision, hearing, speech, ambulation, dexterity, emotion, cognition, pain). [Note: newer versions – HUI4 and HUI5 – have been developed but have not been widely used and there is little information about them on the HUI website]. Each domain has five or six levels (known as attributes). Respondents choose the level of each domain that describes them best. This generates a series of health states for each respondent to which utility scores (based on large population samples) are applied. Test-retest reliability of the HUI3 has been evaluated 184 and found to be high (ICC = 0.77). Reliability of the scoring formula has also been investigated for HUI2 185 and found to be satisfactory. 55

Criterion-related validity (predictive) has been established by the ability of HUI to predict future health outcomes 186. Known groups validity has been demonstrated in significant differences between low birth weight and control children at age 8 187 188. Health Utilities Index (HUI) in cataract surgery We did not identify any papers that have used/evaluated the HUI with cataract surgery patients Health Utilities Index (HUI) in varicose vein surgery We did not identify any papers that have used/evaluated the HUI with varicose vein surgery patients Health Utilities Index (HUI) in hernia repair We did not identify any papers that have used/evaluated the HUI with hernia repair patients Health Utilities Index (HUI) in hip and/or knee replacement surgery Construct validity (convergent) of the HUI has been extensively evaluated in total hip arthroplasty patients 174. Eighty seven a priori hypotheses were tested. In general, three quarters of these hypotheses were confirmed indicating acceptable convergent validity. Construct validity (convergent) has also been evaluated by comparing HUI2 and HUI3 with SF-6D and standard gamble in a sample of patients referred for total hip arthroplasty 189

. Agreement between pre-operative scores for HUI2 and SF-6D was moderate (0.47)

and between HUI3 and SF-6D was low (0.28). Responsiveness has been demonstrated with hip arthroplasty patients for some domains of HUI (effect sizes range from 0.00 (HUI3 vision, speech, dexterity, cognition) to -0.22 (HUI3 emotion) to -0.56 (HUI3 ambulation) to -1.3 (HUI3 pain) 190. In general HUI was less responsive than WOMACTM and slightly less responsive than the SF-36 190. Relative responsiveness (compared with SF-6D) has been evaluated 189 and similar effect sizes found for HUI2, HUI3 and SF-6D (effect sizes = 1.1, 1.08 and 1.06 respectively). 6.4 Conclusions

56

A recent review of the use of health status measures in economic evaluation 208 concluded that “the best preference-based measures at the moment would seem to be the EQ5D and the HUIs.” It is important to note that this review predates the study introducing the SF6D 209 and that it might thus be appropriate to consider it alongside the EQ5D and the HUI. The SF-6D is of potential importance because it is based on responses to 11 items from the SF36, and the SF36 is the only generic measure to have been used with respect to all five areas of surgery considered in this report. However to date, the SF-6D has not been very widely used and the sample used to derive valuations of the different health states is much smaller than that used for the EQ5D (though larger than that underlying the HUI). In addition to the study already mentioned in chapter 6 159 that directly compared HUI2, HUI3 and SF-6D we also identified one additional study that had used SF-6D with a sample of leg ulcer patients 210. This study was excluded from our main review as the sample did not undergo surgery for varicose veins, but compared EQ5D with SF-6D. The authors concluded that further research is needed to investigate the similarities and differences between these types of measures. The SF-36 is the only generic measure to have been used in all five areas of surgery. However, as the purpose of including a generic measure is to assess efficiency, we shortlisted the EQ-5D because it is the most widely used health state measure applicable for utility evaluation. It has not been used with cataract or hernia patients.

57

Chapter 7 Post-operative complications Little work has formally evaluated patients’ ability to report on post-surgical complications. A recent review 203 notes the absence of patient-reported measures to assess surgical complications and argued that there is a need to formally assess the reliability of self-diagnosis of surgical wound infection by patients. We identified only one measure of surgical complications (wound infection), which is as yet unevaluated. Assessment of surgical complications tends therefore to be based on individual survey type questions that have not undergone formal psychometric evaluation. Example of these are described below. 7.1 The Cataract Outcome Study Questionnaire It includes 8 questions describing readmission, use of hospital casualty departments, outpatient visits, GP consultations, consultations with eye specialists. 204 Administered four months after surgery. 7.2 The Royal College of Surgeons (England) National Hernia Outcomes Questionnaire It includes 5 items describing surgical wound complications, (bruising, groin problems, scrotum problems, oozing from the wound, antibiotics). 52 It was developed in UK English. Each item is reported on 3-4 point Likert-type response scale and the it is administered two weeks after surgery. 7.3 The National Total Hip Replacement Questionnaire This includes two questions describing hip dislocation and any other problems after surgery. 205 In addition, we identified two generic measures of post-operative complications: 7.4 The Patients’ Experiences of Surgery Questionnaire It includes 5 questions describing pain, wound infection, allergy or reaction to drug, bleeding and other problems. 196-198 Each is reported using a Likert-type scale (2-4 points). Individual questions are scored separately. This was developed for the Audit Commission and was used extensively in the 1990s with a 'basket' of common daysurgery procedures. 58

7.5 ASEPSIS Wound Infection Questionnaire This is currently in use at University College London Hospitals. 206 The patient completed measure consists of 9 items describing post-surgical wound infection. It was developed to complement and provide additional information to the clinician-rated ASEPSIS measure 207. Each item is reported on a Likert-type scale (2 points) and refers to the time since leaving hospital. The measure generates a single score (higher is greater infection). Formal evaluations of validity, reliability and responsiveness have not been undertaken, but are planned 206. 7.6 Conclusions Given the need for a generic measure for post-operative complications, the only option is the appropriate set of questions from the Patients' Experiences of Surgery Questionnaire.

59

Chapter 8 Recommendations On the basis of their having the best psychometric properties, we short-listed the following: •

VF-14 (cataract surgery)



VEINES-QOL/Sym and Aberdeen VV Questionnaire (varicose vein surgery)



SF-36 (hernia repair)



WOMAC TM, OHS, OKS (hip and/or knee replacement surgery)

In addition, the EQ5D (generic utility measure) and relevant questions from the Patients' Experiences of Surgery Questionnaire were short-listed to cover post-operative complications. In this chapter we discuss the advantages and disadvantages of using each of these measures. For each measure we consider: i) any gaps in the existing psychometric evidence for the relevant surgery; ii) assessment against the operational criteria; and iii) clinicians’ views. 8.1

Cataract surgery - VF-14

The VF-14 13 measures visual functional impairment and has acceptable evidence of reliability and responsiveness and several forms of validity evidence. It does not have evidence of discriminant validity nor criterion-related validity and these should be addressed in PROMs phase 2. However, it may be difficult to obtain evidence of criterion-related validity as there is no gold standard measure against which to compare the VF-14. As it measures function, the VF-14 does not include psychological or social domains. While this is not a problem for a measure of function, an assessment of psychosocial aspects of outcome may be important for cataract surgery. If these domains are considered important for cataract surgery outcome, then the VF-14 should be used in conjunction with a generic measure (e.g. SF-36) that includes these domains. There is a small amount of evidence that the VF-14 could be used for comparing health care providers 24. The measure can be obtained by contacting the developer.

60

Clinicians considered the VF-14 to be acceptable though an additional assessment of glare was also suggested. 8.2

Varicose vein surgery - VEINES-QOL/Sym and Aberdeen VV questionnaire

Both instruments have acceptable evidence of reliability, validity and responsiveness in varicose vein patients. There is no difference in two of the operational criteria (Table 3) but as regards the third, there is no explicit published evidence as to the acceptability of the Aberdeen instrument. However, anecdotal evidence suggests it is well accepted by patients. It also needs to be noted that the acceptability of VEINES-QOL/Sym is based on samples including a wide mix of cases. Its acceptability specifically for varicose vein patients, a milder condition than some of the other venous diseases, is therefore uncertain. Clinicians were not supportive of VEINES-QOL/Sym, partly because of lack of familiarity. In contrast, they were familiar with the Aberdeen instrument, it having been developed in the UK, unlike the VEINES-QOL/Sym. They indicated that they would not find the VEINES-QOL/Sym an acceptable choice and would prefer the Aberdeen Varicose Vein Questionnaire. Given that the psychometric and operational characteristics of the two instruments do not differ substantially, we recommend the Aberdeen Questionnaire be used. 8.3

Hernia repair - SF-36

While the SF-36 has been used several studies of hernia repair outcome, some psychometric properties have not been adequately tested. These include: test-retest reliability; criterion-related validity; construct validity (within scale analyses and convergent/discriminant). In addition, further information is needed on two operational criteria: acceptability and feasibility. Despite these shortcomings, clinicians supported the use of the SF-36 for this purpose (rather than an instrument designed specifically for hernia repair).We therefore recommend its usage. 8.4

Hip and/or knee replacement - WOMAC TM, OHS, OKS

61

Psychometrically there is little to choose between WOMAC TM and the combination of OKS and OHS. WOMAC TM is widely used and has a high volume of associated literature, but is perhaps less widely used in the UK than the US. In contrast the OHS and OKS were developed in the UK and widely used in British orthopaedics. WOMAC TM has acceptable reliability, validity and responsiveness in both hip and knee replacement patients. There is some question about the factor structure of the measure, though the acceptable internal consistency found for all three sub-scales supports the continued use of all three sub-scale. WOMAC TM has not been evaluated for criterionrelated validity, though without an obvious gold standard measure, this may be difficult to evaluate. WOMAC TM measures disability of the hip/knee, but does not address psychological or social domains. If these are considered useful outcomes for hip and/or knee replacement then WOMAC TM will need to be supplemented with a generic measure (e.g. SF-36). There is no evidence that WOMAC TM has been used for comparing health care providers. The measure can be obtained by contacting the developers. OHS and OKS both have acceptable reliability and responsiveness and most forms of validity. Discriminant validity has not been evaluated in OKS. Neither OHS nor OKS have been evaluated for criterion-related validity, though without an obvious gold standard this may not be possible. Like WOMAC TM, both OHS and OKS measure physical and functioning aspects of hip/knee outcome and neither address psychological or social domains. It will therefore be necessary to supplement with a generic measure that covers these domains if they are considered important. There is no evidence that OHS/OKS has been used for comparing health care providers. The measure can be obtained from the developers. Clinicians expressed a preference for OHS/OKS on the basis of a perceived reluctance of patients to complete WOMAC TM, increasing the possibility of missing data. We therefore recommend the OHS/OKS be used. 8.5 Generic measures – EQ-5D The EQ-5D met all the operational criteria, not surprising given its brevity and clarity. Additional benefits from using it are its endorsement by the National Institute for Health and Clinical Excellence which requires “that health states should be measured in patients 62

using a generic and validated classification system for which reliable UK population preference values, elicited using a choice-based method such as the time trade-off or standard gamble (but not rating scale), are available.” 211 We are, however, aware of the current testing of the SF-6D and the belief that it will prove to be ore responsive than the EQ5D. Given the considerable uncertainties as to the performance of the SF-6D and the need to derive it from the full length SF-36 (though this may be reduced to 16 questions in the future), we cannot recommend its routine use at present. We do, however, accept its potential benefits and suggest it be added in for one procedure group in Phase II of this work so as to be able to contribute to knowledge of its methodological properties. 8.6 Post-operative complications In the absence of a well-developed measure of surgical complications, we suggest the use of a small number of individual patient-reported questions asking about complications drawn from the Patient's Experiences of Surgery Questionnaire. Questions should cover complications relevant to the particular type of surgery, e.g. wound infection, urinary tract infection, chest infection, pain, pressure sores and deep vein thrombosis (DVT), and reported on a simple yes/no scale. The opportunity to report 'other' complications should be included.

63

References 1. Fitzpatrick R, Davey C, Buxton M, Jones D. Evaluating patient-based outcome measures for use in clinical trials. Southampton: National Coordinating Centre for Health Technology Assessment, 1998:1-73. 2. McCormack HM, Horne DJ, Sheather S. Clinical applications of visual analogue scales: a critical review. Psychol Med 1988;18(4):1007-19. 3. McDowell I, Newell C. Measuring health: A guide to rating scales and questionnaires. New York: Oxford University Press, 1996. 4. Lamping D, Schroter S, Marquis P, Marrel A, Duprat-Lomon I, Sagnier P. The Community-Acquired Pneumonia Symptom Questionnaire. A new, patient-based outcome measure to evaluate symptoms in patients with community acquired pneumonia. Chest 2002;122:920-929. 5. Lamping D, Schroter S, Kurz X, Kahn S, L A. Evaluating outcomes in chronic venous disorders of the leg: Development of a scientifically rigorous, patient-reported measures of symptoms and quality of life. Journal of Vascular Surgery 2003;37:410-419. 6. Trust SACotMO. Assessing health status and quality of life instruments: Attributes and review criteria. Quality of Life Research 2002;11:193-205. 7. McDowell I, Jenkinson C. Development standards for health measures. J-Health-ServRes-Policy 1996;1(4):238-246. 8. Bowling A. Measuring health: A review of quality of life measurement scales. Second ed. Buckingham: Open University Press, 1991. 9. Bowling A. Measuring disease. Buckingham: Open University Press, 1995. 10. Bowling A. Measuring disease. Second ed. Buckingham: Open University Press, 2001. 11. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951;16:297-334. 12. Streiner DL, Norman GR. Health measurement scales: A practical guide to their development and use. Second ed. Oxford: Oxford University Press, 1995. 13. Steinberg EP, Tielsch JM, Schein OD, Javitt JC, Sharkey P, Cassard SD, et al. The VF-14: An index of functional impairment in patients with cataract. ARCH OPHTHALMOL 1994;112(5):630-638. 14. Lee JE, Fos PJ, Zuniga MA, Kastl PR, Sung JH. Assessing health-related quality of life in cataract patients: the relationship between utility and health-related quality of life measurement. Qual Life Res 2000;9(10):1127-35. 15. Desai P, Reidy A, Minassian DC, Vafidis G, Bolger J. Gains from cataract surgery: visual function and quality of life. Br J Ophthalmol 1996;80(10):868-73. 16. Cassard SD, Patrick DL, Damiano AM, Legro MW, Tielsch JM, Diener-West M, et al. Reproducibility and responsiveness of the VF-14. An index of functional impairment in patients with cataracts. Arch Ophthalmol 1995;113(12):1508-13.

64

17. Alonzo J, Espallargues M, Andersen TF, Cassard SD, Dunn E, Bernth Petersen P, et al. International applicability of the VF-14: An index of visual function in patients with cataracts. OPHTHALMOLOGY 1997;104(5):799-807. 18. Schein OD, Steinberg EP, Cassard SD, Tielsch JM, Javitt JC, Sommer A. Predictors of outcome in patients who underwent cataract surgery. Ophthalmology 1995;102(5):817-23. 19. Friedman DS, Tielsch JM, Vitale S, Bass EB, Schein OD, Steinberg EP. VF-14 item specific responses in patients undergoing first eye cataract surgery: can the length of the VF-14 be reduced? Br J Ophthalmol 2002;86(8):885-91. 20. Pager CKEMLcso. Assessment of visual satisfaction and function after cataract surgery. J CATARACT REFRACTIVE SURG 2004;30(12):2510-2516. 21. Mallinson T, Stelmack J, Velozo C. A comparison of the separation ratio and coefficient alpha in the creation of minimum item sets. Med Care 2004;42(1 Suppl):I17-24. 22. Velozo CA, Lai JS, Mallinson T, Hauselman E. Maintaining instrument quality while reducing items: application of Rasch analysis to a self-report of visual function. J Outcome Meas 2000;4(3):667-80. 23. Brydon KW, Tokarewicz AC, Nichols BD. Amo array multifocal lens versus monofocal correction in cataract surgery. J CATARACT REFRACTIVE SURG 2000;26(1):96-100. 24. Norregaard JC, Bernth Petersen P, Alonso J, Andersen TF, Anderson GF. Visual functional outcomes of cataract surgery in the United States, Canada, Denmark, and Spain: report of the International Cataract Surgery Outcomes Study. J Cataract Refract Surg 2003;29(11):2135-42. 25. Mangione CM, Phillips RS, Seddon JM, Lawrence MG, Cook EF, Dailey R, et al. Development of the 'Activities of Daily Vision Scale'. A measure of visual functional status. Med Care 1992;30(12):1111-26. 26. Elliott DB, Patla A, Bullimore MA. Improvements in clinical and functional vision and perceived visual disability after first and second eye cataract surgery. Br J Ophthalmol 1997;81(10):889-95. 27. Pesudovs K, Garamendi E, Keeves JP, Elliott DB. The Activities of Daily Vision Scale for cataract surgery outcomes: re-evaluating validity with Rasch analysis. Invest Ophthalmol Vis Sci 2003;44(7):2892-9. 28. Mangione CM, Orav EJ, Lawrence MG, Phillips RS, Seddon JM, Goldman L. Prediction of visual function after cataract surgery. A prospectively validated model. Arch Ophthalmol 1995;113(10):1305-11. 29. Superstein R, Boyaner D, Overbury O. Functional complaints, visual acuity, spatial contrast sensitivity, and glare disability in preoperative and postoperative cataract patients. J Cataract Refract Surg 1999;25(4):575-81. 30. McGwin G, Jr., Scilley K, Brown J, Owsley C. Impact of cataract surgery on selfreported visual difficulties: comparison with a no-surgery reference group. J Cataract Refract Surg 2003;29(5):941-8.

65

31. Mangione CM, Phillips RS, Lawrence MG, Seddon JM, Orav EJ, Goldman L. Improved visual function and attenuation of declines in health-related quality of life after cataract extraction. Arch Ophthalmol 1994;112(11):1419-25. 32. Frost NA, Sparrow JM, Durant JS, Donovan JL, Peters TJ, Brookes ST. Development of a questionnaire for measurement of vision-related quality of life. Ophthalmic Epidemiol 1998;5(4):185-210. 33. Pesudovs K, Coster DJ. An instrument for assessment of subjective visual disability in cataract patients. Br J Ophthalmol 1998;82(6):617-24. 34. Donovan JL, Brookes ST, Laidlaw DAH, Hopper CD, Sparrow JM, Peters TJ. The development and validation of a questionnaire to assess visual symptoms/dysfunction and impact on quality of life in cataract patients: The Visual Symptoms and Quality of life (VSQ) Questionnaire. OPHTHALMIC EPIDEMIOL 2003;10(1):49-65. 35. Prager TC, Chuang AZ, Slater CH, Glasser JH, Ruiz RS, Baribeau AD, et al. The Houston Vision Assessment Test (HVAT): An assessment of validity. OPHTHALMIC EPIDEMIOL 2000;7(2):87-102. 36. Prager T, Chuang A, Slater C, Glasser J, Ruiz R. The Cataract Outcome Study Group. Health-related quality of life assessment in cataract populations - the Houston Vision Assessment Test (HVAT). Invest Ophthalmol Vis Sci 1995;36:S841. 37. Javitt JC, Jacobson G, Schiffman RM. Validity and reliability of the Cataract TyPE Spec: an instrument for measuring outcomes of cataract extraction. Am J Ophthalmol 2003;136(2):285-90. 38. Lawrence DJ, Brogan C, Benjamin L, Pickard D, Stewart Brown S. Measuring the effectiveness of cataract surgery: the reliability and validity of a visual function outcomes instrument. Br J Ophthalmol 1999;83(1):66-70. 39. Javitt JC, Steinert RF. Cataract extraction with multifocal intraocular lens implantation: a multinational clinical trial evaluating clinical, functional, and quality-of-life outcomes. Ophthalmology 2000;107(11):2040-8. 40. Javitt JC, Wang F, Trentacost DJ, Rowe M, Tarantino N. Outcomes of cataract extraction with multifocal intraocular lens implantation: functional status and quality of life. Ophthalmology 1997;104(4):589-99. 41. Aslam TM, Gilmour D, Hopkinson S, Patton N, Aspinall P. The development and assessment of a self-perceived quality of vision questionnaire to test pseudophakic patients. Ophthalmic Epidemiol 2004;11(3):241-53. 42. Garratt AM, Macdonald LM, Ruta DA, Russell IT, Buckingham JK, Krukowski ZH. Towards measurement of outcome for patients with varicose veins. Qual Health Care 1993. 43. Smith JJ, Garratt AM, Guest M, Greenhalgh RM, Davies AH. Evaluating and improving health-related quality of life in patients with varicose veins. J Vasc Surg 1999. 44. Garratt AM, Ruta DA, Abdalla MI, et al. Responsiveness of the SF-36 and a condition specific measure of health for patients with varicose veins. Quality of Life Research 1996.

66

45. MacKenzie RK, Paisley A, Allan PL, Lee AJ, Ruckley CV, Bradbury AW. The effect of long saphenous vein stripping on quality of life. J Vasc Surg 2002;35(6):1197203. 46. Mackenzie RK, Lee AJ, Paisley A, Burns P, Allan PL, Ruckley CV, et al. Patient, operative, and surgeon factors that influence the effect of superficial venous surgery on disease-specific quality of life. J Vasc Surg 2002;36(5):896-902. 47. Garratt AM, Ruta DA, Abdalla MI, Buckingham JK, Russell IT. The SF36 health survey questionnaire: an outcome measure suitable for routine use within the NHS? BMJ 1993;306(6890):1440-4. 48. Smith JJ, Garratt AM, Guest M, Greenhalgh RM, Davies AH. Evaluating and improving health-related quality of life in patients with varicose veins. J Vasc Surg 1999;30(4):710-9. 49. Lamping DL, Schroter S, Kurz X, Kahn SR, Abenhaim L. Evaluation of outcomes in chronic venous disorders of the leg: development of a scientifically rigorous, patient-reported measure of symptoms and quality of life. J Vasc Surg 2003. 50. Kahn SR, M'Lan CE, Lamping DL, Kurz X, Berard A, Abenhaim LA. Relationship between clinical classification of chronic venous disease and patient-reported quality of life: results from an international cohort study. J Vasc Surg 2004;39(4):823-8. 51. Kurz X, Lamping DL, Kahn SR, Baccaglini U, Zuccarelli F, Spreafico G, et al. Do varicose veins affect quality of life? Results of an international population-based study. J Vasc Surg 2001. 52. Browne J. Royal College of Surgeons pf England National Hernia Outcomes Project Patient's Two Week Questionnaire. In: Cano S, editor. London, 2005. 53. Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt LW. Validation study of WOMAC: a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol 1988;15(12):1833-40. 54. Bellamy N, Campbell J, Stevens J, Pilch L, Stewart C, Mahmood Z. Validation study of a computerized version of the Western Ontario and McMaster Universities VA3.0 Osteoarthritis Index. J Rheumatol 1997;24(12):2413-5. 55. Brazier JE, Harper R, Munro J, Walters SJ, Snaith ML. Generic and conditionspecific outcome measures for people with osteoarthritis of the knee. Rheumatology (Oxford) 1999;38(9):870-7. 56. Bombardier C, Melfi CA, Paul J, Green R, Hawker G, Wright J, et al. Comparison of a generic and a disease-specific measure of pain and physical function after knee replacement surgery. Med Care 1995;33(4 Suppl):AS131-44. 57. Ostendorf M, van Stel HF, Buskens E, Schrijvers AJ, Marting LN, Verbout AJ, et al. Patient-reported outcome in total hip replacement. A comparison of five instruments of health status. J Bone Joint Surg Br 2004;86(6):801-8. 58. McGrory BJ, Shinar AA, Freiberg AA, Harris WH. Enhancement of the value of hip questionnaires by telephone follow-up evaluation. J Arthroplasty 1997. 59. Stucki G, Sangha O, Stucki S, Michel BA, Tyndall A, Dick W, et al. Comparison of the WOMAC (Western Ontario and McMaster Universities) osteoarthritis index

67

and a self-report format of the self-administered Lequesne-Algofunctional index in patients with knee and hip osteoarthritis. Osteoarthritis Cartilage 1998;6(2):7986. 60. Kennedy D, Stratford PW, Pagura SMC, Wessel J, Gollish JD, Woodhouse LJ. Exploring the factorial validity and clinical interpretability of the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC). Physiotherapy Canada 2003;55(3):160-8. 61. McGrory BJ, Harris WH. Can the western Ontario and McMaster Universities (WOMAC) osteoarthritis index be used to evaluate different hip joints in the same patient? J Arthroplasty 1996. 62. Boardman DL, Dorey F, Thomas BJ, Lieberman JR. The accuracy of assessing total hip arthroplasty outcomes: a prospective correlation study of walking ability and 2 validated measurement devices. J Arthroplasty 2000;15(2):200-4. 63. Witvrouw E, Victor J, Bellemans J, Rock B, Van Lummel R, Van Der Slikke R, et al. A correlation study of objective functionality and WOMAC in total knee arthroplasty. Knee Surg Sports Traumatol Arthrosc 2002. 64. Finch E, Walsh M, Thomas SG, Woodhouse LJ. Functional ability perceived by individuals following total knee arthroplasty compared to age-matched individuals without knee disability. Journal of Orthopaedic and Sports Physical Therapy 1998;27(4):255-63. 65. Lingard EA, Katz JN, Wright RJ, Wright EA, Sledge CB. Validity and responsiveness of the Knee Society Clinical Rating System in comparison with the SF-36 and WOMAC. J Bone Joint Surg Am 2001;83-a(12):1856-64. 66. Anderson JG, Wixson RL, Tsai D, Stulberg SD, Chang RW. Functional outcome and patient satisfaction in total knee patients over the age of 75. J Arthroplasty 1996;11(7):831-40. 67. Fortin PR, Penrod JR, Clarke AE, St Pierre Y, Joseph L, Belisle P, et al. Timing of total joint replacement affects clinical outcomes among patients with osteoarthritis of the hip or knee. Arthritis Rheum 2002;46(12):3327-30. 68. Mahomed NN, Liang MH, Cook EF, Daltroy LH, Fortin PR, Fossel AH, et al. The importance of patient expectations in predicting functional outcomes after total joint arthroplasty. J Rheumatol 2002;29(6):1273-9. 69. Conner Spady BL, Arnett G, McGurran JJ, Noseworthy TW. Prioritization of patients on scheduled waiting lists: Validation of a scoring system for hip and knee arthroplasty. CAN J SURG 2004;47(1):39-46. 70. Conner Spady B, Estey A, Arnett G, Ness K, McGurran J, Bear R, et al. Prioritization of patients on waiting lists for hip and knee replacement: validation of a priority criteria tool. Int J Technol Assess Health Care 2004. 71. Thomas SG, Pagura SMC, Kennedy D. Physical activity and its relationship to physical performance in patients with end stage knee osteoarthritis. Journal of Orthopaedic and Sports Physical Therapy 2003;33(12):745-54. 72. Lavernia CJ, Lee D, Sierra RJ, Gomez-Marin O. Race, ethnicity, insurance coverage, and preoperative status of hip and knee surgical patients. Journal of Arthroplasty 2004;19(8):978-985.

68

73. Williams JI, Llewellyn Thomas H, Arshinoff R, Young N, Naylor CD. The burden of waiting for hip and knee replacements in Ontario. Ontario Hip and Knee Replacement Project Team. J Eval Clin Pract 1997;3(1):59-68. 74. Nilsdotter AK, Lohmander LS. Age and waiting time as predictors of outcome after total hip replacement for osteoarthritis. Rheumatology (Oxford) 2002;41(11):1261-7. 75. Stratford PW, Kennedy DM. Does parallel item content on WOMAC's pain and function subscales limit its ability to detect change in functional status? BMC Musculoskelet Disord 2004. 76. Faucher M, Poiraudeau S, Lefevre-Colau MM, Rannou F, Fermanian J, Revel M. Assessment of the test-retest reliability and construct validity of a modified Lequesne index in knee osteoarthritis. Joint Bone Spine 2003;70(6):521-5. 77. Davis AM, Badley EM, Beaton DE, Kopec J, Wright JG, Young NL, et al. Rasch analysis of the Western Ontario McMaster (WOMAC) Osteoarthritis Index: results from community and arthroplasty samples. J Clin Epidemiol 2003;56(11):1076-83. 78. Ryser L, Wright BD, Aeschlimann A, Mariacher-Gehler S, Stucki G. A new look at the Western Ontario and McMaster Universities Osteoarthritis Index using Rasch analysis. Arthritis Care Res 1999;12(5):331-5. 79. Guermazi M, Poiraudeau S, Yahia M, Mezganni M, Fermanian J, Habib Elleuch M, et al. Translation, adaptation and validation of the Western Ontario and McMaster Universities osteoarthritis index (WOMAC) for an Arab population: the Sfax modified WOMAC. Osteoarthritis Cartilage 2004;12(6):459-68. 80. Faucher M, Poiraudeau S, Lefevre Colau MM, Rannou F, Fermanian J, Revel M. Algo-functional assessment of knee osteoarthritis: Comparison of the test-retest reliability and construct validity of the Womac and Lequesne indexes. OSTEOARTHRITIS CARTILAGE 2002;10(8):602-610. 81. Bellamy N. Womac TM. In: Smith S, editor. London, 2005. 82. Faucher M, Poiraudeau S, Lefevre Colau MM, Rannou F, Fermanian J, Revel M. Assessment of the test-retest reliability and construct validity of a modified WOMAC index in knee osteoarthritis. Joint Bone Spine 2004;71(2):121-7. 83. Bellamy N, Wells G, Campbell J. Relationship between severity and clinical importance of symptoms in osteoarthritis. Clin Rheumatol 1991;10(2):138-43. 84. Gilbey HJ, Ackland TR, Wang AW, Morton AR, Trouchet T, Tapper J. Exercise improves early functional recovery after total hip arthroplasty. Clin Orthop 2003(408):193-200. 85. Hawker G, Melfi C, Paul J, Green R, Bombardier C. Comparison of a generic (SF-36) and a disease specific (WOMAC) (Western Ontario and McMaster Universities Osteoarthritis Index) instrument in the measurement of outcomes after knee replacement surgery. J Rheumatol 1995;22(6):1193-6. 86. Braeken AM, Lochhaas-Gerlach JA, Gollish JD, Myles JD, Mackenzie TA. Determinants of 6-12 month postoperative functional status and pain after elective total hip replacement. Int J Qual Health Care 1997;9(6):413-8.

69

87. Jones CA, Voaklander DC, Suarez Almazor ME. Determinants of function after total knee arthroplasty. Physical Therapy 2003;83(8):696-706. 88. Lingard EA, Katz JN, Wright EA, Sledge CB. Predicting the outcome of total knee arthroplasty. J Bone Joint Surg Am 2004. 89. Bachmeier CJ, March LM, Cross MJ, Lapsley HM, Tribe KL, Courtenay BG, et al. A comparison of outcomes in osteoarthritis patients undergoing total hip and knee replacement surgery. Osteoarthritis Cartilage 2001;9(2):137-46. 90. Whitehouse SL, Lingard EA, Katz JN, Learmonth ID. Development and testing of a reduced WOMAC function scale. J Bone Joint Surg Br 2003;85(5):706-11. 91. Parent E, Moffet H. Comparative responsiveness of locomotor tests and questionnaires used to follow early recovery after total knee arthroplasty. Arch Phys Med Rehabil 2002;83(1):70-80. 92. Theiler R, Sangha O, Schaeren S, Michel BA, Tyndall A, Dick W, et al. Superior responsiveness of the pain and function sections of the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) as compared to the Lequesne-Algofunctional Index in patients with osteoarthritis of the lower extremities. Osteoarthritis Cartilage 1999;7(6):515-9. 93. Hawker G, Wright J, Coyte P, Paul J, Dittus R, Croxford R, et al. Health-related quality of life after knee replacement. J Bone Joint Surg Am 1998;80(2):163-73. 94. Kreibich DN, Vaz M, Bourne RB, Rorabeck CH, Kim P, Hardie R, et al. What is the best way of assessing outcome after total knee replacement? Clin Orthop 1996(331):221-5. 95. Jones CA, Voaklander DC, Johnston WC, Suarez Almazor ME. Health related quality of life outcomes after total hip and knee arthroplasties in a community based population. Journal of Rheumatology 2000;27(7):1745-52. 96. Nilsdotter A, Roos EM, Westerlund JP, Roos HP, Lohmander LS. Comparative responsiveness of measures of pain and function after total hip replacement. Arthritis and Rheumatism 2001;45(3):258-62. 97. Peerbhoy D, Keane P, Maciver K, Shenkin A, Hall GM, Salmon P. The systematic assessment of short-term functional recovery after major joint arthroplasty. J Qual Clin Pract 1999;19(3):165-71. 98. Binkley JM, Stratford PW, Lott SA, Riddle DL, North American Orthopaedic Rehabilitation Research N. The Lower Extremity Functional Scale (LEFS): scale development, measurement properties, and clinical application. Physical Therapy 1999;79(4):371-83. 99. Stratford PW, Binkley JM, Watson J, Heath Jones T. Validation of the LEFS on patients with total joint arthroplasty. Physiotherapy Canada 2000;52(2):97-105, 110. 100. Meenan RF. The AIMS approach to health status measurement: conceptual background and measurement properties. J Rheumatol 1982;9(5):785-8. 101. Liang MH, Larson MG, Cullen KE, Schwartz JA. Comparative measurement efficiency and sensitivity of five health status instruments for arthritis research. Arthritis Rheum 1985.

70

102. Brodie LJ, Sloman RM. Changes in health status of elderly patients following hip replacement surgery. J Gerontol Nurs 1998;24(3):5-12. 103. Berge DJ, Dolin SJ, Williams AC, Harman R. Pre-operative and post-operative effect of a pain management programme prior to total hip replacement: a randomized controlled trial. Pain 2004;110(1-2):33-9. 104. Liang MH, Fossel AH, Larson MG. Comparisons of five health status instruments for orthopedic evaluation. Med Care 1990. 105. Selman SW. Impact of total hip replacement on quality of life. Orthop Nurs 1989;8(5):43-9. 106. Fries JF, Spitz P, Kraines RG, Holman HR. Measurement of patient outcome in arthritis. Arthritis Rheum 1980;23(2):137-45. 107. Block JE, Westlake SM, Meredith LM, Sheppard MS. Total knee arthroplasty: the effect of early discharge on outcome at 6-8 weeks postoperative. Physiotherapy Canada 1999;51(1):45-51. 108. Kelly HK. Patient perceptions of pain and disability after joint arthroplasty. Orthopaedic Nursing 1991;10(6):43-50, 70. 109. Kirwan JR, Currey HL, Freeman MA, Snow S, Young PJ. Overall long-term impact of total hip and knee joint replacement surgery on patients with osteoarthritis and rheumatoid arthritis. Br J Rheumatol 1994;33(4):357-60. 110. McGrory BJ, Morrey BF, Rand JA, Ilstrup DM. Correlation of patient questionnaire responses and physician history in grading clinical outcome following hip and knee arthroplasty. A prospective study of 201 joint arthroplasties. J Arthroplasty 1996;11(1):47-57. 111. Insall J, Ranawat C, Aglietti P, Shine J. A comparison of four models of total kneereplacement prosthesis. J Bone Joint Surg 1976;58A:754. 112. Kavanagh BF, Fitzgerald RH, Jr. Clinical and roentgenographic assessment of total hip arthroplasty. A new hip score. Clin Orthop 1985(193):133-40. 113. Keating EM, Ranawat CS, Cats Baril W. Assessment of postoperative vigor in patients undergoing elective total joint arthroplasty: A concise patient- and caregiver-based instrument. ORTHOPEDICS 1999;22(Suppl.):S119-s128. 114. Dawson J, Fitzpatrick R, Murray D, Carr A. Comparison of measures to assess outcomes in total hip replacement surgery. Qual Health Care 1996. 115. Dawson J, Fitzpatrick R, Carr A, Murray D. Questionnaire on the perceptions of patients about total hip replacement. J Bone Joint Surg Br 1996;78(2):185-90. 116. Fitzpatrick R, Morris R, Hajat S, Reeves B, Murray DW, Hannen D, et al. The value of short and simple measures to assess outcomes for patients of total hip replacement surgery. Qual Health Care 2000;9(3):146-50. 117. McMurray R, Heaton J, Sloper P, Nettleton S. Measurement of patient perceptions of pain and disability in relation to total hip replacement: the place of the Oxford hip score in mixed methods. Qual Health Care 1999;8(4):228-33. 118. Dawson J, Fitzpatrick R, Frost S, Gundle R, McLardy Smith P, Murray D. Evidence for the validity of a patient-based instrument for assessment of outcome after revision hip replacement. J Bone Joint Surg Br 2001.

71

119. Trudelle Jackson E, Emerson R, Smith S. Outcomes of total hip arthroplasty: a study of patients one year postsurgery. Journal of Orthopaedic and Sports Physical Therapy 2002;32(6):260-7. 120. Fitzpatrick R, Dawson J. Heath-related quality of life and the assessment of outcomes of total hip replacement surgery. Psychology and Health 1997;12(6):793-803. 121. Fitzpatrick R, Norquist JM, Dawson J, Jenkinson C. Rasch scoring of outcomes of total hip replacement. J Clin Epidemiol 2003;56(1):68-74. 122. Fitzpatrick R, Norquist JM, Jenkinson C, Reeves BC, Morris RW, Murray DW, et al. A comparison of Rasch with Likert scoring to discriminate between patients' evaluations of total hip replacement surgery. Qual Life Res 2004. 123. Norquist JM, Fitzpatrick R, Dawson J, Jenkinson C. Comparing alternative Raschbased methods vs raw scores in measuring change in health. Med Care 2004;42(1 Suppl):I25-36. 124. Johanson NA, Charlson ME, Szatrowski TP, Ranawat CS. A self-administered hiprating questionnaire for the assessment of outcome after total hip replacement. J Bone Joint Surg Am 1992;74(4):587-97. 125. Harris WH, Sledge CB. Total hip and total knee replacement (1). N Engl J Med 1990;323(11):725-31. 126. Harris WH, Sledge CB. Total hip and total knee replacement (2). N Engl J Med 1990;323(12):801-7. 127. Mahomed NN, Arndt DC, McGrory BJ, Harris WH. The Harris hip score: comparison of patient self-report with surgeon assessment. J Arthroplasty 2001. 128. Katz JN, Phillips CB, Poss R, Harrast JJ, Fossel AH, Liang MH, et al. The validity and reliability of a total hip arthroplasty outcome evaluation questionnaire. J BONE JT SURG SER A 1995;77(10):1528-1534. 129. Wright JG, Young NL. The patient-specific index: asking patients what they want. J Bone Joint Surg Am 1997;79(7):974-83. 130. Wright JG, Young NL, Waddell JP. The reliability and validity of the self-reported patient-specific index for total hip arthroplasty. J Bone Joint Surg Am 2000;82(6):829-37. 131. Dawson J, Fitzpatrick R, Murray D, Carr A. Questionnaire on the perceptions of patients about total knee replacement. J Bone Joint Surg Br 1998;80(1):63-9. 132. Liow RY, Walker K, Wajid MA, Bedi G, Lennox CM. Functional rating for knee arthroplasty: comparison of three scoring systems. Orthopedics 2003. 133. Harcourt WGV, White SH, Jones P. Specificity of the oxford knee status questionnaire. J BONE JT SURG SER B 2001;83(3):345-347. 134. Weale AE, Halabi OA, Jones PW, White SH. Perceptions of outcomes after unicompartmental and total knee replacements. Clin Orthop 2001(382):143-53. 135. Bennett KJ, Torrance GW, Moran LA, Smith F, Goldsmith CH. Health state utilities in knee replacement surgery: the development and evaluation of McKnee. J Rheumatol 1997;24(9):1796-805.

72

136. Feeny D, Furlong W, Boyle M, Torrance GW. Multi-attribute health status classification systems. Health Utilities Index. Pharmacoeconomics 1995;7(6):490502. 137. Ware JEJ, Snow KK, Kosinski M, Gandek B. SF-36 Health Survey: Manual and interpretation guide. Boston: The Health Institute, New England Medical Centre, 1993. 138. Brazier JE, Harper R, Jones NM, O'Cathain A, Thomas KJ, Usherwood T, et al. Validating the SF-36 health survey questionnaire: new outcome measure for primary care. BMJ 1992;305(6846):160-4. 139. Jenkinson C, Coulter A, Wright L. Short form 36 (SF36) health survey questionnaire: normative data for adults of working age. BMJ 1993;306(6890):1437-40. 140. Ware JE, Jr., Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care 1992;30(6):473-83. 141. Jenkinson C, Gray A, Doll H, Lawrence K, Keoghane S, Layte R. Evaluation of index and profile measures of health status in a randomized controlled trial. Comparison of the Medical Outcomes Study 36-Item Short Form Health Survey, EuroQol, and disease specific measures. Med Care 1997;35(11):1109-18. 142. Velanovich V. Laparoscopic vs open surgery: a preliminary comparison of qualityof-life outcomes. Surg Endosc 2000;14(1):16-21. 143. Neumayer L, Giobbie-Hurder A, Jonasson O, Fitzgibbons R, Jr., Dunlop D, Gibbs J, et al. Open mesh versus laparoscopic mesh repair of inguinal hernia. N Engl J Med 2004;350(18):1819-27. 144. Kingsnorth AN, Wright D, Porter CS, Robertson G. Prolene Hernia System compared with Lichtenstein patch: a randomised double blind study of short-term and medium-term outcomes in primary inguinal hernia repair. Hernia 2002;6(3):113-9. 145. Nienhuijs SW, van Oort I, Keemers-Gels ME, Strobbe LJ, Rosman C. Randomized trial comparing the Prolene Hernia System, mesh plug repair and Lichtenstein method for open inguinal hernia repair. Br J Surg 2005;92(1):33-8. 146. Post S, Weiss B, Willer M, Neufang T, Lorenz D. Randomized clinical trial of lightweight composite mesh for Lichtenstein inguinal hernia repair. Br J Surg 2004;91(1):44-8. 147. Zieren J, Kupper F, Paul M, Neuss H, Muller JM. Inguinal hernia: obligatory indication for elective surgery? A prospective assessment of quality of life before and after plug and patch inguinal hernia repair. Langenbecks Arch Surg 2003;387(11-12):417-20. 148. Bringman S, Heikkinen TJ, Wollert S, Osterberg J, Smedberg S, Granlund H, et al. Early results of a single-blinded, randomized, controlled, Internet-based multicenter trial comparing Prolene and Vypro II mesh in Lichtenstein hernioplasty. Hernia 2004;8(2):127-34. 149. Burney RE, Jones KR, Coon JW, Blewitt DK, Herm A, Peterson M. Core outcomes measures for inguinal hernia repair. J Am Coll Surg 1997;185(6):509-15.

73

150. Poobalan AS, Bruce J, King PM, Chambers WA, Krukowski ZH, Smith WC. Chronic pain and quality of life following open inguinal hernia repair. Br J Surg 2001;88(8):1122-6. 151. Jenkinson C, Layte R, Jenkinson D, Lawrence K, Petersen S, Paice C, et al. A shorter form health survey: can the SF-12 replicate results from the SF-36 in longitudinal studies? J Public Health Med 1997;19(2):179-86. 152. Jenkinson C, Layte R, Lawrence K. Development and testing of the Medical Outcomes Study 36-Item Short Form Health Survey summary scale scores in the United Kingdom. Results from a large-scale survey and a clinical trial. Med Care 1997;35(4):410-6. 153. Jenkinson C. Comparison of UK and US methods for weighting and scoring the SF36 summary measures. J Public Health Med 1999;21(4):372-6. 154. Jenkinson C, Lawrence K, McWhinnie D, Gordon J. Sensitivity to change of health status measures in a randomized controlled trial: comparison of the COOP charts and the SF-36. Qual Life Res 1995;4(1):47-52. 155. Lawrence K, McWhinnie D, Goodwin A, Gray A, Gordon J, Storie J, et al. An economic evaluation of laparoscopic versus open inguinal hernia repair. J Public Health Med 1996;18(1):41-8. 156. Kantz ME, Harris WJ, Levitsky K, Ware JE, Jr., Davies AR. Methods for assessing condition-specific and generic functional status outcomes after total knee replacement. Med Care 1992. 157. Shields RK, Enloe LJ, Leo KC. Health related quality of life in patients with total hip or knee replacement. Arch Phys Med Rehabil 1999;80(5):572-9. 158. Mangione CM, Goldman L, Orav EJ, Marcantonio ER, Pedan A, Ludwig LE, et al. Health-related quality of life after elective surgery: measurement of longitudinal changes. J Gen Intern Med 1997;12(11):686-97. 159. Feeny D, Wu L, Eng K. Comparing short form 6D, standard gamble, and Health Utilities Index Mark 2 and Mark 3 utility scores: results from total hip arthroplasty patients. Qual Life Res 2004;13(10):1659-70. 160. McMurray A, Grant S, Griffiths S, Letford A. Health-related quality of life and health service use following total hip replacement surgery. J Adv Nurs 2002;40(6):663-72. 161. Stucki G, Liang MH, Phillips C, Katz JN. The Short Form-36 is preferable to the SIP as a generic health status measure in patients undergoing elective total hip arthroplasty. Arthritis Care Res 1995;8(3):174-81. 162. Wasielewski RC, Weed H, Prezioso C, Nicholson C, Puri RD. Patient comorbidity: relationship to outcomes of total knee arthroplasty. Clin Orthop 1998(356):85-92. 163. Fuchs S, Sandmann C, Gerdemann G, Skwara A, Tibesku CO, Bottner F. Quality of life and clinical outcome in salvage revision total knee replacement: hinged vs total condylar design. Knee Surg Sports Traumatol Arthrosc 2004;12(2):140-3. 164. Benson ER, Resine ST, Lewis CG. Functional outcome of arthrodesis for failed total knee arthroplasty. Orthopedics 1998;21(8):875-9. 165. Whitehouse JD, Friedman ND, Kirkland KB, Richardson WJ, Sexton DJ. The impact of surgical-site infections following orthopedic surgery at a community 74

hospital and a university hospital: adverse quality of life, excess length of stay, and extra cost. Infection Control and Hospital Epidemiology 2002;23(4):183-9. 166. March LM, Cross MJ, Lapsley H, Brnabic AJ, Tribe KL, Bachmeier CJ, et al. Outcomes after hip or knee replacement surgery for osteoarthritis. A prospective cohort study comparing patients' quality of life before and after surgery with agerelated population norms. Med J Aust 1999;171(5):235-8. 167. Dierick F, Aveniere T, Cossement M, Poilvache P, Lobet S, Detrembleur C. Outcome assessment in osteoarthritic patients undergoing total knee arthroplasty. Acta Orthop Belg 2004;70(1):38-45. 168. Chiu HC, Chern JY, Shi HY, Chen SH, Chang JK. Physical functioning and healthrelated quality of life: before and after total hip replacement. Kaohsiung J Med Sci 2000;16(6):285-92. 169. McGuigan FX, Hozack WJ, Moriarty L, Eng K, Rothman RH. Predicting quality-oflife outcomes following total joint arthroplasty. Limitations of the SF-36 Health Status Questionnaire. J Arthroplasty 1995;10(6):742-7. 170. Mandy A, Pearman A, Ross K. Postdischarge support for elective hip arthroplasty patients. Physiotherapy Theory and Practice 2000;16(3):161-8. 171. Harwood RH, Ebrahim S. A comparison of the responsiveness of the Nottingham extended activities of daily living scale, London handicap scale and SF-36. Disabil Rehabil 2000;22(17):786-93. 172. Bert JM, Gross M, Kline C. Outcome results after total knee arthroplasty: does the patient's physical and mental health improve? Am J Knee Surg 2000;13(4):223-7. 173. van Essen GJ, Chipchase LS, O'Connor D, Krishnan J. Primary total knee replacement: short-term outcomes in an Australian population. J Qual Clin Pract 1998;18(2):135-42. 174. Blanchard CM, Cote I, Feeny D. Comparing short form and RAND physical and mental health summary scores: results from total hip arthroplasty and high-risk primary-care patients. Int J Technol Assess Health Care 2004. 175. Moran M, Khan A, Sochart DH, Andrew G. Evaluation of patient concerns before total knee and hip arthroplasty. J Arthroplasty 2003;18(4):442-5. 176. Group TE. EuroQol: a new facility for the measurement of health-related quality of life. Health Policy 1990;16:199-208. 177. van Agt H, Essinck-Bot M, Krabbe P, al. e. Test-retest reliability of health state valuations colelcted with the EuroQol questionnaire. Social Science and Medicine 1994;39:1537-1544. 178. Hurst N, Jobanputra P, Hunter M, al. e. Validity of Euroqol - a generic health status instrument - in patients with rheumatoid arthritis. British Journal of Rheumatology 1994;33:655-662. 179. Brazier J, Jones N, Kind P. Testing the validity of the Euroqol and comparing it with the SF-36 Health Survey questionnaire. Quality of Life Research 1993;2:169-180. 180. Loftus S. A longitudinal, quality of life study comparing four layer bandaging and superficial venous surgery for the treatment of venous leg ulcers. J Tissue Viability 2001.

75

181. Luo N, Chew LH, Fong KY, Koh DR, Ng SC, Yoon KH, et al. Validity and reliability of the EQ-5D self-report questionnaire in English-speaking Asian patients with rheumatic diseases in Singapore. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation 2003;12(1):87-92. 182. Tidermark J, Bergstrom G, Svensson O, Tornkvist H, Ponzer S. Responsiveness of the EuroQol (EQ 5-D) and the SF-36 in elderly patients with displaced femoral neck fractures. Qual Life Res 2003;12(8):1069-79. 183. Tidermark J, Zethraeus N, Svensson O, Tornkvist H, Ponzer S. Femoral neck fractures in the elderly: functional outcome and quality of life according to EuroQol. Qual Life Res 2002;11(5):473-81. 184. Boyle M, Furlong W, Torrance GW. Reliability of the Health Utilities Index-Mark III used in the 1991 cycle 6 General Social Survey Health Questionnaire: McMaster University Centre for Health Economics and Policy Analysis, 1994. 185. Feeny D, Torrance GW, Furlong W. Health Utilities Index. In: Spilker B, editor. Quality of life and pharmacoeconomics in clinical trials. Philadelphia: LippincottRaven, 1996:239-252. 186. Franks P, Gold M, Erickson P. Do utility based measures of health-related quality of life predict future health states? Longitudinal evidence from a nationally representative cohort: University of Rochester, 1992. 187. Saigal S, Rosenbaum P, Stoskopf B. Comprehensive assessment of the helath status of extremely low birth weight children at 8 years of age: comparison with a refrence group. Journal of Pediatrics 1994;125:411-417. 188. Saigal S, Feeny D, Furlong W. Comparison of the health related quality of life of extremely birth weight children and a reference group of children at 8 years of age. Journal of Pediatrics 1994;125:418-425. 189. Feeney SL. The relationship between pain and negative affect in older adults: anxiety as a predictor of pain. J Anxiety Disord 2004;18(6):733-44. 190. Blanchard C, Feeny D, Mahon JL, Bourne R, Rorabeck C, Stitt L, et al. Is the Health Utilities Index responsive in total hip arthroplasty patients? J CLIN EPIDEMIOL 2003;56(11):1046-1054. 191. Scott J, Huskisson E. Vertical or horizontal visual analogue scales. Ann Rheum Dis 1979;38:560. 192. Downie W, KLeatham P, Rhind V. Studies with pain rating scales. Ann Rheum Dis 1978;37(378-381). 193. Crow R, Gage H, Hampson S, Hart J, Kimber A, Storey L, et al. The measurement of satisfaction with healthcare: implications for practice from a systematic review of the literature. Health Technol Assess 2002;6(32):1-244. 194. Staniszewska S, Henderson L. Patients evaluations of their health care: the expression of negative evaluation and the role of adaptive strategies. Patient Education and Counselling 2004;55:185-192. 195. Mancuso CA, Salvati EA. Patients' satisfaction with the process of total hip arthroplasty. J Healthc Qual 2003;25(2):12-8.

76

196. Commission TA. Measuring quality: the patients' view of day surgery. London: The Audit Commission, 1991. 197. Black N, Sanderson C. Day surgery; development of a questionnaire for eliciting patients' experiences. Qual Health Care 1993;2(3):157-61. 198. Black N, Petticrew M, Hunter D, Sanderson C. Day surgery: development of a national comparative audit service. Qual Health Care 1993;2(3):162-6. 199. Jenkinson C, Coulter A, Bruster S. The Picker Patient Experience Questionnaire: development and validation using data from in-patient surveys in five countries. Int J Qual Health Care 2002;14(5):353-8. 200. Jenkinson C, Coulter A, Bruster S, Richards N, Chandola T. Patients' experiences and satisfaction with health care: results of a questionnaire study of specific aspects of care. Qual Saf Health Care 2002;11(4):335-9. 201. Bruster S, Jarman B, Bosanquet N, Weston D, Erens R, Delbanco TL. National survey of hospital patients. BMJ 1994;309(6968):1542-6. 202. NHS. NHS Trust Patient Surveys, 2005. 203. Bruce J, Russell EM, Mollison J, Krukowski ZH. The measurement and monitoring of surgical adverse events. Health Technol Assess 2001;5(22):1-194. 204. Desai P. The outcomes of cataract surgery: the relationships between visual acuity, visual function and quality of life [PhD]. University of London, 1996. 205. Lewsey J. National Joint Registry. In: Smith S, editor. London, 2005. 206. Reeves B. ASEPSIS patient reported wound infection questionnaire. In: Cano S, editor. London, 2005. 207. Wilson A, Treasure T, Sturridge E, Gruneberg R. A scoring method (ASEPSIS) for post-operative wound infections for use in clinical trials of antibiotic prophylaxis. Lancet 1986;i:311-3. 208. Brazier JE, Deverill M, Green C, Harper R, Booth A. A review of the use of health status measures in economic evaluation. Health Technol Assess 1999;3(9):1-158. 2609. Brazier JE, Roberts J, Deverill M. The estimation of a preference-based measure of health from the SF-36. Journal of Health Economics 2002;21(2):271-292. 210. Brazier J, Roberts J, Tsuchiya A, Busschbach J. A comparison of the EQ-5D and SF6D across seven patient groups. Health Econ 2004;13(9):873-84. 211. NICE. Guide to the methods of technology appraisal. London: National Instuitute of Clinical Excellence, 2004.

77

Appendix 1: Final set of text search terms for PROMs Cataract surgery ((((mobility) or (ADL) or (symptom*) or (activities of daily living) or (satisfaction) or (pain) or (performance status) or (disability scale) or (functional status) or (quality of life) or (health status)) or ((patient based) or (self report*) or (patient report*) or (patient related) or (patient*)) or ((score*) or (questionnaire*) or (scale*) or (measure*) or (instrument*))) and ((valid*) or (reliab*) or (psychomet*) or (test retest) or (repeatability) or (acceptability) or (reproducib*) or (sensitivity near change) or (effect size) or (responsive*))) and ((eye disease*) or (cataract*)) Varicose veins Surgery ((((mobility) or (ADL) or (symptom*) or (activities of daily living) or (satisfaction) or (pain) or (performance status) or (disability scale) or (functional status) or (quality of life) or (health status)) or ((patient based) or (self report*) or (patient report*) or (patient related) or (patient*)) or ((score*) or (questionnaire*) or (scale*) or (measure*) or (instrument*))) and ((valid*) or (reliab*) or (psychomet*) or (test retest) or (repeatability) or (acceptability) or (reproducib*) or (sensitivity near change) or (effect size) or (responsive*))) and ((lower limb and (ischaemia) or (ischemia)) or ((superficial vein*) or (varicose vein*) or (venous insufficiency) or (venous disease*) or (saphenous vein*)) or (claudication)) Hernia repair ((((mobility) or (ADL) or (symptom*) or (activities of daily living) or (satisfaction) or (pain) or (performance status) or (disability scale) or (functional status) or (quality of life) or (health status)) or ((patient based) or (self report*) or (patient report*) or (patient related) or (patient*)) or ((score*) or (questionnaire*) or (scale*) or (measure*) or (instrument*))) and ((valid*) or (reliab*) or (psychomet*) or (test retest) or (repeatability) or (acceptability) or (reproducib*) or (sensitivity near change) or (effect size) or (responsive*))) and (herni*) Hip and/or knee replacement surgery ((((mobility) or (ADL) or (symptom*) or (activities of daily living) or (satisfaction) or (pain) or (performance status) or (disability scale) or (functional status) or (quality of life) or (health status)) or ((patient based) or (self report*) or (patient report*) or (patient related) or (patient*)) or ((score*) or (questionnaire*) or (scale*) or (measure*) or (instrument*))) and ((valid*) or (reliab*) or (psychomet*) or (test retest) or (repeatability) or (acceptability) or (reproducib*) or (sensitivity near change) or (effect size) or (responsive*))) and ((knee* and osteoarth*) or (hip* and osteoarth*) or (knee* and rheum*) or (hip* and arth*) or (knee* and arth*) or (hip replacement) or (knee replacement) or (knee* arthroplast*) or (hip* arthroplasty))

78

Appendix 2: Final set of MESH search terms for PROMs 1. Body part/disease Cataract surgery TEXT term Cataract

Varicose veins surgery TEXT term Superficial vein Venous insufficiency Saphenous vein Varicose vein Venous disease Hernia repair TEXT term Hernia

MESH term Cataract Cataract extraction

MESH term Femoral Vein Venous Engorgement Venous Insufficiency Venous Ulcer Saphenous Vein Varicose Veins Venous Insufficiency

MESH term Hernia, Inguinal Hernia, Femoral

Hip and/or replacement surgery TEXT term MESH term Arthroplast* Arthroplasty Arthroplasties Hip Replacement Knee Replacement Knee replacement Knee Injuries Knee Knee Dislocation Knee Joint Knee Prosthesis Osteoarthritis, Knee Hip replacement Hip Hip Dislocation Hip Fractures Hip Injuries Hip Joint Hip Prosthesis Osteoarthritis, Hip Rheum* Rheumatism Rheumatic Diseases Arthritis, Rheumatoid Arthrit* Arthritis 79

Osteoarth*

Content area TEXT term Symptom* ADL Satisfaction Pain

Performance status Disability scale Quality of life Respondent TEXT term Patient based

Elicitation method TEXT term Questionnaire* Scale* Measure* Measure quality TEXT term Reproducib* Sensitivity near change Valid*

Osteoarthritis Arthritis, Degenerative Arthritis, Post-Infectious Arthritis, Postinfectious Osteoarthritis Osteoarthritis, Hip Osteoarthritis, Knee Osteoarthrosis

MESH term Signs and Symptoms Activities of Daily Living Patient Satisfaction Abdominal Pain Analogue Pain Scale Pain Measurement Assessment, Pain Arthralgia Joint Pain Postoperative Pain Karnofsky Performance Status Disability Evaluation Quality-Adjusted Life Years Quality of Life MESH term Assessment, Patient Outcome, Outcome Assessment (Health Care) Assessment, Patient Outcomes Patient Compliance Patient Preference Patient Satisfaction MESH term Questionnaire(s) Questionnaire Design Visual Analog Scale Outcome Measures Outcome Assessment (Health Care) MESH term Reproducibility of Findings Reproducibility of Results Sensitivity and Specificity Reliability and Validity

80

Reliab* Psychomet*

Validity of Results Validity (Epidemiology) Reliability and Validity Reproducibility of Results Reliability (Epidemiology) Psychometrics

81

Appendix 3: Final set of text search and MESH terms for patientreported post-operative complications

Disease terms as in Appendix 1 and 2. Search terms ((complication*) or (adverse event*)) and ((patient based) or (self report*) or (patient report*) or (patient related)) MESH terms (("Postoperative-Complications" / all SUBHEADINGS in MIME,MJME) or ("Intraoperative-Complications" / all SUBHEADINGS in MIME,MJME)) and ((patient based) or (self report*) or (patient report*) or (patient related))

82

Appendix 4: Data extraction sheet Reference number

Reviewer: Date of extraction:

Name of measure: Instrument in the file? Type of measure (ie generic, disease specific, domain specific) Target Population (ie disease, country/language) Description of measure (including conceptual model (cm), description of subscales, response scales, time frame) Administration (ie patient-reported, interviewer administered, other proxies) Scoring (ie overall score, subscales) Respondent burden/feasibility (ie completion rates, time to administer, other resource implications) Aims of paper Study aims and design (psychometric, RCT, observational etc) Sample characteristics (N, age, gender, disease, country) Timings of administrations Outcome Measures (if RCT or observational) or validation measures (if psychometric)

83

Methods (ie psychometric methods, statistics, criteria) Questionnaire Development Item Reduction Acceptability Validity Content Validity Criterion-related Validity Construct Validity Convergent Validity Discriminant Validity Known Groups Reliability Internal Consistency Test retest Responsiveness Interpretability Statistical Methods for RCT or Observational Results (e.g. psychometric/statistical results) Questionnaire Development Item Reduction Acceptability Validity Content Validity Criterion-related Validity Construct Validity Convergent Validity Discriminant Validity Known Groups Reliability Internal Consistency Test retest Responsiveness Interpretability Statistical Results for RCT or Observational Publication details Availability Used for benchmarking? (details) Other versions/cultural & language adaptations (details in this paper) Contact details for authors Other notes

84

Appendix 5: Clinician survey

London School of Hygiene & Tropical Medicine (University of London) Keppel Street, London, WC1E 7HT Switchboard: 0207-636 8636 Telex 8953474 Health Services Research Unit Department of Public Health & Policy Telephone: +44 (0) 207 5668 Fax:+44 (0) 207 580 8183 E-Mail: [email protected] June 2005 Dear Assessing patient outcomes after xxxxxxx A research team from the London School of Hygiene & Tropical Medicine, the Clinical Effectiveness Unit of the Royal College of Surgeons (England) and the RCN Institute has been commissioned by the Department of Health to undertake a systematic review and scientific appraisal of the literature on patient-reported outcome measures (PROMS) in five areas of surgery: cataract surgery, hip replacement, knee replacement, procedures for varicose veins, hernia repair. The team will shortly provide recommendations about the measures that are to be used in a pilot study of PROMS in UK treatment centres later this year. Now that the team has completed a rigorous review of measures, they would like to know surgeons’ views before making their recommendations to the Department of Health. As someone who has worked extensively in xxxxx surgery we are keen to have your input about the measures we are considering recommending. A summary of users’ views will be included in the final report, but no individual responses or names will be included. Attached is a short description of the two measures that the team has recommended for use in xxxxx surgery: measures to be named here. These measures have been chosen on scientific grounds, i.e. they meet rigorous gold standard criteria for reliability, validity and responsiveness. We want to know your views about the acceptability of each questionnaire; this involves rating each measure on a 3-point scale and should only take a few minutes. We will not be asking your for any further information. Please return your ratings to Dr Sarah Smith at the above address (either by post or email) as soon as possible and at latest by xxxxxxx. If you have any queries, please do not hesitate to contact Dr Smith on 0207 637 5668 or by email at [email protected]. Your sincerely Professor xxxxxx On behalf of the PROMS Study team: Dr Sarah Smith, Dr John Browne, Dr Donna Lamping, Professor Nick Black, Dr Jan van der Meulen, Dr Sophie Staniszewka, Dr James Lewsey, Professor John Cairns, Dr Stefan Cano

85

Name of disease-specific questionnaire Hypothetical example: Questionnaire name

Madeupqol

Target population

people who have had xxxx surgery available in English

Description of measure

-

14 items

-

2 domains (physical and psychosocial)

-

questionnaire is reported by the patient.

- each question is rated on a 5 point scale (never to very often). - questionnaire generates 2 domain scores (physical and psychosocial) and an overall score. Time to complete

-

10 minutes

Please consider the following statements in relation to the above questionnaire and choose the one statement that most closely describes your view: F I definitely consider xxxx to be an acceptable disease-specific measure for use in xxxx surgery. F I probably consider xxxx to be an acceptable disease-specific measure for use in xxxx surgery, but I would also consider using a different measure. Please specify which measure you would use instead xxxxxxx. F I do not consider xxxx to be an acceptable disease-specific measure for use in xxxx surgery and would definitely use a different measure. Please specify which measure you would use instead xxxxxxx.

86

Patient-Reported Outcome Measures (PROMs)

The advent of Treatment Centres creates a new opportunity to provide efficient elective surgery to NHS patients. In the increasing culture of accountability within.

573KB Sizes 1 Downloads 190 Views

Recommend Documents

Clinically useful outcome measures for physiotherapy ...
outcome measure. Computer Aided Lung Sound Analysis (CALSA) is proposed as a new .... term studies (around one year),39,40 spirometry was able to detect ...

Clinically useful outcome measures for physiotherapy ...
come measures to monitor their interventions and eval- uate their .... ment and monitoring purposes.26 ..... home management of cystic fibrosis: a pilot study.

Patient reported outcome measures in hospice ...
symptoms and a 1-item (un)well-being measure on a 0-10 numerical scale. All USD ... using Chi Square for categorical data and independent Students T test or ...

nefopam, regulatory outcome: variation
Mar 11, 2017 - Considering the presented cumulative analysis of cases reporting withdrawal symptoms and drug abuse the ... Package Leaflet. •. Section 4 ...

hydrochlorothiazide / lisinopril Regulatory outcome - European ...
Sep 29, 2016 - Send a question via our website www.ema.europa.eu/contact. © European ... Product Name (in authorisation country) ... not available. 1-19510.

Revenue Mobilisation MeasuRes - WTS
full benefits of increased employment opportunities, reduction in import bill, acquisition of new technology, as well as increased ... business at the various ports.

Risk Measures
May 31, 2016 - impact to simplify the business process in the country. For example, to make .... aim to cut down the number of items returned by the customers. Till now, a ...... the allocation of the new 1800 MHz spectrum has propelled the ...

Argentina - Import Measures (AB) - WorldTradeLaw.net
Jan 15, 2015 - 2.1.2 Identification of the single unwritten TRRs measure . ...... Canada – Renewable Energy /. Canada – Feed-in Tariff ...... whose panel request simply refers to external sources runs the risk that such request may fall short of 

Argentina - Import Measures (Panel) - WorldTradeLaw.net
Aug 22, 2014 - Panel Report, Canada – Certain Measures Affecting the Automotive ...... USD 4 billion in the first semester of the year), 23 ...... activity of the firm, progress on the degree of integration of local content and the relationship wit

Active substance: rabeprazole - Regulatory outcome: maintenance
Jun 9, 2016 - 30 Churchill Place ○ Canary Wharf ○ London E14 5EU ○ United Kingdom ... Procedure Management and Committees Support. List of ...

escitalopram PSUSA 001265-201612, Regulatory outcome ...
Sep 1, 2017 - Cipralex® 5 mg – Filmtabletten. SE/H/0278/001. 1-24549. H. LUNDBECK A/S. AT. Cipralex® 10 mg – Filmtabletten. SE/H/0278/002. 1-24550.

Active substance: ketamine Regulatory outcome: maintenance
Sep 2, 2016 - Send a question via our website www.ema.europa.eu/contact ... Ketalar 50 mg/ml injeksjonsvæske, oppløsning not available. 5724. PFIZER AS.

Active substance: chloroquine Regulatory outcome: variation
Mar 31, 2016 - 30 Churchill Place ○ Canary Wharf ○ London E14 5EU ○ United Kingdom ... Procedure Management and Committees Support. List of ...

IGF-1 to improve neural outcome
Jun 27, 2003 - US RE43,982 E. Feb. 5, 2013. (54) IGF-1 TO IMPROVE NEURAL OUTCOME. (75) Inventors: Peter Gluckman, Auckland (NZ);. Karoly Nikolich, Emerald Hills, CA. (Us). (73) Assignees: Genentech Inc., South San Francisco,. CA (U S); Auckland Unise

Active substance: valaciclovir Regulatory outcome: maintenance
Sep 2, 2016 - SIGMA-TAU INDUSTRIE. FARMACEUTICHE RIUNITE. S.P.A.. IT. Valaciclovir ”Orion”. UK/H/2940/01/DC. 44083. ORION OYJ. DK. Valaciclovir ...

cefazolin PSUSA 000589-201611, Regulatory outcome: maintenance
Sep 1, 2017 - Cefazolin MIP 2 g. FI/H/0778/001/DC. 11-8738. MIP PHARMA GMBH. NO. Cefazolin MIP Pharma 2 g injektio/infuusiokuiva-aine, liuosta varten.

Course Outcome - Web Programming and Networking Lab.pdf ...
Course Outcome - Web Programming and Networking Lab.pdf. Course Outcome - Web Programming and Networking Lab.pdf. Open. Extract. Open with. Sign In.

A Synthesis of Outcome Research.pdf
all ages, and with couples and families across the life cycle. The second element of the credentialing process is preparation of a port- folio, supported by 10 ...

Active substance: rupatadine Regulatory outcome - European ...
Feb 11, 2016 - 30 Churchill Place ○ Canary Wharf ○ London E14 5EU ○ United Kingdom ... Procedure Management and Committees Support. List of ...

Active substance: Captopril-hydrochlorothiazide Regulatory outcome ...
Jan 28, 2017 - The primary analysis examined sudden death within seven days of an outpatient prescription for one of ... Package Leaflet. •. Section 2.

Active substance: hydromorphone Regulatory outcome: variation
Oct 29, 2016 - Taking into account the PRAC Assessment Report on the PSUR(s) for ... 3. Annex II. Amendments to the product information of the nationally ...