IPDET textbook 2008 ver 6.1

Viewer
Transcript

The Road to Results Designing and Conducting Effective Development Evaluations

Linda MorraMorra-Imas Ray C. Rist

June 2008

© 2008 The World Bank All rights reserved. The findings, interpretations, and conclusions expressed herein are those of the authors and do not necessarily reflect the views of the Board of Executive Directors of the World Bank or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work. The boundaries, colors, denominations, and other information shown on any map in this work do not imply any judgment of the part of the World Bank concerning the legal status of any territory of the endorsement or acceptance of such boundaries. Rights and Permissions The material of this work is copyrighted. Copying and/or transmitting portions or all of this work without permission may be a violation of applicable law. The World Bank encourages dissemination of its work and will normally grant permission promptly.

Printed at Carleton University, Ottawa, Ontario, Canada. June 2008.

Authors: Dr. Linda Morra-Imas Chief Evaluation Officer Independent Evaluation Group (IEG) World Bank Group 2121 Pennsylvania Ave Washington, D.C. 20433 U.S.A Dr. Ray C. Rist Senior Evaluation Officer Independent Evaluation Group (IEG) The World Bank 1818 H. Street, N.W. Washington, D.C. 20433 U.S.A.

Curriculum Developer/Instructional Designer Diane Schulz Novak, M.S.; M.S. Instructional Designer Brandon, Manitoba R7B 0W8 Canada

Revision Date June 2008

Acknowledgments The authors of this manual would like to thank the following for their contributions to the curriculum. These colleagues instructed at IPDET, reviewed drafts of the curriculum, and/or provided material from their workshops or other sources to this manual. Their contributions are greatly appreciated. Doha Abdelhamid Martin Abrams Marie-Hélène Adrien Suresh Balakrishnan Michael J. Bamberger Michael Barzelay Janet Mancini Billson Jennifer Birch-Jones Bob Boruch Heather Buchanan Colin Campbell Soniya Carvalho Ajay Chhibber Harry Cummings Niels Dabelstein Sid Edelmann Jouni Eerikainen Mari Fitzduff Steen Folke Ted Freeman Sulley Gariba Patrick G. Grasso Penny Hawkins John Heath Greg Ingram Gregg B. Jackson Edward T. Jackson Gail Johnson John Johnson Yusuf Kassam James (Jed) Edwin Kee Elizabeth M. King

Colin Kirk Ted Kliest Jody Kusek Jeanne LaFortune Frans L. Leeuw Harvey Lithwick Norman T. London Charles Lusthaus Nancy MacPherson Mohamed Manai Ghazala Mansuri John Mayne Roland Michelitsh Britha Mikkelsen Joseph J. Molnar Karen Odhiambo Eric Oldsman Michael Q. Patton Ted Paterson Robert Picciotto Nancy Porteous Sukai Prom-Jackson Kim Scott Michael Scriven James Sanders Terry Smutylo Martin Steinmeyer William E. Stevenson Susan Stout Gene Swimmer Robert D. van den Berg Nicholas M. Zacchea

Table of Contents Overview of the Textbook ................................................................ ........................................................................................... ........................................................... 1 Relationship of Textbook to IPDET ............................................................................ 3 Organization of the Text ........................................................................................... 4

Foundations ................................................................ ................................................................................................ ................................................................................ ................................................ 7 Chapter 1 ................................................................ ................................................................................................ .................................................................................... .................................................... 9 Introducing Development Evaluation ................................................................ ............................................................................ ............................................ 9 Part I: Evaluation: What Is It? ................................................................................ 10 Defining Evaluation ............................................................................................ 10 Purpose of Evaluation ......................................................................................... 14 Benefits of Evaluation ......................................................................................... 15 What to Evaluate ................................................................................................ 17 Uses of Evaluation .............................................................................................. 18 Relation between Monitoring and Evaluation ....................................................... 20 Roles and Activities of Professional Evaluators .................................................... 22 Who Conducts the Evaluation? ........................................................................ 22 Evaluator Activities ......................................................................................... 24 Part II: The Origins and History of the Evaluation Discipline ................................... 24 Influences from New Efforts ................................................................................ 26 Evolution of Development Evaluation .................................................................. 29 Audit Tradition................................................................................................ 29 Audit and Evaluation ...................................................................................... 31 Social Science Tradition .................................................................................. 31 Part III: The Development Evaluation Context ......................................................... 33 The Growth of Development Evaluation ............................................................... 35 Growth of Professional Evaluation Associations ................................................... 37 Part IV: Principles and Standards for Development Evaluation ................................ 38 Evaluation and Independence.......................................................................... 48 Summary ............................................................................................................... 52 Chapter 1 Activities ................................................................................................ 53 Application Exercise 1.1 ..................................................................................... 53 References and Further Reading ............................................................................. 54

Chapter 2 ................................................................ ................................................................................................ .................................................................................. .................................................. 63 Understanding Issues Driving Development Evaluation............................................... ............................................... 63 Part I: Evaluation in Developed and Developing Countries: An Overview .................. 64 Evaluation in Developed Countries ..................................................................... 64 Whole-of-Government, Enclave, and Mixed Approaches ....................................... 67 The Whole-of-Government Approach ................................................................ 67 Enclave Approach ........................................................................................... 68 Mixed Approach .............................................................................................. 68 Evaluation in Developing Countries .................................................................... 76 IPDET ............................................................................................................. 78 New Evaluation Systems ................................................................................. 80

Table of Contents Part II: Emerging Development Issues: What Are the Evaluation Implications? ........ 88 Millennium Development Goals (MDGs) .............................................................. 91 Monterrey Consensus ......................................................................................... 94 Paris Declaration on Aid Effectiveness ................................................................. 96 Debt Initiative for Heavily-Indebted Poor Countries (HIPC) ................................... 97 The Emergence of New Actors in International Development Assistance ............... 99 Conflict Prevention and Post-Conflict Reconstruction ........................................ 100 Governance ...................................................................................................... 102 Anti-Money Laundering and Terrorist Financing................................................ 103 Workers’ Remittances ....................................................................................... 105 Gender: From Women in Development (WID) to Gender and Development (GAD) to Gender Mainstreaming ..................................................................................... 106 Private Sector Development (PSD) and Investment Climate ................................ 108 Responding to PSD Initiatives ........................................................................ 113 Environmental and Social Sustainability ........................................................... 115 Global Public Goods ......................................................................................... 116 Summary ............................................................................................................. 119 References and Further Reading ........................................................................... 121

Preparing ................................................................ ................................................................................................ ................................................................................ ................................................ 135 Chapter 3 ................................................................ ................................................................................................ ................................................................................ ................................................ 137 Building a ResultsResults-Based Monitoring and Evaluation System .................................... .................................... 137 Part I: Importance of Results-based M&E.............................................................. 138 Part II: What Is Results-based Monitoring and Evaluation? ................................... 140 Part III: Traditional vs. Results-Based M&E .......................................................... 142 Brief Introduction to Theory of Change .............................................................. 142 Performance Indicators ................................................................................. 145 Part IV: The Ten Steps to Building a Results-Based M & E System ........................ 147 Step One: Conducting a Readiness Assessment ................................................. 147 Step Two: Agreeing on Performance Outcomes to Monitor and Evaluate ............. 150 Step Three: Developing Key Indicators to Monitor Outcomes .............................. 152 Step Four: Gathering Baseline Data on Indicators ............................................. 155 Step Five: Planning for Improvements—Setting Realistic Targets ........................ 157 Step Six: Monitoring for Results ........................................................................ 160 Step Seven: The Role of Evaluations .................................................................. 163 Step Eight: Reporting Findings ......................................................................... 165 Step Nine: Using Findings................................................................................. 167 Step Ten: Sustaining the M&E System within the Organization ......................... 170 Concluding Comments ..................................................................................... 172 Last Reminders ............................................................................................. 172 Going Forward .............................................................................................. 173 Summary ............................................................................................................. 173 Chapter 3 Activities .............................................................................................. 175 Application Exercise 3.1: Get the Logic Right..................................................... 175 Application Exercise 3.2: Identifying Inputs, Activities, Outputs, Outcomes, and Impacts ............................................................................................................ 177 Application Exercise 3.3: Developing Indicators ................................................. 178 References and Further Reading: .......................................................................... 180

Chapter 4 ................................................................ ................................................................................................ ................................................................................ ................................................ 183 Understanding the Evaluation Context and Program Program Theory of Change ..................... 183 Part I: Front-end Analysis .................................................................................... 184 Balancing Potential Costs and Benefits of the Evaluation .................................. 185

Page viii

The Road to Results: Designing and Conducting Effective Development Evaluations

Table of Contents Pitfalls Involved in the Front-End Planning Process ........................................... 186 Part II: Identifying the Main Client and Key Stakeholders ...................................... 187 The Main Client ................................................................................................ 187 Stakeholders .................................................................................................... 188 Identifying and Involving Key Stakeholders ....................................................... 190 Stakeholder Analysis ........................................................................................ 191 Stakeholders: Diverse Perspectives.................................................................... 194 Part III: Understanding the Context ...................................................................... 194 Existing Theoretical and Empirical Knowledge of the Project, Program, or Policy 196 Part IV: Constructing, Using, and Assessing a Theory of Change ........................... 198 Why Use a Theory of Change ............................................................................ 200 Constructing a Theory of Change ...................................................................... 202 Is There Research Underlying the Program? ................................................... 203 What Is the Logic of the Program? ................................................................. 205 What Are the Key Assumptions Being Made? ................................................. 206 Theory of Change Template to Assist with Discussions ................................... 208 Graphic Representations of Theory of Change ................................................... 214 Basic Model .................................................................................................. 214 Different Graphic Formats to Depict Theory of Change Models ....................... 215 Logical Framework (Logframe) ....................................................................... 222 Assessing a Theory of Change ........................................................................... 225 Summary ............................................................................................................. 227 Chapter 4 Activities .............................................................................................. 228 Application Exercise 4.1: Applying the Theory of Change/Logic Model ............... 228 Application Exercise 4.2: Applying the Theory of Change/Logic Model ............... 229 Application Exercise 4.3: Your Program............................................................. 230 Resources and Further Reading:........................................................................... 232

Chapter 5 ................................................................ ................................................................................................ ................................................................................ ................................................ 247 Considering the Evaluation Approach ................................................................ ...................................................................... ...................................... 247 Part I: Introduction to Evaluation Approaches....................................................... 248 Part II: Development Evaluation Approaches ......................................................... 249 Prospective Evaluation...................................................................................... 249 Evaluability Assessment ................................................................................... 251 Goal-based Evaluations .................................................................................... 253 Goal-free Evaluations ....................................................................................... 254 Multi-site Evaluations ...................................................................................... 256 Challenges for Multi-Site Evaluations ............................................................ 257 Cluster Evaluations .......................................................................................... 258 Social Assessment ............................................................................................ 259 Environment and Social Assessment ................................................................. 263 E&S Guidelines/Standards/Strategies .......................................................... 263 The Equator Principles .................................................................................. 264 ISO 14031 .................................................................................................... 264 Sustainable Development Strategies: A Resource Book ................................... 265 Participatory Evaluation ................................................................................... 266 Challenges of Participatory Evaluation in Developing Countries ..................... 268 Benefits of Participatory Evaluation in Developing Countries .......................... 269 Importance of Participatory Evaluation .......................................................... 270 Outcome Mapping ............................................................................................ 272 Rapid Assessment ............................................................................................ 275 Evaluation Synthesis and Meta-evaluation ........................................................ 277 Meta-evaluation ............................................................................................ 279 Emerging Approaches ....................................................................................... 280

The Road to Results: Designing and Conducting Effective Development Evaluations

Page ix

Table of Contents Utilization Focused Evaluation ...................................................................... 281 Empowerment Evaluation ............................................................................. 281 Realist Evaluation ......................................................................................... 283 Inclusive Evaluation ...................................................................................... 284 Beneficiary Assessment ................................................................................. 284 Horizontal Evaluation .................................................................................... 285 Part III: Challenges Going Forward ....................................................................... 286 Summary ............................................................................................................. 288 Chapter 5 Activities: ............................................................................................. 294 Application Exercise 5.1: Describing the Approaches ......................................... 294 Application Exercise 5-2: .................................................................................. 294 Application Exercise 5-3: .................................................................................. 295 References and Further Reading ........................................................................... 296

Designing & Conducting ................................................................ .......................................................................................... .......................................................... 303 Chapter 6 ................................................................ ................................................................................................ ................................................................................ ................................................ 305 Developing Evaluation Questions and Starting the Design Matrix .............................. 305 Part I: Sources of Questions ................................................................................. 306 Part II: Three Types of Questions .......................................................................... 308 Descriptive Questions ....................................................................................... 309 Normative Questions ........................................................................................ 310 Cause and Effect Questions .............................................................................. 312 Relationship of Question Types to Outcome Models ........................................... 317 Part III: Identifying and Selecting Questions.......................................................... 318 Part IV: Keys for Developing Good Evaluation Questions ....................................... 322 Part V: Suggestions for Developing Questions ....................................................... 323 Part VI: Evaluation Design ................................................................................... 325 The Evaluation Design Process ......................................................................... 326 The Evaluation Design Matrix ........................................................................... 331 Summary ............................................................................................................. 337 Chapter 6 Activities .............................................................................................. 338 References and Further Reading ........................................................................... 340

Chapter 7 ................................................................ ................................................................................................ ................................................................................ ................................................ 341 Selecting Designs for Cause and Effect, Normative and Descriptive Evaluation Questions ................................................................ ................................................................................................ ............................................................................... ............................................... 341 Part I: Connecting Questions to Design ................................................................. 342 Broad Categories of Design ............................................................................... 343 Experimental Design ..................................................................................... 343 Quasi-experimental Design............................................................................ 344 Non-experimental Design .............................................................................. 344 Design Notation ................................................................................................ 345 Part II: Experimental Designs for Cause and Effect Questions ............................... 346 Control Groups ............................................................................................. 349 Random Assignment ..................................................................................... 351 Part III: Quasi Experimental Designs and Threats to Validity for Cause and Effect Questions ............................................................................................................ 355 Internal Validity ............................................................................................ 355 Quasi-experimental Designs ............................................................................. 360 Matched and Non-equivalent Comparison Design .......................................... 362 Time Series and Interrupted Time Series Design ............................................ 363 Correlational Design Using Statistical Controls .............................................. 363 Longitudinal Design ...................................................................................... 364

Page x

The Road to Results: Designing and Conducting Effective Development Evaluations

Table of Contents Panel Design ................................................................................................. 365 Before-and-After Designs ............................................................................... 365 Cross Sectional Designs ................................................................................ 366 Propensity Score Matching ............................................................................ 367 Causal Tracing Strategies ................................................................................. 368 Part IV: Designs for Descriptive Questions ............................................................ 371 One-Shot Designs ......................................................................................... 371 Cross Sectional Designs ................................................................................ 372 Before and After Designs ............................................................................... 373 Simple Time Series Designs ........................................................................... 373 Longitudinal Design ...................................................................................... 374 Case Study Designs ...................................................................................... 375 Part IV: Designs for Normative Questions ............................................................. 377 Part VI: The Gold Standard Debated ..................................................................... 378 Making Design Decisions .................................................................................. 380 Summary ............................................................................................................. 385 Chapter 7 Activities: ............................................................................................. 387 Application Exercise 7-1: Selecting an Evaluation Design .................................. 387 Application Exercise 7-2: Selecting an Evaluation Design and Data Collection Strategy ........................................................................................................... 388 References and Further Reading ........................................................................... 389

Chapter 8 ................................................................ ................................................................................................ ................................................................................ ................................................ 395 Instruments................................ ............................................ Selecting and Constructing Data Collection Instruments ............................................ 395 Part I: Data Collection Strategies .......................................................................... 396 Structured Approach ........................................................................................ 397 Semi-structured Approach ................................................................................ 398 Data Collection General Rules .......................................................................... 399 Part II: Key Issues about Measures ....................................................................... 399 Credibility ........................................................................................................ 400 Validity ............................................................................................................ 400 Relevance ......................................................................................................... 401 Reliability ......................................................................................................... 401 Precision .......................................................................................................... 401 Part III: Quantitative and Qualitative Data ............................................................ 403 Obtrusive vs. Unobtrusive Methods ................................................................... 407 Part IV: Common Data Collection Approaches: The Toolkit .................................... 408 Combinations ................................................................................................... 408 Measurement Considerations............................................................................ 409 Tool 1: Participatory Data Collection ................................................................. 410 Community Meetings .................................................................................... 410 Mapping ....................................................................................................... 411 Transect Walks ............................................................................................. 414 Tool 2: Available Records and Secondary Data Analysis ..................................... 415 Tool 3: Observation .......................................................................................... 419 Tool 4: Surveys ................................................................................................. 426 Techniques for Developing Questions............................................................. 430 Techniques for Conducting Face-to-Face Interviews ....................................... 439 Techniques for Developing Self-Administered Questionnaires ......................... 449 Tool 5: Focus Groups........................................................................................ 455 Purpose of Focus Groups .............................................................................. 455 Typical Elements of Focus Groups: ................................................................ 457 Techniques for Focus Group Evaluation Design ............................................. 458 Techniques for Planning Focus Groups .......................................................... 460

The Road to Results: Designing and Conducting Effective Development Evaluations

Page xi

Table of Contents Focus Group Protocol .................................................................................... 463 Techniques for Moderating Focus Groups ...................................................... 467 Tool 6: Diaries, Journals, and Self-reported Checklists .................................. 468 Diaries or Journals ....................................................................................... 468 Self-reported Checklist .................................................................................. 469 Tool 7: Expert Judgment .................................................................................. 471 Selecting Experts .......................................................................................... 472 Tool 8: Delphi Technique .................................................................................. 473 Tool 9: Citizen Report Cards ............................................................................. 476 Final Statement on Tools .................................................................................. 479 Summary ............................................................................................................. 481 Chapter 8 Activities: ............................................................................................. 482 Application Exercise 8-1: Data Collection from Files .......................................... 482 Application Exercise 8.2: Mapping .................................................................... 482 Application Exercise 8-3: Data Collection: Interview .......................................... 483 Application Exercise 8-4: Data Collection — Focus Groups ................................ 484 References and Further Reading ........................................................................... 485

Chapter 9 ................................................................ ................................................................................................ ................................................................................ ................................................ 491 Deciding on the Sampling Strategy................................................................ .......................................................................... .......................................... 491 Part I: Introduction to Sampling ........................................................................... 492 Part II: Types of Samples: Random and Non-random ............................................. 493 Random Sampling ............................................................................................ 493 Generating Random Samples ........................................................................ 493 Types of Random Samples ............................................................................. 494 Non-Random Sampling ..................................................................................... 501 Types of Non-random Samples ...................................................................... 502 Bias and Non-random Sampling .................................................................... 503 Combinations ................................................................................................... 504 Part III: How Confident and Precise Do You Need to Be? ........................................ 504 Part IV: How Large a Sample Do You Need? .......................................................... 506 Part VI: When Do You Need Help from a Statistician?............................................ 509 Part VI: Sampling Glossary ................................................................................... 510 Summary ............................................................................................................. 512 Chapter 9 Activities .............................................................................................. 513 Application Exercise 9-1: Using a Random Number Table .................................. 513 Application Exercise 9-2: Sampling Strategy...................................................... 515 References and Further Reading ........................................................................... 516

Chapter 10 ................................................................ ................................................................................................ .............................................................................. .............................................. 519 Planning Data Analysis and Completing the Design Matrix ........................................ ........................................ 519 Part I: Data Analysis Strategy ............................................................................... 520 Part II: Analyzing Qualitative Data ........................................................................ 521 Making Good Notes .......................................................................................... 522 Organizing Qualitative Data for Analysis ........................................................... 524 Reading and Coding Data ................................................................................. 525 Interpreting Data .............................................................................................. 529 Content Analysis ........................................................................................... 529 Techniques for Analyzing Data from Focus Groups ........................................ 540 Summarizing Qualitative Data .......................................................................... 543 Controlling for Bias .......................................................................................... 545 Affinity Diagram Process ............................................................................... 545 Challenges to Qualitative Data Analysis ......................................................... 546

Page xii

The Road to Results: Designing and Conducting Effective Development Evaluations

Table of Contents Concluding Thoughts on Qualitative Data Analysis ........................................... 547 Part III: Analyzing Quantitative Data .................................................................... 547 Elements of Descriptive Statistics ..................................................................... 548 Measures of Central Tendency, the 3-M’s ....................................................... 548 Measures of Dispersion ................................................................................. 551 Analyzing Quantitative Data Results ................................................................. 555 Commonly Used Descriptive Statistics .............................................................. 557 Describing Two Variables at the Same Time ...................................................... 559 Measures of Relationship .............................................................................. 561 Inferential Statistics ......................................................................................... 563 Chi Square .................................................................................................... 563 T-Test ........................................................................................................... 564 ANOVA (Analysis of Variance) ........................................................................ 564 The Logic of Statistical Significance Testing ................................................... 565 Simple Regression Models ............................................................................. 566 Propensity Score Matching ............................................................................ 566 Data Cleaning .................................................................................................. 567 Part IV: Linking Qualitative Data and Quantitative Data ....................................... 569 Summary ............................................................................................................. 575 Chapter 10 Activities ............................................................................................ 583 Application Exercise 10-1: Affinity Diagram Process .......................................... 583 Application Exercise 10-2: Qualitative Data Coding and Analysis ....................... 584 Application Exercise 10-3: Common Mistakes in Interpreting Quantitative Data 585 Application Exercise 10-4: Analyzing Results from a Questionnaire ................... 586 References and Further Reading ........................................................................... 587

Leading ................................................................ ................................................................................................ ................................................................................... ................................................... 595 Chapter 11 ................................................................ ................................................................................................ .............................................................................. .............................................. 597 Presenting Results ................................................................ ................................................................................................ .................................................................. .................................. 597 Part I: Communication Basics .............................................................................. 598 Communication Strategy .................................................................................. 599 Innovative Communication Strategies ............................................................ 600 Part II: Writing Evaluation Reports for Your Audience ........................................... 602 Writing the Evaluation Report ........................................................................... 603 The Executive Summary ................................................................................ 603 The Body of the Report .................................................................................. 605 Summary of Report Writing .............................................................................. 612 Part III: Using Visual Information ......................................................................... 614 Pictures and Illustrations ................................................................................. 614 Maps ............................................................................................................ 615 Sketches ....................................................................................................... 616 Line Drawings ............................................................................................... 618 Photographs.................................................................................................. 618 Charts and Graphs ........................................................................................... 619 Organization Charts ...................................................................................... 619 Gantt Charts ................................................................................................. 620 Graphs and Data Charts ............................................................................... 621 Choosing a Chart or Graph Type ................................................................... 628 Tables .............................................................................................................. 630 Data Tables................................................................................................... 630 Classification Tables (Matrices) ...................................................................... 632 Illustrating Evaluation Concepts ....................................................................... 632 Illustrating Evaluation Design ....................................................................... 632

The Road to Results: Designing and Conducting Effective Development Evaluations

Page xiii

Table of Contents Illustrating Impact ........................................................................................ 633 Program Logic Charts .................................................................................... 634 Visual Information Design from Tufte ................................................................ 635 Tips and Tricks for Effective Tables and Charts ................................................. 640 Summary of Graphs, Charts, and Tables........................................................... 641 Part IV: Making Oral Presentations ....................................................................... 642 Planning for Your Audience .............................................................................. 642 Preparing Your Presentation ............................................................................. 643 Enhancing Your Presentation ........................................................................... 643 Using Presentation Programs ............................................................................ 644 Practicing Your Presentation ............................................................................. 647 Presentations Tips from Tufte ........................................................................ 647 Part V: Peer Review and Meta-evaluation .............................................................. 649 Summary ............................................................................................................. 651 Chapter 11 Activities ............................................................................................ 652 Application Exercise 11.1: Review Evaluation Reports ....................................... 652 Application Exercise 11-2: Tailor Reports to Audiences ...................................... 653 References and Further Reading ........................................................................... 654

Chapter 12 ................................................................ ................................................................................................ .............................................................................. .............................................. 657 Managing for Quality and Use ................................................................ .................................................................................. .................................................. 657 Part I: Managing the Design Matrix ...................................................................... 658 Part II: Managing an Evaluation ........................................................................... 659 The Evaluation Team ........................................................................................ 659 Terms of Reference ........................................................................................... 660 Contracting Evaluations ................................................................................... 663 Roles and Responsibilities ................................................................................ 665 Evaluation Manager ...................................................................................... 665 Evaluator ...................................................................................................... 668 Main Client ................................................................................................... 673 Stakeholders ................................................................................................. 673 Part III: Project Management Process .................................................................... 673 Part III: Managing Effectively ................................................................................ 677 Managing People Effectively .............................................................................. 677 Meeting with Client for Contextual Information .............................................. 679 Techniques for Teamwork .............................................................................. 684 Teamwork Skills ............................................................................................ 686 Brainstorming ............................................................................................... 687 Affinity Diagrams .......................................................................................... 687 Concept Mapping .......................................................................................... 687 Conflict Resolution ........................................................................................ 688 Communication Strategies ............................................................................ 689 Working with Groups of Stakeholders ............................................................ 690 Managing Tasks Effectively ............................................................................... 696 Part IV: Assessing the Quality of an Evaluation ..................................................... 699 Using a Meta-evaluator ..................................................................................... 701 Helpful Hints for Meta-evaluation .................................................................. 701 Part V: Using Evaluation Results .......................................................................... 701 Influence and Effects of Evaluation ................................................................... 706 Summary ............................................................................................................. 707 Chapter 12 Activities: ........................................................................................... 708 Application Exercise 12.1: Individual Activity — Terms of Reference .................. 708 Application Exercise 12-2: Are You Ready to Be a Manager? .............................. 710 References and Further Reading ........................................................................... 711

Page xiv

The Road to Results: Designing and Conducting Effective Development Evaluations

Table of Contents Chapter 13 ................................................................ ................................................................................................ .............................................................................. .............................................. 717 Evaluating Complex Interventions ................................................................ ............................................................................ ............................................ 717 Part I: Big Picture Views of Development Evaluation ............................................. 718 Move to a Higher Plane ..................................................................................... 719 Part II: Country Program Evaluations ................................................................... 720 Example of Country Program Evaluation Methodology ....................................... 722 Evaluating in Three Dimensions .................................................................... 722 Evaluating Assistance Program Impact .......................................................... 723 Using a Ratings Scale .................................................................................... 724 Retrospective ................................................................................................ 724 Institutional Development Impact .................................................................. 726 Part III: Sector Program Evaluations ..................................................................... 730 Part IV: Thematic Evaluations .............................................................................. 736 Gender in Development .................................................................................... 738 The Importance of Gender in Development Evaluation ................................... 738 The Elements of a Gender-Responsive Evaluation Approach ........................... 742 Part V: Joint Evaluations ..................................................................................... 745 Part VI: Global and Regional Partnership Programs (GRPP) Evaluation .................. 750 Part VII: Evaluation Capacity Development ........................................................... 756 Towards Evaluation Capacity Development ....................................................... 759 Concluding Comments ..................................................................................... 760 Summary ............................................................................................................. 761 Chapter 13 Activities ............................................................................................ 763 Application Exercise 13.1: Building Evaluation Capacity ................................... 763 References and Further Reading ........................................................................... 765

Acting Professionally ................................................................ ............................................................................................... ............................................................... 771 Chapter 14 ................................................................ ................................................................................................ .............................................................................. .............................................. 773 Guiding the Evaluator: Evaluation Ethics, Politics, Standards, and Guiding Principles 773 Part I: Ethical Behavior ........................................................................................ 774 Evaluation Corruptibility and Fallacies ............................................................. 774 Identifying Ethical Problems ............................................................................. 776 Part II: Politics and Evaluation ............................................................................. 779 Causes of Politics in Evaluation ........................................................................ 779 Technical Weaknesses ................................................................................... 780 Human Weaknesses ...................................................................................... 781 Identifying Political Games ................................................................................ 782 Political Games of People Being Evaluated ..................................................... 783 Political Games of Other Stakeholders ........................................................... 783 Political Games of Evaluators ........................................................................ 784 Managing Politics in Evaluations ...................................................................... 784 Building Trust ............................................................................................... 784 Building Theory of Change or Logic Models .................................................... 785 Balancing Stakeholders with Negotiation........................................................... 786 Principles for Negotiating Evaluation ............................................................. 786 Negotiation Evaluation Practice ..................................................................... 787 Part III: Evaluation Standards and Guiding Principles ........................................... 789 Program Evaluation Standards ......................................................................... 790 International Views of Evaluation Standards .................................................. 791 Guiding Principles for Evaluators ...................................................................... 793 Evaluation Ethics for the UN System ................................................................ 795 Conflict of Interest ............................................................................................ 796

The Road to Results: Designing and Conducting Effective Development Evaluations

Page xv

Table of Contents Summary ............................................................................................................. 796 Chapter 14 Activities ............................................................................................ 797 Application Exercise 14.1: Ethics: Rosa and Agricultural Evaluation.................. 797 Resources and Further Reading............................................................................ 799

Page xvi

The Road to Results: Designing and Conducting Effective Development Evaluations

Table of Contents

Table of Tables Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table

1.1: Examples of Evaluations............................................................................ 17 1.2 Comparisons of Monitoring and Evaluation. ................................................ 21 1.3: Changing Development Concepts ............................................................... 34 1.2: Rating Template for Governance of Evaluation Organizations. Evaluation Cooperation Group of the Multilateral Development Banks, 2002 .............. 49 3.1: Components of a Graphic Representation of a Theory of Change............... 143 3.2 a: Developing Outcomes for Education Policy. ........................................... 151 3.2b: Developing Outcomes for Education Policy (continued, showing indicators)154 3.2c: Developing Outcomes for Education Policy (continued, showing baseline data). ..................................................................................................... 157 3.2d: Developing Outcomes for Education Policy (continued, showing performance targets) ................................................................................................... 159 3.3: Outcomes Reporting Format .................................................................... 166 4.1: Checklist of Stakeholder Roles ................................................................. 189 4.2: Example of Stakeholder Analysis. ............................................................ 193 4.3: Typology of the Life of a Project, Program, or Policy. ................................. 195 5.1: Types of GAO Forward Looking Questions ................................................ 250 5.2: Participatory versus Traditional Evaluation Techniques ............................ 268 5.3: Matrix Comparing the Evaluation Approaches .......................................... 289 7.1: Comparison of Broad Design Categories. .................................................. 345 7.2: Main Types of Performance Audit. ............................................................ 377 8-1: Decision Table for Data Collection Method for Adult Literacy Intervention. 396 8.2: Key Issues about Data: ............................................................................ 402 8.3: When to Use Quantitative vs. Qualitative Approaches ............................... 406 8.4: Levels of Refinement in Determining Questionnaire Content ..................... 431 8.5: Developing an Interview ........................................................................... 440 8.6: Conducting Interviews ............................................................................. 446 8.7: Questionnaire Tips and Tricks ................................................................. 450 8.8: Comparison of Mail or Internet Survey, Structured Interviews, and Semistructured Interview Data Collection Options .......................................... 454 8.9: Guidelines for Using Diaries or Journals .................................................. 469 8.10: Example of a Citizen Report Card Reporting Overall Satisfaction with Services. ................................................................................................. 477 8.11: Overview of evaluation portfolio by sector. .............................................. 480 9.1: Drawing a Simple Random Sample .......................................................... 495 9.2: Drawing a Random Interval Sample ......................................................... 496 9.3: Summary of Random Sampling Process ................................................... 501 9.2: Guide to Minimum Sample Size (95% confidence level, +/- 5% margin of error) ...................................................................................................... 506 9.3: Sampling Sizes for Large Populations (one million or larger)...................... 508 9.4: 95% Confidence Intervals for a Population of 100. .................................... 508 9.5: 95% Confidence Intervals for a Population of 50. ...................................... 508 10.1: Key Considerations in the Early Phase of Qualitative Data Analysis ........ 523 10.2: Summary of Suggestions for Interpreting Qualitative Data ...................... 544 10.3: Affinity Diagram Process ........................................................................ 546 10.4: Distribution of Respondents by Gender .................................................. 548 10.5: Urban Percent Populations, Sample Data .............................................. 549

The Road to Results: Designing and Conducting Effective Development Evaluations

Page xvii

Table of Contents Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table

10.6: When to Use Types of Data. ................................................................... 550 10.7: Calculating Standard Deviation. ............................................................ 554 10.8: Guidelines for Analyzing Quantitative Survey Data ................................. 555 10.9: Client Views on Health Care Services at the Local Clinic ......................... 556 10.10: Client Views on Health Care Services at the Local Clinic ....................... 556 10.11: Commonly Used Descriptive Statistics (with illustrative examples using fake data from a new university) ............................................................. 557 10.12: Crosstab Results ................................................................................. 559 10.13: Comparison of Mean Incomes by Gender .............................................. 560 10.14: Common Tests for Statistical Significance ............................................ 565 10.15: Structure of the database of initial evidence for the visited projects ....... 573 10.16: Example Design Matrix ........................................................................ 577 11.1: Checklist for Communication Strategy. .................................................. 599 11.2: Recommendation Tracking System (RTS)................................................ 610 11.3: Summary of General Guidelines for Writing Reports ............................... 612 11.4: Parts of Graphs. .................................................................................... 621 11.4: Comparison of Chart and Graph Types. ................................................. 629 11.5: Example of Data in a Table with Many Lines. ......................................... 631 11.6: Example of Data in a Table with Few Lines............................................. 631 11.7: Example of Classification Table. ............................................................. 632 12.1: Essential Competencies for Program Evaluators (ECPE) ......................... 671 12.2 Twenty Key Project Manager Activities ..................................................... 675 12.3: Values to Check for Relevance ............................................................... 683 12.4: Hypothetical Task Map – first portion (7/1 to 8/31) ................................ 696 12.5: Example of Gantt Chart ......................................................................... 697 12.6: Three Primary Uses of Evaluation Findings ........................................... 703 12.7: Four Primary Uses of Evaluation Logic and Processes............................. 704 13.1: Descriptions of IEG Ratings Scale. ......................................................... 724 13.3: Checklist for Assessing the Gender Sensitivity of an Evaluation Design .. 744

Page xviii

The Road to Results: Designing and Conducting Effective Development Evaluations

Table of Contents

Table of Figures Fig. 0.1: Organization of the Blocks and Chapters. .......................................................5 Fig. 3.1: Ten Steps to Designing, Building and Sustaining a Results-based Monitoring and Evaluation System. ............................................................................. 138 Fig. 3.2: Program Theory of Change (Logic Model) to Achieve Results — Outcomes and Impacts ..................................................................................................... 144 Fig. 3.3: Program Theory of Change (Logic Model) to Reduce Childhood Morbidity via use of ORT ................................................................................................. 144 Fig. 3.4: Matrix for Selecting Indicators. ................................................................... 153 Fig. 3.5: Summary of Data Collection Methods. ........................................................ 156 Fig. 3.6: Identifying Expected or Desired Level of Improvement Requires Selecting Performance Targets .................................................................................. 158 Fig. 3.7: Key Types of Monitoring. ............................................................................ 160 Fig. 3.8: Implementation Monitoring Links to Results Monitoring ............................. 161 Fig. 3.9: Linking Implementation Monitoring to Results Monitoring .......................... 161 Fig. 3.10: Achieving Results through Partnership ..................................................... 162 Fig. 4.1: Systems Model. .......................................................................................... 199 Fig. 4.2 Potential Influences on the Program Results. ............................................... 200 Fig. 4.3: Process for Constructing a Theory of Change. ............................................. 202 Fig. 4.4: Simple Theory of Change Diagram. ............................................................. 206 Fig. 4.5: Simple Theory of Change Diagram with Assumptions Identified. ................. 207 Fig. 4.7: Theory of Change Template ........................................................................ 208 Fig. 4.6: Example of Program Theory Model: Teacher Visits to Students’ Homes. ....... 211 Fig. 4.7: Schematic Representation of Core Elements of WBI’s Underlying Program Logic. ......................................................................................................... 213 Fig. 4.8: Example of a Theory of Change Model for Research Grant Proposal. ............ 214 Fig. 4.9: Structure of a Flow Chart or Classic Logic Model. ....................................... 216 Fig. 4.10: Logic Model Emphasizing Assumptions..................................................... 217 Fig. 4.11: Structure of a Results Chain Model. ......................................................... 218 Fig. 4.12: Simple Logic Model for a Micro-Lending Program. ..................................... 219 Fig. 4.13: More Complex Logic Model for a Micro-Lending Program. .......................... 220 Fig. 4.14: Logic Model for a Training Program........................................................... 221 Fig. 4.15: Example of Logframe for a Program Goal. ................................................. 222 Fig. 4.16: Logical Framework for a Childcare Program Embedded in a Women in Development Project................................................................................... 224 Fig. 6.1: Using a Logic Model to Frame Evaluation Questions. .................................. 307 Fig. 6.2: Example of Theory of Change of a Micro-lending Program Showing Categories of Questions Generated. ............................................................................. 319 Fig. 6.3: Example of Theory of Change for a Training Program, and Question Generation. ................................................................................................ 319 Fig. 6.4: Matrix for Ranking and Selecting Evaluation Questions. ............................. 321 Fig. 6.5: The Evaluation Process. ............................................................................. 327 Fig. 6.6: Approach to Development Evaluation. ........................................................ 330 Fig. 8.1: Google Earth Image of Dacca (Dhaka), India. .............................................. 413 Fig. 8.2: Example of a Data Collection Instrument. ................................................... 417 Fig. 8.3: Issue-based Observation Form for Case Studies in Science Education. (Source: Stake, (1995, p. 50) .................................................................................... 424 Fig. 8.4: Obtaining Good Response Rates for Mail Surveys. ...................................... 453

The Road to Results: Designing and Conducting Effective Development Evaluations

Page xix

Table of Contents Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig.

8.5: Model for Focus Group Evaluation Design. ................................................. 460 8.6: The Protocol Funnel. .................................................................................. 464 9.1: Stratified Random Sample. ......................................................................... 497 10.1: Data Collection vs. Data Analysis over Time. ............................................. 520 10.2: Flowchart for the Typical Process of Content Analysis Research ................ 534 10.3a: Qualitative Data Analysis Worksheet (example, blank)............................. 537 10.3b: Qualitative Analysis Worksheet (example showing topics recorded) .......... 538 10.3c: Qualitative Data Analysis Worksheet (example showing quotes and findings) ................................................................................................................. 539 10.4: A Normal Distribution. ............................................................................. 552 10.5: Examples of Data Not Matching the Normal Distribution. .......................... 552 10.6: Standard Deviation. ................................................................................. 553 10.7: Frequently Used Descriptive Analyses. ...................................................... 558 11.1: IFC’s Two Stage MATR to Track Recommendations. .................................. 611 11.2: Example of a Sketch Used in an Evaluation Report. .................................. 617 11.3: Example of a Line Drawing Used in an Evaluation Report ......................... 618 11.4: Example of a Photograph Used in an Evaluation Report (Students Working on a Cooperative Learning Assignment). .......................................................... 619 11.5: Example of Organization Chart. ................................................................ 620 11.6: Example of a Gantt Chart. ........................................................................ 620 11.7: The Parts of a Graph. ............................................................................... 622 11.8: Example of Line Graph for One Kind of Data over Time. ............................ 624 11.9: Example of Multiple Line Chart with Legend. ............................................ 624 11.10: Example of Single Bar Graph in Horizontal Format. ................................ 625 11.11: Example of Multiple Bar Graph in Vertical Format. ................................. 626 11.12: Example of Pie Chart. ............................................................................. 627 11.13: Example of Scatter Diagram. .................................................................. 628 11.14: Experimental Design ............................................................................. 633 11.15: Quasi-Experimental Design .................................................................... 633 11.16: Retrospective Design .............................................................................. 633 11.17: Example of Bar Chart Showing Impact.................................................... 633 11.18: Example of Graphic of Results Chain. ..................................................... 634 11.19: Example of Graphic of Logic Model. ........................................................ 634 11.20: Example of Graphic of Logical Framework. ............................................. 635 11.21: Graph Showing Improper Data Ink and Chartunk ................................... 637 11.22: Graph Showing Better Use of Data Ink and No Chartjunk. ...................... 638 12.1: Michael Greer’s Five Phase Project Management Process. .......................... 674 13.1: Conceptual Framework for Harnessing Synergies with the Public Sector ... 734

Page xx

The Road to Results: Designing and Conducting Effective Development Evaluations

The Road to Results Designing and Conducting Effective Development Evaluations

Overview of the Textbook Development Evaluation: Where Are We Today? The analytical, conceptual, and political framework of development has changed dramatically. The new development agenda calls for broader understandings of sectors, countries, development strategies, and policies. It emphasizes learning and continuous feedback at all phases of the development cycle. Indeed, development evaluation can be considered a kind of public good. “… evaluation extends beyond the boundaries of any single organization. A good evaluation study can have positive spillover effects throughout the development community. Development evaluation has the characteristics of an international public good” (Picciotto & Rist, 1995, p. 23). As the development agenda has grown in scope and complexity, development evaluation has followed suit. Development evaluators have moved from traditional implementation-focused evaluation models to results-based evaluation models. The primary evaluation unit of analysis has shifted from the project to the country, sector, theme, policy, and global levels. The process of development has changed from an emphasis on individual projects, or a partial approach, to a more comprehensive approach.

Overview of the Textbook Development partnerships are another important factor that evaluators must take into consideration. Given the growing number of partnerships involved in development assistance, the performance of individual partners now needs to be evaluated according to their respective contributions and obligations. With the advent of a more demanding, fragmented, and participatory approach to development, evaluation has also become more difficult to design. It encompasses more intricate methodological demands and sets very different standards for establishing impacts. Indeed, evaluation tools and methods need to be adapted to a more difficult, complex environment. For example, quantitative methods are becoming more sophisticated, and qualitative methods are becoming more prevalent in development evaluation. The increasing use of joint evaluations means that organizations and institutions will need to come together to form coherent evaluations. Joint evaluations, while beneficial in many respects, also add to the complexity of development evaluation (OECD/DAC, 2003). Demand for new evaluation approaches and a new mix of skills goes beyond economics and draws again from the social sciences. For example, the special problems of the environment call for new and innovative approaches to evaluating environment sustainability. The scope of environmental problems, multinational consequences, difficulties in obtaining comparable measurements, and persistent evidence of unanticipated consequences all necessitate a complex and multi-method approach to evaluation. “It may well be that no single discipline can be expected to dominate in an endeavor that deals with the multiple challenges, hopes, and exertions of the majority of humankind. In the absence of a single intellectual rallying point, trespassing across disciplinary boundaries is common and evaluators are increasingly eclectic and venturesome in their use of social science instruments” (Picciotto & Rist, 1995, p. 169). Study of cultural context, institutional change, and means of empowerment are but three examples of the kinds of topics that need to be considered by development evaluators. More attention is needed to issues of implementation, to documenting different strategies of local participation and empowerment, to ensuring that the voice of the people is heard in assessing a development initiative, and to studying the cultural context of development initiatives.

Page 2

The Road to Results: Designing and Conducting Effective Development Evaluations

Overview of the Textbook Textbook The creation of evaluation units trained in development evaluation practices and methods is a considerable challenge facing many developing countries. This IPDET course is aimed at addressing this challenge by seeking to build and enhance capacity in development evaluation around the world. Evaluators are increasingly thinking about and exploring new evaluation architecture, exploring ways that can bring together donor countries and institutions, recipient countries and entities, the UN system, civil society, the private sector, NGOs, etc. to focus on results in meeting the challenges of reducing poverty on a global scale. The creation of national and regional evaluation associations and inter-linkages between them can help move the development community in this direction.

Relationship of Textbook to IPDET The International Program for Development Evaluation Training (IPDET) was initiated by the Operations Evaluation Department (now the Independent Evaluation Group) of the World Bank in partnership with Carleton University to meet the needs of evaluation and audit units of bilateral and multilateral development agencies and banks, developed and developing country governments, and evaluators working in development and nongovernmental organizations. The overall goal of this training program is to enhance the knowledge, skills, and abilities of participants in development evaluation. It is our intention that by the end of the training program, the IDPET participants will be able to: •

describe the development evaluation process

•

discuss development evaluation concepts, techniques, and issues

•

analyze different options for planning development evaluations, including data collection, analysis, and reporting

•

design a development evaluation.

IPDET is built on 14 chapters which together overview the road to effective evaluation of development interventions. This textbook builds and expands on these chapters, presenting a comprehensive discussion of issues facing development evaluators, as well as a guide to undertaking development evaluation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 3

Overview of the Textbook

Organization of the Text This text is organized into 14 chapters. The 14 chapters are grouped into five blocks. Each block of chapters discusses a related group of information pertaining to development evaluation. The first block is Foundations. The two chapters in this block discuss the foundations of development evaluation. •

Introducing Development Evaluation

•

Understanding the Issues Driving Development Evaluation.

The second block is Preparing. The three chapters in this block present practical information for organizing and planning a development evaluation. The three chapters in this section include: •

Building a Results-based Monitoring and Evaluation System

•

Understanding the Evaluation Context and Program Theory of Change

•

Considering the Evaluation Approach

The third block is named Designing and Conducting. The five chapters in this block discuss techniques for designing an evaluation and present practical ways design the development evaluation. The chapters in this block help evaluators see the choices and rationale for making design decisions for evaluation. Throughout the five chapters in this block, the evaluation design matrix is discussed and constructed. These chapters contain many practical suggestions for developing the design. The chapters in this block include:

Page 4

•

Developing Evaluation Questions and Starting the Design Matrix

•

Selecting Designs for Cause and Effect, Normative, and Descriptive Evaluation Questions

•

Selecting and Constructing Data Collection Instruments

•

Deciding on the Sampling Strategy.

•

Planning Data Analysis and Completing the Design Matrix.

The Road to Results: Designing and Conducting Effective Development Evaluations

Overview of the Textbook Textbook The fourth block is titled Leading. The first of the three chapters in this block introduces techniques to share information and results by writing reports, making presentations, and other ways of sharing information. The second chapter discusses ways to organize and plan the work of all of the people involved in the evaluation. The third chapter discusses complex evaluation. The chapters in the block are: •

Presenting Results

•

Managing for Quality and Use

•

Evaluating Complex Interventions.

The fifth and last block is named Acting Professionally. It discusses an important issue that effects every evaluation and every decision that an evaluator faces, professionalism. The single chapter in this block is: •

Guiding the Evaluator: Evaluation Ethics, Politics, Standards, and Guiding Principles.

The diagram in Figure 0.1 shows the relationships among the modules in this course.

Introducing Development Evaluation

Foundations

Preparing

Designing & Conducting

Developing Evaluation Questions and Starting the Design Matrix

Leading

Acting Professionally

Understanding Issues Driving Development Evaluation

Building a ResultsBased M & E System

Understanding the Evaluation Context and Program Theory of Change

Selecting Designs for Cause and Effect Normative, and Descriptive, Evaluation Questions

Selecting and Constructing Data Collection Instruments

Presenting Results

Managing for Quality and Use

Considering the Evaluation Approach

Deciding on the Sampling Strategy

Planning Data Analysis and Completing the Design Matrix

Evaluating Complex Interventions

Guiding the Evaluator: Evaluation Ethics, Politics, Standards, and Guiding Principles

Fig. 0.1: 0.1: Organization of the Blocks and Chapters.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 5

The Road to Results Designing and Conducting Effective Development Evaluations

Foundations “True genius resides in the capacity for evaluation of uncertain, hazardous, and conflicting information.” WINSTON CHURCHILL

Chapter 1: Introducing Development Development Evaluation •

Evaluation, What Is It?

•

Origin and History of the Evaluation Discipline

•

The Development Evaluation Context

•

Principles and Standards

Chapter 2: Understanding Issues Driving Development Evaluation •

A Look at Evaluation in Developed and Developing Countries

•

Emerging Development Trends: What Are the Evaluation Implications?

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 1 Introducing Development Evaluation Introduction History shows that people have used evaluation in many ways, for many reasons, and for many centuries. International development is no exception. This chapter first introduces the definition and general concepts behind evaluation of projects, programs, and policies. It then turns to evaluation of development interventions, often called development evaluation. This chapter has four parts. They are: •

Evaluation: What Is It?

•

The Origins and History of the Evaluation Discipline

•

The Development Evaluation Context

•

Principles and Standards for Development Evaluation.

Chapter 1

Part I: Evaluation: What Is It? To begin understanding development evaluation, it is important to understand what is meant by evaluation, its purposes, and how it can be used. This section discusses: •

defining evaluation

•

purpose of evaluation

•

benefits of evaluation

•

what to evaluate

•

uses of evaluation

•

relation between monitoring and evaluation

•

roles and activities of professional evaluators.

Defining Evaluation Evaluation has been defined in many ways. The straightforward dictionary definition of evaluation is: 1. the action of appraising or valuing (goods, etc); a calculation or statement of value; 2. the action of evaluating or determining the value of (a mathematical expression, a physical quantity, etc.), or of estimating the force of probabilities, evidence, etc) (Oxford English Dictionary, 2007). Yet within the evaluation discipline, the term has come to have a variety of meanings. Differences in definitions reflect differing emphases on the purpose of evaluation – accountability versus learning – or the timing of evaluation in relation to the maturity of the program, project, or policy. Indeed, there is no universal agreement on the definition itself. In fact, in considering the role of language in evaluation, Michael Scriven, one of the founders of modern evaluation, recently noted that there are nearly sixty different terms for evaluation that apply to one context or another. These include: adjudge, appraise, analyze, assess, critique, examine, grade, inspect, judge, rate, rank, review, score, study, test… (Fitzpatrick, Sanders, & Worthen, 2004, p. 5).

Page 10

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation Most definitions include some concept of valuing or assessing that also differentiates evaluation from research and monitoring activities. A common definition is from the Organization for Economic Co-operation and Development (OECD) / Development Assistance Committee (DAC) Glossary. It is the definition used in this text, and is as follows:

OECD Evaluation Definition “Evaluation refers to the process of determining the worth or significance of an activity, policy, or program.” “An assessment, as systematic and objective as possible, of a planned, on-going, or completed intervention.” Source: OECD/ DAC, 2002, Glossary of Key Terms in Evaluation, p. 21.

In this context, it is important to introduce the notions of formative evaluations, summative evaluations, and prospective evaluations for projects, programs, and policies. Formative evaluations are evaluations intended to improve performance, [and] are most often conducted during the implementation phase of projects or programs. Formative evaluations may also be conducted for other reasons such as compliance, legal requirements or as part of a larger evaluation initiative. Summative evaluations, by contrast are studies conducted at the end of an intervention (or a phase of that intervention) to determine the extent to which anticipated outcomes were produced. Summative evaluation is intended to provide information about the worth of a program (OECD/DAC, Glossary, 2002, p. 2122). A formative evaluation looks into the ways in which the program, policy or project is implemented, whether or not the assumed ‘operational logic’ corresponds with the actual operations and what (immediate) consequences the implementation (stages) produces. This type of evaluation is conducted during the implementation phase of projects or programs. Sometimes formative evaluations are called “process” evaluation because they focus on operations. One type of formative evaluation is a mid-term or mid-point evaluation. As implied, a mid-term evaluation is conducted approximately half-way through a project, program, or change in policy. The purpose of a mid-term evaluation is to help identify which features are working and any features that are weak or not working.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 11

Chapter 1 Mid-term evaluations can begin to focus on lessons learned, as well as relevance, effectiveness and efficiency. Lessons learned are also important in guiding future interventions or changing current ones. For example, the International Development Research Centre (IDRC) was involved in managing a natural resources program initiative in Latin America and the Caribbean (known as MINGA) (Adamo, 2003, p. ii). The general objective of the MINGA program initiative was to contribute to the formation of natural resource management (NRM) professionals, women and men, in Bolivia, Ecuador, and Peru. One component of the program initiative was a commitment to mainstream gender into its programming and supported research. To lean more about how gender was being mainstreamed into the program, they contracted an evaluator to do a formative evaluation. The methodology for the formative evaluation began with a review of the program documents that related to gender mainstreaming and activities to data. The evaluators also reviewed trip reports to assess the extent to which gender was being addressed during visits. Interviews were also organized with program staff to examine their individual efforts and experiences to mainstream gender into their work and the lessons they learned along the way. A summative evaluation, often called on outcome or impact evaluation, is conducted at the end of an intervention or on a mature intervention to determine the extent to which anticipated results were realized. Summative evaluation is intended to provide information about the worth and the impact of the program. Summative evaluations may be: impact evaluations, cost-effectiveness investigations, quasiexperiments, randomized experiments, or case studies. An example of a summative evaluation is one completed by the Asian Development Bank (ADB) to evaluate the Second Financial Sector Program in Mongolia (ADB, 2007). The program involved financial sector reforms that included restructuring and transforming the financial sector from a mono-banking system into a two-tier system supported by the ADB. This particular evaluation was completed at the end of the second phase of this program. Summative evaluations are used to answer questions of relevance, performance, impacts, sustainability, external utility and lessons learned.

Page 12

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation Stated another way: •

formative evaluations focus on project/program/policy implementation and improvement

•

summative evaluations focus on results — enabling persons to make decisions as to continuing, replicating, scaling up, or ending a given project/program/policy.

Typically, both kinds of evaluation are needed and used by organizations at different times in the cycle of a project, program, or policy. Prospective evaluations assess the likely outcomes of proposed projects/programs/policies. Prospective evaluations are somewhat similar to evaluability assessments. An evaluability assessment answers the questions “Is this program or project worth evaluating?” and “Will the gains be worth the effort/resources expended?” A prospective evaluation synthesizes evaluation findings from earlier studies to assess the likely outcomes of proposed new projects [programs/policies]… For example, US Congressional committees frequently ask the US Government Accountability Office (GAO)1 for advice in forecasting the likely outcomes of proposed legislation. A relatively dated but nevertheless interesting example of a prospective evaluation is the 1986 US GAO study: “Teenage pregnancy: 500,000 Births a Year but Few Tested Programs” (GAO, 1986). The GAO’s evaluation used four procedures. It analyzed the main features of two congressional bills, reviewed available statistics on the extent of teenage pregnancy, examined the characteristics of federal and non-federal programs, and reviewed evaluation studies on the effectiveness of prior programs for assisting pregnant and parenting teenagers, as well as teenagers at risk of becoming pregnant. The evaluators reconstructed the underlying program theory and the operational logic of both congressional bills to find out why it was believed that these initiatives would work as proposed in the legislation. They then compared the evidence found to the features of the proposed legislation. This type of prospective evaluation is also sometimes called an ex ante (before the fact) evaluation (Rossi & Freeman, 1993, p. 422). Ex ante or prospective evaluations often include program theory reconstruction or assessment and scenario studies as well as summaries of existing research and evaluation to ascertain the empirical support for proposed initiatives.

The United States General Accounting Office changed its name to The Government Accounting Office in July of 2004. They still use the acronym GAO.

1

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 13

Chapter 1

Purpose of Evaluation Evaluation can be used for a variety of purposes. Again, within the discipline, there are different views about what the purpose or goal of evaluation should be within a given context. A prevalent view that has emerged in the literature is that evaluation has four distinct purposes: •

an ‘ethical’ purpose: to report to political leaders and citizens on how a policy or program has been implemented and the results achieved. This purpose combines the objectives of better accountability, information, and the serving of democracy

•

a ‘managerial’ purpose: to achieve a more rational distribution of financial and human resources among different “competing” programs, and to improve the management of the programs and benefits by those entrusted with accomplishing them

•

a ‘decisional’ purpose: to pave the way for decisions on the continuation, termination, or reshaping of a policy or program

•

an ‘educative and motivational’ purpose: to help in educating and motivating public agencies and their partners by enabling them to understand the processes in which they are engaged and identify themselves with their objectives (National Council of Evaluation, 1999, p. 12).

Prominent evaluators in the field describe the following purposes of evaluation to:

Page 14

•

obtain social betterment

•

promote the fostering of deliberative democracy

•

provide oversight and compliance

•

ensure accountability and transparency

•

build and share knowledge and manage it

•

further organizational improvement

•

promote dialogue and cooperation among key stakeholders

•

determine project, program, and/or policy relevance, implementation, efficiency, effectiveness, impact, and sustainability

•

generate lessons learned.

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation Chelimsky and Shadish (1997), take a global perspective. They extend evaluation’s context in the new century to worldwide challenges rather than domestic ones. The challenges they cite include: the impact of new technologies, demographic imbalances across nations, environmental protection, sustainable development, terrorism, human rights, and other issues that extend beyond one program or even one country. Ultimately, most agree, the purpose of any evaluation is to provide information to decision makers to enable better decisions about projects, programs, or policies.

Benefits of Evaluation Evaluation helps answer questions about interventions. For example, an evaluation might answer questions such as: •

What are the impacts of the intervention?

•

Is the specific intervention working as planned?

•

Are there differences across sites in how the program is performing?

•

Who is benefiting from this intervention?

People benefit from evaluations in different ways. Direct beneficiaries are those who are the direct recipients of the benefit of the intervention. Some people are indirect beneficiaries, as they are not involved in the intervention but receive a benefit from the intervention. Interventions are intended to benefit people, but occasionally, people are unintended beneficiaries. Some interventions give short-term benefits; others may provide benefits over a long-term. To illustrate how people benefit in different ways, consider the following example. The United States Department of Housing and Urban Development (1997) evaluated a midnight basketball program for boys and girls aged 16 to 20 residing in public housing. Surveys were administered to the youths both before and after the program was implemented. The survey findings showed that prior to the program: • 92 percent of the youths surveyed reported that they expected to get into some kind of trouble in the next 3 months. •

66 percent of the youths thought that they would be victims of violent acts during that same period.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 15

Chapter 1 Following implementation of the basketball program: •

20 percent of the youths surveyed stated that they expected to get into some kind of trouble.

•

Only 5 percent of the youths expected to be crime victims.

The evaluation of the midnight basketball program showed a 78 percent reduction in the juvenile offender crime rate among youths 16 to 20 years old in the precinct where the public housing development was located. The primary reason the youths gave for their survey responses was having a midnight basketball program gave them something positive to do. In this example, the youth are the direct beneficiaries, as they are the direct recipients of the benefit of the midnight basketball program. They believe they will stay out of jail and are less likely to be a victim of violent crime. For the evaluation, community residents were also surveyed and responded that they felt both their community and their children were safer because of the midnight basketball program. Community residents are indirect and at least shortterm beneficiaries; (depending on how long the gains last) as they are not involved in the program at all, but believe they are safer because of the program. The summary findings above could be used to demonstrate to residents and the community at large that this program was successful in preventing and reducing violence. The midnight basketball program administrators could also present the findings to the city council to justify a request for continued funding. The midnight basketball program administrators are indirect beneficiaries if they make and receive their request for continued funding as they get to keep their jobs longer. In the long term, society at large also benefits if the young people stay out of jail, because society does not need to cover the costs of incarceration and lost productivity; rather, these youths can perhaps become good, employable, productive, taxpaying citizens. An evaluation can provide information on the process of implementing a program, as well as the outcomes. Other public housing agencies (unintended beneficiaries) will be able to benefit from lessons learned during the program implementation phase and the subsequent evaluation.

Page 16

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation In a second example, an intervention to build and maintain a water treatment plant brings safe drinking water to residents. Indirect medium term benefit might be industries moving to the community because it has safe drinking water. The direct immediate benefit is clean drinking water to residents. A longer term benefit to individuals in the community and the community at large would be decreased incidence of water based diseases.

What to Evaluate Evaluations can look at many different facets of development. The following are some facets that can be evaluated. •

Projects: a single intervention in one location or a single project implemented in several locations.

•

Programs: an intervention comprising various activities or projects, which are intended to contribute to a common goal.

•

Policies: evaluations of the standards, guidelines or rules set up by an organization to regulate development decisions

•

Organizations: multiple intervention programs delivered by an organization.

•

Sectors: evaluations of interventions across a specific policy arena, such as education, forestry, agriculture, and health.

•

Themes: evaluations of particular issues, often crosscutting, such as gender equity or global public goods.

•

Country assistance: evaluations of country progress relative to a plan, the overall effect of aid, and lessons learned.

Table 1.1 gives examples of evaluations. Table 1.1: Examples of Evaluations Privatizing Water Systems

Resettlement

Policy Evaluations

Comparing model approaches to privatizing public water supplies

Comparing strategies used for resettlement of villagers to new areas

Program Evaluations

Assessing fiscal management of government systems

Assessing the degree to which resettled village farmers maintain their previous livelihood

Project Evaluations Evaluations

Assessing the improvement in water fee collection rates in two provinces

Assessing the farming practices of resettled farmers in one province

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 17

Chapter 1

Uses of Evaluation The results of evaluation can be used in many ways. Evaluations give clients, government agencies, NGOs, the public, and many others feedback on policies, programs, and projects. The results provide information about how public funds are being used. They can give managers and policymakers information on what is working well and what is not working well, according to original or revised objectives. Evaluations can help to make projects, programs, and policies accountable for how they use public funds. Evaluations can identify projects, programs and policies for replication, scaling up, improvements, or for possible termination. Carol Weiss (2004, p. 1) stresses the importance of identifying the intended uses for an evaluation from the initial planning stage. She writes: … if you cannot identify and articulate the primary intended users and uses of the evaluation you should not conduct the evaluation. Unused evaluation is a waste of precious human and financial resources. Weiss (2004) asserts that, from beginning to end, the evaluation process is designed and carried out around the needs of the primary intended user. These primary users will have the responsibility for implementing change based on their involvement in the process or with the evaluation findings. Evaluation can be used to: •

help analyze why intended results were or were not achieved

•

explore why there may have been unintended results or consequences

•

assess how and why results were affected by specific activities

•

shed light on implementation processes, failures, or successes that may occur at any level

•

help to provide lessons, highlight areas of accomplishment and potential, and offer specific recommendations for improvement and reform.

Pragmatic uses of evaluation are summarized in the following box.

Page 18

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation

Pragmatic Uses of Evaluation •

Help make resource allocation decisions.

•

Help rethink the causes of a problem.

•

Identify emerging problems.

•

Support decision-making on competing or best alternatives.

•

Support public sector reform and innovation.

•

Build consensus on the causes of a problem and how to respond.

Source: Kusak and Rist, 2005, p. 115.

As the next box summarizes, evaluation can help provide information on strategy, operations, and learning.

Evaluation Provides Information Information on: Strategy: Strategy Are the right things being done? rational or justification clear theory of change Operations: Operations: Are things being done right? effectiveness in achieving expected outcomes efficiency in optimizing resources client satisfaction Learning: Learning: Are there better ways? alternatives best practices lessons learned Source: Kusak and Rist, 2005, p. 117

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 19

Chapter 1 In summary, evaluations can be useful in focusing on: •

the broad political strategy and design issues (“Are we doing the right things?”)

•

operational and implementation issues (“Are we doing things right?”)

•

whether there are better ways of approaching the problem (“What are we learning?”).

Relation between Monitoring and Evaluation To be consistent, we use the OECD/DAC Glossary of Key Terms in Evaluation definition of evaluation. Monitoring is a continuing function that uses systematic collection of data on specified indicators to provide management and the main stakeholders of an ongoing development intervention with indications of the extent of progress and achievement of objectives and progress in the use of allocated funds (2002, pp. 27-28). In other words, monitoring is a routine, ongoing, and internal activity. It is used to collect information on a program’s activities, outputs, and outcomes in order to measure the performance of the program. An example of a monitoring system is the Malawi’s Ministry of Health use of 26 indicators to monitor the quality of health provided at Central Hospital. Indicators include, for example the: •

number of patients seen by specialists within four weeks of referral

•

total number of in-patient deaths

•

number of direct obstetric deaths in the facility

•

number of –in patient days (Government of Malawi, 2007).

Regular provision of data on the indicators provides the Minister with a trend line. Any dramatic swings can be investigated. For example, a marked increase in the number of in-patient deaths may reflect a high hospital infection rate that needs to be decreased with swift action. Alternatively, a marked decrease in infection rates may suggest that the use of a new disinfectant is quite effective and its use should be promoted.

Page 20

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation Evaluations are generally conducted to answer the “why” question behind the monitoring of data. For example, why caesarean sections are up in five hospitals or why three of 50 sites have particularly high survival rates for premature babies. Evaluations are also needed to attribute results to a specific intervention rather than to other possible causes. Both monitoring and evaluation measure and assess performance, but they do this in different ways and at different times. •

Monitoring takes place throughout program or project implementation.

•

Evaluation is the periodic assessment of the performance of the program or project. It seeks in a monitoring and evaluating (M & E) system to answer the question “why”.

Monitoring is an internal activity carried out by project staff and it is generally the project management’s responsibility to see that it happens and that the results are used. On the other hand, evaluation can be carried out internally or externally and it is the responsibility of the evaluator together with program staff members (Insideout, 2005, p. 2). Table 1.2 compares characteristics of monitoring and evaluation. Table 1.2 Comparisons of Monitoring and Evaluation. Monitoring

Evaluation

Ongoing, continuous

Period and time bound

Internal activity

Can be internal, external, or participatory

Responsibility of management

Responsibility of evaluator together with staff and management

Continuous feedback to improve program performance

Periodic feedback

Source: Insideout, Monitoring and Evaluation Specialists, 2005, p 2.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 21

Chapter 1

Roles and Activities of Professional Evaluators As the concept and purposes of evaluation have evolved over time, so too have the roles and activities of evaluators. Evaluators have played a multitude of roles and engaged in numerous activities. Their role depends on the nature and purpose of the evaluation. Evaluators play many roles including scientific expert, facilitator, planner, [as well as judge, trusted person, teacher, and social change agent], collaborator, aid to decision makers, and critical friend (Fitzpatrick, Sanders, & Worthen, 2004, p. 28).

Who Conducts the Evaluation? Evaluators may be part of internal, external, or participatory evaluations. The OECD/DAC Glossary defines internal evaluation as: Evaluation of a development intervention conducted by a unit or individuals reporting to the management of the donor, partner, or implementing organization (2002, p. 26). The OECD/DAC Glossary defines external evaluation as: Evaluation of a development intervention conducted by entities and/or individuals outside the donor, partner and implementing organization (2002, p. 23). There are advantages and disadvantages to using internal and external evaluators. Internal evaluators usually know more about the program, project, or policy than an outsider. In fact, the person who develops and manages the intervention may also be charged with its evaluation. They usually know more about the history, organization, culture, people involved, and the problems and successes. Because of this knowledge, internal evaluators might be able to ask the most relevant and pertinent questions, know where to go “backstage” in the organization to find out what “really” is going on. This advantage, however, can also be a disadvantage. Internal evaluators may be so close to the program, project, or policy that they do not see it clearly and might not be able to recognize solutions or changes that others might see. Internal evaluators may also have the disadvantage of being more subject to pressures or influence from program decisionmakers who also make personnel decisions. They may see the whole organization from only their own position inside the

Page 22

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation organization. Lastly, external stakeholders may perceive their findings as less credible. External evaluators usually have more credibility and give the perception of objectivity to an evaluation. In addition, most external evaluators have more specialized skills that may be needed to perform effective evaluations. They also are independent from the administration and financial decisions about the program (Fitzpatrick, Sanders, & Worthen, 2004), pp. 23-24). However, an external evaluation is not a guarantee of independent and credible results, particularly if the consultants have prior program ties. Just because a unit external to the organization conducts an evaluation does not guarantee it is independent and credible. External consultants, for example, may have prior program ties, or may be overly accommodating to management in the hopes of future work. Participatory evaluators work together with representatives of agencies and stakeholders to design, carry out, and interpret an evaluation (OECD/DAC, 2002, Glossary, p. 28). Participatory evaluation is increasingly considered as a third evaluation method. Participatory evaluation differs from internal and external evaluation in some fundamental ways. Participatory evaluation represents a further and more radical step away from the model of independent evaluation… [it] is a form of evaluation where the distinction between experts and layperson, researcher and researched, is deemphasized and redefined…evaluators… [act as] mainly facilitators and instructors helping others to make the assessment (Molund & Schill, 2004, p. 19). Note the distinction between participatory evaluation and participatory methods; the latter may be used in internal and external evaluations.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 23

Chapter 1

Evaluator Activities Evaluators carry out a number of activities to match various roles. For example, internal evaluators may work on project or program design, implementation, and outreach strategies for the intervention to be assessed. External evaluators would typically limit their involvement in program management. Regardless, generally, evaluators: •

consult with all major stakeholders

•

manage evaluation budgets

•

plan the evaluation

•

perform or conduct the evaluation or hire contract staff to perform the evaluation

•

identify standards for effectiveness (relying on authorizing documents or other sources)

•

collect, analyze, interpret, and report on data and findings.

As such, evaluators are expected to have a diverse skill set. In this context, evaluators can also play the role of knowledge creator/builder and disseminator of lessons learned.

Part II: The Origins and History of the Evaluation Evaluation Discipline The modern discipline of evaluation emerged from social science research and the scientific method. However, evaluation has ancient traditions. Indeed, the earliest forms of evaluation were thousands of years ago. For example, archaeological evidence shows that the ancient Egyptians regularly monitored their country’s outputs in grain and livestock production more than 5000 years ago. In the public sector, formal evaluation was evident as early as 2000 BC, when Chinese officials conducted civil service examinations to measure the proficiency of applicants for government positions. And, in education, Socrates used verbally mediated evaluations as part of the learning process (Fitzpatrick, Sanders, & Worthen, p. 31).

Page 24

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation Some experts trace the emergence of modern evaluation methods to the advent of the natural sciences and attendant emphasis on observed phenomena (the “empirical method”) in the 17th century. In Sweden, ad hoc policy commissions that performed some kind of evaluations came into being in the 17th century. Traditionally, appointed ad hoc policy commissions have played a great part in preparing the ground for many decisions. The commissions have been important for providing briefing and (evaluative) background materials both with respect to fundamental policy decisions and in connection with the day-to-day fine-calibration of the arsenal of means available to various spheres of activity (Furubo & Sandahl, 2002, p. 116). Indeed, the commission system is still used in Sweden today, with several hundred commissions currently in existence. In the 1800’s, evaluation of education and social programs began to take root in several Anglo-Saxon countries. Program evaluation was conducted in England by governmentappointed commissions who were called upon to investigate and evaluate dissatisfaction with educational and social programs. The current-day external inspectorates for schools grew out of these earlier commissions. In the United States, pioneering efforts were made during the 1800’s to examine the quality of the school system using achievement tests. These efforts continue to the present day, where student achievement scores are a key measure for determining the quality of education in schools. Also during this period, the early beginnings of accreditation for secondary schools and universities began in the United States In the early 1900’s, formal evaluation and accreditation of medical schools took place in the United States and Canada. Other areas of investigation/measurement and evaluation during this period included: health, housing, work productivity, democratic and authoritarian leadership, and standardized educational testing. Most were small-scale efforts conducted by government agencies and social services. In the development arena, “Dodd’s attempt to introduce water boiling as a public health practice in villages in the Middle East is one of the landmark studies in the pre-World War II empirical sociological literature” (Rossi & Freeman, 1993, p. 10).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 25

Chapter 1 Applied social research grew rapidly following the Great Depression, when the United States’ President Roosevelt instituted the New Deal socio-economic programs. The United States’ federal government began to grow rapidly, as new agencies were created to manage and implement national programs such as: agricultural subsidies to protect farmers; public works and job creation schemes; rural electrification; and social security administration, etc. As these large-scale programs were new and experimental in nature, the need for evaluating their effectiveness in jump-starting the economy, creating jobs, and instituting social safety nets grew in tandem. The need for evaluation increased during and after World War II, as more large-scale programs were designed and undertaken for the military, urban housing, job and occupational training, and health. It was also during this time that major commitments were made to international programs that included family planning, health and nutrition, and rural community development. Expenditures were very large and consequently were accompanied by demands for ‘knowledge of results.

Influences from New Efforts In the 1950’s and 1960’s, evaluation became more routinely used in the United States and Europe to assess programs related to education, health, human services, mental health, prevention of delinquency, and rehabilitation of criminals. In addition, the United States’ President Johnson’s “war on poverty” program during the 1960’s stimulated increased interest in evaluation. Work in developing countries around the world also expanded, with some evaluation activity for programs in agriculture, community development, family planning, health care, and nutrition. Again, for the most part, these assessments relied on traditional social science tools, such as surveys and statistical analysis. In the United States in 1949, the first Hoover Commission recommended that budget information for the national government be structured in terms of activities rather than line-items. They also recommended that performance measurements be provided along with performance reports (Burkhead, 1961; Mikesell, 1995: 171). This phase of budget reform became known as performance budgeting (Tyler & Willand, 1997, Section: Performance Budgeting).

Page 26

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation In 1962, also in the United States, the Department of Defense, under Secretary of Defense Robert McNamara, developed the Planning, Programming, and Budgeting System (PPBS). The purpose of the PPBS was to increase efficiencies and improvements in government operations. It involved: • • •

establishing long-range planning objectives analyzing the costs and benefits of alternative programs that would meet those objectives translating programs into budget and legislative proposals and long-term projections.

The PPBS changed the traditional budgeting process by: •

emphasizing objectives

•

linking planning and budgeting (Office of the Secretary of Defense, 2007, Section: PPBS Introduced to Improve the Budgeting Process).

The early efforts of the PPBS would eventually lead to the monitoring for results movement. In the late 1960’s, many western European countries began to undertake program evaluation. In the Federal Republic of Germany, for example, Parliament started to require the federal government to report on the implementation and impact of various socio-economic and tax programs. These included reports on the: joint federal-state program to improve the regional economic structure (1970); labor market and employment act (1969)…; hospital investment program (1971)…; general educational grants law (1969)…; and various reports on subsidies and taxes (1970 to present) (Derlien, 1999, p. 41). During this time, the Canadian government also began to undertake steps toward evaluating government programs and performance. Canadian government departments were encouraged to establish planning and evaluation units. However, early efforts did not yield significant results. In the Canadian, German and Swedish cases, “…despite institutionalization of program evaluation in various policy areas, their systems remained rather fragmented and the number of studies carried out seems to be relatively low” (Derlien, 1999, p. 146).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 27

Chapter 1 The Elementary and Secondary Education Act (ESEA) of 1965 was a landmark for evaluation in the United States. This legislation mandated the government to assess student performance and teacher quality standards and provided resources (first US government budgetary set-aside for evaluation) to undertake these activities, thereby institutionalizing evaluation. With Federal money going into evaluation in the late 1960s and early 1970s, numerous articles and books on the topic of evaluation began to appear in the United States and some OECD countries. Graduate school university programs focusing on evaluation were developed to train a new cadre of evaluators to meet the increasing demands for accountability and effectiveness in government-financed socio-economic programs, such as elementary and secondary education grants, and “Great Society” (poverty reduction, Head Start preschools, civil rights, job corps, food stamps, etc.) programs. Canada, Sweden, and the Federal Republic of Germany (FRG) undertook program evaluation in the 1960’s to assess new government-financed education, health, and social welfare programs. In this context formal planning systems emerged, which either were limited to medium-term financing planning (in the FRG) or even attempted to integrate budgeting with programming (in Sweden and Canada). In any case, evaluation was either regarded logically as part of these planning systems or as necessitated by the information needs of the intervention programs… Evaluations, then, were primarily used by program managers to effectuate existing and new programs (Derlien, 1999, pp. 153-154). From the mid-1970’s to the mid-1980’s, evaluation became a full-fledged profession in many OECD countries. Professional evaluation associations were created, more programs to train evaluators were introduced, evaluation journals were started, and evaluation began to expand beyond the purview of government-financed programs to corporations, foundations, and religious institutions. For example, in France, public policy evaluation has been systematically developed, with many universities—including the Grandes Ecoles— offering courses and /or information about evaluation as part of their curriculum.

Page 28

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation Many OECD countries have established evaluation-training programs for civil servants either within the government or with outside contractors. In addition, new methodologies and models were explored, with greater emphasis on the information needs of consumers, examination of unintended outcomes, and the development of values and standards. The evaluation literature has also grown in quantity and quality (Fontain & Monnier, 2002, p. 71). Since 1985, computers and technology have vastly increased the ability of evaluators to collect, analyze and report on evaluation findings and share them with others.

Evolution of Development Evaluation Development evaluation evolved out of the audit and social science traditions. There are important similarities and differences, as well as linkages between these two traditions.

Audit Tradition Auditing traces its roots to 19th century Britain when growing commercial and industrial development gave rise to the …need for verifiably accurate and dependable financial records… Auditors’ work lent credibility to the growing capitalist infrastructure of the West. Auditors’ opinions carried weight because of their technical craftsmanship and because auditors were outsiders (Brooks, 1996, p. 16). The audit tradition has an investigative, financial management, and accounting orientation: did the program do what was planned and was the money spent within the rules, regulations, and requirements? It uses concepts such as internal controls, good management and governance, and verification. Its emphasis is on accountability and compliance. Auditors are traditionally independent from program management. From Afghanistan to Zimbabwe, almost every developed and developing country belongs to the International Organization of Supreme Audit Institutions (INTOSAI). As of 2008, the INTOSAI counts 188 countries among its members. That the audit tradition is strong in developing countries led to a strong tradition of compliance auditing in evaluation. For example, the role of Malaysia’s National Audit Department (NAD) in ensuring public accountability has existed for 100 years. Among other things, NAD believes “Due attention and compliance must be taken seriously at all times by all parties with regard to procedural and legislated requirements.” There are several different types of auditing: The Road to Results: Designing and Conducting Effective Development Evaluations

Page 29

Chapter 1 •

standard audit: an independent, objective assurance activity designed to add value and improve an organization’s operations. It helps an organization accomplish its objectives by bringing a systematic, disciplined approach to assess and improve the effectiveness of risk management, control and governance processes (OECD/DAC, 2002, Glossary)

•

financial audit: an audit that focuses on compliance with applicable statutes and regulations (OECD/DAC, 2002, Glossary, p. 17)

•

performance audit: an audit that is concerned with relevance, economy, efficiency, and effectiveness. (OECD/DAC, 2002, Glossary, p. 17).

The auditing profession – unlike the younger evaluation profession – has a common set of standards by which auditors abide. Indeed, auditing …gets much of its strength from the fact that it has a largely agreed upon a set of standards (Institute of Internal Auditors and national standards). It delivers a range of products, from comprehensive to compliance audits, dealing with different aspects of an organization, and moves outward to the organization’s activities and products (Treasury Board of Canada Secretariat, 1993). Furthermore, internal auditing “…encompass[es] [a wide array of] financial activities and operations including systems, production, engineering, marketing, and human resources” (Institute of Internal Auditors, 2007). It should also be noted that auditing gains strength from the fact that professional accreditation is offered, which is not yet the case with evaluation. Development evaluation drew from the audit profession a strong compliance focus on legal and procedural requirements. This can be observed in the objectives-based project evaluation frameworks of the bilateral donors and development banks. For example, the ”Good Practice Standards for Evaluation of MDB – Supported Public Sector Operations”, adopted by the Evaluation Cooperative Group of the Multilateral Development Banks (MDBs) have as a GPS the “Achievement of Objectives”, the justification is that “Evaluation against objectives enhances accountability” (p. 9.)

Page 30

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation

Audit and Evaluation Auditing and evaluation can be viewed as a kind of continuum, providing related but different kinds of information about compliance, accountability, impact, and results. There is some ...overlap in areas such as efficiency of operations and cost effectiveness…with evaluation concerned with analysis of policy and outputs, and auditing with internal financial controls and management systems (Treasury Board of Canada Secretariat, 1993, Forms of Linkage: History, Trends, and Theory area, ¶ 3). Auditing and evaluation have common objectives in that both aim to help decision-makers “…by providing them with systematic and credible information that can be useful in the creation, management, oversight, change, and occasionally abolishment of programs” (Wisler, 1996, p. 1). Much has been written on the differences and overlap between auditing and evaluation. Differences stem from their origins, with auditing deriving much of its approach from financial accounting, and evaluation deriving much of its approach from the social sciences. Thus, auditing has intended to focus on compliance with program requirements, while evaluation tends to focus on proving attribution of the changes to the program intervention. In other words, we observe the “inclination of auditors toward normative questions [what is versus what should be] and [the inclination] of evaluators toward descriptive and impact [cause and effect] questions” (Wisler, 1996, p. 3).

Social Science Tradition As governments and organizations moved from emphasizing questions about verification and compliance to emphasizing questions about impact, social science techniques were incorporated into evaluation. Development evaluation, similar to evaluation in developed countries, drew on scientific and social research methods. The scientific method “refers to research methodologies that pursue verifiable knowledge through the analysis of empirical data.” Calhoun, 2002). The scientific method, as the name implies, grew out of the 13th century natural sciences, and relied on Aristotle’s notion of induction. The scientific method is an approach to acquiring information. Procedures for using the scientific method vary from one field of inquiry to another but they have features that distinguish them from other ways of investigating phenomena or acquiring or correcting knowledge.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 31

Chapter 1 Scientific researchers: 1. Identify a problem, consider the problem and try to make sense of it, and consider previous explanations. 2. Development of a hypothesis and try to state an explanation. 3. Make a prediction based on the hypothesis. For example: if you assume the prediction is true, what consequences do you expect to follow? 4. Test the predictions through data collection and analysis. The advantage of the scientific method is that it can provide unbiased and replicable or verifiable evidence. Evaluation also uses a variety of methods from the social sciences, including sociology, anthropology, statistics, political science, etc. Indeed, The application of social research methods to evaluation coincides with the growth and refinement of the methods themselves, as well as with ideological, political, and demographic changes that have occurred this [past] century. Of key importance were the emergence and increased standing of the social sciences in universities and increased support for social research. Social science departments in universities became centers of early work in program evaluation and have continued to occupy an influence place in the field” (Rossi & Freeman, 1993, p. 9). Evaluation also draws heavily from social science research in areas such as: theory construction, design, approach, data collection methodology, analysis and interpretation, statistics, surveys, and sampling.

Page 32

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation

Part III: The Development Evaluation Context Development evaluation has emerged as a sub-discipline of evaluation. It began mainly with the post World War II efforts of reconstruction and development. The World Bank formed in 1944 and established the first independent evaluation group in 1972. The European Bank for Reconstruction and Development (EBRD) was founded in 1991. Other multilateral development banks followed. These are the origins for development evaluation. Bilateral organizations, such as Department for International Development (DfID) in the United Kingdom and the Canadian International Development Agency (CIDA), also contributed to the origins of development evaluation. Developing countries’ ministries needed to meet requirements for reporting project findings using project evaluation systems imposed by donor needs. As the notion of development has changed over the past decades, so has development evaluation. For example, since inception, the World Bank has shifted its emphasis with increasing implications for the complexity of development evaluation. Since its inception in 1944, the World Bank has expanded from a single institution to a closely associated group of five development institutions. •

1950’s: after World War II, were characterized by a focus on rebuilding, reconstruction, technical assistance, and engineering.

•

1960’s: as many newly independent countries were created, the development world placed primary emphasis on economic growth, financing, and the creation of projects in the hopes that higher economic growth would lift more people out of poverty.

•

1970’s: the emphasis shifted again to the social sectors or basic needs—education, health, and social welfare. As such, the development community began to do longer term planning and to make social sector investments.

•

1980’s: further shifts toward structural adjustment policies and lending. Adjustment lending was used to support major policy reforms and to help countries cope with financial and debt crises. Adjustment lending was linked to specific conditionality.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 33

Chapter 1 •

1990’s: the focus shifted to country-level assistance, that is, more comprehensive country-based programs as opposed to an emphasis on individual projects. More emphasis was given to building capacity and institutions within developing countries.

As we move through the first decade of the 21st century, trends in development highlight poverty reduction, partnerships, participation, and a results orientation. Sector-wide, country level, and global levels are now being increasingly used. Sectorwide approaches, or SWAPs, are a new way of organizing development aid and progressively moving to give governments more control of procedures and disbursement of funds. With SWAPs, development partners collaborate to support a government-led program. This collaboration is for a sector or sub-sector, such as the health sector or the education sector. With this approach, the development partners agree upon priorities and strategies and plan together. The development partners progressively give control of the procedures and funds to the government. Thus, SWAPs also include new challenges for funding strategies and donor partnership/coordination at the broader sectoral level. The current decade has also seen a shift from project to country level assistance programs and national poverty reduction strategies; that is, from partial to more comprehensive development approaches. Table 1.3 summarizes these transitions. Table Table 1.3: Changing Development Concepts Decades

Objectives

Approaches

Discipline

1950’s

Reconstruction

Technical assistance

Engineering

1960s

Growth

Projects

Finance

1970s

Basic needs

Sector investment

Planning

1980s

Adjustment

Adjustment lending

Neoclassical economics

1990s

Capacity-building

Country assistance

Multi-disciplinary

2000s

Poverty reduction

Partnerships

Results-based management

Source: Robert Picciotto, PowerPoint Presentation. World Bank 2002

Page 34

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation In addition, the importance of addressing global and regional public goods that affect development – through both countrylevel actions and global initiatives – has grown. The number of Global and Regional Partnership Programs (GRPPs) which seek to provide and enhance global public goods and address related development issues has reached over 150 in number. GRPPs address these issues through partnerships which lead to coordinated strategies. The strategies can achieve results more effectively than would have otherwise been achieved through activities undertaken at the country or local level alone The largest of these GRPPs address such global public goods as curbing the spread of communicable diseases (e.g. the Global Fund to Fight Aids, Tuberculosis, and Malaria [GFATM]), and preserving the global environmental commons and mitigating climate change (e.g. Global Environmental Facility [GEF]). Others help facilitate trade, preserve world financial stability, and sponsor both scientific and social science research and knowledge sharing important to global sustainable development.

The Growth of Development Evaluation Development evaluation is no longer an approach that views the program from the detached perspective of an outsider, but now increasingly views the evaluator as facilitator of a participatory, collaborative endeavor. The role of the evaluator has also expanded from evaluator as auditor and facilitator to evaluator as researcher. Evaluators are now expected to have a broader and more diverse skill set (Sonnichsen, 2000). The relationships between participants, donors, and evaluators are also changing. Where once evaluations were top-down events, they are changing to a more collaborative, joint approach that brings the key stakeholders together in designing and carrying out evaluations. Established in 1961, the mission of the OECD has been to: help governments achieve sustainable economic growth and employment and rising standards of living in member countries while maintaining financial stability, so contributing to the development of the world economy” (OECD, 2007, Overview of the OECD: What is it? History? Who does what? Structure of the organisation? How has it developed?). The members of the OECD meet in specialized committees. One committee is the Development Assistance Committee (DAC) which has long had a working group on development evaluation (currently the DAC Network on Evaluation).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 35

Chapter 1 The official OECD’s definition of development evaluation has been widely adapted. It differs somewhat from the generic definition of evaluation given at the beginning of this chapter. The systematic and objective assessment of an on-going or completed project, program or policy, its design, implementation and results. The aim is to determine the relevance and fulfillment of objectives, development efficiency, effectiveness, impact, and sustainability. An evaluation should provide information that is credible and useful, enabling the incorporation of lessons learned into the decision-making process of both recipients and donors. A wide variety of methodologies and practices have been utilized in the development evaluation community. It has generally become accepted that a mix of theories, analysis strategies, and methodologies often works best in development evaluation, especially given the growing scale and complexity of development projects, programs, or policies. Mixing approaches can help strengthen the evaluation. This mix of methods is called methodological triangulation, which refers to: the use of several theories, sources or types of information, and/or types of analysis to verify and substantiate an assessment. By combining multiple data sources, methods, analyses, or theories, evaluators seek to overcome the bias that comes from single informants, single methods, single observer, or single theory studies (OECD/DAC, 2002, Glossary, p. 37). Thus, development evaluation’s foundation is the OECD criteria. This trend is likely to persist for years to come. Evaluation continues to become ever more methodologically diverse. It is by now well established that the full array of social science methods belongs in the evaluator’s methodological toolkit—tools from psychology, statistics, education, sociology, political science, anthropology, and economics (Chelimsky & Shadish, 1997). Ultimately, the choice of which evaluation design and methodology – or combination of designs and methodologies – to use will be determined by the questions being asked and the information sought.

Page 36

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation

Growth of Professional Evaluation Associations In the 1980’s there were only three regional or national evaluation societies. Since then, there has been explosive growth in new national, regional, and international evaluation associations that have sprung up around the world. Currently, are more than 75 such evaluation associations in developing and developed countries alike (IOCE, January 2008). Much of the growth comes from the establishment of evaluation associations in developing countries. Professional evaluation associations create a support system and allow for professionalism within the evaluation community. This support contributes to capacity development in development evaluation. At the national level, to name just a few, the Malaysia Evaluation Society in 1999, the Sri Lanka Evaluation Association in 1999, and the Uganda Evaluation Society in 2002. On the regional level, the Australasian Evaluation Society was established in 1991, the European Evaluation Society in 1994, and the African Evaluation Association in 1999. An important international organization for evaluation is the International Organisation for Cooperation in Evaluation (IOCE). It is a loose alliance of regional and national evaluation organizations (associations, societies and networks) from around the world. Their aim is collaborating to: •

build evaluation leadership and capacity in developing countries

•

foster the cross-fertilization of evaluation theory and practice around the world.

IOCE assists the evaluation profession to take a more global approach to contributing to the identification and solution of world problems (IOCE, 2006). Another important international organization for evaluation is the International Development Evaluation Association (IDEAS) (IDEAS, 2004). IDEAS was created in 2001 to help build evaluation capacity in developing countries. IDEAS mission is: to advance and extend the practice of development evaluation by refining methods, strengthening capacity, and expanding ownership.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 37

Chapter 1 IDEAS is the only association for those who practice development evaluation. IDEAS strategy is to: •

Promote development evaluation for results, transparency and accountability in public policy and expenditure

•

Give priority to evaluation capacity development (ECD)

•

Foster the highest intellectual and professional standards in development evaluation…”

Part IV: Principles and Standards for Development Evaluation Most development-related organizations use five OECD/DAC criteria, or variations of them, for evaluating development assistance.

OECD/DAC Criteria for Evaluating Development Assistance •

Relevance: Relevance The extent to which the objectives of a development intervention are consistent with beneficiaries’ requirements, country needs, global priorities, and partners’ and development agencies’ policies.

•

Effectiveness: Effectiveness A measure of the extent to which an aid activity attains its objectives.

•

Efficiency: Efficiency Efficiency measures the outputs – qualitative and quantitative – in relation to the inputs. It is an economic term, which signifies that the aid uses the least costly resources possible in order to achieve the desired results. This generally requires comparing alternative approaches to achieving the same outputs, to see whether the most efficient process has been adopted.

•

Impact: Impact The positive and negative changes produced by a development intervention, directly or indirectly, intended or unintended. This involves the main impacts and effects resulting from the activity on the local social, economic, environmental and other development indicators. The examination should be concerned with both intended and unintended results and must include the positive and negative impact of external factors, such as changes in terms of trade and financial conditions.

•

Sustainability: Sustainability Sustainability is concerned with the resilience to risk the net benefit flows over time. It is particularly relevant to assess (not measure) whether the benefits of an activity or program are likely to continue after donor funding has been withdrawn. Projects and programs need to be environmentally as well as financially sustainable.

Source: OECD/DAC Criteria for Evaluating Development Assistance, 2007, p. 1.

Page 38

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation The following examples are examples from several evaluation reports, discussing the DAC criteria (adapted from ALNAP, 2006, pp. 24-25, 52-53, 46-47, and 58-59).

Example of Relevance: WFP evaluation of food food aid for relief and recovery in Somalia Source: WFP (2002) Full Report of the Evaluation of PRRO Somalia 6073.00, Food Aid for Relief and Recovery in Somalia. Rome: WFP.

Background The evaluation was carried out by two expatriates who visited Somalia for a three-week period in mid-July 2001. The evaluation assesses three-year support to 1.3 million people, with 63,000 million tonnes of food commodities distributed at a cost of US$ 55 million. Of this support, 51 per cent was projected to go towards rehabilitation and recovery, 30 per cent to emergency relief, and 19 per cent to social institutions. The primary aim of the protracted relief and recovery operation (PRRO) was to: “contribute to a broader framework for integrated rehabilitation programs in Somalia, while maintaining flexibility to both grasp development opportunities and respond to emergency situations.” The evaluation, therefore, needed to examine the relevance of this mix of allocations as well as the appropriateness of each type of intervention. Evaluation of relevance First, the overall relevance of the intervention is considered in the context of the political economy of aid in Somalia. The evaluation considered the rationale for providing food aid in Somalia. Arguments in favor of food aid included: the country is usually in food deficit, populations in many locations are isolated from customary markets, and are doubly disadvantaged due to loss of primary occupations and assets. Then again, it may make more sense to give the affected population funds to buy local food where this is available, whether as cash-for-work, or food-for-work. However… these commitments of their nature tend to be longer-run than most R&R projects… it was not clear what the exit strategy was in a number of instances. This evaluation’s examination of both wider and specific issues means that its analysis of relevance is comprehensive.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 39

Chapter 1

Example of Effectiveness: DFID evaluation of support to WFP in Bangladesh Source: DFID (2001) Emergency Food Aid to Flood-Affected People in South-western Bangladesh: Evaluation report. London: DFID.

Background In September 2000 floods in six southwestern districts of Bangladesh seriously affected about 2.7 million people. DFID supported the World Food Programme (WFP) in providing three distributions of food, including: a full ration of rice, pulses and oil. In the first distribution, 260,000 beneficiaries received the food support, and in the second and third distributions, 420,000 beneficiaries received the food support. The DFID evaluation provided a comprehensive analysis of whether the project objectives were met in relation to distribution of food aid, with particular reference to ration sizes, commodity mixes and distribution schedules, the latter being one of the factors contributing to timeliness. Evaluation of effectiveness The evaluation included both quantitative and qualitative methods. Quantitative data were collected in 2,644 randomly selected households in villages throughout the project zone. Qualitative data were collected during livelihood assessments in six representative villages, on the livelihoods systems, status and prospects in flood affected communities. A second, smaller evaluation team was deployed about five weeks after the end of the first qualitative assessment to explore community perceptions and behaviors related to the food ration, including issues such as timeliness of distribution, desirability of commodities and usage patterns. The quantitative and qualitative data sets were used in combination in the analysis. The report includes most key elements for the evaluation of effectiveness including:

Page 40

•

an examination of development of the intervention objectives, including analysis of the logframe

•

an assessment of criteria used for selection of beneficiaries, including primary stakeholder views of these criteria (an area which can also be assessed under coverage)

•

an analysis of implementation mechanisms, including levels of community participation

•

targeting accuracy, disaggregated by sex and socioeconomic grouping (again, this could be considered under coverage)

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation •

an examination of the resources provided – both the size of the ration and the commodity mix – including the reasons why they were provided (this could also be assessed under the relevance/appropriateness criterion)

•

an examination of the adequacy of distribution schedules

•

an analysis of the affected population’s view of the intervention.

Example of Efficiency: Evaluation of DEC Mozambique Fl Flood ood Appeal Funds Funds Source: DEC (2001) Independent Evaluation of DEC Mozambique Floods Appeal Funds: March 2000 – December 2000. London: DEC.

Background After the 2000 floods in Mozambique, the Disasters Emergency Committee (DEC) evaluation took a close look at the humanitarian response undertaken by DEC agencies. The purpose of the evaluation was to report to the UK public on how and where its funds were used, and to identify good practice for future emergency operations. The method for the evaluation included extensive interviews, background research and field visits, and a detailed beneficiary survey. Evaluation of efficiency The chapter dedicated to efficiency contains many of the key elements necessary for evaluation, including analysis of: •

the use of military assets by the DEC agencies, assessed in terms of: lack of collaborative use of helicopters to carry out needs assessment; the high costs of using western military forces for humanitarian relief, as compared to use of commercial facilities; and the comparative costs of the Royal Air Force, US military and South African National Defence Forces (the report notes that expensive military operations consumed large amounts of funding, which limited later donor funding of NGO projects)

•

the effects on efficiency of an underdeveloped market for contracted services; for example, although use of national contractors enabled agencies to implement equipment-heavy works, such as road repairs, without having to make large capital investments, the contractors used by the DEC agencies often failed to meet their obligations in a timely manner

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 41

Chapter 1 •

the efficiency of choice of response, i.e. intervening directly with operational programs, working though local partners, or working through international network members – the evaluation found that staff composition was a more important factor determining efficiency than choice of response (this area could also have been considered under appropriateness)

•

whether it was more efficient for agencies to build their response on existing capacity in-country or through international staff

•

whether agencies with existing partners were more efficient than those without

•

how investment in preparedness led to a more efficient response

•

the efficiency of accounting systems.

An attempt was made to compare input costs between the different agencies, for example of emergency kits, but this proved impossible given the different items provided and delivery channels used. Instead, the evaluation relies on cost implications of general practice followed, such as warehousing practices and transportation costs. In addition the evaluation includes a breakdown of expenditure of funds by sectors, and for each of the DEC agencies by supplies and material, non-personnel and personnel, and agency management costs.

Example of Impact: Joint evaluation of emergency assistance to Rwanda Source: JEEAR (1996) The International Response to Conflict and Genocide: Lessons from the Rwanda Experience. 5 volumes. Copenhagen: Steering Committee of JEEAR.

Background The Joint Evaluation of Emergency Assistance to Rwanda (JEEAR) was the largest and most comprehensive evaluation of humanitarian action. It involved 52 consultants and researchers. It also set standards for the joint assessment of the impact of political action and its lack in complex emergencies.

Page 42

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation JEEAR assessed impact mainly in terms of a lack of intervention in Rwanda by the international community, despite significant signs that forces in Rwanda were preparing the climate and structures for genocide and political assassinations. It employed a definition of humanitarian action that included both political and socioeconomic functions. The definition lead to an analysis of political structures that largely determine humanitarian response and impact. Evaluation of impact In the Rwanda case, the lack of intervention is considered in two parts: (i) an analysis of historical factors which explained the genocide; and (ii) a detailing the immediate events leading up to the genocide. The value of the joint evaluation is that it allowed an assessment that went beyond the confines of examination of single-sector interventions to an analysis of political economy. The political-economy approach is then linked to the evaluation of the effectiveness of the humanitarian response. This approach can be contrasted with that used in evaluations of other crises: conflict and its aftermath in Kosovo, the effects of Hurricane Mitch, and interventions in Afghanistan. In each of these cases, decisions were made to carry out single-agency and single-sector evaluations, which largely missed the political nature of the event and the response to it. In the Kosovo and Afghanistan cases, this led to a lack of attention by evaluators to issues of protection and human rights (ALNAP, 2004; 2001); in the Central American case it led to lack of attention to how far humanitarian action supported the transformative agenda proposed in the Stockholm Declaration (ALNAP, 2002). JEEAR is unusual in its assessment of impact because it places a strong emphasis on why there was little interest in intervening in Rwanda – principally because of its lack of geopolitical significance – rather than listing events and their consequences. One of the lessons for evaluators of JEEAR is that evaluations of impact need to look not only at what interventions took place, but also at what might have happened given other circumstances and different kinds of intervention.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 43

Chapter 1

Example of Sustainability: Sustainability: “The Third Country Training Program on Information and Communication Technology” Source: (JICA, 2005, Results of Evaluation, Achievement of the Project) Joint evaluation of emergency assistance to Rwanda

Background The Japan International Cooperation Agency (JICA) conducted an evaluation of this project in the Philippines. The project aimed to provide an opportunity for participants from Cambodia, Lao PDR, Myanmar and Vietnam (CLMV) to improve their knowledge and techniques in the field of information and communication technology for entrepreneurship. Evaluation of sustainability Sustainability is high as there are strong indications that project benefits will be continued. This is clearly shown by FITED's commitment to take on future training programs to achieve the project objectives. It has established an e-group to allow networking among participants and for FIT-ED to share knowledge and enhance their capacities. As an institution committed to help increase IT awareness in government and business sectors in the ASEAN countries, FIT-ED will continue to be at the forefront of ASEAN activities related to ICT. On the financial aspect, FIT-ED's adequate and timely allocation of resources for the three training courses proves its commitment to sustain the training program. The participants have likewise expressed commitment to support the initiatives of the project. They have acknowledge the importance of ICT in their businesses and a substantial number (84% of those interviewed) have already applied knowledge and skills absorbed form the training program in their work and businesses. These are in the areas of website development, communications, textile and apparel, exports and imports of handicrafts, construction, coffee production and government undertakings, among others. The respondents expressed that they have benefited much form the course which they have already foreseen at the beginning of the training program. Aside from using the strategic e-business plan drafted during the training program as reference, the participants also made use of the internet and World Wide Web to apply the knowledge gained from the course to promote the sectors above-cited.

Page 44

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation

DAC DAC Principles for Evaluation of Development Assistance The DAC also developed specific principles for the evaluation of development assistance that address the following issues: (OECD/DAC, 1991, pp. 2- 8) •

the purpose of evaluation

•

impartiality and independence

•

credibility

•

usefulness

•

participation of donors and recipients

•

donor cooperation

•

evaluation programming

•

design and implementation of evaluations

•

reporting, dissemination and feedback

•

application of these principles.

A review of the DAC Principles for Evaluation of Development Assistance was conducted in 1998. The review compared the DAC principles to those of other organizations and looked for consistency and possible areas to expand. Members responded with recommendations for possible revisions to the Principles, including: •

modifying the statement of purpose

•

directly addressing the question of decentralized evaluations and participatory evaluations

•

elaborating more on the principles and practices for recipient participation and donor cooperation

•

introducing recent developments in evaluation activity such as performance measure[ment], [status], and success rating systems, and developing a typology of evaluation activity (OECD/DAC, 1998, p. 18).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 45

Chapter 1 These revisions have not, however, been finalized. Why does the evaluation community need principles and standards? They: •

promote accountability

•

facilitate comparability

•

enhance reliability and quality of services provided (Picciotto, 2005).

In 1994, the American Evaluation Association (AEA) published its Program Evaluation Standards in the United States. These standards were approved by the American National Standards Institute (ANSI) as the American National Standards for program evaluation. They were updated in 1998 and have been adapted by other evaluation associations, including those of developing countries. In March, 2006, the DAC Evaluation Network established the DAC Evaluation Quality Standards (OECD/DAC, 2006). They are currently used on a trial basis, for test phase application. These standards identify the key pillars needed for a quality evaluation process and product. They are intended to:

Page 46

•

provide standards for the process (conduct) and products (outputs) of evaluations

•

facilitate the comparison of evaluations across countries (meta-evaluation)

•

facilitate partnerships and collaboration on joint evaluations

•

better enable member countries to make use of each others’ evaluation findings and reports (including good practice and lessons learned)

•

streamline evaluation efforts.

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation

The Ten Parts of the Evaluation Quality Standards: •

Rationale, purpose, and objectives of an evaluation

•

Evaluation scope

•

Context

•

Evaluation methodology

•

Information sources

•

Independence

•

Evaluation ethics

•

Quality assurance

•

Relevance of the evaluation results

•

Completeness.

The Web site describing the new DAC Evaluation Quality Standards can be found at: http://www.oecd.org/dataoecd/30/62/36596604.pdf In addition to the standards mentioned above, the World Bank’s Independent Evaluation Group, at the request of the OECD/DAC Evaluation Network and other evaluation networks, developed indicative consensus principles and standards for evaluating Global and Regional Partnership Programs (GRPPs), which have some unique features which make evaluation complex. These indicative principles and standards are being tested through use, and will be revised and endorsed in a few years. The Web site for these can be found at: www.worldbank.org/ieg/grpp/Sourcebook.html Principles and standards are a major part of ethical considerations and are discussed further in Chapter 14: Guiding the Evaluator: Evaluation Ethics, Politics, Standards, and Guiding Principles of this text.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 47

Chapter 1

Evaluation and Independence An important component of credibility of development evaluation is independence. The OECD/DAC Glossary defines an independent evaluation as: “An evaluation carried out by entities and persons free of the control of those responsible for the design and implementation of the development intervention.” The definition also includes the following note. The credibility of an evaluation depends in part on how independently it has been carried out. Independence implies freedom from political influence and organizational pressure. It is characterized by full access to information and by full autonomy in carrying out investigations and reporting findings (OECD/DAC, 2002, Glossary, p. 25). However, independence is not isolation. The interaction between evaluators, program managers, staff, and beneficiaries can enhance the evaluation and its use. An evaluation can be conducted internally or externally, by evaluators organizationally under those responsible (i.e. management) for making decisions about the design and implementation of the program interventions. These are not, however, independent evaluations. As such, they serve a learning purpose better than an accountability purpose. Earlier in this chapter we discussed internal and external evaluations. The point of the discussion was that neither is intrinsically independent, but the degree of independence of and evaluation unit or evaluation can be rated on certain characteristics. The heads of evaluation of the Multilateral Development Banks (MDB), who meet regularly as members of the Evaluation Cooperation Group (ECG), have identified four dimensions or criteria of evaluation independence. They are: •

organizational independence

•

behavioral independence

•

avoidance of conflicts of interest

•

protection from external influence.

The information in Table 1.2 specifies the criterion which each dimension is defined to review their level of independence:

Page 48

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation Table 1.2: Rating Template for Governance of Evaluation Organizations. Evaluation Cooperation Cooperation Group of the Multilateral Development Banks, 2002 Criterion

Aspects

Indicators

I. Organizational independence

The structure and role of the evaluation unit

Whether the evaluation unit has a mandate statement that clarifies that its scope of responsibility extends to all operations of the organization, and that its reporting line, staff, budget and functions are organizationally independent from the organization’s operational, policy, and strategy departments and related decision-making

The unit is accountable to, and reports evaluation results to, the head or deputy head of the organization or its governing Board

Whether there is a direct reporting relationship between the unit and the Management and/or Board of the institution

The unit is located organizationally outside the staff or line management function of the program, activity or entity being evaluated

The unit’s position in the organization relative to the program, activity or entity being evaluated

The unit reports regularly to the larger organization’s audit committee or other oversight body

Reporting relationship and frequency of reporting to the oversight body

The unit is sufficiently removed from political pressures to be able to report findings without fear of repercussions

Extent to which the evaluation unit and its staff are not accountable to political authorities, and are insulated from participation in political activities

Unit staffers are protected by a personnel system in which compensation, training, tenure and advancement are based on merit

Extent to which a merit system covering compensation, training, tenure and advancement is in place and enforced

Unit has access to all needed information and information sources

Extent to which the evaluation unit has unrestricted access to the organization’s staff, records, co-financiers and other partners, clients, and those of programs, activities or entities it funds or sponsors

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 49

Chapter 1

Criterion

Aspects

Indicators

II. Behavioral Independence

Ability and willingness to issue strong, uncompromising reports

Extent to which the evaluation unit has issued reports that invite public scrutiny (within appropriate safeguards to protect confidential or proprietary information and to mitigate institutional risk) of the lessons from the organization’s programs and activities; propose standards for performance that are in advance of those in current use by the organization; and critique the outcomes of the organization’s programs, activities and entities

Ability to report candidly

Extent to which the organization’s mandate provides that the evaluation unit transmits its reports to the Management/Board after review and comment by relevant corporate units but without management-imposed restrictions on their scope and comments

Transparency in the reporting of evaluation findings

Extent to which the organization’s disclosure rules permit the evaluation unit to report significant findings to concerned stakeholders, both internal and external (within appropriate safeguards to protect confidential or proprietary information and to mitigate institutional risk).

Proper design and execution of an evaluation

Extent to which the evaluation unit is able to determine the design, scope, timing and conduct of evaluations without Management interference

Evaluation study funding

Extent to which the evaluation unit is unimpeded by restrictions on funds or other resources that would adversely affect its ability to carry out its responsibilities

Judgments made by the evaluators

Extent to which the evaluator’s judgment as to the appropriate content of a report is not subject to overruling or influence by an external authority

Evaluation unit head hiring/firing, term of office, performance review and compensation

Mandate or equivalent document specifies procedures for the hiring, firing, term of office, performance review and compensation of the evaluation unit head that ensure independence from operational management

Staff hiring, promotion or firing

Extent to which the evaluation unit has control over staff hiring, promotion, pay increases, and firing, within a merit system

Continued staff employment

Extent to which the evaluator’s continued employment is based only on reasons related to job performance, competency or the need for evaluator services

III. Protection from outside interference

Page 50

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation

Criterion

Aspects

Indicators

IV. Avoidance of conflicts of interest

Official, professional, personal or financial relationships that might cause an evaluator to limit the extent of an inquiry, limit disclosure, or weaken or slant findings

Extent to which there are policies and procedures in place to identify evaluator relationships that might interfere with the independence of the evaluation; these policies and procedures are communicated to staff through training and other means; and they are enforced

Preconceived ideas, prejudices or social/political biases that could affect evaluation findings

Extent to which policies and procedures are in place and enforced that require evaluators to assess and report personal prejudices or biases that could imperil their ability to bring objectivity to the evaluation; and to which stakeholders are consulted as part of the evaluation process to ensure against evaluator bias

Current or previous involvement with a program, activity or entity being evaluated at a decisionmaking level, or in a financial management or accounting role; or seeking employment with such a program, activity or entity while conducting the evaluation

Extent to which rules or staffing procedures that prevent staff from evaluating programs, activities or entities for which they have or had decision-making or financial management roles, or with which they are seeking employment, are present and enforced

Financial interest in the program, activity or entity being evaluated

Extent to which rules or staffing procedures are in place and enforced to prevent staff from evaluating programs, activities or entities in which they have a financial interest

Immediate or close family member is involved in or is in a position to exert direct and significant influence over the program, activity or entity being evaluated

Extent to which rules or staffing procedures are in place and enforced to prevent staff from evaluating programs, activities or entities in which family members have influence

Sources: U.S. General Accounting Office, Government Auditing Standards, Amendment 3 (2002); OECD/DAC, Principles for Evaluation of Development Assistance (1991); INTOSAI, Code of Ethics and Auditing Standards (2001); Institute of Internal Auditors, Professional Practices Framework (2000); European Federation of Accountants, The Conceptual Approach to Protecting Auditor Independence (2001); Danish Ministry of Foreign Affairs, Evaluation Guidelines (1999); Canadian International Development Agency, CIDA Evaluation Guide (2000).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 51

Chapter 1

Summary Although evaluation has taken place for many centuries, it is recently that evaluation has looked at the effects of interventions on development. There are three kinds of evaluation, formative, summative, and prospective. Evaluations can be used for four purposes: ethical, managerial, decisional, and educating and motivational. Evaluation can provide information on strategy (are the right things being done?), operations (are thing being done right?), and learning (are there better ways?). Monitoring is a routine, ongoing, and internal activity used to collect information on an intervention. Both monitoring and evaluation measure and assess performance, however they do it in different ways. Evaluation can be conducted using internal evaluation, external evaluation, and/or participatory evaluation. Development evaluation evolved from social science research, the scientific method, and the audit tradition. The role of the evaluator changed overtime from an emphasis on evaluator as auditor, accountant, and certifier to evaluator as researcher, and facilitator of participatory evaluations. Development evaluation is based on the OECD/DAC criteria of relevance, effectiveness, efficiency, impact, and sustainability. The OECD/DAC has developed specific principles for evaluation of development assistance and evaluation quality standards. One indication of the growth of development evaluation is the increasing number of professional organizations and networks. There are now over 75 such associations. An important part of credibility is independence. The degree of independence of an evaluation unit or evaluation can be rated on certain characteristics. The heads of the MDB identifies four dimensions or criteria of evaluation independence. They are: organizational independence, behavioral independence, avoidance of conflicts of interest, and protection from external influence.

Page 52

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation

Chapter 1 Activities Application Exercise 1.1 1. Imagine that you have been asked to justify why development evaluation should be a budgeted expense for a new national program. The program was designed to improve the education of families about effective health practices. What would you say in defense of development education? 2. Interview an evaluator in your field (or review recent evaluation reports conducted in your field) to determine the extent to which standards and guiding principles are addressed in evaluations that this individual has seen. Where do the strengths seem to be? Where are the weaknesses? Share your findings with evaluation colleagues and listen to their comments and experiences. Do you see any patterns?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 53

Chapter 1

References and Further Reading Active Learning Network for Accountability and Performance in Humanitarian Action (ALNAP) (2006). Evaluating humanitarian action using the OECD/DAC criteria. London: Overseas Development Institute. Retrieved March 19, 2008 from http://www.odi.org.uk/alnap/publications/eha_dac/pdfs/ eha_2006.pdf ALNAP (2002) Humanitarian Action: Improved Performance through Improved Learning. ALNAP Annual Review 2002. London: ALNAP/ODI. ALNAP (2001) Humanitarian Action: Learning from Evaluation. ALNAP Annual Review 2001. London: ALNAP/ODI. Adamo, Abra (2003). Mainstreaming gender in IDRC’s MINGA program initiative: A formative evaluation. Retrieved on March 19, 2008 from http://idrinfo.idrc.ca/archive/corpdocs/121759/5460.pdf Asian Development Bank (ADB) (2007). Mongolia: Second financial sector program. Retrieved March 19, 2008 from http://www.oecd.org/dataoecd/59/35/39926954.pdf Bhy, Y. Tan Sri Data’ Setia Ambrin bin Huang, Auditor General of Malaysia (2006). The Role of the National Audit Department of Malaysia in Promoting Government Accountability. Paper presented at the 3rd ASOSAI Symposium held in Shanghai, China on 13 September 2006 Retrieved March 18, 2008 from http://apps.emoe.gov.my/bad/NADRole.htm Brooks, R.A. (1996). “Blending two cultures: State Legislative auditing and evaluation,” in Evaluation and auditing: Prospects for convergence, Carl Wisler, editor. New Directions for Evaluation, Number 71, Fall 1996. San Francisco: Jossey-Bass Publishers. Burkhead, J. (1956 and 1961), Government budgeting, New York: John Wiley and Sons. Callow-Heusser, Catherine (2002). Digital Resources for Evaluators. Retrieved July 17, 2007 from: http://www.resources4evaluators.info/CommunitiesOfEval uators.html Chelimsky, Eleanor. and William R. Shadish (1997). Evaluation for the 21st Century: A handbook. Thousand Oaks, CA: Sage Publications.

Page 54

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation Chelimsky, Eleanor (1997). “The coming transformations in evaluation,” in E. Chelimsky and W.R. Shadish (Eds.), Evaluation for the 21st Century: A handbook. Thousand Oaks, CA: Sage Chelimsky, Eleanor (1995). “Preamble: New dimensions in evaluation,” in Evaluating country development policies and programs: New approaches for a new agenda, Number 67, Fall 1995. Robert Picciotto, Ray C. Rist, editors. San Francisco: Jossey-Bass Publishers (Publication of the American Evaluation Association). Derlien, Hans-Ulrich (1999). “Program evaluation in the Federal Republic of Germany,” in Program evaluation and the management of government: Patterns and prospects across eight nations (Editor, Ray C. Rist). New Brunswick NJ: Transaction Publishers. DFID on SWAps http://www.keysheets.org/red_7_swaps_rev.pdf Evaluation Center, Western Michigan University: http://www.wmich.edu/evalctr/ Feuerstein, M. T. (1986). Partners in Evaluation: Evaluating Development and Community Programs with Participants. London: MacMillan, in association with Teaching Aids At Low Cost. Fitzpatrick, Jody L., James, R. Sanders, and Blaine. R Worthen (2004). Program evaluation: alternative approaches and practical guidelines, Third Edition. New York: Pearson Education, Inc. Fontaine C. and E. Monnier (2002). “Evaluation in France,” in International atlas of evaluation, J. E. Furubo, R. C. Rist, and R. Sandahl, Editors. New Brunswick, NJ: Transaction Publishers. Furubo, Jan-Eric and Sandahl, R. (2002). “Coordinated Pluralism,” in International Atlas of Evaluation, Jan-Eric Furubo, Ray Rist, and Rolf Sandahl, editors, New Brunswick, NJ: Transaction Publishers. Furubo, Jan-Eric,: Ray Rist and Rolf Sandahl, editors (2002). International Atlas of Evaluation. New Brunswick, NJ: Transaction Publishers. General Accounting Office (GAO) (1986). Teenage pregnancy: 500,000 births a year but few tested programs. Washington, D.C. U.S. GAO. Government of Malawi (2007). National indicators for routine monitoring of quality of health services at Central Hospital. Retrieved September 7, 2007 from http://www.malawi.gov.mw/MoHP/Information/Central%2 0Hospital%20Indicators.htm The Road to Results: Designing and Conducting Effective Development Evaluations

Page 55

Chapter 1 Human Rights Education: Retrieved March 18, 2008 from http://www.hrea.org/pubs/EvaluationGuide/ Insideout, Monitoring and Evaluations Specialists (2005). M&E In’s and Out’s. Insideout, Issue #3: October/November 2005. Retrieved July 17, 2007 from: http://www.insideoutresearch.co.za/docs/newsletter3.pdf International Organisation for Cooperation in Evaluation (IOCE). Members – Evaluation Associations around the World. Retrieved March 18, 2008 from http://ioce.net/members/eval_associations.shtml International Organisation for Cooperation in Evaluation (IOCE) (2006). Retrieved July 17, 2007 from: http://ioce.net/overview/general.shtml International Development Evaluation Association (IDEAS) (2004). Retrieved July 17, 2007 from: http://www.ideas-int.org/ The Institute of Internal Auditors. Retrieved July 17, 2007 from: http://www.theiia.org The International Organization of Supreme Audit Institutions. Retrieved July 17, 2007 from: http://www.gao.gov/cghome/parwi/img4.html Japan International Cooperation Agency (JICA) (2005). JICA Evaluation – Information and Communication. Retrieved March 19, 2008 from http://www.jica.go.jp/english/evaluation/project/term/as /2004/phi_03.html KRA Corporation (1997). A guide to evaluating crime control of programs in public housing. Prepared for U.S. Department of Housing and Urban Development Office of Policy Development and Research. Retrieved on September 10, 2007 from http://www.ojp.usdoj.gov/BJA/evaluation/guide/documen ts/guide_to_evaluating_crime.html Kusek, Jody Zall and Ray C. Rist ( 2004). Ten steps to a resultsbased monitoring and evaluation system. Washington D.C.: World Bank. Lawrence, J. (1989). Engaging Recipients in Development Evaluation—the “Stakeholder” Approach. Evaluation Review, 13(3). Linkages between Audit and Evaluation in Canadian Federal Developments,” Treasury Board of Canada. Retrieved March 18, 2008 from http://www.tbssct.gc.ca/pubs_pol/dcgpubs/TB_h4/evaluation03_e.asp

Page 56

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation Mikesell, J. L. (1995), Fiscal administration: analysis and applications for the public sector, 4th ed. Belmont, CA: Wadsworth Publishing Company Monitoring and Evaluation of Population and Health Programs, MEASURE Evaluation Project, University of North Carolina at Chapel Hill. Retrieved March 18, 2008 from http://www.cpc.unc.edu/measure Molund, Stefan and Göran Schill (2004). Looking back, moving forward: SIDA evaluation manual. Stockholm: SIDA. OECD (2007). Organization for European Economic Development Co-operation. Retrieved September 10, 2007 from http://www.oecd.org/document/48/0,3343,en_2649_2011 85_1876912_1_1_1_1,00.html OECD (2007a). Development Co-operation Directorate (DCDDAC). Retrieved September 10, 2007 from http://www.oecd.org/department/0,2688,en_2649_33721_ 1_1_1_1_1,00.html OECD/DAC (2006) Evaluation Quality Standards (for test phase application). 30-31 March 2006. Retrieved July 17, 2007 from: http://www.oecd.org/dataoecd/30/62/36596604.pdf OECD/DAC (2002). OECD glossary of key terms in evaluation and results based management. Paris: OECD Publications. OECD/DAC (1998) Review of the DAC Principles for Evaluation of Development Assistance. Retrieved July 17, 2007 from: http://www.oecd.org/dataoecd/31/12/2755284.pdf OECD/DAC (1991). Criteria for Evaluating Development Assistance. Retrieved July 17, 2007 from: http://www.oecd.org/document/22/0,2340,en_2649_3443 5_2086550_1_1_1_1,00.html OECD/DAC, (1991) Principles for Evaluation of Development Assistance. Retrieved July 17, 2007 from: http://siteresources.worldbank.org/EXTGLOREGPARPRO/ Resources/DACPrinciples1991.pdf Office of the Secretary of Defense (OSD) Comptroller Center (2007). The Historical Context. Retrieved September 10, 2007 from http://www.defenselink.mil/comptroller/icenter/budget/hi stcontext.htm

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 57

Chapter 1 Picciotto, Robert. (2005) “The value of evaluation standards: A comparative assessment.” in Journal of Multidisciplinary Evaluation. Number 3. Retrieved July 17, 2007 from: http://evaluation.wmich.edu/jmde/content/JMDE003cont ent/PDFs%20JMDE%20003/4_%20The_Value_of_Evaluatio n_Standards_A_Comparative_Assessment.pdf Picciotto, Robert (2001).“Development Evaluation as a Discipline” Paper presented at the International Program for Development Evaluation (IPDET), 2001. Proposal for Sector-wide Approaches (SWAp) http://enet.iadb.org/idbdocswebservices/idbdocsInternet/I ADBPublicDoc.aspx?docnum=509733Patton, Michael Quinn (1997). Utilization-focused Evaluation (3rd Ed.). Thousand Oaks, CA: Sage Publications. Quesnel, Jean Serge (2006). “The Importance of Evaluation Associations and Networks” UNICEF Regional Office for CEE/CIS and IPEN Issue #5 New Trends in Development Evaluation. Retrieved July 17. 2007 from: http://www.unicef.org/ceecis/New_trends_Dev_EValuation .pdf Rossi, Peter and Howard Freeman (1993). Evaluation: A systematic approach. Thousand Oaks, CA: Sage. Simpson, John and Graeme Diamond (Eds.) (2007). Oxford English Dictionary. Oxford: Oxford University Press. Sonnichsen, R. C. (2000). High impact internal evaluation. Thousand Oaks, CA: Sage. Treasury Board of Canada Secretariat (1993).“Linkages between Audit and Evaluation in Canadian Federal Developments”. Retrieved July 17, 2007 from: http://www.tbssct.gc.ca/pubs_pol/dcgpubs/TB_h4/evaluation03_e.asp Tyler, C and J. Willand (1997). Public budgeting in America: A twentieth century retrospective. Journal of public budgeting, accounting and financial management. Vol. 9, no. 2 (Summer 1997). Retrieved September 10, 2007 from http://www.ipspr.sc.edu/publication/Budgeting_in_Americ a.htm UNFPA List of Evaluation Reports and Findings. United Nations Population Fund. Retrieved March 18, 2008 from http://www.unfpa.org/publications/index.cfm United Nations Development Project Evaluation Office. Retrieved March 18, 2008 from www.undp.org/eo/

Page 58

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation UNICEF Regional Office for CEE/CIS and IPEN Issue #5 (2006). New Trends in Development Evaluation. Retrieved March 18, 2008 from http://www.unicef.org/ceecis/New_trends_Dev_EValuation .pdf United States Department of Housing and Urban Development, Office of Policy Development and Research. (1997). A Guide to Evaluating Crime Control of Programs in Public Housing. Washington, DC: Prepared for the U.S. Department of Housing and Urban Development by KRA Corporation; pp. 1.2-1.4. Retrieved March 17, 2008 from http://www.ojp.usdoj.gov/BJA/evaluation/guide/documen ts/benefits_of_evaluation.htm United States General Accounting Office (GAO) reports free of charge. Retrieved March 18, 2008 from www.gao.gov/docsearch Weiss, Carol (2004). Identifying the intended use(s) of an evaluation. Evaluation Guideline 6. The IDR Website. Retrieved July 17, 2007 from:http://www.idrc.ca/ev_en.php?ID=58213_201&ID2=D O_TOPIC p 1 Wisler, Carl. Editor (1996). Evaluation and auditing: Prospects for convergences, in New Directions for Evaluation, Number 71, Fall 1996. San Francisco: Jossey-Bass Publishers. World Bank. Retrieved March 18, 2008 from www.worldbank.org The World Bank Participation Sourcebook. (HTML format). Retrieved March 18, 2008 from http://www.worldbank.org/wbi/sourcebook/sbhome.htm

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 59

Chapter 1

Web Sites Evaluation Organizations As of January 17, 2008) Retrieved March 18, 2008 from http://ioce.net/members/eval_associations.shtm African Evaluation Association – www.afrea.org/ American Evaluation Association – www.eval.org Australasian Evaluation Society – www.aes.asn.au Australasian Evaluation Society – http://www.aes.asn.au/ Brazilian Evaluation Association – www.avaliabrasil.org.br Canadian Evaluation Society – www.evaluationcanada.ca Danish Evaluation Society – www.danskevalueringsselskab.dk Dutch Evaluation Society – www.videnet.nl/ European Evaluation Society – www.europeanevaluation.org Finnish Evaluation Society – www.finnishevaluationsociety.net/ French Evaluation Society – www.sfe.asso.fr/ German Society for Evaluation (DeGEval) standards – http://www.degeval.de/ International Development Evaluation Association (IDEAS) – http://www.ideas-int.org/ IDEAS Website with links to many organizations – http://www.ideas-int.org/Links.aspx International Organisation for Cooperation in Evaluation – http://ioce.net/overview/general.shtml International Organization of Supreme Audit Institutions (INTOSAI) – http://www.intosai.org/ International Program Evaluation Network (Russia & Newly Independent States) – www.eval-net.org/ Israeli Association for Program Evaluation – www.iape.org.il Italian Evaluation Association guidelines – http://www.valutazioneitaliana.it/ Japan Evaluation Society – http://ioce.net/members/eval_associations.shtml Latin American and Caribbean Programme for Strengthening the Regional Capacity for Evaluation of Rural Poverty Alleviation Projects (PREVAL) – www.preval.org/ Malaysian Evaluation Society – www.mes.org.my Page 60

The Road to Results: Designing and Conducting Effective Development Evaluations

Introducing Development Evaluation Niger Network of Monitoring and Evaluation (ReNSE) – www.pnud.ne/rense/ Polish Evaluation Society – http://www.pte.org.pl/x.php/1,71/Strona-glowna.html Quebec Society for Program Evaluation – www.sqep.ca Red de Evaluacion de America Latina y el Caribe (ReLAC) – www.relacweb.org South African Evaluation Network (SAENet) – www.afrea.org/webs/southafrica/ South African Monitoring and Evaluation Association (SAMEA) – www.samea.org.za Spanish Evaluation Society – www.sociedadevaluacion.org Sri Lanka Evaluation Association (SLEvA) – www.nsf.ac.lk/sleva/ Swedish Evaluation Society – www.svuf.nu Swiss Evaluation Society (SEVAL) – www.seval.ch/de/index.cfm Uganda Evaluation Association (UEA) – www.ueas.org United Kingdom Evaluation Society – www.evaluation.org.uk Wallonian Society for Evaluation (Belgium – www.prospeval.org Evaluation Standards American Evaluation Association Guiding Principles – http://www.eval.org/Publications/GuidingPrinciples.asp African Evaluation Association Evaluation Standards and Guidelines – http://www.afrea.org/ Australasian Evaluation Society ethical guidelines for evaluators – http://www.aes.asn.au/content/ethics_guidelines.pdf. DAC Evaluation Quality Standards (for test phase application). 30-31 March 2006 – http://www.oecd.org/dataoecd/30/62/36596604.pdf German Society for Evaluation (DeGEval) standards – http://www.degeval.de/standards/standards.htm Italian Evaluation Association guidelines – http://www.valutazioneitaliana.it/statuto.htm#Linee Program Evaluation Standards updated in 1998 – http://www.eval.org/EvaluationDocuments/progeval.html Swiss Evaluation Society (SEVAL) standards – http://seval.ch/

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 61

Chapter 1 United Nations Evaluation Group (UNEG) Norms for Evaluation in the UN System – http://www.uneval.org/docs/ACFFC9F.pdf M&E In’s and Out’s. InsideOut, Issue #3: October/November 2005 – http://www.insideoutresearch.co.za/docs/newsletter3.pdf United Nations Evaluation Group (UNEG) Standards for Evaluation in the UN System – http://www.uneval.org/docs/ACFFCA1.pdf

Page 62

The Road to Results: Designing and Conducting Effective Development Evaluations

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 2 Understanding Issues Driving Development Evaluation Introduction The previous chapter of this text discussed the history of development evaluation. The field of development evaluation is a relatively new field which changes in response to emerging issues in developed and developing countries. This chapter looks at some of the current issues that affect both developed and developing countries. This chapter has two parts. They are: •

Evaluation in Developed and Developing Countries: An Overview

•

Emerging Development Issues: What Are the Evaluation Implications?

Chapter 2

Part I: Evaluation in Developed and Developing Countries: An Overview Evaluation can assist countries to learn about how well, and to what extent, they are achieving their development goals, such as the Millennium Development Goals. Policy makers and others can use key insights and recommendations drawn from evaluation findings to initiate change. Evaluation enables countries to learn from experience to improve the design and delivery of current projects, programs, and policies and/or change future directions. Many developed and developing countries have put monitoring and evaluation systems in place to assist with development. These systems can be set up in different ways, depending upon the needs and the resources available. This section of the chapter begins discussing what developed countries are doing and then discusses different approaches to evaluation. It then turns to a discussion on the kinds of evaluation that are taking place in developing countries.

Evaluation in Developed Countries A large majority of the thirty OECD countries now have mature monitoring and evaluation (M&E) systems. Arriving there was neither an easy nor linear process for them. They differ, often substantially, in their paths, approach, style, and level of development. Furubo, Rist, and Sandahl (2002) edited the Evaluation Atlas as an attempt to map evaluation cultures in OECD countries and to explain the observed patterns. In the descriptive part of the Evaluation Atlas, 23 countries were studied using nine different variables. Each country was given a score between 0 and 2 for each of the nine variables. A score of 2 indicated a high level of evaluation maturity and a 0 a low level of evaluation maturity.

Page 64

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation The nine variables they rated were: 1. Evaluation takes place in many policy domains. 2. There is a supply of evaluators specializing in different disciplines who have mastered different evaluation methods, and who conduct evaluations. 3. There is a national discourse concerning evaluation in which more general discussions are adjusted to the specific national environment. 4. There is a profession with its own societies or frequent attendance at meetings of international societies and at least some discussion concerning the norms and ethics of the profession. 5. [There are] institutional arrangements in the government for conducting evaluations and disseminating their results to decision makers. 6. Institutional arrangements are present in Parliament [or other legislative bodies] for conducting evaluations and disseminating them to decision makers. 7. An element of pluralism exists: that is, within each policy domain there are different people or agencies commissioning and performing evaluations. 8. Evaluation activities [also take place] within the Supreme Audit Institution. 9. The evaluations done should not just be focused on the relation between inputs/outputs or technical production. Some public sector evaluations must have program or policy outcomes as their object (Furubo, et al, 2002, p. 7-9) According to these in criteria, Australia, Canada, the Netherlands, Sweden and the United States had the highest “evaluation culture rankings” among OECD countries (Furubo, Rist, & Sandahl, 2002, p. 10). The OECD countries have developed evaluation cultures and M&E systems in response to varying degrees of internal and external pressures. For example, France, Germany, and the Netherlands developed such a culture in response to both strong internal and external (mostly EU-related) pressures, while countries such as Australia, Canada, Korea, and the United States were motivated mostly by strong internal pressures (Furubo, et al, 2002). Interestingly, the first wave of OECD countries were motivated to adopt evaluation cultures mostly because of strong internal pressures such as, domestic planning, programming and budgeting imperatives for new socio-economic spending programs, as well as parliamentary oversight. The Road to Results: Designing and Conducting Effective Development Evaluations

Page 65

Chapter 2 Several factors contributed to the adoption of an evaluation culture in the pioneering countries in particular. Many of the earliest adopters of evaluation systems were predisposed to do so because they had democratic political systems, strong empirical traditions, civil servants trained in the social sciences (as opposed to strict legal training), and efficient administrative systems and institutions. Countries with high levels of expenditure on education, health, and social welfare also adopted evaluation mechanisms in these areas, which then spilled over into other areas of public policy. The OECD countries that were early adaptors of an evaluation culture were also instrumental in spreading evaluation culture to other countries by disseminating evaluation ideas and information, and by launching evaluation organizations, training institutes, networks, and consulting firms. In contrast, many of the latecomer OECD countries in terms of developing an evaluation culture (e.g. Italy, Ireland, and Spain), tended to respond to evaluation issues mainly because of strong external pressures, primarily EU membership requirements, including access to EU structural development funds. These “late comers” were also heavily influenced by the evaluation culture of the first wave countries, as well as the evaluation culture rooted in the international organizations with which they interact. The Tavistock Institute (2003, p. 86) describes an idealized model, or map for a journey, that has four stages and intermediate destinations towards developing evaluation capacity. They concede that this is only an idealized model, cases differ and some are very different. Their four stages are: •

Stage One: Mandating Evaluation

•

Stage Two: Coordinating Evaluation

•

Stage Three: Institutionalizing Evaluation

•

Stage Four: Towards an Evaluation System (Tavistock, 2003, pp. 86-98).

Stage one usually begins with external pressure that requires evaluation through norms, regulations or policy objectives. Even when the driving force comes from within, a certain degree of external scrutiny is likely. Stage two includes two kinds of actions in response to the formal and rule-based first stage evaluation policy. The two actions are:

Page 66

•

providing guidelines and basic tools

•

professionalizing staff as a way of improving quality.

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Stage three usually begins after a central unit is up and running. Stage three moves towards institutionalization and includes two different steps, usually taking place simultaneously. The two steps are: •

creating decentralized units

•

improving supply of evaluation expertise.

Stage four is building a fully-operative evaluation system where evaluation is incorporated into policy making, program management, and governance. It includes: •

the establishment of stronger internal links within the system

•

opening up the network to external stakeholders.

WholeWhole-ofof-Government, Enclave, and Mixed Approaches The pioneering and the latecomer OECD countries also differed in their approach to creating evaluation systems. There are essentially three approaches: •

whole-of-government approach

•

enclave approach

•

mixed approaches.

Whole--of of--Government Approach The Whole The whole-of-government approach was adopted in some of the early M&E countries, such as Australia. This approach involves a broad-based, comprehensive establishment of M&E across the government. Sustained government commitment is also important. A whole-of-government approach framework cannot be developed overnight. For it to succeed it must first: •

win support from the government

•

develop necessary skills

•

set up civil service structures and systems to make full use of M&E findings.

For developing countries, in addition to the requirement for sustained government support, is the need for steady support from development assistance agencies. It can take at least a decade at the whole-of-government level to embed such a framework in a sustainable manner (World Bank OED, 1999, p. 3).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 67

Chapter 2 With the adoption of the Millennium Development Goals (MDG), many developing countries are looking to design and implement comprehensive, whole-of-government, evaluation systems. Also, with the growing emphasis on results in international aid lending, more donors, governments, and other institutions are providing support to developing countries to build evaluation capacity and systems. Often different ministries are at different stages in their ability to take on the establishment of an evaluation system. The whole-of-government strategy, then, may not be able to move all ministries simultaneously; there may be a need for sequencing among ministries in developing these systems. Many times, innovations at one level will filter both horizontally and vertically to other levels in the government.

Enclave Approach The second approach, the enclave approach, is more limited. It is focused on one part or sector of the whole government. For example, the focus might be on one ministry of the government as has been the case in Mexico (social development), Jordan (planning), and the Kyrgyz Republic (health). Many countries, especially developing countries, may not yet be in a position to adopt such sweeping change in a comprehensive fashion as is at the basis of a whole-of-government approach. Working with one ministry that has a strong champion may be the best course of action.

Mixed Approach Interestingly, other countries such as Ireland, have adopted a third, blended, or mixed approach to evaluation. While some areas are comprehensively evaluated (e.g. projects financed by the EU Structural Funds), other areas receive more sporadic attention. The government of Ireland began their evaluation system with an enclave approach but moved in the direction of a more comprehensive evaluation approach with respect to government expenditure programs (Lee, 1999, pp. 78-79). The mixed approach may also be a valid alternative for some developing countries.

Examples of Evaluation in OECD Countries We will now look at three examples of how evaluation is conducted in OECD countries, including France, the United States, and Australia.

Page 68

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation France: France: Adopting a New Program Approach to Evaluation In contrast to other OECD countries, France had been among the group of OECD countries that was slowest to move towards a mature evaluation system. Indeed, France has even lagged behind many transition and developing economies. Various incremental reforms efforts were attempted during the late 1980’s and throughout the 1990’s. Then in 2001, the French government passed sweeping legislation, replacing the 1959 financial constitutional by-law eliminating line item budgeting, and instituting instead a new program approach. The new constitutional by-law, phased in over a five-year period (2001-2006), had two primary aims: •

to reform the public management framework, in order to make it resultsand performance-oriented

•

to strengthen parliamentary supervision.

As the former Prime Minister, Lionel Jospin, noted: “The budget’s presentation in the form of programs grouping together expenditure by major public policy should give both members of Parliament and citizens a clear picture of the government’s priorities and the cost and results of its action.” Approximately 100 programs were identified, and financial resources were being budgeted against them. Every program budget that is submitted to Parliament must have a statement of precise objectives and performance indicators. Public managers have greater freedom and autonomy with respect to the allocation of resources, but in return they are held more accountable for results. Thus, the new budget process is results-driven. Future budget requests are to include annual performance plans, detailing the expected versus actual results for each program. Annual performance reports are also to be included in budgetary reviews. Consequently, parliamentarians will improve their ability to evaluate the performance of these governmental programs. In line with observations about the political nature of evaluation, this reform initiative alters some of the political and institutional relationships within the French government. In this context, the Parliament has been given increased budgetary powers. “Article 40 of the Constitution previously prohibited members of Parliament from tabling amendments that would increase spending and reduce revenue. They are able to change the distribution of appropriations among programs in a given mission.” Parliament was able to vote on: •

revenue estimates

•

appropriations for each mission

•

the limits of the number of state jobs created

•

special accounts and specific budgets. (continued on next page)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 69

Chapter 2 France: Adopting a New Program Approach to Evaluation (continued) In addition, the parliamentary finance committees have monitoring and supervisory responsibilities concerning the budget. Public servants reacted to the changes immediately. There was a new bureaucracy of control, new accountants, more audits, more questionnaires about the Audit Offices and the inspectors, requests for reporting, etc. Managers had difficulty adapting to the constraints of achieving outputs (quantitative) results and displacing the quality of other services that did not appear in the objectives. When there are only quantitative objectives it means that people do what is asked, but no more. As for the quality of service, “no mechanism of competition or of strong consumerist pressures make it possible to guarantee it” (Trosa, 2008, p. 8). Societies are very complex. To assist development, some people need financial assistance, others need trust and the assumption of responsibility, and others are living happily. The question is how to summarize these contradictions in one formula (Trosca, 2008 p. 13). Combining the previous model of evaluation with the new one did not allow freedom of management, creativity, and innovation. An alternative model will be needed. Another lesson learned from the French experience is that “enhancing internal management can not be done without linking it to the internal governance of public sectors” (Trosa, 2008, p. 2). Trosca also suggests that the new system does not need to be demolished, but widened by clearly discussing required purposes while encouraging logics of action and not only the use of tools (2008, p. 13). (Sources: Towards New Public Management Newsletter, 2001.

Page 70

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation United States: Evaluation in the U.S. Government − U.S. Government Performance Results Act (GPRA) of 1993 One key development in United States government evaluation in recent years is the U.S. Government Performance Results Act (GPRA) of 1993, which entailed instituting results-based evaluation in all U.S. government agencies. It has directly affected how evaluation is deployed across the U.S. government. GPRA was an all of government approach, beginning with pilots then phasing in the changes. Performance measurement in the U.S. began first with local governments in the 1970’s, then spread to state governments, and eventually to the federal level with the enactment of the GPRA in 1993. “The US federal government actually joined the performance game later than other governments in the US and abroad” (Newcomer, 2001, p. 338). The purposes of the US Government Performance Results Act are to: 1) improve the confidence of the American people in the capability of the Federal Government, by systematically holding Federal agencies accountable for achieving program results 2) initiate program performance reform with a series of pilot projects in setting program goals, measuring program performance against those goals, and reporting publicly on their progress 3) improve Federal program effectiveness and public accountability by promoting a new focus on results, service quality, and customer satisfaction 4) help Federal managers improve service delivery, by requiring that they plan for meeting program objectives and by providing them with information about program results and service quality 5) improve congressional decision-making by providing more objective information on achieving statutory objectives, and on the relative effectiveness and efficacy of Federal programs and spending 6) improve internal management of the Federal Government. The GPRA forced federal agencies to focus on their missions and goals, how to achieve them, and how to improve the structural organizations and business processes. They are required to submit a five-year strategic plan and updates every three years to their programs. They must also identify any “key external factors” that might have a significant effect on the ability of the agency to achieve goals and objectives. Agencies must publish annual program performance reports. Agencies must also measure their performance to ensure that they are meeting goals and making informed decisions. Performance measures need to be based on program-related characteristics and must be complete, accurate, and consistent. The data collected must be used to improve organizational processes, identify gaps, and set performance goals (GAO, 2003, GPRA-GGD-96-118). A 2003 survey of 16 programs across 12 United States government agencies found that many federal programs had already made use of regularly collected outcome data to help them improve their programs. For example, outcome data were used to trigger corrective action; identify and encourage ‘best practices’; motivate [and recognize staff]; and to plan and budget. (continued on next page)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 71

Chapter 2 United States: Evaluation in the U.S. Government − U.S. Government Performance Results Act (GPRA) of 1993 (continued) At the same time, the survey found some continuing obstacles— indeed obstacles that can affect any organization—to the use of outcome data:. 1) lack of authority or interest to make changes 2) limited understanding of use of outcome data 3) outcome data problems (e.g. old data, non-disaggregated data; lack of specificity, need for intermediate data, etc) 4) fear of ‘rocking the boat’ (Hatry, Morely, Rossman & Wholey, 2003, pp. 11-13). GPRA was extended to integrate performance and budgeting. Efforts were also being made across the government to time more closely GPRA strategic and annual planning and reporting. ChannahSorah (2003, pp. 11-13) summarizes GPRA as “just ‘good business.’ Its requirements have provided government departments with tools for very basic ways of conducting business in sensible ways: set performance goals and measure both long and short-term outcomes. Any organization seeking to provide improved quality of life, greater quantity of services, and enhanced overall quality of customer services must have a vision and a mission, set goals and objectives, and must measure results.” The GAO found that many US agencies face significant challenges GPRA implementation has alsofound that many agencies face significant challenges in establishing an agency-wide results-orientation (GAO, 2003, p. 7-8). Federal managers surveyed reported that agency leaders did not consistently demonstrate a strong commitment to achieving results. Furthermore, these managers believed that agencies did not always positively recognize employees for helping the agency accomplish its strategic goals. The GAO also reported that high-performing organizations seek to shift the focus of management and accountability from activities and processes to contributions and achieving results. However, although many federal managers they surveyed reported that they were held accountable for the results of their programs, only a few reported that they had the decision making authority they needed to help the agencies accomplish their strategic goals. Finally, the GAO found that although managers increasingly reported having results-oriented performance measures for their programs, the extent to which these managers reported using performance information for any of the key management activities we asked about mostly declined from earlier survey levels. The GAO (2003, p. 8) reports the need to transform their organizational cultures. so that they are more results-oriented, customer-focused, and collaborative. Leading public organizations in the United States and abroad have found that strategic human capital management must be the centerpiece of any serious change management initiative and efforts to transform the cultures of government agencies. Performance management systems are integral to strategic human capital management. Such systems can be key tools to maximizing performance by aligning institutional performance measures with individual performance and creating a “line of sight” between individual and organizational goals. Leading organizations use their performance management systems as a key tool for aligning institutional, unit, and employee performance; achieving results; accelerating change; managing the organization day to day; and facilitating communication throughout the year so that discussions about individual and organizational performance are integrated and ongoing.

Page 72

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Australia’s WholeWhole-ofof-Government Model Australia was one of the early pioneers in developing evaluation systems starting in 1987. It had intrinsic advantages, which were conducive to building a sound evaluative culture and structure: •

strong human, institutional and management capacity in the public sector

•

public service known for integrity, honesty and professionalism

•

well-developed financial, budgetary and accounting systems

•

a tradition of accountability and transparency

•

credible, legitimate political leaders.

Two main of factors contributed to the success in building strong evaluation systems in Australia. Initially, budgetary constraints prompted the government to look at ways of achieving better value-for-money. Australia also had two important institutional champions for evaluation—the Department of Finance and the Australian National Audit Office. Australia chose to adopt a whole-of-government strategy (as opposed to an enclave strategy whereby one or two agencies would first pilot the evaluative approach); such a strategy aims to bring all ministries on board—both the leading and the reluctant. It also had the support of Cabinet members and key ministers, who placed importance on using evaluation findings to better inform decision-making (Mackay, 2002). The first generation (1987-1997) began during a time of severe budget pressures. Many public sector reforms occurred where line departments and agencies were given autonomy but failed to conduct monitoring and evaluation. For this reason, governments forced departments and agencies into evaluation. The objectives of this first generation of M&E systems were to: •

aid budget decision-making

•

strengthen accountability within the government

•

aid managers in ministries/agencies.

The first generation M&E system was designed and managed by the Department of Finance. Evaluations were mandatory, every three to five years for every program. Sector ministries must prepare rolling, three-year plans for major evaluations. A broad range of evaluation types were used. By the mid-1990s a sum of 160 evaluations were underway at any time. Little formal requirements were given for collecting or reporting of performance indicators. Due to evaluation findings being sent to Cabinet, near 80 percent of new policy proposals and two thirds of savings options influenced decisions by Cabinet in its budget decision making. Other strengths of this system were the high use of evaluation findings by sector departments and agencies, and evaluation becoming a collaborative endeavor. The first generation system also had weaknesses. For example, the quality of the evaluations was uneven. There was insufficient central support for advanced evaluation training and insufficient formal requirements for the collection and reporting of performance indicators. Also, ministries claimed that the system caused a burden on the administration. (continued on next page)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 73

Chapter 2 Australia’s Whole-of-Government Model (continued) Mackay’s second generation began in Australia with the new conservative government that made changes to government. Changes included a significant reduction in the size of the civil service, the policy-advising system for the budget process was dismantled, central oversight and “bureaucratic” rules were considerably reduced, and the Department of Finance was significantly downsized, reducing its role in providing advice during the budget process. In a response to the changes in government, the M&E system needed to change. The old evaluation strategy was dismantled and evaluation was “deregulated.” Evaluation was encouraged, but not required. An emphasis was placed on performance monitoring (i.e. performance indicators) of outputs and outcomes and reporting them to Parliament, both ex-ante and ex-post. The Australian National Audit Office reported that performance on this second generation system was highly inadequate. Data collection was poor due to weak standards. Little use was made of targets or benchmarking. A lot of data was collected on government outputs, but little information on outcomes. A real analysis of performance information was lacking. All of these shortcomings caused the Parliamentary committees to be very unhappy with the inadequate information. In spite of this, a few departments (i.e. Family and Community Services, Education, and Health) still learned from good-practice evaluations. The third generation (from 2006) began with ongoing concerns with difficulties in implementing complex government programs and with “connectivity”, that is, coordination across ministries, agencies, both federal and state. There was also a desire on the part of the Department of Defense to rebuild its role in budget and policy advising (Mackay, 2007). Two types of review were set up to ensure that spending was efficient, effective, and aligned with government priorities: • strategic reviews (seven per year) to focus on major policy and spending areas and on their purpose, design, management, results, and future improvements • program reviews (seven per year) to focus on individual programs and their alignment with government objectives, effectiveness, duplication, overlap, and savings. The Department of Defense was set to manage the reviews and the decision was made to mainstream the system. US $17 million was committed for reviews over four years. Retired civil servants were hired to head two pilot programs. They also maintained the requirements from generation two for the performance monitoring framework. What lessons were learned from the evolutions of Australia’s evaluation program? • the issues of program coordination, implementation, and performance (results) are permanent challenges for all governments • the nature of government decision-making determines the level of demand for M&E (and review) information • it takes time to build and M&E review system • it is difficult to balance needs between top-down and centralized or developed • most departments are not naturally inclined to do evaluations; they consider them costly and dangerous.

Page 74

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Evaluation Capacity Development in Ireland In the late 1960’s, Ireland, along with many other countries had an interest in rational analysis and its application to planning and budgeting. They identified a need for objective studies of social programs and the development and a need for those involved to acquire the skills to implement these studies (Friis, 1965 and Public Services Organisation Review Group, 1969). A number of initiatives were undertaken to develop evaluation skills. Despite these initiatives, the scope of these evaluations was limited and they had little influence on decision making until the late 1980’s. This was partly due to the lack of a strong tradition of evaluation of policies and programs in Ireland. Also, at this time, there was an economic crisis with high taxation and interest in evaluation as a tool of good governance was not considered as important as the drive to control public expenditure. An exception to this was the European Union (EU) expenditure in Ireland. The EU funds are applied through a number of operational programs which are run under a joint IrishEU agreed Community Support Framework (CSF) plan. The EU has been a major source of funding support and it demanded for evaluation consistently and systematically. The EU –funded program evaluations significantly impacted two main policy areas: industrial training and employment creating schemes; and anti-poverty and other community development programs. The labor market area tended to focus on quantitative measurement of outcomes, while the community development initiatives focused on qualitative methods concerned with description rather than outcome measurement (Boyle, 2005, pp 1-2). From 1989-1993 two independent evaluations units were established, one by the European Social Fund, the other by industry evaluation unit. Since 1989, evaluation of the EU Structural Funds was a formal requirement of those receiving assistance, and led to further developments in evaluation. During the period of 1994-1999, a central evaluation unit was established under the Department of Finance. Also, a third evaluation unit was established to cover evaluations in agriculture and rural development and external evaluators were appointed for operational program expenditures and the CSF plan. Besides the expenditures of the EU, in the late from 1999 to 2006, there was a renewed interest in evaluation of public expenditure in Ireland. Independent evaluation units were abolished and the capacity of the central evaluation unit was increased to take on extra responsibilities and a significant use of external evaluators to conduct the mid-term evaluation of the operational programs and the National Development Plan (Boyle, 2005, pp. 2-3).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 75

Chapter 2

Evaluation in Developing Countries Developing countries face challenges similar and different from developed countries in moving towards and building their own evaluation systems. Demand for and ownership of such a system—the most basic requirement—may be more difficult to establish in developing countries. For an evaluation system to be established and take hold in any country, minimum requirements are necessary. These requirements are interested stakeholders and commitments to transparency and good governance. Remember the nine dimensions of evaluation culture introduced near the beginning of this chapter. They are: 1. Evaluation takes place in many policy domains. 2. There is a supply of evaluators specializing in different disciplines who have mastered different evaluation methods, and who conduct evaluations. 3. There is a national discourse concerning evaluation in which more general discussions are adjusted to the specific national environment. 4. There is a profession with its own societies or frequent attendance at meetings of international societies and at least some discussion concerning the norms and ethics of the profession. 5. [There are] institutional arrangements in the government for conducting evaluations and disseminating their results to decision makers. 6. Institutional arrangements are present in Parliament [or other legislative bodies] for conducting evaluations and disseminating them to decision makers. 7. An element of pluralism exists: that is, within each policy domain there are different people or agencies commissioning and performing evaluations. 8. Evaluation activities [also take place] within the Supreme Audit Institution. 9. The evaluations done should not just be focused on the relation between inputs/outputs or technical production. Some public sector evaluations must have program or policy outcomes as their object (Furubo, et al, 2002, p. 7-9).

Page 76

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Weak political will and institutional capacity may slow progress. So too, difficulties in inter-ministerial cooperation and coordination can impede progress toward strategic planning. Indeed, a lack of sufficient governmental cooperation and coordination can be a factor in both developed and developing countries. To emerge and mature, evaluation systems need political will in the government and champions who are highly placed and willing to assume the political risks in advocating evaluation. The presence of a national champion(s) can go a long way towards helping a country develop and sustain evaluation systems. We know of no instance where an M&E system has emerged in the public sector of a developing country without a champion present. Many developing countries are still struggling to put together strong, effective institutions. Some may require civil service reform, or reform of legal and regulatory frameworks. In this context, the international development community is trying to improve basic building blocks to support them. The challenge is building institutions, undertaking administrative and civil service reforms, and/or revamping legal and regulatory codes, while at the same time establishing evaluation systems. Instituting evaluation systems could help inform and guide the government to undertake needed reforms in all of these areas. Developing countries must first have or establish a foundation for evaluation. Many developing countries are moving in this direction. Establishing a foundation means that they must have basic statistical systems and data, as well as key budgetary systems. Data and information must be of appropriate quality and quantity. Developing countries − like developed ones – need to know their baseline conditions. That is, they need to know where they currently stand in relation to a given program or policy. Additionally, capacity in the workforce is needed to develop, support, and sustain these systems. Officials need to be trained in modern data collection, monitoring methods, and analysis. This can be difficult for many developing countries (Schacter, 2002, p. 18). In response to these challenges, many aid organizations have ramped up their efforts to build institutional capacity. The methods may include technical and financial assistance to build statistical systems, provision of training in monitoring and evaluation, diagnostic readiness assessments and results or performance-based budget systems. The trend to resultsbased country assistance strategies may help model practices. Moreover, assistance to developing countries in producing country-led Poverty Reduction Strategies (PRS) may be another way to build such capacity.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 77

Chapter 2 As part of the efforts to support local capacity in developing countries, development organizations are also moving to create development networks, such as new on-line computer networks and participatory communities that share expertise and information, including, for example, the Development Gateway. It can still be argued that circumstances in Bangladesh, China, Costa Rica, or Mali are unique and distinct, and that the experience of one country will not necessarily translate to another. But once it is accepted that there is very little generic development knowledge — that all knowledge has to be gathered and then analyzed, modified, disassembled, and recombined to fit local needs — the source is immaterial. The new motto is: ‘Scan globally, reinvent locally’ (Source: UNDP, 2002, p. 18).

IPDET In 1999, the Independent Evaluation Group of the World Bank conducted a survey to identify training in development evaluation. It found little except for a few one-off programs for development organizations. From these findings the International Program for Development Evaluation Training (IPDET) was created in 2000 as a major effort to build evaluation capacity in developing countries, as well as those working in organizations focused on development issues. It trains all those working, or about to work in development evaluation, in designing and conducting such evaluations. Held annually, in Ottawa, Canada, on the Carleton University campus, the four-week training program is a collaboration between the Independent Evaluation Group (IEG) of the World Bank Group. the Faculty of Public Affairs at Carleton University, and other supporting organizations including: the Canadian International Development Agency (CIDA), the United Kingdom’s Department for International Development (DfID), the Swiss Agency for Development and Cooperation (SDC), the Norwegian Agency for Development Cooperation (Norad), the International Development Research Centre (IDRC), the Geneva International Centre for Humanitarian Demining, the Swedish International Development Cooperation Agency (SIDA), the Commonwealth Secretariat, and the Dutch Ministry of Foreign Affairs.

Page 78

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation The IPDET course begins with a two-week Core Program. The Core Program devotes special attention to monitoring and evaluating the implementation of poverty reduction strategies and emphasizes results-based monitoring and evaluation and stakeholder participation. The Core Program offers more than 80 instructional hours complete with tools, case studies, discussion groups, and readings. Almost one-third of instructional time is devoted to structured work group sessions, which give participants with similar subject matter interests the opportunity to work together on real-world evaluation issues and problems, The IPDET way of maximizing the opportunity for peer learning covers the basic concepts and practices that are common to most development evaluations. Following the two-week Core Program are two weeks of customizable training through 28 workshops offered by an array of highly respected and well-known instructors. Examples of workshops include: •

designing impact evaluations under constraints

•

designing and building results-based monitoring and evaluation systems

•

World Bank country, sector, and project evaluation approaches

•

using mixed methods for development evaluations

•

participatory monitoring and evaluation

•

evaluating post-conflict; conducting international joint evaluations

•

evaluating HIV/AIDS programs

•

evaluation with hidden and marginal populations.

IPDET is still one of the few programs in development evaluation that can be counted on occurring annually. Entering it eighth year in 2008, over 2,000 have been trained from government ministries, bilateral and multilateral aid organizations, non-government organizations (NGO), etc. The IPDET Core Program is now offered regionally and nationally. Customized versions of IPDET have been delivered in Botswana, Uganda, South Africa, Tunisia, China, India, Thailand, Canada, Australia (for the Australasian NGO community) and Trinidad & Tobago. Over time these shorter regional programs have come to be known as mini-IPDETs. Mini-IPDETs are highly interactive, employing a mix of presentations, discussion groups, group exercises, and case studies. The programs are designed to maximize opportunities for peer learning, with a focus on discussing real-world issues and learning practical solutions.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 79

Chapter 2 Based on the demonstrated success of IPDET through annual and impact evaluations, the IEG is now partnering with the Chinese Ministry of Finance (International Department), AsiaPacific Finance Department Center, and the Asian Development Bank on the first institutionalized regional offering called Shanghai IPDET (SHIPDET) – offered twice each year – once nationally and once regionally. This illustrates efforts to help develop capacity in evaluation within developing countries and regions.

New Evaluation Systems Attempts to develop an evaluation system and to shed light on resource allocation and actual results may meet with political resistance, hostility, and opposition. In addition, given the nature of many developing country governments, building an evaluation system can lead to a considerable reshaping of political relationships. To create a mature evaluation system requires interdependency, alignment, and coordination across multiple governmental levels. This can be a challenge. In many developing countries, governments are loosely interconnected, and are still working towards building strong administrative cultures and transparent financial systems. As a result, some governments may have only vague information about the amount and allocation of available resources. They may also need more information about whether these resources are being used for their intended purposes. Measuring government performance in such an environment is an approximate exercise. Many developed and developing countries are still working towards linking performance to public expenditures framework or strategy. If these linkages are not made, there is no way to determine if the budgetary allocations that the support programs are ultimately supporting is are a success or a failure. Some developing countries are beginning to make progress in this area. For example, in the 1990’s, Indonesia started to link evaluation to the annual budgetary allocation process. “Evaluation is seen as a tool to correct policy and public expenditure programs through more direct linkages to the National Development Plan and the resource allocation process” (Guerrero, 1999, p. 5). In addition, some developing countries – for example, Brazil, Chile, and Turkey – have made progress with respect to linking expenditures to output and outcome targets. The government of Brazil issues separate governmental reports on outcome targets (OECD, PUMA, 2002, pp. 5-7).

Page 80

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Many developing countries still operate with two budget systems: one for recurrent expenditures and another one for capital/investment expenditures. Until recently, Egypt had its Ministry of Finance overseeing the recurrent budget and its Ministry of Planning overseeing the capital budget. Consolidating these budgets within one ministry has made it easier for the government to consider a broad-based evaluation system in order to ensure that the country’s goals and objectives are met. Given the particular difficulties of establishing evaluation systems in developing countries, adopting an enclave or partial approach, by which a few ministries or departments first pilot and adopt evaluation systems, may be preferable to a whole of government approach. Attempting to institute a whole-ofgovernment approach toward evaluation, as in Australia, Canada and the United States, may be too ambitious. For example, in the Kyrgyz Republic, a World Bank 2002 readiness assessment recommended that the Ministry of Health, where some evaluation capacity already exists—be supported as a potential model for eventual government-wide implementation of an evaluation system.

Examples of Evaluation in Developing Countries This following section examines three developing countries: Malaysia, Uganda, and China.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 81

Chapter 2

Malaysia − OutcomeOutcome-Based Budgeting, NationNation-Building and Global Competitiveness Among developing countries, Malaysia has been at the forefront of public administration reforms, especially in the area of budget and finance. These reforms were initiated in the 1960’s as part of an effort by the government to strategically develop the country. The public sector was seen as the main vehicle of development, and consequently the need to strengthen the civil service through administrative reform was emphasized. In 1969, Malaysia adopted the Program Performance Budgeting System (PPBS) and continued to utilize it until the 1990s. The PPBS replaced line item budgeting with an outcome-based budgeting system. While agencies used the program-activity structure, in practice, implementation still resembled the line-item budgeting and an incremental approach. Budgetary reform focused on greater accountability and financial discipline among the various government agencies entrusted to carry out the socioeconomic development plans for the country. In addition to greater public sector accountability and improved budgetary system performance, the government undertook a number of additional reforms including improving: financial compliance, quality management, productivity, efficiency in governmental operations, and management of national development efforts. Malaysia’s budget reform efforts have been closely linked with the efforts at nation building and global competitiveness associated with the program Vision 2020, which aimed to make Malaysia a fully developed country by the year 2020. In 1990, the government introduced the Modified Budgeting System (MBS) to replace the PPBS. Greater emphasis was placed on outputs and impact of programs and activities in government. Under the prior PPBS system, there were minimal links between outputs and inputs. As such, policies continued to be funded even when no results were being systematically measured. In the late 1990s, Malaysia developed its integrated results-based management system (IRBM). The components of the system were resultsbased budgeting system (RBB) and a results-based personnel performance system (PPS). Malaysia also developed two other results-based systems to compliment their management system: the results-based management information system (MIS) and a results-based monitoring and evaluations framework (M&E) (Thomas, 2007, p. 7). (continued on next page)

Page 82

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Malaysia − Outcome-Based Budgeting, Nation-Building and Global Competitiveness (continued) The IRBM system provides a framework for planning, implementing, monitoring, and reporting on organizational performance. It is also able to link organizational performance to personnel performance. A number of countries (Zimbabwe, Mauritius, India, Namibia, Botswana, and Afghanistan) are integrating an IRBM system in stages with results-based budgeting and results-based M&E at the forefront. In Malaysia, the budget system is the main driver of the IRBM system (Thomas, 2007, p. 7). The IRBM becomes more dynamic driven by the MIS and M&E. They provide the performance measurement dimension to the strategic planning framework, using accurate, reliable, and timely information geared towards decision making. In this system, MIS and M&E work closely together to ensure that the system produces the right information for the right people at the right time. Indicators must be both operational and results-based. An electronic-based version of the integrated performance management framework has been developed and used in Malaysia (Thomas, 2007, p. 8). Malaysia identified several lessons learned, among these are: •

a capacity-building program for key levels needs to be sustained

•

monitoring and reporting time for the system is time consuming

•

there is a need to do more development work

•

there is a need to strengthen the performance planning process to be more zero-based rather than incremental

•

the rewards and sanctions are not commensurate at all levels

• there is limited integration with other initiative (Rasappan, 2007, slides 24-25). Malaysia also identified recommendations, among these are: •

working towards full vertical and horizontal linkages

•

avoiding disillusionment at both policy and operational levels

•

reviewing and strengthen all support policies and systems

•

working towards an integrated results-based management system that focuses on whole-of-government performance (Rasappan, 2007, slides 26-27).

Although Malaysia has been at the forefront of public administration and budget reforms, these reforms efforts have not been smooth or consistent over the years. Nonetheless, the MBS was a bold initiative on the part of the Malaysian government, demonstrating foresight, innovativeness, dynamism, and commitment to ensure value for money in the projects and policies being implemented (World Bank, 2001).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 83

Chapter 2

Uganda and and Poverty Reduction — Impetus toward Evaluation The government of Uganda has committed itself to effective public service delivery in support of its poverty-reduction priorities. The recognition of service delivery effectiveness as an imperative of national development management is strong evidence of commitment to results, which is also evident in several of the public management priorities and activities that are currently ongoing. Over the past decade, Uganda has undergone comprehensive economic reform and has achieved macroeconomic stabilization. Uganda developed a Poverty Eradication Action Plan (PEAP) in response to the Comprehensive Development Framework, and it is now incorporated into the Poverty Reduction Strategy Paper. The PEAP calls for a reduction in the absolute poverty rate from 44 percent (as of the late 1990s) to ten percent by the year 2017. The PEAP and the Millennium Development Goals (MDGs) are broadly similar in focus and share the overall objective of holding Government and development partners responsible for development progress. Uganda was the first country to be declared eligible and to benefit from the heavily indebted poor countries (HIPC) initiative. Most recently, Uganda qualified for enhanced HIPC relief in recognition of the effectiveness of its poverty reduction strategy, consultative process involving civil society, and the government’s continuing commitment to macroeconomic stability. Uganda has introduced new measures to make the budget process more open and transparent to internal and external stakeholders. The government is modernizing its fiscal systems, and embarking on a decentralization program of planning, resource management, and service delivery to localities. The Ministry of Finance, Economic Planning, and Development (MFPED) is also introducing output-oriented budgeting. In addition, government institutions will be strengthened and made more accountable to the public. The country is still experiencing coordination and harmonization difficulties with respect to evaluation and the PEAP. “The most obvious characteristic of the PEAP M&E regime is the separation of poverty monitoring and resource monitoring, albeit both coordinated by the MFPED. The two strands of M&E have separate actors, reports, and use different criteria of assessment. Financial resource monitoring is associated with inputs, activities and, increasingly, outputs, whereas poverty monitoring is based on analyzing overall poverty outcomes.” Other evaluation coordination issues revolve around the creation of a new National Planning Authority, and among the sector working groups. At the end of 2007, the Office of the Prime Minister (OPM), presented a Working Note for discussion of the Monitory and Evaluation of the National Development Plan. Two parts of this paper were to review of the strengths and weaknesses of the PEAP and to propose a way forward for the monitoring and evaluation of the new national plan (Uganda OPM, 2007). (continued on next page)

Page 84

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Uganda and Poverty Reduction — Impetus toward Evaluation (continued) The following is a short summary of what the Working Note reported: • a “lack of clear sector ministry outcomes and outputs, measurable indicators with associated baselines and targets, efficient monitoring systems, and the strategic use of evaluation to determine performance and causality.” (Uganda OPM, 2007, p. 4) • Accountability is based on spending rather than substantive performance measures. • The amount of data being collected has not been balanced by the actual demand for these data and the capacity to use the data. • Due to duplicative and uncoordinated monitoring, a complex and formidable burden of inspection activity, indicator data collection, and reporting formats has occurred. The result is a large volume of data on compliance with rules and regulation, but not often yielding a sufficiently clear basis for assessment of value for money and cost-effectiveness in public sector delivery. A study done in 2005 reported “the reasons for poor coordination and duplication of effort may relate more to the incentive structure of the civil service, where monitoring activities are driven in-part by the desire for per diems as important salary supplements.” (Uganda OPM, 2007, p. 6) • It has been a challenge to convene a National N&E Working Group effectively around addressing M&E challenges due to lack of incentives and issues of overlapping mandates on planning and related M&E issues. • Although numerous evaluations are conducted, they are largely done within sectors and ministries without use of common standards. • At the local level, people still do not feel involved in decision-making (Uganda OPM, 2007, pp 4-7). The Working Note proposed a few areas for consideration: • link budget allocations to the achievement of results • consider establishing public service agreements or performance contracts • provide information on results in a timely and usable manner to policy makers • ensure that information and data demands are reflected in the data supply • establish mechanisms for quality control and assurance • address analysis for policy use • separate the monitory and evaluation functions • clarify roles and responsibilities across government for planning, monitoring, evaluation, and other related quality assurance functions (Uganda OPM, 2007, pp 7-8). Regarding future evaluation, Uganda faces the challenge of keeping track of and learning from its progress toward poverty reduction via the PEAP/National Poverty Reduction Strategy. Evaluation cannot be isolated from the decision-making practices and incentives that underpin national development systems and processes (Hauge, 2001, pp. 6-16).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 85

Chapter 2 Evaluation in China Evaluation is a relatively new phenomenon in China. Indeed, before the early 1980’s, it was almost unknown. This unfamiliarity with evaluation reflected the orientation of the social sciences at that time, the virtual absence of any evaluation literature published in Chinese, and the lack of systematic contacts by Chinese with those practicing evaluation in other parts of the world. The Chinese did conduct some activities that are related to evaluation, including: policy analysis, economic and management studies, survey research, project completion reviews, and summarizing of experience. Social science institutional and technical/analytical capacity exists at some economic policy and research institutes. It was not until 1992 that key central agencies, including the State Audit Administration, the Ministry of Finance, and the State Planning Commission began to develop and put forth specific proposals for building performance monitoring and evaluation capacity to the State Council. The Center for Evaluation of Science & Technology conducted a first joint evaluation of science and technology programs with the Netherlands (NCSTE, IOB), With capital and development assistance going into China over the last twenty years, the country has seen an increase in capability in, and understanding of, technological and engineering analysis, financial analysis, economic analysis and modeling, social impact analysis, environmental impact analysis, sustainability analysis, and implementation studies. The driving force for evaluation in China is the massive and sustained surge in national development and economic growth. The annual GDP has increased at over 7.8 percent per year for the last nine years. The attention and capability of the country to address evaluation questions comes from this concern with development. Some central agencies, including the China International Engineering Consulting Company (CIECC, a governmentowned consulting firm), the Ministry of Construction, and the State Development Bank, have now established evaluation capacities at the highest levels of these organizations. Although most evaluation is ex post project assessment, there is increasing recognition that evaluation issues are also embedded in all stages of the development project cycle. For this reason, there is growing awareness within China that the evaluation function is applicable to all stages of the project cycle. There is now interest in linking evaluation to project and program formulation and implementation, and some ongoing evaluation has already been undertaken, though comprehensively doing so is still infrequent. One notable exception is that in 2006, China, for the first time built a systematic M & E component into its Five-Year Plan. In the ten previous fiveth year plans, China had no formal M & E system, but the 11 five-year plan has embedded a detailed M & E system. This system is based on the “10 Steps” elaborated on in the book by Kusek & Rist (2004). (continued on next page)

Page 86

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Evaluation in China (continued) In April 2006, China launched the twice-yearly Shanghai International Program for Development Evaluation Training (SHIPDET) to train national and regional evaluators. Partners in the training are the Ministry of Finance, the World Bank, Asian Development Bank, and the Asia-Pacific Finance and Development Center. China is also building a foundation for evaluation; however, there is no grand edifice in place. In the Chinese governmental structure and administrative hierarchy, several key tasks appear necessary at this time if evaluation is to continue to develop. These include: •

the need to establish a strong central organization for overall evaluation management and coordination

•

the need to establish formal evaluation units, policies, and guidelines in all relevant ministries and banks

•

recognition that the time is right for provincial-level and local governments to start their own evaluations

•

the need to set up in the State Audit Administration an auditing process of the evaluation function so that there can be ongoing oversight and auditing of the evaluations undertaken within line ministries and the evaluation policies and guidelines issued by the central evaluation organizations, the relevant ministries, provinces, and the banks

•

the need to develop advanced evaluation methods across units and organizational entities

•

the need to strengthen the monitoring and supervision function in investment agencies

•

the need to develop a supply of well-trained evaluators for the many national ministries, provinces, and banks moving into the evaluation arena (Houqi & Rist, 2002, pp. 249-259).

China has identified the important issue of raising a demand for evaluation results. This key issue is a challenge in many countries without a tradition of transparent government.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 87

Chapter 2

Part II: Emerging Development Issues: What Are the Evaluation Implications? This section introduces many of the emerging development issues. It is meant to be a brief overview and to highlight the evaluation implications. It is not an in-depth treatment of any of the issues. For those who would like further information, references are offered. Michael Q. Patton (2006) begins a discussion of recent trends in evaluation by identifying evaluation as a global public good. He describes the growth of professional organizations, associations, and societies for evaluation around the world and the standards and guidelines being established by these organizations. Patton also points to the development of over 100 new, distinct models for evaluation as an emerging trend. For example, participatory evaluation, utilization focused evaluation, and empowerment evaluation, to name three. Another recent trend Patton credits to Rist is that of going beyond studies to streams (Rist & Stame, 2006). Rist describes how evaluation is now relying on systems of evaluative knowledge, not individual evaluators to produce evaluative knowledge. Patton uses an analogy to help illustrate the emerging complexity of evaluation. In the past, evaluators could often follow a kind of “recipe” in doing evaluations. To illustrate this Patton describes the simple recipe process as:

Page 88

•

the recipe is essential

•

recipes are tested to assure replicability of later efforts

•

while no particular expertise is needed, knowing how to cook increases success

•

recipes produce standard products

•

there is certainty of same results each time.

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation But recipes do not get standard results in development. Patton describes how evaluators need a more complex model, one where the evaluator must react to complex questions. His analogy for the emerging trend in development evaluation is that of raising a child. Compare following a recipe to raising a child. A recipe is a step-by-step process. Raising a child is a highly complex process where care-givers use knowledge to help them make decisions and react to new situations. Patton describes another trend in development evaluation, moving to more formative situations. The kinds of formative situations he discusses are evaluations where: •

the intended and hoped-for outcomes are specified but the measurement techniques are being piloted

•

a model for attaining outcomes is hypothesized and being tested/refined

•

the implementation of the intervention is not standardized, but it is being studied and improved as problems in the intervention are worked out – a kind of iterative approach

•

the attribution is formulated with the possibility of testing of causality as part of the challenge.

The development agenda will continue to evolve in response to emerging issues. Among current issues are: globalization; the growing incidence of conflict around the world; terrorism and money laundering; the widening gap between the world’s rich and poor; the increasing number of players on the development scene, the drive toward debt reduction, the focus on improved governance, etc. Addressing these issues in turn places new demands on the evaluator. What happens in development affects evaluation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 89

Chapter 2 The global drive toward comprehensive, coordinated, participatory development and demonstration of tangible results presents new challenges to the development evaluation community. There have been significant shifts from partial to comprehensive development, from an individual to a coordinated approach (partnerships), from growth promotion to poverty reduction, and from a focus on implementation to a focus on results. With respect to comprehensive development, for example, bilateral and multilateral “donors must now position each project within a larger context and examine its sustainability and potential effects on society, politics and the broad economy” (Takamasa & Masanori, 2003, p. 6). They note: Development theorists have also come to believe that the most important factor for economic development is not capital but appropriate policies and institutions. This shift was caused by the tremendous impact that economists such as North…, Stiglitz… and Sen… had on the discipline of economics, including development economics. These developments resulted in the current situation where the central theme of international development assistance is poverty reduction in a broad sense which includes the expansion of human dignity and political and economic freedom for people in developing countries” (Takamasa & Masanori, 2003, p. 6). The Millennium Development Goals are one concrete manifestation of this new thinking in development. Indeed, the recent World Development Report focused on what governments can do to create better investment climates in their societies, and measured countries’ progress through sets of indicators designed to tap elements of business climates. The report recommended institutional and behavior improvements: through well-designed regulation and taxation, reducing barriers to competition, improving business incentives, tackling corruption, fostering public trust and legitimacy, and ensuring proper implementation of regulations and laws (World Bank, 2005b, p. 25). Many of the new issues in development assistance involve multiple bilateral and multilateral development partners and the potential burden of their multiple evaluations on developing countries. This calls into play the strong rationale for conducting joint international evaluations. Such joint evaluations can be conducted at the project, country, sector, or thematic level. They may yield efficiencies of cost and scale for the development organizations, as well as harmonization of evaluation methods that facilitate comparison of results.

Page 90

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation What follows is a brief discussion of some of the major drivers of the international development agenda and their implications for evaluation. These include: •

Millennium Development Goals (MDG)

•

Monterrey Consensus

•

Paris Declaration on Aid Effectiveness

•

Debt Initiative for Heavily-Indebted Poor Countries (HIPC)

•

The Emergence of New Actors in International Development Assistance

•

Conflict Prevention and Post-conflict Reconstruction

•

Governance

•

Anti-Money Laundering and Terrorist Financing

•

Workers’ Remittances

•

Gender: From Women in Development (WID) to Gender and Development (GAD) to Gender Mainstreaming

•

Private Sector Development (PSD) and Investment Climate

•

Environmental and Social Sustainability

•

Global Public Goods.

Millennium Development Goals (MDGs) In September of 2000, 189 United Nations member countries and numerous international organizations adopted the United Nations Millennium Declaration, from which the MDGs were, in part, derived. The MDGs consist of a series of development goals for the international community, involving both the active participation of developed and developing countries alike to achieve by the year 2015. These are ambitious goals aimed at poverty reduction, human development, and the creation of global partnerships to achieve them. Further, they represent a shift away from the earlier emphasis in the development community on economic growth, when it was hoped that achieving economic growth would be able to lift those living in poverty. The MDGs specifically target a series of measures aimed at poverty reduction and better living conditions for the world’s poor.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 91

Chapter 2

Millennium Development Goals (MDGs) 1. Eradicate extreme poverty and hunger. 2. Achieve universal primary education. 3. Promote gender equality and empower women. 4. Reduce child mortality. 5. Improve maternal health. 6. Combat HIV/AIDs, malaria, and other diseases. 7. Ensure environmental sustainability. 8. Develop a global partnership for development.

The eight MDGs include a detailed set of 18 targets and 48 indicators by which to measure progress. The MDGs are results-based goals that must be measured, monitored and evaluated accordingly. In this context, the MDGs pose major challenges to evaluation systems on the part of all countries. Developing countries have different mixes of the 18 targets and also different dates for achieving them depending on their situations. However, many developing countries have lacked the capacity to perform monitoring and evaluation. The result has been a growing effort by development organizations to provide both statistical and monitoring and evaluation capacity building, technical assistance, and support from development organizations. The MDGs symbolize a focus on results… The new development paradigm emphasizes results, partnership, coordination, and accountability… [It] combines a results-orientation; domestic ownership of improved policies; partnerships between governments, the private sector, and the civil society; and a long-term, holistic approach that recognizes the interaction between development sectors and themes (Picciotto, 2002a, p. 3).

Page 92

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation The MDGs are driving developing countries to build M&E capacity and systems. Development organizations are being called upon to provide technical assistance and financing for these efforts. As we saw earlier, many developing countries are in the early stages of building M&E systems, and are slowly working their way towards the construction of results-based systems that will help in determining the extent to which the MDGs are being achieved. Assessing success towards meeting the MDGs will require the development and effective use of evaluation systems. The evaluation system will, in turn, need to be integrated into the policy arena of the MDGs so that it is clear to all why it is important to collect the data, how the information will be used to inform the efforts of the government and civil society to achieve the MDGs…” .Kusek, Rist, and White, 2004, pp. 17-18) Each year, the World Bank and the International Monetary Fund publish a Global Monitoring Report on the MDGs. The Global Monitoring Report 2004 focused on how the world is doing in implementing the policies and actions for achieving the MDGs and related development outcomes. It is a framework for accountability in global development policy. The report highlights several priorities for strengthening the monitoring exercise. These include: •

Strengthening the underlying development statistics, including timely implementation of the action plan agreed upon among international statistical agencies…

•

Conducting research on the determinants of the MDGs, on critical issues such as effectiveness of aid, and on development of more robust metrics for key policy areas such as governance and for the impact on developing countries of rich country policies

•

Deepening collaboration with partner agencies in this work, building on respective comparative advantage and ensuring that the approach to monitoring and evaluation is coherent across agencies (IMF and World Bank, 2004).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 93

Chapter 2 The 2005 report pointed to opportunities created by recently improved economic performance in many developing countries. It outlined a five-point agenda designed to accelerate progress: •

Ensure that development efforts are country-owned. Scale up development impact through country-owned and led poverty reduction strategies

•

Improve the environment for private sector-led economic growth. Strengthen fiscal management and governance, ease the business environment, and invest in infrastructure

•

Scale up delivery of basic human services. Rapidly increase the supply of health care workers and teachers, provide larger and more flexible and predictable financing for these recurrent cost-intensive services, and strengthen institutional capacity

•

Dismantle barriers to trade. Through an ambitious Doha Round, including major reform of agricultural trade policies and also increasing "aid for trade"

•

Double development aid in the next five years. In addition, improve the quality of aid, with faster progress on aid coordination and harmonization. World Bank, 2005).

The 2006 report highlighted economic growth, better quality aid, trade reforms, and governance as essential elements to achieve the MDGs (World Bank, 2006). The 2007 report highlighted two key thematic areas —gender equality and empowerment of women (the third MDG) and the special problems of fragile states, where extreme poverty is increasingly concentrated (World Bank, 2007e).

Monterrey Consensus In March of 2002, government representatives from more than 170 countries, including more than 50 heads of state, met to discuss a draft of the “Monterrey Consensus” on Financing for Development. The draft reflected an attempt to distribute more money to the world’s poorest people, those that live on less than $1 per day. Most significantly for development evaluation, the Monterrey Consensus stressed mutual responsibilities in the quest for the MDGs. It also called on developing countries to improve their policies and governance and on developed countries to step up their support, especially by opening up access to their markets and providing more and better aid.

Page 94

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation The document recognized the need for greater financial assistance to raise the living standards of the poorest countries, but did not set any firm goals for increasing aid, relieving most debt burdens or removing trade barriers (Qureshi, 2004). At the midway point between the year in which the MDGs were adopted and their 2015 target date, the Economic Commission for Africa published a paper assessing Africa’s progress towards meeting the commitments to Africa for the Monterrey Consensus. The paper concluded that the evidence suggests that substantial progress had been made in the area of external debt relief. However, they also found that very limited progress had been made in the other core areas of the Consensus. Of interest to evaluators, the paper summarizes: There is the understanding that monitoring of the commitments made by both African countries and their development partners is essential if the objectives of the Monterrey Consensus are to be realized. African leaders have recognized this and put in place a mechanism to monitor progress in the implementation of their commitments as well as those of their development partners. The recent institutionalization of an African Ministerial Conference on Financing for Development is a bold step by African Leaders in this area. The international community has also put in place a mechanism to monitor donor performance. For example, they have established an African Partnership Forum and an African Progress Panel, both of which will monitor progress in the implementation of key commitments on development finance. Ultimately, the effectiveness of these monitoring mechanisms shall be assessed in terms of how they are able to turn promises made by development partners into deeds. For it is only through the implementation of these commitments that African countries and the international community can reduce poverty in the region and lay the foundation for a brighter future for its people (Kavazeua, Osakwe, Shimeles, & Verick 2007, p. vi).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 95

Chapter 2

Paris Declaration on Aid Effectiveness The Paris Declaration on Aid Effectiveness, usually called simply the Paris Declaration, was an international agreement to continue to increase efforts for managing aid to developing countries. Over one hundred ministers, heads of agencies and other senior officials endorsed the agreement on March 2, 2005. One feature of this declaration important to evaluation was the agreement to use monitorable actions and indicators as a part of the implementation of the agreement. Twelve indicators were developed to help track the progress and encourage progress towards attaining more effective aid. Targets have been set for eleven of the twelve indicators for 2010 (OECD, Development Co-operation Directorate [DCD/DAC] 2005, p. 1). The indicators and targets that were endorsed have been organized around the following five key principles: •

Ownership: Partner countries exercise effective leadership over their development policies and strategies and coordinate development actions.

•

Alignment: Development organizations base their overall support on partner countries’ national development strategies, institutions, and procedures.

•

Harmonization: Development Organizations’ actions are more harmonized, transparent, and collectively effective.

•

Managing for results: Managing resources and improving decision-making for results.

•

Mutual accountability: Development organizations and partners are accountable for development results. (Joint Progress toward Enhanced Aid Effectiveness, 2005, pp. 2-6)

In 2007, the OECD published a landmark report summarizing the results of a baseline survey of the state of affairs in 2005. The report assesses the effectiveness of aid, not only globally, but also for development organizations. (See OECD, 2007.)

Page 96

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation The Organisation for Economic Cooperation and Development (OECD) conducted a survey to monitor the progress in improving aid effectiveness as emphasized in the Monterrey Consensus and made more concrete in the Paris Declaration. From the survey they arrived at the following conclusions: •

The Paris declaration has increased awareness and promoted dialogue at the country level on the need to improve the delivery and management of aid.

•

The pace of progress in changing donor attitudes and practices on aid management has been awfully slow and the transactions costs of delivering and managing aid are still very high.

•

There is the need to strengthen national development strategies, improve the alignment of donor support to domestic priorities, increase the credibility of the budget as a tool for governing and allocating resources, and increase the degree of accuracy in budget estimates of aid flows.

•

Changing the way in which aid is delivered and managed involves new costs and this should be taken into account by donors and partners.

•

Countries and donors should use performance assessment frameworks and more cost-effective resultsoriented reporting. In this regard, there is the need for donors to contribute to capacity building and make more use of country reporting systems.

•

More credible monitoring systems need to be developed to ensure mutual accountability (Kavazeua, et al, 2007, p. 18).

Debt Initiative for HeavilyHeavily-Indebted Poor Countries (HIPC) In 1996, the World Bank and the IMF proposed the HighlyIndebted Poor Country (HIPC) Initiative, the first comprehensive approach to reduce the external debt of the world’s poorest and most heavily-indebted countries. The Initiative is designed to reduce debts to sustainable levels for poor countries that pursue economic and social policy reforms. It is used specifically in cases where traditional debt relief mechanisms will not be enough to help countries exit from the rescheduling process. Therefore, HIPC will reduce debt stock, lower debt service payments, and boost social spending.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 97

Chapter 2 HIPC was endorsed by 180 countries, and includes both bilateral and multilateral debt relief. External debt servicing for HIPC countries is expected to be cut by about $50 billion. As of April 2007, there were 41 countries receiving debt relief under the HIPC initiative (International Development Fund & International Monetary Fund, 2007, p. i). HIPC is linked to the larger comprehensive national poverty reduction strategies. In 1999, the international development community agreed that national Poverty Reduction Strategies Papers (PRSPs) should be the basis for concessional lending and debt relief. These strategies include agreed-upon development goals over a three-year period, with a policy matrix and attendant set of measurable indicators and a monitoring and evaluation system by which to measure progress. In short, if the country met their goals, their debt would be reduced, providing incentives to speed up reforms and increase country ownership. As a condition for debt relief, recipient governments must be able to monitor, evaluate, and report on reform efforts and progress toward poverty reduction. This condition created demand for M&E capacity building and assistance. Some developing countries, such as Uganda, have made progress in evaluation and qualified for enhanced HIPC relief. However, lack of capacity for evaluation has been a particular problem for participating HIPC countries such as Albania, Madagascar, and Tanzania. These countries require additional capacity building assistance to develop their evaluation capacity. In providing that those countries with very high levels of debt distress be given concessional loans or grants to mitigate the risk of future debt crises, HIPC raised an additional evaluation issue. That is, how would a grant, as opposed to a loan, effectiveness be evaluated? According to what criteria? Again, this raises new challenges for development evaluators. September, 2006 marked ten years of the HIPC Initiative. Since 1999, the poverty-reducing expenditures of HIPCs have increased, while debt-service payments have declined (World Bank, 2007f). This finding suggests that HIPC is resulting in progress.

Page 98

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation

The Emergence of New Actors in International Development Assistance A recent OECD study attempted to estimate the amount of funds given from philanthropic foundations to developing countries. They: “... attempted a serious estimate of the amount of funds distributed by 15 of the largest philanthropic foundations with some international giving, for 2002. The total was almost $4 billion dollars and the total international giving was about $2 billion dollars. This represents about 4% of all development aid and is about one-half of the contributions attributed by the official Development Assistance Committee to …NGOs as a whole (a group which includes the foundations)” (Oxford Analytica, 2004). Thus, foundations have emerged as important players in international development. The US Council on Foundations counts 56,000 private and community foundations in the US, distributing US $27.5 billion annually. The European Foundation Centre found some 25,000 foundations in nine EU countries spending over US $50 billion annually. Several large foundations tend to dominate the global scene. Examples of these large foundations are: •

The Bill and Melinda Gates Foundation

•

The Ford Foundation

•

The Susan Thompson Buffet Foundation

•

Soros Foundation/Open Society.

The Soros Foundation/Open Society Institute network is an influential foundation player on the international development scene, with programs in more than 50 countries. Programs provide support for: education, media, public health, women, human rights, arts and culture, and social, economic, and legal reforms (SOROS Foundations Network, 2007, Open Society Institute). The large-scale contributions and activities of foundations constitute another new challenge to development evaluators. The challenge is to recognize the contribution that foundations are making in developing countries.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 99

Chapter 2

Conflict Prevention and PostPost-Conflict Reconstruction In the post-Cold War era from 1989 through 2001, there were 56 different major armed conflicts in 44 different locations. In 2003, conflicts were estimated to affect over one billion people. Most conflicts have proved difficult to end, with the majority of conflicts during this period having lasted seven years or more. The global costs of civil wars in particular are great. “By creating territory outside the control of any recognized government, armed conflicts foster drug trafficking, terrorism and the spread of disease” (Collier, et al, 2003). Poverty is both a cause and a consequence of conflict. Sixteen of the world’s 20 poorest countries have experienced a major civil war over the past 15 years. On average, countries coming out of a war face a 44 percent chance of relapsing in the first five years of peace. Dealing with post-conflict reconstruction involves the coordination of large numbers of bilateral and multilateral development organizations. For example, 60 development organizations were active in Bosnia-Herzegovina, 50 development organizations in the West Bank and Gaza, and 82 development organizations in Afghanistan. Rebuilding after a conflict has placed strains on aid coordination mechanisms to ensure that needs are met and that duplication and gaps in aid are avoided. Post-conflict reconstruction is not “business as usual.” It is not only about rebuilding infrastructure. Reconstruction often involves support for such activities as: institution-building, technical assistance, democracy and elections; NGOs and civil society; civilian police forces; budgetary start-up and recurrent costs; debt relief; balance of payments; de-mining; refugees and internally displaced people, children and youth; gender; and demobilization and reintegration of ex-combatants.

Page 100

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Because of concerns about corruption and the need to leverage ODA (Official Development Assistance), post-conflict reconstruction has often entailed the creation of new development organization lending instruments and mechanisms. In the case of the West Bank and Gaza, a multilateral development organization trust fund was created to support start-up and recurrent budgetary expenditures for the new Palestinian administration. Such instruments and mechanisms are now commonplace in post-conflict regions in other parts of the world. Increasingly, bilateral and multilateral development organizations are looking at the economic causes and consequences of conflict, and are seeking ways to prevent conflict. As such, there is a greater emphasis on: social, ethnic, and religious communities and relations; governance and political institutions; human rights; security; economic structures and performance; the environment and natural resources; and external factors. Post-conflict reconstruction brings a new level of difficulty and scale to evaluation (Kreimer, et al., 1998). These reconstruction programs are multi-sector programs that cost billions of dollars, and are funded by 50-80 bilateral and multilateral development organizations. Evaluators must examine the impact that such heavily front-loaded development approaches have on post-conflict reconstruction and reconciliation. Evaluators are challenged in new ways to examine a development organization coordination process that brings together such a large and diverse group of supporters. New projects and programs in untraditional areas of development assistance, such as de-mining and demobilization and reintegration of ex-combatants, must also be evaluated. So too, evaluators are now being required to evaluate new types of development organization mechanisms and lending instruments, such as multilateral development organization trust funds. And evaluators must look at what is and can be done in the development context to actually prevent such conflicts from erupting.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 101

Chapter 2

Governance Governance While often implicitly acknowledged behind closed doors, the issue of governance and corruption came publicly to the forefront of the international community’s attention in the mid1990’s. Since then international conventions have been signed address the problem of corruption around the world. The UN and the OECD have adopted conventions on corruption including provisions on: prevention, criminalization, international cooperation on asset recovery, anti-bribery measures, and multinational corporations. Multilateral development banks have also instituted anticorruption programs. Lending is directed toward helping countries build efficient and accountable public sector institutions. Governance and anti-corruption measures are addressed in country assistance strategies. Specifically, governance programs seek to promote: •

anti-corruption

•

public expenditure management

•

civil service reform

•

judicial reform

•

administration, decentralization, e-government and public services delivery.

Transparency International (TI), an NGO whose aim is to put “corruption on the global agenda,” was created and launched in the early/mid-1990’s. TI publishes an annual “Corruption Perception Index” that ranks approximately 140 countries by perceived levels of corruption among public officials, as well as an annual “Bribe Payers Index” that ranks exporting countries according to their incidence of bribery. TI has chapters in 88 countries, and works with local, national, regional and international partners (governmental and non-governmental) to combat corruption (Transparency International, 2007). Some estimates attribute more than US $1 trillion as lost to corruption annually.

Page 102

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Measuring corruption and the costs of corruption has been a challenge for the international community. But the “increasing availability of surveys and polls by many institutions, containing data on different dimensions of governance, has permitted the construction of a worldwide governance databank” (The World Bank, 2007b, p. 1 ¶ 3). “Utilizing scores of different sources and variables, as well as a novel aggregation technique, the databank now covers 200 countries worldwide, and contains key aggregate indicators in areas such as rule of law, corruption, regulatory quality, government effectiveness, voice and accountability, and political instability” (World Bank, 2007c, p. 1, ¶3). Development organizations and evaluators can use these data as a measure of aid effectiveness. Findings suggest that where corruption is higher, the possibility of aid being wasted is also commensurately higher. Results-based management is being used to identify and monitor the most vulnerable determinants and institutions in a country’s governance structure. The data help to demystify and treat more objectively issues of governance that were previously obscured. The data generated will also aid evaluators in compiling more quantitative evaluation findings related to the lessons learned. At the same time, evaluating investment climates and business environments will involve difficult and thorny concepts (see section below on private sector development). This is a new area that is evolving quickly and will require that evaluators address new developments and data in a timely fashion.

AntiAnti-Money Laundering and Terrorist Financing Money-laundering and terrorist financing are part of the broader anti-corruption landscape. Money laundering is the practice of engaging in financial transactions in order to conceal the identities, sources and destinations of the money in question. In the past, the term "money laundering" was applied only to financial transactions related to otherwise criminal activity. Today its definition is often expanded by government regulators, such as the Securities and Exchange Commission (SEC), to encompass any financial transaction which is not transparent based on law. As a result, the illegal activity of money laundering is now commonly practised by average individuals, small and large business, corrupt officials, and members of organized crime, such as drug dealers or Mafia members (Investor Dictionary.com, 2006). The Road to Results: Designing and Conducting Effective Development Evaluations

Page 103

Chapter 2 With an estimated US $1 trillion (or 2-5 percent of world gross domestic product) (IMF estimate) being laundered annually, this is a serious and growing international problem, affecting developing and developed countries alike. Globalization and the opening/easing of borders have facilitated transnational criminal activities and attendant illegal financial flows. Further, global anti-money laundering initiatives have taken on new importance with the spread of terrorism. Money laundering can take an especially heavy toll on developing economies/ countries. Emerging financial markets and developing countries are …important targets and easy victims for money launderers, who continually seek out new places and ways to avoid the watchful eye of the law. The consequences of money laundering operations can be particularly devastating to developing economies. Left unchecked, money launderers can manipulate the host’s financial systems to operate and expand their illicit activities…and can quickly undermine the stability and development of established institutions (International Federation of Accountants, 2004, p. 5). The United Nations General Assembly adopted the “Convention Against Transnational Organized Crime” in November 2000 (United Nations, 2006). The OECD’s Financial Action Task Force on Money Laundering (FATF) was created in 1989 by the G-7, and is now comprised of 31 member countries and territories and two regional organizations. It is an intergovernmental, policy-making body aimed at developing and promoting national and international policies to combat money laundering and terrorist financing. Monitoring and evaluation of implementation is a part of the FATF mandate, and is carried out multilaterally, by peer review, and by mutual evaluation. For example, the monitoring and evaluation process entails the following: Each member country is examined in turn by the FATF on the basis of an on-site visit conducted by a team of three or four selected experts in the legal, financial and law enforcement fields from other member governments. The purpose of the visit is to draw up a report assessing the extent to which the evaluated country has moved forward in implementing an effective system to counter money laundering and to highlight areas in which further progress may still be required (FATF, 2007, p. 1 ¶ 3). The FATF also has a series of measures to be taken in the event of non-compliance.

Page 104

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation

Workers’ Remittances A study tracking the rising trend of global remittances over the past decade found that annual global remittances sent by migrant workers to their countries of origin have now outpaced the volume of annual overseas development assistance (ODA). Indeed, workers’ remittances have been rising dramatically, from US $60 billion a year in 1998 to US $80 billion a year in 2002 and an estimated US $100 billion a year in 2003/4 – as compared with approximately US $50-60 billion/year in ODA and US $143 billion dollars in private capital flows in 2002. Remittances also tend to be more stable than private capital flows (The World Bank, 2003 and Oxford Analytica, 2004a). Remittances through the banking system are also likely to rise, as restrictions on informal transfers increase because of: •

more careful monitoring regulations to stem financing to terrorist organizations (see below) through informal mechanisms

•

a decrease in banking fees as a result of increased competition in the sector to capture the global remittance market.

Global remittances have been found to have a strong impact of poverty reduction. “On average, a 10 percent increase in the share of international migrants in a country’s population will lead to a 1.9 percent decline in the share of people living in poverty ($1.00/person/day)” (Adams and Page, 2003, p. 1). Global remittances help to fund local consumption in housing, agriculture, industry, and in the creation of new SMEs in the recipient country. Developed and developing countries and organizations are cognizant of these trends and are now seeking to find ways of capitalizing on these flows for investment purposes. A recent G-8 Summit Plan called on members and developing country governments to: Facilitat[e] remittance flows from communities overseas to help families and small entrepreneurs [businesses], including by: encouraging the reduction of the cost of remittance transfers, and the creation of local development funds for productive investments; improving access by remittance recipients to financial services; and enhancing coordination (G-8, 2004, p. 7).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 105

Chapter 2 Tracking global remittances and funneling them to new types of investments and funds will again pose new questions and issues for evaluators. As development practitioners have not yet devised ways in which remittances can be captured and leveraged for poverty reduction, evaluators will watch this area with great interest. Evaluation designs stress that the impact of remittances on developing countries have yet to be fully articulated and tested.

Gender: From Women in Development (WID) to Gender and and Development (GAD) to Gender Mainstreaming Gender refers to the socially constructed roles ascribed to females and males. Gender analysis examines the access and control men and women have over resources. It also refers to a systematic way of determining men and women’s, often differing, development needs and preferences and the different impacts of development on women and men. Gender analysis takes into account how factors of class, race, ethnicity or other factors interact with gender to produce discriminatory results. Gender analysis has traditionally been directed towards women because of the gender gap: that is, because of the gap between men and women in terms of how they benefit from education, employment, services, and so on. Women comprise half of the world’s population, and play a key role in economic development. Yet their full potential to participate in socio-economic development has yet to be realized. Indeed, women and children still comprise the majority of the world’s poor. Women produce half the food in some parts of the developing world, bear most of the responsibility for household food security, and make up a quarter of the workforce in industry and a third in services… Yet, because of more limited access to education and other opportunities, women’s productivity relative to that of men remains low. Improving women’s productivity can contribute to growth, efficiency and poverty reductionkey development goals everywhere (World Bank, 1994).

Page 106

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Recent trends regarding the role of women in development have evolved away from the traditional “Women in Development” (WID) approach to Gender and Development (GAD), and now to a more comprehensive “Gender Mainstreaming” approach. The early WID strategy regarding gender focused on women as a special target or interest group of beneficiaries in projects, programs, and policies. WID recognizes that women are active, if often unacknowledged, participants in the development process, providing a critical contribution to economic growth…as an untapped resource; women must be brought into the development process (Moser, 1997, p. 107). The Gender and Development (GAD) approach focuses on social, economic, political, and cultural forces that determine how men and women participate in, benefit from, and control project resources and activities. It highlights women and men’s often different needs and preferences. This approach shifts the focus of women as a group to the socially determined relations between men and women. Progress in gender equality and the empowerment of women is embodied in the MDGs, including specific goals, targets, and indicators for measuring and evaluating progress. OECD’s Development Assistance Committee (DAC) has also produced key and auxiliary guiding questions to assist managers in the evaluation of development activities. Questions include: •

Has the project succeeded in promoting equal opportunities and benefits for men and women?

•

Have women and men been disadvantaged or advantaged by the project?

•

Has the project been effective in integrating gender into the development activity? (Woroniuk & Schalkwyk, 1998).

Gender budgeting is another way of implementing and assessing how much of the national budget may benefit men and women. According to still another approach, a different way to measure and evaluate assistance is by examining the extent to which development assistance benefits sectors …that involve women, help women, empower women, and generate results for women. Literacy, health, population and micro-credit are key areas to measure (Jalan, 2000, p. 75).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 107

Chapter 2 Given the current emphasis on comprehensive approaches and partnerships, it is important to note that evaluation of gender mainstreaming policies must also be conducted, integrated and coordinated within and between development partner countries, organizations and agencies. In every evaluation, it is important to look at how the project, program, or policy affects men and women. What are the differences in how the intervention affects men and women?

Private Sector Development (PSD) (PSD) and Investment Climate There are a host of issues contained within Private Sector Development (PSD) and investment climate, including: the role of the private sector and foreign direct investment in poverty reduction; privatization; private participation in infrastructure services; public-private partnerships; creation of micro-, small- and medium-sized enterprises (SME); support for micro and SME finance through fiscal intermediaries; and stimulating entrepreneurship. The private sector investment has become increasingly recognized as critically important in reducing poverty in the developing world. In 1990, private sector investment in developing countries was about $30 billion a year, while development assistance amounted to about $60 billion – or twice that of the private sector. By 1997, private sector investment in developing countries had reached $300 billion, while development assistance had fallen to $50 billion. In other words, PSD grew from half of the size of development assistance to six times the size in the space of less than ten years. Another measure of the investment climate is Official Development Assistance (ODA). The OECD Glossary of Statistical Terms (2002) defines Official Development Assistance as: Flows of official financing administered with the promotion of the economic development and welfare of developing countries as the main objective, and which are concessional in character with a grant element of at least 25 percent (using a fixed 10 percent rate of discount). By convention, ODA flows comprise contributions of donor government agencies, at all levels, to developing countries (“bilateral ODA”) and to multilateral institutions. ODA receipts comprise disbursements by bilateral donors and multilateral institutions.

Page 108

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation In 1997, aid levels began an increase but hit a trough that continued until 2001. Total ODA from DAC members rose by 7% in 2001 and a further 5% in 2003. The increases in 2003 were attributed to continuing growth in general bilateral grants, an increase in debt forgiveness grants, and a cyclical fall in contributions to multilateral concessional funds and by reduced net lending. (OECD, 2004) In 2005, ODA from DAC members rose by 32%. The main factors for the increase in development assistance in 2005 were aid for the tsunami of 2004 in the Indian Ocean and debt relief for Iraq and Nigeria. (OECD, 2005) In 2006, the final ODA fell by 4.6%. The fall was predicted due to the exceptionally high debt and humanitarian relief in 2004 (OECD, 2006). ODA has grown steadily over the last decade, and is expected to continue to rise as donors have committed to significantly scale-up aid to achieve the MDGs. To make effective use of such scaled-up ODA at the country level, a number of implementation challenges would need to be addressed by donor and recipients. The most upfront challenges include: •

achieving complementarity across national, regional and global development priorities and programs

•

strengthening recipient countries’ ability to make effective use of potentially scaled-up fast-disbursing ODA, such as budget support (World Bank, 2007g).

Another measure of the investment climate is Foreign Direct Investment (FDI). FDI plays an extraordinary and growing role in global business. FDI is a cross-border investment made by an investor with a view to establishing a lasting financial interest in an enterprise and exerting a degree of influence on that enterprise's operations and where the foreign investor holds an interest of at least 10% in equity capital. FDI is often mentioned as a lead driver for economic growth and thought to bring certain benefits to national economies (InvestorDictionary.com, 2006). The most profound effect of FDI has been seen in developing countries, where yearly FDI flows have increased from an average of less than US $10 billion in the 1970’s to a yearly average of less than $20 billion in the 1980’s, to explode in the 1990s from US $26.7billion in 1990 to US $179 billion in 1998 and US $208 billion in 1999 and comprised a large portion of global FDI (Graham & Spaulding, 2005). The United Nations Conference on Trade and Development (UNCTAD) (2008) reported global growth on foreign direct investment in the 2007 year-end figures The report discussed a continued to rise of FDI in all of three groups of economies: developed countries, developing economies, and in South-East Europe and the Commonwealth of Independent States (CIS).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 109

Chapter 2 The results largely reflected the high-growth propensities of transnational corporations and strong economic performance in many parts of the world. Another cause reported was increased corporate profits and an abundance of cash which boosted the value of the cross-border mergers and acquisitions that constitute a large portion of FDI flows, although the value of mergers and acquisitions in the latter half of 2007 declined. In the 2007 report, FDI flowed to developed countries for the fourth consecutive year, reaching US$1 trillion. FDI inflows to developing countries and economies in transition (the latter comprising South-East Europe and CIS) rose by 16% and 41% respectively, and reached new record levels. Privatization of state-owned enterprises was a particularly strong theme in the 1990s, as many countries sought to move from socialist to market-oriented economies. It is still a major force in many countries, where the state continues to own and operate many economic assets. More than 100 countries, on every continent, have privatized some or most of their state-owned companies, in every conceivable sector of infrastructure, manufacturing and services… an estimated 75,000 medium and large-sized firms have been divested around the world, along with hundreds of thousands of small business units... Total generated proceeds are estimated at more than US$735 billion (Nellis, 1999). Privatization is controversial in many respects; there are downsides and tradeoffs. The debate over if, and when, and how best to go about privatization continues. It is not a panacea for economic ills, but has proved to be a useful tool in promoting net welfare gains and improved services for the economy and society. There is a potentially large role for private participation in infrastructure, especially in developing countries. Traditionally, the public sector delivered infrastructure and related services for electricity, energy, telecommunications, transport, and water and sewerage in most developing countries. Government monopolies provided such services from the 1950’s until the 1990’s. Progress was slow, and in the 1990’s, many governments began to look to the private sector to play a larger role in financing, building, owing and/or operating infrastructure.

Page 110

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Public-private partnerships aimed at facilitating the provision of infrastructure services are becoming increasingly important in many developing countries. Similarly, infrastructure investments which cross sovereign borders, such as cross-country pipelines, dams, telecom and transport networks are becoming increasingly common; nearly all [of them]… involved large scale private financing to complement public funding (Development Committee, IFC, 2004, p. 4). Such partnerships can take a variety of forms, including: contracting out services; joint ventures; build-operate-transfer (BOT) schemes; and build-own-operate (BOO) arrangements. Public-private partnerships use and leverage the best of both sectors. The private sector can contribute its innovativeness, its management skills, its efficiency, its finance and investment potential; whilst government can continue to meet its responsibilities to its citizens, to ensure the provision of public services, to regulate areas of the economy and to shoulder risks which private sector entities cannot bear (United Nations, 1998, p. 1). Support from development organizations for the creation and sustenance of the small and medium-sized enterprises sector has grown along with the recognition of the significant contribution that SMEs can make vis-à-vis innovation, economic growth, job creation, and ultimately poverty alleviation. SMEs comprise the majority of companies in the private sector of most developing countries. At the same time, many SMEs face ongoing obstacles in obtaining access to financing and global markets. Governments can do more to reform legal and regulatory environments to enable the SME sector to flourish. In 2004, an OECD meeting adopted a declaration on “Fostering Growth of Innovative and Internationally Competitive SMEs” (OECD, 2004). Evaluations in this area have focused on various forms of technical assistance and advisory services to the SME sector. Micro-enterprise development and micro-credit are a related area of high interest in the development community. Microenterprise development targets traditionally very small, selfemployed entrepreneurs, and/or family-owned businesses (including those started/owned by women) that do not have access to formal credit institutions. Over 500 million poor people are engaged in profitable micro-enterprise activities with the support of micro-credit provided by the development agencies and NGOs.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 111

Chapter 2 Micro-enterprise development and credit has large potential to improve socio-economic conditions in poor countries. Microfinance is a proven, effective tool in the fight against poverty. The poor have displayed a capability to repay loans, pay the real cost of loans, and generate savings that are reinvested in their business. Income earned through micro-enterprises enables families to increase their spending on education, health care, and improved nutrition (USAID, 2002, p. 1 ¶ 5). In 2006, Muhammad Yunus of Bangladesh and the Grameen Bank were jointly awarded the 2006 Nobel Peace Prize. Mr Yunus founded the Grameen Bank, which is one of the pioneers of micro-credit lending schemes. Mr Yunus set up the bank in 1976 with just $27 from his own pocket. Thirty years on, the bank has 6.6 million borrowers, of which 97% are women (Grameen, 2006). Micro-enterprise programs are not without criticism. Although several problems have been cited, we concentrate on three issues: banking and interest rates, gender issues and context. The rates may far exceed standard measures of affordability. According to a 2004 survey in Microbanking Bulletin, real annual interest rates (i.e. after controlling for inflation) on group loans range between 30-50%, These rates may be lower than what moneylenders typically charge, but remain very high. The finance world contends that, accounting for the risks to the lender, these rates are appropriate; and that anything less will not attract profit-seeking bankers into this market. In addition, the Grameen Bank has claimed repayment rates as high as 95% but the accuracy of these figures has been disputed. Some observers contend that, Grameen allows distressed borrowers to roll over or stretch out their repayments rather than declaring them in default. This may be the most effective and humane approach, but it is clearly inconsistent with the business model supported by an increasing share of micro finance enthusiasts (Pollin, 2007 How the Gramen Model Works, ¶3-4). The second criticism addressed here concerns gender issues. In many traditional societies, women are legally perceived as minors. They are not allowed to take out ordinary bank loans without the signature of husbands. In many cases, the husbands are absent because they are away doing migrant work. Even when women do manage to start small businesses they must continually fight against a repressive patriarchal social structure. Since many women have had limited access to schooling, they may lack skills to help them run a successful business (Meade, 2001, A. Problems of Microcredit, Turning a Profit on the Loan, ¶5).

Page 112

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Another gender problem comes from the loan officers. They are usually male and not family members, and have authority over the women. The relationship may cause social problems. In addition, in most cases, the loans finance some type of “women’s work” seen by most as not fit for men to do. As a result, the women begin to rely on their female children to take care of the family at home, causing increased pressure for these young women to stay away from school (Khander, 1998, pp 57, 59). The third problem or concern about micro-enterprises concerns the context of the business. Whether the credit terms are low or high, micro-enterprises run by poor people cannot be successful simply because they have increased opportunities to borrow money. They also need: •

access to decent roads and affordable means of moving their products to markets and ways to reach customers

•

a vibrant, well-functioning domestic market itself that encompasses enough people with enough money to buy what they have to sell.

In addition, micro-businesses benefit greatly from an expanding supply of decent wage-paying jobs in their local economies. When the wage-paying job market is strong, it means that the number of people trying to survive as micro entrepreneurs falls. This reduces competition among micro businesses and thereby improves the chances that any given micro enterprise will succeed (Pollin, 2007 Context is Everything, ¶1).

Responding to PSD Initiatives Within private sector development, we have examined private sector and foreign direct investment, privatization, publicprivate partnerships, SMEs, micro-enterprises, and credit. How has the development evaluation community responded to these initiatives? The International Finance Corporation (IFC) evaluates the effects on interventions at the project level. The IFC uses the Business Environment Snapshots (BE Snapshots), to “present measurable indicators across a wide range of business environment issues and over time” (International Finance Corporation, 2007, ¶ Objective). This new tool compiles disparate data, indicators, and project information on the business environment for a country and makes it easily accessible, consistent, and in a usable format. Development practitioners and policymakers can use BE Snapshots to obtain a comprehensive picture of the business environment in particular countries. BE Snapshots can be used as a monitoring tool, or as a planning tool (International Finance Corporation, 2007, ¶ Audience).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 113

Chapter 2 How does one go about evaluating these kinds of activities? On a general level, one may look at four possible indicators: •

business performance

•

economic sustainability

•

environmental effects

•

private sector development.

A recent World Development Report (2005) highlighted investment climate surveys and business environment and firm performance surveys, which can determine how governments can create better investment climates for firms of all types − from farmers and micro-entrepreneurs to local manufacturing companies and multinationals. The surveys covered 26,000 firms in 53 developing countries, and 3000 micro- and informal enterprises in 11 countries. These surveys allow for the comparison of existing conditions and the benchmarking of conditions to monitor changes over time. The survey instrument is composed of a core set of questions and several modules that can be used to explore in greater depth specific aspects of the investment climate and its links to firm-level productivity. Questions can be categorized into three distinct groups: •

those generating information for the profiling of businesses

•

those used for the profiling of the investment climate in which businesses operate

•

those generating indicators of firm performance.

Indicators used were: •

policy uncertainty (major constraint, unpredictable interpretation of regulations)

•

corruption (major constraint, report bribes are paid)

•

courts (major constraint, lack confidence courts uphold property rights)

•

crime (major constraint, report losses from crime, average loss from crime as percentage of sales).

Other sources of investment climate indicators included: a business risk service; country credit ratings (Euromoney Institutional Investor); country risk indicators (World Markets Research Center); Country Risk Service (Economist Intelligence Unit); Global Competitiveness Report (World Economic Forum), and so forth.

Page 114

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Multilateral development banks, international financial institutions, development organizations, and the private sector are all involved in such surveys, providing valuable information and advice (World Development Report, 2005). There are also ongoing and periodic assessments and evaluations of investment climates in countries around the world. One notable example is the “Doing Business Database,” which provides objective measures of business regulations and their enforcement. Indicators are comparable across 145 economies. They indicate the regulatory costs of business and can be used to analyze specific regulations that enhance or constrain investment, productivity and growth (World Bank, 2007d).

Environmental and Social Sustainability Corporate social responsibility (CSR) involves actively taking into account the economic, environmental, and social impacts and consequences of business activities. Private sector companies, organizations, and governments are looking at new ways of ensuring that business activities and services do not harm the economy, society, and environment in the countries and sectors in which they operate. For example, the British government has adopted various policies and legislation to encourage corporate social responsibility in general, and environmental and social sustainability in particular. The Government sees CSR as the business contribution to our sustainable development goals. Essentially it is about how business takes account of its economic, social and environmental impacts in the way it operates—maximizing the benefits and minimizing the downsides…The Government’s approach is to encourage and incentivize the adoption and reporting of CSR through best practice guidance, and, where appropriate, intelligent regulation and fiscal incentives (CSR, 2004, What is CSR? ¶ 1). An example of an international environmental and social sustainability effort is the 2003 signing of the Equator Principles by ten western financial institutions. The Equator Principles were developed by private sector banks. The Equator Principles are an industry approach for financial institutions to determine, assess, and manage environmental and social risk in project financing. In 2006, a revised version of the Equator Principles was adopted. The new version reflects the revisions to International Finance Corporation's (IFC) own Performance Standards on Social and Environmental Sustainability. The 2006 version of the Equator Principles apply to all countries and sectors, and to all project financings with capital costs above US $10 million.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 115

Chapter 2 The IFC, and 61 (and growing) leading commercial banks (in Europe, North America, Japan and Australia) have voluntary adopted the Equator Principles in their financing of projects around the world. The institutions are seeking to ensure that the projects they finance are developed in a socially responsible manner and reflect sound environmental management practices. The Equator Principles are intended to serve as a common baseline and framework for the implementation of individual, internal environmental and social procedures, and standards for project financing activities across all industry sectors globally. In adopting these principles, the institutions undertake to review carefully all proposals for which their customers request project financing. They pledge to not provide loans directly to projects where the borrower will not, or is unable, to comply with their environmental and social policies and processes. Standards cover environmental, health and safety, indigenous peoples, natural habitats, and resettlement (The Equator Principles, 2007). BankTrack, a network of 18 international Non-Government Organizations (NGO’s), specializing in the financial sector, has played an important role helping to monitor the way the Equator Principles are implemented. They point out that making a public commitment is one thing, but applying them in good faith is quite another. The main criticism is of the way the Equator Principles monitor the financial institutions committed to the Equator Principles and their reporting requirements (BankTrack, 2008).

Global Public Goods Another emerging issue is the notion of global public goods. To understand global public goods, we first look at two terms, private goods and public goods. Economists define private goods are those for which consumption by one person reduces the amount available for others, at least until more is produced (Linux Information Project, 2006). For example, a mango is a private good because if one person has that mango, others have one less mango. Private goods tend to be tangible items that can be touched. Other examples of private goods are: olive oil, chickens, trousers, books, computers, and gold. Most products are private goods.

Page 116

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Public goods are goods that are in the public domain – goods that are there for all to consume (gpgNet, 2008). Economists define public goods as products which anyone can consume as much as desired without reducing the amount available for others (Linux Information Project, 2006). For example, clean air is a public good, breathing clean air does not mean less air for others. Public goods tend to be intangible items, things you cannot touch – many fall into the category of information or knowledge. Other examples of public goods are: languages, stories, songs, and history. Global public goods are public goods that have a fairly universal impact on: a large number of countries, a large number of people, or several generations. Examples of global public goods are: property rights, predictability, safety, financial stability, and clean environment. Indeed, development evaluation can be considered a kind of public good. … evaluation extends beyond the boundaries of any single organization. A good evaluation study can have positive spillover effects throughout the development community. Development evaluation has the characteristics of an international public good (Picciotto & Rist, 1995, p. 23). Global public goods are important because with increased openness of national borders, the public domains of countries have become interlocked. A public good in one country often depends on domestic policy AND on events and policy choices made by other countries or internationally (gpgNet, 2008). Everyone depends on public goods – neither markets nor the wealthiest person can do without them. According to Picciotto (2002b), at the global level, evaluation is: …largely absent. Collaborative programs designed to deliver global public goods are not subjected to independent appraisal and, as a result, often lack clear objectives and verifiable performance indicators. In addition, the impact of developed country policies on poor countries is not assessed systematically even though aid, debt, foreign investment, pollution, migration patterns, and intellectual property regimes are shaped by the decisions of developed country governments. Controlling the spread of and ultimately eliminating HIV/AIDS is another example of a global public good that is at the top of many international agendas. In this context, the impact of globalization on the poor has yet to be assessed. In short, development evaluation needs to become “…more indigenous, more global and more transnational” (Chelimsky & Shadish, 1997).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 117

Chapter 2 In 2004, the Independent Evaluation Group (IEG) of the World Bank released an evaluation of the World Bank’s involvement in global programs. The report, Addressing the Challenges of Globalization, investigated 26 Bank-supported global programs and drew lessons about the design, implementation, and evaluation of global programs (World Bank, IEG, 2004, Introduction section, ¶ 1). This report had 18 Findings, but highlights the following five findings: •

The Bank's Strategy for global programs is poorly defined.

•

Global programs have increased overall aid very little.

•

Voices of developing countries inadequately represented.

•

Global programs reveal gaps in investment and global public policy.

•

Independent global program oversight is needed.(World Bank, IEG, 2004, Main report box, Findings).

This report also identified the following recommendations:

Page 118

•

need strategic framework for Bank's involvement in global programs

•

linking financing to priorities

•

improved selectivity and oversight of the global program portfolio

•

better governance and management of individual programs

•

additional evaluation (World Bank, IEG, 2004, Main report box, Recommendations).

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation

Summary Evaluation now increasingly has to address more complex issues. A large majority of the OECD countries now have mature monitoring and evaluating systems. They took different paths to get where they are and are often very different in their approach, style, and level of development To describe the level of maturity of these evaluation system, the OECD uses nine criteria. A number of factors contributed the adoption of an evaluation culture. Many of the earliest adopters of evaluation systems were predisposed to do so because they had democratic political systems, strong empirical traditions, civil servants trained in the social sciences, and efficient administrative systems and institutions. Once embarking on an evaluation system, countries followed one of three approaches: whole of government, enclave, or mixed approaches. The whole of government approach involves a broad-based comprehensive establishment of the system across the government. It takes time and it must first: win the support from the government, develop necessary skills, and set up civil service structures and systems to make full use of the evaluation findings. The enclave approach is more limited, it focuses on one part or one sector of the whole government. The mixed approach is a blended approach. Some parts of sectors of the government are comprehensively evaluated while other areas receive more sporadic treatments. Developing countries have a more difficult path to achieving and evaluation system because they lack democratic political systems, strong empirical traditions, civil servants trained in the social sciences, and efficient administrative systems and institutions. Development organizations are focusing on capacity development to assist the move to evaluations systems.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 119

Chapter 2 Many complex issues in development are influencing evaluation. These issues involve multiple bilateral and multilateral development partners. Some of the major drivers for the development agenda include:

Page 120

•

Millennium Development Goals (MDG)

•

Monterrey Consensus

•

Paris Declaration on Aid Effectiveness

•

Debt Initiative for Heavily-Indebted Poor Countries (HIPC)

•

The Emergence of New Actors in International Development Assistance

•

Conflict Prevention and Post-conflict Reconstruction

•

Governance

•

Anti-Money Laundering and Terrorist Financing

•

Workers’ Remittances

•

Gender: From Women in Development (WID) to Gender and Development (GAD) to Gender Mainstreaming

•

Private Sector Development (PSD) and Investment Climate

•

Environmental and Social Sustainability

•

Global Public Goods.

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation

References and Further Reading Adams, Richard H., Jr. and John Page (2003). “International Migration, Remittances and Poverty in Developing Countries,”, World Bank Policy Research Working Paper 3179, December, 2003. BankTrack (2008) The Equator Principles. Retrieved March 26, 2008 from http://www.banktrack.org/ Boyle, Richard (2005). Evaluation capacity development in the Republic of Ireland. ECD Working Paper Series 14. Washington D.C.: The World Bank. Boyle, Richard (2002). “A two-tiered approach: Evaluation practice in the Republic of Ireland”. In J.E. Furubo, R.C. Rist, and R. Sandahal (eds.) International Atlas of Evaluation. New Brunswick, NJ: Transaction Publishers. ChannahSorah, Vijaya Vinita (2003). “Moving from measuring processes to outcomes: Lessons learned from GPRA in the United States.” Presented at World Bank and Korea Development Institute joint conference on Performance evaluation system and guidelines with application to largescale construction, R&D, and job training investments. Seoul, South Korea. July 24-25. CGAP (2003a). Consultative Group to Assist the Poor. Retrieved on July 17, 2007 from: http://www.cgap.org/ CGAP (2003b). CGAP publications on assessment and evaluation. Retrieved July 17, 2007 from: http://www.cgap.org/portal/site/CGAP/menuitem.9fab704 d4469eb0167808010591010a0/Collier, Paul (2003). Breaking the conflict trap. Washington, D.C.: World Bank. Chelimsky, Eleanor & William R. Shadish, (Eds.) (1997). Evaluation for the 21st century: A handbook. Thousand Oaks, CA: Sage. Chemin, Matthieu (2008). Special Section on Microfinance. The Benefits and Costs of Microfinance: Evidence from Bangladesh Journal of development studies, Volume 44 Issue 4 2008 463 – 484: DOI: 10.1080/00220380701846735 China National Center for Evaluation of Science & Technology (NCSTE) and the Netherland’s Policy and Operations Evaluation Department (IOB) (2004). A country-led joint evaluation of the ORET/MILIEV Programme in China. Retrieved March 26, 2008 from http://www.euforic.org/iob/docs/200610201336433964.p df

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 121

Chapter 2 Collier, Paul, V.L. Elliott, Håvard Hegre, Anke Hoeffler, Marta Reynal-Querol, and Nicholas Sambanis (2003). Breaking the conflict trap: Civil war and development policy. Oxford: Oxford University Press & Washington, DC: The World Bank. CSR (2004). What is CSR? Retrieved on July 17, 2007 from: http://www.csr.gov.uk/whatiscsr.shtml Development Committee, IFC (2004). “Strengthening the foundations for growth and private sector development: investment climate and infrastructure development.” Retrieved July 17,2007 from: http://siteresources.worldbank.org/DEVCOMMINT/Docum entation/20259614/DC2004-00011(E)Growth%20Agenda.pdf The Equator Principles (2007) A milestone or just good PR? Retrieved on July 17, 2007 from: http://www.equatorprinciples.com/principles.shtml Financial Action task Force on Money Laundering (FATF) (2007). Monitoring the Implementation of the Forty Recommendations. Retrieved July 17, 2007 from: http://www.fatfgafi.org/document/60/0,3343,en_32250379_32236920_34 039228_1_1_1_1,00.html Financial Action task Force on Money Laundering (FATF) (2004). Retrieved July 17, 2007 from: http://www.fatfgafi.org/dataoecd/14/53/38336949.pdf Fitzpatrick, J. L.; J. R.’Sanders, and B. R.Worthen (2004). program evaluation: Alternative approaches and practical guidelines. New York: Pearson Education Inc. Feuerstein, M. T. (1986). Partners in evaluation: Evaluating development and community programs with participants. London: MacMillan, in association with Teaching Aids At Low Cost. Friis, H. (1965). Development of social research in Ireland. Dublin: Institute of Public Administration. Furubo, Jan-Eric and Rolf Sandahl (2002) “Coordinated Pluralism,” in International atlas of evaluation, Jan-Eric Furubo, Ray Rist, and Rolf Sandahl, editors. New Brunswick, NJ: Transaction Publishers. G-8 (2004). G-8 Plan of Support for Reform, Sea Island Summit, June 2004. G-8 (2004). G-8 Action Plan: Applying the Power of Entrepreneurship to the Eradication of Poverty, Sea Island Summit, June 2004

Page 122

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Gerrard, Christopher (2006). “Global partnership programs: Addressing the challenge of evaluation.” PowerPoint presentation on March 31, 2006. Retrieved April 3, 2008 from http://www.oecd.org/secure/pptDocument/0,2835,en_215 71361_34047972_36368404_1_1_1_1,00.ppt Government Accountability Office (GAO) (2003). Executive guide: Effectively implementing the government performance and results act. Retrieved March 25, 2008 from http://www.gao.gov/special.pubs/gpra.htm gpgNet (2008). The global network on global public goods. Retrieved on March 27, 2008 from http://www.sdnp.undp.org/gpgn/# Graham, Jeffrey P. and R. Barry Spaulding (2005). Going Global: Understanding foreign direct investment. JPG Consulting. Retrieved March 29, 2008 from http://www.goingglobal.com/articles/understanding_foreign_direct_investme nt.htm Grameen Bank, (2006) The Nobel Peace Prize 2006. Retrieved July 17, 2007 from: http://www.grameeninfo.org/Media/mediadetail6.html Guerrero, R. Pablo (1999). “Evaluation capacity development: Comparative insights from Colombia, China, and Indonesia,” in Richard Boyle and Donald Lemaire, editors, Building effective evaluation capacity: Lessons from practice New Brunswick, NJ: Transaction Publishers. Hatry, Harry P., Elaine Morely, Shelli B. Rossman, Joseph P. Wholey (2003). “How federal programs use outcome information: Opportunities for federal managers.” Washington, D.C.: IBM Endowment for the Business of Government. Hauge, Arild (2001). “Strengthening capacity for monitoring and evaluation in Uganda: A results based perspective.” World Bank Operations Evaluation Dept. ECD Working Paper Series, Number 8. Washington, D.C. Hougi, Hong and Ray C. Rist (2002). “Evaluation Capacity Building in the People’s Republic of China,” in International atlas of evaluation, Furubo, Rist and Sandahl, editors. New Brunswick, NJ: Transaction Publishers. Institute of Development Studies (IDS) (2008). Impact Evaluation: the experience of official agencies, IDS Bulletin vol 39, no. 1, March 2008. Retrieved May 28, 2008 from http://www.ntd.co.uk/idsbookshop/details.asp?id=1030

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 123

Chapter 2 International Development Association and International Monetary Fund (2007). Heavily Indebted Poor Countries (HIPC) Initiative and Multilateral Debt Relief Initiative (MDRI) – status of implementation. Retrieved March 31, 2008 from http://siteresources.worldbank.org/DEVCOMMINT/Docum entation/21510683/DC2007-0021(E)HIPC.pdf International Federal of Accountants (2004). “Anti-Money Laundering,” Second Edition, March 2004. International Finance Corporation (2007). Business Environment Snapshots. Retrieved April 1, 2008 from http://rru.worldbank.org/documents/BES_Methodology_N ote_External.pdf International Monetary Fund and World Bank (2004. Global Monitoring Report 2004. Retrieved October 19, 2007 from http://www.imf.org/external/np/pdr/gmr/eng/2004/0412 04.htm Investor Dictionary.com (2006). “Money laundering,” Retrieved July 17, 2007 from: http://www.investordictionary.com/definition/money+laun dering.aspx Jalan, Bimal (2000). “Reflections on Gender Policy,” in Evaluating Gender Impact of Bank Assistance, World Bank Operations Evaluation Department. Joint Progress Toward Enhanced Aid Effectivenss, High Level Forum (2005). Paris declaration on aid effectiveness: Ownership, harmonization, alignment, results, and mutual accountability. Retrieved July 11, 2007 from http://www1.worldbank.org/harmonization/Paris/FINALP ARISDECLARATION.pdf Formal and Informal Rural Credit in Four Provinces of Vietnam 485 – 503 Authors: Mikkel Barslund; Finn Tarp DOI: 10.1080/00220380801980798 Katjomulse, Kavazeua, Patrick N. Osakwe, Abebe Shimeles, and Sher Verick (2007). The Monterrey Consensus and development in Africa: Progress, challenges and way forward. Addis Ababa, Ethiopia: Financing Development Section; Trade, Finance, and Economic Commission for Africa; UN Economic Commission for Africa. Retrieved March 31, 2008 from http://www.uneca.org/eca_programmes/trade_and_region al_integration/documents/MonterreyConsensusMainReport .pdf Khander, Shahidur R. (1998). Fighting poverty with microcredit: Experience in Bangladesh. New York: Oxford University Press.

Page 124

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Kreimer, Alcira, John Eriksson, Robert Muscat, Margaret Arnold, and Colin Scott (1998). The World Bank’s Experience with Post-Conflict Reconstruction. Retrieved July 17, 2007 from: http://www.reliefweb.int/library/documents/2002/wbpostconflict-jun98.pdf Kusek, Jody Zall, Rist, Ray C., and Elizabeth M. White (2004). How will we know millennium development results when we see them? Building a results-based monitoring and evaluation system to give us the answer. , World Bank Africa Region Working Paper Series, Number 66, May 2004. Lawrence, J. (1989). Engaging recipients in development evaluation—the “stakeholder” approach. Evaluation Review, 13(3). Lee, Yoon-Shik (1999). “Evaluation coverage.” In Building effective evaluation capacity: Lessons from practice. Richard Boyle, Donald Lemaire, editors. New Brunswick, NJ: Transaction Publications. Linux Information Project (2006). Public Goods: A brief introduction. Retrieved March 27 from http://www.linfo.org/public_good.html Nellis, John (1999). “Time to Rethink Privatization in Transition Economies?” IFC Discussion Paper Number 38. Moser, Caroline O.N. (1995). “Evaluating Gender Impacts,” in Evaluating Country Development Policies and Programs: New Approaches for a New Agenda, Number 67, Fall 1995. Robert Picciotto, Ray C. Rist, editors. Jossey-Bass Publishers. Mackay, Keith (2008). M&E systems to improve government performance: Lessons from Australia, Chile and Columbia. PowerPoint presentation to High-Level Delegation from the People’s Republic of China held in Washington, D.C. March 6, 2008. Mackay, Keith (2007). Three generations of national M&E system in Australia. PowerPoint presentation to the Third Latin America & Caribbean Regional Conference on Monitoring and Evaluation held in Lima, Peru, 23-24 July 2007. Mackay, Keith (2002). “The Australian government: Success with a central, directive approach,” in Furubo, Rist, and Sandahl, editors, International atlas of evaluation. (New Brunswick, NJ: Transaction Publishers, 2002).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 125

Chapter 2 Meade, Jason (2001). An Examination of the Microcredit Movement Yahoo Geocities. Retrieved March 27, 2008 from http://www.geocities.com/jasonmeade3000/Microcredit.ht ml OECD (2007) Final ODA flows in 2006. Retrieved March 31, 2008 from http://www.oecd.org/dataoecd/7/20/39768315.pdf OECD (2007a) Monitoring the Paris Declaration. Retrieved October 19, 2007 from http://www.oecd.org/department/0,3355,en_2649_155772 09_1_1_1_1_1,00.html OECD (2006) Final ODA data for 2005. Retrieved March 31, 2008 from http://www.oecd.org/dataoecd/52/18/37790990.pdf OECD (2005) Aid rising sharply, according to lasts OECD figures. Retrieved March 31, 2008 from http://www.oecd.org/dataoecd/0/41/35842562.pdf OECD (2004). The Istanbul Ministerial Declaration on Fostering Growth of Innovative and Internationally Competitive SMEs. Retrieved July 17, 2007 from: http://www.oecd.org/document/16/0,3343,en_2649_2011 85_32020176_1_1_1_1,00.html OECD (2004a) Final ODA data for 2003. Retrieved March 31, 2008 from http://www.oecd.org/dataoecd/19/52/34352584.pdf OECD (2002). Public Management and Governance. “Overview of results-focused management and budgeting in OECD member countries.” Twenty-third annual meeting of OECD Senior Budget Officials. Washington, D.C. June 3-4. OECD (2002a). OECD glossary of statistical terms. Retrieved March 31, 2008 from http://stats.oecd.org/glossary/index.htm OECD/DAC (2003). Joint OECD DAC/Development Centre Experts’ Seminar on aid effectiveness and selectivity: Integrating multiple objectives into aid allocations. Retrieved on July 17, 2007 from: http://www.oecd.org/document/51/0,2340,en_2649_3443 5_2501555_119808_1_1_1,00.html OECD Development Co-operation Directorate (2007). Development aid from OECD countries fell 5.1% in 2006. Retrieved March 31, 2008 from http://www.oecd.org/document/17/0,3343,en_2649_3372 1_38341265_1_1_1_1,00.html

Page 126

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation OECD Development Co-operation Directorate (2005). The Paris Declaration. Retrieved July 11, 2007 from http://www.oecd.org/document/18/0,2340,en_2649_3236 398_35401554_1_1_1_1,00.html Oxford Analytica (2004). “Foundations muscle into aid arena”, Oxford Analytica, August 10, 2004. Oxford Analytica (2004a) “Remittances fund investment growth,” 7 September 2004. Oxford: Oxford Analytica Patton, M.Q. (2006). Recent trends in evaluation. A presentation to International Finance Corporation, May 8, 2006. Patton, M.Q. (1997). Utilization-focused evaluation (3rd ed.). Thousand Oaks, CA: Sage. Picciotto, Robert. (2003). “International trends and development evaluation: The need for ideas”. American Journal of Evaluation. 24: 227-234. Picciotto, Robert (2002a). “Development cooperation and performance evaluation: The Monterrey challenge,” World Bank. Picciotto, Robert (2002b). “Development Evaluation as a Discipline”. Paper and presentation to IPDET, June, 2002. Picciotto, Robert and Ray C. Rist (1995). Evaluating country development policies and programs: New approaches and a new agenda. Jossey-Bass Publishers, Number 67, Fall 1995. Pollin, Robert (2007). Microcredit: False hopes and real possibilities. Retrieved March 28, 2008 from http://www.fpif.org/fpiftxt/4323 Public Services Organisation Review Group (1969). Report of the Public Services Organization Review Group. Dublin: Stationary Office. Qureshi, Zia (2004). Millennium Development Goals and Monterrey Consensus: From vision to action. Washington D.C.: World Bank. Retrieved October 19, 2007 from http://wbln0018.worldbank.org/eurvp/web.nsf/Pages/Pap er+by+Qureshi/$File/MOHAMMED+QURESHI.PDF Rasappan, Arunaselam (2007). Implementation strategies and lessons learnt with results based budgeting (Malaysia). Training Course on Program and Performance Budgeting. Presentation on October 1-5, 2007. Retrieved April 2, 2008 from http://blogpfm.imf.org/pfmblog/files/rasappan_implementation_strate gies_lessons_malaysia.pdf

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 127

Chapter 2 Rist, Ray C. and Nicoletta Stame, editors (2006). “From studies to streams: Managing evaluative systems.” New Brunswick, New Jersey: Transaction Books Schacter, Mark (2000). “Sub-Saharan Africa: Lessons from experience in supporting sound governance.” World Bank Operations Evaluation Department. ECD Working Paper Series, Number 7. Washington, D.C. Soros Foundations Network (2007). About OSI and the Souros Foundation Network. Retreived July 17, 2007 from: http://www.soros.org/about/overview Takamasa, Akiyama and Kondo Masanori, editors (2003). Global ODA since the Monterrey Conference, Foundation for Advanced Studies on International Development (FASID), International Development Research Institute, Japan, Tavistock Institute (in association with GHK and IRS) (2003). The evaluation of socio-economic development: The GUIDE. Retrieved November 13, 2007 from http://coursenligne.sciencespo.fr/2004_2005/g_martin/guide2.pdf. Tedeschi, Gwendolyn Alexander (2008). Overcoming selection bias in mcrocredit impact assessments: A case study in Peru 504-518: DOI: 10.1080/00220380801980822 (The last article also covers evaluation of Mibanco, IFC client MFI in Peru.) Retrieved May 28, 2008 from http://www.informaworld.com/smpp/content~content=a79 2696580~db=all~order=page Thomas, Koshy, Deputy Undersecretary, Ministry of Finance, Malaysia (2007). “Integrated results based management in Malaysia.” in Results matter: Ideas and experiences on managing for development results. December 2007. Asian Development Bank. Retrieved April 2, 2008 from http://www.adb.org/Documents/Periodicals/MfDR/dec2007.pdf Transparency International, (2007) Retrieved July 17, 2007 from: http://www.transparency.org/ Trosa, Sylvie (2008). Towards a post-bureaucratic management in France. United Nations (2006). The United Nations Convention Against Transnational Organized Crime and its Protocols. Retrieved July 17, 2007 from: http://www.unodc.org/unodc/en/crime_cicp_convention.h tml

Page 128

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation United Nations (1998). Public-Private Partnerships: A New Concept for Infrastructure Development. United Nations. USAID (2002). USAID-supported microcredit institution receives World Bank investment. Europe and Eurasia, Central Asian Republic’s Success Stories Home, Success Stories Archive, July 2002. Retrieved on July 17, 2007 from: http://www.usaid.gov/locations/europe_eurasia/car/succe ssarchive/0207carsuccess.html The World Bank (2007a). Conflict Prevention and Reconstruction. Retrieved July 17, 2007 from: http://lnweb18.worldbank.org/ESSD/sdvext.nsf/67ByDoc Name/ConflictPreventionandReconstruction The World Bank (2007b). The data revolution: Measuring governance and corruption. Retrieved July 17, 2007 from: http://web.worldbank.org/WBSITE/EXTERNAL/NEWS/0,c ontentMDK:20190210~menuPK:34457~pagePK:34370~piP K:34424~theSitePK:4607,00.html The World Bank (2007c) “The Data Revolution: Measuring Governance and Corruption,” Retrieved July 17, 2007 from: http://web.worldbank.org/WBSITE/EXTERNAL/NEWS/0,, contentMDK:20190210~menuPK:34457~pagePK:34370~piP K:34424~theSitePK:4607,00.html The World Bank (2007d). Doing business: Economy profile reports, http://rru.worldbank.org/DoingBusiness/ The World Bank (2007e) Global Monitoring Report 2007. Retrieved October 19, 2007 from http://web.worldbank.org/WBSITE/EXTERNAL/EXTDEC/ EXTGLOBALMONITOR/EXTGLOMONREP2007/0,,menuPK: 3413296~pagePK:64218926~piPK:64218953~theSitePK:34 13261,00.html The World Bank (2007f) The enhanced HIPC Initiative – Overview. Retrieved November 13, 2007 from http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/E XTDEBTDEPT/0,,contentMDK:21254881~menuPK:641667 39~pagePK:64166689~piPK:64166646~theSitePK:469043,0 0.html The World Bank (2007). Aid Architecture: An overview of the main trends in official development assistance flows, Executive Summary. Retrieved March 31, 2008 from http://siteresources.worldbank.org/IDA/Resources/Aidarc hitecture-execsummary.pdf The World Bank (2005). World Development Report 2005 “A better investment climate for everyone.” Washington, D.C., World Bank.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 129

Chapter 2 The World Bank (2005a) Global Monitoring Report 2005). Retrieved October 19, 2007 from http://www.worldbank.org/features/2005/gmr_0405.htm The World Bank, IEG (2004). Evaluating the World Bank’s approach to global programs Addressing the challenges of globalization. Retrieved July 11, 2007 from http://www.worldbank.org/oed/gppp/ The World Bank (2003). Global Development Finance 2003, Retrieved on July 17, 2007 from: http://siteresources.worldbank.org/INTRGDF/Resources/ GDF0slide0show103010DC0press0launch.pdf The World Bank (2001). “Strategic Directions for FY02–FY04.” Washington, D.C. Retrieved July 11, 2007, from http://lnweb18.worldbank.org/oed/oeddoclib.nsf/24cc3bb 1f94ae11c85256808006a0046/762997a38851fa0685256f8 200777e15/$FILE/gppp_main_report_phase_2.pdf#page=2 1 The World Bank (1999). “Monitoring and evaluation capacity development in Africa” in Précis Spring 1999, Number 183. Retrieved November 13, 2007 from http://wbln0018.worldbank.org/oed/oeddoclib.nsf/7f2a29 1f9f1204c685256808006a0025/34b9bade34aca617852567 fc00576017/$FILE/183precis.pdf The World Bank (1994). Enhancing Women’s Participation in Economic Development. Washington, D.C.: World Bank. World Development Report (2005b): A Better investment climate for everyone. Co-publication of World Bank and Oxford University Press. The World Bank (2006) Global Monitoring Report 2006. Retrieved October 19, 2007 from: http://web.worldbank.org/external/default/main?menuPK =2186472&pagePK=64218926&piPK=64218953&theSitePK =2186432 Uganda Office of the Prime Minister (2007-a). “Working Note: Monitoring and evaluation of the National Development Plan”. October 2007. Uganda Office of the Prime Minister (2007-b). National integrated monitoring and evaluation strategy (NIMES): 20062007 bi-annual implementation progress report. Reporting period: January to June 2007. United Nations Conference on Trade and Development (UNCTD) (2008). Press Release: Foreign direct investment reached new record in 2007. Retried March 27, 2008 from http://www.unctad.org/Templates/Webflyer.asp?docID=94 39&intItemID=2068&lang=1 Page 130

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation The World Bank (2008) Online atlas of the Millennium Development Goals: Building a better world. Retrieved April 1, 2008 from http://devdata.worldbank.org/atlas-mdg/ Woroniuk, B., and Schalkwyk, J. (1998). OECD DAC, Gender Tipsheet, Evaluation, Prepared for Sida. Retrieved July 17, 2007 from: http://www.oecd.org/dataoecd/2/13/1896352.pdf

Web Sites Conflict Conflict Prevention and Reconstruction Unit, World Bank http://lnweb18.worldbank.org/ESSD/sdvext.nsf/67By DocName/ConflictPreventionandReconstruction Crime UN Convention Against Transnational Organized Crime and its Protocols: http://www.unodc.org/unodc/en/crime_cicp_conventio n.html Environment The Equator Principles: http://www.equator-principles.com/ga1.shtml Finance CGAP, http://www.cgap.org/ CGAP assessment and evaluation: http://www.cgap.org/publications/assessment_evaluati on.html Tedeschi, Gwendolyn Alexander (2008). Overcoming selection bias in mcrocredit impact assessments: A case study in Peru 504-518: DOI: 10.1080/00220380801980822 (The last article also covers evaluation of Mibanco, IFC client MFI in Peru.) Retrieved May 28, 2008 from http://www.informaworld.com/smpp/content~content=a79 2696580~db=all~order=page The World Bank, Doing Business: Economy Profile Reports: http://rru.worldbank.org/DoingBusiness/ Gender Gender OECD DAC, Gender Tipsheet, Evaluation: http://www.oecd.org/dataoecd/2/13/1896352.pdf The World Bank “Enhancing Women’s Participation in Economic Development,”: http://www.worldbank.org/gender/overview/enhance.h tm

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 131

Chapter 2 Governance Transparency International: http://www.transparency.org/ Governance Matters 2007 Website http://info.worldbank.org/governance/wgi2007/ Governance Matters Report 2007 http://info.worldbank.org/governance/wgi2007/pdf/bo oklet_decade_of_measuring_governance.pdf Governance Matters Video 2007 http://web.worldbank.org/WBSITE/EXTERNAL/NEWS /0,,contentMDK:21400275~menuPK:51416191~pagePK: 64257043~piPK:437376~theSitePK:4607,00.html UK Policy: Corporate Social Responsibility: Policy and Legislation-UK: http://www.societyandbusiness.gov.uk/ukpolicy.shtml Impact Evaluation Institute of Development Studies (IDS) (2008). Impact Evaluation: the experience of official agencies, IDS Bulletin vol 39, no. 1, March 2008. Retrieved May 28, 2008 from http://www.ntd.co.uk/idsbookshop/details.asp?id=1030 Millennium Development Development Goals Online atlas of the Millennium Development Goals: Building a better world. http://devdata.worldbank.org/atlas-mdg/ Millennium Development Goals http://www.un.org/millenniumgoals/ OECD. “Istanbul Ministerial Declaration on Fostering Growth of Innovative and Internationally Competitive SMEs,” OECD, June 2004. http://www.oecd.org OECD DAC, joint evaluations: http://www.oecd.org/document/51/0,2340,en_2649_3 4435_2501555_119808_1_1_1,00.html Global Monitoring Report 2004 http://www.imf.org/external/np/pdr/gmr/eng/2004/0 41204.htm Global Monitoring Report 2005 http://www.worldbank.org/features/2005/gmr_0405.ht m Global Monitoring Report 2006 http://web.worldbank.org/external/default/main?men uPK=2186472&pagePK=64218926&piPK=64218953&th eSitePK=2186432

Page 132

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding Issues Driving Development Evaluation Global Monitoring Report 2007 http://web.worldbank.org/WBSITE/EXTERNAL/EXTDE C/EXTGLOBALMONITOR/EXTGLOMONREP2007/0,,me nuPK:3413296~pagePK:64218926~piPK:64218953~theS itePK:3413261,00.html Monterrey Consensus Millennium Development Goals and Monterrey Consensus: From vision to action by Qureshi, Zia (2004) World Bank http://wbln0018.worldbank.org/eurvp/web.nsf/Pages/ Paper+by+Qureshi/$File/MOHAMMED+QURESHI.PDF Money Laundering Financial Action task Force on Money Laundering http://www1.oecd.org/fatf/AboutFATF_en.htm New Actors Soros Foundation: http://www.soros.org/ Paris Declaration OECD Development Co-operation Directorate (DCD-DAC) (2005). The Paris Declaration. http://www.oecd.org/document/18/0,2340,en_2649_3 236398_35401554_1_1_1_1,00.html OECE (2007). Monitoring the Paris Declaration http://www.oecd.org/department/0,3355,en_2649_155 77209_1_1_1_1_1,00.html Population Population Issues and the Role of the World Bank http://web.worldbank.org/WBSITE/EXTERNAL/NEWS /0,,contentMDK:21415943~pagePK:34370~piPK:34424~ theSitePK:4607,00.html Poverty Poverty-Environment Website http://www.povertyenvironment.net PovertyNet newsletter (from The World Bank) http://www.worldbank.org/poverty AdePT software to make poverty analysis easier and faster http://econ.worldbank.org/programs/poverty/adept

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 133

Chapter 2 Other and General Information OECD. “Istanbul Ministerial Declaration on Fostering Growth of Innovative and Internationally Competitive SMEs,” OECD, June 2004. http://www.oecd.org OECD DAC, joint evaluations: http://www.oecd.org/document/51/0,2340,en_2649_3 4435_2501555_119808_1_1_1,00.html UK Policy: Corporate Social Responsibility: Policy and Legislation-UK: http://www.societyandbusiness.gov.uk/ukpolicy.shtml The World Bank Participation Sourcebook. (HTML format): http://www.worldbank.org/wbi/sourcebook/sbhome.htm

The World Bank “About Private Participation in Infrastructure” http://www.worldbank.org/infrastructure/ppi/

Page 134

The Road to Results: Designing and Conducting Effective Development Evaluations

The Road to Results Designing and Conducting Effective Development Evaluations

Preparing “Our plans miscarry because they have no aim. When a man does not know what harbour he is making for, no wind is the right wind” SENECA

Chapter 3: Building a Results Resultssults-Based Monitoring and Evaluation System •

Importance of Results-Based M&E?

•

What is Results-Based M&E?

•

Traditional vs. Results-Based M&E

•

The Ten Steps to Building a Results-Based M&E System.

Chapter 4: Understanding the Evaluation Context and Program Theory Theory of Change •

Front-end Analysis

•

Identifying the Main Client and Key Stakeholders

•

Understanding the Context

•

Constructing, Working with, and Assessing a Theory of Change.

Chapter 5: Considering the Evaluation Appr Approach •

Introduction to Evaluation Approaches

•

Development Evaluation Approaches

•

Challenges Going Forward.

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 3 Building a Results-Based Monitoring and Evaluation System Introduction In all parts of the world, governments are attempting to address demands and pressures for improving the lives of their citizens. Internal and external pressures and demands on governments and development organizations are causing them to seek ways to improve public management. Improvements might include greater accountability and transparency and enhanced effectiveness of interventions. Results-based monitoring and evaluation (M&E) is a management tool to help track progress and demonstrate the impact of development projects, programs, and policies. This chapter is the one place in this textbook where we explicitly address the function of monitoring. After this chapter, our attention turns exclusively to a focus on evaluation. There are four parts in this chapter. They are: •

Importance of Results-based M&E

•

What Is Results-based M&E?

•

Traditional vs. Results-based M&E

•

The Ten Steps to Building a Results-based M&E System.

Chapter 3

Part I: Importance of ResultsResults-based M&E There are growing pressures in developing countries to improve the performance of their public sectors. Responding to these pressures leads countries to begin to develop performance management systems. These new systems involve reform in budgeting, human resources, and organizational culture. To assess whether public sector efforts are working, there is also a need for performance measurement. This is where we turn our attention to monitoring and evaluation (M&E) systems. These systems track the results produced (or not) by governments and other entities. Many of the international and external initiatives (covered in Chapter 2 of this text) are pushing governments to adopt public management systems that show results. The Millennium Development Goals (MDGs), and the Highly Indebted Poor Country (HIPC) Initiative are two examples of these initiatives. This chapter describes a ten-step approach to the design and construction of a results-based M&E system that is currently being implemented in a number of developing countries (see Figure 3.1).

1

Planning for Improvement − Selecting Realistic Targets

Selecting Key Indicators to Monitor Outcomes

Conducting a Readiness Assessment

2 Agreeing on Outcomes to Monitor and Evaluate

3

4 Baseline Data on Indicators— Where Are We Today?

5

Using Findings

The Role of Evaluations

6 Monitoring for Results

7

8

9

Reporting Findings

10 Sustaining the M&E System within the Organization

Fig. 3.1: Ten Steps to Designing, Building and Sustaining a ResultsResults-based Monitoring and Evaluation Evaluation System.

Page 138

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System The overall strategy outlined in this chapter builds on the experiences of developed countries—especially those in the OECD—but also reflects the particular challenges and difficulties faced by many developing countries as they too, try to initiate performance measurement systems. Their challenges can range from a lack of skill capacity to poor governance structures to systems that are far from transparent. Although the primary focus of this chapter is on improving government effectiveness and accountability using a sound monitoring and evaluation system, the principles and strategies apply equally well at the level of organizations, policies, programs, and projects.

The Power of Measuring Results •

If you do not measure results, you cannot tell success from failure.

•

If you cannot see success, you cannot reward it.

•

If you cannot reward success, you are probably rewarding failure.

•

If you cannot see success, you cannot learn from it.

•

If you cannot recognize failure, you cannot correct it.

•

If you can demonstrate results, you can win public support (Osborn & Grabler, 1992).

A results-based M&E system provides crucial information about public sector or organizational performance. It can help policy makers, decision makers, and other stakeholders answer the fundamental questions of whether promises were kept and outcomes achieved. If governments are promising improved performance, monitoring and evaluation is the means by which improvements – or a lack of improvements – can be demonstrated. By reporting the results of various interventions, governments and other organizations can promote credibility and public confidence in their work. Such practices also support a development agenda that is shifting towards greater accountability for aid lending.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 139

Chapter 3 A good results-based M&E system can be extremely useful as a management and motivational tool. It helps focus people’s attention on achieving outcomes that are important to the organization and its stakeholders, and provides an impetus for establishing key goals and objectives that address these outcomes. It also promotes to managers crucial information on whether the theory of change guiding the intervention is appropriate, correct, and adequate to the changes being sought through this intervention. Once indicators are established, indicators selected, and targets set, the organization is striving to achieve them, the M&E system can provide timely information to staff about progress, and can help with the early identification of any weaknesses that require corrective action. A good M&E system is an essential source of information for streamlining and improving interventions to maximize the likelihood of success. A good M&E system helps identify promising interventions early so that they can potentially be implemented elsewhere. Having data available about how well a particular project, practice, program, or policy works provides useful information for formulating and justifying budget requests. It also allows judicious allocation of scarce resources to the interventions that will provide the greatest benefits. Monitoring data also provides information on outliers – those sites, be they clinics, hospitals, schools, neighbourhoods, etc. – that are doing particularly well or poorly. Evaluation can then be undertaken to find out why and learn from it.

Part II: What Is ResultsResults-based Monitoring and Evaluation? Results-based information can come from two sources: a monitoring system and an evaluation system. Both of these systems are essential for effective performance measurement; they are distinct but complementary. Results-based monitoring is a continuous process of measuring progress toward explicit short, intermediate, and long-term results by the tracking of indicators. It can provide feedback on progress (or lack thereof) to staff and decision-makers, who can use the information in various ways to improve performance.

Page 140

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System

Definition of ResultsResults-based Monitoring Results-based monitoring (what we call “monitoring”) is a continuous process of collecting and analyzing information on key indicators, and comparing actual results to expected results in order to measure how well a project, program or policy is being implemented.

Definition of ResultsResults-based Evaluation Results-based evaluation is an assessment of a planned, ongoing, or completed intervention to determine its relevance, efficiency, effectiveness, impact, and sustainability. The intention is to provide information that is credible and useful, enabling incorporation of lessons learned into the decision making process of recipients. The main differences between results-based monitoring and evaluation are that: •

Monitoring is focused on tracking evidence of movement towards the achievement of specific, predetermined targets by the use of indicators.

•

Evaluation takes a broader view of an intervention, asking if the progress towards the target or explicit result is caused by the intervention or if there is some other explanation for the changes showing up in the monitoring system. Examples of other types of evaluation questions can include: −

Whether the targets and outcomes are relevant and worthwhile in the first place?

−

How effectively and efficiently they are being achieved?

−

What unanticipated effects have been caused by the intervention?

−

Whether the intervention as a package represents the most cost-effective and sustainable strategy for addressing a particular set of identified needs?

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 141

Chapter 3

Part III: Traditional vs. ResultsResults-Based M&E It is not a new phenomenon for governments to monitor and evaluate their own performance. Governments have, over time tracked their: •

expenditures and revenues

•

staffing levels and resources

•

program and project activities

•

numbers of participants

•

goods and services produced, etc.

A theoretical distinction needs to be drawn, however, between traditional M&E and results-based M& E.

Traditional M&E focuses on the monitoring and evaluation of inputs, activities, and outputs (i.e., on project or program implementation). ResultsResults-based M&E, M&E in contrast, combines the traditional approach of monitoring implementation with the assessment of results.

It is this linking of implementation progress with progress in achieving the desired objectives or results of government policies and programs that makes results-based M&E most useful as a tool for public management. Implementing this type of M&E system allows the organization to modify and make adjustments to the theory of change as well as the implementation processes in order to more directly support the achievement of desired objectives and outcomes.

Brief Introduction to Theory of Change One way to view the differences between traditional M&E and results-based M&E is to view the theory of change. Theory of change is a representation of how an organization or initiative is expected to lead to the results and an identification of the underlying assumptions being made. More information about theory of change and how to construct a theory of change is discussed in Chapter 4: Understanding the Evaluation Context and Program Theory of Change.

Page 142

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System While theory of change models can vary considerably in terms of how they look, they typically have three main components – activities, outputs, and results. Table 3.1 summarizes the components of a logic model. Table 3.1: Components of a Graphic Representation of a Theory of Change Components Key Attribute

Description

Activities

The main actions of the project.

What we do

The description may begin with an action verb (e.g., market, provide, facilitate, deliver). Outputs

What we produce

Outputs are the tangible products or services produced as a result of the activities. They are usually expressed as nouns. They typically do not have modifiers. They are tangible and can be counted.

Results

Why we do it

Results are the changes or the differences that result from the project outputs. Note that there can be up to three levels of results (immediate, intermediate, and ultimate or final). Results are usually modified (e.g. increased, decreased, enhanced, improved, maintained).

Immediate Results

Those changes that result from the outputs. These results are most closely associated with or attributed to the project.

Intermediate Results Those changes that result from immediate results and will lead to the ultimate outcomes. Ultimate Results

Those changes that result from the intermediate results. Generally considered a change in overall “state”. Can be similar to strategic objectives. Link final results to the agency’s strategic results as specified in the MRRS.

Some theory of change models also include other features, such as: •

Reach or Target Groups – To which target groups/clients are the activities directed?

•

Inputs – What resources are used?

•

Internal/External Factors – The identification of factors within and outside control or influence.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 143

Results

Chapter 3

Impacts

• Long-term, widespread improvement in society

↑ Outcomes

• Behavioral changes, both indented and unintended, positive, and negative

Implementation

↑ Outputs

• Products and services produced/delivered

↑ Activities

• Tasks personnel undertake in order to transform inputs into outputs

↑ Inputs

• Financial, human, and material resources

Fig. 3.2: Program Theory of Change (Logic Model) to Achieve Results — Outcomes and Impacts

Results

Let us use this model to frame a results-based approach to reducing childhood morbidity, using oral re-hydration therapy (ORT) as an example as shown in Figure 3.3. Impacts

• Child morbidity reduced

↑ Outcomes

• Improved use of ORT in management of childhood diarrhea (behavioral change)

Implementation

↑ Outputs

• Increased maternal awareness of and access to ORT services

↑ Activities

• Media campaigns to educate mothers, health personnel trained in ORT, etc.

↑ Inputs

• Funds, ORT supplies, trainers, etc.

Fig. 3.3: 3.3: Program Theory of Change (Logic Model) to Reduce Childhood Morbidity via use of ORT

Page 144

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System

Performance Indicators Monitoring involves measurement of the progress towards achieving an outcome or impact (results.) However, the outcome cannot be measured directly. It must first be translated into a set of indicators that, when regularly measured, will provide information whether or not the outcomes or impacts are being achieved. A performance indicator (sometimes called an indicator) is: a variable that allows the verification of changes in the development intervention or shows results relative to what was planned (OECD/DAC 2002, p. 29). For example: if country X selects the target of improving the health of children by reducing childhood morbidity from infections diseases by 30% over the next five years, it must first identify a set of indicators that translate changes in the incidence of childhood morbidity from infectious diseases into more specific measurements. Again, indicators are the measures that evaluators are tracking to see whether the change they seek has actually occurred. Indicators that can help assess the changes in childhood morbidity might include: •

the incidence and prevalence of infectious diseases, such as hepatitis (a direct determinant)

•

the level of maternal health (an indirect determinant)

•

the degree to which children have access to sanitary water supplies

•

the number of calories consumed.

It is the cumulative evidence of a cluster of indicators that managers examine to see if their program is making progress. We strongly urge that no outcome or impact be measured by just one indicator. Measuring a disaggregated set of indicators (a set of indicators that have been divided into constituent parts) provides important information as to how well government programs and policies are working to achieve the intended outcome or impact. But they are also used to identify program outliers, those sites that are performing better or worse than the average. They can also identify policies that are not performing well. If, for example, it is found from the measurement of a set of indicators, that over time, fewer and fewer children have clean water supplies available to them, then the government can use this information to reform programs aimed to improve water supplies, or strengthen those programs that provide information to parents about the need to sanitize water before providing it to their children. The Road to Results: Designing and Conducting Effective Development EvaluationsPage 145

Chapter 3 It is important to note here that performance information obtained from a monitoring system only reveals the performance of what is being measured at that time, although it can be compared against both past performance and some planned level of present or anticipated performance (targets). Monitoring data do not reveal why that level of performance occurred, nor does it provide causal explanations about changes in performance from one reporting period to another or one site to another. This information comes from an evaluation system. An evaluation system serves as a complimentary but distinct function from that of a monitoring system within a resultsbased management framework. Building an evaluation system allows for: •

a more in-depth study of results-based outcomes and impacts

•

bringing in other data sources than just extant indicators

•

addressing factors that are too difficult or expensive to continuously monitor

•

tackling the issue of why and how the trends being tracked with monitoring data are moving in the directions they are (perhaps most important).

Such data on impacts and causal attribution are not to be taken lightly and can play an important role in an organization making strategic resource allocations.

Page 146

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System

Part IV: The Ten Steps to Building a ResultsResults-Based M & E System Building a quality results-based M&E system involves ten steps (see also Figure 3.1, presented at the beginning of the chapter): 1. Conducting a Readiness Assessment 2. Agreeing on Performance Outcomes to Monitor and Evaluate 3. Selecting Key Indicators to Monitor Outcomes 4. Baseline Data on Indicators—Where Are We Today? 5. Planning for Improvement—Setting Realistic Targets 6. Monitoring for Results 7. The Role of Evaluations 8. Reporting Findings 9. Using Findings 10. Sustaining the M&E System within the Organization.

Step One: Conducting a Readiness Assessment . Conducting a Readiness Assessment

1

2

3

4

5

6

7

8

9

10

A readiness assessment is a way of determining the capacity and willingness of a government and its development partners to construct a results-based M&E system. This assessment addresses such issues as the presence or absence of champions and incentives, roles and responsibilities, organizational capacity, and barriers to getting started.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 147

Chapter 3 Incentives. The first part of the readiness assessment involves understanding what incentives exist for moving forward to construct this M & E system and conversely, what disincentives will hinder positive progress. Specific questions to consider under this heading include: 1. What is driving the need for building an M&E system? 2. Who are the champions for building and using an M&E system? 3. What is motivating those who champion building an M&E system? 4. Who will benefit from the system? 5. Who will not benefit? Roles and Responsibilities. Next, it is important to identify who is currently responsible for producing data in the organization and in other relevant organizations, and who the main users are of data. For example: 1. What are the roles of central and line ministries in assessing performance? 2. What is the role of parliament? 3. What is the role of the supreme audit agency? 4. Do ministries and agencies share information with one another? 5. Is there a political agenda behind the data produced? 6. Who in the country produces data? 7. Where at different levels in the government are data used? Organizational Capacity. A key element driving the organization’s readiness for a results-based monitoring and evaluation system relates to the skills, resources, and experience the organization has available.

Page 148

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System Questions to ask when assessing organizational capacity include: 1. Who in the organization has the technical skills to design and implement such a system? 2. Who has the skills to manage an M&E system? 3. What data systems currently exist within the organization, and of what quality are they? 4. What technology is available to support the necessary data system? Database capacity, availability of data analysis, reporting software, etc. should be parts of the assessment. 5. What fiscal resources are available to design and implement an M&E system? 6. What experience does the organization have with performance reporting systems? Barriers. As with any organizational change intervention, it is important to consider what could potentially stand in the way of effective implementation. Questions to ask here include: 1. Do any of these immediate barriers now exist to getting started in building an M&E system? −

lack of fiscal resources

−

lack of political will

−

lack of a champion for the system

−

lack of an outcome-linked strategy

−

lack of prior experience

2. How do we confront these barriers?

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 149

Chapter 3

Key Questions for Predicting Success in the Construction Construction of and M&E System: •

Does a clear mandate exist for M & E at the national level? −

PRSP (Poverty Reduction Strategy Papers)? Laws and Regulations? Civil Society? Other?

•

Is there the presence of strong leadership and support at the most senior levels of the government?

•

How reliable is information that may be used for policy and management decision making?

•

How involved is civil society as a partner with government in building and tracking performance information?

•

Are there pockets of innovation that can serve as beginning practices or pilot programs?

At the end of the readiness assessment, senior government officials confront the question of whether to move ahead with constructing a results-based M & E system. Essentially, the question is “go-no go?” (now, soon, or maybe later). If the decision is to move forward, we are ready to consider Step Two.

Step Two: Agreeing on Performance Outcomes to Monitor and Evaluate 1

2

3

4

5

6

7

8

9

10

Agreeing on Outcomes to Monitor and Evaluate

As mentioned previously, it is important to generate an interest in assessing the outcomes and impacts the organization or government is trying to achieve, rather than simply focusing on implementation issues (inputs, activities, and outputs). After all, outcomes are what tell you whether or not the specific intended benefits have been realized.

Page 150

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System Strategic outcomes and impacts should focus and drive the resource allocation and activities of the government and its development partners. These impacts should be derived from the strategic priorities of the country. Issues to consider when generating a list of outcomes include: •

Are there stated national/sectoral goals (for example, Vision 2016)?

•

Have political promises been made that specify improved performance in a particular area?

•

Do citizen-polling data (e.g. “citizen scorecards”) indicate specific concerns?

•

Is aid lending linked with specific goals?

•

Is authorizing legislation present?

•

Is the government making a serious commitment to achieving the Millennium Development Goals (MDGs)?

There are many different strategies available for gathering information about the concerns of major stakeholder groups. These include brainstorming sessions, interviews, focus groups, and surveys. When using these methods, the focus should be on existing concerns, rather than potential future problems. Table 3.2a shows the first step in developing outcomes for one policy area (education) – setting clear outcomes. Here are hypothetical examples of two outcomes a Ministry of Education might set for itself. Table 3.2 a: Developing Outcomes for Education Policy. Outcomes

Indicators

Baselines

Targets

1. Nation’s children have improved coverage with preschool programs 2. Primary school learning outcomes for children are improved.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 151

Chapter 3

Summary of Outcomes: Outcomes: Why an Emphasis on Outcomes? •

Makes explicit the intended results of government action. − (Know where you are going before you get moving”)

•

Outcomes are the results that governments hope to achieve.

•

Clearly setting outcomes is essential to designing and building results-based M & E Systems.

•

IMPORTANT: Budget to outputs, outputs, manage to outcomes!

•

Outcomes are usually not directly measured − only reported on.

•

Outcomes must be translated to a set of key indicators.

Step Three: Developing Key Indicators to Monitor Outcomes Developing Key Indicators to Monitor Outcomes

1

2

3

4

5

6

7

8

9

10

As the old adage goes, “What gets measured gets done.” Specification of exactly what is to be measured in order to gauge achievement of outcomes not only helps us track progress; it can also be a powerful motivating tool to focus efforts and create alignment within an organization if it is done early enough in the process. An indicator is a specific measure that, when tracked systematically over time, indicates progress (or not) towards a specific target. And, just to be explicit, we recommend that in new M & E systems, all the indicators should be numerical. Qualitative indicators can come later when the M & E system is more mature.

Page 152

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System An outcome indicator answers the question:

How will we know success when we see it? Indicator development is a core activity in building an M&E system and drives all subsequent data collection, analysis, and reporting. The political and methodological issues in creating credible and appropriate indicators are not to be underestimated. According to Schiavo-Campo (1999), good indicators use the “CREAM” of good performance. That is, they should be: •

Clear (precise and unambiguous)

•

Relevant (appropriate to the subject at hand)

•

Economic (available at reasonable cost)

•

Adequate (able to provide sufficient basis to assess performance)

•

Monitorable (amenable to independent validation).

Sometimes it is possible to minimize costs by using predesigned indicators. However, it is important to consider how relevant they are (and will be perceived to be) to the specific country’s context. Some may need to be adapted to fit, or supplemented with others that are more locally relevant. First, it is a good idea to select more than one indicator, but less then seven indicators for each outcome. Indicators, once selected, are not cast in stone. Expect to add new ones and drop old ones over time as you improve and streamline the monitoring system. How many indicators are enough? The minimum number that answers the question:

Has the outcome been achieved? Figure 3.4 shows a matrix that must be completed before an indicator is used in an M & E system. Indicator

Data source

Data collection method

Who will collect data

Frequency of data collection

Cost to collect data

Difficulty to collect

Who will analyze and report data

Who will use the data?

1. 2. 3.

Fig. 3.4: Matrix for for Selecting Indicators.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 153

Chapter 3 The matrix shown in Figure 3.4 is important. Completing each cell in the matrix, for each indicator, gives a better idea of the feasibility of actually deploying each indicator. Data systems may not be available for each indicator. The performance indicators selected and the data collection strategies to collect information on these indicators need to be grounded in reality (Kusek & Rist, 2004, p. 83). Consider: •

what data systems are in place

•

what data can presently be produced

•

what capacity exists to expand the breadth and depth of data collection and analysis.

Table 3.2b shows the second step in developing outcomes for education policy – identifying the indicators that will be used to measure performance. Table 3.2b: Developing Outcomes Outcomes for Education Policy (continued, showing indicators) Outcomes

Indicators

1. Nation’s children have improved coverage with pre-school programs

1. % of eligible urban children enrolled in pre-school education

2. Primary school learning outcomes for children are improved

1. % of Grade 6 students scoring 70% or better on standardized math and science tests

Baselines

Targets

2. % of eligible rural children enrolled in pre-school education

2. % score of Grade 6 students higher on standardized math and science tests in comparison to baseline data

Summary of of Developing Indicators

Page 154

•

You will need to develop your own indicators to meet your own needs.

•

Developing good indicators takes more than one try!

•

Arriving at the final indicators will take time!

•

All indicators should be stated neutrally, not as “increase in” or “decrease in.”

•

Pilot, Pilot, and Pilot!

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System

Step Four: Gathering Baseline Data on Indicators 1

2

3

4

5

6

7

8

9

10

Baseline Data on Indicators − Where Are We Today?

The measurement of progress (or a lack of it) towards outcomes begins with the description and measurement of initial conditions being addressed by the outcomes. Collecting baseline data essentially means taking the first measurements of the indicators to find out “Where are we today?” A performance baseline is information (qualitative or quantitative) about performance on the chosen indicators at the beginning of (or immediately prior to) the intervention. In fact, one consideration when choosing indicators is the availability of baseline data from that indicator, which will allow subsequent performance to be tracked relative to that initial baseline. Sources of baseline data can be either primary (gathered specifically for this measurement system) or secondary (collected for another purpose, but can be used to report on one or more of the selected indicators). Secondary data can come from within your organization, from the government, or from international data sources. Secondary data can save money for acquiring data, but needs checking that it really is the information needed, it will be extremely difficult to go back and get primary baseline data if later it is found that the secondary source does not meet the needs! Possible sources of data include: •

written records (paper and electronic)

•

individuals involved with the intervention

•

the general public

•

trained observers

•

mechanical measurements and tests

•

geographic information system.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 155

Chapter 3 Once the sources of baseline data for the indicators are chosen, evaluators decide who is going to collect the data, and how. They identify and develop data collection instruments such as forms for gathering information from files or records, interview protocols, surveys, and observational instruments. As they develop the collection instruments, they keep in mind the practical issues: •

Are quality data currently available (or easily accessible)?

•

Can data be procured on a regular and timely basis, to allow tracking of progress?

•

Is the planned primary data collection feasible and costeffective?

There are many ways to collect data. We will learn more about these in Chapter 9: Planning Data Analysis and Completing the Design Matrix. Figure 3.5 summarizes many techniques and displays them from least vigorous and least formal (also least costly) to more structured, formal and virorous (also more costly) techniques. Panel Surveys

Key Informant Interviews Conversation with Concerned Individuals

Focus Group Interviews

Community Interviews

Field Visits

Participant Observations

One-Time Surveys

Direct Observations

Census

Reviews of Official Records (MIS and Admin Data)

Field Experiments Surveys

Informal/Less Structured Methods

More Structured/Formal

Fig. 3.5: Summary of Data Collection Methods.

Table 3.2c shows the third step in developing outcomes for eduction policy – establishing baselines.

Page 156

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System

Table 3.2c: Developing Outcomes for Education Policy (continued, showing baseline data). Outcomes

Indicators

Baselines

1. Nation’s children have improved coverage with pre-school programs

1. % of eligible urban children enrolled in preschool education

1. 75% urban in 1999

2. Primary school learning outcomes for children are improved

1. % of Grade 6 students scoring 70% or better on standardized math and science tests

1. 47% in 2002 scored 70% or better in math. 50% in 2002 scored 70% or better in science

2. % score of Grade 6 students higher on standardized math and science tests in comparison to baseline data

2. Mean % score in 2002 for Grade 6 students for math was 68%, and 53% for science

2. % of eligible rural children enrolled in preschool education

Targets

2. 40% rural in 2000

Step Five: Planning for Improvements— Improvements— Setting Realistic Targets Planning for Improvement− Setting Realistic Targets

1

2

3

4

5

6

7

8

9

10

.

The next step, of the ten steps, is the final step in building the performance framework. This step establishes targets. According to Kusek and Rist (2004, p. 91), “In essence, targets are the quantifiable levels of the indicators that a country, society, or organization wants to achieve by a given time.” Most outcomes and nearly all impacts in international development are long term, complex, and not quickly achieved. Thus, there is a need to establish interim targets that specify how much progress towards an outcome is to be achieved, in what time frame, and with what level of resource allocation. Measuring performance against these targets can involve both direct and proxy indicators as well as the use of both quantitative and qualitative data.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 157

Chapter 3 Referring back to the theory of change in Figure 3.2 (earlier in this chapter), one can think of theory of change impacts as the long-term goals the intervention is ultimately striving to achieve. Outcomes are a set of sequential and feasible targets for the indicators (against the baselines) we hope to achieve along the way, and within a specified, realistic (political and budgetary) time frame. Stated differently, if we reach our sequential set of targets over time, we will reach our outcome (provided we have a good theory of change and successful implantation). When setting targets for indicators, it is important to have a clear understanding of: •

the exact baseline starting point (e.g. average of last three years, last year, average trend)

•

the theory of change and how to disaggregate it into a set of time-bound achievements

•

the levels of funding and personnel resources over the timeframe for the target

•

the amount of outside resources expected to supplement the program’s current resources

•

relevant political concerns

•

organizational (especially managerial) experience in delivering projects and programs in this substantive area.

Figure 3.6 shows how to identify the expected or desired level of project, program, or policy results to be achieved as one step along a chain of performance targets that will eventually lead over time to the outcome. Baseline Indicator Level

Desired Level of Improvement

+

Assumes a finite and expected level of inputs, activities, and outputs

Target Performance

=

Desired level of performance to be reached within a specific time

Fig. 3.6: Identifying Expected or Desired Level of Improvement Requires Selecting Performance Targets

Be sure to set only one target for each indicator. If the indicator has never been used before, be cautious about setting a specific target (set a range instead). Targets can be set for the intermediate term (but no longer than three years!); the important thing is to be realistic about how long it will take to achieve the target and whether it is achievable or not. Most targets are set yearly, but some can be set quarterly, while others can be set for longer periods, but again, not more than three years. Page 158

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System It takes time to observe the effects of improvements, therefore, be realistic when setting targets. The following are two examples of targets related to developing issues: •

Goal: economic well-being Outcome target: by 2012, reduce the proportion of people living in extreme poverty by 20 percent against the baseline.

•

Goal: social development Outcome target: by 2012, increase the primary education enrolment rate in the Kyrgyz Republic by 30 percent.

Table 3.2d shows the final step in developing outcomes for education policy – setting performance targets. Table 3.2d: Developing Outcomes for Education Policy (continued, (continued, showing performance targets) Outcomes

Indicators

Baselines

Targets

1. Nation’s children have improved coverage with preschool programs

1. % of eligible urban children enrolled in pre-school education

1. 75% urban in 1999

1. 85% urban by 2006

2. 40% rural in 2000

2. 60% by 2006

2. Primary school learning outcomes for children are improved.

1. % scores of Grade 6 students 70% or better on standardized math and science tests

1. 45% in 2002 scored 70% or better in math. 50% in 2002 scored 70% or better in science

1. 80% by 2006 in math

2. Mean % score in 2002 for Grade 6 students for math was 68%, and 53% for science

2. Mean math test score in 2006 will be 78%

2. % of eligible rural children enrolled in pre-school education

2. % scores of Grade 6 students higher on standardized math and science tests in comparison to baseline data

67% by 2006 in science

Mean science test score in 2006 will be 65%

This completed matrix now becomes the results framework. It defines the outcomes and gives a plan for knowing if it was successful in achieving these outcomes. The completed matrix includes the outcomes, indicators, baselines, and targets for the project, program, or policy.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 159

Chapter 3 The framework defines the design of a results-based M & E system that will, in turn, begin to provide information about whether interim targets are being achieved on the way to the longer-term outcome. The framework helps evaluators design the evaluation. It can also assist managers with budgeting, resource allocation, staffing, and so forth. Managers should frequently consult the framework to ensure the evaluation is moving towards the desired outcomes. Performance targeting is critical to reaching policy outcomes. Using a participatory, collaborative process involving baseline indicator levels and desired levels of improvement over time are key to results-based M & E.

Step Six: Monitoring for Results 1

2

3

4

5

6

7

8

9

10

Monitoring for Results

As mentioned before, a results-based monitoring system tracks both implementation (inputs, activities, outputs) and results (outcomes and impacts). Figure 3.7 shows these key types of monitoring and how each fits in with the model.

Results

Impacts ↑

Results Monitoring

Outcomes ↑

Implementation

Outputs ↑ Activities

Implementation Monitoring Monitoring (Means and Strategies)

↑ Inputs

Fig. 3.7: Key Types of Monitoring.

Page 160

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System Each outcome will have a number of indicators, each of which will have a target. In order to achieve those targets, there are a series of activities and strategies that need to be coordinated and managed. Figure 3.8 illustrates the relationships of outcomes to targets and how implementation monitoring links to results monitoring.

Outcome Monitor Results

Target 1

Target 2

Target 3

Means and Strategies

Means and Strategies

Means and Strategies

(Multi-year and Annual Work Plans)

(Multi-year and Annual Work Plans)

(Multi-year and Annual Work Plans

Monitor Implementation

Fig. 3.8: Implementation Monitoring Monitoring Links to Results Results Monitoring

Figure 3.9 shows linking implementation monitoring to results monitoring with an example of reducing child mortality. Impact

Outcome

Target

Means and Strategies

Children’s mortality reduced

Children’s morbidity reduced

Reduce incidence of childhood gastrointestinal disease by 20% over three years against the baseline • Improve cholera prevention programs • Provide vitamin A supplements • Use oral re-hydration therapy

Fig. 3.9: Linking Implementation Monitoring Monitoring to Results Monitoring

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 161

Chapter 3 Working with partners is increasingly the norm for development work. This is illustrated in Figure 3.10. Notice the number of partners included in the lowest level of this hierarchy, each potentially contributing: inputs, activities, and outputs as part of a strategy to achieve targets. Impact

Outcome

Outcome

Outcome

Target 1

Target 2

Means & Strategy

Means & Strategy

Means & Strategy

Partner 1

Partner 1

Partner 1

Partner 2

Partner 2

Partner 2

Partner 3

Partner 3

Partner 3

Fig. 3.10: Achieving Achieving Results through Partnership

A strong M&E system, like the program itself, must be supported through the use of management tools – a budget, staffing plans, and activity planning. Building an effective M&E system involves administrative and institutional tasks such as: •

establishing data collection, analysis, and reporting guidelines

•

designating who will be responsible for which activities

•

establishing means of quality control

•

establishing timelines and costs

•

working through the roles and responsibilities of the government, the other development partners, and civil society

•

establishing guidelines on the transparency and dissemination of the information and analysis. To be successful, every monitoring system needs the following: following

Page 162

•

ownership

•

management

•

maintenance

•

credibility.

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System

Step Seven: The Role of Evaluations The Role of Evaluations

1

2

3

4

5

6

7

8

9

10

Although this chapter has concentrated thus far on the development of a monitoring system, it is important to emphasize the role that evaluation plays in supplementing information on progress toward outcomes and impacts. Whereas monitoring will tell us what we are doing relative to indicators, targets, and outcomes, evaluation will tell us whether: •

we are doing the right things (strategy)

•

we are doing things right (operations)

•

there are better ways of doing it (learning).

Evaluation can address many important issues that go beyond a simple monitoring system. For example, the design of many interventions is based on certain causal assumptions about the problem or issue being addressed. Evaluation can confirm or challenge these causal assumptions using theory-based evaluation and logic models (see Chapter 4: Understanding the Evaluation Context and Program Theory of Change). Evaluation can also delve deeper into an exciting or troubling result or trend that emerges from the monitoring system (e.g., finding out why girls are dropping out of a village school years earlier than boys).

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 163

Chapter 3 When should evaluation be used in addition to monitoring? Here are nine possibilities: •

any time there is an unexpected result or performance outlier that requires further investigation

•

when resource or budget allocations are being made across projects, programs, or policies

•

when a decision is being made regarding whether or not to expand a pilot

•

when there is a long period with no improvement, and it is not clear what the reasons for this are

•

when similar programs or policies are reporting divergent outcomes (or when indicators for the same outcome are showing divergent trends)

•

when there are comparisons with other ways to get the same results

•

when attempting to understand the side effects of interventions

•

when learning about the merit, worth, and significance of what was done

•

when looking carefully at costs vis-à-vis benefits.

If governments and organizations are going to rely on the information gathered using an M & E system, they must depend upon the quality and trustworthiness of the information they gather. Poor, inaccurate, and biased information is of no use to anyone. As noted at the beginning of this chapter, in subsequent chapters we will denote our attention to the “E” in M&E. Here we are making only a brief note of evaluation to emphasize its complementarities to a performance-based monitoring system.

Page 164

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System

Step Eight: Reporting Findings

1

2

3

4

5

6

7

8

9

10

Reporting Findings

Analysis and reporting of M&E findings is a crucial step in this process, as it determines what findings are reported to whom, in what format, and at what intervals. Thinking carefully about the demand for information at each level of the organization, as well as the form in which that information will be most useful, and at what stage(s) of the project/program the findings should be reported is crucial. Analyzing and reporting data: •

gives information on the status of projects, programs, and policies

•

provides clues to problems

•

creates opportunities to consider improvements in the implementation (projects, programs, or policy) strategies

•

provides important information over time on trends and directions

•

helps confirm or challenge the theory of change behind the project, program, or policy.

Be sure to find out what the main decision points are at the project, program, and policy levels, so that it is clear when M&E findings will be most useful for decision makers.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 165

Chapter 3 When analyzing and presenting data, be sure to do the following: •

Compare indicator data with the baseline and targets, and provide this information in an easy-to-understand visual display.

•

Compare current information to past data, and look for patterns and trends.

•

Be careful about drawing sweeping conclusions based on small amounts of information. The more data points you have, the more certain you can be that trends are real.

Be sure to report all important results, whether positive or negative. A good M&E system should provide an early warning system to detect problems or inconsistencies, as well as being a vehicle for demonstrating the value of an intervention. Performance reports should include explanations about poor or disappointing outcomes, and document any steps already underway to address them. Also, be sure to protect the messenger. Do not punish people for delivering bad results. Uncomfortable findings can indicate new trends or notify managers of problems early on, which allows them time needed to solve these problems. Table 3.3 shows an outcomes reporting format. It includes actual versus targets. Table 3.3: Outcomes Reporting Format Outcome Indicator

Baseline %

Current (x)

Target (y)

Difference ( yy-x)

%

%

%

Rates of hepatitis (N=6000)

30

35

20

-15

Percentage of children with improved overall health status (n=9000)

20

20

24

-4

Percentage of children who show 4 out of 5 positive scores on physical exams (N=3500)

50

65

65

0

Percentage of children with improve nutritional status (N=14,000)

80

85

83

+2

(Fictional data)

Data analysis and reporting are covered in considerable detail in later chapters.

Page 166

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System

Step Nine: Using Findings Using Findings

1

2

3

4

5

6

7

8

9

10

The crux of an M&E system is not in simply generating resultsbased information, but in getting that information to the appropriate users in the system in a timely fashion so that they can take it into account (as they choose) in the management of the projects, programs, or policies. Development partners and civil society have important roles in using the information to strengthen accountability, transparency, and resource allocation procedures.

Ten Uses of Results Findings 1 Responds to elected officials’ and the public’s demands for accountability 2 Helps formulate and justify budget requests (we want to avoid funding failure) 3 Helps in making operational resource allocation decisions 4 Triggers in-depth examinations of what performance problems (with the theory of change or implementation) exist and what corrections are needed 5 Helps motivate personnel to continue making program improvements 6 Monitors the performance of contractors and grantees (it is no longer enough for them to document how busy they are) 7 Provides data for special, in-depth program evaluations 8 Helps track service delivery against precise outcome targets (are we doing things right?) 9 Supports strategic and other long-term planning efforts (are we doing the right things?) 10 Communicates with the public to build public trust.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 167

Chapter 3 Some strategies for sharing information that can be implemented at any government level include: •

Empowering the media: The media can be an important partner, helping to disseminate the findings generated by results-based M & E system. They can also be helpful in exposing corruption and calling for better governance.

•

Enacting “Freedom of Information” legislation Freedom of Information is a powerful tool that can be used to share information with concerned stakeholders.

•

Instituting E-government: E-government involves the use of information technology to provide better accessibility, outreach, information, and services. E-government allows stakeholders to interact directly with the government to obtain information and even transact business online.

•

Adding information on internal and external Internet files: Information can be shared by posting information, as well as published performance findings on internal (agency or government) and external Websites. Many agencies are also developing searchable databases for M & E findings.

•

Publishing annual budget reports: The best way to communicate how taxpayer money is being spent is to publish the budget. This way, citizens have the opportunity to “compare” the quality and level of services being provided by the government, and the priority the government gives to the service or program.

•

Engaging civil society and citizen groups: Engaging civil society and citizen groups encourages the groups to be more action-oriented, accountable, and more likely to agree on the information they need.

•

Strengthening parliamentary oversight: Parliaments in many countries, both developed and developing, are asking for information about performance as part of their oversight function. They are looking to see that budgets are used effectively.

Page 168

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System •

Strengthening the office of the auditor general: Countries are also finding the Office of the Auditor General a key partner in determining how effective the government is functioning. We are starting to see better implementation of projects, programs, and policies as audit agencies demand more information about how well the public sector is performing.

•

Sharing and comparing results findings with development partners: Due to the introduction of National Poverty Reduction Strategies and other similar strategies and policies, development partners (especially bilateral and multilateral aid agencies) are sharing and comparing results and findings.

Understanding the utility of performance information for various users is a key reason for building an M & E system in the first place. Key potential users in many societies, such as citizens, NGO groups, and the private sector, are often left out of the information flow. The point is that monitoring and evaluation data have both internal (governmental) and external (societal) uses that need to be recognized and legitimated.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 169

Chapter 3

Step Ten: Sustaining the M&E System within the Organization 1

2

3

4

5

6

7

8

9

10

Sustaining the M & E System within the Organization

Ensuring the longevity and utility of a results-based M&E system is a serious challenge. There are six critical components crucial to the construction of a sustainable system: •

demand

•

clear roles and responsibilities

•

trustworthy and credible information

•

accountability

•

capacity

•

incentives.

Each of these components needs continued attention over time to ensure the viability of the system. Demand. One way of building demand for M&E information is to build in a formal structure that requires regular reporting of performance results (e.g. legislation, regulation, or an annual reporting requirement for organizational units). Another useful strategy is to publicize the availability of this information, through the media, thereby generating demand from government bodies, citizen groups, donors, and the general public. Third, by making a practice of translating strategy into specific goals and targets, those interested in the organization’s strategic direction will also be interested in monitoring and evaluation against the associated goals and targets.

Page 170

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System Clear Roles and Responsibilities. One of the most important structural interventions for institutionalizing an M&E system is the creation of clear, formal lines of authority, and responsibilities for collecting, analyzing, and reporting performance information. Second, issue clear guidance on who is responsible for which components of the M&E system and build this into their performance reviews. Third, build a system that links the central planning and finance functions with the line/sector functions to encourage a link between budget allocation cycles and the provision of M&E information, essentially a performance budgeting system. Finally, build a system where there is demand for information at every level of the system (i.e. there is no part of the system that information simply “passes through” without being used). Trustworthy and Credible Information. The performance information system must be able to produce both good and bad news. Accordingly, the producers of this information will need protection from political reprisals. The information produced by the system should be transparent, and subject to independent verification (e.g. a review by the national audit office of the government, or an independent group of university professors). Accountability. Consider the external stakeholders who have an interest in performance information, and find ways to share transparent information with them. Key stakeholder groups to consider include civil society organizations, the media, the private sector, and the government. Capacity. Undertaking a readiness assessment and focusing on organizational capacity is one of the first things we considered in the building of an M&E system. Key elements to build on here include: sound technical skills in data collection and analysis, managerial skills in strategic goal setting and organization development, existing data collection and retrieval systems, the ongoing availability of financial resources, and institutional experience with monitoring and evaluation. Incentives. Incentives need to be introduced to encourage use of performance information. This means that success needs to be acknowledged and rewarded, problems need to be addressed, messengers must not be punished, organizational learning is valued, and budget savings are shared. Corrupt or ineffective systems cannot be counted on to produce quality information and analysis.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 171

Chapter 3

Concluding Comments There is no requirement that the building of an M&E system has to be done according to these ten steps. One can develop strategies that are more detailed in the number of steps as well as those with fewer numbers. The challenge is one of ensuring that key functions and activities are recognized, clustered together in a logical manner, and then done in an appropriate sequence. Governments and organizations can use results-based M & E systems as powerful management tools. Results-based M & E systems can help build and foster political and financial change in the way governments and organizations operate. They can also help build a solid knowledge base. A resultsbased M & E system should be iterative. To ensure the system remains viable and sustainable, it must receive continuous attention, resources, and political commitment. It takes time to build the cultural shift to a results orientation. But the time, effort, and rewards are worth the effort and the rewards can be many.

Last Reminders

Page 172

•

The demand for capacity building never ends! The only way an organization can coast is downhill.

•

Keep your champions on your side and help them.

•

Establish the understanding with the Ministry of Finance and Parliament that an M&E system needs sustained resources, just as does the budget system.

•

Look for every opportunity to link results information to budget and resource allocation decisions.

•

Begin with pilot efforts to demonstrate effective resultsbased monitoring and evaluation.

•

Begin with an enclave strategy (e.g. islands of innovation) as opposed to a whole-of-government approach.

•

Monitor both implementation progress and results achievements.

•

Complement performance monitoring with evaluations to ensure better understanding of public sector results.

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System

Going Forward Once the framework is developed for an evaluation (Step 5), the framework can be used to construct the theory of change, choose an approach, begin writing questions, and choosing a design for the evaluation. These will be covered in Chapters 4, 5, 6, and 7.

Summary A results-based monitoring and evaluation system can be a valuable tool to assist policy makers and decision makers in tracking the outcomes and the impacts of projects, programs, and policies. Unlike traditional evaluation, results-based monitoring and evaluation moves beyond an emphasis on inputs and outputs. It focuses on outcomes and impacts. It is the key architecture for any performance measurement system. Results-based monitoring and evaluation systems: •

use baseline data to describe a problem before beginning an initiative

•

tracks indicators for the outcomes to be achieved

•

collect data on inputs, activities, and outputs and their contributions to achieving outcomes

•

assesses the robustness and appropriateness of the deployed theory of change

•

have systematic reporting to stakeholders

•

are done with strategic partners

•

capture information on the success or failure of partnership strategy in achieving intended results

•

constantly strives to provide credible and useful information as a management tool.

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 173

Chapter 3 The following are the ten steps recommended to design and build a results-base monitoring and evaluation system: 1. Conducting a Readiness Assessment 2. Agreeing on Performance Outcomes to Monitor and Evaluate 3. Selecting Key Indicators to Monitor Outcomes 4. Baseline Data on Indicators – Where Are We Today? 5. Planning for Improvement – Setting Realistic Targets 6. Building a Monitoring System 7. The Role of Evaluations 8. Reporting Your Findings 9. Using Your Findings 10. Sustaining the M&E System within Your Organization Results-based M & E has become a global phenomenon due to the efforts of national and international stakeholders in the development process to find increased accountability, transparency, and results from governments and organizations. Building and sustaining a results-based M & E system is not easy. It needs continuous commitment, champions, time effort, and resources. There may be organizational, technical, and political challenges. The original system may need several revisions to tailor the system to meet the needs of the organization. But it is possible and it is worth the effort.

Page 174

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System

Chapter 3 Activities Application Exercise 3.1: Get the Logic Right Instructions: How ready is your organization to design and implement a results-based monitoring and evaluation system? Rate your organization on each of the following dimensions, giving comments to explain your rating. Discuss with a colleague any barriers to implementation, and how they might be addressed. 1. Incentives (circle one rating): plenty of incentives

a few incentives

several disincentives

Comments:

Strategies for improvement:

2. Roles and Responsibilities (circle one rating): very clear

somewhat clear

quite unclear

Comments:

Strategies for improvement:

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 175

Chapter 3 (Application Exercise 3.1 cont.) 3. Organizational Capacity (circle one rating): excellent

adequate

weak

Comments:

Strategies for improvement:

4. Barriers (circle one rating): no serious barriers

very few barriers serious barriers

Comments:

Strategies for improvement:

Page 176

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System

Application Exercise 3.2: Identifying Inputs, Activities, Outputs, Outcomes, and Impacts Instructions: From the list below, identify whether each of the following is an input, an activity, an output, an outcome, or a long-term impact. If possible discuss with a colleague, and explain the basis on which you categorized each one. •

Women-owned micro-enterprises are significantly contributing to poverty reduction in the communities where they are operating

•

Government makes available funds for micro-enterprise loans

•

Government approves 61 applications from program graduates

•

Course trainers identified

•

72 women complete training

•

Income of graduates increases 25% in first year after course completion

•

100 women attend training in micro-enterprise business management

•

Information provided to communities on availability of micro-enterprise program loans

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 177

Chapter 3

Application Exercise 3.3: Developing Indicators Instructions: Identify program or policy with which you are familiar. What is the main impact the program is trying to achieve? What are two outcomes you would expect to see as necessary, if the intervention was on track to achieve that impact? Impact:

_________________________________

Outcome 1: ______________________________________________ Outcome 2: ______________________________________________ Starting with the outcomes, identify two or three indicators you would use to track progress against each of the above. (In actual practice, we would say the number of indicators should be at least two and no more than seven). Outcome 1: ____________________________________________ Indicator (a):

_______________________________

Indicator (b):

_______________________________

Indicator (c):

_______________________________

Outcome 2: ____________________________________________

Page 178

Indicator (a):

_______________________________

Indicator (b):

_______________________________

Indicator (c):

_______________________________

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System Impact:

____________________________________________

Indicator (a):

_______________________________

Indicator (b):

_______________________________

Indicator (c):

_______________________________

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 179

Chapter 3

References and Further Reading: Boyle, R. and D .Lemaire, Editors (1999). Building effective evaluation capacity. New Brunswick, NJ: Transaction Books. IFAD (International Fund for Agriculture Development) (2002). “A guide for project M&E: Managing for impact in rural development.” Rome: IFAD. Retrieved April 1, 2008 from http://www.ifad.org/evaluation/guide/ Furubo, Jan-Eric, Ray C. Rist, and Rolf Sandahl, Editors (2002). International alas of evaluation. New Brunswick, NJ: Transaction Books. Khan, M. Adil (2001). A guidebook on results based monitoring and evaluation: Key concepts, issues and applications. Monitoring and Progress Review Division, Ministry of Plan Implementation, Government of Sri Lanka. Colombo, Sri Lanka. Kusek, Jody Z. and Ray C. Rist (2004). Ten steps to building a results-based monitoring and evaluation system. Washington D.C.: The World Bank. Kusek, Jody Z. and Ray C. Rist (2003). “Readiness assessment: Toward performance monitoring and evaluation in the Kyrgyz Republic.” Japanese Journal of Evaluation Studies. 3(1): 17-31. Kusek, Jody Zall and Ray C. Rist (2001). “Building a performance-based monitoring and evaluation system: The challenges facing developing countries,” Evaluation Journal of Australasia, Vol. 1, no. 2 pp. 14-23. Malik, Khalid and Christine Roth, Editors (1999). Evaluation capacity development in Asia. United Nations Development Program Evaluation Office, NYC. Osborn, David and Ted Graebler (1992). Reinventing government. Boston, Mass.: Addison-Wesley Publishing. Organization for Economic Co-operation and Development (OECD Development Co-operation Directorate (2002). Glossary of key terms in evaluation and results-based management. OECD/DAC. Schiavo-Campo, Salvatore (1999). “Performance’ in the Public Sector.” Asian Journal of Political Science. 7(2): 75-87. UNPF (United Nations Population Fund) (2002). “Monitoring and evaluation toolkit for program managers.” Office of Oversight and Evaluation. Retrieved April 1, 2008 from http://www.unfpa.org/monitoring/toolkit.htm

Page 180

The Road to Results: Designing and Conducting Effective Development Evaluations

Building a ResultsResults-based Monitoring and Evaluation System System Valadez, Joseph, and Michael Bamberger (1994). Monitoring and evaluation social programs in developing countries: A handbook for policymakers, managers, and researchers. Washington, D.C.: The World Bank. Wholey, Joseph S., Harry Hatry, and Kathryn Newcomer (2001). “Managing for results: Roles for evaluators in a new management era.” 22(3): 343-347 The World Bank (1997). World Development Report: The State in a Changing World. Washington, D.C.

Web Sites: Calvani, Sandro and Sanong Chinnanon (2003) A manual on monitoring and evaluation for alternative development projects. Regional Centre for East Asia and the Pacific http://www.unodc.org/pdf/Alternative%20Development/M anual_MonitoringEval.pdf International Development Research Centre (2004). Evaluation Planning in Program Initiatives. Ottawa, Canada. http://web.idrc.ca/uploads/userS/108549984812guideline-web.pdf MandE News http://www.mande.co.uk/ Specialist Monitoring and Evaluation Web Sites http://www.mande.co.uk/specialist.htm W.K. Kellogg Foundation (1998). W.K. Kellogg Evaluation Handbook. http://www.wkkf.org/Pubs/Tools/Evaluation/Pub770.pdf Uganda Communications Commission (2005). “Chapter 8: Monitoring and Evaluation Funding and implementing universal access: Innovation and experience from Uganda. International Development Research Centre (IDRC) http://www.idrc.ca/en/ev-88227-201-1-DO_TOPIC.html Upgrading Urban Communities – Resource Framework (2001). Tools: Capturing experience monitoring and evaluation. The World Bank Group. http://web.mit.edu/urbanupgrading/upgrading/issuestools/tools/monitoring-eval.html#Anchor-Monitoring-56567 Vernooy, Ronnie, Sun Qiu, and Xu Jianchu (2003) Voices for Change: Participatory Monitoring and Evaluation in China. IDRC. http://www.idrc.ca/en/ev-26686-201-1-DO_TOPIC.html

The Road to Results: Designing and Conducting Effective Development EvaluationsPage 181

Chapter 3 The World Bank, Data & Statistics, Rural Development Indicators, 2008. Washington, D.C. http://web.worldbank.org/WBSITE/EXTERNAL/DATASTA TISTICS/0,,contentMDK:21725423~pagePK:64133150~piP K:64133175~theSitePK:239419,00.html The World Bank (2008) Online Atlas of the Millennium Development Goals: Building a Better World. http://devdata.worldbank.org/atlas-mdg/ The World Bank. Core Welfare Indicators Questionnaire http://www4.worldbank.org/afr/stats/cwiq.cfm International Fund for Agricultural Development (IFAD) Practical Guide on Monitoring and Evaluation of Rural Development Projects http://www.ifad.org/evaluation/oe/process/guide/index.ht m World Bank Operations Evaluation Department Monitoring and Evaluation Capacity Development in Africa http://lnweb18.worldbank.org/oed/oeddoclib.nsf/View+to+ Link+WebPages/34B9BADE34ACA617852567FC00576017 /$File/183precis.pdf?OpenElement

Page 182

The Road to Results: Designing and Conducting Effective Development Evaluations

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 4 Understanding the Evaluation Context and Program Theory of Change Introduction This chapter is the first of two chapters discussing evaluation planning. It begins the discussion on designing an evaluation and emphasizes understanding the context of the evaluation. Chapter 12: Managing for Quality and Use discusses work plan techniques for implementing the design. This chapter is about the beginning, that is, the front-end of an evaluation or how to start. An evaluation that begins with a well-planned design is more likely to finish on time, on budget, and meet the needs of the main client and other stakeholders. A front-end analysis investigates and identifies lessons from the past, confirms or casts doubt on the theory behind the program, and sets the context influencing the program. This chapter has six parts. They are: •

Front-end Analysis

•

Identifying the Main Client and Key Stakeholders

•

Understanding the Context

•

Constructing, Using, and Assessing a Theory of Change.

Chapter 4

Part I: FrontFront-end Analysis Where to begin? If one wants to get to the correct destination, it is best to begin by finding out what direction to go and what others have already learned about the path to that destination. One will want to collect critical information for decisions about time frames, costs, hazards, and processes (US GAO, 1991, p. 6). To reach this understanding, the evaluator does front-end analysis. A front-end analysis is an investigation of an issue or problem to determine what is known about it and how to proceed in developing an evaluative approach to it. In simple words, it is what we do to figure out what to do. In a front-end analysis, the evaluator investigates: •

Who is the main client for the evaluation? Who are other important stakeholders?

•

How will the timing of the evaluation in relation to project, program, or policy implementation affect the evaluation?

•

How much time is available to complete the evaluation?

•

What is the nature and extent of available resources?

•

Does social science theory have relevance for the evaluation?

•

What have been the findings of similar evaluations?

•

What is the theory of change behind the policy, program, or policy?

•

What existing data can be used for this evaluation?

Many evaluators are impatient to get the evaluation planning finished and rush into data collection. They try to do exploratory work at the same time as data collection. But completing a good front-end analysis is critical to learning about an intervention. It can save time and money on the evaluation, ensure the evaluation meets client needs, and sustain or build relationships, not only with the client, but also key stakeholders. Most importantly, a good front-end analysis can make sure the evaluation is addressing the right questions to get information that is needed rather than collecting various data that may never be used.

Page 184

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change At the beginning of an evaluation, many evaluators typically make assumptions about the evaluation. Some may be correct and some not. For example, they may assume that there is a rich data infrastructure, when in fact, there is little data available. Or they might assume there are experienced consultants with extensive country knowledge who will assist them, only to find out the people they counted on are busy with other projects. Determining if joint evaluation is appropriate and possible should also be done at this front-end stage. If there is interest and it is appropriate, the partners will need to determine what roles each will undertake. They will also need to agree on the timing of the evaluation.

Balancing Potential Costs and Benefits of the Evaluation Expected costs and benefits of the evaluation and how to balance them should be on the agenda during the front end planning. Examples of benefits are diverse, including: •

‘strong knowledge’

•

intended use by the main client, such as a decision to replicate or scale up

•

reauthorization or replenishment decision versus routine mid-point evaluation

•

potential for learning what works in an innovative intervention

•

context specification of what works

•

building local capacity through a participatory evaluation

•

answers to questions of concern to stakeholders.

Costs of evaluations, however, are important too. Think of: •

cost of evaluation in relation to the cost of the program. They should be in line; spending US $50,000 to evaluate a US $25,000 program does not make sense.

•

costs in terms of the social burden to program officials and evaluands (‘respondents’) as a consequence of participating in the study

•

reputation costs to the evaluator and the evaluation community if results are likely to be disputed because the evaluation is on a highly political controversial program or the time available is insufficient to do a comprehensive evaluation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 185

Chapter 4

Pitfalls Involved in the FrontFront-End Planning Process There are also potential pitfalls in the front-end planning process. They include:

Page 186

•

The belief that everything can be covered up front, including the belief that if front-end planning has been taken place, everything will be okay.

•

The to-do-ism fixation: this is the danger that the evaluator fixed on the original planning document of planning and can only do what has been planned already. There is resistance to modifying the plan (Leeuw, 2003).

•

Ritzer (1993, p. 1) has coined the concept of McDonaldization of society. Rizer says that “McDonaldization” is the process by which the principles of the fast-food restaurant are coming to dominate more and more sectors of American society as well as of the rest of the world” (Ritzer, 1993:1). This phrase is particularly applicable when checklists, to do-lists, and frameworks ‘take over’ and replace reflective thinking.

•

Another danger is truisms that pop up while doing the front-end planning. For example, “Randomized experiments…no way: too complicated, too expensive, too conservative…” Or: “a logical framework analysis is the least we can do and therefore we should do it”.

•

Front-end planning also does not prevent people from joining in ‘group think; that is, agreeing with the group position in order to remain a part of the group, or because criticism is not encouraged.

•

Power matters. While it is important to consider the positions of the participants; automatically weighting the value of the suggestions by power positions can be misleading.

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change

Part II: Identifying the Main Client and Key Stakeholders An important part of the front-end analysis is identifying the main client and key stakeholders of the project, program, or policy. Sometimes it may be that there are several clients, but one of them will be the main client. And separating key stakeholders from all stakeholders may not be straightforward

The Main Client When is a stakeholder a client? Are clients also stakeholders of the evaluation? Are all clients stakeholders of the program? Often there is one key stakeholder sponsoring or requesting the evaluation. This stakeholder is the main client for the evaluation. The needs of this client will have great influence on the evaluation. The main client may be: •

authorizing and funding the program

•

authorizing and funding the evaluation

•

accountable to the public for the intervention

•

to whom the evaluators are accountable.

But there will be one main client. It is important to meet with the main client early on to help identify issues for the evaluation from the main client’s perspective. During this meeting, it is important to learn of any critical timing needs of the main client and the intended use of the evaluation by the client. The evaluator, who first listens and probes the evaluation’s focus, can summarize, prepare written notes, and provide clients with broad options about ways the evaluation can be approached.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 187

Chapter 4

Stakeholders Stakeholders are all those people or representatives of organizations that have a “stake” in the intervention. Typically, they are those who are affected by an intervention either during its lifetime or in subsequent years. It is important to include as stakeholders, those who would typically not be asked to participate in evaluation. Stakeholders can include: •

Participants: those people who participate or have participated in the intervention

•

Direct Beneficiaries: those people who directly and currently benefit from the intervention

•

Indirect Beneficiaries: those people who are not recipients of the intervention but who benefit from others who are beneficiaries. For example, employers benefit from educational programs since they are able to hire better-trained people

•

Development organizations who provide funding

•

Government officials, elected officials, government employees with a relevant interest, such as planners and public health nurses

•

Program directors, staff, board members, managers, and volunteers

•

Policy-makers

•

Community and interest groups or associations, including those that might have a different agenda from the program officials.

Table 4.1, adapted from Fitzpatrick, Sanders, and Worthen (2004, p. 202), provides a means of determining the centrality of different stakeholders to the program and its evaluation.

Page 188

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change Table 4.1: Checklist Checklist of Stakeholder Roles Individuals, groups, or agencies

To make policy

To make operational decisions

To provide input to evaluation

To react

For interest only

Developer of the program Funder of the program Person/agency who identified the local need Boards/agencies who approved deliver of the program at local level Local funder Other providers of resources (facilities, supplies, in-kind contributions) Top managers of agencies delivering the program Program managers Program directors Sponsor of the evaluation Direct clients of the program Indirect beneficiaries of the program (parents, children, spouses, employers) Potential adopters of the program Groups excluded from the program Groups perceiving negative side effects of the program or the evaluation Groups losing power as a result of use of the program Groups suffering from lost opportunities as a result of the program Public/community members Others

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 189

Chapter 4

Identifying and Involving Key Stakeholders The challenge is to identify the key stakeholders. This can be done by looking at documents about the intervention and talking with the main evaluation client, program sponsors, program staff, local officials, and program participants. Stakeholders can be interviewed initially, or brought together in small groups. In contacting stakeholders about the evaluation, the evaluation planner must be clear about what is needed from each stakeholder. Is it just to make a specific stakeholder aware of the upcoming evaluation? Is it to get the stakeholders to identify issues they have about the project, program, or policy? For some evaluations, key stakeholder meetings are held periodically, or an even more formal structure is established. The evaluation manager may set up an advisory or steering committee structure. By engaging key stakeholders early on, the evaluator will have a better understanding of the intervention, what it was to accomplish, and the issues and challenges it faced in doing so. The evaluation team will be better informed as to possible key issues for the evaluation and what information will be needed, when, and from whom. Meeting with key stakeholders helps ensure that the evaluation will not miss major critical issues. It also helps get a “buy in” on the evaluation. If the evaluator indicates that questions the stakeholder would like to have answers for will be carefully considered, they are far more likely to be supportive of and interested in the evaluation. The extent to which stakeholders are actively involved in the design and implementation of the evaluation depends on several factors, such as resources and relationships. For example, stakeholders may not be able to afford to take time away from their regular duties, or there may be political reasons why the evaluation needs to be seen as independent. This text takes the position that increasing utilization of an evaluation is a process that begins with a meeting with the main client and engaging key stakeholders in the evaluation design. It is not something that happens with the final evaluation report and its dissemination.

Page 190

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change

Stakeholder Analysis In their website “A Guide to Managing for Quality,” the Management Sciences for Health (MSH) and the United Nations Children’s Fund (UNICEF) (1998) lay out a process for stakeholder analysis as a technique to identify and assess the importance of key people, groups of people, or institutions that may significantly influence the success of an evaluation. They also suggest the following reasons for doing a stakeholder analysis: •

identify people, groups, and institutions that can influence the evaluation (either positively or negatively)

•

anticipate the kind of influence, positive or negative, these groups will have on the evaluation

•

develop strategies to get the most effective support possible for the initiative and reduce obstacles to successful implementation of the evaluation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 191

Chapter 4

How to Do a Stakeholder Analysis One procedure for a stakeholder analysis follows: Step

Procedure

1

Make a matrix with four columns with the following headings for each column: • Stakeholder • Stakeholder interest(s) in the project, program, or policy • Assessment of potential impact of the: a. evaluation on the stakeholder b. stakeholder on the evaluation • Potential strategies for obtaining support or reducing obstacles

2.

Brainstorm to identify all the people, groups, and institutions that will affect or be affected by the intervention.

3.

List the people, groups, and institutions in the “Stakeholder” column.

4.

Once you have a list of all potential stakeholders, review the list, and identify the specific interests these stakeholders have in the evaluation. Consider issues like: • the potential evaluation benefit(s) to the stakeholder • the changes that the evaluation might require the stakeholder to make • the project activities that might cause damage or conflict for the stakeholder. Record these under the column "Stakeholder Interest(s) in the Project."

5.

Again, review each stakeholder listed. This time, ask the question: How important are the stakeholder's interests to the success of the evaluation? As you ask the question, consider: • the role the key stakeholder must play for the evaluation to be successful, and the likelihood that the stakeholder will play this role • the likelihood and impact of a stakeholder's negative response to the evaluation Record your assessment under the column "Assessment of Impact" for each stakeholder. Assign a(n): - “A” for extremely important - “B” for fairly important - “C” for not very important

6.

Page 192

Finally, consider the kinds of things that you could do to get stakeholder support and reduce opposition. Consider how you might approach each of the stakeholders. • What kind of issues will they want the evaluation to address? • How important is it to involve the stakeholder in the planning process? • Are there other groups or individuals that might influence the stakeholder to support your evaluation? • Record your strategies for obtaining support or reducing obstacles to your evaluation in the last column in the matrix.

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change It is important to distinguish key stakeholders from others. Efforts to involve those on the periphery may only result in irritation at why their time and participation are being solicited. A stakeholder analysis helps identify the key stakeholder and the contributions of other stakeholders. Table 4.2 shows an example of a stakeholder analysis matrix before information is entered. Table 4.2: Example Example of Stakeholder Analysis.

Stakeholder

Assessment of Stakeholder Potential Impact Potential Strategies of the: Interest(s) in the for Obtaining a evaluation on Support or Reducing Project, Program, Reducing the stakeholder or Policy Obstacles b. stakeholder on the evaluation

Normally, a stakeholder analysis will show the diversity of the stakeholders. Stakeholders will have different interests and in some cases, their interests may be contradictory. The evaluator needs to determine in these instances the groups most directly impacted by the evaluation. Sometimes, evaluators are asked to directly involve one or more stakeholders in not only planning but also conducting the evaluation. Mikkelsen (2005, pp. 283-285) discusses the difficulty of this situation. For these participatory evaluation situations, the evaluator needs to ensure that stakeholders are involved in: •

formulating terms of reference

•

selecting the evaluation team

•

analyzing data

•

formulating conclusions and recommendations.

This text returns to participatory evaluations in Chapter 12: Considering the Evaluation Approach.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 193

Chapter 4

Stakeholders: Diverse Perspectives Stakeholders approach the intervention from different perspectives. This is a good thing. It helps to understand that the initial discussions may reflect those perspectives. A donor may be concerned that the money is spent appropriately and that the intervention is effective. A program manager may be concerned that the intervention is well managed and is generating lessons learned. Program participants may want to get more and/or better services. Policy-makers may be most concerned with whether the intervention is having its intended impact. Others in the community may want to replicate or expand the intervention, while others may want to limit what they perceive to be some of the negative consequences of the intervention. Disagreement is a normal part of the process of people working together. People who feel passionately often have somewhat different visions of how the world is and should be. As a facilitator, it is important for the evaluation manager or evaluator to help the group set ground rules about disagreement that makes sense within the cultural context. However, it is essential that disagreement about issues and ideas be brought into the open, discussed, and resolved in a way that everyone feels is fair. Working with stakeholders with diverse perspectives is now a key component of all evaluation work, whether it is participatory or non-participatory.

Part III: Understanding the Context A front-end analysis also investigates the relationship between program stages and the broad evaluation purpose. The life of a project, program, or policy can be thought of as something of a developmental progression. In this progression, different evaluation questions are asked at different stages. Pancer & Westhues (1989) have presented a typology as shown in Table 4.3. These are only examples of questions; there are many potential questions per stage.

Page 194

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change Table 4.3: Typology of the Life of a Project, Program, or Policy. Stage of program development

Evaluation question to be asked

1. Assessment of social problem and needs

To what extent are community needs and standards met?

2. Determination of goals

What must be done to meet those needs and standards?

3. Design of program alternatives

What services could be used to produce the desired changes?

4. Selection of alternative

Which of the possible program approaches is most robust?

5. Program implementation

How should the program be put into operation?

6. Program operation

Is the program operating as planned?

7. Program outcomes/effects/impact

Is the program having the desired effects?

8. Program efficiency

Are program effects attained at a reasonable cost?

Source: S. Mark Pancer and Anne Westhues (1989). “A developmental stage approach to program planning and evaluation,” in Evaluation Review 13(1): 56-77. (Adapted by Rossi, Freeman & Lipsey [1999])

A further step in a front-end analysis is to determine the policy context. Research can help determine what evaluations have been conducted on other programs in similar contexts. The evaluator begins by reviewing the issues addressed, the type of design selected by others, related instruments used, and findings. If the evaluation is for a completely new, innovative intervention, then the evaluation may need to be designed without roadmaps from previous evaluations; however, this is rarely the case.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 195

Chapter 4 The same is true about the level of complexity of the programs and policies involved. For example, we now have programmatic lending to developing countries which gives them broad discretion compared to the specificity of financing projects. According to Pawson (2006): “it looks like an important change in public policy in recent years has been the rise of complex, multiobjective, multi-site, multi-agency, multi-subject programs. The reasons are clear. The roots of social problems intertwine. A health deficit may have origins in educational disadvantage, labor market inequality, environmental disparities, housing exclusion, differential patterns of crime victimization, and so on. Decision makers have, accordingly, begun to ponder whether single-measure, single-issue interventions might be treating just the symptoms.” Pawson (2006) outlines which approaches to follow if one is dealing with what he calls ‘a new breed of ‘super interventions’. This approach highlights a focus on: •

the underlying program theory

•

using existing evidence through research synthesis

•

interpreting a complex program as intervention chains, with one set of stakeholders providing resources (material, social, cognitive, or emotional) to other sets of stakeholders, in the expectation (or ‘theory’) that behavioral change will follow. The success of the intervention is thus matter of the integrity of the sequence of program theories and, in particular, how different stakeholders choose to respond to them (Pawson, 2006).

Existing Theoretical and Empirical Knowledge of the Project, Program, or Policy A next step in a front-end analysis investigates the existing theoretical and empirical knowledge about the project, program, or policy. This is also known as tapping the “knowledge fund.”

Page 196

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change The knowledge coming from evaluations and other social science research, including economic studies, increases every day. Journals often contain articles synthesizing the accumulation of explanatory knowledge on a specific topic, such as the effect of class size on learning, or nutritional programs on infant birth weights for expecting mothers. Problem-oriented research into how organizations function combines theories and research from such diverse disciplines as organizational sociology, cognitive psychology, public choice and law and economics (Scott, 2003; Swedberg, 2003). Organizations, such as the Campbell Collaboration, are reviewing the quality of evaluations on a given topic and synthesizing those that meet the criteria to determine findings. Repositories of randomized experiments in the field of criminal justice and crime prevention, social welfare programs, and health and educational programs indicate that more than 10,000 “experiments” have been done (Petrosino, 2003). See the following example from the field of criminal justice programs.

Example of a knowledge knowledge fund: See Sherman, L. W. et al (editors), Evidence-based crime prevention, London, Routledge, 2002, with a concluding chapter on: •

what works

•

what does not work

•

what is promising

It appeared that 29 programs worked; 25 did not, 28 were promising, and for 68 programs, it was unknown what they ‘did’. More than 600 evaluations were synthesized, including such diverse fields as:

• • • • • • • •

school and family based crime prevention reducing burglary programs drug arrests policing /hot spots CCTV initiatives (closed circuit TV) neighborhood wardens mentoring types of (prison) sanctions /corrections (anger management, training programs, cognitive programs focused on reducing recidivism, boot camps etc)

Therefore, it is crucial that when organizing and planning for an evaluation, attention is paid to identifying and reviewing the “Knowledge Fund”.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 197

Chapter 4

Part IV: Constructing, Using, and Assessing a Theory of Change The last part of a front-end analysis that will be discussed is constructing a theory of change and how to use it and assess it. The underlying logic or theory of change for a program is an important topic for evaluation, whether it is during the ex ante or the ex post stage of a study. This section will address: why use a theory of change, constructing a theory of change, and assessing a theory of change. One definition of theory of change states that it is “an innovative tool to design and evaluate social change initiatives” and is a kind of “blueprint of the building blocks” needed to achieve social change initiative’s long-term goals (ActKnowledge and Aspen Institute Roundtable on Community Change, 2003, What is Theory of Change and why should I care?). Another way to look at theory of change is that it is a representation of how the organization or initiative is expected to lead to the results and an identification of underlying assumptions being made. In this textbook, we also take the position that a theory of change must:

Page 198

•

depict a sequence of the inputs the project, program, or policy will use, the activities that the inputs will support, the outputs to which the project, program, or policy is budgeting either from a single activity or a combination, and the results anticipated in terms of outcomes and impacts

•

depict other events or conditions in the project, program, or policy context that might affect obtainment of the outcomes

•

identify the assumptions the program is making about how things will work in terms of cause and effect

•

identify critical assumptions that based on the policy and environmental context, and a review of the literature, need to be examined by the evaluation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change When an evaluation is being planned, attention must be paid to the question of how the theory of change underlying the program will be constructed and tested. Visuals should be used to help overview the key components and interactions of the project, program, or policy. The visual should show the cause and effects of projects, programs, or policies, that is, the visuals generally are diagrams that show the links in a chain of reasoning about "what causes what," in relationship to the desired impact or goal. The desired impact or goal is usually shown as the last link in the model. The value of a theory of change is that it visually conveys beliefs about why the project, program, or policy is likely to succeed in reaching its objectives. The theory of change also specifies the various components of a program and their relationships to each other. Resources are provided to enable an organization to engage in activities in order to achieve specific objectives. The resources, activities, outputs, intended outcomes, and impacts are inter-related. In some cases, an evaluator may find that a project, program, or policy already has an existing theory of change. If so, the evaluator needs to review it carefully. In many cases, it will be necessary to refine or rework the existing theory of change and confirm it with those directly involved. If no theory of change exists, the evaluator should create one and validate it with the program manager and staff, if possible. With a theory of change, assumptions being made must also be identified. The most critical of these assumptions that the evaluation should test, based on the prevailing political and policy environment, as well as a literature review, also need to be identified. Theories of change open the “black box” or how an intervention expects to take inputs, conduct activities, and produce outputs, and then convert them to results as show in a simple model. (See Figure 4.1)

Results Inputs

Activities

Outputs

Black Box

Outcomes Outcome

Impacts

Environment Fig. 4.1: Systems Model.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 199

Chapter 4 It is important to identify what is happening in the broader context. This is the environment in which the program operates leading to the result. It can influence the inputs, activities, and the black box. The diagram in Figure 4.2 shows a fuller sense of the relationship between the context discussed earlier and potential influence on the program’s results. Political Environment (Governance, (Governance, etc.)

Policy Context

MacroMacro-economic Picture

Inputs Activities Outputs

Public Attitudes

Black Box

Aid Players

Outcomes Impacts

Environment

Fig. 4.2 Potential Influences on the Program Results.

There is a theory of change often waiting to be articulated behind every project, program, and policy. The theory of change may be visually represented in different ways using different formats or models. Sometimes these are called theory models, logic models, or outcome models. Each is a variation on a theme depicting the theory of change. The graphic illustration of the theory of change may also be called a change framework (ActKnowledge and the Aspen Institute Roundtable on Community Change, 2003, Glossary). All theory of change depictions should layout a casual chain, show influences and key assumptions.

Why Use a Theory of Change A theory of change is valuable to both evaluators and stakeholders because they work together to build “a commonly understood vision of the long-term goals, how they will be reached, and what will be used to measure progress along the way” (ActKnowledge and the Aspen Institute Roundtable on Community Change, 2003, Basics, Theory of What?).

Page 200

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change A theory of change has many possible uses for an evaluator, including: •

as a framework to check milestones and stay on course

•

to document lessons learned about what really happens

•

to keep the process of implementation and evaluation transparent, so everyone knows what is happening and why

•

as a basis for reports to funders, policymakers, boards (ActKnowledge and the Aspen Institute Roundtable on Community Change, 2003, Benefits).

Another use of theory of change is for reporting results of an evaluation. The W.K. Kellogg Foundation (2004, p. 6) discusses the importance of communication in reporting a program’s success and sustainability. They identify three primary ways a depiction of a theory of change helps with the strategic marketing efforts in: •

describing programs in language clear and specific enough to be understood and evaluated

•

focusing attention and resources on priority program operations and key results for the purposes of learning and program improvement

•

developing targeted communication and marketing strategies.

Articulating the theory of change for a project, program, or policy has several benefits: •

It helps identify elements of programs that are critical to success.

•

It helps build a common understanding of the program and expectations among stakeholders based on a common language.

•

It provides a foundation for the evaluation.

•

It identifies and subsequently measures progress on intervening variables on which outcomes depend.

•

It identifies assumptions that may become the basis of evaluation questions.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 201

Chapter 4

Constructing a Theory of Change Frequently, project or program managers develop theory of change models as they conceptualize the project or program. During this process they may include the stakeholders. Other interventions have not constructed a theory of change and the evaluator will need to construct one. Examining the theory of change should form a part (indeed the basis) of every evaluation. It is important to highlight that a theory of change is not always made explicit, nor is it always or necessarily consistent from start to finish vis-à-vis a given intervention. Evaluation practitioners must uncover and reconstruct a theory of change, and discover any inconsistencies in the theory, and between the theory and outcome. Before beginning to review or construct a theory of change, evaluators must have a clear understanding of the purpose and goals of the project, program, or policy. There are three main questions to consider when constructing a theory of change. The critical questions are: •

Is there research underlying the program?

•

What is the logic of the program?

•

What are the key assumptions being made?

The process begins with learning as much as possible about related interventions and evaluations. With the new information, the process of drawing out the logic of the program and the key assumptions begins. As the logic is identified, it is placed in a chain of events and mapped or drawn. Key assumptions are then identified. Figure 4.3 shows the process in graphic form.

Key

Logic

Assumptions

Research

Fig. 4.3: Process for Constructing a Theory of Change.

Page 202

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change

Is There Research Underlying the Program? The theory of change begins by identifying and reviewing the evaluative research literature. For example, research may show that students do better academically when their parents are involved in their homework or that teachers who visit students homes become more empathetic. Once research findings are identified, the theory of change can be constructed to predict what will take place due to a similar intervention. A review of literature begins by identifying the literature that is available. Evaluative research literature to search includes: •

evaluative studies in one’s own organization

•

OECD DAC repository of Publications and Documents or Information by Country, available on the Internet at the following Web site: http://www.oecd.org/department/0,2688,en_2649_337 21_1_1_1_1_1,00.htm

•

evaluation studies done by the development organizations, development banks (i.e. Asian Development Bank), non-government organizations (NGOs), etc.

•

evaluation studies in related fields

•

research of theories of development.

For example, evaluators have been asked to evaluate an education program in Africa whose goal is to increase the achievement of students in the classroom. The development organization learned that when primary school teachers in a neighboring country visited the homes of their students, their achievement scores increased. The evaluators began by learning what the research literature had to say. They first looked through the research literature in their own development organization for primary school education and achievement scores to see what other interventions had been attempted. They found several other projects and programs and requested copies of evaluation reports from the interventions. They also checked the OECD DAC Publications and Documents on the Internet and found a copy of the evaluation report of the program in the neighboring country that had primary school teachers visiting the homes of their students. They also found two other descriptions of similar programs, one in the Caribbean and another in Bangladesh.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 203

Chapter 4 As a part of their research they also searched for evaluation reports in the medical filed where doctors and/or nurses visited the homes of their patients and other programs aimed at improving achievement of primary school students. The evaluation team read through the research literature they found to learn as much as they could from the literature related to this program. Once evaluators locate relevant research they examine it carefully. Executive Summaries and Conclusions or Lessons Learned are a good place to begin reading in evaluation reports. With limited time available, evaluators can use the skills of skimming and scanning to find important information. Skimming is a way to look for general, main ideas, and important points in a reading. Most academic writing is structured into three parts: introduction, body, and conclusion. The introduction usually gives an overview of the ideas that will be covered in the article. Skimming begins by reading the introduction to learn what to expect in the article and then go through the article to survey the headings, graphics, and other components. Once the evaluator has a general idea of what is in the article, he or she goes back to beginning. Within the body there will be paragraphs that have key words and sentences. In many cases the body also has supporting or points such as explanations, quotations, examples, statistics, and other details about the topic. These supporting points are used to support the key idea. When skimming an article or chapter, evaluators begin by looking for the main ideas in the introduction and conclusions. They also skim down and identify topic sentences in the body paragraphs. If they have a printed copy of the material they can use a highlighter or a pen or pencil to identify ideas so they can find them later. If they are reading from a computer screen, they can highlight key ideas, then copy and paste them into a word processing or spreadsheet document. After skimming the article or chapter the evaluator can then scan it. Scanning is looking for supporting points and details in the paragraph. The evaluator can take notes as he or she scans the article or chapter.

Page 204

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change

What Is the Logic of the Program? The logic of a program identifies probable occurrences surrounding the goal of a project, program, or policy and the intervention. They may begin by being thought of as “If… then…” situations. The logic looks at the purpose and the goals of an intervention and identifies “If X happens, then Y should happen”. When identifying logic, those constructing it specify a number of details about the nature of the desired change, including: specifics about the target population, the amount of change required to signal success, and the timeframe over which such change is expected to occur. They then identify the chain of events they expect to occur (If... then…). This set of relationships is called a theory of change. Often they work backwards from the long-term goal, identifying the logical chain of events until they return to the situation at the beginning of the intervention. If the theory of change has already been constructed the evaluator must go through a similar process to reconstruct the theory of change. When constructing or reconstructing a theory of change, small pieces of paper or cards can be used for recording events in the chain of activities. These can then be easily moved around, added, or removed to assist with building the chain. Consider the example of an intervention to train people in the techniques of development evaluation. The expected results of this intervention would be more high quality evaluations conducted. For this intervention there may be an expected chain of relationships. A simple chain for this intervention might include the following: if evaluators are better trained, they will conduct better evaluations, which should result in useful information to policymakers. The useful information should result in better decisions by decision makers. A theory of change provides an understanding of the intended workings of a project, program, or policy. It does not assume simple linear cause-and-effect relationships. Most theory of change models are not linear; they usually have boxes and/or arrows that link back to earlier, or ahead to later, parts of the theory or change, they show complex relationships. Theory of change models can be done in many different ways. The boxes and arrows can move vertically or horizontally. Alternatively, they might instead be circular, or a storyboard of what is expected to happen. While constructing or reconstructing the logic, evaluators begin to construct a simple map, drawing, or storyboard of what should happen to lead to the long term goal. Again, one suggestion to assist with mapping is to work backward from the long-term goals to the situation at the beginning of the intervention (ActKnowledge and the Aspen Institute Roundtable on Community Change, 2003, Outcomes).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 205

Chapter 4 The chain of events for the training example might be mapped as shown in Figure 4.4.

Training

High Quality Evaluations

Useful Information

Better Decisions

Fig. 4.4: Simple Theory of Change Diagram. Diagram.

As the discussion of the logic progresses, the simple drawing may become more and more complex, showing relationships among the component parts of the map or drawing of the theory of change. Once again, using small pieces of paper or cards to help construct the logic and the map can be helpful. It allows the participants to quickly move items around and to change relationships.

What Are the Key Assumptions Being Made? When constructing the logic, evaluators may find that their logic chains at first appear linear. When they further consider the many factors that interact with their projects, programs, and policies, the theory of change becomes more complex. When identifying the logic of the program they must also identify the assumptions they are making about the change process. Each assumption can then be examined and tested to determine if any key assumptions are hard to support (or even false). Assumptions usually fall into one of three important types: •

assertions about the connections between long term, intermediate, and early outcomes on the map

•

substantiation for the claim that all of the important preconditions for success have been identified

•

justifications supporting the links between program activities and the outcomes they are expected to produce.

In some cases, a fourth type of assumption is often an additional important factor in illustrating the complete theory of change. This assumption outlines the contextual or environmental factors that will support or hinder progress toward the realization of outcomes in the pathway of change (ActKnowledge and the Aspen Institute Roundtable on Community Change, 2003, Overview).

Page 206

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change Evaluators study the emerging logic and investigate their assumptions. Possible questions to ask are: •

Is this theory of change plausible? Is the chain of events likely to lead to the long-term goal?

•

Is this theory of change feasible? Are the capabilities and resources to implement the strategies possible to produce the outcomes?

•

Is this theory testable? Are the measurements of how success will be determined specified? (Anderson, p. 25)

The assumptions are written down and then included in the chain of events. Again, small pieces of paper can be used so they can be reorganized to match the emerging theory. Not all assumptions should be identified. The list would be very long; however, the key assumptions, those presenting the most risk to the program success if they do not happen. must identified. In the example of the training program, key assumptions might be that the training is appropriate for the needs of the evaluators. It also assumes that the evaluators value the training and are motivated to learn. Another assumption is that the evaluator will be given the resources they need to use what they learned in the training. For example, they may need more time to implement the new skills than they were given in the past, or they may need access to technology to use the new skills. Another assumption is that the report writing skills will communicate the information effectively to the government agency and that they will use the results of the evaluation to make better decisions. For this chain to be effective, the assumptions must also be addressed. The assumptions can be simply listed along with the theory of change diagram or the assumptions can also be drawn into the theory of change diagram. Figure 4.5 shows a simple diagram with a few assumptions identified. Needs of trainees met

Training

Enough time to learn

High Quality Evaluations

Report writing skills adequate to communicate with government

Useful Information

Better Decisions

Resources given to follow what they learned

Fig. 4.5: Simple Theory of Change Diagram with Assumptions Identified.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 207

Chapter 4

Theory of Change Template to Assist with Discussions The W.K. Kellogg Foundation (2004, pp. 28-31) suggests using a template to help evaluators explain the theory of change. Figure 4.7 shows their Theory of Change Template. Strategies

Assumptions

5

Influential Factors

6

Problem or Issue

Desired Results

1

4

(outputs, outcomes, and impact)

3

Community Needs/Assets

2

Source: W.K. Kellogg Foundation (2004, p. 28) Fig. 4.7: Theory Theory of Change Template

To use the Theory of Change Template, begin in the middle of the model, Problem or Issue, where the number 1, sits. This is the heart of the template and will be the core of the theory of change. In this space, the evaluator writes a clear statement of the problem or issue. It is helpful to know what other successes have done and to learn from best practices from others. The research done before beginning this step can help at this step.

Page 208

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change The second step moves to the box below step 1, Community Needs/Assets. In this box, the evaluator specifies the needs and/or assets of the community or organization so that they can address the problems or issues. If a needs assessment has been conducted or if the needs of the community or organization have been prioritized, these are the kinds of information to include here. The third step, moves to the right, Desired Results, the evaluator identifies what the project, program, or policy is expected to achieve in both the near-term and the long-term. This might be a vision of the future. These will become the outputs, outcomes, and impact. The fourth step, moves to the far left of the model, Influential Factors. In this area, the evaluator lists the potential barriers and/or supports that might impact for the change that is hoped. For example, these may be risk factors, existing policy environment, or other factors. The fifth step, moves to the upper right, Strategies. Here the evaluator lists general, successful strategies the research has identified that helped similar communities or organizations to achieve the kinds of results the project, program, or policy is attempting to elicit. These may be called “best practices” in the research. The sixth and last step, moves to the right, Assumptions. The last step the evaluator does is to state the assumptions of how and why the identified change strategies will work in the community or organization. These may be principles, beliefs, and/or ideas. The Theory of Change Template can then be used to draw out the graphic representation of the theory of change. If a group of people are involved in constructing the theory of change, each person (or group) can be given a blank template to complete. When all are completed, they can bring them together to discuss and come to agreement upon.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 209

Chapter 4

Example of Constructing a Theory of Change for Teacher Visits to Student Homes Looking at another example, a program did not have an existing theory of change, and the evaluator needed to construct it. The desired goal of the program is better academic performance of students. The intervention is teachers visiting the homes of students. The logic of the situations follows in this way: if teachers (input) visit the homes of their students (activity) and talk to parents (output) they will be more empathetic to the child (outcome), the parents will understand better the school’s need for homework to be completed on time and for the child to attend school every day, and make sure both happen (outcomes). Because the child does homework, attends school regularly, and has an empathetic teacher, the student’s achievement will increase (impact). The evaluator creating the theory of change began with the intended result “”Achievement in Reading” and placed it at the bottom of the diagram. Next, the evaluator identified the intervention at the top by writing “Visits by teachers to student’s home”. From there, the evaluator identified three possible results from home visits: •

teachers gaining an understanding of the home culture of the student

•

parents gaining information about what the school expects of the students

•

both teachers and parents able to special problems that keep the student from doing well at school.

From each of these three possible the evaluator identified other results for each of the three results, creating chains of results and interacting results. For example, from the result “Teachers understanding of the home culture” the evaluator identified a chain of results that included:

Page 210

•

teacher gaining sympathy with children and their view of the world

•

teaching in ways that are more comfortable to the student (words and environment)

•

student morale improves

•

students improve achievement in reading.

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change The evaluator went on to identify other possible results from each of the original three possible results, all ending with achievement in reading. Some of the chains of results also interact with other possible results. The theory of change also identified several assumptions. In this case, they are listed instead of being drawn into the diagram. The key assumptions the evaluator identified were: •

parents will welcome teachers into their homes

•

parents will feel comfortable discussing their views of educating their child

•

teachers will better understand the home culture and learn to adjust their teaching to what they learned

•

parents want to be involved in student learning and want their children to attend school

Figure 4.6 is the visual representation of this theory of change. Visits by teachers to students’ homes Sharing of views by parent and teacher

Teachers’ understanding of the home culture

Parents’ knowledge of school’s expectations for students

Teachers’ sympathy with children and their view of the world

Parental support and encouragement with child’s homework and school assignments

Parental support for better attendance at school

Parents’ knowledge of school’s expectations for students

Teaching in terms comfortable and understandable to students

Conscientiousness of work by students

Student attendance

Student’s receipt of special help

Identification of special problems that retard student’s achievement (health, emotional, etc.)

Improvement of condition (health, emotional)

Student morale

Achievement in reading Fig. 4.6 4.6: Example of Program Theory Model: Teacher Teacher Visits to Students’ Homes.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 211

Chapter 4

Example: AntiAnti-corruption program emphasizing participatory workshops. Consider a different example. This program is attempting to introduce a participatory workshop to reduce corruption in a government. To construct the theory of change, the evaluator began by writing the long-term goal of reducing corruption at the bottom of the diagram and writing the intervention at the top of the diagram. The main events that are predicted to occur are placed in a chain of events in between the two. Although the evaluator could identify over 15 assumptions, the following three key assumptions were identified: •

the participatory workshops are effective and meet the needs of the learners and the program

•

the learners have the skills, attitude, and motivation to participate in the workshop

•

the people will develop “local ownership” and a trickle down effect will occur

Figure 4.7 presents the articulated theory of change underlying a World Bank Institute (WBI) anticorruption program (Haaruis, 2005, p. 43).

Page 212

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change

An anti-corruption program emphasizing (participatory) workshops

• will foster policy dialogues • will help establish a ‘sharing and learning’ process of ‘best practices’ and ‘good examples’ that will have behavioral impacts (like signing integrity pledges) • will include learning process that will be more than ad hoc or single shot while it will also help steer ‘action research’ • will empower participants • will involve partnerships and networks with different stakeholders within civil society and will therefore establish (or strengthen) ‘social capital’ between partners fighting corruption • when these activities help realize ‘quick wins’ that will encourage others to also become involved in the fight against corruption • when these activities also help to establish ‘islands of integrity’ that can have an exemplary function, they will indeed have such a function developing ‘local ownership’ when dealing with anti-corruption activities a trickle down effect from these workshops to other segments of society will take place

• • • •

increased public awareness on the con’s of corruption; increased awareness on the con’s of corruption within civil society; institution building through establishing or strengthening the different pillars of integrity

a transparent society and a transparent and accountable state;

• an exit strategy for the World Bank this will help establish (or strengthen) a national integrity system

which will help establish Good Governance

Which Will Reduce Corruption

Fig. 4.7 4.7: Schematic Representation of Core Elements of WBI’s Underlying Program Logic.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 213

Chapter 4

Graphic Representations of Theory of Change Although most theory of change models use similar components, they may have a different visual presentation. Here we present a basic theory of change model, followed by other variations of visual presentations of a theory of change and ending with a discussion of the logical framework or logframe, a specific variant of the theory of change model.

Basic Model Figure 4.8 shows a very basic example of a theory of change model, called a logic model. This example is for a research grants proposal. Notice that the components are identified on the left of the model. The figure is followed by an example of short- and long- term results.

Activities

Outputs

Selection Criteria Selecting Applicants

Immediate Results

Enhanced Selection Process

Intermediate Results

Improved Research Quality

Final Results

Increased Use of Research Findings

Fig. 4.8: 4.8: Example of a Theory of Change Model for Research Grant Proposal.

Page 214

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change The following example is of short- and long-term results for a logic model given in an example from the National Parole Board’s logic model for the Aboriginal Corrections Component of the Effective Corrections Initiative in Canada. Examples of short-term results •

Communities are better informed about the NPB and conditional release.

•

Hearing processes for offenders in the Nunavut Territory are culturally appropriate.

Examples of long-term results •

The conditional release decision-making process is responsive to the diversity within the Aboriginal offender population.

•

The NPB has better information for decision making, including information on the effects of their history, when conducting hearings.

Different Graphic Formats to Depict Theory of Change Models Theory of change varies considerably in terms of how they look. They can flow horizontally or vertically. The graphic format should be appropriate to the agency and stakeholders. The graphic format should provide sufficient direction and clarity for planning and evaluation purposes. Keep in mind that the logic model should help to focus the evaluation on the results of the program. Since program theory has become a major force in evaluation (Rogers et al, 2000), there has been confusion about terminology. Patton (2002, p, 163), for example, distinguishes a logic model from a theory of change. Patton states that the only criterion for a logic model is that it portrays a reasonable, defensible, and sequential order from inputs through activities to outputs, outcomes, and impacts. In contrast, a theory of change must also specify and explain assumed, hypothesized, or tested causal linkages. Although there are many types of formats to depict theory of change, there are two common types: flow chart or classic logic models and results chain models.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 215

Chapter 4

Flow Chart or Classic Logic Model Flow charts or tables are the most common formats used to illustrate theory of change. The flow chart or classic logic model illustrates the sequence of results that flow (or result) from activities and outputs. It is a very flexible logic model as long as the three core components of the logic model are presented: activities, outputs, and results. Any number of result levels can be used to ensure that a logic model accurately depicts the sequence of outcome results. The cause-effect linkages can be explained by using “if-then” statements. For example, if the activity is implemented, then these outputs will be produced. If the outputs are achieved, then they lead to the first level of immediate results, and so on. Figure 4.9 shows the structure of a flow chart or classic logic model.

Fig. 4.9: 4.9: Structure of a Flow Chart or Classic Logic Model.

Page 216

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change Another visual design to depict the theory of change is shown in Figure 4.10. This model includes the assumptions as the principles behind the design of the initiative. Assumptions

The Beginnings

Inputs

Activities Activities

Outputs

The Planned Work

Outcomes

The Intended Results

Source: Adapted from W.K. Kellogg Foundation (2004, p. 11) Fig. 4.10: Logic Model Emphasizing Assumptions.

The flow chart logic model carefully directs thinking about the linkages between specific activities, outputs, and outcomes. It helps answer questions such as: What outputs result from each activity? What outcome resulted from the output?

Results Chain Model The results chain model is also referred to as a performance chain. While it is similar to the flow chart model, it does not isolate the specific activities, outputs or results. The results chain, therefore, does not show the same detail with respect to the causal sequence of outputs and results. Both types of logic models, however, are used as a structure for describing the expectations of a program and as a basis for reporting on performance. Like the flow chart model, it is based on the rationale or theory of the program. The logic model can be used as a basis for measuring efficiency and effectiveness. The inputs, activities, and outputs can be used as measures of efficiency, whereas the results (outcomes) can be used as measures to evaluate program effectiveness.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 217

Chapter 4 Figure 4.11 shows the structure of a results chain model.

Area of Control Internal to the Organization

Inputs

Activities

(Resources)

Outputs Reach Direct Beneficiaries

Outputs

Area of Influence External to the Organization

Short-Term Results

Intermediate Results

(Direct)

(Indirect)

Long Term Result

External Factors Source: Six Easy Steps to Managing For Results: A Guide for Managers, April 2003, Evaluation Division, Department of Foreign Affairs and International Trade in Canada.

Fig. 4.11: Structure of a Results Chain Model.

Consider the following when comparing flow chart and results chain models: •

The results chain is less time-consuming to develop.

•

The flow chart logic model enhances understanding of how specific activities might lead to results.

•

Can develop one, two, or three result levels, depending on the relevance to your program or organization.

Many examples of completed logic models are on the University of Wisconsin Extension Web site. http://www.uwex.edu/ces/pdande/evaluation/evallogicmodel examples.html

Page 218

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change

Example: MicroMicro-Lending Program Many micro-lending programs intend to promote new livelihoods and improve household well-being by helping women enter the labor force and build entrepreneurial skills thereby increasing household income. The long-term goal is to promote private sector development and further economic growth. The programs often focus on the rural poor. These programs not only provide financing but also technical assistance. Loans average US $225 and are all at or below US $500. They are to be lump sums for investing in a micro enterprise or providing working capital. The loan maturities range from 1 to 10 years, with an average of 2-3 years. A grace period of one year is offered. Many programs also include capacity building programs for clients. Topic covered might include literacy, basic bookkeeping, business plan development, and financial management. Figure 4.12 shows a simplified logic model that can be drawn depicting the micro-lending program described in the above example. Access to startup funds for small business

Financial management advice and support

Income and employment for local people

Skills in business and financial management

Improved living conditions

Reduced family poverty

Fig. 4.12: Simple Logic Model for a MicroMicro-Lending Program.

Figure 4.13 shows a more detailed graphical depiction of the theory of change for the same micro-lending program. Besides the chain of events, notice the items in circles. The circled items show some other major thing going on in the environment that might influence this program’s attainment of its goal.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 219

Chapter 4

Women have limited access to economic opportunities due to lack of access to credit and other productive resources and to social control Governmentfunded related programs

Worker remittances Project offers credit, technical assistance and group information

Macro economic environment

Women create business

Other bi-lateral micro-financed MDM programs

Foundation programs NGO’s programs

Generate profits

Short-term improvement in household welfare

Permanent improvement in household welfare

Impact nutrition, health, and clothing

Profits re-invested

Business sustained

Improved housing

Improved education for girls

Economic improvements

Fig. 4.13: More Complex Logic Model for a MicroMicro-Lending Program.

Key Assumptions:

Page 220

•

Women will create business. They will have time and family support to do so.

•

The profits generated will not be diverted (e.g. used to pay for dowry).

•

The business will succeed because there is a strong demand for the product at the price to be charged and there is short supply.

•

The business will succeed despite possible constraints on women’s time or social pressures.

•

The business will succeed as women are provided financial management skills and peer support.

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change Figure 4.14 shows an example of a logic model for a training program. Along with this graphic depiction, other major environmental events that may affect the program should be identified. This would be particularly vulnerable to the macro-economic situation. Inputs

Activities

Resources

Services

▪ Money

▪ Training

▪ Staff

▪ Education

▪ Volunteers

▪ Total # of classes ▪ Hours of service ▪ Number of participants completing course

• Eligible participants

Outcomes

Products

▪ Counseling

▪ Supplies

Outputs

Impacts

Benefits ▪ New knowledge ▪ Increased skills ▪ Changed attitudes ▪ New employment opportunities

Changes ▪ Trainees earn more over five years than those not receiving training ▪ Trainees have higher standard of living than the control group

Fig. Fig. 4.14: Logic Model for a Training Program

The key assumptions for the training program include: •

growing economy that needs skilled workers

•

training is provided in areas currently and in future likely to be in high demand

•

program does not train those likely to have succeeded in obtaining good jobs without the program

•

training is of sufficient length

•

program does not focus on those at the top of the eligibility range

•

program has specialized equipment needed

•

program has specialized staff needed

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 221

Chapter 4

Framework Logical Fram ework (Logframe) A specific variant of the theory of change model or framework is the logical framework, or logframe. A logical framework links up the activities, results, purpose, and objectives of a program, policy, or project in a hierarchy. For each of the components, the evaluator identifies the indicators that are needed, the sources, and the assumptions. The logframe is a specific type of logic model or approach. It helps to clarify the objectives of a given project, program, or policy, and to identify the causal links between inputs, processes, outputs, outcomes, and impact. Performance indicators are drawn up for each stage of the intervention. Key assumptions are articulated, and the manner in which evaluation and supervision will be undertaken is explained. The logframe is essentially a 4×4 matrix containing a summary of the critical elements of a project/program/policy. The approach addresses key questions for a project/program/ policy in a methodical manner according to causal logic. Figure 4.15 contains one example of the way in which a logical framework can be used for a program goal. Narrative Summary

Performance Indicators

M&E/Supervision/ Verification

Key Assumptions

Program Goal: Project development objective: Outputs: Activities:

Fig. 4.15: Example of Logframe for a Program Goal.

Page 222

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change The logframe can be used for a variety of purposes: •

Improving quality of project, program, and/or policy design – by requiring the specification of clear objectives, the use of performance indicators, and assessment of risks.

•

Summarizing design of complex activities.

•

Assisting the preparation of detailed operational plans.

•

Providing objective basis for activity review, monitoring, and evaluation (which is also true of other logic models) (World Bank OED/ECD, 2004, p. 8).

There are two important criticisms of the logframe. When developing a logframe a person can get lost in the details and lose sight of the bigger picture. The second criticism is that logframes do not emphasize a need for baseline data. Additionally, the logframe is frequently too simple, even for simple project designs. “Not everything important can be captured in a one to three pages, four or five level diagram (Gasper, 1997). Many logframe users have underestimated that a ‘frame’ includes some things and leaves others out, and that a ‘frame-work’ is to help the required work not substitute for it. Finally, after a LF has been prepared, it tends to be fixed and not updated, and thus becomes a `lock-frame’ (Gasper, 1997).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 223

Chapter 4

Example: Logframe for Childcare Figure 4.16 illustrates the logframe approach, using the example of a childcare component of a women’s development project.

GOAL (general objective) Improve the economic and social welfare of women and their families

INDICATORS INDICATORS Improvements in family income in x% of participating families Improvements in measures of health status, nutritional status, and educational participation

VERIFICATION Household surveys of the economic, social, and health condition of all family members

PURPOSE (specific objective) Provide women with opportunities to earn and learn while their children are cared for in home day care centers

INDICATORS • day care homes functioning, providing accessible, affordable care of adequate quality during working hours and thus allowing shifts in women's employment and education activities

VERIFICATION From surveys: changes in women's employment and education and their evaluations of the care provided Evaluations of quality of care provided based on observation

ASSUMPTIONS Other family members maintain or improve their employment and earnings Economic conditions remain stable or improve

OUTPUTS Trained caregivers, supervisors, and directors Day care homes upgraded and operating Materials developed Administrative system in place MIS in place

INDICATORS

VERIFICATION

ASSUMPTIONS

• caregivers trained

Data from MIS on trainees, homes, and materials

Family conditions allow home day care mothers to carry through on their agreements to provide care

ACTIVITIES Select caregivers and supervisors and provide initial training Upgrade homes Develop materials Develop administrative system Deliver home day care Provide continuous training and supervision Develop monitoring and evaluation system

RESOURCES

• homes upgraded and operating • materials created and distributed • a functioning MIS

Budget Technology Human resources

Evaluations of trainees after initial training and during course of continuous training VERIFICATION Plan of action, budgets, and accounting records Studies showing that the chosen model and curriculum work Evaluations to see that the activities were not only carried out but done well Survey o

Source: Inter-American Development Bank, http://www.iadb.org/sds/soc/eccd/6example.html#ex1

Fig. 4.16: Logical Framework for a Childcare Program Embedded in a Women in Development Project

Page 224

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change If a logical framework is developed for a well-baby clinic, it might include immunizations as one of its activities, with a target result of immunizing 50% of all children under age 6 in a particular district. If this target is achieved, then the incidence of preventable childhood diseases should decrease. This ultimately should achieve the overall objective of reducing the number of deaths of children under age six. The second column identifies the indicators that verify the extent to which each objective has been achieved. The third and fourth columns specify where the data will be obtained in order to assess performance against the indicators, and any assumptions made about the nature and accessibility of those data. Logic models are extremely useful in showing how a program is supposed to work and achieve its intended outcomes and impacts. They are also useful in identifying through assumptions the threats to the program working as it supposed to and achieving the desired outcomes and impacts. But when conducting an evaluation based on logic models, the evaluator must also look for unintended outcomes and impacts, both positive and negative.

Assessing a Theory of Change Once the theory of change is constructed, evaluators need to step back and assess the quality of the theory. Evaluators can assess the theory from different viewpoints. These viewpoints or frameworks include assessment: •

in relation to social needs

•

of logic and plausibility

•

through comparison with research and practice

•

by confronting a program theory with one or more relevant scientific theories

•

via preliminary observation (Rossi, Freeman, & Lipsey, pp. 174-183).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 225

Chapter 4 Whatever framework from which the evaluator is viewing the theory of change, all models should be able to answer the following questions: •

Is the model an accurate depiction of the program?

•

Are all elements well defined?

•

Are there any gaps in the logical chain of events?

•

Are elements necessary and sufficient?

•

Are relationships plausible and consistent?

•

Is it realistic to assume that the program will result in the attainment of stated goals in a meaningful manner?

The W.K. Kellogg Foundation (2004, p. 23) developed a checklist to asses the quality of a logic model. The following list of criteria is adapted from their checklist.

Page 226

•

Major activities needed to implement the program are listed.

•

Activities are clearly connected to the specified program theory.

•

Major resources needed to implement the program are listed.

•

Resources match the type of program.

•

All activities have sufficient and appropriate resources.

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change

Summary Evaluators need to resist the urge to rush into the tasks of evaluating without a plan. A front-end analysis is a valuable part of planning that helps evaluators get a larger picture of the project, program, or policy. The front-end analysis can answer important questions about the evaluation, including timing, time to complete, who will be involved, resources, design, program theory and logic, and what already exists. One part of the front-end analysis is identifying who else is involved in the project, program, or policy. A stakeholder analysis is one way to identify the key evaluation stakeholders and determine what they know, what they can contribute, and what they expect. Another part of a front-end analysis for an evaluation is looking at the context of the evaluation. Evaluators ask different questions at different stages of a project, program, or policy, so the evaluator must identify the stage of the intervention. Evaluators also need to keep informed about what others have learned from similar interventions and the latest theories for improving development conditions. Constructing a theory of change underlying the program helps evaluators and stakeholders visualize the program and what evaluation questions could be addressed. The process of constructing a theory of change begins with research. What have others learned about similar situations? What is learned from the research can help evaluators make more informed decisions for evaluating the project, program, or policy. The next step in the process involves three activities: constructing the logic of the program, identifying the key assumptions being made when constructing the logic, and mapping or drawing out the chain of events and their relationships. There are different ways to depict a theory of change but all should be based on the research, depict the logical flow, show competing events that might influence the results, and show a causal chain of events. While the visual depictions can look very different, they typically have three main components: activities, outputs, and results. A logic model is a common framework for visualizing a theory of change. Logical frameworks or logframes are also common frameworks for helping to visualize theory of change.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 227

Chapter 4

Chapter 4 Activities Application Exercise 4.1: Applying the Theory of Change/Logic Model Instructions: Instructions: Below are a list of inputs, activities, outputs and outcomes in random order. Please check the column that best describes each statement. Statement

Input

Activity

Output

Outcome

Impact

A. Poverty rates decline in areas which have the project schools B. Built new educational facilities C. 5,000 students attended classes last year. D. 80% of graduates are hired at above poverty wages E. Hired 20 new teachers F. Provided $6 million in loans and grants for construction G. Implemented new curriculum to teach more practical skills for marketplace H. Test scores improved by 20% I. 1,000 students graduated last year. J. Employers are satisfied with skills of graduates L. Provided 500 students with classes in job search strategies.

Page 228

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change

Application Exercise 4.2: Applying the Theory of Change/Logic Model Now that you are able to identify the inputs, activities, outputs, outcomes and impacts from an existing list (see previous exercise); the next step is to be able to generate a list of inputs, activities, outputs, outcomes, and impacts for a program you are evaluating.

Instructions: Suppose you have been asked to evaluate an agricultural assistance program in your region. The purpose of the program is to provide technical assistance and equipment to farmers to help them improve sustainable farming practices. Identify the inputs, activities, outputs, outcomes, and impacts of the program. Then compare and discuss your lists with a colleague.

Inputs:

Activities:

Outputs:

Outcomes:

Impacts:

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 229

Chapter 4

Application Exercise 4.3: Your Program Instructions: Think about the program you are currently working with or one that you are familiar with. 1. What are its goals?

2. What are its objectives?

3. What are its major activities?

4. Why is it important to know whether this program is making a difference?

5. Identify it’s: Inputs: Activities: Outputs: Outcomes: Impacts:

Page 230

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change 6. What is the theory of change? a) What does research have to say about the expected relationships?

b) What factors external to the project, program, or policy might explain the results or influence them?

c) What critical assumptions are being made?

7.

Draw a program logic model to show how or why you expect your intervention to work.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 231

Chapter 4

Resources and Further Reading: ActKnowledge and the Aspen Institute Roundtable on Community Change (2003). Theory of Change. Retrieved April 16, 2008 from http://www.theoryofchange.org/ Anderson, Andrea, A. (2004). The community builder’s approach to theory of change: A practical guide to theory development. New York: The Aspen Institute Roundtable on Community Change. Retrieved April 18, 2008 from http://www.aspeninstitute.org/atf/cf/%7BDEB6F227659B-4EC8-8F848DF23CA704F5%7D/rcccommbuildersapproach.pdf Bell, P. (1997). “Using argument representations to make thinking visible for individuals and groups.” In: R. Hall, N. Miyake, and N. Enyedy (Eds.), Proceedings of CSCL '97: The Second International Conference on Computer Support for Collaborative Learning. Pp. 10-19. Toronto: University of Toronto Press. Bruning, R. H., G. J.Schraw, M. M. Norby, and R. R. Ronning (2004). Cognitive psychology and instruction (4th Ed). Upper Saddle River, NJ: Pearson Merrill Prentice Hall. Canadian International Development Agency (CIDA) (2005). Case Study #1: Jamaica Environmental Action Program (ENACT) Caribbean Division, Americas Branch. Retrieved April 15, 2008 from http://www.acdicida.gc.ca/CIDAWEB/acdicida.nsf/En/EMA-218131811PHY#1 Eggen, P. and D. Kauchak (1999). Educational psychology: Windows on classrooms (4th ed.). Upper Saddle River, NJ: Merrill: Prentice Hall. Evaluation Models (2001). New directions for evaluation, No. 89. San Francisco, CA: Jossey Bass. Fitzpatrick, Jody L., James R. Sanders, and Blaine R. Worthen (2004). Program evaluation” Alternative approaches and practical guidelines. New York: Person Education. Funnell, S. (1997). Program logic: An adaptable tool for designing and evaluating programs. Evaluation news and comment, Vol. 6(1), pp. 5-7. Gagne, R. M. and W. D., Rohwer Jr. (1969). Instructional psychology. Annual Review of Psychology, 20, 381-418. Gasper, D. (1997). Logical Frameworks: A Critical Assessment Managerial Theory, Pluralistic Practice Working Paper Series No. 264. Institute of Social Studies. The Hague: ISS.

Page 232

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change Gasper, D. (2000). “Evaluating the ‘logical framework approach’ – towards learning-oriented development evaluation,” in Public Administration Development, 20:1. pp. 17-28. Haarhuis, Carolien Klein (2005). Promoting anti-corruption reforms: Evaluating the implementation of a World Bank anti-corruption program in seven African countries. Retrieved August 15, 2007 from: http://igiturarchive.library.uu.nl/dissertations/2005-0714200002/full.pdf . p. 43 Healy, A. F, and D. S. McNamara, (1996). Verbal learning memory: Does the modal model still work? Annual Review of Psychology, 47, 143-172. Heubner, T. A. (2000). Program theory in evaluation: Challenges and opportunities. New directions for evaluation, No. 87. San Francisco, CA: Jossey Bass. Kassinove, H., and M. Summers (1968). The developmental attention test – A preliminary report on an objective test of attention. Journal of Clinical Psychology, 24(1), 76-78. Kellogg Foundation on Logic Model Creation Leeuw, Frans L. (2003). “Reconstructing Program Theories: Models Available and Problems to be Solved,” in The American Journal of Evaluation, Volume 24, Number 1, Spring, 2003, pp. 5-20 Leeuw, Frans L. (1991). Policy theories, knowledge utilization, and evaluation. Knowledge and policy, 4: 73-92. Management Sciences for Health (MSH) and the United Nations Children’s Fund (UNICEF), a joint effort (1998). “Quality guide: Stakeholder analysis” in the Guide to managing for quality. Retrieved August 14, 2007 from: http://bsstudents.uce.ac.uk/sdrive/Martin%20Beaver/We ek%202/Quality%20Guide%20%20Stakeholder%20Analysis.htm Mikkelsen, B. (2005). Methods for development work and research: A new guide for practitioners. Thousand Oaks, CA: Sage Publications. Newman, David Kent (2007). Theory: Write-up for conceptual framework. Retrieved November 15, 2007 from: http://deekayen.net/theory-write-conceptual-framework Ormrod, J. E. (2006). Essentials of educational psychology. Upper Saddle River, NJ: Pearson Merrill Prentice Hall. Owen, J.M., & Rogers, P.J. (1999). Program evaluation: Forms and approaches. London: Sage.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 233

Chapter 4 Pancer, S. Mark & Anne Westhues (1989). “A developmental stage approach to program planning and evaluation.” In Evaluation Review, Vol. 13, No. 1, 56-77. Sage Publications. Pawson, Ray. (2006). Evidence-based policy: A realistic perspective. New Brunswick, New Jersey: Sage Publication. Petrosino, Anthony, Robert A. Boruch, Cath Rounding, Steve McDonald, and Iain Chalmers (2003). “The Campbell Collaboration Social, Psychological, Educational, & Criminological Trials Register (C20SPECTR).” Retrieved on August 14, 2007 from: http://www.campbellcollaboration.org/papers/unpublishe d/petrosino.pdf Porteous, Nancy L., B.J. Sheldrick, and P.J. Stewart (2002). “Introducing program teams to logic models: Facilitating the learning process.” The Canadian journal of program evaluation. Vol 17 No. 3. Pp. 113-141. Porteous, Nancy L., B.J. Sheldrick, and P.J. Stewart (1997). Program evaluation tool kit: A blueprint for public health management. Ottawa, Canada: Ottawa-Carleton Health Department. Prensky, Marc (2001). Digital natives, digital immigrants. On the Horizon 9(5). Retrieved November 15, 2007 from http://www.marcprensky.com/writing/Prensky%20%20Digital%20Natives,%20Digital%20Immigrants%20%20Part1.pdf Ritzer, George (1993). The McDonaldization of Society, revised edition. Thousand Oaks: Pine Forge Press. Rogers, Patricia J. T.A. Hacsi, A .Petrosino, & T.A. Huebner, (eds.) (2000). Program theory in evaluation: challenges and opportunities, New directions in evaluation, Number 87, San Francisco: Jossey-Bass Publishers. Rossi, P., H.Freeman, & M. Lipsey (1999). Evaluation: A systematic approach. Thousand Oaks: Sage Publications. Scott, M. (2003). The Benefits and Consequences of Police Crackdowns. Problem-Oriented Guides for Police, Response Guide No. 1. Washington, D.C.: U.S. Department of Justice, Office of Community Oriented Policing Services. Scriven, Michael (2007). Key evaluation checklist. p. 3. http://www.wmich.edu/evalctr/checklists/kec_feb07.pdf Shadish Jr., W.R., T.D. Cook, and L.C. Leviton (1991). Foundations of pro gram evaluation. Thousand Oaks, CA: Sage Publications. Sherman, L.W. et al (editors) (2002).Evidence-based crime prevention. London: Routledge. Page 234

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change Stufflebeam, D.L., G.F Madaus, and T. Kellaghan (Eds.) (2000). Evaluation models (2nd Ed.). Boston: Kluwer Academic Publishers. Suthers, D. D. and A. Weiner, (1995) Belvédère. Retrieved August 15, 2007 from: http://lilt.ics.hawaii.edu/belvedere/index.html Suthers, D. D., E. E. Toth, & A. Weiner (1997). An integrated approach to implementing collaborative inquiry in the classroom. In R. Hall, N. Miyake, & N. Enyedy (Eds.), Proceedings of CSCL '97: The Second International Conference on Computer Support for Collaborative Learning (pp. 272-279). Toronto: University of Toronto Press. Swedberg, Richard. (2003). Principles of Economic Sociology, Princeton, NJ: Princeton University Press. Taylor-Powell, Ellen (2005). Logic models: A framework for program planning and evaluation. University of Wisconsin – Extension, Program Development and Evaluation. Retrieved August 15, 2007 from: http://www.uwex.edu/ces/pdande/evaluation/pdf/nutritio nconf05.pdf US GAO (1991). Designing evaluations. Retrieved August 14, 2007 from: http://www.gao.gov/special.pubs/10_1_4.pdf W.K. Kellogg Foundation (2004). Logic Model Development Guide. Battle Creek, MI. Retrieved April 17, 2008 from http://www.wkkf.org/Pubs/Tools/Evaluation/Pub3669.pdf Weiss, Carol H. (1997). Evaluation: Methods for studying programs and policies. New Jersey: Prentice Hall. The World Bank, OED/ECD (2004). Monitoring and evaluation: Some tools, methods and approaches. Retrieved August 15, 2007 from: http://lnweb18.worldbank.org/oed/oeddoclib.nsf/b57456d 58aba40e585256ad400736404/a5efbb5d776b67d285256b 1e0079c9a3/$FILE/MandE_tools_methods_approaches.pdf The World Bank Group. The World Bank participation sourcebook. Involving Stakeholders. Retrieved on August 14, 2007 from: http://www.worldbank.org/wbi/sourcebook/sb0303t.htm Worthen, B. R., Sanders, J. R., and Fitzpatrick, J. L. (1997). Program evaluation. New York: Longman.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 235

Chapter 4

Web Sites CDC Evaluation Working Group: Logic Model Resources http://www.cdc.gov/eval/resources.htm#logic%20model Campbell Collaboration http://www.campbellcollaboration.org/ Community Toolbox. A Framework for Program Evaluation: A Gateway to Tools. http://ctb.lsi.ukans.edu/tools/EN/sub_section_main_13 38.htm International Development Research Centre (2004). Evaluation Planning in Program Initiatives Ottawa, Ontario, Canada. Online: http://web.idrc.ca/uploads/userS/108549984812guideline-web.pdf W.K. Kellogg Foundation Logic Model Development Guide http://www.wkkf.org/Pubs/Tools/Evaluation/Pub3669 .pdf Klein Haarhuis, Carolien (2005). Promoting anti-corruption reforms: Evaluating the implementation of a World Bank anticorruption program in seven African countries. Available online at: http://igitur-archive.library.uu.nl/dissertations/20050714-200002/full.pdf Management Sciences for Health (MSH) and the United Nations Children’s Fund (UNICEF), “Quality guide: Stakeholder analysis” in Guide to managing for quality. http://bsstudents.uce.ac.uk/sdrive/Martin%20Beaver/We ek%202/Quality%20Guide%20%20Stakeholder%20Analysis.htm Porteous, Nancy L, Sheldrick, B. J., and Stewart, P. J. (1997). Program evaluation tool kit: A blueprint for public health management. Ottawa, Canada: Ottawa-Carleton Health Department. Available online at http://www.phac-aspc.gc.ca/php-psp/tookit.html (English) or http://www.phac-aspc.gc.ca/php-psp/toolkit_fr.html (French) Suthers, D. and A. Weiner. (1995) Groupware for developing critical discussion skills. http://www-cscl95.indiana.edu/cscl95/suthers.html

Page 236

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change Taylor-Powell, Ellen (2005). “Logic models: A framework for program planning and evaluation”. Madison WI: University of Wisconsin-Extension, Program Development and Evaluation. http://www.uwex.edu/ces/pdande/evaluation/pdf/nutritio nconf05.pdf University of Ottawa Program Evaluation Toolkit. Program Logic Model: http://www.uottawa.ca/academic/med/epid/excerpt.htm University of Wisconsin-Extension (UWEX). Logic Model: http://www.uwex.edu/ces/pdande/evaluation/evallogicmodel. html University of Wisconsin-Extension examples of logic models. http://www.uwex.edu/ces/pdande/evaluation/evallogicmo delexamples.html The Evaluation Center, Western Michigan University. The Checklist Project: http://www.wmich.edu/evalctr/checklists/checklistmenu. htm#mgt The World Bank Participation Sourcebook. Online (HTML format): http://www.worldbank.org/wbi/sourcebook/sbhome.htm

From the M & E News Web Site Wider Discussions of Logic Models from the M and E News Website 2007 Program logic – an introduction; provided by Audience Dialogue (15/06/07) http://www.audiencedialogue.org/proglog.htm Enhancing Program Performance with Logic Models This course introduces a holistic approach to planning and evaluating education and outreach programs. Module 1 helps program practitioners use and apply logic models. Module 2 applies logic modeling to a national effort to evaluate community nutrition education. Provided by the University of Wisconsin (15/06/07) http://www.uwex.edu/ces/lmcourse/ 2006 Online Logic Model training: an audiovisual presentation by Usable Knowledge, USA Twenty minutes long, with a menu that can be used to navigate to the sections of interest (09/12/06) http://www.usablellc.net/Logic%20Model%20%28Online% 29/Presentation_Files/index.html

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 237

Chapter 4 2004 W.K. Kellogg Evaluation Logic Model Development Guide: Using Logic Models to Bring Together Planning, Evaluation, and Action. Updated (original was published in 1998) "The program logic model is defined as a picture of how your organization does its work – the theory and assumptions underlying the program. A program logic model links outcomes (both short- and long-term) with program activities/processes and the theoretical assumptions/principles of the program." http://www.wkkf.org/pubs/tools/evaluation/pub3669.pdf 2003 Network Perspective in the Evaluation of Development Interventions: More than a Metaphor. Rick Davies, for the EDAIS Conference November 24-25, 2003 New Directions in Impact Assessment for Development: Methods and Practice (Posted 21/05/05) http://www.enterpriseimpact.org.uk/conference/Abstracts/Davies.shtml [Full text also at www.mande.co.uk/docs/nape.doc ] 2001 The Temporal Logic Model : A Concept Paper:, by Molly den Heyer. On the IDRC Website. July 2001 http://www.idrc.ca/uploads/userS/10553603900tlmconceptpaper.pdf A Bibliography for Program Logic Models/Logframe Analysis. December 18, 2001 Compiled by: Molly den Heyer Evaluation Unit, International Development Research Centre http://www.idrc.ca/uploads/userS/10553606170logframebib2.pdf 2000 Application of Logic Modeling Processes to Explore Theory of Change from Diverse Cultural Perspectives PowerPoint presentation by Ricardo Millett, Sharon Dodson, & Cynthia Phillips American Evaluation Association November 4, 2000 http://www.mande.co.uk/docs/Phillips.ppt 1997 The Logic Model for Program Planning and Evaluation, Paul F McCawley, 1997, University of Idaho Extension. http://www.uidaho.edu/extension/LogicModel.pdf

Page 238

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change

Explanations of the Logical Framework from the M and E News Website: 2006 Wikipedia entry: Logical framework approach (09/12/06) http://en.wikipedia.org/wiki/Logical_framework_approach 2005 Logical Framework Approach, as explained by IAC Waageningen UR on their PPM&E Resource Portal http://portals.wdi.wur.nl/ppme/index.php?Logical_Frame work_Approach The Rosetta Stone of Logical Frameworks. It shows how different agencies' terms relate to each other. Produced by Jim Rugh of CARE http://www.mande.co.uk/docs/Rosettastone.doc The newly updated AusGuidelines. See Section 3.3 The Logical Framework Approach [267KB] and Section 2.2 Using the Results Framework Approach [135KB] (posted 23/12/2005) http://www.ausaid.gov.au/ausguide/default.cfm Logical Framework Analysis: A Planning Tool for Government Agencies, International Development Organizations, and Undergraduate Students. Andrew Middleton http://www.undercurrentjournal.ca/2005II2%20%20middleton.pdf 2004 The Logical Framework Approach: A summary of the theory behind the LFA method. SIDA. January 2004. Kariu Ortengren. The aim of this booklet is to provide practical guidance for Sida partners in project planning procedures. It contains a description of the theory of LFA, which summarizes approaches and principles, the different planning steps and how they can be implemented – as well as the roles of different stakeholders in a planning procedure. http://www.sida.se/shared/jsp/download.jsp?f=SIDA1489 en_web.pdf&a=2379 Constructing a Logical Framework, produced by the Knowledge and Research Programme on Disability and Healthcare Technology. July 2004 http://www.kar-dht.org/logframe.html

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 239

Chapter 4 Logical Framework (LogFRAME) Methodology, produced by JISC infoNet Providing Expertise in Planning and Implementing Information Systems. Undated http://www.jiscinfonet.ac.uk/InfoKits/projectmanagement/InfoKits/infokit-related-files/logicalframework-information 2003 Annotated Example of a Project Logframe Matrix, by IFAD (actually Irene Guijt and Jim Woodhil, consultants to) These two Web pages "provide an example of how to develop and improve the logframe matrix for an IFAD-supported project by giving a "before revision" and "after revision" comparison. The "before" logframe matrix is shown with comments on the problems and how these could be overcome. The "after" logframe matrix shows the partial reworking of the original logframe matrix. The example is based on several IFAD-supported projects and so represents a fictitious project.” This Annex is a part of "A Guide for M&E" whose main text also includes one section on "Linking Project Design, Annual Planning and M&E" [http://www.ifad.org/evaluation/guide/3/3.htm ] which has sub-sections specifically on the Logical Framework. http://www.ifad.org/evaluation/guide/annexb/index.htm 2002 The Logical Framework: Making it Results-Oriented, produced by CIDA http://www.acdicida.gc.ca/cida_ind.nsf/0/c36ebd571b6fa02985256c62006 6cd6f?OpenDocument Tools for Development: A handbook for those engaged in development activity Performance and Effectiveness Department for International Development September 2002. See section 5 Logical Frameworks, 5.1 Introduction, 5.2 What is a logframe and how does it help?, 5.3 Advantages, 5.4 Limitations, 5.5 How to develop a logframe, Box 1: Key points to completing the logframe, Box 2: The If / And / Then logic that underlies the logframe approach, 5.6 Types of Indicators, Box 3: The logframe matrix, Box 4: Indicators, 5.7 Living logframes, Box 5: Logframe program planning for primary education, Box 6: Learning logframe principles, Box 7: Checklist for Objectives column of the logframe, Box 8: Checklist for Risks and Assumptions, Box 9: Checklist for Indicators and Means of Verification, Box 10: The Logical Framework: Project Design, Box 11: The Logical Framework: Project Indicators, Monitoring, Evaluation and Reporting" (Posted 17/06/05)

http://www.dfid.gov.uk/pubs/files/toolsfordevelopment.pd f

Page 240

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change 2001 Engendering the Logical Framework, produced by Helen Hambly, Odame Research Officer, ISNAR August 2001 http://www.jiscinfonet.ac.uk/InfoKits/projectmanagement/InfoKits/infokit-related-files/logicalframework-information BOND Guidance Notes Series I Beginner’s Guide to Logical Framework Analysis 2001 These guidance notes are drawn from training on LFA conducted for BOND by Laurence Taylor , Neil Thin, & John Sartain http://www.ngosupport.net/graphics/NGO/documents/eng lish/273_BOND_Series_1.doc 1999 The Logical Framework Approach, Handbook for Objectives-Oriented Planning. Fourth edition, NORAD, 1999, ISBN 82-7548-160-0. http://www.norad.no/default.asp?V_ITEM_ID=1069 1997 Guidance on the DIFD Logical Framework, as received by CARE in 1997 [includes matrix] http://www.mande.co.uk/docs/DFID1997CARELogFrame Guide.doc 1996 The Third Generation Logical Framework Approach: Dynamic Management for Agricultural Research Projects, R. Sartorius http://library.wur.nl/ejae/v2n4-6.html 1987 Coleman, G. 1987. Logical Framework Approach to the Monitoring and Evaluation of Agricultural and Rural Development Projects. Project Appraisal 2(4): 251-259.

Critiques of the Logical Logical Framework from the M and E News Website 2006 The Use and Abuse of the Logical Framework Approach A Review of International Development NGOs’ Experiences. A report for Sida. November 2005. Oliver Bakewell and Anne Garbutt, of INTRAC. "In this review, we have attempted to take stock of the current views of international development NGOs on the LFA and the ways in which they use it. We start in the next section by considering the different meanings and connotations of the term logical framework approach as different actors use it. In Section 3 we look at how LFAs are used by INGOs in both planning The Road to Results: Designing and Conducting Effective Development Evaluations

Page 241

Chapter 4 and project management. The next section reviews some of the debates and critiques around the LFA arising both from practice and the literature. In response to these challenges, different organizations have adapted the LFA and these variations on the LFA theme are outlined in Section 5. We conclude the paper by summarizing the findings and reflecting on ways forward. This review has been commissioned by Sida as part of a larger project which aims to establish new guidelines for measuring results and impact and reporting procedures for Swedish development NGOs receiving support from Sida. " (Posted 05/02/06) http://www.sida.se/shared/jsp/download.jsp?f=LFAreview.pdf&a=21025 2005 Methodological Critique and Indicator Systems MISEREOR http://www.misereor.org/index.php?id=4495 2001 Thinking about Logical Frameworks and Sustainable Livelihoods: A short critique and a possible way forward by Kath Pasteur with ideas and input from Robert Chambers, Jethro Pettit and Patta Scott-Villiers August 22nd, 2001 http://www.livelihoods.org/post/Docs/logframe.rtf Logical Frameworks: Problems and Potentials, by Des Gasper. http://winelands.sun.ac.za/2001/Papers/Gasper,%20Des. htm Programme and Porject Cycle (PPCM): Lessons from DFID and other organizations Phillip Dearden. http://www.mande.co.uk/docs/dearden.pdf 2000 Logical frameworks, Aristotle and soft systems: a note on the origins, values and uses of logical frameworks, in reply to Gasper Simon Bell", Open University, UK. Correspondence to Simon Bell, Southern Cottage, Green Lane, Wicklewood, Norfolk NR18 9ET, UK. Evaluating the “logical framework approach” - towards learning-oriented development evaluation’, Des Gasper, Public Administration and Development, 20(1), 2000, pp. 17-28. e-mail; [email protected]. Abstract: "The logical framework approach has spread enormously, including increasingly to stages of review and evaluation. Yet it has had little systematic evaluation itself. Survey of available materials indicates several recurrent failings, some less easily countered than others. In particular: focus on Page 242

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change achievement of intended effects by intended routes makes logframes a very limiting tool in evaluation; an assumption of consensual project objectives often becomes problematic in public and inter-organizational projects; and automatic choice of an audit form of accountability as the priority in evaluations can be at the expense of evaluation as learning."

Alternative versions of the Logical Framework from the M and E Website 2006 Logical Framework Approach – with an appreciative approach. April 2006 SIDA Civil Society Centre. "As a part of its effort to realize the intentions of Sweden’s Policy on Global Development, Sida Civil Society Center (SCSC) initiated a development project in 2005 together with PMU Interlife (the Swedish Pentecostal Mission’s development cooperation agency) and consultant Greger Hjelm of Rörelse & Utveckling. The goal was to create a working model which combines the goal hierarchy and systematics from the Logical Framework Approach (LFA)1 with the approach used in the Appreciative Inquiry tool (AI). AI is both a working method and an approach. In analyzing strengths and resources, motivation and driving forces, the focus is placed on the things which are working well, and on finding positive action alternatives for resolving a situation. LFA, which is an established planning model in the field of international development, is found by many to be an overly problem-oriented model. Using this approach, one proceeds based on a situation in which something is lacking, formulates the current situation as a “problem tree”, and thus risks failing to perceive resources which are actually present, and a failure to base one’s support efforts on those resources. Working in close cooperation, we have now formulated a new working method for planning using LFA, one which is built on appreciative inquiry and an appreciative approach. The model was tested by PMU Interlife’s program officers and their cooperating partners in Niger, Nicaragua and Tanzania during the autumn of 2005. Their experiences have been encouraging, and it is our hope that more Swedish organizations and their cooperating partners will try our model and working method.(Posted 01/07/06) http://www.sida.se/shared/jsp/download.jsp?f=SIDA2835 5en_LFA_web.pdf&a=23355

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 243

Chapter 4 2005 No more log frames!! People-Focused Program Logic Two day workshop Monday 19th and Tuesday 20th of September 2005, in Melbourne, Australia. "Purpose of the workshop: •

to understand what ‘people-focused’ program logic is and how to use it

•

to build a people-focused program logic for their own project.

Who should attend? People with monitoring and evaluation interests who are working on projects with capacity building components. Course description: In this workshop, participants will build their own ‘people-focused’ logic model. To do this they will analyze the key beneficiaries of their project, build their program logic model around this analysis, and consider assumptions made in the logic. The program logic will be built around a generic theory of how capacity building works, that can be modified to include elements of advocacy and working with or through partners. Participants will also learn how this logic can be used to form the spine of their monitoring, evaluation and improvement framework. As participants will be invited to develop their own program logic model, they are encouraged to bring along others from the same project team. Examples of frameworks, and a workbook will be provided to participants" For additional information: Jo Leddy of Clear Horizon Phone: 03 9783 3662 E-mail: [email protected] . Website: www.clearhorizon.com.au. See the rest of the flyer at: http://www.mande.co.uk/docs/PFPflyer.pdf for more information... (Posted 21/06/05) Intertwining participation, Rights Based Approach and Log Frame: A way forward in Monitoring and Evaluation for Rights Based Work. Partha Hefaz Shaikh (mailto:[email protected]) Initial Draft - Circulated for discussion. "Programme implementation through Rights Based Approach (RBA) in ActionAid Bangladesh started in 2000 and it took us quite a while to understand what it meant to implement programs in a RBA environment. Side by side we were also grappling with issues of monitoring and evaluation of programs implemented through a rights based approach. In order to develop a more meaningful framework that has all the elements of participation, RBA and log-frame we developed what we call “Planning and Implementation Framework Analysis (PIFA)”. (Posted 20/05/05) http://www.mande.co.uk/docs/PIFA_Article_PDF.pdf

Page 244

The Road to Results: Designing and Conducting Effective Development Evaluations

Understanding the Evaluation Context and Program Theory of Change A Modified Logframe for Use in Humanitarian Emergencies. by Bernard Broughton? http://www.mande.co.uk/docs/EmergencyLogframeBroug hton.doc Family Planning Logical Framework (with two parallel processes, one feeding back into the other) http://www.mande.co.uk/docs/FP%20Logframe.doc

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 245

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 5 Considering the Evaluation Approach Introduction Development has moved from a project approach to focus on programs and policies with emphasis on sustainable development. To address these broader and more complex subjects, a wide variety of approaches to designing and conducting evaluations have been used. In this chapter we will look at some of these approaches. This chapter has three parts and they are: • Introduction to Evaluation Approaches •

Development Evaluation Approaches

•

Challenges Going Forward.

Chapter 5

Part I: Introduction to Evaluation Approaches An evaluation approach is a “general way of looking at or conceptualizing evaluation, which often incorporates a philosophy and a set of values” (Duigen, 2007, Evaluation Approaches). From the 1960s, assistance for development has focused on projects. Evaluation of these development projects has focused on efficiency and effectiveness. While project level evaluations remained the predominant approach, the 1980s paid more attention to structural adjustment policies, moving beyond evaluating only at the project level to looking at programs. Since the 1990s, the international community has been developing partnership approaches to development assistance. Cooperative approaches include more stakeholders, greater stakeholder involvement, and more complex operations. In addition, development organizations are increasingly entered into joint evaluations. Since 2000, evaluation has focused on the Millennium Development Goals (MDGs), discussed in Chapter 2: Understanding Issues Driving Development Evaluation. Increasingly, as seen in Chapter 4: Understanding the Evaluation Context and Program Theory of Change, there is demand to know what works with an emphasis on impact evaluation. A variety of approaches have been developed to meet the changing nature of development evaluation. The choice of evaluation approach is somewhat dependent on the context. Approaches are not necessarily mutually exclusive and evaluations may combine elements of two or more approaches. Still, as we move from the project to broader and more complex interventions, evaluation challenges increase. Some approaches (e.g. evaluability assessment, rapid assessment, and evaluation synthesis) have been used and tested for many years and continue to be valuable.

No mater what approach is chosen, each approach still requires the same planning steps. All approaches define evaluation questions, identify measures, collect and analyze data, and report and use findings.

Page 248

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Part II: Development Evaluation Approaches In this part of the chapter we will discuss many of the development evaluation approaches. The approaches discussed are: •

Prospective Evaluation

•

Evaluability Assessment

•

Goal-based Evaluation

•

Goal-free Evaluation

•

Multi-site Evaluation

•

Cluster Evaluation

•

Social Assessment

•

Environmental and Social Assessment

•

Participatory Evaluation

•

Outcome Mapping

•

Rapid Assessment

•

Evaluation Synthesis and Meta-evaluation

•

Emerging Approaches.

Prospective Evaluation A prospective evaluation is done ex ante, that is, one in which a proposed program is reviewed before it begins, in an attempt to: •

analyze its likely success

•

predict its cost

•

analyze alternative proposals and projections.

Prospective evaluations have been done by evaluation organizations within legislative branches. An example is the United States Government Accountability Office (GAO)2, which reports to the U.S. Congress. The GAO evaluators sometimes assist government decision-makers by providing analytical information on issues and options on potential programs (United States Government Accountability Office (GAO, 1990, Chapter 1, p. 8). Often, the GAO is asked questions about the likely success of proposed new programs. In addition, the GAO reviews The U.S. Government Accountability Office (GAO) was renamed from the General Accounting Office.

2

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 249

Chapter 5 information on alternative proposals and analyzes the results of similar programs that may be ongoing or completed. Table 5.1 identifies four kinds of forward-looking questions the GAO is asked to do. Table 5.1: Types of GAO Forward Forward Looking Questions What GAO is asked to do Question type

Critique others’ analysis

Do analyses themselves

Anticipate the future

1. How well has the administration projected future needs, costs, and consequences?

3. What are future needs, costs, and consequences?

Improve future actions

2. What is the likely success of an administration or congressional proposal?

4. What course of action has the best potential for success?

(Source: GAO, 1990, p. 11)

Most prospective evaluations involve the following kinds of activities: •

a careful, skilled textual analysis of the proposed program or policy

•

a review and synthesis of evaluation studies from similar programs or policies

•

a prediction of likely success or failure, given a future context that is not too different from the past and suggestions on strengthening the proposed program and policy if the decision-makers want to go forward (GAO, 1990, p. 11).

A prospective evaluation is different from evaluability assessment. It is done before a program exists. An evaluability assessment determines evaluability of a proposed or recently designed program. See the following Web site for an example of prospective evaluation, “Textbooks and Test Scores: Evidence from a Prospective Evaluation in Kenya”: http://www.econ.yale.edu/~egcenter/infoconf/kremer_paper.pdf

Page 250

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Evaluability Assessment An evaluability assessment is a brief preliminary study to determine whether an evaluation would be useful and feasible. This type of preliminary study can also help refine the purpose of the evaluation, identify what data resources are currently available and accessible, identify key stakeholders and clarify their information needs, and consider different methods for conducting the evaluation. The process can save time and help avoid costly mistakes. Joseph S. Wholey and his colleagues developed the evaluability assessment approach in the early 1970s to address their belief that many evaluations failed because of discrepancies between “rhetoric and reality” (Nay & Kay, 1982, p. 225). Wholey and his colleagues saw evaluability assessment as a means for facilitating communication between evaluators and stakeholders. They proposed evaluability assessment as a means for determining whether a program was “evaluable” and for focusing the evaluation (Fitzpatrick, Sanders, & Worthen, 2004, p. 182). Although evaluability assessment was originally developed as a precursor to summative evaluation, its role has expanded. Now it is also used to clarify the purposes of a formative study or as a planning tool (Smith, 1989). The decision is made on the basis of whether or not an intervention is sufficiently coherent so that one can conduct an evaluation. Evaluators do preliminary work to ascertain if an evaluation can be conducted. For example, if an objectivesbased, or goal-based, evaluation is proposed, it may be problematic if program objectives are not sufficiently clear or lack shared agreement among stakeholders. Sometimes measures are not available and need to be developed or data may be inaccessible. Evaluability assessment thus focuses on the feasibility of conducting an evaluation. It is not feasible to design an evaluation from available information on the logic of the intervention, on the theory of change, and on intended outcomes. If this occurs, it is a warning sign that key gaps exist – in the description of the goals, in clarity on the target population, on what outcomes should be expected, and so on. The evaluability assessment serves a useful purpose in helping a proposed intervention focus its goals and objectives, outputs, target population, and outcomes to be clear on what is to be achieved. Evaluability assessments are often conducted by a group including stakeholders, such as implementers, and administrators, as well as evaluators.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 251

Chapter 5 Evaluability assessment includes: •

reviewing materials that define and describe the intervention

•

identifying modifications to the implemented intervention from that which was originally planned

•

obtaining manager and staff perceptions about the intervention’s goals and objectives

•

interviewing stakeholders on their perceptions of goals and objectives

•

developing a theory of change model

•

identifying sources of data

•

identifying people and organizations that can implement any possible recommendations from the evaluation.

One of the potential benefits of evaluability assessment is that it can lead to a more realistic and appropriate evaluation. Smith (1989) and Wholey (1987), also point out that it can improve: •

the ability to distinguish between program failure and evaluation failure

•

accurate estimation of longer term outcomes

•

stakeholder investment in the program

•

program performance

•

program development and evaluation skills of staff

•

the visibility and accountability for the program

•

administrative understanding of the program

•

policy choices

•

continued support.

The challenges of an evaluability assessment are that they can be time consuming and costly, especially if the group conducting the assessment does not work well together. See the following Web site for an example of an evaluability assessment, “An Evaluability Assessment of Responsible Fatherhood Programs”: http://fatherhood.hhs.gov/evaluaby/intro.htm

Page 252

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

GoalGoal-based Evaluations According to Patton (1990), a goal-based (or objectives-based) evaluation is defined as measuring the extent to which a program or intervention has attained clear and specific objectives. The focus of the evaluation is on the stated outcomes (the goals or objectives) of the project, program, or policy: this is the typical evaluation with which most are familiar as it is the basis of most development organization project evaluation systems. A criticism of the goal-based approach is that these kinds of evaluations concentrate on the economical and technical aspects instead of social and human aspects (Hirshheim & Smithson, 1988). A second criticism is that these evaluations focus only on stated goals. There may be other important goals that are implicit rather than explicit, or they may have been discussed during board meetings or division meetings but do not end up reflected in the stated project goals. This may be a serious oversight. For example, a new water treatment plant may have to relocate people who were living on the land that will not be used for the power plant. It is a serious omission if the project does not articulate a goal or objective of the relocation that leaves people with improved, sustainable livelihoods. The evaluation compounds the problem if it does not ask questions about the relocation because it was not a formal explicit project objective. Results-based evaluation, the method we advocate in the textbook, looks for results whether or not they were articulated as goals or objectives. Ideally, they have been established as goals or objectives with measures specified, but they may be implicit, intended or unintended, positive or negative. Goalbased evaluations can be strengthened by being open to unexpected positive or negative results. See the following Web site for an example of goal-based evaluations from the International Fund for Agricultural Development (IFAD) – Country Program Evaluation of The People’s Republic of Bangladesh. http://www.ifad.org/evaluation/public_html/eksyst/doc/coun try/pi/bangladesh/bangladesh.htm

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 253

Chapter 5

GoalGoal-free Evaluations Another recent approach to evaluation is goal-free evaluation. It came as a reaction to goal-based (or objectives-based) evaluation. Scriven (1972b) first proposed goal-free evaluation and has been a well-known advocate of the goal-free approach. In goal-free evaluation, the evaluator makes a deliberate attempt to avoid all rhetoric related to program goals. The evaluator will not discuss goals with the staff or read program brochures or proposals. The evaluators only study the program’s observable outcomes and documentable effects in relation to participant needs (Patton, 2002, p. 169). Scriven (1972b, p. 2) states: It seemed to me, in short, that consideration and evaluation of goals was an unnecessary but also a possibly contaminating step…. The less the external evaluator hears about the goals of the project, the less tunnel-vision will develop, the more attention will be paid to looking for actual effects (rather than checking on alleged effects). Whereas goal-based evaluations must have clearly stated goals and objectives, goal-free evaluations open up the option of gathering data on the effects and effectiveness of programs without being constrained by a narrow focus of stated goals or objectives. Goal-free evaluation requires directly capturing the actual experiences of program participants in their own terms. Goal-free evaluation also requires the evaluators to suspend judgment about what the program is trying to do and to focus instead on finding out what is actually occurring. For these reasons, it is especially compatible with qualitative inquiry. It is important to remember that it can employ both quantitative and qualitative methods (Patton, 2002, p. 170). Scriven (1997) proposed that goal-free evaluation might use separate evaluators, one doing a goal-free evaluation and one doing a goal-based evaluation. He suggested this might maximize the strengths and minimize the weaknesses of each approach.

Page 254

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach Wholey, Harty, and Newcomer (1994) describe the following characteristics of goal-free evaluation: •

The evaluator purposefully avoids becoming aware of the program goals.

•

Predetermined goals are not permitted to narrow the focus of the evaluation study.

•

Goal-free evaluation focuses on actual outcomes rather than intended program outcomes.

•

The goal-free evaluator has minimal contact with the program manager and staff.

•

Goal-free evaluation increases the likelihood that unanticipated side effects will be noted.

For example, an evaluator may be given the following goal for a program: 1) bring school dropouts into a vocational training program, 2) train them in productive vocations, and 3) place them in stable jobs. The goal-free approach grew out of concern that an objectivesbased application was likely to miss important unanticipated outcomes – either positive or negative – because it only looked at the stated objective. If, for example, the stated objective of a water treatment plant is to provide clean water to a certain number of city dwellers, that is what gets measured. If an unanticipated consequence is also reducing water-born diseases, the objectives-based evaluation is likely to miss important benefits. The evaluator may choose a design to measure only stated objectives. If there are additional effects of the program that were not anticipated, such as a rising crime rate of others not participating in the program, these will not be measured in the goal-based evaluation. A goal-free evaluator will be more likely to identify this problem and include a more in-depth investigation of it, than an objectives-oriented evaluator. Examples of goal-free evaluations from the Evaluation Center of Western Michigan University can be found at the following Web site: http://www.wmich.edu/evalctr/project-pub.html

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 255

Chapter 5

MultiMulti-site Evaluations In a larger scale intervention, it is often necessary to look selectively at interventions that have been implemented in a variety of locations. These are called multi-site evaluations. The intervention may have been implemented in the same way in all locations or implemented somewhat differently in some locations. A multi-site evaluation provides information about the overall experience of the intervention as well as a deeper understanding of the variations that have occurred. It may answer questions, such as: •

What features of the intervention implementation are common to all locations?

•

Which features vary and why?

•

Are there differences in outcomes based on those variations?

The deeper information is key and often case studies are used for multi-site evaluation. Sites are generally selected for study because they represent certain characteristics (e.g. size, ethnicity, socio-economic status) that may result in systemic differences in intervention implementation and results. Of course, it may be hard to determine whether it was the variations in the interventions that caused the difference. Sometimes interventions show impacts because of unique differences in a setting, such as strong leadership or a community with active citizens. Other times, changes may be explained by systematic differences in implementation such as regional differences. These may have implications for replication. The evaluation must capture the climate in which the interventions operate, as well as any cultural, geographic, economic, size or other systematic differences that might affect variation in experiences and outcomes. Stakeholders’ participation is important since they can help the evaluator to better understand the local situation. A multi-site evaluation typically is a stronger design than an evaluation of a single intervention in a single location. A multi-site evaluation can more credibly summarize across a larger population because it includes a larger sample and more diverse set of intervention situations. It can address “within” as well as “between” site analyses. Overall findings, as well as consistent findings across interventions, provide stronger evidence of intervention effectiveness.

Page 256

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach The comparisons of the interventions within their contexts are likely to provide a range of lessons learned and strategies for dealing with a variety of situations. Good practices may also emerge from a multi-site evaluation. It is also important to keep in mind that sites selected using judgment are not statistically representative of the population. Not all good practices are necessarily being identified.

Multi--Site Evaluations Challenges for Multi Conducting multi-site evaluations poses unique challenges. First, data collection must be as standardized as possible. The same data collected in much the same way is necessary for comparisons to be meaningful. This requires well-trained staff, access to all sites, and sufficient information ahead of time to design the data collection instruments. It also assumes that the same data are generally available at every site. Yet, each location is different. Some indicators may be comparable (such as amount of resources invested, infant mortality rate, incidence of infectious diseases, fertility rates, utilization of health care resources), but each site may have a slightly different focus. When looking across countries (or regions of a country), the political, social, economic and historical contexts are notable in shaping project implementation and therefore its evaluation. This can be seen in “Investing in Health: Development Effectiveness in the Health, Nutrition, and Population Sector” (Johnston & Stout, 1999).

The following Web sites give examples of multi-site evaluations “Multi-site evaluation of four anti-HIV-1/HIV-2 enzyme immunoassays” by the Australian HIV Test Evaluation Group can be found at the following Web site: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Ret rieve&db=PubMed&list_uids=7882108&dopt=Abstract Multi-site evaluation from SRI International, “A Multi-site Evaluation of the Parent’s as Teachers (PAT) Project” can be found at: http://www.sri.com/policy/cehs/early/pat.html

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 257

Chapter 5

Cluster Evaluations Cluster evaluations are similar to multi-site evaluations but the intention is different. Like multi-site evaluations, cluster evaluations focus on interventions that share a common mission, strategy, and target population. However, a cluster evaluation is not intended to determine whether an intervention works or to ensure accountability. It does not evaluate the success or failure of individual interventions nor does it identify interventions to be terminated. Its intent is to learn about what happened across the clusters and to ascertain lessons learned. Information is only reported in aggregate so that no one project is identified. Like multi-site evaluations, stakeholder participation is a key element. Cluster evaluations differ from multi-site evaluations in that cluster evaluations are not concerned with generalizability or replicability. Variation is viewed as positive because individual projects are adjusting to their contexts, and the evaluation is more focused on learning than drawing overall conclusions about program quality or value. While there is no specific methodology, cluster evaluations are more likely to use qualitative approaches to supplement quantitative data collected. It is possible to think of cluster evaluations as multiple case studies, with sharing of information across cases through networking conferences as a significant characteristic of this approach. Like any evaluation, it is necessary to identify the evaluation questions, determine appropriate measures, develop data collection strategies, analyze and interpret the data and report the findings back to the stakeholders. A disadvantage of cluster evaluations is that they do not show results for data about individual sites or take into account planned or unplanned variation. The data will only show aggregate information. The following Web site shows an example of a cluster evaluation, “Governance in PNG: A cluster evaluation of three public sector reform activities” can be found at the following Web site: http://www.ausaid.gov.au/publications/pdf/governance_in_p ng_qc35.pdf

Page 258

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Social Assessment Social assessment has become an important part of many evaluations. A social assessment looks at various social structures, processes, and changes within a group or community. It can also look at trends that may affect the group. A social assessment is the main instrument used to ensure that social impacts of development projects, programs, and policy are taken into account. It is used to understand key social issues and risks and to determine the social impacts of an intervention on different stakeholders. In particular, social assessments are intended to determine whether the project is likely to cause adverse social impacts, such as require relocation of those living on land to make way for a power plant or other facility. Strategies can be put into place to mitigate those adverse impacts if they are first known and acknowledged, and these mitigation strategies can then be monitored and assessed as part of the evaluation. The World Bank Participation Sourcebook (1996) discusses social assessments. As a part of this discussion, it identifies the following purposes of social assessment •

Identify key stakeholders and establish an appropriate framework for their participation in the project selection, design, and implementation.

•

Ensure that project objectives and incentives for change are acceptable to the range of people intended to benefit and that gender and other social differences are reflected in project design.

•

Assess the social impact of investment projects and, where adverse impacts are identified, determine how they can be overcome or at least substantially mitigated.

•

Develop ability at the appropriate level to enable participation, resolve conflict, permit service delivery, and carry out mitigation measures as required.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 259

Chapter 5 The World Bank Participation Sourcebook also identifies the following common questions asked during social assessment:

•

Who are the stakeholders? Are the objectives of the project consistent with their needs, interests, and capacities?

•

What social and cultural factors affect the ability of stakeholders to participate or benefit from the operations proposed?

•

What is the impact of the project or program on the various stakeholders, particularly on women and vulnerable groups? What are the social risks (lack of commitment or capacity and incompatibility with existing conditions) that might affect the success of the project or program?

•

What institutional arrangements are needed for participation and project delivery? Are there adequate plans for building the capacity required for each?

Social assessment tools and approaches include: •

stakeholder analysis

•

gender analysis

•

participatory rural appraisal

•

observation, interviews, focus groups

•

mapping, analysis of tasks, wealth ranking

•

workshops: objective-oriented project planning, teamup.

The following are a few examples of key indicators for social impact monitoring:

Page 260

•

participation rate by social group in voluntary testing and counseling activities and reports of desirable behavior change

•

percent of community members participating in care for HIV/AIDS victims and their families

•

reduction in AIDS-related violence (by or towards AIDS victims)

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

The following Web site shows an example of social assessment by the World Bank “Morocco: Fez Medina Rehabilitation Project”: http://www.worldbank.org/wbi/sourcebook/sba108.htm#D

Case Study 77-5: Azerbaijan Agricultural Development and Credit Project The Farm Privatization Project, an intervention to provide more flexible and adaptable loans, was implemented to restore Azerbaijan’s farming areas to former levels of productivity. The project focuses on real estate registration, the development of land markets, and provision of credit and information to a larger group of rural women and men, especially those of low income. The purpose of the social assessment was to ensure that the proposed intervention was based on stakeholder ownership (commitment) and that the anticipated benefits were socially acceptable. The information helped design a participatory monitoring and evaluation process. The first phase of the social assessment covered several areas in which the Farm Privatization Project was being implemented. The approaches used included: •

a review of secondary data, including earlier assessments

•

surveys of households and women in three of the six regions following a qualitative rapid assessment

•

semi-structured interviews of individuals (farmers, farm managers, unemployed workers, community leaders, women’s groups, local associations, technicians, government officials)

•

on-site observation by staff (a member of the team lived with a farming family to conduct an in-situ observation of the impact of farm privatization)

•

five focus groups with homogeneous groups of stakeholders

•

consultations with policy makers and administrators, local and international NGOs

•

discussions with ex-managers of state farms and community leaders

•

a stakeholder seminar.

(continued on next page)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 261

Chapter 5

Case Study 77-5: Azerbaijan Agricultural Development and Credit Project (continued) The assessment was organized around the four pillars: Social Development: key concerns focused on poverty, gender, and social exclusion. Institutions: Institutions the power base of the rural areas was changing, making it difficult to identify the key stakeholders. There was limited research about the social organizations and lack of analysis of the impacts of rural migration. Participation: confusion and ambiguities in the land reform process were reported. Land distribution had resulted in reducing poverty and curtailed the influence of ex-farm managers and helped empower the rural population. Access to credit increased but interest rates are high (15-18%). Monitoring/Evaluation: Monitoring/Evaluation Performance indicators are used to monitor implementation. Indicators link the projects inputs and activities with quantified measure of expected outputs and impacts. The assessment also looked at impact: increased productivity, increased income, reduced poverty, and participant satisfaction. (Source: Kudat & Ozbilgin, 1999, pp. 119-172) More information about this project can be found at the following Web site: http://web.worldbank.org/external/projects/main?pagePK=64283627&piPK=7 3230&theSitePK=40941&menuPK=228424&Projectid=P090887

Page 262

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Environment and Social Assessment Another form of evaluation is an Environment and Social, (E&S) Assessment. Increasingly, social assessment and environmental assessment are being viewed as inseparable. Development organizations recognize the need for programs and projects to address E&S issues, as well as evaluate the attainment of E&S-related objectives. Most development organizations adhere to core E&S standards (e.g. Equator Principles) and evaluate their implementation in programs and projects. Development organizations have recognized the role that local people must play in the design and implementation of interventions for the environment and natural resources. Local people and other stakeholders are partners in conservation and natural resource management. Environmental evaluation may be the sole purpose of the exercise or it may be embedded in the program evaluation. Just as the intervention may be an environmental one, an E&S may be targeted. For example, an environmental project might be implementation of waste management or an investment in electrostatic precipitation. A pulp and paper mill, steel mill, or oil pipeline project in an environmentally sensitive area are examples of projects that have strong environmental components and potential impacts. Other interventions, such as the building of a new school or funding of a credit line still need an E&S review.

E&S Guidelines/Standards/Strategies There are three major generic publications that can assist evaluators to assess the environment and social aspects of an intervention if the organization does not have its own standards and they are: •

The Equator Principles

•

ISO 14031

•

Sustainable Development Strategies: A Resource Book.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 263

Chapter 5

The Equator Principles The Equator Principles are an approach for financial institutions to assist them in determining, assessing, and managing environmental and social risk in project financing. The principles are intended to serve as a common baseline and framework for the implementation of individual, internal, environmental, and social procedures and standards for development projects. The Equator Principles only apply to projects with a total capital cost of US $50 million or more. For more information about the Equator Principles see the following Web sites: http://www.equator-principles.com/ http://www.ifc.org/ifcext/equatorprinciples.nsf/Conten t/ThePrinciples

ISO 14031 The International Organization for Standardization, more often known as the ISO, has developed and currently maintains international standards for environmental management. These are called the ISO 14031, Environmental Management Guidelines and were first published in 1999. The subject of this international standard is environmental performance evaluation (EPE). The standard is an internal management process and tool designed to provide management with reliable and verifiable information on an ongoing basis. It helps determine whether an organization’s environmental performance is meeting the criteria set by the management of the organization. EPE and environmental audits help the management of an organization assess the status of its environmental performance and identify areas for improvement. (ISO, 1999, p. v) The EPE assists by establishing processes for:

Page 264

•

selecting indicators

•

collecting and analyzing data

•

assessing information against EP criteria (objectives)

•

reporting and communicating

•

periodically reviewing and improving this process.

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach More information about ISO 14031 can be found at the following Web sites: http://www.iso.org/iso/catalogue_detail?csnumber=231 49 http://www.iso-14001.org.uk/iso-14031.htm http://www.altech-group.com/ftp/EPEarticle.pdf

Strategies: Sustainable Development Strategie s: A Resource Book The OECD and UNDP have published a resource book to provide flexible, non-prescriptive guidance on how to develop, assess, and implement national strategies for sustainable development in line with the principles outlined in the guidelines on strategies for sustainable development. It contains ideas and case studies on the main tasks in the strategy processes. It is targeted at countries, organizations, and individuals concerned with sustainable development at national or local levels, as well as international organizations concerned with supporting such development. The PDF files for this publication are available at the following Web site address: http://www.nssd.net/res_book.html#contents The following Web site provides an example of E&S assessment by Lanco Amarkantak Thermal Power for the IFC Board of Directors consideration of the proposed transaction, named “Environmental & Social Review”: http://www.ifc.org/IFCExt/spiwebsite1.nsf/DocsByUNI DForPrint/30D71C7753448974852572A000676512?op endocument

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 265

Chapter 5

Participatory Evaluation Participatory evaluation is a different way to approach an evaluation. It takes the notion of stakeholder involvement to a new level. The responsibilities for evaluation planning, implementing, and reporting are shared. Not only are stakeholders involved in defining the evaluation questions and reviewing the report, they are also frequently involved in data collection, analysis, and drafting the report. Hubert E. Paulmer (2005, p. 19) describes participatory evaluation as: … a collective assessment of a program by stakeholders and beneficiaries. They are also action-oriented and build stakeholder capacity and facilitate collaboration and shared decision making for increased utilization of evaluation results. There can be different levels of participation by beneficiaries in an evaluation. There are two primary objectives to participation and participatory approaches: •

participation as product where the act of participation is an objective and is one of the indicators of success

•

participation as a process by which to achieve a stated objective.

According to Patton (1997), the basic principles of participatory evaluation are:

Page 266

•

evaluation process involves participants skills in goal setting, establishing priorities, selecting questions, analyzing data, and making decisions on the data

•

participants own [commit to] the evaluation as they make decisions and make their own conclusions

•

participants ensure that the evaluation focuses on methods and results that they consider important

•

people work together and hence group unity is facilitated and promoted

•

all aspects of the evaluation are understandable and meaningful to participants

•

self-accountability is highly valued

•

facilitators act as resources for learning and participants are decision makers and evaluators.

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach The participatory evaluation approach is receiving increased attention in the development context. It is being used more often for development projects, especially community-based initiatives. Participatory evaluation is another step in the move away from the model of independent evaluation or evaluator as “expert”. In participatory evaluation, stakeholders might be asked to keep diaries or journals of their own experiences with the intervention. They might help interview others in the community. They might also be involved in analyzing the data, interpreting findings, and helping to develop recommendations. Consequently, the process of conducting a participatory evaluation is more time consuming and involves more transaction costs and they meet more often. Planning decisions, such as identifying the questions, measures, and data collection strategies, are made together. It is a joint process rather than a more traditional top-down process. The participatory approach usually increases the credibility of the evaluation results in the eyes of program staff, and the likelihood that the results will be used. Additionally, advocates of participatory evaluation see it as a tool for empowering participants and increasing capacity at the local level for engagement in the development process. Participatory evaluation does pose considerable challenges. It can be time-consuming with meetings and making sure everyone understands what is expected. It also takes considerable skill in helping the group clarify roles, responsibilities, and the process. Groups tend to go through a process where differences are reconciled and group norms develop before the group focuses on achieving the tasks at hand. This group dynamic process is sometimes referred to as “forming, storming, norming, and performing.” After forming, it is natural to hit a period of conflict (storming). If the group works through these conflicts, it will establish more specific agreements about how they will work together (norming). Once these agreements are established, they will move onto performing the tasks at hand (performing). There may be challenges in creating an egalitarian team in a culture where the different members have different status in their community. The evaluator wanting to do participatory evaluation must have facilitation, collaboration, and conflict management skills (or have someone with those skills take the lead). Additionally, the evaluator must have the ability to provide just-in-time training on basic skills and techniques associated with evaluation and group process inherent in participation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 267

Chapter 5 Table 5.2 compares one view of participatory evaluation to traditional evaluation techniques. Table 5.2: Participatory versus Traditional Evaluation Techniques Participatory

Traditional

• Participant focus and ownership

• Donor focus and ownership

• Focus on learning

• Focus on accountability and judgment

• Flexible design

• Predetermined design

• More informal methods

• Formal methods

• Outsiders are facilitators

• Outside evaluators

Challenges of Participatory Evaluation in Developing Dev eloping Countries Those trained in and conducting traditional evaluations may be concerned that a participatory evaluation will not be objective. There is a risk that those closest to the intervention may not be able to see what is actually happening if it is not what they expect to see. The evaluation may indeed become “captured” and lose objectivity. Participants may be fearful of raising negative views, either because they fear that others in the group may ostracize them, or that the intervention will be terminated resulting in loss of money for the community, or that they will never get the development organization to work with them again. While approaching participatory evaluations from a learning perspective may help in reducing these fears, these need to be dealt with. Evaluators should seriously consider the degree to which credibility may be compromised (in the view of outsiders) by choosing a participatory rather than an independent evaluation approach.

Page 268

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Benefits of Participatory Evaluation in Developing Devel oping Countries Gariba (1998, Chapter 4) describes how, in the development context, the word evaluation often causes mixed reactions in donors and implementers. The donors may worry about how the evaluation will affect the project; that is, cause it to be extended or terminated. For project implementers, evaluations may cause feelings of vindicating or vilifying their approaches to project management. In any case, evaluation may cause discomfort and the evaluator is caught in the middle of these feelings. Gariba describes how participatory evaluation can be a successful and systematic way of learning from experience. With participatory evaluation, the partners in the development intervention can draw lessons from their interaction and take corrective actions to improve the effectiveness or efficiency of their ongoing future activities. Gariba (1998) describes three critical elements of participatory evaluation: •

Evaluation as a Learning Tool. This principle forms the main paradigm of choice. The purpose is not to investigate but to create an opportunity for all the stakeholders, the donors included, to learn from their particular roles in the development intervention exercise.

•

Evaluation as Part of the Development Process. The evaluation activity is not discrete and separable from the development process itself. The results and corresponding tools become, in effect, tools for change rather than historical reports.

•

Evaluation as a Partnership and Sharing of Responsibility. This is in sharp contrast to the tendency for evaluators to establish a syndrome of "we" the professionals and "they" the project actors and beneficiaries. In the participatory impact assessment methodology, all the actors have more or less equal weight.

As described by Gariba, in this context, the evaluator becomes readily transformed from an investigator to a promoter and participant.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 269

Chapter 5

Importance of Participatory Evaluation According to the Canadian International Development Agency (CIDA) Guide 2004 (pp. 22-23), if stakeholders participate in the development of results, they are more likely to contribute to the implementation of the intervention. CIDA believes that participatory evaluation also: •

builds accountability within communities

•

gives a more realistic orientation to evaluation

•

increases cooperation

•

empowers local participants by getting them involved in the evaluation process.

In participatory evaluation key stakeholders become integrally involved in: •

setting up frameworks for measuring and reporting on results

•

reflecting on:

•

Page 270

−

results achieved

−

proposing solutions

−

responding to challenges

promoting the implementation of evaluation recommendations.

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Case 55-1: Morocco: Engaging Women Building Trust: guess who knows? •

the group gathered in a circle and joined hands

•

facilitator asked them to entangle themselves without letting go of hands

•

two outsiders were asked to give instructions to untangle the group −

time it took: six minutes

•

then the group was asked to repeat the exercise and entangle themselves

•

the facilitator was asked to give instructions, and simply said “untangle yourselves” −

time it took: ten seconds

Conclusions: •

Local people know better how to get out of their own mess because they live in it.

•

The role of “outsiders”: Facilitators and catalysts rather than leaders.

The following Web site has an example of a participatory evaluation, “Preventing Chronic Disease: A Participatory Evaluation Approach, by Contra Costa Health Services”: http://www.cchealth.org/groups/chronic_disease/guide/evalu ation.php “Picturing Impact: Participatory Evaluation of Community IPM in Three West Java Villages” is at the following Web site: http://www.communityipm.org/docs/Picturing%20Impact/Pic turing%20Impact%20top%20page.html

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 271

Chapter 5

Outcome Mapping The International Development Research Centre (IDRC) has developed an innovative approach to evaluation. Their outcome mapping approach does not attempt to replace more traditional forms of evaluation, but to supplement them by focusing on related behavioral change. Much of the information in this section is adapted from the Earl, Carden, and Smutylo. (2001, pp. 1-5) Outcome Mapping book. In brief, outcome mapping focuses on one specific type of result: outcomes as behavioral change. Recall that outcomes are defined as changes in the behavior, relationships, activities, or actions of other people, groups, and organizations with whom a program works directly. Outcomes can be logically linked to a project, program, or policy’s activities. When using outcome mapping, the focus is on outcomes rather than the achievement of development impacts, because these are too “downstream” and are the result of many efforts and interventions. To accurately assess any one organization’s contributions to impact, IDRC argues, is futile. Instead, outcome mapping seeks to look at behaviors, resulting from multiple efforts, to help improve the performance of projects, programs, and policies, by providing new tools, techniques, and resources to contribute to the development process. While recognizing the importance of impact as the ultimate goal, outcome mapping seeks to provide information for programs to improve their performance. Under outcome mapping, boundary partners are identified. These are individuals, groups, and organizations that interact with projects, programs, and policies. They are also those who may have the most opportunities for influence. Outcome mapping assumes that the boundary partners control change. It also assumes that it is their role as external agent that provides them with access to new resources, ideas, or opportunities for a certain period of time. By focusing on these behavior changes, outcome mapping argues, the most successful programs are those that transfer power and responsibility to people acting within the project or program. The focus of outcome mapping is people. It is a shift away from assessing the development impact of a project or program and towards describing changes in the way people behave through actions and relationships alone or within groups and/or organizations. Outcome mapping provides a way to model what a program intends to do; however, it differs from most traditional logic models. Unlike traditional logic models, outcome mapping recognizes that different boundary partners operate within different logic and responsibility systems.

Page 272

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach Outcome mapping offers a method for monitoring changes in the boundary partners and in the program as an organization. Outcome mapping also encourages the program to regularly assess how it can improve its performance. Another way to use outcome mapping is as an end-of-program assessment tool when the purpose of the evaluation is to study the program as a whole. Many programs, especially those focusing on capacity building, can better plan for and assess their contributions to development by focusing on behavior. For example, a program may have the objective to provide communities with access to cleaner water by installing purification filters. With the traditional method of evaluation, the results might be measured by counting the number of filters installed and measuring the changes in the level of contaminants in the water before and after the filters were installed. An outcome mapping approach would focus on behavior. It would start with the premise that water does not remain clean without people being able to maintain its quality over time. The outcomes of the program would then be evaluated by focusing on the behavior of those responsible for water purity: specifically, changes in their acquisition and use of the appropriate tools, skills, and knowledge. Outcome mapping would evaluate how people monitor the contaminant levels, change filters, or bring in experts when required. The following is a song about Outcome Mapping.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 273

Chapter 5

The Output Outcome Downstream Impact Blues Outputs, Outcomes, Impacts – For Whom, by Whom, Says Who? - written by Terry Smutylo, former Director of Evaluation (International Development Research Center, Ottawa) – Coda Don’t look for impact with attribution (4x) Well there’s a nasty little word getting too much use In development programs it’s prone to abuse It’s becoming an obsession now we’re all in the act Because survival depends on that elusive impact. REFRAIN I Because it’s impact any place, impact any time Well you may find it ’round the corner or much farther down the line But if it happens in a way that you did not choose You get those Output Outcome Downstream Impact Blues. Now when donors look for impact what they really wanna see Is a pretty little picture of their fantasy Now this is something that a good evaluator would never do Use a word like impact without thinking it through. But now donors often say this is a fact Get out there and show us your impact You’ve got to change peoples’ lives and help us take the credit Or next time you want funding – huh hmm You might not just get it. REFRAIN I Because it’s impact any place, impact any time Well you can find it ’round the corner or much farther down the line But if it happens in a way that you did not choose You get those Output Outcome Downstream Impact Blues. Well recipients are always very eager to please When we send our evaluators overseas To search for indicators of measurable impact Surprising the donors what, what they bring back. Well impact they find when it does occur Comes from many factors and we’re not sure Just what we can attribute to who Cause impact is the product of what many people do. REFRAIN II Because it’s impact any place, impact any time Well you can find it ’round the corner or much farther down the line But if you look for attribution you’re never going to lose Those Output Outcome Downstream Impact Blues. So donors wake up from your impossible dream You drop in your funding a long way upstream Then in the waters they flow, they mingle, they blend So how can you take credit for what comes out in the end. REFRAIN II Because it’s impact any place, impact any time Well you can find it ’round the corner or much farther down the line But if you look for attribution you’re never going to lose Those Output Outcome Downstream Impact Blues.

Coda (4x then fade)

Page 274

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach A recording of Terry Smytylo performing the song is available at the following Web page: http://www.idrc.ca/en/ev-65284-201-1-DO_TOPIC.html

The following Web site has an example of outcome mapping, “African Health Research Fellowship Program”: http://www.idrc.ca/fr/ev-34425-201-1-DO_TOPIC.html

Rapid Assessment Rapid assessments used in the development evaluation context meet the need for fast and low-cost evaluations. In developing countries, it sometimes is not possible, due to time and other resource constraints, to conduct a more thorough evaluation study. The country may lack baseline data, may not have an accurate or complete listing of everyone in the population, beneficiaries may have low literacy which means that only administered questionnaires can be used, etc. While there is no fixed definition as to what a rapid assessment is, it is generally described as a bridge between formal and informal data collection and as a “fairly quick and fairly clean” approach rather than “quick and dirty.” It could be described as a systematic, semi-structured, approach. It is used in the field, typically with a team of evaluators. Ideally, the team is diverse so that a variety of perspectives will be reflected. Rapid assessment is best used when looking at processes and not at outcomes or impacts. Generally, it seeks to gather only the most essential information – the “must know” rather than the “nice to know” – and tends to use both quantitative and qualitative approaches. Its basic orientation in development evaluation is to “seek to understand” because a nonjudgmental approach will be more likely to elicit open and honest conversations. Observation of the intervention within its setting can provide clues as to how well the intervention is working. Listening skills are essential. A key task is to identify people who have a range of experiences and perspectives, especially those who would most likely be overlooked in an evaluation. A small but highly diverse group of informants can be very effective in obtaining a holistic view of the situation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 275

Chapter 5 Rapid assessments must use more than one source of information. Multiple sources increase credibility, reduce bias, and provide a holistic perspective. Rapid assessment can use the same data collection and data analysis methods as any other evaluation. The difference is usually in terms of scope. Typically, rapid assessments are small in scope: a few people in face-to-face data collection in a few locations. Existing data (prior reports and studies, records, and documents) supplement and corroborate data collected by observation, interviews, and focus groups. Surveys, usually administered, are also sometimes used. To the extent that qualitative methods are used, strong note taking skills are essential. It helps if the evaluator maintains a journal to note observations, feelings, hunches, interpretations as well as any incidents that happen during the field visit. These need to be shared with other team members to help identify common themes. While rapid appraisal is not limited to one particular method, following a few principles will help: •

Conduct a review of secondary data before going into the field.

•

Once in the field, observe, converse, and record.

•

Maintain good notes throughout the process; not only are good notes essential for the report, they will help make sense out of the information gathered across the team.

Some strategies and lessons learned in doing rapid appraisals include using a diverse, multidisciplinary team with both men and women as members of the team. When possible, recruit insiders, who have familiarity with the intervention and the local area; and outsiders, who will see things fresh. Other strategies and lessons include:

Page 276

•

Using small teams, rather than large teams, to maximize interactions

•

Dividing time between collecting data and making sense out of it

•

Willingness to go where needed: fields, market places, off the main road

•

Flexibility and adaptability since new information can change the evaluation plan (Food and Agriculture Organization of the United Nations, 1997).

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach The following Web site shows an example of rapid assessment, prepared by the United Nations Interregional Crime and Justice Research Institute (UNICRI) and the Australian Institute of Criminology (AIC), “Global Programme Against Trafficking in Human Beings, Rapid Assessment: Human Smuggling and Trafficking from the Philippines” can be found at the following Web site: http://www.unodc.org/pdf/crime/trafficking/RA_UNICRI.pdf

Evaluation Synthesis and MetaMeta-evaluation An evaluation synthesis is a useful approach in situations where many other evaluations about a particular intervention have already been done. This might be most useful in looking at similar interventions addressing a similar issue or theme. It is useful when the evaluation seeks to find out the overall effectiveness of an intervention. Chlelimsky and Morra (1984) brought the method to a wider policy issue (Smith & Brandon, 2007, p. 97). For evaluation synthesis, it is necessary to: •

locate all relevant studies

•

establish criteria to determine the quality of the studies

•

establish criteria so as to include only quality studies

•

combine the results – chart the quality of the studies and the key measure of impact.

While each individual evaluation may provide useful information about a specific intervention, it typically is too weak to allow for a general statement about intervention impact. However, when the results of many studies are combined, it is possible to make general statements about the intervention’s (and even policy) impact. One advantage of an evaluation synthesis is that is uses available research, making it cheaper to do. It also creates a much larger base for assessing an intervention impact: more people and more data. It is possible to be fairly confident in making general statements about intervention impact. The challenges are in locating all relevant studies and obtaining permission to use the data. There also is some risk of bias in selecting studies. The criteria for selection must be stated explicitly.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 277

Chapter 5 For example, in 2000, the Department for International Development (DFID) from the United Kingdom published an evaluation synthesis study on the environment (DFID, 2000). During the 1990s DFID successfully managed a substantial portfolio of environmental projects but felt that the environmental benefits were “generally assumed rather than critically examined” (p. 1). The Environmental Synthesis Study was commissioned to examine 49 DFID supported-projects in five countries on the implementation and impact of the DFID bilateral project support for environmental improvement and protection. The projects were not primarily environmental projects, but covered a wide range of environmental interventions (energy efficiency, industrial, forestry, biodiversity, agriculture, and urban improvement). After looking through the 49 studies, one conclusion was that there was a “gap between high policy priority attached by DFID to environmental issues… and what has actually been delivered in terms of positive environmental impact. They also concluded that the key challenge is use evidence and research, not assertion to evaluate environmental issues. In discussing environmental interventions they stated: If they are not to be further sidelined, environmental considerations and interventions need to become demonstrably effective as a means of achieving poverty reduction, not just a worth ad-on or a risk to be avoided (DFID, p. 4). An evaluation report for an evaluation synthesis will include the following. •

results

•

citations for all studies

•

clearly stated procedures for identifying studies

•

criteria for inclusion in synthesis

•

description of the studies

•

gaps or limitations of the analysis.

There are both advantages and challenges for using evaluation synthesis studies. The advantages of an evaluation synthesis are that it:

Page 278

•

uses available studies

•

avoids original data collection

•

yields stronger conclusions than any of the individual studies – increases the power of the findings due to the results

•

is cost effective.

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach The potential challenges to evaluation synthesis are: •

difficulty in locating all the relevant studies

•

difficulty in obtaining permission to use the data

•

the same group may have done several studies

•

difficulty in developing a credible measure of quality

•

the risk of bias in selecting studies.

An evaluation synthesis can also be qualitative. In 1997, an evaluation study looking at the impact of Non-Government Organizations interventions as well as evaluation methods was published (OECD/DAC, 1997). The purpose of the study was to assess the impact of development interventions (the evaluation synthesis component of the study) and to assess the evaluation methods and approaches used (a meta-evaluation component; i.e., evaluating evaluations). Qualitative studies are different from a literature review. A qualitative study implies a quality test. The following Web site shows an example of evaluation synthesis by the Institute of Development Studies at the University of Helsinki, “Searching for Impact and Methods: NGO Evaluation Synthesis Study”: http://www.eldis.org/static/DOC5421.htm (Note: Eldis is one of a family of knowledge services from the Institute of Development Studies, Sussex. Eldis is core funded by Sida, Norad, SDC, and DFID.)

Meta--evaluation Meta Some use the terms evaluation synthesis and meta-evaluation interchangeably. In this textbook, we distinguish between the two. As used here, evaluation synthesis refers to an analytic summary of results across evaluations that meet minimum quality standards. An evaluation synthesis systematically analyzes and summarizes the results of evaluation studies. In an evaluation synthesis, the focus is on results and only those evaluations meeting quality standards are included in the synthesis of results. In contrast, meta-evaluation is viewed as an expert review of one or more evaluations against professional quality standards. Patton (1997, p. 143) describes meta-evaluation as “evaluating the evaluation based on the profession’s standards and principles.”

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 279

Chapter 5 As used here, evaluation synthesis refers to an analytic summary of results across evaluations that meet minimum quality standards. In contrast, meta-evaluation is viewed as an expert review of one or more evaluations against professional quality standards. It is a way of re-analyzing results of one or various evaluations. Some consider a meta-evaluation to also serve the purpose of an evaluation synthesis, while others distinguish the two. The Campbell Collaboration is an international organization that exists to do meta-evaluations and then synthesize results of those, meeting high quality standards.

The following Web site links to the Campbell Collaboration Online Library where you can view many of their documents. http://www.campbellcollaboration.org/frontend.aspx

Emerging Approaches As evaluation struggles with increased politization of evaluation and evaluator cultural competence, even an issue with cross-cultural expansion of evaluation emerges. Patton (2008) sees a resulting proliferation of evaluation approaches, theories, and models and among these are:

Page 280

•

utilization-focused evaluation

•

empowerment evaluation

•

realist evaluation

•

inclusive evaluation

•

beneficiary evaluation

•

horizontal evaluation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Evaluation Utilization Focused Evaluati on Utilization-Focused Evaluation (U-FE) proposes that an evaluation should be judged by its utility and how it is actually used. Patton (2002, p. 173) describes utilization-focused evaluation in this way: Utilization-focused evaluation begins with identification and organization of specific, relevant decision makers and information users (not vague, passive audience) who will use the information that the evaluation produces. The Utilization-Focused Evaluation Checklist is located at the following Web site: http://www.wmich.edu/evalctr/checklists/ufechecklist. htm A description of the textbook Utilization-Focused Evaluation by Michael Q. Patton and excerpts from the textbook can be found at the following Web site: http://www.sagepub.com/booksProdDesc.nav?prodId= Book229324

Empowerment Evaluation Fetterman (1996, p. 16), defines empowerment as “an enabling and emancipatory concept”. Zimmerman (2000) describes empowerment as the process by which people take chare of their environment. Empowerment is usually associated with political or decision-making power and contributes to psychological power. By helping people achieve their goals as a member of a community, and improve lives, empowerment can create a sense of well-being and positive growth (Fetterman and Wandersman, 2004, p. 10). Empowerment Evaluation (EE) is the use of evaluation concepts, techniques, and findings to foster improvement and self-determination. (Fetterman, Kaftarian, & Wandersman, 1996). Empowerment evaluation goes beyond participatory evaluation. It acknowledges a deep respect for people’s capacity to create knowledge about, and solutions for, their own experience.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 281

Chapter 5 Nine empowerment evaluators got together and defined empowerment evaluation in the following way: Empowerment evaluation: An evaluation approach that aims to increase the probability of achieving program success by (1) providing program stakeholders with tools for assessing the planning, implementation, and selfevaluation of their program, and (2) mainstreaming evaluation as part of the planning and management of the program/organization (Wandersman et al., 2005, p. 28). Fetterman & Wandersman (2004, p. 4) describe the role of an empowerment evaluator as a “critical friend”. They advocate that community members remain in charge of the evaluation and that the evaluator play the role of a facilitator and influence the evaluation rather than being the authority controlling the evaluation. Empowerment evaluation was influenced by collaborative and participatory evaluations, as well as other approaches. Both participatory and empowerment evaluations focus on participation and active engagement. They are both committed to local control and capacity building; asking evaluators to listen, share, and work together with other participants. Empowerment evaluation is also similar to utilization-focused evaluation because both approaches are designed to be helpful, constructive, and useful at every stage of the evaluation (Fetterman & Wandersman, 2004, p. 6). In describing the difference between empowerment and participatory evaluation, Alkin & Christie (2004, p. 56) state that: Since participatory evaluation emerges from a utilization framework, the goal of participatory evaluation is increased utilization through these activities [design, implementation, analysis, and interpretation] as apposed to empowering those that have been oppressed, which is political or emancipatory in nature.” The Empowerment Evaluation Blog is available at the following Web site: http://eevaluation.blogspot.com/ A link to the article Empowerment Evaluation: Yesterday, Today, and Tomorrow in the American Journal of Evaluation can be found at the following Web site: http://homepage.mac.com/profdavidf/documents/EEye sterday.pdf

Page 282

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Realist Evaluation Realism is a “concern for fact or reality and rejection of the impractical and visionary” (Realism, 2008). Pawson and Tilley (2004, p. 1) describe realist evaluation as a “species of theorydriven evaluation”. They relate it to theory of change and program theory because they regard social programs as products of the human imagination, that is, they are a hypothesis of social betterment. As in a theory of change, a realist evaluation draws out a chain of events where “wrongs might be put to rights, deficiencies of behaviour corrected, inequalities of condition alleviated” (Tilley & Pawson, 2004, p. 1). According to Tilley and Pawson (2004, p. 22) realist evaluation provides a “coherent and consistent framework” for the way evaluations engage with programs. Realist evaluation understands the importance of stakeholders to program development and delivery – but it: … steers a course between disregard for stakeholders on account of their self-interested biases and their craven treatment as omniscient and infallible on account of their inside knowledge. Stakeholders are treated as fallible experts whose understanding needs to be formalized and tested (Tilley & Pawson, 2004, p. 20-21). Realist evaluation is derived from a wide range of research and evaluation approaches. It draws on parts or all of other approaches. Realist evaluation has no preference for a qualitative or quantitative research method; it considers both and often chooses to combine them. Tilley and Pawson (2004, p. 21) also recognize that realist evaluation can be enormously challenging because there is no simple formula that can “provide tick-box recipes for delivering findings”. In the past, most of the work on realistic evaluation has related to the examination of individual programs but is moving to formative realist evaluation and realist metaevaluation (p. 10). The following Web sites links to paper by Pawson and Tilley: “Realistic Evaluation” by Ray Pawson and Nick Tilley http://www.dprn.nl/uploads/thematic_meetings/Realis tic%20Evaluation.pdf “Realistic Evaluation: An Overview” by Nick Tilley. http://www.danskevalueringsselskab.dk/pdf/Nick%20T illey.pdf

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 283

Chapter 5

Inclusive Evaluation Inclusive evaluation aims at involving the least advantaged as a part of a systematic investigation of the merit or worth of a project, program, or policy. Inclusive evaluation is data based but the data are generated from the least advantaged stakeholders, those who have been traditionally underrepresented. It does not include those who have been traditionally included in evaluations (Mertens, 1999, p. 5).

Beneficiary Assessment According to Salment, beneficiary assessment is: … a qualitative research tool used to improve the impact of development operations by gaining the views of intended beneficiaries regarding a planned or ongoing intervention (Salmen, 1999, p. 1). Similar to inclusive evaluation, the rationale for this approach is to involve a group that is often overlooked when considering stakeholders. Beneficiary assessment involves the ultimate client, the project beneficiaries. The rationale is that with increased participation by the beneficiaries, the beneficiaries gain ownership. They become key players in producing the needed and desired changes in their own development. The objective of beneficiary assessment is to assess the value of an activity as perceived by project beneficiaries and integrate these findings into project activities. Beneficiary assessment plays a central part in social assessment by helping to bridge culture with decision-making (Salmen, 1999, pp. 1-2). The following Web site links to “Beneficiary Assessment Manual for Social Funds” by Lawrence F. Salmen” http://lnweb18.worldbank.org/ESSD/sdvext.nsf/07By DocName/BeneficiaryAssessmentManualforSocialFunds /$FILE/%5BEnglish%5D+Beneficiary+Assessment+Man ual.pdf

Page 284

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Horizontal Evaluation The Horizontal Evaluation approach is a combination approach. It combines an internal assessment process with an external review by peers. The combination was designed to neutralize: … lopsided power relations that prevail in traditional external evaluations, creating a more favorable atmosphere for learning and subsequent program improvement (Thiele, Devaux, Velasco, & Manrique, 2006, p. 1). Horizontal evaluation has often been used to learn about and improve research and development methodologies that are under development. The horizontal evaluation approach has been used in an Andean regional program developing new research and development methodologies and in Uganda to assess the Participatory Market Chain Approach (PMCA). The key to the horizontal evaluation approach is two separate groups of stakeholders: •

local participants who present and critique the process under investigation and make recommendations on how to improve it:

•

visitors (peers from other organizations or projects who work on similar themes) who assess the process, identify strengths and weaknesses, and make suggestions for improvement (Thiele, Devaux, Velasco, Manrique, 2006, p. 4).

A component of horizontal evaluation is a three-day workshop which allows the two groups to come together. The following is a link to “Horizontal Evaluation: Stimulating Social Learning among Peers” International Potato Center/ Papa Andina Program, Lima, Peru. http://www.dgroups.org/groups/pelican/docs/Hor_Eva ln_18_05.doc?ois=no

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 285

Chapter 5

Part III: Challenges Going Forward The Millennium Development Goals (MDG) have had major implications for development evaluation. They have created a shift from goal-based evaluations that evaluate project goals and objectives to evaluations of the MDGs at the country level. As a result, evaluation should also shift from development organizations sponsoring interventions to the developing countries receiving the aid (Picciotto, 2007, p. 509). Picciotto identifies four challenges that now face development evaluation managers: •

to measure development success in terms of global goals means that the main unit of account for evaluation has shifted from the project level to the country level with new metrics derived from the commonly agreed MDGs

•

to acknowledge that poverty reduction and improved governance are the primary responsibility of developing countries assumes a readiness to make development evaluation a country-based and country-owned process instead of a donor imposition

•

to recognize that rich countries have an obligation to contribute to poverty reduction by improving the enabling environment for development suggests that, along with the quantity and quality of the aid provided by rich countries, the development friendliness of their non-aid policies should be open to evaluation scrutiny

•

to commit to the MDGs implies that the realism and fairness of the goals themselves and the soundness of their underlying theories of change constitute legitimate objects of evaluation (pp. 511-512).

These four challenges have not yet been addressed by the development evaluation community. Most evaluators working in development are still operating in a “business as usual” mode, seeing the project and not the effects on the larger picture of the country or the world. Because of this, many evaluators have not noticed that the development environment has changed. Picciotto identifies five major methodological consequences of this changed environment:

Page 286

•

Alignment – all development interventions should be assessed in terms of their impact on the MDGs as articulated in development programs owned by developing countries.

•

Aggregation – the results of development interventions should be measured at the country level through country-based arrangements.

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach •

Accountability – given the global partnership for development embedded in the MDGs, the performance of all development actors should be evaluated in terms of their distinctive accountabilities and reciprocal obligations.

•

Attribution – neither the evaluation community nor the economics profession have managed to convince a skeptical public that “aid works.” More robust evidence is needed to demonstrate a causal link between aid operations and country level economic performance.

•

Asymmetry – The imbalance between the seven MDGs that address poor countries’ performance and the one (the eighth MDG) that addresses the need to improve the enabling global policy environment for development should be redressed: that is, the MDGs themselves (and the processes that underlie their monitoring and their oversight) should be evaluated (p. 512).

Rich countries have an obligation beyond providing financial aid to developing countries, they need to transform their priorities towards evaluation. (p. 509) To reach these new priorities, Picciotto promotes redeploying resources away from the evaluation of individual interventions toward: •

evaluation at a higher plane

•

evaluation capacity building

•

citizens’ participation in evaluation

•

evaluations of the mutual accountability between aid donors and recipients

•

evaluation-research partnerships designed to capture the poverty impact of policies and standards shaped by rich countries and the polity networks they control (p. 520).

Many of the new approaches to development evaluation are striving to help evaluation make the shift to evaluation conducted by the developing country; addressing more than project goals or objectives and addressing MDGs; evaluation of many interventions at a time to see a larger picture; and how all affect the country, region, or world.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 287

Chapter 5

Summary There are many approaches to development evaluation. An evaluation approach is a way of looking at or conceptualizing evaluation in a general way. It often incorporates a philosophy and a set of values. Some approaches have been in use for years while others are newly developed. The following evaluation approaches were covered: •

prospective evaluation

•

evaluability assessment

•

goal-based evaluation

•

goal-free evaluation

•

multi-site evaluation

•

cluster evaluation

•

social assessment

•

environment and social assessment

•

participatory evaluation

•

outcome mapping

•

rapid assessment

•

evaluation synthesis and meta-evaluation

•

emerging approaches −

utilization-focused evaluation

−

empowerment evaluation

−

realist evaluation

−

inclusive evaluation

−

beneficiary evaluation

−

horizontal evaluation.

With the emphasis on the Millennium Development Goals, evaluations need to begin addressing these goals as well as the goals and objectives of their specific project, programs, or policies. Many of the new approaches are an attempt to move development evaluation to this higher level of evaluating how the intervention affects the larger picture of the whole country, region, and possibly the world. Table 5.3 summarizes the evaluation approaches discussed in this chapter.

Page 288

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Table 5.3: Matrix Comparing Comparing the Evaluation Approaches Evaluation Approach

Purpose/Philosophy Purpose/Philosophy

Characteristics/ Characteristics/Activities s/Activities

Strengths

Challenges

Prospective

Program review before it begins Used for forward looking questions

Careful, skilled textual analysis of the program or policy Review and synthesis of other evaluation studies Prediction of success or failure and suggestions on strengthening the proposed program or policy if it does go forward

Evaluability Assessment

Brief study to determine if an evaluation would be useful and feasible Facilitating communication between evaluators and stakeholders Clarify purposes of a formative study or as a planning tool

Review materials, interview intervention managers Stakeholders and those people and organizations that can implement recommendations Develop theory of change model

Can lead to a more realistic and appropriate evaluation

Can be time consuming and costly, especially if the group conducting the assessment does not work well together

GoalGoal-based

Measure whether goals and objectives have been met Typical evaluation with which most are familiar as it is the basis of most donor project evaluation systems

Identifies goals and objectives Measures if intervention reaches goals and objectives — normative evaluation

Clear methodology in comparison of actual against standards

Important effects may missed because they are not explicitly stated goals or objectives

Opens the option of gathering data on the effects and effectiveness of programs without being constrained by a narrow focus of stated goals or objectives

Purposefully avoids becoming aware of the program goals Predetermined goals are not permitted to narrow the focus of the evaluation study Evaluator has limited contact with program manager and staff

Increases the likelihood that unanticipated side effects will be noted

Limited contact with program staff Quite difficult to avoid awareness of program goals and objectives

(Objective(Objective-based)

GoalGoal-free

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 289

Chapter 5

Evaluation Approach

Purpose/Philosophy Purpose/Philosophy

Characteristics/Activities Characteristics/Activities

Strengths

Challenges

MultiMulti-site

Investigate interventions with standard implementation in all locations or with planned variation, to determine conditions under which the program best achieves its goals and objectives. Must capture the climate in which the interventions operate, as well as cultural, geographic, economic, size or other systemic differences that might affect variations

Stakeholder participation is important Generally is a strong design, gathers deeper information Sites are selected because they represent certain characteristics that may result in systematic differences in intervention implementation and results Comparison of the interventions within their contexts is likely to provide a range of lessons learned and strategies for dealing with a variety of situations

Investigates many interventions implemented in the same way Good practices often emerge

Data collection must be standardized Requires welltrained staff, access to all sites, and sufficient information ahead of time to design data collection instruments Each location is different and differences must be considered

Cluster Cluster

Focus on interventions that share a common mission, , strategy, and target population BUT the intent is to learn what happened, NOT if it was successful

Stakeholder participation is important NOT concerned with generalizability or replicablity Variation is viewed as positive More likely to use a multiple case studies, with sharing of information across cases through networking conferences

Evaluation is more focused on learning than drawing overall conclusions about program quality or value

Do not show results for data about individual sites; rather the data will only show aggregate information

Social Assessment

Looks at various social structures, processes, and changes within a group or community Main instrument to ensure that social impacts of development interventions are taken into account

Investigates consistency between objectives of the intervention with the needs, interests, and capacities of the stakeholders Addresses the affect of social and cultural factors on the ability of stakeholders to participate or benefit from the intervention Investigates the impact of the intervention on the stakeholders

Often included now as part of Environmental & Social Assessment

Environmental & Social Assessment

Evaluate the attainment of environmental and social objectives

Environmental evaluation may be the sole purpose of the exercise or it may be embedded in the evaluation

Many evaluations now considered flawed if do not do an E & S assessment

Page 290

Often requires technical expertise

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Evaluation Approach

Purpose/Philosophy urpose/Philosophy

Characteristics/Activities Characteristics/Activities

Strengths

Challenges

Participatory

Stakeholders share responsibility for evaluation planning, implementing, and reporting Participation as a product where the act of participation is an objective and is one of the indicators of success Participation as a process by which to achieve a stated objective

Evaluation process involves participants skills in goal setting, establishing priorities, selecting questions, analyzing data, and making decisions on the data People work together; group unity is facilitated and promoted, self-accountability is highly valued Facilitators act as resources for learning and participants are decision makers and evaluators

Increases credibility of the evaluation results in the eyes of the program staff and likelihood that the results will be used A tool for empowering participants and increasing capacity at the local level

May be challenges in creating an egalitarian team Evaluator must have facilitation, collaboration, and conflict management skills as well as skills to train participants in evaluation techniques Often not viewed as “independent” evaluation

Outcome Mapping

Supplements more traditional forms of evaluation by focusing on related behavioral change Focus is on people and outcomes rather than achievement of development impacts

Seeks to look at behaviors resulting from multiple efforts to help improve performance of projects, programs, or policies by providing new tools techniques, and resources Boundary partners are identified to see their influences on change Describing changes in the way people behave through actions and relationships alone or within groups and/or organizations

Gets out of the “Downstream Outcome Blues”

For accountability, demand is for impact evaluation

Rapid Assessment

Meets the need for fast and low-cost evaluations Generally seeks to gather only the most essential information to provide clues

Usually uses a systematic, semi-structured approach Typically uses documents analysis, interview, and a short site visit Must use more than one source of information Good listening and note taking are essential

A kind of bridge between formal and informal data collection Best used when looking at processes and issues

Typically small in scope Gives limited information, mostly descriptive

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 291

Chapter 5

Evaluation Approach

Purpose/Philosophy Purpose/Philosophy

Characteristics/Activities Characteristics/Activities

Evaluation Synthesis

Useful when other evaluations about a particular intervention have already been done Locates and studies all relevant studies and combine their results

Locate all relevant studies Establish quality criteria to determine the quality of the studies Include only quality studies Combine the results, chart the quality of the studies and the key measures of impact May use qualitative and/or quantitative data

Uses available evaluation and research making it less costly Creates a much larger base for assessing an intervention impact

MetaMeta-evaluation

Evaluating an evaluation based on professional standards

Professional reviews an evaluation for meeting quality standards

Improves quality of evaluations Helps new evaluators learn processes and qualify standards for evaluations

Utilization Utilizationationfocused

Evaluation should be judged by its utility and how it is actually used

Begins with identification and organization of specific, relevant decision makers and information users (not vague, passive audience) who will use the information that the evaluation produces

Empowerment

The use of evaluation concepts, techniques, and findings to foster improvement and self-determination

Goes beyond participatory evaluation by acknowledging a deep respect for people’s capacity to create knowledge about, and solutions for, their own experience

Realist

Realist evaluation provides a coherent and consistent framework for the way evaluations engage with programs

Derived from a wide range of research and evaluation approaches. It draws on parts or all of other approaches. Has no preference for a qualitative or quantitative research method; it considers both and often chooses to combine them

Inclusive

Aims at involving the least advantaged as a part of a systematic investigation of the merit or worth of a project, program, or policy.

Inclusive evaluation is data based but the data are generated from the least advantaged stakeholders, those who have been traditionally underrepresented. It does not include those who have been traditionally included in evaluations

Page 292

Strengths

Challenges Locating all relevant studies and obtaining data Some risk in bias in selecting “quality” studies

Variant of participatory evaluation

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Evaluation Approach

Purpose/Philosophy Purpose/Philosophy

Characteristics/Activities Characteristics/Activities

Beneficiary

A qualitative research tool used to improve the impact of development operations by gaining the views of intended beneficiaries regarding a planned or ongoing intervention

Involves the ultimate client, the project beneficiaries. With increased participation by the beneficiaries, the beneficiaries gain ownership.

Horizontal

Combines an internal assessment process with an external review by peers to neutralize lopsided power relations that prevail in traditional external evaluations

Has often been used to learn about and improve evaluation and development methodologies that are under development

The Road to Results: Designing and Conducting Effective Development Evaluations

Strengths

Challenges

Page 293

Chapter 5

Chapter Chapter 5 Activities: Application Exercise 5.1: Describing the Approaches Instructions: Assume you are working with a group of stakeholders and you are trying to explain one of the approaches described in this chapter. Using the case examples in this chapter, prepare a 5minute presentation you would give to the stakeholders. The presentation should include: •

A description of the approach: what it is, benefits and challenges.

•

A description of the application in the example.

•

Why this approach might be useful, or not useful, in evaluating your project.

Application Exercise 55-2: You are convinced that another approach to an evaluation, outlined just above in Exercise 5-1, will be more effective. What reasons would you give to your superior to convince him/her that one of these approaches is the most appropriate for evaluating your intervention? What are his/her likely concerns and how would you respond?

Page 294

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach

Application Exercise 55-3: For each of the following five short scenarios, select an evaluation approach and describe your rationale for choosing the approach.

Scenario 1 A development bank wants to assess a country’s strategic focus of technical assistance based on the findings of five country studies completed by different development organizations.

Scenario 2 A development organization is seeking to learn what successful or highly successful educational interventions implemented in their projects and programs in order to improve the educational systems in the region.

Scenario 3 A development organization is seeking to evaluate the most significant issues concerning the country’s natural resources and environment sector.

Scenario 4 A development bank is investigating the development in the rice sector for a country. It plans to investigate the importance of rice in the current cultural, social, and economic contexts; rice production systems; constraints facing rice farmers; research conducted and technologies developed; and future priorities for further rice development.

Scenario 5 A development bank has invested in numerous projects and programs and millions of dollars into interventions by one development organization in international agriculture research for over 30 years. They want to evaluate the evaluations completed by the development organization in the agriculture sector.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 295

Chapter 5

References and Further Reading Alkin, M. & Christie, C. (2004). An Evaluation Theory Tree. In M. Alkin (Ed.), Evaluation Roots. Thousand Oaks, CA: Sage Publications. Canadian International Development Agency (2004). CIDA Evaluation Guide 2004. Ottawa, Ontario: Canadian International Development Agency. Chambers, R. (1991).Shortcut and participatory methods for gaining social information for projects. In Putting people first: sociological variables in rural development, 2nd ed., Ed. M. M. Cernea, 515–37. Washington, D.C.: Oxford University Press, World Bank. Chelimsky, E. and L. G. Morra (1984). Evaluation synthesis for the legislative user. In W. H Yeaton and P. M. Wortman (Eds), Issues in data synthesis. New directions for program evaluation, No. 24. San Francisco: Jossey-Bass. Christie, C. & Alkin, M. (2004).Objectives Based Evaluation. In S. Mathison (Ed.), Encyclopedia of Evaluation. Thousand Oaks, CA: Sage Publications. Cousins, J. B. and L. M.Earl (Eds.) (1995). Participatory evaluation in education. Bristol, PA: Falmer Press. DFID (2000). Environmental evaluation synthesis study. “Environment: Mainstreamed or sidelined?” EVSM EV626, January, 2000. Retrieved May 8, 2008 from http://www.dfid.gov.uk/aboutdfid/performance/files/ev62 6s.pdf Duignan, Paul (2007). Introduction to strategic evaluation: Section on evaluation approaches, purposes, methods, and designs. Retrieved May 8, 2008 from http://www.strategicevaluation.info/se/documents/104f.ht ml Earl, Sarah, Fred Carden, and Terry Smutylo (2001). Outcome mapping: Building learning and reflection into development programs. Ottawa, Ontario: International Development Research Centre. http://www.dgroups.org/groups/pelican/docs/Mapping_M &E_capacity_080606.pdf Eerikainen and Michelevitsh (2005). Environmental and social sustainability. Methodology and toolkit: Various approaches. Presentation at IPDET, July, 2005.

Page 296

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach European Centre for Development Policy Management (ECDPM) (2006). Study on capacity, change and performance: Mapping of approaches towards M&E of capacity and capacity development. Retrieved May 9. 2008 from Fetterman, David M., S. Kaftarian, and A. Wandersman (Eds.) (1996). Empowerment evaluation: knowledge and tools for self-assessment and accountability. Thousand Oaks, CA: Sage Publications. Fetterman, David M., (2001). Foundations of empowerment evaluation. Thousand Oaks, CA: Sage Publications. Fetterman, David M. and Abraham Wandersman (2004). Empowerment evaluation principles in practice. New York: Guilford Publications. Fetterman, David M., and Abraham Wandersman (2007). Empowerment Evaluation: Yesterday, Today, and Tomorrow. American journal of evaluation 28; 179. Also available on the Internet. Retrieved May 9, 2007 from http://homepage.mac.com/profdavidf/documents/EEyeste rday.pdf Fitzpatrick, Sanders, and Worthen (2004). Program evaluation: Alternative approaches and practical guidelines. New York: Pearson. Food and Agriculture Organization (FAO) of the United Nations (1997). Marketing Research and Information Systems (Marketing and Agribusiness Texts-4). Chapter 8: Rapid Rural Appraisal. Retrieved July 16, 2007 from http://www.fao.org/docrep/W3241E/w3241e09.htm Gariba, Sulley (1998). “Participatory impact assessment: Lessons from poverty alleviation projects in Africa” in Knowledge shared: Participatory evaluation in development cooperation; Jackson and Kassam, Eds. Bloomfield, CT: Kumarian Press, Inc. Glass, Gene V. and Mary Lee Smith (1979). Meta-analysis of research on class size and achievement. Educational evaluation and policy analysis. January-February: 2-16. Government Accountability Office (GAO)(1990). Prospective evaluation methods: The prospective evaluation synthesis. Retrieved July 16, 2007 from: http://www.gao.gov/special.pubs/10_1_10.PDF Government Accountability Office (GAO)(1992). The evaluation synthesis. Retrieved May 28, 2008 from: http://www.gao.gov/special.pubs/pemd1012.pdf

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 297

Chapter 5 Hirschheim, R and S. Smithson (1988) “A Critical analysis of information systems evaluation”, in IS Assessment: issues and changes (Eds N Bjorn- Andersen & G B Davis). Amsterdam: North-Holland. The International Bank for Reconstruction and Development/The World Bank (2004) Turning Bureaucrats into Warriors. Chapter 24 Social assessment. Washington DC. The World Bank. Retrieved July 16, 2007 from: http://www.worldbank.org/afr/aids/gom/manual/GOMChapter%2024.pdf ISO (1999). Environmental management – Performance evaluation – guidelines. Standard; Reference ISO 14301. Geneva, Switzerland: ISO. Johnston, Timothy, and Susan Stout (1999). “Investing in Health: Development in Health, Nutrition, and Population Sector,” The World Bank, Operations Evaluation Department. Retrieved July 16, 2007 from: http://wbln0018.worldbank.org/oed/oeddoclib.nsf/6e14e4 87e87320f785256808006a001a/daf8d4188308862f852568 420062f332/$FILE/HNP.pdf Khon Kaen University. (1987).Proceedings of the 1985 International conference on rapid rural appraisal. Khon Kaen, Thailand: Rural Systems Research and Farming Systems Research Projects. Kumar, Krishna (Ed.) (1993). Rapid appraisal methods. Washington, D.C.: World Bank. Kudat, Ayse and Bykebt Ozbilgin (1999). “Azerbaijan Agricultural Development and Credit Program”. Retrieved July 16, 2007 from: http://lnweb18.worldbank.org/ESSD/sdvext.nsf/61ByDoc Name/AzerbaijanAgriculturalDevelopmentandCreditProject /$FILE/AzerbaijanAgriculturalDevelopmentandCreditProjec t424KbPDF.pdf Light, R. J. and D. B. Pillemer 1984). Summing up. The science of reviewing research. Cambridge, MA: Harvard University Press. McKnight, J. (1992). Asset mapping. Evanston, IL: Northwestern University. Mertens, D. (1999). Inclusive evaluation: Implications of transformative theory for evaluation. American journal of evaluation, 20(1), 1-14. Nay, J. and P. Kay (1982). Government oversight and evaluability assessment. Lexington, MA: Heath.

Page 298

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach OECD/DAC (1997). Searching for Impact and Methods: NGO Evaluation Synthesis Study. Retrieved July 16, 2007 from: http://www.eldis.org/static/DOC5421.htm Patton, Michael Q.(2002). Qualitative evaluation and research methods, 3rd Ed. Thousand Oaks, CA: Sage Publications. Patton, Michael Q. (1997). Utilization focused evaluation: The new century text. Beverly Hills: Sage Publications. Patton, Michael Q.(1990). Qualitative evaluation and research methods, 2nd Ed. Thousand Oaks, CA: Sage Publications. Paulmer, Hubert E. (2005). “Evaluation guidelines of International development aid agencies: A comparative study.” Paper presented to Faculty of Graduate Studies, University of Guelph, Guelph, Ontario. Pawson, Ray and Nick Tilley (2004).“Realistic evaluation” Retrieved May 9, 2008 from http://www.dprn.nl/uploads/thematic_meetings/Realistic %20Evaluation.pdf Picciotto, Robert (2007). The new environment for development evaluation. American journal of evaluation, 24(4), 309-521. Preskill, Hallie and Darlene Russ-Eft (2005). Building evaluation capacity: 72 activities for teaching and training. Thousand Oaks, CA: Sage Publications. Salmen, Lawrence F. (1999). Beneficiary assessment manual for social funds. For the Social Protection Team Human Development Network World Bank. Retrieved May 9, 2008 from http://lnweb18.worldbank.org/ESSD/sdvext.nsf/07ByDoc Name/BeneficiaryAssessmentManualforSocialFunds/$FILE /%5BEnglish%5D+Beneficiary+Assessment+Manual.pdf Sanders, J.R. (1998). Cluster evaluation. In E. Chelimsky and W.R. Shadish, Jr. (Eds.). Evaluation for the 21st Century: A resources book. Thousand Oaks, CA: Sage Publications. Scrimshaw, N., and G. R. Gleason. 1992. Rapid assessment procedures: Qualitative methodologies for planning and evaluation of health related programmes. Boston: International Nutrition Foundation for Developing Countries. realism. (2008). In Merriam-Webster Online Dictionary. Retrieved May 9, 2008, from http://www.merriamwebster.com/dictionary/realism

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 299

Chapter 5 Scriven, Michael (1972a). “Objectivity and subjectivity in educational research.” In Philosophical redirection educational research: The Seventy-first Yearbook of the National Society for the Study of Education, Thomas L. G. (Ed). Chicago: The University of Chicago Press. Scriven, Michael (1972b). “Pros and cons about goal-free evaluation.” Evaluation Comment, 3, 1-7. Scriven, Michael (1991). Evaluation thesaurus (4th ed.). Thousand Oaks, CA: Sage Publications. Smith, M.F. (1989). Evaluability assessment: A practical approach. Boston: Kluwer Academic Press. Smith, Mary Lee, and Gene V. Glass (1980). Meta-analysis of research on class size and its relationship to attitudes and instruction. American educational research journal 17:419433. Smith, Nick L., and Paul R. Brandon (Eds) (2007). Fundamental issues in evaluation. Thiele, G., A. Devaux, C. Velasco. and K. Manrique (2006). Horizontal Evaluation: Stimulating Social Learning Among Peers. International Potato Center/ Papa Andina Program, Lima, Peru. Draft version: Draft of 18/05/06 Retrieved May 9, 2008 from http://www.dgroups.org/groups/pelican/docs/Hor_Evaln_ 18_05.doc?ois=no Thiele, G., A. Devaux, C. Velasco. and D. Horton (2007). Horizontal Evaluation: Fostering knowledge sharing and program improvement within a network. American Journal of Evaluation, Vol. 28, No. 4, 493-508. Tilley, Nick (2000). Realistic evaluation: An overview. Paper presented at the Founding Conference of the Danish Evaluation Society, September, 2000. Retrieved May 9, 2008 from http://www.danskevalueringsselskab.dk/pdf/Nick%20Tille y.pdf Turpin, R. S. and J. M. Sinacore (Eds.) (1991). Multisite evaluations. New Directions for Program Evaluation, No. 50. San Francisco, CA: Jossey-Bass. The World Bank (1996). The World Bank Participation Sourcebook. Appendix I: Methods and Tools. Retrieved July 16, 2007 from: http://www.worldbank.org/wbi/sourcebook/sba108.htm# D

Page 300

The Road to Results: Designing and Conducting Effective Development Evaluations

Considering the Evaluation Approach The World Bank Group, Social aspects of environment. Retrieved July 16, 2007 from: http://wbln0018.worldbank.org/LAC/LAC.nsf/ECADocBy Unid/9ED289DC14A17E1B85256CFD00633D5E?Opendoc ument United States General Accounting Office (GAO) (1992). The evaluation synthesis. GAO/PEMD-10.1.2. Retrieved May 8, 2008 from http://www.gao.gov/special.pubs/pemd1012.pdf Wandersman, A., Snell-Johns, J., Lentz, B., Fetterman, D., Keener, D. C., Livet, M., et al. (2005). The principles of empowerment evaluation. In D. M. Fetterman & A. Wandersman (Eds.), Empowerment evaluation principles in practice (pp. 27-41). New York: Guilford. Wholey, J. S. (1987). Evaluability assessment: Developing program theory. In L. Bickman (Ed.) Using program theory in evaluation. New Directions for Program Evaluation, No. 33. San Francisco: Jossey-Bass.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 301

Chapter 5

Web Sites: Empowerment Evaluation: Yesterday, Today, and Tomorrow: http://homepage.mac.com/profdavidf/documents/EEyeste rday.pdf Equator Principles http://www.equator-principles.com/ http://www.ifc.org/ifcext/equatorprinciples.nsf/Content/T hePrinciples The Utilization-Focused Evaluation http://www.wmich.edu/evalctr/checklists/ufechecklist.ht m International Finance Corporation (IFC). Environmental and social policies and guidelines. http://www.ifc.org/ifcext/enviro.nsf/Content/PoliciesandG uidelines IUCN (The World Conservation Union). Sustainability assessment: http://www.iucn.org/themes/eval/search/iucn/sustassess .htm NSF's User-Friendly Handbook for Mixed-Method Evaluations: http://www.ehr.nsf.gov/EHR/REC/pubs/NSF97153/start.htm The World Bank Group, Social Aspects of Environment: http://wbln0018.worldbank.org/LAC/LAC.nsf/ECADocByUnid /9ED289DC14A17E1B85256CFD00633D5E?Opendocument The World Bank. Social Assessment: http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/E XTSOCIALDEVELOPMENT/EXTSOCIALANALYSIS/0,,menu PK:281319~pagePK:149018~piPK:149093~theSitePK:28131 4,00.html

Page 302

The Road to Results: Designing and Conducting Effective Development Evaluations

The Road to Results Designing and Conducting Effective Development Evaluations

Designing & Conducting “There’s no limit to how complicated things can get, on account of one thing always leading to another.” E. B. WHITE

Chapter 6: Developing Evaluation Evaluation Questions and Starting the Design Matrix •

Sources of Questions

•

Three Types of Questions

•

Identifying and Selecting Questions

•

Keys for Writing Good Questions

•

Suggestions for Developing Questions

•

The Evaluation Design Process.

Chapter 7: Selecting Designs for Cause and Effect, Normative, and Descriptive Evaluation Questions •

Connecting Questions to Design

•

Design Elements

•

Answering Evaluation Questions

•

Key Points about Design.

Chapter 8: Selecting and Constructing Data Collection Instruments •

Data Analysis Strategy

•

Analyzing Qualitative Data

•

Analyzing Quantitative Data

•

Linking Quantitative Data and Qualitative Data.

Designing and Conducting

Chapter 9: Deciding on the Sampling Strategy •

Introduction to Sampling

•

Sampling Glossary

•

Types of Samples: Random and Non-random

•

How Confident and Precise Do You Need to Be?

•

How Large a Sample Do You Need?

Chapter 10: 10: Planning Data Analysis and Completing the Design Matrix

Page 304

•

Data Analysis Strategy

•

Analyzing Qualitative Data

•

Analyzing Quantitative Data

•

Linking Quantitative Data and Qualitative Data.

The Road to Results: Designing and Conducting Effective Development Evaluations

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 6 Developing Evaluation Questions and Starting the Design Matrix Introduction This is the first of five chapters that discuss specific steps in designing an evaluation. This chapter discusses evaluation questions – the different types of evaluation questions and when to use each type. It also covers how to write good questions and how to structure questions for evaluating development projects, programs, and policies. Knowing the type of questions to ask is important for selecting an appropriate evaluation design to answer the question. This chapter has six parts. They are: •

Sources of Questions

•

Three Types of Questions

•

Identifying and Selecting Questions

•

Keys for Developing Good Evaluation Questions

•

Suggestions for Developing Questions

•

Evaluation Design.

Chapter 6

Part I: Sources of Questions We place considerable emphasis on evaluation questions. Why? One reason is that evaluation questions give direction to an evaluation. They are the critical element that helps key individuals and groups improve efforts, make decisions, and provide information to the public. Fitzpatrick, Sanders, and Worthen (2004, pp. 233-234) state that careful reflection and investigation are needed to complete the critical process of identifying and defining the questions to be answered by an evaluation. Evaluation questions are the questions evaluators ask to learn about the project, program, or policy being evaluated. A frequent problem in developing questions is assuming that everyone involved shares the understanding of the evaluation goals. For example, if the question is "Did the program assist the participants?" – different stakeholders may interpret the term “assist” and “participate” for that matter) differently. Getting agreement on the theory of change, discussed in the previous chapter, can remedy this problem. In order to ensure that the evaluator gets diverse viewpoints, Fitzpatrick et al (2004, p. 234), give the following of list of sources the evaluator should use: •

questions, concerns, and values of stakeholders

•

evaluation “models,” frameworks, and approaches such as heuristic (trial and error)

•

research and evaluation models, findings, or important issues raised in the literature in the field of the program, project, or policy

•

professional standards, checklists, guidelines, instruments, or criteria developed or used elsewhere

•

views and knowledge of expert consultants

•

the evaluator’s own professional judgment.

Discussion in a Chapter 4: Understanding the Evaluation Context and Program Theory of Change covered techniques for identifying and working with stakeholders to learn their views on issues that they believe are important to the program to be evaluated. Chapter 4 also covered the importance of review of prior evaluation studies for question identification. The results chain helps visualize the relationships of the key elements and identifies the operating assumptions. Evaluators can use theory of change models to help them identify areas of focus for the evaluation. These will come from the major assumptions underlying the model. Page 306

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix The W.K. Kellogg Evaluation Logic Model Development Guide (2004, Chapter 4) discusses how to use a theory-of-change, the logic model, to form evaluation questions: A clear logic model illustrates the purpose and content of your program and makes it easier to develop meaningful evaluation questions from a variety of program vantage points: context, implementation, and results (which include outputs, outcomes, and impact). Figure 6.1 is adapted from the W. K. Kellogg Evaluation Logic Model Development Guide (2004, p. 36). It shows the types of evaluation questions that are appropriately asked at different points in the causal chain. Area of Control Internal to the Organization

Inputs

Activities

Outputs Reach Direct Beneficiaries

Outputs

(Resources)

Area of Influence External to the Organization

Short Term Results

Intermediate Results

(Direct)

(Indirect)

Long Term Result

External Factors

Formative Evaluation What aspects of our situation most shaped our ability to do the work we set out to do in our community

What did our program accomplish in our community?

and/or

Summative Evaluation

What is our assessment of what resulted from our work in the community?

What have we learned about doing this kind of work in a community like yours?

Fig. 6.1: Using a Logic Logic Model to Frame Evaluation Questions.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 307

Chapter 6 The generic questions at the bottom of the diagram show that formative questions can be drawn from the activities and outputs and summative questions from intermediate results and long term results. Questions derived from short-term results can be written as either formative or summative questions. Questions should flow from the major assumptions being made in the logic model about how the program will work and what benefits and/or outcomes will be achieved. As discussed in the Chapter 4: Understanding the Evaluation Context and Program Theory of Change, questions also will come from the review of research of completed evaluations of similar programs, as well as stakeholders’ diverse perspectives of the project, program, or policy. But a main reason we are concerned about formulating questions, is that the type of question asked has implications for the evaluation design being selected. We cover these implications more fully in Chapter 6, Descriptive, Normative, and Cause and Effect Evaluation Designs.

Part II: Three Types of Questions Many possible questions can be considered in planning an evaluation, and these questions can be categorized in different ways. The type of question chosen has implications for the design of the study. In this chapter, we will be putting questions into one of three baskets – either:

Page 308

•

descriptive questions

•

normative questions

•

cause-effect questions.

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix

Descriptive Questions Descriptive questions represent “what is.” They might describe aspects of a process, a condition, a set of views, or a set of organizational relationships or networks. The following are characteristics of descriptive questions: •

they seek to understand or describe a program or process

•

they provide a “snapshot” of what is

•

they are straight-forward questions, such as: −

•

who? what? where? when? how? and how much/many?

they can be used to describe: −

inputs, activities and outputs

−

they are frequently used to gather opinions from program clients.

According to Patton (2002, p. 438), “description forms the bedrock of qualitative data”. Descriptive questions for qualitative evaluation include: •

What are the goals of the program from the perspectives of different stakeholders?

•

What are the primary activities of the program?

•

How do people get into the program?

•

Where has the program been implemented?

•

What services does the program provide to men? To women?

•

What are the effects of the program on participants?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 309

Chapter 6

Other Examples of descriptive questions: •

Who receives the program?

•

What are the characteristics of the program?

•

What services are provided and to whom?

•

Where is the program delivered?

•

When was the program implemented?

•

How do the participants feel about the usefulness of the program?

•

How much did the program cost?

•

How many women participated in the program?

•

How were participants selected?

•

How well did participants score on the final exam?

•

What are the informal communication channels inside the organization?

It is important to stress that questions about the proportion of clients who find the program useful or the proportion that like the training are still descriptive questions.

Normative Questions Normative questions compare “what is” to “what should be.” They compare the current situation with a specified target, goal or benchmark. In other words, there is a standard or criterion against which to compare achieved performance. In looking at what is and comparing it to what should be, normative questions ask: Are we doing what we are supposed to be doing? Are we hitting our target? Did we accomplish what we said we would accomplish? Normative evaluations questions are similar to those asked in performance auditing. If the program has a results-based monitoring system, with targets specified for indicators and with timeframes by which they are to be accomplished, normative questions can be used to answer questions about inputs, activities, and outputs.

Page 310

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix

What to do if there are no standards? Sometimes the evaluator will find that a program has objectives, but there are no apparent criteria for determining how they will be measured or be attained, and/or the minimal amount that must be attained over some period of time. So for example, objectives might be: •

Children in the selected school districts will improve their reading skills. (What proportion of children, in which school levels, need to increase what specific skills, according to what standardized reading test, and by how much? And by what time?)

•

The program will increase awareness of HIV-AIDS and prevention methods. (But whose awareness, how will awareness be measured, how much does it need to increase, and by when?)

•

Micro-enterprises will expand and increase profits.

(How many micro-enterprises, defined how, and by how much do they need to expand? By when?) What can the evaluator do in such circumstances? While several options are frequently used, none are without risk of challenge. Typically, the evaluator will work with the program “owners” — those officials responsible administratively for the program or for its implementation. The evaluator might ask: “What is a reasonable level of performance for this program to attain?” “Where is the line above which you would deem this program to be successful?” But a concern is that one group may not “buy” the standards that another group has set. For example, those with oversight responsibility may not agree with the standard proposed by the program implementers. They may argue that the standards have been set too low. Another approach used is to bring in one or more experts in the particular program area and have them agree on a standard that could be used. A potential criticism is that the standard will reflect the personal biases of the expert. This criticism can be diminished by using several experts, but in such case, it is important that the expert group be viewed as politically neutral or balanced, and that they have no prior involvement with the specific program. The weakest and riskiest alternative (to be avoided) is for the evaluator to set the standard him or herself, based on personal experience. In such a situation, the evaluator is only setting him or herself up for accusations of bias or inexperience.

Criteria are generally found in program authorizing documents such as legislation or governing board approval documents. Criteria may also be specified as indicators with specific targets in results-based management systems. Other sources that may establish the standards are accreditation systems, blue-ribbon panels, professional organizations, or other commissions.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 311

Chapter 6

Examples of normative questions •

Did we spend as much as we had budgeted?

•

Did we reach the goal of admitting 5,000 students per year?

•

Did we vaccinate 80% of children as required?

•

Did we meet the objective of draining 100,000 hectares of land?

•

Was the process for selecting participants fair/equitable?

Cause and Effect Questions Cause and effect questions determine “what difference the intervention makes.” Referred to often in the literature as “Outcome Questions” and “Impact Questions”, they attempt to measure what has changed because of the intervention. Cause and effect questions seek to determine the effects of a project, program, or policy. They are the “so what” questions. Cause and effect questions, sometimes also called attributional questions, ask whether the desired results have been achieved as a result of the program. Is it the intervention that has caused the results? Results are outcomes and impacts. Program theory of change models depict the desired outcomes and impacts of a particular program. But there is a need to be cautious here. Outcomes may or may not be stated as cause and effect questions. For example, in a program to introduce farmers to a new improved seed, an outcome question might be whether the grain yield increased. As stated, this would be a descriptive question. It asks simply, how much did the crop increase? If the evaluation is asking whether the crop increased as a result of the program, and not, for example, as a result of unusually ideal weather for the grain crop, then it is asking a clear cause and effect question. Cause and effect questions imply a comparison of performance on one or more measures or indicators – not only before and after the intervention, but also with and without it.

Page 312

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix

Examples of causecause-effect questions •

As a result of the program, do participants have higher paying jobs than they otherwise would have?

•

Do program graduates have higher literacy levels than before the program and can the gains be attributed to the program?

•

Did the microenterprise program reduce the poverty rate in the township compared to nearby townships not participating in the program?

•

Did draining the land result in crop production greater than those of surrounding areas?

•

Did the new road increase traffic for trade and increase incomes more than would otherwise have been the case?

•

What other impacts or side effects (positive or negative) did this intervention have on the wider community?

Cause-effect questions are frequently difficult to answer. That is why we want to make sure that we ask the questions as a cause and effect question. Since many activities are occurring at the same time, it is difficult to demonstrate that the outcomes are solely, or at least primarily, the result of the intervention. When coming up with designs to answer cause and effect questions, evaluators need to exercise great care to eliminate other possible explanations for whatever changes they measure.

Example of eliminating other possible explanations A project to increase crop production in order to reduce poverty has been implemented. The evaluation team collects data on family income and finds that family income has increased after they implemented this project. But, did the project cause the increase or was there something else that was occurring at the same time that really caused the increase in income? Maybe the prices for the particular crops rose dramatically because of shortages caused by droughts in other areas. Cause-effect questions are about causality and attribution for causality: did the intervention cause something to happen? In order to determine whether there is a causal relationship, it is generally necessary to have the following: •

A theory of change: the connection between the intervention and outcomes should make sense (as coverd in Chapter 4, Understanding the Evaluation Context and Program Theory of Change). It is logical to expect that training people in agricultural methods would be likely to increase crop production, if what assumptions are met?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 313

Chapter 6 •

Time order: the interventions should come before the outcome. The training should come before we see an increase in crop production.

•

Co-variation: both the intervention and the outcome should have the ability to change. This means that if we compared people who had the training against those who did not (variation in program participation) we would see whether there were corresponding directional changes in crop production (variation in the amount of crops produced).

•

Elimination of rival explanations: we need to be able to establish if it is the intervention, rather than other factors (rainfall, soil, etc.) that explain the changes we have measured. In order to do this, cause and effect questons generally need one of the following: −

comparison to a baseline for pre- and postintervention differences

−

comparison to a similar group who did not receive the intervention.

While cause-effect questions are more challenging to address than descriptive and normative questions, all questions have to be clearly defined in measurable ways. All questions require that relevant and accurate data be collected and analyzed. The type of question, the data available, and the amount of time and money will drive the type of design selected.

Page 314

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix

Examples of questions used to evaluate policy Example 1 Policy: •

Ensure that all children receive preventative health care.

Goal: •

To reduce infant and pre-school child mortality.

Preventative Health Care Program for Children Intervention questions: •

What percent of all children received preventative health care since last year’s program inception? (Descriptive question)

•

Have the intended groups of low-income children received preventative health care? (Normative question)

•

Have child mortality rates decreased as a result of the program? (Cause and effect question)

Example 2 Policy •

Ensure that secondary schools teach the knowledge and skills needed for employment in local markets.

Goal: •

To ensure that graduates are able to get well-paying skilled jobs.

Development and Implementation of a New Secondary School Curriculum Intervention Questions: •

How are secondary schools preparing students for jobs in the local market? (Descriptive question)

•

To what extent are secondary schools making decisions on training areas based on forcasts of market needs as required? (Normative question)

•

How much more are recent graduates getting paid in their jobs than non-graduates? (Weak descriptive question that should be restated as a cause and effect question)

•

Are recent graduates receiving higher wages compared to those who dropped out of the program? (Cause and effect question)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 315

Chapter 6

Examples of questions used to evaluate a program intervention Example 1 Intervention: •

Family clinics provide free immunization against measles to all children under the age of five in three regions of the country in one year.

Evaluation questions: •

How did the clinics outreach to parents and children? (Descriptive question)

•

Did 100 percent of all children under the age of five in the three regions receive immunizations against measles last year? (Normative question)

•

Did the program follow agreed upon procedures to reach the children most at risk? (Normative question)

•

Has the proportion of children contracting measles decreased over the pre-intervention proportion in the three regions as compared with other regions without the program? (Cause and effect question)

•

Has there been a decline in child mortality from measles-related complications as a result of this program? (Cause and effect question)

Example 2 Intervention •

Three secondary schools within three cities implement a market-based curriculum.

Evaluation Questions:

Page 316

•

How different is the curriculum from that used by nonparticipating schools? (Descriptive question)

•

Was the curriculum market-based as required? (Normative question)

•

To what extent did graduates of these schools obtain high-paying jobs? (Poorly worded cause and effect question)

•

What was the proporton of the graduates who obtained high-paying jobs as compared to similar graduates from other schools using the traditional curriculum? (Cause and effect question)

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix The questions chosen for the evaluation depend upon the information needs of the main client(s) for the evaluation, the amount of time and resources that are available, and the accessibility of the information to answer the questions. Note that evaluation questions can be rephrased to change them from one type of question to another. Frequently, questions need rewriting to make them clearer.

Relationship of Question Types to Outcome Models In Chapter 4, Understanding the Evaluation Context and Program Theory of Change, we illustrated the theory of change. Confusion sometimes exists on how questions about outputs and results – outcomes and impact – fit with the three types of questions. Questions about outputs are typically normative questions. We want to know if what we budgeted for, or how much what we budgeted for, is attained by the project or program. Questions about the attainment of outcomes are typically cause and effect questions, but they may also be descriptive questions. They reflect the program logic that if we do “X” activities, and attain “Y” outputs, then we will see these results. In Figure 6.1, earlier in this chapter, questions about the extent to which the program increases income and employment of local people would be cause and effect questions. Questions about impacts are almost always cause and effect questions, such as “to what extent did the program contribute to improved family living conditions?” Remember that questions about how much measures of outcome (or impact) changed are either descriptive questions or poorly worded cause and effect questions. If meant to be cause and effect questions, they may need to be rewritten to indicate, not just a before and after comparison, but also a with and without the intervention comparison.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 317

Chapter 6

Part III: Identifying and Selecting Questions Questions At this point, there may be a long list of questions. How does the evaluator decide which are the most important ones to pose? Cronbach (1982, pp. 210-213) suggests using two phases for identifying and selecting questions, the divergent phase and the convergent phase. In the divergent phase, a comprehensive list of potentially important questions and concerns is developed. Few questions are eliminated; many sources are consulted. Chronbach (1982) summarizes the divergent phase of planning an evaluation as follows: The first step is opening one’s mind to questions to be entertained at least briefly as prospects for investigation. This phase constitutes an evaluative act in itself, requiring collection of data, reasoned analysis, and judgment. Very little of this information and analysis is quantitative. The data come from informal conversations, casual observations, and review of extant records. Naturalistic and qualitative methods are particularly suited to this work because, attending to the perceptions of participants and interested parties, they enable the evaluator to identify hopes and fears that may not yet have surfaced as policy issues…. The evaluator should try to see the program through the eyes of the various sectors of the decision-making community, including the professionals who would operate the program if it is adopted and the citizens who are to be served by it. There will come a time when no new questions are being generated. At this time, the evaluator should stop and examine the questions obtained, and begin to organize them. Here is where the evaluator can use the evaluation frameworks prepared earlier.

Page 318

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix Classify each question as it fits into the labels of the model or framework. Figure 6.2 shows a theory of change (logic model) example from Chapter 4: Understanding Evaluation Context and Program Theory of Change, for a micro-lending program.

Access to startup funds for small business

Income and employment for local people

Financial management advice and support

Skills in business and financial management

Improved living conditions

Reduced family poverty

Questions about access to start-up funds for small businesses Questions about income and employment for local people Fig. 6.2: Example Example of Theory of Change of a MicroMicro-lending Program Showing Categories of Questions Generated.

Figure 6.3 depicts the theory of change for a training program. The information or labels in the boxes can assist in classifying the questions. Inputs

Activities

Resources

Outputs

Services

▪ Money

▪ Training

▪ Staff

▪ Education

▪ Volunteers

▪ Counseling

▪ Supplies

Products ▪ Total # of classes ▪ Hours of service

Outcomes

Benefits (as immediate or intermediate result of the intervention) ▪ New knowledge

▪ Number of participants completing course

▪ Increased skills

Scores on knowledge test

▪ New employment opportunities

▪ Changed attitudes

Impacts

Longer term Changes or Goals ▪ Trainees earn more over five years than those not receiving training ▪ Trainees have higher standard of living than the control group

Questions about resources Questions about services (etc.) Fig. 6.3: Example of Theory of Change for a Training Program, and Question Generation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 319

Chapter 6 Questions also may be categorized according to the OECD/DAC criteria for developing evaluation assistance (OECD/DAC, 2007). As presented in Chapter 1: Introducing Development Evaluation, these are: •

relevance

•

effectiveness

•

efficiency

•

impact

•

sustainability

•

sometimes institutional development impact.

We might start by identifying questions that help determine if the project, program, or policy is relevant and those, for example, which address efficiency. In the convergent phase, evaluator) narrow the list of questions (generated in the divergent phase). The task is to identify the most critical questions. How does the evaluator decide which questions are most critical? According to Fitzpatrick et al, (2004, p. 241), stakeholders will mostly focus on outcomes. While outcomes are very important, the evaluator needs to look at the entire model and think about which questions need to be asked that might explain the attainment, or lack of attainment, on the outcomes. Questions on inputs, activities, and outputs are important, potentially, as well as on outcomes and impacts. The evaluator needs to select the most appropriate and central questions. Fitzpatrick et al (2004, pp. 247-248) propose the following criteria for determining which proposed evaluation questions should be investigated:

Page 320

•

Who would use the information? Who wants to know? Who will be upset if this evaluation question is dropped?

•

Would an answer to the question reduce present uncertainty or provide information not now readily available?

•

Would the answer to the question yield important information? Have an impact on the course of events?

•

Is this question merely of passing interest to someone, or does it focus on critical dimensions or continued interest?

•

Would the scope or comprehensiveness of the evaluation be seriously limited if this question were dropped?

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix •

Is it feasible to answer this question, given available financial and human resources, time, methods, and technology?

This list of criteria can be put in a matrix to help the evaluator and client narrow down the original list of questions into a manageable set. Figure 6.4 illustrates the type of matrix that can be used to rank or select evaluation questions. Would the evaluation question …?

Evaluation Question 1

1

be of interest to key audiences?

2

reduce present uncertainty?

3

yield important information?

4

be of continuing (not fleeting) interest? be critical to the study’s scope and comprehensiveness? have an impact on the course of events? be answerable in terms of: - financial and human resources? - time? - available methods and technology?

5 6 7

2

3

4

5

6

7 …

Source: Fitzpatrick et al., p. 249 Fig. 6.4: Matrix for Ranking and Selecting Evaluation Questions.

The evaluator should pay particular attention to the questions that the client (usually the requesting client) has. The evaluator and the client will want to agree on questions. It is important to resolve any differences at this early stage in planning the evaluation. Additionally, the process helps the evaluator and client to establish a “shared ownership” or “partnership” that can be valuable during later stages of the evaluation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 321

Chapter 6

Part IV: Keys for Developing Good Evaluation Questions Begin by identifying the major issues being addressed by the project, program, or policy. Major issues are generally identified through a review of the related literature, including evaluations of similar programs, development or review of the theory of change, program documents, and discussions with program stakeholders and the client who is funding the evaluations. Examples of major issues: -

multiple causes of infant mortality

-

a competing on-going program such as a campaign (not related to the program), using the media to increase women’s awareness of free health screenings and better nutrition to mothers through food supplements

-

effectiveness of methods used to outreach low-income mothers

- extent and nature of use of food supplements for nonintended purposes Next, ask the questions that will help determine if the issues have been affected by the policy or intervention. Example questions to learn about issues: •

What outreach methods has the program used?

•

Which outreach methods have been the most effective?

•

What has been the incidence of life-threatening infant diseases during the time the program has been operational?

•

How much have the mortality rates decreased?

•

What other related efforts to improve maternal health have been ongoing?

Compound questions are NOT good. It is always better to separate the issues and write separate questions for each issue. Example of compound question: •

How many women have received health screenings and nutritional supplements?

Example of a compound question separated: •

Page 322

How many women have received health screenings?

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix •

How many women have received nutritional supplements?

Questions about an issue can be addressed using all three question types by adjusting the wording. Example of key issue: •

Reduce injury and death from land mines.

Examples of question types •

Descriptive: Where are the majority of accidents involving land mines occurring?

•

Normative: Did the project reach the goal of eliminating 1,000 land mines in the area in the given time?

•

Cause and effect: Has the number of people who have been injured or killed from land mines decreased as a result of the intervention?

Part V: Suggestions for Developing Questions Questions The following is a list of suggestions to help write better questions. •

There should be a clear link between each evaluation question and the purpose of the study.

•

The issues of greatest concern should be addressed by the evaluation questions.

•

The questions should be answerable; if not, change the question.

•

The number of questions that can be answered in a single evaluation must be realistic.

•

Focus on the important questions – the ones that must be answered, as opposed to those that would be nice to know.

•

Questions should be answerable within the evaluation’s timeframe and available resources.

•

Lastly, consider the timing of the evaluation relative to the program cycle. Questions about impact, for example, are best answered after the intervention has been fully operational for a few years.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 323

Chapter 6 The evaluation questions may relate not only to a project or program, but also to: •

the overarching policy issue

•

a specific policy

•

and/or a specific intervention associated with a policy.

For example, if the overall concern (the policy issue) is reducing poverty, a number of program interventions may be launched. Each policy gets translated into action – an intervention designed to achieve specific objectives. Ultimately, if the policy and the interventions are carried out effectively and the theory of change is correct, then the overall outcomes should be attained. If not, then both the interventions and policy need to be reassessed. One, or both, may need to be changed. For instance, poverty (an overarching policy issue) is caused and perpetuated by many problems. Policymakers take actions that they believe will reduce poverty, based on what they believe are its most important causes. One policy to address the poverty problem might be to ensure that young unemployed people are given the knowledge and skills necessary to obtain high-wage, skilled jobs. The policymakers’ implicit “theory” is that, if unemployed young people can obtain good paying jobs, poverty will decline. It may also be true that if there are skilled workers available, new businesses might choose to locate in this city, thereby creating more employment opportunities. To translate the policy into action, money is appropriated to test a job skills training intervention in three community-based organizations. Policy questions will focus on overall outcomes: •

Do graduates from the community-based training programs obtain higher skilled jobs and higher wages compared to those who attend other programs for the unemployed and those unemployed during the same time period who receive or no program at all?

•

Is poverty reduced over time?

Evaluators charged with assessing the intervention are likely to ask questions about the intervention itself: •

Page 324

Does it teach the skills needed by employers, and at the right level?

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix Ideally, the evaluation would also compare the jobs and wages of graduates of the community-based training to those obtained by program drop-outs. Exploratory questions would also be asked on program delivery such as: •

To what extent do the programs focus on the skills and information required by the marketplace?

•

Do businesses believe the graduates of this program are more highly skilled than other prospective hires?

There may also be a problem with the theory of change. Perhaps, for example, the problem is not the shortage of a skilled workforce, but rather a lack of jobs needing a highly skilled workforce. The policy may be aimed only at the supply side, when it might be more effective in addressing the demand side as well.

Part VI: Evaluation Design Much like an architect designs a building, and evaluator designs an evaluation. An evaluation design is the plan for what the evaluation will include. It is not the full work plan for the study. An evaluation design consists of the major issue or question that the evaluation is to address, the general evaluation approach to be taken, the specific evaluation questions, and sub-questions, the operationalization (measures or indicators), the data source(s), the methodological strategies for the type of data collection to be used, the analysis planned, and the dissemination strategy. Patton (1987, pp. 44-45) describes two kinds of issues when considering evaluation design: conceptual issues and technical issues. Conceptual issues focus on how the people involved will think about the evaluation. For example, issues such as determining the purpose of the evaluation and its primary stakeholders, as well as political issues that should be taken into account. We have addressed these issues in Chapter 4: Understanding the Evaluation Context and Program Theory of Change, and this chapter. Patton describes the technical design as a plan for data collection and analysis. Technical design issues for Patton are, for example, determining the sampling strategy and kinds of data that will be collected. These technical issues which define the specifics of the evaluation comprise the heart of the design matrix.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 325

Chapter 6 For each question (or sub-question, if they are used) the design matrix requires: •

determinging the type of question or sub-question being asked – descriptive, normative, or cause and effect

•

specifiying the measure (indicator or determinant) by which the question/subquestion will be answered (for example, percent growth in local housing or number of children vacinated); sometimes this is an agreed-on indicator with a clear target. In any event, presence or absence of a baseline must be indicated

•

identifying a methodological strategy or design that will provide information appropriate for answering the descriptive, normative, or cause and effect question

•

identifying the soure(s) of data for each question or subquestion

•

determining if a sampling framework is needed and, if so, what kind will be used

•

selecting data collection instruments for each question or sub-question

•

identifying how the data will be analyzed and presented.

We refer to this process as the evaluation design process. The completed evaluation matrix represents the evaluation design. This is not the complete work plan. It does not indicate all the tasks and who will perform each task and when. For example, data collection instruments need to be developed or adapted and tested and refined; data collectors must be identified, possibly hired, and trained. The complete work plan is covered in Chapter 12: Managing for Quality and Use.

The Evaluation Design Process Ideally, the evaluation process begins (ex-ante) with the initial program design. As the actual evaluation is developed, there are several distinct and important stages. (See Figure 6.5)

Page 326

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix Initial Planning or Scoping Scoping •

Get a thorough understanding of the program, project, or policy − Meet with the main client for the evaluation − Identify and meet with other key stakeholders − Explore program context and gather background materials − Search for related, relevant evaluations − Review prior evaluations to identify issues identified, designs, and data collection strategies used − Meet with program staff (if external to the program) − Review and refine or develop theory of change for the program

Designing •

•

Determine the questions and issues − Meet the client and identify main purpose of the evaluation, issues of concern, and critical timing needs − Identify and meet with other key stakeholders to identify issues and concerns for possible inclusion in the evaluation − Determine general resources available for the evaluation, such as budget for consultants and travel, and team members and skill mix − Assess stakeholders needs, including timing Prepare terms of reference and evaluation matrix − Identify type of evaluation − Identify specific evaluation questions and sub-questions − Select measures for each question or sub-question − Identify data sources for addressing each question or sub-question − Identify an appropriate design for each question or sub-question − Develop data collection strategy, including instruments and sampling, if needed, again by question or sub-question − Develop data analysis strategy − Determine what sampling strategy, if any, is needed − Determine resource and time requirements

Doing • •

• • • • •

Brief client and key stakeholders on the evaluation design Prepare work plan, including reviewing and testing the methodology including pre-test instruments, training data collectors, and developing protocol) Gather the data Prepare data for analysis − Develop table shells (if not done as part of evaluation design) − Clean data Analyze the data Develop graphics Formulate the findings

Reporting • • • • •

• • •

Hold story conference – theme identification Identify major findings: what works, what does not, and what needs improvement Write report Brief client on findings and statements of fact Brief program officials on findings and statements of fact, and make correctons as needed Determine who will receive what kind of study product (briefing, two to four page summary, full report, in-depth workshop, etc.) Program officials review draft report and comment – may suggest recommendations Develop recommendations that are: − Clear and specific that indicate who should do what and when − Link evidence to recommendations

Disseminating: • • •

Product: Written report to client, written report, briefings Process: contributes to learning Request program officials to develop a plan for implementing the recommendations (if any).

Fig. 6.5: The Evaluation Process. Process. The Road to Results: Designing and Conducting Effective Development Evaluations

Page 327

Chapter 6 The initial planning or scoping phase clarifies the nature and scope of the evaluation. During this phase, some decisions are made: for example, the main purpose of the evaluation, the stakeholders to be consulted, who will conduct the evaluation, and the time frame for the results. It is an exploratory period. Key issues are identified from the perspective of the main client and other stakeholders, the literature review, mapping of related interventions that may influence the program at its results. The theory of change and assumptions underlying it are developed or refined. At the end of the initial planning or scoping phase, there should be enough knowledge of the context for the evaluation that a general approach may be decided. The heart of the evaluation planning is the evaluation design phase which culminates in the evaluation design matrix. This is the framework that specifies the questions and sub-questions, the strategy for answering each question, the measures, the data collection techniques and instruments, the sampling to be undertaken, and the data analysis to be conducted. If the overall design is flawed, it will limit the ability to make conclusions about the performance of the intervention. Generally, it is a good practice to present and discuss the overall design with the evaluation sponsor (client) and other key stakeholders beforehand. It ensures that there are “no surprises” and builds buy- in and support of the evaluation. An advisory group and peer reviewers are also good sounding boards to ensure the soundness of the evaluation design. The design matrix can be used as the basis for a Terms of Reference (TOR). The TOR may serve as the basis for a request for proposal, or guide for the evaluation team if the evaluation is done internally. In some cases when there is to be an external evaluation, a TOR will be done for a consultant produced evaluation design matrix. In such cases, the TOR requests consultants to produce the design matrix. When the scoping and background work for the evaluation are done by an external consultant, it is wise to have a TOR solely for production of the evaluation design. Based on the evaluation design produced, another TOR can be developed for its implementation. The doing phase relates to the actual gathering and analysis of the data. If different kinds of data are to be collected, different instruments must be developed and tested. Analysis is often interactive with data collection, rather than sequential.

Page 328

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix About two-thirds of the way through data collection, the evaluation team should hold a “story” conference to identify emerging themes and main messages. It is useful as part of this process to think of the three to five major messages one would give to the agency or ministerial head, or to the press. These messages or themes should provide the basis for a report outline and later, the initial set of findings should be organized around the themes. The purpose of the story conference is to ensure early agreement on the major themes and check that the main issue or question behind the evaluation has been addressed. In the reporting phase, initial findings or statements of fact can be shared and discussed with the program “owners” so that any factual errors can be corrected and any new information considered before a report is drafted and recommendations are developed. Once the analysis is completed, the results are written up, drafts are reviewed, comments are incorporated as appropriate, and a final report is presented to the client and key stakeholders. The report will typically provide background and context for the evaluation, the purpose of the evaluation, a description of the evaluation’s scope and methodology, and findings (including both intended and unintended outcomes). The report generally also includes information about lessons learned and recommendations. Understanding what works well and why it works well are as important as understanding what does not work well and why. Both should be clear. The report should be written with the audience in mind; it should be free of jargon and easy to read. This will be discussed further in Chapter 11: Presenting Results. Finally, a good evaluation design will include a dissemination process. As indicated above, not all the reporting should come at the end of an evaluation, nor is it always in the form of printed material. Verbal briefings are useful for communicating findings earlier, especially when there are unexpected or critical findings.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 329

Chapter 6 A good evaluation is used and this may mean that a plan is developed to implement the recommendations. Many evaluations result in action to: •

modify intervention

•

remove barriers

•

inform future policy or interventions (modify the theory of change)

•

show others the way

•

reshape peoples’ thinking about the nature of the problem.

The relationship between the different components can be seen in Figure 6.6. The process typically is not linear; there is often some back and forth between the elements. Focus the Evaluation

Design & Methodology

▪ Identification and meeting with stakeholders

▪ Evaluation questions ▪ Measurement strategy

▪ Purpose – meeting with the client

▪ Data collection design

• Research – other studies and program documentation

▪ Data collection strategy

▪ Theory of Change

▪ Sampling strategy

▪ Specification of evaluation questions

▪ Develop data collection instruments • Develop analysis plan

▪ Terms of Reference

Use Evaluation

Report Findings

• Develop communication strategy • Brief on evaluation design • Update brief on evaluation progress ▪ Communicate Findings ▪ Feedback ▪ Decision-making ▪ Action Plan • Follow-up • Recommendations • Tracking

▪ Write report • Review and quality checks ▪ Make recommendations • Incorporate feedback/ refine

▪ Brief client and stakeholders

Gather & Analyze Data • Test instruments • Develop protocols • Train as needed ▪ Gather data according to protocols ▪ Prepare data for analysis ▪ Analyze data • Interpret data • Hold message conference • Draft statement of findings

• Deliver

Fig. 6.6: Approach to Development Evaluation.

Page 330

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix

The Evaluation Design Matrix An evaluation design matrix is another organizing tool to help plan an evaluation. It is one we highly recommend. An evaluation design matrix organizes the evaluation questions and the plans for collecting information to answer the questions. The matrix links descriptive, normative, and impact evaluation questions to design and methodologies. The evaluation design matrix is a very simple tool, but it can play a powerful role in planning and implementing an evaluation. Which tool an evaluator uses to help think about a program, its context, measurable objectives, data collection and analysis strategies will vary. Some evaluators may decide to create their own. The point is that the evaluator needs to use a tool to identify the necessary pieces of the evaluation, to ensure they connect, and that the connections are clear at every step. The purpose of the design matrix is to organize the evaluation purpose and questions and match what is to be evaluated with appropriate data collection techniques. Although there is no hard and fast rule, a design matrix usually includes the following linked elements: •

main evaluation issue

•

general approach

•

questions

•

sub-questions

•

type of (sub)question

•

measure or indicator

•

target or standard (if normative)

•

presence or absence of baseline data

•

design strategy

•

data sources

•

sample or census

•

data collection instrument

•

data analysis and graphics

•

comments.

Data collection protocols and evaluation on work assignments and schedules, terms of references, and communication plans may be added or remain separate but linked additional tools.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 331

Chapter 6 Although each evaluation question is unique, the same data collection method may address more than one question or several data collection methods may be used to address a single question. The design matrix incorporates known sources of information and planned sources. As one moves the process from planning to implementation, sources can be expanded and clarified. Beyond its immediate usefulness as a planning tool, the matrix can serve as a tool to promote the use of the evaluation and cooperation between evaluators and program staff. Figure 6.7 shows an example of an evaluation design matrix. Note that the matrix template is worked horizontally. As we proceed through later modules, we cover adding information to this matrix. The evaluation matrix is not cast in stone. Like any other planning tool, it undoubtedly will need modification as the evaluation progresses. During the evaluation, evaluators can review the matrix, update it, and use it as a guide for implementing the evaluation. While up-front planning should minimize the surfacing of problems, the best of planning cannot prevent surprises.

Page 332

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix Design Matrix for:

Page 1

Main Evaluation Issue: Questions

General Evaluation Approach: Approach: SubSub-Questions

Type of (Sub)Question

The Road to Results: Designing and Conducting Effective Development Evaluations

Measures Measures or Indicators Indicators

Target or Standard (if normative)

Baseline Data?

Page 333

Chapter 6

Design Matrix Design

Page 334

Data Sources

Sample or Census

Page 1 continued Data Collection Instrument

Data Analysis

Comments

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix

Design Matrix Continued Questions

SubSub-Questions

Type of (Sub)Question

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 2 Measures Measures or Indicators Indicators

Target or Standard (if normative)

Baseline Data?

Page 335

Chapter 6

Design Matrix Continued Questions

Page 336

SubSub-Questions

Type of (Sub)Question

Page 2 continued Measures Measures or Indicators Indicators

Target or Standard (if normative)

Baseline Data?

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix

Summary Evaluators need to work together with the main client and key stakeholders to identify possible questions. After completing the background research and meeting with the client and key stakeholders, developing the program theory of change and identifying major assumptions underlying the program, the evaluator can begin selecting evaluation questions from the long list of those generated. Evaluation questions should be checked against the major evaluation issue to check that it is being addressed. Evaluators will likely use descriptive questions, normative questions, and cause and effect questions. The wording of the question is important as it helps determine the means for finding the answer(s) to the question. The recommended way to organize the evaluation is to use a design matrix. The matrix helps organize questions, design, and data collection and analysis strategies, among other things. The following chapters of this textbook are a step-by-step guide to completing the design matrix.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 337

Chapter 6

Chapter 6 Activities Types of Questions Instructions: Please identify whether each of the following questions about a rural women’s preventative health initiative are descriptive, normative, or cause-effect questions. Do some questions need to be rewritten to make their type clearer? How would you re-write them to make them clearer? This activity is important so that you use the kinds of questions you want and ones for which you will be able to collect data.

1. Did the health initiative provide the required advice, support, and other services to 30 rural women in its first month of operation? 2. Were the services associated with the initiative delivered at a location and time that maximized the number of women who could participate? 3. What were the best methods for reaching women in remote areas and making the program accessible to them? 4. Are health problems among rural women detected earlier among those who participated in the women’s health initiative? 5. Since its inception, how many women have received what types of services? 6. How effective is the women’s health initiative compared to other interventions for improving the health of rural women? 7. What is the impact of the health initiative on the women, their families, and the wider rural community in which they live? 8. How satisfied are participants with the advice, information, support, and other services they receive? 9. Is the rural women’s healthy initiative meeting the government’s required efficiency standards? 10. What do participants say are the impacts of the program on them? 11. To what extent did women receiving services meet eligibility requirements? 12. Did the program meet its objective of increasing women’s knowledge of preventative health techniques?

Page 338

The Road to Results: Designing and Conducting Effective Development Evaluations

Developing Evaluation Questions and Starting the Design Matrix

Modifying Question Types Instructions: For each of the following situations, write three different questions, one descriptive, one normative, and one cause-effect question. You should have nine questions when you are done. 1. A program to provide vocational training and job training to young men. 2. A program to build roads linking three communities to a central market. 3. A program to improve corporate governance in private sector companies.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 339

Chapter 6

References and Further Reading Chronbach, L. J. (1982) Designing evaluations of educational and social programs. San Francisco: Jossey-Bass. Fitzpatrick, J. L., Sanders, J. R. and Worthen, B. R.. (2004). Program evaluation: Alternative approaches and practical guidelines. New York: Pearson Education Inc. Feuerstein, M. T. (1986). Partners in evaluation: Evaluating development and community programs with participants. London: MacMillan, in association with Teaching Aids at Low Cost. Human Rights Resource Center, University of Minnesota (2000). Part VI. B. Questions about Evaluation. The Human Rights Education Handbook: Effective Practices for Learning, Action, and Change. Retrieved July 6, 2005 from: http://www1.umn.edu/humanrts/edumat/hreduseries/hr handbook/part6B.html Lawrence, J. (1989). Engaging recipients in development evaluation – the “Stakeholder” approach. Evaluation Review, 13(3). OECD/DAC (2007). Criteria for evaluating development assistance. Retrieved November 16, 2007 from http://www.oecd.org/document/22/0,2340,en_2649_2011 85_2086550_1_1_1_1,00.html Patton, Michael Q. (2002). Qualitative research and evaluation methods. 3rd Ed. Thousand Oaks: Sage Publication. Patton, Michael Quinn (1997). Utilization-focused evaluation (3rd Edition). Thousand Oaks, CA: Sage Publications. Shadish, William (1998). Some evaluation questions. Practical assessment, research & evaluation, 6(3). Retrieved July 6, 2005 from http://PAREonline.net/getvn.asp?v=6&n=3 Also available at ERIC/AE Digest. Retrieved July 6, 2005 from: http://www.ericdigests.org/1999-2/some.htm

Web Sites W.K. Kellogg Evaluation Logic Model Development Guide (2004). http://www.wkkf.org/pubs/tools/evaluation/pub3669.pdf The World Bank Participation Sourcebook (1996). http://www.worldbank.org/wbi/sourcebook/sbhome.htm

Page 340

The Road to Results: Designing and Conducting Effective Development Evaluations

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 7 Selecting Designs for Cause and Effect, Normative and Descriptive Evaluation Questions Introduction After determining the evaluation questions, the next step will be to select an evaluation design approach that is most appropriate given each question. This chapter presents some guidelines, along with the strengths and weaknesses of various design options, but it is important to keep in mind that every situation is unique. There is no “one and only” way to address an evaluation question. This chapter has six sections. They are: •

Connecting Questions to Design

•

Experimental Design for Cause and Effect Questions

•

Quasi-experimental Designs and Threats to Validity for Cause and Effect Questions

•

Designs for Descriptive Questions

•

Designs for Normative Questions

•

The Gold Standard Debated.

Chapter 7

Part I: Connecting Questions to Design When we evaluate, we seek to answer questions. In the last chapter, we indicated there were three main types of questions: descriptive, normative, and cause and effect questions. We indicated that the type of question would relate to the design selected. Consider the following story from a Southern African context.

Elephants in the Village Every year, elephants came through the village. The villagers knew they would come, but did not know what day to expect them. Every person had their own strategy to drive the elephants away. Some banged pots and pans, others whistled, shouted, or screamed, each making lots of noise. Others kicked up dust and moved around establishing their ownership of the land. They were all involved in driving the elephants away. They celebrated when the elephants changed their direction and moved away. One small girl (who was destined to become an evaluator) asked them “Why did the elephants leave? The villagers said because we drove them away. The girl asked another question “But what is it that made them leave, the sound of the whistle, or the dust in the air?”

Development organizations are seeking to find solutions to questions about development issues, just as the young girl is seeking to learn what caused the elephants to leave. But in attempting to answer questions, they have not always taken the right steps. The first mis-step is starting to address the question by choosing a strategy for collecting data. So to answer the young girl’s question on the elephants, the village elder who is quite experienced with survey techniques might say: “Let’s do a survey and find out what villagers say make the elephants leave”. Leading with a data collection strategy is almost certainly not going to provide the information needed. Often there is a push and a rush to get an issue addressed and the leap is made. Will knowing how many villagers think it was the banging of pots that made the elephants depart, answer the young girl’s question? Not really. The second mis-step is to think that each evaluation has a single design. Typically, an evaluation seeks to address several questions. Each question needs to have an appropriate design. An evaluation will usually need to address descriptive and normative questions and sometimes cause and effect questions. The evaluator needs to avoid “the method in search of an application” technique. That is, what questions can we answer by doing a multiple regression or focus groups – or the thinking that if one is addressing a cause and effect question, one does not also have to address descriptive and normative questions. Perhaps if one did an in-depth case study of the elephant’s movements, one would learn that neither the pot banging nor the dust made the elephants leave the village. The village was simply in the elephant’s migrating path and they were moving through. Page 342

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Broad Categories of Design Evaluators have three broad categories of designs from which to select: •

experimental

•

quasi-experimental

•

non-experimental.

Experimental Design An experimental design might also be called a randomized or true experiment design. Experimental designs are often called the evaluation “gold standard” and are considered the most “rigorous” of all designs. When asking an evaluation question, the evaluator is attempting to determine whether an intervention is the cause of the desired result. For example, using the elephants in the village, a question might be: if the villagers bang on pots and make noise, this causes the elephants to leave the village. But in a true experimental design, evaluators must also prove that if the intervention does not occur, the desired result will not happen. Again, using the elephants in the village example, with a true experimental design, the evaluator must also prove that if the villagers did not bang on the pots, the elephants will not leave the village. To show the intervention is the cause, an experimental design uses two groups, one that receives the intervention and one that does not receive the intervention and then compares the results. The main criterion for distinguishing experimental designs from other designs is random assignment to groups. To compare groups, the groups should be “equivalent”, as much as possible. Assignment to the groups should include the same people, backgrounds, gender, context, timeframe, etc. Using experimental design, people from a common pool are randomly assigned to one or the other group before the intervention is introduced. The crucial point here is to assign the members of each group randomly (Trochim & Land, 2006, Experimental Design). For example, all of the names of the villagers would be written on slips of paper and placed in a bowl. To randomly select the group to receive the intervention, the village elder might pull out the names one at a time: the first name going to one group, the second to the other group, the third to the first group, the fourth to the second group, etc. In this way, the groups are randomly assigned.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 343

Chapter 7

Quasi--experimental Design Quasi Quasi-experimental design is similar to experimental design, but does not use random assignment to groups. Quasiexperimental design can compare groups that are similar, but not equivalent. The groups may exist in different but similar villages, or they may use the same group but at different times (before and after). For example, for the elephants in the village story, a quasi-experimental design might choose two similar villages, they are from the same region, with the same climate, have the same number of elephants in the area, have similar number of villagers, have the same number of homes, etc. One village would receive the treatment (banging pots) and the other would have no intervention. For another example, the two groups might be selected by time, before and after. The groups would contain the same villagers, in the same village, but before and after the intervention. The first time the elephants arrive at the village, the villagers respond with the intervention, banging of pots. The second time the elephants arrive at the village, they do not bang on the pots. In both examples, the two groups are similar but not equivalent. In the first example, the people and environment have differences that might affect the results. In the second example, the time, before and after, is the difference between the two situations. With quasi-experimental design, the evaluators cannot definitively link the intervention to the solution nor can it show a cause and effect link but they can learn more about is the likely cause and effect. Quasi-experimental designs can also be more feasible than experimental designs.

Non--experimental Design Non Evaluations that do not compare groups can be called nonexperimental design or sometimes called a descriptive design. Non-experimental designs provide an extensive description of the relationship between an intervention and its effects. With a non-experimental study an evaluator is attempting to find a representative sample, not two equivalent or similar samples. A non-experimental evaluation might use an analysis of existing data or information or a survey of a focus group. Here, they choose a representative sample and then specifically focus on what they intend to study with that representative sample. Non-experimental designs look at identifying characteristics, frequency, and associations (Project STAR, 2006, pp. 5-6).

Page 344

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Table 7.1 summarizes the broad characteristics of each of the three design categories. Table 7.1: Comparison of Broad Design Categories. Control Group

Random Assignment

Experimental

Yes

Yes

Quasi-experimental

Yes

No

Non-experimental

No

No

Design Notation Evaluation designs are sometimes represented using X’s and O’s. In these representations, an X represents an intervention or treatment and an O represents an observation. Each treatment and observation is given a subscript to identify the number For example, the following is the notation for an evaluation design that has one treatment followed by one observation. X O1 The following is the notation for an evaluation design that has an observation followed by the treatment followed by two observations. O1

X O2 O3

Each group in the design is given a separate line. If the evaluation design has two groups, there will be two lines in the notation. The following notation shows the notation for an evaluation design that has two groups, one group that received the treatment; the second that does not receive the treatment. Both groups receive observations before treatment followed by two observations. O1

X O3 O5

O2

O4 O6

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 345

Chapter 7

Part II: Experimental Designs for Cause and Effect Questions Cause and effect questions pose the greatest challenge; this is where we need a well-thought out design (as opposed to a particular approach to answering a relatively straightforward descriptive question). In any evaluation with questions involving cause and effect, or impact, the evaluation design attempts to rule out feasible explanations for the observed results – other than the intervention – in order to conclude that it was the invention that made the impact. In other words, we try to be sure that any observed changes can be attributed to the intervention, rather than something else. When addressing cause and effect questions, evaluation design needs to address the question: “What would the situation have been if the intervention had not taken place?” It is not possible to actually observe this but it is possible to estimate what might have happened. Try to construct an approximate group of non-participants in a program. The experimental model has its roots in medical research where it is used often to test drugs and treatment protocols. In applying this design to a health-related evaluation question, a cause and effect question is asked. For example, a development organization wants to lower the incidence of malaria in a region. They pose the question “What is the best way to reduce the incidence of malaria in the region?” A sub-questions might be “Do treated bed nets reduce the incidence of malaria in the region?” The experimental design takes the question and turns it into a proposition. In the case of the malaria example, the proposition might be. If people in the region are given treated bed nets, then there will be fewer cases of malaria in the region. As presented in Chapter 4: Understanding the Evaluation Context and Program Theory of Change, a theory of change might be developed for the proposition. Randomization is the important factor in a medical experimental design For example, a group of patients might be selected from those volunteering for random trials with the same stage and type of disease, same gender, etc. Individuals are then assigned by chance – as in a lottery – to one of several drug regimens. One sub-group might be taking the drug that represents current knowledge; another sub-group might receive another promising new drug. Not only are the individuals themselves unaware of which drug they are receiving, but the medical staff are also unaware.

Page 346

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

In the malaria example, households might be randomly assigned to Group 1 and Group 2. Households in Group 1 might receive no treatment, while households in Group 2 might receive treated mosquito bed netting. One of the trends in development evaluation is towards using experimental designs. The movement comes from a frustration with lack of knowledge – despite many years of evaluation – of what works in the development context and under what conditions. We will return to this discussion later in this chapter. Our immediate point here is that experimental designs are generally used to address cause and effect questions. In a classic experiment, there are six steps: •

formulate a hypothesis

•

measure the dependent variable (i.e. obtain a baseline)

•

randomly assign cases to the intervention and nonintervention (i.e. control) group

•

introduce the treatment or independent variable in the intervention

•

measure the dependent variable again (post-test)

•

calculate the differences between the groups and test for significance.

Again, to use our malaria example, the question is whether treated bed netting reduces the incidence of malaria in a region. The six steps might be: •

hypothesis: Households use of bed nets treated with mosquito repellent will reduce the incidence of malaria.

•

establish baseline (dependent variable) to determine the number of cases of malaria in the region for a two month period.

•

random assignment: Using a lottery, households in the region are randomly assigned to treatment and nontreatment groups.

•

introduce treatment: The one group is given bed nets (treatment group) and the other group is not (control group).

•

measure again: After two months determine the number of cases of malaria in each group during the two months.

•

calculate the difference between the two groups.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 347

Chapter 7 The design notation for the classic experiment with two treatment groups and one control group looks like this: O1

X1 O4

O2

X2 O5

O3

O6

In some cases, evaluators also use an R in front of the line for each group to show that the group had random assignment. For an experimental design the notation would look like this: R O1

X O3

R O2

O4

Humans are complex, however, and social situations are difficult to analyze. The experimental design works well when testing drugs on human bodies which are relatively similar. But when looking at complex human behavior, we need, even in this simple example, to be aware that we may be seeing false positives or false negatives. False positives occur when the study indicates the intervention was the cause of the effect when it actually is not the cause. False negatives occur when a study shows no link between an intervention and success when the intervention actually is the link to the success. For example, we might get a false positive because household data are self-reported. The region is quite aware of the experiment and the treatment households under-report cases of malaria to please government officials. No real positive has occurred. False negative may be more common. We may find no difference between the two groups because our postintervention two-month measurement period was taken in the dry season with little incidence of malaria for either group. Or a false negative might occur because treatment group households fail to use the nets every night or perhaps those in the no-intervention groups purchased nets themselves as they do not want to risk malaria if there is an easy means of prevention. If this study asked no additional questions, it would obviously be hard to interpret results even in this simple randomassignment intervention. It would have been helpful in explaining the results, or lack thereof, if the evaluators had also asked, for example:

Page 348

•

What information did households received on the use of the bed netting?

•

How was the “household” determined?

•

What implementation issues were raised by intervention and non-intervention groups?

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

•

How did the incidence of malaria in the region compare historically for the two, two-month periods?

•

Were any other malaria prevention efforts going on in the region at the time of this intervention?

Notice that these are all descriptive questions that could use simpler designs.

Control Groups An experimental design attempts to rule out or control for other factors that may be competing explanations for the results in the “experiment”. When using experimental design, evaluators compare equivalent groups. The control group is the group that is exposed to the usual conditions; they are not exposed to the intervention. The group exposed to the intervention may be called the treatment group. Using control groups allows for a comparison between groups that were exposed to the intervention to be compared to groups that were not exposed to the treatment. •

The group that receives the treatment may be called the treatment group.

•

The group that does NOT receive the treatment may be called the control group.

If an intervention causes change, then those who participate in the treatment group will show more change than those in the control group. For example, a country has had an outbreak of skin infections. A development organization in the health sector learns that a pharmaceutical company has developed a new ointment to treat skin infections and wants to see if it will work on the skin infections in the country. The development organization designs an intervention to see if the ointment will heal the skin rash. They place people with the skin rash into two groups. The control group does not have the ointment applied to the rash. The treatment group applies the new ointment to the rash. If those with the new ointment have more improvement in the skin rash than those with the no ointment, then the intervention (applying the ointment) might be the reason for the improvement. Sometimes the control and treatment groups receive the same intervention but the treatment group receives it before the control group. In the ointment treatment, the treatment group might use the ointment first, if they seem to be improving more than those with no ointment, and then the ointment can be given to the control group.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 349

Chapter 7 However, not all the other factors have been ruled out. What if the people that receive the ointment are different in some important ways from the people that did not? Maybe the control group was in an area that had polluted water and the treatment group lived on a lake with fresh, clean water. Or maybe the some of the people in the treatment group also used a locally grown herb and applied it to the rash Control groups often involve withholding an intervention from some of those in need. Sometimes withholding the intervention is justified because there are not enough resources to serve all those in need. In other cases the intervention is unproven so it is uncertain that something of value is being withheld. It can be politically difficult to explain why a group of people in need is being denied an intervention (Patton, 2008). Case 7-1 shows another example of using control groups.

Case 77-1: Impact of Job Training Programs for LaidLaid-Off Workers Many developing countries face the problem of retraining workers when state-owned enterprises are downsized. In one such case, a training program was put into place and subsequently evaluated. Evaluating training programs is challenging as they frequently have several different components to serve different constituencies and ways to measure outcomes: employment, selfemployment, monthly earnings, and hourly earnings. Questions: Questions Are participants more successful in re-entering the labor market than are non-participants? What is the cost effectiveness of the different training programs? Comparison Groups: Groups Participants receiving training were matched with a similar group of non-participants. Administrative data and survey data were used. Statistical techniques were used to measure the impact of the training program. However, it is possible that the people who participated were different in some way that made a difference in terms of their outcomes. Maybe they were more motivated or had more job experience, so it was easier for them to find new employment. To strengthen this design, it would be necessary to randomly assign those eligible to participate or not to participate in the program so as to eliminate the possibility of differential attributes of the two groups.

Page 350

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Random Assignment Experimental design involves randomly assigning potential program participants to intervention and non-intervention groups, to maximize the probability that the groups are equal on factors that could influence the program or intervention results. These could be age, gender, education, attitudes, past history, villages, roads, schools, farms, and so forth. Think of a new drug treatment where patients are randomly assigned to the new drug, current alternate drug therapies, or a placebo. Patients, and the health-care providers, do not know who is getting which drug. Through random selection, eligible participants are assigned to a group with new drug A, others get assigned randomly to a group and continue with the old drug B, and others, also randomly selected, take a placebo. In the ideal world, we would be able to randomly decide who will receive (or not receive) the intervention. A problem in evaluation is to identify a credible control group. One way to do this is by allocating the project or program resources in a random manner. The project or program beneficiaries are then a random sample of the population as a whole. This sample can then be compared with those of another randomly drawn sample of the non-beneficiaries of the project or program (the control group) (White, 2007, p. 11). In this situation, random assignment may not only enable the use of a strong design to measure impact, it may also be more equitable; no bias or favoritism is in play when assignment is based on chance. However, in a different design, the manager might take a random sample from the motivated people. Within the group of motivated people, a sub-group can be selected randomly to receive the intervention and compared be compared to those who do not. Although random assignment is more applicable to development interventions than may be thought, it is not always an option. Sometimes all eligible people are to receive the intervention, and/or it would be unethical to withhold it from some of them. It is possible, however, to be in a situation where the intervention is not large enough to accommodate all those who apply to participate. A manager might want to assign those with the best chance of benefiting from the intervention to participate. This may be a way to get the most benefit from limited program dollars. For example, a manager of a training program understands that if they choose highly motivated people to participate in the program, it is likely to yield the best results. However, from an evaluation perspective, if the best people are assigned to the program, there is likely to be a bias in the results. The Road to Results: Designing and Conducting Effective Development Evaluations

Page 351

Chapter 7 When random selection is not possible, one option is to collect data about factors that might be different between the two groups, and that seem likely to impact outcomes. These variables are then built into the data analysis as control variables. Using control variables allows us to rule out some alternative explanations even when random assignment is not possible. When selecting groups, consider the problem of selection bias. In selection bias, a difference between the participants and non-participants may be based on the differences that are not observable in the groups and not the effects of the intervention. The process of randomization ensures that before the intervention takes place the treatment and control groups are statistically equivalent, on average, with respect to all characteristics. Randomized experiments solve the problem of selection bias by generating an experimental control group of people who would have participated in a program but who were randomly denied access to the program or treatment. The random assignment does not remove the selection bias but instead balances the bias between the participant (treatment) and non-participant (control) groups, so that it cancels out when calculating the mean impact estimate. Any differences in the average outcomes of the two groups after the intervention can be attributed to the intervention (World Bank, 2008, Evaluation Design, Selection Bias). Selection bias occurs in two ways: •

self-selection of participants to participate (those who want the intervention choose to be involved)

•

program managers select the participants most likely to succeed.

For example, consider again the treated bed netting program. If the program intervention was to introduce treated bed nets into the market at a very low cost, there would be selection bias because only those who purchased the bed netting would be in the treatment group. Those who could not afford the netting or those who did not learn about treated bed netting would not be in the treatment group. Thus the groups are not equivalent. Also, it would be very difficult, to compare the treatment and control group because there may be no record of those who use bed netting and those who do not.

Page 352

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

In development evaluation, many believe that they cannot use these experimental evaluation designs because it is unethical to withhold the “best” alternative from a group of participants, or because political decisions about who gets the intervention prevent randomization. The Word Bank and others have shown, however that randomization is more frequently possible then perhaps thought, since there are often not enough resources for an intervention to be delivered to all eligible participants initially (Boruch, 2004). For example, a new textbook and curriculum reform may be introduced only in some districts initially. The key is randomly assigning districts to the intervention and non-intervention groups so as to systematically test for eventual differences in academic performance. To reduce the possibility of being fooled into thinking we know something as true, which really is not, we can borrow from social science methods. Using experimentation, it is possible to control the implementation of a program, policy, or project, who receives it, and the environment in which it is delivered. When the evaluation can reasonably control everything but the intervention, evaluators can then be fairly certain that any observed differences are the result of that intervention. Suppose the evaluation is of an intervention using a fertilizer that is intended to increase the crop yield of corn. The intervention has a greenhouse where the villagers control the temperature, water, and soil conditions. As part of the evaluation design, two separate growing areas are created within the greenhouse. One is randomly assigned the seeds to plant in the testing area and the other seeds in control area. Both areas receive the same temperature, sunlight, and water and are planted in the exact same soil mixture. The test area receives the fertilizer and the control area does not. At harvest, the yields are measured. If the test area has a higher yield than the control area, the evaluator can conclude the fertilizer made a difference. Now let us think about what happens when the intervention is working in the field instead of the controllable environment of the greenhouse. What happens if the two areas are close together and that fertilizer runs off into the non-test area, thus giving an imperfect measure of the impact of the fertilizer? The non-test area could be moved to a different part of the field – but the soil, light, temperature, or rain may be slightly different. The two fields may also receive different amounts of attention. While the evaluation can still measure impact, it will likely be more tentative about concluding that the fertilizer alone caused those results.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 353

Chapter 7 In the complex world in which development interventions take place, it becomes difficult to determine attribution in the midst of other factors. In the agricultural case, let us suppose an irrigation intervention was implemented during a time of ideal weather and strong demand for the crops. The income in the area where the irrigation intervention was implemented increased over prior years. But is the higher income a result of the intervention? Or is it caused by other factors, such as increased rainfall, general economic good times, or an unusual period of political stability. Ideally we would take those eligible for the water irrigation intervention with a defined area and randomly assign some to the intervention and non-intervention group. But what are the options when random assignment is not possible and the experimental design is thus not an option? In many such cases, quasi-experimental designs are used for attribution. In quasi-experimental designs, there is comparison, but without random assignment of groups. Patton (2007), Bamberger and White (2008), and Chatterji (2007) have excellent papers on the limitations of experimental designs in the real world.

Page 354

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Part III: Quasi Experimental Designs and Threats to Validity for Cause and Effect Questions Sometimes it is not possible to randomly assign people to groups for comparison purposes. However, it might be possible to make other comparisons. For example, we might introduce the intervention in phases. Those sites receiving the intervention first can be compared to those receiving the intervention later, it they are randomly assigned. Without random assignment, there is still a good chance that the comparison group differs in important ways from the group receiving the intervention. If that is true, the intervention might look effective (or ineffective) simply because of pre-existing differences between recipients and non-recipients.

Internal Validity When we talk about eliminating other possible explanations, we are talking about internal validity. Internal validity refers to the design's ability to rule out other explanations for the observed results. An evaluation design with strong internal validity enables evaluators to be more confident in their conclusion that the intervention did or did not cause the observed results. A design with weak internal validity makes it harder to convince others that the intervention caused the observed results. Yet, it is important to keep in mind that these threats are just possible rival explanations; they might not actually exist. Thus, internal validity is context-related. Quasiexperimental designs need to address threats to internal validity. The United Kingdom Evaluation Society (2003, Glossary of Terms) defines internal validity in this way: The confidence one can have in one's conclusions about what the intervention actually did accomplish. A threat to internal validity is an objection that the evaluation design allows the causal link between the intervention and the observed effects to remain uncertain. It may be thought of as a question of the following nature: could not something else besides the intervention account for the difference between the situation after the intervention and the counterfactual?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 355

Chapter 7 In what has become a classic text, Cook and Campbell (1979) identified several common threats to internal validity. They are: •

history

•

maturation

•

repeated testing

•

selection

•

mortality

•

regression to the mean

•

instrumentation

The history effect refers to the possibility that outside events that occurred during the course of the intervention or between repeated measures may have influenced the results. History will always be a threat in longitudinal research. It is also perhaps the most difficult to detect because the evaluator must investigate events that occurred during the intervention that may effect the results. When looking at the results for an individual, historical events that affect the results are quite probable. Personal history involves a succession of events, some of which may be trait changing. On the other hand, for a group of individuals, a historical threat to internal validity must identify an event that simultaneously affected most or at least some of the individuals enough to appreciably change the measured trait. If all individuals are members of some group, the search for this event may be conducted through interview or observation; if the participants are independent, the likelihood of such an event simultaneously changing the trait will be small unless the event occurs in a common setting where all participants are located, such as a hospital (Brossart, Clay, & Willson, 2002). For example, during the course of a program aimed at highrisk youth there is a heinous crime committed by a juvenile offender. The situation brings about a corresponding outcry for tougher responses to high-risk juveniles. The situation may alter the types of youth referred to the program and presumably affect the results. Attitude surveys are particularly subject to an influence of this type since opinions may largely be influenced by recent events and media presentation of topical issues (Office of Juvenile Justice and Delinquency Prevention, 1989: pp. 38-40).

Page 356

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Another common example is an intervention such as the introduction of a new seed or improved cultivation training for farmers located in a particular province. The dependent variable or outcome measure might be increased income from crops compared to the pre-intervention year. But it could be that farmers stuck to their old ways of cultivation and tried and true seed, even though their income increased on average. Deeper investigation might show a year of excellent climate for crop production. Climate rather than the intervention was the cause of the results. Events outside of the intervention, history, have influenced the results. Before and after random assignment often suffers from the history effect. The maturation effect occurs when results are due to aging or development. As people age they mature. As they mature they may feel differently, or act differently to questions or situations. As they mature they may be able to do things quicker or better. Changes that may naturally occur due to the passage of time, such as becoming older, smarter, or gaining experience are maturation effects. This can occur with a group of people also, either between or within groups. Children will be able to jump further at age 8, on average, than at age 6. Also, they may be better readers over the two-year period even without additional training. In the same way as people mature, organizations can develop and change. These changes may have nothing to do with the intervention, but may be part of a natural cycle of growth and development. Before and after designs often are weak because of the maturation effect. Maturation may be conceived as occurring in two forms: shortor long-term. Short-term maturation is demonstrated by fatigue and learning. Long-term maturation deals with psychophysical development, cultural changes, and environmental changes that can affect psychological constructs. In situations when measurements are made several months apart, long-term maturation is potentially important. For example, an evaluation may investigate the effects of a twoyear reading program on reading scores for primary school children. After two years, as the children grow their cognitive skills will increase with or without the reading program. How can the evaluator be sure the increased reading scores were due to the reading program and not to maturation of the students?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 357

Chapter 7 The repeated testing effect (short-term) occurs when subjects are given the same test pre- and post-intervention or multiple times. The subjects learn the questions and may learn how to respond to the questions, rather than reflect any change because of the intervention. For example, an intervention is attempting to improve the skills of teachers in rural schools. Teachers are given performance tests at the end of each month and are evaluated by selected members of the evaluation committee using a checklist and teachers are provided feedback on their performance monthly. The teachers may improve on skills measured on the standard checklist just by repeated skill testing. Selection bias is another threat to validity that may affect interventions and limit validity of findings using multiple groups. The change is due to differences between groups. A selection threat is generated when one group self-selects into a program and is compared with another group made up of those who did not volunteer or self-select into the program. The two groups are not equivalent. The ones self-selecting are more likely to improve skills or change attitudes even without the program intervention than those not self-selecting into the program. Selection bias may even be present in those who choose to complete a survey versus those that do not respond. Self-selection bias is possible in any program where people or firms find out and sign up for a program. The participants in groups are unlike. This is a risk for quasi-experimental designs, often applied to social programs. The mortality effect refers to drop outs from an intervention. Losing participants can create a false treatment effect that appears to be the result of the intervention. Just as selection can be a source of bias, so can the differential dropout rate or mortality among participants. While there is a strong temptation to present results on only those who successfully complete the program, this also will result in a biased group. Those who drop out likely had lower or poorer performance than those completing it. While program completion and obtaining the full treatment effect are important inputs to an evaluation, they should not cloud the comparison with the performance of a control group.

Page 358

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

For example, a teacher education program with 400 participants was using graduation rates as one way to determine the success of the program. After the three-year program, 25 of the participants had died from HIV/AIDS. The loss of these participants will artificially lower the graduation rates creating the impression that the program was less successful than it may actually have been. Consider another example, using the same teacher education program. Suppose the teacher’s college sponsoring the program had a policy that pregnant women could not attend classes or sit for examinations. The women that became pregnant would not be allowed to complete their studies and sit for exams so they would not be included in the graduation rates. The regression to the mean effect may make scores on a test higher or lower. The phenomenon of regression shows a natural tendency for individuals who score either very high or very low, to score closer to the middle when retested. Also, if a measure is not reliable, there will be some variation between repeated measures. The chances are that the measurements will move towards the middle instead of towards extremes. Thus, in programs selecting individuals or groups based on their extreme scores changes in performance could be expected as the "extreme" group will regress towards the mean, whether it has benefited from the program or not. For example, a program to improve book-keeping skills in micro-credit intervention chooses the participants based on short test scores for arithmetic ability. Those with the very highest scores were selected. If these same participants were given the same short arithmetic test after the intervention, their scores might decrease because they are moving closer to the mean score. The instrumentation effect occurs if the reliability of the instrument changes. The changes can be due to changes in measurement due, for example, to calibration (if using a measuring device). For example, an evaluation of a program trying to improve adult weights by providing nutritional information may show no significant effects if the scales used to measure body weight have not been calibrated or vary in how and when they were calibrated.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 359

Chapter 7

QuasiQuasi-experimental Designs Quasi-experimental design methods can be used to carry out evaluation when it is not possible to construct treatment and control groups using random assignment. In quasiexperimental design, the evaluator constructs groups that are as equivalent on important characteristics (e.g. age, gender, income, socioeconomic background etc.) as possible. These equivalent groups are then compared. Sometimes the evaluator can create a comparison group by matching key characteristics. Other times the evaluator will find a comparison group that is not exactly the same as the group that received the intervention, but is similar enough to provide some comparison. For example, an evaluation might be able to compare a village that received an economic development intervention with one in the same geographic area that did not receive the intervention. The evaluators are looking for a good match with the characteristics of those in the treatment group. In these designs, they may have to collect more information to make a case that the intervention outcomes cannot be explained by factors other than the intervention. The notation for a quasi experimental design looks the same as for experimental design. The difference is that there is a nonequivalent assignment of subjects to groups for quasi experimental designs. In some cases, at the start of the line for non-equivalent groups an N is indicated. Thus a basic quasiexperimental design would look like the following: N O1

X1 O3

N O2

O4

To answer whether an intervention made a difference, the evaluation has to show that performance on the key measures or indicators changed as a result of the intervention. There are many quasi-experimental designs; some stronger than others. Some of the same basic designs used to answer cause-andeffect can be used to answer descriptive questions questions. The key difference is that for cause and effect questions, comparison groups are needed, either using random assignment (experimental design) or constructed comparisons (quasi-experimental designs). An example of a quasi-experimental design is the evaluation of El Salvador’s EDUCO Program (World Bank, 1998) (See Case 7-2).

Page 360

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Case 77-2: Do CommunityCommunity-Managed Schools Work? An Evaluation of El Salvador’s EDUCO Program This evaluation was intended to measure the effects of decentralizing educational responsibility to communities and schools on student outcomes. El Salvador’s Community-Managed Schools Program (EDUCO) was designed to expand rural education rapidly following a civil war. It compares student achievement on standardized tests and school attendance of rural students in EDUCO schools to those in traditional schools. It controls for student characteristics and selection bias using statistical controls. In 1991, the Minister of Education expanded education in rural areas through the EDUCO program. This is an innovative program for both pre-primary and primary education to decentralize education by strengthening direct involvement and participation of parents and community groups. An elected Education Association drawn from the parents of the students manages the EDUCO schools. The question is whether quick expansion to rural area has come at the rd expense of learning. This study compares outcomes measures of 3 graders in EDUCO and traditional schools. Outcome measures are based on standardized tests in mathematics and language. However, because tests scores may be unresponsive in the short term, the evaluators also looked at the school days missed by a student due to teacher absence. Differences in educational outcomes, however, can be affected by factors other than school. These include differences in household background, the school’s inputs, and organizational factors. The evaluators needed to determine whether differences in test scores (as a measure of student achievement) were due to differences in type of school, or to other factors. Factors apart from type of school (EDUCO or traditional) that might explain student achievements are: • Household characteristics (education, family size, income) •

Student characteristics (gender, age, number of siblings)

•

School data (enrollment, teacher quality, school facilities and finances)

•

Teacher characteristics (educational background, years of experience).

The evaluators used data collected by surveys administered by the Ministry of Education to construct a model that would measure the independent impact of the type of school while controlling for those other factors. Using complex statistical modeling that controlled for all of the above factors the evaluators concluded that the achievement scores of children in EDUCO and traditional schools are about the same. The rapid expansion did not have an adverse impact on learning, even controlling for a range of other variables. In other words, the community-managed schools were as effective as regular schools.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 361

Chapter 7 Common examples of quasi-experimental designs are: •

matched and non-equivalent comparison design

•

time series and interrupted time series design

•

correlational design using statistical controls

•

longitudinal design

•

panel design.

•

before-and-after

•

cross sectional design

•

propensity score matching.

Non--equivalent Comparison Matched and Non Design While in quasi-experimental designs subjects are not assigned randomly to groups there can still be a control group or a comparison group. These groups can then be called nonequivalent groups. The groups can still be compared but evaluators need to carefully consider the threats to internal validity as presented earlier. In an attempt to make groups more equivalent, the evaluators try to match the groups. Matching can be done using skills tests, performance tests, judgment scores, etc. They may give all subjects a pre-test, and then select groups using their scores on the test, by matching number of high scores thorough low scores, equally in both groups. For example, an intervention to improve awareness on gender issues, a pre-test for was administered covering concepts and principles of gender awareness. All of the scores were ranked from highest to lowest. Of the two highest scores, on was placed in one group and the other in the second group. This matching would continue for similar scores, one in each group. The evaluators would attempt to match the groups as close as possible to each other using the scores on the pre-test. If the intervention (gender awareness training) was successful, then the group receiving the training should perform better on a post-test. Although the groups are not equivalent, they were matched as closely as possible by the pre-test scores. The notation for a matched, non-equivalent comparison would look like the following: O1

O3 X

O2

Page 362

O4

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Time Series and Interrupted Time Series Design A time series design measures the performance of one group multiple times before the intervention, administers the intervention and then measures the same group for performance multiple times after the intervention. For example, for the gender awareness program, the evaluators might administer a pre-test to a group three times, on the first day of May, June, and July. They would then be given the gender awareness training. After receiving the training during the month of August they would be given a post-test on the first of September, October, and November. If the training was successful there should be a significant increase in the scores on the September 1 test and should be maintained (or possibly higher) on the scores for October and November. The time series design can be done with one group, or using more than one group. The notation for a time series design within a group would look like the following: O 1 02 03

X O4 O5 O6

Time series design between groups would look like this: O 1 03 05

O7 O9 O11 X

O 2 04 06

O8 O10 O12

Correlational Design Using Statistical Controls Occasionally ethical and practical problems arise that are not possible to evaluate using experimental design. Experimental design manipulates the variables for the group who receives the treatment. Correlational design looks at variables that cannot be manipulated for the study. Each subject is measured on any number of variables and the sometimes complex relationships among the variables. The analyses of these data are usually done using a data analyst. For example, an evaluation seeking to learn is there is a link between occupation and the incidence of HIV/AIDS. In this case an extensive questionnaire can be distributed to a large percent of the population asking about occupation, who they contact, where they spend time away from home, etc. The data can be analyzed to see if there are any patterns for occupations with higher rates of HIV/AIDS and why.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 363

Chapter 7 Correlational designs are often used when we are seeking to answer questions about relationships. Correlational designs can be used with data already available, or with new data. For example, an evaluation wants to find out if having women in political offices is related to a more honest government, a correlational design could be used. Data about the proportion of women in political office in the different areas within a country and the amount of reported corruption could be correlated with each other. Correlational evidence alone cannot establish causality – even if governments with more women in office are correlated with less corruption, it would still be necessary to rule out any plausible alternative explanations for the relationship. Since a correlational design can be set up in different ways, the notation might look differently. The first notation shows a design with three groups and one observation. The second notation shows two groups, on receiving the treatment. The third notation shows three different treatments (signified by X, Y, and Z) each followed by an observation. O1

O1

X

O1

O2

X O2

Y

O2

Z

O3

O3

Longitudinal Design A longitudinal design occurs over a long period of time. Subjects are assessed a several different times during a long time. The purpose of the design is to see how things change over time. For example, the health care field might be interested in investigating the long term health concerns for children who were born to mothers with HIV/AIDS but received a drug to prevent transmission of the virus. The longitudinal design would track the children as they grew and matured, noting any health concerns that arose. The results would be compared to see if there were any similarities in health problems among these children. Longitudinal studies can be expensive, difficult to conduct, and will have difficulty with attrition (as subjects die or lose contact). Besides these difficulties, they can provide a wealth of information that cannot be found using other designs. The notation for a longitudinal design would look similar to the following: S

Page 364

O1

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Panel Design One form of a longitudinal design is a panel design. Instead of following individuals, as in the longitudinal study, a panel design follows a sample of subjects over time. The sample might be a family, a unit in a hospital, a faculty in a college, or other group of people. For example, a program might be investigating the shifting attitudes and patterns of behavior about gender over time for students at a school. The panel design would collect information about gender attitudes for each member of one class from Grade 1 to Grade 6. The notation for a panel design might look like the following: O1 O2 O3 O4 O5 O6

Before--and and--After Designs Before Before-and-after design is one way to measure change. Change is measured by comparing key measures after the intervention began against the measures taken before the intervention began. Pre- and post-tests are common before-and-after measures. The “before” measure often is called the baseline. Collecting baseline data is sometimes called a baseline study. However, a design with only one before-and-after measure is insufficient by itself to demonstrate that the intervention alone caused the change. Maybe the people changed their behavior because they were being observed or maybe something else occurred at the same time as the intervention and that something was the real cause of the changes we observed. It is also important to remember that in a situation where there is little change in performance on the measures, evaluators should be hesitant to conclude that the intervention did not work. For example, an intervention to reduce poverty was implemented in a country. Everyone was eligible to receive the intervention so there was no comparison group. At the end of ten years, the proportion of people in poverty did not change. Can the evaluator conclude that the poverty reduction intervention did not work? It could be that without the intervention, a greater proportion of people would have been in poverty. Before and after measures are not usually regarded as giving credible results for questions because they do not control for other factors affecting the outcomes. They only look at the comparison from before and after. There is no comparison of with and without the intervention (White, 2007, p. 10). This design element should be used with other design elements for evaluation questions.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 365

Chapter 7 The notation for a before-and-after design can be represented as: O1

X

O2

That is, ad observation is made first, followed by a treatment or intervention, followed by an observation.

Cross Sectional Designs Cross-sectional designs show a snapshot at one point in time. But in this case, evaluators are interested in sub-group responses. This design is often used with a survey method. The subgroups may be based on sub-group characteristics such as age, gender, income, education, ethnicity, or amount of intervention received. For example evaluation question focus on whether sub-groups of citizens or beneficiaries of an intervention are satisfied with services they received, or why they do not use services. Evaluators use this design to learn how the sub-groups compare on variables such as services received, or on use of services, or opinions of services, for example, at one point in time. Sometimes the question may be to find out the current status of people who participated in an intervention a few years ago. The notation for cross-sectional design is represented as: X O1 O2 O3 "

Page 366

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Propensity Score Matching Another quasi-experimental design is propensity score matching (PSM). Propensity score matching is used to measure a program’s effect on project participants compared to nonparticipants with similar characteristics (White & Masset, 2005, slides 7-8). To use this technique, baseline data must first be collected. The next step is to identify observable characteristics that are likely to link to the evaluation question. For example, consider the question “Do girls living near the school have higher graduation rates than those who walk more than 5 km from their home to the school?” The observable characteristics might be: gender, age, marital status, distance from home to school, room and board arrangements, other responsibilities boarding facility, number of siblings graduating from secondary school, birth order, etc. Once the variables are selected, the treatment group and the control group can be constructed by matching each person in the treatment group with the one in the control group that is most similar using the identified observable characteristics. The result is pairs of individuals or households who are as similar to one another as possible, except on the treatment variable (White, 2007, p. 10). PSM should be used for the evaluation of voluntary programs to avoid selection bias. It requires large datasets and computing capabilities. It provides a more reliable assessment of the project or program effect on the participants (White & Masset, 2005, slide 19). There are software tools to assist with implementing the matching for propensity scores. Stata is the most commonly used tool (Aliendo & Kopeinig, 2005, p.19).

Structural Equation Modeling Structural equation modeling (SEM) is a statistical technique for testing and estimating causal relationships using a combination of statistical data and qualitative assumptions. SEM models represent translations of a series of hypothesized cause and effect relationships between variables into a composite hypothesis concerning patters of statistical dependencies (Shipley, 2000).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 367

Chapter 7

Causal Causal Tracing Strategies Many of the strategies for determining whether observed changes are due to the intervention (as opposed to some other cause) require a fairly structured and quantitative data collection strategy. For the evaluator who is conducting a rapid assessment, or one who is evaluating a very small, or new and untried intervention, such strategies may be neither practical nor advisable. Although it is best to choose a stronger evaluation design, in these situations, a weaker design using causal tracing strategies may be the only option. What options are available when the sample size is small, the data collection strategies are largely open-ended, and/or when sophisticated statistical analysis is not feasible? There are some resources available on how to use causal tracing strategies, which are often pared with qualitative and mixed method (qualitative and quantitative) studies. Broadly speaking, the principle is the same – systematically rule out alternative explanations, one by on until it is most likely that the changes being observed are indeed caused (primarily or at least substantially) by the intervention – or are not. Recall the “Elephant in the Village” story at the beginning of this chapter. To use causal tracing strategies to answer the evaluation questions “What causes the elephants to leave the village?” the evaluator must rule out, one by one, each of the strategies. That is, use only the whistle and see the reaction of the elephants and record the results. Then use the banging of the pots and pans, see the reaction and record it, and so forth. The art of evaluation design is making the strongest, most credible case that we can, ruling out alternative explanations. The following is a list of eight possible sources of to consider when using causal tracing (adapted from Davidson, 2000): 1. Causal list inference – we know that this particular outcome is almost always caused by one of the following: A, B, C, or D, and on this occasion neither B, C, or D occurred, so we can be almost sure the cause was A. While we cannot apply randomization, we can draw from studies that did.

Page 368

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Example: For each of these causal tracing strategies, the evaluation question is “Did blowing whistles cause the elephants to leave the village?” The villagers know that the elephants turned away when they A) blew whistles, B) hit pots and pans, C) shouted out, and D) ran around kicking up dust. IF when the elephants arrived, the villagers only responded doing A) and none of the other techniques, they can almost be sure the cause of the elephants leaving was A) blowing whistles. 2. Modus operandi (MO) inference – or pattern of behavior, useful if more than one possible cause occurred: •

We know that this outcome is almost always caused by one of the following: A, B, C, or D, and on this occasion, neither C nor D occurred, which narrows the cause down to A or B. In addition, only the characteristic causal chain/MO/telltale pattern of events for A was present

•

This inference is strengthened if the MO for A is highly distinctive/very different from that for B.

Example: IF the villagers learned from another village that elephants did NOT leave when they chased towards them and ran around causing dust to arise AND when the elephants arrived, the villagers did A) blowing whistles and D) ran around kicking up dust, they can almost be sure the cause of the elephants leaving was blowing whistles. This is strengthened because blowing whistles is very different than kicking up dust. 3. Temporal precedence – the observed effect only happened after the intervention had begun, not before. Example: IF the elephants arrived AND THEN the villagers began blowing the whistles, AND THEN the elephants left the village, the villagers can believe there may be some connection to the whistles driving the elephants away. BUT if they were blowing whistles before the elephants came and they still came to the village, then the whistle blowing probably did not cause the elephant departure. 4. Constant conjunction – the effect was observed everywhere the intervention was implemented. Example: IF the villagers met with villagers from the entire region and shared their hypothesis that blowing whistles causes the elephants to leave. AND the other villages try this AND when elephants came to their villages AND they blew whistles elephants left the villages, THEN they can be almost be sure that blowing whistles caused elephants to leave villages.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 369

Chapter 7 5. Strength of association – the observed change was much stronger where the program was implemented than it was where other possible causes were present. Example: IF all of the villages in the region used many different techniques to drive elephants from their villages AND those villages that used whistle blowing as one of the techniques were more successful for driving the elephants away, THEN they can associate the elephants leaving with whistle blowing. 6. Biological gradient – the more treatment received, the larger the observed change. Example: IF the villagers used more than one technique to drive the elephants away from the village AND they blew multiple whistles very loud, THEN the elephants left, BUT when they only blew one whistle, the elephants did not leave, THEN they can associate the elephants leaving the village with loud whistle blowing. 7. Coherence – the relationship we see between the intervention and the observed change fits logically with other things we know about the intervention and this particular outcome. Example: IF the villagers had noticed that other animals (hippopotami, crocodiles, and hyena) left away when they blew whistles AND THEN because the other animals also are dangerous animals, the villagers can logically associate whistle blowing and animals leaving the village. They may want to begin driving elephants from the village by using whistle blowing. 8. Analogy – the pattern we see between the intervention and the observed changes resemble the well-established pattern we know about between a related intervention and its effects. Example: IF the villagers heard a story telling how a village in a South America used the sound of a high pitched, loud whistle whenever they heard that a puma (also know as: mountain lion, panther, cougar) was observed in the area. The South American village believed the noise kept the puma away. THEN the African villagers could associate with the story (analogy) with their problem with elephants and find a relationship between loud, sharp noises and driving elephants away.

Page 370

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

When designing a data collection strategy, consider which of the above pieces of evidence it is feasible and necessary to gather, and plan how to obtain them. Not all are needed to be able to make causal attributions; evaluators gather the pieces that make the most sense, and that together will give sufficient certainty about the findings, given the decisions that will be based on the evaluation.

Part IV: Designs for Descriptive Questions In Chapter 6: Developing Evaluation Questions and Starting the Design Matrix, we indicated that descriptive questions, ask questions like “how many?” or “how much?” They may also ask for perceptions or opinions. Descriptive questions generally use descriptive or non-experimental designs. Thus, these designs, when used to answer descriptive questions, do not involve a comparison group that did not receive the intervention. They focus only on those who have received the intervention. Some of the designs used for descriptive questions are the same as those for cause and effect questions. To answer descriptive questions, the most common designs are: •

one shot

•

cross-sectional

•

before-and-after

•

simple time series

•

longitudinal design

•

case studies.

One--Shot Designs One A one-shot design is a look at a group receiving an intervention at one point in time, following the treatment or intervention. We can think of one-shot design like a camera taking a snap shot that has the date printed on it. It is a picture, as of that date. We may use this design to answer questions such as: •

How many women were trained?

•

How many participants received job counseling as well as vocational training?

Implicit in the above questions is the “as of” some date. The date may be as of the end of the last calendar year or month, or the date of the project completion, or a time interval. The time or period covered is variable but some period of time must be specified.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 371

Chapter 7 Evaluators may use one-shot design to ask program participants questions about how well they liked a program, for example, or to determine how they found out about the services offered: was it the program’s outreach program or something else? The notation for one-shot designs can be represented as: X

O 1.

Cross Sectional Designs Cross-sectional designs were introduced under answering cause and effect questions. They show a snapshot at one point in time. A cross-sectional survey selects a sample of citizens, intervention beneficiaries, or former intervention participants at one in time. It would then gather data from them and report what they said. A cross-sectional design might answer questions such as: •

Do participants with different levels of education have different views on the value of the training?

•

Did women receive different training services than their male counterparts?

The notation for a cross-sectional design is represented as: X O1 O2 O3 "

Example of crosscross-sectional design design for descriptive questions In evaluating a program designed to economically empower women to launch their own small businesses, the evaluators wanted to find out the views of women who have been through the program. Their views can shed light on whether what they learned in the economic empowerment program helped them launch a viable business, the kind of business they went into, and whether what they learned in the program was useful for running the business. With limited resources, the evaluators opted to conduct a short survey of recent program graduates (a one-shot design). The survey instrument contained questions on the demographic characteristics of the participants so that responses could be compared for women by level of education, age bracket, and ethnicity.

Page 372

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Before and After Designs Before and after designs were introduced in a previous section discussing designs for cause and effect questions. These designs can also be used to answer descriptive questions. In the before and after design, often called a pre-and post-design, we ask about group characteristics before and after the intervention. There is no comparison group. Thus, we may ask whether our program participants increased their knowledge of parenting techniques, and test them at program entry and following program completion. The notation for before-and-after designs can be shown as: O1

X

O2

Example of before and after design for descriptive questions We could look at the wages of our vocational training program participants before our training intervention and two years following the program to address the question of how much, on average, wages have increased. We could also make this a cross sectional before and after design by asking questions about the relation of wage increases to different types of vocations.

Simple Time Series Designs Time series designs were first introduced earlier under designs for cause and effect questions. A simple time series design can also be used to answer descriptive questions. Simple time series designs look for changes over time, generally to identify trends. The purpose, when used to answer descriptive questions, is to explore and describe changes over time either after, or before and after, the intervention. Thus, time series designs can be used to discern trends. They can be simple time series designs or cross-sectional designs. The notation for simple time series is represented as: O1 O2 O3

X

O4 O5 O6

Example of simple design for descriptive questions Child mortality rates might be examined several times before and after an intervention, providing maternal nutritional supplements, to identify the trends. Or changes in participant attitudes over time towards women entrepreneurs might be examined.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 373

Chapter 7

Longitudinal Design Longitudinal design, as introduced under designs for cause and effect questions can also be used to answer descriptive questions. A longitudinal design is another type of time series design. This design is one in which repeated measures of the same variable are taken from the same people (or from sample groups in the same population). When used for descriptive questions, a longitudinal design might be used to answer, for example, whether those children attending enrichment program maintain learning gains over time. A panel design (also introduced under designs to answer cause and effect questions) can also be used to answer descriptive questions. Recall that a panel design is a special type of longitudinal design in which a smaller group of people is tracked at multiple points in time, and their experiences are recorded in considerable detail. Panel designs almost always use qualitative (open-ended survey questions, in-depth interviews, and observation) as well as quantitative data. Panel designs can give a more in-depth perspective on any changes people may be experiencing as a result of an intervention. The notation for a longitudinal design can be depicted as: X O 1, O 2, O 3 …

Example of longitudinal design for descriptive questions A study looking at Poland’s family allowance used panel data gathered from the same people between 1993-1996 to find out how families receiving social benefits transitioned into and out of poverty.

Page 374

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Case Study Designs The case study is also an evaluation design. It is a nonexperimental design, that is, it does not use random selection or control or constructed groups as part of the design. A case study design is frequently used when the evaluator wants to gain an in-depth understanding of a process, event, or situation and explain the “why” of results. It is useful when the question deals with how something works or why something happens, and is especially useful when the intervention is relatively innovative or experimental or not well understood. Case studies emphasize more than descriptions, but also interpretations of situations by those most knowledgeable. Case studies are frequently used in evaluating development interventions. The case study design is particularly useful for describing what implementation of the intervention looked like on the ground and why things happened the way they did. The descriptive case study may be used to examine program extremes, or a typical intervention. Case studies can use qualitative and/or quantitative methods to collect data. They can consist of a single case, or of multiple cases. They often focus on in-depth understandings of the effects of an intervention on organizations, communities, programs, cities, and/or nations. If we were interested in evaluating public transportation in a country, we might simply track key indicators against the baseline and targets. We might do a national study if the indicators are the number of miles covered by public transportation, capacity, the number of people who use the system, and revenues received. However, if we wanted to answer other kinds of questions that require more in-depth data collection, we would opt for case study. We might choose a single case study or we might choose a few different locations to gain a wider range of experiences. For instance, if we were asked to evaluate a program to improve transportation to rural areas we might want to investigate people’s choices about using public transportation. The design might stipulate that we will have to gather the data directly from them. More resources will be required to collect these data if done on a national scale. It is more manageable to gather that data in a more narrowly defined geographic area – a single case. Alternatively, evaluators might opt for a multiple case study, where several cities might be selected.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 375

Chapter 7 The cases might be selected in several ways: •

randomly

•

judgmentally or purposively selected based on some specific criteria −

best case, typical case, worst case, or a mix

−

only large cities or cities of varying sizes.

The same data collection strategies used in the single case study can be used in multiple case studies. Case studies make sense in development where the intention is about understanding a specific situation in order to make or adjust policy or practice. But not only are case studies more practical than trying to do large national studies, they also provide in-depth information that is often appropriate to the decision-maker’s focus. A comparative case study of the use of free immunization clinics would provide greater understanding about why one approach is more or less successful than another. The notation for a case study design may be represented as: O1 O2 O3

Example of case study design for descriptive questions A study investigating a micro-lending program in India wants to explore ways that the women involved conceptualized and initiated their marketing ideas. The case study method chose five women and their projects and followed their progress from the beginning and for three years.

Page 376

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Part IV: Designs Designs for Normative Questions What designs work best for normative questions? The logic for normative questions is similar to descriptive questions, except that normative questions are always assessed against a criterion or standard. The difference between answering normative questions and descriptive questions is that there is a specified desired or mandatory goal, or standard to be reached. Indicators may have been specified and targets against the indicators also set out, and the actual findings are compared to that standard. These may have been a part of an M&E system with indicators and targets. The overall question is: Was the standard met? Generally, the same designs work for normative questions as descriptive questions. Normative questions are similar to performance audit questions. Performance auditing (Mayne, 2005, 2006) concerns itself with some aspect of performance of, or within, an organization. Performance audits can be very similar to evaluation. Barzelay (1997) identifies seven types of performance audits, based on a survey of OECD member countries. Table 7.2 shows four of the most relevant ones. Table 7.2: Main Types of Performance Audit. Type

Unit of Analysis

Focus

efficiency audit

organizational or jurisdiction function, process or program element

Identify opportunities to lower budgetary cost of delivering program outputs

effectiveness audit

policy, program or major program element

Assess the impact of public policies; evaluate policy or program effectiveness

performance management capacity audit

organization or jurisdiction public management issue

Assess the capacity of the systems and procedures of a jurisdiction, organization or program to achieve intended goals

performance information audit

organization

Attest to the quality of performance information provided by the organization

(Source: Adapted from Barzelay, 1997)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 377

Chapter 7

Part VI: The Gold Standard Debated Recently, many leaders in the field of international development evaluation (Scriven, 2006, Davidson, 2006, Bamberger & White, 2007) have been debating the need for more “rigorous” program evaluation in all sectors. They have observed that the majority of evaluations carried out by official development agencies have been mostly process evaluations. Process evaluations focus on how well the program is operating. They also noted the increase in participatory evaluations that add more opinions from beneficiaries, but “did not produce data amenable to quantitative analysis of impact” (Bamberger & White, 2007, p. 58). Another development they noted was the rise of results-based approaches and the focus on the MGDs resulting in greater calls to demonstrate impact. Impact evaluations are concerned with the results that are caused by the project, program, or policy. Among the calls for more rigor in assessment of development outcomes and greater accountability in the use of aid are the following: •

2002 Monterrey Conference on Financing for Development – calling for more use of results-based management in development agencies

•

2005 Paris Accords – encouraged multi-donor cooperation in the promotion of impact evaluations

•

Poverty Action Lab (PAL) – promoting the use of randomized designs and offering training programs for developing countries on these designs

•

Center for Global Development (CGD) – advocating strongly from more rigorous evaluation designs, notably in their publication When Will We Ever Learn? (CGD, 2006) – also issuing a call to action of an independent evaluation agency to ensure more independence and rigor in the development evaluations (Bamberger & White, 2007, pp. 58-59).

Pushing the bounds of current thinking about international development and development evaluation is encouraging the use of impact evaluation. For example, Spain is increasing its development assistance and is concerned about improving aid effectiveness and promoting results-based management in its own agency and in partner developing countries.

Page 378

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

To assist, they have implemented a new program to support the World Bank in evaluating the impact of innovative programs to improve human development outcomes. It is called the Spanish-World Bank Trust Fund for Impact Evaluation (SIEF) and is the largest trust fund every established in the World Bank focused on impact evaluation and results (World Bank, 2007f). The point is that evaluations should strive for more rigor in evaluation design, but it must be appropriate for the each situation. Patton (2007, pp. 1-2) discusses matching designs to different kinds of development interventions. He discusses the importance of considering the purposes, intended uses, and appropriate evidence over the method used. He states “different impact situations and different evaluation purposes have different implications for methods.” (p. 1) Evaluators need to begin by clarifying the situation and the design will emerge from the situation. He again states that “there are multiple development impact situations and therefore diverse approaches to impact evaluation” (p. 2). If the situation is appropriate, evaluators should consider ways to make their evaluations more rigorous. Bamberger and Patton (2007, pp. 66- 69) offer some advice on cost effective ways of strengthening an evaluation design. They include: •

•

consider building the evaluation design on a program theory model −

helps explain links in the causal chain and identify assumptions

−

may identify local economic, political, institutional, environmental, and socio-cultural factors to explain differences in the performance and outcomes

−

only work well when there is a sound theory on which to build.

adopt a good mixed-method design, combine qualitative and quantitative approaches −

use qualitative data for triangulation to provide additional evidence in support of the quantitative results, to help frame the research, and to help interpret quantitative results.

•

make maximum use of available secondary data, including project monitoring data

•

include whenever time and budget permit, collection of data at additional points in the project cycle

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 379

Chapter 7 Bamberger and Patton (2007, pp. 69-71) also discuss ways to address time and budget constraints. These suggestions include: •

consider eliminating one or more of the four data collection points (pre-test/post-test, project, or control group) – this is a trade-off that must be assessed

•

simplify data collection instruments

•

use secondary data creatively −

existing data from already completed surveys for baseline data or control groups

•

consider reducing the sample size

•

reduce the costs of data collection: −

use less expensive interviewers

−

use direct observation rather than household surveys

−

piggy back or synchronizing with other evaluations by adding to another planned survey.

Making Design Decisions Each evaluation will be different. Each will have different evaluation questions, data available, time constraints, and limitations of resources. An evaluator must explore the options for each design in an attempt to give the most robust results. Evaluation is both an art and a science. While making design decisions, consider best practices, but follow good sense. The following are key points to keep in mind as for the design of evaluations: •

There is no perfect design.

•

Each design has strengths and weaknesses.

•

There are always trade-offs in terms of time, cost, and practicality.

•

Acknowledge trade-offs and potential weaknesses.

•

Provide some assessment of their likely impact on the results and conclusions.

Patton (2002, p. 255) states: “What is certain is that different methods can produce quite different findings. The challenge is to figure out which design and methods are most appropriate, productive, and useful in a given situation.”

Page 380

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Michael Scriven’s (2007, Methodology section, ¶ 1) Key Evaluation Checklist offers the following examples of questions that have to be answered in the design phase: •

Do you have adequate domain expertise? If not, how will you add it to the evaluation team (via consultant(s), advisory panel, and full-team membership)?

•

Can you use control or comparison groups to determine causation of supposed effects?

•

If there is to be a control group, can you randomly allocate subjects to it?

•

If none of these, how will you determine?

•

If judges are to be involved, what reliability and bias controls will you need (for credibility as well as validity)?

•

How will you search for [anticipated and unanticipated positive and negative) side-effects?

On the following pages, several tables summarize information in this chapter.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 381

Chapter 7 Linkages between Question and the Design Descriptive Questions: Normative Questions: Questions:

Cause and effect Questions:

Non-experimental, quasi-experimental, or nonexperimental approaches Non-experimental, quasi-experimental, or nonexperimental approaches Plus goals/standards/needs assessment/criteria Experimental, quasi-experimental, or nonexperimental with in-depth causal tracing

Summary of Common Designs Designs for Data Collection Experimental designs:

Always use random assignment for treatment and control groups. True experiment collects data before and after treatment. Variations: sometimes only collect data after treatment.

QuasiQuasi-experimental designs:

Compares intervention and non-intervention groups: no random assignment.

Matched:

Where the groups are matched on key characteristics.

Non-equivalent groups:

Comparison of group with intervention to group without the intervention.

Correlational design:

Collects data from all or a sample of people, cases, units (etc.), and uses statistical techniques to determine whether there are relationships.

Cross-sectional design:

Collects variables from a sample of cases or people at one point in time. Uses statistical controls to separate cases into those who received the intervention and those who did not.

Interrupted time series:

Collects the same data at many points in time before and after the intervention from different people or the same people.

Longitudinal design:

Collects the same data at a few points in time from the same people or from different samples of people from the same population.

Panel design:

Collects in-depth qualitative and quantitative data from the same people at various points in time.

Non Non-Experimental designs:

Page 382

Designs for descriptive questions

Cross-sectional design:

Collects variables from a sample of cases or people at one point in time

Time-series design:

Collects the same data over time, before and after an intervention to observe trends

Descriptive case studies:

In-depth information across few sites

Before-and-after design:

Collects data on key measures before and after intervention

One Shot:

A snapshot – no before measures and no comparison

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

IPDET Terms Experimental Experimenta

Design Types

Visual Representation Representation

Key Advantages

Key Disadvantages

groups. Experimental designs are characterized by random assignment to control and intervention groups Randomized 01 03 strong internal validity, costly, ethical Comparison group X identifies change over considerations, difficult to 04 02 time both with and without generalize intervention After only 03 good internal validity, does not identify change Randomized X slightly more practical, over time Comparison group 04 useful for comparing No before test outcome All quasiquasi- prepre-, and nonnon-experimental designs are slightly weaker than experimental designs with QuasiQuasiQuasi-experimental designs involve comparisons experimental respect to validity or have a low internal validity. Quasibut without random assignments assignments Matched and Before & after 01 03 context must be rules out the effect of non-equivalent Between Groups X considered, greater history, difficult to control (non-equivalent) 02 04 confidence than within for all variables which make comparison group comparison groups non-equivalent Time Series Time Series (within threat of history partially threat of testing bias Interrupted group) 0 1 0 2 0 3 X 0 40 5 0 6 controlled, maturation Time Series controlled (good for Time Series 0103 05 0709011 rules out threats of history, costly, time consuming, descriptive Between Groups X regression toward mean difficult to keep track of questions) (non-equivalent) 02 04 06 08010 012 reduced people over time Comparison Correlational 01 uses statistics to requires large sample 02 determine correlations sizes, no statement about 03 between cases to isolate cause can be made, potential threats speculative determines important relationships and potentially confounding variables After only with 01 practical, context must be ethical considerations, Non-equivalent X 02 considered, control of selection threatens validity Comparison Group effects of testing, instrumentation, regression, history After only with X 01 can compare many threats remain Different Y 02 interventions, must take Treatments Design Z 03 context into consideration Longitudinal No Baseline X 01 O2 O3 O4… follows individuals over costly, difficult to keep track time of individuals over time Panel Measure same X 01 02 03 04 05 … in depth information can be costly over time Ideal for Description NonNon-experimental One shot ease, practicality many threats to validity, X 01 weak design Before and Within Group 01 X 03 practical, context must be testing, instrumentation and After (good for Before & after considered regression threats descriptive Design questions) Cross Within and X 01 clear picture of a point in no clear indication of what Sectional Between 02 time is happening over time 03 Case Study 01 in depth contextual time consuming, little 02 information internal validity 03 Tables on previous page and this page, adapted from Grembowski, D. (2001). The Practice of Health Program Evaluation. London: Sage Publications. Adapted for IPDET June 21, 2004.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 383

Chapter 7 No Perfect Design Experimental Experimental Design: •

Controls for internal threats to validity

•

Hard to do in the public sector.

Before and After Design: •

Useful in giving context for measuring change.

•

Depending on the situation, it may have some weaknesses: testing, instrumentation, regression to the mean, attrition, history, and maturation may be threats.

Comparison Design: •

Useful in looking at differences between groups

•

Controls for history and maturation if comparison group is a close match.

•

Selection and attrition are threats.

One Shot Design: Design:

Page 384

•

Useful for descriptive and normative questions

•

Very weak for cause/effect questions: many threats

•

Multiple one-shot designs begin to build a case.

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Summary An evaluation design is a “plan” to answer evaluation questions. The “gold standard” design is experimental design. An experimental design attempts to control all factors in the “experiment” to determine or predict what may occur. It uses randomized assignment of subjects and use of control groups. Quasi-experimental design is similar to experimental design in that it uses control groups, but it does not use randomized assignment of subjects to groups. Non-experimental designs are more descriptive. They use neither randomized assignment nor control groups. For most development interventions, it is more difficult to design for cause and effect questions because of the complexity of the situation. It is difficult to prove that one intervention causes the observed effect. Evaluation designs can help us determine the impact of a program to the extent that they give us control over the implementation and measurement of the program. The intent is to eliminate other possible explanations for what we observe. For cause and effect questions, consider one or more types of these evaluation designs: •

matched and non-equivalent comparison design

•

time series and interrupted time series design

•

correlational design using statistical controls

•

longitudinal design

•

panel design.

•

before-and-after

•

cross sectional design

•

propensity score matching.

Descriptive questions generally use descriptive or nonexperimental designs. Designs for descriptive questions, focus only on those who have received the intervention. Some of the designs used for descriptive questions are the same as those for cause and effect questions.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 385

Chapter 7 To answer descriptive questions, the most common designs are: •

one shot

•

cross-sectional

•

before-and-after

•

simple time series

•

longitudinal design

•

case studies.

The logic for normative questions is similar to descriptive questions, except that normative questions are always assessed against a criterion. Many leaders in international development evaluation are calling for more rigor in evaluation design. As projects, programs, and policies are moving towards results-based management and impact evaluations designs for evaluations should attempt to move towards stronger evaluation designs, when appropriate.

Page 386

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Chapter 7 Activities: Application Exercise 77-1: Selecting an Evaluation Design Scenario: You have been asked to measure the impact of building a community health clinic to teach parents how to treat common family illnesses, and how to identify something that might be more serious. The goals are to increase the number of parents with basic understanding of preventative healthcare, first aid, and early treatment strategies, and to reduce the number of children and elderly whose illnesses become serious. 1. What is the program?

2. What are the desired outcomes?

3. How would you write a cause and effect question for this evaluation?

4. How would you write a normative question for this evaluation?

5. Who would you write a descriptive question for this evaluation?

6. What design might you use for each of these approaches?

7. Why? What are the strengths and limits of these designs? Why is each one better than other possible designs?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 387

Chapter 7

Application Exercise 77-2: Selecting an Evaluation Design and Data Collection Strategy Scenario: You have been asked to create an evaluation design for a sixmonth study to assess the effectiveness of a preventative health information campaign in your country. You have a moderate budget that will allow some assessment of outcomes, and you have a team of six research assistants to help you with the details. This campaign is to consist of two-day seminars conducted by health professionals in communities throughout your country. The purpose of your evaluation is to determine whether the information campaign resulted in improved health practices by citizens. Is your primary evaluation question a descriptive, a normative, or a cause and effect question? Explain.

Should your data collection strategy be more structured, more open-ended, or a combination of both? Why?

How would you identify the most important outcomes to measure, and how would you measure them?

What evaluation design elements would you use (e.g., inclusion of a comparison group, controlling for other variables, causal tracing strategies, etc)? What are strengths and weaknesses associated with your design?

Page 388

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

References and Further Reading Aliendo, Marco and Sabine Kopeinig (2005) Some practical guidance of the implementation of propensity score matching. Discussion Paper No. 1588, May, 2005. Bonn, Germany: IZA. Retrieved May 5, 2008 from http://ftp.iza.org/dp1588.pdf Bamberger, Michael and Howard White (2007). “Using strong evaluation designs in developing countries: Experience and challenges.” Journal of multidisciplinary evaluation, Vol. 4, No. 8, pp. 58-73. Barzelay, M. (1997). “Central Audit Institutions and Performance Auditing: A Comparative Analysis of Organizational Strategies in the OECD.” Governance: An international journal of policy and administration, 10(3): 235-260. Boruch, Robert (2004). Ethics and randomized trials. Presentation made at the International Program for Development Evaluation Training (IPDET), 2004. Brossart, Daniel F., Daniel L. Clay, and Victor L. Willson (2002). Methodological and statistical considerations for threats to internal validity in pediatric outcome data: Response shift in self-report outcomes. Journal of Pediatric Psychology, Vol. 27, No. 1, 2002, pp. 97-107.

Brown, Randall S. and Ellen Eliason Kisker (1997). Nonexperimental Designs and Program Evaluation. Children and Youth Services Review 19, no. 7): 541-66. Retrieved July 16, 2007 from: http://www.aei.org/publications/pubID.17770/pub_detail. asp Campbell, D.T., and J.C. Stanley (1963). Experimental and quasi-experimental designs for research. In N. L. Cage (Ed.) Handbook of research on teaching. Chicago: Rand McNally Center for Global Development (CGD) (2006). When will we ever learn?” Improving lives through impact evaluation. Washington, D.C.: Center for Global Development Chatterji, M. (2007). “Grades of evidence: Variability in quality of findings in effectiveness studies of complex field interventions”. American journal of evaluation 28(3) 239255. Cohen, M. (2001). Evaluating microfinance’s impact: Going down market. In O.N. Feinstein & R. Picciotto (Eds.) Evaluation and poverty reduction. (pp. 193-203). New Brunswick, NJ: Transaction Publishers.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 389

Chapter 7 Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis for field settings. Boston: Houghton Mifflin. Davidson, E. J. (2006). The RCT’s only doctrine: Brakes on the acquisition of knowledge? Journal of multidisciplinary Evaluation, 6, ii-v. Davidson, E. J. (2000). Ascertaining causality in theory-based evaluation. New Directions for Evaluation, No. 87:17-26. Homer-Dixon, Thomas (1995). Strategies for studying causation in complex ecological political systems. Occasional paper, Project on Environment, Population, and Security. Washington, D.C.: American Association for the Advancement of Science and the University of Toronto. Retrieved January 3, 2008 from www.library.utoronto.ca/pcs/eps/method/methods1.htm Independent Evaluation Group (IEG), The World Bank (2006). Conducting quality impact evaluations under budget, time, and data constraints. Washington, D.C.: The International Bank for Reconstruction and Development/The World Bank. Independent Evaluation Group (IEG), The World Bank (2006). Impact evaluation – The experience of the Independent Evaluation Group of the World Bank. Washington, D.C.: The International Bank for Reconstruction and Development/The World Bank. Mayne, John (2005). Ensuring quality for evaluation: Lessons from auditors. The Canadian journal of program evaluation. Vol. 20 No. 1, pages 37-64. Mayne, John (2006). Audit and evaluation in public management: challenges, reforms, and different roles. Canadian Journal of program evaluation, Vol. 21, No 1, Spring 2006, pages 11-45 Miles, M. B. and A. M. Huberman (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage Publications. National Institute for Occupational Safety and Health (NIOSH) (1999). A model for research on training effectiveness (TIER). Retrieved July 16, 2007, from: http://www.cdc.gov/niosh/99-142.html Office of Juvenile Justice and Delinquency Prevention (1989). Evaluating juvenile justice programs: A design monograph for state planners. Washington, DC: Prepared for the U.S. Department of Justice, Office of Juvenile Justice and Delinquency Prevention by Community Research Associates, Inc. Page 390

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Patton, Michael Q. (2005). The debate about randomized controls in evaluation: The gold standard question. PowerPoint presentation to IPDET, July 2005. Patton, Michael Q. (2007). “Design options and matching the type of impact evaluation and attribution issue to the nature of the intervention: Background discussion on impact evaluation for international development efforts.” November 2007. Patton, Michael Q. (2008). The logic of experimental designs and ten common criticisms: The gold standard debate. In Utilization-focused evaluation, fourth edition. Thousand Oaks: Sage Publications. Prennushi, Giovanna, Gloria Rubio and Kalanidhi Subbarao (2002). PRSP sourcebook. core techniques, Chapter 3: Monitoring and evaluation. Retrieved January 8, 2008 from http://go.worldbank.org/3I8LYLXO80 Project STAR (2006). Study designs for program evaluation. Aguirre Division, JBS International, Inc. Retrieved May 5, 2008 from http://www.nationalserviceresources.org/filemanager/dow nload/performanceMeasurement/Study_Designs_for_Evalu ation.pdf Schweigert, F. J. (2006). The meaning of effectiveness in assessing community initiatives, American journal of evaluation 2006; 27; 416. http://aje.sagepub.com/cgi/content/abstract/27/4/416 Scriven, Michael (2007). Key evaluation checklist, February 2007. Retrieved July 16, 2007 from: http://www.wmich.edu/evalctr/checklists/kec_feb07.pdf Scriven, Michael (2006). Converting perspective to practice. Journal of MultiDisciplinary Evaluation, 6 8-9. Stake, R.E. (1995). The art of case study research. Thousand Oaks, CA: Sage Publications. Stufflebeam, Daniel L. (2004). Evaluation design checklist. The Evaluation Center, Western Michigan University, November 2004. Retrieved January 3, 2008 from http://www.wmich.edu/evalctr/checklists/evaldesign.pdf Stufflebeam, D. L., G. F. Mdaus, and T. Kellaghan editors (2000). Evaluation Models: Viewpoints on educational and human services evaluation. Boston: Kluwer. Trochim, W. and D. Land. (1982). Designing designs for research. The researcher, 1,1: 1-16. Retrieved July 16, 2007 from: http://www.socialresearchmethods.net/kb/desdes.htm

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 391

Chapter 7 United Kingdom Evaluation Society (2003). Glossary of evaluation terms. Retrieved January 2, 2008 from http://www.evaluation.org.uk/Pub_library/Glossary.htm Wadsworth, Y. (1997). Everyday evaluation on the run. St. Leonards, NSW, Australia: Allen and Unwin. White, Howard (2007).Challenges in evaluating development effectiveness. Working paper. Washington D.C.:World Bank. White, Howard, and Edoardo Masset (2005) Quasiexperimental evaluation. PowerPoint presentation February 16, 2005. The World Bank, Development Research Group. (1998). “Do community-managed schools work? An evaluation of El Salvador’s EDUCO program.” Impact evaluation of education reforms, Paper No. 8. February 1998. The World Bank (2008). PovertyNet, impact evaluation, methods and techniques, evaluation designs. Retrieved on January 9, 2008 from http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/E XTPOVERTY/EXTISPMA/0,,contentMDK:20188242~menuP K:415130~pagePK:148956~piPK:216618~theSitePK:384329 ,00.html The World Bank (2004). Poverty Net, evaluation designs. Retrieved August 16, 2007 from: http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/E XTPOVERTY/EXTISPMA/0,,contentMDK:20188242~menuP K:412148~pagePK:148956~piPK:216618~theSitePK:384329 ,00.html Yin, R.K. (1984). Case study research. Thousand Oaks, CA: Sage Publications.

Page 392

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting Designs for Cause and Effect , Normative, and Descriptive Evaluation Questions

Web Sites: The Campbell Collaboration Retrieved July, 16, 2007 from: http://www.campbellcollaboration.org/ Doing Impact Evaluation Series from the World Bank http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/E XTPOVERTY/EXTISPMA/0,,menuPK:384336~pagePK:1490 18~piPK:149093~theSitePK:384329,00.html#doingIE Schweigert, F. J. (2006). The meaning of effectiveness in assessing community initiatives, American journal of evaluation 2006; 27; 416. Retrieved November 21, 2007 from http://aje.sagepub.com/cgi/content/abstract/27/4/416 Scriven, Michael (2007). Key evaluation checklist. Retrieved July 16, 2007 from http://www.wmich.edu/evalctr/checklists/kec_feb07.pdf Stufflebeam, Daniel L. (2004). Evaluation design checklist. The Evaluation Center, Western Michigan University, November 2004. Retrieved January 3, 2008 from http://www.wmich.edu/evalctr/checklists/evaldesign.pdf

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 393

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 8 Selecting and Constructing Data Collection Instruments Introduction Previous chapters discussed evaluation questions and evaluation designs to match these questions. This chapter looks at how to collect the data to answer evaluation questions. The chapter begins with information about data collection and ends with a “Toolkit” of methods for collecting data. This chapter has four parts. They are: •

Data Collection Strategies

•

Key Issues about Measures

•

Quantitative and Qualitative Data

•

Common Data Collection Approaches: The Toolkit.

Chapter 8

Part I: Data Collection Strategies Data can be collected in many ways. No one single way is the best way. The decision about which method to use depends upon: •

what you need to know

•

where the data reside

•

resources and time available

•

complexity of the data to be collected

•

frequency of data collection.

Table 8.1 shows how the data collection method depends upon the situation. It is a decision table to help an evaluator consider a data collection method for a program involving, for illustrative purposes, adult literacy in villages. Table 88-1: Decision Table for Data Collection Method for Adult Literacy Intervention. If you need to know:

Then consider: consider:

whether villagers with low literacy levels who participated in the program can read and write better than those with low literacy levels who did not participate

• collecting samples of writing before and after the intervention

whether literacy intervention participants are more actively engaged in their children’s education

• observing parent-child interactions

whether literacy program participants were satisfied with the quality of the literacy workshops and follow-up

• using a structured interview of participants

• using test results from before and after the intervention

• asking children, parents, and teachers whether this is the case before and after the program

• using a survey if literacy levels are high enough

As mentioned briefly in Chapter 6, Developing Evaluation Questions and Starting the Design Matrix, the choice of methods hinges partly on the evaluation question to be answered, partly on how well the intervention is understood, and partly on the time and resources available. There is a trade-off here between in-depth understanding that comes from a case study, for example, and having data collected in a systematic and precise way through a survey that allows valid comparisons to be made. As each evaluation question requires its own mini-design, different data collection methods may be used to answer different questions within the over-all evaluation design. Page 396

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments

Choices: 1.

Do you want to obtain some information across all participants in the program (breadth) or do you want to obtain a more-in-depth qualitative understanding? It helps to know what the main client for the evaluation identifies as most important. Do they want representative data on the condition of the nation’s schools or a more indepth sense of the situation in the poorest urban areas? Sometimes both are important.

2. How structured do you want to be? There are two choices for the amount of structure in collecting data. •

If you want precision, then more structure is better.

•

If you want depth and nuance, or if you are uncertain about what you want to specifically measure, then a semi-structured or even an informal constructed approach is better.

Structured Approach Structured data collection approaches require that all data be collected in exactly the same way. This is particularly important for multi-site and cluster evaluations. In these evaluations, evaluators need to be able to compare findings at different sites in order to draw conclusions about what is working where. Structure is also important when making comparisons with alternative interventions to determine which is most cost-effective. Consider the example of an evaluation of an agricultural intervention. To address one evaluation question, the evaluators decide to use the moisture content of the soil as a measure for successful land drainage. The evaluation will then collect measures of moisture content from multiple sites in the region, before and after the drainage, over the same period of time (and under the same weather conditions). To address a second question, evaluators want to investigate all affected farmer’s views of the project’s effects by asking specific questions, which are normally focused. Based on the semi-structured interviews, they “count” (e.g. 20 percent of the farmers indicated…). The questions should be narrowly focused, precisely worded with a fixed set of responses in a multiple choice format, so everyone is asked the question in exactly the same way and has to choose from exactly the same set of responses.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 397

Chapter 8 For a third question, they plan to use records of crop production and prices over time, before the intervention and after it, in the drained area, compared with a similar area in the region where the land has not yet been drained. For a fourth question they asked a sample of 100 of the 2600 small farmers their views about the project and its effects. The evaluation will likely want semi-structured questions so they can probe responses, but also tabulate the responses. If the evaluation does not need “counting information” the questions can be more open-ended or semi-structured. Structured data collection approaches are used to collect quantitative data when the evaluation: •

has a need for precision

•

has a large sample or population

•

knows what they need to measure

•

needs to show its results numerically

•

needs to make comparisons across different sites or interventions.

SemiSemi-structured Approach Semi-structured data collection approaches are still systematic and follow general procedures, but data are not collected in the same way every time. These approaches are more open and fluid. People can tell you what they want in their own way. The evaluation may vary questions or probe and ask for more detail since the evaluators are not following a rigid script. Semi-structured data collection methods are generally qualitative and used when the evaluation: •

is conducting exploratory work in a new development area (empowerment interventions, or those targeted at women)

•

is seeking understanding, themes, and/or issues

•

wants participant narratives, or in-depth information

•

wants in-depth, rich, and “back stage” information.

Page 398

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments If the evaluation is on a Community Driven Development (CDD) project, for example, the evaluators may choose to use a semistructured approach to data collection. CDD is an approach that gives control over planning decisions and investment resources to community groups and local governments (World Bank, Community Driven Development, ¶ 1). Because CDD programs give control of planning decisions to local groups, the evaluator cannot use a fully structured approach because the same questions are not appropriate to all groups as the specific interventions selected will differ.

Data Collection General Rules The following are general rules to help with data collection. •

Use multiple data collection methods when possible.

•

Use available data if you can. (Using it is faster, less expensive, and easier than generating new data.)

•

If using available data, be sure to find out how earlier evaluators:

•

−

collected the data

−

defined the variables

−

ensured accuracy of the data.

If you must collect original data: −

establish procedures and follow them (protocol)

−

maintain accurate records of definitions and coding

−

pre-test, pre-test, pre-test

−

verify accuracy of coding and data input.

Part II: Key Issues about Measures The art of collecting data involves measures. We are measuring opinions, performance, skills, and attitudes. Each question is a data collection method is taking a measure. In determining how we will measure the variable of interest and collect data on it, five key issues must be kept in mind: •

Are the measures credible?

•

Are the measures valid?

•

Are the measures relevant?

•

Are the measures reliable?

•

Are the measures precise?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 399

Chapter 8

Credibility Credibility refers to how trustworthy or believable the data collected are. In other words, are the data that are collected giving information about the actual situation? For example, teacher opinions may not be the most credible measure for learning the reasons for high dropout rates. The opinions of the dropouts themselves are a more credible measure.

Validity Validity is a term used to describe if a measurement actually measures what it is supposed to measure. Are the questions giving accurate information? For example, using waiting lists as a measure of the demand for certain early childhood education programs is known as a measure with e weak validity. Waiting lists are frequently out of date and parents place children on multiple waiting lists. When a child is placed, its name does not necessarily come off the other waiting lists. Two specific kinds of validity are face validity and content validity: •

Face validity addresses the extent to which the content of the test or procedure look like they are measuring what they are supposed to measure. For example, if the evaluation is measuring physical fitness, the measure of how fast one runs 100 meters may indeed look like it could be one measure of physical fitness.

•

Content validity addresses the extent to which the content of a test or procedure adequately measures the variable of interest. Using the example of health status, if an evaluator was trying to develop such a measure, then he or she might best consult with health professionals to ensure a measure is selected with high content validity. Proportion of body fat might be a more valid measure for fitness, for example, the knowledge of healthy eating habits. Self-reports of application of knowledge of healthy eating habits might be somewhat more valid, but still be weak in that the respondent may not want to report poor eating habits and so give a better picture than is the actual case.

Page 400

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments

Relevance Relevance is the term used for measuring what counts. It is important to make sure that the data being collected are relevant in that they measure the most important information. Be sure to avoid the trap of measuring what is easy instead of measuring what is needed, or trying to measure everything. Indeed the design matrix is a tool itself for making sure data collected will be relevant.

Reliability Reliability is a term used to describe the stability of the measurement: that it measures the same thing, in the same way, in repeated tests. For example, the measurement tools for some sporting events need to be reliable. The tape that measures the distance of a jump, must measure the distance in the same way, each time it is used. If it does, it is considered a reliable measure. If it does not, the results of the study (the competition) would be flawed and results of the event could be questioned. Birth weights of newborn infants are an example of a reliable measure, assuming the scales are calibrated. Attendance rates at schools are an example of a measure with low reliability unless it is precisely defined. Attendance rates are known to vary depending on when in the school year the measure is taken.

Precision Precision is a term used to describe how the language used in the data collection matches the measure. For example, if the question is about countries, then the measures must be at the national level. If the question is about people, then the measures must be on the individual level.

Examples of precise measures of community health status: •

infant mortality rates

•

number of physicians per 1,000 people

•

number of deaths from communicable diseases for age specific cohorts

•

percentage of per-school children immunized

•

number of clinics per square mile.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 401

Chapter 8

Examples of precise measures of educational quality •

teacher-student ratio

•

educational background of teachers

•

number of books and other learning resources per student

•

number of certified teachers

•

students’ test scores

•

satisfaction of parents

•

number of students who graduate as a proportion of those who begin the least year of secondary school.

Table 8.2 8.2: Key Issues about Data: Question

Key Issue

Are measures credible?

Are your data giving you information about the actual situation?

Are your findings valid?

Do your findings reflect what you set out to measure? If your question is about people’s behavior, are you really measuring behavior? If your question is about people’s perceptions, are you measuring perceptions?

Are you measuring what is most important?

“Measure what counts.” Are you measuring what really matters as opposed to what is easiest?

Are your findings reliable?

Are your data collected in the same way using the same decision rules every time so that you have consistency in your measures?

Are your measures precise?

If your question is about crop production, then your measures should be based on the quality and quantity of crops. We might want to compare: • Buyers’ assessments of crop quality from the newly-drained land vs. the quality of crops prior to the drainage project; • Price per kilo obtained for the crop at market before vs. after the drainage project, with an additional comparison with the price obtained by un-drained farms over the same period; • Total yield (in kilos per hectare) before and after the project. If your question is about how farms perform financially, then your measures should be based on farm income. We might want to compare: • farm incomes before and after the land was drained • farm incomes for drained vs. un-drained farms in comparable areas.

Page 402

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments

Part III: Quantitative and Qualitative Data Data can be classified as quantitative and qualitative data. Quantitative data are typically defined as data in numerical form. Qualitative data are data in non-numerical form. Quantitative data deal with numbers. These are data that generally can be precisely measured. Think of the word quantitative, it is derived from the word quantity. Examples of ways quantitative data can be described are: age, cost, percentage, length, height, area, volume, weight, speed, time, and temperature. Qualitative data deal with descriptions. They are data that can be observed, or self-reported, but not necessarily precisely measured. Examples of ways qualitative data are described are: appearance, color, texture, smell, tastes, relationships, and behavior (Roberts, 2007, Qualitative vs. Quantitative Data). For example, an evaluation is investigating a group of villagers involved in a micro-lending program. Quantitative data for this program might include: number of participants, number of participants by gender, age, number of children, income, inventory of product, cost of product, and sales. Qualitative data for this program might include descriptions of: products, family relationships, demeanor of participants, relationships with the community, and self-efficacy (feelings of control). Patton (2002, p. 4) categorizes three data collection methods that produce qualitative findings: •

in-depth, open-ended interviews

•

direct observation

•

analysis of written documents.

Patton further identifies the kinds of information evaluators learn from each of the three methods. Interviews yield direct quotations about experiences, opinions, feeling, and knowledge. Observations give detailed descriptions of activities, behaviors, actions, and the full range of interpersonal interactions and organizational processes. Document analysis includes studying excerpts, quotations, or entire passages from records, memoranda and correspondence, official publications and reports, personal diaries, and open-ended written responses to questionnaires and surveys.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 403

Chapter 8 Most qualitative data collection comes from the evaluator spending time in the setting under study. The evaluator makes firsthand observations of activities and interactions, sometimes engaging personally in those activities as participant observer. The extensive notes from the data collection are collected for the raw data. These data are then organized into readable narrative descriptions with major themes, categories, and illustrative case examples (Patton, 2002, pp. 4-5). The quality of the qualitative data collected depends on the evaluator. According to Patton: “Systematic and rigorous observation involves far more than just being present and looking around. Skillful interviewing involves much more than just asking questions. Content analysis requires considerably more than just reading to see what’s there. Generating useful and credible qualitative findings through observation, interviewing, and content analysis requires discipline, knowledge, training, practice, creativity, and hard work” (2002, p. 5). How do you know when qualitative methods are appropriate for an evaluation? Patton (1987, pp. 39-41) developed a checklist of twenty questions to help decide if qualitative methods are an appropriate evaluation strategy. If the answer to any question is “yes”, then the collection of at least some qualitative data is likely to be appropriate. Patton’s 20 Question Qualitative Qualitative Checklist (1987, p. 3939-41) 1.

Does the program emphasize individual outcomes – that is, are different participants expected to be affected in qualitatively different ways? And is there a need or desire to describe and evaluate these individualized client outcomes?

2.

Are decision makers interested in elucidating and understanding the internal dynamics of programs – program strengths, program weaknesses, and overall program processes?

3.

Is detailed, in-depth information needed about certain client cases or program sites, for example, particularly successful cases, unusual failures, or critically important cases for programmatic, financial, or political reasons?

4.

Is there interest in focusing on the diversity among, idiosyncrasies of, and unique qualities exhibited by individual clients and programs (as opposed to comparing all clients or programs on standardized, uniform measures)?

5.

Is information needed about the details of program implementation: What do clients in the program experience? What services are provided to clients? How is the program organized? What do staff do? Do decision makers need to know what is going on in the program and how it has developed?

6.

Are program staff and other stakeholders interested in the collection of detailed, descriptive information about the program for the purpose of improving the program (i.e., is there interest in formative evaluation? (continued on next page)

Page 404

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments Patton’s 20 Question Qualitative Checklist (cont.) 7.

Is there a need for information about the nuances of program quality – descriptive information about the quality of program activities and outcomes, not just levels, amounts, or quantities of program activity and outcomes?

8. Does the program need a case-specific quality assurance system? 9.

Are legislators or other decision makers or funders interested in having evaluators conduct program site visits so that the evaluations can be the surrogate eyes and ears for decision makers who are too busy to make such site visits themselves and who lack the observing and listening skills of trained evaluators? Is legislative monitoring needed on a case basis?

10. Is the obtrusiveness of evaluation a concern? Will the administration of standardized measuring instruments (questionnaires and tests) be overly obtrusive in contrast to data-gathering through natural observations and open-ended interviews? Will the collection of qualitative data generate less reactivity among participants than the collection of quantitative data? Is there a need for unobtrusive observations? 11. Is there a need and desire to personalize the evaluation process by using research methods that emphasize personal, face-to-face contact with the program – methods that me by perceived as “humanistic” and personal because they do not label and number the participants, and feel natural, informal, and understandable to participants? 12. Is a responsive evaluation approach appropriate – that is, an approach that is especially sensitive to collecting descriptive data and reporting information in terms of differing stakeholder perspectives based on direct, personal contact with those different stakeholders? 13. Are the goals of the program vague, general, and nonspecific, indicating the possible advantage of a goal-free evaluation approach that would gather information about what effects the program is actually having rather than measure goal attainment? 14. Is there a possibility that the program may be affecting clients or participants in unanticipated ways and/or having unexpected side effects, indicating the need for a method of inquiry that can discover effects beyond those formally stated as desirable by program staff (again, an indication of the need for some form of goal-free evaluation)? 15. Is there a lack of proven quantitative instrumentation for important program outcomes? Is the state of measurement science such that no valid, reliable, and believable standardized instrument is available or readily capable of being developed to measure quantitatively the particular program outcomes for which data are needed? 16. Is the evaluation exploratory? Is the program at a pre-evaluation stage, where goals and program content are still being developed? 17. Is an evaluability assessment needed to determine a summative evaluation design? 18. Is there a need to add depth, detail, and meaning to statistical findings or survey generalizations? 19. Has the collection of quantitative evaluations data become so routine that no one pays much attention to the results anymore, suggesting a possible need to break the old routine and use new methods to generate new insights about the program? 20. Is there a need to develop a program theory grounded in observations of program activities and impacts, and the relationship between treatment and outcomes?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 405

Chapter 8 Data collection usually includes both quantitative and qualitative data but one approach may be dominant. Each approach can be characterized in the following ways. A quantitative approach: •

is more structured

•

attempts to provide precise measures

•

emphasizes reliability

•

is harder to develop

•

is easier to analyze.

A qualitative approach: •

is less structured

•

is easier to develop

•

can provide “rich data” (detailed information that can be widely applied and linked to other data)

•

is challenging to analyze

•

is labor intensive to collect

•

emphasizes validity.

Table 8.3 8.3: When to Use Quantitative vs. Qualitative Approaches If you:

Then use this approach:

want to do statistical analysis want to be precise

Quantitative

know what you want to measure want to cover a large group want narratives or in-depth information are in an exploratory situation and are not sure what you are able to measure

Qualitative

do not need to quantify

Page 406

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments In reality, quantitative and qualitative data are related to each other. According to Trochim “All quantitative data is [sic] based upon qualitative judgments; and all qualitative data can be described and manipulated numerically” (2006, Types of Data, ¶3). When creating instruments for collecting data, evaluators make many qualitative decisions about which questions to ask, how to ask them, etc. On the other hand, all qualitative data can be converted into quantitative data. This can be done by dividing the qualitative information into units and numbering them (Trochim, 2006, Types of Data, ¶ 3-4).

Obtrusive vs. Unobtrusive Methods Data can be collected obtrusively or unobtrusively. Obtrusive methods are approaches where observations are made of behavior with the participant’s knowledge. Examples of obtrusive methods are perceptions, opinions, and attitudes gathered through interviews, surveys, and focus groups, as well as observations done with the knowledge of those being observed. Unobtrusive methods are observations done without the knowledge of the participant. Examples of unobtrusive methods are historical/document/archival data, observing participants without their knowledge of your observation. If an evaluation uses questionnaires to collect data, the subjects know they are being studied which may produce artificial results. According to Patton: “The instrument itself can create a reaction which, because of its intrusiveness and interference with normal program operation and client functioning, fails to reflect accurately what has been achieved in the program” (1987, p. 33). If those who are being studied know they are being studied, it introduces a risk of error. Those being studied might change their behavior and/or they might change what they say. One legendary unobtrusive measure looked at measuring the popularity of museum exhibits in a museum by studying the amount of wear on the floor tiles in front of the exhibits and where they most often needed to be replaced.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 407

Chapter 8

Part IV: Common Data Collection Approaches: The Toolkit This chapter discusses a number of data collection strategies that you can add to your “toolkit” to guide your decisions for when to use each data collection technique. The data collection technique chosen will depend on the situation. Each technique is more appropriate in some situations than others. All can be used, however, even if they vary in their amount of structure.

Caution: Caution: Gathering data from people No matter which method is chosen to gather data from people, all the information gathered is potentially subject to bias. Bias means that when asked to provide information about themselves, respondents may or may not tell the whole truth, unintentionally or intentionally. It might be because they do not remember accurately, or they fear the consequences of providing a truthful answer. They may also be embarrassed or uncomfortable about admitting things they feel will not be socially acceptable. All self-reported data have this same vulnerability. For example, if the evaluation is asking questions about the use of protection in sexual intercourse, or when the subject last visited a doctor, he or she may feel embarrassed and not answer accurately. The subject may describe what he or she thinks the evaluator wants to hear rather than the actual truth. There is also some concern that the people who choose to participate in a program as well as its evaluation may be different from those who choose not to. These are issues in surveys, interviews, and focus groups whether they use structured or unstructured approaches.

Combinations Typically, a variety of data collection approaches are used in combination to answer different evaluation questions or to provide multiple sources of data in response to a single evaluation question. The evaluation may, for example, collect available data from farmers’ crop yield records, interview buyers of farm produce, and survey the farmers themselves. Sometimes evaluators use focus groups, or case studies to help develop themes for a questionnaire or to make sense of survey results.

Page 408

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments Collecting the same information using different approaches in order to get more accurate information to an evaluation question is called methodological triangulation. Denzin (1978) has identified several types of triangulation. One type involves the convergence of multiple data sources. Another type is the triangulation of methods, which involves the convergence of data from multiple data collection sources. Evaluator triangulation, in which multiple evaluators are involved in an investigation, is another type of triangulation. Related to evaluator triangulation is participant corroboration or triangulation.

Measurement Considerations If people know they are being studied, they may act differently, so the very act of collecting data may introduce some error into the measurements. The data collection method may also have an effect; for example, women may respond differently to a male interviewer than to a female interviewer, they may respond differently if they are interviewed alone or with their spouses. These are the kinds of considerations for choosing the data collection methodology. Evaluators generally have several options in the kinds of measures they choose. For example, if the evaluator wants to study economic opportunities for women, they might need to define the range of possible opportunities available, such as: starting their own business, high-paying jobs, or executive positions. On the other hand, they might also want to know what percent started businesses that were financially successful and created jobs for other women or family members. Does the evaluator want to know how many are passing their new skills and confidence on to their daughters? As the evaluators specify what they want to measure, they become clearer about what is important in answering questions. The following sections present a set of “tools” that introduce different methods of collecting data: •

Tool 1: Participatory Data Collection

• •

Tool 2: Available Records and Secondary Analysis Tool 3: Observation

• •

Tool 4: Surveys Tool 5: Focus Groups

•

Tool 6: Diaries, Journals, and Self-reported Checklists

•

Tool 7: Expert Judgment

• •

Tool 8: Delphi Technique Tool 9: Citizen Report Cards.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 409

Chapter 8

Tool 1: Participatory Data Collection Participatory data collection approaches involve groups or communities heavily in the data collection. Examples of participatory data collection techniques are: •

community meetings

•

mapping

•

transect walks.

Community Meetings One of the most common methods of participatory data collection is through community meetings. These meetings allow members of the community to ask questions, make comments, and discuss issues of importance to the community. In order for meetings to collect usable data they must be well organized. The evaluator and stakeholders should agree on the purpose of the meeting and commit to being present for the meeting. Before the meeting, the evaluator should establish ground rules, announce them before the meeting and then followed them during the meeting. Items to consider for ground rules are: •

how to identify speakers

•

time allotted for speakers

•

format for questions and answers

The ground rules should be put writing and be available at the meeting for latecomers. The community meeting should be widely publicized. Consider using flyers at community-based organizations, ads in newspapers serving particular communities, and radio station announcements. Members of the community can also be responsible for spreading the word. Choose the location for the meeting with the idea of encouraging community participation while still meeting the comfort, access, and safety needs of the participants (Minnesota Department of Health, 2007, Community Engagement, Community Forums and Public Hearings).

Page 410

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments The following are some of the advantages of holding community meetings •

can raise the credibility of the process by enhancing openness and inclusion

•

inexpensive and relatively easy-to-arrange

•

allows for broad participation

•

may be a more relaxed setting, yielding greater participation

•

can raise the level of awareness and understanding of the evaluation and build support

•

can increase the evaluator’s knowledge of important program issues

•

may reveal issues that warrant further investigation.

Community meetings have pitfalls as well, these include: •

community members who choose to participate may not be representative of the community; some people with good ideas or a clear understanding of the issues do not like to attend or to speak at such events

•

those who attend such community meetings may be those with strong likes or dislikes about the program.

Because of these disadvantages community meetings should not be the primary data collection method for the evaluation.

Mapping Social mapping can be used to present information on village layout, infrastructure, demography, ethno-linguistic groups, community facilities, health patterns, wealth and other community issues One approach that can be used when working in communities is mapping – “drawing” a conceptual picture of the various elements that make up a community, including resources and assets, and how they interact with one another. This approach brings together members of the community to better understand the community and how the intervention fits (or does not) within the community. It can be used as part of any approach if appropriate to the evaluation questions. Mapping is a method especially for collecting and plotting information on the distribution, access, and use of resources within a community.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 411

Chapter 8 Mapping is a useful tool in participatory evaluation or any approach involving stakeholders because it provides them with a way to work together. At the same time, it increases everyone’s understanding of the community. It is possible that people have different understandings of the community based on their status and experience. It is also particularly useful with non-literate groups. The process of mapping can be used to generate discussions about local development priorities. It can be used to verify secondary sources of information. Mapping can also capture changes or perceived changes over time (e.g. before and after an intervention). There are many kinds of mapping. They include: •

resource mapping

•

historical mapping

•

social mapping

•

health mapping

•

wealth mapping

•

land use mapping

•

demographic mapping.

While the process of mapping often is applied to the planning of interventions, it can also be used in evaluations. It may help, for example, to understand whether the project being evaluated is located in the areas of greatest need or how it is co-located with other resources in the community. If co-located with other resources, do they work collaboratively? If not, what are the barriers?

Tools for Mapping The global positioning system, usually called GPS, is a navigation system that uses satellites to identify locations on Earth. GPS has become a vital global utility, indispensable for modern navigation on land, sea, and air in all locations of the world. It can also be a valuable tool to assist with mapping. A GPS devise is helpful to pinpoint a location and learn its latitude and longitude components. An interesting tool that can assist evaluators is Google Earth. Google Earth is a free-of-charge, downloadable program from the Internet. It maps the entire earth by pasting images obtained from satellites and aerial photography and geographical information systems (GIS) on to an image of a three dimensional globe. Many large cities are available in a resolution high enough to see individual buildings, houses, and even cars. Page 412

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments The degree of resolution available is based somewhat on the points of interest, but all land is covered in at least 15 meters of resolution. There are two ways to locate an area on the globe: enter the coordinates, or simply use the mouse to browse to a location on the globe. Google Earth can be helpful for collecting baseline data and trend data. Locate an area with Google Earth, save the image, and print it. The pictures show locations of buildings, forests, rivers, lakes, etc. Data collected at later dates can then be compared to the image captured baseline data. It can be useful in showing changes over time, for example, in transportation improvements (e.g. roads). Figure 8.1 shows a copy of an image from Google Earth of Dacca (Dhaka), India. Notice, the image shows buildings, a river, and boats or barges on the river.

Fig. 8.1: Google Earth Earth Image of Dacca (Dhaka), India.

Google Earth is available in a free version, and in licensed versions for commercial use. It is currently officially available on Windows XP, Mac OS X and Linux. at the following website: http://earth.google.com/

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 413

Chapter 8

Transect Walks Transect walks are walks that evaluators take around a community observing the people, surroundings, and resources. They are a kind of spatial data-gathering tool. Good observation skills are essential for performing a transect walk. Transect walks are usually done after mapping and provide an evaluator with a “big picture” view of the community and to help identify issues that need further investigation. A transect walk can take as little as an hour or as long as a day. The transect walk is planned by drawing a “transect line” through a map of a community. The transect line should go through, or transect, all zones of the community in order to provide a representative view of the community. The evaluator, accompanied by several community members, walk along the area represented by the transect line on the map. The evaluator talks to the community members as they observe conditions, people, problems, and opportunities across the community (Empowering Communities: Participatory Techniques for Community-Based Programme Development, 2007, p. 41). The following are examples of things that can be observed during a transect walk: •

housing conditions

•

presence of “street children”

•

informal street commerce and prostitution

•

availability of public transportation

•

types of non-governmental organizations or church organizations

•

types of stores

•

interactions between men and women

•

food sold in open markets

•

sanitary conditions

•

children’s labor

•

presence of health facilities

•

community facilities (Empowering Communities: Participatory Techniques for Community-Based Programme Development, 2007, p. 42).

Page 414

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments

Tool 2: Available Records and Secondary Data Analysis Sometimes data have already been collected that can be used to answer our questions. When using data gathered by others, find out how they carried out the data collection, how they measured each variable, the decision rules they used to code and clean the data, and how they treated missing data, nonresponses, and low response rates. Newspapers, television, and Web pages and discussion groups using the Internet give access to vast amounts of information. Some of it can be valuable; some of it can be wrong or misleading.

Examples of typical sources of available data: •

files/records

•

computer data bases

•

industry reports

•

government reports

•

other reports or prior evaluations

•

census data and household survey data

•

electronic mailing lists and discussion groups

•

documents (budgets, policies and procedures, organizational charts, maps)

•

newspapers and television reports.

Using Existing Records Government agencies, clinics, schools, associations, and development organizations are but a few of the organizations that produce records. These can be a mainstay of evaluations. Organizational records are a common source of evaluation information. Most organizations have already collected data from clients and communities, and have organized these data into internal information systems. They may also have summarized and reported the information in the form of: •

internal management reports

•

budget documents

•

reports to the public or funding agencies

•

evaluation or monitoring reports.

Data may also be available for secondary analysis.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 415

Chapter 8

Key Issues to Consider: •

Are the available data valid?

•

Are the available data reliable?

•

Are the available data accurate?

Collecting Data from Paper Files, Records, or Documents: Sometimes the data are available but not in a form that is easy to analyze. The evaluators may have to collect information that is in files, documents, or newspapers, for example. In this situation, they develop a data collection instrument (DCI) that specifies exactly what data to collect from the file or record and how to code it. This DCI is like a close-ended questionnaire with specific items that have fixed responses. The objective is to develop a DCI that is easy, simple, and clear. Once the instrument is developed, it should be pre-tested. When working with the evaluation documents that describe current activities or practices, the evaluator should verify that the documents accurately reflect what is actually practiced. Observations and interviews may help in verifying actual practices. For example, when observing a training program: do they really hold classes for five days every week, are materials available, and are the participants diverse? If the evaluation is on whether critical care nurses trained in a government sponsored training program are more effective than other critical care nurses, a data collection instrument could be used to systematically gather relevant data in their files. The evaluators could select a sample of critical care clinics with one or more of the nurses trained through the government program and review the records of all the nurses. These records include their educational background, how long they have been nursing, and their performance ratings.

Page 416

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments The data collection instrument might look like the one shown in Figure 8.2.

ID #:____________ 1. Highest year of education completed: ________________ 2. Registered Nurse?

Yes ________ No ________

3. Completed government training?

Yes ________ No ________

4. If yes, year completed training:

________________

5. How many years nursing at this clinic?

__________

6. How many years nursing elsewhere?

__________

7. Performance ratings for the past 5 years: Year:______ Rating: ______ Year:______ Rating: ______ Year:______ Rating: ______ Year:______ Rating: ______ Year:______ Rating: ______ 8. Performance award received during past 5 years: ___ Yes ___ No If yes, number of awards received in past 5 years:___ 9. Gender: ____ Male ____ Female 10. Comments: ____________________________________ __________________________________________ __________________________________________

Fig. 8.2: Example of a Data Collection Instrument.

Sometimes it is necessary to read and analyze official documents rather than files in order to describe current activities or practices.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 417

Chapter 8 For example, to determine the factors that contribute to a good practice in delivering a public service, it would be helpful to look at documents and ask: •

When was the program started?

•

What were the goals and objectives in the enabling legislation or the authorizing document?

•

How many people were involved?

•

What departments were involved?

•

How did they go about implementing the program?

•

What measures did they use to track success?

•

What was the budget?

•

Have prior evaluations been conducted?

It is essential to try to verify that the documents accurately reflect what is actually practiced. Verification can come in the form of supporting documents that report similar information or through interviews of people who are knowledgeable about the program, its history, and implementation. It helps to seek out people who have different roles, including external people, such as budget staff and clients. Their perspectives can help shed light on the information obtained, as well as provide insights into the unwritten history.

Collecting Computer Data: Often, the data we need are from large computer data bases on household survey data or loans made by a financial intermediary to SMEs. We may need access to the data to do a different analysis of the data than done to date or to verify data we have been given. In such cases, the evaluator must: •

Obtain the database structure, data dictionary, and coding schemes.

•

Find out what is needed to transfer the data to your computer. Sometimes the organization holding the data base will prefer to do the analyses for you. Sometimes this is the only option, but if possible and practical, it is preferable to transfer the file.

•

Verify the accuracy of the data.

•

Transfer the data with a minimum of effort (no re-typing) to avoid introducing new errors from data entry procedures.

•

Be sure to check for viruses before transferring data to your computer.

Page 418

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments

Advantages and Challenges of Using Available Data Advantages:

Often less expensive and faster than collecting the original data yourself.

Challenges:

There may be coding errors or other problems. Data may not be exactly what is needed. You may have difficulty getting access. You have to verify validity and reliability of the data.

Tool 3: Observation Observation enables us see what is happening. By just using our eyes, we can observe: traffic patterns, land use patterns, the layout of city and rural environments, the quality of housing, the condition of roads, or who comes to a rural health clinic for medical services. Observation is a useful data collection tool when the evaluation is securing benchmark and descriptive data and documenting program activities, processes, and outputs. Observation is appropriate in the following conditions: •

•

when you want direct information, for example: −

paying random visits to schools, homes, farms (or other sites) and observing rather than simply asking people

−

observing operations and procedures in offices, schools, hospitals (or other sites) rather than relying on reports

−

unobtrusively recording numbers of occurrences such as ethnic diversity, gender, or age groups

when you are trying to understand an ongoing behavior, process, unfolding situation or event, for example: −

observing and describing what is done in each phase of project or program

−

observing parents dealing with children, teachers dealing with students, health care workers dealing with patients, or managers dealing with employees

−

observing managers conducting business meetings before and after officer training programs

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 419

Chapter 8 •

•

when there is physical evidence, products or outputs that can be readily seen, for example: −

observing food and other items being sold in the market

−

periodically observing along the coastline of a lake involved in a clean up or re-building program

−

having a team of experts inspect the quality of grasses and legumes in a pasture

−

inspecting gardens, newsletters, project books, etc.

when written or other data collection procedures seem inappropriate, for example: −

several participants volunteer to observe and report on your program delivery rather than having all participants fill out a questionnaire

−

a perceptive observer records dynamics and concerns during a workshop for new immigrants

−

trainers observe each others’ classes, noting dynamics, questions, and level of participation (The University of Wisconsin – Extension: Cooperative Extension, 1996, p. 1).

When using observation techniques, the observer can be: unobtrusive, participant, or obtrusive. Unobtrusive Observer:

No one knows you are observing. For example, if you visit a local market that has been given resources for development, you can observe the activity within the shops, the traffic in the area, the general sanitation, and may even enter into casual conversations with shoppers. Of course, to be unobtrusive you must look like someone who would be likely to be seen in that marketplace.

Participant Observer:

You actually participate in the activity, typically without anyone knowing you are observing. For example, you may make some purchases in the local market as if you were just a regular shopper but you really are evaluating the costs of merchandise in the market versus in established stores. Participant data are often used to check the quality of customer service, such as tax advice.

Page 420

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments

Obtrusive Observer:

The people being observed know you are there to observe them. For example, if you come into the marketplace with a clipboard and video camera and are introduced as an observer, then everyone knows you are there as an observer. If you observe a classroom, teachers and students are aware of your presence.

When people know they are being observed, they may change their behavior. For instance, if observing the shopping transactions in the market, the merchants might make a particularly strong effort with bartering and promotion of their best products. Based on only observations, you will not know whether it is your presence or because they normally engage with their customers so energetically. Over time, these effects diminish. The children you are observing on the playground, for example, may become oblivious to your presence. It is good to minimize your influence on behavior by using unobtrusive observation. But observation at one point in time may not give an accurate picture. The activities in the marketplace may be influenced by a recent crime, so the observer may mistakenly assume not many people come to the marketplace. Observation over time and/or follow-up of the use of other measurement strategies should help in giving a clearer picture of behavior. Within any of these options, observations can be structured or semi-structured. Structured observations use a specific checklist to precisely count events or instances according to a specific schedule or a stopwatch to time activities. Semistructured observations may simply note what the evaluator found interesting, typical, unusual, and/or important. Alternatively, the evaluator using a semi-structured observation might engage in continuous note taking about transactions as they occur, or focus on specific actions of shoppers or merchants. The possible variations are many, the method selected will depend upon the situation and the style and preferences of the evaluator. “What is not optional is the taking of field notes” (Patton, 2002, p. 302). Lofland (1972, p. 102) states that field notes are “the most important determinant of later bringing off a qualitative analysis. Field notes provide the observer’s raison d’être. If… not doing them, [the observer] might as well not be in the setting”.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 421

Chapter 8 Patton (2002, p. 302-303) discusses techniques for keeping field notes for observations. He describes the importance of describing everything that the observer believes to be worth noting and not to trust anything to future recall. As soon as possible, the observer should capture any information that has helped understand the context, the setting, and what went on, in the field notes. The field notes should contain descriptive information that during analysis will allow an evaluator to mentally return to the observation to experience that observation. Patton suggests recording basic information, such as: •

where the observation took place

•

who was present

•

what the physical setting was like

•

what social interactions occurred

•

what activities took place.

Patton goes on to illustrate the importance of using specific, rather than general terms – words such as poor, anger, and uneasy are not sufficiently descriptive. “Such interpretive words conceal what actually went on rather than reveal the details of the situation.” Patton summarizes field notes in the following way: “Field notes, then, contain the ongoing data that are being collected. They consist as descriptions of what is being experienced and observed, quotations from the people observed, the observer’s feelings and reactions to what is observed, and field-generated insights and interpretations” (Patton, 2002, p. 305).

Page 422

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments The following are examples of program components to observe: •

•

•

•

•

•

Characteristics of participants (individually and as a group) −

gender, age, profession/vocation, dress, appearance, ethnicity

−

attitude toward subject, toward others, about self

−

skill and knowledge levels

Interactions −

level of participation, interest

−

power relationships, decision-making

−

general climate for learning, problem-solving

−

levels of support, cooperation

Nonverbal behavior (learners, presenters) −

facial expressions

−

gestures

−

postures

Program leader(s) presenters −

clarity of communication

−

group leadership skills, encouraging participation

−

awareness of group climate

−

flexibility, adaptability

−

knowledge of subject, use of aids, other teaching/learning techniques

−

sequence of activities

Physical surrounding −

the room – space, comfort, suitability

−

amenities – beverages, etc.

−

seating arrangements

Products of a program −

demonstrations, facility, plan

−

brochures, manuals, newsletters (Cloutier et al., 1987, section III p. 50).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 423

Chapter 8 Patton (2002, p. 260) points out that just because a person “equipped with functioning senses does not make that person a skilled observer.” He goes on to discuss the importance of training and preparing observers. He identifies six components of training for observers: •

learning to pay attention, see what there is to see, and hear what there is to hear

•

practice in writing descriptively

•

acquiring discipline in recording field notes

•

knowing how to separate detail from trivia to achieve the former without being overwhelmed by the later

•

using rigorous method to validate and triangulate observations

•

reporting the strengths and limitations of one’s own perspective, which requires both self-knowledge and self-disclosure.

Stake (1995, p. 50) developed an issue-based observation form While the example is designed for case studies in education, the observation form can be adapted for other kinds of observations. An adapted version of Stake’s example is show in Figure 8.3. The example uses coding for quick recording. Again, an evaluator can change the coding to meet the needs of different observations. In this example the issues being studied are identified in the large box on the right. Observer: School: Teacher: M F Age 25 35 50 65 Tchr Experience: 0 - - M Direct instr L - - H Θ = Archipolis Synopsis of lesson, activities:

Description of room learning place L - - H science place L - - H compet’n place L - - H

Date: Time: to: Grade: Time of write-up: same day # Students: Subject Comments on science education issues: ϑ 1 response to budget cuts ϑ 2 locus of authority ϑ 3 teacher prep ϑ 4 hands-on matls

Pedagogic orientation textbook L--H stdzd testing L - - H prob solving L--H

Teacher aim didactic L--H heuristic L--H philetic L--H

Reference made to: sci method 0 - - M technology 0 - - M ethics, relig 0 - - M

Fig. 8.3: IssueIssue-based Observation Form for Case Studies in Science Science Education. (Source: (Source: Stake, (1995, (1995, p. 50)

Page 424

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments

Examples of observations in development evaluation The following are some examples of ways of using observation to collect data: •

observe classroom activities to measure the amount of time spent on hands-on learning activities

•

observe the amount of traffic on a road from the village to major town

•

observe the amount of male versus female participation in meetings for information on gender issues

•

observe program offices to observe relationships and interactions among people in the offices.

Planning how you are going to collect the data is another important part of data collection. Developing a way to record the information gathered during observations is a major component of this plan. Three ways to consider to record information include: •

observation guide – this is a printed form that provides space for recording observations.

•

recording sheet or checklist – this is a form used to record observations in a yes-no option or on a rating scale to indicate extent of quality of something. These are used when there are specific, observable items, actors or attributes to be observed.

•

field notes – this is the least structured way to record observations. Observations are recorded in a narrative, descriptive style when the observer notices or hears something important (The University of Wisconsin – Extension: Cooperative Extension, 1996, p. 2),

In the appendix of an article by The University of Wisconsin – Extension: Cooperative Extension (1996, pp. 6-8) there are sample observations guides. Another part of the plan will include choosing the number of observers. Whenever feasible, use more than one observer. Whatever number of observers used, all observers should be trained so they observe according to agreed upon procedures.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 425

Chapter 8 Pilot test the observation data collection instrument. To do this, two observers go to the same area and complete their rating sheets. After they complete their sheets, compare them. If there are big differences, give more training and clarification. If there is little differences proceed with the larger study. But always try to pilot test before beginning actual data collection efforts. This will generate data that will inherently be more diverse as there is less emphasis on the uniformity of data being collected. It is important to note that in determining whether a project or program is meeting evaluation standards, observation is the major tool.

Advantages and Challenges of Observational Data Collection: Advantages:

Collects data on actual behavior rather than self-reported behavior or perceptions. It is realtime rather than retrospective.

Challenges:

Interpretation and coding challenges; sampling can be a problem; observation data collection can be labor intensive.

Tool 4: Surveys Surveys are excellent tools for collecting data about people’s perceptions, opinions, and ideas. They are less accurate in measuring behavior because what people say they do may or may not reflect what they actually do. A key component of a survey is the decision on sample selection; ideally, a sample is taken, the sample is representative of the population as a whole (Sampling is covered in Chapter 9: Deciding on the Sampling Strategy). Surveys can be structured or semistructured, administered in person or by telephone or selfadministered. Structured surveys are precisely worded with a range of predetermined responses that the respondent can select. Everyone is asked exactly the same questions in exactly the same way, and is given exactly the same choices to answer the questions.

Page 426

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments The following shows an example of a question that might be found in a structured survey. Note that the number of response options should generally be an odd number, (i.e., 3, 5, or 7) so that the neutral response is readily apparent to the respondent. The time to use two options is when only a “yes’ or “no” answer is needed. (Sometimes, however, development organizations use even numbered scales in their project evaluation rating scales, to require the respondent to make a choice between a “satisfactory” and “partly unsatisfactory” rating). Semi-structured surveys ask the same general set of questions, but allow many, if not all, of the responses to be opened-ended. The following are examples that follow illustrate the difference between structured and semi-structured survey questions.

Examples of structured structured questions 1. To what extent, if at all, has this workshop been useful in helping you to learn how to evaluate your program? •

little or no extent

•

some extent

•

moderate extent

•

great extent

•

very great extent

•

no opinion

•

not relevant.

2. Do all people in the village have a source of clean water within 500 meters of their homes? (Yes/No)

Examples of semisemi-structured questions 1. What are three things you learned from the program evaluation workshop that you have used on the job? 2. Where are the sources for clean water for the villagers?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 427

Chapter 8 With all surveys and interviews, it is essential to pilot test (pretest) the data collection instrument early on. This means asking a small but fairly representative sample of potential respondents to take the survey and highlight any areas where the questions need clarification. One of the most useful strategies for questionnaires is to sit down with someone while they fill it out, and ask them to reason aloud as they fill it out. This can give some excellent insights into how people interpret questions. More often than not, things considered crystal clear turn out to be confusing or ambiguous in ways that were never anticipated. Evaluators need to pay particular attention when respondents misunderstand a question. If this happens, they need to revise the question and retest before going “live” with the instrument. The following summarizes the advantages of structured and semi-structured surveys. Structured surveys are: •

harder to develop: the survey needs to be absolutely certain to have covered all possible pieces of information, since there are no “catch-all” open-ended questions that could fill in the gaps

•

easier to complete: checking a box takes less time than writing a narrative response

•

easier to analyze

•

more efficient when working with large numbers of people.

Semi-structured surveys are: •

somewhat easier to develop: the survey can include fairly broad open-ended questions that will capture anything else missed in the structured sections, so there is less danger of leaving something else

•

harder to analyze but provide a rich source of data

•

subject to bias in the interpretation of the open-ended responses

•

burdensome for people to complete as a selfadministrated questionnaire.

Three of the most frequent means of conducting surveys are: •

in-person: group or individual

•

self-administered

•

mail, phone, or Internet.

Page 428

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments In-person surveys are useful for an in-depth understanding of experiences, opinions or individual personal descriptions of a process. It is also useful when other approaches will not work; for instance, self-administered surveys only work when the population is literally able to read the language of the survey and are motivated enough to respond. In-person surveys can be done individually or in groups. Self-administered surveys should be short and take no more than 20 minutes to complete (shorter is better). Another choice for collecting data with a survey is to use the postal system or a technology, such as telephone or the Internet. These can save costs for travel during data collection. Surveys can be sent to any area that has access to these methods of distribution. If the area does not receive postal access regularly or has limited telephone or Internet capability, they will be excluded from the survey. In addition, issues of literacy make these approaches less feasible in many development contexts. Self-administered surveys can also be either structured, semistructured, or a combination. These are written surveys that the respondent completes closed-ended questions but include one or two open-ended questions at the end. These often make people feel more comfortable as they can add anything that they feel the survey missed or comment on the survey itself. One or two open-ended questions are all that is needed; more may be burdensome and the respondent may not complete the survey. These open-ended responses can be time consuming to analyze but can provide some useful insights and quotes to illustrate major themes of the report. Of all the approaches, research suggests that people are more likely to give honest responses to sensitive questions in a self-administered survey.

Advantages and Challenges of Surveys Advantages:

Best when you want to know what people think, believe, or perceive; only they can tell you that.

Challenges:

People may not accurately recall their behavior or may be reluctant to reveal their behavior if it is illegal or stigmatized. What people think they do or say they do is not always the same as what they actually do.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 429

Chapter 8

Techniques for Developing Questions Whether the survey is conducted in-person, by mail, phone, or Internet, or is self-administered, the person conducting the survey needs to know about developing questions. Developing questions includes: •

choosing the questions

•

wording the questions and responses

•

sequencing questions

•

choosing a layout for the survey

•

reviewing, translating, and field testing.

Choosing Choosing the Questions When developing questions evaluators can choose among many different forms of questions. Questions can be openended or closed-ended. Open ended questions cannot be answered with a simple response or a simple selection. Often respondents are uncomfortable or unwilling to answer openended questions in writing. Closed-ended questions can be answered with one simple piece of information. The question “What is your date of birth?” is a close-ended question because it can be answered with just the date of birth, no details, just a simple piece of information. A second form of closed-ended question is a dichotomous question, that is, a question whose response has two choices and only one answer, such as a yes/no or true/false response. A third form of closed-ended question is a multiple choice question, that is, a question that has more than one choice for the response, but only one answer. Many experts advise using more closed-ended questions but consider using one or two open ended questions at the end of the survey (Burgess, 2001, Questionnaire Design). Using too many different forms of questions can be confusing for respondents. Jackson (2007, slide 49) suggests using no more than three forms of questions for common people and five forms for college-educated people.

Page 430

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments According to the Living Standards Measurement Survey Study (LSMS) (World Bank, 1996, pp. 21-52), it is helpful develop questions in levels. •

The first level for developing the survey is identifying the important issues to be covered.

•

When the important issues are identified, this will help establish the relative weight of the different sections in the survey.

•

Important issues can then be identified within sections.

•

Question writers may need to learn more about how specific programs work.

•

Once this background work is done, the actual writing of the survey may begin.

Table 8.4 shows how progressively more detail is required at each level of the process. Table 8.4 8.4: Levels of Refinement in Determining Questionnaire Content Level

Description

Overarching Objectives

Define the objectives: for example, to study poverty; to understand the effects of government policies on households

Balance between Sections

Define which issues are most important: for example, the incidence of food price subsidies; the effect of changes in the accessibility or cost of government health and education services; the effect of changes in the economic climate due to structural adjustment or transition from a centrally planned to a market economy.

Balance within Sections

Within the education sector,[for example,] define which of the following are the most important for the country and moment: the levels and determinants of enrollment, poor attendance, learning, and differences in male and female indicators; the impact of the number of years of schooling on earnings in the formal sector and agriculture and the question of how or if they differ; which children have textbooks or receive school lunches or scholarships; how much parents have to pay for schooling.

Write Questions to Study Specific Issues or Programs

In a case where it is decided that it is important to study who has access to textbooks, for example, the question writer will need to know: how many different subjects are supposed to have textbooks available; if the books to be given out by the government are to be given to each child individually or are to be shared; if they are to be taken home or used only in the classroom; if they are to be used for only one year or several; if they are to be paid for; when the books are supposed to be available; and are textbooks bought from bookshops better or worse than those provided by the school.

(Source: World Bank, 1996, p. 23)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 431

Chapter 8 Many times, the step of communicating and consulting with policymakers is not given enough attention. Many policymakers are not familiar with surveys and may have difficulty interpreting complicated data derived from them. As well, they may not know how to imagine how the answers to the questions will be analyzed. For this reason, it is important to show policymakers and program managers examples of tables or other analyses that could be produced from the survey as well as the draft survey itself.

Wording the Questions and Responses Patton (2002, p. 348- 351) describes six kinds of questions that can be asked of people. His purpose for classifying questions types is that by distinguishing the type of question, it forces the evaluator to be clear about what is being asked and helps the respondent to respond appropriately. The six types of questions are: •

experience and behavior questions – what a person does, or aims to do to elicit behaviors, experience, action, and activities

•

opinion and values questions – aimed at understanding the cognitive and interpretive processes of people (head stuff as opposed to actions and behaviors)

•

feeling questions – aimed at eliciting emotions of people to experiences and thoughts, looking for adjective responses, such as: anxious, happy, afraid, intimidated, confident, and so on. The question should be sure to tell the person being interviewed that the question is asking for opinions, beliefs, and considered judgments, not feelings.

•

knowledge questions – questions that inquire about the respondent’s factual information—what the respondent knows

•

sensory questions –questions about what is seen, heard, touched, tasted, and smelled

•

background/demographic questions – age, education. occupation, and the like, that describe the characteristics of the person being interviewed

Each of the six types of questions can be asked in the present, past, or future. For example, an evaluator can ask, what did you know about HIV/AIDS treatment five years ago (past tense); or what did you learn today from the presentation about HIV/AIDS (present tense); or what would you like to learn about HIV/AIDS (future tense) (Patton, 2002, pp. 351352). Page 432

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments It is critically important that the evaluator make it clear to the respondent what question is being asked. If the question is unclear, the respondent might feel uncomfortable, ignorant, confused, or hostile. Also, find what special terms are used by the persons being surveyed use when talking about the project, program, or policy use. Use these words when writing the survey (2002, p. 361). Evaluators need to word questions carefully and clearly. The following is a list of suggestions for wording questions so the respondent knows what information is requested. The following list of suggestions for wording surveys is adapted from the TC Evaluation Center, U. C. Davis: •

Use simple words that will have the same meaning for all respondents.

•

Make questions and response categories specific.

•

Avoid questions and response options that are doublebarreled. (Avoid questions or responses that use the words “and” or “or.”)

•

Avoid questions that assume knowledge. If necessary, provide relevant information in the introduction to the question.

•

Be wary of double negatives in question-response combinations; for example, when “not” appears in the question of a yes/no question.

•

Make response options mutually exclusive.

•

Make response options balanced.

•

Avoid objectionable, intrusive, and/or condescending questions.

•

If you have room, list the response options vertically instead of across the page. It makes them easier to read (2007).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 433

Chapter 8 Besides the above suggestions for wording questions, the evaluator must also be careful not to lead the respondent into giving a desired answer. Many evaluators have confirmed that slight changes in the way questions are worded can have a significant impact on how people respond. Several investigators have looked at the effects of modifying adjectives and adverbs. Words like usually, often, sometimes, occasionally, seldom, and rarely are "commonly" used in questionnaires, although it is clear that they do not mean the same thing to all people. Some adjectives have high variability and others have low variability. The following adjectives have highly variable meanings and should be avoided in surveys: a clear mandate, most, numerous, a substantial majority, a minority of, a large proportion of, a significant number of, many, a considerable number of, and several. Other adjectives produce less variability and generally have more shared meaning. These are: lots, almost all, virtually all, nearly all, a majority of, a consensus of, a small number of, not very many of, almost none, hardly any, a couple, and a few (StatPac, 2007, Question Wording). Frary (1996) also offers many suggestions for designing effective questionnaires. Among his suggestions for rated responses are to: •

order the responses progressing from a lower level to a higher order from left to right. For example: 1) Never 2) Seldom 3) Occasionally 4) Frequently

•

consider combining response categories if responders would be very unlikely to mark “never” and if “seldom” would connote an almost equivalent level of activity. For example: 1) Seldom or never 2) Occasionally 3) Frequently

•

ask responders to rate both positive and negative stimuli (responses from both ends of the scale), in this way respondents must evaluate each response rather than uniformly agreeing or disagreeing to all of the responses

•

if possible use fewer responses, for example if the opinions of the respondents are likely to be clear use a simple 1) Agree 2) Disagree. However, when many respondents have opinions that are not strong or wellformed, use more options, such as 1)Agree 2) Tend to Agree 3) Tend to Disagree 4) Disagree

•

avoid the response option “other”.

Page 434

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments After writing questions, read each question and response to check for language and logic. Is the grammar correct? Does the response follow the question logically?

Sequencing Questions By beginning the survey with questions that are of interest to the respondent, the survey will more likely be completed. Keep the flow of questions logical and avoid complex branching. Group similar or related questions together and try to establish a logical sequence for the questions (Burgess, 2001). Locate personal or confidential questions at the end of the survey. If they are placed at the beginning, some respondents may not continue with the remaining questions because they are unsettled (Frary, 1996). Jackson (2007, slide 49) suggests sequencing questions: •

from easy and interesting to difficult and uncomfortable

•

from general to specific

•

in chronological order

•

ask the most important questions by two-thirds of the way through the survey because some respondents may not complete the survey.

Jackson also advises evaluators to prioritize questions and consider dropping low-priority questions. The purpose of a survey is to collect the data needed not more and more information. He suggests a self-administered survey for most people to be one page. For professional people with at least an undergraduate college degree he suggests the survey be a two pages, front and back, self-administered survey (2007, slide 38).

Choosing a Layout for the Survey The physical layout of the printed survey deserves consideration. The page should have a title and the revision date. It should be uncluttered and free of unnecessary headings. Contact information and return information must be included with the survey. Many evaluators also include a brief introductory statement in case the cover letter is misplaced. To make it easier for the respondents to follow the survey the question and responses should be formatted and worded consistently and neatly. Each question should be numbered. Along with the questions, instructions for how to answer the question must be included. For example, the instructions should indicate how many responses may be chosen or how to indicate your selection (tick in a box, circle the answer, or write in a short answer (Burgess, 2001).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 435

Chapter 8 Jackson (2007, slides 51-52 )suggests placing check off boxes or lines for responding with enough space between them so the evaluator can identify which selection was made because people can be sloppy when completing surveys. He also discusses the importance of the survey having a “professional look” because they usually field a higher response rate. To help with higher response rates, make sure it is easy to return the completed survey by including a self-addressed, stamped envelope or establishing a procedure for retrieving the survey. For the letter introducing the survey, Jackson (2007, slides 5354) suggests including: •

purpose of the survey (how this is in the respondent’s best interest

•

how the person was selected for the survey

•

how the data will be used

•

whether anonymity will be maintained

•

respondent’s time needed

•

any incentives

•

contact of the survey staff for more information or clarification.

Reviewing, Reviewing, Translating, and Pilot Testing The process of survey development is an iterative one. Once the initial version of the survey is drafted, the various interested parties should review the draft in detail. During their review, they should make notes and share their criticisms and suggestions. Revisions are then made to the original draft. This process may need to be repeated several times. In cases where one or more different languages are needed to collect data, the surveys must be translated. Those responsible for the survey need to make sure that accurate translations have been done of the instruments. When translating surveys, a person who knows both languages and is familiar with the purpose of the questions should do the first translation. If feasible, a person who was intimately involved in designing the questionnaire should do a back-up translation. This helps avoid contaminating the interpretations with prior knowledge. Once the survey is translated into the local language, another person should translate it back to the original. This process checks for gaps or misunderstanding in what was translated. This is the only way to assure an accurate translation. Lack of agreement on all parts of the survey need to be reconciled before any pilot testing can be done.

Page 436

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments In most cases, surveys are printed only in the official language(s) of the country and teams of interviewers with skills in communicating in a number of local languages are used to collect data. In these cases, a few key questions or phrases are translated into local languages and presented in the survey manual. For less commonly spoken languages, local interpreters may have to be used. Questions used in surveys should always be worded in simple terms used in the language that is commonly spoken. Questions should not be written in an academic or formal language style. In local languages, the gap between the spoken and written languages and the difficulty of balancing simplicity and precision may be great. This is especially true for those languages that are not commonly used in writing. Once the survey is agreed upon, it should go through a field or pilot test with a small number of subjects. Based on the results of the field test, revisions may be needed. If the first field test suggests many changes, another field test may be needed, covering some or all of the survey. This may be especially so if there were multiple translation problems. The field test is one of the most critical steps of preparing a survey. The goal of a field test is to ensure that the survey is capable of collecting the information it is aimed to collect. A good field test will look at the survey at three levels: •

as a whole – Are all parts of the survey consistent? Are there areas that ask the same question?

•

each section – If the survey has more than one section, does the information looked for in each section collect the intended information? Are all major activities accounted for? Are there any questions that are not relevant?

•

individual questions – Is the wording clear? Does the question allow ambiguous responses? Are there alternative interpretations?

When conducting a field or pilot test, it is important to test the survey with samples from diverse areas and all major language and socioeconomic groups, for example: •

rural and urban

•

individuals employed in the formal sector and informal sector

•

all key language groups.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 437

Chapter 8 If the field test is only needed in one language, it usually takes about one week to complete. If the final survey is to be done in more than one language, it will take more time because a version in each language should be field-tested. At the end of the field test, one or two weeks should be set aside to review the results. The team(s) working on the field test should meet to discuss and agree upon changes needed for the questions or instructions in the survey. The group should be sure to address the survey as a whole, each section of the survey, and each question and instruction. The following is a list of guidelines for conducting surveys.

General Guidelines Guidelines for Conducting Surveys The following are general guidelines for conducting interviews: •

Keep it simple, clear, easy, short.

•

Locate other people who have done the kind of evaluation you are interested in and locate surveys similar to what you think you want to do.

•

Make sure people know why you are asking them to participate.

•

Ask questions that are easy to answer and do not frustrate the respondent’s desire to be clear in his or her responses.

•

Do not ask them for information that requires them to go to a file or other source. If you must do this, you need to let them know in advance so the material can be assembled before the survey administration.

•

Respect their privacy. Treat surveys confidentially and have procedures in place to assure privacy. Make sure you can insure confidentiality. Never promise confidentially unless it can be absolutely delivered.

•

Respect respondents’ time and intelligence.

•

Tell them how they were selected and why their participation is important.

•

Do no harm: keep responses confidential. For example, in your report, use aggregate responses; and assign an identification number to the data and destroy the link to the person’s name.

Page 438

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments

Conducting Face--to to--Face Techniques for Cond ucting Face Interviews Developing questions for interview is similar for all kinds of surveys, but evaluators have different or additional events to consider for conducting interview, including: •

sequencing questions for interviews

•

techniques for interviewing

Sequencing Questions for Interviews Carter McNamera, author of Field Guide to Consulting and Organizational Development with Nonprofits (2007), suggests using the following suggestions to sequence questions in surveys. •

Get the respondents involved in the interview as soon as possible.

•

Before asking about controversial matters (such as feelings and conclusions), first ask about some facts. With this approach, respondents can more easily engage in the interview before warming up to matters that are more personal.

•

Intersperse fact-based questions throughout the interview to avoid long lists of fact-based questions, which tends to leave respondents disengaged.

•

Ask questions about the present before questions about the past or future. It is usually easier for them to talk about the present and then work into the past or future.

The last questions might be to allow respondents to provide any other information they prefer to add and their impressions of the interview. Patton (2002, pp. 352-353) also addresses sequencing questions for interviews. He also likes to begin an interview with questions about non-controversial present behaviors, activities, and experience. He likes to begin this way because these kinds of questions are hopefully easy to answer and they encourage the respondent to talk descriptively. Once an experience or activity is described, Patton likes to ask about opinions and feelings about the behaviors, activities, and experiences from the first questions. In this way, the respondents will feel less threatened and the answers are likely to be more grounded and meaningful. Patton also likes to start with question in the present because they are easier than questions about the past. He then suggests asking questions about the past using the answers to the questions about the present as a baseline. Once he has answers to the present and

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 439

Chapter 8 past questions, only then will he ask questions about the future because the answers to these involve speculation and are less reliable. According to Patton (2002, p. 353), “background and demographic questions are basically boring; they epitomize what people hate about interviews”. Also, he feels they can sometimes make the respondent uncomfortable if they are too personal. For this reason, he keeps background and demographic questions to a minimum. He also advises to never begin an interview with a long list of background or demographic questions. This prevents the respondent from becoming actively involved. If these kinds of questions are needed to make sense of the rest of the interview, they need to be tied to descriptive information about the present life experience.

Techniques for Interviewing This section covers techniques for conducting interviews to collect survey data, including: •

developing an interview

•

suggestions for interviewing people

•

obtaining participation for in-person interview

•

dealing with cultural differences.

Developing an interview interview Table 8.5 summarizes the steps for developing an interview. Table 8.5 8.5: Developing an Interview Step

Procedure

1

Define the purpose of the interview. Link your purpose to the evaluation objectives.

2.

Decide whether you want to ask open-ended or close-ended questions.

3.

Draft interview questions and sequence the questions so they flow.

4.

Prepare an Introduction and Closure for the interview, including: • purpose of the interview • how and why they were selected • close with asking whether they have questions or comments • thank you and follow-up

5.

Prepare to record responses.

6.

Pre-test the instrument.

Page 440

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments

Suggestions for interviewing people people Much can be learned from listening to others. Interviews can be informal conversations, semi-structured interviews, or highly structured interviews. The lines between these kinds of interviews can become blurred. Interviews are usually best when they are conducted like conversations. The participants need to feel comfortable; it helps if they know why their views are being sought and by whom. Typically, they are more comfortable if the interview is confidential. Porteous, Sheldrick, and Stewart (1997), suggest that people conducting interviews should possess the following abilities. They should be able to: •

engage and encourage people to share views

•

start and maintain discussions with strangers

•

refrain from expressing own opinions.

•

maintain confidentiality

•

speak clearly

•

read/write/speak in the language of data collection

•

deal with difficult people

•

provide consistency.

They also suggest that people who conduct interviews should have an understanding of the purpose of the evaluation and the specific evaluation questions. Interviewers should also have a familiarity with the data collection technique and their role in it (previous experience is preferable.) Good interviewers possess the following traits: •

excellent memory

•

flexibility

•

friendliness

•

good sense of timing

•

good listening skills.

How many interviewers? This will depend upon many things. Before deciding how many interviewers, consider the following: •

type and length of the survey instrument

•

number of people from whom to collect data and their schedules

•

how difficult it is to reach people

•

overall timelines of the evaluation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 441

Chapter 8 For example, consider a survey to collect information about attitudes towards injury prevention. The survey will be collecting information by telephone. For this survey, the “guesstimate” of the number of interviewers needed might involve the following reasoning: •

each interview takes about 10 minutes to complete

•

need to interview parents who work outside the home

•

one interviewer can do roughly three 10 minute interviews an hour (includes the time the interviewer is actually interviewing someone plus the time it takes to successfully contact and enlist them)

•

the best time to reach most people is probably between 5 PM and 9 PM in the evenings and from about 10 AM to 5 PM during weekends

•

can collect data for four hours, five days a week and seven hours a day on weekends (for a total of 34 hours of available interviewing time per week)

•

need to complete 200 interviews, so based on three interviews an hour that will take about 66 hours

•

one person working full-time could complete the interviews in a little over two weeks, which may be a bit draining and intense for one person

•

need to get the data gathered and analyzed as quickly as possible.

Based on this reasoning, you decide to use two interviewers, providing for about 100 interviews to be performed in a little over one week. Generally, aim for consistency and involve as few data collectors as possible. Once the data collectors are selected, establish a protocol to help maximize the consistency of data collection. An interview protocol is a detailed plan for conducting an interview. An interview protocol is important because it identifies the important background information interviewers need as well as a set of primary questions and a set of probing questions (secondary questions) that are associated with each of the primary questions.

Page 442

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments An interview protocol can contain the following kinds of information: •

a description of the program and respondents

•

a clearly stated purpose of evaluation and of the data collection tool

•

how to introduce and explain the tool

•

how to record answers

•

an outline of what the data collector is supposed to do, when, why, where, with whom, and how

•

who to refer the respondent to if the subject matter is upsetting

•

how to answer questions respondents ask

•

how to reach a supervisor if the respondent so requests.

Before starting the interview process, the interviewers need to be trained to conduct the interviews. Porteous, et. al. (1997) suggest including the following when training data collectors: •

cover all information in protocol

•

meet together

•

start by giving a brief overview

•

explain roles and responsibilities

•

walk data collectors through their tasks

•

review data collection technique

•

do a trial run or walk through of the interview.

During the trial run, be sure to provide feedback to learn more about the survey instrument. If the data collectors are not comfortable, they can receive additional training. How much pre-testing? The number of tests depends on type of tool and its complexity. In addition, if there are significant changes after the first pre-test, it may be best to pre-test again. An interview is a dialogue between a skilled interviewer and the person being interviewed. The goal is to elicit information that can be used to answer evaluation questions. The dynamics of interviewing are similar to a conversation, only the interviewer guides the conversation and is an attentive listener. The quality of the information obtained is largely dependent on the interviewer’s skill and personality (Lofland & Lofland, 1995). The key to a good interview is a good listener and a good questioner.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 443

Chapter 8 Computer assisted telephone interviewing (CATI) is an interactive computer system that aids interviewers as they ask questions over the telephone. With a CATI system, an interviewer begins an interview but the computer program controls branching to or skipping among questions. The choice of questions is based on the answers to other questions, allowing more personalized and sophisticated interviews than with paper questionnaires. During the interview, interviewers enter data into the program along with simple coding directly into the computer system. Most questions are in a multiple choice format, so the interviewer only points and clicks on the correct answer and the computer system translates it into a code and stores it in a data base (UNESCAP, 1999, pp. 25-26). If the respondents have telephones, CATI can save considerable time and money for collecting data. With traditional data collection, field interviewers make door to door visits, incurring costs for transportation and travel. Additionally, when respondents are not available, interviewers need to return at a different time for the interview, adding considerable time to the data collection method (UNESCAP, 1999, p. 25). Semi-structured or unstructured in-person interviews are useful when the evaluator wants an in-depth understanding of reactions to various experiences or the reasons for holding particular attitudes. It is often more practical to interview people about the steps in a process, the roles and responsibilities of various members of a community or team, or a description of how a program works, rather than to attempt to develop a written survey that captures all the possible variations. With good rapport and interesting questions, people will often be willing to be interviewed for an hour or more, whereas they would be very unlikely to spend that amount of time filling out a questionnaire. It is better to have two peopled conducting the interview. In this way, the interviewers can compare notes. Also, it helps to later resolve disputes about what was said. Semi-structured interviews should have a purpose; know what questions to ask and what information to obtain. Taking good notes is essential. It is hard to write as quickly as people speak, so it is important to try to capture the key points and words that will spark your memory.

Page 444

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments Leave time after each interview to review notes and make additions. Write up the notes soon after the interview. When traveling to other countries or distant regions within a country, some think that the way to maximize the use of time is to pack in as many interviews as possible. However, it is extremely important to leave time between interviews to do at least a preliminary write-up of your notes. It is surprising how difficult it is to make sense of the notes taken during an interview even just a day or two earlier. For some key interviews, consider letting the interviewees read the summary of your notes, when you have them prepared, to ensure you are correctly capturing their words. Sometimes when you are interviewing government people, they assume their responses are public. There are times when they want to say something but they do not want it attributed to them. When you guarantee confidentiality, it is your moral obligation to protect your sources. Consider tape recording the interview. Be sure to check with the interviewee and get permission before recording. Consider sending out the questions ahead of time, so people feel comfortable with the line of questions that will be asked, assemble needed information, or be aware of the time needed for the interview. What helps make a good interview? Being a good listener. Avoid dominating the conversation or using visual cues such as nodding or shaking the head to encourage certain answers. Accept whatever they say with empathy and without judgment. The recorder is there to ask questions and record answers. Instead of writing up your notes, consider dictating them onto a tape recorder. Be sure to thank the interviewees. Consider sending thank you notes. Table 8.6 summarizes the steps for conducting interviews.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 445

Chapter 8 Table 8.6 8.6: Conducting Interviews Step

Procedure

1

Let interviewees know: • purpose and timing of the study • why they are being interviewed • how they were selected • how the data will be used • whether it is confidential • how long the interview will take • content of interview by sharing questions in advance • whether you might want to talk to them again • whether they will get a copy of the final report • that a summary of your notes will be made available to them if desired. Try to pick a time and place that is quiet and free of distractions

3.

Ideally, have a second person to help take notes.

4.

Consider tape recording the interview. If you do, be sure to check with the interviewee and get permission before recording.

5.

Stick to your script. • if asking close-ended questions, ask them exactly the way they were written • if asking open-ended questions, “go with the flow” rather than always directing it.

6.

Be aware of cultural norms, such as eye contact, direct questions, or gender issues.

7.

Balance: if you ask about what they think are the major supports, follow with what you think are the major barriers.

8.

Try to avoid asking “why” questions, if doing so is seen as aggressive or critical.

9.

Accept whatever they say with empathy and without judgment.

10.

Take good notes without distracting from the conversation. • Write while maintaining eye contact. • Write key words or phrases, not verbatim. • If someone is saying something you want to capture, it is OK to ask them to repeat it or finish what you are writing before asking the next question. • If someone says something important, you may want to ask “Would you mind if I use your exact words?”

11.

Write up the interview. • Every word and idea is valuable. • Take time to write up your notes as carefully and in-depth as possible. • It is best to do at least a brief clean-up of notes immediately afterwards (leave an hour between interviews.) •Write up full notes within a day of the interview Share with respondent to check their memory of responses and to agree on what was said.

12

Page 446

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments

Obtaining participation for for inin-person interviews interviews •

Identify who you are, the purpose of the study, why you want to survey them, how they were selected, and how the information will be used.

•

It is general practice to send a letter ahead of time to explain all this to those you want to interview prior to calling to set up an appointment; you may want to include a copy of the interview guide as well.

•

Keep it simple and respect their time. Tell them ahead of time about how much time it will take and stick to that.

•

Offer to share a summary of what you understand from the interview. This might be especially useful to give the interviewee (especially if it is a high ranking official.)

•

In-person interviews can take up to an hour, even longer if it is interesting to the interviewee.

Dealing with with cultural differences differences When conducting an interview, pay attention to the reactions of your subject. Each culture has its own values and customs. A question or gesture may offend people from one culture and not another. The interviewer must be ever vigilant in an attempt to learn information, but not offend people. Before each interview, learn more about the culture of the person you will be interviewing. If you are interviewing someone from a different culture, consider discussing interview techniques with someone who is familiar with this culture or using an interview from the same culture as the interviewee. Try to find out interview protocol, such as: •

the amount of physical space between people who are talking with each other

•

the amount of eye contact that is appropriate

•

the significance of voice inflections when asking questions, and the significance, if any, of head movements and other body language during a conversation

•

what is appropriate professional clothing

•

what is an appropriate form of greeting.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 447

Chapter 8 One key area to investigate for each culture is the role of gender. Be sensitive to the role of gender. For example, in some cultures, it may be inappropriate for a male interviewer to be alone in a room with a woman who is being interviewed or in other instances for the male to even interview the female. A general practice of always having an opposite-gender witness present would be good procedure for such situations. Another example is of a female interviewing a male. In certain cultures the male might react adversely, such as “clamming up” and not vocal or simply giving minimal responses. Regardless of cultural differences, there are some constants for cultures from all parts of the world: •

Every person appreciates being treated with respect.

•

Even those who come from cultures noted for selfsacrifice and community thinking, have a sense of selfvalue and appreciate being treated as individuals.

•

Every person appreciates feeling as if his or her opinion matters to you.

Advantages and Challenges of Interviewing Advantages:

Can be structured, unstructured, or a combination. Can explore complex issues in-depth. Forgiving of mistakes: unclear questions can be clarified during the interview and changed for subsequent interviews. Can last one hour or more, depending on perceived importance and interest. Can provide evaluators with an intuitive sense of the situation

Challenges:

Can be expensive, labor intensive, and time consuming. May not be able to explore why people have different viewpoints. Selective hearing on the part of the interviewer may miss information that does not conform to pre-existing beliefs. Cultural sensitivity: gender issues.

Page 448

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments

Self--Administered Techniques for Developing Self Questionnaires Writing survey questions is hard to do because they have to be understandable to everyone. Words have multiple meanings and connotations. If a question is asked in a way that people do not understand or understand in a variety of ways, people will be essentially be responding to a different question. There is also a risk of getting useless data when questions are poorly constructed. For example, the agency head may want to find out how much computer-training people in his organization have had. You might ask a series of questions: A. Have you had any training in the past three months? B. Have you had any training in the past six months? C. Have you had any training in the past year? The problem with asking this set of questions is that everyone who had training within the past three months will answer “yes” all three questions. When they check yes to A, B and C, the data are essentially useless. When writing a survey, it is important to make sure the “gates are closed”, so that people cannot slip through to other questions with the same information. How can these questions be saved? One possibility is to ask: how many training courses have you attended in each of the following time periods: a. 1-3 months ago: _____________ b. 4-6 months ago:_____________ c. 6-9 months ago:______________ d. 10-12 months ago:____________ e. 12-24 months ago:____________ Poorly worded questions may frustrate respondents, causing them to guess at answers or even throw the survey away. Either way, the results will be compromised. Since poorly constructed questions cannot be saved in analysis, prevention is the best strategy. Leave plenty of time to have people review the survey and to pre-test it. Some tips and tricks in writing effective survey questions are in Table 8.7. These are intended as guidelines rather than an exhaustive list of procedures.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 449

Chapter 8 Table 8.7 8.7: Questionnaire Tips and Tricks A. B.

C.

D.

E. F. G. H.

If possible, use an existing questionnaire as a guide. Modify as needed to fit your situation. It is easier to modify then it is to create one from scratch. Basic guidelines on writing questions: • Use simple, clear language that is appropriate for the respondents. • Ask only one question at a time, for example: “To what extent, if at all, is the material clear” rather than “To what extent, if at all, is the material clear and helpful.” If the material is clear but not helpful, there is no way the person can give an accurate answer. • Write your questions so that everyone feels their responses are acceptable. Lead into your question by giving the range: "To what extent, if at all," Or "How important or unimportant are...." • Provide appropriate response categories that are mutually exclusive. If asking age groups, make categories; 20-30, 31-40, 41-50 rather than 2030, 30-40, 40-50. • When possible, write questions so that responses range from negative to positive: To what extent, if at all, was … Very Unhelpful ←…………→ Very Helpful • Avoid "yes" or "no" responses. Instead, try to capture a range of views by asking people to answer along a scale. For example, provide a 5 point scale ranging from a "little or no extent" to "very great extent" • Avoid absolutes at either end of the scale (few people are absolute about anything). For example, you can soften the ends by using "always or almost always" at one end of the scale and "never or almost never" at the other end of the scale. • Ask questions about the current situation. Memory decays over time. • Leave exits (use "no basis to judge" and "no opinion" categories). If you do not provide them an exit, they may make meaningless responses but you will not know that. • Avoid using double negatives. Make the survey easy for people to complete. Provide them with boxes they can check. Provide sufficient instructions so respondents know what they are to do: indicate, "check only one" or "check all that apply" when appropriate. Ask general questions first, then demographic questions, then more specific questions, and then one or two final open-ended question: "any comments or anything else we should know?" Demographic questions: only ask what you will use. Be sensitive that some people can be identified by demographics. Have your draft questions reviewed by experts. If the questions need to be translated, Have that done, and then have them translated back to the original language to check the translation. Pre-test, Pre-test, Pre-test!! Do as many rounds of pre-testing until you feel you have caught the major errors. Have typical respondents answer the questionnaire rather than just read it. Then go back through each question to get their feedback about: • Is each question clear? • Did they understand what was being asked? • Are there unknown words or unclear phrases? • Is there a better way to ask each question?

Page 450

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments Remember, a self-administered survey should take no more than 20 minutes to complete! Keep the questionnaire simple, using as few questions as possible. It is sometimes helpful to go through and decide which questions are essential, which ones are nice to know if there is room, and which ones are not needed. It also is important to develop a plan to analyze the data. This helps eliminate unnecessary questions, and is a check to make sure everything needed has been asked. Pre-test the questionnaires. Select a few typical respondents. Be present when they complete the survey. Keep track of how long it takes to complete. Observe if they seem to have difficulty or turn to previous pages. After they have completed the pre-test survey, de-brief them to gather further insight. Ask: •

What was clear, what was not?

•

What questions are missing?

•

What questions are unnecessary?

Make changes based on results of pre-test, and pre-test again.

Response Rates One of the major issues in survey research is the response rate. Response rate is the percentage of people who actually participate out of the total number asked. A good evaluator always gives the number of people (or units, such as organizations) surveyed, the number who responded, the response rate, and a description of efforts that were made to increase the response rate (i.e. follow-up telephone calls or letters.) What is desirable as a response rate varies, depending on the circumstances and uses of the survey data. The problem with low response rates is that it becomes a volunteer or selfselected sample. The problem with a volunteer sample is that people who choose to participate might be different from those who choose not to. Maybe only people who are really angry at management will choose to answer a survey; this will result in a more negative assessment of management than would have been the case if everyone, or at least a more representative group, had participated. For example, an organization conducts an employee attitude survey of all employees but just thirty percent of the people complete the survey (30 percent response rate). If the most dissatisfied tended to answer the survey while those who were satisfied did not, it will be hard to use the data to understand the views of all employees. It would be a mistake to make decisions based on these results without getting more information on who responded and who did not.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 451

Chapter 8 With a low response rate, it is sometimes possible to check demographics to determine if the respondents are generally similar to the larger population. For example, an evaluator can look at the demographics of the survey respondents to see if they generally match the larger population in terms of such characteristics as age and gender. If it is similar, the evaluator may cautiously proceed. If the survey demographics are different, the survey results should be interpreted with caution. Evaluators should report such analyses of nonrespondents and how they affect. Survey results with low response rates must always be reported in terms of the number of respondents, recognizing that these results may or may not be an accurate reflection of the larger group of employees. That is, the data should be expressed as “of the 87 respondents” or “75 percent (N=60) of the survey respondents reported...” (But do not use percentages of the total number of those sent the survey if less than 50.)

Surveys with response rates under 70 percent should not be reported unless the evaluator has analyzed response bias and demonstrated the survey’s validity.

Participation in surveys is usually voluntary, so the strategy must be to increase the willingness of people to participate. For all survey methods, it essential to let the respondents know who is sponsoring the evaluation, the purpose of the survey, how they were selected, how the information will be used, and whether they will get a copy of the report. An accurate estimate of the amount of time that is needed to complete the survey should be stated. A contact person and phone number should also be provided in case they wish to verify the legitimacy of the study. Both confidentiality and anonymity should be assured and absolutely protected. Confidentiality is an ethical principle that protects information by allowing it to only be accessed by those who are authorized. Anonymity means that the personal identify or personal information about a person will not be shared. Many evaluators use a statement on surveys such as, "your responses will be treated with confidence and at all times data will presented in such a way that your identity cannot be connected with specific published data" (Burgess, 2001). It is important that if you promise confidentiality and anonymity, you follow through on the promise.

Page 452

The Road to Results: Designing

Selecting and Constructing Data Collection Instruments Figure 8.4 presents a checklist for response rates. Checklist for Obtaining Good Response Rates for Mail Surveys Survey must look professional: printed, maybe in booklet form or on colored paper, and error free. Make sure you have a person they can call if they have questions or want to verify the legitimacy of the survey. You have to induce them to participate. Do this in the cover letter, which identifies who you are, the purpose of the survey, and why their participation is important. Make it personal. Personally address the letter and envelope. If you assure anonymity, mean it. Never, ever ask people to self-identify on a survey. This means that you do not ask names or identifying numbers. Provide a self-addressed, stamped envelope. Be prepared to do one to two or more follow-ups to strengthen your response rate. Fig. 8.4 8.4: Obtaining Good Response Rates for Mail Surveys.

When considering whether to collect data by mail, by phone, by e-mail, or with an in-person interview; it is important to bear in mind the trade-offs involved. Table 8.8 presents some guidelines about the pros and cons of each method, and when each should be used. Of course, now, Internet surveys are more common. But when doing evaluation, in the development context, issues of literacy, telephone access, and Internet access may make these approaches less feasible.

Cover Letters Whether you are sending a questionnaire by mail or e-mail, or administering it, include a cover letter. The cover letter will introduce the sponsor and give the participants more information about the purpose of the questionnaire. The cover letter should include the following: • •

be personally addressed identify who is sponsoring the survey

•

state the purpose of the interview and an overview of questions

•

state how the information will be used

•

assure confidentiality and/or anonymity

•

provide a name and phone number (or email address) of contact person

•

provide instructions for returning the survey

•

indicate if the respondent will receive a copy of the report.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 453

Chapter 8

Table 8.8 Semi--structured Interview Data Collection Options 8.8: Comparison of Mail or Internet Survey, Structured Interviews, and Semi

Option

When to Use

Characteristics

Strengths

Limitations

Mail survey or Internet survey

Many people

Structured

Inexpensive

Dispersed

Easy to fill out

Easy to analyze

Time consuming to develop and conduct

Moderately complex questions

Takes less than 20 minutes

Reach large sample Comparable data

Sensitive questions

Consistency Greater anonymity

Must know what you want Unforgiving with mistakes Response rate is a challenge Assumes literacy Impersonal Assumes access to the Internet or postal system

Structured interviews

SemiSemi-structured interviews

Complex questions

Structured All interviewers ask the same questions

Complex questions Complex processes Exploratory

A handful of broad, general questions

Allows for clarification of questions

Time consuming to conduct Expensive Interviewers could bias responses

Allows for probes and clarifications Doesn't filter, preclude, or limit responses More forgiving of mistakes

Time consuming to conduct Potential interviewer bias Interviewer skill Tape record Data not comparable Analysis is labor intensive

Page 454

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments

Tool 5: Focus Groups A focus group is a qualitative evaluation methodology in which small groups of people are brought together to discuss specific topics under the guidance of a moderator. But the structure of the focus group is anything but informal. There is a script, a set of open-ended questions that are prepared ahead of time. The moderator can improvise, though, with probes or additional questions as warranted by the situation. The group process tends to elicit more information than individual interviews because people express different views and engage in a dialogue with each other. The moderator is able to facilitate the dialogue as well as explore their reasons and feelings behind those differences. The conversation is often not linear; participants may bring up information or different perspectives at any time. A focus group is not simply a group interview!

Purpose of Focus Groups Billson and London (2004) describe the purpose of focus groups as to elicit reliable data, not just interesting information. Focus groups can: •

help develop a survey questionnaire

•

clarify sample selection

•

contextualize survey data

•

be used in tandem with surveys

•

be used as a separate data collection tool.

The purposes of focus groups can also be grouped into four categories: •

•

•

exploring −

explore issues, language

−

test methodological approaches

−

understand the setting

−

formulate hypotheses

triangulating −

use multiple methods to enhance validity of data

−

enrich and broaden data

pre-testing −

test questionnaire items

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 455

Chapter 8

•

−

assess initial reactions to programs, products, or ideas

−

explore impacts on relevant groups

uncovering meaning −

identify meaning on multiple levels

−

explore unexpressed meanings, beliefs, values, and motivations

−

elaborate upon complex accounts.

Billson and London also identify the following areas of inquiry where one could consider using focus groups to collect data: •

group interaction

•

complexity of resources

•

“how” and “why” rather than “whether” and “how much”

•

contextual responses not “yes” or “no” responses

•

triangulation of multiple methods

•

immediate feedback

•

complexity of behaviors and motivations

•

the range of intensity of opinions

•

views on sensitive topics

•

when respondents are not comfortable with “paper and pencil” methods.

Billson also lists the following situations where NOT to use focus groups: •

if you want to gather statistical data

•

if language barriers are insurmountable

•

if you have little control over the situation

•

if you cannot establish trust

•

if you cannot ensure free expression

•

if confidentiality is critical.

Focus groups can contribute critical data for organizational analysis but should not be asked to directly:

Page 456

•

solve problems

•

resolve conflicts

•

make decisions

•

build consensus.

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments For example, one evaluation considered using a focus group to gather data for a needs assessment. Here it looks for expectations of staff, participants, and/or beneficiaries to help identify and define needs. Focus groups can also help with program evaluations during workshops and conferences, rapid assessments, and/or participatory evaluations.

Typical Elements of Focus Groups: Focus groups generally involve small groups (6 to 12 people). The composition of people in a focus group depends upon the purpose of the focus group. Most focus groups are homogenous; but some are diverse. It is important to consider status. Most focus groups comprise people from similar status (i.e. teachers in one group, students in another, or supervisors in one group, employees in another group). Who can and cannot be in the same focus group will depend upon the situation and the culture(s) involved. The following are common elements of focus groups. •

comfortable, safe surroundings

•

refreshments (essential)

•

monetary incentives may be used

•

transportation and/or childcare arrangements are often needed

•

skilled moderator (or facilitator)

•

note taker (takes notes, manages the audio taping and handles whatever comes up)

•

sessions are tape-recorded and ideally, a verbatim transcript is prepared for each focus group.

•

begins with a very clear explanation about the purpose, why their views are important, how they were selected, what a focus group is, and the rules of the process.

•

key rule is stated and understood: "what is said in this room stays in this room"

•

moderator guides the process, keeps the group focused, makes sure everyone has the opportunity to voice their views and that a few people do not dominate the conversations.

•

few questions are asked by the moderator, who follows a guide developed specifically for the session.

•

all questions are open-ended, moving from an easy, conversational question to the more serious questions, and ending with summary and wrap-up questions that allow for impressions to be corrected if necessary, and any additional comments and ideas to be recorded.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 457

Chapter 8

Advantages and Challenges of Focus Groups Advantages:

Relatively quick and easy; may take less staff time than in-depth in-person interviews; provides flexibility to make changes in process and questions; ability to explore different perspectives. It is fun! It is different than a group interview.

Challenges:

Analysis is time consuming; potential challenges include participants might be different from rest of population, risk of bias in interpreting the data; and the risk of the group being influenced by moderator or dominant members of the group.

Techniques for Focus Group Evaluation Design Billson notes that, when setting up a focus group, the evaluator(s) are responsible for deciding all of the logistics of the focus group, including: •

who should be in the focus group

•

how many focus group sessions to conduct

•

where to conduct the sessions

•

when to conduct the sessions

•

what to ask the participants

•

how to analyze and present the data.

A typical focus group project requires six to eight weeks lead time. The length of time depends upon logistical factors and how complex or urgent the project is. It also depends on the accessibility of the decision makers and the difficulty in recruiting the desired sample of participants.

Page 458

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments The process of focus group evaluation works best if it follows a process that flows from one key step to another. The following is a short description of each of the key steps (Adapted from Billson, 2004, pp. 14-16). •

Step 1: Clarify the key evaluation questions −

•

Step 2: Design the evaluation approach −

•

Immediately after each focus group, when the data are fresh, share insights generated during the focus group with clients and other interested parties. Record additional information not openly discussed (impressions, conclusions, etc.) for use in the next step.

Step 7: Analyze your data −

•

You need to identify and use good moderation techniques during the focus group sessions.

Step 6: Debrief observers/evaluators/clients and record additional information −

•

You need to recruit the appropriate respondents.

Step 5: Specify your moderation techniques −

•

The moderator’s guide needs to develop all of the needed protocol (structure or code of behavior) for the evaluation that does not bias responses but directs the group towards key issues.

Step 4: Recruit your participants −

•

Based on the evaluation purpose and key questions, design the general approach and flow of topics to get at the key questions.

Step 3: Develop your protocol (moderator’s guide) −

•

Conceptualization comes first, which means clarifying the key evaluation questions. If clients and evaluators are not clear about the key questions that the focus groups are supposed to answer, the entire process will be frustrating.

If the focus group has worked well, it will produce a mountain of words and ideas. These are qualitative data requiring special analytical techniques, particularly content analysis.

Step 8: Present your findings −

Report findings in a way that is meaningful and useful to others, particularly to your client. Use oral, written, or video formats, or a combination.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 459

Chapter 8 Figure 8.5 illustrates the above process for focus group evaluation.

Conceptualization

Design

Recruitment

Protocol

Moderation

Debriefing

Data Analysis

Reporting

(Source: adapted from Billson, p. 15) Fig. 8.5 8.5: Model for Focus Group Evaluation Design.

Techniques for Planning Focus Groups Focus group sessions normally range from one to two hours in duration. For most projects, 100 minutes allows time to follow through on each major line of questioning without exhausting or boring the participants or the moderator (Billson 2004, pp. 21-25). While evaluators might use a room with a two-way mirror so the participants can be observed, this is not typically available to most evaluators because of budget constraints. Focus groups should be held in a neutral setting, if possible. It is sometimes better to choose an easily accessible location, however, over a neutral one. Typically, participants sit around a table or chairs arranged in a circle, to facilitate conversation among participants. It is possible for a few observers to be present in the room while the focus group is conducted. They should be introduced at the Page 460

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments beginning of the session and it should be explained why they are there.

Facilities for Focus Groups The ideal focus group setting is a commercial facility specifically designed and staffed to place participants, clients, and moderators at ease. These facilities contain high-tech audio and video recording equipment installed and monitored by a technician to ensure the quality of the recordings – the optimal conditions for a session. These commercial facilities are not often available in developing countries and the evaluator will need to find a facility and adapt it to meet the needs of the group and the ability to capture information from the group. If there is not a specific, specially equipped focus group room, consider using a hotel meeting room, a school or church meeting area, or some other informal setting. The room should have a large area with a large area where the group can gather to discuss. Some sessions use a conference table and chairs, others comfortable chairs arranged in a circle. The room should have access to beverage and food preparation. It is helpful to have a separate area for observers and recorders to watch the session and record information. What is probably most important, try to find a facility that is most convenient and accessible for the majority of the participants.

Materials Needed Audio recorders or videotape recorders are extremely useful in providing an accurate transcript of a session. If professional services are unavailable, we might be able to arrange for an amateur audio or video recording. Or we might only have notes taken by a colleague sitting in the same room as the focus group participants. Although in-room observers and notetakers can create problems with confidentiality and perceived “safety” of the group, most participants soon forget that they are being taped or that an observer is taking notes. Taking notes on a laptop computer can speed up the initial data analysis and report writing. If you are not using audio or video taping, it is strongly recommended to have two notetakers to document the sessions. The moderator should not have to take notes. This takes the pressure off the moderator, so that he or she can concentrate on one thing – ensuring that participants address the evaluation questions.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 461

Chapter 8 To assist with communication during the meeting, consider using name tents (place cards printed in large type and folded to sit upright on a surface; MS word has templates for creating these). Alternatively, use name badges, although some people feel uncomfortable wearing these. Either option allows the person collecting data to refer to the participants by name.

Number of Focus Group Sessions There is no fixed rule about how many focus group sessions to do. The general rule is to do them until the same themes emerge or until no new information emerges. This usually happens after three to six sessions. It may be useful to ask a core set of questions of every focus group and then add different questions or probes that are more extensive once it becomes clear that the themes are consistent. It helps the evaluation team to debrief with each other after each session so decisions can be made about adjustments to the protocols in subsequent sessions. This is a very adaptable and fluid approach to data collection. Do not over-schedule. Two or three focus group sessions in one day are the maximum one facilitator can achieve.

Recruitment The key factor for recruitment is that the approach to selecting participants must not bias the evaluation results in any predictable way. No category of potential participants should be overlooked if their responses can be expected to alter or expand the results. When selecting participants for focus groups, consider the following: •

participants should reflect diverse constituencies and diverse views

•

may need to use homogeneous groups, because:

•

Page 462

−

mixing gender or race may be an issue

−

mixing social class may be an issue.

−

mixing manages with staff may be an issue

−

mixing clients with staff may be an issue

cultural norms are important.

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments Dawson and Manderson (1993, Part II, Section 7) suggest, as a courtesy in some communities, contacting the local leader, or perhaps a high-ranking person in the organization, to obtain permission to enter the community. This person can also help locate participants and can be of great use in arranging a site for the session. This is a courtesy to explain your purpose, but try not to give details of the session as this could influence responses. When recruiting, consider the daily activities of the participants and be sensitive to the amount of time they would have available to give up for a two-hour session. For example, a focus group scheduled late afternoon might interfere with the preparation of an evening meal and may limit the number of people who are willing to participate. If participants need to travel a distance to attend the meeting and linger for a while talking to friends, then return home, they might easily have lost half a day or more.

Focus Group Protocol The protocol for a focus group is called the moderator’s guide (Billson, 2004, pp. 35-54). This guide must provide a structure to the session that will direct the group toward exploring the key issues, and provide for the collection of relevant and unbiased data. A focus group session does not consist of the moderator going around the room and asking each person to respond to each question. That style of questioning is more like a survey and will not produce the same quality of data as a focus group. The questions during a focus group should inspire each person to ponder how they feel, what they believe, and what they think. Then explore the complexities for each response.

Focus Group Questions Before beginning to design questions, clarify how the data will be used. The following are additional principles to follow for designing questions: •

Avoid vague, confusing wording.

•

Ask one question at a time.

•

Avoid assumptions that are leading or misleading.

•

Avoid questions that introduce bias into the thinking of the respondents, skewing the responses.

•

Avoid supplying alternative responses.

•

Make it interesting.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 463

Chapter 8 Mechanical questions elicit mechanical responses. In a focus group, a more effective way of approaching questions is to ask open-ended questions. A way of looking at this is to use what is called the “protocol funnel”. A protocol funnel begins with questions asking for a broad conceptualization and goes through stages to finish with probing questions. Figure 8.6 illustrates the protocol funnel.

Broad Conceptualization Key Evaluation Questions General Questions Specific Questions Probes

Fig. 8.6 8.6: The Protocol Funnel.

The focus group protocol follows the principles of qualitative interviewing. Which are: •

clarify the concepts to be explored

•

use open-ended questions

•

keep questions to a minimum number of topic areas

•

end with closure-type questions

•

pre-test and refine the interview.

The actual focus group session can be broken into four phases: •

Page 464

Phase I: Preamble or Opening Statement −

puts participants at ease

−

the moderator explains the purpose

−

moderator provides ground rules

−

all persons in the room introduce themselves.

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments •

•

•

Phase II: Introductions and Warm-up −

participants relate experience and roles to the topic

−

the moderator stimulates group interaction and thinking about the topic.

Phase III: Main Body of Group Discussion −

deep responses are elicited

−

emergent data are connected to create a complex, integrated basis for analysis

−

broad participation is ensured.

Phase IV: Closure −

key themes are summarized and refined

−

theories, impressions and hunches are presented to group members for reaction

−

Participants are invited to provide a round of final comments and/or insights – “key lessons learned.”

When planning the sequence of questions, plan to begin with ice-breaking questions. After those questions, move from the least threatening to the most threatening or sensitive questions. Move from the simplest to the most complex questions. Move from the least controversial/political to the most controversial/political questions. End with closure-type questions. It is important to not rush through the introduction. The moderator should welcome all participants, introduce the members of the evaluation team present and their roles, discuss the purpose of this activity, and provide an overview the focus group process. The moderator should also let participants introduce themselves. All ground rules should be covered and an opportunity for participants to ask questions should be provided. The intent is to make sure people feel comfortable. Sometimes the participants will answer a question on the script before it is even asked. The moderator needs to acknowledge that this question was answered previously and decide whether to ask it again, ask a follow-up question, or skip it altogether.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 465

Chapter 8 At the end of the questions, summarize your understanding of the substantive issues that the session has covered, and ask for confirmation or correction. For example, consider also asking: •

for “last thoughts” or anything else respondents would like you to take back to the evaluation team

•

if you missed anything important, that anyone wanted to add.

Such closure questions may help additional data to emerge. At the close of the session, thank the participants and distribute any incentive that they were promised.

Sample Questions The following are examples might be useful in understanding the kinds of questions to ask during a focus group (Billson, 2004, slides 49-54). Introductions: Let us start by asking you to introduce yourselves. Please mention your title and where you work, as well as the nature of your direct experience in providing services for this program. On Perceptions: Critics of affirmative action allege that the appearance of “favoritism” toward minorities and women may actually undermine their confidence and self-respect. How would you describe the situation in your organization? On Governance: Recent surveys suggest that the current budgets decentralized to districts do not reach local schools or clinics. What is your assessment of these problems? What needs to be done? On Reinventing Government: Some governments have tried to improve efficiency through decentralization. How well has this strategy worked? On Product Satisfaction: You have been using the Bank’s Country Assessments for at least three years. What is your assessment of this report’s usefulness? On Key Lessons: What are the key lessons you would like us to take away from this discussion?

Page 466

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments

Techniques for Moderating Focus Groups The facilitator or moderator of the focus group has a very important role. The facilitator will direct the meeting and manage the time (Billson, 2004, slides 73-88). Facilitators will: •

be familiar with the script, rather than reading it, so the session appears conversational

•

make sure everyone is heard, rather than allowing one or two persons to dominate the discussion, by: −

asking “what do other people think?”

−

stating “We have heard from a few people; do others have the same views or different views?”

•

manage time, closing off discussion, and moving to the next topic when appropriate

•

set ground rules, such as: −

there is no such thing as a wrong comment

−

no criticism of others is permitted.

•

say as little as possible, letting conversation flow across the table with minimal direction

•

keep personal views outside the room

•

use active listening

•

accept all views while managing differences of opinion: −

•

“So, we have different perspectives.”

probe for elaboration −

“Tell me more.”

After the focus group session, the moderator writes impressions immediately. The write-up should include all of the major issues and major points of the discussion. It can also capture anything unusual that happened during the focus group. Ideally, a focus group will be audio taped or video taped. If so, the focus group tape can be transcribed verbatim. If this is not possible, the facilitator should listen to the tape afterwards while writing in-depth notes. Most people are surprised about how much they did not hear during the actual focus group. Once the write-up is complete, the facilitator should compare the write-up with the notes of the “scribe” who took notes during the entire session. Chapter 10: Planning Data Analysis and Completing the Design Matrix discusses analysing focus group data.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 467

Chapter 8

Self--reported Tool 6: Diaries, Journals, and Self Checklists Another method of collecting data is to use a diary (also called a journal or a self-reported checklist).

Diaries or Journals Diaries can be used to capture detailed information about events in people’s daily lives. For example, diaries can give information about how people use their time during the day or to learn about a typical day in the lives of people in a community. Consider asking teachers to keep a diary of their work load, activities, and teaching topics. Diaries, ideally, are kept on a daily basis so that people can more accurately remember. It is a useful tool to supplement other data collection. It does require that people be literate and be willing to take the time to maintain the diary. Typically, the participants will be given a booklet that provides a clear set of instructions as well as an example of a completed set of diary entries. Any unusual terms should be explained. The last page should ask whether this was a typical time or whether anything unusual occurred during the time period of the diary. The last page also provides an opportunity for them to share any other comments. For example, a diary might be used as part of a measurement strategy in an evaluation comparing students’ experiences and reactions in a traditional classroom (lecture style) with those in a non-traditional classroom (hands-on, active learning). By keeping a diary, they will capture their experiences in realtime, as opposed to be surveyed at the end of the school year. The data from the diaries could be used to supplement other data collection.

Examples of uses of diaries or journals

Page 468

•

travel

•

social networks

•

health, illness, and associated behavior

•

diet and nutrition

•

farm work

•

study habits

•

contraception use

•

child rearing practices.

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments Table 8.9 summarizes the guidelines for using diaries or journals. Table 8.9: Guidelines for Using Diaries or Journals Step

Guideline

1.

Recruit people face-to-face. • Encourage participation using highly motivated, personal interviewers. • Appeal to altruism, helpfulness. • Assure confidentiality. • Provide an incentive for completing.

2.

Provide a booklet to each participant. • Cover page should have clear instructions. • Include an example of a completed diary entry. • Include short memory-joggers • Explain any terms such as “event” or “session”. • On the last page, ask whether this was a typical or untypical time period, and for any comments or clarifications. • Include a calendar, indicating when an entry is due.

3.

Consider the time-period for collecting data. • If the period is too long, it may become burdensome for the participants. • If the period is too short, you may miss the behavior or event.

Self--reported Checklist Self A related strategy is a self-reported checklist. This is a cross between a questionnaire and a diary. Participants are asked to keep track of a specific set of activities or events, listed so that the respondents can easily check them off. Checklists can be done on a daily or weekly basis, or every time a particular event or activity occurs. For example, it is possible to have a checklist to capture the times for taking malaria pills or a checklist tracking the number of trips to a water connection and the time of the trip. A self-report is easier to complete and analyze than the diary but it requires that the evaluator understand the situation very specifically to create the checklist.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 469

Chapter 8

Advantages and Challenges Challenges of Diaries and SelfSelf-reported Checklists Advantages:

Can capture in-depth, reliable, detailed data (“rich data”) that might otherwise be quickly forgotten. Good for collecting data on how people use their time. Helps in collecting sensitive information. Supplements interviews, provides richer data.

Challenges:

Requires literacy. May change behavior because people know they their behavior is being observed, Requires commitment and self-discipline, accurate and honest recording of information by the participant. The data recorded may be incomplete, or the participant waited to record information and did not remember correctly (“tomorrow diaries”). Reading people’s handwriting may be a challenge although it may be possible for the evaluator to obtain clarification. It may be a challenge to understand phrases in diaries.

Page 470

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments

Tool 7: Expert Judgment Sometimes it makes sense to engage experts as the source of information or opinion. Consider the role of a book critic or a movie critic. They are considered experts in their field. People use the expert’s judgments to help inform their own decisions or choices. Also think of government task forces. They are another form of expert judgment evaluation. Expert judgment can be used to gather information. Experts can be interviewed separately or brought together as a panel. Interviews can be structured or unstructured. A group process has the advantage of dialogue and discussion that may explore differences in perspectives. The group process can take the form of a group interview where everyone has to answer a set of specific questions. It can also be more free-flowing, taking the form of a focus group. It can also take the form of a panel in which the experts are asked to make a formal presentation about specific issues and then discuss the issues among themselves. Some of the ways expert judgment has been used in evaluation are: •

formal professional review systems

•

informal professional review systems

•

ad hoc panel reviews

•

ad hoc individual reviews (Fitzpatrick, Sanders, & Worthen, 2004, pp. 114-125).

One form of a formal professional review is accreditation. Expert judgment has been used for accreditation for many years. In this process, an organization grants approval (or non approval) of institutions. A group of experts visits a school, university, or hospital and investigates the programs, facilities, and staff at the organization. They usually have a formal, published standards and instruments to ensure they ask and evaluate in a consistent manner (Fitzpatrick, et al, 2004, p. 114). Informal professional review systems also use expert judgment, but do not have the set of formal standards or instruments. In these situations, the group of experts determines the standards for judging (Fitzpatrick, et al, 2004, pp. 118-119). Ad hoc panel reviews are done on irregular schedules, that is, when circumstances demand the review. Usually these are done for a specific purpose and have no predetermined standards. A funding agency review panel is an example of an ad hoc panel review.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 471

Chapter 8 Ad hoc individual reviews are conducted by an individual, contracted for their expertise in an area to perform an individual review of a program or activity. Ad hoc individual reviews are often done to evaluate: textbooks, training programs, media products, job-placement tests, and program plans (Fitzpatrick, et al, 2004, p. 120). A common example of expert judgment is the use of school inspectors. The school inspectors visit schools to evaluate the school, teachers, and administration at the school. They write a report and submit it to the government. The reports can be used in a variety of ways. Whichever technique is used with experts, it is important to have an informed dialogue. Expert judgments should have: •

clearly stated expectations

•

clearly stated evaluation issues, including terms of reference

•

agency officials meet with experts

•

agency staff provide data

•

experts engage in a series of meetings until consensus is achieved.

Selecting Experts Selection of the experts should pass the reasonable person test: would a reasonable person think this group is credible? The group of experts should reflect a diverse set of views, experiences, and roles. It is important to establish criteria for selecting experts based upon one or more of the following criteria: •

recognized expert

•

areas of expertise

•

diverse perspectives

•

diverse political views

•

diverse technical expertise.

Once the experts are selected, state the rationale for the selection of each expert.

Page 472

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments

Examples of experts •

managers and administrators

•

front-line staff

•

current and former clients

•

managers from other programs

•

policy experts

•

donors

•

evaluators in the field.

While an expert panel is not considered a strong evaluation approach, it may be the best approach given time and resource constraints. Expert panels are better used at the design and early to mid-implementation stages than for providing an impact evaluation. It is especially useful in rapid assessments.

Advantages and Challenges of Expert Judgment Data Collection Advantages:

Fast, relatively inexpensive.

Challenges:

Weak for impact evaluation. May be based mostly on perceptions. The worth of the data collected will be proportional to how credible group expertise is perceived to be.

Tool 8: Delphi Technique The Delphi technique enables experts who live in different locations to engage in a dialogue and reach consensus through an iterative process. Experts are asked specific questions; their answers are returned to a central source for the evaluator to summarize and feed back to the experts. No one knows who said what, so that conflict is avoided. The experts can then comment on the summary and are free to challenge particular points of view or to add new perspectives by providing additional information. According to Randall Dunham (1996), the purpose of the Delphi technique is to elicit information and judgments from participants to facilitate problem-solving, planning, and decision-making.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 473

Chapter 8 The Delphi technique does this without the participants physically meeting. They share information by mail, FAX, or email. It is a method for helping groups reach consensus on issues, strategies, and priorities. The Delphi technique requires a coordinator, who: •

organizes requests for information

•

organizes information received

•

is responsible for communication with the participants.

The coordinator’s job can take substantial time. Recent experience suggests that coordination of the Delphi technique using email with 20 participants and the processing of three questionnaires could utilize 30-40 hours of the coordinator’s time. An example of use of the Delphi technique is a health study in Kenya. The study, Economic evaluation schistosomiasis: Using the Delphi technique to assess effectiveness, used the Delphi technique to establish priorities base on considerations of both costs and benefits of all of the interventions that might be of some benefit to eradicate the disease schistosomiasis. Kenya could not afford to implement all of the interventions, so they gathered expert judgments on the effectiveness of the then known schistosomiasis interventions (Kirigia, 1997). The Delphi technique process includes the following steps: 1. Identify the issue and solicit ideas. For example: “What action could be taken to provide faster response to patient inquiries between visits?” Prepare and send the first questionnaire, which asks each participant to engage in individual brainstorming to generate as many ideas as possible for dealing with the issue. 2. Response to first questionnaire. Each participant lists the ideas generated by Questionnaire #1 in a brief, concise manner. These ideas need not be fully developed. In fact, it is preferable to have each idea expressed in one brief sentence or phrase, with no attempt at evaluating or justifying these ideas. The participant then returns the list anonymously to the coordinator. 3. Create and send Questionnaire #2. The coordinator prepares and sends a second questionnaire to participants that contains all of the ideas sent in response to the first questionnaire and provides space for participants to refine each idea, to comment on each idea’s strengths and weaknesses for addressing the issue, and to identify new ideas. Page 474

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments 4. Response to second questionnaire. Participants anonymously record their responses to Questionnaire #2 and return them to the coordinator. 5. Create and send Questionnaire #3. The coordinator creates and sends a third questionnaire that summarizes the input from the previous step and asks for additional clarifications, strengths, weaknesses, and new ideas. 6. Continuation of the process. If desired, the coordinator performs iterations of the preceding process until it becomes clear that no new ideas are emerging and that all strengths, weakness, and opinions have been identified. 7. Resolution. Resolution may occur in one of two ways. •

If dominant, highly evaluated ideas emerge via consensus, the exercise is declared finished. The end product is a list of ideas with their concomitant strengths and weaknesses, or, if there is no consensus.

•

The coordinator conducts a formal assessment of the group’s opinions of the merits of the ideas. There are a number of ways to conduct a formal evaluation. In one method, the coordinator prepares a questionnaire that lists all the ideas and asks participants to rate each one on a scale. For example, a 7-point scale could be used that ranges from 0 (no potential for dealing with the issue) through 7 (very high potential for dealing with the issue). If this approach is used, participants send the rating forms to the coordinator, who compiles the results and rank-orders the ideas based on the evaluations.

•

A second approach for evaluating the ideas is that which is used in the nominal group technique for "voting." With this approach, the coordinator asks each member to identify the top five ideas and assign five points to the most promising idea, 4 points to the next most promising, and 3, 2, and 1 points to the third, fourth, and fifth-best ideas. These votes are returned to the coordinator, who tallies the results and prepares a report. The report notes the rank order of the ideas based on the total number of points received and indicates the number of people who voted for each idea. (Dunham, 1996).

This process may be done iteratively until there is consensus.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 475

Chapter 8

Advantages and Challenges of the Delphi Technique for Data Collection (Michigan State University Extension, 1994) Advantages:

allows participants to remain anonymous inexpensive free of social pressure, personality influence, and individual dominance conducive to independent thinking and gradual formulation allows sharing of information and reasoning among participants

Challenges:

judgments of those of a selected group of people and may not be representative tendency to eliminate extreme positions and force a middle-of-the-road consensus requires skill in written communication requires adequate time and participant commitment

Tool 9: Citizen Report Cards Citizen report cards are a means of addressing the performance of public services. They are being used in development, for example, in Bangalore, India. A group of private citizens are using them to address problems with public services. They have also been used for many years in the U.S. to grade the performance of public agencies. The successes of these efforts lead to the further expansion and development of this data collection technique. Citizen report cards (CRC), sometimes called citizen score cards, are a tool to: (

Page 476

•

collect citizen feedback on public services from actual users of a service

•

assess the performance of individual service providers and/or compare performance across providers

•

generate a database of feedback on services that can then be placed in the public domain Asian Development Bank & Asian Development Bank Institute, 2004).

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments Table 8.10 shows a fictitious example of a citizen report card reporting overall satisfaction with services. Table 8.10: Example of a Citizen Report Card Reporting Overall Satisfaction with Services. Agency

# of Users

Percent Satisfied

Percent Dissatisfied

Power

1024

43

15

Water

775

41

19

Telephone

203

87

12

Police

98

45

36

The following is a list of the kind of feedback that many citizen reports collect: •

availability of service

•

access to the service

•

reliability of the service

•

quality of the service

•

satisfaction with service

•

responsiveness of service provider

•

hidden costs

•

corruption and support systems

•

willingness to pay (World Bank, 2006).

The following is a list of the stages involved in the development of a citizen report card: Stage 1: Assessment of Local Conditions Stage 2: Pre-survey Groundwork Stage 3: Conducting the Survey Stage 4: Post Survey Analysis Stage 5: Dissemination of Findings Stage 6: Improving Services (Asian Development Bank and Asian Development Bank Institute, 2004).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 477

Chapter 8 There are many ways to use citizen report cards. Here are examples and the locations where they have been used: •

satisfaction with urban service studies – seven Indian cities; Kampala report card by Uganda Management Institute

•

satisfaction with provincial and national studies on service delivery – India, the Philippines, and part of the National Service Delivery Survey (NSDS) in Uganda

•

sector studies – public hospitals, Bangalore

•

program evaluation – rural food security, Tamil Nadu, India, rural water and sanitation, Maharashtra, India

•

governance reform projects – Bangladesh, the Philippines, Peru, Sri Lanka, Ukraine, and Vietnam (World Bank, 2006).

Advantages and Challenges of the Citizen Report Cards for Data Collection Advantages:

mixes focus groups and questionnaires provides summative feedback on performance structured for simple communication helps reduce bias in data collection increases response rates

Challenges:

capacity must exist to understand this kind of assessment there are limits to comparing across services requires a large sample for heterogeneous population and lesser used services a lack of predictability in how different players respond

Page 478

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments

Final Statement on Tools Before finalizing your data collection decisions, look back at the information from your front-end analysis, particularly your theory of change and the purpose of the evaluation. Consider these when choosing a data collection strategy. Both quantitative and qualitative data should be considered for data collection. Each has strengths and weaknesses but together they are complementary and can give a more balanced view of the situation. Evaluations should use multiple techniques to collect data on the project, program, or policy. Whichever collect data technique is chosen, the evaluator must use the correct tool to meet the needs of the question being addressed.

Do not let the tool drive your work. Choose the appropriate tool and use it to meet the needs of the evaluation.

Case 8-1 shows an example of an evaluation that used multiple data collection methods.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 479

Chapter 8

Case 88-1: Methodology for ORET/MILIEV Programme in China “The Development and Environment Related Export Related Transactions (ORET/MILIEV) Program is a program to finance certain types of projects through a combination of a development cooperation grants and commercial loans. The program is designed to help generate employment, boost trade and industry, and improve environmental quality in developing countries (Chinese National Centre for Science and Technology Evaluation CNCSTE, 2006, p. 15)”. This evaluation conducted by the Chinese National Centre for Science and Technology (CNCSTE) used the following data collection methods to look at the ORET/MILIEV projects funded: •

desk study

•

stakeholder workshops

•

questionnaire survey.

Table 8.11 summarizes the methods used to collect data on the funded projects by sector (Chinese National Centre for Science and Technology Evaluation, 2006, p. 29). The evaluation covered the key issues listed in the TOR with the for evaluation criteria — efficiency, effectiveness, relevance, and impact. Table 8.11: 8.11: Overview of evaluation portfolio by sector. Sector Field Study

Evaluation Portfolio

Total

Desk Study

# of cases

Completed Questionnaires End user

Supplier

Agriculture and water conservation

4

11

9

10

11

Energy and transportation

5

10

10

6

10

Environment and waste treatment

5

10

9

6

10

Factory equipment

2

6

4

2

6

Farm produce processing and equipment

7

21

13

15

21

Medical equipment

3

7

5

4

7

Water treatment/supply

6

15

15

13

15

Others

3

4

3

4

4

Total

35

84

68

60

84

For example, out of 11 agriculture and water conservation projects that were funded under the program, the CNCSTE desk reviewed all of them and also undertook a field study of four of the projects.

Page 480

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments

Summary There is no one best way to collect data. The method chosen to collect data must meet the needs of the evaluation and specific question being addressed. Evaluations may use more than one data collection technique. The decisions about which data collection techniques to use are based upon: •

what you need to know

•

where the data reside

•

resources and time available

•

complexity of the data to be collected

•

frequency of data collection.

Key issues in measurement are: validity, reliability, and precision. There are nine approaches to data collection that can become part of an evaluator’s toolkit, including: •

participatory data collection

•

available records and secondary data analysis

•

observation

•

survey

•

focus groups

•

diaries, journals, and self-reported checklist

•

expert judgment

•

Delphi technique

•

citizen report cards.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 481

Chapter 8

Chapter 8 Activities: Application Exercise 88-1: Data Collection from Files Look at the qualifications and experience of those admitted to a horticultural training workshop. The data are on their admission files. Develop a short form of five questions that can be used by a team of three evaluation assistants to collect data from those admission files. 1.

2.

3.

4.

5.

Application Exercise 8.2: Mapping Consider the area around where you are living or staying: what are the assets in this geographic area? Draw a map of the area. Interview key informants, observe people, traffic, types of business, etc.

Page 482

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments

Application Exercise 88-3: Data Collection: Interview Instructions: You have been asked to develop a short interview to evaluate participant reactions to the quality of a workshop based on this chapter, or a training workshop or conference you have recently attended. Develop five open-ended questions that address the content, level, and delivery of the workshop. If possible, find a partner who attended the same workshop or conference, and interview each other. Next, write a readable, in-depth write-up of the interview you conducted, and have your partner critique it for accuracy, readability, and coverage. This is confidential; do not include names or any other identifying information. 1.

2.

3.

4.

5.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 483

Chapter 8

Application Exercise 88-4: Data Collection — Focus Groups You have been asked to design a focus group to evaluate the impact of a series of workshops and a financial assistance package to help women start and run their own small businesses. Assume a geographic location with which you are familiar. Develop a set of five questions that would be appropriate to ask women who completed the program six months ago. Focus not only on the intended effects, but be sure to probe how else it has impacted their lives, friends, and family. 1.

2.

3.

4.

5.

Page 484

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments

References and Further Reading Asian Development Bank and Asian Development Bank Institute. Improving local governance and service delivery: Citizen Report Card Learning Tool Kit. Retrieved August 7, 2007 from: http://www.citizenreportcard.com/index.html# Billson, Janet Mancini (2002). The power of focus groups for social and policy research. Barrington, Rhode Island: Skywood Press. Billson, Janet Mancini (2004). The power of focus groups: A training manual for social, policy, and market research – Focus on international development. Barrington, Rhode Island: Skywood Press. Billson, J. and N. T. London (2004). Presentation at IPDET, The power of focus groups, July, 2004. Burges, Thomas F. (2001). Guide to the Design of Questionnaires. Edition 1.1., 5 Questionnaire Design. Retrieved May 12, 2008 from http://www.leeds.ac.uk/iss/documentation/top/top2/top2 -5.html Chinese National Centre for Science and Technology Evaluation (CNCSTE) (China) and Policy and Operations Evaluation Department (IOB) (the Netherlands) (2006). Country-led joint evaluation of the ORET/MILIEV programme in China. Amsterdam: Aksant Academic Publishers. Cloutier, Dorothea, Bill Lilley, Devon Phillips, Bill Weber, David Sanderson (1987). A guide to program evaluation and reporting. Orono, Maine: University of Maine Cooperative Extension Service. Dawson, Susan and Lenor Manderson (1993). Methods for social research in disease: A manual for the use of focus groups. Boston: International Nutrition Foundation for Developing Countries. Retrieved December 14, 2007 from http://www.unu.edu/Unupress/food2/UIN03E/uin03e00. htm Denzin, K. (1978). The research act, New York: McGraw-Hill. Dunham, Randall B. (1996). The Delphi technique, Retrieved August 7, 2007 from: http://www.medsch.wisc.edu/adminmed/2002/orgbehav/ delphi.pdf

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 485

Chapter 8 Empowering Communities: Participatory Techniques for Community-Based Programme Development, Session 10 Transect walks and observation (2007). Retrieved December 12, 2007 from http://pcs.aed.org/manuals/cafs/handbook/sessions1012.pdf EuropeAid Co-operation Office (2005). Evaluation methods. Retrieved January 4, 2008 from http://ec.europa.eu/europeaid/evaluation/methodology/eg eval/index_en.htm Fitzpatrick , J. L., J. R. Sanders, and B. R. Worthen (2004). Program evaluation: Alternative approaches and practical guidelines. New York: Pearson. Frary, Robert B. (1996). Hints for designing effective questionnaires. Practical assessment, research & evaluation, 5(3). Retrieved May 12, 2008 from http://PAREonline.net/getvn.asp?v=5&n=3 . Kirigia, J. M. (1997). Economic evaluation in schistosomiasis: Using the Delphi technique to assess effectiveness. Retrieved August 7, 2007 from: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retri eve&db=PubMed&list_uids=9107365&dopt=Abstract Krueger, R.A. and M.A. Casey (2000). Focus groups (3rd Ed.). Thousand Oaks, CA: Sage Publications. Hofstede, G. (2001), Culture’s consequences, 2nd ed. Newbury Park, CA: Sage Publications. Jackson, Gregg (2007). Guidelines for developing survey instruments as a part of the Designing and conducting surveys workshop. Presented at IPDET, 2007. Living Standards Measurements Study (May, 1996). A manual for planning and implementing the living standards measurement study survey. Working paper # 126. Washington D.C.: The World Bank. Lofland, John and L. H. Lofland (1995) Analysis social settings: A guide to qualitative observation and analysis, 3rd Edition. Belmont, CA: Wadsworth Publication. Lofland, John (1971). Analyzing social settings. Belmont, CA: Wadsworth. McNamara, Carter (2007) General guidelines for conducting interviews. Retrieved August 7, 2007 from: http://www.managementhelp.org/evaluatn/intrview.htm#a nchor615874

Page 486

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments Michigan State University Extension (1994) Delphi technique. Issue Identification Information: III00006, 10.01/94. Retrieved August 7, 2007 from: http://web1.msue.msu.edu/msue/imp/modii/iii00006.ht ml Miles, Matthew B. and A. Michael Huberman (1994). Qualitative data analysis: An expanded sourcebook. Thousand Oaks, CA: Sage Publications. Minnesota Department of Health (2007). Community Engagement, Community forums and public hearings. Retrieved December 12, 2007 from http://www.health.state.mn.us/communityeng/needs/nee ds.html Patton, Michael Q. (2002) Qualitative evaluation and research methods, 3rd Ed. Thousand Oaks, CA. Sage Publications Patton, Michael Q. (1987). How to use qualitative methods in Evaluation. Thousand Oaks: Sage Publications. Porteous, Nancy L., B. J. Sheldrick, and P. J. Stewart (1997). Program evaluation tool kit: A blueprint for public health management. Ottawa, Canada: Ottawa-Carleton Health Department. Roberts, Donna (2007), Regents Exam Prep Center, Qualitative vs. quantitative data. Retrieved December 14, 2007 from http://regentsprep.org/Regents/math/ALGEBRA/AD1/qu alquant.htm Sanders, J. R. (2000). Evaluating school programs (2nd Ed.). Thousand Oaks, CA: Sage Publications. Stake, R. (1995). The art of case study research. Thousand Oaks, CA: Sage. StatPac (2007). StatPac, Survey Software, Question Wording. Retrieved February 18, 2008 from http://www.statpac.com/surveys/index.htm#toc TC Evaluation Center, University of California, Davis (2007). Wording questions – Some general guidelines. Retrieved February 18, 2008 from http://ucce.ucdavis.edu/files/filelibrary/5715/27621.pdf Trochim, William M. K. (2006). Types of data, Research methods knowledge base. Retrieved December 14, 2007 from http://www.socialresearchmethods.net/kb/datatype.php

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 487

Chapter 8 United Nations Economic and Social Commission for Asia and the Pacific (ESCAP) (1999). Guidelines on the application of new technology to population data collection and capture. Chapter 4, Computer Assisted Telephone Interviewing (CATI). Retrieved May 12, 2008 from http://www.unescap.org/stat/pop-it/popguide/capture_ch04.pdf The University of Wisconsin – Extension: Cooperative Extension (1996). G3658-5 Program development and evaluation, collecting evaluation data: Direct observation. Retrieved December 13, 2007 from http://learningstore.uwex.edu/pdf/G3658-5.pdf Wadsworth, Y. (1997). Do it yourself social research (2nd ed.). St. Leonards, NSW, Australia: Allen and Unwin. Wadsworth, Y. (1997a). Everyday evaluation on the run. St. Leonards, NSW, Australia: Allen and Unwin. Wengraf, Tom (2001). Qualitative research interviewing: Biographic narrative and semi-structured methods. Thousand Oaks, CA: Sage Publications. The World Bank (1996) Living Standards Measurements Study (LSMS). A manual for planning and implementing the LSMS survey, Working paper No. 126, Washington, DC, World Bank. The World Bank, Participation and Civic Engagement Group, Social Development Department (2006). Citizen Report Cards – A presentation on Methodology. Retrieved August 7, 2007 from: http://info.worldbank.org/etools/docs/library/94360/Tan z_0603/Ta_0603/CitizenReportCardPresentation.pdf The World Bank (2007). Community Driven Development. Retrieved August 7, 2007 from: http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/E XTSOCIALDEVELOPMENT/EXTCDD/0,,menuPK:430167~p agePK:149018~piPK:149093~theSitePK:430161,00.html

Page 488

The Road to Results: Designing and Conducting Effective Development Evaluations

Selecting and Constructing Data Collection Instruments

Web Sites: Asian Development Bank and Asian Development Bank Institute Improving Local Governance and Service Delivery: Citizen Report Card Learning Tool Kit: http://www.citizenreportcard.com/index.html# Dunham, Randall B. (1998). The Delphi Technique, from Organizational Behavior, University of Wisconsin School of Business: http://instruction.bus.wisc.edu/obdemo/readings/delphi. htm EuropeAid Co-operation Office. Evaluation Methods. http://ec.europa.eu/europeaid/evaluation/methodology/eg eval/index_en.htm Evaluation Portal http://www.evaluation.lars-balzer.name/ Evaluators' Instruments Exchange.: http://141.218.173.232:120/xchange/default.htf Harrell, A. Evaluation Strategies for Human Services Programs: A Guide for Policymakers and Providers.: http://www.bja.evaluationwebsite.org/html/documents/ev aluation_strategies.html Carter McNamara, MBA, PhD, General guidelines for conducting interviews.: http://www.managementhelp.org/evaluatn/intrview.htm#a nchor615874 Empowering Communities: Participatory Techniques for Community-Based Programme Development Volume 1: Trainer’s Manual and Volume 2: Participant’s Handbook. Retrieved December 12, 2007 from http://pcs.aed.org/manuals/cafs/handbook/sessions1012.pdf Google Earth http://earth.google.com/ Nielsen, J. (1997). The use and misuse of focus groups.: http://www.useit.com/papers/focusgroups.html

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 489

Chapter 8 Porteous, Nancy L., Sheldrick, B.J., and Stewart, P.J. (1997). Program evaluation tool kit: A blueprint for public health management. Ottawa, Canada: Ottawa-Carleton Health Department. http://ottawa.ca/residents/funding/toolkit/index_en.html http://www.phac-aspc.gc.ca/php-psp/tookit.html (English) or http://www.phac-aspc.gc.ca/php-psp/toolkit_fr.html (French) The Measurement Group: Evaluation/Research Tools.: http://www.themeasurementgroup.com/evalbttn.htm The University of Wisconsin – Extension: Cooperative Extension (1996). G3658-5 Program Development and Evaluation, Collecting Evaluation Data: Direct Observation. Contains an appendix with example observation guides. http://learningstore.uwex.edu/pdf/G3658-5.pdf W.K. Kellogg Evaluation Handbook.: http://www.wkkf.org/Pubs/Tools/Evaluation/Pub770.pdf The World Bank, Citizen Report Cards – A presentation on Methodology. Participant and Civic Engagement Group, Social Development Department, The World Bank. http://info.worldbank.org/etools/docs/library/94360/Tan z_0603/Ta_0603/CitizenReportCardPresentation.pdf The World Bank, Community Driven Development http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/E XTSOCIALDEVELOPMENT/EXTCDD/0,,menuPK:430167~p agePK:149018~piPK:149093~theSitePK:430161,00.html

Page 490

The Road to Results: Designing and Conducting Effective Development Evaluations

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 9 Deciding on the Sampling Strategy Introduction The last several chapters of this book covered writing evaluation questions, choosing an evaluation design, and approaches to data collection. This chapter discusses ways to choose how much data to collect and ways to choose from where to collect the data so that the data closely reflect the population and help answer the evaluation questions. This chapter has six sections. They are: •

Introduction to Sampling

•

Types of Samples: Random and Non-random

•

How Confident and Precise Do You Need to Be?

•

How Large a Sample Do You Need?

•

Where to Find a Sampling Statistician

•

Sampling Glossary.

Chapter 9

Part I: Introduction to Sampling Part of a data collection strategy is deciding where to collect data, from whom, and how much data to collect. That is, is it possible – or if possible, feasible – to collect data from the entire study population? Can we read every document, interview every farmer, investigate every mile of a road system, review every file, or observe every tester? If we can, we will then be able to accurately report the qualifications of every teacher in the school system, the number of paved miles on all the roads, or the views of all the farmers in the area. If we collect all the data accurately and reliably, then there is little chance of error. This type of complete coverage of the population in question is called a census. (See the glossary at the end of this chapter for a comprehensive list of all the technical terms for sampling used in this chapter). However, for most evaluations, it is either not possible, or feasible to collect data from every file, farmer, or service provider. It takes too much time and costs too much. Instead, we take a sample − a subset of the entire population. We want to draw inferences about a population based on the sample results. That is, estimate what the population is like based on the sample results. We call this “generalizing to a population”. We use samples all the time. For example, when we have a blood test to check on our health, the laboratory, fortunately for us, takes a sample rather than all our blood. Tests are run using that sample and it is assumed that what they find in the sample is an accurate reflection of what is in all our blood. Sampling is not just something that applies to large, quantitative studies. Even when conducting a highly qualitative, one-week field visit to assess a program that is spread out over a large geographic region, for example, evaluators still need to be thoughtful about which areas of that region to investigate. Consider the biases that might be introduced if program officials, anxious to show the best picture, select the sites and participants to be studied. Those biases could be avoided with a randomly selected sample. An understanding of the basic concepts of systematic sampling can enhance the extent to which the evaluation accurately reflects the true picture of what is really going on in the program.

Page 492

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy

Part II: Types of Samples: Random and NonNonrandom When we cannot collect data from every country, every person, or every school, we select a smaller subset. Recall that a subset is called a sample. There are two kinds of samples – random and non-random.

Random Sampling Random samples are samples in which each unit in the population has an equal chance of being selected. Random samples can come from any group, such as: files, roads, farms, or people. Often, a lottery is used as an example – where every number has an equal change of being selected as the “winning” number. One advantage of random sampling is that it eliminates selection bias. Since everyone has an equal chance of being selected, evaluators cannot select only those people that look like them or who share a particular viewpoint. An appropriately sized random sample should be representative of the population as a whole, enabling evaluators to generalize to the population from which the sample was drawn.

Generating Random Samples A complete list of every unit in the population of interest, called a sampling frame, is needed to select a random sample. Each unit in the population needs to be assigned a unique number. Although it is not necessary to start with the number “1” it should be sequential. Units are then selected from the total population using a random schedule; typically, we would use a table of random numbers and select every unit until we reach the sample size we set. Random numbers can be generated fairly simply using any major spreadsheet program. To generate a random whole number between 0 and 100 in Microsoft Excel®, for example: 1. Enter the formula =RAND( )*100 2. Format the cell as a number with zero decimal places 3. Copy both the formula and format to as many other cells as needed, (i.e., the size of the sample needed to draw). To create random numbers between 100 and 200, simply add 100 to the formula, i.e., =RAND( )*100+100.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 493

Chapter 9 An Excel®-generated table of two-digit random numbers (whether two-, three-, or four-digit, more random number tables are used depends on the size of the population) is included at the end of this chapter for use in Application Exercise 9-1. An example of an application would be to use the random numbers in the table near the end of this chapter to select a sample from an enumerated list of project sites numbered sequentially from 20 to 55. Sometimes it is not possible to do a truly random selection. In these cases, a systematic sampling technique can be used, starting from a random place then selecting every nth case.

Example of systematic sampling The evaluators need to review records but the records are in twenty boxes and there is no way they can examine and number them all sequentially to select a sample. A systematic selection with a random start is acceptable, as long as there is nothing about the original order of the documents that is systematic. For example, they could take a random start and then pick every 20th file until they get the total number of files they need to select. If this type of simple random sample is selected, the evaluator must be sure that the files are not in some systematic order. Say, they are medical records, filed systematically in alphabetical order, by patient last name. In this instance the sample, might be fully selected before we got to the names at the end of the alphabet. In this case, systematic sampling would be inappropriate as each and every unit would not have an equal chance of being selected.

Types of Random Samples There are six types of random samples. They are:

Page 494

•

simple random samples

•

random interval samples

•

stratified random samples

•

random cluster samples

•

multi-stage samples

•

combination random samples.

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy

Simple Random Samples A simple random sample is the most common form of random sampling and is the simplest sample. It is used when the evaluator’s primary objective is to make inferences about the whole population rather than about specific sub-populations. Simple random samples are good for drawing samples from 50 to 500 from homogenous populations or large samples (more than 500) from more heterogeneous populations (Jackson, 2008, p. 11). A complete list of every unit in the population of interest, called a sampling frame, is needed to select a simple random sample. These units are selected using a random schedule; typically, evaluators would use a table of random numbers and select every unit until reaching the sample size that was set. Table 9.1 summarizes the procedure for drawing a simple random sample: Table 9.1: Drawing a Simple Random Sample Step

Procedure

1

Define the population carefully; indicating what is within and outside the population.

2

Find or generate a sampling frame that lists all of the units in the population or a very large proportion of them and assigns each a unique number. The numbers do not have to be contiguous (using each consecutive number) but drawing the sample will go faster if there are not large gaps among the assigned numbers.

3

Decide on the size of the sample

4

Determine the number of digits in the largest assigned number.

5

Acquire a random number table.

6

Decide on any fixed pattern for reading the numbers (i.e. top to bottom and then right to the next column, reading the first digits).

7

Blindly select a starting point in the table.

8

Read the numbers in the selected pattern, always reading the number of digits determined in Step 4 above.

9.

Each time you read a digit that corresponds with the number of a unit in your sampling frame, mark that unit for selection.

10

Continue until you have selected the number of units you decided were needed.

11

If you come upon a random number that is the same as the one already used, go on to the next number, you cannot select the same unit twice.

(Source: Jackson, 2008, p. 11)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 495

Chapter 9 For example, an evaluation team selects 100 files from a population of 500. All the files have been consecutively numbered from 001 to 500 and are filed in numerical order from 1-500. They can then use a random numbers table, mentally block it off into three-digit numbers, and then select the first 100 numbers that fall between 001 and 500. These would be the files they select for their study. Note that the evaluation team could have chosen the order using simple random sampling, as shown, or the systematic sampling approach previously illustrated.

Random Interval Samples Random interval sampling can be used when there is a sequential population that is not already enumerated and would be difficult or time-consuming to enumerate. A random interval also uses a random numbers table. However, instead of using the table to select the units for the sample, it uses the table to select the interval (the count between numbers) to select the units. For example, an evaluator has files containing data for an entire population. The evaluator can follow the sequence described in Table 9.2 to draw a random interval sample. Random interval samples can also be generated using computer programs. Table 9.2: Drawing a Random Interval Sample Step

Procedure

1.

Estimate the number of units in the population.

2.

Determine the desired sample size.

3.

Divide the estimated number of units by the desired sample size. Round this number off to two decimal places to get the average random interval length needed to yield the desired sample size when sampling thorough the whole population.

4

Multiply the result of Step 3 by 1/5 and round to two decimal places to get the “multiplier”.

5.

Blindly designate a starting point in the population.

6.

Use a random number (as shown in Table 9.1) select a single-digit random number, multiply it by the multiplier, round to the nearest whole number, count from the staring point down that many places, select that unit for the sample, and place a marker where it was drawn.

7

Take the next single-digit random number, multiply it by the multiplier, round it to the nearest whole number, and count that many more places.

8

Continue in the same manner through the entire population until you reach the point where you started

9

WARNING: If it is important to return the units to their original positions, make sure that you mark each place from which you draw a file, with a file identifier, so you can get the files back into the proper places. (Source: Jackson, 2008, p. 14)

Page 496

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy

Stratified Random Samples Sometimes evaluators want to make sure specific groups are included that might otherwise be missed by using a simple random sample. These groups are usually a small proportion of the population. In this case, divide the population into strata based on some meaningful characteristic, such as gender, age, ethnicity, or other characteristic. This kind of sample is called a stratified random sample (Jackson, 2008, p. 16). For example, an evaluation team may want to make sure they have enough people from rural areas in their study. If selected by a simple random sample, they may not get enough people from rural areas if they are a small proportion of all the people in the area. This is especially important if they want to have sufficient numbers in each stratum so they can make meaningful comparisons. For example, they may want to take a stratified sample of farmers at various distances from a major city. To do a stratified random sample, divide the population into non-overlapping but exhaustive groups (i.e., strata) n1, n2, n3, ... ni, such that n1 + n2 + n3 + ... + ni = n. Then do a simple random sample in each stratum. The number selected randomly from each stratum would be equivalent to that strata’s proportion of the total population. Figure 9.1 illustrates this process.

subpopulation sub-population

Total Population sub-population

simple random sample simple random sample

simple random sample

Fig. 9.1: Stratified Random Sample. Sample.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 497

Chapter 9

Random Cluster Sampling Random cluster sampling is another form of random sampling. A “cluster” is any naturally occurring aggregate of the units that are to be sampled. Thus, households (or homes) are clusters of people, and towns are clusters of households. Cluster samples are most often used when: •

there is not a complete list of everyone in the population of interest but we do have a complete list of the clusters in which they occur, or

•

there is a complete list of everyone, but they are so widely disbursed that it would be too time consuming and expensive to send data collectors out to a simple random sample.

In a random cluster sample, the cluster is randomly sampled (such as towns or households) and then data are collected on all the target units. For instance, if the evaluation needs to collect data on the height and weight of children ages 2-5 in the program sites scattered across a large rural region, the evaluators might randomly sample 20 villages from the 100 villages receiving the program, and then collect data on all the children ages 2-5 in those villages. Here is another example. An evaluation team may want to interview about 200 AIDS patients. There is no compiled list of all AIDS patients, but the patients are served by 25 clinics scattered over a large region with poor roads. The evaluators know that most clinics serve about 50 AIDS patients. Therefore they randomly sample four clinics from the total of 25 clinics, and then study all AIDS patients in those four clinics, which would provide about 200 patients. The main drawback of random cluster samples is that they are likely to yield somewhat less accurate estimates of the population parameter than simple random samples or stratified random samples of the same size (N). It is possible that the selected clinics will serve clients who differ in economic or religious characteristics from the ones not included in the sample, and if so, the results of the sample will provide biased estimates of the full population of AIDS patients served by clinics.

Page 498

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy

MultiMulti-stage Random Sampling Multi-stage random sampling combines two or more forms of random sampling. It applies a second form of random sampling to the results of a first form of random sampling. Most commonly, it begins with random cluster sampling and then applies simple random sampling or stratified random sampling. In the example immediately above of sampling HIV/AIDS patients, a multi-stage random sample might initially draw a cluster sample of eight clinics instead of four. Then it might draw a simple random sample of 25 patients from each clinic. That provides a sample of 200 patients, just as in the above example, but they come from a larger number of clinics. This would be somewhat more expensive than collecting data from just four clinics but it is likely to provide less biased estimates of the population parameter. It is also possible to combine non-random and random sampling in a multi-stage sample. For instance, the clinics might be sampled non-randomly and then the HIV/AIDS patients sampled randomly from each selected clinic. The drawback of multi-stage and cluster samples, as discussed, is that they may not yield an accurate representation of the population. For example, an evaluation team may want to interview 200 HIV/AIDS patients, but these 200 may be selected from only four randomly sampled clinics because of resource constraints. It is possible that the clinics will serve populations that are too similar in terms of economic background or other characteristics, and therefore may not be representative of all HIV/AIDS patients. Likewise, it is possible that the selected clinics are too similar in their level of care to patients to accurately represent the total population of patients. It is difficult to travel to all of the people living on small, dispersed, and remote farms and interview each one where they reside. In a cluster sample, evaluators might sample 10 of the 50 farms and then interview all the people at each of those sampled farms.

Combination Random Samples Sometimes combinations of methods are used. The total group may be divided into strata; all the people in one stratum might be selected (as a census) and a random sample selected from the other strata. The program in Ghana (see Case 9-1) gives an example of a complex use of combination random sample.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 499

Chapter 9

Case 99-1: Impact on Nutrition: Lower Pra Rural Bank Credit with Education Program in Ghana The intent of the program is to increase the nutritional status and food security of poor households in Ghana. The Credit with Education Program combines: (1) providing credit to participants with (2) education on the basics of health, nutrition, birth timing, and small business skills. Evaluation Questions: Questions Did the program have an impact on the nutritional status of children, women’s economic capacity, women’s knowledge of health issues, and ability to offer a healthy diet to their children? Overall Design: Design A quasi-experimental design using two surveys. Nineteen communities that did not yet have the Credit with Education Program were the focus of this study. The communities were divided into groups (strata) based on set criteria. Within each of the strata, communities were assigned either to a treatment group (will receive the Credit with Education Program) or to a control group (who will not receive the program). They were not randomly assigned; three were assigned for political reasons and three were assigned as matched controls. Sampling within the Communities: Three groups of women with children were surveyed: those who participated at least one year (all participants were selected); those who did not participate but were in the program communities (random sample); and those in control communities (random sample). In all, ten mother/child pairs with children aged 12-23 months were chosen from each of the small communities; 30 from the large communities.

Page 500

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy

Examples of selecting random samples For example, an evaluation team wants to observe classroom activities to measure the amount of time spent doing hands-on learning activities. They can: •

randomly select classrooms

•

randomly select times of day

•

randomly select days of the week.

In another example, evaluators might want to observe the amount of traffic on the road from the village to a major town. They can: •

randomly select times and days of the week

•

randomly select times of the year

•

randomly select observation points or select a single observation point along the road.

Table 9.3 summarizes of the random sampling process. Table 9.3: Summary of Random Sampling Process Step

Process

1.

Obtain a complete listing of the entire population

2.

Assign each case a unique number.

3.

Randomly select the sample using a random numbers table.

4.

When no numbered listing exists or is not practical to create, use systematic random sampling: • make a random start th • select every n case.

NonNon-Random Sampling When random sampling is not possible, a different approach must be used. Non-random sampling approaches enable evaluators to use a group smaller than the total population. While there are many names for different non-random sampling techniques, they all are limited in their ability to generalize findings back to the larger population. However, even when there is a non-random sample (limited to a particular school, for example), evaluators can still use random sampling within it to make the evaluation more credible.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 501

Chapter 9

Non--random Samples Types of Non Non-random samples are commonly classified into three types: convenience, snowball, and purposeful.

Convenience Samples In a convenience sample, selections are made based on the convenience to the evaluator. Principals from local schools may be selected because they are near where the evaluators are located. In development evaluation, common examples of convenience samples are: •

visit whichever project sites are closest to an airport

•

interview whichever project managers that are available the day of a visit

•

observe whichever physical areas that project officials choose to show

•

talk with whichever NGO representatives and town leaders who are encountered (Jackson, 2008, p. 20-25).

Although convenience samples are convenient for the evaluators, they are inferior for every other purpose. There is no way to know how different these samples are from the normal population.

Snowball Samples Snowball samples (also known as chain referral samples) are used when the evaluators do not know who or what you should include. They are used when the boundaries of the population are unknown and there is no sampling frame. Typically used in interviews, they ask their interviewees whom else they should talk to. They continue to ask until no new suggestions are obtained. Snowball samples contain several serious potential biases. They should be used only when purposeful and random samples are not feasible.

Page 502

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy

Purposeful Samples Samples In purposeful samples, (also called “judgment” samples) selections are made to meet the specific purposes of the study. The selection is based on pre-determined criteria that, in the judgment of the evaluators, will provide the data needed but are not generalized to the whole population. For example, an evaluation may want to interview primary school principals and decide to interview some from rural areas as well as some from urban areas (but no quota is established). The following are the most widely used forms of purposeful samples (Jackson, 2008, p. 21-25), •

Typical cases (median) sample: Units are deliberately drawn from the typical or most common characteristics of a population. In a bell curve, they are at the middle range of the curve. The purpose of the study is to closely look at the typical items, not those that are atypical.

•

Maximum variation (heterogeneity) sample: Units are drawn deliberately to represent the full range of a characteristic of interest. In a bell curve, they are taken from all parts of the curve.

•

Quota sample: Units are drawn deliberately so that there are equal numbers or an equal proportion from each stratum. For example the evaluator chooses five units from the top third, five units from the middle third, and five units from the lower third of a distribution.

•

Extreme case sample: Units are drawn deliberately from the extreme cases of a distribution. On a bell curve, the units would be selected from the left and right ends of the bell curve. Extreme case samples look at the least common units.

•

Confirming and disconfirming cases sample: Units are drawn deliberately from cases that are known to confirm or contradict conventional wisdom, principle, or theory. For example, the units are drawn from wellprepared projects that succeeded and well-prepared projects that failed.

Non--random Sampling Bias and Non Non-random samples are sometimes called “unscientific sampling”. Calling them unscientific may be appropriate for convenience samples, but it is not appropriate for the other types of non-random samples. In fact, case study evaluators often use purposeful samples for qualitative case studies.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 503

Chapter 9 When using a non-random sample, examining the issue of bias is important. Is there something about the particular sample that might be different from the population as a whole? Is it possible to gather demographic information to make it possible to describe the characteristics of the sample? Ideally, there will be no obvious differences between the sample and the population. But, when the evaluator fully reports the demographics of the sample, the audience can then judge as to how similar the sample is to the population. Non-random samples are the weakest form of sampling and should be avoided if the evaluator needs to generalize to the whole. When using a non-random sample, report your results in terms of the respondents. For example, “Of the mothers interviewed, 70 percent are satisfied with the quality of the healthcare their children are receiving.” Without random sampling, one does not have the basis to generalize to a larger population. However, the data may be very useful and may be the best given the situation. Always make the sample selection criteria and procedures clear.

Combinations Random and non-random methods can be combined. For example, an evaluation team may be collecting data on schools. They can select two schools from the poorest communities and two from the wealthiest communities. Then from these four schools, they can randomly select students for their data collection.

Part Part III: How Confident and Precise Do You Need to Be? Even when using a random sample, there is some possibility of error. It is possible that the sample will be different from the population. This is where statistics come in (see Chapter 10, Planning Data Analysis and Completing the Design Matrix). The narrowest definition of statistics concerns the validity of data derived from random samples. More specifically, it is concerned with estimating the probability that the sample results are representative of the population as a whole. Statisticians have developed theories and formulas for making these estimates and selecting sample size. While we will present some statistics in the next chapter, we will not present or discuss statistical formulas here. Rather, we will focus on understanding the basic concepts of statistical analysis, and how to apply them to designing evaluations.

Page 504

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy There are some options in deciding how accurate and precise the evaluation needs to be in inferring results to the larger population. First, the evaluators must decide how confident they need to be that the sample results are an accurate estimate of what is true for the entire population. The confidence level generally used is 95 percent. This means they want to be certain that 95 times out of 100 their sample results are an accurate estimate of the population as a whole. If they are willing to be 90 percent certain, their sample size will be smaller as they are willing to be certain 90 times out of 100. If they want to be 99 percent confident (only 1 percent chance of having the sample be very different from the population as a whole), they will need a larger sample. The next choice is about how precise they need their estimates to be. This is sometimes called sampling error or margin of error. We often see this when results from polls are reported. For example, a newspaper might report that 48 percent are in favor raising taxes and 52 percent oppose raising taxes (+/- 3 percent). What this means is that if everyone in the population were asked, the actual proportions would be somewhere between 45 percent to 51 percent (48 +/- 3) favoring raising taxes, and 49 percent to 55 percent (52 +/- 3) opposing. Most evaluations accept a sampling error of 5 percent. In the tax example, if we had a 5 percent margin of error, then the true picture of opinions would be from 43 percent to 53 percent that favor raising taxes and from 47 percent to 57 percent that oppose raising taxes. Notice that there is more variability (less precision) in the estimates with a +/-5 percent margin of error as compared to a +/-3 percent margin. The more precise the evaluators want to be; the larger the sample will need to be. In this example, however, note that the sampling errors overlap. This means that these results are too close to call. When working with real numbers, such as age or income, precision is presented in terms of the confidence interval. (Note: this is not to be confused with the concept of confidence level explained earlier). We use this when we want to estimate the mean of the population based on our sample results. For example, if the average per capita income of the rural poor in our sample is 2,000 South African Rand per year, the computer might calculate a 95 percent confidence interval as between R 1,800 and R 2,200. We can then say that we are 95 percent certain (this is the confidence level) that the true population's average salary is between 1,800 and 2,200 (this is the confidence interval).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 505

Chapter 9

Part IV: How Large a Sample Do You Need? Sample size is a function of size of the population of interest, the desired confidence level, and level of precision. There are two ways to determine the appropriate sample size: calculate a formula or use a tool such as Table 9.2, a Guide to Minimum Sample Size. This table shows the sample size needed when estimating a population percentage (or proportion) at the 95 percent confidence level and a + or – 5 percentage point confidence interval. Notice that the smaller the population, the higher proportion of cases needed. If the population is 300, they need a sample size of 169 – just over half the total population – to obtain a confidence level of 95 percent. A population of 900 will require a sample size of 269 – just under a third. When the population is larger than 100,000, 385 will be needed – a much smaller fraction. +/-- 5% Table 9.2: Guide to Minimum Sample Size (95% confidence level, +/ margin of error) Population Size

Sample Size

Population Size

Sample Size

10

10

550

226

20

19

600

234

40

36

700

248

50

44

800

260

75

63

900

269

100

80

1,000

278

150

108

1,200

291

200

132

1,300

297

250

152

1,500

306

300

169

3,000

341

350

184

6,000

361

400

196

9,000

368

450

207

50,000

381

500

217

100,000+

385

Source: R. V. Krejcie, and D. W. Morgan, "Determining Sample Size for Research Activities", Educational and Psychological Measurement, Vol. 30: 607-610, 1970

Note that these are minimum sample sizes. Whenever possible, select a larger sample size to compensate for the likelihood of a lower than 100 percent response rate.

Page 506

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy An important point is that low response rates always have the threat of non-response bias. Over-sampling cannot control for this. No amount of over-sampling can control response bias if there is a low response rate (e.g. only 20 percent). Rather than over-sampling, the evaluator should put extra resources into doing everything possible to obtain a high response rate. Such efforts should include incentives and multiple follow-ups to non-respondents. Any response rate of less than 70 percent greatly enhances the likelihood of “response bias”. While samples are used to keep costs of data collection down, go for as large a sample as you can manage. This will make the estimates of the population as accurate as possible. If it is possible, do the entire population, because there will then be no sampling error involved. However, keep mind that censuses can also yield biased data if there are low response rates. Also recognize larger samples are not always the best use of your evaluation resources. You sample to save cost and time.

Summary of Sampling Size •

Accuracy and precision can be improved by increasing the sample size. In other words: −

to increase accuracy and reduce margin of error, increase the sample size

•

The standard to aim for is a 95 percent confidence level and a margin of error of +/- 5 percent.

•

The larger the margin of error, the less precise the results will be.

•

The smaller the population, the larger the needed ratio of the sample size to the population size (See Table 9.2)

Table 9.3 summarizes sample sizes for very large populations (those of 1 million or larger). Many national surveys use samples of about 1,100 because that makes the margin of error + or –3 percentage points with a 95 percent confidence level.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 507

Chapter 9 Table 9.3: Sampling Sizes for Large Populations (one million or larger) Precision

Confidence Level

(margin of error, +/+/-%)

99%

95%

90%

± 1%

16,576

9,604

6,765

± 2%

4,144

2,401

1,691

± 3%

1,848

1,067

752

± 5%

666

384

271

Tables 9.4 and 9.5 show the confidence intervals for two population sizes (100 and 50): a few samples sizes and various proportions found in the sample. Table 9.4 shows confidence levels for populations of 100. Table 9.5 shows confidence levels for populations of 50 (Jackson, 2005, p. 28). Table 9.4: 95% Confidence Intervals for a Population of 100. Proportion Found in the Sample Sample Size

.5

.4 or .6

.3 or .7

.2 or .8

.1 or .9

75

± .06

± .06

± .05

± .05

± .03

50

± .10

± .10

± .09

± .08

± .06

30

± .15

± .15

± .14

± .12

± .09

Table 9.5: 95% Confidence Intervals for a Population of 50. Proportion Found in the Sample Sample Size

.5

.4 or .6

.3 or .7

.2 or .8

.1 or .9

30

± .11

± .11

± .10

± .09

± .07

20

± .17

± .17

± .16

± .14

± .10

There are tools available on the Internet that can help identify the population size needed to achieve the confidence level and margin of error. The following sites are examples of these: If you want it to determine the sample size, use the Sample Size Calculator from Creative Research Systems at: http://www.surveysystem.com/sscalc.htm The Sample Size Calculator can also help find the confidence level. Another calculator for sample size that is called the Proportion Difference Power /Sample Size Calculation and can be found at: http://statpages.org/proppowr.html

Page 508

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy

Part VI: When Do You Need Help from a Statistician? If it is determined that an evaluation needs to use a complex multi-stage sampling strategy, evaluators may want to consider asking for assistance. The American Statistical Association (ASA) has a directory of statistical consultants. The ASA Directory Web site is located at: http://www.amstat.org/consultantdirectory/index.cfm At the Web site searches can be conducted by expertise, specialties, language fluency, or location (in North and Central America). The Alliance of Statistics Consultants also offers assistance for data management, data analysis, thesis and dissertation statistics consultation, as well as statistics training and tutoring. They offer a way to get a free estimate for the cost of the service. The Alliance of Statistics Consultants Web page is: http://www.statisticstutors.com/#statistical-analysis An additional Web site, HyperStat Online: Help with Statistics: Statistical Consultants and Tutors, links to many other resources for assistance with statistics. HyperStat Online is at: http://davidmlane.com/hyperstat/consultants.html

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 509

Chapter 9

Part VI: Sampling Glossary accidental sample:

A sample in which the units are selected “by accident.”

census:

A count of (or collection of information from) the entire population.

confidence interval:

The calculated range within which the true population value lies and for which we can express 95 percent confidence (the standard, but may vary).

confidence level:

The degree of certainty that the statistic obtained from the sample is an accurate estimate of the population as a whole.

confirming and disconfirming cases sample

A purposeful sample in which units are drawn deliberately from cases that are known to confirm or contradict conventional wisdom, principle, or theory.

convenience sample:

A sample in which selections are based on the convenience to the evaluator (e.g., on easy geographic or organizational access).

extreme case sample

A purposeful sample in which units are drawn deliberately from the extreme cases of a distribution. On a bell curve, the units would be selected from the left and right ends of the bell curve.

maximum variation (heterogeneity) sample

A purposeful sample in which units are drawn deliberately to represent the full range of a characteristic of interest. In a bell curve, they are taken from all parts of the curve.

multi-stage random sample

This combines two or more of the above procedures in a sequential manner. Usually a random cluster sample is followed by some form of random sampling from the clusters. For instance, there might be a cluster sample of a village and then random sampling of families within villages.

parameter:

Characteristic of the population.

population:

The total set of units. It could be all the citizens in a country, all farms in a region, or all children under the age of five living without clean running water in a particular area.

purposeful sample:

A sample in which selections are made based on predetermined criteria. (also known as judgment sample)

quota sample:

A purposeful sample in which a specific number of different types of units are selected. Units are drawn deliberately so that there are equal numbers or and equal proportion from each stratum.

Page 510

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy random cluster sample

When there is no good sampling frame of the units of analysis, but there is a sample frame of a naturally occurring cluster of the units (villages are a cluster of people), or if it would be very expensive to collect data with a simple random sample (usually because of wide geographic dispersion), a random sample is drawn from a sampling frame of naturally occurring clusters of the unit of analysis.

random interval sample

A sample chosen using by using random distances between numbers

random sample:

A sample in which each unit in the population has an equal chance of being selected.

response rate:

Percent of the intended sample for which the intended data is actually collected.

sample design:

The method of sample selection.

sample:

A subset of units selected from a larger set of the same units.

sampling frame:

The list from which to select the sample.

simple random sample:

Each unit of analysis has a unique identification number or is assigned one. A random number table or generator indicates the identification numbers of the units to be selected for the sample.

snowball sample: A type of sampling strategy typically used in interviews, where interviewers ask interviewees who else to talk to. statistic:

Characteristic of a sample.

typical cases (median) sample

A purposeful sample in which units are deliberately drawn from the typical or most common of a population. In a bell curve, they are at the middle range of the curve. The purpose of the study is to look at the typical items, not those that are atypical.

stratified random sample

The sampling frame is divided into two or more strata (sub-populations) and a random sample is drawn from each strata.

systematic sampling:

a sample drawn from a list using a random start followed by a fixed sampling interval.

units of analysis:

The type of entity on which data is sought (people, sites, etc.) and will be reported

(Sources: Adapted from Easton & McColl, 2007 and Jackson, 2008)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 511

Chapter 9

Summary Often it is not possible or is impractical to collect data from every data source. For this reason evaluators will collect from a sample of data sources. There are two forms of sampling, random sampling and nonrandom sampling. Random samples are samples in which each unit in the population has an equal chance of being selected. Random samples can come from any group, such as: files, roads, farms, or people. Random sampling approaches use a group smaller than the total population and then generalize to the larger population. There are six types of random samples: •

simple random samples

•

random interval samples

•

stratified random samples

•

random cluster samples

•

random multi-stage samples

•

combination random samples.

There are four types of non-random samples: •

quota

•

snowball

•

judgmental

•

convenience

Evaluations do not need to use only one kind of sampling. They can use combinations sampling techniques. Sample size is a function of size of the population of interest, the desired confidence level, and level of precision. Even when using a random sample, there is some possibility of error. It is likely that the sample will be somewhat different from the population and so we need to understand and test for that variation with various statistical approaches to determine our confidence levels and confidence intervals.

Page 512

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy

Chapter 9 Activities Application Exercise 99-1: Using a Random Number Table Instructions: You have been asked to select 20 case files from a group of 300 urban and rural men and women. All the files have been filed according to numbers from 1-300. List three different ways you could use the random number table on the next page to select 20 case files for review. Under what conditions would you use each one? Strategy 1: Use if:

Strategy 2: Use if:

Strategy 3: Use if:

Finally, list the cases you would select using the simplest random sampling strategy: ____

____

____

____

____

____

____

____

____

___

____

____

____

____

____

____

____

____

____

___

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 513

Chapter 9

Table of Random Numbers for Exercise 99- 1 3 44 14 12 12 03

12 73 72 62 33

35 62 80 34 77

69 59 54 90 01

50 04 93 76 69

43 95 47 60 80

23 95 24 95 24

55 69 89 41 18

12 94 43 21 43

40 76 50 38 18

05 44 23 72 61

58 67 99 05 75

54 05 51 52 04

34 25 64 90 95

02 86 51 14 37

36 82 03 65 38

93 49 64 06 93

01 30 62 05 68

96 19 97 24 16

26 94 14 17 45

22 51 09 92 16

75 85 18 50 50

60 80 52 42 11

05 70 89 53 38

57 78 12 98 55

51 48 77 54 07

66 15 33 44 64

58 20 10 51 62

06 25 56 63 67

73 73 79 05 65

55 84 17 67 52

38 16 29 05 24

12 05 35 87 31

92 44 84 04 17

47 18 78 54 40

02 59 74 06 73

86 96 79 86 75

67 31 41 40 20

87 17 85 98 70

78 84 03 69 43

38 43 98 90 75

56 49 88 52 78

25 05 76 72 06

59 37 56 24 36

95 05 30 62 02

26 67 04 13 77

37 21 57 77 41

82 30 32 80 09

87 50 88 39 26

92 72 07 07 79

64 30 91 94 34

44 56 72 31 73

50 31 60 61 92

25 83 10 59 12

42 38 35 20 97

80 90 15 96 10

36 26 08 69 40

72 01 08 61 49

52 89 23 29 43

66 48 23 50 79

11 87 48 55 60

23 16 06 63 49

58 12 02 87 92

54 50 07 21 42

60 57 63 64 82

31 00 76 35 49

09 26 43 03 52

49 89 85 48 15

78 10 98 87 90

27 16 06 60 48

24 82 63 16 18

59 66 20 02 70

09 27 37 97 64

95 95 61 84 52

20 96 33 60 12

42 31 92 39 40

06 66 32 71 29

13 48 31 76 43

07 61 35 67 17

45 47 39 40 06

10 85 24 93 91

10 16 62 33 78

48 06 83 22 55

84 21 44 10 67

27 76 80 20 44

27 62 56 95 92

55 31 57 47 74

58 27 64 85 98

91 58 06 70 18

35 35 58 08 41

3

Page 514

Generated using Microsoft Excel ©.

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy

Application Application Exercise 99-2: Sampling Strategy Instructions: Working in small groups if possible, identify an appropriate measure or statistic for each evaluation question, and decide what sampling strategy you would use for each of these situations, and why. 1. How would you determine the quality of the roads in villages in a particular region of Cambodia, immediately after the rainy season? Measure:

Sampling strategy (and reasoning):

2. How would find out what proportion of children contract at least one case of malaria before the age of 10 in a specific region of India? Measure:

Sampling strategy (and reasoning):

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 515

Chapter 9

References and Further Reading Easton, V.J., & J.H. McColl Statistics glossary: Sampling. Retrieved on August 16, 2007 from: http://www.cas.lancs.ac.uk/glossary_v1.1/samp.html Guba, E. and Y.S. Lincoln (1989). Fourth generation Evaluation. Beverly Hills: Sage Publications. Henry, G. T. (1990). Practical sampling. Thousand Oaks, CA: Sage Publications. Jackson, Gregg B. (2008). Sampling in Development Evaluations. Presentation at IPDET, June 30 and July 1, 2007. Jackson, Gregg B. (2007). Sampling for IEG Managers. Presentation, December 18, 2007. Kish, L. (1995). Survey sampling. New York: John Wiley & Sons. Krejcie, R. V., and D. W. Morgan (1970). "Determining Sample Size for Research Activities", Educational and Psychological Measurement, Vol. 30: 607-610. Kumar, R. (1999). Research methodology. A step-by-step guide for beginners. London: Sage Publications. Laws, S. with C. Harper and R. Marcus (2003). Research for development: A practical guide. London: Sage Publications. Levy, P. and S. Lemeshaw (1999). Sampling of populations (3rd edition). New York: John Wiley & Sons. Lipsey, M.W. (1990). Design sensitivity: Statistical power for experimental research. Newbury Park, CA: Sage Publications. Lohr, S. (1998). Sampling: Design and analysis. Pacific Grove, CA: Duxbury Press. Merriam-Webster (2008) Merriam-Webster Online. Retrieved January 17, 2008 from http://www.merriamwebster.com/dictionary. Neuman, W. Lawrence (2006). Social research methods— qualitative and quantitative approaches, 6th edition. Boston: Allyn and Bacon. Patton, M.Q. (2002). Qualitative research and evaluation methods. Thousand Oaks, CA: Sage Publications. Scheyvens, R. and D. Storey, editors (2003). Development fieldwork: A practical guide. London: Sage Publications. Tryfos, P. (1996). Sampling methods for applied research. New York: John Wiley & Sons. Page 516

The Road to Results: Designing and Conducting Effective Development Evaluations

Deciding on the Sampling Strategy Strategy

Web Sites: Alliance of Statistics Consultants: http://www.statisticstutors.com/#statistical-analysis American Statistical Association Directory http://www.amstat.org/consultantdirectory/index.cfm Dr. Drott’s Random Sampler http://drott.cis.drexel.edu/sample/content.html HyperStat Online: Chapter 11: Power http://davidmlane.com.hyperstat/ch11_contents.html HyperStat Online: Help with Statistics: Statistical Consultants and Tutors, http://davidmlane.com/hyperstat/consultants.html Probability Sampling http://www.socialresearchmethods.net/kb/sampprob.h tm Power Analysis www.statsoft.com/textbook/stpowan.html Research Randomizer www.randomizer.org Survey Research Methods Section www.fas.harvard.edu/~stats/survey-soft/surveysoft.html The Survey System: Sample Size Calculator www.surveysystem.com/sscalc.htm UCLA Statistics Calculator: http://calculators.stat.ucla.edu Web Pages that Perform Statistical Calculations www.StatPages.net

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 517

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 10 Planning Data Analysis and Completing the Design Matrix Introduction Once the data are collected, evaluators need to go through the it and find meaning in the words and numbers. Techniques are available to help with this task. Analysis begins with a data analysis strategy. Qualitative and quantitative data will demand different strategies and techniques. This chapter has four parts. They are: •

Data Analysis Strategy

•

Analyzing Qualitative Data

•

Analyzing Quantitative Data

•

Linking Quantitative Data and Qualitative Data.

Chapter 10

Part I: Data Analysis Strategy Developing the data analysis strategy is an important part of the planning process. It helps to know the options for data analysis, with the various strengths and weaknesses, during the planning of the evaluation. In the design matrix, the objective is to be specific and indicate the analysis and graphic that will result from information that will be collected to answer the question or sub-question. A common mistake is collecting vast amounts of data that are never used. Whether the evaluation design emphasizes mostly qualitative data or quantitative data, data collection and data analysis will overlap. Figure 10.1 shows a graph plotting data collection and data analysis over the evaluation. Notice that at the start of data collection, a small amount of time is spent in data analysis, especially if a pilot test is being done first. As the evaluation continues, more time is spent in data analysis and less in data collection.

Pilot

Hours Spent

Data Analysis

Data Collection

Time

Fig. 10.1: Data Collection vs. Data Analysis over Time.

There are two key types of data analysis, qualitative and quantitative. As might be expected, qualitative analysis is used with qualitative data and quantitative analysis is most frequently used with more quantitative data collection strategies. Qualitative analysis is best used in situations where, for example, a semi-structured interview guide is used for a fairly in-depth understanding of the intervention, including cases where we are evaluating something relatively new. It would be used to analyze responses to question, such as:

Page 520

•

What are some of the difficulties faced by staff?

•

Why do participants say they dropped out early?

•

What is the experience like for participants?

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix Quantitative analysis would be used to answer questions for which structured data collection instruments were used, such as a survey. It might be used to analyze data collected to answer questions such as: •

What are the mean scores for the different groups of participants?

•

How do participants rate the relevance of the intervention on a scale of one to five?

•

How much variability is there in the responses to the item?

•

Are the differences between the two groups statistically significant?

Part II: Analyzing Qualitative Data Patton (2002, p. 436) discusses the break between data collection and data analysis and differences depending on the qualitative or quantitative nature of the data. He states: For data collection based on surveys, standardized tests, and experimental designs, the lines between data collection and analysis are clear. But the fluid and emergent nature of naturalistic inquiry makes the distinction between data gathering and analysis far less absolute. In the course of fieldwork, ideas about directions for analysis will occur. Patterns take shape. Possible themes spring to mind. Hypotheses emerge that inform subsequent fieldwork. While earlier stages of fieldwork tend to be generative and emergent, following wherever the data lead, later stages bring closure by moving toward confirmatory data collection — deepening insights into and confirming (or disconfirming) patterns that seem to have appeared. Qualitative data analysis is used for any non-numerical data collected as part of the evaluation. Unstructured observations, open-ended interviews, analysis of written documents, and focus groups transcripts all require the use of qualitative techniques. Analyzing qualitative data is challenging, although many people find it interesting. Great care has to be taken in accurately capturing and interpreting qualitative data.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 521

Chapter 10 According to Patton (2002, pp. 436-437), qualitative data analysis begins while still in the field. When collecting data, insights may emerge. Part of fieldwork is recording and tracking analytical insights that occur during data collection and are the beginning of qualitative data analysis. On the other hand, rushing to premature conclusions should be avoided. There should be an overlap of data collection and analysis, so long as the evaluator takes care not to allow these initial interpretations to overly confine analytical possibilities. He also says that once data collection and writing are underway, fieldwork may not be over. Sometimes, gaps or ambiguities are found during analysis. If the schedule, budget, and other resources allow, an evaluator may return to the field to collect more data to clarify or deepen responses or to make new observations. Often in collecting data, members of a team will confer daily or weekly to discuss emerging themes and adapt protocols, if indicated.

Making Good Notes When collecting qualitative data, it is important to accurately capture all observations; good notes are essential. This means paying close attention to language: what people say and how they say it. Try not to interpret what they say when writing notes. Write down anything observed, any body language, or anything that happened while collecting data (for example, any interruptions during the interview). Capture immediate thoughts, reactions, and interpretations. Keep them in a separate section of the notes. As mentioned in an earlier chapter, it is extremely important to provide time immediately after an interview, observation, mapping exercises, or focus group to review the preliminary notes and make additions, and write up the notes so they will make sense when reading them later on. It is surprising how difficult it is later on to make sense of notes taken in an interview, focus group, or observation session – even from just a day earlier. Even if the session is tape-recorded, invest at least a small amount of time in a preliminary write-up while the session is still fresh, Doing so can help save hours of listening to or watching tapes or poring over transcripts.

Page 522

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix Triangulation is the use of three or more theories, sources, types of information, or types of analysis that are used to verify and substantiate an assessment by cross-checking results. Triangulation is useful in qualitative data analysis. For example, consider the following examples of mixed sources of data: •

interviews, focus groups, and questionnaires

•

questionnaires, available data, and expert panels

•

observations, program records, and mapping

•

interviews, diaries, and available data.

Table 10.1 is a summary of the key considerations in the early phase of qualitative data analysis. Table 10.1: Key Considerations in the Early Phase of Qualitative Data Analysis While collecting data

• Keep good records • Write up interviews, impressions, notes from focus groups • Make constant comparisons as you progress • Meet with the team regularly to compare notes, identify themes, and make adjustments

Write contact summary report

• Write a one page summary immediately after each major interview or focus group • Include all the main issues • Include any major information obtained • What was the most interesting, illuminating or important issue discussed or information obtained? • What new questions need to be explored?

Use tools to help you

• Create a separate file for your own reactions during the study, including your feelings, hunches, and reactions • File your ideas as they emerge • Keep a file of quotations from the collection process for use in bringing your narrative to life when you write your report

Begin the data analysis

• Make sure all of the information is in one place • Make copies and place originals in a central file • Use copies to write on, cut and paste as needed

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 523

Chapter 10

Organizing Qualitative Data for Analysis After collecting qualitative data, the evaluator will have many pages of notes and/or transcriptions from observations, interviews, and other data sources. It is often a challenge to organize and make sense of the information. There are no firm rules for how data should be analyzed but there are guidelines for organization. One is to report their analytical procedures in a detailed fashion. Documenting this process is important to demonstrate the validity of the findings to the audience (IDRC, 2008, Analysis of Data). To begin organizing, evaluators should: •

check to make sure all of the data are complete

•

make several copies of all data

•

organize the data into different files (IDRC, 2008, Step 1 – Getting Started).

With several copies of data, the evaluator can keep files organized in different ways. For example, some evaluators recommend keeping one set of data in a file in chronological order, another as an analytical file or “journal notes”, a third file with relevant notes about research methodology, and a copy of notes in a fourth, separate file (IDRC, 2008, Organizing files). Patton (2002, pp. 438-440) presents another option for organizing and reporting descriptive findings. He classifies the presentation of data into different motifs. He first classifies the organization into three categories. Each category has further classification. The following is a summary description of the options for organizing and reporting qualitative data: •

•

Page 524

storytelling approaches—presenting the data in a manner of telling a story either: −

chronologically (telling the story from start to finish)

−

flashback (start at the end, and then work backward to describe how the ending emerged)

case study approaches—presenting the data in a case study by: −

people (people are the primary unit, the case study focuses on people or groups)

−

critical incidents (major events or critical incidents, usually presented in order of appearance rather than in sequence of occurrence)

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix −

•

various settings (describing various places, sites, settings, or locations, before doing cross-setting pattern analysis)

analytical framework approaches—illuminating an analytical framework −

processes (describing important processes)

−

issues (illuminating key issues, often equivalent of the primary evaluation questions)

−

questions (organized by questions, especially where standardized questions are used)

−

sensitizing concepts (sensitized concepts such as “leadership” versus “followership”).

Patton, reminds evaluators that analysis of the data should stem from the evaluation questions. The choice of the organization of data should strive to answer these evaluation questions.

Reading and Coding Data Using these categories for organizing information is a beginning step. The next step is to read through all of the data carefully. After several readings the evaluator should be able to make some sense of the data. Patton (2002) says that coming up with these topics is like constructing an index for a book or labels for a filing system. He suggests that you look at what is there and give it a name or a label (IDRC, 2008, Reading and Coding). Once the data have been organized using the system developed by the evaluator, the data need to be described and coded. This can be done manually, by using colored pens, pencils, or papers to identify classifications for all of the data and then using scissors to cut the data and manually sort it into different piles.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 525

Chapter 10 The data can also be organized using a computer with a program to help identify and sort information. Different computer software programs can be used to help with different stages of the evaluation process. For example: •

•

•

word processors: −

assist with writing up and editing field notes

−

graphic mapping: creating diagrams

−

search and retrieval: locating text segments when required

−

writing reflective comments on various segments of the data

presentation software (i.e. PowerPoint): −

displaying data: condensing and organizing data for display

−

graphic mapping: creating diagrams

databases and/or spreadsheets and/or qualitative data analysis software −

storing: keeping text in an organized database

−

linking data: assembling relevant data into clusters, categories and networks

−

analyzing content: counting frequencies, sequences, or locations of words or phrases

−

coding: attaching codes to segments of the text for later retrieval

−

theory building: developing coherent explanations of findings (IDRC, 2008, Using Computers for Data Management and Analysis).

When, evaluators enter the data into a data file on a computer to help organize the data. Coding the data into a numeric response allows them to be processed in a meaningful way. If an evaluator is gathering data on height, weight, age, number of days absent from school, etc. these data do not need coding. The data themselves can be entered. But other data will need a numeric code to allow for analysis. For example, a questionnaire may ask the participants if they have a banking account. A computer cannot easily analyze the answers of yes or no. For this reason, an evaluator can use a code of numbers for the answer. The answer yes = 1 and no = 2. In this way the analysis can count the number of 1’s and 2’s (O’Rourke, 2000b, ¶ 2-6).

Page 526

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix Other data may be collected in ranges or opinions. For example, the question might be “to which age group do you belong?” The responses might be: •

less than 18

•

18-25

•

26-35

•

36-50

•

51-65

•

over 65.

Each of the age groups can be given a code as shown below: •

less than 18 = 1

•

18-15 = 2

•

26-35 = 3

•

36-50 = 4

•

51-65 = 5

•

over 65 = 6.

Coding can also be done for other data, such as country of origin. Each country can be given a numerical code. And the code recorded. Codes can also be established for may different kinds of data. The codes need to be established and entered consistently and accurately. In some cases, respondents do not know the answer to the question or refuse to answer. In other cases, a respondent may inadvertently skip a question. One convention many evaluators use for these responses is as follows: •

do not know = 8

•

refused or missing data = 9.

If the response to the above question might have more than one digit, the code can have multiple digits. For example for age, a person can have an age with two digits, such as 32. The code for age might be 99 for a no response. The same is true for three, four, or more digits. The code might be 999 for responses with a possible three digits, 999 for those with possible four digits, etc. Each person or record is called a “case.” Data files are made up of variables and values for each variable. Variables are each of the items to be analyzed (O’Rourke, 2000b, ¶ 3-4).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 527

Chapter 10 It is extremely important for evaluators to indicate how each variable is coded and in what field it appears. The codes should be recorded in a codebook (sometimes called a data dictionary). Those doing the coding must have access to the codebook and be trained to use it. Quick Data Quality Tips: •

Always be sure data variables are labeled

•

Make sure the data dictionary is updated for any changes made to data labels or response codes. Good documentation is essential!

•

Create a backup of the data set in case of emergency. Also, create temporary and permanent data sets wisely. Think about what needs to be done if the data are lost

•

Always keep a copy of the original data set.

Source: The Child and Adolescent Health Measurement Initiative (CAHMI) p. 64.

Example of Coding The following are first-cut coding examples. The example codes from the filed not margins on an evaluation of an education program (adapted from Patton, 2002). (P is for participants. S is for staff) Code: Ps Re Prog (meaning: participants' reactions to the program.) Code: Ps Re Ps (participants' reactions to other participants.) Code: Ob PP (observations of participants' interactions) Code: Ob SS (observations of staff interactions) Code: Ob SP (observations of staff/participants interactions) Code: Phil (statements about program philosophy) Code: Prc (examples of program processes) Code: P/outs (effects of program on participants/outcomes) Code: S-G (subgroup formations) Code: GPrc (group process) Code: C! (conflicts) Code: C-PP (conflicts among participants) Code: C-SP (conflicts between staff and participants) Code: C-SS (conflicts among staff)

The abbreviations are written in the margins directly on the relevant data passages or quotations. The full labels in parentheses are the designations for separate files that contain all similarly coded passages. Page 528

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Interpreting Data Before analyzing qualitative data, it needs to be presented clearly, in a descriptive way, referring to filed notes and other data sources. Interpreting data means finding causal links, making inferences, attaching meanings, and dealing with cases contradict the analysis. The evaluator looks for descriptive data (what the data says) and interpretation (what the data means) (IDRC, 2008, Data Presentation and Interpretation). Patton (2008, p. 453-455) describes two kinds of qualitative analysis, inductive and deductive. Inductive analysis involves discovering patterns, themes, and categories in the data. Deductive analysis involves analyzing data by an existing framework. Typically, qualitative data are inductive in the early stages (figuring out categories, patterns, and themes). Once these categories, patterns, and themes are established, the next stage may be deductive. The deductive phase involves testing and affirming the authenticity and appropriateness of the inductive analysis.

Content Analysis Content analysis is used to identify the presence of certain words or concepts within text or speech. It is systematic approach to qualitative data analysis that identifies and summarizes the messages that the data are sending. The term content analysis is usually to refer to the analysis of books, brochures, written documents, speeches, transcripts, news reports, and visual media. Sometimes content analysis is used when working with narratives such as diaries or journals, or to analyze qualitative responses to open-ended questions on surveys, interviews, or focus groups. Content analysis can start with coding data. An example of content analysis might be to examine children’s textbooks to see whether they cover the necessary material for learning a particular subject, presented in a way that is reading level appropriate and fits the context in which the children live and study. A deeper analysis might examine whether the textbooks convey a specific political agenda or biased interpretation of history. The process of content analysis assumes that words and phrases mentioned most often are those reflecting important concerns in communication. Therefore, content analysis starts with word frequencies, space measurements (column centimeters/inches in the case of newspapers), time counts (for radio and television time) and keyword frequencies. However, content analysis extends far beyond plain word counts.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 529

Chapter 10 To analyze the content, the words are coded and organized into manageable categories. These new coded categories are examined for frequency of response and relationships. Content analysis can be classified into two types, conceptual analysis, and relational analysis. The focus of a conceptual content analysis is to look at the frequency of the occurrence of selected terms within a text or texts. A relational analysis goes beyond determining frequency and explores relationships among the concepts that are identified (Busch, De Maret, Flynn, Kellum, Le, Meyers, Saunders, White, & Palmquist, 2004, Writing Guides, Content Analysis, Types of Content Analysis). Stemler (2001) comments that concept analysis goes far beyond simple word frequency counts. It relies on coding and categorizing data. Weber (1990) states that "A category is a group of words with similar meaning or connotations" (p. 37). For example, a concept analysis might identify categories using the following list:

Page 530

•

shared language on the topic, what was taken for granted and what was asked for clarification by other participants

•

beliefs and myths about the topic that are shared, taken for granted, and which ones are challenged

•

arguments which participants call upon when their views are challenged

•

sources of information people call upon to justify their views and experiences and how others respond to these

•

arguments, sources and types of information that stimulate changes of opinion or reinterpretation of experience.

•

tone of voice, body language, and degree of emotional engagement is involved when participants talk to each other about the topic (Catterall & Maclaran, 1997, 4.6).

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

The Process of Content Analysis According to Kripendorff (1980 and 2004), six questions must be addressed in every content analysis: •

Which data are analyzed?

•

How are they defined?

•

What is the population from which they are drawn?

•

What is the context relative to which the data are analyzed?

•

What are the boundaries of the analysis?

•

What is the target of the inferences?

Once these questions are addressed, choices can be made about relevant and irrelevant data. The next step is to code the transcripts. One way to code is to do it manually by reading through available transcripts and manually tallying occurrences of variables, concepts, words, or units of analysis.

Computer Help for Content Analysis It can be difficult to obtain reliable and valid results from content analysis, for this reason computer programs have been developed to assist with developing category systems and coding large amounts of data. To use computer help, all information must be in computer files that the computer program can read (i.e. MS Word). Evaluators may need to type results, scan, or recreate data files. There are software packages to help organize data derived from observations, individual interviews, and focus group interviews. These include text-oriented database managers, word processors, or automatic-indexing software. They are specifically developed for working with text applications. When entering transcripts into a word processor, they can be organized, indexed, and coded. Qualitative analysis software can be a powerful tool for organizing vast amounts of data produced through focus groups or individual interviews. These software programs are called Computer-Assisted Qualitative Data Analysis Software (CAQDAS) – also sometimes simply called Qualitative Data Analysis Software (QDAS or QDA software).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 531

Chapter 10 CAQDAS programs search, organize, categorize, and annotate textual and visual data. Programs of this type allow evaluators to visualize the relationships between and among data and/or theoretical constructs, and help them build theories. Packages on the market include Ethnograph, Qualpro, Hyperqual, Atlas-ti, QSR’s N6 (formerly NUD*IST), AnSWR, HyperRESEARCH, Qualrus, and other variations. CAQDAS can help code data. The following is a list of the strategies to help: (Loughborough University Department of Social Sciences, 2007, CAQDAS) •

Memos are the most basic way to annotate data. Like small electronic stick-up notes, one can attach memos to all sorts of data bits.

•

Free coding allows evaluators to mark sections of data and attach a code to these sections.

•

Automatic coding procedures work in various ways. The most common way is where the computer program automatically codes results from, but other procedures, such as the where the program automaticly recodes data according to previously specified queries; so-called "supercodes" in N6 also exist.

•

Software generated coding suggestions are a novel feature of Qualrus, in which an algorithm suggests codes on the basis of previously occurring codes.

•

Multimedia coding is offered by N6, HyperRESEARCH and Qualrus. These programs allow coding sequences of audio or video files and parts of pictures. Some other CAQDAS allow linking to external multimedia files.

Just as with the search functions, the range of coding functions available varies with the specific CAQDAS. For further information and Web sites, see the citations at the end of this chapter. The American Evaluation Association’s Web site contains a Web page devoted to information for computer software for analysis of qualitative data of all kinds, including text, audio, and video. The site gives links to many developer sites that describe each software package. In many cases the site also link to sites for free downloads of software or for free trial downloads of software. The American Evaluation Association’s Qualitative Software site is available at: http://www.eval.org/Resources/QDA.htm

Page 532

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix Kimberly A. Neuendorf (2006, A Flowchart for the Typical Process of Content Analysis Research) presents a flowchart showing the process of performing a content analysis. Figure 10.2 is adapted from her flowchart. Neundorf’s flowchart begins with considering the theory and rationale for the analysis. When doing a concept analysis for an evaluation, go back to your theory of change and other frontend analysis. Look at the evaluation questions and determine what you want to learn. Library work may be helpful here. Look for relationships among items in your questions with what research is showing. The second step is to conceptualize decisions. Look at the variables that can be used in the evaluation and how to define these variables. The third step is to operationalize the measures.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 533

Chapter 10 Theory and rationale: What content and why? Do you have research questions? Hypotheses?

1)

2)

Conceptualization decisions: What variables will be used in the study and how do you define them conceptually?

3)

Operational measures: Your measures should match your conceptualizations (internal validity). What unit of data collection will you use? You may have more than one unit. Are the variables measured with a high level of measurement, with categories that are exhaustive and mutually exclusive? An “a priori” coding scheme describing all measures must be created. Both face validity and content validity may be assessed at this point. Computer Coding

Human Coding 4a) Coding schemes: Create the following: 1) Codebook (with all variable measures fully explained 2) Coding form

5)

6)

4b) Coding schemes: With computer text content analysis, you still need a codebook of sorts – a full explanation of your dictionaries and method of applying them. You may use standard dictionaries (i.e. those in Hart’s program Diction) or originally created dictionaries. When creating original dictionaries, be sure to first generate a frequencies list from your text sample, and examine for key words and phrases.

Sampling: Is a census of the content possible? (If yes, go to #6). How will you randomly sample a subset of the content? This could be by time period, by issue, by page, by channel, etc.

Training and initial reliability During a training session in which coders work together, find out whether they can agree on the code of variables. Then, in an independent coding test, note the reliability on each variable. At each stage, revise the codebook/coding form as needed.

7a) Coding (human): Use at least two coders, in order to establish inter-coder reliability. Coding should be done independently, with at least 100% overlap of the reliability test.

8)

7b) Coding: (computer) Apply dictionaries to the sample test to generate per unit frequencies for each dictionary. Do some spot checking for validation.

Final reliability: Calculate a reliability figure for each variable.

9) Tabulation and reporting: .Can be done in many different ways. May report figures and statistics one at a time or cross-tabulated. Over-time trends may also be used.

Fig. 10.2: Flowchart for the Typical Process of Content Analysis Research (Source: Neuendorf, 2006)

Page 534

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Advantages and Disadvantages of Content Analysis There are both advantages and disadvantages to using content analysis for data. Advantages of Content Analysis Advantages include: •

looks directly at communication using texts or transcripts, and hence gets at the central aspect of social interaction

•

can allow for both quantitative and qualitative operations

•

can provide valuable historical/cultural insights over time through analysis of texts

•

can allow a closeness to text which can alternate between specific categories and relationships and also statistically analyzes the coded form of the text

•

can be used to interpret texts for purposes such as the development of expert systems (since knowledge and rules can both be coded in terms of explicit statements about the relationships among concepts)

•

is an unobtrusive means of analyzing interactions

•

provides insight into complex models of human thought and language use.

(Source: Busch, et al, 2004, Advantages of Content Analysis)

On the other hand, content analysis also has several disadvantages, both theoretical and procedural.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 535

Chapter 10

Disadvantages Disadvantages of Concept Analysis Disadvantages include: •

can be extremely time consuming

•

is subject to increased error, particularly when relational analysis is used to attain a higher level of interpretation

•

is often devoid of theoretical base, or attempts too liberally to draw meaningful inferences about the relationships and impacts implied in a study

•

is inherently reductive, particularly when dealing with complex texts

•

tends too often to simply consist of word counts

•

often disregards the context that produced the text, as well as the state of things after the text is produced

•

can be difficult to automate or computerize

(Source: Busch, et al, 2004, Disadvantages of Content Analysis)

Robert Philip Weber (1990) describes validity for content analysis: "To make valid inferences from the text, it is important that the classification procedure be reliable in the sense of being consistent: Different people should code the same text in the same way" (Weber, 1990, p. 12).

Suggestions for Analysis by Hand Begin setting up for analysis before starting data collection. Go back to the theory of change, look at the evaluation questions, do some library work. Use the information from these and other sources to begin determining words, phrases, and impressions that may assist with coding. Once starting to collect data, Nancy Porteous et al (1997, pp. 66-68) suggest the following process for analyzing qualitative data by hand (i.e. without computer assistance). Have the following materials available:

Page 536

•

several highlighters (a different colour for each evaluation question)

•

a worksheet for each evaluation question

•

data including notes, transcripts and/or tapes from interviews or focus groups

•

collection tools for self-completed questionnaires, registration forms, observations, or chart reviews.

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix Figure 10.3a shows an example of a blank qualitative data analysis worksheet: Evaluation Question: Color, code, or symbol: Topics

Quotes

Findings

QuPoin

kkk (Source: Porteous et al, 1997)

Fig. 10.3 10.3a: Qualitative Data Analysis Analysis Worksheet (example, blank)

Use at least one worksheet for each evaluation question. Then follow the following process. •

Write each evaluation question in the space provided at the top of each worksheet.

•

For each question, choose a code to identify the data related to the evaluation question. It might the color of a pen, pencil, or highlighter, or it might be a symbol. Record your color or symbol in the second space at the top of each worksheet.

Now that the worksheets are identified and ready, begin going through the notes and materials collected thus far and code the information on the worksheets.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 537

Chapter 10 Follow this procedure to record information: •

Read all completed tools or notes and transcripts in one sitting.

•

Use highlighters to mark the parts that deal with each evaluation question.

•

Go back and carefully read all of the data that pertains to the first evaluation question.

•

In the “Topics” column of the worksheet, write down each opinion, idea, or feeling that pertains to the expectations for that evaluation question.

•

Leave a space between each topic; giving room to keep track of how frequently each point is raised.

•

Keep a tally of the number of times an opinion, idea, or feeling is mentioned.

Figure 10.3b shows the worksheet after this step is completed. Evaluation Question: Were participants satisfied with the content of the training workshops?

Color, code, or symbol: Yellow Topics

Quotes

Findings

QuPoint

Parents decide on topics //// //// //// //// //// //// //// ////

Cover a couple of topics per session //// //// //// ///

Not enough time spent on each topic //// //// //// ///

(Source: Porteous et al, 1997)

Fig. 10.3 10.3b: Qualitative Analysis Worksheet (example showing topics recorded)

Page 538

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix When working through completing the “topics” column, address the rest of the worksheet in the following way: •

From the notes, extract and insert quotes that best represent each topic.

•

Make a preliminary conclusion about specific points and write them in the “Findings” column.

•

Organize the findings by type or category.

•

Use numbers of responses to give precision and a sense of magnitude.

Occasionally, the minority view is important and needs to be reported. Use your judgment but always make it clear that only one or a few respondents expressed that opinion. Figure 10.3c shows the completed worksheet. Evaluation Question: Were participants satisfied with the training workshops?

Color, code, or symbol: Yellow Topics

Quotes

QuPoint

Parents decide on topics

I think the process of deciding would be valuable.

Findings Findings There was a strong feeling that parents should be more involved in the choice of topics.

//// //// //// //// //// //// //// ////

Cover a couple of topics per session //// //// //// ///

Not enough time spent on each topic //// //// //// ///

Sometimes we just got into a topic and then it was time to leave or move to something else. We need more time to discuss

Many participants (38 of 52 interviewed) thought there should be more time for discussion.

(Source: Porteous, et al 1997)

Fig. 10.3 10.3c: Qualitative Data Analysis Worksheet Worksheet (example showing quotes and findings)

Remember, when using a software package or note cards when analyzing qualitative data, the goal is to summarize what has been seen or heard in terms of common words, phrases, themes, or patterns. New themes may appear later and earlier material may need to be re-read to check if the theme was actually there from the beginning, but was missed because at the time its significance was not clear.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 539

Chapter 10 When identifying the words, issues, themes, or patterns, identify where they are located in order to find them again if needed to verify exact quotes or context. This may be tedious the first time it is done; with experience, it becomes easier to locate potentially important information more quickly. The University of the West of England Web site on Data Analysis discusses the term Sock bag in the following way: Life is rarely neatly packaged up into tidy bundles. There are always cul-de-sacs, themes which peter out or are inconsistent with one another. The temptation in qualitative research is to ignore the odd categories that do not fit neatly into the emerging theory. These oddments are like the solo socks you find in your drawers, hence the sock bag phenomenon. All qualitative research projects will have oddments that defy characterisation, rather than air brush them from the picture they need to be acknowledged as part of the whole (2007, Data Analysis, Analysis of Textual Data, Sock bag).

Techniques for Analyzing Data from Focus Groups When analyzing data from focus groups, the following sources of data may be available:

Page 540

•

transcripts (verbatim) or the notes taken by a second note taker

•

moderator notes that include a separate list of impressions and reactions. Care should be taken in recording agreements within a group: for example, never report that 30% of the group “said…”, because not all participants in a group respond to every question.

•

evaluator notes.

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix With a focus group, analysis of information is an on-going process that begins as soon as the evaluator enter the field or begins the project, and continues until the final report. It is very important to have this idea firmly in mind before beginning the evaluation. If the analysis is left to the very end, large gaps in the results can occur and at that stage it would be too late to correct any problems. Early and continuous analysis serves three main purposes: •

to enable the study to focus quickly on the main issues that are important to the participants, and then explore these issues more closely

•

to check that the focus group discussions are being conducted in the best possible way (i.e., natural flowing discussion, participants not forced into answering in a particular way, and so on)

•

to examine the results of the discussion early enough to be able to check that the information you require to meet the project object is actually being collected (Dawson & Manderson, 1993, Part I, Section 7).

Dawson and Manderson (1993, Part II, Section 7) suggest a four-phase process for analyzing focus group data.

Orientation At the beginning of the project, spend time talking to people about the topic in casual conversation. Although this information should always be written down, it can be just jotted down in note form and may not need to be expanded upon too much. This information helps to build up a picture of the topic under study. Evaluators can discuss their findings in meetings that are held in the planning phase to design the project. Make special time for this activity in those meetings.

Debrief Time must be set-aside at the end of every focus group to examine the focus group activities and results. It feeds back information quickly that can be acted on immediately, and builds upon the developing picture.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 541

Chapter 10

Analysis of Transcripts The analysis of the transcripts or session notes should be carried out as the transcripts become available, not when all the focus groups have been conducted. First, read the transcripts in several different ways. First, read them as a whole to get your general impressions. Have the objectives in mind. Look for major opinions and attitudes expressed by the groups. Second, read each transcript looking for very specific things. Keep the objectives of the focus group and your design matrix in mind during your reading. Note information in the transcripts that match the objectives and other information that is arising that were not predicted. Third, reach each transcript and remove any responses that were forced from participants by poor moderating skills. As well, during the group meeting, there may be some statements that seem to be made simply because others have made them. The recorder should identify these statements. During the analysis these statements can mark these as having less importance than other responses. Also, remove any sections that were poorly transcribed and do not make much sense. Fourth, it is time to do a concept analysis and code the transcripts.

Analysis of All Focus Group Discussions Once all the focus groups have been conducted, it is time to look at all the focus group discussions together and begin to describe findings that apply to the study as a whole. Do not forget that if the evaluation is using other methods (like individual interviews and observation), it is useful to have a final report on the findings from the focus groups separately. At the end of this activity the evaluator should have produced a set of results with a detailed description of how he or she believes the results relate to the objectives.

Page 542

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Summarizing Qualitative Data Generally, evaluators report qualitative data in terms of "common themes" or "most of the interviewed participants said…”. Sometimes, however, there is an isolated idea or perspective that evaluators want to highlight, even if it is not a common theme. There is no single rule to apply, and evaluators need to tailor their reporting to the specific context and data. Sometimes it is useful to count the incidence of specific themes to give some sense of how often a particular line of thinking is encountered among respondents. For example, an evaluator might want to specify that X% of the front page news stories in the major national newspapers over a certain time period had a liberal bias as compared to Y% that had a conservative bias. Typically, evaluations use both qualitative and quantitative data. If using mixed methods (qualitative and quantitative) data collection approaches, evaluators will want to find themes and comments that help clarify and illuminate some of the quantitative data. For example, if 55% of the respondents were dissatisfied with the accessibility of the intervention, it helps to have a representative mix of comments that help illustrate the sources of the dissatisfaction. The evaluator will want to capture the "quotable quotes”. These are the actual statements of the participants and are chosen because they clearly present a theme or an important point to emphasize. There is power with these words, so select them carefully. Many report recipients members will remember a quote but not a page of description. Be careful NOT to introduce bias here. Present several different quotes that show the range of issues and perspectives about the same theme. Table 10.2 summarizes suggestions for interpreting qualitative data.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 543

Chapter 10 Table 10.2 10.2: Summary of Suggestions for Interpreting Qualitative Data

Page 544

Develop Develop Categories

• Use recurrent themes, ideas, words, and phrases. • Use categories that are large enough to capture a range of views but not so large as to be meaningless. • Make categories distinct from each other.

Code the Data

• Develop a coding scheme. • Develop decision rules for coding; they must be exhaustive and unambiguous. • Train your coders to use the coding scheme.

Check for Reliability

If using more than one observer: • Do a pre-test with a small sample of qualitative data. • Check for inter-rater reliability – do people measuring the same thing, in the same way, get the same results? • If problems exist, fix them, then pre-test again.

Analyze the Data

• • • • •

Interpret the Data

• Teams of at least two people, when possible, should review and categorize the material. They should: - compare their findings - if findings are different, review and revise. • Look for meaning and significance in the data. • Link themes and categories to processes of the program and/or to the outcomes. - are some themes more prevalent with discussing process issues? - are some themes more relevant when discussing outcome issues? - look for alternative explanations and other ways of understanding the data.

Share and Review

• Share information early and often with key informants. • Have others review early drafts with the intention of obtaining information, questions, other ways of interpreting the data, and other possible sources of data.

Write the Report

• Describe major themes (thematic approach) or present material as it reflects what happened over time (natural history approach). • Highlight interesting perspectives even if only said by one or two people. • Stay focused, with so much data; it is easy to get lost. • Include only important information. That is, ask yourself: - does this information answer the evaluation questions? - is this information useful to the stakeholders?

Bring order to the data. Consider placing data on cards. Consider placing data on a spreadsheet. Consider using a computer to assist with data analysis. Sort the data to reveal patterns and themes.

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Controlling for Bias There is some risk of bias in working with qualitative data (if not using software) in particular; we often see what we want to see and genuinely miss things that do not conform to our expectations. It helps (but does not always completely remedy the situation) to have another person assist in analyzing the data. By comparing the two analyses, new themes or different ways of understanding the data may emerge. When reporting qualitative data, sometimes it is not possible or meaningful to present a count of how many or what percent said or did something. Since all participants were not asked the same question, it is difficult to know how everyone felt about that question. Another way of controlling for bias is to have two coders review the same documents and code them in terms of themes. Having two people read and code the same set of documents helps better control for individual differences in perceptions. If the evaluators are well trained and the operational definitions and rating systems are clear and agreed upon in advance, both evaluators would have a high rate of agreement in their ratings of the material. Analysts call this inter-rater reliability and a high rate would be an indicator of credibility. A low rate of agreement between raters indicates a need to revise the operational definitions and/or rating systems.

Affinity Diagram Process When working with several people on the evaluation, it may be helpful to have each person identify what they believe are the common themes or interesting points to report. An affinity diagram is a good strategy to use here (see Table 10.3). Have people write their ideas on file cards or sticky notes (one idea per card/note, but as many cards per person as they think is important), and then have everyone place their cards or notes on a wall. As a group, they can then sort them into similar ideas and themes. By using this process, everyone's ideas are considered and there is less ownership of a single idea. It is also a very quick way to develop an organizing structure for the analysis and perhaps even the final report.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 545

Chapter 10 Table 10.3: 10.3: Affinity Diagram Process Step

Process

1.

In silence, have all team members identify ideas or themes that they observed and consider interesting or important, and write them on a card or sticky note – only one idea or theme per card, but as many ideas as a member considers important.

2.

In silence, the team members place their ideas on a wall.

3.

In silence, each person on the team begins to sort the cards so that similar ideas are placed together.

4.

Once it appears that the cards have been sorted in terms of general themes, the team discusses the groupings.

5.

The initial groups are not fixed: new groupings can be made and ideas can be shifted.

6.

The team discusses names for each of the large themes.

Challenges to Qualitative Data Analysis Although qualitative evaluation can produce comprehensive and rich data, there are several challenges to collecting and analyzing qualitative data. Qualitative data analysis can be time consuming. It may also be difficult to develop a coding scheme. Constable, et al (2005, Disadvantages of Qualitative Observational Research) identified several challenges for qualitative data analysis; the following are adapted from their list:

Page 546

•

The evaluators can bias the design of the study and contribute to bias in collecting data.

•

Evaluators must consider that sources of data or subjects may not be equally credible.

•

The groups that the evaluators are studying may not be representative of the larger population.

•

Analysis of observations can be biased. If using more than one person to assign codes, it is essential to train the coders to maintain reliability between and among the coders.

•

Any group that is studied is affected to some degree by the very presence of the evaluator. Therefore, any data collected is somewhat distorted.

•

It takes time to build trust with participants that facilitates full and honest self-representation. Shortterm observational studies are at a particular disadvantage where trust building is concerned.

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Concluding Thoughts on Qualitative Data Analysis For a number of reasons, many people are afraid of using statistics. Consequently, there is a strong tendency to think that using qualitative methods is somehow the easier option. But as we have seen in this section, there is a lot more to doing good qualitative data analysis than meets the eye of the casual observer. Analyzing qualitative data is labor intensive and time consuming, but can reveal most valuable information. Be sure to plan enough time to do this well. As noted in an earlier chapter, qualitative methods can be powerful tools for looking at causality – whether observed changes are due to the intervention or to something else. An excellent resource with astep-by-step guide for how to do systematic qualitative data analysis (descriptive, causal, and other) is Qualitative Data Analysis: An Expanded Sourcebook, 2nd edition (Miles & Huberman, 1994).

Part III: Analyzing Quantitative Data Quantitative data are numerical and are analyzed using statistics. This section will introduce some of the most important statistical concepts to know as a user and conductor of development evaluations. Statisticians divide statistics into two large categories: •

descriptive statistics, which (in its narrowest definition) is typically used to analyze census or sample data by summarizing data collected about a quantitative variable.

•

inferential statistics is typically used to analyze random sample data by predicting a range of population values for a quantitative or qualitative variable, based on information for that variable from the random sample. Part of the prediction includes a reliability statement, which states the probability that the true population value lies within a specified range of values.

While there are some data analysis techniques that are used only with inferential statistics, many can be used with both kinds of data. This overview will start with the most common data analysis techniques used for descriptive data, and then it will focus on commonly used data analysis techniques for data obtained using random samples.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 547

Chapter 10

Elements of Descriptive Statistics Descriptive statistics summarize the distribution by frequency or proportion. They may be used with sample or census data. A given variable might be gender, marital status, or citizenship. Descriptive statistics describe how many and what percent, as in: 33% of the respondents are male and 67% are female (see Table 10.4). Table 10.4 10.4: Distribution of Respondents by Gender

Male

Female

Total

Number

Percent

Number

Percent

Number

100

33%

200

67%

300

Source: Fabricated Data, 2008

There are two kinds of measures used to summarize distribution: •

A measure of central tendency shows how similar the characteristics are.

•

A measure of dispersion shows how different the characteristics are.

3--M’s Measures of Central Tendency, the 3 The three measures of central tendency that are most often used are sometimes called the three M’s:

Page 548

Mode:

Most frequent response.

Median:

Mid-point or middle value in a distribution; half the values are larger, half are smaller. Note that in even-numbered data sets, there will be no identifiable single case that represents the midpoint. In such situations, the median is defined as the average of the two middle cases (the sum of the two middle cases divided by 2).

Mean:

Average – the sum of all collected values divided by the number of values collected (sample size), calculated in the following way: mean = Σ(Xi) ÷ n

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

The Mean and the Median The two most commonly used statistics are the mean and the median.) Examples: (Swimmer, 2005, p. 2-3) Table 10.5 Shows the proportion of the population in each of six countries living in urban areas (cities and suburbs). Table 10.5 10.5: Urban Percent Populations, Sample Data Country

% Urban

Bolivia

65

Algeria

60

Central Africa Republic

41

Georgia

61

Panama

58

Turkey

75

Source: Fabricated data, 2008 survey.

Suppose we wanted to summarize the information for the urban (quantitative) variable across the six countries. The mean would be (65+60+41+61+58+75) ÷ 6 = 60. The two middle cases are 60 and 61; therefore, the median would be (61 + 60) ÷ 2 = 60.5

Notice that the mean is greatly affected by extreme values in the sample, while the median is not. Suppose the urban percentage for Turkey had been 87 (instead of 75.) The mean would increase to 62, but the median would be unaffected. For this reason, the median is the preferred measure for summarizing variables that are potentially distorted by extremely high or low values. For example, median income usually gives a clearer picture of the center of the income distribution than the mean, because income is spread over a huge range in many countries.) Alternately, it can be argued that expressing data in terms of the median wastes information (in not being affected by extremes.) For that reason, the sample mean might be considered a better predictor of the center of the population than the sample median.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 549

Chapter 10

Types of Data Which measure of central tendency to use depends on the type of data: nominal data, ordinal data, and interval or ratio data. •

Nominal data are sometimes called categorical data. The data “fit” into one of the multiple non-overlapping categories, such as gender (male, female), religion (Buddhist, Christian, Jewish, Muslim), or country of origin (Botswana, China, Ethiopia, Kenya, Peru, Zimbabwe).

•

Ordinal data are data that can be placed on a scale that has an order to it but the “distance” between consecutive responses is not necessarily the same. For example, scales that go from "most important" to "least important," or "strongly agree" to" strongly disagree" illustrate ordinal data. Ordinal data lack a zero point, such as I.Q. scores.

•

Interval/Ratio data are refereed to as real numbers. They have a zero point and fixed intervals, like a ruler, and can be divided and compared to other ratio numbers.

Table 10.6 indicates the preferred measure(s) of central tendency by type of data. Table 10.6 10.6: When to Use Types of Data. If you have this type of data:

Choose this measure of central tendency:

Nominal Data

mode

Ordinal Data

mode or median

Interval/Ratio

mode, median or mean

For interval/ratio data, the choice will also depend on the distribution itself. If it is a normal distribution, the mean, median, and mode should be very close. The mean would be the best description of central tendency. However, if the scores are a few, very high scores or a few, very low scores, the mean will no longer be close to the center. In this situation, the median will be a better descriptor of where the center of the distribution is.

Page 550

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Measures of Dispersion Two measures are commonly used to measure the spread of quantitative variables. They are the range and the standard deviation (Swimmer, 2005, pp. 4-5).

Range The range is defined as the difference between the highest and lowest value of variable. Using the data in Table 10.6, the range for the percent (%) urban population is 75 – 41 = 34. Although it is simple to calculate, the range is not very revealing, both because it is determined exclusively by two observations, with all other cases ignored, and that when the two end values are extreme, there is no sense of where all other scores lie.

Standard Deviation The most commonly used measure of dispersion for interval or ratio data is standard deviation. Standard deviation is a measure of the spread of the scores on either side of the mean. The more the scores differ from the mean, the larger the standard deviation will be. To better understand standard deviation, it is important to understand the normal distribution. This is sometimes called the “bell curve” because it resembles a bell shape. In a normal distribution the majority of the data occur in the middle of the range of data. There are fewer and fewer data at either end of the distribution. That is, most of the examples in a set of data are close to the mean, while fewer examples tend to one extreme or the other. For example, consider the data about the height of a large group of people. The frequency of each height will probably fall into a normal distribution. Most of the people will be towards the middle (mean) with fewer people being shorter and taller. At the extreme ends there will be the very shortest person and at the other end, the very tallest person. Most people’s height will be close to the mean because most people are somewhere in between short and tall. Figure 10.4 shows a normal distribution. The “y” axis shows the frequency of the data, the “x” axis shows the value of the data. For example if this were a normal distribution of the height of a population, the “x” axis would show the value (measure) of the height, the “y” axis would show the number of people with this measure.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 551

Chapter 10

Frequency

y

0

Value

x

Fig. 10.4 10.4: A Normal Normal Distribution.

Not all sets of data will have data that match the normal distribution. Some will have steeper curves; others will have more flat curves, while others will have the curve closer to one end than the other. Figure 10.5 shows examples of some of these differences. But all normally distributed data will look very similar to the normal distribution.

Fig. 10.5 10.5: Examples of Data Not Matching the Normal Distribution.

The standard deviation is a statistic that measures how closely all the data in a set of data are clustered around the mean. When the data in a data set closely match the normal distribution and cluster close to the mean, the standard deviation is small. When the data in a data set are spread differently than the normal distribution (e.g. more towards each end of the distribution), the standard deviation gets larger. The computation of standard deviation can be complicated, but is easier to understand visually. (See Figure 10.6).

Page 552

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

y

Mean

One standard deviation from the mean

Two standard deviations from the mean

x

0 68%

Three x standard deviations from the mean

95%% 98%

Fig. 10.6 10.6: Standard Deviation.

One standard deviation away from the mean in either direction on the horizontal axis (between the bold dashed lines) accounts for somewhere around 68 percent of the people in this group. Two standard deviations away from the mean (between the lighter dashed lines) account for roughly 95 percent of the people. And three standard deviations (between the dotted lines) account for about 98 percent of the people. If the curve from a data set is flatter and more spread out, the standard deviation would be larger, in order to account for those 68 percent or so of the people. The value of standard deviation indicates how spread out the examples in a set are from the mean. If everyone scored 75 on a test, the mean would be 75 and the standard deviation would be 0. If everyone scores between 7080 (also giving a mean of 75), the standard deviation would be smaller than if everyone scored between 40-90 (still a mean of 75). Put another way: Small standard deviation = not much dispersion. Large standard deviation = lots of dispersion. Standard deviation is superior to range, because it allows every case to have an impact on its value. Standard deviation measures the average distance of the measurements from the mean of the variable.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 553

Chapter 10 Table 10. 7 gives a quick summary of how standard deviation is calculated. Table 10.7 10.7: Calculating Standard Deviation. Step:

Procedure:

1.

Calculate the mean for the data.

2

Go back to the list of numbers and subtract the mean from each number. The result is a list of new numbers that indicate the deviation each score is from the mean.

3.

Square each of the deviation scores.

4.

Sum all of the squares of deviations.

5.

Subtract 1 from the number of items in the list

6.

Divide the sum of all the squares of deviations by the result of step 5 (number of items in the list minus 1).

7.

Square root of the result of step 6.

The formula for calculating standard deviation is as follows:

σ =

(

)

Σ x−x N −1

2

In the formula: σ = standard deviation Σ = the sum of

x = the mean Even with a small sample, calculating the standard deviation is time consuming. Thankfully, most statistical programs, including SPSS for Windows, and Excel can do the calculations.

Page 554

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Analyzing Quantitative Data Results Closed-ended survey results can be reported as percent answering (52% women, 48% men, for example or proportion of those asked to respond who actually responded). Sometimes, the questions ask for specific counts “Were you employed in the past week?” or “How many goats do you own?” These would also be reported in terms of percents, absolute numbers, or optimally, both. Other times, people are asked to give opinions along a scale. For example, one may ask whether the respondents have been able to apply what they have learned, and are given a five-point scale ranging from “not at all” to “a lot”. When analyzing this type of data, establish a decision rule: will the analysis focus on the percent who answered at the extreme ends of the scale, or will it focus on those who answered on either side of the middle category or the average response? Some guidelines might be helpful but there are no firm rules here (see Table 10.8). Survey rvey Data Table 10.8: Guidelines for Analyzing Quantitative Su Guideline 1.

Choose a standard way to analyze the data and apply it consistently.

2.

Do not combine the middle category with categories at either end of the scale.

3.

Do not report an “agree” or “disagree” category without also reporting the “strongly agree” or “strongly disagree” category (if used).

4.

Analyze and report both percentages and numbers.

5.

Provide the number of respondents as a point of reference.

6.

If there is little difference in results, raise the benchmark: what do the results look like when the focus is on the questions that received a majority saying “very satisfied” or “strongly disagree”?

7.

Remember that data analysis is an art and a skill; it gets easier with training and practice.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 555

Chapter 10 To gain further understanding, let us look at an example: a survey of clients of a health center. The data are shown in Table 10.9. Table 10.9: Client Views on Health Care Services at the Local Clinic

1. Considering your experiences with the local health clinic, do you agree or disagree with the following statements? Strongly disagree

Disagree

Neither

Agree

Strongly agree

I wait a long time before being seen.

10%

20%

10%

35%

25%

The staff are willing to answer my questions

5

10

30

30

25

I receive good health care at the clinic

15

25

10

25

25

N=36 Source: Fabricated Data, 2008 Survey.

One way to analyze these data is to report that half the respondents agree or strongly agree that they receive good health care and 55% agree or strongly agree that clinic staff is willing to answer questions. However, 60% agree or strongly agree that they wait a long time before being seen. In this analysis, the decision rule was to report the combined percentages of agree and strongly agree. If the data were different, one might use a different strategy. For example, consider if the results looked like those in Table 10.10. Table 10.10: Client Views on Health Care Services at the Local Clinic

1. Considering your experiences with the local health clinic, do you agree or disagree with the following statements? Strongly disagree

Disagree

Neither

Agree

Strongly agree

I wait a long time before being seen.

50%

20%

10%

15%

5%

The staff are willing to answer my questions

0

5

0

30

65

I receive good health care at the clinic

0

20

0

55

25

N=36 Source: Fabricated Data, 2008 Survey.

Page 556

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix The analysis in this case might read that 80% (combination of agree and strongly agree) of the respondents agree or strongly agree that they receive good health care and that staff is willing to answer questions. The greatest strength appears to be the willingness to answer questions, with 95% reporting they strongly agree or agree.

Commonly Used Descriptive Statistics A range of descriptive statistics can be used. Some of the more frequently used statistics are listed in Table 10.11. Table 10.11: Commonly Used Descriptive Statistics (with illustrative examples using fake data from a new university) Descriptive Statistic

Example

Frequencies (numbers, a count of how many)

50 students graduated from the university last year

Percent (proportion) distributions

20 % of the students at the university are women.

Mean (average)

the average age of the students was 25 years of age

Median (mid-point)

ages ranged from 18 to 40, with the mid-point at 24 years of age

Mode (the most frequent value)

the most frequently reported age was 22

Money: Money costs, revenues, expenditures, etc., total or average amount

• Costs of running the program increased by US $100K (or 50% over the past 5 years) • The average cost per student fell by $22, or 8%

Percent: change (or not) over two points in time (sometimes called rate of change)

the university increased its enrollment by 40% over the past year.

Ratio: Ratio the number of students per faculty member

the student/faculty ratio is 15:1

Comparisons: Comparisons could be numbers, percents, means

• the average salary for graduates from the new university was 20% higher than they had been receiving in their previous jobs • 90% of the employers of the university graduates report being very satisfied with their employees compared to 75% for those who employed graduates from other universities

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 557

Chapter 10

Which One to Use? Figure 10.7 lists some of the most frequently used descriptive analyses, what information they provide, and some examples of when they might be used.

Frequently Used Descriptive Analyses Frequency Distributions: Distributions Number and Percent Describing Parts of a Whole (100%) • percent: parts of a whole expressed as a percent, for example 75% • proportion: parts of a whole, expressed as a decimal, not as a percent: ..75, for example Rates: number of occurrences that are standardized; allows for comparison, for example: • deaths of infants/1000,000 births • crop yield per acre Ratio:

another way to show the relationship between two numerical variables; shows relative proportions, for example: • student to teacher ratio is 15:1

Rates of Change or Percentage Change: • shows change over time, for comparing two items • [(new time - older time) / (older time)] x 100 gives percent rate of change, for example: -

the rate of change from 1980 to 1985 is calculated: = [(12,000 - 10,000) / 10,000] x 100 = 20%

Rates of Change from Prior Year: Year

Acres Made Available

Rate of Change between measures

1980 1985 1990 1995

10,000 12,000 19,000 28,000

Baseline 20% 58% 47%

Therefore, acres made available increased 20% from 1980 to 1985. Take care when using the lower and upper case “n/N” with statistics. •

n = an indefinite number

• N = a set of ranked data (ordinal scale) Fig. 10.7 10.7: Frequently Used Descriptive Analyses.

Page 558

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Describing Describing Two Variables at the Same Time At times evaluators want to describe two variables at the same time. For example, suppose they want to describe the composition of the hands-on and lecture classes. For each class, they want to know what percent were boys and what percent were girls. Analysis of the data shows that the handson classes consist of 55% boys and 45% girls, while the traditional lecture classes consist of 55% girls and 45% boys. A cross tabulation (often abbreviated as crosstab) displays the joint distribution of two or more variables. Crosstabs are usually presented as a contingency table in a matrix format. Whereas a frequency distribution provides the distribution of one variable, a contingency table describes the distribution of two or more variables simultaneously. Each cell shows the number of respondents that gave a specific combination of responses, that is, each cell contains a single cross tabulation. Crosstabs are used when working with nominal and ordinal data, or when the data are categorized interval/ratio data. The following might be an interpretation of the above crosstab results: in this sample, boys are somewhat more likely (55%) to take the hands-on classes as compared to girls (45%). Table 10.12 shows an example of crosstab results applied to test results. Table 10.12 10.12: Crosstab Results Hands on Classes

Traditional Classes

Total %

Boys

55% (N=28)

45% (N=34)

100%

Girls

45% (N=22)

55% (N=41)

100%

N=125

Source: Fabricated data, 2004 survey.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 559

Chapter 10 There appears to be some relationship but how strong is it? When looking at measures of association it is important to understand the concepts of independent and dependant variables. Independent variables are variables that explain a change in the dependent variable. For example, in a program evaluation training course, the variables in the program itself are the independent variables. These might include the experience of the instructors, the backgrounds of the training participants, the curriculum used, the length of the training, the modality used, and so forth. Dependent variables are the variables to be explained. For the program evaluation, the dependent variables might be scores on a knowledge test, grade on a design matrix for an evaluation or, for example, improved evaluation designs. To illustrate, using a second example, if determining whether South African women have a lower per capita income than South African men, we could compare the mean income of women to the mean income of men, as shown in Table 10.13. Table 10.13 10.13: Comparison of Mean Incomes by Gender mean income Women N=854

27,800 South African Rand

Men N=824

32,400 South African Rand

N=1149

Source: Fabricated data, 2008 survey.

In this example, the dependent variable is annual income, and the independent variable is gender (Swimmer, 2005, p. 22). In many evaluations, evaluators are interested in whether there is a difference in the average values of a quantitative variable for a pair of samples. For example, we might be interested in these questions:

Page 560

•

As a result of an irrigation project, are the average crop yields higher than before the project?

•

Is there a difference in the proportion of patients who are satisfied with their care, comparing an older existing hospital and a new hospital built under a development project?

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix The big question is whether the difference indicates an actual difference in the population means (or proportions) or did the result occur simply by chance variation in the samples taken from the two populations? In a statistical test, it is commonly assumed that there is no difference between the two population means (or proportions. This issue will be addressed in the up-coming section dealing with inferential statistics.

Measures of Relationship Measures of relationship (or association) tell how strongly variables are related. Association never proves cause, but it can suggest the possibility of a causal relationship if (and only if) there is a strong measure of association. While there are many kinds of measures of association, they are usually reported in terms of a zero to 1 scale to indicate the strength of the relationship. A perfect relationship would score 1. A relationship showing no association at all would score zero. In other words, the closer the measure is to zero, the weaker the relationship and the closer the measure is to 1, the stronger the relationship. Measures of correlation are calculated on a scale of -1 to +1. These measures show the direction of the relationship through the sign (positive or negative). A measure with a positive sign means that the variables change in the same direction: both go up or both go down. This is called a direct relationship: for example, as the years of education increases, individual income increases. A negative sign indicates that the variables have an inverse relationship meaning that they move in the opposite directions: for example, as age increases, health status decreases. A measure of association of -1 would therefore mean a perfect inverse relationship. A measure of association of -.1 would be close to zero and is a very weak inverse relationship. A correlation between two interval or ratio variables or a multiple regression technique can be used to estimate the influence of several variables simultaneously on the dependent variable. This technique works with interval and ratio level data.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 561

Chapter 10 The Pearson product-moment correlation coefficient is used with interval or ratio data and is one of the most common ways of determining relationships. Simply put, a correlation coefficient determines the extent to which the values of two variables are “proportional” to each other (StatSoft, 2008, Pearson Correlation). The Pearson product-moment correlation is calculated as follows: r12 = [ (Yi1 - Y-bar1)*(Yi2 - Y-bar2)] / [ (Yi1 - Y-bar1)2 *

(Yi2 - Y-bar2)2]1/2

Other techniques can be used for ordinal and nominal data but they are less commonly used. These include: •

Spearman rank-order correlation (used when one variable is ordinal and the second is at least ordinal)

•

Scatter plots.

According to Swimmer (2005, p. 29), ...most social science research focuses on the relationship between two or more variables. In fact, many social science theories are based on the idea that changes in one variable cause the other variable to change in a similar or opposite direction, for example: increases in nutrition cause infant mortality rates to fall and increases in exports lead to higher levels of economic growth. This is important for evaluation research, because a development project is sometimes aimed at changing a variable, which will in turn generate a benefit, by changing another factor – a project aimed at increasing literacy rates in order to reduce the spread of AIDS, for example. Although statistical analysis can never prove causality (i.e., that nutrition causes a decline in infant mortality), it can determine whether the data supports the theory: that is, countries with better nutrition usually have lower infant mortality rates.

Page 562

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Inferential Statistics Inferential statistics enable evaluators to make an estimate about a population based on a random sample selected from that population. Whenever using sample data, the major concern is whether the results are a function of some quirkiness of the sample rather than an accurate picture of the population. If the evaluators had picked a different sample, would their results be fairly similar – or quite different? Statisticians have developed tests to estimate this. These are called statistical significance tests and do a very simple thing: Statistical significance tests allow you to estimate how likely it is that you have gotten the results you see in your analysis by chance alone. Statistical tests come in 100+ varieties. Two of the more common statistical tests are Chi Square and the t-test. The good news is that all the different statistical tests are interpreted using the same guidelines. Evaluators typically set the benchmark for statistical significance at the .05 level. This is sometimes called the alpha level or the p value (for probability of error). That is, we set the benchmark so that we are at least 95% certain that the sample results are not the result of random chance. If we want to raise the bar, we would set the level at .01 to be 99% certain that the sample results are not due to chance alone. All tests of statistical significance are partly based on sample size. If the sample is very large, small differences are likely to be statistically significant. Evaluators still need to decide whether the differences are important, given the nature of their research. Importance is always a judgment call.

Chi Square Chi-square is one of the most popular statistical tests because it is easy to calculate and interpret although not the strongest measure. The purpose of chi square is to determine whether the observed frequencies (counts) markedly differ from the frequencies that we would expect by chance. Chi square is used to compare two nominal values (for example, marital status and religious affiliation). It is also used to compare two ordinal variables (scaled responses) or a combination of nominal and ordinal variables.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 563

Chapter 10 The chi square statistic is the sum of the contributions from each of the individual cells in a data table. Every cell in the table contributes something to the overall chi square statistic. If a given cell differs markedly from the expected frequency, then the contribution of that cell to the overall chi square is large. If a cell is close to the expected frequency for that cell, then the contribution of that cell to the overall chi square is low. A large chi square statistic indicates that somewhere in the table, the observed frequencies differ markedly from the expected frequencies. It does not tell which cell (or cells) is causing the high chi-square, only that they are there. Chi square measures whether two variables are independent of one another based on observed data.

T-Test A t-test is a statistical technique that can determine whether one group of numerical scores is statistically higher or lower than another group of scores. This analysis is appropriate whenever you compare the means of two groups. It is especially appropriate for the analysis of a project to compare the mean scores of the group affected by the project with the mean scores for the control group (unaffected by the project). This leads us to a very important conclusion: when looking at the differences between scores for two groups, we have to judge the difference between their means relative to the spread or variability of their scores. The ttest does just this.

ANOVA (Analysis of Variance) The earlier section on t-tests described how the t-test is used for comparing the means of two groups. The t-test is very cumbersome to use for situations calling for the comparison of three or more groups (Weiss & Sosulski, 2003, ANOVA). When an evaluation needs to compare the means of several different groups at one time, it is best to use an analysis of variance (ANOVA). As with the t-test, ANOVA is used to test hypotheses about difference in the average values of an outcome between two groups. But an ANOVA can also be used to examine the difference among the means of several different groups at one time. ANOVA is a statistical technique for assessing how nominal independent variables influence a continuous dependent variable.

Page 564

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix ANOVA assumes two situations: •

that the populations for all groups being compared have equal standard deviations (assumption of the homogeneity of variance)

•

the samples are randomly selected from the population.

It is important to check that these assumptions are in fact the case when using ANOVA. The tests in an ANOVA are based on the F-ratio. The F-ratio is: the variation due to an experimental treatment or effect, divided by the variation due to experimental error. The null hypothesis is this ratio equals 1.0. That is, the treatment effect is the same as the experimental error. This hypothesis is rejected if the F-ratio is significantly large enough that the possibility of it equaling 1.0 is smaller than some pre-assigned criteria such as 0.05 (one in twenty) (Washington State University, 2000, What is an ANOVA?).

The Logic of Statistical Significance Testing These tests are set up to measure the probability of getting the same results if there really was no difference in the population as a whole. Evaluators call this the null hypothesis, and it is always based on zero difference in the population. Suppose a survey based on a random sample of people in Pakistan shows that there was a 5,000 rupee difference in annual income between men and women. Our test might be expressed in this way: if there really is no difference in the population, what is the probability of finding a 5,000 rupee difference in income between the men and women in a random sample? If there is a 5% chance (.05) or less (that's our benchmark), then we will conclude that the sample results are an accurate estimate of the population. We would conclude that there is indeed a difference of about 5,000 rupees, and that difference is statistically significant. Most reports do not go beyond a benchmark of .05 or 5%. This means that we are 95% certain that our sample results are not due to chance or that the results are statistically significant at the .05 level. Table 10.14 shows common tests for statistical significance and what kind of data they test. Table 10.14 10.14: Common Tests for Statistical Significance Statistical Test

Kind of Data Needed

chi square

a test using nominal or ordinal data

t-test

dependent variable, ratio data independent variables, 2 categories

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 565

Chapter 10

Simple Sim ple Regression Models Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables:

•

One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.

•

The other variable, denoted y, is regarded as the response, outcome, or dependent variable.

Simple linear regression gets its adjective "simple," because it concerns the study of only one predictor variable. In contrast, multiple linear regression gets its adjective "multiple," because it concerns the study of two or more predictor variables.

Propensity Score Matching Propensity is an often intense natural inclination or preference. Propensity score matching is a statistical method used with observational studies. It is effective when used for situations in which a group of units is exposed to a well-defined treatment, but no systematic methods of experimental design were used to maintain a control group. Earlier, we discussed the problems of experimental bias. It is commonly recognized that problems such as self-selection or some systematic judgment by the evaluator in selecting units to be assigned to the treatment can cause bias to the estimate of a causal effect obtained by comparing a treatment group with a nonexperimental comparison group. Propensity score-matching methods can be used to correct for sample selection bias due to observable differences between the treatment and comparison groups. Matching involves pairing treatment and comparison units. Using propensity score matching, with a small number of characteristics (for example, two binary variables), matching is straightforward (one would group units in four cells). However, when there are many variables, it is difficult to determine along which dimensions to match units or which weighting scheme to adopt. Under these circumstances, propensity scorematching methods can be especially useful because they provide a natural weighting scheme that yields unbiased estimates of the treatment impact (Dehejia & Wahba, 2001, Introduction).

Page 566

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix Propensity score matching allows evaluators to designate certain characteristics, such as race, income, region, and/or educational attainment, to act as matching criteria. These matching criteria serve as independent variables or controls that are used to generate the propensity score. Based on the criteria, a counterfactual group can be created. A matching variable divides respondents into control and treatment groups while the background control variables generate the propensity score that is used to match respondents into pairs. Propensity score matching, indeed, provides an important alternative for addressing selection bias. Future research needs to explore potential limitations such as sample size, the extent of overlap between treatment and control groups, and hidden bias.

Data Cleaning Data cleaning (may also be called data cleansing or data scrubbing) is the process of removing errors and inconsistencies from data in database in order to improve the quality of the data (Rahm & Do, p. 1 ¶ Introduction). Data with errors or inconsistencies are often called “dirty data”. Data analysts estimate that up to half of the time needed for analysis is typically spent in cleaning the data. Typically, this time is underestimated. Once a clean dataset is achieved, the analysis itself is quite straightforward (P.A.N.D.A., 2000, Chapter 2 Data Cleaning). Common sources of data errors in a database are: •

missing data

•

“not applicable’” or “blank”

•

typing errors on data entry

•

data formatted incorrectly

•

column shift (data for one variable column was entered under the adjacent column)

•

fabricated data (“made up’” or contrived)

•

coding errors

•

measurement and interview errors

•

out of date data(adapted from P.A.N.D.A., 2000).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 567

Chapter 10 For example, consider data that have been entered into a database by several different people. One person typed all of the data in using all capital letters, another person entered with only proper names with initial letters in capital letters all others were in lower case. Another person entered each address as one entry. Another person entered the street address as one entry, the township as another entry and the country as another entry. In addition, data was merged from a previous evaluation that was done ten years ago. The older data collection had fewer questions, so there are no responses for some of the new questions. Another problem with the data are that two questions were questioned by many of the respondents. They did not understand how to answer the question. The persons recording the answers made their best guesses of how to deal with these questions, but each data recorder used different rules for coding. And in several cases, some respondents when asked to select from a scale of 1 to 5, went outside the scale and used decimal points (e.g. 2.5). Some coders rounded this down to 2, some rounded up to 3, and some treated it as missing data. All of these problems are due to human error in responding to items or in entering the data. Many of these problems can be minimized there are rules for coding data for data entry and the rules are strictly enforced. However, there will always be errors in data entry and data entry needs to be checked. Evaluators need to set up rules for coding responses and keeping track of original questionnaires (to refer to when possible errors are identified so they can be checked). Evaluators will need to inspect the data and try to identify any “dirty data”. What can one do if data are found that have typographical errors, formatting errors, or other incorrect entries? Once data are entered, they should be “screened” and “cleaned” before they are analyzed. For example, consider data with school records. Look at the question. For questions about gender there could only be 1 for male and 2 for female. If you find other responses, these would be errors. For the question about having a physical exam, the only possible values would be 1 for yes, 2 for no, 8 for “do not know”, or 9 for missing or refused. For height information, the evaluator can look for height or weight far above or below those expected for the age of the student (O’Rourke, 2000, ¶ 3).

Page 568

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix How can evaluators deal with cleaning data? There are several ways, from low tech to high tech. A low tech approach is to visually scan the data. Either print out a copy or use the computer screen. Look for impossible data – those data that are impossible answers to the questions. Once you identify a possible error, go back and check the original data. Data records often have incomplete or missing data. Statistician handle incomplete data in the following ways: •

deleting incomplete entries

•

filling in incomplete entries based on the most similar complete entry (“hot deck imputation”)

•

filling in incomplete entries with the sample mean (“mean substitution”)

•

using a learning algorithm or criterion (EM, max likelihood) to infer a missing entry (Gao, Langberg, & Schulman, 2006, Introduction).

Computer software programs to help clean data. The programs check for out of range data. However, most important is the need for a record of data cleansing decisions that were made.

Part IV: Linking Qualitative Data and Quantitative Data Miles and Huberman discuss how qualitative and quantitative data may be linked in a study (Miles & Huberman, 1994, pp. 40-43). They begin their discussion with a quote from Fred Kerlinger, a highly regarded quantitative researcher: “There’s no such thing as qualitative data. Everything is either 1 or 0.” They then offer an opposing view: that all data are basically qualitative. They also discuss the concept that all research has a qualitative grounding. The argument defending quantitative or qualitative data has been going on for some time and will probably continue long into the future. Development evaluation uses both quantitative and qualitative data to understand the world; that is. “…quantities are of qualities, and a measured quality has just the magnitude expressed in its measure”(Miles & Huberman, 1994). Miles and Huberman ask the question “should the two sorts of data and associated methods be linked during study design, and, if so, how can it be done, and, for what purposes?” These researchers identify others’ views on these questions. The following is a summary of information from other authors on this topic.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 569

Chapter 10 Linking two sorts of data allows the evaluator to: •

enable confirmation or corroboration of each other via triangulation

•

elaborate or develop analysis, providing richer detail

•

to initiate new lines of thinking through attention to surprises or paradoxes, “turning ideas around” and providing fresh insight (Rossman & Wilson, 1994, pp. 315-327).

Greene & Caracelli (1997, p. 7) describe two levels of the value of mixing methods in evaluation: •

epistemological – that we can know something better if we bring multiple ways of knowing to bear on it

•

political – that all ways of knowing are partial, and hence multiple, diverse ways of knowing are to be valued and respected.

They also state: “Good mixed-method evaluation actively invites diverse ways of thinking and valuing to work in concert toward better understanding.” and “Different kinds of methods are best suited to learning about different kinds of phenomena.” Hawkins (2005, pp. 8-9) lists the following benefits of using an integrated mixed method approach to evaluation.

Page 570

•

Consistency checks can be built in with triangulation procedures that permit two or more independent estimates to be made for key variables (such as income, opinions about projects, reasons for using or not using services, etc.)

•

Different perspectives can be obtained. For example, while evaluators may consider income or consumption to be the key indicators of household welfare, case studies may reveal that women are more concerned about vulnerability (defined as lack of access to social support systems in times of crisis), powerlessness, or exposure to violence.

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix •

Analysis can be conducted on different levels. Survey methods can provide good estimates of individual, household, and community-level welfare, but they are much less effective for analysing social processes (e.g. social conflict) or for institutional analysis (how effectively public service operate or are perceived by the community). There are many qualitative methods designed to analyse issues such as social processes, institutional behaviour, social structure, and conflict.

•

Opportunities can be provided for feedback to help interpret findings. Survey reports frequently include references to apparent inconsistencies in findings, or interesting differences between groups that cannot be explained by analysis of the data. In most quantitative evaluations, once the data collection phase is completed, it is not possible to return to the field to gather additional data.

•

Survey evaluators frequently refer to the use of qualitative methods to check on outliers – responses that diverge from the general patterns. In many cases the data analyst has to make an arbitrary decision as to whether a respondent reports conditions that are significantly above or below the norm should be excluded (assuming that it is a reporting error) or the figures are adjusted. Qualitative methods permit a rapid follow up in the field to check on these cases.

•

Benefits and implementation strategies vary according to the professional orientation of the evaluator. The perceived benefits of integrated approaches depend on the evaluator’s background. From the perspective of the quantitative evaluator, a qualitative component will help to identify key issues to be addressed in the evaluation; refine questions to match the views of respondents; and provide information about the social, economic and political context within which the evaluation takes place. It is also possible to return to the field to follow up on interesting findings.

•

Qualitative evaluators will see different benefits to using quantitative methods. Sampling methods can be used to allow findings to be generalised to the wider population. Sample selection can be co-ordinated with ongoing or earlier survey work so that findings from qualitative work can be compared with survey findings. Statistical analysis can be used to control for household characteristics and the socioeconomic conditions of different study areas, thereby eliminating alternative explanations of the observed changes.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 571

Chapter 10 Qualitative methods provide more context and quantitative approaches allow generalization of the findings to other situations. Hawkins (2005, p. 10-11) discusses when and when not to use a mixed method approach. “When to use mixed methods: −

When an in-depth understanding of the intervention and its context are required.

−

When there is a limited budget and significant time constraints and, for example, triangulation can help to validate information collected from different sources using different methods and small samples).

When not to use mixed methods: −

There are few questions that would be relatively easy to answer using a single method approach. For example:

Secondary analysis of existing data

Participant-observer operations

Critical incident case studies of a unique situation e.g. Foot and Mouth infection

−

Generalisability of the findings is required and the indicators/measures are straightforward and for example, a survey alone would suffice, or analysis of an administrative database.

−

Less likely to be successful when evaluators with expertise in the chosen methods are not available for the whole study – better to only use those methods for which the expertise is available. Better to do a mono-method well than to create a mess with mixed methods.

−

When there is a strong commitment held by key stakeholders to a particular paradigm to the exclusion of all others (and they will therefore remain doggedly unconvinced by a MM approach no matter how good it is).

−

Time available at the analysis and interpretation stage is very limited. “

Case 10-2 gives another example of a mixed method approach.

Page 572

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Case 10.2, ORET/MILIEV Programme in China “The Development and Environment Related Export Related Transactions (ORET/MILIEV) Program is a program to finance certain types of projects through a combination of a development cooperation grants and commercial loans. The program is designed to help general employment, boost trade and industry, and improve environmental quality in developing countries”. (Chinese National Centre for Science and Evaluation, 2006, pp. 1-5)

Table 10.15 shows the structure of the database of initial evidence for the 35 projects visited during the evaluation. Table 10.15: 10.15: Structure of the database of initial evidence for the visited projects Evaluation Criteria

Issues

Project 1

Policy Relevance

Listed as a priority in the local/ministries’ development plans

Project 2

In line with sector development strategies at national level O/M info-sources Main reasons for applying for O/M Without support, would the project be implemented Who initiated the project Effect on poverty alleviation (during appraisal) Effect on W&D (during appraisal) Effect on environment (during appraisal) Whether the project/program objectives be adjusted

Efficiency

State of the project Achieved the objectives on schedule and within budget Delays during implementation comparing with fact sheet Delays during implementation according to end user Main causes of delays Price of the equipment Main causes resulting in higher prices Period of appraisal by Dutch side Duration of application procedures in Dutch Duration of (whole) application procedures in China On-lending procedure (time-consuming) Acquiring tariff exemption Cooperation between end user and supplier Comments on efficiency by end user

(Source: Chinese National Centre for Science and Evaluation, 2006, pp. 36-37) (continued on next page)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 573

…

Chapter 10 Case 10-2 continued Evaluation Criteria

Issues

Effectiveness

Achievement of the short-term objectives

Project 1

Project 2

Achievement of the long-term objectives Successfully trained personnel Quality of delivered goods/services Spare parts supply Prices of spare parts After sales service (degree of satisfaction) Number of jobs created in the project Increasing indirect employment Equipment still functioning x years after FCC signed Project sustainability Repayment of loans New supplier representative offices/joint venture/agents, etc. China Impacts

Overall impacts on end user Additional Dutch exports to China Demonstration and replication effect Poverty alleviation Gender and development Environmental impacts Local economic promotion Future cooperation opportunities

Suggestions

On appraisal by the Dutch side On application procedures on the Dutch/Chinese side On on-lending procedures On lowering prices On 60% content On working capital/matching funds On qualification of Dutch suppliers On bidding On purchasing procedure On evaluation/supervising

Page 574

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Summary After collecting data, evaluators will have much data to sort through and try to analyze. There are many techniques to assist, some appropriate for qualitative data, other for quantitative data. A plan is important for data analysis. In this way, evaluators do not collect data that they never use. Qualitative data analysis is used for non-numerical data. The data can be gathered using unstructured observations, openended interviews, analysis of written documents, and focus group transcriptions. With qualitative data the notes taken during the data collection are extremely important so the notes must be detailed and include everything that goes on. After collecting qualitative data it needs to be organized. These data can be sorted so patterns and commonalities appear by using manual sorting or computers to assist. Once sorted, the data can be coded and then interpreted. Content analysis and the affinity diagram process are techniques to assist with analyzing qualitative data. Analyzing qualitative data is labor intensive and time consuming, but can reveal valuable information. Quantitative data are numerical and are analyzed using statistics. Statisticians divide statistics into two large categories: descriptive and inferential. Descriptive statistics describe the frequency and/or percentage distribution of a single variable within a sample. Measures of central tendency are mean, median, and mode. Common measures of dispersion are range and standard deviation. The following are other commonly used descriptive statistics: •

frequency distributions (use for number or percent)

•

describing parts of a whole (percent or proportion)

•

rates (number of occurrences that are standardized)

•

ratio (relationship between two numerical values, shows relative proportions)

•

rates of change or percentage change (change over time for comparing two items).

Measures of relationship tell how strongly variables are related using measures of correlation. Inferential statistics enable evaluators to make an estimate about a population based on a random sample selected from that population. Common inferential statistical tools are Chi square, t-tests, and ANOVA.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 575

Chapter 10 Generally, evaluators use both qualitative and quantitative methods. Both have their advantages and disadvantages. Using mixed methods (more than one method) has many benefits and are best used when an in-depth understanding of the intervention and its context is required. Mixed methods are also valuable when there is a limited budget and significant time constraints because of the triangulation from information collected using different sources. Mixed methods may not be helpful if there are few questions that are relatively easy to answer using a single approach.

Completed Design Matrix On the following pages is Table 10.16. It is an example of a completed design matrix. The example is from a secondary school vocational training program in Outer Baldonia. The example illustrates using a matrix to assist with planning the evaluation by making decisions on evaluation questions, indicators, targets, design, data sources, data collection, and data analysis.

Page 576

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix Table 10.16: Example Design Matrix Matrix Design for: Outer Baldonia Secondary School School Vocational Training Program Main Evaluation Issue: Issue Should this program be reauthorized?

General Approach: Approach Quasi-experimental Impact Evaluation

Questions

SubSub-Questions

1. What services did the program provide to whom?

1.A.1. In what vocational skill areas were participants trained?

Descriptive

1.A.2. Were there changes over time in participation by skill area?

2. To what extent was there gender equity in the services delivered?

Page 1

Type of (Sub)Question

Target or Standard (if normative) normative)

Baseline Data?

Vocational skill areas program offered to trainees for certification

NA

NA

Descriptive

Same as above by number of participants each year NA

NA

Yes

1.B.1. What support services were offered by the program?

Descriptive

Support services (e.g. literacy, counseling) offered by the program

NA

NA

1.B.2. What proportion of participants received support services? 1.C. What were the most popular certification areas selected by trainees in the vocational training program?

Same as above

Number and percent of trainees receiving each type of support Number of trainees by certificate areas

NA

NA

NA

NA

1.D. 1.D To what extent did the program provide certification in areas forecast as high demand for the next 5-10 years?

Descriptive

List of vocational areas forecast as high demand over the next 5-10 years

NA

NA

2.A.1. Did equal numbers of males and females participate in the program?

Normative

Number and proportion males/females receiving vocational training .

Program authorizing documents indicate 50% participation goal for females

NA

2.A.2 Is receipt of support services related to gender?

Normative

Proportions by gender receiving each of the support services offered

Program authorizing documents indicate gender equality a goal

NA

Descriptive

The Road to Results: Designing and Conducting Effective Development Evaluations

Measures or Indicators

Page 577

Chapter 10

Design Matrix Design 1A.1. One Shot

Page 1 continued Data Sources

Sample or Census

Program records (MIS) Program Director

For each of past 5 years

Data Collection Instrument Record Retrieval Document 1 Program Officials Interview Guide Same as above

Data Analysis

Comments

Frequency count Content analysis

Data sources should match; note any discrepancy and explain

Same as above by year

Graphic would be good here

1.A.2. Time Series

Same as above

Same as above

1.B.1. One Shot

Program records (MIS)

Census over past 5 years

Record Retrieval Document 2

List

Check for duplicates such as M. Smith and Mary Smith

1.B.2. 1.B.2 Same as above

Same as above

Same as above

Same as above

Frequency count

1.C. One Shot

Program records (MIS)

Census over past 5 years

Record Retrieval Document 2

Frequency count

Note that participants can receive more than one support service Graphic

1.D. Time Series

Labor Ministry Annual Reports on Short-, Mid-, and Long-Term Labor projections

Reports for each of past 5 years

Record Retrieval Document 3

Trend analysis and forecast for each certification area offered over the past 5 years

2.A.1. Time Series

Program records (MIS)

For each of past 5 years

Record Retrieval Document 1

2.A.2. Same as above

Same as above

Same as above

Same as above

Frequency counts by gender; present as line chart so trend over 5 years is clear Compare to standard Same as above

Page 578

Note changes in trends and program’s responsiveness to them Note whether there were potential growth areas in which the program did not offer training Show standard in heavy black line across the line chart so it is easy to see

Note if there were changes over time

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Design Matrix Questions 3. Was the program effective?

SubSub-Questions

Type Type of (Sub)Question

Measures or Indicators

Target or Standard (if normative)

Page 2 Baseline Data?

3.A. To what extent were the annual job placement targets met or exceeded?

Normative

Job placement rates by year

Yes. 80% of those completing the program

NA

3.B. To what extent were the annual average placement wage rates met or exceeded?

Normative

Job placement wages for each year

Yes. $2 per hour years 13 and $3 an hour years 4&5

NA

3.C. 3.C To what extent were participants placed in jobs that matched their certification areas?

Descriptive

Trainee certificate area and job placement area

Implicit standard only so treated as descriptive question

NA

3.D, What was the program’s drop-out rate?

Normative

Program documents indicate should not be more than 10%

NA

4. Was the program cost- efficient?

4.A. Was the program cost per participant reasonable in relation to similar programs?

Descriptive

Number entering the program each year and number graduating each year Cost per placed trainee compared to other similar training programs

Implicit standard only so treated as descriptive question

NA

5. To what extent was instructor turnover a problem?

5.A, What was the turnover rate of instructors?

Descriptive

Turnover rate annually and sum for instructors

None set. Implicit standard is that it should be low

NA

5.B. How long did instructor positions stay vacant?

Descriptive

Average length of vacancies and range

None set. Implicit standard is that it should be low

NA

5.C. Were equally qualified instructors found as replacements?

Descriptive

Years teaching experience Years working in area of certification

None set. Implicit standard is that they should be comparable

NA

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 579

Chapter 10

Design Matrix Questions

6. To what extent were trainee dropouts a concern?

7. To what extent do those trainees placed in jobs earn more than they would have absent the training program?

Page 580

Page 2 continued Sub -Questions

Type Type of [Sub]Question

Measures or Indicators

Target or Standard (If Normative)

Baseline Data?

Background documents indicate 10% is acceptable; Implicit that it would be the same rate for each gender

NA

6.A. 6.A What were the numbers and percentages of males and females dropping out of the program each year?

Descriptive proportions by gender

Number and proportion males/females starting the program by year and dropout rates by year

6.B. 6.B What were the certification areas from which they dropped?

Descriptive

Above by certification area

NA

NA

6.C. 6.C What were the common reasons for dropping out of the program?

Descriptive

Most frequent reasons for dropping out.

NA

NA

6.D. 6.D How concerned were program officials about drop-out rates?

Descriptive

Awareness of drop-out rates Opinion on whether problem Actions taken

None specified

NA

7.A. 7.A What are the job retention, salary increase, promotion, and firing rates of placed participants compared with others similar in characteristics who did not participate in the training program?

Cause & effect

Placed participant job retention, starting salary, salary increases, and promotion and firing rates over two years compared with (i) others hired by the firms for similar positions over a comparable period; (ii) pre-program earnings; (iii) earnings of program drop-outs

None specified

Yes, on placement wages

7.B. What are employers’ views of the performance of placed employees compared with others hired with similar characteristics who did not receive the training program?

Cause & effect

(i)Likelihood of hiring training participants absent the program and (ii) hiring more program grads; (iii) views on job performance compared with others

None specified

Initial placement wages; pretraining wage, if any

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Matrix

Design Matrix Design 3A. Time Series

One Shot 3B. Time Series

One Shot 3C. 3C One Shot

3D. 3D Time Series 4. One Shot

5A. Time Series 5B. Time Series 5C. Time Series

Page 3 Data Sources

Sample

Data Collection Instrument

Data Analysis

Trainee records by year for each of 5 years (MIS) Trainee records for the last year (MIS)

Census of those Record Retrieval Form 4 Comparison to standard placed by year each year and cumulatively across the 5 Employer Interview years. Guide Employers/Employers’ Random sample Employer Record Form Match of MIS info with records 1 employer information Trainee records by year Census of those Record Retrieval Form 4 Comparison to standard for each of 5 years placed by year each year and Employer Interview cumulatively across the 5 Trainee records for the Random sample Guide years. last two years (MIS) Employer Record Form Employers/Employers’ Match of MIS info with records employer information Trainee records for past 2 Census of those Record Retrieval Form 4 Frequency count of years showing placed matches by year and certification area (MIS) Employer Interview cumulatively Guide Employers hiring trainees Random sample Employer Record Form over past 2 years 1 Trainee records by year Census Record Retrieval Form 4 Comparison to standard for each of 5 years (MIS) each year and cumulatively Program Financial Office Census- all 5 Interviews Cost per participant Placement rates (see 3A) years of Cost per participant Program Financial financial placed Statements statements Existing evaluations of Literature review Content analysis similar training programs Program employment records Program financial records

All 5 years

Record Retrieval Form 5 Frequency counts, range

All 5 years

Record Retrieval Form 5 Frequency counts, range

Program employment records- c.v.s

All 5 years

Record Retrieval Form 6 Comparisons of staff c.v.s.

Comments Need to validate the information in the records by confirming placements and starting rates with employers for a sample of trainees. Recall will likely be a problem for past two years Need to validate the information in the records by confirming placements and starting rates with employers for a sample of trainees. Recall will likely be a problem for past two years

Hope to be able to compare cost per placed trainee to that of other similar training programs

and average and average

The Road to Results: Designing and Conducting Effective Development Evaluations

Across the 5 years

Page 581

Chapter 10

Design Matrix Design 6.A Time series

6B Time series

6C. 6C One Shot

Page 3 continued Data Sources Trainee records by year for each of 5 years Trainee records by year for each of 5 years Trainee records for past two years Training program officials

6D. 6D One Shot

7.A Quasi-experimental DesignNon-equivalent Groups 7B One Shot

Page 582

Trainee records for past two years Training program officials Employers and Employer records

Sample Census by year

Record Retrieval Form 4

Census by year

Record Retrieval Form 4

Census of program dropouts for the two year period Senior officials

Former participant survey

Census of program dropouts for the two year

Program Officials Interview Guide Program Officials Interview Guide

Data Analysis Frequency distribution of dropouts by year; percent drop-out by year by gender, and totals Cross tab of certification areas by frequency of program dropout annually and cumulatively Frequency distribution of reasons for dropping out of program from (i) participant perspectives, and (ii) training program officials’ views

Comments

Might run test of significance—Chi Square? Triangulate

Content Analysis and frequency counts

Census for prior two years

Employer Interview Guide Employer Record Form 1 Former Participant Survey

Content analysis/frequencies

No comparison group formed at program initiation; proxies need to be used

Census for prior two years

Employer Interview Guide Employer Record Form 1

Content analyses/ frequencies

Note C&E design limitations Employee performance evaluations are confidential; no access

Program drop-outs

Employers and Employer records

Data Collection Instrument

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Design Matrix

Chapter 10 Activities Application Exercise 1010-1: Affinity Diagram Process Instructions Obtain three or four fairly substantial newspaper articles about an issue in the area where you work. Read each of the articles to determine the major themes. If you are working with others on this chapter, have them do the same. Then together use the affinity diagram process to identify themes. [If you are working alone, please skip this and go on to Exercise 10-2.] You will be given pieces of paper. Write one theme or idea on each of the pieces of paper. This is to be done in silence. Once you have written all your themes, stick the pieces of paper randomly on the wall. Others should add their pieces of paper to the wall. The group has to discuss until they reach consensus on which paper pieces go where. As a group they begin to arrange into similar groups—again in silence. Once the pieces of paper begin to take shape according to themes, people may talk as they continue to re-arrange the post-its. The large group will then identify the themes.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 583

Chapter 10

Application Exercise 1010-2: Qualitative Data Coding and Analysis Instructions Instructions Obtain three or four fairly substantial newspaper articles about an issue in the area where you work. Read each of the articles to determine the major themes. If you are working by yourself and have access to a computer, set up an Excel grid (or other spreadsheet program, or even a large, handwritten paper grid if you prefer). In the top left cell, write “article”, and at the top of the second column, write the word “excerpt”. Next, enter each of the comments from the interviews, one per row, in the column labeled “excerpt”. Identify themes from the articles, enter them as column headings, and mark the appropriate cell where a particular article excerpt contains each theme. Finally, write a narrative summarizing the findings from the three articles.

Page 584

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Design Matrix

Application Exercise 1010-3: Common Mistakes in Interpreting Quantitative Data 1. A survey asked participants to report their perceptions and 80% said it was helpful. Which would be the better way to report the findings: a. ____"The program is helpful" b. ____"The participants found the program helpful." 2.

The respondents were asked to identify the barriers and supports to the program. What is the problem with reporting the results in terms of “pros" and "cons"?

3.

A survey asked students to rate various components of the course and most rated each of the components positively. What is the problem of writing that “most (70%) of the students felt the course was a success”?

4. 40% of the women and 30% of the men favored curriculum changes. Is it accurate to report the findings as: “a majority of women favored curriculum changes”? 5. 51% favored changing the curriculum. Is it accurate to say "more than half of the respondents said…”? 6. The survey was completed by 5 of the 20 instructors. All five agreed that they were well prepared. Is it accurate to say, “All of the instructors were well-prepared”? 7.

In the same survey, is it accurate to report "Eighty percent of the participants said the materials were organized”?

8.

Is it accurate to report a 100% increase of women in political office when the actual number increased from 2 to 4, out of 50 elected positions?

9.

A training program found that those who participated in the program earned 20% more than those not in the program. Is it accurate to report, “The program caused a 20% increase in salary”?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 585

Chapter 10

Application Exercise 1010-4: Analyzing Results from a Questionnaire Instructions: Please complete this survey, and have two colleagues who are also working through these training chapters also complete it. [If no one else is working through these chapters, have colleagues complete Question #1.] Collect the surveys and tally the results. Working by yourself or with others, summarize all the results. Include some conclusion about the overall findings. IPDET Survey

ID: _______

1. To what extent, if at all, would you say that you currently have the analytic capacity to do each of the following? Little or no extent

Some extent

Moderate extent

Great extent

Very great extent

a. Design an evaluation b. Analyze data c. Develop a survey d. Conduct a focus group e. Facilitate a stakeholders’ meeting f. Write an evaluation report g. Prepare an oral briefing

2. At this point in this training program, would you strongly agree or disagree with each of the following statements: Strongly Disagree

Disagree

Neither

Agree

Strongly Agree

a. The material is new to me b. The material is interesting c. There is sufficient lecture d. There is sufficient class discussion e. The exercises are helpful f. I am learning material I can use

Any comments?

Page 586

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Design Matrix

References and Further Reading Catterall, M. and Maclaran, P. (1997). “Focus group data and qualitative analysis programs: Coding the moving picture as well as the snapshots” Sociological Research Online, vol. 2, no. 1, . Retrieved January 16, 2008 from http://www.socresonline.org.uk/socresonline/2/1/6.html Babbie, E., F. Halley, and J. Zaino. (2000). Adventures in social research. Thousand Oaks, CA; Pine Forge Press, p. 27 Busch, Carol; Paul S. De Maret; Teresa Flynn; Rachel Kellum; Sheri Le; Brad Meyers; Matt Saunders; Robert White; and Mike Palmquist. (2005). Content Analysis. Writing@CSU. Colorado State University Department of English. Retrieved [January 15, 2008] from http://writing.colostate.edu/guides/research/content/ Child and Adolescent Health Measurement Initiative (CAHMI)"Step 4: Monitor survey administration and prepare for analysis." Retrieved January 21, 2008 from www.cahmi.org. Chinese National Centre for Science and Technology Evaluation (NCSTE) (China) and Policy and Operations Evaluation Department (IOB) (the Netherlands) (2006). Country-led Joint Evaluation of the ORET/MILIEV Programme in China. Amsterdam: Aksant Academic Publishers. Constable, Rolly, Marla Cowell, Sarita Zornek Crawford, David Golden, Jake Hartvigsen, Kathryn Morgan, Anne Mudgett, Kris Parrish, Laura Thomas, Erika Yolanda Thompson, Rosie Turner, and Mike Palmquist. (2005). Ethnography, Observational Research, and Narrative Inquiry. Writing@CSU. Colorado State University Department of English. Retrieved January 18, 2008 from http://writing.colostate.edu/guides/research/observe/ . Dehejia, Rajeev H. and Sadek Wahba (2001). Propensity socrematching methods for nonexperimenal causal studies. Retreived January 22, 2008 from http://www.nber.org/~rdehejia/papers/matching.pdf Denzin, N. and Y. Lincoln (Eds.) (2000). Handbook of Qualitative Research, (2nd Ed.) Thousand Oaks, CA: Sage Publications. Firestone, W. (1987). Meaning in method. The rhetoric of quantitative and qualitative research. Educational Researcher, 16.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 587

Chapter 10 Glass, G. and K. Hopkins (1996). Statistical methods in education and psychology (3rd Ed.). Boston: Allyn and Bacon. Gao, Jie; Michael Langberg, Lenard J. Schulman (2006). Analysis of inpmplete data and an intrinsic-dimension Helly Theorem. Retrieved January 21, 2008 from http://www.cs.sunysb.edu/~jgao/paper/clustering_lines.p df Greene, J.C., V.J. Caracelli, and W.F. Graham (1989). “Prevention effectiveness: A guide to decision designs.” Educational evaluation and policy analysis: 11. Greene, J.C. and V.J Caracelli, (1997) Defining and describing the paradigm issue in mixed-mthod evaluation. Advances in mixed method evaluation: The challenges and benefits of integrating dverse paradigms. New directions for evaluation, Number 74, Summer 1997. Jossey-Bass, San Francisco. Hawkins, Penny (2005) Thinking about mixed method evaluation. Presentation at IPDET, 2005. International Development Research Centre (IDRC) (2008). Qualitative research for tobacco control, Module 6: Qualitative data analysis. Retrieved February 19, 2008, from http://www.idrc.ca/en/ev-106563-201-1DO_TOPIC.html Jaeger, R.M. (1990). Statistics: A spectator sport (2nd Ed.). Thousand Oaks, CA: Sage Publications. Krippendorf, Klaus (2004). Content analysis: An introduction to its methodology. 2nd edition. Thousand Oaks, CA: Sage Publications. Loughborough University Department of Social Sciences. (2007). New methods for the analysis of media content. CAQDAS- A primer. Retrieved August 9, 2007 from: http://www.lboro.ac.uk/research/mmethods/research/sof tware/caqdas_primer.html#what Merriam-Webster online (2008). Definition of propensity. Retrieved January 22, 2008 from http://www.merriamwebster.com/dictionary/propensity Miles, Mathew. B., and A. Michael Huberman (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage Publications. Morse, Janice M. and Lyn Richards (2002). “The integrity of qualitative research In “ Read me first for a user’s guide to qualitative methods by J.M Morse and L. Richards. Thousand Oaks, CA: Sage Publications.

Page 588

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Design Matrix Neuendorf, Kimberly A. (2006). The content analysis guidebook online. Retrieved January 15, 2008 from http://academic.csuohio.edu/kneuendorf/content/index.h tm O’Rourke, Thomas W. (2000a). Data Analysis: The art and science of coding and entering data. American Journal of Health Studies. Retrieved January 21, 2008 from http://findarticles.com/p/articles/mi_m0CTG/is_3_16/ai_ 72731731 O’Rourke, Thomas W. (2000b). Techniques for screening and cleaning data for analysis. American Journal of Health Studies. Retrieved January 21, 2008 from http://findarticles.com/p/articles/mi_m0CTG/is_4_16/ai_ 83076574 Practical Analysis of Nutritional Data (P.A.N.D.A.) (2000). Chapter 2, Data cleaning. Retrieved January 21, 2008 from http://www.tulane.edu/~panda2/Analysis2/datclean/data clean.htm Patton, M.Q. (2002). Qualitative research and evaluation methods (3rd ed.). Thousand Oaks, CA: Sage Publications. Porteous, Nancy L., B.J. Sheldrick, and P.J. Stewart (1997). Program evaluation tool kit: A blueprint for public health management. Ottawa, Canada: Ottawa-Carleton Health Department. Rahm, Erhard and Hong Hai Do (2000). Data Cleaning: Problems and Current approaches. Retreived January 21, 2008 from http://homepages.inf.ed.ac.uk/wenfei/tdd/reading/cleanin g.pdf Rist, R. C. (2000): “Influencing the policy process with qualitative research”, in Denzin, N. K. and Y. S. Lincoln (eds.): Handbook of qualitative research, Thousand Oaks: Sage Publications, pp. 1001-1025. Rossman, G. B. and B. L. Wilson (1994). Numbers and words revisited: Being “shamelessly methodologically eclectic.” Quality and Quantity, 28. Sieber, S. D. (1973). The integration of fieldwork and survey methods. American Journal of Sociology, 78(6). StatSoft (2008). Electronic textbook. Retrieved May 30, 2008 from http://www.statsoft.com/textbook/glosp.html Stemler, Steve (2001). An overview of content analysis. Practical Assessment, Research & Evaluation, 7(17). Retrieved January 16, 2008 from http://PAREonline.net/getvn.asp?v=7&n=17 .

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 589

Chapter 10 Swimmer, Gene, Qualitative Data Analysis, Part I. IPDET Handbook 2005. Presented at IPDET 2006. U.S. General Accounting Office (1996). Content Analysis: A Methodology for Structuring and Analyzing Written Material. GAO/PEMD-10.3.1. Washington, D.C. (This book can be ordered free from the GAO). University of the West of England (2006). Analysis of textual data. Retrieved May 30, 2008 from http://hsc.uwe.ac.uk/dataanalysis/qualTextData.asp Washington State University (2000). A field guide to experimental designs, What is an ANOVA? Retreived January 22, 2008 from http://www.tfrec.wsu.edu/ANOVA/basic.html Weber, Robert Philip (1990). Basic content analysis. 2nd ed., Thousand Oaks, CA: Sage Publications. Weiss, Christopher and Kristen Sosulski (2003) Quantitative Methods in Social Science (QMSS) e-lessons. Columbia Center for New Media Teaching and Learning (CCNMTL). Retreived January 22, 2008 from http://www.columbia.edu/ccnmtl/projects/qmss/anova_a bout.html Wolcott, H.F., (1990) "On seeking – and rejecting – validity in qualitative research", in Eisner, E.W. & Peshkin, A. qualitative inquiry in education: The continuing debate, New York: Teachers College Press.

Web Sites: Online texts and tutorials: CAQDAS, (2008). Lane, D.M. Hyperstat online textbook. http://davidmlane.com/hyperstat/index.html Faculty of Health and Social Care, University of the West of England, Bristol (2006). Data Analysis: http://hsc.uwe.ac.uk/dataanalysis/ Neuendorf, Kimberly A. (2006). The content analysis guidebook online. http://academic.csuohio.edu/kneuendorf/content/index.h tm Practical Analysis of Nutritional Data (P.A.N.D.A.) (2000). Chapter 2, Data cleaning. Retrieved January 21, 2008 from http://www.tulane.edu/~panda2/Analysis2/datclean/data clean.htm

Page 590

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Design Matrix Porteous, Nancy L., B.J. Sheldrick, and P.J. Stewart (1997). Program evaluation tool kit: A blueprint for public health management. Ottawa, Canada: Ottawa-Carleton Health Department. Available online at http://www.phac-aspc.gc.ca/php-psp/tookit.html (English) or http://www.phac-aspc.gc.ca/php-psp/toolkit_fr.html (French) QMSS e-Lessons. Quantitative Methods in Social Sciences. Columbia Center for New Media Teaching and Learning (CCNMTL). http://www.columbia.edu/ccnmtl/projects/qmss/credits.h tml StatSoft, Inc. (2001). Electronic Statistics Textbook. Tulsa, OK: StatSoft. http://www.statsoft.com/textbook/stathome.html Statistics at Square One http://bmj.bmjjournals.com/collections/statsbk/index.shtml

Stat Primer http://www2.sjsu.edu/faculty/gerstman/Stat Primer Computer Software for Qualitative Data American Evaluation Association, Qualitative Software Web site: http://www.eval.org/Resources/QDA.htm AnSWR: Developer Web site: http://www.cdc.gov/hiv/topics/surveillance/resources/ software/answr/index.htm Free download Web site: http://www.cdc.gov/hiv/software/answr/ver3d.htm

Atlas-ti: http://www.atlasti.com/ CDC EZ-Text: Developer Web site http://www.cdc.gov/hiv/topics/surveillance/resources/ software/ez-text/index.htm Free trial download site: http://www.cdc.gov/hiv/software/ez-text.htm Ethnograph http://www.qualisresearch.com/default.htm Hperqual: http://home.satx.rr.com/hyperqual/

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 591

Chapter 10 Loughborough University CAQDAS – a Primer http://www.lboro.ac.uk/research/mmethods/research/sof tware/caqdas_primer.html#what Overview/Summary of Qualitative Software Programs http://www.quarc.de/software_overview_table.pdf QSR Software, for N^ (formerly NUD*IST): http://www.qsr.com.au/ Qualpro: http://www.qualproinc.com/ General Site for Statistics: Interactive Statistical Calculations Pages http://members.aol.com/johnp71/javastat.html Statistics Computer Programs SPSS (free 30 day download) http://www.spss.com OpenStat version 4 (similar to SPSS) http://www.statpages.org/miller/openstat/ Online tutorials for SPSS Tutorial for SPSS v. 11.5 http://www.datastep.com/SPSSTraining.html/ Getting Started with SPSS for Windows http://www.indiana.edu/~statmath/stat/spss/win/ Sites with examples US Census Bureau, International Programs Center http://www.census.gov/ipc/www/idbnew.html Carleton University, Canadian Foreign Policy (Journal), The WWW Virtual Library http://www.carleton.ca/npsia/cfpj World Health Organization (WHO) http://www.who.int/health-systems-performance WHO Statistical Information System (WHOSIS) http://www.who.int/topics/statistics/en/ The North-South Institute http://www.nsi-ins.ca/ensi/research/index.html United Nations Department of Economic and Social Affairs, Statistics Division http://unstats.un.org/unsd/databases.htm United Nations Development Program Human Development Report 2002 http://www.undp.org/hdr2002

Page 592

The Road to Results: Designing and Conducting Effective Development Evaluations

Planning Data Analysis and Completing the Design Design Matrix United Nations Environmental Programme, Division of Early Warning and Assessment (DEWA) http://www.grid.unep.ch United Nations High Commission for Refugees, Statistics and Research/Evaluation http://www.unhcr.ch/cgi-bin/texis/vtx/home UNESCO Institute for Statistics http://www.uis.unesco.org/en/stats/stats0.htm The United States Agency for International Development http://www.usaid.gov/educ_training/ged.html The United States Agency for International Development http://www.dec.org/partners/eval.cfm International Monetary Fund http://www.imf.org/external/pubs/res/index.htm International Monetary Fund http://www.imf.org/external/np/sta/index.htm World Bank Data and Statistics http://www.worldbank.org/data World Bank World Development Indicators 2005 http://worldbank.org/data/wdi2005 Organisation for Economic Cooperation and Development Development Indicators http://www1oecd.org/dac/ International Institute for Sustainable Development Measurement and Indicators for Sustainable Development http://www.iisd.org/measure/default.htm

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 593

The Road to Results Designing and Conducting Effective Development Evaluations

Leading “When we do the best we can, we never know what miracle is wrought in our life, or in the life of another” HELEN KELLER

Chapter 11: Presenting Results •

Communication Basics

•

Writing Evaluation Reports for Your Audience

•

Using Visual Information

•

Making Oral Presentations

•

Peer Review and Meta-evaluation.

Chapter 12: Managing for Quality and Use •

Managing the Design Matrix

•

Managing an Evaluation

•

Managing Effectively

•

Assessing the Quality of an Evaluation

•

Using Evaluation Results.

Leading

Chapter 13: Evaluating Complex Interventions Interventions

Page 596

•

Big Picture Views

•

Country Program Evaluations

•

Thematic Evaluations

•

Sector Program Evaluations

•

Joint Evaluations

•

Global and Regional Partnership Program (GRPP) Evaluations

•

Evaluation Capacity Development.

The Road to Results: Designing and Conducting Effective Development Evaluations

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 11 Presenting Results Introduction Once data collection and analysis are largely completed, it is time to share preliminary results and finalize plans to communicate the final results. Sharing what was learned is one of the most important parts of an evaluation. It is a critical precondition to effecting change. Presenting results can be done in writing, through memos, reports, or verbal communication like conferencing and presentations. This chapter has five parts. They are: •

Communication Basics

•

Writing Evaluation Reports for Your Audience

•

Using Visual Information

•

Making Oral Presentations

•

Peer Review and Meta-evaluations.

Chapter 11

Part I: Communication Basics An evaluation that is not used to inform decisions is of little value. When designing an evaluation, it helps to begin with the end (the ultimate goal) in mind: to provide useful information to stakeholders that leads to decisions about the program, such as: funding decisions, accountability, and/or learning. This is the key difference between research and evaluation. Evaluation is not knowledge for knowledge’s sake. Therefore, it is essential that the results of an evaluation be communicated clearly, accurately, and appropriately for the audience(s) to make use of the information. A communication strategy is an essential component of development evaluation. It is helpful not only to involve the main stakeholder(s) in planning the evaluation but also as to engage them in developing the process and frequency for feedback and communication. Throughout this text, we have emphasized that good communication starts at the very beginning and continues throughout the evaluation. It is not just an activity that takes place at the end of an evaluation. Always remember: •

the goal is to communicate, not to impress

•

make it easy for the reader to get your point

•

keep the purpose and audience(s) in mind.

Choose words and visuals wisely. The real meaning of an evaluation report is not in the writer’s words (or visuals) themselves but in the mind of the audience. For successful communication to occur, the words used in the message must be the same in the minds of both the writer and the reader. People interpret messages based upon their past experiences and perceptions. Different people can understand the same words or visuals differently. For this reason, learn as much as possible about the audience and write the report in a style that will best communicate with them. Try to put yourself in the audience’s place. How would you like to have this message given if you were in the audience? Did you explain everything clearly, or did you take too much for granted?

Page 598

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results Use words that are: •

simple

•

active

•

positive

•

short and concise

•

conversational

•

familiar

•

direct

•

culturally sensitive.

Communication Strategy The point of evaluation is to provide knowledge that can support decision-making, such as policymaking, program changes, or program replication. When planning an evaluation, develop a communication strategy. This strategy should identify who needs to receive the results of the evaluation, in what format, and when they should receive it. A communication plan will likely use several different communication tools. A donor, for example, might want to receive an in-depth formal report, but the local program staff want to receive an overview report with a briefing. Finally, the participants themselves might want to receive a presentation. Use a checklist as shown in Table 11.1 to organize the communication tools needed. Table 11.1: Checklist for Communication Strategy. Audience

Product

Who is responsible

Due date

Donor

Formal Report

Team leader

6/1

Advisory board

Oral Briefing

Team member A

6/4

Local stakeholders

Executive Summary

Team member B

6/8

Team member C

6/11

Oral Briefing Program staff

Copy of formal report Executive summary

Local government officials

Oral briefing

Team leader

6/15

Participants

Oral briefing

Team leader

6/15

Development Evaluation Community

Article for publication

Team leader

8/1

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 599

Chapter 11 If possible, develop the communication strategy before the evaluation to ensure that everyone involved understands what will be required and what will be provided. During the evaluation, make sure that everyone is kept informed of the progress of the evaluation. Use informal communications such as phone, e-mail, faxes, and conversations during the evaluation. For the final report, choose from communications vehicles, such as briefings, presentations, and written reports. Press releases are often used disseminate information to a wider audience. If a press release is planned, the timing should be discussed with the main stakeholder(s), as well as who will be responsible for the release. The same holds true for press conferences and media requests. Be sure to include a feedback process to bring stakeholders and evaluators together to discuss the findings, insights, alternative actions, and next steps. If the evaluation plans to use large group discussions, be sure to consider all the stakeholders connected with the program and to identify any challenges in communicating evaluation results to different stakeholders.

Innovative Communication Strategies Torres, Preskill, & Piontek (1997) surveyed internal and external evaluator’s communication and reporting practices. They found that most evaluators wrote final technical reports but some thought these reports were not effective to promote the use of the evaluation findings. Insufficient time was identified as the most common factor hindering effective reporting. They suggested using alternative communication tools that took less time to create. The alternative tools included brochures, shorter summaries with charts and graphs, and short memos to summarize findings. Lawrenz, Gullickson & Toal (2007) studied ways to disseminate results from evaluations so they are more likely to be used. They considered new ways of disseminating information that might better meet the needs of their stakeholders. They described a reflective case narrative that used alternate dissemination tools, including using the Internet, that might better match the needs of those who might use the results of their study. The case narrative described site visits to 13 different projects by teams of evaluators. As they were preparing the case study, they looked not only at the needs of the client (the organization requesting the evaluation), but also others who might benefit from the information the evaluation discovered.

Page 600

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results The evaluation committee thought the traditional case study would not be helpful in addressing important issues surrounding the program. They decided to share the important information by organizing it into separate issue papers addressing information for different audiences. The purpose of these issue papers was to “synthesize the site visit reports and survey data, along with existing research” (Lawrenz, et. al, 2007, p. 284) into separate documents, highlighting separate issues. These were posted on their Web site. The researchers also decided that site visit handbook, containing the procedures used to conduct the site visits, might be useful for others. They posted this on their Web page and found many organizations and other researchers were interested in the site visit handbook. The evaluators created a small overview brochure as a “teaser to build interest in the issue papers” (Lawrenz, et. al, 2007, p. 286). The brochure stirred up interest in the papers and led to publishing the whole set of issue papers in a journal, disseminating the results further than the original plan. They also developed a three-fold brochure with key action steps for the sustainability of the program they studied. They saved this in PDF format and posted it on their Web site. This brochure was well received by the field because it was easy to use and provided pertinent information about improving projects. Another dissemination technique was a videoconference used to provide more in-depth discussion of the ideas about sustainability presented in the issue paper. The final technique for disseminating information was electronic synthesis. This technique brought the videoconference proceedings together with other information about sustainability by hyperlinking them to a document. This enabled someone who did not attend the conference to learn information from the study. The information included links to information about authors of the study, information from key presenters in the videoconference, links to supporting documents and video materials. It was posted on the Web site but was also available on compact disc. These innovative uses of information sharing were chosen given the needs of the stakeholders to increase the likelihood of the results being used (Lawrenz, et. al., 2007, p. 288).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 601

Chapter 11

Part II: Writing Evaluation Reports for Your Audience The following are keys to writing a good evaluation report.

Page 602

•

Keep it simple.

•

Avoid acronyms.

•

Provide enough information about the research design and methods so others can judge their credibility.

•

Be clear about the limitations of the evaluation and caution the audience about interpreting the findings in ways that may not be valid

•

Place technical information in an appendix, including the design matrix and any survey instrument(s) or questionnaire(s) used.

•

Limit background information to that which is needed to introduce the report, and that which makes it clear that you have an understanding of its context. Additional context can be included as an annex, if necessary.

•

Organize the report material into sections that relate to addressing major themes or answering each of the key research questions.

•

Place major points first in each section, minor points later in the section. Open each paragraph by stating the point it addresses.

•

Leave time to revise, revise, and revise!!

•

Support conclusions and recommendations with evidence.

•

Find a person to proof-read the draft who has not seen any of this material before. Ideally, this should be a detail-oriented person, who is looking to make sure every "i" is dotted and every "t" is crossed. Ask the proofreader to identify anything that was left out or not clear.

•

If possible, ask a colleague who is familiar with the evaluation process to review your final draft – what is called “peer review” – and suggest any final changes to the document before presenting it.

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

Writing the Evaluation Report Report An evaluation report will usually contain an executive summary followed by the remainder of the report (which we call the body of the report). The report should be formatted so that readers can quickly find the information they want.

Indented Text, Headings Headings to the Left One way to draw attention to the key findings is to indent the text, with headings standing out. In the report, indent (space over) narrative text like this paragraph. Headings should remain tight to the left margin where they are visible, so it is easy for the reader to scan the report. Keep in mind that the headings used also appear in the Table of Contents at the beginning of the report. Use descriptive headings for actual chapter and section headings. Many readers will scan the Table of Contents to quickly gain a sense of the contents by its chapter and section headings.

The Executive Summary An executive summary provides a quick overview of the study: the issues studied, questions asked, methods used, and a brief summary of findings and recommendations. It provides a way for the reader to quickly grasp the major highlights and points. The executive summary is not just a condensed version of the “conclusions” section, nor should it be a “teaser” that promises to reveal information later. It should serve as a stand-alone document, published as a summary. The client should be able to read the executive summary and know all the basic facts about the evaluation, and be able to find supporting data for those facts easily by using the table of contents. According the Michael Scriven (2007, p. 1, Executive Summary), the aim of the executive summary is to summarize the results, not just the process. Through the whole process of evaluation, keep asking yourself how the overall summary is going to look, based on what you have learned so far, and how it relates to the client’s and stakeholders’ and audiences’ needs; this helps you to focus on what still needs to be done to learn about what matters most. Check with the main client to see if they have a particular format they prefer, and use the format that the main client prefers.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 603

Chapter 11

Length The executive summary should be short: two pages is great and more than four is too much. Let us emphasize this again – two pages is great and more than four pages is too much!!

Basic Components of an Executive Summary •

Brief overview or introductory paragraph, stating: −

•

Description of the evaluation, stating: −

•

•

the major questions addressed, plus a brief statement about how the evaluation was conducted.

Background information, providing: −

•

purpose of the study, and the situation or issue of concern, written in such a way as to grab the reader's attention, if possible.

only enough information to place the study in context.

Summary of major findings: −

ensure that the major findings relate to the purpose or evaluation questions as stated in the introduction. Use your judgment: what would the audience think is most important?

−

present individual findings in bullet format, or in a narrative

−

use simple, clear, jargon-free language

−

refer readers to the text or to an appendix for more detail, especially technical detail

−

refer readers to the page numbers that correspond with the information in the text that is made in the executive summary.

Major conclusions and recommendations: −

key conclusions and recommendations should clearly relate to the findings

−

present the evidence that supports each conclusion or recommendation.

Keep in mind, there is no single format for an executive summary; it will depend upon the audience.

Page 604

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

the The Body of th e Report The body of an evaluation report should contain the following components, usually divided into chapters (or sections of a shorter report). •

introduction

•

description of the evaluation

•

findings

•

conclusions

•

recommendations.

Introduction The introduction to the report will discuss the purpose of the report and the questions that will be answered. The introduction should create interest using what writers call a “hook” – a way of attracting attention or interest – that draws the reader into the body of the report. The following is a list of components that will be found in an introduction: •

purpose of the report

•

background of the program

•

program goals and objectives

•

evaluation questions and goals.

Description of the Evaluation After the introduction, present a brief description of the evaluation. It will include the following components: •

evaluation focus

•

evaluation design

•

evaluation questions

•

methodology and strategy for analysis

•

limitations of the methodology

•

who was involved in the evaluation and their time frames.

Further technical details should be contained in an appendix.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 605

Chapter 11

Findings Now that the audience has the “big picture” of the evaluation, go on to present the findings. Be sure to: •

present data so that the audience can understand it

•

present data selectively: what are the most important points?

•

organize the findings around study questions, major themes, or program components

•

use charts and tables, or other ways to illustrate findings, to help highlight the major points.

Conclusions and Recommendations Recommendations The final part of the report will be the conclusions and recommendations. The main goal of writing the report is drawing conclusions and making recommendations. The conclusions and recommendations are often the part of the report readers go to, in order to understand the meaning of the whole report.

Conclusions Writers often have difficulty distinguishing findings from conclusions. •

Findings describe what was found in the evaluation, answering the question ‘what did I find from the evidence gathered?’ The findings may relate to whether a criterion has or has not been met. Findings should be supported by persuasive evidence and in some cases may be nearly certain; in others a degree of professional judgment may be needed.

•

Conclusions are a summary opinion drawn from findings and to a greater degree based on professional judgment and relate to the evaluation’s higher-level objectives. Conclusions should be made against each evaluation sub-objective and overall objective (Office of the Victorian Privacy Commissioner, 2007, p. 11).

Conclusions are only a summary of the milestones or discussion of the evaluation; they should not advocate action. Recommendations advocate action. Conclusions must connect the findings to the evaluation questions or evaluation focus by clearly stating the key evidence that supports those conclusions. Conclusions need to be based on the evidence presented in the body of the report. Drawing conclusions is a goal of the report. Many readers go directly to the conclusions to draw meaning from the whole report.

Page 606

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results When writing conclusions, consider the following (Druker, 2006, Conclusions): •

Keep conclusions short (usually a couple paragraphs).

•

Write using simple terms; do NOT use jargon or many technological terms.

•

Emphasize what the report means. −

focus on the main results and what they mean

−

put the analyses of your results together

−

interpret the overall meaning of the results for the reader

−

explain the inferences you want readers to draw from the report

•

Add no new details.

•

Do not merely summarize the report.

Scriven (2007, p. 15-20) suggests considering including five sections in the conclusions. They are: •

overall significance

•

recommendations and explanations (possible)

•

responsibility and justification (possible)

•

report and support

•

meta-evaluation.

For overall significance, Scriven suggests combining what he calls sub-evaluations into an overall evaluation or profile. His sub-evaluations are: process, outcomes, costs, comparisons, and generalizability. The focus or point of view should usually be the present and future impact on consumers’ needs, subject to the constraints of ethics and the law. It should also consider the feasibility, including all the other relevant values. Usually, there may also be some conclusion(s) that refer(s) to the main client’s, and other stakeholders’ needs for information. If feasible, they may also consider their wants or hopes, for example, the goals met, and unrealized value, if calculable. According to Scriven, recommendations and explanations are is not always relevant or feasible and very often will require extra time and/or costs. The merit, worth, or significance of a program are often hard to determine; determining how to improve it, why it works or fails to work, and what one should do with it, are simply other tasks.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 607

Chapter 11 Similar to recommendations and explanations, Scriven also describes responsibility and justification as not always relevant or feasible and very often will require extra time and/or costs. Responsibility and justification, if any, can be determined and should be considered. Some versions of accountability that stress the accountability of people require consideration. For example, evaluating disasters, recently an area of considerable activity, usually involves determining responsibility. Allocating blame or praise requires extensive knowledge of: •

the main players’ knowledge-state at the time of key decision making

•

their resources and responsibilities

•

an ethical analysis of their options, and the excuses they may have.

Not many evaluators have the qualifications to do this kind of analysis. The “blame game” is very different from evaluation in most cases and should not be undertaken lightly. Still, sometimes mistakes are made, are demonstrable, have major consequences, and should be pointed out. Sometimes justified choices, with good or bad effects, are made and attacked, and should be praised or defended as part of an evaluation. Scriven’s describes report and support, as an appropriate way, of conveying conclusions. The reports should include postreport help, such as: •

handling questions when they turn up immediately or later

•

explaining the report’s significance to different groups including: users, staff, funders, and other people impacted by the intervention.

Scriven discusses the importance of trying to identify ‘lessons learned. Also trying to get the results and incidental knowledge findings:

Page 608

•

entered into the relevant databases, if there are any

•

included, if possible, in a journal publication

•

entered into a newly created database or information channel, such as, a newsletter, where beneficial

•

disseminated into wider channels if appropriate, such as, through presentations, online, discussions at scholarly meetings, or in hardcopy posters.

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results Scriven’s final section under conclusions, is the role of metaevaluations. Meta-evaluations are done to identify the strengths, limitations, and/or other uses of evaluations. Metaevaluations are discussed further near the end of this chapter.

Recommendations The evidence supporting each recommendation must be presented or page number references given. Recommendations are often difficult to draft. They should not be overly prescriptive and remove management’s prerogative to remedy the problem as they see fit. Yet they cannot be so general as to give no clear sense of how one would know if they were implemented. They should be clear and specific, identifying who needs to take action and when it should be done. Evaluators do NOT specifically recommend how to implement recommendations. It is the job of management to resolve issues and implement recommendations. For example, a recommendation may be to develop a pricing policy for technical assistance activities but it is not the job of the evaluator to draft it or specify the content. In the recommendation section, answer these questions: •

What do you want the key stakeholder/user to do?

•

What action(s) should be taken?

To make recommendations work: •

Base recommendations on conclusions.

•

Keep them simple.

•

Use a list for emphasis if there are two or more recommendations.

•

Keep the number of major recommendations short and concise – four or five at most. Do not make a long “laundry list” of recommendations. It is better to group recommendations so that they are a manageable number (4 to 6) with sub parts, as needed.

•

Consider tone. Remember that reports do not make decisions; people do.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 609

Chapter 11

Recommendation Tracking System Recommendations serve little purpose if not acted upon. One way to do this is establish a recommendation tracking system (RTS). An RTS allows stakeholders to monitor the implementation of evaluation recommendations. It tracks each recommendation from an evaluation and the progress being made to implement the recommendation, including: •

date of the recommendation

•

who is responsible for taking action

•

response/progress.

Table 11.2 shows a simple matrix that can be used for a recommendation tracking system. Table 11.2: Recommendation Tracking System (RTS) Recommendation

Date

Who is Responsible

Response/Progress

1. 2. 3. 4.

In this matrix, the evaluators can fill out the first two columns but managers have to keep track and ensure that recommendations are followed up on by whomever they designate. The International Finance Corporation (IFC) has its own RTS. Evaluation studies prepared by IFC’s Independent Evaluation Group (IEG) include recommendations for IFC Management, and IFC’s Management Response. These recommendations are discussed at the IFC Board’s Committee on Development Effectiveness (CODE). CODE expects periodic status reports on each recommendation including their level of adoption and status. To track these recommendations, IEG, with IFC, developed a RTS called the Management Action Tracking Record (MATR, pronounced “matter”).

Page 610

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results The MATR is designed to maintain the integrity of the reporting process, wherein neither IEG nor IFC can change finalized ratings. Figure 11.1 illustrates the two stages of the MATR. In the first stage, IEG and IFC agree on indicators to assess the implementation of each new recommendation. In the second stage, the status and level of adoption of each active recommendation is periodically updated and reported to CODE. IEG and IFC ratings need not be the same. When recommendations are implemented, superseded, or no longer relevant, they are made inactive. The IEG recommendations that are not accepted by IFC management are not tracked. Stage One Indicator Cycle

Stage Two Monitoring Cycles

Fig. 11.1: IFC’s Two Stage MATR to Track Recommendations.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 611

Chapter 11

Summary of Report Writing Table 11.3 summarizes important information about writing reports. Table 11.3: Summary of General Guidelines for Writing Reports General Guideline 1.

Keep the report simple, clear, and easy to understand. (Remember – writing a long and confusing report is a hostile act against the reader!)

2.

Avoid using acronyms and jargon.

3.

Provide the minimum of background to establish the context.

4.

Present the most important material.

5

Place major points at the beginning of a section.

6.

Organize your findings and recommendations (if any) around research questions or themes.

7.

Put detailed data analysis material in a technical appendix.

8.

Leave time to revise, revise, and revise.

9.

Have a proof-reader review a preliminary draft of the report.

10.

Have a knowledgeable reader review the draft.

Case11-1 shows an example from the OET/MILIEV Programme in China. The example shows a list of the planned products that were a part of the evaluation. Notice that different audiences receive different report products based upon their needs.

Page 612

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

Case 1111-1: The ORET/MILIEV Programme in China The plan for this evaluation included the following products as a part of the evaluation (Chinese National Centre for Science and Technology Evaluation, 2006, pp 166-167): •

Main evaluation written report

•

Field study report and desk study report

•

Results from the stakeholder dialogue approach workshops

•

Annexes −

glossary of terms

−

terms of reference

−

additional statistical tables and background materials

−

notes on the evaluation methodology

−

evaluation process (procedures, persons interviewed)

−

bibliographic references

−

database for the programme

−

the Policy and Operation Evaluation Department (IOB)

−

National Centre for Science and Technology Evaluation (NCSTE).

The main evaluation report will be produced in both Chinese and English. The evaluation report will be officially presented to: •

Dutch parliament

•

National People’s Congress of China

•

Ministry of Finance of China

•

Ministry of Foreign Affairs of the Netherlands.

Copies of the report will also be disseminated to:

•

State Development and Reform Commission of China

•

Ministry of Foreign Affairs of China

•

Ministry of Commerce China

•

Ministry of Science and Technology of China

•

local financial administrations in China

•

Netherlands Development Finance Company

•

end users

•

suppliers

•

related banks

•

industry associations

•

research institutes

•

other stakeholders.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 613

Chapter 11

Part III: Using Visual Information They say, “A picture is worth 1,000 words”. In many cases this is very true. Harry Cummings (2003, Section 3, slides 2-3) describes the following reasons to use graphics: •

add interest

•

communicate information more clearly and effectively

•

can be used to “lighten” the density of continuous text

•

provide a focal point to attract the audience to key points.

Good graphics are: •

simple

•

communicate information without needing text

•

easily reproduced

•

clearly labeled

•

illustrate patterns that can be easily distinguished

•

culturally appropriate

•

correctly placed in the text

•

consistently numbered and titled

•

provided with correct references (sources).

When using visual aids in a report, also include a list of figures (maps, graphics, tables, charts) at the beginning (or end) of the document, called the “index of tables, figures, etc”. Each individual figure or table must have the following: •

a title that clearly describes what idea the visual aid is attempting to communicate to the reader

•

a number and name of the figure/table within the section

Pictures and Illustrations Levin, Anglin, and Carney (1987) summarized key information about using pictures and illustrations in materials. They concluded two things about the effects that pictures have on the process of learning from prose. •

Page 614

When the illustrations are relevant to the content, then moderate to substantial gains can be expected in learning.

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results •

When illustrations are NOT relevant to the content, or even worse, conflicting, expect NO gain in learning and maybe even confusion.

When applying this research to choosing pictures and illustrations for reports, be sure that any pictures or illustrations are relevant to the content of the report. Within the report, they should be used for a reason, not just for decoration. Pictures or illustrations on the cover of the report can illuminate the overall theme, but pictures used within the report should have a specific and concrete reason for being there. With each picture or illustration within the report, use the narrative of the report to tell the learner what he or she is supposed to see in the picture. Direct them to the picture and tell them what to look for. The following are examples of pictures or illustrations that can be used in a report: •

maps

•

sketches

•

line drawings

•

photographs

•

graphic art.

Maps According to Cummings, (2005, Section 6 slides 2-3) the following are reasons to use maps to display information in a report: •

to indicate the geographic location of a program

•

to provide context

•

to indicate the geographic reach or spread of a program

•

to serve as a basis for a sampling system for surveys (for example, populations within 5 km of a water source)

•

to indicate rates or level of a phenomenon across the topography of an area using patterns or isolines (flooding, for example, or spread of a new strain of disease).

For maps to display information in a report, they must be easy to read and understand. For this reason, Cummings suggests the following guidelines for displaying information using maps: •

When maps indicate different areas, be sure the graphic patterns or colors used to show those areas can be clearly distinguished.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 615

Chapter 11 •

If the report will be printed in black and white only, use different textures (dots vs. stripes, for example)

•

Budget for the expenses of color.

•

Make sure to include: −

the source of the map

−

a compass arrow indicating North

−

the scale (1cm = 1 km, for example).

Also, do not forget to provide a list of figures, maps, and tables, at the front of the report. Make sure the most current map is used.

Sketches As a part of data collection, some evaluators collect sketches made by participants showing their impressions. Cummings offers the following considerations for using sketches: •

can add interest

•

can personalize a report

•

may be part of certain methodological approaches

•

may work in situations where technical capacity does not exist to create more sophisticated illustrations

•

will depend on the talent of the artist

•

a way of introducing humor

•

may give participants a voice that would not otherwise be heard in such detail.

Remember; use sketches only it they are relevant to the content. Also remember, when using a sketch, provide a description of what the readers are supposed to see in the sketch. Figure 11.2 shows an example of a sketch drawn for one project. This sketch shows a child’s impression of life before and after an intervention. In this study, drawing exercises were a part of the qualitative evaluation.

Page 616

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

Source: Nine-year old child involved in MICAN. Fig. 11.2 11.2: Example of a Sketch Used in an Evaluation Report.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 617

Chapter 11

Line Drawings Line drawing can be used in a report to illustrate how something operates or how one object relates to another. Line drawings show the objects with simple lines and eliminate unimportant data. They simplify the situation and the objects so that the reader can focus on the key details. Figure 11.3 shows a line drawing.

Source: Busuladzic & Trevelyn 1999. Demining Research)

Fig. 11.3: 11.3: Example of a Line Drawing Used in an Evaluation Report

(An Ergonomic Aspect of Humanitarian Demining, Deminer’s Position.)

Photographs Now, with the availability of digital cameras, it is easy to include digital photographs in written reports. Recall from earlier in this part to choose photographs only if they are relevant to the data and add information. They should not be used for decoration alone. Cummings (2005, Section 6, slide 7) suggests that photographs are best used: •

to provide context

•

to indicate the extent of field work (progress)

•

as a tool for direct observation (e.g., house types, crowded conditions in a neighborhoods)

•

to familiarize the audience with the field situation

•

to provide evidence for an evaluation.

Be sure to obtain permissions, as needed.

Page 618

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results Figure 11.4 shows an example of a photograph that might be used in an evaluation. It shows a group of secondary school children in a Community Day Secondary School involved in a cooperative learning assignment.

Source: SSTEP, Malawi, (photo by D.S. Novak).

Fig. 11.4 11.4: Example of a Photograph Used in an Evaluation Report (Students Working on a Cooperative Learning Assignment).

Charts and Graphs Charts and graphs are among the many graphics that can be used very effectively to present evaluation findings. They provide a visual representation of size, proportionality, and relationships between diverse sets of data, for example. If properly created, they will require no text to enhance their meaning. However, like all illustrative tools, they must be clearly titled, referenced, and indexed. Descriptions of some types of charts and graphs follow.

Organization Charts Organization charts illustrate the structure of organizations or other entities. Understanding that structure may be the first step to understanding a program – for evaluators and audience alike. Organizations are often studied to determine how efficiently or effectively they operate, and such an analysis requires a clear idea of responsibilities, reporting structure, and so on, which can be clearly and concisely represented in an organization chart.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 619

Chapter 11 Most word processing programs have a feature that assists with quickly creating and revising organization charts. Figure 11.5 gives an example of an organization chart.

Chairperson

Director (1)

Assistant to Director (1)

Director (2)

Assistant to Director (2)

Advisory Board

Director (3)

Internal Audit

Evaluation & Monitoring

Assistant to Director (3)

Manager Manager Manager

Manager Manager Manager

Manager Manager Manager

Source: Fabricated information, 2008

Fig. 11.5 11.5: Example of Organization Chart.

Gantt Charts Gantt charts are used to illustrate a time line associated with a program or an evaluation. Gantt charts are often used for planning. They are useful for project management – especially to plan the project. Figure 11.6 shows an example of a Gantt chart. Activity

May

June

July

Aug

Sept

Oct

Nov

Dec

Acquisition of baseline and survey data Sorting, tabulating, and analyzing data Field work preparation Field work overseas Analysis and report writing

Source: Fabricated information 2008

Fig. 11.6 11.6: Example of a Gantt Chart.

Page 620

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

Graphs and Data Charts In our context, charts will refer to graph-like displays, such as: line charts, pie charts, and bar charts. A graph is a visual representation of a relationship between, but not restricted to, two variables. A graph generally takes the form of a one- or two-dimensional figure such as a scatter diagram. Although three-dimensional graphs are possible, they are usually considered too complex to understand easily.

The Parts of Graphs All graphs have component parts. Table 11.4 describes each of the parts of a graph or chart. Table 11.4: Parts of Graphs. Name of Part

Description

title

All graphs and charts should have titles so that the audience knows the message of the graph immediately

horizontal or x axis

The horizontal or xx-axis is the horizontal line of a line or bar chart, representing one variable (e.g., time)

vertical or y axis

The vertical or yy-axis is the vertical line of a line or bar chart, representing a second variable (e.g., costs)

origin

The origin is the point where the vertical and horizontal axes meet.

grid lines

Many charts include grid lines to help compare data by clearly showing levels. Only a limited number of grid lines should be used to avoid a cluttered look.

axis titles

The x-axis and y-axis titles are very important. They identify what is being measured and the units of measurement (years, meters, pounds, square miles, cubic tons, dollars, degrees, etc.). For example: •

Costs (in USD)

•

Distance (in km)

axis scales

The x-axis and y-axis need appropriate scales to show values. Chose your scale carefully to include the full range of values of your data. Chose the proportions between axis scales to best illustrate the relationship between the variables.

actual values

Many graphs and charts also include the actual values for the entries, shown as additional text within the graphic. These additions are helpful to the reader in grasping the real situation.

coordinate

A coordinate is the point on a graph where the x-value of the data meets the y-value – how this is represented (a point, a peak, the top of a bar, etc.) will depend on the type of graphic you choose.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 621

Chapter 11 A point on a graph represents a relationship between the two variables represented by the axes. A pair of numbers defines each point containing two co-ordinates (x and y). Figure 11.7 illustrates the parts of a graph.

Identifier number

Title

Vertical or yy-axis Fig. 23: Orphanage Food Costs Fluctuate over Six Months Coordinate

2500

Axis label

s r a ll o D n i s t s o C d o o F

22 00

2000

19 00

1500

1700

1 500

1400

1200

1000 500

Actual value

0 Jan

Feb

Mar

Apr

May

Jun

Mon th

Origin Axis titles Grid lines

Horizontal or xx-axis

Source: Fabricated data, 2008

Source and date

Fig. 11.7 11.7: The Parts of a Graph.

The following list of standards to follow when preparing graphs is adapted from Cummings (2005, Section 8 slides 2-4). Graphs must have:

Page 622

•

a standard title and number

•

the source clearly indicated

•

the year the data were collected clearly indicated

•

data in chronological order wherever possible

•

data portrayed using comparable definitions where possible.

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results Standards for format style of data graphics include: •

no overlapping categories (i.e., 1-19, 20-39 instead of 120, 20-30, etc.)

•

patterns (visual textures) that are clear even when photocopied

•

patterns that are clearly labelled as to meaning (use a separate legend or “key” outside the graph to explain (see Figure 13.8, for example)

•

have no extra lines and patterns – only what is necessary

•

avoid areas of black – it is difficult to reproduce accurately

•

font size no smaller than 10 point, for legibility and reading ease.

Types of Graphs There are at least three types of graphs/charts that might be useful for presenting data. They are: •

line graph

•

bar graph

•

pie graph or pie chart.

Line Graphs Line graphs are a way to summarize how two pieces of information are related and how they vary depending on one another. Line graphs are usually used to show how data changes over time. For example, evaluators might use a line graph to show costs for food rising or falling over the months of the year, population changes over many years, or student grades each day over a six-week term. Line graphs can show one item or multiple items as they change over the same period of time. Line graphs are a good way to show continuous data, that is, data that are interval or ratio data. Interval data are data that are divided into ranges and in which the distance between the intervals is meaningful. Examples of interval data are counts, such as counts of income, years of education, or number of votes. Ratio data are interval data the also have a true zero point. Income is ratio because zero dollars is truly "no income.”

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 623

Chapter 11 Figure 11.8 shows a line graph with one kind of data plotted over time Fig. 9: Average Temperatures for Six Months

Degrees C

40 30

32

28

25

20

24 18

10 0 Feb

Mar

Apr

May

Jun

Month

Source: Fabricated Date (2008) Fig. 11.8 11.8: Example of Line Graph for One Kind of Data over Time.

A line graph can help compare more than one kind of data to a common time frame. Vary the line type (dotted, broken, etc.) and add a legend to help the audience understand the information better. Figure 11.9 shows a multiple line chart and with a legend. Fig. 31: Scool 3 Shows Stong Gains in Reading Scores

Grades (out of 100)

120

Legend

90 School 1

60

School 2 School 3

30 0 1st Qtr

2nd Qtr

3rd Qtr

4th Qtr

2005 School Year N=523 Source: Fabricated data, 2008

Fig. 11.9 11.9: Example Example of Multiple Line Chart with Legend.

Page 624

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

Bar Graphs Bar graphs use bars (elongated rectangles) to represent quantities and let us compare numbers. They should be carefully titled to ensure that the reader understands what factors are being represented. There are two kinds of bar graphs; single bar graphs that show information about a single variable, and multiple bar graphs that give information for more than one variable. The bars can be formatted vertically or horizontally. Use a multiple bar graph to compare two or more groups' data on the same variable. For example, an evaluation wants to compare rate of land mine recovery in three different regions of a country. They might use a multiple bar graph to depict the information. In another example, double bar graphs might be used to compare the responses of boys and girls to a questionnaire. Bar graphs are often used to show nominal or categorical data. Nominal or categorical data have no order, and the assignment of numbers to categories is purely arbitrary (i.e. 1=East, 2=North, 3=South, etc.). These categories must be clearly explained in the legend. The following are examples of bar graphs. Figure 11.10 shows the score earned on a parenting test given to four pregnant women. This is an example of the use of horizontal bars. Fig. 5: Scores on Parenting Test Vary Widely

87

Subject

D

45

C

69

B

75

A 0

20

N=93 Source: Fabricated data, 2008

40

60

80

100

Score (percent)

Fig. 11.10 11.10: 10: Example of Single Bar Graph in Horizontal Format.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 625

Chapter 11 Figure 11.11 shows an example of a multiple bar graph in vertical format.

Fig: 25: Responses to Questionnaire Show Directors Responses Vary from Assistants and Laborers

Mean Answer to Question

100

90

90 80 70 60

Directors 45.9

50 40 30

46.9 38.6

30.6

45 34.6

27.4

20.4

43.9

Assistants Laborers

31.6 20.4

20 10 0

1

2 3 Question Number

4

N=173 Source: Fabricated data, 2008

Fig. 11.11 11.11: Example of Multiple Bar Bar Graph in Vertical Format.

Pie Charts A pie chart is a circle graph divided into pieces. Each piece of the pie displays the size of some related piece of information. Pie charts are used to display the sizes of parts that make up some whole. Pie charts should include a legend. Avoid dividing the pie chart into more than eight sections. Similar to bar charts, pie charts also use categorical or nominal data. Figure 11.12 shows an example of a pie chart.

Page 626

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

Fig. 3: Third Quarter Shows Highest Electricity Costs in US$ for 2008

$204.00

$313.00

1st Qtr 2nd Qtr 3rd Qtr

$678.00

$905.00

4th Qtr

Source: Fabricated data, 2008 N=$2100/year total cost for year

Fig. 11.12 11.12: Example of of Pie Chart.

Scatter Diagram A scatter diagram is similar to a line graph, except that coordinates are placed on the graph without any connecting lines. A scatter diagram is used to see if there is a relationship between two different sets of data. For example, suppose a number of people were interviewed and the evaluator compared their wages to their educational levels. The educational level is the x-axis. The salary or wages is the y-axis. When the data are plotted, they should show visually if there appears to be a relationship. A scatter diagram can also be useful combined with a line diagram that plots a mean or average over time, for example, or another trend. The graph then communicates the change in the mean while showing the reader the dispersion of data around the mean. In other words, use this technique to combine linear and non-linear data. Figure 11.13 gives an example of a scatter diagram. This diagram shows the scores on a test compared to the grade level the students completed.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 627

Chapter 11

Mean Score on Test

Fig. 20: Mean Scores show Relationship between Test and Grade Level 100 90 80 70 60 50 40 30 20 10 0 0

1

2

3

4

5

6

7

8

9 10 11 12

Completed Grade N=377 Source: Fabricated data, 2006

Fig. 11.13 11.13: Example of Scatter Diagram.

Choosing a Chart or Graph Type Table 11.4 compares types of charts and graphs and when to use each one.

Page 628

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results Table 11.4: Comparison of Chart and Graph Types. If you want to:

Then choose this chart type type

Show trends over time

Line Chart

Show percent distribution of a single variable

Single Bar Chart

Compare several items

Multiple Bar Chart

Show parts of a whole

Pie Chart

Show trends or relationships between data. Especially useful for data that are nonlinear

Scatter Diagram

Example

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 629

Chapter 11

Tables Tables can be used to present information in an organized manner. There are two types of tables to consider using in a report. They are: •

data tables

•

classification tables (matrix or matrices).

Data Tables It is useful to present numerical information in tables, called data tables. Data tables often provide the basis for presenting data in other formats, such as line and bar charts and are often put in annexes. As with pictures and illustrations, the audience will not automatically know what to look for in a table. It is helpful if the title of a table describes what they should see and how they can relate the information. A short description of what to see should also be included in the narrative of the report. Whenever presenting data in a table, include the sources of the data and the year in which the data were collected. Eherenburg (1977, pp. 277-279) summarizes principles to guide the design of tables to present information. •

Round-off numbers to no more than two significant digits. This helps audience make comparisons.

•

Provide sums and means of rows and columns (as appropriate) to help audience make comparisons of individual cell entries.

•

Put the most important data into columns because it allows the reader to easily make comparisons.

When deciding on the format of the table, keep in mind that too many lines (dividing cells of the table) will make it difficult to read. This is shown in the next two tables, where Table 11.5 gives an example of data in a table with many lines. Table 11.6 shows the same data in a table with fewer lines. Notice how the data becomes the focus in the second table, not the lines. Also, notice that the last row of the table shows the means for the columns with data.

Page 630

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results Table 11.5: Example of Data in a Table with Many Lines. Demographic Information on Participants Participant number

Height

Weight

Age

District

1

44

30

7.2

North

2

46

35

7.1

East

3

40

20

7.6

North

4

32

22

7.2

South

5

29

23

7.0

South

6

50

38

7.8

North

7

44

30

7.3

West

8

44

28

7.3

West

9

42

30

7.5

East

10

48

45

7.9

South

Mean

38.09

27.36

6.72

N=10, Source: Fabricated data, 2008

Lines.. Table 11.6: Example of Data in a Table with Few Lines Demographic Information on Participants Participant number

Height

Weight

Age

District

1 2 3 4 5 6 7 8 9 10 Average

44 46 40 32 29 50 44 44 42 48 38.09

30 35 20 22 23 38 30 28 30 45 27.36

7.2 7.1 7.6 7.2 7.0 7.8 7.3 7.3 7.5 7.9 6.72

North East North South South North West West East South

N=10, Source: Fabricated data, 2008

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 631

Chapter 11

Classification Tables (Matrices) A classification table, or matrix, has a layout that shows how a list of things has been organized according to different factors. At least two sorting factors indicate similarity or difference among the things that are classified. Classification tables can also help illustrate complex information. A design matrix is a classification table. Table 11.7 shows another example of a classification table or matrix. Table 11.7: Example of Classification Table. Poverty Reduction Strategies: Case Study Countries County

PRSP Date

Years of Implementation

HIPC

OED/IEO

Ethiopia

17-Sep-2002

4.7

Yes

OED

Guinea

25-Jul-2002

4.9

Yes

IEO

Mauritania

6-Feb-2001

6.3

Yes

OED

Mozambique

25-Sep-2001

5.7

Yes

OED/IEO

Tanzania

30-Nov-2000

6.3

Yes

OED/IEO

As of May 2007: Source: World Bank OED, 2007

Illustrating Evaluation Concepts Graphics can also help visualize evaluation concepts. Cummings (2005, Section 9, slides 2-13) suggests using graphics for showing research design, impact, and/or program logic charts in graphic form.

Illustrating Evaluation Design Diagrams can also help illustrate evaluation design. In this case, we used a computer’s table drawing feature to create these diagrams. Figure 11.14 shows an experimental design matrix. Figure 11.15 shows a quasi-experimental design. Notice that for this quasi-experimental design, one of the cells in the matrix is removed to show there is no comparison group for baseline data. Figure 11.16 shows a historical or retrospective design approach. This matrix has no comparison group.

Page 632

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

Program Group

Comparison Group

Baseline Baseline + Time Fig. 11.1 11.14: .14: Experimental Design Program Group

Comparison Group

Baseline Baseline + Time Fig. 11.15 11.15: QuasiQuasi-Experimental Design Program Group Baseline Baseline + Time

Fig. 11.16 11.16: Retrospective Design

Illustrating Impact The impact of an evaluation can also be shown using a graphic. In this case, set the graphic up to compare the measurement at the baseline and compare it to the results using a line or bar graph. Figure 11.17 shows a bar chart illustrating the impact of a study. Fig. 46: MICAH Shows Decrease in Stunting

Percent of Population

60 50 40 Baseline (1996)

30

Follow-up (2000)

20 10 0 Stunting

Wasting

Underweight

Source: World Vision, Canada, 2002

Fig. 11.17 11.17: Example of Bar Chart Showing Impact.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 633

Chapter 11

Program Logic Charts Graphics can be used to illustrate the program theory model. Graphic show the program elements and their logical links. A graphic presents a “picture” of the program and the way it is supposed to work. The graphics can be used as a framework for program descriptions and/or for evaluation reporting. Figure 11.18 shows an illustration of a results chain. Inputs/Resources

Activities

Outputs Outputs

Outcomes

Source: HCA’s Standard Results Chain Model

Fig. 11.18 11.18: Example of Graphic of Results Chain.

Figure 11.19 shows an example of a graphic illustration of a logic model.

Inputs

Activities

Outputs

Outcomes

Impact

Human, organization and physical resources contributed directly or indirectly by stakeholders

Technical assistance and training tasks organized, coordinated, and executed by project personnel

Completion of activities

Development result that is the logical consequence of achieving a combination of outputs

Development result that is the logical consequence of achieving a combination of outputs and outcomes

Fig. 11.19 11.19: Example of Graphic of Logic Model.

Page 634

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results Figure 11.20 shows a graphic illustration of a logical framework. Narrative Summary

Expected Results

Performance Measurement

Goal

Impact

Performance Indicators

Purpose

Outcomes

Performance Indicators

Assumption / Risk Indicators Assumptions

Risk Indicators

Resources

Outputs

Performance Indicators

HCA’s Recommended Model

Fig. 11.20 11.20: 20: Example of Graphic of Logical Framework.

Visual Information Design from Tufte Edward Tufte (pronounced Tuf-tea) is a well respected figure in the field of visual information, specifically data design. Tufte has published several books on information. Much of his work concerns the problem of presenting large amounts of information succinctly. He writes, designs, and self-publishes his books on analytical design, which have received more than 40 awards for content and design (The Work of Edward Tufte and Graphics Press, 2007). In his first book, The Visual Display of Quantitative Information, Tufte (1983, p. 13) describes graphical excellence in statistical graphics as “complex ideas communicated with clarity, precision, and efficiency”. He states “graphics reveal data”. He goes on to say that “graphics can be more precise and revealing than conventional statistical computations”.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 635

Chapter 11 Tufte identifies the following characteristics of excellent graphical displays: •

show the data

•

induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production, or something else

•

avoid distorting what the data have to say

•

present many numbers in a small space

•

make large data sets coherent

•

encourage the eye to compare different pieces of data

•

reveal the data at several levels of detail, from a broad overview to the fine structure

•

serve a reasonable clear purpose: description, exploration, tabulation, or decoration

•

be closely integrated with the statistical and verbal descriptions of the data set (Tufte, 1983, p. 13).

According to Tufte (1983, p. 91), “data graphics should draw the attention to the substance of the data, not something else”. The display of information should have less detail in the grid lines and detailed label, and more information in the actual data. The graphic should focus on the time series itself, not the structure of the graphic (gridlines, data labels). Tufte calls the amount of ink used to present information “data ink”. Most of the visual representation should be devoted to the actual data, much less should go to the frame, the grid lines, and the data labels. Tufte also identifies “chartjunk” as an interior decoration of graphics that generates a lot of ink that does not add any additional information (Tufte, 1983, p. 107). Chartjunk includes fills that add no additional information to the data. Figure 11.21 shows examples of data ink and chartjunk. Notice the amount of ink devoted to the background the shading and the gridlines). Most do not add to the value of the data. A few gridlines to help the observer see the values are all that is needed in the background. The data labels at the top of each column also distract from helping the audience see the data and how it relates to the other data. Notice the diagonal lines, grids, waves, and dots used to fill the boxes. They are examples of chartjunk. They do not add any information, but may distract from the data by taking the emphasis from the data, to the chart structure.

Page 636

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

60 54 50 Frequency

40 30

27

38 34 30

49 46 40

20

20

4139 1 2 3

20

10 0 East

West

North

South

Fig. Fig. 11.21 11.21: Graph Showing Improper Data Ink Ink and Chartunk

According to Tufte (1983, p. 92), “Data graphics should draw the viewer’s attention to the sense and substance of the data, not to something else.” The following is a summary of ways to maximize data ink:

•

Avoid heavy grids

•

Replace enclosing box with an x/y grid

•

Use white space to indicate grid lines in bar charts (VD128: white spaces)

•

Use tics (w/o line) to show actual locations of x and y data

•

Prune graphics by: replacing bars with single lines, erasing non-data ink; eliminating lines from axes; starting x/y axes at the data values [range frames])

•

Avoid over busy grids, excess ticks, redundant representation of simple data, boxes, shadows, pointers, legends. Concentrate on the data and NOT the data containers.

•

Always provide as much scale information (but in muted form) as is needed (1983, 124-137).

Figure 11.22 shows one example of the same data from Figure 11.21 using many of Tufte’s principles.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 637

Chapter 11

60 50 40 Score 30 20 10 0 East

West

North

South

Fig. Fig. 11.22 11.22: Graph Showing Better Use of Data Ink and No Chartjunk.

The principles in Tufte’s first book are valuable for those presenting information from evaluations. Tufte has three other books that those with a particular interest in visual design might find helpful. Tufte’s second book, Envisioning Information (1990), discusses ways to take information placed on flat (paper, computer screen) media and making it high-dimensional. The book describes ways to present complex material in a visual way. Tufte’s third book, Visual Explanations: Images and Quantities, Evidence and Narrative (1997, p. 9), discusses “presenting information about motion, process, mechanism, cause and effect.” Tufte’s fourth book, Beautiful Evidence (2006) is about “how seeing turns into showing”, how empirical observations turn into explanations and evidence presentations.” He believes good graphics can provoke stores, thoughts, and memories. Good examples of effective graphics are in the newspaper: sports, weather, and finance. They can be used as models for presenting data information. Tufte teaches many seminars on visual communication. Many participants in his seminars post summaries of the valuable information they learned in the seminars on the Internet. The following is a list of tips one participant posted.

Page 638

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

9 Tips for Effective Visual Display of Information (Source: M. Lentz, 2007, Tufte and Visual Presentation)

1 2.

Show comparisons. Show causality.

3. Show multi-variate information (more than 1 or 2 variables). 4. Completely integrate word, number, and image. 5. Document everything and tell people about it. (This provides credibility.) 6. Content counts most of all. Design cannot salvage bad or dull content. 7. Do your important analysis linked in space and not stacked. (View as much as possible in your visual display.) 8. Use small multiples. (Put lots of information adjacent in space.) 9. Put everything on a universal grid. Put everything in context.

Graphic Design Tips from Tufte •

A good graphic can provoke stories, thoughts, and memories.

•

Display your graphics like they do in the newspaper sports sections. Use weather, sports, and finance pages as a model.

•

Show data in a contextual order, such as performance instead of alphabetical.

•

Approach design as a research problem, and not a creative process.

•

Codes & legends are impediments to learning. Put names on your objects.

•

Clutter is a failure of design & confusion.

•

There is no relationship between the amount of detail and the difficulty of reading. To simplify, add detail.

•

The single biggest threat to learning the truth is cherry-picking the evidence. As a consumer of presentation, ask WHERE the data came from and trust your intuition.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 639

Chapter 11

Tips and Tricks for Effective Tables and Charts The following are important to remember when using tables and graphs or charts. •

Page 640

Use a readable typeface (font), no smaller than 10 point in size: −

use upper and lower case lettering

−

avoid using too many sizes and types of lettering

−

make it easy for audience to read (horizontal orientation if possible, no interference from other elements of the graph).

•

Avoid busy and unnecessary patterns.

•

Be generous with the use of white space, to provide visual relief.

•

Keep scales honest, clearly specified, and appropriately sized.

•

Present sufficient data to communicate the message, but limit the amount of data presented in a single graphic.

•

The message (and the graphic that communicates it) should make sense to a reasonable person.

•

Include data tables to support the charts, and place those tables in an appendix.

•

Indicate the source of the information in parentheses, directly below the table and to the left margin.

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

Summary of Graphs, Charts, and Tables Use charts and tables to: •

Communicate complex ideas clearly, precisely, and efficiently.

•

Present data in a way that makes it easy to understand.

•

Give your message impact.

•

Increase audience acceptance.

•

Increase memory retention.

•

Show the big picture, patterns, and trends.

•

Provide visual relief from narrative.

A table or chart should: •

Show the data simply and accurately.

•

Encourage the audience to think about the information.

•

Be carefully designed to avoid presenting distorted data or communicating distorted messages about findings.

•

Make large data sets coherent – orderly, logical and, consistent in the relationships among data.

•

Encourage the reader to compare different pieces of the data.

•

Enhance the statistical and prose descriptions of the data.

•

Serve a clear purpose: −

to describe

−

to explore

−

to tabulate

−

to elaborate

−

to compare

−

to elucidate (to make clear).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 641

Chapter 11

Part IV: Making Oral Presentations They say that fear of speaking in public is one of the greatest fears of most people. One way to help ease presentation fears is to be well prepared for the presentation. Here, we will look at some of the ways to do this, including: •

planning for your audience

•

preparing your presentation

•

enhancing your presentation

•

practicing your presentation.

Planning for Your Audience If the report includes presenting results orally to an audience, the presenter needs to make some careful preparations. Begin by asking the following questions about the audience and the message to them. •

Who is your audience?

•

What do they expect? How much detail do they want?

•

Are there any specific language or technical challenges to communicating this information to this audience?

•

What is the point of the presentation? −

•

What are the three things you want the audience to remember?

Can you get any feedback from members of the potential audience to check your expectations against theirs?

Next, consider the logistics for the presentation.

Page 642

•

How much time do you have?

•

What are the resources of the room for delivery of the presentation: slides, overheads, PowerPoint, poster boards?

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

Preparing Your Presentation When preparing a presentation, always keep the audience in mind and the message you want to send. For the actual presentation, follow the simple rules given for making presentations. 1. Tell them what you will tell them. 2. Tell them. 3. Tell them what you told them. What this means is to begin the presentation by introducing the audience to what you will tell them. Consider listing the major topics to be covered. After the quick overview, go ahead delivering the report with the message to the audience. As a rule, a short report is always better than a long one. Stick to the key issues. After delivering the report, finish the presentation by summarizing and reviewing the most important information.

Enhancing Your Presentation Visual elements can enhance a presentation. Key data tables or graphs can be enlarged and shared with the audience. Depending upon the resources that are available, consider: slides, overheads, PowerPoint, and/or poster boards. When planning to use visuals or displays to help communicate a message, be sure to have a back-up plan in case the electricity or the equipment fails (full page sized copies included in appendix, or be ready to distribute if needed). It is a good idea to have a few, well-chosen handouts. Make copies of the most important information to be shared so the audience can take them away. If there is a lot of information on the slides, the audience will appreciate having copies of these. Printing two slides per page is more readable for the audience than printing six or even nine slides to a page. Some pass out their handouts when they speak about the information on the handout. Others choose to pass the handouts out at the end of the presentation. If there are complex data or tables, hand out the tables as you talk about them. Note, however, that people tend to look ahead in the handouts and some people may not attend to the presentation while they read ahead.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 643

Chapter 11 Keep the following points in mind when designing both overheads and handouts. •

Use few words.

•

Use clear visuals.

•

Use lots of white space (blank space around the information).

•

Limit the amount of text to no more than eight lines, for a single presentation slide or overhead.

When making overheads and handouts, keep in mind that the audience is there to listen to the presentation. The presenter does not need to put everything into the overheads and handouts. If the audience is reading the overheads or handouts while the presenter is presenting, again, they may miss important information in the presentation. A general rule is that any one overhead or presentation slide should have no more than eight lines of text.

Using Presentation Programs Microsoft Office PowerPoint and other presentation program can be powerful tools to help present your information, but it needs to be thoughtfully prepared with the audience in mind. There are several advantages to using a presentation program. Two of these advantages are: •

may encourage the use of visuals because it saves time in hand drawing information on slides, blackboards, whiteboard, or overhead transparencies

•

may help a presenter feel more confident about presenting

•

helps presenters organize their information.

Presentation programs have been criticized by many, one being Edward Tufte, better known for his books on visual information design. Tufte argues that PowerPoint tends to elevate “format over content, betraying an attitude of commercialism that turns everything into a sales pitch” (Tufte, 2003) and the “pushy style seeks to set up a speaker's dominance over the audience. The speaker, after all, is making power points with bullets to followers” (Tufte, 2003).

Page 644

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results Tufte adds: Audiences consequently endure a relentless sequentiality, one damn slide after another. When information is stacked in time, it is difficult to understand context and evaluate relationships. Visual reasoning usually works more effectively when relevant information is shown side by side. Often, the more intense the detail, the greater the clarity and understanding. This is especially so for statistical data, where the fundamental analytical act is to make comparisons (Tufte, 2003). While presentation programs can be overused, we believe that, when done well, they are effective. To be visible to the audience, a good slide contains a maximum of 40 words. When an entire presentation is put into a presentation program, with each slide having so little information per slide, many, many slides are needed. Evaluators want to communicate the results of their evaluation. The key word is communicate. Posting bulleted lists on an overhead may help organize thoughts, but hinder communication. During a presentation, people want contact with a person, they want to see someone who knows the evaluation and is eager to talk about it. They want to see a presentation, not just read slides. If you want them to read, give them the report, in print.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 645

Chapter 11 Limit the number of slides and only put the most important information on your slides. Most of the presentation should be the evaluator speaking about the evaluation giving it more life. Here are some tips for using presentation programs: (adapted from Taylor, 2007). •

Begin with a slide that catches their attention about the subject. Let them read it (about 10 seconds) then spend the next five minutes talking about what you mean and why it is important to them.

•

Use a minimum number of slides that reinforce one key point for a given section of your presentation. Consider slide that present the following:

•

−

an important point

−

an amazing fact that you did not realize

−

an amazing fact that maybe you knew

−

a fact that you might have know, but did not realize was relevant

−

a fact that needs to be stated because you can not just say it and expect them to remember

−

another point, etc.

−

the important conclusion.

Show the slide, and then tell them the rest.

Tufte has additional comments on PowerPoint. “Presentations largely stand or fall on the quality, relevance, and integrity of the content. If your numbers are boring, then you've got the wrong numbers. If your words or images are not on point, making them dance in color won't make them relevant. Audience boredom is usually a content failure, not a decoration failure. The practical conclusions are clear. Presentation programs are competent slide managers and include a projector. But rather than supplementing a presentation, they have become a substitute for good presentations. Such misuse ignores the most important rule of speaking: “Respect your audience” (Tufte, 2003).

Page 646

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

Practicing Your Presentation One of the best ways to ensure a good presentation is to practice. Rehearse the presentation alone at first, and then rehearse in front of another person or persons. Get feedback from others after the rehearsal. Adjust the presentation based on what you feel and what others have to say. During the practice, keep track of the time spent presenting. Remember, the presentation should fit into the time slot. Audiences do not like to have presentations that go over the allotted period. During the presentation, talk to the audience, not to the notes. It is important to make eye contact with many people in the audience.

Presentations Tips from Tufte This collection of tips was transcribed from a seminar given in Seattle by Edward Tufte (Kaplan, 2003, Presentation Tips). Show up early, and something good is bound to happen. You may have a chance to head off some technical or ergonomic problem. Also, whereas at the end of a talk people are eager to rush off and avoid traffic, at the beginning they filter in slowly. It's a great time to introduce yourself. Have a strong opening. Tufte offers a few ideas for structuring your opening: •

Never apologize. If you're worried the presentation won't go well, keep it to yourself and give it your best shot. Besides, people are usually too preoccupied with their own problems to notice yours.

•

Open by addressing the following three questions: What's the problem? Who cares? What's your solution? As an alternate but more sophisticated technique, Tufte offers the following anecdote. A high-school mathematics teacher was giving a lecture to an intimidating audience: a group of college math professors. Early in the presentation, the teacher made a mathematical error. The professors immediately noticed and corrected the problem. And for the rest of the lecture, they were leaning forward, paying attention to every word, looking for more errors!

•

PGP: with every subtopic, move from the Particular to the General and back to the Particular. Even though the purpose of a subtopic is to convey the general information, bracing it with particulars is a good way to draw attention and promote retention.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 647

Chapter 11

Page 648

•

Not so much a tip as a law: Give everyone at least one piece of paper. A piece of paper is a record, an artifact from your presentation. People can use that artifact to help recall the details of the presentation, or better yet to tell others about it.

•

Know your audience. This is of course a general piece of advice for public speaking, but Tufte adds his own twist: know your audience by what they read. Knowing what they read tells you what styles of information presentation they are most familiar and comfortable with. Adapting your presentation to those styles will leave fewer barriers to the direct communication of your material.

•

Rethink the overhead. Tufte spent a lot of time explaining why the overhead projector is the worst thing in the world. There's a lot of truth to what he said. Bulleted lists are almost always useless. Slides with bulleted lists are often interchangeable between talks.

•

The audience is sacred. Respect them. Don't condescend by "dumbing down" your lecture. Show them respect by saying what you believe and what you know to be the whole story.

•

Humour is good, but be careful with it. Humour in a presentation works best when it actually drives the presentation forward. If you find you're using canned jokes that don't depend on the context of the presentation, eliminate them.

•

Also, be very careful about jokes that put down a class of people. If you're going to alienate your audience, do it on the merits of your content.

•

Avoid masculine (or even feminine!) pronouns as universals. It can be a nuisance to half the audience. As universals, use the plural "they". The Oxford English Dictionary has allowed "they" as a gender neutral singular pronoun for years.

•

Take care with questions. Many people judge the quality of your talk not by the twenty minutes of presentation, but on the thirty seconds you spend answering their question. Be sure to allow long pauses for questions. Ten seconds may seem like a long pause when you're at the front of the room, but it flows naturally from the audience's point of view.

•

Let people know you believe your material. Speak with conviction. Believing your subject matter is one of the best ways to speak more effectively!

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results •

Finish early. and something good is almost bound to happen. If nothing else, people will be able to leave early, and suddenly they'll have an extra couple of minutes to do things they didn't think they'd get to. People will really like you if you do that.

•

Practice. Practice over and over and over. If you can, record your presentation. Play it back and watch yourself. You'll discover a thousand horrible things you never knew about yourself. Now watch it again without the sound. Why are your hands flying around like that? Now listen to it without the picture. Get rid of those ums! Now watch it at twice the normal speed. This emphasizes low-frequency cycles in your gestures.

•

The two most dehydrating things you can do in modern civilization are live presentations and air travel. In both, the way to stay sharp is to drink lots of water. Take care of your body, especially your voice. If possible, avoid alcohol too.

Part V: Peer Review and MetaMeta-evaluation Some evaluations will undergo a peer review. Peer review is a process used for checking the work performed by one's equals (peers). The peers evaluate the work based upon specific criteria. Many professional occupations use peer review to allow peers to read a work looking to identify other's errors quickly and easily, resulting in a better and more accurate final product. Meta-evaluations (Scriven, 2007, pp. 7-10) are evaluations of an evaluation. Meta-evaluations are done to identify the strengths, limitations, and/or other uses of an evaluation. Meta-evaluations should always be performed as a separate quality control step, as follows: •

by the evaluator after completion of the final draft of any report

•

whenever possible also by an external evaluator of the evaluation (called the meta-evaluator).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 649

Chapter 11 The primary criteria of merit for evaluations are: •

validity

•

usefulness (usually to clients, audiences, and stakeholders)

•

credibility (to select stakeholders, especially sources of funds, regulatory agencies, and usually also to program staff)

•

cost-effectiveness

•

ethicality (that there are ethical considerations that need to be addressed).

There are five ways to go about a meta-evaluation: consider the following: •

apply the Key Evaluation Checklist (KEC) list to the evaluation itself

•

use a special meta-evaluation checklist (there are several available, the bibliography at the end of this chapters provides sources)

•

apply the Program Evaluation Standards to it.

It is highly desirable to employ more than one of these approaches.

Page 650

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

Summary Presenting the results of an evaluation accurately and clearly is a key part of the evaluation. A communication plan is a way to determine who needs to learn what, how much they need to learn, and when will they receive the information. Written and oral presentations are the two most common methods of communicating reports. Whatever method is used, the communication must be aimed at the audience, what they already know and what they need to know. Written reports should include: • executive summary − brief overview or introductory paragraph: − description of the evaluation, stating: − background: − summary of major findings − summary of conclusions and recommendations: • body of report − introduction − description of the evaluation − findings − conclusions − recommendations. Both written and oral reports can use visuals to assist with the communication. These include: •

pictures and illustrations

•

charts, graphs, and data charts

•

tables.

For oral presentations, follow the simple rules given for making presentations. •

Tell them what you will tell them.

•

Tell them.

•

Tell them what you told them.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 651

Chapter 11

Chapter 11 Activities Application Exercise 11.1: Review Evaluation Reports Instructions: Develop a list of criteria for judging an evaluation report in terms of how well it conveys evaluation findings to its intended audience. Ease of reading, clarity, use of tables of charts, and visual appeal might be among these criteria. Based on those factors, assess a report that has recently been written in your field (in a group, if possible). Give it an overall grade based on your assessment of each of the criteria you identified: A for excellent, B for very good, C for adequate and NI for needs improvement. Next, identify the most important improvements that could be made to the report so that it more effectively communicated the evaluation findings. If possible, present your findings to a group of colleagues.

Page 652

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results

Application Exercise Exercise 1111-2: Tailor Reports to Audiences Instructions: For the report you reviewed in Exercise 11-1, identify the various audiences that might be interested in the evaluation findings and/or the methodology. What information needs and other characteristics distinguish each audience group you have identified? Consider: •

Which aspects of the evaluation will be of greatest interest to each audience group and why?

•

At what point would this information be most useful to them?

•

What level of detail will be of interest to each audience group? [Consider the likely levels of expertise in international development, research methods, and evaluation; also the probable time available to peruse the report; you may think of other factors.]

•

What is the best way to communicate your findings so that it fits each group’s needs and preferences? [Consider literacy level, time available, likelihood of wanting to raise questions, any communication preferences you might be aware of, technical resources and challenges, etc.]

Based on your analysis, create a checklist to show which audiences should receive what kind of report/presentation, when, and from whom.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 653

Chapter 11

References and Further Reading Busuladzic, Dino, and James Trevelyan (1999). The University of Western Australia, Demining research, An ergonomic aspect of humanitarian demining . Retrieved on January 24, 2008 from http://www.mech.uwa.edu.au/jpt/demining/tech/dino/er gonomics.html. Chinese National Centre for Science and Technology Evaluation (NCSTE) (China) and Policy and Operations Evaluation Department (IOB) (the Netherlands) (2006). Country-led Joint Evaluation of the ORET/MILIEV Programme in China. Amsterdam: Aksant Academic Publishers. Cummings, Harry (2003). Using graphics in development evaluations. Presentation at IPDET, July 10, 2003. Druker, Phil (2006). Advanced Technical Writing. Course from University of Idaho. Retrieved August 9, 2007 from: http://www.class.uidaho.edu/adv_tech_wrt/week14/concl usion_recommendation_final_report.htm Ehrenberg, A. S. C. (1977). Rudiments of numeracy. Journal of the Royal Statistical Society A, 140, 277-297. Cited by Wright, P. (1982). A user-oriented approach to the design of tables and flowcharts. In D. H. Jonassen (Ed.) The technology of text: Principles for structuring, designing, and displaying text (pp317-340). Englewood Cliffs, NJ: Educational Technology Publications. Kaplan, Craig S. (2003). The Craig Web experience (Web log), “Presentation Tips”. Retrieved January 23, 2008 from http://www.cgl.uwaterloo.ca/~csk/presentations.html Lawrenz, Frances, Arlen Gullickson, and Stacie Toal (2007). “Dissemination: Handmaiden to evaluation use”. American journal of evaluation. Vol. 28 no. 3, September 2007, pp 275-289. Lentz, Michelle (2007,). Write Technology: Web 2.0 consulting, training, technical writing, “Tufte and visual Presentation” Wrtite Technology weblog, posted January 16, 2007. Retrieved June 2, 2008 from http://www.writetech.net/2007/02/presentation_st.html Lester, P.M. (2000). Visual communication – Images with messages (2nd ed). Canada: Wadsworth. Levin, J. R., G. J. Anglin, and R .N. Carney (1987). “On empirically validating functions of pictures in prose,” in D.A. Willows and H.A. Houghton (Ed) The psychology of illustration, Volume 1. Hong Kong: Springer-Verlag. Page 654

The Road to Results: Designing and Conducting Effective Development Evaluations

Presenting Results Office of the Victorian Privacy Commissioner (2007). Privacy audit manual, November 2007. Melbourne, Australia: Office of the Victorian Privacy Commissioner. Scriven, Michael (2007). Key Evaluation Checklist, February, 2007. Retrieved August 9, 2007 from: http://www.wmich.edu/evalctr/checklists/kec_feb07.pdf (Taylor, Dave (2007). Use PowerPoint to enhance your presentation, not cripple it. Retrieved January 25, 2008 from http://www.intuitive.com/blog/use_powerpoint_to_enhanc e_your_presentation_not_cripple_it.html Torres, R., H.S. Preskill, and M.E. Piontek (1996). Evaluation strategies for communicating and reporting. Thousand Oaks, CA: Sage Publications. Tufte, Edward R. (1983). The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press. Tufte, Edward R. (1990). Envisioning Information. Cheshire, CT: Graphics Press. Tufte, Edward R. (1997). Visual Explanations: Images and Quantities, Evidence and Narrative. Cheshire, CT: Graphics Press. Tufte, Edward R. (2003). PowerPoint is Evil: PowerPoint Currupts. PowerPoint Corrupts Absolutely. Wired, Issue 11.09, September 2003. Retrieved January 25, 2008 from http://www.wired.com/wired/archive/11.09/ppt2.html Tufte, Edward R. (2006). Beautiful Evidence. Cheshire, CT: Graphics Press. Tufte, Edward R. (1997). Visual explanations: Images and quantities, Evidence and narrative. Cheshire, CT: Graphics Press.Wallgren, A. et al (1996). Graphing Statistics and Data. Thousand Oaks, CA: Sage Publications. Tufte, Edward R. (1990). Envisioning information. Cheshire, CT: Graphics Press. Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press. The Work of Edward Tufte (2008). Retrieved January 24, 2008 from http://www.edwardtufte.com/tufte/ Wallgren, Anders; Britt Wallgren; Rolf Persson; Ulf Jorner; and Jan-Aage Haaland (1996). Graphing statistincs and data. Thousand Oaks, CA: Sage Publications.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 655

Chapter 11

Web Sites: Creating Tables in Microsoft Word – Part I Video Tip http://www.helpmerick.com/node/1120 National Center for Education Statistics. Create a graph. http://nces.ed.gov/nceskids/Graphing/ Oldfield, F. (2001) Educational Resources for Adults (ERforA): Learning about Charts and Graphs http://www.fodoweb.com/erfora/readtext.asp?txtfile=com munications/charts.toc OSHA Office of Training and Education (1996, May). Presenting effective presentations with visual aids. U.S. Department of Labor. http://www.osha-slc.gov/doc/outreachtraining/htmlfiles/traintec.html

Scriven, Michael, Key Evaluation Checklist http://www.wmich.edu/evalctr/checklists/kec_feb07.pdf Statistics Canada: Using Graphs http://www.statcan.ca/english/edu/power/ch9/using/usi ng.htm Torok, George Presentation Skills Success http://www.presentationskills.ca/ Zawitz, M. W. (2000). Washington Statistical Society Methodology Seminars Data Presentation: A guide to good graphics, Bureau of Justice Statistics. http://www.science.gmu.edu?~wss/methods/zawitz

Page 656

The Road to Results: Designing and Conducting Effective Development Evaluations

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 12 Managing for Quality and Use Introduction Evaluations can be complicated projects. Keeping everyone on task, meeting deadlines, and doing quality work can be challenging. The design matrix is a valuable tool for designing an evaluation, but other tools are available to help manage the evaluation. Planning is a large part of managing and for a person managing an evaluation project it is critical. This chapter discusses ways for evaluators to plan, manage, meet quality standards, and share results so that the evaluation is used by policy makers to effect change. This chapter has five parts. They are: •

Managing the Design Matrix

•

Managing an Evaluation

•

Managing Effectively

•

Assessing the Quality of an Evaluation

•

Using Evaluation Results.

Chapter 12

Part I: Managing the Design Matrix The key to successful development evaluations is planning. If the evaluation is poorly planned, no amount of later analysis – no matter how sophisticated it is – will save it. According to a Chinese adage, even a thousand-mile journey must begin with the first step. The likelihood of reaching one’s destination is much enhanced if the first step and the subsequent steps take the traveller in the correct direction. Wandering about here and there without a clear sense of purpose or direction consumes time, energy, and resources. It also diminishes the possibility that one will ever arrive. So, it is wise to be prepared for a journey by collecting the necessary maps, studying alternative routes, and making informed estimates of the time, costs, and hazards one is likely to confront – in other words, “think before you leap.” The best of evaluation designs may result in a low quality evaluation, for example, if people without the needed skills try to implement it, if it is delivered too late for a critical decision, or if the evaluation runs out of budget during data collection. Recall that the evaluation design matrix is a visual way to map out an evaluation. The matrix is itself adaptable to best suit the needs of the evaluation. The matrix focuses attention on each of the major components in designing an evaluation. Like any plan, however, it likely will need updating and revising. It is unlikely that the evaluators will have all the information needed as they go through each step of the evaluation design process. As new information is learned – such as what secondary data sources can be used or the reliability of a project database on trainees – revisions are generally needed for some of the initial ideas and approaches. It takes time to complete a design matrix since not all the information needed is available at the outset of the process. Planning is an iterative process. Sometimes evaluators run into dead ends – information they thought would be available is not– or the methods that seemed appropriate or practical are not. As the design develops, some of the original assumptions may need to be adjusted. Alternatively, it may be possible to state the information more accurately and in greater depth than first expected. The comment section may be helpful in keeping track of unresolved issues, concerns, or names of contacts that might be helpful. Refining the evaluation is an on-going process through most of the life of an evaluation.

Page 658

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use In Chapter 10: Planning Data Analysis and Completing the Design Matrix, Table 10.16 presented an example of the types of information that go into each column of the matrix evaluation design matrix that links descriptive, normative, and impact evaluation questions to an evaluation design, data collection sources and methods, and analysis. In addition to the design matrix, useful tools for planning evaluations are: •

Evaluation Plans and Operations Checklist (Stufflebeam, 1999)

•

Checklist for Program Evaluation Planning (McNamara, 2007).

Both are available on the Internet (see references at the end of this chapter). As with all the tools presented and referenced in these chapters, it is often useful to look carefully at several of them, and then create a version that will fit a particular situation.

Part II: Managing an Evaluation It is of limited use to have a design matrix that, in its totality, far exceeds the financial resources available to implement it. A good design matrix also makes it easier to identify the skills and resources needed to carry out the evaluation. It is, for example, part of the job of the evaluation manager to identify the available budget for the evaluation, and revise the evaluation matrix as necessary to fit the available budget. Similarly, the evaluation manager must determine how to fill any skill gaps. This chapter covers these and other aspects of managing the evaluation.

The Evaluation Team Conducting an evaluation involves working with other people. Evaluators work with the client (the person commissioning the evaluation) and other stakeholders. For some evaluations, evaluators collaborate with other evaluators in a variety of jobs, with a variety of skills, knowledge, and responsibilities. Establishing and agreeing upon what to do, who will do it, and when it will be done are essential for managing evaluations. If clear terms of reference and roles and responsibilities are established, including who will produce what reporting outputs.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 659

Chapter 12 The following is a list of skills that evaluation teams should possess to enable team members to work together effectively: (Northeast & the Islands Regional Technology in Education Consortium (NEIR TEC, 2004). •

communicating for the purposes and importance of specified work

•

active listening – helping others voice their thinking through effective paraphrasing

•

collaborating with colleagues and others in positive inquiry practices, valuing each person’s unique contribution and viewpoint

•

putting aside personal biases for the sake of seeking answers to hard questions for the good of the persons affected by the project, program, or evaluation

•

putting aside defensive postures in the face of evaluation findings

•

reflecting on policies and practices with an open mind.

Evaluators also have to manage the fears of the program “owners”. Some people are concerned that the evaluation and the possible consequences of any negative findings it may produce may be detrimental to a good program. Others fear evaluator bias or fear that the evaluators will only be looking for, or concentrating on negatives. Fears may also arise that lack of difficult-to-measure program impact may result in the erroneous conclusion that the program is ineffective, with consequences to its funding. Evaluators should recognize fear of evaluation. Involvement of program managers in planning the evaluation and opportunities to review evaluation work plans, findings, and recommendations are ways to address such fears.

Terms of Reference Terms of Reference (TOR) describe the overall evaluation and establish the initial agreements. They may include the design matrix or production of the design matrix may be part of the tasks to be accomplished. The process for developing the Terms of Reference can be very useful in ensuring that all stakeholders are included in the discussion and in decisionmaking about what evaluation issues will be addressed. It establishes the basic guidelines so everyone involved understands the expectations for the evaluation and the context in which the evaluation will take place.

Page 660

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use According to the Glossary of Key Terms in Evaluation and Results Based Management (OECD/DAC, 2002, p. 36), Terms of Reference are a written documentation that present: •

the purpose and scope of the evaluation

•

the methods to be used

•

the standard against which performance is to be assessed or analyses are to be conducted

•

the resources and time allocated

•

reporting requirements.

Terms of Reference typically include: •

Title: short and descriptive

•

Project or Program Description

•

Reasons for the evaluation and expectations

•

Scope and focus of the evaluation: the issues to be addressed and questions to be answered

•

Stakeholder involvement: who will be involved, defined responsibilities, and accountability process

•

Evaluation Process: what will be done

•

Deliverables: typically an evaluation work plan, interim report, final report and presentations

•

Evaluator qualifications: education, experience, skills and abilities required

•

Cost projection based on activities, time, number of people, professional fees, travel, and any other related costs.

According to the “Planning and Managing an Evaluation” Web site from the UNDP: It is always good to have written a TOR. The TOR serves as the basic tool for an evaluation manager to ensure the high quality of the exercise at different points – from the time the evaluation team is organized to the time that the exercise itself is conducted and the final report is prepared. Of course, the TOR has to be well-written, emanating from consultations with evaluation stakeholders and clearly directed at some very specific issues (UNDP, Planning and managing an evaluation, 2005).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 661

Chapter 12 The following guidelines for writing evaluation terms of reference are modified from the UNDP “Planning and Managing an Evaluation” Web site (2006). •

State clearly the objectives of the evaluation: −

identify the stakeholders of the evaluation

−

the products expected from the evaluation

−

how the products are to be used

−

the specific issues to be addressed

−

the methodology

−

the expertise required from the evaluation team

−

arrangements for the evaluation.

•

Do not simply state the objectives in technical or process terms. Be clear on how the evaluation is expected to help the organization.

•

Focus on key questions to be addressed by the evaluation.

•

Avoid too many questions. It is better to have an evaluation that examines a few issues in-depth rather than one that looks into a broad range of issues superficially.

The authors of this book believe strongly that the design matrix should be included in the TOR, or if not provided, a TOR should be developed with the design matrix as its product. Fitzpatrick, Sanders, and Worthen (2004, p. 286), suggest that clients and evaluators should also consider agreeing upon ethics and standards in contract agreements. There are some useful checklists available for drawing up evaluation contracts and budgets. In particular, the following resources should be helpful: •

Evaluation Contracts Checklist (Stufflebeam)

•

Checklist for Developing and Evaluating Evaluation Budgets (Horn)

•

Key Evaluation Checklist (Scriven.)

These checklists are available at The Checklist Project from The Evaluation Center, Western Michigan University at the following Web site. http://www.wmich.edu/evalctr/checklists/ checklistmenu.htm#mgt

Page 662

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Contracting Evaluations There may be times when people in human resources, or those with specific technical skills, are not available in an organization to help complete the evaluation with in-house resources. Hiring one or more people by contracting to assist with the evaluation is a possibility. (Hawkins, 2005, slides 118). Of course, contractors should have no prior program involvement. Contractors can be brought in for the whole study or only parts of the study. Using contractors has advantages and disadvantages. •

•

Advantages: −

contractor may have in-depth knowledge of the type of program being evaluated

−

may have more local knowledge or fluency in the local language

−

if a competitive process is used to select the contractor, the best, at the least cost, may be selected

Disadvantages: −

expensive (tender process can cost more than the price of the contract)

−

loss of in-house knowledge building.

The process for hiring a contractor generally involves two main steps: •

developing a request for proposal (RFP)

•

use a selection panel to choose a contract evaluator based on pre-specified criteria.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 663

Chapter 12 Hawkins (2005, slide 6) suggests the following items be included in a RFP for contract evaluators: •

purposes of the evaluation

•

background and context for the study

•

key information requirements

•

evaluation objectives

•

deliverables required

•

time frame

•

criteria for tender selection

•

contract details for the project manager

•

deadline for proposals

•

budget and other resources.

After establishing the terms of reference, begin setting up a process to hire a contractor. Hawkins suggests using the following selection process: •

Select a panel comprising people with: −

evaluation knowledge and experience

−

knowledge of the program area

−

knowledge of the culture

−

ownership of the findings and their uses.

•

Have the panel select the proposal using the criteria in the RFP.

•

Keep a record of the selection process.

Hawkins also suggests using the following criteria for selecting the contractor:

Page 664

•

What is the contractor’s record of accomplishment?

•

Has the RFP been adequately addressed?

•

Is there a detailed explanation of implementation?

•

What is the communication and reporting strategy?

•

Is there evidence of competencies?

•

What is the cost — is it specified in detail?

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use Once a contractor is hired, the evaluator still has responsibilities towards the contractor and the evaluation. Hawkins (2005, slide 18) suggests the evaluator has the following responsibilities: •

keeping goals and objectives clear

•

maintaining ownership of the study

•

monitoring the work and providing timely feedback

•

decision making – in good time

•

if changes are required to the contract, being open to negotiation with the contractor.

Roles and Responsibilities Multiple people can work on an evaluation. They will have different capacities and will fill different roles and responsibilities. People can be engaged with an evaluation as: •

evaluation manager

•

evaluator

•

client

•

provider of information (stakeholder)

•

consumer.

The important thing is that each person has their roles and responsibilities clearly defined and has agreed to them.

Evaluation Manager The evaluation manager is the person who will manage the design, preparation, implementation, and follow-up of an evaluation. The evaluation manager may have several evaluations to manage at the same time. In some cases, where there is no evaluation manager, an evaluator will have the dual responsibilities of an evaluation manager as well as those of an evaluator.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 665

Chapter 12

Responsibilities of Evaluation Manager The following is a list of responsibilities that may be required of an evaluation manager. The list is adapted from UNFPA, Programme Manager’s Planning, Monitoring and Evaluation Toolkit (2007, Box 1).

Preparation:

Page 666

•

determine the purpose and users of the evaluation results

•

determine who needs to be involved in the evaluation process

•

together with the key stakeholders, define the evaluation design, objectives, and questions

•

draft the terms of reference for the evaluation; indicate a reasonable time-frame for the evaluation

•

identify the mix of skills and experiences required in the evaluation team

•

oversee the collection of existing information/data; be selective and ensure that existing sources of information/data are reliable and of sufficiently high quality to yield meaningful evaluation results; information gathered should be manageable

•

select, recruit, and brief the evaluator(s)

•

ensure that background documentation/materials compiled are submitted to the evaluator(s) well in advance of the evaluation exercise so that the evaluator(s) have time to digest the materials

•

decide whose views should be sought

•

propose an evaluation field visit plan

•

ensure availability of funds to carry out the evaluation

•

brief the evaluator(s) on the purpose of the evaluation; use this opportunity to go over documentation and review the evaluation work plan.

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Implementation: •

ensure that the evaluator(s) have full access to files, reports, publications, and any other relevant information

•

follow the progress of the evaluation; provide feedback and guidance to the evaluator(s) through all phase of implementation

•

assess the quality of the evaluation report(s) and discuss strengths and limitations with the evaluator(s) to ensure that the draft report satisfies the TOR, and that evaluation findings are defensible, and recommendations are realistic

•

arrange for a meeting with the evaluator(s) and key stakeholders to discuss and comment on the draft report

•

approve the end product, ensure presentation of evaluation results to stakeholders.

Followup:: Follow-up •

evaluate the performance of evaluator(s) and place it on record

•

disseminate evaluation results to the key stakeholders and other audiences

•

promote the implementation of recommendations and use of evaluation results in present and future programming; monitor regularly to ensure that recommendations are acted upon.

A major responsibility of evaluation managers is helping the evaluators do their work. If not in the same location as the evaluation team, they may find themselves using the telephone, electronic mail, or conferences to communicate with the evaluation team to: •

clarify the TOR for the evaluation team

•

answer questions

•

check on status of responsibilities

•

ask if they need additional resources

•

help the evaluation team to learn more about each other, their responsibilities, their areas of strength, and means of contacting each other.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 667

Chapter 12 The evaluation manager may serve as a facilitator during team meetings. As a facilitator, the evaluation manager will enable all participants to share their views and ideas. A facilitator is responsible for: •

setting an agenda

•

helping the group stick to the agenda (topics and times schedule)

•

ensuring that all views are heard

•

overseeing a process for decision-making (a consensus or a voting process).

Evaluation managers likely will select the staff – either inhouse or consultant – that will work on evaluation projects. One of the most important responsibilities of an evaluation manager is to review the strength of the evidence underlying the evaluation findings and recommendations made by the evaluation team. The evaluation manager also checks that: •

the final report represents the findings and recommendations of the team as a whole

•

all of the issues specified in the TOR are addressed

•

there is a clear explanation if one or more issues have been dropped.

On some evaluations, one member of the team may have strong, dissenting views on a particular issue. In such cases especially, the evaluation manager must carefully review the evidence before proceeding.

Evaluator Evaluators are the people who do the main work in an evaluation. There may be from one to many evaluators on an evaluation, and they may be internal or external; that is, inhouse or contract evaluators. The number of people involved depends on the size of the evaluation, the budget, and the number of people available.

Page 668

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use Evaluators may be chosen to participate on an evaluation for different reasons. The UNDP (2006) suggests characteristics that might be important in an evaluator. The following list is adapted from the UNDP materials •

expertise in the specific subject matter

•

knowledge of key development issues especially those relating to the main goals or the ability to see the "big picture"

•

familiarity with organization’s business and the way such business is conducted

•

evaluation skills in design, data collection, data analysis, and preparing reports

•

skills in the use of information technology.

Some organizations have evaluators on staff. Those organizations will usually use their evaluators. Other organizations may need to contract with people outside of their organization to work on evaluations. In either case, there are advantages and disadvantages for working with in-house evaluators and contract evaluators. If what the organization needs is relatively straightforward and the organization has someone on their staff with the capabilities to do the evaluation, they should be able to complete this evaluation internally. If the needs of the evaluation go beyond the in-house expertise, then the organization will need to hire one or more outside evaluation experts to supplement the existing staff expertise.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 669

Chapter 12

Responsibilities of Evaluators The following is a list of potential responsibilities of evaluators modified from the UNFPA (2007, Box 2). •

provide inputs regarding evaluation design; bring refinements and specificity to the evaluation objectives and questions

•

conduct the evaluation

•

review information/documentation made available

•

design/refine instruments to collect additional information as needed; conduct or coordinate additional information gathering

•

undertake site visits; conduct interviews

•

in the case of a participatory evaluation, facilitate stakeholder participation

•

provide regular progress reporting/briefing to the evaluation manager

•

analyze and synthesize information; interpret findings, develop and discuss conclusions and recommendations; draw lessons learned

•

participate in discussions of the draft evaluation report; correct or rectify any factual errors or misinterpretations

•

guide reflection/discussions if expected to facilitate a presentation of evaluation findings in a seminar/workshop setting

•

finalize the evaluation report and prepare a presentation of evaluation results.

Evaluation involves many important skills and evaluators must use sound judgment. However, evaluation is not a profession. Currently the profession is attempting to establish professional competency criteria for individuals who engage in the activity to apply to appropriate tool(s) and specific situations (Treasury Board of Canada, Improving the Professionalism of Evaluation, Overview and Summary of Findings). Table 12.1 summarizes the Essential Competencies for Program Evaluators (ECPE). Stevahn, King, Ghere, & Minnema, 2005)

Page 670

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use Table 12.1: Essential Competencies for Program Evaluators (ECPE) (Source: Stevahn and King, Ghere, & Minnema, 2005)

Competency 1.0 1.1 1.2 1.3 1.4 1.5 1.6 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12

Professional Practice Applies professional evaluation standards Acts ethically and strives for integrity and honesty in conducting evaluations Conveys personal evaluation approaches and skills to potential clients Respects clients, respondents, program participants, and other stakeholders Considers the general and public welfare in evaluation practice Contributes to the knowledge base of evaluation Systematic Systematic Inquiry Understands the knowledge base of evaluation (terms, concepts, theories, assumptions) Knowledgeable about quantitative methods Knowledgeable about qualitative methods Knowledgeable about mixed methods Conducts literature reviews Specifies program theory Frames evaluation questions Develops evaluation designs Identifies data sources Collects data Assesses validity of data Assesses reliability of data Analyzes data Interprets data Makes judgments Develops recommendations Provides rationales for decisions throughout the evaluation Reports evaluation procedures and results Notes strengths and limitations of the evaluation Conducts meta-evaluations Situational Analysis Describes the program Determines program evaluability Identifies the interests of relevant stakeholders Serves the information needs of intended users Addresses conflicts Examines the organizational context of the evaluation Analyzes the political considerations relevant to the evaluation Attends to issues of evaluation use Attends to issues of organizational change Respects the uniqueness of the evaluation site and client Remains open to input from others Modifies the study as needed

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 671

Chapter 12 Table 12.1 (continued)

Competency 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 5.0 5.1 5.2 5.3 5.4 5.5 6.0 6.1 6.2 6.3 6.4 6.5 6.6

Page 672

Project Management Responds to requests for proposals Negotiates with clients before the evaluation begins Writes formal agreements Communicates with clients throughout the evaluation process Budgets an evaluation Justifies cost given information needs Identifies needed resources for evaluation, such as information, expertise, personnel, instruments Uses appropriate technology Supervises others involved in conducting the evaluation Trains others involved in conducting the evaluation Conducts the evaluation in a non-disruptive manner Presents work in a timely manner Reflective Practice Practice Aware of self as an evaluator (knowledge, skills, dispositions) Reflects on personal evaluation practice (competencies and areas for growth) Pursues professional development in evaluation Pursues professional development in relevant content areas Builds professional relationships to enhance evaluation practice Interpersonal Competence Uses written communication skills Uses verbal/listening communication skills Uses negotiation skills Uses conflict resolution skills Facilitates constructive interpersonal interaction (teamwork, group facilitation, processing) Demonstrates cross-cultural competence

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Main Client The main client is the person who officially requests the evaluation and, if it is a paid evaluation, pays for or arranges payment for the evaluation. The evaluator may report to this same person. An evaluation may have many stakeholders that are also clients, but usually one main client. The client’s planned use of the evaluation and corresponding timing implications generally frame the evaluations, along with the specific issues that the client wants to have addressed. As discussed in Chapter 4: Understanding the Evaluation Context and Program Theory of Change, the evaluation team must manage its relationship with the main client, in part by developing a communication plan.

Stakeholders Stakeholders can be limited to identifying issues and questions for the evaluation to consider addressing for stakeholders. They can share responsibilities for the evaluation at various levels. Participatory evaluations include selected stakeholders in an evaluation team and engage them in question development, data collection, and analysis. As you may remember, stakeholders are more likely to support the evaluation and act on results and recommendations if they are involved in the evaluation process. It can also be beneficial to engage the project, program, or policy critics in the evaluation. In some cases, a program’s critics can help identify issues that could discredit the evaluation if they are not addressed, thus helping to strengthen the evaluation process (Centres for Disease Control and Prevention, 2001, Chapter 1, p. 17).

Part III: Project Management Process Project management is about managing all the facets of a project, at the same time. This includes managing: •

time: duration of tasks, dependencies, and critical paths

•

scope: project size, goals, requirements

•

money: costs, contingencies

•

resources: people, equipment, material.

Project management is analogous (similar to) juggling. Like a juggler who must keep many balls continuously in the air, the project manager must keep track of many things at one time and be responsible for their success.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 673

Chapter 12 Most experts in the field agree that project management is a process, involving phases or stages. There are many project management models. Michael Greer, a well-known authority on project management, has developed a model that emphasizes actions and results. An organization may have its own model or it can choose another model. The important thing is to understand the many things for which a manager is responsible and actions and results that should be included. Greer’s model divides project management into five phases. The information in this section is adapted from his Project Management Resources Web site (Greer, 2008). His five phases are: •

initiating

•

planning

•

executing

•

controlling

•

closing.

Figure 12.1 shows a diagram of this five-phase process.

Initiating

Planning

Executing

Controlling

Closing

Fig. 12.1 12.1: Michael Greer’s Five Phase Project Management Management Process.

Each of these phases is divided into steps, which Greer calls actions, and for which he includes a description of its results. In total, Greer identifies 20 actions across this process. The 20 Key Project Manager Actions are organized according to their support of the Five Essential Project Management Processes: initiating, planning, executing, controlling, and closing (Greer, 2001, 20 Key Project Manager Actions and Results).

Page 674

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use Table 12.2 12.2 Twenty Key Project Manager Activities Action

Results of Successful Successful Performance

Initiating Phase 1. Demonstrate project need and feasibility.

2. Obtain project authorization.

3. Obtain authorization for the phase.

A document confirming that there is a need for the project deliverables and describing, in broad terms: the deliverables, means of creating the deliverables, costs of creating and implementing the deliverables, benefits to be obtained by implementing the deliverables. • A "go/no go" decision is made by the sponsor. • A project manager is assigned. Formally recognizes the project Is issued by a manager external to the project and at a high enough organizational level so that he or she can meet project needs Authorizes the project manager to apply resources to project activities • A "go/no go" decision is made by the sponsor which authorizes the project manager to apply organizational resources to the activities of a particular phase • Written approval of the phase is created which: Formally recognizes the existence of the phase Is issued by a manager external to the project and at a high enough organizational level so that he or she can meet project needs

Planning Phase 4. Describe project scope. 5. Define and sequence project activities.

6. Estimate durations for activities and resources required. 7. Develop a project schedule. 8. Estimate costs.

9. Build a budget and spending plan.

10. Create a formal quality plan. (optional) 11. Create a formal project communications plan. (optional)

• Statement of project scope • Scope management plan • Work breakdown structure • An activity list (list of all activities that will be performed on the project) • Updates to the work breakdown structure (WBS) • A project network diagram • Estimate of durations (time required) for each activity and assumptions related to each estimate • Statement of resource requirements • Updates to activity list • Supporting details, such as resource usage over time, cash flow projections, order/delivery schedules, etc. • Cost estimates for completing each activity • Supporting detail, including assumptions and constraints • Cost management plan describing how cost variances will be handled • A cost baseline or time-phased budget for measuring/monitoring costs • A spending plan, telling how much will be spent on what resources at what time • Quality management plan, including operational definitions • Quality verification checklists • A communication management plan, including: Collection structure Distribution structure Description of information to be disseminated Schedules listing when information will be produced method for updating the communications plan

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 675

Chapter 12 Table 12.2 (continued) 12. Organize and acquire staff.

13. Identify risks and plan to respond. (optional) 14. Plan for and acquire outside resources. (optional)

15. Organize the project plan. 16. Close out the project planning phase. 17. Revisit the project plan and re-plan if needed.

• Role and responsibility assignments • Staffing plan • Organizational chart with detail as appropriate • Project staff • Project team directory • A document describing potential risks, including their sources, symptoms, and ways to address them • Procurement management plan describing how contractors will be obtained • Statement of work (SOW) or statement of requirements (SOR) describing the item (product or service) to be procured • Bid documents, such as RFP (request for proposal), IFB (invitation for bid), etc. • Evaluation criteria -- means of scoring contractor's proposals • Contract with one or more suppliers of goods or services • A comprehensive project plan that pulls together all the outputs of the preceding project planning activities • A project plan that has been approved, in writing, by the sponsor A "green light" or okay to begin work on the project • Confidence that the detailed plans to execute a particular phase are still accurate and will effectively achieve results as planned

Executing Phase 18. Execute project activities.

• Work results (deliverables) are created. • Change requests (i.e., based on expanded or contracted project) are identified • Periodic progress reports are created • Team performance is assessed, guided, and improved if needed • Bids/proposals for deliverables are solicited, contractors (suppliers) are chosen, and contracts are established • Contracts are administered to achieve desired work results

Controlling Phase 19. Control project activities.

• Decision to accept inspected deliverables • Corrective actions such as rework of deliverables, adjustments to work process, etc. • Updates to project plan and scope • List of lessons learned • Improved quality • Completed evaluation checklists (if applicable)

Closing Phase 20. Close out project activities.

• Formal acceptance, documented in writing, that the sponsor has accepted the product of this phase or activity • Formal acceptance of contractor work products and updates to the contractor's files • Updated project records prepared for archiving • A plan for follow-up and/or hand-off of work products

(Source: The Project Manager’s Partner © Copyright 1996, 2001 Michael Greer & HRD Press)

Page 676

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Part III: Managing Effectively Many analogies have been drawn in defining project management. Some use the analogy of juggling. Like a circus performer tossing and catching objects in the air the project manager needs to keep an eye on people, tasks, time, budget, and quality. Others use an analogy of an orchestra. Here the project manager stands at a distance and directs the activity. Each analogy shows the need for a person with the knowledge and ability to control a project. To look closer at how to manage effectively, we need to address how a manager deals with people and tasks.

Managing People Effectively Typically, a group, rather than an individual, works together to complete an evaluation. When more than one client of key stakeholder is involved, more time is necessarily spent on communication. Participatory evaluations add to the complexity: still more people are involved, and communication becomes a central and major responsibility of the manager. The evaluation manager must work with the evaluation team to clearly articulate the goals, objectives, and values of the evaluation. The roles and responsibilities of each group member must be clearly articulated, and team members need to know what is expected of them, when it is expected, and in what form. For a team to do their best, the project manager must give the team all of the information needed to succeed.

Managers manage people, NOT evaluations.

People are complicated. They are not machines. Their behavior will change from day to day. Be sure to stay alert to what is going on with each person. At the beginning of each evaluation, sit down and get to know the staff. Find out what other projects they are doing, what their goals are, what they like to do in their free time, etc. Managers do not need to put their unique handprint on everything. Some things probably work just fine already. Also good managers do not think or act like they know everything, because nothing breeds resentment more than arrogance. The Road to Results: Designing and Conducting Effective Development Evaluations

Page 677

Chapter 12 Any manager promoted from an evaluator to a manager needs to let go of his or her previous responsibilities as an evaluator. The manager has different responsibilities now. The manager is responsible for everything that happens within his or her scope of authority. Managers must not think that just because they may not be doing the actual work, they are not responsible – the manager is still responsible. The United Nations Development Program (UNDP) provides the following tips to managers for working with evaluators (UNDP, 2006).

Page 678

•

Clarifying the TOR for the evaluation team is important. Electronic mail and other means of communications should be used fully to get this done even before the traditional briefing that is held for the team upon arrival in the countries to be visited.

•

Providing basic documentation that the team should analyze well ahead of time should help clarify some issues at an early stage.

•

Agreeing on the program for the evaluation mission is critical. Remember that it is not enough that evaluators visit relevant institutions. Make sure that they interview the "right" persons in those institutions, e.g., those who are experts on the subject, familiar with the project and its beneficiaries, and have the level of authority who could speak adequately about certain policy issues.

•

Getting the evaluation team to know each other builds teamwork. Having the team members share their CVs and contact information even before they meet each other is usually helpful in breaking the ice. It also enables them to have an idea of the strengths and contributions that each one of them can offer to the exercise.

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Meeting with Client for Contextual Information Michael Scriven’s Key Evaluation Checklist (2007) describes the importance of meeting with the client before beginning work on the evaluation. He writes that about meeting with the client to identify the details of the job or jobs, as the client sees them—or encourage the client to clarify their thinking on details where that has not yet occurred. This theme has been echoed throughout this textbook. We encourage meeting early with the client to expand on the: •

nature and context of the request

•

need, issue, or interest leading to the evaluation request

•

critical timing needs for the evaluation findings

•

questions of major importance for the evaluation to address from the client’s perspective

•

communication schedule and frequency.

Scriven suggests asking questions such as: •

Is the request or the need for this evaluation to investigate worth or merit — or both worth and merit? An evaluation of worth involves serious cost analysis. An evaluation of merit investigates the significance.

•

Exactly what are you supposed to be evaluating?

•

How much of the context is to be included?

•

Are you supposed to be evaluating the effects of the program as a whole or the contribution of each of its components, or perhaps additionally the client’s theory of how the components work?

•

Are you to consider impact in all relevant respects or just some respects?

•

Is the evaluation to be formative, summative, descriptive, or more than one of these?

•

Should the evaluation yield grades, ranks, scores, profiles, or apportionments?

•

Are recommendations, explanations (i.e., your own theory), fault-finding, or predictions requested, or expected, or feasible?

•

Is the client really willing and anxious to learn from faults or is that just a pose? −

The evaluator’s contract, or, for an internal evaluator, job, may depend on getting the answer to this question right.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 679

Chapter 12

Foundations When working with the client, learn as much as you can about the foundations of the project, program, or policy. Scriven’s Key Checklist goes on to identify these as: •

background and context

•

descriptions and definitions

•

consumers

•

resources

•

values.

Background and Context After meeting with the client and clarifying the needs and parameters of the evaluation, continue investigating the context and nature of the evaluation. To Scriven (2007), this means identifying historical, recent, simultaneous, and any projected settings for the program. To do this:

Page 680

•

Identify any ‘upstream stakeholders’ and their stakes – other than clients. That is, identify people, groups, or organizations that assisted in implementation of the program or its evaluation. For example, people who assisted with funding or advice or housing.

•

Identify enabling (and any more recent relevant) legislation/policy, and any legislative/executive/practice or attitude changes since the start-up.

•

Identify the underlying rationale, also known as the official program theory, and political logic (if either exist or can be reliably inferred). Neither is necessary, but they are sometimes useful.

•

Identify the general results of the literature review on similar interventions. Include ‘fugitive’ studies, those not published in standard media, and the Internet, including the ‘invisible web,’ and the latest group and web log (blog) search engines.

•

Identify previous evaluations, if any.

•

Identify their impact, if any.

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Descriptions and Definitions Another important part of the meeting with the client is to standardize descriptions and definitions. We advise sharing and using the OECD/DAC Glossary for this. Scriven (2007) suggests the following ways: •

Record any official description of program, components, context/environment, but do not assume it is correct.

•

Develop a correct and complete description in enough detail to recognize the evaluand, and if needed, to replicate it.

•

Explain the meaning of any ‘technical terms,’ such as those that will not be in the prospective audiences’ vocabulary.

•

Note any significant patterns/analogies/metaphors that are used by (or implicit in) participants’ accounts, or that occur. These are potential descriptions and may be more enlightening than literal prose, whether or not they can be justified.

•

Distinguish the initiator’s efforts in trying to start up a program from the program itself. Both the effort and program itself are interventions; only the program itself is (normally) the evaluand.

Resources Of course resources are another focus of the evaluation manager. Scriven (2007) also advises to learn about the resources for the evaluation, including: •

financial assets

•

physical assets

•

intellectual-social-relational assets.

While investigating, be sure to look at the abilities, knowledge, and goodwill of: •

staff

•

volunteers

•

community members

•

other supporters.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 681

Chapter 12 Your resources should cover what the evaluation could now use or what could have been used, not just what was used. This defines the “possibility space;” that is, the range of what could have been done – often an important element in the comparisons that an evaluation considers. It may be helpful to list specific resources that were not used/available in this implementation. For example, to what extent were potential impactees, stakeholders, fund-raisers, volunteers, and possible donors not recruited or not involved as much as they could have been involved? As a check, and as a complement, consider all constraints on the program.

Values The last of Scriven’s foundations from his Key Evaluation Checklist is values. He states that knowledge of the values is important for learning about the context of the evaluation. He suggests checking the values shown in Table 12.3 for relevance and look for others.

Page 682

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use Table 12.3: Values to Check Check for Relevance Value Needs of the impacted population

Criteria of merit from the definition of the evaluand and from standard usage

Logical requirements Legal and ethical requirements (they overlap) Personal and organizational goals/desires

Fidelity to alleged specifications

Sub-legal but still important legislative preferences Professional standards of quality that apply to the evaluands; Expert judgment Historical/traditional/cultural standards Scientific merit (or worth or significance) Technological merit, worth, significance Marketability Political merit, Resource economy

Description Use a needs assessment Distinguish: - performance needs from treatment needs - met needs from unmet needs - meetable needs from ideal but impractical or impossible-with-present resource needs. Since a program is usually regarded as better (by definition) if it reaches more people and has a larger good effect on them (other things being equal), the criteria of merit typically include the number of people impacted by the program and the depth of desirable impact). For example, consistency Usually including (reasonable) safety, confidentiality, perhaps anonymity, for all impactees. If not in conflict with ethical/legal/practical considerations (if you are not doing a goal-free evaluation); these are usually much less important than the needs of the impactees, but are enough by themselves to drive the inference to an evaluative conclusion about, for example, which apartment to rent. This is also known as “authenticity”, “adherence”, or “compliance. It is often usefully expressed via an “index of implementation”; and – a different but related matter – consistency with the supposed program model (if you can establish this beyond reasonable doubt).

If you can establish it beyond a reasonable doubt. How low-impact is the program with respect to money, space, time, labor, contacts, expertise and the eco-system.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 683

Chapter 12

Techniques for Teamwork There may be times when team members need to work closely together to make decisions. There are techniques to help managers. They include: •

communication skills

•

teamwork skills

•

brainstorming

•

affinity diagrams

•

concept mapping

•

conflict resolution

•

communication strategies.

Communication Skills Communication skills are essential for the success of any evaluation. Managers must be able to communicate among the team members, stakeholders, and subjects. There are two types of communication: non-verbal and verbal. Managers continually use both these types of communication.

Giving People Time Time Setting aside a specific time for meetings and regular communications is good practice. This allows time for everyone involved to prepare. Also, keep in mind that listening is often times much more productive when working to communicate effectively, and can very well be more important than talking. Allow everyone involved the time they need to communicate effectively.

Page 684

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Moderating Group Activities Moderators of group activities must be able to communicate with participants and listen to what they have to say. Moderators should do what they can to improve communication within the group. The following are some suggestions (Mancini Billson, 2004). •

Use name tents (cards) so people can refer to each other using names.

•

Respond positively to a person’s initial attempts to communicate and invite further contributions – this will affect whether the participant will risk contributing again.

•

Avoid passing over group members.

•

Respond in a positive manner to comments that are not quite on the mark and invite further input:

•

−

“now let’s take a step further”

−

“keep going”

−

“that will become important later”

−

“don’t forget what you had in mind.”

Avoid “put down” and close-off comments.

Moderation depends on very good listening skills. To be a good moderator many need to improve their listening skills. The listening process is more than hearing. Use active and reflective listening. Active listening involves paying attention to what is being said and then paraphrasing what was heard to the person who spoke. The listener paraphrases what he or she heard and asks for an acknowledgement that that was what the speaker meant. There are many techniques for improving listening skills. The following is a short list of techniques: •

Recognize that both the sender and the receiver share the responsibility for effective communication.

•

Listen actively and neutrally.

•

Listen with an inner ear for what is actually meant, rather than for what is said.

•

Tune in to the speaker’s non-verbal cues.

•

Be aware that your posture affects your listening.

•

Restate or paraphrase the main ideas to ensure that you have heard them correctly.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 685

Chapter 12 Moderators need to pay close attention to non-verbal cues. People communicate by their actions as well as their words. The following nonverbal cues may indicate that there is a problem with communication: •

silence

•

arms folded

•

head nodding

•

finger tapping

•

yawning

•

looking at watch

•

frowning

•

working with their cell phone or other personal digital assistant (PDA)

•

making frequent trips out of the room.

Teamwork Skills Working with others can be challenging as well as rewarding. When working closely with others use teamwork skills. The following are some of the most important skills for working with others in a team.

Page 686

•

listening – good active listening skills by all on the team can be a team’s most valuable skill

•

questioning – team members should ask questions to clarify and elaborate on what others are saying

•

persuading – team members may need to exchange ideas, elaborate, defend, and rethink their ideas

•

respecting – team members should respect the opinions of others and should encourage and support the ideas and efforts of others

•

helping – team members should help each other by offering assistance to each other when it is needed.

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Brainstorming Brainstorming is a technique used to gather large amounts of information in a short time from a group of people. In brainstorming, each person contributes an idea for an evaluation question that is written on flip-chart. One person gives one idea and then the next person gives one idea. The facilitator keeps circling the group until no more ideas are offered. The basic rule is that every idea goes up on the flip chart—there are no bad ideas and no discussion of the idea occurs at this point. In this way, all ideas are heard without regard to status. The group as a whole then begins to identify common ideas (in this situation, common questions) and a new list is created which captures all the questions.

Affinity Diagrams An alternative to brainstorming is an approach called an affinity diagram. In this approach, everyone writes his or her ideas for evaluation questions on a piece of paper or a note card. Only one idea can go on each card or piece of paper. This occurs in silence. When people have listed all their comments, suggestions, or questions, they place their cards or pieces of paper on a wall. Again, this is done in silence. Then the group begins to arrange the ideas into common themes. This process begins in silence and then when there is a rough sort, the facilitator goes through what is on the wall and leads the group in identifying the common themes. The choice between brainstorming and affinity diagram is based on the group. Brainstorming works well if people cannot write and if the facilitator can handle dominant people. It works less well for shy people. The affinity diagram works well as a fairly anonymous process so that everyone can get his or her idea posted, regardless of status or any fears about speaking. However, it requires that people be able to write and are comfortable with this process.

Concept Mapping When working with stakeholders, one approach that might be useful is concept mapping. Concept mapping is a group process that provides a way for everyone’s ideas to be heard and considered. The first step is to generate ideas. One way is to brainstorm another ways is using affinity diagrams. In either case, concept mapping then includes a validation process. If the ideas are grouped together, they should represent a similar concept or theme.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 687

Chapter 12 The group can then discuss the concepts (big evaluation questions) and why they are important or not. Next, the group can rate each concept in terms of importance, with 1) being not important and 5) being very important. Or people can rate each of the questions as being essential, important but not essential, or nice to know but not important. Each person rates each question posted on the wall. Again, this provides some anonymity so everyone can feel free to express his or her view.

Conflict Resolution When working with people in groups, there is a good chance that conflicts will occur among the team members. Not all conflict should end up with a winner and a loser. The most constructive conflicts end up with both parties “winning.” The two skills most needed in resolving conflict are communication skills and listening skills. Important communication skills include, using “I” statements instead of “you” language. People in conflict should discuss their own feelings. Owning your own feelings and your own communication is a much more effective way to communicate and goes a long way toward reducing conflict. Use active listening skills. Active listening involves trying to understand what the other person is saying, and then the receiver communicates to the sender that her or she does indeed understand what the sender was saying. You may say, I hear you saying … is that correct? The following are suggestions for ways to help resolve conflict.

Page 688

•

Bring those with the conflict to a meeting. Allow each to briefly summarize their point of view, without interruption. If one person does not allow the other to finish or begins to criticize, the manager needs to stop him or her. Each must present their side.

•

Ask each person involved to describe the actions they would like to see the other person take.

•

Listen to both sides. Ask yourself if there is anything about the work situation that is causing this conflict? If so, consider ways of changing the work situation to solve the conflict.

•

Do NOT choose sides. Remind the participants of the goal or objective of the evaluation and strive to find a way to help both sides reach the goal.

•

Expect the participants to work to solve their dispute. Allow them to continue to meet to address the conflict. Set a time to review their progress.

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Communication Strategies A communication strategy helps plan the way to communicate with the client, key stakeholders, and the public. A communication strategy defines the why, what, where, and how for giving and receiving information. If the evaluation involves sensitive information, consider establishing a communication strategy is especially important. A communication strategy establishes a plan for communicating and usually involves different media for different audiences. The Economic and Social Research Council (ESRC, 2007, Top Ten Tips) offers ten tips for putting together a communication strategy. The following are adapted from the ESRC list, and are most applicable to evaluations. 1. Begin with a statement of the objectives in communicating the project; do not simply restate the objectives of the project itself. Make them clear, simple, and measurable. 2. Develop some simple messages and model how these might work in different contexts – a press release, a report, a newspaper article, a website page. Make sure the project is branded in line with the communication objectives. 3. Be clear about the target audiences and users groups, and then prioritise them according to importance and influence relative to the objectives. Do not just think about the ‘usual suspects’. 4. Think about both the actual and preferred channels the target audiences might use. Check to ensure that planning uses the right ones for maximum impact. 5. Include a full list of all the relevant communications activities, developed into a working project plan with deadlines and responsibilities. Keep it flexible but avoid being vague.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 689

Chapter 12

Working with Groups of Stakeholders The World Bank’s Web pages (1996, Involving Stakeholders) have advice on involving stakeholders. This is particularly appropriate for participatory evaluations. The techniques they suggest are the optimal strategy. Once stakeholders have been identified, the next step is to enlist their participation. Evaluators have sought to work with affected stakeholders through a variety of approaches. But "special" measures are needed to ensure that groups that are normally excluded from the decision making process have a voice. To achieve this, evaluators have to first organize the "voiceless," arrange for their representation, hold exclusive participatory sessions with them, employee "leveling" techniques that allow stakeholders at all levels to be heard, and use surrogates-intermediaries with close links to the affected stakeholders. But what happens when opposition exists? The following are examples of techniques to use to ensure important stakeholders are involved. The extent to which the evaluator uses some or all of these techniques is related to the size and needs of the program.

Building Trust To many of the identified stakeholders, an outsider bringing offers of "participatory development" may seem suspect. Prior experience with public agencies, public servants, and donor projects has, in many places, created negative impressions that need to be rectified. The following are ways of building trust: Sharing Information One way to build trust is to share information about what is intended by the evaluation. You can do this with individual meetings or large groups like a “town meeting.” During these meetings, a representative of the project can share information about the how’s and why’s of the evaluation. The participants in the meeting have the opportunity to express their expectations and concerns. Once trust is established, participants can be invited to form their own committees and participate in the evaluation.

Page 690

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use Interacting Repeatedly Another way to build trust is through intensive and repeated interaction between the evaluators and the stakeholders. As both sides develop a feel for and understanding of one another through iterative planning sessions, suspicion of each other begins to dissipate and the basis for trust, respect, and cooperation can be established. Working through Intermediaries In some instances distrust is so great that intermediaries may be required to bridge the gap. In these cases, a person or organization that is respected by the stakeholders is able to use its unique position to bring the different parties together.

Involving Directly Affected Stakeholders A great deal can still be learned about how to work with directly affected stakeholders. The following are approaches for enabling intended beneficiaries – as well as those likely to be adversely affected – to participate in planning and decisionmaking. These approaches include: Working with the Community There are several ways of working with the community. One way is to use information campaigns. In this way, the evaluator shares important information about the evaluation with the community. The community will feel more comfortable and be more willing to assist if they feel they are involved. In another approach, the organization in charge of the intervention designs the intervention first and then negotiates it with the stakeholders. Some evaluators use another approach for working with communities. They involve the community from the very beginning. In this way the community assists with the design as the intervention emerges. Mapping is another way to involve the community. Mapping is a technique to use the community. The evaluator asks people to help draw maps to show important routes, interactions, and places in a community.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 691

Chapter 12 Working with the Representatives After meeting with the entire group, you may want to have the stakeholders form committees of representatives from each area. The members of committees can give input into the evaluation. Working with Surrogates Another approach to involving directly affected stakeholders is through intermediaries or surrogates. Surrogates may be any group or individual who has close links to the affected population and is capable of representing their views and interests during participatory planning. Be sure to exercise caution in selecting surrogates to speak for the directly affected. In some cases the surrogates represent their own interests instead of the stakeholders they are representing.

Seeking Feedback In cases in which stakeholders participate through their representatives or surrogates, evaluators should follow the rule of thumb that one should trust those who speak for the ultimate clients but from time to time verify directly with those whose opinion really counts. The following are ways to crosscheck and see if the approach also facilitates broader ownership and commitment among those affected by the evaluation. These approaches include: Making On-site Visits When looking for feedback from stakeholders, consider the value of direct interaction with communities to ensure that their interests are being accurately represented in the evaluation. The direct interaction can occur in both a formal and informal way. Formal sessions can be arranged and facilitated by representatives of the stakeholders. Evaluators can report to the stakeholder’s progress of evaluation. Open discussion can follow where individual stakeholders can express their opinions and ask questions. Their feedback can then be incorporated in the final evaluation. In addition to these formal meetings, the evaluators can go unannounced to visit stakeholders. They can introduce themselves and ask if stakeholders have heard about the evaluation and what they thought of it. This informal feedback can be compared with what the evaluator hears at the more formal level. It serves as a way of verifying consistency and checking for biases.

Page 692

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use Stakeholder Review of Documents Another way evaluators can obtain feedback is by providing the opportunity for stakeholders to review and revise draft documents prepared by the design team. Evaluators find this follow-up to be crucial in fostering broader ownership and commitment beyond just those who were present at the participatory planning events.

Involving the Voiceless Some groups – especially the very poor, women, indigenous people, or others who may not be fully mobilized – may not have the organizational or financial wherewithal to participate effectively. These are often the exact stakeholders whose interests are critical to the implementation success and sustainability of interventions. Special efforts need to be made to level the disequilibrium of power, prestige, wealth, and knowledge when stronger and more established stakeholders are meant to collaborate with weaker, less organized groups. The following are ways of involving the voiceless: Building Capacity Evaluators can build capacity to involve the “voiceless” by helping local people form and strengthen their own organizations. By organizing communities, local people learn how to work together to take care of their individual and communal needs. Once organized and having clarified their own interests, their willingness and ability to use the new power and skill of speaking with one, unified voice increases significantly. For example, in many rural situations, women are left out of decisionmaking. Evaluators can make the rule that at least one mother is included on each committee Organizing Separate Events Another way to include the “voiceless” is to organize separate events. If you have a large group that is left out, you may want to organize a separate event just for that group. For example, if you have a religious, ethnic, or gender group that is not invited to the planning event but have a large stake in the evaluation, you may set up a meeting with the group or groups alone, in a separate event where they can articulate their priorities and concerns.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 693

Chapter 12 Leveling Techniques Power differences among stakeholders can be diminished with participatory techniques. Skilled design and facilitation of participatory processes can promote "level" interactions. Small working groups, governed by facilitator-monitored "behavioral rules" that ensure that all participants speak and receive respect for their contributions, is one way of doing it. "Leveling" is facilitated when people listen to or observe quietly what others say without criticism or opposition. Quiet observation encourages the "voiceless" to express themselves through nonverbal representations. Similarly, role reversal can help level the playing field. Role-playing exercises are another means of leveling. Using Surrogates In some cases it may be logistically infeasible to bring the “voiceless” to meetings. They may live long distances away and may not be able to leave their homes due to family responsibilities. In other cases, it may not be feasible because of the power differences between the “voiceless” and the people in power. It may be that when making presentations to the minister and other senior government officials, the “voiceless” might feel intimidated and overwhelmed and might not be able to articulate their needs effectively. Bringing in other people who are very familiar with the problems and have experience working with bureaucrats and local government officials such as surrogates might be a solution. It can be easier for them to speak to more powerful stakeholders and participate more equally in preparing action plans on behalf the “voiceless.”

Page 694

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Involving the Opposition Sometimes collaboration among different stakeholders may not be possible, however. In these cases, either resources should not be committed to the proposed activity or a group of stakeholders may have to be left out, generally by modifying the concern being addressed. Stakeholder conflict is often produced by the external expert stance. When external experts formulate a complete, fully developed proposal and present it to the people it affects, immense room for misunderstanding exists on the part of those who were not involved in preparing the proposal. Ways of involving the opposition include: Starting Early and Broadly In most instances, fully developed evaluation designs are really "take-it-or-leave-it" propositions, no matter how much lip service is paid afterward to collaborative decision making. After considerable time and effort is spent preparing the evaluation design, evaluators are not likely to be open to significant changes. For those who perceive a loss for themselves in the evaluation design, outright opposition may appear to be the only possible stance; the greater the loss, the stronger the opposition is likely to be. Once opposition mobilizes, it is difficult – if not impossible – to resolve the matter. When all stakeholders collaborate in designing their collective evaluation, it increases the chances of former differences being resolved and a new consensus emerging around issues everyone can agree on. This is probably so because people who have to live and work together can often find ways to agree if given the chance. Unfortunately, people do not often get the chance to work together. Development evaluations prepared in the external expert stance do not provide that chance. The participatory process, however, facilitates working together. So early participation can be a "conflict avoidance" process to the degree that it helps stakeholders with different interests explore and potentially find common interests.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 695

Chapter 12 Finding Common Ground By focusing on common interests most evaluations result in sustainable collaborative action. Despite the success stories, consensus will sometimes be unattainable and no basis will exist for future action, especially in situations with a long history of entrenched conflict and divisiveness among the parties. In such cases, the result is no action, which is probably better than action that will fall apart during implementation for want of consensus. Dealing with Deadlock Alternatively, when strong opposition exists to the evaluation design from one set of stakeholders, an evaluator may, in certain circumstances, proceed by leaving out that set of opposition stakeholders and working with the others. Employing this approach has many potential dangers, but it does happen from time to time and has worked.

Managing Tasks Effectively Managing the tasks may be easier than managing people. It helps to stay focused on the evaluation goal and the most important tasks. A task map can help by listing everyone’s particular assignments along with the start and completion dates (see Table 12.4). Table 12.4 12.4: Hypothetical Task Map – first portion (7/1 to 8/31)

Page 696

Task

Name

Start date

Due date

Review prior reports

Linda

7/1/

7/31

Schedule meeting with stakeholders

Ed

7/15

7/31

Conduct stakeholder meetings

Linda and Ed

8/1

8/15

Design the evaluation

Ray

7/1

8/31

Develop data collection instruments

Ray

8/01

8/31

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use Another tool is the Gantt chart (see Table 12.5). A Gantt chart is a commonly used chart, which shows the interrelationships of projects, schedules, and other time-related systems progress over time. In project management, a Gantt chart shows the task assignments and when the tasks start and finish. Activities must be monitored to ensure assignments are completed in a timely fashion. If expected progress is not being made, the evaluator needs to figure out the barriers and how to remove them. It is important that the team members feel safe to report problems; it is often easier to fix a problem that is detected early. While it is important to have a plan, it is also important to remain flexible in the face of insurmountable obstacles. Adjustments can be made: more time or resources may be needed or fewer tasks will be included. Table 12.5 12.5: Example of Gantt Chart Task

Month 1

Review Meetings Design Implement

2

3

4

5

6

------

-----

------

▲

7

▲ ▲ ▲

*

The Gantt chart needs to be monitored to ensure that the work is done as planned. It is likely, however, that unanticipated events will occur; some flexibility will be needed. Ultimately, the managing evaluator is responsible for the overall quality of the evaluation and ensuring that the findings are defensible (capable of being justified). If the team wants to make recommendations, the managing evaluator must ensure that they flow from the evidence and are realistic. Case 12-1 describes a plan to manage the tasks set out for a country implementation review.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 697

Chapter 12

Case 12.1 12.1: Mozambique: Country Implementation Review A team set forth to conduct a country implementation review. The team leader decided to do it in a participatory way. The first task was to identify the stakeholders. Banks officials and government officials were easily identified. At the country level, core ministries and agencies implementing the projects were identified. Their target was to get all the Central Bank and Finance staff handling World Bank supported projects plus the project director and/or coordinator of each project. All were invited to a four-day workshop with an outside facilitator. An icebreaker exercise was used to break through the formal relationships. Using a facilitated process, the group developed the agenda for the review: •

The role of project implementation agencies

•

Procurement

•

Disbursements

•

Planning and monitoring (budget, accounting, audit and evaluation)

Additional dialogue surfaced more issues: •

Information needs

•

Pay and remuneration

People could work on the agenda items of greatest interest to them. They wrote the summary reports and the annexes. A long list of recommendations emerged. They achieved three objectives: identify obstacles to project implementation, develop ways to overcome obstacles, and create spirit of teamwork and dialogue. “We have no choice but to practice participation because development is a social process. It occurs when people come together and choose new behaviors that they have learned about by working together. There is simply no other way to build ownership and a productive network of relationships other than by involving the relevant stakeholders in participatory sessions…it is the process of collaboration that creates ownership and lasting relationships.” (Jacomina de Regt, 1996, p. 87)

Page 698

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Part IV: Assessing the Quality of an Evaluation The final step in pulling it all together is to critically assess the quality of the draft evaluation. A good evaluation: •

meets stakeholder needs and requirements

•

is relevant and realistic scope

•

uses appropriate methods

•

produces reliable, accurate and valid data

•

includes appropriate and accurate analysis of results

•

presents impartial conclusions

•

conveys results clearly – in oral or written form

• meets professional standards (see Chapter 1).Kusek and Rist (2004, pp. 126-127) discuss six characteristics of quality evaluations. They are: •

Impartiality: The evaluation information should be free of political or other bias and deliberate distortions. The information should be presented with a description of its strengths and weaknesses. All relevant information should be presented, not just that which reinforces the views of the manager or client.

•

Usefulness: Evaluation information needs to be relevant, timely, and written in an understandable form. It also needs to address the questions asked, and be presented in a form desired and best understood by the client and stakeholders.

•

Technical adequacy: The information needs to meet relevant technical standards – appropriate design, correct sampling procedures, accurate working of questionnaires and interview guides, appropriate statistical or content analysis, and adequate support for conclusions and recommendations, to name but a few.

•

Stakeholder involvement: There should be adequate assurances that the relevant stakeholders have been consulted and involved in the evaluation effort. If the stakeholders are to trust the information, take ownership of the findings, and agree to incorporate what has been learned into ongoing and new policies, programs, and project, they have to be included in the political process as active partners. Creating a façade of involvement, or denying involvement to stakeholders, are sure ways of generating hostility and resentment toward the evaluation – and even toward the manager who asked for the evaluation in the first place.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 699

Chapter 12 •

•

Feedback and dissemination: Sharing information in an appropriate, targeted, and timely fashion is a frequent distinguishing characteristic of evaluation utilization. There will be communication breakdowns, a loss of trust, and either indifference or suspicion about the findings themselves if: −

evaluation information is not appropriately shared and provided to those for whom it is relevant

−

the evaluator does not plan to systematically disseminate the information and instead presumes that the work is done when the report or information is provided

−

no effort is made to target the information appropriately to the audiences for whom it is intended.

Value for money: Spend what is needed to gain the information desired, but no more. Gathering expensive data that will not be used is not appropriate – nor is using expensive strategies for data collection when less expensive means are available. The cost of the evaluation needs to be proportional to the overall cost of the initiative.

There are several useful checklists available for assessing the quality of an evaluation, and it is useful to apply at least two of them when reviewing work to make sure everything needed is there. Some particularly useful checklists are available at http://www.wmich.edu/evalctr/checklists/checklistmenu.htm#meta

They include:

Page 700

•

The Key Evaluation Checklist (Scriven)

•

Program Evaluations Meta-evaluation Checklist (Based on The Program Evaluation Standards) (Stufflebeam)

•

Utilization-Focused Evaluation Checklist (Patton)

•

Guidelines and Checklist for Constructivist (a.k.a. Fourth Generation) Evaluation (Guba & Lincoln)

•

Deliberative Democratic Evaluation Checklist (House & Lowe)

•

Guiding Principles Checklist (For evaluating evaluations in consideration of The Guiding Principles for Evaluators) (Stufflebeam)

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Using a MetaMeta-evaluator If possible, build hiring an experienced meta-evaluator into the budget. This is someone with evaluation expertise who is not involved in conducting the evaluation, but can be a sounding board, advisor, and helpful critic at any stage during the evaluation process.

Helpful Hints for MetaMeta-evaluation Unable to afford a meta-evaluator? Here are some creative options available for those on a low budget. 1. Consider getting a “rapid assessment” meta-evaluator – get an expert to quickly look over the evaluation plan (or report) and identify any gaps and make suggestions. For a stronger enhancement, have two evaluators with complementary skills and perspectives to look at your work. The combined value of their feedback will be much more than double! 2. Offer to act as meta-evaluator/reviewer for someone else, provided they will return the favor sometime. Even a quick look from a fresh set of eyes can add some real value. Evaluation is a very challenging (sometimes daunting) task, so the more feedback and advice an evaluator receives, the better quality product that can be delivered to stakeholders. After all, being an evaluator is all about believing in the value of feedback to maximize quality and effectiveness. What better way to convey the importance of this than to do it yourself?

Part V: Using Evaluation Results Recall that the purpose of an evaluation is to provide information that can be used to: •

learn what we did not know

•

modify, expand, or cancel projects, programs, or policies

•

develop new programs and/or policies.

In the early stages of planning an evaluation, evaluators spend time identifying how the evaluation will be used. If they cannot identify the primary intended users and how the information in the evaluation will be used, they should not conduct the evaluation (Weiss, 2004, p 1, ¶ 2).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 701

Chapter 12 The purpose of formative evaluations is to study and facilitate implementation. For this reason, most formative evaluations focus on improvement and tend to be more open-ended. They usually gather information from a variety of data sources about the strengths and weaknesses of a project, program, and or policy, and encourage reflection and innovation to improve outcomes. The purpose of a summative evaluation is accountability. For this reason, summative evaluations are usually used by “third party” interests, such as donor organizations, board members, key stakeholders, and so on. They will form an opinion about the overall effectiveness, merit, or worth of the project, program, and or policy (Weiss, 2004, p. 2). Kusek and Rist (2004, p. 117) describe the following kind of information that evaluations can supply: •

•

•

Strategy: are the right things being done? −

rationale or justification

−

clear theory of change.

Operations: are things being done right? −

effectiveness in achieving expected outcomes

− −

efficiency in optimizing resources client satisfaction.

Learning: are there better ways? − − −

alternatives best practices lessons learned.

Kusek and Rist (2004, pp. 115-118) also describe the following uses for evaluations: •

Page 702

Pragmatic uses of evaluation: −

help make resource allocation decisions

− −

help rethink the causes of a problem identify emerging issues

−

support decision-making on competing or best alternatives

− −

support public sector reform and innovations build consensus on the causes of a problem and how to respond.

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use •

Answering eight types of management questions: −

descriptive

− −

normative or compliance correlational

−

impact or cause-and-effect

− − −

program logic implementation or process performance

−

appropriate use of policy tools.

Michael Quinn Patton (IPDET, 2005), identifies additional uses for the findings from evaluations as shown in Table 12.6. Table 12.6 12.6: Three Primary Uses of Evaluation Findings Use of Evaluation

Examples

Judge merit or worth

Summative evaluation Accountability Audits Quality control Cost benefit decisions Decide a program’s future Accreditation/licensing

Improve programs

Formative evaluation Identify strengths and weaknesses Continuous improvement Quality enhancement Being a learning organization Manage more effectively Adapt a model locally

Generate knowledge

Generalizations about effectiveness Extrapolate principles about what works Theory building Synthesize patterns across programs Scholarly publishing Policy making

Source: Patton, 2005

Patton also discusses four primary uses of evaluation logic and processes, as shown in Table 12.7. These uses describe situations where the impact comes primarily from the application of evaluation thinking and from engaging in an evaluation process. Note the contrast between uses and situations where impacts come from using the content of evaluation findings.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 703

Chapter 12 Table 12.7 12.7: Four Primary Uses of Evaluation Logic and Processes. Uses of Evaluation Logic and Processes

Examples

Enhancing shared understandings

Specifying intended uses to provide focus and general shared commitment Managing staff meetings around explicit outcomes Sharing criteria for equity/fairness Giving voice to different perspectives and valuing diverse experiences

Supporting and reinforcing the program intervention

Building evaluation into program delivery processes Participants monitoring their own progress Specifying and monitoring outcomes as integral to working with program participants

Increasing engagement, selfdetermination, and ownership

Participatory and collaborative evaluation Empowerment evaluation Reflective practice Self evaluation

Facilitating program and organizational development

Developmental evaluation Action research Mission oriented, strategic evaluation Evaluability assessment Model specification

Source: Patton, 2005

Page 704

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use The following are some of the suggestions for ways to improve the use of evaluations. •

Gain support from the top. −

Increase the awareness of upper level personnel of the role that evaluations can play and ways that the evaluation can help them.

−

Help upper level personnel set realistic expectations.

•

Involve the stakeholders.

•

Integrate the evaluation into the workings of the institution. −

•

•

•

•

Use formal mechanisms and incentives, and wherever possible link the recommendations to budget processes.

Plan evaluations. −

Planning is a major success factor; the evaluation should be well-designed from the outset, with each step of the process anticipated, and planned for, including the final presentation.

−

Set a high standard for quality in methodology.

−

Be sure to identify everyone involved, particularly the people who are most likely to be willing to implement changes and have the ability to make change happen, and plan to meet their needs.

Consider the timing: timing is everything. −

Evaluations must be timed appropriately to the life of a program. Do not do an impact evaluation too soon.

−

A good evaluation that arrives after the decision has been made is useless.

−

A politically sensitive evaluation might be better received after an election.

Communication is important. −

Present an early draft of the evaluation to stakeholders for comment and revision.

−

Make final reports available to the public. Include information with negative findings.

Maintain credibility at every step. −

How well the evaluation is used is always proportional to its credibility. Credibility is increased by trust in the competence of the evaluator.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 705

Chapter 12

Influence and Effects of Evaluation Once the evaluation is complete, evaluators face a major challenge. How do they bring the information that they learned to the attention of the decision-makers, when they make the decisions? How can evaluators be in the right place, at the right time, with the right information? An ideal way to accomplish this is to attend the meeting where the decisions regarding issues that have been addressed in the evaluation will be made, and to brief the decision makers on the results. By doing so, the evaluators ensure that everyone is clear about the results, implications, and the recommendations they are advocating, if any. But keep in mind that evaluation results are often only one important source of information among several that will be considered. Carol Weiss (Weiss, 2004) suggests a primary step in getting the information from the evaluation used is to identify the “evaluation users”. These are the people with the willingness, authority, and ability to put what they have learned from the evaluation to work in some way. Once evaluators identify the users of an evaluation, they need to foster evaluative thinking among them. Most evaluation users will be unfamiliar with evaluations and how to best use them. Introduce a general awareness and understanding of the practices and procedures of evaluation. When stakeholders participate in an evaluation, the process can: •

draw their attention to issues they have not considered

•

create dialogue among the stakeholders.

This process can produce intended and unintended results long after the evaluation results are presented (Kirkhart, 2000).

Page 706

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Summary Evaluations are complex activities. Planning is essential to ensure that the evaluation meets the objectives of the evaluation. Managers are the people who coordinate activities and people. Many evaluations use project managers to improve the quality of the program as well as meeting schedules and budgets. Managers manage people to control: scope, time, money, and resources. Each person working on an evaluation needs to understand their roles and responsibilities, and those of all others involved in the evaluation. The terms of reference or scope of work help by putting these out in print for all to review and agree on. Managers need to keep the goals (objectives) of evaluations in mind as they make decisions for the evaluation. As a manager addresses questions of personnel, time, tasks, and costs, the goal of the evaluation must be considered. Each decision is based on helping to reach the goal of the evaluation, helping to attain an evaluation that meets quality standards. Meta-evaluators are an asset to an evaluation. They can assist at all stages of the evaluation by identifying ways to improve the evaluation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 707

Chapter 12

Chapter 12 Activities: Application Exercise 12.1: Individual Activity — Terms of Reference Reference Instructions: Review the following Terms of Reference and critique the Integrated Framework for Trade Development’s Terms of Reference. Review of the First Two Years: At the suggestion for the World Trade Organization (WTO), the interagency group that manages the Integrated Framework (IF) program asked the World Bank to take the lead in conducting a review of the implementation of the Integrated Funding project over the past two years. The IF is a joint undertaking of several agencies and its objective is to help the least-developed countries to take advantage of opportunities offered by the international trade system, its ambit (sphere) is trade-related assistance, from seminars on WTO rules to improvement of ports and harbors. It functions basically by helping individual countries to identify their needs, then to bring a program of requested assistance to a Round Table meeting for support from donors. As a result of several meetings, the interagency group agreed that the review should cover the following six topics: 1. Identify perceptions of the objectives of the IF by exploring the views of involved parties; 2. Evaluate the implementation of the IF with regard to the process of the IF, output, implementation, pledges, assistance and new money, impact of the IF in terms of its relevance to enhancing the contribution of trade to development of least developed countries; 3. Review of trade-related assistance: institution-building, building human and enterprise capacity and infrastructure; 4. Policy considerations, including the enlargement of the IF, the trade and macroeconomic policy environment; 5. Administration of the IF; and, 6. Recommendations for the future.

Page 708

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use In covering these topics, the consultants should assess the relevance of the IF operations to IF objectives. The costeffectiveness of the IF in achieving its objectives should also be assessed. The consultant should also assess the effectiveness of the coordination between the core agencies that oversee the IF, the Round Tables and other activities.

The consultant is expected to examine documentation available on IF implementation, carry out interviews with operational staff of all agencies involved, and seek out the reviews of the representatives of the Least Developed Countries, as well as government and business representatives in at least two Least Developed Countries who have benefited from the IF, one of which should be from Africa (Bangladesh and Uganda are proposed). Representatives of the key donor will also be consulted. The report will be about 20 pages long, with annexes (appendices) as needed.

Group Activity: Working in pairs (if possible), answer these questions: 1. Does the TOR have all the necessary elements? 2. Which elements are complete? 3. Which elements could be improved?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 709

Chapter 12

Application Application Exercise 1212-2: Are You Ready to Be a Manager? Instructions. Read through this list of Characteristics of a Manager compiled by F. John Reh (2007). Identify the skills you have and those you need to improve. As a person: •

You have confidence in yourself and your abilities. You are happy with who you are, but you are still learning and getting better.

•

You are something of an extrovert. You do not have to be the life of the party, but you cannot be a wallflower. Management is a people skill - it is not the job for someone who does not enjoy people.

•

You are honest and straight forward. Your success depends heavily on the trust of others.

•

You are an “includer,” not an “excluder.” You bring others into what you do. You don’t exclude others because they lack certain attributes.

•

You have a ‘presence’. Managers must lead. Effective leaders have a quality about them that makes people notice when they enter a room.

On the job:

Page 710

•

You are consistent, but not rigid; dependable, but can change your mind. You make decisions, but easily accept input from others.

•

You are a little bit crazy. You think out-of-the box. You try new things and if they fail, you admit the mistake, but don’t apologize for having tried.

•

You are not afraid to “do the math”. You make plans and schedules and work toward them.

•

You are nimble and can change plans quickly, but you are not flighty.

•

You see information as a tool to be used, not as power to be hoarded.

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

References and Further Reading Bemelmans Videc, M.L., R.C. Rist, and E. Vedung (1997). Sticks, Carrots and Sermons: Policy Instruments and their Evaluation. Piscataway, NJ: Transaction Publishers. Billson Janet Mancini, The power of focus groups: A training manual for social, policy, and market research: Focus on international development. Centres for Disease Control and Prevention, 2001, Introduction to Program Evaluation. Retrieved February 5, 2008 from http://www.cdc.gov/tobacco/tobacco_control_programs/su rveillance_evaluation/evaluation_manual/00_pdfs/Chapter 1.pdf Chelimsky, E. (1987). “The politics of program evaluation” Social Science and Modern Society, 25, 24-32. Chinese National Centre for Science and Technology Evaluation (NCSTE) (China) and Policy and Operations Evaluation Department (IOB) (the Netherlands) (2006). Country-led Joint Evaluation of the ORET/MILIEV Programme in China. Amsterdam: Aksant Academic Publishers. de Regt ,Jacomina (1996). Mozambique: Country Implementation Review. In The World Bank Participation Sourcebook. Retrieved August 13, 2007 from: www.worldbank.org/wbi/sourcebook/sb0211.pdf) Economic and Social Research Council (ESRC) (2007). Welcome to ESRC today. Top ten tips . Retrieved August 13, 2007 from: http://www.esrc.ac.uk/ESRCInfoCentre/Support/Commu nications_Toolkit/communications_strategy/index.aspx Evalnet (2000). Retrieved August 13, 2007 from: http://www.evalnet.co.za/services/ Fitzpatrick, Jody L. James R. Sanders, and Blaine R. Worthen. (2004).Program evaluation: Alternative approaches and practical guidelines. New York: Pearson Education. Feuerstein, M. T. (1986). Partners in Evaluation: Evaluating Development and Community Programs with Participants. London: MacMillan, in association with Teaching Aids at Low Cost. Greer, Michael (2001). The project manager’s partner. Amherst, MA: HRD Press. Retrieved June 2, 2008 from http://www.michaelgreer.com/20-actns.htm Greer, Michael (2008). Michael Greer's Project Management. Retrieved June 3, 2008 from http://www.michaelgreer.com. The Road to Results: Designing and Conducting Effective Development Evaluations

Page 711

Chapter 12 Horn, Jerry (2001). A Checklist for Developing and Evaluating Evaluation Budgets. Retrieved August 13, 2007 from: http://www.wmich.edu/evalctr/checklists/evaluationbudg ets.pdf Hawkins, Penny (2005). “Contracting evaluation,” IPDET workshop, June 2005. Slides 1-18. King, Jean A., Laurie Stevahn, Gail Ghere. & Jane Minnema (2001). Toward a taxonomy of essential evaluator competencies. American Journal of Evaluation, 22(2), 229247. Retrieved June 2, 2008 from http://www.nbowmanconsulting.com/Establishing%20Ess ential%20Program%20Evaluator%20Competencies.pdf Kirkhart, K. E. (2000). “Reconceptualizing evaluation use: An integrated theory of influence”. In V.J. Caracelli and H. Preskill (Eds.) The expanding scope of evaluation use. New Directions for Evaluation, No. 88. San Francisco: JoseyBass. Kusek, Jody Zall and Ray C. Rist (2004). Ten steps to a resultsbased monitoring and evaluation system. Washington D.C.: The World Bank. Lawrence, J. (1989). Engaging Recipients in Development Evaluation—the “Stakeholder” Approach. Evaluation Review, 13:3. Leeuw, Frans (1991). Policy Theories, Knowledge Utilization and Evaluation. OECD World Forum on Statistics: Knowledge and Policy, 4:73-91. McNamera, Carter(2007). Checklist for Program Evaluation Planning. Retrieved on August 10, 2007 from http://www.managementhelp.org/evaluatn/chklist.htm Muir, Edward. (1999). “They Blinded me with Political Science”: On the use of on-peer reviewed research in education policy. Political Science and Politics 32 (4):762-764. Northeast & the Islands Regional Technology in Education Consortium (NEIR TEC) (2004). Gathering together and planning: Exploring useful skills for educators to develop through collaborative evaluation. Retrieved February 5, 2008 from http://www.neirtec.org/evaluation/PDFs/GatherTogether3. pdf OECD/DAC (2002). OECD, DAC Glossary of key terms in evaluation and results based management,. Patton, Michael Q. (1977). “In search of impact: An analysis of the utilization of Federal health evaluation research.” in Using social research in public policy making, C. H. Weiss (Ed.). Lexington: Lexington Books. Page 712

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use Patton, Michael Q. (1983) in Weiss Identifying use(s) of an evaluation (2004). Patton, Michael Q. (1997). Utilization-focused Evaluation (3rd ed.). Thousand Oaks, CA: Sage. Patton, Michael Q. (2005), presentation to the International Program for Development Evaluation Training (IPDET) June, 2005. Reh, F. John (2007) Management Tips. Retrieved August 13, 2007, from: http://management.about.com/cs/generalmanagement/a/ mgt_tips03.htm Rist, Ray C. and N. Stame (2006). From studies to streams: Managing evaluative systems. Piscataway, N.J.: Transaction Publishers. Rossi, Peter Henry, Howard E. Freeman, and Mark W. Lipsey (1999). Evaluation: A systematic approach. Thousand Oaks: Sage Publications. Rutman, L. (1980). Planning useful evaluations: Evaluability assessment. Thousand Oaks: Sage Publications. Schwartz, R. and J. Mayne (2004). Quality matters: Seeking confidence in evaluating, auditing, and performance reporting. Piscataway, NJ: Transaction Publishers. Schwartz, R. (1998).“The Politics of Evaluation Reconsidered: A Comparative Study of Israeli Programs”. Evaluation, 4 294309. Scriven, Michael (2007). Key Evaluation Checklist, (KEC). Retrieved August 13, 2007 from: http://www.wmich.edu/evalctr/checklists/kec_feb07.pdf Stevahn, Laurie, Jean A. King, Gail Ghere and Jane Minnema (2005). Establishing essential competencies for program evaluators. American Journal of Evaluation, March 2005. Vol. 26 No. 1, 43-59. Stufflebeam, Daniel L. (1999) Evaluations Plans and Operations Checklist. Retrieved on August 10, 2007 from: http://www.wmich.edu/evalctr/checklists/plans_operation s.pdf Tilley, Nick. (2004). Applying theory-driven evaluation to the British Crime Reduction Programme: the theories of the programme and of its evaluations. Thousand Oaks: Sage Publications. Treasury of Board of Canada Secretariat (2005). Improving the professionalism of evaluation. Final report, May 31, 2005. Retrieved June 2, 2008 from http://www.tbssct.gc.ca/eval/dev/Professionalism/profession_e.asp The Road to Results: Designing and Conducting Effective Development Evaluations

Page 713

Chapter 12 UNDP Planning and managing an evaluation Website: Retrieved February 2005 from: http://www.undp.org/eo/evaluation_tips/evaluation_tips.h tml UNFPA (2004). Programme Manager’s Planning, Monitoring and Evaluation Toolkit. Retrieved on February 5, 2007 from http://www.unfpa.org/monitoring/toolkit/5managing.pdf United Nations Population Fund(UNFPA) (2007). Programme manager’s planning, monitoring and evaluation toolkit. Tool Number 5: Planning and Managing an Evaluation Retrieved August 13, 2007 from: http://www.unfpa.org/monitoring/toolkit/5managing.pdf Universalia—WBI, World Bank training, based on exercise in Chapter 3, p. 12-13. Weiss, Carol H. (2004). Identifying the intended use(s) of an evaluation. IDRC. Retrieved on August 10, 2007 from: http://www.idrc.ca/ev_en.php?ID=58213_201&ID2=DO_T OPIC Weiss, Carol H. (1973). Where politics and evaluation research meet.” Evaluation, 1, 37-45. Wholey, J. S. (1979). Evaluation: promise and performance. Washington, DC: The Urban Institute. Wholey, J. S. (1994). “Assessing the feasibility and likely usefulness of evaluation,” in Handbook of practical program evaluation, Wholey, Hatry, and Newcomer, editors. San Francisco: Jossey-Bass. Widavsky, A. (1972). “The self-evaluating organization.” Public Administration Review, 32, 509-520.

Page 714

The Road to Results: Designing and Conducting Effective Development Evaluations

Managing for Quality and Use

Web Sites International Development Research Centre (2004). Evaluation Planning in Program Initiatives Ottawa, Ontario, Canada. http://web.idrc.ca/uploads/userS/108549984812guideline-web.pdf Conflict Resolution Information Source http://www.crinfo.org/index.jsp Conflict Resolution Network http://www.crnhq.org/ Economic and Social Research Council (ESRC) ESRC Society Today: Communication Strategy http://www.esrc.ac.uk/ESRCInfoCentre/Support/Commu nications_Toolkit/communications_strategy/index.aspx Management Sciences for Health (MSH) and the United Nations Children’s Fund (UNICEF), “Quality guide: Stakeholder analysis” in Guide to managing for quality. http://bsstudents.uce.ac.uk/sdrive/Martin%20Beaver/We ek%202/Quality%20Guide%20%20Stakeholder%20Analysis.htm McNamara, C. (1999). Checklist for program evaluation planning. http://www.managementhelp.org/evaluatn/chklist.htm The Evaluation Center, Western Michigan University, The checklist project. http://evaluation.wmich.edu/checklists Reh, F. John, How to be a better Manager: http://management.about.com/cs/midcareermanager/a/h tbebettermgr.htm UNDP, Planning and Managing an Evaluation: http://www.undp.org/eo/evaluation_tips/evaluation_tips.h tml UNFPA, Programme Manager’s Planning, Monitoring and Evaluation Toolkit. http://www.unfpa.org/monitoring/toolkit.htm The Evaluation Center, Western Michigan University. The Checklist Project: http://www.wmich.edu/evalctr/checklists/checklistmenu. htm#mgt The World Bank Participation Sourcebook. Online (HTML format): http://www.worldbank.org/wbi/sourcebook/sbhome.htm

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 715

Chapter 12

W.K. Kellogg Foundation (1998). W.K. Kellogg Evaluation Handbook. Online: http://www.wkkf.org/Pubs/Tools/Evaluation/Pub770.pdf Weiss, Carol. Evaluating capacity development: Experiences from research and development organizations around the world. (Chapter 7, Using and Benefiting from an Evaluation. http://www.agricta.org/pubs/isnar2/ECDbood(H-ch7).pdf Weiss, Carol. Identifying the intended use(s) of an evaluation. 2004. The International Development Research Center. http://www.idrc.ca/ev_en.php?ID=58213_201&ID2=DO_T OPIC

Page 716

The Road to Results: Designing and Conducting Effective Development Evaluations

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 13 Evaluating Complex Interventions Introduction Previous chapters of this text presented a comprehensive discussion about development evaluation, how to plan an evaluation, and how to implement one. As difficult as it may be to design and conduct an evaluation program with several major components, it is even more challenging to examine the effects of multiple interventions taken together. Development interventions are becoming more complex. Interventions present evaluators with new challenges, new expectations (i.e., MDG’s) new paradigms for dealing with poverty, new methods, and new clients. This chapter discusses the relation of evaluation at the project/program level to country, thematic, sector, or global levels. This chapter has six sections. They are: •

Big Picture Views of Development Evaluation

•

Country Program Evaluations

•

Sector Program Evaluations

•

Thematic Evaluations

•

Joint Evaluations

•

Global and Regional Partnership Program (GRPP) Evaluations

•

Evaluation Capacity Development.

Chapter 13

Part I: Big Picture Views of Development Evaluation Evaluation must some times take a big picture view: what is the overall experience and impact of development interventions within a particular sector, such as health, education, or transportation? A cabinet ministry might want to determine the overall impact and lessons to be learned from interventions aimed at improving the economic well-being of children or women or those in rural areas, or a donor may want to examine the effectiveness of its education sector strategy. Increasingly, donor-lending programs are being based on sector-wide approaches. Complex economic, political, and social factors affect development activities and evaluations. Development and development evaluation are increasingly becoming pluralistic enterprises, including the involvement of non-governmental actors such as the commercial private sector, not-for profit advocacy and implementation agencies, and civil society as a whole. We have encouraged the reader in prior chapters not to think of development intervention as a linear chain. While there is an “if … then …” reasoning behind development interventions, we have stressed that there are many factors– economic, political, and social – that interact with a program and that may affect outcomes. We have encouraged evaluators to identify these factors and to design the evaluation to look for their influence. As we increase the complexity of evaluation by looking across groups of programs, the interaction of these factors also is more complex. Countries and their partners are seeking to determine their cumulative effects in bringing about changes in a sector (health, forestry, agriculture, etc.), in a country, or on cross-cutting thematic such as climate change. Evaluators have to manage the evaluations in the face of this complexity and do so in an ethical way. But complexity of evaluation is not only increasing because of the broadening scope of what is being evaluated, as donors increase awareness of the demands these evaluation place on fledgling or weak country capacities, they have called for joint evaluation, thus increasing complexity in another way.

Page 718

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions

Move to a Higher Plane Development has evolved toward a more comprehensive agenda, increasingly addressing country policy reforms, capacity building, and global concerns. In turn, evaluation has expanded by: •

reorienting the focus of evaluation from just the project, program, or activity level to the country/thematic sector and/or global levels

•

determining how best to aggregate outcomes of interventions at the activity and country level to assess global or program-wide results

•

finding ways to assess the influence of program-level design, partnership approach, and governance on overall results

•

seeking replicability at a higher level and applicability at the system level (Heath, Grasso, & Johnson, 2005, p. 2).

Country evaluations are one way to gain an overall understanding of what is happening and then provide insights about the overall effect and experience of development within a country. Sector or thematic evaluations are other ways to get a big picture view. They might focus on a single country or they might compare several countries (generally using a case study approach). They are likely to use multiple methods: some combination of available data, interviews, field visits, surveys and/or focus groups. Although there are others, in this chapter we will look more closely at three of the “Big Picture Views:” •

Country Program Evaluation

•

Thematic Evaluations

•

Joint Evaluations

•

Sector Program Evaluations

•

Evaluations of Global and Regional Partnership Programs (GRPP).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 719

Chapter 13

Part II: Country Program Evaluations “Big picture” views often focus on country assistance. A country program evaluation (sometimes called Country Assistance Evaluations or Country Program Evaluations) focuses the evaluation on an organization’s entire aid to one country. Typically, a country program evaluation is largely a normative study that compares what is being done to what was planned. It may seek to: •

assess the strategic relevance of the country assistance program relative to the country’s needs

•

test the implementation of agency-wide goals to determine whether the intended outcomes were obtained

•

identify the success and failures in different sectors or that of approaches used in the country, and identify the factors contributing to the performance

•

identify the effectiveness of the donor’s aid to a given country (OECD, DAC 1999).

Country assistance evaluations usually focus on the OECD/DAC dimensions of relevance, efficiency, impact, and sustainability. They can look at donor performance and/or country performance. Country program evaluations may face substantial challenges: •

The overall country assistance may lack coherent goals and instead reflect an opportunistic approach.

•

Similar development interventions may be funded by several sources, making attribution difficult. Note: cofunding is one issue; donors funding similar interventions is another issue.

•

Typically, there is no mapping of in-country assistance. It is difficult to know what others are doing in the area of the intervention.

•

As noted by OECD/DAC: “…as in any evaluation, there are reputations at stake and fears of the consequences can threaten the morale and commitment of programme and partner staff. Country Program Evaluation, like any evaluation, must proceed with sensitivity” (DAC, 1999, p. 18).

Page 720

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions

Recommendations for Country Program Evaluations The following recommendations are adapted from the DAC Network on Development Evaluations. •

A greater proportion of evaluations should be undertaken jointly, with full and active participation of the aid recipients and other partners.

•

Developing countries should show greater initiative in taking the lead in planning, coordinating, and scheduling evaluations.

•

Developing countries should be supported to build their institutional capacity for initiating and leading joint evaluations.

•

Better coordination and knowledge sharing is needed amongst the various partners within aid recipient countries. National M&E networks and professional associations need to be built and expanded.

•

When a large joint evaluation is undertaken with the participation of several developing countries, the developing countries should be facilitated to meet together to coordinate their views and inputs.

Source: DAC Network on Development Evaluations. 2005, p. 7.

The country program evaluation should start with a clearly defined terms of reference to determine exactly what the stakeholders expect. Countries have multitudes of interventions over time. The time period to be covered should be specified. The terms of reference should: •

clearly state the purpose of the evaluation, the evaluation criteria, and way in which the findings will be used

•

specify the organization’s original priorities for the country program (for example, poverty reduction, increased crop production, etc.)

•

specify reporting, dissemination, and follow-up procedures − full disclosure is ideal.

Because of the difficulty of finding a counterfactual, benchmarking is important in country program evaluations. Typically, comparisons are made with the organization’s other countries, and matched on various characteristics such as region. As more country program evaluations are publically disclosed, more cross-organizational comparisons may be feasible.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 721

Chapter 13 It is very important to conduct the evaluation in partnership with the stakeholders. Consider using focus groups, interviews, client surveys, and participatory evaluation as part of your toolkit – again depending on your questions and how you choose to answer them.

Example of Country Program Evaluation Methodology By 2005, the Independent Evaluation Group (IEG) of the World Bank (Heath, Grasso, & Johnson, 2005, pp. 6-9) had completed over 70 country program evaluations and developed a clearly articulated methodology. Its evaluation methodology is a bottom-up and top-down approach, including: •

Evaluating in Three Dimensions

•

Evaluating Assistance Program Impact

•

Using a Rating Scale.

Evaluating in Three Dimensions IEG examines the country assistance program across three dimensions. The dimensions are:

Page 722

•

Products and Services Dimension, involving a “bottomup” analysis of major program inputs-loans, AAA (analytical and advisory activities), and aid coordination.

•

Development Outcome Dimension, involving a “top-down” analysis of the principal country program objectives for relevance, efficacy, outcome, sustainability, and institutional impact.

•

Attribution Dimension, in which the responsibility for the program outcome is assigned to the four categories of actors: the client, the Bank, partners and other stakeholders, and external forces.

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions

Evaluating Assistance Program Impact When IEG evaluates the expected development impact of an assistance program, it gauges the extent to which the major strategic objectives were relevant and achieved, with no shortcomings. Typically, programs express their goals using higher order objectives, such as the MDGs and poverty reduction. The country assistance strategy may also establish intermediate goals, such as improved targeting of social services or promotion of integrated rural development. It may also specify how the programs are expected to contribute toward achieving the higher order objective. The task for IEG becomes validating whether the intermediate objectives produced satisfactory net benefits, and whether the results chain specified in the country assistance strategy was valid. Where causal linkages were not specified in the country assistance strategy, it becomes the evaluator’s task to reconstruct this causal chain from the available evidence. The evaluator also needs to assess the relevance, efficacy, and outcome of the intermediate and higher order objectives. Evaluators also assess the degree to which the clients demonstrate ownership of international development priorities. Examples of such priorities are: MDGs, and corporate advocacy priorities, such as safeguards. Ideally, these issues would be identified and addressed by the country assistance strategy, allowing the evaluator to focus on whether the approaches adapted for this assistance were appropriate. However, in other instances, the strategy may be found to have glossed over certain conflicts, or avoided addressing key client development constraints. In either case, the consequences could include a decrease in program relevance, a loss of client ownership, and/or unwelcome side-effects, such as safeguard violations, all which must be taken into account in judging program outcome. The important point is that, even if the World Bank’s projects did well, if the country did not do well, the assistance could be judged unsatisfactory.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 723

Chapter 13

Using a Ratings Scale The IEG uses six rating categories for assistance program outcome, ranging from highly satisfactory to highly unsatisfactory. These categories are described in Table 13.1. Table 13.1: Descriptions of IEG Ratings Scale. Rating

Description

Highly Satisfactory

The assistance program achieved at least acceptable progress toward all major relevant objectives, and had best practice development impact on one or more of them. No major shortcomings were identified.

Satisfactory

The assistance program achieved acceptable progress toward most of its major relevant objectives. No best practice achievements or major shortcomings were identified.

Moderately Satisfactory

The assistance program achieved acceptable progress toward most of its major relevant objectives. No major shortcomings were identified.

Moderately Unsatisfactory

The assistance program did not make acceptable progress toward most of its major relevant objectives, and either (a) did not take into adequate account a key development constraint or (b) produced a major shortcoming, such as a safeguard violation.

Unsatisfactory

The assistance program did not make acceptable progress toward most of its major relevant objectives, and either (a) did not take into adequate account a key development constraint or (b) produced a major shortcoming, such as a safeguard violation.

Highly Unsatisfactory

The assistance program did not make acceptable progress toward any of its major relevant objects and did not take into adequate account a key development constraint, while also producing at least one major shortcoming, such as a safeguard violation.

The Retrospective In 2007, IEG issued a retrospective on their experiences. The Retrospective indicated that, in one-third of the country programs IEG evaluated, the outcomes from the projects were successful, but the overall evaluation of the country was unsuccessful. It also noted that, while the outcomes in the health and education sector were relatively successful, outcomes in private sector development, public sector management, and rural development were less successful.

Page 724

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions The Retrospective identified three factors that were responsible for the lack of success in those sectors: •

reforms in these sectors seem to face stronger opposition from vested interest and potential losers

•

institutional capacity constraints take time to resolve and many projects in these sectors attempt to improve upon the legal, institutional, and regulatory framework, but implementing change requires overcoming inertia and adverse incentives in the bureaucracies of many countries

•

these sectors also suffer larger adverse effects from exogenous events and macroeconomic shocks. Improving outcomes in all sectors would imply focusing more on measuring and supporting results-based indicators (IEG, 2007a).

From their experience, IEG developed a number of lessons for improving country assistance strategies. These are: •

A key component of successful country programs is that they are tailored to the country context and an understanding of the political economy of reform is essential. Domestic politics and vested interests largely determine the pace and content of reforms in countries.

•

Country knowledge is strongly associated with success. Analytical and advisory activities can also be an effective vehicle for engaging governments in policy dialogue and informing civil society, but adequate attention needs to be paid to dissemination.

•

Technical assistance and investment loans can play a large role in promoting institutional development and capacity-building. But the sustainability of benefits requires that these operations are part of a broader macro stabilization and reform program. Linking technical assistance and investment loans with policy reforms supported by adjustment loans also improves the probability of success.

•

Adjustment lending can be successful, especially when combined with a strong government commitment to macro stabilization and structural reform. However, without sustained progress on government stabilization and reform may cause debt and weaken the incentive to reform (IEG, 2007a).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 725

Chapter 13 The Retrospective identified several strategies that would serve to improve the outcomes of Bank assistance programs: •

Undertaking more robust risk analysis to carefully assess borrower commitment to reform and implementation capacity.

•

Reducing the level of planned assistance when faced with clear evidence of policy slippage.

•

Lending more prudently in turn-around situations.

To summarize the Retrospective they state: Generalizing these points, optimistic projections or expectations with inadequate risk analysis often weaken the performance of country strategies. Programs should not be based on the best possible of forecasted outcomes. And, finally, country strategies need to be flexible, not rigid and narrow with only one path (IEG, 2007a).

Institutional Development Impact The institutional development impact (IDI) can be rated as high, substantial, modest, or negligible. IDI measures the extent to which the program strengthened the client’s ability to use its human, financial, and natural resources more efficiently, equitably, and for more sustainability. Examples of areas where IEG judges the institutional development impact of the program are the:

Page 726

•

soundness of economic management

•

structure of the public sector, in particular, the civil service

•

institutional soundness of the financial sector

•

soundness of legal, regulatory, and judicial systems

•

extent of monitoring and evaluation systems

•

effectiveness of aid coordination

•

degree of financial accountability

•

extent of building NGO capacity

•

level of social and environmental capital.

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions Sustainability can be rated as highly likely, likely, unlikely, or highly unlikely. If available information is insufficient, the rating can be non-evaluable. Sustainability measures the resilience of the development benefits of the country assistance program to risk over time, taking into account the following eight factors: •

technical resilience

•

financial resilience

•

economic resilience

•

social support (including conditions subject to safeguard policies)

•

environmental resilience

•

ownership by governments and other key stakeholders

•

institutional support (including a supportive legal/regulatory framework, and organizational and management effectiveness)

•

resilience to external forces, such as international economic shocks and changes in the political and security environments.

Johnson (2007, slides 2-6) of the IEG, discusses three challenges to assessing a country program evaluation: •

clarifying the object of the evaluation: −

country development performance

−

program performance

−

donor performance.

•

reaching agreement on counterfactuals

•

attributing program results correctly.

To address the first of the three challenges, the evaluator must be clear on the objective of the evaluation. Is it to evaluate the performance of: country development, the program performance, or development organization’s performance?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 727

Chapter 13 In this context, IEG defines counterfactuals as occurrences that would have happened in the absence of the intervention. Johnson discusses approaches to finding the right counterfactuals (2007, slide 4). He begins with the “ideal approach”. This ideal approach estimates the development impact that would have occurred for each intervention in the targeted area with no intervention, then compares the estimate with the result that occurred with the intervention. The ideal approach would use control group monitoring. Johnson states that it should be a standard feature for all assistance programs, but is often not done for political, ethical and cost reasons. Johnson also discusses three alternative approaches to the ideal approach: •

comparing conditions before and after the program intervention

•

comparing benefits projected by program strategy document to actual results achieved

•

comparing development results achieved to those achieved in similar sectors with similar beneficiaries in comparator countries.

Another challenge identified by Johnson is that of attributing program results correctly. That is, accurately isolating the impact of multiple interventions aimed at the same result. He asks the following questions to help illustrate attribution issues: •

What if one partner assumes primary responsibility for design of the intervention, but provides little else; a second supplies most of the financing, but little else; a third mostly administrative oversight, but little else; a fourth mostly monitoring, but little else, etc.? How should credit for the development result, or blame for its absence, be apportioned among the donors, the client government, and other stakeholders? (2007, slide 6).

Because of these complexities, Johnson explains, IEG has now abandoned hope of measuring attribution. Instead it measures Bank contribution, as judged by aid providers, beneficiaries, civil society, the client government, internal records on inputs provided, and corporate best practice. Evaluators use triangulation to help determine contribution. They use triangulation by evaluator and triangulation by dimension (Johnson, slides 7-16).

Page 728

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions Triangulation by evaluator includes comparing finding from: •

self evaluation through country assistance strategy completion reports (CASCRs)

•

evaluation by client government and other local stakeholders as part of country program evaluation

•

independent evaluation in two modes through country program evaluation and CASRs

Triangulation by dimension includes comparing findings from: •

•

•

products and services −

loans, credits, and grants

−

analytical and advisory services

−

aid coordination and resource mobilization

development outcome −

main country-level program objectives are derived from statement of objectives contained within country assistance strategy document

−

major ratings same as in project evaluations

contribution from: −

−

−

bank

professional quality of services

prudence and probability

participation and partnership

selectivity

creativity, initiative, and efficiency

client stakeholders (not publicly rated)

ownership of assistance program

effective support for design and implementation of country assistance strategy and relevant international development priorities, notably the MDGs

observance of safeguards on protecting the environment, humane resettlement, etc.

aid partners (not publicly rated)

impact on design of country assistance strategy

impact on implementation of country assistance strategy

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 729

Chapter 13 −

exogenous factors (not publicly rated)

world economic shocks

events of nature

war/civil disturbances

other.

Part III: Sector Sector Program Evaluations Sector program evaluations are evaluations done on major program sectors, such as education, health, housing, or transportation. The International Organization for Migration (IOM) Evaluation Guidelines gives the following definition of sector evaluation: An evaluation of a variety of aid actions all of which are located in the same sector, either in one country or cross-country. A sector covers a specific area of activities such as health, industry, education, transport, or agriculture (Office of the Inspector General International Organization for Migration [IOM] 2005, p. 30). Because sector program evaluations look at many projects with different objectives and different donors, they are more complex than project evaluation, and can be equally as complex as country program evaluations, or more so, if done across countries. As with evaluations of country programs, sector evaluations are generally normative evaluations. In sector program evaluations, ownership and partner responsibility are key issues (Danish Ministry of Foreign Affairs, Danish Ministry of Foreign Affairs, 1999, p. 30). Both development organizations and the partner institution are concerned with improving the delivery of the aid and accountability, and with improving development in the sector. Sector program evaluations often include several development organizations. For example, in 2005, the African Development Bank completed a review of the Education Sector in Morocco. Development organizations that were involved included: the World Bank, the European Union, the United Nations Development Programme, the United Nations Development Fund for Women, and the United Nations Joint Programme on HIV/AIDS (African Development Bank, 2005, p. 1).

Page 730

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions

Example of Joint Sector Evaluation: Evaluation: Joint External Evaluation: The Health Sector in Tanzania 19991999-2006 (Ministry of Foreign Affairs of Denmark, 2007, pp. 14-15 During the 1990s, the health sector in Tanzania faced a period of stagnation. Local health services were characterized by severe shortages of essential drugs, equipment and supplies . They were also characterized by deteriorating infrastructure and were plagued by poor management, lack of supervision and lack of staff motivation. The sector also faced stagnating or deteriorating hospital care. There was little cooperation in health service delivery between the public sector, faith-based organizations, and private service providers. Health services were severely under-funded, with public health sector spending at USD 3.46 per capita. There was also little coordination of support to the health sector by Development Partners. The Government of Tanzania and Development Partners (Belgium, Canada, Denmark, Germany, the Netherlands, and Switzerland) responded to this situation together in a process beginning with a joint planning mission convened by the Government in mid-decade. By 1999, this process resulted in the first major health sector strategic plan, the Health Sector Program of Work (POW) and an agreement that support to the health sector would take place in the framework of a Sector Wide Approach (SWAp). The POW and subsequent Health Sector Strategic Plan 2 (HSSP2) articulated a process of health sector reform aimed at addressing the recognizable deficiencies in the sector and achieving specific goals and targets in health as set out in the Millennium Development Goals (MDGs) and the National Strategy for Growth and Reduction of Poverty (NSGRP)/MKUKUTA). The Health sector Program of Work and Health Sector Strategic Plan 2 identified priority areas of strategic intervention. These included: • Strengthening district health services; • Transforming the role of the central Ministry of Health and Social Welfare into a facilitative policy organization; • Reforming and strengthening regional and national referral hospitals; • Improving central support systems (including infrastructure, health management information systems, drug supplies, transport, and communications and information technology); • Dealing more effectively with issues in human resources for health; • Improving the level, appropriateness and sustainability of health care financing; • Promoting public private partnership; • Adopting and implementing a national strategy for combating HIV/AIDS as an explicit element of the HSSP 2; and, • Improving Government and Development Partner relations to improve harmonization and alignment of external and Tanzanian resources in a more effective partnership. (continued on next page)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 731

Chapter 13 Example of Joint Sector Evaluation: Joint External Evaluation: The Health Sector in Tanzania (continued) Evaluation Focus and Approach The evaluation focused on four thematic areas: 1. The relevance of health sector strategic and implementation plans to achievement of the MDGs in health and to the National Strategy for Growth and Reduction of Poverty (MKUKUTA, in Swahili) health sector goals and targets, and the appropriateness and relevance of external support; 2. The extent of progress and achievements under each of the nine strategic priorities of the health sector reform process, as listed immediately above; 3. Achievements in improving access, service quality and health outcomes during the evaluation period; and, 4. Changes in partnership during the evaluation period, including the evolution of efforts at harmonization and alignment and the use of different aid modalities. The evaluation was conducted from December 2006 to September 2007 by a team of eight international health and evaluation consultants, three of whom were nationals of Uganda and Malawi. The main methodologies used were an extensive document review, key informant interviews at national level, district self-assessments carried out by the Ministry of Health and Social Welfare (MOHSW) staff in 16 districts in Tanzania, in-depth case studies in six districts (including discussions with community members) carried out by the evaluation team, an analysis of financial and other resource flows to the health sector at the national, regional and council level, and a review of national health outcomes data. Finally, evaluation information from all the methodologies used was tested and triangulated to identify contradictions and strengthen the validity of findings and conclusions.

Page 732

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions Example of Sector Evaluation: Private Private Sector Development and Operations: Harnessing Synergies with the Public Sector (Asian (Asian Development Bank, ank, 2007, p. iiv iiv) Introduction. Research confirms a strong link between poverty reduction, economic growth, improvements in the investment climate, and levels of private sector investment. In 2005, the Development Effectiveness Committee recommended that the Operations Evaluation Department (OED) assess the strategic direction and performance of the Asian Development Bank’s (ADB) private sector development and investment operations. OED has prepared this Special Evaluation Study to determine: (i) the extent to which ADB is pursuing the correct objectives; (ii) whether the efficiency and effectiveness of its operations to support the private sector can be improved; and (iii) the extent ADB has been creating value in its private sector development activities. Context of ADB’s Private Sector Support. Over the last four decades, there has been a dramatic shift in the way multilateral development banks operate. In the mid 1960s, there was a lack of foreign currency capital in the Asia Pacific Region. Multilateral agencies such as ADB were established to mobilize this capital from members of the Organization for Economic Cooperation and Development and make it available through sovereign loans to developing member countries. There is now tens of trillions of dollars in funds available in international capital markets and regional banks. The main problem mobilizing funding in the Asia Pacific Region is finding investment opportunities of an acceptable quality that balance risks and rewards. Also, governments are withdrawing from direct provision of services and focusing on strengthening the policy and regulatory environment for private sector. These developments have meant that private sector is becoming an increasingly important client of ADB, and this is a trend that is expected to continue for the foreseeable future. In many cases, money is no longer the core product being sought from ADB. Increasingly, what is being sought from ADB is support for building the enabling environment for the private sector (e.g., rule of law; access to finance; appropriate police/legal/regulatory frameworks) and lowering the risk associated with individual transactions through ADB’s involvement. As a result, there has been a shift within ADB to create value for transactions by providing more intangible services such as private sector development, knowledge transfer and risk management. ADB is uniquely placed to provide this combination of services through its base in the region, and presence of public and private sector operations under one roof, creating a “one stop shop” for private sector development and commercial risk management services. Fully exploiting this potential comparative advantage requires the development of strong synergies between the public and private sector parts of ADB. Study Objectives, Scope, and Methodology The Private Sector Development Strategy was meant to help realize ADB’s pro poor growth objectives and catalyze private sector development (PSD) and private sector investment. To capture the effects of this strategy, operations were evaluated over the period of 1995 to 2005, five years before and five years after the policy was adopted. (continued on next page)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 733

Chapter 13 Private Sector Development and Operations: Harnessing Synergies with the Public Sector (continued) This evaluation is designed to determine: (i) the extent ADB is producing the right outputs in its enabling environment and related investments activities to achieve private sector development outcomes; (ii) whether there are opportunities to improve the efficiency and effectiveness of its operations; and (iii) the degree to which ADB’s private sector activities are adding value. The conceptual framework for the evaluation is illustrated in Figure 13.1.

Activities

`

Outputs

Political will for reform, resource endowments, macro stability, openness to trade, extent of privatization etc.

ADB Strategic Direction for PSD

Prioritization selection of PSD Projects

Marketing and project origination capacity, project administration systems, risk management and reporting arrangements, skills and number of staff etc..

Outcomes

External Context Is ADB doing the right thing?

Implement Public Sector Loans/TA – Improve Enabling Environment

Implement NonSovereign PSO Direct Investment

Impacts

Objective To what extent is ADB adding value?

Policy and Institutional Reform

Lower transaction costs and risks

Strengthen Firm Capacity

Better firm decisions

Increased Competition and Private Investment

Internal Context Is ADB doing things right?

Fig. Fig. 13.1: Conceptual Framework for Harnessing Synergies with the Public Sector (continued on next page)

Page 734

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions Private Sector Development and Operations: Harnessing Synergies with the Public Sector (continued) Issues and Recommendations. Despite strong growth in the size of the Private Sector Operations Department (PSOD) portfolio, many development member countries (DMCs) are complaining that ADB is not responding adequately to this demand. In most countries in the Asia Pacific Region, the private sector’s role in financing, managing, and delivering services has increased, particularly in sectors such as finance, energy, transport, and water utilities. Governments are shifting their operational focus to policy and regulatory functions. This trend requires changes in the scope of ADB’s operations, as well as greater synergy between public and private sector operations. As discussed in the ADB’s Medium-Term Strategy II, for 2006– 2008, and the presentation of the Private Sector Development Strategy Task force to the Board in 2006, PSOD’s role within the context of ADB’s overall operations is expected to grow substantially over the next 5 years. The final report of the 2007 Eminent Persons Group confirmed this view. It highlighted the importance of economic growth, public private partnerships in infrastructure, and the need to strengthen financial intermediation. This vision requires changes in the roles, products, responsibilities, and organization structure; the level and type of resources; and ADB’s organizational structure related to private sector operations. This type of change will not be straightforward. ADB has struggled to find ways of merging public and private sector operations since the early 1980s as it requires changes in culture, organization structures and strong leadership from management. It is a particular challenge to graft a private sector culture into an organization like ADB that is dominated by a public sector culture. Many corporations involved in merger and acquisition activities have found that merging different corporate cultures is a difficult exercise. It is not surprising that ADB is no exception in this regard. This evaluation is the first of four related studies. It will be followed by OED reviews of (i) investments in private equity funds, starting in 2007, (ii) private infrastructure portfolio in 2008, and (iii) the effectiveness of strategies for developing an enabling environment for the private sector in 2009. Together, these four evaluations will provide a comprehensive assessment of ADB’s private sector operations, and its efforts to improve the business climate in DMCs. Within this context, this evaluation has found that at the corporate level, ADB needs to look at the organizational effectiveness to ensure optimal efficiency in its business procedures and delivery of services. At the departmental level, ADB needs to develop country level business plans within the framework of a country partnership strategy for its private sector operations and medium term strategic plans. There are opportunities to further harmonize operations with other similar multilateral development agencies in developing/fine-tuning guidelines and practices for preparing country and departmental level business plans and implementing environment and social safeguard policies and procedures. ADB’s aspirations for expanded private sector operations should be accompanied by the resources required to achieve this goal. The following key recommendations are put forward for Management consideration. Further details on these recommendations are presented in the main text of the report (Chapter V).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 735

Chapter 13

Part IV: Thematic Evaluations According to DANIDA’s Evaluation Guidelines, “thematic evaluations deal with selected aspects or themes in a number of development activities” (Danish Ministry of Foreign Affairs, 1999, p. 30). These themes emerge from policy statements. The development organization may decide, for example, that all projects or programs will address a specific issue or issues. For example, the policy may be that all projects and programs will address issues around gender, environmental, and social sustainability, and/or poverty alleviation. By policy, these issues must be addressed in all stages of the project or program and for all forms of aid. Thematic evaluations might also focus on issues such as improving the investment climate or use of guarantees. The relation of the issues to organizational policies may be less direct in that facilitation of an improved investment climate may only be an implicit, rather than explicit, part of an organization’s mission. As with country and sector evaluations, the approach is bottom-up and top-down. Themes are evaluated on a projectby-project basis and the information in these evaluations provides a wealth of information for thematic evaluations. But thematic evaluations also go beyond the project level and usually look across all projects, but also select countries for indepth study (case studies). A thematic evaluation will look at many different kinds of information. It will then extract aggregate information from these sources.

Page 736

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions Example of Thematic Evaluation: on Child Labour in Scavenging Africa, Asia, and Europe Assessment WASTE is an organization that works towards sustainable improvement of the urban poor's living conditions and the urban environment in general. WASTE was contracted by the International Labour Organisation (ILO) to carry out a thematic evaluation of child labor in waste picking. The purpose of this evaluation was to provide guidance to the ILO, on how best to address the exploitation of children in this sector. The thematic evaluation identified and did a critical assessment about scavenging and various approaches to addressing the problem of child labor in relation to waste picking. The information was drawn from the various projects carried out in this sector by IPEC (International Programme on the Elimination of Child Labour), as well as from similar efforts of other agencies, institutions, or governments. The issue of child labor in waste and rag-picking is regarded from the perspective of waste picking as a livelihood activity in the context of the overall socio-economic conditions in cities in the South, in particular: •

general economic developments in Southern cities of the past 10-20 years

•

specific developments in the solid waste management sector

•

the role of waste picking in solid waste management

•

the position and condition of waste pickers, distribution of sexes, ethnicities, ages, etc

•

the risks and hazards faced by waste pickers; and their decisionmaking in light of these

•

the function of waste picking in their overall livelihood activities

The project resulted in a strategic assessment and recommendations for the ILO. The evaluation was titled Addressing the exploitation of children in scavenging (waste picking), a thematic an action on child labour. It can be downloaded from the ILO Web site. It is available in English and Spanish Additionally 5 reports have been written by field researchers in the following countries. •

Egypt

•

Tanzania

•

Thailand

•

India

•

Romania

Source: WASTE, 2005, Thematic Evaluation on Child Labour in Scavenging Africa, Asia, and Europe Assessment

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 737

Chapter 13

Gender in Development A thematic evaluation using the theme of gender allows the evaluators to look at many development activities and their effect on gender development. Most evaluations of development interventions do not systematically examine differences in the impacts of policies and projects on men and women. Even fewer examine how interventions affect gender relations such as the economic roles and responsibilities of different household members or the relative contribution of men and women to the household economy. This lack of attention to gender has important consequences for the quality and operational utility of impact evaluations (Bamberger, 2005, p. 3). Evaluations that address the theme of gender are one way to address how well the organization reflects its focus across its work.

The Importance of Gender in Development Evaluation A substantial body of literature exists that demonstrates how gender inequality is detrimental to development (UNFPA, 2005). Inequality reduces economic growth and limits access to public services. An understanding of gender differences is essential for evaluating the efficiency and equity impacts of development policies and programs for several reasons. Because of their different economic roles and responsibilities, men and women experience poverty differently: they have different priorities concerning programs to reduce poverty, and face different constraints in their efforts to improve their economic or social conditions. Equally important, especially for impact evaluation design is the fact that development programs often affect men and women differently. A program (e.g. microenterprise) that focuses on women without considering the support of their spouses, for example, may have a higher chance of failing. Gender is also integral to data collection. Studies mainly using male interviewers who, in many cultures, are not able to speak to women in the community will have an incomplete, flawed picture (Bamberger, 2005, p. 9).

Page 738

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions Men and women have socially constructed gender roles, based on rules and norms assigning them economic, social, and political roles and responsibilities. Sometimes gender roles are a matter of custom, but other times they are supported by law. For example, in some countries, women cannot legally work, go to school, or own property. Men and women have different experiences because of their gender roles and different access to resources. For example, women, because of their social role as caregivers, have different development needs. Women need public transportation that will take them where they need to go: to markets, clinics, and schools. They need to have access during non-peak hours and they need transportation that is safe and reliable. Access to safe and affordable childcare is also important in the lives of women, especially those who work outside the home. Medical care for women, provided at times and places that are convenient, with medical professionals they trust, is also an important issue. But women are not all the same. Old and young, rich and poor, single and married, and those with and without children will vary in their needs and ability to participate in and benefit from development. Development needs to be responsive to this variation. Development interventions are increasingly recognizing that development has to include and involve women who have been traditionally marginalized and who have been given limited access to social, economic, and political resources. It is believed that the presence of women in public life will make a difference. One study found that as women gain influence, the level of corruption decreases (World Bank, 2001, p. 122). Development interventions focusing on the needs of women have shown some success (see Case 13-1 and 13-2).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 739

Chapter 13

Case 13.1: Achievements in the Advancement of Women: Local Economic Development Agencies Women are among the most vulnerable groups where yearly per capita income is estimated at only US$150. They make up 60 percent of the adult population, head up to 35 percent of households and constitute the bulk of the labor force. Many women depend on small-scale, income-generating activities in the informal sector, but their earnings are limited by lack of literacy, vocational and business skills, and capital. The UNDP-supported Employment Generation Programme covers vocational training, labor-based infrastructure rehabilitation, and small enterprise and informal sector promotion. Over 18 months, the NGO conducted 108 small business-training programs for 1,786 trainees, of whom 60 percent were women, and over 1,000 of whom later started or expanded a small business (68% percent women). Based on new jobs generated and average family size, an estimated 25,000 low-income people now enjoy a higher standard of living thanks to the program.

Case 13.2: Achievements in the Advancement of Women Strengthening Kenya Women Finance Trust An affiliate of Women’s World Banking provides women with access to credit and technical assistance. With support from the UNDP, more loans were made in 18 months than during the prior 10 years. It has nearly doubled the number of women trained each year. It has more than 2,000 women entrepreneur clients. Loan monitoring and record keeping have improved. Loan recovery stands at 100 percent.

Page 740

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions Too often, development evaluation has ignored gender issues. Bamberger (2001) states that many of the tools used in evaluation are not gender-sensitive: •

Many household surveys only collected information from the “household head”, who is usually a man.

•

Even when women are interviewed in the house, other family members may be present and the woman may be inhibited from speaking freely.

•

Women may only speak the local language.

•

Women may either not attend community meetings, may not be allowed to speak, or may be expected to agree with their male relatives.

•

Many studies mainly use male interviewers, who in many cultures are not able to speak to women in the community.

Some of the gender-related goals that development interventions seek to achieve and that can be the subject of an evaluation include: •

•

Goal – Social equity in governance and civic involvement: −

Increase women’s access to social, economic, and political resources

−

Value women’s contribution to households and community maintenance

−

Ensure equality under the law

−

Ensure equal opportunities

−

Ensure equal voice

−

Increase political participation and representation

−

Increase participation in government, NGOs, and advocacy organizations.

Goal – Gender equity in partnerships: −

Build on shared interests, reciprocal support, mutual benefit and respect

−

Include women in all phases of development, including planning, resource allocation, implementation, and evaluation.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 741

Chapter 13

Gender nder--Responsive The Elements of a Ge nder Evaluation Approach Bamberger (2005, p. 9) describes a gender-responsive evaluation approach. It draws upon all of the conventional evaluation tools for data collection and analysis. The distinguishing characteristics of a gender-responsive approach are the following: •

A conceptual framework recognizing the gendered nature of development and the contribution of gender equity to economic and social development.

•

Creation of a gender data base at the national, sectoral, or local level, which synthesizes available genderrelevant data and identifies the key gender issues to be addressed in the design and evaluation of projects.

The unavailability of sex-disaggregated data is often used to justify the lack of attention to gender. In these situations it is important to define strategies for developing the appropriate data bases to make it possible to conduct better gender analysis in future studies and project planning:

Page 742

•

Ensuring that data collection methods generate information on both women and men and that key gender issues (such as the gender division of labor, time-use analysis, control of resources and decisionmaking at the household and community levels) are incorporated into the research design.

•

Ensuring that information is collected about, and from, different household members and that the “household head” [usually defined as a male] is not the only source of information.

•

Complementing conventional data collection methods with gender-inclusive methods, where required.

•

Ensuring that the research team includes a balance between men and women.

•

Ensuring that stakeholders are consulted during the design, analysis, and dissemination of the evaluation, and that the consultations include groups representing both men and women (World Bank, 2007, Gender section).

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions •

Development planning frequently uses “gender neutral” approaches, which assume that men and women have the same development needs. In most societies, men tend to dominate community and household decisionmaking, so that “gender-neutral” approaches are largely responding to male priorities.

•

Ignoring women’s needs and capacities will significantly reduce the efficiency and equity of policies and programs.

•

The way in which many development planning and evaluation tools are applied is not gender-responsive, so that even when community consultations or household surveys are intended to capture the views of all sectors of the community, women’s concerns will often not be fully captured.

Table 13.3 gives a checklist for assessing the gender sensitivity of an evaluation design.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 743

Chapter 13 Table 13.3: Checklist for Assessing the Gender Sensitivity of an Evaluation Design How well addressed Not applicable

Poor or not addressed

Adequate

Good 1. Conceptual framework: and research design 1-1 Evaluation includes a gender analysis framework 1-2 Evaluation addresses gender issues and hypotheses where appropriate 1-3 Stakeholder consultations with all key groups, including women’s groups 1-4 Use (where appropriate) of rapid assessment/diagnostic studies during evaluation design 1-5 Ensure focus on gender, not just women 2. Organization of the research 2-1 Both sexes included at all levels of research team 2-2 Local language speakers involved 3. Sample design 3-1 Both male and female household members interviewed 3-2 Special modules to interview other (non-household head) members of the household 3-3 Monitoring who participates (both attends and speaks) in community meetings. 3-4 Follow-up sample if key groups missing 3-5 Focus groups selected to ensure all key groups represented 3-6 Follow-up sample for missing groups 4. Data collection methods 4-1 Data collected (where appropriate) on both sexes 4-2 Key gender issues are covered 4-3 Information on gender division of labor 4-4 Time use 4-5 Control of resources 4-6 Information collected about, and from, different household members 4-7 Use of gender-sensitive data collection methods where required. 4-8 Mixed method data collection strategy 4-9 Systematic use of triangulation 5. Data analysis and presentation 5-1 Ensure sex-desegregation of data. 5-2 Follow-up (if possible in the field) when triangulation reveals inconsistencies. 5-3 Ensure findings reach and are commented on by all key groups (including groups representing both men and women)

Source: Michael Bamberger (2005). Handbook. Evaluating gender impacts of development policies and programs. Presented at IPDET, July 4 and 5, 2005. p. 33.

Page 744

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions

Part V: Joint Evaluations According to the OECD/DAC Glossary, joint evaluations are evaluations “to which different donors and/or partners participate” (OECD/DAC, 2002, p. 26). The glossary also notes that; “there are various degrees of “jointness” depending on the extent to which individual partners cooperate in the evaluation process, merge their evaluation resources and combine their evaluation reporting. Joint evaluations can help overcome attribution problems in assessing the effectiveness of programs and strategies, the complementarity of efforts supported by different partners, the quality of aid coordination, etc.) (OECD/DAC, 2002, p; 26). Joint evaluations have been done since the early 1900s, but recently evaluators are partnering to conduct joint evaluations with increasing frequency. The DAC has been promoting joint evaluations to enhance coordination and cooperation of development organizations (Breier, 2005, p. 12). When discussing joint evaluations, it is important to look at how the joint members come together, that is, what is the relationship between donors and partners? Here, donors are those donating the resources for the intervention and partners are those working together to implement the intervention or evaluation. Chen and Slot (2007, slide 3) categorized four ways of joining to work on evaluations: •

donor + donor

•

donor + partner country

•

multi-donor + multi-partner

•

partner + partner.

Breier (2005, pp. 16-17) identifies a different typology for joint evaluations based on the mode of work. They are: •

Classic multi-partner: Participation is open to all stakeholders. All partners participate and contribute actively and on equal terms.

•

Qualified multi-partner: Participation is open to those who qualify in the sense that there may be restrictions or the need for “entry tickets“ - such as membership of a certain grouping (e.g. EU, Nordics, UNEG, ECG, Utstein) or a strong stake in the subject matter of the evaluation (e.g. active participation within a SWAp that is being evaluated).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 745

Chapter 13 •

Hybrid multi-partner: This category includes a wide range of more complex ways of joint working. For example: −

work and responsibility may be delegated to one or more agencies while other actors take a ‘silent partnership’ role

−

some parts of an evaluation may be undertaken jointly while other parts are delivered separately

−

various levels of linkage may be established between separate but parallel and inter-related evaluations

−

the joint activities focus on agreeing on a common framework, but responsibility for implementation of the evaluation is devolved to different partners

Key Steps in Planning and Delivering Joint Evaluations Planning for a joint evaluation is critical to its success. Breier (2005, pp. 38) discusses the importance of clarifying the purpose, objectives, focus, and scope of joint evaluations as a part of the planning process. According to Breier: One of the near-universal lessons learned from recent experience with joint evaluations is that it is imperative to allow sufficient time at the beginning of the process to develop and agree on this framework of common understanding about the proposed evaluation’s purpose, objectives, focus and scope. When this is not done with the necessary time and patience, as is the case in a number of the evaluations analyzed, there is a strong chance that the evaluation process will run into difficulties later on (2005, p. 40). When a joint evaluation only deals with a few agencies, the management structure can be simple. The evaluators may decide to meet regularly and to share in all management decisions. Partners may decide to have all agencies equally involved in the management, but identify one or more agencies to serve in a leadership role. Other partners may decide to delegate the management responsibility to one agency, giving the other agencies the role of reviewing key outputs (OECD/DAC, 2006, p. 19).

Page 746

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions For larger joint evaluations, a two-tier management system can work. In a two-tier system, there is a broad membership steering committee and a smaller management group. Another option for large joint evaluations is a flexible or decentralized approach. One agency may manage discrete sub-components of the overall evaluation, in a sort of “jig-saw” fashion. Each is responsible for one piece and when all complete their part the evaluation is complete. It is also possible to do a mixed approach where some parts are undertaken jointly and others are done as separate parts (OECD/DAC, 2006, p. 19). Both management structures (decentralized and centralized) have strengths and weaknesses. Decentralizing, organizing management of the evaluation by diverse issues, makes it easier to delegate or divide responsibilities, resulting in a more efficient management process. Decentralized structures, however, might also create duplication of efforts or cause important issues to be missed. Using a centralized management structure enables each partner to have input on, and influence over, all the components of the evaluation process. The most common management structure for joint evaluations is a two-tier system, consisting of a) a broad membership steering committee; and b) a smaller management group that runs the day-to-day business of the evaluation. Within this structure there is significant leeway for deciding whether some agencies will participate as silent partners, at what level of detail the steering committee should be involved in decisionmaking, and how many partners should be on the management group and how much responsibility should be delegated (OECD, DAC, 2006, p. 20, Box 5). Partners need to consider the pluses and minuses of each structure and then decide which best fits the need of the evaluation. The OECD/DAC identified the following key areas on which evaluation partners must reach agreement: •

common ground rules for the evaluation

•

the terms of reference for the evaluation team

•

selecting the evaluation team: bidding and contracting

•

budgeting, costing, and financing

•

collecting and analyzing data and reporting findings (OECD/DAC, 2006, pp. 21-30).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 747

Chapter 13 Breier (2005, p. 52-53) discusses ways of coping with legal issues for joint evaluation. He stresses the importance of considering and agreeing upon the legal issues involved in working together. These can be challenging, and may cause delays, but they can be overcome. Legal issues include: •

agreed upon contract that normally reflects the legal system, the requirements, and the established practice of the agency that is taking the lead on behalf of the group

•

lump sum agreements versus negotiated contracts, including cancellation clauses (for poor performance)

•

contractual needs among the partners

•

stipulating to submit progress reports showing that funds are being put to proper use.

Freeman (2007) describes 16 rules for organizing and managing the external evaluation team: 1. While encompassing important expertise on the sector, sub-sector and geographical areas under evaluation ensure that the core expertise in complex, large scale evaluations of development cooperation is as solid as possible. 2. In most cases a consortium of firms and/or research institutions will be needed – keep the organization as simple as possible and, whenever possible, work with organizations you have worked with before. 3. The lead organization in the consortium should be one with a strong commitment and track record in the evaluation of international development cooperation. The project should be a natural fit with its core business and markets. 4. Commitment to the evaluation should be made clear at the board level of the main external evaluation organization. 5. National consultants should be integrated into the process of the international competitive bid and should take part in methodology selection and design. 6. In multi-country studies, each field team should combine resources from different organizations in the consortia rather than having each organization specialize geographically or institutionally. 7. Evaluation team workshops to develop a common approach to measurement and reporting are invaluable.

Page 748

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions 8. The evaluation team is ultimately responsible to the overall Evaluation Steering Committee; whenever possible, this should be done with and through a smaller management group. 9. Meetings with the Steering Committee will be less frequent given the expenses of assembling the committee but sufficient time will be needed to allow for full discussions and working out of a common position. 10. It is always useful to present preliminary evaluation findings to the Steering Committee (along with basic evidence) in advance of the presentation of the Draft Report itself. 11. In joint evaluations, it is essential that the external evaluators operate openly and transparently and are able in presentations and reports to directly link methods to evidence gathered to findings, conclusions and recommendations. 12. In negotiations for additional resources, when they are clearly needed, the evaluation team and the management group will need to begin by agreeing on the split between work which should be undertaken as a result of the original contract (and using the original resource envelope) and work which is the result of new issues and interests or arises from unforeseeable circumstances. This will require the team to prepare detailed, costed, and time-bound plans for any new work required. 13. Large, lengthy, complex and high-stakes joint evaluations require all stakeholders to maintain a strong positive orientation throughout the exercise. 14. It is essential that the external evaluation team is responsive to all members of the Steering Committee as having essentially equal weight. It is equally essential that the evaluation team demonstrate an absence of institutional bias. 15. Draft reports are never perfect. Both the evaluation team and the Steering Committee should enter discussions on drafts with an open attitude to improvements which can be made. At the same time the evaluators must be able and willing to maintain their objective responsibility for evaluation findings and conclusions. 16. The cost, complexity and duration of joint evaluations argue strongly for investing a substantial proportion of the budget in dissemination and follow up activities.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 749

Chapter 13 OECD DAC has published Guidance for Managing Joint Evaluations. The publication is available at the following Web site: http://www.oecd.org/dataoecd/29/28/37512030.pdf

Part VI VI: Global and Regional Partnership Programs (GRPP) Evaluation Global and Regional Partnership Program (GRPP) evaluations are an increasingly important modality for channeling and delivering development assistance to address pressing global and/or regional issues. Most GRPPs are specific to a certain sector or theme, such as agriculture, environment, health, finance, or international trade. GRPPs are programmatic partnerships in which: •

the partners contribute and pool resources (financial, technical, staff, and reputational) towards achieving agreed-upon objectives over time

•

the activities of the program are global, regional, or multi-country (not single-country) in scope

•

the partners establish a new organization with a governance structure and management unit to deliver these activities (IEG, 2007c, p. 18).

A new GRPP is usually established after a group of global and/or regional actors identify a problem or opportunity which requires public action but which crosses national sovereign boundaries. The actors may be donors and/or recipients and may come from the public and/or private sector. Consider the example of water resource management in the Nile basin. Issues or opportunities cannot be addressed effectively solely at a single-country level because many countries are involved in the water resource management. Also, substantial economies result from collective action at a global or regional level (generation of new technologies or good practice relevant to development in many countries). GRPPs are thus often focused on production or preservation of “global public goods” (e.g. research which yields improved seed technology, or biodiversity) or reduction of “global public bads” (e.g. climate change or pollution). The global public goods aspects refers to the fact that the goods provided are non-rival (provision to one person or country does not decrease supply to another) and/or non-excludable (once provided to one person or country, all benefit), and that the reach of the benefits extends beyond national boundaries.

Page 750

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions Because of differing perspectives and incentives of the various partners who have banded together to establish the program (and usually fund it, or lend technical expertise to it), a formal governance structure is set up which sets out formal objectives, criteria on membership or participation, decisionmaking procedures, and sometimes allocation criteria. Examples of GRPPs include the Consultative Group for International Agricultural Research, the West Africa HIV/AIDS and Transport program, the Global Water Partnership, the Integrated Framework for Trade, and the Medicines for Malaria Venture. About 150 multi-partner GRPPs have been identified and they range in level of annual disbursements from a few million dollars to over one billion. Their activities may be limited to knowledge-sharing and networking, or may also encompass TA, investment, and market/trade activity. Features of GRPPs which make them complex to evaluate are the following: •

The nature of the programs as partnerships has two implications; first, differing perspectives of different partners may need to be taken into account; second, the effectiveness of the governance structure itself needs to be evaluated, both on intrinsic criteria related to good practice in partnership, and in terms of its contributions to program results (positive or negative).

•

The programs, unlike projects, do not have a fixed timeframe in which to achieve objectives, and indeed, the objectives and strategies often evolve over time as funding increases (or decreases), the donor or partnership composition changes, or external conditions change.

•

Costs are incurred and benefits accrue at different levels: local, country, and global/program levels, and an evaluation of effectiveness needs to consider the contributions and interdependence between the different levels; this makes the construction of the results chain complex, and poses a results aggregation problem.

•

The global and regional public goods dimension of GRPPs mean that there is a divergence between the costs and benefits captured at national and global levels, making evaluation of cost-effectiveness complex, and making assessment of the effectiveness of globalcountry coordination and linkages essential.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 751

Chapter 13 •

Because of this open-ended and longer-timeframe, evaluators are often asked to consider progress toward complex medium-term strategic goals such as resourcemobilization aimed at scaling up, devolution to local implementers, independence from host agency, closure or exit, or changed patterns of collaboration with new emerging global programs.

Given the growing participation of the World Bank Group in GRPPs, and their increasing importance in achieving global sustainable development results, IEG has recently launched two new activities to support continued examination and improvement of their performance: First, in response to a request from the OECD/DAC Evaluation Network, (supported by the UN and MDB Evaluation Networks), IEG led an effort to develop consensus principles and standards on evaluating GRPPs. The first product of this work was the “Sourcebook for Evaluating Global and Regional Partnership Programs (GRPPs)”, and work is ongoing on research which will contribute to publication of a second accompanying report giving more detailed guidelines and examples. Seocnd, IEG also began reviewing independent evaluations of GRPPs in which the Bank is involved, to: •

assess the quality of the evaluation and give feedback;

•

provide an independent assessment of the findings and recommendations; and

•

assess the performance of the Bank as partner.

Seven of these “Global Program Reviews (GPRs)” have been concluded and released to the public, and lessons are emerging.

Page 752

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions

Example of a GRPP Evaluation and subsequent Global Program Review: Review: the Medicines for Malaria Venture (MMV): Background on the Program: The objective of MMV, which was established in 1999, is to reduce the burden of malaria in disease-endemic countries by discovering, developing and delivering new affordable anti-malarial drugs through effective publicprivate partnerships. Cumulative contributions were $151 million through 2006. The Secretariat of the program, which is organized as an independent non-profit organization under Swiss law, is in Geneva, and there is a satellite office in New Delhi, India. Its Board of Directors consists of eleven distinguished individuals from industry, academia, and the World Health Organization, as well as the Gates Foundation which provides 60% of its funding. The Evaluation: MMV’s donors commissioned an external evaluation in 2005. MMV’s board cooperated with the evaluation, although it did not commission it. The evaluation was carried out by a four-person team, led by a distinguished African public health professor. The Terms of Reference called for use of the standard evaluation criteria, with the evaluation questions drilling down to more detail on aspects specific to GRPPs in general and the nature of the activities funded by MMV in particular. The following illustrates how such criteria are applied in complex GRPP evaluations: Relevance: Questions were posed relating to the relevance of the objectives of the program – Was there an international consensus on the problem being resolved and the need for the program to address it? (The evaluation pointed out the relevance of the program to achieving the millennium development goal of preventing the spread of communicable diseases, which is also a “global public good” (GPG). The evaluation’s assessment of relevance also applied the “subsidiarity principle criterion”—programs should be implemented at the most local level possible. (The evaluation asked—Does the program provide services that cannot be provided at the country or local level? Because of the GPG being addressed, the evaluation concluded the program met the subsidiarity principle.) The evaluation considered MMV’s value added relative to other sources of supply and services (considering both other programs dealing with malaria, and purely private sector drug development programs). The evaluation also assessed the relevance of the objectives of the program to the beneficiaries’ needs. (The program was judged relevant by this criterion, since it was addressing not only drug development issues, but also delivery and access issues.). Finally, the consideration of relevance considered not only objective, but also asked the question—Is the program strategy and design relevant (Choice of interventions? Geographical scope and reach? Use of Public –Private Partnerships (PPPs)?) Would an alternative design be more relevant?)

(continued on next page)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 753

Chapter 13 Evaluation and subsequent Global Program Review: the Medicines for Malaria Venture (MMV) (continued) Efficacy: (achievement of objectives) The evaluation examined both outputs achieved and outcomes achieved, relative to expectations, as well as the factors that helped or hindered the achievement of objectives. This aspect of the evaluation was highly dependent on the quality of the monitoring and evaluation (M & E) framework and data collection. The contributions of the following factors were assessed: •

Stakeholder involvement

•

Effectiveness of portfolio management

•

Effectiveness of PPP management

•

Scientific approach

•

Effectiveness of key approaches to product

•

Finally, the evaluation noted any unintended outcomes.

Efficiency: Though not called for explicitly as a separate criterion in the TOR, the evaluators recognized the important of considering opportunity costs of more efficient means of delivering services. The evaluators analyzed the costs of management and administration relative to program expenditures and the trends over time, and compared these to benchmarks such as the cost of development of drugs for other diseases, and the costs of a typical product development process in a large private pharmaceutical company. Governance: The evaluation assessed the legitimacy of MMV governance by looking at (a) the representation of endemic countries on the governing body; (b) the engagement of scientific expertise, such as that of WHO and endemic country researchers; and c) evidence of confidence of the partners, for example, in successful resource mobilization campaigns, and reactions to evaluations. The Evaluation also assessed the effectiveness of the governance in performing the following important functions: •

setting strategy (How well did the governance structure inform deliberations on the new direction of addressing drug access at the country level?)

•

collaboration with other agencies with similar mandates (The evaluation recommended more attention to this aspect)

•

resource mobilization (The evaluation recommended acting more proactively to raise needed resources and diversify donor support.)

Sustainability/Medium-Term Strategic Questions: The evaluation asked if the benefits of the program were likely to be sustained under the current conditions. (It addressed sustainability both in terms of financing needs and sustained advocacy and support of objectives). The evaluation also assessed if change could be made to make conditions more favorable for sustainability. (Should the scale of the program be changed? Can relationships with other organizations be made more effective? Should alternative organizational or governance arrangement be considered?). Finally the evaluation addressed issues of improvement in M & E. The TOR called for the evaluators to test a template proposed by donors for assessing the effectiveness of PPP. In addition, the evaluators made recommendations on improving monitoring to support future (continued on next page)

Page 754

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions Evaluation and subsequent Global Program Review: the Medicines for Malaria Venture (MMV) (continued) The methodology of the evaluation involved a) review of documents about the program, b) extensive interviews with stakeholders at both the program and activity level (e.g. members of governing body, secretariat, donors, implementing staff); c) observation of a Scientific Advisory Committee meeting; and d) in-country site visits. IEG’s Global Program Review (GPR): --The GPR assessed the independence and quality of the evaluation. (It considered who commissioned the evaluation and how it was managed, to judge the degree of independence. It examined if the evaluation covered all aspects of MMV work, in accordance with the TOR. And it looked at the quality of the evaluation and its ultimate impact. (To what degree did program management implement the recommendations?) --The GPR provided an independent assessment of the findings of the evaluations, and confirmed that it provided a sound basis for follow-up actions by program management and the Bank. --The GPR assessed the World Bank’s performance as a partner, considering its use of convening power to help establish the program, its subsequent financial support, which was initially important, but shrank over time relative to other donors, and the Bank’s participation in deliberations of the governing bodies and committees. The GPR pointed out future issues likely to arise in the Bank’s role in the program, Finally, the GPR summarized lessons which arose from the evaluation and the GPR, including those for the program itself (establishing an appropriate M & E framework) those applicable to other GRPPs (the need for effective coordination and consultation with other key players at the global and country level), and those applicable to other aid instruments (lessons for projects which rely on PPPs to deliver services). The World Bank’s Board received a copy of this GPR, and its Committee on Development Effectiveness discussed a summary report which provided common lessons from the Bank’s seven GPRs finalized so far, along with recommendations regarding the Bank’s engagement with GRPPs. The GPRs have been released to the public, and are available on the World Bank’s external website (www.worldbank.org/ieg/grpp).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 755

Chapter 13

Part VII VII: Evaluation Capacity Development As the demand for monitoring performance and measuring results continues to increase, the demand for evaluation capacity also increases. Evaluation that is based on a few experts cannot meet the demand, so evaluation capacity at all levels within a country has to be enhanced. Not only does having more people with evaluation skills increase the quality and quantity of evaluations, it also helps in program development. Once people are attuned to the concept of measuring performance and results, they are more likely to establish clear and specific goals and objectives at the program planning stage. Programs that are developed with concrete measures are more likely to stay focused. Focused evaluations can be more helpful since they can provide more specific feedback relevant to the interventions’ goals and objectives. These kinds of evaluations are more likely to contribute to sound governance and high performance. Evaluation Capacity Development (ECD) (Mackay, 2007, pp. 65-80) encompasses many types of actions to build and strengthen monitoring and evaluation (M&E) systems in developing countries, and has a particular focus on the national and sector levels. It encompasses many related concepts and tools: capacities to keep score on development effectiveness, specification of project/program objectives and result chains, performance information (including basic data collection), program/project monitoring and evaluation, beneficiary assessment surveys, sector reviews and performance auditing. ECD focuses on measuring the performance of governments at the ministry, program, and project levels. It also supports M&E capacity-building for civil society, as contributors to M&E information, as users of it, and – in some cases – as producers of it. The priority for ECD has been highlighted by the renewed emphasis, via Poverty Reduction Strategies and the Comprehensive Development Framework, on results.

Page 756

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions ECD ensures that evaluation findings are available to assist countries in four key areas. They are: •

Evaluation findings can be an important input to government resource allocation, such as: planning, decision-making and prioritization, particularly in the budget process.

•

Evaluation assists government managers by revealing the performance of ongoing activities at the sector, program, or project levels. It is therefore a management tool that leads to learning and improvement in the future (results-based management).

•

Evaluation findings are an input to accountability mechanisms, so that managers can be held accountable for the performance of the activities that they manage, and so that government can be held accountable for performance. The notion of accountability encompasses the recognition that economic governance and a sound public sector are central to national economic competitiveness. markets reward countries that are able to manage and screen public expenditures, and evaluation offers a tool to help do that.

•

Evaluation findings demonstrate the extent to which development activities have been successful. This is proving to be increasingly important for countries in attracting external resources, particularly given the pressures on international development assistance agencies to channel their assistance to countries where past development efforts have been successful. Moreover, the increasing emphasis by development assistance agencies on a whole-of-government approach to development increases the premium on having country-wide measures of performance available.

Developing evaluation capacity requires the recognition that evaluation is helpful; it cannot be imposed on a government. Governments have to create the demand for increased evaluation capacity and own the evaluation system. Development evaluation also needs a supply of people with the necessary skills and a system in place to support their work. Lastly, there needs to be an information infrastructure so that data can be routinely collected and results can be disseminated and used.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 757

Chapter 13 If evaluation capacity is not in place, then a plan must be developed. This plan should include the following nine-step process. 1. Identify the key stakeholders for performance measurement, measuring for results and/or evaluation. 2. Examine and diagnose problems in the public sector environment. 3. Understand the factors that actually influence budget and management decisions at each ministry. 4. Determine the existing need. 5. Assess the evaluation activities and capabilities of central and line ministries and other organizations (such as universities, businesses, NGOs). 6. Consider the evaluation activities of multilateral and bilateral development assistance agencies in the country. 7. Identify major public sector reforms that might support performance measurement or measuring results efforts. 8. Map the options for developing evaluation capacity. 9. Prepare a realistic evaluation capacity development action plan. Henrick Schaumberg-Müller (1996, pp. 18-19), a consultant to DANIDA, wrote a report for the DAC Expert Group on Aid Evaluation, to discuss experiences in evaluation capacity building. In his paper, he discusses why the evaluation community has learned to support evaluation capacity development. He identifies the need for the design and formulation of evaluation functions to be specific for the individual country or organization. There may be common objectives at the general level, but they specific systems and approaches need to be considered. Schaumberg-Müller also identifies another area of agreement. Building usable evaluation systems may take a long time because they require political and institutional changes. Over time, these institutions need to understand that evaluations are not control systems but tools to improve performance and decision-making. It took the USA ten-years to fully implement the Government Performance and Results Act (GPRA) and all the M&E systems within it. It began with local governments, spread to state governments, and eventually to the federal level (Kusek & Rist, 2004, p. 156).

Page 758

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions Another issue Schaumber-Müller identified is the importance of leaving more of the initiative and design of evaluation to the participants. He acknowledges that donors and host countries have legitimate interests in evaluation, but they may be different from those of the participants. The donors can continue to evaluate both for accountability and to provide lessons of adequacy of their delivery systems, but if evaluation is to become important to the countries or institutions, they must have more control of the initiative and design. Porteous et al (1999, pp. 137-154) identified the importance of empowering managers to improve their knowledge and skills in program evaluation. They identified the following five principles for building evaluation capacity: •

taking stock of what is needed

•

building on shared values

•

valuing different perspectives

•

integrating planning and evaluation into routine program management

•

maximizing adult learning.

Towards Evaluation Capacity Development When we look at developing capacity, we must change our focus. We must now look at long-term versus short-term. For institutional change to take place, it must have time to develop. By building on common ground, evaluators can increase capacity and empower people. Kusek and Rist (2004, p. 177) offer three questions to help learn more about capacity building requirements for a performance-based M&E system. •

How would you assess the skills of civil servants in the national government in each of the following six areas: −

project and program management?

−

data analysis?

−

policy analysis

−

setting project and program goals?

−

budget management?

−

performance auditing?

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 759

Chapter 13 •

•

Are you aware of any technical assistance, capacity building, or training in M&E now underway or done in the past two years for any level of government (national, regional, or local)? Has it been related to: −

the CDF or PRSP process?

−

strengthening of budget systems?

−

strengthening of the public sector administration?

−

government decentralization?

−

civil service reform?

−

individual central or line ministry reform?

Are you aware of any institutes, research centers, private organizations, or universities in the country that have some capacity to provide technical assistance and training for civil servants and others in performancebased M&E?

Concluding Comments Development evaluators around the world face a unique set of complex economic, political, and social factors as they strive to conduct high quality evaluations that meet the needs of diverse stakeholders. Many development interventions and policies are long-term in nature, cover broad geographical areas, and interact with other interventions. Evaluations of these interventions need not only to be technically well designed (sometimes within very tight time and resource constraints); they also need to be conducted in an ethical manner that is sensitive to the local conditions as well as to difficult development issues such as gender roles and poverty reduction. Recently, development is attempting to investigate the effects of many interventions with a conceptual focus on a sector, a country, and/or a theme. These “big picture” approaches are trying to find how the interventions are affecting a larger picture, not just the success of one intervention.

Page 760

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions

Summary As international banks and development organizations are investigating the effects to development efforts, evaluations are moving to a more complex “big picture” view. But focusing only on single projects or programs does not show the actual picture of evaluation in developing countries. The more complex “big picture” approaches look at multiple projects, programs, and/or policies to see interactions and ways to share resources. These complex evaluations may involve joint evaluations, conducted by multiple organizations. Three complex interventions are: •

country program evaluations

•

sector program evaluations

•

thematic evaluations

•

global or regional partnership program (GRPP) evaluations

A country program evaluation focuses the evaluation on an organization’s entire aid to one of the main partner countries. Typically, a country program evaluation is partly a normative study that compares what is being done to what was planned. But it also goes beyond the normative questions to: •

assess the strategic relevance of the country assistance program relative to the country’s needs

•

test the implementation of agency-wide goals to determine whether the intended outcomes were obtained

•

identify the success and failures in different sectors or that of approaches used in the country, and identify the factors contributing to the performance

•

identify the effectiveness of the donor’s aid to a given country, and use this to bolster the case for aid (OECD, DAC 1999).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 761

Chapter 13 Sector program evaluations are evaluations done on major program sectors, such as education, health, housing, or transportation. The International Organization for Migration (IOM) Evaluation Guidelines gives the following definition of sector evaluation: An evaluation of a variety of aid actions all of which are located in the same sector, either in one country or cross-country. A sector covers a specific area of activities such as health, industry, education, transport, or agriculture (Office of the Inspector General International Organization for Migration [IOM], 2005, p. 30). Thematic evaluations deal with selected aspects or themes in a number of development activities. These themes emerge from policy statements. By policy, these issues must be addressed in all stages of the project or program and for all forms of aid. Examples of themes include issues around: •

gender

•

environmental, and social sustainability

•

poverty alleviation.

Global or regional partnership program (GRPP) evaluations are programmatic partnerships in which: •

the partners contribute and pool resources (financial, technical, staff, and reputational) toward achieving agreed-upon objectives over time

•

the activities of the program are global, regional, or multi-country (not single-country) in scope

•

the partners establish a new organization with a governance structure and management unit to deliver these activities.

Most GRPPs are specific to a certain sector or theme, such as agriculture, environment, health, finance, or international trade. Evaluation Capacity Development (ECD) encompasses many types of actions to build and strengthen monitoring and evaluation (M&E) systems in developing countries, and has a particular focus on the national and sector levels. It encompasses many related concepts and tools: capacities to keep score on development effectiveness, specification of project/program objectives and result chains, performance information (including basic data collection), program/project monitoring and evaluation, beneficiary assessment surveys, sector reviews and performance auditing (Mackay, 2007, pp. 65-80).

Page 762

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions

Chapter 13 Activities Application Exercise 13.1: Building Evaluation Capacity Instructions: Suppose you (or your group) have been asked by the government to create a strategic plan for increasing evaluation capacity in your home country. Use the checklist below to guide your discussion: 1. What are the two or three most difficult development issues to be tackled in the next several years?

2. What evaluation capacity exists already, to the best of your knowledge (think about availability of evaluators, skills, resources, infrastructure, etc)?

3. Given current and future development needs and issues, and your assessment of current evaluation capacity, list the six most important enhancements that would improve evaluation capacity in your country: a. b. c. d. e. f. (continued on next page)

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 763

Chapter 13 4. In your country, what is driving the need for monitoring and evaluating systems?

5. Where in your government does accountability for effective (and efficient) delivery of programs lie?

6. Is there a codified (through statute or mandate) strategy or organization in the government for tracking development goals?

7. Where does capacity lie with the requisite skills for designing and using monitoring and evaluation systems in your country? How has this capacity (or lack thereof) contributed to the use of monitoring and evaluation in your country context?

8. Now, prioritize the items on your list above by labeling them: critical, very important, or important.

Page 764

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions

References and Further Reading Reading African Development Bank (2005). Morocco: Evaluation of Bank assistance to the education sector. Retrieved February 11, 2008 from http://www.oecd.org/dataoecd/36/19/37968449.pdf Asian Development Bank (2006). Project performance evaluation report for Indonesia. Indonesia: Capacity building project in the Water Resources Sector. Retrieved February 11, 2008 from http://www.adb.org/Documents/PPERs/INO/26190-INOPPER.pdf Asian Development Bank (2006). Private sector development and operations: Harnessing synergies with the public sector. Retrieved February 11, 2008 from http://www.oecd.org/dataoecd/6/59/39519572.pdf Bamberger, Michael, Mark Blackden, Lucia Fort, and Violeta Manoukian (2001). Chapter 10, Gender, “Integrating Gender into Poverty Reduction Strategies” in The PRSP Sourcebook World Bank, 2001 Retrieved February 8, 2008 from http://povlibrary.worldbank.org/files/4221_chap10.pdf Bamberger, Michael (2005). IPDET Handbook for: Evaluating gender impacts of development policies and programs. July, 2005. Breier, Horst (2005). Joint evaluations: Recent experiences, lessons learned and options for the future. Draft report to the DAC Network on Development Evaluation, presented at the 3rd meeting of the DAC Network on Development Evaluation, June 2-3, 2005. CDF Secretariat (2001). Design paper for a multi-partner evaluation of the Comprehensive Development Framework. Retrieved on February 8, 2008 from http://www.worldbank.org/evaluation/cdf/cdf_evaluation_ design_paper.pdf Chen, Jhaoying and Hans Slot ( 2007). Country-led joint evaluation: Dutch ORET/MILIEV Programme in China. Presentation at the Sixth Meeting of the DAC Network on Development Evaluation, June 2007, Paris. Retrieved February 12, 2008 from www.oecd.org/dataoecd/63/28/38851957.ppt

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 765

Chapter 13 Compton, Donald W., M. Baizerman, and S.H. Stockdill editors (2002). “The art, craft, and science of evaluation capacity building.” in New directions for evaluation. A Publication of the American Evaluation Association. Number 93, Spring 2002. DAC Network on Development Evaluations (2005). Workshop on joint evaluations, challenging the conventional wisdom – the view from developing country partners, Nairobi 20-21, April 2005, Workshop report. Retrieved August 13, 2007 from: http://www.oecd.org/dataoecd/20/44/34981186.pdf DAC. (1999). Evaluating country programmes, Vienna Workshop, 1999. Retrieved August 13, 2007 from: http://www.oecd.org/dataoecd/41/58/35340748.pdf Danish Ministry of Foreign Affairs (1999). Evaluation Guidelines, 2nd edition. DANIDA. Retrieved August 13, 2007 from: http://www.um.dk/NR/rdonlyres/4C9ECE88-D0DA4999-9893371CB351C04F/0/Evaluation_Guidelines_1999_revised.pdf Freeman, Ted (2007). Joint evaluations. Presentation at IPDET, June-July, 2007. Ottawa, Ontario. Fullan, M. (1993). Change forces. London: Falmer Press. Guiding principles for evaluators. (1995). New Directions for Program Evaluation. No. 66. San Francisco: Jossey-Bass. Gesellschaft fur Technische Zusammenarbeit (GTZ)(2004). published the National Monitoring of Sustainable Poverty Reduction Strategy Papers PRSPs. Retrieved August 13, 2007 from: http://siteresources.worldbank.org/INTISPMA/Resources/ Training-Events-and-Materials/summary_MainReport.pdf Heath, John; Patrick Grasso; and John Johnson (2005). World Bank country, sector, and project evaluation approaches. Presented at IPDET, 2005. Independent Evaluation Group (IEG), (2007a). Impact evaluations: Bangladesh maternal and child health. Retrieved August 13, 2007 from: http://www.worldbank.org/oed/ie/bangladesh_ie.html Independent Evaluation Group (IEG): (2007b). Country assistance evaluation (CAE) Retrospective Retrieved February 8, 2008 from http://www.worldbank.org/ieg/countries/cae/featured/ca e_retrospective.html

Page 766

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions Independent Evaluation Group (IEG) (2007c). Sourcebook for evaluating global and regional partnership programs: Indicative principles and standards. Washington, D.C.: World Bank. Retrieved May 20, 2008 from http://siteresources.worldbank.org/EXTGLOREGPARPRO/ Resources/sourcebook.pdf Independent Evaluation Group (IEG) (2007d). Global program review: Medicines for malaria venture. Retrieved May 20, 2008 from http://lnweb18.worldbank.org/oed/oeddoclib.nsf/24cc3bb 1f94ae11c85256808006a0046/d591aea3bbb897de852573 130077a0cb?OpenDocument Johnson, John (2007). Confronting the Challenges of Country Assistance Evaluation. Presentation at IPDET, June 26-27, 2007, Carleton University, Ottawa, Ontario, Canada. Kusek, Jody Zall and Ray C. Rist (2004). Ten steps to a resultsbased monitoring and evaluation system. Washington D.C.: The World Bank. Mackay, Keith (2007). How to build M&E systems to support better government. Retrieved February 11, 2008 from http://www.worldbank.org/ieg/ecd/docs/How_to_build_M E_gov.pdf Mackay, Keith (2006). Institutionalization of monitoring and evaluation systems to improve public sector management. Retrieved February 11, 2008 from http://siteresources.worldbank.org/INTISPMA/Resources/ ecd_15.pdf Mackay, Keith (1999). Evaluation capacity development: A diagnostic guide and action framework. ECD working paper series No. 6: January 1999. Retrieved August 13, 2007 from: http://lnweb18.worldbank.org/oed/oeddoclib.nsf/a4dd58e 444f7c61185256808006a0008/7f2c924e183380c5852567f c00556470?OpenDocument Ministry of Foreign Affairs of Denmark (2007). Joint External Evaluation : The Health Sector in Tanzania, 1999-2006. Retrieved February 11, 2008 from http://www.oecd.org/dataoecd/53/46/39837890.pdf Office of the Inspector General International Organization for Migration (IOM). (2006). IOM Evaluation Guidelines, Retrieved August 13, 2007 from: http://www.iom.int/EN/PDF_Files/evaluation/Evaluation_ Guidelines_2006_1.pdf

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 767

Chapter 13 OECD/DAC (2006). DAC Evaluation series: Guidance for managing joint evaluations. Retrieved February 7, 2008 from http://www.oecd.org/dataoecd/29/28/37512030.pdf OECD/DAC (2002). OECD glossary of key terms in evaluation and results based management. Paris: OECD Publications. Operations Evaluation Department (OED) (2005). Evaluation of World Bank’s Assistance to Primary Education. Retrieved August 13, 2007 from: http://www.worldbank.org/oed/education/evaluation_desi gn.html Porteous, Nancy L., Barbara J. Sheldrick, and Paula J. Stewart (1999). “Enhancing managers’ evaluation capacity. A case study for Ontario public heath.” In The Canadian journal of program evaluation. Special Issue, pp. 137-154 Royal Danish Ministry of Foreign Affairs (2006). Evaluation guidelines. Copenhagen: DANIDA. Schaumberg-Müller, Henrik (1996). Evaluation capacity building: Donor support and experiences. Copenhagen. Retrieved August 13, 2007 from: http://www.oecd.org/dataoecd/20/52/16546669.pdf United Nations Population Fund (UNFPA) (2005). State of World Population 2005. Retrieved August 14, 2007 from: http://www.unfpa.org/swp/2005/english/notes/index.htm WASTE ( 2005). Thematic Evaluation on Child Labour in Scavenging Africa, Asia, and Europe Assessment. Retrieved February 8, 2008 from http://www.waste.nl/page/720 The World Bank (2001). Engendering Development – through Gender Equality in Rights, Resources and Voice. A copublication of the World Bank and Oxford University Press. The World Bank (2007). PRSP Sourcebook, Gender. Retrieved August 13, 2007 from: http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/E XTPOVERTY/EXTPRS/0,,contentMDK:20175742~pagePK:2 10058~piPK:210062~theSitePK:384201,00.html The World Bank (2007a). Poverty Monitoring Systems. Retrieved August 13, 2007 from: http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/E XTPOVERTY/EXTPAME/0,,contentMDK:20203848~menuP K:435494~pagePK:148956~piPK:216618~theSitePK:384263 ,00.html The World Bank Group (2007). Core Welfare Indicators Questionnaire (CWIQ) Retrieved August 13, 2007 from: More information about CWIQ can be found at: http://www4.worldbank.org/afr/stats/cwiq.cfm

Page 768

The Road to Results: Designing and Conducting Effective Development Evaluations

Evaluating Complex Interventions

Web Sites AdePT software: Making Poverty Analysis Easier and Faster: http://econ.worldbank.org/programs/poverty/adept DAC Network on Development Evaluations, Workshop on joint evaluations, challenging the conventional wisdom – the view from developing country partners, Nairobi 20-21, April 2005, Workshop report. http://www.oecd.org/dataoecd/20/44/34981186.pdf DANIDA Royal Danish Ministry of Foreign Affairs Evaluation guidelines. http://www.um.dk/en/menu/DevelopmentPolicy/Evalu ations/Guidelines/ Gesellschaft fur Technische Zusammenarbeit (GTZ) published the National Monitoring of Sustainable Poverty Reduction Strategy Papers PRSPs. http://siteresources.worldbank.org/INTISPMA/Resources/Tr aining-Events-and-Materials/summary_MainReport.pdf

IEG Impact Evaluations: Bangladesh Maternal and Child Health. http://www.worldbank.org/oed/ie/bangladesh_ie.html OECD/DAC (2006). DAC Evaluation Series: Guidance for Managing Joint Evaluations. http://www.oecd.org/dataoecd/29/28/37512030.pdf Office of the Inspector General International Organization for Migration (IOM). (2006) IOM Evaluation Guidelines.: http://www.iom.int/EN/PDF_Files/evaluation/Evaluati on_Guidelines_2006_1.pdf Operations Evaluation Department OED, Evaluation of World Bank’s Assistance to Primary Education. http://www.worldbank.org/oed/education/evaluation_d esign.html Mackay, Keith (2007). How to build M&E systems to support better government. http://www.worldbank.org/ieg/ecd/docs/How_to_build _ME_gov.pdf Mackay, Keith (1999). Evaluation capacity development: A diagnostic guide and action framework. ECD working paper series No. 6: January 1999. pp.2-3. http://lnweb18.worldbank.org/oed/oeddoclib.nsf/a4dd58e44 4f7c61185256808006a0008/7f2c924e183380c5852567fc005 56470?OpenDocument

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 769

Chapter 13 Porteous, Nancy L., Barbara J. Sheldrick, and Paula J. Stewart (1999). “Enhancing managers’ evaluation capacity. A case study for Ontario public heath.” In The Canadian journal of program evaluation. Special Issue, pp. 137-154. http://www.phac-aspc.gc.ca/phppsp/pdf/toolkit/enhancing_managers_evaluation%20_c apacity%20_CJPE_1999.pdf PovertyNet: http://www.worldbank.org/poverty Royal Danish Ministry of Foreign Affairs Evaluation guidelines. http://www.um.dk/en/menu/DevelopmentPolicy/Evaluation s/Guidelines/ or http://www.um.dk/NR/rdonlyres/4BA486C7-994F-4C45A084-085D42B0C70E/0/Guidelines2006.pdf

Schaumberg-Müller, Henrik (1996). Evaluation capacity building: Donor support and experiences. Copenhagen. http://www.oecd.org/dataoecd/20/52/16546669.pdf Stufflebeam, D.L. (1999). Evaluation plans and operations checklist. http://www.wmich.edu/evalctr/checklists/plans_operat ions.htm Stufflebeam, D.L. (2001). Guiding principles checklist. http://www.wmich.edu/evalctr/checklists/guiding_prin ciples.pdf The World Bank, Poverty Monitoring Systems http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/EXTPOVE RTY/EXTPAME/0,,contentMDK:20203848~menuPK:435494~pagePK :148956~piPK:216618~theSitePK:384263,00.html

The World Bank (2001). The PRSP Sourcebook http://web.worldbank.org/WBSITE/EXTERNAL/TOPIC S/EXTPOVERTY/EXTPRS/0,,menuPK:384207~pagePK: 149018~piPK:149093~theSitePK:384201,00.html World Bank Core Welfare Indicators Questionnaire (CWIQ) http://www4.worldbank.org/afr/stats/cwiq.cfm

Page 770

The Road to Results: Designing and Conducting Effective Development Evaluations

The Road to Results Designing and Conducting Effective Development Evaluations

Acting Professionally “Action should culminate in wisdom” BHAGAVADGITA

Chapter 14: 14: Guiding the Evaluator: Evaluation Ethics, Politics, Standards, and Guiding Principles Principles •

Ethical Behavior

•

Politics and Evaluation

•

Evaluation Standards and Guiding Principles.

The Road to Results Designing and Conducting Effective Development Evaluations

Chapter 14 Guiding the Evaluator: Evaluation Ethics, Politics, Standards, and Guiding Principles Introduction As we have seen in prior chapters, evaluators have many tasks, including planning, organizing, designing, collecting data, analyzing data, and presenting data. They also have to deal with internal and external pressures. They might be asked to make changes to the plan, organization, or reporting of the evaluation to meet the needs of others. Sometimes these proposed modifications are not a problem or even welcome, but other times they may raise ethical, and/or political considerations. Ethics and politics are an issue for all evaluators and especially for those working in countries with poor governance and a history of corruption. But internal pressures in any development organization can also raise ethical issues. This chapter discusses ethical issues and political considerations in evaluations. This chapter has three parts. They are: •

Ethical Behavior

•

Politics and Evaluation

•

Evaluation Standards and Guiding Principles.

Chapter 14

Part I: Ethical Behavior Evaluators are often faced with difficult situations in which the right thing to do is not clear. Ethics are a set of values and beliefs that guide choices. Ethics are complicated; no laws or standards can cover every possible situation. Behavior can be legal but still possibly unethical (e.g., taking a small gift from those you are about to evaluate or changing the tone of a report to make it more positive or negative despite the strength of the evidence behind it.) There are many gray areas. Still, evaluators are expected to conduct evaluations ethically.

Evaluation Corruptibility and Fallacies Worthen, Fitzpatrick, and Sanders (2004, pp. 423-424) present five forms of “evaluation corruptibility”. By evaluation corruptibility, they mean the ways that evaluators may be convinced to go against ethical standards. They describe the following forms based on ethical compromises or distortions: •

a willingness to twist the truth and produce positive findings, due to conflict of interest or other perceived payoffs or penalties (such willingness may be conscious or unconscious)

•

an intrusion of unsubstantiated opinions because of sloppy, capricious, and unprofessional evaluation practices

•

“shaded” evaluation “findings” as a result of intrusion of the evaluator’s personal prejudices or preconceived notions

•

obtaining the cooperation of clients or participants by making promises that cannot be kept

•

failure to honor commitments that could have been honored.

When looking at these five forms of corruptibility, we see that evaluators may behave in unprofessional ways.

Page 774

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior E. R. House (1995, p. 29) looks at corruptibility from a slightly different perspective. He indicates that evaluators may simply misunderstand their responsibilities. He calls these evaluation fallacies and he identifies five of them: •

clientism – the fallacy that doing whatever the client requests or whatever will benefit the client is ethically correct

•

contractualism – the fallacy that the evaluator must follow the written contract without question, even if doing so is detrimental to the public good

•

methodologicalism – the belief that following acceptable inquiry methods assures that the behavior of the evaluator will be ethical, even when some methodologies may actually compound the evaluator’s ethical dilemmas

•

relativism – the fallacy that opinion data the evaluator collects from various participants must be given equal weight, as if there is no basis for appropriately giving the opinions of peripheral groups less priority than that given to more pivotal groups

•

pluralism/elitism – the fallacy of allowing powerful voices to be given higher priority because the evaluator feels they hold more prestige and potency than the powerless or voiceless.

When looking at the five fallacies, we see that evaluators may have the best intentions for doing what is right, correct, or ethical, but may have a misunderstanding about their roles and/or responsibilities.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 775

Chapter 14

Identifying Ethical Problems Morris and Cohn (1993, pp. 621-642) conducted a survey of the members of the American Evaluation Association about their views on ethical issues. The following list of ethical problems is modified from their survey: •

•

Page 776

Main client problems: −

Prior to the evaluation taking place, client has already decided what the findings “should be” or plans to use the findings in an ethically questionable fashion.

−

Client declares certain research questions “off-limits” in the evaluation, despite their substantive relevance.

−

Findings are deliberately modified by client prior to release.

−

Evaluator is pressured by client to alter presentation of findings.

−

Findings are suppressed or ignored by client.

−

Evaluator is pressured by client to violate confidentiality.

−

Unspecified misuse of findings by client.

−

Legitimate stakeholders are omitted from the planning process.

Other problems: −

Evaluator has discovered behavior that is illegal, unethical, or dangerous.

−

Evaluator is reluctant to present findings fully, for unspecified reasons.

−

Evaluator is unsure of his or her ability to be objective or fair in presenting findings.

−

Although not pressured by client or stakeholders to violate confidentiality, the evaluator is concerned that reporting certain findings could represent such a violation.

−

Findings are used as evidence against someone.

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior If evaluations are to be useful to managers, development organizations, participants, and citizens, then the work must be honest, objective, and fair. It is the evaluator’s job to ensure that the data are collected accurately, and that data are analyzed and reported honestly and fairly. It is not surprising that some may try to influence the way information is presented or the recommendations that are made. While most evaluators would quickly recognize a bribe, it is not always easy to recognize subtle forms of influence. Offers of friendship, dinner, or recreational activities can be a kindly gesture to someone who is a long way from home. On the other hand, it can be an attempt to influence the evaluator’s perspective, and ultimately the report. Influence at the beginning of an evaluation may be subtle. Sometimes there is pressure to avoid asking certain kinds of evaluation questions or to steer the evaluation onto less sensitive grounds (i.e. “That question is not the important one to ask.”). Certain issues are not brought up that might reflect negatively on the organization or the program or perhaps it is “we know and you know we have a problem to fix and we have already started corrective action; But we do not need to make this public and jeopardize the program’s support”. There may be resistance to surveying staff, program participants and/or citizens because sensitive (negative) issues might be revealed. In other situations, particular people may be excluded from meetings or interviews. Sometimes field trips are limited because of “time constraints”. The evaluator should strive to raise those issues that are being avoided, avoid being co-opted, and make sure that all points of view are heard or considered. Sometimes someone provides the evaluator with leads about corruption and/or fraud. The evaluator has to sort out whether this information is plausible, an attempt to direct focus away from other issues, or an attempt by the informant to get even with someone. The motto, “Do No Harm”, certainly applies to evaluation. Except when there is an issue of fraud or abuse, evaluations should not harm participants. People who participate should never be identified or placed in threatening situations. Protecting confidentiality is essential. But there may be situations where it is difficult.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 777

Chapter 14 For instance, evaluators of an education program are told by several interviewees that the director is allegedly spending program money for personal benefit. What should they do? Or they are told that the director seems to be sexually harassing a staff member. In either case, revealing these findings runs the risk of exposing those who reported these behaviors in confidence. On the other hand, the alleged behavior is illegal in the first instance and, at a minimum, misconduct in the second instance. In most development organizations, procedures exist for reporting misconduct or fraud. The World Bank Group, for example, has a Department of Institutional Integrity to investigate allegations of fraud and corruption in Bank Group operations, as well as allegations of staff misconduct. Often there is a “hotline” for reporting misconduct, fraud, or abuse. The evaluation team must be familiar with their organizations’ policies and standards for handling allegations of fraud, abuse, and misconduct. These may be part of a staff manual, special brochure, or contract of employment. The evaluation team needs to keep in mind that they are bound by their organization’s policies and standards. Further, they are not trained as special investigators. Evaluators should not proceed with an investigation to determine if the allegations are true, or otherwise decide, on their own, if the behavior constitutes fraud, abuse, or decide what should be done about it. It is recommended to maintain a written record of the findings and responses to the off-the-record information until deciding what to do. These should be maintained separately from the evaluation material. It may be helpful to talk with a supervisor or manager (not involved in the allegation) about the situation and options. Also note that in some countries (i.e. USA) sexual harassment is a crime and if you are made aware of it and do not report it – do nothing – you are legally liable.

Page 778

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior

Part II: Politics and Evaluation Evaluation is always carried out in a political context and for some purpose and/or for some person or position. It always takes place in a political context and evaluation should be considered a political act. Webster defines the word “politic” as: “characterized by shrewdness in managing, contriving, or dealing”. Here, we are not talking about government politics, but politics as it is used to refer to behavior that occurs when conflict is perceived to exist by at least one party in a relationship (Tassie, et al, 1996, pp. 347-363). Politics can undermine the integrity of an evaluation. Politics can determine the extent to which and how an evaluation is used. Evaluations are an important source of information for those who make decisions about projects, programs, and policy, but not the only source. A positive evaluation can help secure more funds, and build careers for those involved in the intervention. Evaluations that identify serious problems can improve interventions and future results but they also may result in reduced program budgets or programs that are not renewed. It is important to recognize the political nature of evaluation. Is it possible to identify and manage politics in an evaluation? The following sections address: •

causes of politics in evaluation

•

identifying political “games”

•

managing politics in evaluation

•

balancing stakeholders.

Causes of Politics in Evaluation It is often said that “Knowledge is Power”. Evaluation is a form of organizational knowledge – so power struggles over definitions of reality are inherent in the evaluation process. These struggles cause politics in evaluation. Murray (2002, p. 2-3) identifies the reason politics are inevitable in evaluation – there is so much room for subjectivity. The subjectivity leads to differences among the people involved in the evaluation. The evaluator’s are gathering perceptions of reality from stakeholders and those who are being evaluated. Perceptions may differ, often causing disagreements at different stages of the evaluation, and thus giving rise to political behavior.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 779

Chapter 14 Murray describes the basis for the disagreements to “inherent problems with technical elements of evaluation methods and very common frailties in many human beings”. He identifies the following points where disagreements occur: •

What is the purpose of the evaluation?

•

What will be considered a success or failure?

•

So what? How will the information be used in subsequent decision-making?

Murray also gives a good description of some of the minor weakness in evaluations that can have a political effect. He classifies them as technical and human weaknesses.

Technical Weaknesses Most evaluations work best when measured to stated goals, objectives, and standards. But the evaluators, clients, and other stakeholders may find it difficult to agree on what to measure. In addition, it can be difficult to determine the focus of the evaluation. As we have learned, a good evaluation will identify the theory of change and underlying assumptions about the evaluation program. A “logic model” may be developed to show “if this…, then this… will result” causal chain of reasoning. It may further list the key assumptions underlying the program. If developed with program stakeholders, the logic model can ensure a common understanding of the program’s goal and objectives, activities, and the results for which the program should be accountable. Thus, the theory of change or logic model can help avoid potential conflicts about understanding of the program before they become a political problem. Murray identifies a second common technical problem that leads to political problems – measuring one level of an organization but generalizing about another. This causes problems when the underlying assumptions are not worked out showing the links between the performance of the individuals, programs, or functions and the organization as a whole. Again, a theory of change model is one way to help identify the underlying assumptions (Murray, 2002, p. 4).

Page 780

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior

Human Weaknesses As humans, we often do what is in our self-interest and this can be a source of psychological tendency. Cutt and Murray (2000), identify three human factors that can affect politics: •

“Look-Good-Avoid-Blame” (LGAB) mindset

•

“Subjective Interpretation of Reality” (SIR) phenomenon

•

trust factors.

The look-good-avoid-blame (LGAB) mindset identifies a common human characteristic. People want to succeed. They want to avoid being associated with failure. Most evaluations have the intent of revealing successes where they exist, but also of revealing problems that exist, and providing information to help address the problems. However, people believe that someone will be blamed for the problems that are identified and they do not want to be the one blamed. Of course, if the evaluation finds positive results, they want to take credit for the results. Whenever a LGAB situation occurs, it is likely to make the situation a political one. People will focus on what makes them look good. If there are negative outcomes, people will go to great lengths to explain the results as “beyond their control”. Alternatively they may challenge the evaluation scope and approach (one reason why up-front agreements on evaluation design are so important). The subjective interpretation of reality (SIR) phenomenon also arises during the interpretation and explanation of evaluation data. Any time we look at human behavior, there are multiple variables and little control over them. Any human behavior may be explained by different theories. Thus two people witnessing the same event may describe it as “a teacher losing control of a classroom” or “a teacher who has fully engaged a class on an issue”. Evaluators have pre-existing beliefs and attitudes about what works. This means that evaluation results may be mixed with subjective interpretations, based on those pre-existing beliefs and attitudes. For example, the evaluator who believes that a good teacher is one who (i) has children in their seats, speaking one at a time, when called upon, and (ii) maintains discipline may be more likely to see the events as chaos or loss of control. The evaluator who has his or her own child in an “open classroom” may be more likely to see the same event as an example of a teacher engaging students.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 781

Chapter 14 The SIR phenomenon is one of the reasons evaluators need to do literature reviews to learn about what works and what the issues have been in similar programs. Another way to lessen the bias of subjective reality is to use multiple data collection methods, such as, questionnaires, interviews, or structured observations, with multiple sources as a triangulation strategy. The third factor identified by Cutt and Murray is the trust factor. It can trigger (or cause) the LGAB or SIR factors to come into play. Trust is the belief in the integrity or ability of a person. If people feel another person lacks integrity or ability, they may mistrust that person. They may fear that this person can do them harm. Trust is measured in degrees, varying from partial trust (only in certain context or about certain matters) to full trust (in all things). When distrust occurs, it is likely that the LGAB or SIR phenomenon will cause politics to enter into the relationship. Keep in mind when reading the following sections, that biases can be consciously unknown to the person who displays them.

Identifying Political Games It is impossible to keep evaluation completely separate from politics. Evaluators should not even assume they can – so the issue is always about how to manage the political situation (whatever it is) that evaluators find themselves in. But there are ways evaluators can take some control over political situations. We have given many of them in these chapters, for example, involving those being evaluated in identifying possible questions the evaluation asks. Murray (2002) suggests a beginning step in managing political gaming in evaluations is to classify the games by the role of the people involved.

Page 782

•

people being evaluated games

•

evaluator games

•

other stakeholder games.

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior

Political Games of People Being Evaluated Often, people being evaluated want to avoid unwanted formal scrutiny of their activities. They may respond by: •

denying the need for an evaluation – who is asking for it

•

claiming the evaluation will take too much time away from their normal workload

•

claiming evaluation is a good thing, but introducing delaying tactics,( i.e. timing is not good, they just surveyed themselves)

•

seeking to form close personal relationships with the evaluator(s) to convince the evaluator(s) to trust him or her.

Once the evaluation has begun and data are being collected, the people being evaluated may play political games by: •

omitting or distorting the information they are asked to provide so they do not look bad

•

giving the evaluator(s) huge amounts of information so they have difficulty sorting out what is relevant and what is not (can be called a “snow job”)

•

coming up with new data towards the end of the evaluation.

Once the data are collected and evaluators are looking to interpret the data, and what it means, people being evaluated may respond by: •

denying the problem exists

•

downplaying the importance of the problem as they already knew about it and are implementing changes or attributing it to others or forces beyond their control

•

arguing that the information is now irrelevant because things have changed.

Political Games of Other Stakeholders Other stakeholders may also affect the politics of an evaluation. Different stakeholders have different agendas and concerns. Stakeholders play many of the political games used by those being evaluated. If the stakeholders were not involved in identifying the evaluation’s major questions, they may decide the evaluation looked at the wrong things. In addition, they may try to get others, such as the media, to criticize the organization and indicate how the evaluation should have been done differently.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 783

Chapter 14

Political Games of Evaluators Evaluators can also play evaluation “games”. Some of the games evaluators play during the design of the evaluation are: •

insisting that evaluations be quantitative (statistics don’t lie)

•

using the “experts know best” line (i.e. evaluators do not trust those being evaluated).

During data collection, some evaluators may subvert the process by collecting their own information “off the record”, but failing to define what that means. The informal information can then enter into the interpretation phase of the evaluation. Most evaluator game playing occurs during the interpretation phase of the evaluation. Some of the games evaluators play during interpretation may be: •

not stating, or else shifting the measurement standards

•

applying unstated criteria to decision-making

•

applying unstated values and ideological filters to the data interpretation – such as deciding that one source of data is not to be trusted

•

ignoring data that do not conform to the evaluator’s conclusion.

Managing Politics in Evaluations Since politics in evaluation is inevitable, it is important to learn how to manage it. Trust is a large part of politics. Throughout the entire evaluation process, the evaluator should be building trust. Ideally, during each phase of an evaluation, there would be open discussions giving all players involved a chance to discuss their concerns, and at least agree to disagree, about their differences. They would use logic models and standards to discuss the evaluation and see where all stand on the important issues (Murray, 2002, pp. 8-10).

Building Trust If trust does not exist, different players become more concerned with their own interests. They try to win the political games more often than the persons they consider the opponent.

Page 784

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior How do you build trust? It usually takes time and many encounters among all of the players. Murray suggests trusting consciously, involving all interested parties in the process, particularly those who are to be evaluated. He identifies six questions that all must have a voice in deciding the answers: •

What is the purpose of the evaluation?

•

What should be measured?

•

What evaluation methods should be used?

•

What standards or criteria should be applied to the analysis of the information obtained?

•

How should the data be interpreted?

•

How will the evaluation be used?

This will help with many evaluations. In some, the information may be treated as input for the decision of the evaluation team. But, if the evaluation involves a strong suspicion of malfeasance among the evaluatees, the evaluatees may consciously suppress or distort information. In these cases, the probable solution would be an independent external evaluation. Recall that more information was covered about building trust in Chapter 12, Managing for Quality and Use under the topic “Working with Groups of Stakeholders”.

Building Theory of Change or Logic Models Murray discusses the importance of making sure all parties involved in the evaluation fully understand the underlying logic. He suggests the theory of change or logic model is one way to articulate the logic so that there is little room for misunderstanding. He views use of the six evaluation questions as key for providing the information needed to build the theory of change or logic model.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 785

Chapter 14

Balancing Stakeholders with Negotiation One of the biggest challenges for evaluators is dealing with multiple stakeholders. Evaluators need strong negotiating skills to manage multiple stakeholders’ interests and often competing agendas. Anne Markiewicz (2005, pp. 13-21) describes a negotiation model for evaluation in two parts: principles and practice.

Principles for Negotiating Evaluation The following is adapted from Markiewicz’s (2005) list of principles for negotiating evaluations: •

recognize the inherently political nature of evaluation

•

value the contribution of multiple stakeholders

•

assess stakeholder positions and plan the evaluation

•

assure evaluator is an active player within the stakeholder community

•

develop the skills of the evaluator as negotiator responding to conflict

•

develop skills in managing conflict with multiple stakeholders.

Recall the earlier discussion of the political nature of evaluation and the value of the contribution of multiple stakeholders. Markiewicz (2005) suggests one key strategy is to organize the stakeholders into reference groups, steering committees, or advisory committees to oversee the evaluation process. Take note that this model moves into the “evaluator as facilitator” model. It is not as applicable for independent evaluations with accountability a major purpose. It is important that these groups have clearly defined roles and functions. The reference group needs to have ground rules defining how active the members are to be in the valuation process. According to Markiewcz (2005), once the evaluator establishes a level of credibility and acceptance with the stakeholders, the evaluator needs skills to negotiate areas of conflict or dispute among the stakeholders. The evaluator needs to act as a catalyst to assist stakeholders at arriving at their own solutions. To do this, the evaluator needs strong communication skills which include active and reflective listening, asking appropriate questions, and checking understanding. The evaluator also needs to keep the negotiation process focused, as well as to facilitate and encourage interaction among all stakeholders. Page 786

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior Evaluators need to develop negotiating skills. Many do not have the skills needed for negotiating. In some cases, evaluators may need to arrange for additional training and practice of negotiating skills (evaluator as facilitator model). Another way for evaluators to develop negotiating skills is to work with peers to share experiences of conflict resolutions, both successful and unsuccessful. Michael Q. Patton (1997, pp. 355-357) suggests a minimum of four meetings take place, with longer-term projects requiring more meetings. During the meetings, the group would consider the following: •

first meeting: focus of the evaluation

•

second meeting: methods and measurement tools

•

third meeting: instrumentation developed prior to data collection

•

fourth meeting: review the emergent data to find agreement on interpretations which will lead to findings.

Markiewicz (2005) discusses the active role evaluators should play with stakeholders. Two characteristics she describes as valuable are to be both responsive and flexible enabling the stakeholders to engage in the process. She also discusses the difficulties if the evaluator becomes too close and has too much interpersonal interaction with the stakeholders. Patton suggests remaining focused on the empirical process, and assisting stakeholders to do so as well. This helps keep relationships objective and avoids the intrusion of bias or misuse of findings.

Negotiation Evaluation Practice Markiewicz (2005) identifies three stages to a model for evaluation negotiation. The model includes: •

initial stage: positions are put on the table

•

middle stage: active negotiation

•

last stage: steps are taken to reach consensus.

To use this model, the evaluation negotiator needs to have a range of skills that are both empathetic and assertive. The empathetic skills create a climate that is conducive for the negotiation process. The assertive skills provide structure to the process. This is a delicate balance.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 787

Chapter 14 Empathy can be defined as “The process of demonstrating an accurate, non-judgmental understanding of the other side’s needs, interest, and positions” (Mnookin, Peppet & Tulumello, 1996, pp. 20-35). They suggest two components to empathy in negotiation. •

The first component is to see the world through the eyes of the other. That is, [the evaluators are] to put themselves in the place of the other person to try to see how they feel.

•

The second component of empathy is to express the other person’s viewpoint. That is, to actually state it in words.

This technique involves translating the understanding of the experience of the other into a shared response. Markiewicz (2005) believes empathy is an important characteristic for acquiring information about other goals, values, and priorities. Empathy becomes the catalyst for inspiring openness in others and becomes a persuasive tool for negotiating. Once the evaluator has a good understanding of the views of each stakeholders, he or she needs to paraphrase (restate) that understanding to the stakeholders (Hale, 1996, pp. 147-162). Once the evaluator restates his or her understanding, the evaluator should ask the parties if his or her understanding was correct and to clarify any differences. Active and reflective listening help the evaluator attend to what is being said, ask appropriate questions, and check the understanding of what the stakeholders say. Assertiveness is very different from empathy. Assertiveness is the ability to express and advocate for one’s own needs, interest, and positions. (Mnookin, et al, 1996) In negotiating evaluations, it might also be described as facilitator authority. It can be difficult balancing between empathy and assertiveness. Mnookin, et al (1996) see empathy and assertiveness as two interdependent dimensions of negotiation behavior. When used together, they can produce substantial benefits in negotiation. This process of empathy and assertiveness brings a better understanding of the needs of stakeholders.

Page 788

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior

Part III: Evaluation Standards and Guiding Principles Professional associations develop standards or guidelines to help their members make ethical decisions. Professional associations in Europe and many countries, including the United States, Canada, and Australia have established ethical codes for evaluators. Currently, standards and principles developed by the American Evaluation Association (AEA) are serving as the platform for other groups, such as AfrEA (African Evaluation Association), to modify and adapt to their local circumstances or situation. These two AEA documents are: •

Program Evaluation Standards

•

Guiding Principles for Evaluators.

The Joint Committee on Standards developed the Program Evaluation Standards for Educational Evaluation. They were designed to assist both evaluators and consumers in judging the quality of one particular evaluation. The American Evaluation Association developed the Guiding Principles for Evaluators to provide guidance for evaluators in their everyday practice. The biggest difference between these two documents is their purpose. The Standards are concerned with professional performance while the Guiding Principles are concerned with professional values. The Standards focus on the product of the evaluation while the Guiding Principles focus on the behavior of the evaluator. Both documents inform us about ethical and appropriate ways to conduct evaluations. The Web site address for accessing the Standards and Guiding Principles are listed in the bibliography.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 789

Chapter 14

Program Evaluation Standards The AEA Program Evaluation Standards are grouped into four categories: •

utility

•

feasibility

•

propriety, including

•

−

service orientation

−

formal agreements

−

rights of human subjects

−

human interactions

−

complete and fair assessment

−

disclosure of findings

−

conflict of interest

−

fiscal responsibility

accuracy.

To better understand the AEA Program Standards, let us look closer at each of the eight specific standards under propriety (American Evaluation Association Program Standards, 2005).

Page 790

•

Service orientation: addresses the need for evaluators to serve not only the interests of the agency sponsoring the evaluation but also the learning needs of program participants, community, and society.

•

Formal agreements: includes such issues as following protocol, having access to data, clearly warning clients about the evaluation limitations, and not promising too much.

•

Rights of human subjects: includes such things as obtaining informed consent, maintaining rights to privacy, and assuring confidentiality.

•

Human interactions: is an extension on the rights of human subjects. It holds that evaluators must respect human dignity and worth in all interactions. No participants in the evaluation should be humiliated or harmed.

•

Complete and fair assessment: this standard aims to ensure that both the strengths and weaknesses of a program are portrayed accurately. The evaluator needs to ensure that he or she does not “tilt” the study to satisfy the sponsor or appease other groups.

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior •

Disclosure of findings: deals with the evaluator’s obligation to serve the broader public who benefit from both the program and its accurate evaluation, not just the clients or sponsors. Findings should be publically disclosed.

•

Conflict of interest: evaluators must make their biases and values explicit in as open and honest way possible so that clients are alert to these biases that may unwittingly creep into the work of even the most honest evaluators.

•

Fiscal responsibility: includes evaluators making sure all expenditures are appropriate, prudent, and well documented. It also includes nontrivial costs to personnel involved in that which is evaluated, including time and effort in providing, collecting, or facilitating the collection of information requested by evaluators and the time and energy expended in explaining evaluations to various constituencies.

International Views of Evaluation Standards The AEA has always maintained that The Program Evaluation Standards are uniquely American and may not be appropriate for use in other countries without adaptations. In 2000, The W. K. Kellogg Foundation funded a residency meeting of regional and national evaluations that took place in Barbados, West Indies. Several international evaluation organizations were represented at the meeting. One of the issues discussed at the meetings was AEA The Program Evaluation Standards and if and how they relate to other countries. One result of the meeting was the printing of Occasional Papers discussing this issue (Russon, 2000, p. 1). In the first Occasional Paper, Taut (2000, p. 6) begins her paper by stating the assumption that The Program Evaluation Standards developed by the AEA are values-based. She also states (based on a large amount of research on cultural values) that values differ across cultures. In the paper, she examines the theory behind to use of the AEA Standards in international settings. Her study investigates how the differing values influence the usability of the AEA standards by societies with differing values. As evaluation becomes a stronger profession, standards are becoming an important issue. The process used to develop standards takes much effort and can be very costly. It is logical for evaluators to use the AEA standards as a model for their work (Joint Committee, 1994, pp. xvi-xviii; Stufflebeam, 1986, p. 2). Taut (2000, p. 7) summarizes the use of the AEA standards by other countries in this way:

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 791

Chapter 14 •

The Standards were translated into German, with minor adaptations – they found them transferable to the German cultural context.

•

The Swiss Evaluation Society developed Standards closely following the North American model – however, they did not specifically address the degree of transferability of the Standards.

•

The Spanish translated the standards – however they did not specifically address the degree of transferability of the Standards.

•

Of the nine African evaluators, attending a focus group discussion at a UNICEF Evaluation Workshop, the majority found the Standards to be transferable if some modifications were undertaken. In their eyes, the Standards are not laden with values incompatible with African values. During the discussion, the participants suggested adaptations of twelve Standards. A minority held that African Evaluators should develop their own Standards.

•

The Standards were also used in Israel, Brazil, and Sweden where they concluded the Standards need at least some adaptation to fit the host culture.

She goes on to examining the values in the Standards and compares these values to cultural value dimensions identified in cross-cultural literature (Taut, 2000, p. 8). The cultural value dimensions she identifies as the most used are: •

individualism vs. collectivism

•

hierarchy vs. egalitarianism (or power distance)

•

conservatism vs. autonomy

•

mastery vs. harmony

•

uncertainty avoidance.

Additional, less common psychological concepts she uses to describe differences between cultures are: •

direct vs. indirect communication

•

high-context vs. low –context

•

seniority.

In her summary Taut states that “it seems that what is useful and ethical differs to a greater extent across cultures than what is feasible and accurate” (2000, p. 24). She later summarizes that “it becomes clear that Propriety Standard issues are highly dependent on both political and cultural influences” (p. 24). Page 792

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior Finally, she concludes her paper with a recommendation for those foreign evaluators who see the use of a set of Standards as beneficial but do not feel the existing Standards meet their needs. She recommends that evaluators from cultures foreign to the United States describe their societies with regard to the cultural dimensions discussed in her paper. She also recommends that evaluators consult with colleagues for their perceptions and with cultural experts to guide their analysis.

Guiding Principles for Evaluators The American Evaluation Association strives to promote ethical practice in the evaluation of programs, personnel, and policy. Towards that end, AEA developed Guiding Principles to assist evaluators in their professional practice. The Joint Committee on Standards for Educational Evaluation was founded in 1975 to develop standards for educational evaluation. Originally initiated by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education, the Joint Committee now includes many other organizations in its membership. AEA is one of those organizations, and has a representative to the Joint Committee. The Joint Committee has developed a set of standards for the evaluation of educational programs as well as for evaluating personnel. The American Evaluation Association’s (1995) Guiding Principles (listed below) contain many of the common elements now found in the various sets of ethical guidelines subsequently developed around the world. •

Systematic inquiry: that evaluators conduct systematic, data-based inquiries

•

Competence: that evaluators provide competent performance to stakeholders

•

Integrity/honesty: that evaluators ensure the honesty and integrity of the entire evaluation process

•

Respect for people: that evaluators respect the security, dignity, and self-worth of respondents, program participants, clients, and other stakeholders with whom they interact

•

Responsibilities for general and public welfare: that evaluators articulate and take into account the diversity of interests and values that may be related to the general public welfare.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 793

Chapter 14 Further information about the Joint Committee's work and requests for reprints may be addressed to: The Joint Committee on Standards for Educational Evaluation, The Evaluation Center, Western Michigan University, Kalamazoo MI 49008-5178, USA. The AEA Ethics Committee oversaw a major review and update of the Principles in 2004 and subsequent vetting with the membership. The full version of the Guiding Principles is available online at http://www.eval.org/Publications/GuidingPrinciples.asp. An abbreviated version, in brochure form, is available for downloading free from the Internet for use with clients, in the classroom, or in other professional venues. It is called the Guiding Principles for Evaluators. This publication and further information about the Guiding Principles and the Program Evaluation Standards can be found at the AEA website, www.eval.org. The Australasian Evaluation Society has produced a similar set of ethical guidelines for evaluators, which are available on their website at http://www.aes.asn.au/content/ethics_guidelines.pdf. The Canadian Evaluation Society has established Guidelines for Ethical Conduct. They are available online at http://www.evaluationcanada.ca . The European Evaluation Society has yet to develop a set of guidelines or principles for evaluators, but the Swiss Evaluation Society (SEVAL) has standards available on their website, http://seval.ch/. The German Society for Evaluation (DeGEval) has also adopted a set of standards (http://www.degeval.de/standards/standards.htm). The Italian Evaluation Association has a set of guidelines comparable to the AEA Guiding Principles (see http://www.valutazioneitaliana.it/statuto.htm#Linee). The African Evaluation Association has a draft Evaluation Standards and Guidelines at: http://www.afrea.org/ . The Asian Development Bank has contributed to good practice by articulating its own ethical standards.

Page 794

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior

Evaluation Ethics for the UN System The United Nations has also addressed evaluation ethics in their Norms for Evaluation in the UN System (2005, p. 10). These include: •

Evaluators must have personal and professional integrity.

•

Evaluators must respect the right of institutions and individuals to provide information in confidence and ensure that sensitive data cannot be traced to its source. Evaluators must take care that those involved in evaluations have a chance to examine the statements attributed to them.

•

Evaluators must be sensitive to beliefs, manners, and customs of the social and cultural environments in which they work.

•

In light of the United Nations Universal Declaration of Human Rights, evaluators must be sensitive to and address issues of discrimination and gender inequality.

•

Evaluations sometimes uncover evidence of wrongdoing. Such cases must be reported discreetly to the appropriate investigative body. Also, the evaluators are not expected to evaluate the personal performance of individuals and must balance an evaluation of management functions with due consideration for this principle.

The United Nations Evaluation Group (UNEG) has also established Standards for Evaluation in the UN System. The UN Standards for Evaluation include standards concerning ethics. Some relate to the norms discussed above. The following is a list of the UN standards for ethics: •

Evaluators should be sensitive to beliefs, manners, and customs and act with integrity and honesty in their relationships with all stakeholders.

•

Evaluators should ensure that their contacts with individuals are characterized by the same respect with which they would want to be respected.

•

Evaluators should protect the anonymity and confidentiality of individual informants.

•

Evaluators are responsible for their performance and their product(s).

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 795

Chapter 14

Conflict Conflict of Interest Explicit or implicit conflict of interest (see the AEA Program Standards) is a major issue potentially affecting the credibility of evaluators and the soundness of the evaluation. Evaluators should self-attest, for each evaluation, whether they are free from conflict of interest or appearance of conflict of interest. Some organizations have developed guidelines. One such example is that of the Operations Evaluation Group of the Asian Development Bank. The Guidelines prohibit staff or consultants from evaluating works they were involved in. This is a good practice for evaluation units. The Operations Evaluation Group of the Asian Development Bank Guidelines are at the following Web site: http://www.adb.org/documents/guidelines/evaluation/indepe ndent-evaluation.pdf.

Summary Ethics are a set of values and beliefs that guide choices. What causes people to go against their ethical standards? They may have a conflict of interest; sloppy and unprofessional evaluation practices; intrusions of prejudices; making promises they cannot keep; or failure to honor commitments. Misunderstandings can also cause unethical behavior, such as: clientism, contractualism, methodologicalism, relativism, and pluralism/elitism. Good communication skills are essential to understand what is expected and what stakeholders and others on the evaluation team understand. This can be a good start to ethical behavior. The United Nations, the American Evaluation Association and other agencies and associations have developed standards, guidelines, and norms to assist them define and measure quality, ethics, and norms.

Page 796

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior

Chapter 14 Activities Application Exercise 14.1: Ethics: Rosa and Agricultural Evaluation Instructions: Imagine Rosa calls you for advice and tells you the following story. What are the major ethical issues here and how would you advise Rosa to address them? Rosa met with local officials, program officials, and landowners to brief them on the upcoming evaluation of the agricultural program. Over the years, the community has received substantial amount of money to build irrigation systems, buy fertilizer, build roads, and purchase equipment. This was Rosa’s first visit, but the local team member, Eduardo, had visited the area several times and knew many of the landowners. He suggested that they all go out to dinner after the presentation to begin to build rapport. During the dinner, Rosa listened to the conversation between Eduardo and the landowners. The landowners appeared to have a close relationship with Eduardo, presenting him with a box of cigars. They discussed the needs of the area; the landowners felt that they needed more resources to effectively use the land. They wanted to bring in more equipment to replace some of the farm workers. In addition, they wanted to use more fertilizer but were prohibited because of environmental laws. Eduardo agreed and told them the upcoming evaluation could help because they could recommend that they be given an exception. The dinner ended with an invitation for Rosa to join one of the landowner’s for a tour of the area, followed by lunch with his family. Rosa felt it would be rude not to accept and made plans to meet the next day. She briefly spoke with Eduardo after the dinner and asked why he agreed with the landowner. Eduardo said that he felt it would make the landowners more cooperative if they felt they would get something positive from the evaluation. During the tour the next day, the landowner explained how hard they have worked and the progress they have made against great odds. The landowner told Rosa that he counted on her to support their efforts. If there was a negative evaluation, he and his family could not survive. As a token of his appreciation, he gave her a necklace that he said had been in his family for generations.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 797

Chapter 14 After the tour and lunch with the landowner’s family, Rosa met with the program manager. He had mapped out a schedule of whom she was to meet with during the three remaining days. He also set up two community meetings; these meetings included the landowners, several agricultural extension workers, several members of the business community that sell agricultural equipment and fertilizer, and several exporters of agricultural products. When asked why none of the farm workers and their families was included, she was told they did not have anything of value to contribute to evaluating the effectiveness of the project. She asked whether there were others in the community that she should talk to. She was told that the program manager had taken pains to make sure that all the right people were included so she would have an easy job in assessing the program.

Page 798

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior

Resources and Further Reading American Evaluation Association Program Standards. (2005). Retrieved August 14, 2007 from: http://www.eval.org/EvaluationDocuments/progeval.html Cutt, James and Vic Murray (2000). Accountability and effectiveness evaluation in nonprofit organizations. London: Routledge. Fitzpatrick, Jody. L., James R. Sanders, and Blaine R. Worthen (2004). Program Evaluation: Alternative Approaches and Practical Guidelines. New York: Pearson Education Inc. Hale, K. (1998). “The language of co-operation: Negotiation frames”. Mediation Quarterly, Vol. 16. No. 2. pp. 147-162. House, E. R. (1995). Principles evaluation: A critique of the AEA Guiding Principles. In W. R. Shadish, D. L. Newman, M. A. Scheirer, and C. Wye (Eds.), Guiding principles for evaluators, New Directions for Program Evaluation, No. 66, pp. 27-34. San Francisco: Jossey-Bass. Joint Committee on Standards for Educational Evaluation (1994). The Program Evaluation Standards. Thousand Oaks, CA: Sage. Markiewicz, Anne (2005). ‘A balancing act’: Resolving multiple stakeholder interests in program evaluation. In Evaluation Journal of Australasia, Vol. 4 (new series), Nos. 1 & 2, March/April 2005. pp. 13-21 Mnookin, R, S. Peppet, & A. Tulumello (1996). “The tension between empathy and assertiveness”. In Negotiation Journal, Vol. 12. No. 3. pp. 20-35. Morris, M. and R. Cohn (1993). Program evaluators and ethical challenges: A national survey. Evaluation Review; 17:621 – 642. Molund, Stefan and Göran Schill (2004). Looking Back, Moving Forward: SIDA Evaluation Manual. Stockholm, SIDA. Patton, M. Q. (1997). Utilization-focused evaluation: The new century text, 3rd ed. Thousand Oaks, CA: Sage Publications. Russon, Craig (2000). The Program Evaluation Standards in International Settings. Retrieved February 12, 2008 from http://www.wmich.edu/evalctr/pubs/ops/ops17.pdf Stufflebeam, D. L. (1986). Standards of practice for evaluators. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco, CA.

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 799

Chapter 14 Tassie, A.W., V. V. Murray, J. Cutt, and D. Gragg (1996). Rationalitiy or politics: What really goes on when funders evaluate the performance of fundees? Nonprofit and Voluntary Sector Quarterly, 25(3), September 1996, pp. 347363. Taut, Sandy (2000). “Cross-cultural transferability of The Program Evaluation Standards.” in The program evaluation standards in international settings. Editor Craig Russon. Retrieved February 12, 2008 from http://www.wmich.edu/evalctr/pubs/ops/ops17.pdf Mnookin, R., S. Peppet, and A. Tulumello (1996). “The tension between empathy and assertiveness”. In Negotiation Journal, Vol. 12. No. 3. pp. 20-35. Murray, Vic V. (2002). Evaluation games: The political dimension in evaluation and accountability relationships. Retrieved June 2006 from: http://www.vserp.ca/pub/CarletonEVALUATIONGAMES.p df United Nations Evaluation Group (UNEG), (2005). Norms for evaluation in the UN system. Retrieved August 14, 2007 from: http://www.uneval.org/index.cfm?module=UNEG&Page=U NEGDocuments&LibraryID=96

Web Sites African Evaluation Association: http://www.geocities.com/afreval/ American Evaluation Association: www.eval.org AEA Guiding Principles: http://www.eval.org/Publications/GuidingPrinciples.asp

Canadian Evaluation Society: www.evaluationcanada.ca DFID on SWaps http://www.keysheets.org/red_7_swaps_rev.pdf European Evaluation Society: www.europeanevaluation.org Evaluation Center, Western Michigan University: http://www.wmich.edu/evalctr/ Government Organizations and NGOs http://www.eval.org/Resources/govt_orgs_&_ngos.htm Human Rights Education: www.hrea.org/pubs/EvaluationGuide/ The Institute of Internal Auditors, http://www.theiia.org The International Organization of Supreme Audit Institutions http://www.gao.gov/cghome/parwi/img4.html Page 800

The Road to Results: Designing and Conducting Effective Development Evaluations

Ethical Behavior Linkages Between Audit and Evaluation in Canadian Federal Developments”, Treasury Board of Canada http://www.tbssct.gc.ca/pubs_pol/dcgpubs/TB_h4/evaluation03_e.asp Monitoring and Evaluation of Population and Health Programs, MEASURE Evaluation Project, University of North Carolina at Chapel Hill: http://www.cpc.unc.edu/measure Murray, Vic V. Evaluation games: The political dimension in evaluation and accountability relationships. http://www.vserp.ca/pub/CarletonEVALUATIONGAMES.pdf

National Aeronautics and Space Act of 1958: http://www.hq.nasa.gov/office/pao/History/spaceact.html OECD DAC, Principles for Evaluation of Development Assistance, http://www.oecd.org/dataoecd/31/12/2755284.pdf OECD, DAC Criteria for Evaluating Development Assistance, http://www.oecd.org/document/22/0,2340,en_2649_3443 5_2086550_1_1_1_1,00.html Participatory Monitoring and Evaluation: Learning from Change, IDS Policy Briefing, issue 12, November 1998 http://www.ids.ac.uk/ids/bookshop/briefs/brief12.html Proposal for Sector-wide Approaches (SWap) http://enet.iadb.org/idbdocswebservices/idbdocsInternet/I ADBPublicDoc.aspx?docnum=509733 UNFPA List of Evaluation Reports and Findings. United Nations Population Fund. Online: http://www.unfpa.org/publications/index.cfm United Nations Development Project Evaluation Office: www.undp.org/eo/ United Nations Evaluation Group (UNEG). Norms for Evaluation in the UN System. and United Nations Evaluation Group (UNEG), Standards for Evaluation in the UN System. http://www.uneval.org/index.cfm?module=UNEG&Page=U NEGDocuments&LibraryID=96 World Bank: www.worldbank.org The World Bank Participation Sourcebook. Online (HTML format): http://www.worldbank.org/wbi/sourcebook/sbhome.htm

The Road to Results: Designing and Conducting Effective Development Evaluations

Page 801

Blue-Pelican Java Textbook, ver 1.10

CellDesignerTM Ver. 4.1 - GitHub

vER MÃS.pdf

Technology Risk Checklist Ver 7point3

SSC_CGL_2016_Tier_2_English_Previous_Year_Paper (2).pdf-61

62% 61% Services

Lipar 61.pdf

14-61.pdf

Musereum Whitepaper ver.0.9.1.4.pdf

EasyGamma3D User Manual (Ver. 1.0) -

61 Ende liturgi.pdf

61 TRI TRESI.pdf

iClinicSys ver 2 User Manual.pdf

CIAWM1.8 lores (online ver).pdf

a flying jatt ver pelicula online ...

BakerVFirstcom-RICO_Complaint-ver.2.2.pdf

APPLICATION VER 2.1.pdf

APIS Whitepaper Ver.2.2.pdf

2008 1152-2008 acto cÃvico.pdf

EasyGamma3D User Manual (Ver. 1.0) -

BakerVFirstcom-RICO_Complaint-ver.2.2.pdf

NGLS Ver 1.1.pdf

26-61.pdf