Evaluation in Development: An Overview of Aims, Lexicon, & Emerging Trends

Evaluation in Development: An Overview of Aims, Lexicon, & Emerging Trends Mark A. Constas, Associate Professor Charles H. Dyson School of Applied Economics & Management Cornell University Presented at the Workshop on Evaluation Cornell International Institute for Food and Agriculture November 5, 2011

Topics Addressed and Overall AimA quick and frenzied tour of evaluation Aims -Purposes -Terminology -Criteria -Planning -Resources • I. Definitions/distinctions • Monitoring & evaluation • II. Lexicon • Language of monitoring & evaluation • Issues in development policy that have/will influence evaluation • III. Factors that influence evaluation planning • Evidence and scale • IV. Evaluation Standards • Qualities of evaluations • V. Trends: The current context of development • Evaluation priorities/preferences & mandates • VI. Resources • Publications, reports, guides for evaluation • VII. Questions & discussion • During or after

I. Distinctions betweenMonitoring & Evaluation • Monitoring • Focuses on how program resources are allocated and how an intervention is implemented • Evaluation • Focus on the results produced by resources, processes, and implementation

I. Monitoring: Definition • Monitoring is a continuous function providing managers and key stakeholders with regular feedback on the consistency or discrepancy between planned and actual activities and programme performance and on the internal and external factors affecting results. • Monitoring provides an early indication of the likelihood that expected results will be attained. It provides an opportunity to validate the programme theory and logic and to make necessary changes in programme activities and approaches. • Information from systematic monitoring serves as a critical input to evaluation. From UNDP Evaluation Policy (2011) http://www.undp.org/evaluation/documents/Evaluation-Policy.pdf

I. Evaluation: Definition • Evaluation is a judgment made of the relevance, appropriateness, effectiveness, efficiency, impact and sustainability of development efforts, based on agreed criteria and benchmarks among key partners and stakeholders. • It involves a rigorous, systematic and objective process in the design, analysis and interpretation of information to answer specific questions. • It provides assessments of what works and why, highlights intended and unintended results, and provides strategic lessons to guide decision-makers and inform stakeholders. From UNDP Evaluation Policy (2011) http://www.undp.org/evaluation/documents/Evaluation-Policy.pdf

II. Key/Common TermsMy top 30 terms Sequence of M & E events • Programs/interventions (9) • How are programs/interventions modeled? • Evaluation approaches (7) • How are programs/interventions studied? • Evaluation Designs (7) • How are data collection/analysis events arranged over time, within/between samples, across indicators and in relation to program participants? • Technical features(7) • What are the technical features of evaluation that allow one to draw conclusions/make judgments about a programs operations/results?

II. Key/Common Terms • Glossary of Evaluation Terms • USAID http://pdf.usaid.gov/pdf_docs/PNADO820.pdf • UNDP http://www.un.org/Depts/oios/mecd/mecd_glossary/index.htm • Western Michigan University - http://ec.wmich.edu/glossary/glossaryList.htm

II. Key/Common Terms Programs/Interventions • Program - A set of interventions, activities or projects that are typically implemented by several parties over a specified period of time and may cut across sectors, themes and/or geographic areas. • Input – Resources provided for program implementation. Examples are money, staff, time, facilities, equipment, etc. • Output - The products, goods, and services which result from an intervention. • Outcome - A results or effect that is caused by or attributable to the project, program or policy. Outcome is often used to refer to more immediate and intended effects. • Impact - A results or effect that is caused by or attributable to a project or program. Impact is often used to refer to higher level effects of a program that occur in the medium or long term, and can be intended or unintended and positive or negative.

II. Key/Common Terms Programs/Interventions (cont) • Processes - The programmed, sequenced set of things actually done to carry out a program or project. • Logic model - A logic model, often a visual representation, provides a road map showing the sequence of related events connecting the need for a planned program with the programs’ desired outcomes and results. • Results framework -A management tool, that presents the logic of a project or program in a diagrammatic form. It links higher level objectives to its intermediate and lower level objectives. …The results framework is used by managers to ensure that its overall program is logically sound and considers all the inputs, activities and processes needed to achieve the higher level results. • Program theory – The arguments (ideally based in data) that justify causal connections between one part of an intervention/component and another part (rel. knowledge of mechanism)

II. Key/Common Terms General Types of Evaluation • Evaluability assessment -A study conducted to determine a) whether the program is at a stage at which progress towards objectives is likely to be observable; b) whether and how an evaluation would be useful to program managers and/or policy makers; and, c) the feasibility of conducting an evaluation. • Formative -An evaluation conducted during the course of project implementation with the aim of improving performance during the implementation phase. • Process evaluation -An assessment conducted during the implementation of a program to determine if the program is likely to reach its objectives by assessing whether or not it is reaching its intended beneficiaries (coverage) and providing the intended services using appropriate means (processes). • Summative evaluation -Evaluation of an intervention or program in its later stages or after it has been completed to (a) assess its impact (b) identify the factors that affected its performance (c) assess the sustainability of its results, and (d) draw lessons that may inform other interventions.

II. Key/Common Terms General Types of Evaluation (cont) • Impact evaluation -A systematic study of the change that can be attributed to a particular intervention, such as a project, program or policy. Impact evaluations typically involve the collection of baseline data for both an intervention group and a comparison or control group, as well as a second round of data collection after the intervention, some times even years later. • Sector Program Evaluation- An evaluation of a cluster of interventions in a sector within one country or across countries, all of which contribute to the achievement of a specific goal • Meta-evaluation -A systematic and objective assessment that aggregates findings and recommendations from a series of evaluations.

II. Key/Common Terms Evaluation Designs • Participatory Evaluation - An evaluation in which managers, implementing staff and beneficiaries work together to choose a research design, collect data, and report findings. • Experimental Design: A methodology in which research subjects are randomly assigned to either a treatment or control group, data is collected both before and after the intervention, and results for the treatment group are benchmarked against a counterfactual established by results from the control group (ref RCTs, RFTs) • Quasi-experimental Design: A methodology in which research subjects are assigned to treatment and comparison groups typically through some sort of matching strategy that attempts to minimize the differences between the two groups in order to approximate random assignment. • Regression Discontinuity –Comparison groups constructed using cutoff score method (e.g., high vs. low group) • Observational methods/secondary data analysis- Evaluations that rely use existing data sets to draw conclusions about the effects of an intervention/program (rel. sec data)

II. Key/Common Terms Evaluation Designs (cont) • Qualitative/interpretative – relies primarily on data collection tools that produce textual (rather than quant) accounts of phenomena. Qualitative perspective also values localized, interpretative perspectives that may help explain the operations/successes/failures of programs • Ethnographic – similar to qualitative but employs culture as a construct through which an the experiences/successes/failures of an intervention/program is are understood. • Mixed methods evaluation - Use of both quantitative and qualitative methods of data collection in an evaluation (rel. triangualation)

II. Key/Common Terms Technical Features • Counterfactual –A hypothetical statement of what would have happened (or not) had the program not been implemented. • Sample - method and rationale used to guide decision about who/what is used a the target for data collection • Cluster sampling- A sampling method conducted in two or more stages in which each unit is selected as part of some natural group rather than individually (such as all persons living in a state, city block, or a family). • Bias- non random/systematic error that may distort indicators, distributions, or estimates of an intervention’s effect • Internal validity -The degree to which conclusions about causal linkages are appropriately supported by the evidence collected. • External validity – The degree to which findings, conclusions, and recommendations produced by an • evaluation are applicable to other settings and contexts. • Reliability -Consistency or dependability of data with reference to the quality of the instruments, procedures and used. Data are reliable when the repeated use of the same instrument generates the same results. • Validity - The extent to which data measures what it purports to measure and the degree to which that data provides sufficient evidence for the conclusions made by an evaluation. • .

III. Factors that Influence Evaluation Planning • Geographic reach • Substantive focus • Type of inference • Type of data Two questions • What is the scope of the evaluation? • What kind of evidence is desired in connection with the program?

III. Factors that Influence Evaluation Planning Implications for Evaluation • Modeling • Sampling • Comparisons • Resources • What is the scope of the evaluation? • Geographic • Household or collection of households • Village • Province • Country • Region • Substantive • Intervention • Project comprised of multiple interventions • Program comprised of multiple projects • Sector wide evaluation comprised of multiple programs • Aid effectiveness-comprised of

III. Factors that Influence Evaluation Planning Implications for Evaluation • Data collection tasks • Immersion issues • Counterfactuals • Analysis of mechanisms • Basis of comparisons • Theoretical components What kind of evidence is desired in connection with the program/intervention? • Type of data • Quantitative • Qualitative • Contextualized • Types of Inference • Basic description • Causal description • Causal explanation

IV.Evaluation Standards American Evaluation Association Utility (8) Feasibility (4) Propriety (7) Accuracy (8) Accountability (3) source Yarbrough, D. B., Shulha, L. M., Hopson, R. K., and Caruthers, F. A. (2011). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage American Evaluation Association http://www.eval.org/evaluationdocuments/progeval.html

V. Trends in Development the Influence Evaluation • Evaluation policies • USAID • OECD • UNDP • CGIAR • World Food Program • World Bank • Paris Declaration on Aid Effectiveness (2005) • Accra Agenda for Action (2008)

V. Trends in Development that Influence Evaluation Evaluation Policy • Rajiv Shah, USAID administrator –extract from January 11, 2011 letter Our success will depend on our ability to use evaluation findings to strengthen our efforts and sharpen our decision-making. With the implementation of this policy, we expect a step change in the quantity and quality of evaluation findings that inform our own strategies, program design, and resource allocation decisions; and we will contribute to the global community with new, practical and rigorous knowledge. • Organization for Economic Cooperation andDevelopment- Development Cooperation Directorate (OECD-DCD) Robust, independent evaluation of development programmes provides information about development effectiveness of aid and helps hold donors and partner country governments accountable for results”.

V. Trends in Development that Influence Evaluation Five Principles in the Paris Declaration • 1. Ownership:Developing countries set their own strategies for poverty reduction, improve their institutions and tackle corruption. • 2. Alignment:Donor countries align behind these objectives and use local systems. • 3. Harmonisation:Donor countries coordinate, simplify procedures and share information to avoid duplication. • 4. Results:Developing countries and donors shift focus to development results and results get measured. • 5. Mutualaccountability:Donors and partners are accountable for development results

V. Trends in Development that Influence Evaluation Accra Agenda for Action (2008) • Ownership:Countries have more say over their development processes through wider participation in development policy formulation, stronger leadership on aid co-ordination and more use of country systems for aid delivery. • Inclusive partnerships:All partners - including donors in the OECD Development Assistance Committee and developing countries, as well as other donors, foundations and civil society - participate fully. • Delivering results:Aid is focused on real and measurable impact on development.

Upcoming Event to Track Developments in Development Evaluation

V. Trends in Development that Influence Evaluation Across these documents and others: • Impact assessment is called for • Counterfactuals as the “most rigorous” way to construct comparison groups noted • Random assignment is prioritized AND • Collaboration and participation are prioritized • Local capacity building is required • The importance of ensuring the sustainability of programs is highlighted

Resources & Publications Evaluation Guidelines Evaluation Standards Evaluation Reports Evaluation Initiatives Dedicated Publications Evaluation Networks

Developments in Development Evaluation Evaluation Standards

Evaluation guidelines 2010 Evaluation Standards http://browse.oecdbookshop.org/oecd/pdfs/free/4310171e.pdf

Evaluation guidelines http://siteresources.worldbank.org/EXTEVACAPDEV/Resources/4585672-1251461875432/conduct_qual_impact.pdf

Developments in Development Evaluation Evaluation guidelines

Consultative Group on International Agricultural Research Standing Panel on Impact Assessment (SPIA) http://impact.cgiar.org/ http://impact.cgiar.org/sites/default/files/images/SPIAstrategy2011-13.pdf

Guidance Network What Is NONIE? NONIE is a Network of Networks for Impact Evaluation comprised of the Organisation for Economic Co-operation and Development’s Development Assistance Committee (OECD/ DAC) Evaluation Network, the United Nations Evaluation Group (UNEG), the Evaluation Cooperation Group (ECG), and the International Organization for Cooperation in Evaluation (IOCE) 2009

New Initiative chttp://www.3ieimpact.org/what_3ie_does.html

Developments in Development Evaluation Dedicated publication First Issue March 2011

Evaluation Resources • United National Development Program (UNDP) • http://www.undp.org/evaluation/policy.htm • United Nations of Office of Internal Oversight Services (UNIOS) • Managing for Results http://www.un.org/Depts/oios/pages/manage_results.pdf • Dev & Cooperation Directorate, Dev Eval Committee (DEReC) • http://www.oecd.org/document/12/0,3746,en_2649_34435_46582796_1_1_1_1,00.html • IFPRI • http://www.ifpri.org/sites/default/files/pubs/pubs/ib/ib5.pdf • http://www.ifpri.org/book-7770/ourwork/researcharea/program-evaluation • CGIAR- Standing Panel on Impact Assessment • http://impact.cgiar.org/ • USAID – Evaluation Resources • http://www.usaid.gov/policy/evalweb/evaluation_resources.html • Network of Networks on Impact Evaluation (NONIE) • http://www.worldbank.org/ieg/nonie/ • -

Regression Discontinuity Pretest scores Cutoff score 0 100 Assigned to Comparison Assigned to Treatment

From the Research Methods Knowledge base http://www.socialresearchmethods.net/kb/quasird.php Regression-Discontinuity Design with Ten-point Treatment Effect

Evaluation in Development: An Overview of Aims, Lexicon, & Emerging Trends