Program Evaluation Methods
Archived information
Archived information is provided for reference, research or recordkeeping purposes. It is not subject à to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.
Chapter 4 - DATA COLLECTION METHODS
4.1 Introduction
The relationship between a program and its results can be established only to the extent that relevant data are available. Methods used to collect data must be selected on the basis of the nature of the data required and the sources available. The nature of the data required, in turn, will depend upon the evaluation design, the indicators used to capture the program's results and the type of analysis to be conducted.
There are several ways to classify data. For example, a distinction is often made between quantitative and qualitative data. Quantitative data are numerical observations. Qualitative data are observations related to categories (for example, colour: red, blue; sex: female, male).
The terms objective and subjective are also used in classifying data. Subjective data involve personal feelings, attitudes and perceptions, while objective data are observations based on facts that, in theory at least, involve no personal judgement. Both objective and subjective data can be qualitatively or quantitatively measured.
Data can also be classified as longitudinal orcross-sectional. Longitudinal data are collected over time while cross-sectional data are collected at the same point in time, but over differing entities, such as provinces or schools.
Finally, data can be classified by their source: primary data are collected by the investigator directly at the source; secondary data have been collected and recorded by another person or organization, sometimes for altogether different purposes.
This chapter discusses the six data collection methods used in program evaluation: literature search, file review, natural observation, surveying, expert opinion and case studies. The first two methods involve the collection of secondary data, while the latter four deal with the collection of primary data. Each of the methods can involve either quantitative and qualitative data. Each could also be used with any of the designs discussed in the previous chapter. However, certain data collection methods lend themselves better to some designs.
Note that while data collection methods are discussed in this chapter largely as elements of a research strategy, data collection is also useful to other aspects of an evaluation. In particular, several collection techniques lend themselves to the initial development of ideas for the evaluation strategies themselves, and other exploratory research related to the evaluation study. For example, a survey might help to focus the evaluation issues; a file review may assist in determining which data sources are available or most easily accessible.
References: Data Collection Methods
Cook, T.D. and C.S. Reichardt. Qualitative and Quantitative Methods in Evaluation Research. Thousand Oaks: Sage Publications, 1979.
Delbecq, A.L., et al. Group Techniques for Program Planning: A Guide to Nominal Group and Delphi Processes. Glenview: Scott, Foresman, 1975.
Dexter, L.A. Elite and Specialized Interviewing. Evanston, IL: Northwestern University Press, 1970.
Gauthier, B., ed. Recherche Sociale: de la Problématique à la Collecte des Données. Montreal: Les Presses de l'Université du Québec, 1984.
Kidder, L.H. and M. Fine. "Qualitative and Quantitative Methods: When Stories Converge." In Multiple Methods in Program Evaluation. V. 35 of New Directions in Program Evaluation. San Francisco: Jossey-Bass, 1987.
Levine, M. "Investigative Reporting as a Research Method: An Analysis of Bernstein and Woodward's All The President's Men," American Psychologist. V. 35, 1980, pp. 626-638.
Miles, M.B. and A.M. Huberman. Qualitative Data Analysis: A Sourcebook and New Methods. Thousand Oaks: Sage Publications, 1984.
Patton, M.Q. Qualitative Evaluation Methods. Thousand Oaks: Sage Publications, 1980.
Martin, Michael O. and V.S. Mullis, eds. Quality Assurance in Data Collection. Chestnut Hill: Center for the Study of Testing, Evaluation, and Educational Policy, Boston College, 1996.
Stouthamer-Loeber, Magda and Welmoet Bok van Kammen. Data Collection and Management: A Practical Guide. Thousand Oaks: Sage Publications, 1995.
Webb, E.J., et al. Nonreactive Measures in the Social Sciences, 2nd edition. Boston: Houghton Mifflin, 1981.
Weisberg, Herbert F., Jon A. Krosmick and Bruce D. Bowen, eds. An Introduction to Survey Research, Polling, and Data Analysis. Thousand Oaks: Sage Publications, 1996.
4.2 Literature Search
A literature search enables the evaluator to make the best use of previous work in the field under investigation, and hence to learn from the experiences, findings and mistakes of those who have previously carried out similar or related work. A literature search can provide invaluable insight into the program area being evaluated and should, consequently, always be undertaken at an early phase of an evaluation study.
A literature search involves an examination of two types of documents. The first consists of official documents, general research reports, published papers and books in the program area. Reviewing these documents lets the evaluator explore theories and concepts related to the program and examine generalizations that might apply to the issues being considered. A literature search may identify evaluation questions and methodologies not considered by the evaluator, thus leading to a potentially more effective evaluation. For example, past research into industrial assistance programs might suggest major differences in the effectiveness of a program based on a firm's size. This would imply that any sampling procedure used in the evaluation should ensure the proper representation of all sizes of firms (through blocked randomization), so that the evaluation results could be generalized.
The second area examined through a literature search will include specific studies in the area of interest, including past evaluations. This will involve compiling and summarizing previous research findings. This information can then serve as input into various components of the evaluation study. For example, in studying an industrial assistance program, an evaluator might find past research that yields data on employment in areas that have benefited very differently from industrial assistance. A quasi-experimental design might then incorporate this data into the evaluation, where regions receiving high amounts of aid would serve as one group, and regions receiving smaller amounts of aid would become the control group.
Strengths and Weaknesses
A literature search early in the evaluation process can save time, money and effort. Indeed, several benefits consistently result from a thorough search.
-
Past research may suggest hypotheses to be tested or evaluation issues to be examined in the current study.
Chapter 3 emphasized the importance of identifying, as early as possible, competing explanations for any observed result other than the program intervention. A review of past research may reveal potential competing explanations (threats to validity) for the results observed. The strategy adopted would then have to isolate the impact of the program from these alternative explanations.
-
A search may identify specific methodological difficulties and may uncover specific techniques and procedures for coping with them.
-
In some cases, evaluation questions can be directly answered on the basis of past work and redundant data collection can be avoided.
-
Sources of usable secondary data may be uncovered, thus lessening the need to collect primary data.
Even when secondary data cannot directly provide the answer to the evaluation question, they might be used with primary data as input to the evaluation strategy, or as benchmark data to check validity.
A literature search is a relatively economical and efficient way of collecting relevant data and has a high potential for payoff. Always conduct such a search during the assessment phase of an evaluation. A literature search is also useful as a source of new hypotheses, to identify potential methodological difficulties, to draw or solidify conclusions, and as input to other data collection techniques.
The weaknesses of the data from a literature search are those associated with most secondary data: the data are usually generated for a purpose other than the specific evaluation issues at hand.
-
Data and information gathered from a literature search may not be relevant or compatible enough with the evaluation issues to be usable in the study.
Relevance refers to the extent to which the secondary data fit the problem. The data must be compatible with the requirements of the evaluation. For instance, secondary data available on a national level would not be helpful for an evaluation that required provincial data. Also, the scales of measurement must be compatible. If the evaluator needs data on children 8 to 12 years old, secondary data based on children aged 5 to 9 or 10 to 14 would not suffice. Finally, time greatly affects relevance; quite often secondary data are just too dated to be of use. (Keep in mind that data are usually collected between one and three years before publication).
-
It is often difficult to determine the accuracy of secondary data.
This problem goes to the very root of secondary data. The evaluator obviously has no control over the methodology used to collect the data, but still must assess their validity and reliability. For this reason, the evaluator should use the original source of secondary data (in other words, the original report) whenever possible. The original report is generally more complete than a second- or third-hand reference to it, and will often include the appropriate warnings, shortcomings and methodological details not reported in references to the material.
In summary, a comprehensive literature search is a quick and relatively inexpensive means of gaining conceptual and empirical background information for the evaluation. Consequently, an evaluator should do a literature search at the outset of an evaluation study. However, he or she should carefully assess, to the extent possible, the relevance and accuracy of the data yielded by the literature search. Evaluators should be wary of relying too heavily on secondary data for which few methodological details are provided.
References: Literature Searches
Goode, W.J. and Paul K. Hutt. Methods in Social Research. New York: McGraw-Hill, 1952, Chapter 9.
Katz, W.A. Introduction to Reference Work: Reference Services and Reference Processes, Volume II. New York: McGraw-Hill, 1982, Chapter 4.
4.3 File Review
As with the literature search, a file review is a data collection method aimed at discovering pre-existing data that can be used in the evaluation. A file review, however, seeks insight into the specific program being evaluated. Data already collected on and about the program and its results may reduce the need for new data, much as is the case in a literature search.
Two types of files are usually reviewed: general program files, and files on individual projects, clients and participants. The types of files program managers retain will depend on the program. For example, a program subsidizing energy conservation projects might produce files on the individual projects, the clients (those who initiated the project) and the participants (those who worked on the project). On the other hand, a program providing training to health professionals in northern communities may only retain the individual files on the health professionals who attended the training sessions. The distinction between types of files retained leads to two different types of file review: general reviews of program files and more systematic reviews of individual project, client or participant files.
File reviews can cover the following types of program documents:
- Cabinet documents, documents about memoranda of understanding negotiated and implemented with the Treasury Board, Treasury Board submissions, departmental business plans or performance reports, reports of the Auditor General and minutes of departmental executive committee meetings;
- administrative records, which include the size of program or project, the type of participants, the experience of participants, the post-project experience, the costs of the program or project, and the before-and-after measures of participants' characteristics;
- participants' records, which include socio-economic data (such as age, sex, location, income and occupation), critical dates (such as entry into a program), follow-up data, and critical events (such as job and residence changes);
- project and program records, including critical events (such as start-up of projects and encounters with important officials), project personnel (such as shifts in personnel), and events and alterations in project implementation; and
- financial records.
File data may be retained by a program's computerized management information system or in hard copy. The file data may also have been collected specifically for evaluation purposes if there is an agreement beforehand on an evaluation framework.
Strengths and Weaknesses
File reviews can be useful in at least three ways.
-
1. A review of general program files can provide invaluable background data and information on the program and its environment and hence put program results in context
A file review can provide basic background information about the program (such as program terms, history, policies, management style and constraints) that ensures the evaluator's familiarity with the program. As well, such a review can provide key information for outside experts in the program area (see Section 4.6) and provide input to a qualitative analysis (see Section 5.4).
-
2. A review of individual or project files can indicate program results.
For example, in a study of an international aid program, project files can provide results measures such as product/capital ratio, value added/unit of capital, productivity of capital employed, capital intensity, employment/unit of capital, value added/unit of total input, and various production functions. Although these measures do not directly assess program effectiveness, they are indicators that could serve as inputs into the evaluation. Data of this kind may be sufficient for a cost-benefit or cost-effectiveness analysis (see Section 5.6).
-
3. A file review may produce a useful framework and basis for further data gathering.
A file review, for example, may establish the population (sampling frame) from which the survey sample is to be drawn. Background information from the files may be used in designing the most powerful sample, and in preparing the interviewer for an interview. Asking for information on a survey that is already available in files is a sure way of discouraging cooperation; the available file information should be assembled before the survey.
In terms of feasibility, a file review has major strengths.
-
A file review can be relatively economical.
There is minimal interference with individuals and groups outside the program administration. As with a literature search, file reviews are a basic and natural way of ensuring an evaluator's familiarity with the program. Furthermore, an initial file review ensures that the evaluator does not collect new and more expensive data when adequate data already exist.
There are, however, certain problems associated with a file review.
-
Program files are often incomplete or otherwise unusable.
More often than not, a central filing system is relegated to a secondary position, containing brief memos from committees, agendas of final decisions and so forth. In retrospect, these files tell an incomplete story.
When researching the material that has given shape to a policy, program or project, the evaluator may find that this information is contained in files held by separate individuals, instead of in a central repository for program files. This can create several problems. For instance, experience suggests that once the project life-cycle moves beyond a working group's terms of reference, participating individuals will dispense with their files instead of keeping them active. Similarly, when a particular person stops participating in the project's implementation, his or her files are often lost; and because of the rapidly shifting role of participants at the outset of a program, this may significantly affect the comprehensiveness of files on the program.
-
A file review rarely yields information on control groups, except in special cases, such as when files on rejected applicants to a program exist.
To assess impact effectively, evaluators must have access to a control group of some sort. For a file review, this implies a requirement for file information about program participants before they entered the program, or information about non-participants. It is rare for such information to exist, except where an evaluation framework was approved and implemented beforehand. The lack of such data may make it necessary to collect new data, but these data may not be comparable with the original file data.
A file review can, however, provide information on control groups when program levels vary (which is useful for a post-program-only different treatment design). It may also yield the basic information needed to identify and select a control group.
Despite its limitations, a file review should always be undertaken as part of an evaluation assessment, in order to determine the type of data available and their relevance to the evaluation issues. This exercise will also yield information necessary for addressing specific evaluation issues (such as, background information and potential indicators of program results).
References: Secondary Data Analysis
Boruch, R.F., et al. Reanalyzing Program Evaluations - Policies and Practices for Secondary Analysis for Social and Education Programs. San Francisco: Jossey-Bass, 1981.
Weisler, Carl E., U.S. General Accounting Office. Review Topics in Evaluation: What Do You Mean by Secondary Analysis?
4.4 Observations
"Seeing is believing" as the old saying goes; direct observation generally provides more powerful evidence than that which can be obtained from secondary sources. Going into the "field" to observe the evaluation subject first-hand can be an effective way of gathering evidence. The results of field observation, recorded through photos or videos, can also be helpful and may have a powerful impact on the reader if used in the evaluation report.
Observation involves selecting, watching and recording objects, events or activities that play a significant part in the administration of the program being evaluated. The observed conditions can then be compared with some pre-established criteria and the deviations from this criteria analyzed for significance.
In some cases, direct observation can be an essential tool for gaining an understanding of how the program functions. For example, a team evaluating customs clearance at airports might observe long lines of incoming passengers whenever two 747s arrive at the same time. Such peak-load problems would hinder the effectiveness of inspection, as well as the quality of service. Another example might be a case where dangerous chemicals were stored improperly, indicating unsafe working conditions for staff and a violation of health and safety regulations. Neither of these findings would have become apparent from examining written records only.
Observational data describe the setting of a program, the activities that take place in the setting, the individuals who participate in the activities and the meaning of these activities to the individuals. The method has been extensively used by behavioural scientists, such as anthropologists and social psychologists. It enables an evaluator to obtain data about a program and its impact holistically.
The technique involves on-site visits to locations where the program is operating to observe activities and to take notes. Program participants and staff may or may not know that they are being observed.
Observations should be written up immediately after the visit and should include enough descriptive detail to allow the reader to understand what has occurred and how it occurred. Descriptions must be factual, accurate and thorough, without being filled with irrelevant items. Observational data are valuable in evaluation projects because evaluators and users can understand program activities and effects through detailed descriptive information about what occurred and how people have reacted.
Strengths and Weaknesses
-
Observation provides only anecdotal evidence unless it is combined with a planned program of data collection. A random walk provides no basis for generalization. Some first-hand observation can be justified in almost every evaluation, but it can be expensive to plan and carry out field trips to collect representative data.
Observation permits the evaluator to understand a program better, particularly when a complex or sophisticated technology or process is involved. Through direct, personal observation, evaluators are able to create for themselves a complete picture of the program's functioning. Furthermore, direct observation permits the evaluator to move beyond the selective perceptions gained through such means as interviews. Evaluators, as field observers, will also have selective perceptions, but by making their own perceptions part of the data available, evaluators may be able to present a more comprehensive view of the program.
-
The evaluator will have the chance to see things that may escape staff members or issues that they are reluctant to raise in an interview.
Most organizations involve routines which participants take for granted. Subtleties may be apparent only to those not fully immersed in these routines. This often makes it possible for an outsider, in this case the evaluator, to provide a "fresh" view. Similarly, outsiders may observe things that participants and staff are unwilling to discuss in an interview. Thus, direct experience with and observations of the program will allow evaluators to gain information that might otherwise be unavailable.
-
The reliability and validity of observations depend on the skills of the observer and on the observer's awareness of any bias he or she brings to the task.
Direct observation cannot be repeated: another person carrying out a similar set of on-site observations may observe the same phenomena differently. This implies limits to both the internal and external validity of the direct observation data.
-
Program staff may behave quite differently from their usual patterns if they know that they are being observed by an evaluator.
The evaluator must be sensitive to the fact that staff, participants or both may act differently if they know they are being observed. Evaluators should take appropriate steps to prevent this problem from occurring, or to account for its effect.
References: Observations
Guba, E.G. "Naturalistic Evaluation." In Cordray, D.S., et al., eds. Evaluation Practice in Review. V. 34 of New Directions for Program Evaluation. San Francisco: Jossey-Bass, 1987.
Guba, E.G. and Y.S. Lincoln. Effective Evaluation: Improving the Usefulness of Evaluation Results through Responsive and Naturalistic Approaches. San Francisco: Jossey-Bass, 1981.
Office of the Auditor General of Canada. Bulletin 84-7: Photographs and Other Visual Aids. (While aimed at the end use of photographs in the annual report, this bulletin also helps explain what makes an effective photograph for evidence purposes.)
Patton, M.Q. Qualitative Evaluation Methods. Thousand Oaks: Sage Publications, 1980.
Pearsol, J.A., ed. "Justifying Conclusions in Naturalistic Evaluations," Evaluation and Program Planning. V. 10, N. 4, 1987, pp. 307-358.
V.Van Maasen, J., ed. Qualitative Methodology. Thousand Oaks: Sage Publications, 1983.
Webb, E.J., et al. Nonreactive Measures in the Social Sciences, 2nd edition. Boston: Houghton Mifflin, 1981.
Williams, D.D., ed. Naturalistic Evaluation. V. 30 of New Directions for Program Evaluation. San Francisco: Jossey-Bass, 1986.
4.5 Surveys
Surveys, in an evaluation context, are systematic ways of collecting primary data-quantitative, qualitative or both-on a program and its results from persons (or from other sources, such as files) associated with the program. The term "survey" refers to a planned effort to collect needed data from a sample (or a complete census) of the relevant population. The relevant population is composed of those persons from whom the data and information are required. When properly conducted, a survey offers an efficient and accurate means of ascertaining the characteristics (physical and psychological) of almost any population of interest.
Surveys are used extensively in evaluation because of their versatility. In fact, surveys can be used to gather data on almost any issue. Nevertheless, surveys provide the input data to some other analytic technique; a survey on its own is not an evaluation strategy, but rather a data collection method.
Developing a survey for use in an evaluation requires care and expertise. Numerous textbooks, some of which are listed at the end of this chapter, explain how to develop a useful survey. In Appendix 1, the basic elements of survey research are described and discussed. What follows here is a brief description of how surveys should be used in evaluation.
Evaluators should follow three basic steps before implementing a survey. First, define the evaluation information needs. Second, develop the instrument to meet these needs. And third, pre-test the instrument. These steps, in fact, apply to all data collection techniques. They are discussed here because surveys are such a common presence in evaluative work.
(a) Defining the Evaluation Information Needs
The first and most fundamental step in preparing a survey is to identify, as precisely as possible, what specific information will address a given evaluation issue.
First, the evaluator must thoroughly understand the evaluation issue so that he or she can determine what kind of data or information will provide adequate evidence. The evaluator must consider what to do with the information once it has been collected. What tabulations will be produced? What kinds of conclusions will the evaluator want to draw? Without care at this stage, one is likely to either gather too much information or to find out afterward that key pieces are missing.
Next, the evaluator must ensure that the required data are not available elsewhere, or cannot be collected more efficiently and appropriately by other data collection methods. In any program area, there may be previous or current surveys. A literature search is therefore essential to determine that the required data are not available elsewhere.
A third consideration relates to economy and efficiency. There is always a temptation to gather "nice-to-know" information. The evaluator should realize that defining the scope and nature of a survey determines in large part its cost and that collecting "extra" data will add to the total cost.
(b) Developing the Survey
The development of the actual survey is discussed in Appendix 1, "Survey Research." It involves determining the sample, deciding on the most appropriate survey method and developing the questionnaire. These steps tend to be iterative rather than sequential, based on information needs as they are determined.
(c) Pre-testing the Survey
Surveys that have not been properly pre-tested often turn out to have serious flaws when used in the field. Pre-test a representative sample of the survey population to test both the questionnaire and the procedures to be used in conducting the survey. Pre-testing will provide information on the following.
-
Clarity of questions
Is the wording of questions clear? Does every respondent interpret the question in the same way? Does the sequence of questions make sense?
-
Response rate
Is there any question that respondents find objectionable? Does the interview technique annoy respondents? Do respondents refuse to answer parts of the questionnaire?
-
Time and length
How long does the questionnaire take to complete?
-
Survey method
If the survey is conducted by mail, does it yield an adequate response rate? Does a different method yield the required response rate?
Strengths and Weaknesses
The strengths and weaknesses of various survey methods are discussed in Section A.5 of Appendix 1. Nevertheless, some general points are made here.
-
A survey is a very versatile method for collecting data from a population.
Using a survey, one can obtain attitudinal data on almost any aspect of a program and on its results. The target population can be large or small and the survey can involve a time series of measurements or measurements across various populations.
-
When properly done, a survey produces reliable and valid information.
A great number of sophisticated techniques are available for conducting surveys. Many books, courses, experts and private-sector consulting firms are available to help ensure that the information collected is pertinent, timely, valid and reliable.
However, as a data collection method, surveys do have several drawbacks.
-
Surveys require expertise in their design, conduct and interpretation. They are easily misused, resulting in invalid data and information.
Survey procedures are susceptible to a number of pitfalls that threaten the reliability and validity of the data collected: sampling bias, non-response bias, sensitivity of respondents to the questionnaire, interviewer bias and coding errors. Each potential problem must be controlled for. Statistics Canada has prepared a compendium of methods that can be used to assess the quality of data obtained from surveys (1978).
Surveys must be rigorously controlled for quality. Often, evaluators will contract out survey field work. In these instances, it is wise to test the contractor's work through independent call backs to a small sample of respondents.
References: Surveys
Babbie, E.R. Survey Research Methods. Belmont: Wadsworth, 1973.
Bradburn, N.M. and S. Sudman. Improving Interview Methods and Questionnaire Design. San Francisco: Jossey-Bass, 1979.
Braverman, Mark T. and Jana Kay Slater. Advances in Survey Research. V. 70 of New Directions for Program Evaluation. San Francisco: Jossey-Bass, 1996.
Dexter, L.A. Elite and Specialized Interviewing. Evanston, IL: Northwestern University Press, 1970.
V.Fowler, Floyd J. Improving Survey Questions: Design and Evaluation. Thousand Oaks: Sage Publications, 1995.
Gliksman, Louis, et al. "Responders vs. Non-Responders to a Mail Survey: Are They Different?" Canadian Journal of Program Evaluation. V. 7, N. 2, October-November 1992, pp. 131-138.
Kish, L. Survey Sampling. New York: Wiley, 1965.
Robinson, J.P. and P.R. Shaver. Measurement of Social Psychological Attitudes. Ann Arbor: Survey Research Center, University of Michigan, 1973.
Rossi, P.H., J.D. Wright and A.B. Anderson, eds. Handbook of Survey Research. Orlando: Academic Press, 1985.
Statistics Canada. A Compendium of Methods for Error Evaluation in Consensus and Surveys. Ottawa: 1978, Catalogue 13.564E.
Statistics Canada. Quality Guidelines, 2nd edition. Ottawa: 1987.
Treasury Board of Canada, Secretariat. Measuring Client Satisfaction: Developing and Implementing Good Client Satisfaction Measurement and Monitoring Practices. Ottawa: October 1991.
Warwick, D.P. and C.A. Lininger. The Survey Sample: Theory and Practice. New York: McGraw-Hill, 1975.
4.6 Expert Opinion
Expert opinion, as a data gathering technique, uses the perceptions and knowledge of experts in given functional areas as evaluation information. Essentially, this method consists of asking experts in a given subject area for their opinions on specific evaluation issues. Evaluators use this information to determine program outcomes. Eliciting opinions from experts is really a specific type of survey, and all the comments described in the survey section are relevant here. However, because of the frequent use of this technique, a separate discussion of it is warranted.
Note that expert opinion is a method best suited to supplementing (or replacing, in the absence of more objective indicators) other measures of program outcomes. It should be emphasized that expert opinion is a data collection method. It does not refer to the use of an expert on the evaluation team, but rather to the use of experts as a source of data for addressing evaluation issues.
Expert opinions can be collected and summarized systematically, though the results of this process will remain subjective. For example, suppose an evaluator was trying to measure how a particular support program advanced scientific knowledge. One way of measuring this hard-to-quantify variable would be through questions put to appropriate scientific experts. Using specific survey methods, which can be administered through the mail or personal interviews, the evaluator could obtain quantitative measures. The procedures used could either be a one-shot survey, an interactive method such as Delphi (see Linstone and Turoff, 1975) or a qualitative controlled feedback process (see Press, 1978).
Strengths and Weaknesses
-
Expert opinion can be used to carry out measurements in areas where objective data are deficient. It is a relatively inexpensive and quick data collection technique.
Because of its flexibility and ease of use, expert opinion can be used to gauge almost any program outcome or, indeed, any aspect of a program. Its credibility is enhanced if it is done as systematically as possible. Expert opinion is, however, subject to several serious drawbacks.
-
There may be a problem in identifying a large enough group of qualified experts if the evaluator wishes to ensure statistical confidence in the results.
-
There may be a problem in obtaining agreement from the interested parties on the choice of experts.
-
Experts are unlikely to be equally knowledgeable about a subject area, and so weights should be assigned to the results.
Although there are statistical methods that try to adjust for unequal expertise by using weights, these methods are fairly imprecise. Thus, the evaluator runs the risk of treating all responses as equally important.
-
As with any verbal scaling, the validity of the measurement can be questioned.
Different experts may make judgements on different bases, or they may be using numbers in different ways on rating scales. For example, an individual who, on a 1 to 5 scale, rates a project's contribution to scientific knowledge as 3 may view the project no differently than does an individual who rates it at 4. The only difference may be in the way they use numerical scales.
-
Like any subjective assessment, expert opinion presents a credibility problem.
Disputes over who the experts were and how they were chosen can easily undermine the best collection of expert opinion.
-
As a result of these weaknesses, expert opinion should not be used as the sole source of data for an evaluation.
References: Expert Opinion
Boberg, Alice L. and Sheryl A. Morris-Khoo. "The Delphi Method: A Review of Methodology and an Application in the Evaluation of a Higher Education Program," Canadian Journal of Program Evaluation. V. 7, N. 1, April-May 1992, pp. 27-40.
Delbecq, A.L., et al. Group Techniques in Program Planning: A Guide to the Nominal Group and Delphi Processes. Glenview: Scott, Foresman, 1975.
Shea, Michael P. and John H. Lewko. "Use of a Stakeholder Advisory Group to Facilitate the Utilization of Evaluation Results," Canadian Journal of Program Evaluation. V. 10, N. 1, April-May 1995, pp. 159-162.
Uhl, Norman and Carolyn Wentzel. "Evaluating a Three-day Exercise to Obtain Convergence of Opinion," Canadian Journal of Program Evaluation. V. 10, N. 1, April-May 1995, pp. 151-158.
4.7 Case Studies
When a program is made up of a series of projects or cases, a sample of "special" case studies can assess (and explain) the results. As with expert opinion, case studies are really a form of survey, but they are also important enough to be dealt with separately.
Case studies assess program results through in-depth, rather than broad, coverage of specific cases or projects. Unlike the data collection techniques discussed so far, a case study usually involves a combination of various data collection methods. Case studies are usually chosen when it is impossible, for budgetary or practical reasons, to choose a large enough sample, or when in-depth data are required.
A case study usually examines a number of specific cases or projects, through which the evaluator hopes to reveal information about the program as a whole. Thus, selecting appropriate cases becomes a crucial step. The cases may be chosen so that the conclusions can apply to the target population. Unfortunately, cases are often chosen in a non-scientific manner (or too few are selected) for valid statistical inferences to be made.
Alternatively, a case may be chosen because it is considered a critical example, perhaps the purported "best case". If a critical case turned out badly, the effectiveness of the whole program might be seriously questioned, regardless of the performance of other cases. Both selection criteria-the representative and critical cases-are discussed below.
Suppose that a determination of the results of an industrial grant can be based only on a detailed examination of corporate financial statements and comprehensive interviews of corporate managers, accountants and technical personnel. These requirements would likely make any large sample prohibitively expensive. The evaluator might then choose to take a small sample of those cases that are felt to represent the whole population. The evaluator could apply the results thereby obtained to the entire population, assuming that similar circumstances prevailed in cases not studied. Of course, this is not always an easy assumption to make; questions or doubts could arise and cast doubt on the credibility of any conclusions reached.
To measure program results, the case study of a critical case may be more defensible than the case study of a representative sample. Suppose, for example, that one company received most of the program's total funds for a given industrial project. Assessing the effect of the grant on this project-did it cause the project to proceed, and if so, what benefits resulted-may go a long way toward measuring overall program results. Thus, the critical case study can be a valid and important tool for program evaluation.
However, case studies are usually used in evaluation less for specific measurement than for insight into how the program operated, and why things happened as they did.
More often than not, the results are not as straightforward as anticipated. Evaluators may claim that these unanticipated results are the result of "complex interactions", "intervening variables" or simply "unexplained variance". What this typically means is that some important factor was overlooked at the evaluation assessment stage. This is likely to happen fairly often because prior knowledge of the process that links inputs to outputs to outcomes is seldom complete. However, this knowledge is relatively important and evaluators can gain it by using evaluation data collection methods that provide insights into the unanticipated; the case study method is clearly one of these.
In fact, case studies can be used for many purposes, including the following:
- to explore the manifold consequences of a program;
- to add sensitivity to the context in which the program actions are taken;
- to identify relevant "intervening variables"; and
- to estimate program consequences over the long term (Alkin, 1980).
Strengths and Weaknesses
-
Case studies allow the evaluator to perform an in-depth analysis that would not be possible with more general approaches.
This is probably the most important attribute of case studies, since practical considerations often limit the amount of analysis that can be done with broader approaches. The depth of analysis often makes the results of a case study quite valuable. In addition, case studies can generate explanatory hypotheses for further analysis.
-
Case studies are typically expensive and time consuming to carry out. It is, therefore, usually not possible to analyze a statistically reliable sample of cases. As a result, the set of case studies will usually lack a statistical basis from which to generalize the conclusions.
The in-depth analysis possible with case studies usually requires significant resources and time, limiting the number which can be carried out. Hence, they are not normally expected to provide results that can be generalized statistically. Their main function is, rather, to provide a broader overview and insights into the unfolding of the program. Because of this, it is usually recommended that case studies be carried out before (or at least in parallel with) other, more generalizable, procedures for collecting data.
References: Case Studies
Campbell, D.T. "Degrees of Freedom and the Case Study," Comparative Political Studies. V. 8, 1975, pp. 178-193.
Campbell, D.T. and J.C. Stanley. Experimental and Quasi-experimental Designs for Research. Chicago: Rand-McNally, 1963.
Cook, T.D. and C.S. Reichardt. Qualitative and Quantitative Methods in Evaluation Research. Thousand Oaks: Sage Publications, 1979, Chapter 3.
Favaro, Paul and Marie Billinger. "A Comprehensive Evaluation Model for Organizational Development," Canadian Journal of Program Evaluation. V. 8, N. 2, October-November 1993, pp. 45-60.
Maxwell, Joseph A. Qualitative Research Design: An Interactive Approach. Thousand Oaks: Sage Publications, 1996.
McClintock, C.C., et al. "Applying the Logic of Sample Surveys to Qualitative Case Studies: The Case Cluster Method." In Van Maanen, J., ed. Qualitative Methodology. Thousand Oaks: Sage Publications, 1979.
Yin, R. The Case Study as a Rigorous Research Method. Thousand Oaks: Sage Publications, 1986.
4.8 Summary
This chapter has discussed six data collection methods used in program evaluation: literature searches, file reviews, observation, surveys, expert opinion and case studies.
The first two methods collect secondary data and the remaining four collect primary data. For the sake of discussion and presentation ease, each method was treated separately here. However, in the context of a program evaluation, these methods should be used together to support the various evaluation research strategies employed.
A literature search and a file review are indispensable in any evaluation exercise. They should be undertaken during the evaluation assessment phase and at the earliest stage of the evaluation itself. These methods will define the context of the program under review, and will also suggest plausible ways of attributing observed results to a given program. What is more, they can prevent unnecessary data collection by suggesting or identifying relevant or equivalent data already available elsewhere.
Many of the methods discussed in this chapter collect attitudinal data. Evaluators should be aware, however, that attitudes change over time, depending on contextual factors. Attitudes are also subjective. For example, a survey asking people about the results of a program gives the evaluator, at best, the aggregate opinion of the target population about the program result. This may or may not be of interest in determining the actual results of the program. Attitudinal data are best interpreted in light of the given historical and socio-economic context. This background data should therefore be collected to support a proper analysis of the attitudinal data.
Evaluators should be aware of the potential subjectivity of the data obtained through particular collection methods, especially through observation, expert opinion and, at times, case studies. This is not necessarily a disadvantage, but it does require that the external validity of any conclusions be carefully assessed. On the other hand, these collection methods are the best ways to generate holistic and in-depth information on the impact of programs. Used with quantitative data, qualitative data are quite effective in verifying the link between a program and its results.
Typically, any single data collection method will not be completely satisfactory for a program evaluation. When constraints permit it, it is always better to use several different collection methods and sources of data.
- Date modified: