Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

10.1177/0022487105284475

EVIDENCE IN TEACHER PREPARATION ESTABLISHING A FRAMEWORK FOR ACCOUNTABILITY Mona S. Wineburg American Association of State Colleges and Universities

This article reports on a survey examining the state of knowledge and practice about how universities provide evidence of the effectiveness of their programs to schools, parents, policy makers, and the public. The project asked three questions: What is happening? What is promising? What is believable? The survey focused on two areas: results and outcomes, and issues in measurement. Data from this study reveal that state colleges and universities are expending extraordinary energy and resources assessing prospective teachers and compiling data about teacher preparation programs. The survey data uncover the myriad issues that confound the data collection process, the difficulties around establishing validity and reliability, and the extraordinary demands placed on programs to produce data for a variety of constituencies. Recommendations are made for the development of a national framework for evidence, guidelines that institutions can use to proactively develop data systems that promote a culture of evidence on their campuses.

Keywords: teacher education; teacher preparation program effectiveness; evidence-based teacher education; teacher preparation program improvement; accountability A rising chorus from many different quarters demands that university-based teacher education programs prove their effectiveness (Finn, 2003; Paige, 2002). State legislators wonder if university-based teacher education is worth the money being invested (Education Commission of the States, 2000). Hiring officials and parents wonder about the competence of recent teacher candidates (Education Testing Service, 2002). Advocates of alternative programs imply that quality programs can be delivered in less time for less money (Feistritzer, 2004; Hess, 2001).

And state officials implement new strategies such as the American Board for Certification of Teacher Excellence Passport to Certification test to assess teachers’ capabilities (for more information, see http://www.abcte.org/passport/ index/html). The demand is always the same: Produce evidence to prove the effectiveness of teacher preparation programs. Feeling the mounting pressure to demonstrate the effectiveness of teacher preparation programs with solid evidence, university administrators and teacher educators are trying

Author’s Note: The work of a national association necessarily depends on the time, talents, and goodwill of many people. In that regard, I would like to particularly thank David Wright, associate director, Evaluation and Assurance of Teacher Education Programs, California State University, for his willingness to help in developing and reviewing the survey; Lee Shulman, president of the Carnegie Foundation for the Advancement of Teaching, for his hospitality in hosting a key meeting in Palo Alto; and the advisory panel for devoting their time, sharing their expertise, and participating in national meetings. Finally, I thank the member institutions of the American Association of State Colleges and Universities for their willingness to complete the survey and for their commitment to improving public education and preparing high-quality teachers. This project received funding support from the Carnegie Corporation of New York. Journal of Teacher Education, Vol. 57, No. 1, January/February 2006 51-64 DOI: 10.1177/0022487105284475 © 2006 by the American Association of Colleges for Teacher Education

51

to respond to growing expectations. In this article, I present the initial results of a survey of the members of the American Association of State Colleges and Universities (AASCU), which was intended to find out what kinds of data are being collected to meet the demand for evidence. The survey reveals that many of the administrators and faculty members involved in teacher education at institutions across the country are expending extraordinary energy and resources in the data collection process. However, the survey also shows that educators are responding to the demand for evidence in the absence of a shared consensus about what should be measured and how and may well be collecting information that is of dubious utility. In this article, I suggest that what is needed is a national framework for evidence of teacher education program effectiveness, including guidelines that institutions could use pro-actively to develop data systems that promote a culture of evidence on their campuses. To be useful, such a framework for evidence would need to be developed collaboratively, broadly agreed on, and implemented on a state-by-state basis. BACKGROUND: THE SEARCH FOR CREDIBLE AND PERSUASIVE EVIDENCE Linking teacher practice to pupil outcomes has proven particularly challenging for teacher educators. Profound methodological problems occur when linking individual teacher actions with subsequent pupil performance, including substantial intervening variables, questions about appropriate measures of student learning, issues regarding the lack of test standardization between schools and districts, and problems in the mechanics of tracking candidates and accessing data (Zeichner & Conklin, 2005). Alternate measures of student learning, such as whole school scores, or proxies for student learning, such as teacher behavior, only add to the attribution complexity (Rice, 2003). For AASCU member institutions, the accountability pressure is particularly acute. AASCU has a historic commitment to public education, especially to the preparation of teachers. Many AASCU institutions began as normal schools (from approximately the 1840s to the 1930s), 52

and they take pride in their sustained contributions to public education. AASCU represents more than 400 public colleges, universities, and systems of higher education throughout the United States and its territories, and its members enroll more than 3 million students or 55% of the enrollment at all public 4-year institutions. In the 2002-2003 academic year, member institutions conferred 35%—more than one third—of the undergraduate degrees awarded nationwide, and they conferred half of the undergraduate degrees in education (55,105 bachelor’s, 58,656 master’s, and 1,354 doctorates). In 2001, AASCU reinvigorated the Christa McAuliffe Excellence in Teacher Education Award which, although created in the aftermath of the Challenger disaster to identify outstanding teacher education programs, had been discontinued in the early 1990s (for more information, see http://www.aascu.org/programs/ mcauliffe/default.htm). The revised award is intended to highlight winning institutions’ efforts to hold themselves accountable for the quality of the teachers they prepare. The award is designed to identify and recognize institutions that demonstrate the effectiveness of their programs at enhancing candidates’ and pupils’ knowledge. Since 2002, when the award was reinstated, 13 institutions have been honored. For example, East Carolina University created the Latham Clinical Schools Network, a university–public school partnership between East Carolina University and 16 public school systems in rural eastern North Carolina. Their mission was to provide a regional partnership where public schools and the university could collaborate to improve the quality of teacher preparation and increase P-12 student achievement. Through this partnership, East Carolina University was able to receive student test data that could be directly correlated to program graduates to demonstrate student achievement. Central Michigan University was honored for the Michigan Schools in the Middle Program. They were able to demonstrate the program’s success through the comparison of seventh- and eighthgrade student achievement at 13 of their participating high-need schools with the achievement of students at 71 other Michigan middle schools

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

with identical grade-level configurations and comparable demographics. These comparisons, which included results in reading, writing, mathematics, science, and social studies, were based on the percentage of students passing the Michigan Education Assessment Program annual tests at “satisfactory” or higher achievement levels. The Liberal Studies–Elementary Partnership Program at Longwood University in rural Virginia built on the work of the Renaissance Group Title II Improving Teacher Quality Grant that supported the collaborative development of the teacher work sample assessment rubrics. Pretesting and posttesting determined entry- and exit-level pupil knowledge on the content of a teaching unit, and data were then aggregated to demonstrate changes in pupil learning. AASCU has now reviewed 130 applications for the McAuliffe Award. It became apparent to those reviewing the applications that institutions are collecting data on a wide variety of measures, in a wide variety of ways, using a wide variety of methods. Some are measuring individuals, whereas others measure groups. Some are using local measures, whereas others are using national measures. Some measure only once, whereas others measure multiple times. Some are using only a single measure, whereas others are using multiple measures. And some measure results only against themselves, whereas others measure their progress against others. It is clear that institutions not only are facing an incredible burden but also are working essentially in isolation without national guidelines or national perspectives. In effect, everyone is trying to invent (or reinvent) wheels to produce evidence, albeit some of the wheels are without tires, some are broken, and some are not even round. To expand on what we had learned from the applications for the McAuliffe Award, AASCU leaders developed a project to discover what all of AASCU campuses (and others) are doing to provide credible and persuasive evidence of the effectiveness of their programs to schools, parents, policy makers, and the public. This project is based on two premises. First, we assumed that teacher education accountability is impor-

tant and legitimate; public institutions have a public obligation to be accountable. Second, the project is based on the premise that robust evidence systems must be in place to achieve educational outcomes, to guide program improvement, and to assure and protect the public. However, we are skeptical about whether any of the current approaches to collecting data have the power to provide such robust evidence. Given current limitations on design and data collection, we are concerned about the capacity of most teacher education programs and states to provide evidence about the impact of their programs at this point in time. Although some states are developing data systems that will be capable of tracking the achievement of individual students and teacher education program graduates, it is imperative that we not wait until these more elegant systems are developed before we focus on developing systems that demonstrate, in credible, persuasive, and useful ways, the impact of teacher education programs. THE AASCU SURVEY AASCU is committed to finding promising pathways for institutions to pursue based on current practice and emerging consensus in the field about the role of evidence in teacher education. We had three guiding principles. First, we wanted to understand the real world of practitioners, including teacher educators, state officials, and others who have the responsibility to prepare candidates. Second, we wanted to be informed by scholars and theoreticians who are working on the issues of evidence in teacher education. And third, we wanted to find out from policy makers and legislators what evidence they would find credible and persuasive. A review of the research on teacher preparation and student achievement helped inform both the need and the design for the project. In a review of research on teacher preparation, Wilson, Floden, and Ferrini-Mundy (2001) concluded that the design and reporting of research on teacher preparation should be explicit about connections to improving student achievement:

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

Research on teacher preparation, like other education research, should contribute to our understand-

53

ing of how to improve student achievement. . . . To help practitioners and policymakers see the contributions of the research, reports should make the connections to student achievement explicit, using measures of teacher knowledge, skill, and practice that are thought important for effective teaching. Because the effects of teacher preparation on student achievement are distant in time and complicated by other intervening events, it is seldom practical to gather student achievement data as part of teacher preparation research. But improving student achievement remains the ultimate goal. (p. 33)

Additional reviews comment on the dearth of research and information on the connections between teacher preparation and student achievement. Jennifer King Rice (2003) reviewed research on teaching quality and concluded that not much is known about the dynamics among multiple facets of teaching, teacher effectiveness with special populations, and teacher effects in elementary schools and in high school subjects outside of math and science. Michael Allen (2003) reviewed 92 studies to answer eight questions about teacher preparation particularly important to policy and education leaders for the Education Commission of the States. Two of their recommendations are particularly relevant to the AASCU project: make education research more responsive to the needs of policy makers and practitioners and more accessible to all stakeholders and make the connections to student achievement as explicit as possible. Finally, the report from the American Educational Research Association’s Panel on Research and Teacher Education (Cochran-Smith & Zeichner, 2005) identifies specific research needs in teacher education. Few studies were found that address the connection between teacher preparation and student learning: Given the current political context . . . where pressures on teacher education programs have intensified to demonstrate the connection of their work to student learning, we think that greater efforts need to be made by researchers to connect teacher education to student learning. (Cochran-Smith & Zeichner, 2005, p. 743)

SURVEY RESULTS: METHODS FOR COLLECTING EVIDENCE OF PROGRAM EFFECTIVENESS

Informed by the research reviews noted above, AASCU leaders asked chief academic officers at AASCU institutions to help in gather54

ing information about their teacher preparation programs. They were asked to forward the request to the person who would be most knowledgeable about teacher preparation and able to complete the survey. The survey asked how institutions assess the content knowledge, the classroom performance, and the P-12 student learning of their program graduates; how programs track the retention of graduates; what data collection and analysis procedures are used; what mandates institutions are under to collect and report information; and what issues exist in relation to accessing data (see Table 1). Survey questions were open-ended to allow for descriptive responses. Terms were not defined because we were interested in what is actually occurring and did not want responses to be modified to fit preexisting categories. As a result of this approach, we made some judgments when assigning categories of response. For example, teacher work sample and work sample were considered to be similar, as were evaluation by the cooperating teacher and ratings of the student teacher by the cooperating teacher. To analyze the data, the survey questions and responses were entered into an Excel database program. Multiple readings of responses and searches for common language yielded categories and themes that enabled us to calculate counts and percentages of responses. We received responses from 240 institutions, or 65% of all AASCU institutions that prepare teachers. Taken together, these institutions represent full-time equivalent enrollments approaching 2,000,000 students and confer more than 65,000 education degrees each year. Of the institutions, 38% are located in rural areas, 38% in metropolitan areas, and 24% in urban areas. Table 2 illustrates the types of institutions that responded; this distribution is reasonably representative of all AASCU institutions.

The survey reveals that institutions are using similar measures and instruments to collect effectiveness data, such as work samples and

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

TABLE 1

Survey Questions

Results and Outcomes 1. Do you assess the content knowledge of your program graduates? If so, a. When is content knowledge measured within your teacher preparation programs? b. If you use a state test that is required for licensure, answer the following questions: i. What tests do you use? ii. Do you receive the scores when the candidates take the tests? iii. How do you use the scores? c. If you don’t use a state test, answer the following questions: i. What measures do you use? ii. What types of evidence do you collect? Please provide examples. iii. How do you ensure validity and reliability of your assessments? d. Do you use multiple sources of evidence? 2. Do you assess the classroom performance of your program graduates? If so, a. What measures do you use? b. What types of evidence do you collect? Please provide examples of the evidence. c. Do you use multiple sources of evidence? d. How do you ensure validity and reliability of your assessments? e. Do you solicit judgments about the adequacy of teacher preparation? i. From whom do you collect data? (e.g., state education agency, licensing board) ii. What types of data do you collect? iii. Is the data aggregated? iv. Can you trace the performance data to individual programs? v. How do you know if the data is valid? 3. Do you use measures of P-12 student learning to assess your program graduates? If so, a. What measures do you use? b. Who develops the measures? c. Who administers the measures? d. Are these assessments intended for district, school, or individual assessment? e. What types of evidence do you collect? f. Do you use multiple sources of evidence? Please provide examples. g. How do you ensure validity and reliability of your assessments? 4. Do you track both the teacher retention and participation of program graduates? If so, a. What do you count as participation? b. How many years do you track graduates? c. What methods do you use? d. What are the sources of evidence? Issues in Measurement 1. Mandates and Expectations a. Do you have specific legislative or accreditation requirements, mandates, or expectations that you will measure the performance of program graduates? b. If so, please describe both the measurement expectations (what is to be measured) and from whom (which agencies, organizations)? 2. Data Collection and Analysis a. What unit collects data for assessment of program graduates? b. Are P-12 partner schools involved? c. What issues have you encountered in data collection and analysis? d. Do you use state or district P-12 standards to measure the performance of your graduates? 3. Unit of Measurement a. Do you track the performance of individual graduates or do you use aggregate measures (such as whole school scores, etc.)? 4. Access to Data a. Who has access to the data? b. Are the data used only in the college/school of education, or shared with others, such as Arts and Science faculty or P-12 partners? c. How do you handle public access to teacher education evaluation? 5. Use of Data a. How are the data used? i. Program review? ii. Program improvement? iii. Other Purposes?

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

55

TABLE 2

Survey Responders

Carnegie Classification Doctoral/Research universities— Extensive Doctoral/Research universities— Intensive Masters colleges and universities—I Masters colleges and universities—II Baccalaureate colleges—Liberal arts Baccalaureate colleges—General Baccalaureate/Associates colleges Teachers colleges

Number Percentage

7

3.0

25 156 12 9 20 2 1

10.8 67.2 5.2 3.9 8.6 0.9 0.4

NOTE: Complete data were not available for all institutions.

surveys. They are responding to mandates from their respective state education departments and from national accrediting agencies that they be accountable for the learning of both teacher candidates and the P-12 students they teach. They are aware of the need to demonstrate this accountability, but many are still conceptualizing what accountability means and what constitutes evidence of accountability, particularly for P-12 student learning. Many institutions are in the planning stages or involved in piloting systems to collect performance data. Others appear to be revising methods they had used previously to conform to new expectations. Although some have difficulty accessing the data they need, others are able to access information, often with assistance from their states or their university systems. Most individual institutions are struggling to respond to outside mandates for evidence of program effectiveness in isolation from other institutions. They do not appear to be able to organize and interpret the data in ways that would provide an effective response to outside mandates. Different data requirements and differing definitions for all of the formal reporting requirements of state, federal, and national accreditation make analysis extremely difficult for many respondents. Nor is it clear that there are structures in place to use the data to inform ongoing change. Lack of data management systems, lack of access to data, and lack of a consistent methodology to gather and analyze data were often cited as impediments. Across the 240 institutions we surveyed, we found that evidence of program effectiveness is 56

gathered through four primary methods: (a) observation systems more or less supported by faculty-developed rubrics and program standards from professional associations (e.g., International Reading Association; National Council for Teachers of Mathematics, for more information, see http://www.ncate.org/public/ standards .asp; Association for Childhood Education International); (b) surveys of P-12 cooperating teachers, school principals, and program graduates, both during the program and in follow-up data gathering; (c) work samples/ portfolios of teacher candidates, usually developed during methods courses or student teaching; and (d) state teacher certification tests, particularly Praxis I and II. We were interested in knowing how institutions assess the depth of content knowledge that teacher candidates possess. We found that virtually all institutions assess content knowledge of their teacher candidates in some systematic way. Those few that do not assess content knowledge reported they are currently developing ways to do so. Institutions that do assess content knowledge generally use either Praxis tests from the Educational Testing Service or state-designed tests used for teacher certification purposes (e.g., Illinois, Massachusetts, Texas, Michigan, Oklahoma, California). These content tests are used in a variety of ways and at different points in the preservice program: program entrance or exit, admittance to student teaching, or recommendation for certification. Programs seemed to be using these tests to verify that their teacher candidates had sufficient content knowledge to be effective in classrooms. However, the recent report from the American Educational Research Association’s Panel on Research and Teacher Education (Cochran-Smith & Zeichner, 2005) recommends that the content and concurrent validity of such tests needs to be assessed and that research is needed concerning their predictive and consequential validity (Wilson & Youngs, 2005). Although the No Child Left Behind Act of 2001 requires that teachers must demonstrate their content knowledge to be considered “highly qualified,” it is not clear that passing a content test provides sufficient evidence that teacher

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

candidates possess the knowledge they need to be successful with P-12 students. We also found that most programs assess the classroom performance of their program graduates, although in a wide variety of ways. Assessments include portfolios, teacher work samples, surveys, self-assessment, observation by experts, and anecdotal information. Observation by experts is used by all of the institutions surveyed as a way to assess the classroom performance of teacher candidates. Our observation by experts category includes student teacher supervisors or teacher educators reviewing videotapes of classroom performance and assessing that performance using an observational instrument; microteaching (practice lessons, usually taught to peers during classes on specific pedagogy) with feedback provided by the professor; observing candidates during student teaching or evaluating practicum experiences; state-developed evaluation systems; national, state, or systemwide testing (e.g., Praxis III and other statewide observation protocols); and induction programs. The following three responses, from survey respondents at three different institutions, reflect the range and variety of assessment methods used to gather classroom performance data:

during student teaching, field experience assignments, portfolios with artifacts demonstrating teacher candidate actions, surveys, narratives, testimonials, anecdotes, interview data, checklist ratings, classroom observations with observation protocols, evaluations from residency year, Praxis III data, videotape analysis, state-aggregated data, principal-aggregated data, and self-report. All of the institutions indicated they use multiple sources of evidence to assess classroom performance, usually combinations of the types listed above. It is clear that great amounts of data are collected, but the survey does not reveal how the different types of evidence are aggregated or how they are used to demonstrate effectiveness. The following examples from three different completed surveys show the wide range of types of evidence collected: The School of Education collects observation data, using ADEPT observation forms. The School of Education also uses various artifacts such as candidate work samples of pre-, post- student assessment evaluations. Observation data are collected as well as candidate evaluation of impact on student learning. A final evaluation includes assessment of competence in teaching tasks. Examples include asking open questions, lesson planning and delivery, classroom management, professional behavior.

Classroom performance is assessed throughout the programs. There are five levels of clinical experiences. In-course simulations, individual tutoring, practicum experiences, candidate internships, and a two-year follow up for those needing on-the-job assistance. The standards we use in our teacher education programs require candidates to provide evidence of their competence in professional knowledge and skills. We use a portfolio assessment for such as well as an internship final evaluation that is based upon our standards. The teacher education unit has developed assessments in the form of evaluation of candidate performance which are completed by K-12 mentor teachers/ administrators and university-based supervisors. Additional assessments include teacher work samples at the undergraduate level and similar comprehensive assessments at the graduate level.

To assess classroom performance, the institutions use pretesting and posttesting of P-12 students, teacher work samples developed

Types of evidence collected include critical performances; course projects like TWS and various components of the TWS; feedback from student presentations/products/models; feedback from classroom teachers who house students during field experiences; feedback on recommendation forms submitted by faculty.

The survey asked about validity and reliability of classroom performance assessments because we were interested in how programs ensure that other stakeholders will see their instruments as credible. Of the respondents, 15% did not respond to this question at all. Of those who did respond, approximately 10% said they use face validity, content validity, or construct validity. Approximately 20% identified interrater reliability, triangulation of data, or correlation studies as methods for ensuring the validity and reliability of their assessment

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

57

measures. Some responses demonstrate sophisticated attempts to ensure reliability and validity, others demonstrate reliance on outside sources (e.g., state-validated instruments) to take care of the issue, whereas still others demonstrate the use of fairly simplistic methods to attend to validity and reliability. For some institutions, the process of gathering evidence of classroom performance is a new endeavor and they have not addressed the validity and reliability issue fully. The following excerpts from different survey responses reflect the range of responses: There has been no formal determination of the reliability or validity of the assessments we use. Our measures are new. We only have content validation up to now.

The PEPE (Alabama Professional Education Personnel Evaluation Program) has been through a validation process (state).

State tests and other tests used in our program have established validity and reliability. For some assessments, content experts establish reliability and validity.

PRAXIS (III) tests are constructed by ETS and meet the organization’s expectations for reliability and validity. Trained observers collect candidate’s data and multiple evaluators evaluate all candidate work samples.

We conduct inter-rater reliability studies and faculty continually examines the validity (usefulness, predictive validity, content validity) of assessments and makes necessary revisions on a 2-year cycle.

Teacher education programs, by design, involve a large number of professionals from outside the university. Some, such as cooperating teachers, mentor teachers, and principals, are directly involved with program graduates. Others, such as local school districts and parents, develop opinions about teacher education programs after experiencing the work of program graduates. We were interested in how institutions solicit judgments about the adequacy of the classroom performance of their graduates from various stakeholders and how they validate those judgments. Survey responses indicate that virtually all institutions solicit judgments about their programs from a wide range of sources, including program graduates, P-12 schools (cooperating teachers, school administrators, mentor teachers, induction programs, and district superintendents), local advisory boards, state licensing boards and education agencies, national and regional accrediting agencies, and colleges of arts and sciences. Approximately 50% use surveys, and the remainder

58

use a variety of methods to gather information, including licensure and hiring reports, portfolios and K-12 student work samples, state assessments of teaching scores, accreditation reports, and reports from districts and states. However, about a quarter of the institutions did not respond at all to questions about validity of judgment data. For those institutions that do attempt to validate judgment data, face validity, content validity, and triangulation of data were the methods mentioned most often. Institutions in states that had developed classroom performance assessment systems rely on them to validate the instruments. The following four responses from survey completers illustrate the variety of responses:

Forms have face and content validity. Reliability is established by using a single clinical evaluation instrument across the unit and by providing training to evaluators in the proper use of the instrument. This is an ongoing concern.

It is often difficult to develop reliability with judgment data and extremely difficult, if not impossible, when judgments are sought from such a wide range of perspectives. It is not surprising, therefore, that so many programs find it a challenging task. We were particularly interested in how programs use measures of P-12 student learning to assess their programs and program graduates. However, we received the fewest responses to this question in our survey. Almost 50% of the responders either did not complete this question or indicated that they do not use measures of P-12 learning to assess their graduates. Approximately 20% of responders use teacher work sample methods to gather P-12 student learning data. Only 10% of responders indicated they use some form of P-12 student test data to assess effectiveness of their graduates.

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

Responses from others indicate that gathering data is difficult and often subjective. Measures of P-12 student learning include performance assessments/portfolios (including samples of student work, in-class test results), teacher work samples/pretest-posttest measures completed by teacher candidates (usually done during student teaching, including prescribed components and analysis of student learning during a period of instruction), and state tests (standard skills K-12 and end of grade). Other measures include observations with rubrics (e.g., Pathwise, which is a comprehensive mentoring and support program for beginning teachers developed by the Educational Testing Service); state-designed performance assessments; and teacher evaluations. These types of measures typically address the link between the teachers’ behavior and students’ learning, asking, for example, whether the teacher reflects on the extent to which the learning goals were met or if the teacher adjusts learning activities to aid in student learning. The following responses from three different survey completers reveal some of the ways institutions are gathering P-12 student learning data as well as what is planned by other institutions: During student teaching—Candidates complete a Teacher Work Sample that focuses on their ability to document their impact on P-12 learning. They are observed several times and evaluated using the current KTIP observation instrument. After graduation—currently state-testing data is only reported at the district and school level with no direct link between student achievement data and the teacher of record. At the current time we only track P-12 student learning while candidates are in our program. We are working with the state to track P-12 student data once candidates graduate from our program. However, we do survey principals and graduates as to what impact they have on P-12 student learning. Although subjective, it does give us some information. (Our university system) is undertaking a one-time study of the effects of teacher preparation on student learning in state public schools. The dependent variables in this study will be two sets of standardized examinations that are administered in all K-12 schools: one set of norm-referenced examinations and one set of criterion-referenced examinations. Both sets of examinations yield information about

the teaching and learning of literacy, other language skills, mathematics, science and history–social studies beginning in grade 2 and ending in grade 11. In this study, we are assessing the effectiveness of each teacher education program by aggregating learning data and proficiency data among all students whose teachers are graduates of a particular teacher preparation program. Additionally, we are assembling student demographic data (e.g. English language proficiency) and student socio-economic data (e.g. eligibility for reduced-price school meals) for the purpose of controlling these factors in the analysis of teacher preparation effects.

Institutions in some states indicated that they are able to use data supplied by the state in addition to the data gathered during candidates’ programs. They reported using these data to assess individual candidates. Most of the data appear to come from induction programs. States that make databases available included Colorado (survey Years 1 and 3), Connecticut (BEST), Illinois (beginning in 2005), Kentucky (KTIP), Louisiana (statewide induction, LATAAP), Maine (Standards for Initial Teacher Certification), Missouri (Missouri Assessment Program), North Carolina (surveys every year), Ohio (Pathwise and Praxis III), Oklahoma (residency year program), South Carolina (ADEPT), Tennessee (Framework for Evaluation and Professional Growth), Texas (Accountability System for Educator Preparation–TAKS), Vermont (tracks performance data through surveys during the course of 5 years), and Virginia (working on database for all teachers). Almost 50% of institutions did not respond when asked how they ensure validity and reliability of their measures of P-12 student learning to assess program graduates. Approximately 25% reported they do not assess for validity and reliability of their in-house– developed instruments. Programs that use state tests assume validity and reliability have been determined by the state. The remainder of the programs use a range of methods, including interrater reliability and correlation analysis, to validate their instruments. The following excerpts from four different surveys suggest this range:

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

We ensure validity & reliability through multiple measures for correlation analysis.

59

teaching as well as full- and part-time teaching for more than five months during the year immediately following the completion of the preparation.

Inter-rater reliability is re-established at each faculty training session on using the portfolio rubric. The validity of work sampling methodology/protocols has been widely validated for content validity in Oregon and elsewhere. We are in the process of assessing the consistency in the way in which protocols are used and assessed across student teachers and their university supervisors.

Participation is employment as teacher of record in an accredited school. Participation is teaching at least 1/2 time within 2 years of program completion.

Not sure of the reliability or validity because we only have one year of data at this point.

We found that state colleges and universities are expending extraordinary energy and resources assessing prospective teachers and compiling data about teacher preparation programs. Although programs are collecting voluminous data about the classroom performance of their graduates, many do not appear to be paying sufficient attention to ensuring that their assessments are reliable and valid. This is of particular concern because although the measures typically used to gather data are designed in-house by individual institutions, they must be reliable and valid if they are to be credible to others. We were interested in knowing whether institutions track the retention and participation of their program graduates. We also wanted to know whether institutions track their graduates to find out whether they are actively involved in the profession in some way and whether they gather data about the effectiveness of their graduates as they advance in their careers. The survey asked how participation in the profession is defined as well—whether data are differentiated enough to identify those program graduates who stay in the profession full-time, go on to positions that are not classroom-based, or work less than full-time. However, most institutions do not differentiate between retention and participation. There is no consensus on how participation is defined, which makes comparison across programs virtually impossible. The following three excerpts reflect the lack of consensus: (We) annually track participation in K-12 teaching by the graduates of all of the systems’ teaching credential programs. As “participation,” we count teaching any grade from kindergarten through grade 12 in any school (public or private) in any location throughout the world. We count substitute

60

Approximately 50% of the institutions reported they track graduates after program completion. Some institutions gather data only 1 year out, whereas one institution has more than 20 years of tracking data. Typically, data are gathered at 1, 3, and 5 years beyond program completion. We discovered that almost all of the retention data gathered by programs is gathered through surveys. Many institutions reported difficulty in locating graduates, resulting in poor response rates on their surveys. There are also indications that institutions are working with states and systems to develop ways to share data. Early attempts to share data reveal some of the difficulties that need to be overcome, as this example suggests: We are attempting to (track graduates) with lists of employed teachers provided by the department of education. This is difficult, however, because in our state the workforce database and department of education databases are not linked.

As tracking systems are developed, more and richer kinds of data should become available. Collaboration with states varied, as these excerpts suggest: We are currently working with the state to have them develop this database, as they host all this information anyway, due to the state teacher test bank. We are using data available through the Illinois Teacher Service Record and Teacher Certification Information System to track program graduates. The system taps electronic employment databases that are maintained by the State. We work closely with as many as 800 state school districts and county offices in our search for our teaching graduates. We also tap private vendors who specialize in producing the current addresses of large groups of people.

Rather than each individual institution’s tracking graduates in the workforce, it is likely that pooling the expertise and resources of each state into developing sophisticated databases would

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

be more accurate, cost-effective, and useful to teacher education programs and the public. Our survey asked institutions about a variety of issues in measurement, particularly concerning mandates to collect data, methods for collection and analysis, and ability to access data. We knew that programs had been asked to produce specific evidence of their graduates’ knowledge and abilities, mainly as a result of Title II reporting requirements and the highly qualified teacher provisions of the No Child Left Behind Act of 2001. We were particularly interested in who is asking for evidence, what kinds of evidence they are requesting, and what it looks like from the institutional perspective. Almost 90% of respondents indicated they have specific legislative or accreditation requirements, mandates, or expectations to measure the performance of program graduates. Some states require national accreditation; others have state requirements that include performance expectations. The National Council for the Accreditation of Teacher Education was mentioned most often as mandating specific requirements. Despite historic criticism about teacher preparation programs’ lack of connection to P-12 schools, most institutions reported they work with P-12 schools to assess their program graduates. Approximately 75% of institutions indicated they work with P-12 partner schools to gather data about the performance of their program graduates, and almost 63% of those use state or district P-12 standards to measure graduate performance. There are a number of difficult issues involving data collection and analysis that are of concern to programs. Institutions appear to be frustrated by the amount of time and money needed to build and maintain sophisticated data systems, the difficulties they encounter in accessing data, and the lack of compatibility among various databases. The following four responses suggest some of their specific concerns: 1) The amount of time to collect and analyze is considerable, especially for a small program; 2) Different data requirements and differing definitions to be used in collecting and summarizing data for all of

the formal reporting requirements of state, federal, and national accreditation make analysis extremely time consuming; and 3) It is a continuing challenge to ensure the reliability and validity of measures of classroom performance by students and graduates. Major issues include availability of data and reluctance to release data, even aggregated, because of union guidelines and confidentiality issues. Time, technology and personnel to input, aggregate, analyze and interpret the volume of data generated by a large program and communicating the information in a timely manner to all who need to have it to make better decisions. Database compatibility within and across organizations; program components assessed at multiple times, by different stakeholders, so results are not comparable; requests, format and questions worded differently for similar information; participant return rate on surveys; lack of state database support and access.

Approximately 50% of programs use aggregated measures, such as whole school scores, as the unit of measurement to track performance. Respondents seemed to assume that comparisons of schoolwide improvement on standardized test scores in schools where graduates teach with improvement in schools without program graduates could be used to demonstrate program effectiveness. Most existing data systems are not able to track individual student scores and match them to individual program graduates. And even when such sophisticated systems exist, as in the Tennessee Value Added Assessment System (for more information, see http://www.shearonforschools.com/TVAAS _index.html), individual institutions are not able to access the P-12 scores of their graduates’ students because of privacy regulations. However, almost 35% of the respondents indicated they track individual performance. This appears to be through teacher work samples and other data they collect while the teacher candidates are still in the preparation program. A common criticism of teacher preparation programs has been their lack of connection with arts and sciences faculty. It is interesting, however, that 80% of the respondents to this survey indicated they share their data with faculty in arts and sciences, as well as with their P-12 partners. Finally, almost all institutions reported

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

61

using their data for program review and improvement. CONCLUSION The AASCU survey indicates that state colleges and universities across the nation are collecting voluminous amounts of data. AASCU member institutions compile data about a multitude of program variables: they collect data from prospective students; they survey, observe, assess, and test students during program enrollment; and they require students to submit portfolios and various bits of information as prerequisites for credential recommendations. The Christa McAuliffe Award applications, however, also suggest that data collection is idiosyncratic to individual institutions. Some institutions measure individual teachers; others measure schools or grades. Some institutions use standardized tests; others use observational strategies. However, most of the student achievement data collected, even for evidence of program success, focus on only narrowly defined outcomes, usually on math and language arts skills. Missing are measures of teaching and learning in a variety of academic subjects not tested by standardized tests. The No Child Left Behind Act of 2001 requires annual testing in mathematics and reading and will soon require testing in science. As a result, gathering student achievement data to demonstrate program effectiveness in other subject areas is difficult. But beyond content, also missing are measures of outcomes such as democratic skills, measures of social skills, and measures of self-esteem or confidence as a learner. There is little discussion of programs gathering evidence on a broader set of important indicators, such as those identified by Michelli and Keiser (2005, p. xviii): preparing students to be active, involved participants in democracy; preparing students to have access to knowledge and critical thinking within the disciplines; preparing students to lead rich and rewarding personal lives and to be responsible and responsive community members; and preparing students to assume their highest possible place in the economy. 62

A number of institutions reported difficulties related to accessing data. In some areas, confidentiality agreements and state privacy laws limit access to pupil data. But beyond these obvious limits, many computer systems are old, databases are formatted in incompatible ways or unable to link to other systems, and a host of other technical issues make the practical problem of sharing data, even if releasable, virtually impossible. It is clear that evidence of the effectiveness of programs will not be possible unless there are collaborative efforts between universities and the school districts that own the data and have a natural interest in learning about the effectiveness of candidates coming to them. Beyond the local partnerships, however, we believe that data systems must also be developed at the state level. Some states are moving ahead with interesting approaches; it is clear that the most advanced state at this point is Louisiana, whose 5-year development effort is moving toward a comprehensive, articulated system that should provide a rich set of data for both program improvement and public assurance. Our sense from reviewing the survey results, particularly the section on issues in measurement, is that institutions are besieged by demands for data and frustrated by the amount of time and energy they are devoting to the collection process. The wide array of data required by different groups makes it difficult for programs to build data systems that are useful for program development, teacher quality improvement, and the development of public trust. We would argue that institutions need proactively to develop data systems to guide continuous program improvement, to achieve educational outcomes, and to assure their publics that they are graduating teachers who can positively affect the learning of P-12 students. Evidence of teacher program effectiveness is needed by a variety of relevant constituencies, including policy makers such as regents/ boards of education, higher education agencies, legislators who allocate resources, local communities, and parents. Evidence is also needed for licensure to determine whether a teacher

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

possesses a sufficient level of knowledge and skill to perform effectively and responsibly. Finally, and perhaps most important, evidence is needed for program improvement to inform institutions as they evaluate their work and modify or redesign programs based on the data gathered. All of these different purposes cannot be served by the same data. What is useful and credible to one audience might be unacceptable to another. For example, licensing agencies might want data on state and national tests of teacher knowledge; teacher education programs might want more specific data on the efficacy of their courses in methodology; local education agencies might want teacher retention data, principal satisfaction survey data, and student achievement data; whereas legislators might want data about subject-matter preparation and expenditures to prepare teachers. The mere compilation of data is insufficient for program improvement and accountability. It must be organized in a way that makes it valuable in the conceptualization, planning, assembly, analysis, interpretation, and use of evidence for accountability and program improvement. To be successful, institutions need the assistance and support of all stakeholders. There is movement in this direction. Leaders in states and systems are creating policy environments and statewide frameworks to support the development of a culture of evidence in teacher preparation. They are establishing the partnerships and developing the accountability systems, together with their teacher preparation institutions, that are necessary for institutional success. They are creating and pilot testing systems to assess teacher program effectiveness as well as building technology that will allow data to be shared in useful ways. The states of Louisiana, Virginia, Ohio, South Carolina, and Georgia as well as the California State University, Texas A&M University, and City University of New York systems are leading examples. Without supportive policy environments, particularly at the state level, the work of teacher preparation programs to use evidence to achieve educational outcomes, to guide program improvement, and to assure and protect the public is difficult if not impossible.

We began this project to help our member institutions identify ways they could provide evidence of the effectiveness of their programs to schools, parents, policy makers, and the public. We had hoped to identify promising pathways, strategies, and methods for collecting evidence that would be both credible and persuasive to policy makers and the public. However, we learned that this was not to be so simple. Mounting external pressure on institutions has resulted in the prevalence of tests and assessments of prospective teachers, new teachers, and teacher preparation. It is ironic that the pervasiveness of these assessments may not be beneficial for teachers, teaching, or schools. Demands from various constituencies require programs to produce different kinds of data, often in different formats, for different purposes. Institutions report demands and expectations from state licensing agencies, national accrediting agencies, specialty program areas, and the federal government. They also must develop data in response to institutional needs, as well as to individual college and program needs. Meeting all of these demands takes valuable time, personnel, and resources away from the most important work of the programs. We would urge universities to work to develop evidence-based cultures that will demonstrate quality of teacher education program graduates and be valuable to institutions as well as to the public at large. They need to identify the data needed by various constituencies to provide evidence of quality and areas for improvement. They need to work together with state agencies and professional practice boards, the federal government and national accrediting agencies, university faculty and administrators, K-12 partners, and policy makers to accomplish this task. Consensus about what data are useful, at what levels, and for what purposes needs to be reached. All stakeholders, including policy makers, should develop a national framework collaboratively. It should be broadly agreed on by legislators and the public. It will need to be operationalized state-bystate, given the various state agencies and policy environments that exist. It must be reliable

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

63

and valid to ensure credibility. Cost should certainly be a consideration but not a deterrent. Finally, it should be able to measure the effectiveness of individual programs in ways that are meaningful. A transparent framework, one in which it is clear that student learning is the centerpiece, would help teacher preparation programs clarify to all stakeholders that what they do matters in P-12 student learning. It would enable them to redesign or refocus their programs without guessing what data would be considered credible and persuasive evidence of the quality of their graduates to their publics. They would be able to focus their time and energy on improving the quality of the teachers they prepare, guided by their data, rather than reacting to the demands of policy makers. They would be able to work together for quality and aggregate new knowledge, as do professionals in other fields. A framework would allow for policy and licensure oversight, increase the ability to compare programs, and lead to public assurance that universities are preparing quality teachers. Such a framework would result in increased numbers of highly effective teachers, increased professionalization of teacher education programs, and growing public confidence. REFERENCES Allen, M. (2003). Eight questions on teacher preparation: What does the research say? Denver, CO: Education Commission of the States. Cochran-Smith, M., & Zeichner, K. (Eds.). (2005). Studying teacher education: The report of the AERA Panel on Research and Teacher Education. Washington, DC: American Educational Research Association. Education Commission of the States. (2000). Two paths to quality teaching. Denver, CO: Author. Education Testing Service. (2002). A national priority: Americans speak on teacher quality. Princeton, NJ: Author. Feistritzer, E. (2004). Alternative teacher certification: A state by state analysis 2004. Washington, DC: National Center for Education Information.

64

Finn, C. (2003). High hurdles. Education Next. Retrieved August 9, 2004, from http://www.educationnext.org/ 20032/62.html Hess, F. (2001). Tear down this wall: The case for a radical overhaul of teacher certification. Washington, DC: Progressive Policy Institute. Michelli, N. M., & Keiser, D. L. (Eds.). (2005). Teacher education for democracy and social Justice. New York: Routledge Falmer. No Child Left Behind Act of 2001, Pub. L. No. 107-110, 115 Stat. 1425 (2002). Paige, R. (2002). Meeting the highly qualified teachers challenge: The secretary’s annual report on teacher quality. Washington, DC: U.S. Department of Education, Office of Postsecondary Education. Rice, J. K. (2003). Teacher quality: Understanding the effectiveness of teacher attributes. Washington, DC: Economic Policy Institute. Wilson, S., Floden, R., & Ferrini-Mundy, J. (2001). Teacher preparation research: Current knowledge, gaps, and recommendations. Washington, DC: Center for the Study of Teaching and Policy. Wilson, S. M., & Youngs, P. (2005). Research on accountability processes in teacher education. In M. Cochran-Smith & K. Zeichner (Eds.), Studying teacher education: The report of the AERA Panel on Research and Teacher Education (pp. 591-643). Washington, DC: American Educational Research Association. Zeichner, K. M., & Conklin, H. G. (2005). Teacher education programs. In M. Cochran-Smith & K. Zeichner (Eds.), Studying teacher education: The report of the AERA Panel on Research and Teacher Education (pp.645-735). Washington, DC: American Educational Research Association.

Mona S. Wineburg is Director of Teacher Education at the American Association of State Colleges and Universities in Washington, DC. She earned a Ph.D. from the University of Maryland–College Park, a M.Ed. from Boston University, and a B.A. from Temple University. She served as Director of Teacher Education at American University, was a teacher education program approval specialist at the Maryland State Department of Education, and taught in the College of Education at the University of Maryland–College Park. Additionally, she was a classroom teacher and learning specialist in K-12 schools before beginning her work in higher education.

Journal of Teacher Education, Vol. 57, No. 1, January/February 2006

evidence in teacher preparation

data systems that promote a culture of evidence on their campuses. ... lems in the mechanics of tracking candidates and accessing data ... AASCU has now reviewed 130 applications for the McAuliffe ...... Princeton, NJ: Author. Feistritzer, E.

110KB Sizes 0 Downloads 117 Views

Recommend Documents

TEACHER TRAINING AND TEACHER IDENTITY IN ...
developing the skills required for current employment but also skills, knowledge and attitudes that .... (National Training Information Service). Achievement of the ...

CATHOLICITY OBLIGATIONS IN TEACHER CONTRACTS
Jun 5, 2018 - and offer assistance to the teacher, to take steps to reconcile himself or ... of the Third Millennium, General Directory for Catechesis (available at.

Teacher Learning in Language Teaching - EPDF.TIPS
an exploration of the thinking and learning processes employed by teachers as they learn to teach. Fifteen original articles, based on studies conducted in North America, Europe, Asia, and. Australia, provide examples of research into the ways that i

Teacher Education in Finland
administration and planning. The departments of ... of the entire higher education system in 1979, the degree earned in teacher education became .... same pedagogical studies are accepted in all kinds of teaching jobs in the comprehensive and ..... s

Teacher status in Finland for Education
This is not the case in Finland, where teachers work in one of the .... core subjects once a year with a random sample of schools. There are no school ... vocational education in Finland is popular, with almost 40% of young people opting to.

Information and Evidence in Bargaining
with (partially) verifiable evidence have been applied mainly in information- ..... MATHIS, J. (2008): “Full Revelation of Information in Sender-Receiver Games of ...

INTRODUCING TEXTUAL EVIDENCE IN YOUR BLOG.pdf ...
INTRODUCING TEXTUAL EVIDENCE IN YOUR BLOG.pdf. INTRODUCING TEXTUAL EVIDENCE IN YOUR BLOG.pdf. Open. Extract. Open with. Sign In.

SCERT for conducting Teacher Eligibility Test in the state in ...
Jul 4, 2012 - SCERT for conducting Teacher Eligibility Test in the state in ... The Secretary, Kerala Public Service commission, Thiruvananthapuram.

Preparation of Papers in Two-Column Format
inverted curriculum, problem-based learning and good practices ... Computer Science Education, ... On the programming courses for beginners, it is usual for the.

Case In Point: Complete Case Interview Preparation ...
... Case Interview Preparation 8th Edition All Ebook Downloads http pdfsite top book 0971015880 ... Point: Complete Case Interview Preparation, 8th Edition PDF full Online, Case In .... The Wall Street Journal calls Case in Point the MBA Bible!

Preparation of Papers in Two-Column Format
QRS complex during real time ECG monitoring and interaction between ... absolute value of gradient is averaged over a moving window of ... speed and satisfactory accuracy, which does not fall below the ... heart rate as well as other vital signs [7],

pdf-1467\case-in-point-complete-case-interview-preparation-third ...
Try one of the apps below to open or edit this item. pdf-1467\case-in-point-complete-case-interview-preparation-third-edition-by-marc-p-cosentino.pdf.

pdf-1468\case-in-point-complete-case-interview-preparation-fourth ...
Try one of the apps below to open or edit this item. pdf-1468\case-in-point-complete-case-interview-preparation-fourth-edition-by-marc-p-cosentino.pdf.

Preparation of Papers in Two-Column Format
Society for Computational Studies of Intelligence, AI 2003, alifax,. Canada, June 2003. ... workshop on open source data mining: frequent pattern mining.