A Context for Science with a Commitment to Behavior Change

VOLUME 8, ISSUE 2 ISSN: 1539-4352 TABLE OF CONTENTS

Page 111: Editorial – Beholden To Other Professions - Joe Cautilli and Mike Weinberg Page 113: Behavioral Health Management of Space Dwelling Groups: Safe Passage Beyond Earth Orbit - Henry H. Emurian & Joseph V. Brady Page 136: Evaluating Features of Behavioral Treatments in the Nonhuman Animal Laboratory John C. Borrero, Timothy R. Vollmer, Andrew L. Samaha, Kimberly N. Sloman and Monica T. Francisco Page 145: A Behavior Analytic Look at Contemporary Issues in the Assessment of Child Sexual Abuse – W. Joseph Wyatt Page 163:

Extending Research on the Validity of Brief Reading Comprehension Rate and Level Measures to College Course Success - Robert L. Williams, Christopher H. Skinner, and Kathryn E. Jaspers

Page 175: Within Session FR Pausing - Adam Derenne and Kathryn A. Flannery Page 187: Evolutionary Psychology and Behavior Analysis: Toward Convergence - Jeremy E. C. Genovese Page 196: Aggregating Single Case Results - Wim Van den Noortgate and Patrick Onghena Page 210: Psychotherapy Equivalence - Robin Westmacott & John Hunsley Page 226: Relative Efficiency Of Stimulation - Carl J. Dunst, Melinda Raab, Orelena Hawks, Linda L. Wilson, and Cindy Parkey

Page 237: Persistent Preference in Concurrent-Chain Schedules – Paul Neuman et. al. Page 253: CE Questions for Daffern Article in BAT 8.1 on Assessing Aggression, written by Michele Katz

THE BEHAVIOR ANALYST TODAY

VOLUME 8, ISSUE 2, 2007

THE BEHAVIOR ANALYST TODAY PUBLISHER’S STATEMENT VOLUME NUMBER 8, ISSUE NUMBER 1 ISSN: 1539-4352

Published: June 30, 2007

The Behavior Analyst Today (BAT) is published quarterly by Joseph Cautilli. BAT is an online, electronic publication of general circulation to the scientific community. BAT’s mission is to provide a concentrated behavior analytic voice among voices that are more cognitive and structural. BAT emphasizes functionalism and behavioral approaches to verbal behavior. Additionally, BAT hopes to highlight the importance of conducting research from a strong theoretical base. BAT areas of interest include, but are not limited to Clinical Behavior Analysis, Behavior Models of Child Development, and Community based behavioral analytic interventions, and Behavioral Philosophy. BAT is an independent publication and is in no way affiliated with any other publications. The materials, articles, and information provided in this journal have been prepared by the staff of the Behavior Analyst Today for informational purposes only. The information contained in this journal is not intended to create any kind of patient-therapist relationship or representation whatsoever. For a free subscription to The Behavior Analyst Today, send the webmaster an e-mail containing your name, e-mail address, and the word "subscribe" in the subject box, and you will be added to the subscription list. You will receive notice of publication of each new issue via e-mail that will contain a hyperlink to the latest edition. You may also subscribe to the BAT journal by visiting http://www.behavior-analyst-today.com/subscribe.html.

Our Mission The Behavior Analyst Today is committed to increasing the communication between the sub disciplines within behavior analysis, such as behavioral assessment, work with various populations, basic and applied research. Through achieving this goal, we hope to see less fractionation and greater cohesion within the field. The Behavior Analyst Today strives to be a high quality journal, which also brings up to the minute information on current developments wihin the field to those who can benefit from those developments. Founded as a newsletter for master level practitioners in Pennsylvania and those represented in the clinical behavior analysis SIG at ABA and those who comprised the BA SIG at the Association for the Advancement of Behavior Therapy, BAT has evolved to being a primary form of communication between researchers and practitioners, as well as a primary form of communication for those outside behavior analysis. Thus the Behavior Analyst Today will continue to publish original research, reviews of sub disciplines, theoretical and conceptual work, applied research, program descriptions, research in organizations and the community, clinical work, and curriculum developments. In short, we strive to publish all which is behavior analytic. Our vision is to become the voice of the behavioral community.

THE BEHAVIOR ANALYST TODAY

VOLUME 8, ISSUE 2, 2007

THE BEHAVIOR ANALYST TODAY Meet Our Staff Lead Editor:

Michael Weinberg, Ph.D., BCBA - Vice President, Professional Education Resources, Milford, MA , USA [email protected]

Senior Associate Editors:

Joseph D. Cautilli, Ph.D., BCBA - Children’s Crisis Treatment Center – [email protected]

Jack Apsche, Ed.D., ABPP – Apsche Center for Evidenced Based Practices – Capital Academy [email protected]

Associate Editor: Paul Malanga, Ph.D. – University of South Dakota [email protected] James Cordova, Ph.D. - Clarke University [email protected]

THE BEHAVIOR ANALYST TODAY

VOLUME 8, ISSUE 2, 2007

EDITORIAL BOARD Janet Sloand Armstrong, M.Ed., BCBA – Pennsylvania Training and Technical Assistance Network Erik Arntzen, Ph.D. - University of Olso, Norway Warren K. Bickel, Ph.D. - University of Arkansas Ann Branstetter, Claudia Cardinal, Ph.D. - University of Nevada, Reno Michael Lamport Commons, Ph.D. - Harvard University Thomas Critchfield, Ph.D. - Illinois State University Michael Dougher, Ph.D. - University of New Mexico Stephen Eversole, Ed.D., BCBA - Behavior Development Solutions David Feeney, Ed.D. - Temple University David Greenway, Ph.D. - University of Louisiana at Lafayette Dana Lewis Haraway, Ph.D. - University of West Florida Raymond Reed Hardy, Ph.D. - St. Norbert’s College Jonathan Kanter, Ph.D. - University of Wisconsin, Milwaukee Lee Kern, Ph.D. - Lehigh University Mareile Koenig, Ph.D., CCC-SLP, BCBA - West Chester University Dr. Chris Krägeloh - School of Psychology, Auckland University of Technology Richard Kubina, Ph.D. - Penn State University Stephen Ledoux, Ph.D. - SUNY-Canton Ethan Long, Ph.D. - The Association of University Centers on Disabilities (AUCD) Stein Lund - Bancroft Neurohealth Charles Merbitz, Ph.D., BCBA, CRC - The Chicago School of Professional Psychology Frances McSweeney, Sherry Milchick, M.Ed., BCBA - Pennsylvania Training and Technical Assistance Network Edward K. Morris, Ph.D. - University of Kansas Daniel J. Moran, Ph.D. - MidAmerican Psychological Institute, P.C. John T. Neisworth, Ph.D. - Penn State University Martha Pelaez, Ph.D. - Regional Advisor on Aging and Health, World Health Organization Lillian Pelios, Ph.D. – Elwyn Institute Patrick Progar, Ph.D. - Cadwell College David Reitman, Ph.D. - Nova University Lynn Santilli Connor, MSW, LSW, BCBA - Community Treatment Solutions Sherry Serdikoff, Ph.D., BCBA - Pennsylvania Training and Technical Assistance Network Jerzy Siuta, Ph.D. - Jagiellonian University, Poland Ralph Spiga, Ph.D. - Temple University at Episcopal Hospital Monika Suchowierska, Ph.D. - The Warsaw School of Social Psychology Chris Riley-Tillman, Ph.D. - University of East Carolina Darlene Crone-Todd, Ph.D. - Assistant Professor in Psychology, Salem State College Luc Vandenberghe, Ph.D. Tom Waltz, Ph.D. - University of Nevada, Reno Richard Weissman, Ph.D., BCBA - Media Bureau Vince Winterling, Ed. D. - Devereux Foundation BAO JOURNALS STAFF Melissa Apsche - Managing Editor Halina Dziewolska - Director of Advertising National Capitol Area Paralegal Concierge - Legal Affairs Richard Weissman & Craig Thomas – BAO Journals Publishing Management Committee

THE BEHAVIOR ANALYST TODAY

VOLUME 8, ISSUE 2, 2007

Author Submission Information Most contributions are by invitation and all are then peer-reviewed and edited. The editors, however, welcome unsolicited manuscripts, in which case, we suggest potential authors send an abstract or short summary of contents and we will respond as to our interest in a full manuscript submission. In all cases, manuscripts should be submitted electronically saved in “rich text format” (.rtf) to BOTH Michael Weinberg at [email protected] and Joe Cautilli at [email protected]. Please adhere to APA format and use “Times New Roman” font in 11 pt. throughout. In references, however, please italicize the places where APA format would have you underline. Abstracts and keywords must be included, as well as full contact information for all authors. All submissions are peer-reviewed and must be accompanied by a signed Assignment of Rights (AOR) form. After peer review and follow-up, all articles are copyedited. Authors have an opportunity to review and approve their manuscript prior to publication. Once approved, authors are responsible for all statements made in their work, including changes made by the copy editor prior to approval. Peer Review Process All submitted manuscripts are reviewed initially by the Lead Editor. Manuscripts with insufficient priority for publication will be rejected promptly. Other manuscripts will be sent to the Senior Associate Editor, who will distribute them to editorial consultants with relevant expertise. The editorial consultants will read the papers and evaluate (1) the importance of the topic addressed by the paper; (2) the paper’s conformity to standards of evidence and scholarship; and (3) the clarity of writing style. Comments provided by the editorial consultants will then be provided to the author(s) for follow up. Formatting Requirements: To support the electronic copy-editing process, authors must honor all of the following guidelines: The page set-up for manuscripts must be set for 1-inch margins on all 4 boarders. All pages must be in portrait orientation. There can be no pages in landscape orientation. Manuscripts must be typed in single-spacing using size 11 “Times New Roman” Font. Manuscripts must be submitted as one continuous document rather than in sections or sub-documents. Each manuscript must include 7 elements in the following order: title, name(s) of author(s), abstract, key words, body, references, author(s)’ contact information. Do not insert pagination, headers, or footers. (These are inserted in the copy-editing process) The use of headings is encouraged and should be structured according to the guidelines described in the Publication Manual of the American Psychological Association (5th edition). If graphics, figures and tables are used, they must be created in *.jpg or *.bmp format. No Excel graphs will be accepted. Graphics, figures, and tables, if used, may be embedded in the body of the manuscript or they may be submitted in a separate MS Word document. If the latter option is chosen, then author(s) must indicate clearly the intended location of each item (graphic, figure, table) within the manuscript so that the copy editor can make the insertions. Individual graphics, figures, and tables, when used, may not be larger than one page. The caption for a table must be printed above the table. The caption for a figure must be printed below the figure.

THE BEHAVIOR ANALYST TODAY

VOLUME 8, ISSUE 2, 2007

In the references section, please use italics where APA style would allow underlining (e.g., the titles of journals and books). Author contact information must include the following 4 elements for each author: name, mailing address, phone, and e-mail. Manuscripts must be saved and submitted in RTF format. When there is a conflict between the requirements of APA style (see below) and the formatting rules listed here, the formatting rules will supersede the APA requirements. Manuscript Style Requirements: With the exception of the above (formatting) guidelines, authors must write their manuscripts in a style that is consistent with the Publication Manual of the American Psychological Association (APA Manual) (5th edition). A copy of this manual may be ordered at http://www.apastyle.org/ Consistent with APA style, authors must use non-sexist language. Please refer to Table 2.1 in the APA Manual for “Guidelines for Unbiased Language.” Also consistent with APA style, authors must use person-first language for referring to individuals with potentially stigmatizing characteristics. Person-first language requires an author to name the individual first, followed by descriptive information (e.g., "child with autism") rather than to use an adjectival form (i.e., "autistic child") or a nominal form (i.e., "the autistic"). As noted above: When there is a conflict between the requirements of APA style and the formatting rules listed in the above section, the formatting rules will supersede the APA requirements. General Guidelines for Preparing Abstracts: The following general guidelines must be honored to insure that JSLP-ABA will be accepted into the major psych databases. (See PsychINFO website: http://www.apa.org/psycinfo/about/covinfo.html) • • • • • • • •

An abstract may not exceed 960 characters and spaces (approximately 120 words). Characters can be conserved by using digits for numbers (except at the beginning of sentences); by using well-known abbreviations; and by using the active voice. Begin the abstract with the most important information, but don’t repeat the title. Include only the four or five most important concepts, findings, or implications. Embed as many key words and phrases in the abstract as possible. Include in the abstract only information that appears in the body of the manuscript. For the sake of clarity, define all acronyms and abbreviations except for measurements; spell out the names of tests; use generic names for drugs (when possible); and define unique terms. Use the present tense to describe results with continuing applicability or conclusions drawn and the past tense to describe variables manipulated or tests applied. As much as possible, use the third person rather than the first person.

Abstracts for Empirical Studies: Abstracts for empirical studies are also generally about 100 to 120 words in length. They should include the following information: • • • • •

Problem under investigation (in one sentence) Pertinent characteristics of participants (e.g., number, type, age, sex, genus and species) Experimental method, including apparatus, data-gathering procedures, and complete test Names and complete generic names and dosage and routes of administration of any drugs (particularly if the drugs are novel or important to the study) Findings, including statistical significance levels

THE BEHAVIOR ANALYST TODAY



VOLUME 8, ISSUE 2, 2007

Conclusions and implications or applications

Abstracts for Literature Reviews and Theoretical Articles Abstracts for review or theoretical articles are generally about 75 to 100 words in length, and they include the following information: • • • •

The topic (in one sentence) The purpose, thesis, or organizing construct and the scope (comprehensive or selective) of the article Sources used (e.g., personal observation, published literature) Conclusions

Thank you! The Behavior Analyst Online Journals Department

YOUR AD HERE! Advertising in the Behavior Analyst Today Advertising is available in The Behavior Analyst Today. All advertising must be paid for in advance. Make your check payable to Joseph Cautilli. The ad copy should be in our hands at least 3 weeks prior to publication. Copy should be in MS Word or Word Perfect, RTF format and advertiser should include graphics or logos with ad copy. The prices for advertising in one issue are as follows: 1/4 Page: $50.00

1/2 Page: $100.00

Full Page: $200.00

If you wish to run the same ad in multiple issues/titles for the year, you are eligible for the following discount: 1/4 Pg.: $40 - per issue

1/2 Pg.: $75 - per issue Full Page: $150.00 - per issue

An additional one-time layout/composition fee of $25.00 is applicable For more information, or place an ad, contact Halina Dziewolska by phone at (215) 462-6737 or e-mail at: [email protected]

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Editorial on Being Beholden to other Professions Joseph Cautilli, Ph.D., BCBA Michael Weinberg, Ph.D., BCBA Behavior analysis continues to struggle with professional recognition. This occurs contrary to the mounting research evidence that shows behavior analysis to be the treatment of choice for a variety of problems. One population in particular that behavior analytic studies have shown success with is in lowering recidivism of offenders. Keywords: Behavior analysis, professional recognition, recidivism, offenders

A recent meta-analytic article, looking a program impact for offender populations, found that Behavior Therapy (operant and respondent conditioning principles, antecedent control strategies, selfcontrol training, etc.) and Cognitive Behavior Therapy were the only two treatments that produced an effect (Redondo-Illescas, Sanchez-Meca, & Garrido-Genovaes, 2001). These two interventions lead to a 12-15% decrease in recidivism over a two years post treatment. All the other interventions produced effects substantially much lower including therapeutic communities, non-behavioral treatments, dissuasion, and diversion programs. With crime so out of control across this nation, it would seem that a call for action for behavioral programs in prison would be one of the top public out cries but it is not. Part of the reason for this is that behavioral programs have received a great deal of bad press in the treatment communities and are often beholden to other professions in the community. For example, behavior analysts might work under nurses or psychiatrists in the hospital. This has an effect on the behavior targeted for intervention programs. Take for example the literature by behavior analysts on token systems in the institutions in the 1970s. The whole body of literature was on focused on building behaviors- a constructional view (Golddiamond, 1974), with patients setting goals and interventions directed toward community living on the outside (Atthowe, 1973; Bassett, Blanchard, & Koshland, 1975; Milby, Pendergrass, & Clarke, 1973; Fairweather, Sanders, Cresseler, & Maynard, 1969; Rybolt, 1975; Swartz & Bellack, 1975). Yet, when Page, Caron, & Yates (1975) did their famous survey of token systems they found that out of 280 programs surveyed almost all token systems were co-opted to just enforce nursing routines. This is the problem when one field is beholden to another. Being beholden to other professions is harmful or at least not in the consumer's best interest. The tools of behavior analysts become secondary to helping the other profession achieving its professional goals. Thus, instead of building behaviors that would prevent rehospitalization, the behavior analyst find himself or herself designing programs focused solely on getting the client to take medication and comply with hospital routines. Our argument is not that these elements are not important, just that they should not be the sole focus of behavior analytic treatment. If they became the sole focus that would violate the ethical code of behavior analysts. References Atthowe, J.M. (1973). Token economies come of age. Behavior Therapy, 4, 646-654.

111

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Atthowe, J.M. (1975). Legal and ethical accountability in everyday practice. Behavioral Engineering, 3, 35-38. Bassett, J.E., Blanchard, E.B., & Koshland, E.(1975). Applied behavior analysis in the penal setting: Targeting “free world” behaviors. Behavior Therapy, 6, 639-648. Fairweather, G.W., Sanders, D.H., Cresseler, D.L., & Maynard, H.(1969). Community life for the mentally ill. Chicago: Aldine. Goldiamond, I.(1974). Toward a constructional approach to social problems: Ethical and constitutional issues raised by applied behavior analysis. Behaviorism, 2 (1), 38-54 Milby, J.B., Pendergrass, P.E., & Clarke, C.H.(1975). Token economy versus control world: A comparison of staff and patient attitudes toward ward environments. Behavior Therapy, 6, 22-29. Page, S., Caron, P., & Yates, E. (1975). Behavior modification methods and institutional psychology. Professional Psychology, 29, 263-291. Redondo-Illescas, S., Sanchez-Meca, J., & Garrido-Genovaes, V.(2001). Treatment of offenders and recidivism: Assessment of the effectiveness of programmes applied in Europe. Psychology in Spain, 5(1), 47-62. Error! Hyperlink reference not valid. Rybolt, G.A.(1975). Token reinforcement therapy with chronic psychiatric patients: A three-year evaluation. Journal of Behavior Therapy and Experimental Psychiatry, 6, 188-191. Schwartz, J. & Bellack, A.S.(1975). A comparison of a token economy with standard inpatient treatment. Journal of Consulting and Clinical Psychology, 43, 107-108. Author Contact Information

Michael Weinberg, Ph.D., BCBA, Lead Editor Vice President, Professional Education Resources and Conference Services 321 Fortune Boulevard Milford, MA 01757 Tel.: 508-473-3422 ext. 295 Fax: 508-478-8794 Web: www.percs.info Joseph D. Cautilli, Ph.D., BCBA, LPC, Senior Associate Editor Children’s Crisis Treatment Center 1823 Callowhill St. Philadelphia, PA 19130-4197 [email protected]

112

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Behavioral Health Management of Space Dwelling Groups: Safe Passage Beyond Earth Orbit Henry H. Emurian & Joseph V. Brady Plans to pursue space expeditionary missions beyond Earth orbit have occasioned renewed concern that crew behavioral health and performance effectiveness, along with spacecraft habitability, will present major challenges to the success of spaceflight initiatives involving unprecedented increases in time and distance on interplanetary voyages. A programmed environment methodological approach that implements supportive performance and research-based behavioral technologies can contribute to meeting these challenges in furtherance of overcoming the ecologically constrained and inherently stressful circumstances of long-duration spaceflight missions by members of confined microsocieties. This paper presents the background context and rationale for applying behavior analytic methods and procedures to support individual and crew performance effectiveness and adaptation for long-duration spaceflight missions beyond Earth orbit, such as a mission to Mars. Keywords: Programmed environment, behavioral program, confined microsocieties.

NASA’s Vision for Space Exploration calls for humans to return to the moon by the end of the next decade, paving the way for eventual journeys to Mars and beyond1. Orion is the vehicle that NASA’s Constellation Program is developing to carry a new generation of explorers back to the moon and later to Mars. Orion will succeed the space shuttle as NASA's primary vehicle for human space exploration. According to a recent statement by Robert Zubrin, President of The Mars Society and advocate of the Mars Direct plan (Zubrin, 2000), “We could be on Mars in 10 years without a doubt” (Sullivan, 2006). And a conclusion stated within the 2004 Garriott-Griffin report2 on a strategy for the proposed U.S. space exploration policy was as follows: “We believe that human landings on the Moon or on Mars can begin about 2020” (p. 8). In that regard, Manzey (2004) estimates that a low-energy trajectory mission to Mars will require a minimum of 800 days, to include 200 days to reach Mars, 400 days on the surface of Mars, and 200 days to return to Earth. Despite these encouraging developments, expectations, and estimates that are based on the overwhelming technological success of previous manned space initiatives, one consideration remains almost constant: life in space will not be easy for space dwelling groups. Evidence from many international sources supports this conclusion, but two recent committee reports are especially compelling, as noted below. First, in response to a request from NASA, the Institute of Medicine convened a committee to address astronaut health during long-duration missions. The Committee on Creating a Vision for Space Medicine During Travel Beyond Earth Orbit was charged with making recommendations regarding the infrastructure for a health system in space to deal with such problems as radiation, loss of bone mineral density, and behavioral adaptation (“behavioral health”). The full report is available in Ball and Evans (2001), and the basic findings were as follows:

1

http://www.nasa.gov/mission_pages/exploration/main/ The Garriott-Griffin Report: Extending Human Presence into the Solar System, The Planetary Society, 2004. http://www.planetary.org/programs/projects/aim_for_mars/study-report.pdf

2

113

The Behavior Analyst Today

Volume 8, Issue 2, 2007

1. Not enough is yet known about the risks to humans of long-duration missions, such as to Mars, or about what can effectively mitigate those risks to enable humans to travel and work safely in the environment of deep space. 2. Everything reasonable should be done to gain the necessary information before humans are sent on missions of space exploration. Second, in 2003 a NASA-funded workshop (New Directions in Behavioral Health: A Workshop Integrating Research and Application) consisting of behavioral researchers, operational support personnel, and NASA managers convened at the University of California, Davis to promote a dialogue among these representative participants to expand understanding of psychological, interpersonal, and cultural adaptation to space. The resulting 28 reports generated by this workshop were published in 2005 in a special issue of Aviation, Space, and Environmental Medicine, edited by Williams and Davis (2005). In an overview of the workshop, Harrison (2005) warned as follows: “We have to be wary of the expedient belief that ‘nice to have’ items such as private crew quarters, and separate areas for eating, and crew hygiene, time for recreation and other items that enhance the psychological health of the crew can be omitted or cut due to cost and schedule. In fact such items may be important, even crucial for mission success” (p. B10). As reported by Kanas, Salnitskiy, Gushin, Weiss, Grund, Flynn, Kozerenko, Sled, and Marmar (2001), Russian social scientists use the term “asthenia” to describe fatigue, emotional lability, decreased work capacity, and sleep disturbances that have been observed in cosmonauts. Similar decrements in vigor and concentration effectiveness have been reported in military participants exposed to stressful events over time (Harris, Hancock, & Harris, 2005), indicating a continuity of individual performance consequences across disparate antecedent and stress-provoking circumstances. Moreover, the explosive confrontations that occurred among the multinational crew participants during the 240-day SFINCSS-99 simulation study3 (Karash, 2000; Sandal, 2004), leading to the withdrawal of one crewmember volunteer, call attention to the “human behavior element” as the most complex component of plans and designs for extended long-duration space exploration missions beyond Earth orbit (Brady, 2005). In addition, experts continue to warn that previous success of spaceflight missions, to include stays in space for over a year, should not be taken to indicate that current approaches to spacelife management will be successful for the unprecedented durations associated with an expeditionary mission to Mars. For example, within NASA’s Man-Systems Integration Standards (NASA-STD-30004) is presented the following warning: “The user must keep in mind that much is still unknown about the over-all, long term effects of various space environments on performance capabilities.” As stated by Manzey (2004), “Our current psychological knowledge derived from orbital spaceflight and analogue environments is not sufficient to assess the specific risks of missions into outer space” (p. 781). The history of manned orbital spaceflight missions to date, however, shows clearly that humans are capable of enduring demanding spacelife work schedules in isolation and confinement for periods lasting more than a year in orbit. For example, cosmonaut Valery Polyakov holds the record for the longest stay in orbit (438 days) in 1994-19955. On occasion, however, space dwelling crews have not been able to keep pace with scheduled work, as evidenced by the infamous 24-hour “vacation” taken by the Skylab-46 astronauts during the then record-breaking 84-day manned flight ending on February 8, 19747. Much later, former Apollo 8 astronaut James Lovell commented on that event: “The people on the 3

http://www.imbp.ru/WebPages/engl/SFINCSS-99/sfincss_e.html These standards, which were codified in 1995, are in the process of being updated: http://hefd.jsc.nasa.gov/standards.htm 5 http://liftoff.msfc.nasa.gov/news/2001/news-EndIsMir.asp 6 http://chapters.marssociety.org/usa/oh/aero5.htm 7 http://www-pao.ksc.nasa.gov/history/skylab/skylab-4.htm 4

114

The Behavior Analyst Today

Volume 8, Issue 2, 2007

ground have to realize what the conditions are in the spacecraft to be able to accomplish the tasks that you give the crew. In the early days, this was a lot of times not thought about until the crews sort of rebelled and went back to the controllers or mission planners and said, ‘Look. Here’s what we can do, and here’s how we have to stretch out the agenda’” (Dick & Cowing, 2004, p. 35). And as the distance traveled and the time spent in space habitats increase for expeditionary missions, the needs and aspirations of those “sent” may be anticipated to become increasingly autonomous from the expectations and directives of the “senders.” It must be acknowledged, however, that despite the corrective crew events onboard Skylab-4, that mission concluded with unprecedented productivity by the crew8, although those astronauts never again participated in a spaceflight mission for, perhaps, obvious reasons. At the very least, then, these observations indicate that the design of space dwelling microsocieties for long-duration spaceflight must give realistic consideration to the limitations of even highly trained and motivated astronauts to sustain overbearing work-related schedules. Although conditions aboard the International Space Station (ISS) favor intense work schedules by the crew to maximize the scientific returns of such infrequent and hugely expensive undertakings9, perhaps with the expectation that mission participants will be living close to the edge of their short-term endurance, a three-year expeditionary mission to Mars obviously requires favoring crew behavioral health throughout the duration of the mission, even when that means fewer scientific returns. In general, enhancing human performance in long-duration spaceflight missions involves consideration of selection, training, equipment, pharmacology, and even surgery, which might involve prophylactic appendectomy and corneal remodeling (Gibson, 2006). NASA’s Bioastronautics Roadmap addresses areas of risk associated with long-duration spaceflight and proposes interventions (“countermeasures”) to address or overcome them10. Countermeasures may be grouped as proactive or reactive. Proactive countermeasures may include crew selection and training, onboard approaches to overcome the effects of radiation and microgravity, and work and habitat design, “as design is a critical strategy in ensuring behavioral health during extended-duration space missions” (Williams & Davis, 2005). Reactive countermeasures would address such problems as medical emergencies, depression and related emotional problems, circadian desynchronization, crew autonomy from mission control, group fragmentation and interpersonal conflict, and loss of skilled performance. There is, however, an interplay between proactive and reactive countermeasures, and a human performance technology applied to space dwelling groups would be anticipated to encompass both types. The ongoing calls for the development of evidence-based countermeasures, however, reflect the challenges associated with proposing specific recommendations for interventions that qualify to meet the urgent demands for reversing or preventing untoward individual and group events that might take place during a long-duration spaceflight mission. Current inflight countermeasures include monitoring of individual behavior, intervening directly or through the flight surgeon when necessary and appropriate, and facilitating crewmember contact with clinical and social support systems (Palinkas, 2001, p. 27). Existing countermeasures intended to promote psychological adaptation include inflight support such as leisure activities, arrangement of communication with family members, “care packages” that serve as reminders of loved ones on the ground, and adjustments to work schedules. An exception to these approaches that are targeted to brief-duration orbital missions, perhaps, is the development of computerbased training modules to manage interpersonal conflict and depression that might occur during longduration spaceflight missions (Carter, Buckey, Greenhalgh, Holland, & Hegel, 2005). 8

http://spaceline.org/flightchron/skylab4.html http://spaceflight.nasa.gov/living/spacework/index.html 10 http://bioastroroadmap.nasa.gov/User/risk.jsp 9

115

The Behavior Analyst Today

Volume 8, Issue 2, 2007

And the ever-increasing refinements of measurements of space crew behavior, which now include assessments such as WinSCAT (Kane, Short, Sipes, & Flynn, 2005), MiniCog (Shephard & Kosslyn, 2005), and related behavioral test batteries (e.g., Kelly, Hienz, Zarcone, Wurster, & Brady, 2005), have not always led to recommendations in terms of proposing specific countermeasures or spacelife schedule designs. For example, based on a series of ethological observations of isolated and confined teams, Tafforin (2005) concluded: “Optimizing such human factors is one of the challenges we will face in order for Mars teams to be efficient” (p. 1087). Additionally, Carl Walz, acting director of the Advanced Capabilities Office in NASA's Exploration Systems Mission Directorate stated recently that "psychological and physical effects on the astronauts for a Mars mission are a major concern" (Ramstack, 2006). These are only representative comments within the context of ongoing considerations of how to meet the behavioral health challenges associated with extended-duration spaceflight missions (Williams & Davis, 2005). In the distant future, even genetic manipulation might be considered an ethical approach to human performance enhancement for spaceflight (Gibson, 2006). These conditions, then, create an unprecedented and compelling need for extending the evidence base and technology on the organization of general living conditions and the performance requirements for small groups of humans traveling, living, and working together in isolated and confined microsocieties over extended time intervals. Importantly, journeying in a spacecraft on extended exploratory missions beyond Earth orbit does not constitute an ecological setting to which familiar preflight routines of living are easily applied. This unique and foreign ecology requires an applied human systems engineering technology functionally relevant to inherently unfamiliar settings that provides for a comprehensive status-assessment of a confined microsociety beyond what is available from even a finegrained, multi-dimensional individual evaluation. Unforeseen events taking place on such expeditionary missions beyond Earth orbit require possible "countermeasure" interventions at the integrative human systems engineering level rather than at the level of an individual crewmember. Despite uncertainties regarding the requirements of projected spaceflight initiatives beyond Earth orbit, a common feature of such expeditionary endeavors over the next half century will be extended stays by human groups in extraterrestrial vehicles and habitats. The imperatives and opportunities associated with the development and configuration of functional ecological models for such space dwelling human microsocieties must be based upon sound scientific principles for the behavioral management of semipermanent as well as permanent groups with both operational and space science missions (Brady, 1992, in press). CURRENT APPROACH TO SPACELIFE MANAGEMENT Astronauts aboard the International Space Station (ISS) follow a precisely controlled schedule of activities that are intended to maximize the scientific returns of a mission and to ensure the crewmembers’ physiological health as evidenced by exercise, sleep, and nutrition requirements. Figure 1, for

116

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Figure 1. A four-hour interval of activities scheduled for astronauts aboard the ISS. example, taken from NASA’s daily posting of the ISS flight plan timelines11 in GMT, shows an approximately four-hour interval of activities for the three astronauts as scheduled for February 16, 2007. The schedule is precisely controlled to maximize work performance within a setting that requires sharing of resources, including exercise equipment and personal hygiene facilities. Some activities are scheduled for as few as five minutes (e.g., Payload status check for FE-2), and others for as many as 90 minutes (e.g., Physical Exercise for CDR). The continued presence of others is an enforced socialization. Observations of space crews living and working under such rigorously imposed schedules, however, indicate that compliance with the timelines and work demands is not always easily accomplished, if at all12. Variability exists with respect to scheduled events, and there is an ongoing interaction between crew participants and mission control regarding adjustments that are essential when imposed schedules cannot be met. And the rigorous time-oriented schedule offers meals when crewmembers may not be hungry and expects sleep when crewmembers may not be sleepy. An alternative to such time-based activity requirements is almost certainly to be required for long-duration spaceflight beyond Earth orbit, and later sections of this paper will propose such an alternative. Over time within the isolated and confined conditions of long-duration spaceflight, superordinate consequences for regimen/schedule compliance can be anticipated to lose force in sustaining effective individual and team mission-critical performances. Contingency management operations that involve such outcomes as financial rewards or directive and exhortational interactions with mission control will likely be ineffective. Given the ecologically impoverished conditions of space life, inter-operant management of scarce resources may be promising in the design of a confined microsociety that is topographically and functionally prosthetic, at the levels of individual and group, for its crewmember participants. The importance of activity as a countermeasure to the stresses of isolation and confinement was acknowledged by a NASA astronaut who identified being “meaningfully busy” as the single most important factor on a long-duration flight (Herring, 1997, p. 44; cited in Kanas & Manzey, 2003). The implausibility, if not impossibility, of pre-mission ground-based empirical verification of an optimal set of spacelife parameters supportive of members of expeditionary missions beyond Earth orbit suggests the need for a heuristic approach to the design of an isolated and confined microsociety. Such a 11 12

http://spaceflight.nasa.gov/living/index.html J.V. Brady, personal communication.

117

The Behavior Analyst Today

Volume 8, Issue 2, 2007

heuristic, technological, and analytic behavioral application will reflect the “...process of applying sometimes tentative principles of behavior to the improvement of specific behaviors” (Baer, Wolf, & Risley, 1968, p. 91). In that regard, multi-operant models of inter-operant relationships (Findley, 1962) were extended to the design of a mono-inhabited microsociety that supported the multi-dimensional repertoire of a single human volunteer residing in a programmed environment for over six months (Findley, 1966). The orderliness observed in the animal models provided a heuristic context for extending the underlying principles and technology to the generation and support of a multi-operant human performance repertoire. Such an extension follows the principle of “systematic replication” (Sidman, 1960). In much the same way that behavior analysis is challenged to predict the exact instant when a laboratory animal may press a lever for food under a variable-interval schedule of reinforcement or the exact response topography, the orderliness of a human multi-operant repertoire need not require prediction of each instance of an activity selection or engagement. Behavioral “control,” in the sense of knowledge of the conditions that promote a desired performance steady-state, is evidenced at the metaoperant level of analysis. The steady-state, obtained with a variable-interval schedule for the laboratory animal and for a multi-operant repertoire for a human, is to be understood as a function of at least one set of antecedent conditions whose implementation is taken to “control” the production of the desired performance repertoires. A spacelife systems engineering approach, then, reflects the goal of programming inter-operant schedules to optimize the value and access impact of available resources under conditions that promote intra-system motivation and novelty within the context of superordinate steady-states that are operationalized by the disposition of crewmembers to exhibit mission-critical performances during recurrent traversals of a flexibly oriented regimen to achieve mission objectives. MULTI-OPERANT APPROACHES One of the first groups of psychologists to be involved in considering the challenges of undertaking experiments with space-going animals consisted of C. B. Ferster, D. Meyer, C. G. Mueller, F. Ratliffe, and H. Schlosberg, and it was convened by the Behavioral Sciences Advisory Committee of the USAF Air Research and Development Command (Rohles, 1960). Early efforts targeted mice, but the transition to primates was soon underway, supported by scientists from the Walter Reed Army Institute of Research (Brady, 2005). Although Skinner was perhaps the first aerospace comparative psychologist (Skinner, 1960), scientists from the latter Institute were instrumental in providing the behavioral training for the primates named Able and Baker, launched in the nose cone of a rocket in 1958 (Brady, 1990). Operant techniques were instrumental in evaluating the performance effects of subsequent flights by the rhesus monkey named SAM (School of Aerospace Medicine), on December 4, 1959, and the chimpanzee named HAM (Holloman AeroMedical Laboratory), on January 31, 1961 (Rohles, 1992). Both discrete and free operant shock-avoidance tasks were programmed in these initial flights, with subsequent flights using more complex schedules of reinforcement (Rohles, 1966). . Figure 2 shows chimpanzee Ham in the flight couch for the Mercury-Redstone 2 (MR-2) suborbital test flight.13 On January 31, 1961, a Mercury-Redstone launch from Cape Canaveral carried

13

http://lsda.jsc.nasa.gov/scripts/photoGallery/detail_result.cfm?image_id=1813 Use of this and other similar photographs follows NASA’s guidelines regarding the use or reproduction of NASA material obtained from a JSC web page: http://www.jsc.nasa.gov/policies.html#Guidelines

118

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Figure 2. Ham in the flight couch. him over 640 kilometers down range in an arching trajectory that reached a peak of 254 kilometers above Earth. The flight lasted 16 minutes, 39 seconds.14 Figure 3 shows the test apparatus supporting the discrete and free-operant avoidance tasks performed by Ham during his journey15. Figure 4 shows Ham greeting a handler at the conclusion of his

Figure 3. Test apparatus for Ham.

14 15

http://lsda.jsc.nasa.gov/scripts/experiment/exper.cfm?exp_index=907 http://lsda.jsc.nasa.gov/scripts/photoGallery/detail_result.cfm?image_id=1817

119

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Figure 4. Ham at the conclusion of his flight. flight.16 As stated in the report, Ham's flight on MR-2 met all of its objectives, and it was a significant accomplishment toward manned U.S. space flight. The results of the flight showed no significant change in Ham's physiological state or psychomotor performance17, and an examination of Figure 4 would suggest that Ham experienced no enduring ill effects of his journey into space and back. Mercury-Atlas 5 (MA-5), the second chimpanzee flight, was launched on November 29, 1961. During this flight, which lasted 3 hours and 21 minutes, a chimpanzee named Enos performed a complex multiple operant task while orbiting the earth twice. As stated in the description of MA-518, the performance test panel used for the MA-5 flight was specifically designed for the orbital flight. The MA5 performance test panel consisted of three miniature inline digital displays and three levers. The panel also controlled a pellet feeder that was incorporated into the panel and a lip-lever activated drinking tube that was attached to the flight couch near the head of Enos. The performance task for the MA-5 flight consisted of a five-component multiple-operant schedule combining appetitive and avoidance tasks. At the conclusion of this flight, the data showed no significant disturbance in Enos’ performance that could be attributed to the weightless state, to the other conditions accompanying the flight, or even to a lever malfunction during the second orbital pass. A general finding based upon the flights by Ham and Enos was as follows. A 7-minute (MR-2) and a 3-hour (MA-5) exposure to weightlessness was experienced by the subjects in the context of an experimental design that left visual and tactile references unimpaired. There was no significant change in either animal's physiological state or performance as measured during a series of tasks of graded motivation and difficulty.19 16

http://lsda.jsc.nasa.gov/scripts/photoGallery/detail_result.cfm?image_id=1804 http://lsda.jsc.nasa.gov/scripts/mission/miss.cfm?mis_index=164 18 http://lsda.jsc.nasa.gov/scripts/experiment/exp_descrp_pop_up.cfm?exp_id=CHIMP&string=¤t_ string= 19 http://lsda.jsc.nasa.gov/scripts/experiment/exp_descrp_pop_up.cfm?exp_id=CHIMP&string=¤t_ string= 17

120

The Behavior Analyst Today

Volume 8, Issue 2, 2007

These remarkable findings with Ham and Enos clearly showed the applicability of basic operant techniques to sustain a complex set of performances under obviously difficult circumstances. From the perspective of behavior analysis, both Ham and Enos were residents in a programmed environment during training and during the rigors of their respective spaceflights. The environments of both of these chimpanzees were designed to support an increasingly complex set of performances under increasingly challenging schedules of reinforcement. Along with these early developments with space-going animals, an attempt was being made to advance behavior analysis by complicating simple performance units into sequences of operants (Findley, 1962), where an operant is operationalized as a class of responses, any of which will produce a specified consequence under a given set of environmental circumstances (Skinner, 1953, p. 65). This important work led to the conceptualization of multi-operant behavior as “the experimental demonstration of several related operants, where each operant is defined in terms of explicit operations and experimental control” (Findley, 1962, p. 114). In contrast to a strictly serial sequence of operants, this research program led to the demonstration of stable performance repertoires maintained by the interacting reinforcing value of operant engagement alternatives that were available to the organism at various transition points in a sequence. The resulting steady-state performances observed within the context of iterations that involved different choices at the transitions points (“trees”) were shown to exhibit properties of a unitary operant, based upon the sensitivity of a tree to stimulus control and influence by DRL and FR schedules of reinforcement. The extension of this multi-operant model to the challenge of maintaining the health and productivity of a human was undertaken by Findley, Migler, and Brady (1963). A male volunteer entered a programmed environment for what was to be an indeterminate period of time, and the schedules in effect for work, sleep, and recreation showed continuity between the multi-operant procedures used in animals and those in this single-human study. Figure 5 presents a diagram of the environment, which consisted of a central living and sleeping area, an adjacent work room, and an adjacent bathroom. Figure 6 presents a diagram of the behavioral program that determined how the resources in the environment

Figure 5. Schematic diagram of the single-subject programmed environment. Taken from NASA Report NASA-CR-5291320 (Findley, Migler, & Brady, 1963, p. 12). 20

NASA Terms and Conditions of Use: http://ntrs.nasa.gov/NTRS.NACA.2005.Copyright.htm

121

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Figure 6. A diagram of the behavioral program. Taken from NASA Report NASA-CR-5291321 (Findley, Migler, & Brady, 1963, p. 16). were to be accessed and used. Each box in the behavioral program represents an activity unit, and all requirements associated with a given activity unit had to be met before the occupant could transition from one activity to another in the sequence. Beginning at the left of the diagram, a “Sleep trip” consisted of all activities between and including Toilet Operations (T.O.) and Sleep (SLP). The arrow from SLP to T.O. designates a minimum loop of fixed activities designed to assess and maintain the participant’s health. The multi-operant aspects of the program are reflected in the alternatives available to the resident within the right-most three columns of activity units. The first set of alternatives came after Food One (FD1), which was a light meal. After SLP, one of the following three activities was available for selection: (1) Work One (WK1), which 21

NASA Terms and Conditions of Use: http://ntrs.nasa.gov/NTRS.NACA.2005.Copyright.htm

122

The Behavior Analyst Today

Volume 8, Issue 2, 2007

was a tracking task within the work room; (2) SLP, providing access to the bunk for an unlimited period of time; or (3) Work Two (WK2), which required a range of intellectual, clerical, or manual behaviors. The final two columns were intended to provide a progressively greater selection of activities having increasing reinforcing strength. For example, Programmed Instruction (P.I.) provided access to reading material in frames, and Manual Behavior (M.B.) provided access to art material. Within the last column are nine activity units, and the participant was allowed to select and complete four units before transitioning to T.O. at the beginning of the program. The sequence of selections constituted a “Wake Trip.” Reinforcing activities within the last column included Food Two (FD2), which was a major meal, access to Music (M.U.), and access to Cigarettes (CIG.), which were earned by operating a manipulandum. A full description of the environment and the behavioral program is presented in Findley, Migler, and Brady (1963) and Findley (1966). The behavioral program was designed to support the full range of the participant’s work, recreational, and health maintenance activities: behavioral health. The boundaries between successive activities insured that all requirements for a given activity unit were satisfied before transition to a subsequent fixed or alternative activity was permitted. This also provided the opportunity to assess performance within any given activity, such as error frequency on the tracking task, and to assess activity choices and durations over time. More importantly, perhaps, the behavioral program was designed so that all incentives to sustain performance were intrinsic to the programmatic sequencing of the fixed and alternative selections. The participant’s full behavioral repertoire was maintained throughout the 152 days with no incentives external to the programmatic sequence. This clearly showed the value of the behavioral program’s design to support an individual’s motivation to engage in tasks essential to the “mission” and to the welfare of the participant over an extended time period in isolation and confinement. The programmed environment reflected the following: “In summary, then, the techniques of the animal laboratory, that is, the specification of contingencies, the stimulus control of behavior, the organization of complex sequences of behavior, and the use of the continuous experimental environment all combine to provide, in principle, most of the elements employed here for the design of an experimental environment for human research” (Findley et al., 1963, p. 8). Among the report’s recommendations to NASA was the following: “First and foremost is the fact that this environment sustained the subject in good health and maintained good work performance at a variety of tasks under conditions of extreme social isolation and confinement for an unprecedented duration of 152 days, or approximately five months” (Findley et al., 1963, p. 111).22 To extend this approach to groups, a residential laboratory was established at The Johns Hopkins University School of Medicine, and early reports of this work were given by Bigelow, Emurian, and Brady (1973) and Brady, Bigelow, Emurian, and Williams (1975). Figure 7 presents a diagram of this

Figure 7, Next page!

22

The participant, Whilden Breen, Jr., is identified by name in the report. Life Magazine published a description of his experiences within the May 17, 1963 issue.

123

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Figure 7. A diagram of the three-person programmed environment. three-person residential laboratory. The laboratory supported the implementation of a continuously programmed environment methodology as a tool to implement interbehavioral research and applications programs supporting individual and group adjustment to the rigors of isolation and confinement (Brady, 1992). The methodological approach is a direct descendent of the work reported in Findley et al. (1963), and it constitutes a systematic replication (Sidman, 1960). The resulting research methodology brings within the laboratory a broad range of complex and naturalistic features of the habitation/behavior environment for experimental analysis, permits programming, monitoring, objective recording, and quantitative measurement of interaction patterns, and provides for controlled study of both individuals and groups under experimental conditions of long durations without sacrifice of methodological rigor (Brady, Bernstein, Foltin, & Nellis, 1988). Similar to Findley et al. (1963), a behavioral program, which is implemented within the context of a continuously programmed environment, is operationalized by an array of individual and group activities or behavioral units and the rules determining the relationships among them. Figure 8 presents a representation of a behavioral program designed for several typical applications within the residential

Figure 8, Next Page!

124

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Figure 8. A prototypical behavioral program used within the residential environment for group analyses. environment. Figure 9 presents a brief description of the activities represented by the acronyms displayed in Figure 8. This behavioral program is structurally and functionally similar to the one adopted in the single-participant study with the notable addition of social episodes.

Figure 9, Next page!

125

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Figure 9. An inventory of activities constituting a typical behavioral program Beginning with Health Check, participants follow the behavioral program sequentially from left to right. In general, activities that were heuristically judged to show relatively high reinforcing “force” are positioned later in the sequence, and more choices are available. The circled “1” indicates that one choice could be made among those activities designated by the adjacent arrows. At the completion of either Sleep or an activity within the last column, the participant returns to Health Check. There are, then, two iterative sequences with this program. The first sequence, ranging from Health Check to Sleep, is designed to maintain and assess the participant’s health if he or she were otherwise indisposed to engage in the broader selection of activities. The second sequence commenced by a choice other than Sleep when Food One was terminated. That latter sequence consisted of alternative activity opportunities, and successive iterations through the program potentially consisted of different sequences of activities. Several activity selections are outside the scope of the sequential design of the full behavioral program. For example, the individual multiple task performance battery (MTPB), which was a variant of that developed by Morgan and Alluisi (1972), could be selected between any two activities within the full program, and it was presented on a computer terminal within the private room. In most studies, a participant’s remuneration was a function of earning performance “points” on that task. In some studies

126

The Behavior Analyst Today

Volume 8, Issue 2, 2007

(e.g., Emurian, Brady, Ray, Meyerhoff, & Mougey, 1984), a team version of the MTPB was programmed such that three participants were required simultaneously to enter a correct response on designated subtasks while working on separate terminals, all located within the workshop area of the laboratory. Access to social activities required participants to select the activity concurrently. For example, Food Three was a major social meal in the recreation room, and access to that opportunity required either two or three participants’ schedules to be synchronized on any given access occasion. Such a contingency required communications between and among the participants to ensure that schedules were, in fact, synchronized for that social activity opportunity. A behavioral program provides a promising solution to the problem of structuring the limited resources and information that may be available to members of a confined microsociety (Emurian, 1988). The functional interdependencies among activities ensure that performances of value to the welfare of the individual (e.g., physical exercise), to the welfare of the group (e.g., social recreation), and to the welfare of a mission (e.g., sustained individual and team performance effectiveness) occur recurrently over time. These interbehavioral and functional interdependencies reflect the “motivational” properties inherent within successive progressions through the program, and all incentives to maintain the overall operational status of the confined microsociety can reside within the design of the behavioral schedule itself. The necessarily reduced resource opportunities available to spaceflight participants require this behavioral program technology to optimize the impact of resource access to ensure continued value to crewmembers over extended durations. The behavioral program not only structures access to resources, but it also makes all corresponding activity units available for measurement. The boundaries between successive activities in the program impose rigor on the assessment of individual and group preferences and effectiveness with those activities. Additionally, the program has the advantage of providing a comprehensive range of variables for observation and measurement, together reflecting the behavioral health of the organization. Although free-running spacelife schedules may impact and shift wake-sleep cycle routines and circadian rhythms (Kanas & Manzey, 2003, p. 136), it is not at all certain that inflexible work-rest routines will best serve the behavioral health of crew members under conditions of spaceflight durations associated with a Mars expeditionary mission. The effectiveness of a multi-operant behavioral program was affirmed repeatedly over a series of investigations where such a program was implemented (Brady & Emurian, 1978; Brady & Emurian, 1983; Emurian, Brady, Meyerhoff, & Mougey, 1983; Emurian, Emurian, & Brady, 1978; Emurian, Emurian, & Brady, 1985; Emurian, Emurian, & Brady, 1982; Emurian, Emurian, Bigelow, & Brady, 1976). Although experimental operations were performed during the course of those studies, the adoption of the behavioral program as the over-arching approach to generating and sustaining both individual and social work and recreational activities best reflects, perhaps, the technique of systematic replication by “affirming the consequent” (Sidman, 1960, p. 127), which was evidenced by the behavioral outcomes that were observed dependably across different groups and across different experimental interventions. Given the unavailability of concurrent access by crewmembers to many critical resources aboard the spacecraft, a programmatic schedule will likely require a combination of temporally bounded activity opportunities for each astronaut presented in concert with multi-operant sequences and options on other occasions. It is this intersection of two schedule design approaches that provides the occasion for a behavioral economics management model to be considered to bridge the inter-behavioral relationships among successful engagement and completion of critical mission "high-cost" activities and later engagement and completion of "high-demand" activities, the latter in support of maintaining optimal behavioral health during long-duration expeditionary missions.

127

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Behavioral economics (Kagel & Winkler, 1972; Lambert, 2006) provides the conceptual framework for scheduling and interpreting inter-operant relationships within the behavioral program for fixed inter-behavioral sequencing of performance units and for choice management on those occasions where a selection of required and optional activity units is available for use. In particular, consideration of demand elasticity associated with valued activities, together with potential substitutability of one valued activity for another, has direct relevance to managing the resources available to members of a closed economy (Hursh, 1980). Such management is to be understood in terms of suggesting baseline behavioral program ingredients and parameters and countermeasure responses when a dynamic change in the organizational context of a microsociety, as determined by the interactions among multi-operant units, is detected. Behavioral economics will provide an analytical decision support tool for interpreting the reinforcing strength of access to activity units that may be delayed, temporally and sequentially, for selection due to antecedent performance requirements that precede the opportunity to engage in such optional units. And when an otherwise reinforcing unit having anticipated beneficial side-effects to crew cohesion, such as social occasions, is determined to show a loss of strength as evidenced by decreased engagement, a behavioral economics management model can suggest a reorientation of available highvalue opportunities in support of social occasions whose occurrence now relates to a requirement, rather than an option. Similar re-stabilization of the meta-operant behavioral program is anticipated to occur in response to dynamic changes in individual and crew adjustment to isolation and confinement and to dynamic alterations of the value of access to available environmental resources and performance requirements. Under conditions of extreme isolation and confinement, it should be anticipated that activity units will exhibit a combination of performance requirements (i.e., “cost”) and reinforcing value (i.e., “demand”) and that these properties will change over time. The fact that instructional control of work-schedule compliance, which leaves uncertain the precise controlling variables independent of a training history (Kelly et al., 2005), may be compromised by crew autonomy during long-duration expeditionary missions (Brady, 2005) calls attention to the need for realistically alternative approaches to maintain vital individual and crew performances under conditions of isolation and confinement. MISSION TO MARS An expeditionary mission to Mars is projected to take up to three years (Manzey, 2004). Despite the impressive accomplishments in the areas of crewmember screening, selection, and training for spaceflight operations, evidence-based countermeasures to the demands on behavioral health of such a mission have yet to be developed. How a behavioral program might be integrated into a traditionally time-oriented schedule is considered here as a potential countermeasure. Based initially upon Russian experiences (e.g., Gushin, Kholin, & Ivanovsky, 1993), there is suggestive evidence of stages of spaceflight adaptation (Manzey, 2004). Despite the inconclusive findings regarding the generality of such stages across disparate simulation and actual spaceflight conditions (Kanas & Manzey, 2003, ch. 2), consideration of the potential for time-based reactions to isolation and confinement by space dwelling groups provides the occasion for proposing countermeasures directed toward particular stages of adaptation. For example, Manzey (2004) suggests that after approximately 12 weeks in space, during which crew members have adjusted to the routine of their mission, the deleterious effects of boredom and monotony will begin to impact the crew, with increasing stress becoming evident as a function of time spent under those conditions. Accordingly, a countermeasure to such circumstances is to implement a behavioral program. A spacelife systems engineering approach, then, might consider the use of rigorous time-oriented work and rest routines during the early weeks of a mission, during which the activity level is plausibly intense as the crew concludes a launch and prepares for the long months ahead of the transfer journey to Mars. A transition from such routines to a behavioral program is suggested as a countermeasure to the stress consequences of the early months in space. As the crew approaches Mars

128

The Behavior Analyst Today

Volume 8, Issue 2, 2007

after many more months, a transition back to a time-oriented schedule might occur in preparation for the demands of the landing and residence on Mars. Finally, a recent survey of 11 Russian cosmonauts gave suggestive evidence of a preference by some for crewmember regulation of work and rest schedules depending on the stage of the flight (Nechaev, Polyakov, & Morukov, 2007). It must be acknowledged, however, that managing the behavioral health of space dwelling groups is but one component among several equally critical health-related issues that must be confronted and overcome if a mission to Mars is to succeed. In that regard, it has been estimated that during the journey to Mars, crewmembers could lose as much as 40% of their muscle mass and 25% of their bone (Hawkey, 2005). In a dramatic example of what could happen, Haddy (2007) posed the following question: “Imagine a radiation sick, sleep-deprived astronaut stepping on Mars; muscle-and-bone weakened and dehydrated, he or she becomes hypotensive, faints, and breaks a leg. What now, Houston” (p. 643)? His argument, which takes into consideration the earlier findings of Krauss (1991), is that insufficient biological science research has been conducted to address and overcome such problems as long-term exposure to radiation and microgravity. Adding such problems to an astronaut already compromised by the cumulative stresses of life in isolation and confinement is to appreciate the compelling need for an aggressive multi-disciplinary research program to provide evidenced-based countermeasures to all challenges associated with a mission to Mars. In that regard, however, Mark G. Benton of Boeing Space and Intelligence Systems has published a vehicle architecture for a six-person spaceship, based on technologies currently near the state-of-the-art, that provides both shielding against galactic cosmic rays and artificial gravity to mitigate crew physiological problems on long-duration missions, to include an expeditionary mission to Mars (Benton, 2006). A technical solution to such problems leaves crewmember behavioral health as a major consideration for current research initiatives. CHALLENGES There is increasing attention given to the benefits of residential environments to address questions of scientific importance in such areas as behavioral pharmacology (e.g., Donny, Bigelow, & Walsh, 2003), drug effects on learning and cognition (e.g., Kelly, Foltin, & Fischman, 1993), behavioral and physiological effects of phase shifts in sleep cycles (e.g., Hart, Ward, Haney, Nasser, & Foltin, 2003), and properties of reinforcement (Bernstein, 1998). The intent of such work is, for the most part, to understand functional relationships that will account for behavior outside the boundaries of such investigative laboratories (“external validity”). With respect to spaceflight operations, however, the intent is different. In such latter cases, the intent is to understand how best to promote and sustain human performance, adaptation, and endurance under the conditions of an inherently constrained environment (“internal validity”). Research addressing such considerations, however, is challenged, if not directly undermined, by the prevailing culture of scientific endeavors. As stated by Musson and Helmreich (2005), “The shortterm nature of funded research and the expectation of producing meaningful results in the near-term is a result of the culture of experimental scientific research. Such an approach, however, does not seem to suit such settings as human spaceflight... (p. B124).” Although these comments reflected research initiatives relating personality traits to performance in spaceflight and analogue settings, similar challenges exist in undertaking long-duration simulations in ground-based residential laboratories investigating programmed environment management of confined microsocieties. NASA is aware of such challenges, as indicated by Recommendation 4.2 in the Executive Summary of The National Aeronautics and Space Administration’s Bioastronautics Roadmap: “How to support the extensive behavioral research program that would be necessary to validate processes or countermeasures such as select-in/select-out criteria (both for individual crew members and for a composite crew), issues related to cultural diversity, crew interactions, and isolation or stress-induced hazards. These issues may well require long lead times to study adequately” (Longnecker & Molins, 2006, p. 13).

129

The Behavior Analyst Today

Volume 8, Issue 2, 2007

These challenges notwithstanding, the report by Ball and Evans (2001) concludes Chapter 5 (Behavioral Health and Performance) recommending that NASA should give priority to increasing the knowledge base of the effects of living conditions and behavioral interactions on the health and performance of individuals and groups involved in long-duration missions beyond Earth orbit. Investigative attention should focus on the following factors: • • • • • •

Understanding group interactions in extreme, confined, and isolated microenvironments; Understanding the roles of sex, ethnicity, culture, and other human factors on performance; Understanding potentially disruptive behaviors; Developing means of behavior monitoring and interventions; Developing evidence-based criteria for reliable means of crew selection and training and for the Management of harmonious and productive crew interactions; and Training of both space-dwelling and ground-based support groups specifically selected for involvement in operations beyond Earth orbit.

Further investigations, then, should be directed toward the development of countermeasures to overcome such challenges as cultural differences among members of multinational crews, personality differences among members of disparate professional and technical disciplines, the distribution of authority and roles within mixed gender crews, and sexual interactions. In the case of sexual interactions, careful consideration must be given to providing living arrangements that will accommodate this potential challenge to group cohesiveness. CONCLUSIONS Everything reasonable must be done to support the development of evidenced-based principles (“countermeasures”) to support the design of confined microsocieties for spaceflight applications (Ball & Evans, 2001). In that regard, the Institute for Biomedical Problems of the Russian Academy of Sciences is currently planning a 500-day ground-based study (Project MARS-500), scheduled to begin in late 2008, to simulate the duration of components of a Mars mission for a 6-person crew.23 Developing evidencedbased principles to foster and maintain the interest and willingness of inhabitants of confined microsocieties to perform in ways that are beneficial to themselves and to a mission is critically important to the success of future manned space initiatives. The multi-operant behavioral program, which is derived from behavior analytic methods and procedures, provides a promising structural and functional approach to the problem of motivating and monitoring individual and group behavior for the continuous management, observation, and assessment of a confined microsociety. Long-duration simulations and evaluations of this approach, undertaken within a continuously programmed environment ecology, obviously constitute the reasonable next step. REFERENCES Baer, D.M., Wolf, M.M., & Risley, T.R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1, 91-97. Ball, J.R., & Evans, C.H., Jr. (2001). Safe Passage: Astronaut Care for Exploration Missions. Committee on Creating a Vision for Space Medicine During Travel Beyond Earth Orbit, Board on Health Sciences Policy, Institute of Medicine. URL: http://www.nap.edu/catalog/10218.html#orgs

23

http://www.imbp.ru/Mars500/Mars500-e.html

130

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Benton, M.G., Sr. (2006). Spaceship Discovery – Vehicle architecture for human exploration of Moon, Mars, and beyond. A Collection of Technical Papers: AIAA Space Conference & Exposition (pp. 2305-2328), San Jose, CA, September 19-21, 2006. ISBN: 9781563478246.

Bernstein, D.J. (1998). Establishment of a laboratory for continuous observation of human behavior. In K. A. Lattal & M. Perone (Eds.), Handbook of Research Methods in Human Operant Behavior (pp. 509-539). New York: Plenum. Bigelow, G.E., Emurian, H.H., & Brady, J.V. (1973). A programmed environment for the experimental analysis of individual and small group behavior. Presented at the Symposium entitled Controlled Environment Research and its Potential Relevance to the Study of Behavioral Economics and Social Policy. Addition Research Foundation, Toronto, Canada. Brady, J.V. (in press). Behavior analysis in the space age. The Behavior Analyst Today. Brady, J.V. (2005). Behavioral health: The propaedeutic requirement. Aviation, Space, and Environmental Medicine, 76(6), 13-24. Brady, J.V. (1992). Continuously programmed environments and the experimental analysis of human behavior. Cambridge, MA: Cambridge Center for Behavioral Studies. Brady, J.V. (1990). Toward applied behavior analysis of life aloft. Behavioral Science, 35, 11-23. Brady, J.V., Bernstein, D.J., Foltin, R.W., & Nellis, M.J. (1988). Performance enhancement in a semiautonomous confined microsociety. The Pavlovian Journal of Biological Science, 23, 111-117. Brady, J.V., & Emurian, H.H. (1978). Behavior analysis of motivational and emotional interactions in a programmed environment. In R. Dienstbier and R. Howe (Eds.), Nebraska Symposium on Motivation (pp. 81-122), University of Nebraska Press. Brady, J.V., & Emurian, H.H. (1983). Experimental studies of small groups in programmed environments. Journal of the Washington Academy of Sciences, 73(1), 1-15. Brady, J.V., Bigelow, G.E., Emurian, H.H., and Williams, D.M. (1975). Design of a programmed environment for the experimental analysis of social behavior. In D.H. Carson (Ed.), ManEnvironment Interactions: Evaluations and Applications. 7: Social Ecology (pp. 187-208). Carter, J.A., Buckey, J.C., Greenhalgh, L., Holland, A.W., & Hegel, M.T. (2005). An interactive media program for managing psychosocial problems on long-duration spaceflights. Aviation, Space, and Environmental Medicine, 76(6), Section II, B213 – B223. Dick, S.J., & Cowing, K.L. (2004). Risk and Exploration: Earth, Sea, and the Stars. NASA Administrator’s Symposium, September 26–29, 2004, Naval Postgraduate School Monterey, California. URL: http://history.arc.nasa.gov/hist_pdfs/book_risk+explore/riskandexploration_all.pdf Donny, E.C., Bigelow, G.E., Walsh, S.L. (2003). Choosing to take cocaine in the human laboratory: Effects of cocaine dose, inter-choice interval, and magnitude of alternative reinforcement. Drug and Alcohol Dependence, 69, 289-301.

131

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Emurian, H.H. (1988). Programmed environment management of confined microsocieties. Aviation, Space, and Environmental Medicine, 59(10), 976-980. Emurian, H.H., Brady, J.V., Meyerhoff, J.L., & Mougey, E.H. (1983). Small groups in programmed environments: Behavioral and biological interactions. The Pavlovian Journal of Biological Sciences, 18(4), 199-210. Emurian, H.H., Brady, J.V., Ray, R.L., Meyerhoff, J.L., & Mougey, E.H. (1984). Experimental analysis of team performance. Naval Research Reviews, 36(1), 3-19 Emurian, H.H., Emurian, C.S., & Brady, J.V. (1978). Effects of a pairing contingency on behavior in a three-person programmed environment. Journal of the Experimental Analysis of Behavior, 29, 319-329 Emurian, H.H., Emurian, C.S., & Brady, J.V. (1982). Appetitive and aversive reinforcement schedule effects on behavior: A systematic replication. Basic and Applied Social Psychology, 3(1), 19-27 Emurian, H.H., Emurian, C.S., & Brady, J.V. (1985). Positive and negative reinforcement effects on behavior in a three-person programmed environment. Journal of the Experimental Analysis of Behavior, 44, 157-174 Emurian, H.H., Emurian, C.S., Bigelow, G.E., & Brady, J.V. (1976). The effects of a cooperation contingency on behavior in a continuous three-person environment. Journal of the Experimental Analysis of Behavior, 25(3), 293-302 Findley, J.D. (1962). An experimental outline for building and exploring multi-operant behavior repertoires. Journal of the Experimental Analysis of Behavior, 5(1 Suppl), 113-166. Findley, J.D. (1966). Programmed environments for the experimental analysis of human behavior. In W.K. Honig (Ed.). Operant Behavior: Areas of Research and Application (pp. 827-848). Englewood Cliffs, NJ: Prentice-Hall, Inc. Findley, J.D., Migler, B.M., & Brady, J.V. (1963). A long-term study of human performance in a continuously programmed experimental environment. Technical Report to the National Aeronautics and Space Administration. University of Maryland, College Park. URL: http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19640001916_1964001916.pdf Gibson, T.M. (2006). The bioethics of enhancing human performance for spaceflight. Journal of Medical Ethics, 32, 129-132. Gushin, V.I., Kholin, S.F., & Ivanovsky, Y.R. (1993). Soviet psychophysiological investigations of simulated isolation: Some results and prospects. In S.L. Bonting (Ed.), Advances in Space Biology and Medicine, 3, London: JAI Press. Haddy, F.J. (2007). NASA – Has its biological groundwork for a trip to Mars improved? Journal of the Federation of American Societies for Experimental Biology, 21, 643-647. Harris, W.C., Hancock, P.A., & Harris, S.C. (2005). Information processing changes following extended stress. Military Psychology, 17(2), 115-128.

132

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Harrison, A.A. (2005). Behavioral health: Integrating research and application in support of exploration missions. Aviation, Space, and Environmental Medicine, 76(6), Section II, Supplement, B3-B12. Hart, C.L, Ward, A.S, Haney, M., Nasser, J., & Foltin, R.W. (2003). Methamphetamine attenuates disruptions in performance and mood during simulated night-shift work. Psychopharmacology, 169, 42-51. Hawkey, A. (2005). Physiological and biomechanical considerations for a human Mars mission. The Journal of the British Interplanetary Society, 58, 117-130. Herring, L. (1997). Astronaut draws attention to psychology. Human Performance in Extreme Environments, 2, 42-47. Hursh, S. R. (1980). Economic concepts for the analysis of behavior. Journal of the Experimental Analysis of Behavior, 34, 219-238. Kagel, J. H., & Winkler, R. C. (1972). Behavioral economics: areas of cooperative research between economics and applied behavior analysis. Journal of Applied Behavior Analysis, 5, 335-342. Kanas, N., & Manzey, D. (2003). Space Psychology and Psychiatry. Boston, MA: Kluwer Academic Publishers. Kanas, N., Salnitskiy, V., Gushin, V., Weiss, D.S., Grund, E.M., Flynn, C., Kozerenko, O., Sled, A., & Marmar, C.R. (2001). Asthenia—Does it exist in space? Psychosomatic Medicine, 63, 874-880. URL: http://www.psychosomaticmedicine.org/cgi/content/full/63/6/874 Kane, R.L., Short, P., Sipes, W., & Flynn, C.F. (2005). Development and validation of the spaceflight cognitive assessment tool for windows (WinSCAT). Aviation, Space, and Environmental Medicine, 76(6), Section II, B183 – B191. Karash, Y. (2000). The Real World, Moscow-Style. Space.com. URL: http://www.space.com/news/spacestation/isolation_russia_000412.html

Kelly, T.H, Foltin, R.W., & Fischman, M.W. (1993). Effects of smoked marijuana on heart rate, drug ratings and task performance by humans. Behavioural Pharmacology, 4, 167-178. Kelly, T.H., Hienz, R.D., Zarcone, T.J., Wurster, R.M., & Brady, J.V. (2005). Crewmember performance before, during, and after spaceflight. Journal of the Experimental Analysis of Behavior, 84(2), 227-241. Krauss, R.W. (1991). NASA – A new course? A biologist’s view of the Augustine Report. The Journal of the Federation of American Societies for Experimental Biology, 5, 251. Lambert, C. (2006). The marketplace of perceptions. Harvard Magazine, March-April. URL: http://www.harvardmagazine.com/on-line/030640.html Longnecker, D., & Molins, R. (2006). A Risk Reduction Strategy for Human Exploration of Space: A Review of NASA's Bioastronautics Roadmap. URL: http://books.nap.edu/execsumm_pdf/11467.pdf

133

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Manzey, D. (2004). Human missions to Mars: New psychological challenges and research issues. Acta Astronautica, 55, 781-790. Morgan, B.B., & Alluisi, E.A. (1972). Synthetic work: Methodology for assessment of human performance. Perceptual and Motor Skills, 35, 835-845. Musson, D.M., & Helmreich, R.L. (2005). Long-term personality data collection in support of spaceflight and analogue research. Aviation, Space, and Environmental Medicine, 76(6), Section II, B119 – B125. NASA-STD-3000. The standards are available on the World Wide Web. URL: http://msis.jsc.nasa.gov/. The quotation was taken from the Workload section of the document: http://msis.jsc.nasa.gov/sections/section04.htm#_4.10_WORKLOAD. Nechaev, A.P., Polyakov, V.V., & Morukov, B.V. (2007). Martian manned mission: What cosmonauts think about this. Acta Astronautica, 60, 351-353. Palinkas, L.A. (2001). Psychosocial issues in long-term space flight: overview. Gravitational and Space Biology Bulletin, 14(2), 25-33. Ramstack, T. (2006). Manned mission to Mars. The Washington Times, Published April 23, 2006. Rohles, F.H. (1960). Behavioral measurements on animals participating in space flight. American Psychologist, 15(10), 668-669. Rholes, F.H. (1966). Operant methods in space technology. In W.K. Honig (Ed.). Operant Behavior: Areas of Research and Application (pp. 677-717). Englewood Cliffs, NJ: Prentice-Hall, Inc. Rohles, F.H. (1992). Orbital bar pressing: A historical note of Skinner and the chimpanzees in space. American Psychologist, 47(11), 1531-1533. Sandal, G.M. (2004). Culture and tension during an international space station simulation: Results from SFINCSS '99. Aviation, Space, and Environmental Medicine. 75(7), Supplement, C44-C51. Shephard, J.M., & Kosslyn, S.M. (2005). The MiniCog rapid assessment battery: Developing a “blood pressure cuff for the mind.” Aviation, Space, and Environmental Medicine. 76(6, Section II), B192-B197. Sidman, M. (1960). Tactics of Scientific Research. New York: Basic Books, Inc. Skinner, B.F. (1953). Science and Human Behavior. New York: The Free Press. Skinner, B. F. (1960). Pigeon in a pelican. American Psychologist, 15, 28-37. Sullivan, W. (2006). Q&A: A Missionary for Mars Exploration. U.S. News & World Report. Posted 12/8/2006. URL: http://www.usnews.com/usnews/news/articles/061208/8nasa.htm Tafforin, C. (2005). Ethological indicators of isolated and confined teams in the perspectives of missions to Mars. Aviation, Space, and Environment Medicine, 76(11), 1083-1087.

134

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Williams, R.S., & Davis, J.R. (2005). A critical strategy: Ensuring behavioral health during extendedduration space missions. Preface to a special issue entitled “New Directions in Spaceflight Behavioral Health: A Workshop Integrating Research and Application.” Aviation, Space, and Environmental Medicine, 76(6), Section II, Supplement, entire issue. Zubrin, R. (2000). The Mars direct plan. Scientific American, 282, 34-37. Author Contact Information: Henry H. Emurian College of Engineering and Information Technology UMBC 1000 Hilltop Circle Baltimore, Maryland 21250 Email: [email protected] Web: http://nasa1.ifsm.umbc.edu/ Voice: 410-455-3206 Joseph V. Brady Behavioral Biology Research Center Johns Hopkins University School of Medicine 5510 Nathan Shock Drive Baltimore, Maryland 21224 Email: [email protected] Advertisement

Behavior Analyst Online www.Behavior-Analyst-Online.org The Behavior Analyst Online organization (BAO) develops and deploys new resources for behavior analysts and makes them available on the Internet free of charge to the public. These resources are dedicated to educating the public about behavior analysis as well as serving as a resource for professionals involved in the field of behavior analysis. The BAO organization is responsible the profession and to the public to develop resources that behavior professionals and others will find useful in everyday research, education, and application of the science of behavior analysis. The BAO organization offers may perks to its subscribers, including a Web Forum and the ABAPRO Mailing List. In addition, the organization publishes several major free e-journals of interest to the behavior analysis community: The Behavior Analyst Today The Journal of Early and Intensive Behavior Intervention The International Journal of Behavioral Consultation and Therapy The Journal of Speech and Language Pathology - Applied Behavior Analysis The Behavioral Development Bulletin Subscriptions are free. For details, visit our website at www.behavior-analyst-today.org

135

Hope maker. Be the one to help children and their families enjoy a happier outlook for their future.

The Marcus Institute, an affiliate of Kennedy Krieger Institute and the Emory University School of Medicine, is a nationally recognized center of excellence for the provision of comprehensive services for children and adolescents with developmental disabilities. The Marcus Institute, a University Center for Excellence for Developmental Disabilities (UCEDD), is one of 67 in the U.S. Our programs are designed to help children and adolescents with Developmental Delay, Autism Spectrum Disorder, Cerebral Palsy, Feeding Disorders, Learning Disabilities, Neurological Disorders, Severe Behavior Disorders and Fetal Alcohol Syndrome. We are currently seeking talented, dedicated individuals to join our staff in the following roles:

Supervising Behavior Analyst – Autism Program Coordinator – Severe Behavior Outpatient Program

We offer a very stimulating, growing, multidisciplinary work environment that is service- and research-oriented. Faculty appointments at the Emory University School of Medicine are put forward for eligible candidates. The Marcus Institute offers an excellent benefits package and a competitive salary. Interested, qualified candidates, please apply online at www.marcus.org or you may send vita, cover letter and three professional references who may be contacted to: John R. Lutzker, Ph.D., Executive Director, Marcus Institute, 1920 Briarcliff Road, Atlanta, GA 30329.

Behavior Analyst – Early Intervention and School Programs Senior Behavioral Analyst/Director, Marcus Behavior Center Strong background in leadership, supervision, mentoring, departmental budgets, research, grant writing and grant management is necessary. For all positions, an M.A. or Ph.D. is required, and eligibility for BCBA and Georgia State license is preferred.

www.marcus.org EOE, M/F/D/V

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Evaluating Features of Behavioral Treatments in the Nonhuman Animal Laboratory John C. Borrero, Timothy R. Vollmer, Andrew L. Samaha, Kimberly N. Sloman and Monica T. Francisco

Behavioral treatments for severe problem behavior are often derived from basic behavioral principles initially evaluated in the nonhuman operant laboratory. It is less common to see laboratory experiments conducted for the specific purpose of addressing nuances of behavioral treatments. Because of functional analysis/ assessment methods that are now commonly used in applied behavior analysis, integrated basic and applied research is more feasible and bidirectional. This is true because a functional analysis identifies reinforcers for problem behavior and, thus, control of those reinforcers is possible in a way that is similar to controlling access to reinforcers in basic research. Because of our enthusiasm for the possibility of conducting basic research on common behavioral treatments, we initiated a rat laboratory for that purpose. In this paper, we describe some early work from the laboratory. Keywords: behavioral treatment, severe problem behavior, functional analysis, basic and applied research, rat laboratory.

Applied behavior analysis has benefited significantly from laboratory findings involving nonhuman subjects (Branch & Hackenberg, 1998). Treatments designed to decrease severe problem behavior have included various forms of differential reinforcement (e.g., differential reinforcement of incompatible behavior [DRI], differential reinforcement of alternative behavior [DRA], or differential reinforcement of other behavior [DRO]; Vollmer & Iwata, 1992), extinction (EXT; Lerman & Iwata, 1996); positive and negative punishment (Lerman & Vorndran, 2002); and time-based schedules, or noncontingent reinforcement (NCR; Vollmer, Iwata, Zarcone, Smith, & Mazaleski, 1993). All of these procedures were initially evaluated in the nonhuman operant laboratory. For example, the basic parameters of delivering punishing stimuli (e.g., immediacy, intensity, schedule, and motivation) in applied settings are to this day based on the pioneering work of Azrin and colleagues in the 1950s and 1960s involving nonhumans as subjects (e.g., Azrin, 1956; Azrin, 1960; Azrin & Holz, 1966). Historically, the relationship between basic and applied research has been largely unidirectional (Vollmer & Hackenberg, 2001). That is, findings from nonhuman animal laboratories have been utilized by applied researchers and incorporated into behavioral treatments designed to reduce behavioral excesses or increase behavioral deficits among a wide range of populations (Cooper, Heron, & Heward, 1987; Miltenberger, 2004). The reverse, however, (nonhuman animal research designed to answer questions of applied significance) has been the subject of appreciably less research. This unidirectional course may seem pragmatic to some, and as Mace (1994) noted, its proponents may suggest that in the absence of effective technologies designed to address problems of social significance, nonhuman animal research may yield limited tangible outcomes. Although this unidirectional relationship has been effective in the majority of instances, several nuances encountered in application have not been addressed by nonhuman animal research. In addition, as suggested by Baer, Wolf, and Risley (1968), the difference between applied and basic research is not simply that which discovers and that which applies. Applied researchers may venture back to the nonhuman animal laboratory to investigate problems analogous to those of social significance encountered in more typical applied settings. This approach is common in other scientific fields, such as genetics and pharmacy. The purpose of this paper is to first detail the logic of studying behavioral treatments in a laboratory context. We contend that the approach is useful because: (a) laboratory research lends a degree of control over specific nuances of treatments that could not be controlled in applied situations, and (b)

136

The Behavior Analyst Today

Volume 8, Issue 2, 2007

the advent of functional analysis methods allows applied researchers to control and manipulate specific reinforcers for problem behavior; therefore, control and manipulation of reinforcers are more analogous to that in basic research than it would be if arbitrary reinforcers were used to treat behavior disorders. A second purpose is to briefly describe some of the experiments from this laboratory, in order to provide examples of the approach. The logic of a laboratory model Over the past 10-15 years, there has been a growing interest in applied research designed to replicate and extend basic research findings (Fisher & Mazur, 1997; Hineline & Wacker, 1993; Mace & Wacker, 1994). For example, Mace (1994) and others (e.g., Iwata & Michael, 1994; McDowell, 1982; 1988; 1989; Myerson & Hale, 1984; Nevin, 1996) have suggested one or more areas in which applied research might benefit from nonhuman laboratory findings. These areas included, for example, extensions of the matching law (Baum, 1974; Herrnstein, 1961, 1970) and behavioral momentum (Nevin, Mandell, & Atak, 1983), the applied implications of which have been recognized and expanded upon considerably (e.g., Ardoin, Martens, & Wolfe, 1999; Borrero & Vollmer, 2002; Mace & Belfiore, 1990; Mace et al., 1988; Mace et al., 1990; Strand, Wahler, & Herring, 2000; Symons, Hock, Dahl, & McComas, 2003). A complementary approach would be to strengthen the link between basic and applied behavior analysis by developing research questions that can be addressed in both kinds of laboratory settings. For example, treatment failures inherently associated with behavioral treatments could be more clearly elucidated in nonhuman animal laboratories while treatment failures associated with human implementation could be elucidated in applied research. Further, in application it may be recognized that several variations of a particular treatment are available, but it may not be clear which is most effective with all extraneous (human implemented) variables held constant. In such a case, it could be the applied researcher who detects a potential question and turns to the nonhuman laboratory to minimize extraneous variables that might contribute to treatment failures. The nonhuman laboratory provides a convenient setting in which extended and unknown learning histories can be minimized, and establishing operations (EOs) can be precisely controlled. For example, nonhuman subjects, when naïve to experimental arrangements, are typically obtained several weeks after birth, while the target response is shaped in the context of the experimental environment. On the other hand, human participants bring extensive learning histories to bear on the specific preparations. Similarly, levels of deprivation (e.g., food deprivation) can be precisely controlled in the nonhuman laboratory. Applied research has armed clinicians with an arsenal of effective behavior change procedures, but some features of testing may be addressed more easily in the basic animal laboratory, thereby reducing potential threats to internal validity. In addition, if multiple reversals and extended conditions are required to compare the relative efficacy of treatment procedures, it might be best to initially compare the procedures with no risk of harm to human participants (who might engage in self-injury, severe aggression, and so on). One obvious concern is that control by behavioral contingencies in the laboratory does not equate to control by behavioral contingencies in complex environments (i.e., a concern about external validity of laboratory findings). As suggested previously, basic research involving nonhuman subjects has the added advantage of a known reinforcer. Analogously, when a good functional analysis has been conducted (described next in more detail), the assessment and treatment of severe problem behavior involves manipulation of known reinforcers. The refinement of functional analysis methods has brought along numerous demonstrations of control by extinction, differential reinforcement, and other reinforcement schedules. It appears that this is the case because the specific reinforcer (or reinforcers) is (are) isolated via functional analysis and, therefore, extinction, differential reinforcement (and so on) are more analogous to those processes and procedures studied in the laboratory. Thus, because behavioral

137

The Behavior Analyst Today

Volume 8, Issue 2, 2007

treatments now commonly involve manipulation and control of reinforcers previously maintaining the target behavior, functional analysis has increased the external validity of basic research that also involves manipulation of reinforcers maintaining a target response (such as key pecking or lever pressing). The functional analysis method is perhaps best exemplified by the method introduced by Iwata, Dorsey, Slifer, Bauman and Richman (1982/1994). Iwata et al. designed an experimental assessment procedure that included condition-specific discriminative stimuli and condition-specific EOs thought to evoke self-injurious behavior (SIB). Also, particular consequences hypothesized to reinforce SIB (such as attention and escape from instructional demands) were presented. More recently, functional analyses have also included a tangible (or materials) test condition (Durand & Crimmins, 1988; Vollmer, Marcus, Ringdahl, & Roane, 1995), in which specific tangible items like toys or food are presented as a consequence to problem behavior. The significance of functional analysis to the treatment of problem behavior is that the event (or events) shown to reinforce problem behavior can be delivered independent of behavior (typically on fixed-time [FT] or variable-time [VT] schedules, e.g., Mace & Lalli, 1991; Vollmer et al., 1993), withheld completely (EXT; e.g., Iwata, Pace, Cowdery, & Miltenberger, 1994; Zarcone, Iwata, Smith, Mazaleski, & Lerman, 1994), delivered contingent on the omission of behavior (DRO; e.g., Cowdery, Iwata, & Pace, 1990; Lindberg, Iwata, Kahng, & DeLeon, 1999), or delivered contingent on an alternative response (DRA; e.g., Vollmer, Roane, Ringdahl, & Marcus, 1999). The application of the functional analysis method of assessment allows for the logical link between assessment results and treatment development. In turn, the link between assessment and treatment of problem behavior provides the essential components necessary to design nonhuman basic research that is analogous to that conducted in applied settings. In short, functional analysis, and functional analysis-based treatments make possible the nonhuman laboratory experimentation designed to improve applied technologies. Examples of basic research on applied problems Next is a brief overview of several ongoing projects in the rat laboratory that we have designed to address some questions faced in application. We have not yet published these studies, so the point of the current discussion is not to draw conclusions based on the results but to use these studies as examples of the type of research that has been, or could be, developed. The general topics of our research include: (a) differential reinforcement, and (b) noncontingent reinforcement (NCR). Differential Reinforcement Differential reinforcement is the process by which reinforcers are provided contingent upon the occurrence of desirable behavior (DRA) or the absence of undesired behavior (DRO). The goal of such procedures is to decrease undesired behavior through selective reinforcement of more desirable behavior. Two of our current studies relate to DRO. In one study, we are comparing a variation of DRO known as “resetting” DRO to a variation of DRO known as “momentary” DRO. In a resetting DRO, any response resets the interval to the beginning. For example, in a 5-min resetting DRO, if SIB occurs 20 s into the interval, reinforcement is not available for another 5 min. In a momentary DRO, the reinforcer is delivered at the end of the interval so long as the target response has not occurred in that very “moment” (the point of delivery). In application, both kinds of DRO procedures are implemented by delivering or withholding (accordingly) the reinforcer previously maintaining problem behavior (e.g., Mazaleski, Iwata, Vollmer, Zarcone, & Smith, 1993; Repp, Barton, & Brulle, 1983). In our laboratory, these procedures are implemented by delivering or withholding (accordingly) food reinforcers following the occurrence (baseline) or nonoccurrence (“treatment”) of rats’ lever pressing. If momentary DRO turns out to be as effective as whole-interval DRO under highly controlled laboratory conditions, the momentary procedure has several potential advantages: (a) it should be easier to implement because it only requires

138

The Behavior Analyst Today

Volume 8, Issue 2, 2007

monitoring of behavior at the very end of an interval (as opposed to throughout the interval), and (b) it should yield a higher rate of reinforcement. Another study related to DRO involves arranging “negative contingencies” such that the nonoccurrence of behavior is more likely to produce a reinforcer than is the occurrence of behavior. A true resetting DRO implemented with perfect integrity is a prototype example of a negative contingency because the nonoccurrence of behavior produces reinforcers, whereas the occurrence of the target behavior never produces reinforcers. However, in actual application it is likely that the DRO is not implemented perfectly and, thus, the environment arranges reinforcement contingencies on a continuum from negative, to neutral (the occurrence of target behavior does not change the probability of a reinforcer), to positive (the occurrence of target behavior is more likely to produce reinforcers than is the nonoccurrence of target behavior). When DRO is implemented at various degrees of integrity, it is possible that target behavior with a strong history of reinforcement will persist in the face of a negative contingency (perhaps a negative contingency representing good but not great treatment implementation integrity). To address this possibility, we have arranged several experiments designed to test levels of responding under negative contingencies in order to compare the rates to baseline (no reinforcement) and positive contingency conditions. If behavior persists under negative contingencies, we have not only learned something new about reinforcement, we will also have learned more about the effects of treatment integrity failures on problem behavior. Noncontingent reinforcement (NCR) The treatment package called NCR in applied behavior analysis essentially represents a timebased schedule of reinforcer delivery. In a time-based schedule, stimuli are presented independent of behavior on either a fixed-, variable- or random-time schedule (Catania, 1998). Currently, FT schedules are most common in application, although the use of VT schedules is emerging (e.g., Ahearn, Clark, Gardenier, Chung, & Dube, 2003; Carr, Kellum, & Chong, 2001; Van Camp, Lerman, Kelley, Contrucci, & Vorndran, 2000). One frequently noted advantage of time-based schedules is the ease of implementation (e.g., Fischer, Iwata, & Mazaleski, 1997; Hagopian, Fisher, & Legacy, 1994; Vollmer et al. 1993). Because reinforcer delivery is dependent on the passing of a prespecified interval of time and not on the occurrence (or nonoccurrence) of the target behavior, close monitoring of target responses is not required. In contrast, DRO procedures require the caregiver to carefully observe instances of problem behavior to ensure proper implementation of the resetting schedule. Time-based delivery of stimuli allows caregivers to attend to several clients or students at once. Vollmer et al. (1993) assessed the feasibility of time-based schedules as a treatment for SIB by comparing it to DRO. Functional analyses suggested attention as the reinforcer for the SIB of 3 individuals with developmental disabilities. Accordingly, procedures consisted of the delivery of 10 s of attention on a FT schedule, which was gradually faded from continuous attention to one interaction every 5 min. For all participants, both the time-based schedule and DRO were effective in reducing SIB immediately following implementation. However, for two participants, the time-based schedule initially suppressed SIB to lower levels than did DRO, and overall rates of SIB under the time-based conditions were lower and less variable than those produced by DRO. Because reinforcer delivery was based on a preset schedule and not on responding, participants received stimuli at a considerably higher rate under the time-based schedule than in DRO. Unlike the DRO arrangement, there are no lost opportunities for reinforcement when a time-based schedule is in effect. Although NCR is typically used to decrease problem behavior, applied research has also shown that under some circumstances, behavior is maintained even if reinforcer delivery is responseindependent. Some of our studies are designed to evaluate the phenomena of suppressive and maintaining

139

The Behavior Analyst Today

Volume 8, Issue 2, 2007

effects of the procedure under FT arrangements. In application, the suppressive effects of time-based schedules might be influenced by the degree to which the time-based schedule (i.e., the treatment) differs from the baseline rate of reinforcement (Ringdahl, Vollmer, Borrero, & Connell, 2001). For example, if problem behavior is reinforced once every 30 s in baseline, and a FT 30-s schedule is prescribed as treatment, the treatment may prove ineffective if the baseline rate and pattern of responding persist under treatment conditions. In one of our laboratory studies we are comparing response suppressive effects of two time-based schedules: one that was temporally similar to baseline and one that was temporally dissimilar to baseline. The rats are first exposed to FI 60-s baseline schedules and then are exposed to either FT 30-s (different) or FT 60-s (similar) “treatment” schedules. If the temporal similarity between baseline reinforcement rate and a time-based schedule (designed as treatment) is an important factor in the response suppressive effects of time-based schedules, then this information would have obvious applied implications. Specifically, researchers and clinicians could recommend time-based schedules that bear little temporal similarity to baseline reinforcement rate. Alternatively, if a caregiver or teacher wanted behavior to persist during FT, then he or she might intentionally match the baseline rate of reinforcement. For example, if academic engagement is initially reinforced on a FI 30-s schedule, a FT 30-s schedule might be easier for the teacher to implement yet might maintain behavior. In the natural environment, it is possible that the implementation of treatment will suffer from periods of poor treatment integrity. For example, the recommendation to deliver a reinforcer every 5 min may not be implemented as prescribed. In some circumstances, periods of poor integrity could be alternated with periods of good integrity and these periods could be “signaled,” such as when one parent implements treatment poorly and the other parent implements treatment well. In other circumstances, periods of poor and good integrity could be unsignaled, such as when the same caregiver is diligent at times and not so diligent at others. In our laboratory experiments, we have alternated periods of responsedependent schedules (FI) with periods of response-independent schedules (and extinction). In some experimental conditions, signals are associated with each component schedule (a multiple schedule) or no signals (a mixed schedule). Our findings show that responding (on FT) under a multiple schedule is higher than when FT is presented in isolation. Further, responding (on FT) under a mixed schedule is indistinguishable from responding on the FI. The implication is that if periods of free reinforcement and response-contingent reinforcement are alternated, responding may persist during the free reinforcement periods, especially if those periods are not signaled. This would be undesirable if the target response was problem behavior (such as self-injury or aggression), but it would be desirable if the target response was something like academic performance. Summary We have initiated a line of nonhuman animal research to evaluate some common difficulties encountered in the treatment of severe problem behavior and in the development of important skills via reinforcement. The impetus was to restructure the typical unidirectional approach of basic and applied research from one in which researchers apply procedures that evolved from the nonhuman animal laboratory to one in which applied researchers take a more active role in directing the type of nonhuman animal research that is conducted. The nonhuman animal laboratory provides luxuries in terms of control that can only be obtained with significant effort in applied settings. We still have much to learn about basic behavioral treatments such as DRO, EXT, and NCR. Specific investigations under highly- controlled contexts may assist clinicians in developing effective behavior change programs.

140

The Behavior Analyst Today

Volume 8, Issue 2, 2007

As the purpose of each of the aforementioned experiments is to improve applications, various experiments are either in development or in progress to assess the applicability of these procedures involving socially significant populations and responses. In addition, we are continuing the line of basic research to evaluate other features of reinforcement contingencies. References Ahearn, W. H., Clark, K. M., Gardenier, N. C., Chung, B. I., & Dube, W. V. (2003). Persistence of stereotypic behavior: Examining the effects of external reinforcers. Journal of Applied Behavior Analysis, 36, 439-448. Ardoin, S. P., Martens, B. K., & Wolfe, L. A. (1999). Using high-probability instruction sequences with fading to increase student compliance during transitions. Journal of Applied Behavior Analysis, 32, 339-351. Azrin, N. H. (1956). Some effects of two intermittent schedules of immediate and non-immediate punishment. Journal of Psychology: Interdisciplinary and Applied, 42, 3-21. Azrin, N. H. (1960). Effects of punishment intensity during variable-interval reinforcement. Journal of the Experimental Analysis of Behavior, 3, 123-142. Azrin, N. H., & Holz, W. C. (1966). Punishment. In W. K. Honig (Ed.), Operant behavior: Areas of research and application (pp. 380–447). New York: Appleton-Century-Crofts. Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1, 91-97. Baum, W. M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231-242. Borrero, J. C., & Vollmer, T. R. (2002). An application of the matching law to severe problem behavior. Journal of Applied Behavior Analysis, 35, 13-27. Branch, M. N., & Hackenberg, T. D. (1998). Humans are animals, too: Connecting animal research to human behavior and cognition. In W. O' Donohue (Ed.), Learning and behavior therapy (pp. 1535). Needham Heights, MA: Allyn-Bacon. Carr, J. E., Kellum, K. K., & Chong, I. M. (2001). The reductive effects of noncontingent reinforcement: Fixed-time versus variable-time schedules. Journal of Applied Behavior Analysis, 34, 505-509. Catania, A. C. (1998). Learning. Englewood Cliffs, NJ: Prentice Hall. Cooper, J. O., Heron, T. E., & Heward, W. L. (1987). Applied behavior analysis. New York: Macmillan. Cowdery, G. E., Iwata, B. A., & Pace, G. M. (1990). Effects and side effects of DRO as treatment for self-injurious behavior. Journal of Applied Behavior Analysis, 23, 497-506. Durand, V. M. & Crimmins, D. B. (1988). Identifying the variables maintaining self-injurious behavior. Journal of Autism and Developmental Disorders, 18, 99-117.

141

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Fischer, S. M., Iwata, B. A., & Mazaleski, J. L. (1997). Noncontingent delivery of arbitrary reinforcers as treatment for self-injurious behavior. Journal of Applied Behavior Analysis, 30, 239-249. Fisher, W. W., & Mazur, J. E. (1997). Basic and applied research on choice responding. Journal of Applied Behavior Analysis, 30, 387-410. Hagopian, L. P., Fisher, W. W., & Legacy, S. M. (1994). Schedule effects of noncontingent reinforcement on attention-maintained destructive behavior in identical quadruplets. Journal of Applied Behavior Analysis, 27, 317-325. Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267-272. Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243-266. Hineline, P. N., & Wacker, D. P. (1993). JEAB, November '92: What's in it for the JABA reader? Journal of Applied Behavior Analysis, 26, 269-274. Iwata, B. A., Dorsey, M. F., Slifer, K. J., Bauman, K. E., & Richman, G. S. (1994). Toward a functional analysis of self-injury. Journal of Applied Behavior Analysis, 27, 197-209. (Reprinted from Analysis and Interventions in Developmental Disabilities, 2, 3-20, 1982) Iwata, B. A., & Michael, J. L. (1994). Applied implications of theory and research on the nature of reinforcement. Journal of Applied Behavior Analysis, 27, 183-193. Iwata, B. A., Pace, G. M., Cowdery, G. E., & Miltenberger, R. G. (1994). What makes extinction work: An analysis of procedural form and function. Journal of Applied Behavior Analysis, 27, 131-144. Lerman, D. C., & Iwata, B. A. (1996). Developing a technology for the use of operant extinction in clinical settings: An examination of basic and applied research. Journal of Applied Behavior Analysis, 29, 345-382. Lerman, D. C., & Vorndran, C. M. (2002). On the status of knowledge for using punishment: Implications for treating behavior disorders. Journal of Applied Behavior Analysis, 35, 431-464. Lindberg, J. S., Iwata, B. A., Kahng, S., & DeLeon, I. G. (1999). DRO contingencies: An analysis of variable-momentary schedules. Journal of Applied Behavior Analysis, 32, 123-136. Mace, F. C. (1994). Basic research needed for stimulating the development of behavioral technologies. Journal of the Experimental Analysis of Behavior, 61, 529-550. Mace, F. C., & Belfiore, P. (1990). Behavioral momentum in the treatment of escape-motivated stereotypy. Journal of Applied Behavior Analysis, 23, 507-514. Mace, F. C., Hock, M. L., Lalli, J. S., West, B. J., Belfiore, P., Pinter, E., et al. (1988). Behavioral momentum in the treatment of noncompliance. Journal of Applied Behavior Analysis, 21, 123-141.

142

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Mace, F. C., & Lalli, J. S. (1991). Linking descriptive and experimental analyses in the treatment of bizarre speech. Journal of Applied Behavior Analysis, 24, 553-562. Mace, F. C., Lalli, J. S., Shea, M. C., Lalli, E. P., West, B. J., Roberts, M., et al. (1990). The momentum of human behavior in a natural setting. Journal of the Experimental Analysis of Behavior, 54, 163-172. Mace, F. C., & Wacker, D. P. (1994). Toward greater integration of basic and applied behavioral research: An introduction. Journal of Applied Behavior Analysis, 27, 569-574. Mazaleski, J. L., Iwata, B. A., Vollmer, T. R., Zarcone, J. R., & Smith, R. G. (1993). Analysis of the reinforcement and extinction components in DRO contingencies with self-injury. Journal of Applied Behavior Analysis, 26, 143-156. McDowell, J. J (1982). The importance of Herrnstein's mathematical statement of the law of effect for behavior therapy. American Psychologist, 37, 771-779. McDowell, J. J (1988). Matching theory in natural human environments. The Behavior Analyst, 11, 95-109. McDowell, J. J (1989). Two modern developments in matching theory. The Behavior Analyst, 12, 153-166. Miltenberger, R. G., (2004). Behavior modification: Principles and procedures (3rd Ed.). Pacific Grove, CA: Wadsworth. Myerson, J., & Hale, S. (1984). Practical implications of the matching law. Journal of Applied Behavior Analysis, 17, 367-380. Nevin, J. A. (1996). The momentum of compliance. Journal of Applied Behavior Analysis, 29, 535-547. Nevin, J. A., Mandell, C., & Atak, J. R. (1983). The analysis of behavioral momentum. Journal of the Experimental Analysis of Behavior, 39, 49-59. Repp, A. C., Barton, L. E., & Brulle, A. R. (1983). A comparison of two procedures for programming the differential reinforcement of other behaviors. Journal of Applied Behavior Analysis, 16, 435-445. Ringdahl, J. E., Vollmer, T. R., Borrero, J. C., & Connell, J. E. (2001). Fixed-time schedule effects as a function of baseline reinforcement rate. Journal of Applied Behavior Analysis, 34, 115. Strand, P. S., Wahler, R. G., & Herring, M. (2000). Momentum in child compliance and opposition. Journal of Child and Family Studies, 9, 363-375. Symons, F. J., Hoch, J., Dahl, N. A., & McComas, J. J. (2003). Sequential and matching analyses of self- injurious behavior: A case of overmatching in the natural environment. Journal of Applied Behavior Analysis, 36, 267-270. Van Camp, C. M., Lerman, D. C., Kelley, M. E., Contrucci, S. A., & Vorndran, C. M. (2000).

143

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Variable-time reinforcement schedules in the treatment of socially maintained problem behavior. Journal of Applied Behavior Analysis, 33, 545-557. Vollmer, T. R., & Hackenberg, T. D. (2001). Reinforcement contingencies and social reinforcement: Some reciprocal relations between basic and applied research. Journal of Applied Behavior Analysis, 34, 241-253. Vollmer, T. R., & Iwata, B. A. (1992). Differential reinforcement as treatment for behavior disorders: Procedural and functional variations. Research in Developmental Disabilities, 13, 393417. Vollmer, T. R., Iwata, B. A., Zarcone, J. R., Smith, R. G., & Mazaleski, J. L. (1993). The role of attention in the treatment of attention-maintained self-injurious behavior: Noncontingent reinforcement and differential reinforcement of other behavior. Journal of Applied Behavior Analysis, 26, 9-21. Vollmer, T. R., Marcus, B. A., Ringdahl, J. E., & Roane, H. S. (1995). Progressing from brief assessments to extended experimental analyses in the evaluation of aberrant behavior. Journal of Applied Behavior Analysis, 28, 561-576. Vollmer, T. R., Roane, H. S., Ringdahl, J. E. & Marcus, B. A. (1999). Evaluating treatment challenges with differential reinforcement of alternative behavior. Journal of Applied Behavior Analysis, 32, 9-23. Zarcone, J. R., Iwata, B. A., Smith, R. G., Mazaleski, J. L., & Lerman, D. C. (1994). Reemergence and extinction of self-injurious escape behavior during stimulus (instructional) fading. Journal of Applied Behavior Analysis, 27, 307-316. Authors’ Note The research described in the paper was supported in part by the National Institutes of Child Health and Human Development, Grant HD386898-02 and the National Institutes of Mental Health, Grant 5R03MH060643-02. We thank Stephen Haworth, Frans van Haaren, and the graduate and undergraduate students who assisted in running various aspects of the studies. Monica Francisco is now at the University of Kansas.

Author Contact Information: John C. Borrero, Ph.D. University of the Pacific Department of Psychology Stockton, CA 95211

Tel.: 209-946-7317 [email protected]

Timothy R. Vollmer Department of Psychology University of Florida Gainesville, FL 32611 Tel.:: 352-392-0601 ext. 280 [email protected]

144

The Behavior Analyst Today

Volume 8, Issue 2, 2007

A Behavior Analytic Look at Contemporary Issues in the Assessment of Child Sexual Abuse W. Joseph Wyatt The assessment of child sexual abuse has largely been ignored by behavior analysts, although behavior analytic theory and methodology, if applied, likely would advance the field. Three classic cases demonstrate historic errors that might have been avoided, had a behaviorally based approach been employed. Functional analytic interpretations are provided for phenomena that have been explored in a representative sample of studies that, though empirical, do not appear in the behavioral literature. Specific recommendations for practice, and a call for greater involvement of behavior analysis, are presented. Keywords: Child sexual abuse; behavioral assessment; behavior analysis, behavioral assessment of child sexual abuse.

When considering the assessment of child sexual abuse, one may ask, “What are the characteristics of a behavioral approach that make it suited to the task?” The natural science of behavior analysis and the philosophy of behaviorism appear to be well suited to address the assessment of child sexual abuse—especially when examined in contrast to the practices and philosophies that have been heavily relied upon up to now. To make this latter point clear it is helpful to briefly review several high-profile cases. (See Ceci & Bruck, 1995, for more extensive presentations of the cases.) They demonstrate what may occur when a non-scientific approach is applied to the assessment of sexual abuse in young children, specifically to those aged six and under. In reviewing the following cases it is important to remember that the majority of reports of child sexual abuse are true. Estimates of false accusations range from 5% to 35% (Bruck, Ceci & Hembrook, 1998). For example, in a Denver study of 576 reports in a single year it was found that 53% were “indicated” (likely true), 24% contained insufficient information on which to make a true/untrue decision, and 23% were “unfounded” (clearly untrue). Within the last group about three-fourths were thought to have been made in good faith, with the remainder thought to have been knowingly false (Jones & McGraw, 1987). Studies in Michigan looked at true and false allegations of child sexual abuse in divorce cases. One review of 169 cases found that 33% were “unlikely” to be true, based upon Child Protective Services and the court appointed evaluators’ opinions, while another review of 215 cases found that 20% were “unlikely” based on extensive (a mean of 62 hours) multi-disciplinary investigation (Faller, 2001). Thus, the parallel imperatives of child protection and fair treatment of the accused mandate a scientific skepticism about the methods by which reports of child sexual abuse are assessed. Wee Care Nursery School Kelly Michales was a staff member at the Wee Care Nursery School in Maplewood, New Jersey, where she was accused of sexual abuse of numerous children. The accusations began when a 4-year-old was having his temperature taken rectally by his doctor and said, “Her takes my temperature.” The child’s mother contacted Child Protective Services and the boy was then interviewed by an examiner who used anatomically detailed dolls. The child disclosed that Michales had also molested two other boys, although when they were interviewed they denied it.

145

The Behavior Analyst Today

Volume 8, Issue 2, 2007

At that point the school’s staff sent letters to all parents advising them of the disclosures and had a social worker make a presentation to the parents. Many parents, convinced that the allegations were true, placed their children in therapy. Following many interviews by various professionals (therapists, prosecutors, and mental health professionals hired by the prosecution), twenty children accused Michales of molestation. Many of the accusations seemed incredible, such as that Michales had penetrated them with a sword, smeared peanut butter all over their bodies and licked it off, and had done all of these things at the school although it was open and accessible by parents and staff at all times. In August, 1988, Kelly Michales was convicted of 119 counts of abuse and sentenced to 47 years in prison. However, after serving five years Michales was ordered released on bail by New Jersey’s Supreme Court which found the children’s testimony had been tainted by the questionable interview techniques used by the professionals. The court then placed a burden on the prosecution: Before Michales could be re-tried the prosecution would have to appear at a pre-trial “taint hearing” and show that the children’s testimony should still be considered reliable. Faced with such a burden, the prosecution dropped all charges against Michales and she was again a free woman, although her life had been forever changed (State v. Michales, 1988). Little Rascals Day Care Center The North Carolina coastal community of Edenton was the location of the Little Rascals Day Care Center where accusations unfolded in 1989. They began when a parent claimed that the owner’s husband, Bob Kelly, had molested her son. Within a month three other children had made similar claims. Eventually ninety children accused Kelly and other staff members of similar acts of molestation. The children claimed that they had been abused in numerous locations including at the school, in outer space, on supervised boat outings, and that babies had been murdered during rituals of abuse. The children claimed also that the molestation had been photographed. No babies were ever reported missing and no photographs were ever found. All of the abuse was said to have happened even though the nursery school was open, parents were free to drop in at any time and none had ever witnessed any suspicious activity. In April, 1992, Bob Kelly was convicted of 99 counts of sexual abuse and sentenced to life in prison. A staff member, Dawn Wilson, was also convicted. Aware of this, two other staff members pled guilty to lesser charges and received much lighter sentences in exchange for their guilty pleas. However, in May, 1995, the convictions of Kelly and Wilson were overturned by the North Carolina Supreme Court, for reasons similar to those cited by the New Jersey Supreme Court in the Michales case. In May, 1997, all charges against Kelly and Wilson were dropped and they were released, while those who had pled guilty had to serve out their sentences. At the same time the charges were dropped, the prosecution announced that it would pursue a new charge against Kelly, claiming he had molested another child in 1987 (State v. Kelly, 1991). Prosecutors appeared to have little appreciation for the possibility, or likelihood, that they were pursuing innocent people. Prosecutorial fervor for the case evidently persisted long after it had become clear that the case had taken a series of wrong turns. Despite the disastrous results, one of the prosecutors continues to hold herself out as an expert. As recently as November, 2006, Nancy Lamb, still working as an assistant district attorney, was co-presenter of a training program for professionals titled “The Necessary Components of a Legally Defensible Child Sex Abuse Investigation” (Ryan & Lamb, 2006).

146

The Behavior Analyst Today

Volume 8, Issue 2, 2007

McMartin Preschool Perhaps the best known case of alleged multiple child sexual abuse occurred in Manhattan Beach, California, starting in 1983. It began when a parent, who was known to have previously made false allegations of sexual abuse against her ex-husband, went to police, claiming that her 2-year-old had been sexually molested by Ray Buckey. Buckey worked at the McMartin Preschool, which was owned by his grandmother. Before the case was ended, Buckey, his grandmother, his mother and five other staff members were accused of 321 counts of child sexual abuse, some of which were said to have occurred ritualistically in tunnels beneath the pre-school. By 1986, all charges, except several against Buckey and his mother, had been dropped for lack of evidence. In January, 1990, following what was up to then the most expensive trial in California history, the jury returned not guilty verdicts on 52 counts, but remained deadlocked on 12 others against Buckey and one against his mother. The judge then dismissed the single count against Buckey’s mother. A second trial was held on the remaining charges against Buckey. It also ended in a mistrial. At that point the prosecution decided not to re-try Buckey and the case ended, after seven years and a cost of $16 million (State V. Buckey, 1990). Costs In both human and financial terms, the costs of these and similar cases of misapplied science (or, rather, application of non-science) in the examinations of children were enormous. Many children grew up convinced that they had been molested when, most likely, they had not. At the very least, the children came to believe that the legal system had let them down. One wonders whether later in life these children will experience routine difficulties in relationships or other functioning that they will attribute to sexual abuse that never happened. There are other costs. Defendants who probably were not guilty spent years in prison, their lives forever changed. Defendants’ families and children’s families suffered as well. Professionals and professions suffered lost esteem in the wake of these cases. Financial costs of prosecutions and defenses were enormous. As well, it is possible that the adverse publicity from failed prosecutions continues to influence juries who may assume that allegations of child sexual abuse are overblown and frequently false (West, 1996). This could result in not guilty verdicts for actual perpetrators who then go free to continue their abuse. Thus, the costs make clear that the behavioral sciences, including behavior analysis, address the issue of child sexual abuse assessment. Specifically, What Went Wrong? The absence of a science-based approach to the assessment of child sexual abuse contributed powerfully to the disastrous outcomes described above, as well as to similarly unfortunate outcomes in numerous other cases. The problems have been articulated at length elsewhere (See Ceci & Bruck, 1995) and will be only briefly discussed here. However, much of what went wrong may be reduced to the examiners’ evident failure to appreciate the situational control of behavior. Those who conducted the examinations of young children in the above cases frequently failed to appreciate their own antecedent and consequent control over the children’s reports. Within that general set of issues, however, were a number of specific assessment activities that deserve discussion.

147

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Possibly the most characteristic error was their pursuit of a single hypothesis—that any child adjustment problems, and especially those of an anxious nature, must have been due to sexual abuse. For example, if a child demonstrated regressed toileting, excessive nightmares, fingernail biting or other anxious behaviors, it was thought that sexual abuse was the culprit. Little thought was given to whether there may have been alternative causes of the child’s distress. No one asked whether the child’s difficulties in adjustment could have been due to events such as a move to a new neighborhood, death of a grandparent, marital strife between the parents, or the like. There is no compelling evidence that most childhood behavior problems, including anxious behaviors, are due to sexual abuse. Research tells us that there is no standard way that children respond to sexual abuse. While thoughts, feelings, and overt behavior of an anxious nature are fairly common among some child victims, many others exhibit extreme sadness, conduct problems or other problems. Moreover, one-third of child victims exhibit no measurable problems at all in their overt behavior, thoughts or feelings (Saywitz, Mannarino, Berliner & Cohen, 2000). In many of the cases such as those described above, an effort to explore all potential causes of observed behavioral difficulties would have improved the assessment process. Second, it was common for children to be interviewed numerous times, including by examiners who proceeded as if the abuse had occurred, regardless of whether a disclosure had been made. Within those interviews it was not uncommon for the examiners to engage in several other unfortunate tactics: They often used anatomically detailed dolls, devices that are plagued by concerns about their suggestiveness. They asked many leading questions and tended to differentially reinforce statements that would implicate (as opposed to exculpate) an alleged perpetrator. At times, interview tapes and transcripts show, they punished exculpatory statements with disapproval and threats (“You can go home, but only after you tell us what we know you know.”). They assumed that children, in their innocence, could not fabricate a tale of sexual abuse and further assumed that the presence of peripheral details in a child’s story was a valid indication that abuse had occurred (Ceci & Bruck, 1995). There were other errors as well. Interviewers frequently failed to directly assess the child’s tendency to be easily led. Some believed that, although a child might fabricate enjoyable events, it could not fabricate unpleasant or frightening events. They often employed negative stereotypes of the accused (“You can help us stop Mr. Jones from hurting other kids if you tell us what happened to you.”), and they typically failed to directly assess the child’s understanding of the concepts of either fantasy/reality or truth/non-truth. Making things worse was the absence of science-based, agreed-upon assessment procedures that defined best practices in the assessment process. Finally, once an allegation was made, parents and professionals often became emotionally involved to the point that objectivity was lost. There Were No Tunnels Under the McMartin Pre-school An absence of appreciation for objectivity and the resulting drift from it are illustrated in a sidebar to the McMartin case. During numerous suggestive interviews and therapy sessions, several of the pre-schoolers had alleged that ritualistic sexual abuse, at times including animal sacrifices, had taken place in tunnels under the school building. Law enforcement authorities had looked under the building and found nothing. There was neither a basement nor tunnels, and the concrete floor of the pre-school showed no signs of having been disturbed or jack-hammered to provide an opening in the floor. Numerous parents, however, were convinced that their children’s stories of tunnels were true, and that the alleged perpetrators had backfilled the tunnels after accusations had arisen.

148

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Unhappy with police efforts to locate the tunnels, several parents brought in a backhoe and began digging. Authorities halted the process, but the parents then received permission from the property’s owner to hire a professional archeologist who dug into the site in search of evidence that the tunnels had once existed (Stickel, 1993). His evident lapses in objectivity were startling. In an area under the building, the archeologist found differential soil compaction and many artifacts such as bottles and tin cans, as well as animal bones. These caused him to conclude, in part, that, “There is no other scenario that fits all of the facts except that the feature was indeed a tunnel…therefore, this project’s goals or objectives were met with data which probabilistically corroborates (sic) reports made by the children regarding the site” (p.96). However, there is substantial reason to conclude that the archeologist ignored or discounted much of his own evidence and, thus, reached an erroneous conclusion. Important was that the pre-school building had been constructed in 1966. Next door to it was a vacant lot where, in the once rural California community, a home had stood from the 1930s or 1940s until about 1972. In their zeal to find evidence to corroborate the children’s reports of tunnels, the archeologist and the children’s parents had missed an important and likely more plausible explanation of the findings—that the pre-school had been constructed atop a family’s old trash dump, a pit that had been used by the owners of the home next door in an era that pre-dated both construction of the pre-school and home trash pick-up in the area. For example, the archeologist’s analysis of the artifacts revealed that all of the bottles dated from the 1950s and earlier. The animal bones came from chickens, pigs, dogs, birds and cattle, all of which were adult animals at the time of death. The cuts on the bones were the type made with a band saw, such as is used in a butcher shop (and therefore unlikely to have been made during an animal sacrifice). Only two of the several hundred artifacts dated after construction of the school. They were a clip used to repair plumbing (which likely was left during a repair job), and a “snack sized cellophane wrapper” which probably was placed there by a burrowing rodent in a process known as bioturbation. One may question why an archeologist would not have asked himself how or why alleged sexual perpetrators re-filled a tunnel with dirt that contained essentially no trash manufactured after the 1940s? Why would they have used a band saw in a ritualistic sacrifice? How would they have gotten an adult beef into a small tunnel? (The archeologist found the tunnel to have been only four feet high.) These and similar gaps in the archeologist’s analysis are described more fully elsewhere (Wyatt, 2002). A theoretical functional analysis of the archeologist’s premise and conclusions is instructive. A Functional Analysis of Professional Behavior By all accounts, antecedents to the archeologist’s conclusions included the presence of an emotionally charged group of parents and their sincere descriptions to the archeologist of what they believed had happened. As well, the archeologist may have believed that children were incapable of fabricating stories of sexual abuse in tunnels, and he may not have appreciated the extent to which suggestive techniques can cause children to fabricate, a phenomenon now well known and that is described in some detail below. Consequences that could have contributed to his discounting, or failure to consider, the possibility of a trash pit may have included praise from parents and members of the community as the dig progressed. “Operant seeing” (Skinner, 1974) may explain how a classroom door that was always open, and a non-functioning fire alarm, were taken by the archeologist as likely evidence of child sexual abuse (Stickle, 1993). Similarly, operant seeing (or operant reading) may explain another of the archeologist’s conclusions. In addition to his own digging, he had hired a local firm to employ Ground Penetrating

149

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Radar (GPR) in search of tunnels. The GPR firm’s complete results were stated in two sentences: “In Areas One, Two, Three and Four (see Figures Two, Three, Four and Five) the GPR depth of penetration was approximately 8 to 10 feet below ground level. No evidence was found to support the existence of filled-in below ground tunnels.” However, when the archeologist wrote his final report, he interpreted the unambiguous GPR results to mean the opposite: “Thus, the GPR was successful in detecting the main tunnel at the locus of the dividing wall between the two classrooms” (Stickle, 1993). Perhaps the archeologist had never lived in, or otherwise been exposed to, a rural area where families dig a pit and burn their trash. Such a history would serve as another antecedent, or establishing condition, one that could have contributed to his failure to consider a plausible alternative hypothesis, that a long-ago family’s trash pit would have parsimoniously explained the debris he found. One would think that an additional artifact should have prompted the archeologist to reconsider his “tunnels” conclusion. This was the battered mailbox, complete with name and address, of the family that had lived in the house next door decades earlier. A different learning history might have come together with observation of the mailbox with the result that the archeologist would have asked himself potentially disconfirming questions. How would Ray Buckey have happened upon fill dirt that contained that mailbox? Would he have saved the original, artifact-filled soil and then returned it? If he had saved that soil, why had it contained only decades-old artifacts? An especially interesting question is this: What history would have been powerful enough to cause an archeologist to overlook or discount disconfirming evidence? Perhaps emotional antecedents, coupled with social reinforcers for preliminary reports of a “tunnels find” are all, or part, of the answers. In any event, it is clear that non-science was the result. Behavioral Assessment Issues The cases described at the opening of this article reveal a number of mistakes, by well-intentioned interviewers that brought about unreliable reports of child sexual abuse. These mistakes have occurred repeatedly across many cases. There were a number of particularly salient errors of technique and theory. These included pursuit of a single hypothesis (usually that the abuse did occur); repetition of questions and interviews; use of anatomically detailed dolls; use of leading questions; differential reinforcement of statements that either implicate (most often) or exculpate the accused; the assumption that children, in their innocence, could not be wrong about a matter as serious as sexual abuse; the belief that description of peripheral details validated the central allegation of abuse; interviewers’ failure to actively test children to determine their level of suggestibility; the belief that children can not fabricate unpleasant events; suggestion of a negative feature or stereotype of the accused (“He did it to some other boys and girls. Did he do it to you?”); and failure to appreciate, or actively evaluate both the child’s ability to differentiate fantasy from reality and the truth from untruth. That these assumptions and procedures were deleterious to the examination process has by now been relatively well established. What follows is not intended as an exhaustive review of each of these issues. Rather, representative research and examples, several of them seminal, are presented. Pursuit of a Single Hypothesis and Repetition of Questions What is likely to result when an interviewer of young children proceeds with a bias that abuse occurred, or with a bias that it did not occur? Some light was shed on that question by a study of two groups of children aged five and six who were visited in their classroom by a man. One group observed the man and talked to him as he cleaned some dolls. The second group observed and interacted with the man as he played roughly with the dolls. The children were questioned several times on one day by interviewers who employed one of three verbal styles regarding the man’s handling of the dolls:

150

The Behavior Analyst Today

Volume 8, Issue 2, 2007

accusatory, exculpatory or neutral. Later, the children were questioned by other interviewers. It was found that 75% of the children’s remarks were consistent with the tones employed by their earlier interviewers, rather than with the actions of the classroom visitor (Clark-Stewart, Thompson & Lepore, 1989). The examiners’ verbal styles exerted powerful stimulus control over the children’s reports. In another study, interviewers were purposely given erroneous information about a game that had been played by children aged 3 to 6. The interviewers, who had not observed the game as it was played, were asked to get the children to describe as much about the game as possible. Among 3 and 4-year-old children, 34% corroborated one or more fictional events that the interviewers believed had occurred. With 5 and 6-year-olds, 18% corroborated such events (White, Leichtman & Ceci, 1997). Clearly, when an interviewer establishes a preferred direction for children’s answers, substantial percentages of children acquiesce. For most young children adult approval is a relatively powerful secondary reinforcer, and has acquired some intrinsic reinforcement value as well. Prompting, smiling, nods of the head, words of praise and the like were most likely unintended, but powerful, influences upon the children’s reports. The influence of repeated questioning is further evident in a study in which young children were examined by a pediatrician. The doctor did not touch their genitals. Later they were repeatedly asked (using non-leading questions, but with anatomically detailed dolls) whether the doctor had touched their genitals. Ultimately about 50% of them said that their genitals had been touched during the exam. In contrast, at the first interview following the pediatric examination, none had said so (Ceci, Leichtman & Bruck, 1995). Functionally analyzed, it is probable that the interviewer’s style or tone (accusatory/exculpatory/neutral) serves as an antecedent leading to agreement with that tone, at least for many children. Similarly, repetition of questions serves as a relatively powerful consequence. That is, repetition of a question implies that the preferred response has not yet been given. Under such circumstances, it is not surprising that an interviewer’s tone comes together with a child’s history of having been taught to comply with the wishes of authority figures. The result is agreement with the interviewer, rather than telling what happened. The controlling authority of adult approval has been demonstrated in countless studies. (e.g., See any issue of The Journal of Applied Behavior Analysis.) Recently that was described anecdotally in an open letter written by a man who had been one of the McMartin children in the 1980s. In his letter, published in the Los Angeles Times, he described feeling that he would never be allowed to go home unless he told the interviewers what they wanted to hear and how he broke down under repeated questioning. “Anytime I would give them an answer that they didn’t like, they would ask again and encourage me to give the answer they were looking for…I remember breaking down and crying. I felt everyone knew I was lying. But my parents said, ‘You’re doing fine. Don’t worry.’ And everyone was saying how proud they were of me, not to worry” (Zirpolo & Nathan, 2005).

Use of Anatomically Detailed Dolls Use of anatomically detailed dolls (AD dolls) in the assessment of child sexual abuse remains controversial. The concern is that they will contribute to false accusations because children’s attention will be drawn to the dolls’ genitals, which will in turn lead children to make false allegations. In a study of 9 abused and 9 non-abused children’s free play with the dolls, the children exhibited no differences in

151

The Behavior Analyst Today

Volume 8, Issue 2, 2007

their amounts of sexualized play (Kenyon-Jump, Burnette & Robertson, 1991), suggesting that the dolls are of little use. However, free play is not the same as the forensic interview process. In another pediatric study, two groups of 3-year-old children were examined. One group received gentle touching of their buttocks and genitals, the other no such touches. Later an interviewer asked, pointing to parts of the AD dolls, “Did the doctor touch you here? Or here?” Among the children who had been touched, only 47% replied affirmatively to questions about their genitals and buttocks. Among those who had not been touched, 50% indicated that their genitals or buttocks had been touched. Thus, the children’s responses were essentially random, probably due to their young ages. The dolls were not helpful. The American Psychological Association’s approach to the use of the AD dolls has changed over the years. Earlier the organization held that there were no standardized methods for their use, and that normative data on abused and non-abused children’s responses to the dolls were essentially non-existent. In spite of those conclusions, the APA went on to say that the AD dolls, “may be the best available practical solution for a pressing and frequent clinical problem” (American Psychological Association, 1992). APA later seemed to modify its position when a blue-ribbon committee re-evaluated the state of the literature and recommended that APA reconsider whether valid doll-centered assessment techniques exist (Koocher, Goodman, White & Freidrich,1995). Today many professionals suggest that the dolls not be used in the assessment of child sexual abuse at all, because it is difficult to rule out the possibility that normal curiosity about the dolls’ genitalia serves as an antecedent for invalid statements by the child (Wyatt, 1999).

Leading Questions and the Belief that Children Can Not be Wrong Does accuracy of a child’s report depend on whether pleasant or unpleasant events are being reported? Could a child possibly be wrong or fabricate events as troubling as sexual abuse? Does any of this relate to whether interviews are suggestive or repeated? These questions were subjects of a study (Bruck, Ceci & Hembrook, 1997) in which preschoolers were interviewed regarding four kinds of events. Two of the events were true and two had never happened, but were suggested to the children. Furthermore, one of the true events was positive (helping a school visitor who had tripped—an event which was staged for the children), while the other actual event was negative (a recent punishment by a parent or teacher). Of the fictional events that were suggested by the interviewers, one was positive (the suggestion of having helped a lady look for a lost monkey in a park) and one negative (the suggestion of having witnessed a man stealing food from the day-care building). The children were interviewed on five occasions. At the first interview they were asked whether these events had happened and for any details. The second, third and fourth interviews were suggestive: Peer references were used (“Sue said you saw it too. Did you?”). Visualization was employed (“Try to think what might have happened.”). Questions were frequently repeated and children were praised for assenting to the suggestions. Then, a fifth interview was conducted by a new interviewer who used nonsuggestive questions. By the third interview more than 50% of the children assented to all of the events, whether the events were positive or negative, and whether the events had actually occurred or were fictional. The

152

The Behavior Analyst Today

Volume 8, Issue 2, 2007

children’s tendency to assent held through the fifth interview as well. Thus the crucial variables for most of the children were not the events, but the interview techniques. Clearly the assumptions that children cannot be wrong or may not be easily led to report negative events are erroneous. Control of a young child’s responses to questions about alleged sexual abuse may go far beyond the possibility that the abuse actually occurred. Rather, antecedent interviewer suggestion and consequent repetition of questions, combined with differential praise and the like, may come together to render the child’s reports invalid. Do Children’s Linguistic Styles Provide Clues to Validity? It is common for examiners to hold that they are able to make judgments about the validity of a child’s report based on the child’s style of reporting. For example, a child’s report is sometimes said to be more believable when its utterances are fairly spontaneous, its descriptions maintain a consistent time frame, it employs dialog said to have occurred, it includes elaboration of details, or if the child’s report is consistent across interviews. To answer these questions, tapes of the monkey-thief study described above were analyzed using only sequences in which the children fully assented to witnessing or participating in events (Bruck, Ceci & Hembrook, 1997). Results showed that spontaneous utterances were rare, even from the first interview, except for the true positive event (helping a visitor who had tripped). By the second interview there were no differences in frequency of spontaneous utterances about true and false events, a finding that held through the fifth interview. An examination of time frame markers, use of dialog, and elaboration revealed no differences by the second interview. By the third interview the false event narratives contained more creativity—more elaborate details—than did descriptions of true events. Perhaps this last finding should not be surprising. Creativity has been addressed by behavior analysts. Skinner (1974) noted that, “In the field of human behavior the possibility arises that contingencies of reinforcement may explain a work of art or the solution to a problem in mathematics or science without appealing to a different kind of creative mind or to a trait of creativity…” (p. 246-7). Behavior analytic research has shown that children’s creative activity may come under operant control (Glover & Gary, 1976; Goetz & Baer, 1972; Goetz & Baer, 1973; Goetz & Salmonson, 1972). Thus it is not surprising that creativity in the description of alleged sexual abuse may also come under operant control. That phenomenon might be especially pronounced when the allegations are false, because the occasion for creativity is present from the start. Consistency is typically taken as an indication of validity of the child’s report. In the monkeythief study it was found that true stories contained more consistency, defined as mention of a detail in more than one of the five interviews. However, when inconsistency (mention of “A” in one interview and “not A” in another) was examined it was found that both true and false events produced the same, fairly low rates of inconsistency. Thus, a single non-suggestive interview allowed for clearest differentiation between true and false events. Repeated interviews only caused false stores to resemble true stories on the linguistic markers of interest. An exception was that repeated interviews actually increased the likelihood that false stories would contain more frequent descriptive statements than true stories. Are Children Lying When They Make False Claims of Sexual Abuse? Some professionals have assessed children for sexual abuse based upon the assumption that children, in their innocence, could not be wrong about allegations of sexual abuse. The historical

153

The Behavior Analyst Today

Volume 8, Issue 2, 2007

frequency with which expert witnesses have said that children “would not lie” about sexual abuse is unknown, but anecdotally is thought to have been common, and continues to occur occasionally. However, it is misleading to conclude that experts and jurors must choose between two distasteful options. That is painfully restrictive in a case in which the child seems to earnestly describe sexual abuse, but no other compelling evidence of a crime exists, and/or the other evidence actually mitigates for the accused. In such cases both experts and jurors are reluctant to conclude either that the child is lying or that they must send a possibly innocent defendant to jail. There is a third option, one more palatable to jurors in such cases: that the non-abused child has erroneously come to believe he or she was abused. Thus, the child is not lying in the usual sense of the word. This may explain why children in the cases presented at the beginning of this article gave testimony in earnest, with no “signs” of lying. In those cases the children were interviewed repeatedly, typically using suggestive antecedent verbal behavior coupled with differential reinforcement of accusatory, vis-à-vis exculpatory, statements. As a result of the interview process children came to believe they had been abused, although they had not. In light of such a strong possibility, it seems important that an expert witness be prepared to explain the “third option” to jurors. Negative Representations of the Accused What happens when an interviewer describes the accused in negative terms? May this serve as an antecedent for a child’s saying that sexual abuse occurred? The answer likely is yes. This is particularly true if other inappropriate techniques (e.g., repeated interviews, suggestive questions) are combined with negative depictions of the accused. As with many of the other empirical questions about assessment of child sexual abuse, it is difficult to ethically get at the issue. However, some approximations are available. In one study children ages 3 to 6 were told that they would be visited by a man named Sam Stone, and that he was a clumsy man. He visited the children, but did nothing clumsy. Over the next ten weeks the children were interviewed four times and asked with frequent leading questions whether he had ripped things or had carelessly tossed things into the air, etc. Among 3-yr.-olds, 72% said he had engaged in one or more clumsy behaviors, and 44% said they had seen him do so. Among 6-yr.-olds, 11% said they had seen it (Leichtman & Ceci, 1995). Of particular interest is that the children described the fictional clumsy events quite sincerely. In a follow-up to the study, over one thousand researchers and professionals in the area of child testimony were shown videotapes of the children’s interviews. When asked to make a determination about which of the children’s statements were accurately describing what had happened, the experts were wrong somewhat more than 50% of the time (Ceci & Bruck, 1995). Knowledge of Developmentally Normal Behavior Sexual behavior in young children is statistically normal (Rathus, Nevid & Fichner-Rathus, 1993), a fact that is at odds with opinions sometimes offered on the witness stand by experts. In a study of more than 1,000 children who had not been molested, it was found that sexual behaviors including sexual self-stimulation, exhibitionism, sexually rubbing up against someone and other behaviors were reported by their mothers as fairly common—too common to be employed as markers of sexual abuse (Friedrich, Fisher, Broughten, Houston & Shafran, 1998). Similarly, a child’s sexual knowledge “beyond his years” must not be taken as evidence of sexual abuse. We are awash in a culture with sexual images in magazines, on television, in videogames, and in

154

The Behavior Analyst Today

Volume 8, Issue 2, 2007

other media. Moreover, in many instances sexually suggestive television programming is not confined to hours that children ordinarily would be in bed. Thus, what has historically been thought to be a marker for sexual abuse must be discarded. Use of One, or Only a Few, Therapists In each of the cases described at the beginning of this article, only a few professionals examined dozens of children. This was problematic for a reason that becomes obvious. Two important opinions have been formed when a professional concludes that a child has been sexually abused by a specific individual. The first is that the child has been a victim, and the second is that the accused is a pedophile. The professional, having labeled the accused as a pedophile, is more likely to conclude that additional children who had contact with the accused were molested as well. The initial positive finding serves as an establishing condition for additional positive findings, when other children are examined by the same professional. Thus, it is preferable that in any given case in which several children may have been abused, a given professional should examine only one child. Other professionals will be needed to examine additional children. Is There a Routine Pattern to Disclosures of Sexual Abuse? The professional’s job would be much easier if a routine pattern existed by which disclosures are made. Unfortunately, such a pattern does not seem to exist. An early effort to define such a pattern was known as the Child Sexual Abuse Accommodation Syndrome (CSAAS) (Summit, 1983). The model was theoretical and held that the child would go through stages of secrecy, helplessness, entrapment and accommodation, disclosure, and recantation. Many professionals came to accept the CSAAS and rely upon it in examining children, although it was never validated and its originator had advised against using it diagnostically. An extensive review of research relative to the CSAAS found that delay of disclosure is its only component for which there is empirical support (London, Bruck, Ceci & Shuman, 2005). Similarly, there seems to be no consistent pattern of diagnostic signs among children who have been sexually abused. About one-third of them exhibit rather serious problems as a result of sexual abuse. Although anxious behaviors are the most frequently seen, sexually abused children often exhibit depressive behaviors, conduct problems or other difficulties. Moreover, about one-third of sexually abused children exhibit only minor, sub-clinical, problems in adjustment, while another one-third exhibit no measurable problems in their overt behavior, thoughts or feelings. This last phenomenon may be due to the fact that they quickly received optimal support once they made their disclosures, or had better early training about sexual abuse (e.g., had been taught that sexual abuse is never a child’s fault, etc.), or other factors. (For more see Saywitz, Mannarino, Berliner & Cohen, 2000.) Thus, the notions that there is either a pattern to disclosures or that there are behaviors that are diagnostic of sexual abuse must be abandoned. What May We Conclude from the Research? The research suggests a number of conclusions and recommendations. These fall into two categories, those involving our guiding assumptions and those that provide us with specific directions for practice:

155

The Behavior Analyst Today

Assumptions • •



Volume 8, Issue 2, 2007

It is important to make no assumptions as to predictable patterns of disclosure, except that there are none. An exception is that many children exhibit a tendency to delay their disclosures. There is no tendency to recant. Similarly, there are no standard diagnostic “signs” of sexual abuse. Although anxious behaviors are the most common, they are by no means observed in a majority of sexually abused children. One-third of sexually abused children exhibit no problem behaviors at all, and another one-third exhibit only minor problems in adjustment. There are no consistent linguistic markers within children’s sexual abuse accounts. Directions for Practice

• • • •



• • •

A single well done interview is preferable to multiple interviews. Anatomically detailed dolls should not be used. The examiner must consider and explore alternative hypotheses that might account for a child’s disclosure, his problem behaviors, or both. Leading questions are to be avoided. Typically these are questions that can be answered either yes or no. It should be noted, however, that at some point an examiner should clarify definitions of, and whether the child has experienced, good and bad touches. If the child answers affirmatively regarding bad touches then the examiner should ask that the child describe what happened, rather than leading the child through a series of specific questions that may be answered yes or no. In general, repetition of questions and interviews leads one away from a valid description of events. Repetition implies that the preferred answer has not yet been heard or provides an examiner opportunities to inadvertently reinforce erroneous statements. At the same time, an examiner is probably wise to assess the consistency of the child’s report. This may be assessed on a one-time basis by feigning forgetfulness and asking the child to describe the events again. The examiner must avoid negatively stereotyping anyone whom the child has accused, or may accuse. The examiner must possess adequate knowledge of child development and of what are, and are not, statistically normal sexual activities of children. If multiple children are thought to have been abused, a given examiner should assess only one of them. Other children should be referred to other examiners. Additional Recommendations





The examiner should actively assess the child’s understanding of the difference between truth and non-truth. This may be done by asking the child to describe or define these concepts, then quizzing the child with a few simple questions. (“If I said your name is Johnny, is that the truth or a lie?”) It is also wise to ask the child to tell the examiner something that is true and something that is a lie. (“Tell me the truth about the weather outside.”… “Now tell me a lie about it.”) This approach will assess the child’s understanding more completely. Similarly, the examiner should actively assess the child’s understanding of the difference between reality and fantasy. The examiner may ask whether certain television characters are real (humans) or not real (cartoons). However, one must be alert to the fact that the perceptive child may conclude that neither is real because even the humans in situation comedies and dramatic programs are playing roles. Careful inquiry may sort out any misunderstandings.

156

The Behavior Analyst Today



• • • •



Volume 8, Issue 2, 2007

It is developmentally normal for a young child to be easily led. Thus, following a disclosure the examiner should actively assess the extent to which a child will assent to suggestion. This may be done by asking the child whether a specific event (never reported by the child) happened. For example, with a child who has made an allegation that he was anally penetrated, the examiner may ask (with uncertainty), “I don’t remember whether you said this, but did you say he made you touch his penis with your hand? Did that happen?” The ease with which a child may be led to affirm a suggestion is useful information, particularly in a case where no corroboration of the child’s allegations exists, and in which the accused asserts that the child has been coached to make a false allegation. However, it must be remembered that suggestibility is not diagnostic of sexual abuse. Rather, it is useful information to the ultimate finder of fact (judge or jury) if the examiner had actively assessed that dimension of the child’s functioning. The examiner should ask the child whether anyone has promised or suggested that his or her report will result in reinforcers (e.g., money, attention, keeping Daddy out of jail). Avoid use of anatomically detailed dolls. The dolls’ genitalia may well be overly suggestive and, in any event, it may be difficult to convince a judge or jury that the doll’s genitalia are not suggestive. Assess and use the child’s terms for its body parts. Once a child has made a disclosure, the examiner should inquire as to the number of people the child has told, the number of interviews (formal and informal) it has undergone, and under what circumstance it has disclosed. This is important, given what is known about the problems of interview repetition. Finally, a given professional must undertake either a forensic examination or therapy, not both, with any given child. The roles of forensic evaluator and ongoing therapist are different. The forensic evaluator must not become an advocate for the child, which is a roll that is often difficult to avoid when one is an ongoing therapist. For this reason the American Psychological Association’s Guidelines for Psychological Evaluations in Child Protection Matters (Committee on Professional Practice and Standards, APA Board of Professional Affairs,1999) holds, “Psychologists generally do not conduct psychological evaluations in child protection matters in which they serve in a therapeutic role for the child or the immediate family or have had other involvement that may compromise their objectivity.” (p. 589) The role of Psychometric Instruments

This article has focused upon the forensic interview because the field is generally without other suitable means of assessing children for sexual abuse. Direct observation of free play with anatomically detailed dolls revealed no reliable differences between abused and non-abused children (Kenyon-Jump, Burnette & Robertson, 1991). Play therapy techniques have never been researched, much less validated, as useful in assessment of child sexual abuse according to the then-President of the Association of Play Therapy, Sue Bratton (personal communication, January 27, 2002). Are there psychometric instruments that reliably assess for child sexual abuse? Evidently not. One such instrument that is in relatively widespread use, however, is the Child Sexual Abuse Inventory (CSBI). It contains a list of 38 sexual behaviors which are rated retrospectively, covering the past six months, by the child’s parent or other full-time caretaker, on a four-point scale from zero to three (Purcell, Beilke & Friedrich, 1986). Sexual behaviors, such as “masturbates with hand” are rated as follows: never

157

The Behavior Analyst Today

Volume 8, Issue 2, 2007

observed, observed less than once per month, observed one to three times per month, or observed once per week or more. Although the CSBI is frequently employed diagnostically, the retrospective nature of such accounts is problematic. As well, all of the behaviors assessed are seen in non-abused children. For example, one study looked at hundreds of parents’ ratings of their children on the CSBI’s four-point scale. The children were in three groups: normals (n = 1,114), those with psychiatric histories who had not been sexually abused (n = 577), and those known to have been sexually abused (n = 620). Their mean ages were 6.0, 7.5 and 7.3 years, respectively. Regarding the item “masturbates with hand,” the normals’ mean rating was .18 (SD = .54); the psychiatric group’s mean was .16 (SD = .58); and the sexually abused group’s mean was .43 (SD = .90) (Friedrich, Fisher, Dittner, Acton, Berliner, Butler, Damon, Davies, Gray & Wright, 2001). The group differences were statistically significant, primarily due to the large numbers of subjects in each group. However, the results probably have little meaning for the examiner who is assessing an individual child. A look at the means and standard deviations reveals that many children who were never abused tend to masturbate, while some who were abused do not (or have not been observed to do so any more frequently than have a number of non-abused children). Moreover, a parent may have been sensitized to become more observant of a child’s sexualized behaviors, if abuse has been reported or suspected. Although the CSBI is widely used in the assessment of child sexual abuse, its interpretation involves enough difficulties that its use should be held in abeyance at present, except for purposes of research. Behavior Analytic Contributions A cursory examination of the behavioral literature leads to the unfortunate conclusion that little attention has been paid to the assessment of child sexual abuse. For example, A review of a ten-year index (1988-1997) of the Journal of Applied Behavior Analysis found no subject index listings for the following terms: sexual abuse, child abuse, child sexual abuse, assessment of child sexual abuse or behavioral assessment of child sexual abuse. EBSCO Host searches from 1975-present for these terms found no matches for Behavior Research and Therapy, Behavior Modification or Behavioral Assessment. . That is disturbing, given the topic’s visibility in popular media and in non-behavioral literature such as American Psychologist, for which a search resulted in 307 matches to “child sexual abuse,” alone. However, behavior analytic thinking and methods would seem to be desirable in the assessment of child sexual abuse, for several reasons. First is the focus upon working with individuals. A judge or jury wants to know about that child, under its specific circumstances. Behavioral approaches emphasize the individual, in context, and thus seem fundamentally well suited to the assessment of child sexual abuse. Second, behavior analysts are trained to avoid speculation that goes well beyond the data. A tendency to engage in such speculation contributed to the disastrous outcomes of the cases that were profiled at the top of this article. Third, behavior analysts generally avoid the temptation to interpret behaviors as evidence of otherworldy phenomena that fall outside the world of matter and energy with which natural sciences deal. Historic efforts by examiners to uncover inner conflicts, libido fixations, repressed memories and other

158

The Behavior Analyst Today

Volume 8, Issue 2, 2007

hidden phenomena contributed to the unfortunate outcomes in the high profile cases described earlier in this article. A recent article in the Journal of Applied Behavior Analysis follows upon, and extends, methodology employed in the non-behavioral literature described above. It provides some insight into how behavior analytic procedures may better inform us. This single-subject, A-B-C-B experimental design involved four 5-yr.-old children who underwent a simulated ten-minute health check by a confederate who weighed each child, listened through a stethoscope, etc. Later other confederates, who were blind to what had occurred in the examinations, interviewed the children 12 times, with each interview occurring two to three days apart. In between the interviews, parents talked to the children, making suggestions about the “doctor” that were either positive or negative. Positive suggestions included, “I like the way (health checker) looked at you. You had fun, didn’t you?” Negative suggestions included, “I do not like the way (health checker) looked at you. Are you okay?” Additional questions were neutral: “Did (health checker) measure you with the tape?” “Did you get a shot?” It was found that repeated interviews reduced the accuracy of children’s answers. Positive and negative suggestions further decreased accuracy of the children’s reports, with negative suggestions resulting in somewhat greater distortion than positive suggestions (Doepke, Henderson, Critchfield, 2003). The authors concluded, in part, “Studying testimony at the level of the individual by utilizing single-subject experimental designs is the first step toward launching a behavioral analysis to explore the utility of behavioral theory in understanding eyewitness testimony.” (p. 461) Behavior analytic philosophy and methods have found substantial acceptance in arenas such as special education and autism treatment. They have found limited but growing acceptance in other areas such as safety, corrections, business, regular classrooms, and therapy clinics. Fifty years ago Skinner (1956) commented on the potential spread of behavior analysis to our work with psychotic behavior. His observation, if applied to this article’s topic, provides us with guidance: “It is rare to find behavior dealt with as a subject matter in its own right. Instead it is regarded as evidence for a mental life, which is then taken as the primary object of inquiry.” (p. 84) It is time we apply behavior analytic principles and methods to the assessment of child sexual abuse. References American Psychological Association (1992). Guidelines for psychological evaluations in child protection matters. Washington, DC: Author.

Bruck, M., Ceci, S. J. & Hembrook, H. (1997). Children’s reports of pleasant and unpleasant events. In D. Read and S. Lindsay (Eds.), Recollections of trauma: Scientific research and clinical practice (pp. 199-219). New York: Plenum Press. Bruck, M., Ceci, S. J. & Hembrook, H. (1998). Reliability and credibility of young children’s reports: From research to policy and practice. American Psychologist, 53, 136-151. Ceci, S. J., Leichtman, M. D, & Bruck, M. (1995). The suggestibility of children’s eyewitness reports: Methodological issues. In F. Weinert & W. Schneider (Eds.), Memory performance and competencies: Issues in growth and development. Hillsdale, NJ: Erlbaum.

159

The Behavior Analyst Today

Volume 8, Issue 2, 2007

White, T. L., Leichtman, M. D. & Ceci, S. J. (1997). The good, the bad and the ugly: Accuracy, inaccuracy and elaboration in preschoolers’ reports about a past event. Applied Cognitive Psychology, 11, S37-S54. Ceci, S. J.& Bruck, M. (1995). Jeopardy in the courtroom: A scientific analysis of children’s testimony. Washington, DC: American Psychological Association. Clark-Stewart, A., Thompson, W. & Lepore, S. (1989, May). Manipulating children’s interpretations through interrogation. Paper presented at the biennial meeting of the Society for Research on Child Development, Kansas City, MO. Committee on Professional Practice and Standards (1999). Guidelines for psychological evaluations in child protection matters. American Psychologist, 54, 586-593. Doepke, K. J., Henderson, A. L. & Critchfield, T. S. (2003). Social antecedents of children’s eyewitness testimony: A single-subject experimental analysis. Journal of Applied Behavior Analysis, 36, 459463. Faller, K. (2001, October). Allegations of abuse in divorce cases. Workshop presented at the meeting of the 9th Annual West Virginia Children’s Justice Task Force Conference 2001, Charleston, WV. Friedrich, W. N., Fisher, J., Broughten, D., Houston, M. & Shafran, C. R. (1998). Normative sexual behavior in children: A contemporary sample. Pediatrics, 101. Friedrich, W. N., Fisher, J. A., Dittmer, C. A., Acton, R., Berliner, L., Damon, L., Davies, W. H., Gray, A. & Wright, J. (2001). Child Sexual Abuse Inventory: Normative, psychiatric and sexual abuse comparisons. Child Maltreatment, 6, 37-49. Glover, J. & Gary, A. L. (1976). Procedures to increase some aspects of creativity. Journal of Applied Behavior Analysis, 9, 79-84. Goetz, E. M. & Baer, D. M. (1972). Descriptive social reinforcement of “creative” blockbuilding by young children. In E. Ramp and B. L. Hopkins (Eds.), A new direction for education: Behavior analysis. Lawrence: University of Kansas. Goetz, E. M. & Baer, D. M. (1973). Social control of form diversity and the emergence of new forms in children’s blockbuilding. Journal of Applied Behavior Analysis, 6, 209-218. Goetz, E. M. & Salmonson, M. (1972). The effects of general and descriptive reinforcement on “creativity” in easel painting. In G. Semb (Ed.), Behavior analysis and education. Lawrence: University of Kansas. Jones, D. & McGraw, J. M. (1987). Reliable and fictitious accounts of sexual abuse in children. Journal of Interpersonal Violence, 2, 27-45. Kenyon-Jump, R., Burnette, M. & Robertson, M. (1991). Comparison of behaviors of suspected sexually abused and nonsexually abused preschool children using anatomical dolls. Journal of Psychopathology and Behavioral Assessment, 13, 225-240.

160

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Koocher, G. P., Goodman, G. S., White, C. S & Friedrich, W. N. (1995). Psychological science and the use of anatomically detailed dolls in child sexual abuse assessments. Psychological Bulletin, 118, 199-222. Leichtman, M. & Ceci, S. J. (1995). The effects of stereotypes and suggestions on preschoolers’ reports. Developmental Psychology, 31, 568-578. London, L., Bruck, M., Ceci, S. J. & Shuman, D. W. (2005). Disclosure of child sexual abuse: What does the research tell us about the ways that children tell? Psychology, Public Policy, and Law, 11, 194226. Purcell, J., Beilke, r. L. & Friedrich, W. N. (1986), August). The Child Sexual Behavior Inventory: Preliminary normative data. Paper presented at the 94th Annual convention of the American Psychological Association. Washington, DC. Rathus, S. A., Nevid, J. S., & Fichner-Rathus, L. (1993). Human sexuality in a world of diversity. Boston: Allyn and Bacon. Ryan, G. & Lamb, N. (2006, November). The necessary components of a legally defensible child sexual abuse investigation. Workshop presented at the meeting of the 14th Annual West Virginia Children’s Justice Task Force Conference 2006, Charleston, WV. Saywitz, K. J., Mannarino, A. P., Berliner, L. & Cohen, J. A. (2000). Treatment for sexually abused children and adolescents. American Psychologist, 55, 1040-1049. Skinner, B. F. (1974). About behaviorism. New York: Random House. Skinner, B. F. (1956). What is psychotic behavior? In F. Gildea (Ed.), Theory and treatment of the psychoses: Some newer aspects (pp. 77-99). St. Louis: Washington University Committee on Publications. Stickel, E. G. (1993). Archeological Investigation of the McMartin Preschool Site: Manhattan Beach, California. Unpublished manuscript, McMartin Tunnel Project. State v. Buckey, Superior Court, Los Angeles County, California, #A750900 (1990). State v. Michales, Superior Court, Essex County, New Jersey (1998). State v. Robert Fulton Kelly, Jr., Superior Criminal Court, Pitt County, North Carolina, #91-CRS-42504363 (1991-1992). Summit, R. C. (1983). The child sexual abuse accommodation syndrome. Child Abuse and Neglect, 7, 177-193. West, B. (Executive Producer). (1996, November 14). Turning Point. New York: ABC News. Wyatt, W. J. (1999). Assessment of child sexual abuse: Research and proposal for a bias-free interview, part 2. The Forensic Examiner, 8. 24-27. Wyatt, W. J. (2002). What was under the McMartin preschool? A review and behavioral analysis of the “tunnels” find. Behavior and Social Issues, 12, 29-39.

161

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Zirpolo, K. & Nathan, D. (2005, October 30). I’m sorry: A long-delayed apology from one of the accusers in the notorious McMartin Pre-School molestation case. Los Angeles Times. Retrieved November 7, 2006, from http://www.legalspring.com. Author Note W. Joseph Wyatt, Department of Psychology, Marshall University. Correspondence concerning this article may be addressed to the author at the Department of Psychology, Marshall University, 1 John Marshall Dr., Huntington, WV 25755 USA, Telephone: 304696-2778, or [email protected] Advertisement

ANNOUNCEMENT BEHAVIOR ANALYSIS REVIEW 2007 Behavior Analyst Online is pleased to present the "Behavior Analysis Review 2007," a two-volume anthology of brief reviews and discussion articles covering a wide range of topics to which behavior analysis is relevant. The goal of the volumes, which are published by BAO Journals, is to promote the dissemination of scholarly information about behavior analysis across specialty areas and disciplinary boundaries. These articles are suitable for several audiences, including established behavior analysts who wish to know more about unfamiliar topics; individuals who are just starting their development of expertise in behavior analysis; and colleagues from outside of behavior analysis who may be interested in what behavior analysis can contribute to topics about which they care. To get your free set of the Behavior Analysis Review 2007 journals, visit the website: http://www.behavior-analyst-today.com/review2007.html.

162

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Extending Research on the Validity of Brief Reading Comprehension Rate and Level Measures to College Course Success Robert L. Williams, Christopher H. Skinner, and Kathryn E. Jaspers Students in an undergraduate human development course (N = 215) participated in a brief assessment of their reading (comprehension level, reading speed, comprehension rate) and multiple-choice test-taking skills on the second day of class. Students first read a one-page, 400-word passage unrelated to the course and then answered 10 multiple-choice questions over the passage without referring back to the passage. To control for test-taking skills, students also answered 10 multiple-choice questions from an equivalent passage they did not read. Videotapes of student participation permitted individual assessment of time required to complete each phase. Subsequently, during the semester students took five 50-item multiple-choice exams over the major units in the course. Results showed that the brief reading comprehension measures predicted multiple-choice exam performance and that comprehension level accounted for most of the variance in exam performance. Discussion focuses on enhancing brief reading assessment procedures by including direct measures of comprehension. Keywords: Reading Comprehension, Reading Comprehension Rate, Test-taking, College Grades

Several measures predict success in college courses, including measures of critical thinking, generic vocabulary, background knowledge, and reading comprehension (Behrman & Street, 2005; Jackson, 2005; Williams & Eggert, 2002; Williams, Oliver, Allin, Winn, & Booher, 2003a; Williams, Oliver, Allin, Winn, & Booher, 2003b; Williams, Oliver, & Stockdale, 2004; Williams & Worth, 2002). Not all course activities require high-level critical thinking, an advanced vocabulary, or background knowledge, but virtually all course activities require student reading of course material (Behrman & Street, 2005). Thus, the current study focused on reading comprehension measures as predictors of success in a large undergraduate course. Although reading comprehension is predictive of student success (Jackson, 2005), reading speed or fluency may also be related to comprehension levels, effort required to read, and reinforcement for reading (Skinner, Pappas, & Davis, 2005). Researchers have measured reading fluency by timing students' oral reading and scoring word accuracy. These data are then used to calculate words correct per minute (WCM), which has been shown to correlate with reading comprehension and other reading skills as measured via standardized, norm-referenced tests with strong psychometric properties (Deno & Merkin, 1977; Deno, Merkin, & Chiang, 1982; Fuchs, & Fuchs, 1992; Fuchs, Fuchs, Hosp, & Jenkins, 2001; Fuchs, Fuchs, & Maxwell, 1988; Hintze & Shapiro, 1997; Jenkins & Jewell, 1993). These studies provide support for several theories that suggest causal mechanisms for explaining the relationship between reading speed and comprehension. Students who read rapidly and are not required to apply their attention or other cognitive resources (e.g., working memory) towards decoding words have more cognitive resources available to apply towards comprehension. Additionally, as time passes information may become inaccessible (e.g., fading from working memory). Thus, as they read, rapid readers may have access to more information from material read earlier than slow readers, which may enhance their ability to synthesize information as they progress through passages (LaBerge & Samuels, 1974; Perfetti, 1992; Rasinski, 2004; Stanovich, 1986). Reading, like many other skills, improves as people choose to spend more time engaged in the activity (Daly, Chafouleas, & Skinner, 2005; Stanovich, 1986). Even when slower readers are able to

163

The Behavior Analyst Today

Volume 8, Issue 2, 2007

comprehend the material at the same level as rapid readers, rapid readers are more likely to choose to read the assigned work because it requires less time and effort and results in a higher rate of reinforcement (Billington, Skinner, Hutchins, & Malone, 2004; Skinner, 1998; Skinner, 2002; Skinner, Neddenriep, Bradley-Klug, & Ziemann, 2002; Skinner, Wallace, & Neddenriep, 2002). Additionally, fluent readers have more time to engage in other behaviors that may enhance learning (e.g., studying notes) and test performance (e.g., carefully considering all responses on a multiple choice exam) than those who require more time to read (Skinner et al., 2005). Although the psychometric research base supporting WCM is strong, the correlations with broad reading skill development begin to decline as skills develop beyond the 4th or 5th grade level (Hintze & Shapiro, 1997; Jenkins & Jewell 1993; Neddenriep, Skinner, Hale, Oliver, & Winn, in press). Jackson’s (2005) comparison of reading comprehension, oral reading fluency, and decoding showed that reading comprehension was the only significant predictor of college students’ grade point average. Thus, reading comprehension, rather than the ability to read aloud both accurately and rapidly, may be the most essential reading skill for older students (Chall, 1983; Fuchs & Fuchs, 1992; Potter & Wamre, 1990; Shapiro, 2004; Skinner, Neddenriep et al., 2002). However, because college students are often given a limited amount of time to complete numerous assigned readings and exams, reading speed also may predict success in college courses. Recently, researchers have developed and evaluated a measure of reading comprehension rate (Hale, Skinner, Williams, Neddenriep, & Dizer, in press; Neddenriep et al., in press; Skinner, 1998). Reading comprehension rate (RCR) is similar to WCM in that it takes into account reading speed, using time required to read the material in the denominator. However, the numerator (number of words read aloud and accurately) is replaced with a measure of reading comprehension level (percent correct on comprehension questions). This RCR measure can be converted to a common metric, percent passage comprehended for each minute spent reading (Skinner, 1998). Neddenriep et al. (in press) correlated reading comprehension level and RCR with broad reading cluster scores on the Woodcock-Johnson III Tests of Achievement (Woodcock, McGrew, & Mather, 2001). Results showed that silent reading comprehension level (percent comprehension questions correct) moderately correlated with 10th grade readers’ broad reading scores (r = .40). When this measure was converted to a RCR measure, the correlation increased (r = .53). These results showed that altering the traditional comprehension level measure to a comprehension rate measure enhanced the ability to predict broad reading skills. Others have shown that RCR is a sensitive measure of subtle differences in reading comprehension occasioned by different intervention procedures (Freeland, Jackson, & Skinner, 1999; Freeland, Skinner, Jackson, McDaniel, & Smith, 2000; Hale et al., in press; Hale, Skinner, Winn, Oliver, Allin, & Molloy, 2005; McDaniel, Watson, Freeland, Smith, Jackson, & Skinner, 2001). Researchers concerned with measuring reading fluency by assessing rate of aloud reading accuracy have suggested that measuring RCR (e.g., percent of passage understood for each minute spend reading) may provide a more direct measure of functional reading skills, especially in advanced readers (Skinner, 1998; Skinner, Neddenriep, et al., 2002). The current study was designed to extend the research on the validity of brief assessments of reading comprehension levels and rates by determining if these measures could predict a functional outcome for skilled readers (i.e., success in a large college course). We controlled for test-tasking skills (Stough, 1993) by requiring students to answer questions covering passages that they had not read. Analysis included performance on factual and inferential comprehension questions. Finally, we also examined the relationship between reading speed and comprehension.

Method

164

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Participants All students came from five sections of an undergraduate human development course required for entry into the teacher-education program at a major state university in the Southeastern United States. Three large sections had approximately 55 students per section and two small sections had approximately 25 students per section. Because a few students dropped the course, 210 students out of the original 215 ultimately completed all the measures related to the study. Approximately 25% of the students were males and 75% females. With respect to students’ academic level, 3% were freshmen, 40% sophomores, 30% juniors, 17% seniors, and 10% graduate students. Research Materials and Procedures When students entered the classroom on the second day of the course, they found a packet of materials on their desk. After students sat down, they first displayed their name card on the desk immediately in front of them. Then they filled in selected demographic information (e.g., academic classification, sex, and course section) on the front sheet of the materials packet, but did not open the rest of the packet until instructed to do so. When instructed to turn to the second page of the packet, they saw the following information: On the following page, you will find a reading passage. When instructed, please turn to the passage and read it silently. Read at your normal rate—neither faster nor slower than usual. However, read carefully because you will be asked questions about the passage. Raise your hand and keep it raised for three seconds when you have finished reading the passage. A 400-word passage taken from Spargo (1989) appeared on the third page of the packet. The selected passage, entitled “More Rare than Rubies,” mainly dealt with the contemporary importance of rare heavy metals and the methods of prospecting for these metals (p. 43). The course did not deal with heavy metals; plus, students were unlikely to have much background information regarding heavy metals. After finishing the passage and raising their hand for three seconds, students then proceeded to the next page. This page contained the following instructions: On the following page, you will find comprehension questions based on the passage you just read. Try your best to answer these questions accurately. Do not refer back to the passage when answering the questions. When you have finished answering the questions, raise the packet of materials in your hand and keep it raised for three seconds. The subsequent page contained ten multiple-choice questions, with three options per questions. Five of the questions were labeled “Recalling Facts” and five “Understanding the Passage.” The latter questions required inferences from the factual information in the passage. The bottom of this page repeated the instruction to raise the packet and keep it up for three seconds after completing the questions and before proceeding to the next page. After turning to the next page, students read instructions related to answering questions over a passage they had not read: On the following page, you will find comprehension questions based on a passage that you have not read. Try your best to answer these questions accurately. When you have finished answering the questions, raise your hand and keep it raised for three seconds. The following page contained ten multiple-choice questions, with three options per question, based on passage entitled “Sewerage Disposal” from Spargo (1989, p.91). Five questions were factual and five were inferential. Again, the course did not deal with the information in this passage and students were unlikely to have much background information regarding the content in the passage. Thus, their

165

The Behavior Analyst Today

Volume 8, Issue 2, 2007

performance would likely be based on chance or cues embedded in the questions themselves. The bottom of this page included instructions reminding the students to raise their hand and keep it raised for three seconds after completing the questions and before turning the page. The final page instructed the students to place their packet face down on the desk in front of them and wait for instructions regarding the next research activity. In order to precisely calibrate how much time each student took to complete each phase of the data collection, we used multiple video cameras to record the hand/package signals of every student in the classroom. The combination of students’ positioning name cards on the desk in front of them and raising their hands and/or packets when they finished each phase of the data collection permitted precise determination of when each student began and ended each phase. In viewing the videotapes, research associates used stop watches to make time determination by the second. In addition to the pre-course data collection, we administered five 50-item multiple choice exams (four options per item) over the five units in the course: physical development, cognitive development, psychological development, social development, and character development. Previous research (Wallace & Williams, 2003) on the exam items has indicated that about 26% of the items strictly assessed recall of factual information, 58% required use of factual information to make inferential judgments, and 16% did not exclusively fit either the factual or inferential category. All three item types differentiated between high and low performers on the course exam. Results Presentation of the results first highlights the relationship between reading and test-taking variables to performance on the course multiple-choice exams. Next, we present the results of our analysis of passage reading speed and comprehension. Finally, high and low performers on the exams were compared across reading and test-taking variables. Exam Performance and Reading Measures Table 1 shows that virtually all of the reading-comprehension variables correlated significantly with exam performance. RCR was calculated using two terms, total questions Table 1 Correlations between Pre-course Measures and Course-Exam Totals Reading measure Reading passage No reading passage Factual correct .21* .08ns Inference correct .31** .10ns ** Total correct .37 .13ns * Passage reading time -.19 Total correct rate .39** Total question time -.07ns .02ns ns Total question rate .06 .06ns

_________________________________________________________________________________________________ *

p < .01 p < .001 ns non-significant **

correct in the numerator and reading time in the denominator. The significant negative correlation (i.e., r = -.19,; r2 = .04) between passage reading time and exam performance suggests that reading speed is related to performance in college course work. However, the numerator, comprehension questions correct, accounted for more variance (r = .37; r2 = .14) in exam scores than the denominator, reading time (r2 =

166

The Behavior Analyst Today

Volume 8, Issue 2, 2007

.04). Furthermore, much of the variance in exam performance accounted for by reading speed is shared with the variance accounted for by the total reading comprehension level measure (total questions correct). Thus, when these two measures are combined, RCR correlated only slightly higher (r =.39; r2 = .15) with exam performance than reading comprehension level (r = .37, r2 = .14). When number correct on the five factual items was compared with number correct on the inferential items as predictors of course exam performance, the inferential score accounted for 10% of the variance in exam scores (r =.31, r2 = .10) and factual scores accounted for 4% of the variance (r =.21, r2 = .04). The combination of the factual and inferential scores accounted for 14% of the variance in exam scores. Interestingly, this analysis showed that factual and inferential comprehension did not account for the same variance in exam score performance. Comparisons of time taken to answer the questions across passages showed that students spent significantly, t(214) = 13.18, p < .001, more time answering the non-passage items (x = 167.84 seconds) than the passage items (x = 133.60 seconds) relative. These comparisons suggest that students attempted to apply their test-taking skills to answer the non-passage items. However, Table 1 also shows that none of the pre-course exam scores based on an unread passage correlated significantly with course exam performance. Even for the reading passage, time taken to answer the questions and rate of correctly answering the questions did not correlate significantly with performance on the course exams. Thus, testtaking skills were ruled out as a predictor of exam performance in the course and as a plausible rival hypothesis accounting for the relationship between reading variables and exam scores. Passage Reading Time and Passage Comprehension Measures Table 2 shows the correlations between the three passage-reading-comprehension measures (i.e., factual items correct, inferential items correct, and total correct) and passage Table 2 Correlations between Reading Time and Comprehension Questions Correct _______________________________________________________________________ Passage Fact Interference Total reading time correct correct correct _______________________________________________________________________ Passage reading time 1.000 -.073ns -.160* -.161* ns 1.000 .002ns .752** Factual correct -.073 * ns Inference correct -.160 .002 1.000 .661** Total correct -.161* .752** .661** 1.000 ______________________________________________________________________ * p < .05 ** p < .001 ns non-significant reading time. Results show a significant correlation (p < .05) between total questions correct and reading time (r = -.161). This small negative correlation between comprehension level and reading time explains why converting reading comprehension level to a rate measure (i.e., incorporating time to read) did not account for much more variance than reading comprehension alone. Table 2 also shows a significant correlation (p < .05) between inferential reading comprehension and reading time (r = -.160), but an insignificant relationship between factual questions correct and reading time (r = -.073). These results show that students who spent less time reading answered more inferential passage questions correctly than students who spent more time reading. Mean Comparisons

167

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Students scoring in the top and bottom 20% on the unit exam totals were compared on all the reading variables. Table 3 shows significant differences on the pre-course performance measures linked to the read passage, but not on those linked to the unread passage. Additionally, the high and low exam performers did not differ in the time taken to answer the questions over the reading passage or in the rate of correctly answering these questions. Although the effect sizes for total reading comprehension and RCR were very strong, the effect size for RCR (2.00) was almost twice as great as for reading comprehension level (1.17). Thus, at the performance extremes on the course exams, high-performing students clearly differed from the low-performing on the reading comprehension measures, most especially RCR, but not on the test-taking variables. The ability to comprehend reading material, especially at a more rapid rate, apparently facilitated exam performance but test-taking pace and skills per se did not significantly predict exam performance. Table 3 Comparison of Pre-course Means for High and Low Performers on Course Examinations _____________________________________________________________________________________ Reading measure High exam Low exam t ratio Low SDa Effect sizeb _____________________________________________________________________________________ Reading passage Factual correct 3.26 2.48 t(83) = 3.33, p = .001 1.09 0 .72 Inference correct 3.77 2.74 t(83) = 4.90, p = .0001 1.08 0.95 Total correct 7.02 5.21 t(83) = 5.43, p = .0001 1.55 1.17 Passage reading time 137.09 154.67 t(83) = -2.68, p = .01 29.62 -0.59 Total correct ratec 0.05 0.03 t(83) = 5.74, p = .0001 0.01 2.00 Total question time 130.26 132.48 t(83) = -0.36, p = ns 33.83 0.07 Total question rated 0.06 0.05 t(83) = 0.52, p = ns 0.07 0.14 No reading passage __________________________________________________________________________________ Factual correct 2.23 1.93 t(83) = 1.46, p = ns 0.92 0.33 Inference correct 2.87 2.69 t(83) = 0.68, p = ns 1.26 0.14 Total correct 5.09 4.61 t(83) = 1.36, p = ns 1.51 0.32 Total question time 171.26 169.86 t(83) = 0.17, p = ns 39.77 0.04 Total question rate 0.03 0.03 t(83) = 0.95, p = ns 0.01 0.00 ____________________________________________________________________________________ a Low SD = Standard deviation of scores for the low exam group. b Effect sizes were computed by subtracting the low-exam mean from the high-exam mean and then dividing by the standard deviation of the low-exam scores. c Total correct rate was computed by dividing the number of questions answered correctly by the total number of seconds required to read the passage. dTotal question rate was computed by dividing the number of questions answered correctly by the total number of seconds required to respond to the questions. Discussion Neddenriep et al. (in press) found that brief RCR measures correlated with standardized measures of broad reading skill development in elementary and 10th grade students. We extended this research to college students and showed that RCR could predict performance on college exams, a more functional criterion measure than performance on standardized tests. Overall, relationships between the reading variables and performance on the course exams ranged from small to medium, but comparisons of high and low exam performers showed substantial differences on the reading variables but no significant differences on the test-taking pace and skill variables. These results suggest that the reading measures, not

168

The Behavior Analyst Today

Volume 8, Issue 2, 2007

test-tasking skills, account for these relationships. These findings may have theoretical and applied implications. RCR measures include both a measure of reading speed (denominator) and comprehension (numerator). Some theorists, both behavioral (e.g., Skinner, 1998; Skinner, Wallace et al., 2002) and cognitive (e.g., LaBerge & Samuels, 1974; Perfetti, 1992) have proposed causal models describing the relationship between reading speed or fluency and the development of reading skills that are functional (i.e., comprehension level and rate). These theories and the research base supporting these theories have caused educators, researchers, and policy makers to recommend that educators implement procedures to enhance reading speed (e.g., Adams, 1990; Daly, Chafouleas, & Skinner, 2005; National Assessment of Educational Progress, 2005; Rasinski, 2004). Several of our findings support these theories and recommendations. The time students spent reading the passage was negatively correlated (r = -.16) with passage comprehension (total questions correct). Thus, students who spent less time reading answered more questions correctly. Second, the significant correlation (r = -.19) between reading time and exam performance showed that more rapid reading is associated with higher performance on course exams. Compared to factual questions, inferential questions may require students to synthesize more information across the text. The significant negative correlation between reading time and inferential questions answered correctly supports the theory that slower readers may have had difficulty synthesizing information across the passage because information dropped from working memory as time passed. The current results support the relationship between reading speed and success in college courses. However, almost all of the variance in students' exam scores accounted for by the RCR measure can be attributed to the measure of reading comprehension (numerator), as opposed to reading time (denominator), embedded within the RCR measure. These results support previous researchers who have stressed the need to alter brief assessments of reading skills in advanced readers by incorporating direct assessment of reading comprehension level and rate (Jackson, 2005; Neddenriep et al., in press; Skinner et al., 2002). The current results extend the research base by suggesting that the direct assessment of comprehension, as opposed to speed, is most critical for predicting success in college courses. Our results with college students are consistent with the Neddenriep et al. (in press) finding that RCR better predicted overall reading skill than reading comprehension level in 10th-grade students. However, our results differed in that RCR correlated only slightly higher (r = .39) with exam scores than reading comprehension level (r = .37), whereas Neddenriep et al. found greater differences between the predictive ability of silent reading comprehension level (r = .40) and silent RCR (r = .53). Several differences between the current study and the Neddenriep et al. study may account for the stronger support for RCR in the Neddenriep study. The current study included a sample of college students, as opposed to 10th-grade students. A group of college students is likely to have less variability in their reading skills than a group of 10th-grade students, thus reducing the predictive potential of reading scores for college students. Also, the criterion variable differed across the two studies, with Neddenriep et al. using a standardized measure of broad reading ability and the current researchers using exam scores in a college class. Additional studies are needed to determine if RCR is a more significant indicator of overall reading skills, but reading comprehension alone may be sufficient to adequately predict classroom performance. The current results showed that correctly answering inferential questions over what one has just read was more predictive of exam performance than correctly answering factual questions. A majority of the items on the course exams required some degree of inference about course information (Wallace & Williams, 2003), suggesting that test-text overlap may have enhanced the correlations. Perhaps some of the predictive potential was that both the pre-course assessment and the in-course examinations used a multiple-choice format. However, one must keep in mind that performance on the pre-course multiplechoice questions linked to an unread passage was not predictive of course exam performance. Regardless,

169

The Behavior Analyst Today

Volume 8, Issue 2, 2007

additional research is needed to determine if reading comprehension question type (fact versus inference, multiple choice versus essay) influences the predictive validity of reading comprehension measures across various criteria measures (e.g., multiple-choice exams, essay exams, standardized reading achievement tests, test of critical thinking). Individually administering lengthy standardized measures of reading comprehension at the beginning of courses may strengthen the relationships between reading skill assessment and exam performance. However, given that time typically is at a premium in large undergraduate courses, devoting a significant amount of time to pre-course assessment would be a questionable use of class time. Group administration of the reading comprehension measures used in the current study required much less time (the student requiring the most time needed 13.92 minutes) to get a sense of how well students were likely to do on major course exams across the semesters. Actually, it appears quite remarkable that an activity requiring less than 15 minutes for the slowest-working student could generate data predictive of performance on course exams spanning an entire semester. Regardless, it is very easy to group administer and score comprehension level, but more difficult and time consuming to collect reading speed data (i.e., taping all group members, viewing the tapes, and recording each students' time spent reading). Our results suggest that collecting data on reading speed may not be necessary, as including reading time results in only a small increase in predictive power. Future research may find that they can efficiently strengthen the predictive power associated with group administration of silent reading comprehension level measures by using longer passages, passages with more questions, and/or or multiple passages with the median scores used as the predictor (Skinner et al., 2002). The findings of the current study challenge a claim often advanced by our students who do poorly on the multiple-choice exams in the target course. Anecdotally, a rather common claim among these students is that they understood the material really well but the tests did not allow them to demonstrate their understanding. An extension of this claim is that the course exams primarily measure skill in taking multiple-choice tests rather than understanding of course content. However, the purest measure of testtaking skills in the current study (answering multiple-choice questions over a passage not read) was not related to performance on the multiple-choice exams in the course, even at the extremes of exam performance. In contrast, the ability to answer multiple-choice questions over a passage just read was predictive of exam performance. Summary Skinner (1998) indicated that reinforcement for reading is typically contingent upon comprehension (i.e., the function of reading is comprehension). The most commonly used brief measure of oral reading fluency is WCM (Daly et al., 2005). Although WCM correlates with reading comprehension, using correlates to indirectly measure anything requires a level of inference that may lead to questionable outcomes. For example, educators may attempt to apply procedures that enhance aloud reading speed but do not necessarily enhance reading comprehension. Thus, a more direct measure of functional reading skills is a measure of silent reading comprehension level and/or rate (Skinner et al., 2002). In addition to conducting additional studies designed to investigate and enhance the psychometric properties of silent reading comprehension measures (e.g., validity, reliability, sensitivity), researchers should continue to use these measures to assess the effects of interventions designed to enhance silent reading comprehension, the most direct measure of functional reading skills. References Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: Massachusetts Institute of Technology. Behrman, E. H., & Street, C. (2005). The validity of using a content-specific reading comprehension test for college placement. Journal of College Reading and Learning, 35(2), 5-21.

170

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Billington, E. J., Skinner, C. H., Hutchins, H., & Malone, J. C. (2004). Varying problem effort and choice: Using the interspersal technique to influence choice towards more effortful assignments. Journal of Behavioral Education, 13, 193-207. Chall, J. S. (1983). Stages of reading development. New York: McGraw-Hill. Daly, E. J., Chafouleas, S., & Skinner, C. H. (2005). Interventions for Reading Problems: Designing and Evaluating Effective Strategies. New York: The Guilford Press. Deno, S. L., & Merkin, P. K. (1977). Data-based problem modification: A manual. Reston, VA: Council for Exceptional Children. Deno, S. L., Merkin, P. K., & Chiang, B. (1982). Identifying valid measures of reading. Exceptional Children, 49, 36-45. Freeland, J., Jackson, B., & Skinner. C. H. (1999, Nov). The effects of reinforcement on reading rate of comprehension. Paper presented at the twenty-sixth annual meeting of the Mid-South Educational Research Association. Point Clear, AL. Freeland, J. T., Skinner, C. H., Jackson, B., McDaniel, C. E., & Smith, S. (2000). Measuring and increasing silent reading comprehension rates via repeated readings. Psychology in the Schools, 37, 415-429. Fuchs, L. S., & Fuchs, D. (1992). Identifying a measure for monitoring student reading progress. School Psychology Review, 21, 45-58. Fuchs, L. S., Fuchs, D., Hosp, M. K., & Jenkins, J. R., (2001). Oral reading fluency as an indicator of reading competence: A theoretical, empirical, and historical analysis. Scientific Studies of Reading, 5, 239-256. Fuchs, L. S., Fuchs, D., & Maxwell, L. (1988). The validity of informal reading comprehension measures. Remedial and Special Education, 9(2), 20-28. Hale, A., Skinner, C. H., Williams, J., Neddenriep, C. E., & Dizer, J. (in press). Comparing comprehension following silent and aloud curriculum-based measurement reading across elementary and secondary students. Behavior Analysis Today. Hale, A. D., Skinner, C. H., Winn, B. D., Oliver, R., Allin, J. D., & Molloy, C. C. M. (2005). An investigation of listening and listening-while-reading accommodations on reading comprehension levels and rates in students with emotional disorders. Psychology in the Schools, 42, 39-52. Hintze, J. M., & Shapiro, E. S. (1997). Curriculum-based measurement and literature-based reading: Is curriculum-based measurement meeting the needs of changing curricula? Journal of School Psychology, 35, 351-375. Jackson, N. E. (2005). Are university students’ component reading skills related to their text comprehension and academic achievement? Learning and Individual Differences, 15, 113-139. Jenkins, J. R., & Jewell, M. (1993). Examining the validity of two measures for formative teaching: Reading aloud and maze. Exceptional Children, 59, 421-432.

171

The Behavior Analyst Today

Volume 8, Issue 2, 2007

LaBerge, D., & Samuels, S. J. (1974). Toward a theory of automatic processing in reading. Cognitive Psychology, 6, 293-323. McDaniel, C. E., Watson, T. S., Freeland, J. T., Smith, S. L., Jackson, B., & Skinner, C. H. (May 2001). Comparing silent repeated reading and teacher previewing using silent reading comprehension rate. Paper presented at the Annual Convention of the Association for Applied Behavior Analysis: New Orleans. National Assessment of Educational Progress (NAEP) (2005). The nation’s report card: Reading 2005. Retrieved on August 24, 2006, from http://nces.ed.gov/nationsreportcard/pdf/main2005/2006451.pdf Neddenriep, C. E., Skinner, C. H., Hale, A. D., Oliver, R., & Winn, B. D. (in press). An investigation of the concurrent validity of reading comprehension rate: A direct, dynamic measure of reading comprehension. Psychology in the Schools. Perfetti, C. A. (1992). The representation problems in reading acquisition. In P. B. Gough, L. C. Ehri, & R. Treiman (Eds.) Reading acquisition (pp. 145-174). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers. Potter, M. L., & Wamre, H. M. (1990). Curriculum-based measurement and developmental reading models: Opportunities for cross-validation. Exceptional Children, 57, 16-25. Rasinski, T. V. (2004). Creating fluent readers. Educational Leadership, 61 (6), 46-51. Runyan, M. K. (1991). The effect of extra time on reading comprehension scores for university students with and without learning disabilities. Journal of Learning Disabilities, 24, 104-108. Skinner, C. H. (1998). Preventing academic skills deficits. In T. S. Watson & F. Gresham (Eds.). Handbook of child behavior therapy: Ecological considerations in assessment, treatment, and evaluation (pp. 61-83). New York: Plenum. Skinner, C. H. (2002). An empirical analysis of interspersal research: Evidence, implications and applications of the discrete task completion hypothesis. Journal of School Psychology, 40, 347368. Skinner, C. H., Neddenriep, C. E., Bradley-Klug, K. L., & Ziemann, J. M. (2002). Advances in Curriculum-Based Measurement: Alternative rate measures for assessing reading skills in pre- and advanced readers. Behavior Analyst Today, 3, 270-281. Skinner, C. H., Pappas, D. N., & Davis, K. A. (2005). Enhancing academic engagement: Providing opportunities for responding and influencing students to choose to respond. Psychology in the Schools, 42, 389-403 Skinner, C. H., Wallace, M. A., & Neddenriep, C. E. (2002). Academic Remediation: Educational application of research on assignment preference and choice. Child and Family Behavior Therapy, 24, 51-65. Shapiro, E. S. (2004). Academic skills problems: Direct assessment and intervention (3rd ed.). New York: The Guilford Press. Spargo, E. (1989). Timed readings. Providence, RI: Jamestown Publishers.

172

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly, 21, 360-406. Stough, L. M. (1993). Research on multiple-choice questions: Implications for strategy instruction. Paper presented at the Annual Convention of the Council for Exceptional Children, San Antonio, April 5-9. Wallace, M., & Williams, R. L. (2003). Multiple-choice exams: Explanations for student choices. Teaching of Psychology, 29, 234-237. Williams, R. L., & Eggert, A. (2002). Notetaking predictors of test performance. Teaching of Psychology, 29, 234-237. Williams, R. L., Oliver, R., Allin, J., Winn, B., & Booher, C. (2003a). Knowledge and critical thinking as course predictors and outcomes. Inquiry: Critical Thinking across the Disciplines, 22, 57-63. Williams, R. L., Oliver, R., Allin, J., Winn, B., & Booher, C. (2003b). Psychological critical thinking as a course predictor and outcome variable. Teaching of Psychology, 30, 220-223. Williams, R. L., Oliver, R., & Stockdale, S. (2004). Psychological versus generic critical thinking as predictors and outcome measures in a large undergraduate human development course. Journal of General Education, 53, 37-58. Williams, R. L., & Worth, S. (2002). Thinking skills and work habits: Contributors to course performance. Journal of General Education, 51, 200-227. Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock-Johnson Tests of Achievement-Third Edition. Itasca, IL: Riverside Publishing. Corresponding Author: Dr. Christopher H. Skinner, Telephone: 865-974-8403,

Email: [email protected] Author Contact information:

Christopher H. Skinner, Ph.D. Professor, University of Tennessee, Knoxville 525 Claxton Addition 1122 Volunteer Blvd Knoxville, TN 37996-3452 Telephone: 865-974-8403 Email: [email protected] Robert Williams, Ph.D. Professor University of Tennessee, Knoxville 525 Claxton Addition 1122 Volunteer Blvd Knoxville, TN 37996-3452

173

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Telephone: 865-974-6625 Email: [email protected] Kathryn E. Jaspers, B.A. University of Tennessee, Knoxville 525 Claxton Addition 1122 Volunteer Blvd Knoxville, TN 37996-3452 Telephone: 865-974-4169 Email: [email protected]

Advertisement

The Behavioral Development Bulletin BAO is proud to announce the latest addition to our family of free online journals, The Behavioral Development Bulletin. The Behavioral Development Bulletin is the official journal of the Behavioral Development Special Interest Group of the Association for Behavior Analysis (ABA). The BDB journal has been previously published in hard copy format for several years and is now available to readers in electronic format. All past issues will soon be archived and available online. The BDB journal is especially relevant to behavior analysts who study the developmental processes responsible for behavior changes and their progressive organization. The BDB journal hopes to provide answers by looking at the biological and environmental factors that affect behavioral development, while maintaining primarily interest in the role of environmental contingencies in behavior change. A link to the BDB journal is available at the BAO homepage: http://www.behavior-analystonline.org, or you may go directly to the BDB journal homepage: http://www.behavioraldevelopment-bulletin.com We hope you will enjoy the BDB journal!

Joe Cautilli/BAO Journals

174

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Within-Session Changes in the Preratio Pause on Fixed-Ratio Schedules of Reinforcement Adam Derenne and Kathryn A. Flannery

Performances under fixed-ratio schedules of reinforcement are characterized by a “post-reinforcement” or “preratio” pause that precedes responding for the reinforcer. This paper summarizes views on the origins of pausing and presents a series of molecular analyses that shed new light on the conditions under which pausing occurs. Among the findings are that pausing is not limited to the delay before the first response of the ratio and that pause durations increase over time (in some cases more than doubling across a 40-ratio session). Similar findings were made also with variable-ratio schedules. The results are consistent with the view that pausing is a byproduct of a competition between the scheduled reinforcer and other sources of behavioral control. Keywords: fixed ratio, variable ratio, schedule of reinforcement, preratio pause, rats

Fixed-ratio (FR) schedules of reinforcement consist of the delivery of a reinforcer after every nth response. If, for example, subjects consistently are required to complete 50 responses to obtain a single food pellet, then the ratio of responses to reinforcers is 50:1 (FR 50). Performances engendered by FR schedules are characteristically biphasic: an initial delay in responding (i.e., what is termed either the postreinforcement pause or the preratio pause) precedes a period of responding at a sustained rapid pace that terminates with the delivery of the reinforcer. The preratio pause is of interest to researchers because it is a well-known feature of FR performances, yet one that is not required by the contingencies. Indeed, an optimally efficient pattern of responding would not include the pause: the sooner the subject completes the response requirement, the sooner the reinforcer will be delivered. For this reason, the preratio pause has been likened to human procrastination (Shull & Lawrence, 1998). Pausing under FR schedules might occur, in part, because the subject needs time to consume the previously delivered reinforcer or to recover from fatigue incurred in the course of responding. However, it is unlikely that either factor can explain pausing as comparisons with variable-ratio (VR) schedules show that subjects can complete equivalent numbers of responses while making much shorter pauses (Crossman, Bonem, & Phelps, 1987; Mazur, 1983). There is not a consensus as to why pausing occurs, but contemporary accounts emphasize competing sources of behavioral control (for recent discussions of the origins of pausing see Mazur, 2005; Perone, 2003; Pierce & Cheney, 2004; Wade-Galuska, Perone, & Wirth, 2005). For example, pausing has been described as a byproduct of a competition between inhibition elicited by the nonreinforcement of responses early in the ratio and excitation elicited by the upcoming reinforcer (e.g., Derenne & Baron, 2001). Alternatively, pausing might be an act of temporary “escape” from aversive features of the response requirement (e.g., Perone, 2003). Two additional explanations for pausing are highlighted by an earlier study (Derenne & Baron, 2002) in which subjects lever pressing for food on a FR schedule had continuous access to a bottle filled with a saccharine solution. Although the subjects were not liquid deprived, the presence of the saccharine solution led to polydipsia-like levels of drinking. Drinking principally occurred at the beginning of each ratio and corresponded with markedly longer pausing. Perhaps pausing occurs in part because the low probability of the scheduled reinforcer at the beginning of the reinforcer promotes adjunctive-like behaviors that compete with the response; in this case the result was excessive drinking, but other behaviors might show a similar effect (cf. Pierce & Cheney, 2004). However, the drinking during the pause also suggests parallels between pausing and the appearance of “impulsiveness” under the AinsileRachlin model of self-control (Ainslie, 1974; Rachlin, 1970; Rachlin & Green, 1972). That is, the

175

The Behavior Analyst Today

Volume 8, Issue 2, 2007

scheduled food reinforcer may be akin to the larger-later reinforcer in that it is relatively efficacious, but available only after the response requirement has been completed. Alternative sources of reinforcement (like the saccharine) may serve a role similar to smaller-sooner reinforcers in that they are of a lesser magnitude, but immediately available. Pausing, therefore, could occur because subjects choose one or more of the alternative reinforcers over the scheduled reinforcer. One implication of the various accounts of pausing summarized here is that pausing should be of variable duration due to the waxing and waning of competing sources of behavioral control. For example, under the excitation-inhibition account, inhibition may increase over the course of the session because each successive ratio results in the nonreinforcement of responses early in the ratio. At the same time, excitation should also be variable as instances of earlier responding may be differentially reinforced by the resulting improved rate of reinforcement. Other variables subject to change may include the efficacy of the reinforcer for reasons of satiation and habituation, and tolerance for the aversive aspects of responding because of habituation. However, textbook representations of FR performances depict cumulative records of FR performances that resembles an ascending staircase in which uniformly long pauses (the “steps”) abruptly transition into periods of sustained rapid responding. The clearest evidence of the variable nature of pausing comes from studies reporting the frequency distribution of pause durations (e.g., Baron & Herpolsheimer, 1999; Derenne & Baron, 2002; Griffiths & Thompson, 1973). Typical of these distributions is that the distribution is positively skewed. With a skewed distribution, it’s possible that the mean pause is of pronounced duration, yet subjects’ performances are frequently efficient. In other words, on many ratios the pause may not greatly exceed the minimum time the subject needs to consume the previous reinforcer and prepare to respond again. It is only under certain conditions that the subject ceases to respond for a prolonged period. Little is known about why or when FR performances may fluctuate between relative efficiency and inefficiency. To help shed light on the conditions under which pausing is especially likely, the following two studies characterize how pausing varies within experimental sessions. Study 1 The first data set was obtained with a group of rats that were trained under FR schedules using a procedure similar to that which we have reported elsewhere (Derenne & Baron, 2002; Derenne, Richardson, & Baron, 2006). In this case, once steady-state behavior was achieved, a series of molecular analyses were performed to characterize the nature of within-session changes in responding. One possibility is that the competing sources of behavioral control do not remain in balance over the course of the session, but rather pause durations become increasingly long or increasingly short as some sources achieve ascendancy. In other words, the positive tail of the distribution may be comprised of pauses that occur chiefly during the first or final part of the session. Two reasons why pause durations might increase over time include an accumulation of response inhibition and a decrease in reinforcer efficacy due to satiation and habituation. Conversely, pausing might decrease over time because subjects habituate to response costs or because variations in the reinforcement rate serve to differentially reinforce short pauses. Several studies appear to show that there are within-session changes in FR performances, but the evidence is not definitive. Cumulative records in Ferster and Skinner’s (1957) seminal Schedules of Reinforcement, for example, show a tendency for pause durations to increase over time; however, firm conclusions cannot be reached because subjects had complex learning histories, the records frequently depict transitional rather than steady-state behavior, and in many cases only a part of the session is shown. More recently, McSweeney and colleagues found that the rate of responding decreases within FR sessions (Aoyama & McSweeney, 2001; McSweeney & Swindell, 1999), which is also consistent with an increase in pausing. However, such an outcome might reflect a change in run times alone rather than a change in

176

The Behavior Analyst Today

Volume 8, Issue 2, 2007

pausing (the two phases of FR performances were combined into a single measure). Furthermore, Crossman, Trapp, Bonem, and Bonem (1985) reported that under at least small FRs (e.g., FR 2), pause durations tend to decrease within sessions. Method Subjects Six male Sprague-Dawley rats were approximately 11 months old at the start of the experiment; all had previous training with moderate-sized FR schedules. The rats were maintained at 80% of their free-feeding weights, and housed individually with free access to water. Illumination in the vivarium followed a 12:12 hr light/dark cycle; data were collected during the light period at approximately the same time each day. Apparatus Data were collected in two Coulbourn Instruments experimental chambers for rats (12 in. x 10 in. x 12 in. interior dimensions). Each chamber was enclosed in a Coulbourn isolation cubicle. The food reinforcer (a 45-mg Noyes pellet) was delivered to a recessed food cup positioned at floor level in the middle of the front wall of the chamber. A retractable lever (Coulbourn H23-17A) was located 6 cm to the left of the food cup and 7 cm above the grid floor. A speaker was located 6.5 cm to the right of the food cup and 8.5 cm above the floor. A house light was positioned at the top of the rear wall of the chamber, across from the food cup. The chambers were linked to microcomputers in an adjacent room; data were recorded using Coulbourn Graphic State 2.0 software. Procedure Experimental sessions were conducted 5 days a week. At the beginning of each session, the house light was turned on and the lever was extended into the chamber. When the ratio requirement was completed, a 1000 Hz tone was sounded for 0.5 s, after which the reinforcer (one food pellet) was delivered to the food cup. Sessions ended after the completion of 41 ratios. At the end of the session, the light was turned off and the lever was retracted. Because the subjects had previous experience with FR schedules, training began with a ratio of 15. The ratio then was increased in steps of 5 to FR 30 and, for some, steps of 10 to larger ratios. Subjects received a minimum of three sessions at each ratio. The goal was to obtain for each rat a ratio at which marked pausing occurred (as defined by a median pause duration of 10 s or longer), and yet experimental sessions consistently were completed. The terminal ratio ranged from FR 25 to FR 80. Subjects received training at the final ratio for 12 sessions or until performances stabilized as determined by visual inspection of the data. The total duration of observations ranged from 37 to 55 sessions (the median was 53 sessions), including from 13 to 31 sessions under the terminal ratio size (the median was 17.5 sessions). Results and Discussion The preratio pause is typically defined as the time from the delivery of the reinforcer until the completion of the first response of the following ratio. “Pauses” may occur after the first response because responding on one ratio carries over onto the subsequent ratio (Griffiths and Thompson [1973] referred to this phenomenon as a “response overrun”). Mazur and Hyslop (1982), however, also demonstrated that the transition from a period of nonresponding to the rapid responding that characterizes the ratio run may be gradual. One reason for such a “warm-up” effect may be that the subject is at or near an indifference point between competing sources of behavioral control. To ensure that all pauses in initial responding are incorporated into the measure of the preratio pause, some researchers have defined the preratio pause as the time until the fourth or fifth response of the ratio rather than the first (e.g., Ator,

177

The Behavior Analyst Today

Volume 8, Issue 2, 2007

1980; Capehart, Eckerman, Guilkey, & Shull, 1980; Iversen, 1976). We adopted a similar approach for some of the analyses reported here, but for the purpose of characterizing early responding within the ratio, pausing was defined initially in terms of the time until the first response. Figure 1 shows the mean duration of the latency until the first response (the traditional definition of the pause) as well as the subsequent four interresponse times (e.g., “0-1” refers to the time from the delivery of the previous reinforcer until the completion of the first response; “1-2” refers to the time from the first to the second response). The data were derived from the last eight sessions with the terminal ratio. Pause durations are shown for ratios 2 through 41; data from the first ratio were omitted because the initial pause duration could not be measured from a previous reinforcer. The panels representing individual performances have been grouped by ratio. To show within-session changes, the data within each panel have been grouped into four consecutive blocks of 10 ratios.

Mean Interresponse Time (in seconds)

100

FR 25

R38

100

10

10

1

1

0.1 100

0.1

FR 60

R23

100

10

10

1

1

0.1 100

R24

0.1

FR 80

R40

100

10

10

1

1

0.1

R43

1st

2nd

3rd

4th

0.1

R42

1st

0-1 2-3 4-5

2nd

1-2 3-4

3rd

4th

Consecutive Blocks of Ratios Figure 1. Mean interresponse times for the first five responses of the ratio in Study 1. The data within each panel have been grouped into four blocks of 10 ratios. The vertical lines show the standard error. The figure shows that the time from the delivery of the previous reinforcer until the first response was a longer latency than any of the subsequent few interresponse times. The first IRT was nearly as long

178

The Behavior Analyst Today

Volume 8, Issue 2, 2007

as the pause for some subjects (e.g., R40), but in all cases the IRT decreased to 1 s or less by the fifth response. Although this pattern generally held throughout the session, the time until the completion of the first response increased within the session for all subjects (the increase ranged from 3.1 s to 36.1 s; median = 11.1 s). An increase was found as well in some of the interresponse times, although the results in this regard were less consistent. Figure 2 provides a more detailed examination of how pausing changes over time. The data points show the mean pause for each successive ratio comprising the session. In light of the finding, described above, that substantial pausing may occur after the first response, the pause was redefined as the time from the delivery of the previous reinforcer until the completion of the fifth response of the ratio (i.e., the five data points for each time point in Figure 1 have been compressed into a single value). The figure confirms that an increase in pausing can be observed even when a different definition of pausing is used. The increase in pausing appears larger here than in Figure 1, in part because the y-axis scaling is not logarithmic. The rate of increase varied across subjects (note that the slopes of a regression analysis are displayed in each panel); the exact change as measured by the difference from the first five to the last five ratios ranged from +2.7 s to +26.7 s (median = +20.9 s). For some subjects pause durations more than doubled during the session. The change is considerably smaller for other subjects in part because pausing initially decreased before the increase began (the effect is most pronounced for R38 and R43). The initial decrease in pausing is consistent with a reinforcer sensitization effect (cf. McSweeney, Hinson, & Cannon, 1996).

80

R43

80

S lope: 0.55

60

Mean Pause Duration (in seconds)

FR 25

R38

Slope: 0.29

60

40

40

20

20

0

0 0

10

20

30

10

100

20

50

0

30

40

30

40

30

40

Slope: 1.32

150

40

20

R24

200

Slope: 0.59

60

0

FR 60

R23

80

40

0 0

10

20

30

40

20

20

0

20

Slope: 0.35

60

40

10

R42

80

Slope: 0.85

60

0

FR 80

R40

80

40

0 0

10

20

30

40

0

10

20

Consecutive Ratios Figure 2. The mean preratio pause as a function of the position within the session in Study 1. The value on each panel shows the slope of the regression equation.

179

The Behavior Analyst Today

Volume 8, Issue 2, 2007

A limitation of showing changes in the mean pause over time is that the mean does not reveal whether all pauses increased or whether the change occurred only in some parts of the distribution. As noted in the Introduction, the distribution of individual pause durations has a positive skew. Baron and Herpolsheimer (1999) noted that although pausing is usually described in terms of the mean duration, it would be more appropriate to characterize pausing either in terms of a measure of central tendency less sensitive to outliers (e.g., the median), or to compare how the whole distribution varies across conditions. Figure 3 shows how the distribution of pausing (represented by the 10th, 25th, 75th, and 90th percentiles of pause durations) varied within sessions. The figure shows the distribution to have a positive skew (note that the difference between the 90th and 75th percentiles is usually considerably larger than that between the 10th and 25th percentiles). The y-axis scaling is logarithmic because of the extreme durations found at the 90th percentile.

Pause Duration (in seconds)

FR 25

R38

100

10

10

1

1000 1 0

1

2

3

4

R23

100

R43

100

FR 60

0

1

2

3

4

2

3

4

R24

100

10

10

1

1 0

1

2

3

4

R40

1000

FR 80

0

1

R42

100

100 10 10 1

10th 75th

25th 90th

1 2nd 2 1st

3 3rd

1 0

1 2 1st 2nd

3 3rd

4 4th

0

4 4th

Consecutive Blocks of Ratios Figure 3. The values of the 10th, 25th, 75th, and 90th percentiles of the distribution of pause durations in Study 1. The data within each panel have been grouped into four blocks of 10 ratios. The figure shows that the distribution became increasingly skewed over the course of the session. Moreover, those subjects that showed the greatest change in the skew also showed the greatest change in

180

The Behavior Analyst Today

Volume 8, Issue 2, 2007

the mean pause (cf. Figure 2). In other words, the change in the mean pause was due principally to an increase in the longest pauses (the positive tail of the distribution). Figure 3 also shows, however, that changes in the distribution were not limited to the upper range of the distribution, but included to a lesser extent other percentiles as well. The increase in within-session pausing can be attributed to several sources. Increasing pause durations may reflect a progressive loss of reinforcer efficacy due to accumulated satiation and habituation (cf. McSweeney et al., 1996). Also possible, for example, is that the increase in pausing reflects an increase in conditioned inhibition due to the repeated nonreinforcement of responses early in the ratio. In line with Rescorla-Wagner (1972) theory of conditioning, however, the changes in inhibition should be greatest following the first few instances and then become increasingly less pronounced over subsequent ratios (i.e., the function should resemble a learning curve); instead, the figure shows nearly the opposite pattern of behavior. Study 2 Study 2 was designed to confirm and extend the findings of Study 1 by examining whether the same trends would be observed if less pausing occurred overall. For this reason, ratio sizes were smaller and session durations were shorter than was the case in Study 1, and half of the subjects were trained on a FR schedule, and the other half were trained on a VR schedule. Pausing under FR and VR schedules appears to be affected by similar variables, including the overall response requirement and the magnitude of the reinforcer (Blakely & Schlinger, 1988; Schlinger, Blakely, & Kaczor, 1990). The chief reason why VR pause durations are shorter is that VR pausing is additionally dependent on the size of the smallest ratio appearing in the schedule (Schlinger et al., 1990). Specifically, pauses are of minimal duration when only a single response sometimes leads to reinforcement, and they become increasingly long as the smallest possible ratio increases. This finding is consistent with the various perspectives on pausing described above. For example, from an excitationinhibition standpoint, pausing is reduced because the initial part of the ratio is correlated to a lesser degree with nonreinforcement (i.e., there is less inhibition). Alternatively, the occasional small work requirement may reduce the aversiveness of responding, or reduce discounting of the upcoming reinforcer. Assuming that pausing under the two ratio schedules is controlled by the same set of processes, a molecular examination of VR performances should show that pause durations increase within sessions, and that this increase is reflected in changes in the skew of the underlying distribution. Method Subjects and Apparatus Eight male Sprague-Dawley rats were approximately 5 months old at the start of the experiment. Details relating to animal care and the apparatus used in data collection were the same as that described above for Study 1. Procedure Unless otherwise noted, details of the procedure were the same as those described in Study 1. Following initial lever training, half of the subjects were assigned to FR training, and the other half to VR training. The subjects received 25 sessions with a series of progressively larger ratios. The subjects then received an additional 32 sessions of training under either a FR 20 or a VR 20 schedule. The VR schedule was comprised of seven different ratios that appeared in an irregular sequence: 5, 10, 15, 20, 25, 30, and 35. Sessions ended after the completion of 26 ratios. Results and Discussion

181

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Analyses of the data were based on the last eight sessions of training. The preratio pause was defined as the time from the delivery of the previous reinforcer until the completion of the fifth response of the ratio for the reasons described in Study 1. Data from the first ratio were omitted because the pause could not be measured from a previous reinforcer. Figure 4 shows the 10th, 25th, 75th, and 90th percentile of the distribution of pause durations for five consecutive blocks of five ratios. Note that the scaling of the y-axis for the FR subjects is double (or more in the case of R25) that for the VR subjects. Despite relatively small ratios and short session durations, the results resemble those obtained in Study 1. For the four FR subjects, the distribution became increasingly positively skewed over the course of the session. That is, substantial increases were observed at the 90th percentile of pause durations, whereas changes at lower percentiles were considerably more modest. The VR subjects evince similar trends, although pausing was shorter and the changes in skew more modest. Three of the four VR subjects showed a decrease in pausing at the beginning of the session, which is similar to the sensitization-like effect observed in Study 1. VR 20 50

FR 20 100

R14

40 30

60

20

40

10

20

0

0 0

50

5

10

15

20

25

0 100

R15

40

Pause Duration (in seconds)

R23

80

80

30

60

20

40

10

20

5

10

15

20

25

10

15

20

25

10

15

20

25

R24

0

0 0 50

5

10

15

20

140 120 100 80 60 40 20 0

R16

40 30 20 10 0 0 50

5

10

15

20

25

0

60

20

40

10

20

0

5

R26

80

30

5

R25

100

R22

40

0

25

10th 75th

25th 90th

0 0

1st 5

2nd 10

3rd 15

4th 20

5th 25

0

1st 5

2nd 10

3rd 15

4th 20

5th 25

Consecutive Blocks of Ratios

Figure 4. The values of the 10th, 25th, 75th, and 90th percentiles of the distribution of pause durations in Study 2. The data within each panel have been grouped into four blocks of 10 ratios. Figure 5 shows how pausing varied on a ratio-by-ratio basis within sessions. The figure does not show the mean pause, as was reported in Figures 1 and 2, but rather the geometric mean of pauses, an alternative measure of central tendency that we have used in earlier studies (Derenne & Baron, 2002; Derenne, Richardson, & Baron, 2006). The geometric mean is well suited as a measure of central tendency in pausing because it is sensitive to the skew of the distribution (unlike the median) but it is less

182

The Behavior Analyst Today

Volume 8, Issue 2, 2007

influenced by extreme values (unlike the mean; cf. Shull, 1991). The figure confirms that pause durations increase within sessions. A regression analysis was performed for each set of data points, and the slopes of the function appear in each panel. The rate of increase was found to be greater for the four FR subjects than the four VR subjects.

R14

VR 20

R23 60

20

Slope: 0.12

Slope: 0.43

40

10

Geometric Mean of Pause Durations (in seconds)

FR 20

20

0

0 0

R15

10

20

0

R24

10

20

60

20

Slope: 0.04

Slope: 0.39

40

10

20

0

0 0

20

R16

10

20

0

R25

60

Slope: 0.27

10

20

Slope: 0.53

40 10

20

0

0 0

R22

10

20

0

R26

10

20

60

20

Slope: 0.52 40

10

Slope: 0.12

20

0

0 0

10

20

0

10

20

Consecutive Ratios Figure 5. The geometric mean of preratio pause durations as a function of the position within the session in Study 2. The value on each panel shows the slope of the regression equation. General Discussion

183

The Behavior Analyst Today

Volume 8, Issue 2, 2007

The present research examined the possibility of within-session changes in the preratio pause on FR schedules. Study 1 revealed that two changes occur. First pausing becomes increasingly long during the course of the session (chiefly because the distribution of pauses becomes increasingly positively skewed). Second, the transition from a period of nonresponding at the beginning of the ratio to a pattern of sustained, rapid responding becomes increasingly gradual. Study 2 suggested that these trends can be observed when the ratio is small and the session relatively short, or when VR schedules are used in place of FR schedules. Additionally, the findings from the two studies taken in combination suggest that changes in within-session pausing can be observed regardless of whether the pause is defined in terms of the first or fifth response or whether pausing is described in terms of the mean or geometric mean. Within-session changes have been observed with a wide range of operant behavior (cf. McSweeney et al., 1996). However, the changes have always been found in some measure of response rate, and it is noteworthy that within-session changes can also be obtained in a measure of behavior defined as the absence of responding. McSweeney and colleagues (e.g., McSweeney et al., 1996; McSweeney & Roll, 1998) have identified sensitization, satiation, and habituation to the reinforcer as primary causes of within-session changes in behavior, and it is possible that these processes also contributed to the changing behavior patterns described here. For example, the initial decrease in pausing observed with some subjects could reflect sensitization to the reinforcer (i.e., an increase in reinforcer efficacy). The subsequent increase in pause durations observed with all subjects could reflect satiation and habituation (i.e., a decrease in reinforcer efficacy). It is unclear why a sensitization-like effect was not observed in all cases, or why the effect was most pronounced with the VR-trained subjects. Perhaps sensitization effects are most likely to be found in pausing when the reinforcer has relatively greater control over early-ratio behavior (as occurs VR schedules). Consistent with this interpretation, Crossman et al. (1985) found pause durations to decrease within-sessions under very small FRs. As noted earlier, preratio pause durations might increase within-sessions for reasons other than variations in reinforcer efficacy. Empirically, there is little basis to choose among the various possibilities, and as we have suggested elsewhere (Derenne & Baron, 2001, 2002), it may be the case that the various explanations for pausing are essentially correct; that is, the most accurate description of pausing may be one that draws upon multiple perspectives. Nevertheless, the results of the present research do make a stronger case for within-session changes in pausing as resulting from changes in reinforcer efficacy than from changes in other sources of behavioral control. For example, pausing could increase within sessions because of increases over time in conditioned inhibition: each successive ratio causes the delivery of one reinforcer to be associated with a subsequent period of nonreinforcement. Conditioning generally entails the greatest changes in behavior following initial stimulus pairings, and subsequent trials bring diminishing returns. In the present case, however, the greatest increases in pausing did not occur at the beginning of the ratio. Rather, pausing sometimes decreased at the beginning of the ratio, and the largest increases were obtained at the end of the session. References Ainslie, G. W. (1974). Impulse control in pigeons. Journal of the Experimental Analysis of Behavior, 21, 485-489. Ator, N. A. (1980). Mirror pecking and timeout under a multiple fixed-ratio schedule to food delivery. Journal of the Experimental Analysis of Behavior, 34, 319-328. Aoyama, K., & McSweeney, F. K. (2001). Habituation may contribute to within-session decreases in responding under high-rate schedules of reinforcement. Animal Learning & Behavior, 29, 79-91. Baron, A., & Herpolsheimer, L. R. (1999). Averaging effects in the study of fixed-ratio response patterns. Journal of the Experimental Analysis of Behavior, 71, 145-153.

184

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Blakely, E., & Schlinger, H. (1988). Determinants of pausing under variable-ratio schedules: Reinforcer magnitude, ratio size, and schedule configuration. Journal of the Experimental Analysis of Behavior, 50, 65-73. Capehart, G. W., Eckerman, D. A., Guilkey, M., & Shull, R. L. (1980). A comparison of ratio and interval reinforcement schedules with comparable interreinforcement times. Journal of the Experimental Analysis of Behavior, 34, 61-76. Crossman, E. K., Bonem, E. J., & Phelps, B. J. (1987). A comparison of response patterns on fixed-, variable-, and random-ratio schedules. Journal of the Experimental Analysis of Behavior, 48, 395406. Crossman, E. K., Trapp, N. L., Bonem, E. J., & Bonem, M. K. (1985). Temporal patterns of responding in small fixed-ratio schedules. Journal of the Experimental Analysis of Behavior, 43, 115-130. Derenne, A., & Baron, A. (2001). Time-out punishment of long pauses on fixed-ratio schedules of reinforcement. The Psychological Record, 51, 39-51. Derenne, A., & Baron, A. (2002). Preratio pausing: Effects of an alternative reinforcer on fixed and variable-ratio responding. Journal of the Experimental Analysis of Behavior, 77, 272-282. Derenne, A., Richardson, J. V., & Baron, A. (2006). Long-term effects of suppressing the preratio pause. Behavioural Processes, 72, 32-37. Felton, M., & Lyon, D. O. (1966). The post-reinforcement pause. Journal of the Experimental Analysis of Behavior, 9, 131-134. Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement. New York: Appleton-Century-Croft. Griffiths, R. R., & Thompson, T. (1973). The post-reinforcement pause: A misnomer. The Psychological Record, 23, 229-235. Iversen, I. H. (1976). Interactions between reinforced responses and collateral responses. Psychological Record, 26, 399-413. Mazur, J. E. (1983). Steady-state performance on fixed-, mixed, and random-ratio schedules. Journal of the Experimental Analysis of Behavior, 39, 293-307. Mazur, J.E. (2005). Learning and Behavior (6th ed.). Upper Saddle River, NJ: Pearson Education, Inc. Mazur, J. E., & Hyslop, M. E. (1982). Fixed-ratio performance with and without a postreinforcement timeout. Journal of the Experimental Analysis of Behavior, 38, 143-155. McSweeney, F. K., Hinson, J. M., & Cannon, C. B. (1996). Sensitization-habituation may occur during operant conditioning. Psychological Bulletin, 120, 256-271. McSweeney, F. K., & Roll, J. M. (1998). Do animals satiate or habituate to repeatedly presented reinforcers? Psychonomic Bulletin & Review, 5, 428-442. McSweeney, F. K., & Swindell, S. (1999). Behavioral economics and within-session changes in responding. Journal of the Experimental Analysis of Behavior, 72, 355-371.

185

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Perone, M. (2003). Negative effects of positive reinforcement. The Behavior Analyst, 26, 1-14. Pierce, W. D., & Cheney, C. D. (2004). Behavior analysis and learning (3rd ed.). Mahwah, NJ: Lawrence Erlbaum. Powell, R. W. (1968). The effects of small sequential changes in fixed-ratio size on the postreinforcement pause. Journal of the Experimental Analysis of Behavior, 11, 589-593. Rachlin, H. (1970). Introduction to modern behaviorism. San Francisco: W.H. Freeman. Rachlin, H., & Green, L. (1972). Commitment, choice and self-control. Journal of the Experimental Analysis of Behavior, 17, 15-22. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory. New York: Appleton-Century-Crofts. Schlinger, H., Blakely, E., & Kaczor, T. (1990). Pausing under variable-ratio schedules: Interaction of reinforcer magnitude, variable-ratio size, and lowest ratio. Journal of the Experimental Analysis of Behavior, 53, 133-139. Shull, R. L. (1991). Mathematical description of operant behavior: An introduction. In I. H. Iversen & K. A. Lattal (Eds.), Experimental analysis of behavior (Part 2, pp. 243-282). Amsterdam: Elsevier Science. Shull, R. L., & Lawrence, P. S. (1998). Reinforcement: Schedule performance. In K. A. Lattal, & M. Perone (Eds.), Handbook of research methods in human operant behavior (pp. 95-129). New York: Plenum Press. Wade-Galuska, T., Perone, M., & Wirth, O. (2005). Effects of past and upcoming response-force requirements on fixed-ratio pausing. Behavioural Processes, 68, 91-95.

Author Contact Information: Adam Derenne & Katherine Ann Flannery Department of Psychology University of North Dakota PO Box 8380 Grand Forks ND 58202-8380 Tel.: 701-777-4215 Email: [email protected] [email protected]

186

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Evolutionary Psychology and Behavior Analysis: Toward Convergence Jeremy E. C. Genovese Evolutionary psychology and behavior analysis are both committed to a scientific understanding of human behavior. This paper argues that - despite decades of mutual misunderstanding - greater convergence between these two theoretical orientations is now possible. Observations from evolutionary neurobiology, trait psychology, and behavior genetics allow us to move beyond old controversies. Keywords: Evolutionary psychology; Behavior Analysis; Behaviorism

It could be argued that many of the great leaps in human understanding occur when separate domains of inquiry are discovered to be united by some previously unsuspected commonality. Examples include; Descartes’ unification of geometry and algebra (Singer, 1959) and the neo-Darwinian synthesis of natural selection and Mendelian genetics. In the latter case it is worth noting the acrimony that existed between Darwinians and Mendelians before the synthesis. The followers of Mendel had come to see inherited traits as discrete while the Darwinians thought in terms of continuous variation and tended toward Lamarckian notions of heredity. At the time these camps seem irreconcilable, yet the combined work of empirical investigators and mathematical theorists was able to create a powerful theory that embraced both Mendelian genetics and selection (Mayr, 1982). One wonders if such a synthesis is now possible between the discoveries of behavioral analysis and the insights of evolutionary psychology. There is a deep history of antagonism between these two approaches, which has reappeared in various guises over the last half century (e.g. Lorenz, 1965). It is not hard to find instances of misunderstanding and even incomprehension of the rival point of view. For example, Schlinger (1996) claims that: the problem with evolutionary explanations of behavior is that the evidence proffered to support them is so fraught with methodological problems that it is simply insufficient to warrant any conclusions about the role of genes and, thus, evolution (p. 75 ) While, Pinker’s (2002) persuasive polemic for evolutionary and genetic influences on human behavior gratuitously, The Blank Slate, dismisses Skinner as a “Maoist” (p. 246). Skinner (1953) was certainly open to the possibility that genes and natural selection shape behavior. Skinner (1948), through the fictional Frazier, founder of Walden Two, argues: “What is ‘original nature of man?’ I mean, what are the basic psychological characteristics of human behavior – the inherited characteristics, if any, and the possibilities of modifying them and creating others? That’s certainly an experimental question – for a science of behavior to answer” (p. 162). This statement is typical of Skinner’s comments on this topic, he always admits that genes and evolution may play role, but is vague about what those influences might be or is skeptical about our ability to understand or study those influences. For example, in Science and Human Behavior he even seems favorably disposed to such unorthodox ideas as Huntington’s (1938) climatic determinism and Sheldon’s (Sheldon & Stevens, 1942) theory of somatotypes. Yet, just when he acknowledges these intriguing possibilities, he dismisses their relevance to the study of behavior: Even when it can be shown that some aspect of behavior is due to season of birth, gross body type, or genetic constitution, the fact is of limited use. It may help us in predicting behavior, but it

187

The Behavior Analyst Today

Volume 8, Issue 2, 2007

is of little value in an experimental analysis or in practical control because such a condition cannot be manipulated after the individual has been conceived. The most that can be said is that knowledge of the genetic factor may enable us to make better use of other causes. If we know that an individual has certain inherent limitations, we may use our techniques of control more intelligently, but we cannot alter the genetic factor (Skinner, 1953, p. 26). There are several things wrong with this argument. First, in experimental situations with animals it was quite possible to manipulate genetic background through selective breeding. Skinner must have known this since some of his earliest research was on maze bright and maze dull rodent strains (Skinner, 1930; Heron and Skinner, 1940). Genetic engineering has now given us the power to manipulate specific genes and the use of knock-out animals has become common in behavioral research. In addition, Skinner makes the common mistake of assuming that a genetic trait is necessarily unchangeable. On the contrary, it is has been well know since the 1960s that mental retardation can be prevented in individuals with the genetic disorder phenylketonuria through dietary manipulation (Woolf, 1962). Thus, contrary to Skinner, there are cases where a genetically determined behavior may be altered through environmental intervention. It is hard to see how Skinner could not have been aware of this. The advent of PCR technology, which gives us the ability to identify specific genes and determine their behavioral correlates, seems to render much of Skinner’s reluctance about genetic research moot. Finally, advances in neurophysiological techniques and understanding seem to void Skinner’s (1953) prediction that we “may never have this sort of neurological information at the moment it is needed in order to predict a specific instant of behavior” (p. 28). The Evolutionary Function of Learning Perhaps a good place to begin an inquiry into the relationship between evolutionary psychology and behaviorism is to examine what we know about the evolution of learning. Dawkins (1976) pointed out that learning probably evolved as a method for coping with unpredictable environments: One way for genes to solve the problem of making predictions in rather unpredictable environments is to build in a capacity for learning. Here the program may take the form of the following instructions to the survival machine: ‘Here is a list of things defined as rewarding: sweet taste in the mouth orgasm, mild temperature, smiling children. And here is a list of nasty things: various sorts of pain, nausea, empty stomach, screaming child. If you should happen to do something which is followed by one of the nasty things, don’t do it again, but on the other hand repeat anything which is followed by one of the nice things.’ The advantage of this sort of programming is that it greatly cuts down the number of detailed rules which have to be built into the original program; and it is also capable of coping with changes in the environment which could not have been predicted in detail (pp. 60 – 61). This is, of course, a very general conjecture; how much can we really know about the evolutionary history of learning and the selective forces that have shaped it? Evolutionary hypotheses about both the selective advantages of specific traits and the lines of descent of particular species have been criticized as “just so stories,” incapable of falsification (Schlinger, 1996). This objection has considerably less force today than it might have had in the past. Phylogenetic trees, inferred from comparative anatomy or behavior, can now be tested against genetic relatedness (Papini, 2002B; Streidter, 1998). In addition, selective breeding experiments can shed considerable light on evolutionary processes (e.g. Lofdahl, Holliday, & Hirsch, 1992; Mery & Kawecki, 2002). Modularity versus General Laws of Learning

188

The Behavior Analyst Today

Volume 8, Issue 2, 2007

By some accounts the distance between evolutionary psychology and behaviorism appears vast. However, some of the differences may not be as great as they seem. Evolutionary psychologists are, for the most part, cognitivists and repeat the standard criticisms of behaviorism (e.g. Pinker, 1997), but these criticisms are aimed at a straw person. Modern behaviorists may use a different vocabulary and have a different philosophical orientation, but internal mental states are not considered unworthy of consideration (Shrimp, 1989). Using conceptual tools such as “rule governed behavior” and “relational frame theory” (Hayes & Wilson, 1995), behaviorists try to explain many of the same phenomena that cognitivists investigate. One disagreement, however, involves a fundamental characterization of behavior. Evolutionary psychologists tend to speak of the mind’s modularity. They see the brain as a collection of specific largely independent problem solving mechanisms; while behaviorism has historically been committed to the notion of universal laws of learning that transcend species. Evolutionary psychologists point out that natural selection is not teological, it can not anticipate future adaptive challenges. Evolution is not aimed at some distal end point; selection is based on the differential reproductive success of organisms solving proximal adaptive challenges or as Dawkins (1976) states “evolution is blind to the future” (p. 9). Therefore, the argument goes, we should not expect to find general adaptive strategies but only specific behavioral/cognitive modules. Pinker (1997) has succinctly described this proposition: the mind is organized into modules or mental organs, each with a specialized design that makes it an expert in one area of interaction with the world. The modules’ basic logic is specified by our genetic program. Their operation was shaped by natural selection to solve the problems of the hunting and gathering life led by our ancestors in most of our evolutionary history. The various problems of our ancestors were subtasks of one big problem for their genes, maximizing the number of copies that made it into the next generation. (p. 21). Tooby and Cosmides (1992) have claimed modularity as a central tenet of evolutionary psychology, although some evolutionary psychologists have dissented from this view (e.g. Scher & Rauscher, 2002). Pinker (1997) compares any version of general laws of learning with the discredited belief in a universal protoplasm. After the discovery of the intracellular substance in the 1830s, some biologists postulated a vitalistic notion of protoplasm as the force behind all physiological processes. This hypothesis was overthrown by the discovery of functionally specific organelles (Asimov, 1959; Mayr, 1982). Reasoning by analogy, Pinker claims “a jack-of-all-trades is master of none, and that is just as true for our mental organs as for physical organs” (p. 28). Behaviorists have, on the other hand, noted the similarity in basic patterns of learning across phylogeny. Certain types of learning, habituation, sensitization, Pavlovian conditioning and operant conditioning appear in an astonishing array of organisms both vertebrate and invertebrate (Abramson, 1994; Papini, 2002A). More over, the underlying physiologies of these processes are similar (Kandel, 1976). Kandel (2005), using the language of cognitivism, argued that common adaptive challenges led to common learning mechanisms: if learning in humans and other higher animals involves the establishment of certain cognitive structures, why are aspects of such cognitive mechanisms likely to be similar in humans and in simple animals like the snail Aplysia? One good reason for believing that this would be the case lies in the consequences of adaptation to evolutionary pressure. Animals that differ greatly in habitat and heritage nonetheless face common problems of adaptation and survival, problems for

189

The Behavior Analyst Today

Volume 8, Issue 2, 2007

which learning and flexible decision making are useful. When different species face a common environmental constraint, they often manifest homologous patterns of adaptation because a successful solution to an environmental challenge, first evolved in a common primitive ancestor, will continue to be inherited as long it remains useful and the selective pressure is present (p. 124). Is synthesis possible? The fact that brains have an evolutionary history can help us recover and reframe the notion of general laws of learning. Williams (1992) pointed out that “an organism is a living record of its own history” (p. 76) and that: adaptation is seriously constrained by phylogeny. Natural selection never designs new machinery to cope with new problems. In an important sense it does not even redesign old machinery. It can only achieve a slow alteration in the parameters of old machinery. Given enough time, natural selection can readily turn something like an arm and hand into something like a leg and hoof, much less readily into a bifurcate crustacean appendage. Appreciation of the long-term blindness of natural selection makes it possible to understand why the giraffe, in evolving an ever longer neck, did not add more vertebrae. Adding new vertebrae would have required a redesign of the neck skeleton. A slow increase in one parameter of each vertebra, its length, was the available way selection could bring about the general elongation (p. 76). This suggests that evolution often proceeds though the gradual adjustment of regulatory parameters and not by the de novo creation of specialized modules. Since the evidence supports the existence of basic forms of learning we can assume that there are basic learning mechanisms common across species and are, most likely, the consequence of common descent or parallel evolution (Papini, 2002B). There is nothing in evolutionary theory to preclude the existence of general laws of learning in the sense that if basic mechanisms of learning are features of even relatively simple nervous systems then we would expect that these features would continue to be selected for and preserved across phylogeny. For example, habituation has been observed in many species and seems to be a property of very simple nervous systems (Rankin, 2004). It can be reasonably assumed that habituation represents a basic principle of animal learning. The laws of learning can not be separated from their biological substrate (i.e. the nervous system). Brains are homologous structures and share many common features, which must, in turn, determine many common behavioral traits. As Quartz (2002) points out there are basic design features of the nervous system that are common to both vertebrates and invertebrates. He gives the example of brain reward systems, found in species as diverse as humans and honey bees, and notes that: the existence of highly conserved nervous system developmental mechanisms suggests that nervous systems, despite their apparent diversity, share a deep structure, or common design principles, just as the fact that two million distinct species share only 35 major body plans suggest that body plans share many common design principles (p. 195). The fact that neural reward structures are highly conserved over the course of evolution suggests that it is reasonable to speak of operant conditioning – the obvious behavioral correlate of those structures (Brembs, 2003) - as a general law of learning. On the other hand we can expect that divergent patterns of brain evolution reflect the modification of these common laws of learning to solve species specific learning problems. What evolutionary psychologists call modularity may be viewed instead as evolved constraints on learning. This perspective, accomplishes two tasks. It validates the notion of general learning laws and it provides mechanism for evolved differences. Evolutionary psychologists are quick to hypothesize the existence of modules, but slow to describe what those mechanisms might be. The concept of preparedness might help bridge this gap.

190

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Thinking about Preparedness Textbooks on behavior analysis now acknowledge the importance of evolutionary constraints on learning and include discussions of such phenomena as instinctual drift, the Garcia effect, and preparedness (e.g. Leslie, 1996, Mazur, 2002). The notion of a preparedness dimension to learning was spelled out in a 1970 paper by Seligman who argued that “the organism may be more or less prepared by the evolution of its species to associate a given CS and US or a given response to with an outcome” (p. 408). He proposes that for any given species a learned behavior could be placed on a continuum running from prepared through unprepared to contraprepared. One way to think about evolutionary psychology is that it is the study of the evolved species specific preparedness. Preparedness can be thought of a parameter of learning - not dissimilar to the type of parameter described by Williams (1992) - susceptible to selection. The human brain is a product of natural selection and we should expect that an understanding of our evolutionary history would be a rich source of information about human behavior. An interesting example of human species specific preparedness can be seen in a study by Sharps et al. (2002) which found evidence that humans have a superior ability to learn animal tracks in comparison with certain other forms of visual learning. Sharps and his colleagues suggest that this ability was selected for during our long evolution as hunter gatherers. Understanding Individual Differences Both evolutionary psychologists and behaviorists have been criticized for ignoring individual differences. In the past evolutionary psychology tended to focus more on common species traits and ignore individual differences (Tooby & Cosmides, 1992), while behaviorists tended to reduce individual differences to differences in reinforcement history. Behaviorists have been particularly suspicious of trait theories of personality (Flora, 2004; Skinner, 1974). Yet trait psychology has produced a body of evidence that has real predictive power and shows that these traits have strong genetic components (Furnham & Heaven, 1999; Matthews, Deary, & Whiteman, 2003). The conceptual problem we have is to try to understand the underlying neuropsychology of the trait constructs. A number of personality theorists have tried to establish, with some success, correlations between personality and conditionability (e.g. Vogel, 1961). I would like to argue that this promising approach is insufficiently reductionist. We should take the individual differences in conditionability as our starting point. Differential conditionability in no way contradicts the central ideas of learning theory, but may explain the genetic component of individual behavioral differences. An example of this approach is Eysenck’s (1983) argument that genetic differences in susceptibility to respondent conditioning explains individual differences in anti-social behavior. Gray’s (Pickering, et al. 1997; Pickering, Diaz, & Gray, 1995) reinforcement sensitivity model also attempts to understand personality in terms of differential conditionability. A useful research approach might be to test a large sample of people on a variety of conditioning tasks and investigate their behavioral correlates. Exploratory factor analysis, which is fundamentally a pattern recognition technique, would be useful in discerning basic regularities of conditionability. This research would bear resemblance to Eysenck’s (1953) early work, but it would be unburdened by preconceived personality constructs. Genetic differences in conditionability may help explain why individuals are differentially affected by environmental stimuli. An important study by Caspi et al. (2002) found that a polymorphism of the gene for the enzyme monoamine oxidase A (MAOA) moderates the effects of maltreatment in

191

The Behavior Analyst Today

Volume 8, Issue 2, 2007

childhood. Maltreatment during childhood increased the probability of antisocial behavior. Maltreated children with the low activity MAOA gene were more likely to develop antisocial behavior than maltreated children with the high activity MAOA gene. Moffitt (2005) has pointed out that this type of research might help us improve the effectiveness of our interventions. Recognizing the existence of genetically based differences in conditionability also has implications for integrating the insights of behavioral and evolutionary approaches. Selective breeding experiments have demonstrated that it is possible to change a population’s intrinsic level of conditionability (Lofdahl, Holliday, & Hirsch, 1992). Variation of a genetic trait in a population is a prerequisite for natural selection (Minkoff, 1983). Progress in understanding the genetics, adaptive value, and neuromechanisms of conditionability could contribute to robust evolutionary explication of the preparedness phenomenon. Toward Convergence Behaviorism and evolutionary psychology share a common commitment to a scientific understanding of human behavior. Yet the relationship between these two currents has been clouded by mutual misunderstanding. Today, there is reason to think that greater convergence is possible. An article by Barash (2005) in The Chronicle of Higher Education suggests a rethinking is in order. Barash, an evolutionary psychologist, writes: Into my own comfortable conceptual dichotomy (“behaviorism bad; evolutionism good”), there came an apple of discord when I happened to reread Skinner’s Beyond Freedom and Dignity, published more than three decades ago. Please don’t misunderstand; I haven’t become a convert to behaviorism. But I have emerged with a deeper respect for B. F. Skinner and his work, and a recognition that in his legacy, not just evolutionary biologists but all scientists have a potent intellectual ally. His research didn’t encompass neurobiology, sociobiology, or, indeed, biology at all. But there is no doubt that he “did” science, and, moreover, that he provided the rest of us with some conceptual tools and arguments that will help us along our way (p. B10) No science of human behavior can be inconsistent with the principles of natural selection. Any evolutionary model must explain the cross species regularities in learning. Evolutionary psychologist and behaviorists have much to learn from each other. We would all do well to pay heed to Skinner’s (1977) admonition; “the behavior of organisms is a single field in which both phylogeny and ontogeny must be taken into account” (p. 1012). References Abramson, C. I. (1994). A primer of invertebrate learning: A behavioral perspective. Washington, DC: American Psychological Association. Asimov, I. (1959). Words of science and the history behind them. Boston: Houghton Mifflin Company. Barash, D. P. (2005). B.F. Skinner, Revisited. Chronicle of Higher Education, 51(30), B10. Brembs, B. (2003). Operant reward learning in Aplysia. Current Directions in Psychological Science, 12, 218 – 221. Caspi, A., McClay, J., Moffitt, T., Mill, J., Martin, J., Craig, I. W., Taylor, A., & Poulton, R. (2002). Role of genotype in the cycle of violence in maltreated children. Science, 297, 851-854.

192

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Dawkins, R. (1976). The selfish gene. New York: Oxford University Press. Eysenck, H. J. (1953). The structure of human personality. London: Methuen & Company. Eysenck, H. J. (1983). The social application of Pavlovian theories. The Pavlovian Journal of Biological Sciences, 18, 117 – 125. Flora, S. R. (2004). The power of reinforcement. Albany, NY: State University of New York Press. Furnham, A., & Heaven, P. (1999). Personality and social behaviour. London: Arnold. Hayes, S. C., & Wilson, K. G. (1995). The role of cognition in complex human behavior: A contextualistic perspective. Journal of behavior therapy and experimental psychiatry, 26, 241 – 248. Heron, W. T., & Skinner, B. F. (1940). The rate of extinction in maze-bright and maze-dull rats. The Psychological Record, 4, 11 - 18 Huntington, E. (1938). Season of birth: Its relation to human abilities. New York: John Wiley & Sons, Inc. Kandel, E. R. (1976). Cellular basis of behavior: An introduction to behavioral neurobiology. San Francisco: W. H. Freeman and Company Kandel, E. R. (2005). Psychiatry, Psychoanalysis, and the new biology of mind. Washington, DC: American Psychiatric Publishing, Inc. Leslie, J. C. (1996). Principles of behavior analysis. Amsterdam: Harwood Academic Publishing. Lofdahl, K. L., Holliday, M., & Hirsch, J. (1992). Selection for conditionability in Drosophila melanogaster. Journal of Comparative Psychology, 106, 172 – 183. Lorenz, K. (1965). Evolution and modification of behavior. Chicago: University of Chicago Press. Matthews, G., Deary, I. J., & Whiteman, M. C. (2003). Personality Traits (2nd ed.). Cambridge, UK: Cambridge University Press. Mayr, E. (1982). The growth of biological thought: Diversity, evolution, and inheritance. Cambridge, MA: The Belknap Press of Harvard University Press. Mazur, J. E. (2002). Learning and behavior. Upper Saddle River, NJ: Prentice Hall. Mery F. & Kawecki, T. (2002). Experimental evolution of learning ability in fruit flies. Proceedings of the National Academy of Sciences, 99, 14274 – 14279. Minkoff, E. C. (1983). Evolutionary biology. Reading, MA: Addison-Wesley Publishing Company. Moffitt, T. E. (2005). The new look of behavioral genetics in developmental psychopathology: Geneenvironment interplay in antisocial behaviors. Psychological Bulletin, 131, 533 – 554.

193

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Papini, M. R. (2002A). Comparative psychology: Evolution and development of behavior. Upper Saddle River, NJ: Prentice Hall. Papini, M. R. (2002B). Pattern and process in the evolution of learning. Psychological Review, 109, 186 – 201. Pickering, A. D, Corr, P. J., Powell, J. H., Kumari, V., Thorton, J. C., & Gray, J. A. (1997). Individual differences in reactions to reinforcing stimuli are neither black nor white: To what extent are they Gray? In Nyborg, H. (Ed) The scientific study of human nature: Tribute to Hans J. Eysenck at eighty. pp. 36-67. Amsterdam: Pergamon/Elsevier Science Inc Pickering, A. D., Diaz, A., & Gray, J. A. (1995). Personality and reinforcement: An exploration using a maze-learning task. Personality & Individual Differences, 18, 541-558. Pinker, S. (1997). How the mind works. New York: W. W. Norton & Company. Pinker, S. (2002). The blank slate: The modern denial of human nature. New York: Viking Penguin Putnam Inc. Quartz, S. R. (2002). Toward a developmental evolutionary psychology. In S. J. Scher, & F. Rauscher, F. (Eds.). Evolutionary psychology: Alternative approaches. pp. 183 – 210. Boston: Kluwer Academic Publishers. Rankin, C. H. (2004). Invertebrate learning: What can’t a worm learn. Current Biology, 14, R617 – R618. Scher, S. J., & Rauscher, F., (Eds.) (2002). Evolutionary psychology: Alternative approaches. Boston: Kluwer Academic Publishers. Schlinger Jr., H. D. (1996). How the human got its spots. Skeptic, 4(1), 68-76. Seligman, M. E. P. (1970). On the generality of the laws of learning. Psychological Review, 77, 406 – 418. Sharps, M. J., Villegas, A. B., Nunes, M. A., Barber, T. L. (2002). Memory for animal tracks: A possible cognitive artifact of human evolution. The Journal of Psychology, 136, 469 – 492. Sheldon, W. H. & Stevens, S. S. (1942). The varieties of temperament: A psychology of constitutional differences. New York: Harper & Row, Publishers. Shrimp, C. P. (1989). Contemporary behaviorism versus the old behavioral straw man in Gardner’s The mind’s new science: A history of the cognitive revolution. Journal of the Experimental Analysis of Behavior, 51, 163 – 171. Singer, C. (1959). A short history of scientific ideas to 1900. New York: Oxford University Press. Skinner, B. F. (1930). On the inheritance of maze behavior. Journal of General Psychology, 4, 342 – 346. Skinner, B. F. (1948/1973). Walden two. New York: Macmillan Publishing Company Skinner, B. F. (1953). Science and human behavior. New York: The Free Press.

194

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Skinner, B. F. (1974). About behaviorism. New York: Alfred A. Knopf. Skinner, B. F. (1977). Herrnstein and the evolution of behaviorism. American Psychologist, 32, 1006 – 1012). Striedter, G. F. (1998). Progress in the study of brain evolution: From speculative theories to testable hypotheses. The Anatomical Record, 4, 105-112. Tooby J. & Cosmides, L. (1992).The psychological foundations of culture. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.). The adapted mind: Evolutionary psychology and the generation of culture. (pp. 19 – 136). New York: Oxford University Press. Vogel, M. D., (1961). GSR conditioning and personality factors in alcoholics and normals. Journal of Abnormal and Social Psychology, 63, 417 – 421. Williams, G. C. (1992). Natural selection: Domains, levels, and challenges. New York: Oxford University Press. Woolf, L. I. (1962). Nutrition in relation to phenylketonuria. Proceedings of the Nutrition Society, 21, 2129.

Author Contact Information: Jeremy Genovese Department of Curriculum and Foundations, College of Education and Human Services, Cleveland State University, Rhodes Tower 1351, Cleveland, Ohio, 44115 Tel: 216 523-7130 Email: [email protected]

195

The Behavior Analyst Today

Volume 8, Issue 2, 2007

The Aggregation of Single-Case Results Using Hierarchical Linear Models Wim Van den Noortgate and Patrick Onghena Katholieke Universiteit Leuven, Belgium To investigate the generalizability of the results of single-case experimental studies, evaluating the effect of one or more treatments, in applied research various simultaneous and sequential replication strategies are used. We discuss one approach for aggregating the results for single-cases: the use of hierarchical linear models. This approach has the potential to allow making improved inferences about the effects for the individual cases, but also to estimate and test the overall effect, and explore the generality of this effect across cases and under different conditions. Keywords: single-case; hierarchical linear model; replication; aggregation

Single-case experimental designs are used to evaluate the effect of one or more treatments on a single case. The case may be a subject or another single entity that forms the research unit, such as a school or a family. This entity is repeatedly observed, over the levels of one or several manipulated independent variables (Onghena, 2005). In the most basic design, the AB-phase design or interrupted time series design, the case is observed repeatedly during a first phase (A), typically a baseline phase before an intervention takes place, and in a second phase (B) after or during an intervention. To evaluate the effect of the intervention, scores in both phases are compared. Single-case designs have a long history in behavioral science (Ittenbach & Lawhead, 1997), but the last decades, single-case methodology has further been elaborated, aiming at improving the internal validity of the conclusions. For instance, reversal phase designs (e.g., an ABAB-design) or alternation designs with rapidly alternating conditions (e.g., an AABBBABAABB-design) rather than a simple ABphase design may be used in order to assess or control statistically for the effect of history, maturation or other time-related confounding variables. The effect of such confounding variables may further be controlled by means of randomization while setting up the study, for instance by randomly assigning measurement occasions over treatments or randomizing the time of intervention (Edgington, 1996). Although group designs receive much more attention in methodological courses and handbooks, in the last decades there has been renewed interest in single-case designs, especially in behavior modification and clinical psychology (Barlow & Hersen, 1984. Kazdin, 1982), neuropsychology (Caramazza, 1990), psychopharmacology (Cook, 1996), and educational research (Kratochwill & Levin, 1992). The popularity of the designs is also reflected in the relatively large number of articles published in the Behavior Analyst Today that discuss or apply a variety of single-case designs (about twenty between 2001 and 2006). Single-case designs indeed are very attractive in several situations (Franklin, Allison, Gorman, 1997; Onghena, 2005). Single-case studies may be relatively easy to set up and are much less expensive than large-scale group-comparison studies. This makes the designs also attractive for practitioners, who want to get a first insight into the effect of a treatment. An additional strength of single-case designs is that, in contrast to group designs that give insight into the average effect of a treatment, they give an in-depth insight into the behavior of one single case. Especially in clinical settings, the research indeed often focuses on the effect of a treatment for a specific case.

196

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Finally, since only a single case is investigated, the design often allows making a large number of repeated observations, enabling a detailed study of the evolution of the behavior. Single-case designs thus are (initially) aimed at drawing valid conclusions regarding one entity. Sometimes, for instance in applied clinical settings, the primary interest may indeed be in this single entity, since it concerns a case that presented itself with a problem to solve. Moreover, the problem presented by the client may be relatively rare. Yet, the question often arises what can be learned from this case for other cases, or if and to what degree the results of the case can be generalized to other similar cases. An important drawback of single-case experimental studies is that the results are in principle restricted to the cases that were studied. In situations for which it seems reasonable that the effect is common for a whole population, the results of a single-case study may be informative for other entities of this population, but such a generalization cannot be made based on statistical grounds. A natural way to explore the generalizability of single-case results is replication over other entities. According to Barlow and Hersen (1984), this can be done by a) direct replication, which is a replication of the study by the same investigator but on other entities, b) systematic replication, replicating the study varying one or more factors such as the characteristics of the setting, the experimenter or the treatment, and c) clinical replication, which is a more advanced replication by the same investigator of a treatment package containing two or more distinct treatment procedures on a series of clients presenting similar problems. Such replications can be carried out sequentially, for instance every time a client enters a clinical setting. Alternatively, simultaneous replication can be part of the study design. In a multiple baseline across participants design for instance, several cases are investigated simultaneously by means of an AB-phase design (Ferron & Scott, 2005). While group designs typically give information about the effect of a treatment on ‘an average case’, and single-case designs give information about the effect on a specific case, a set of single-case studies combines the strengths of both designs. It offers the opportunity to draw conclusions about specific cases, but also about an average or typical case, as well as exploring systematically the conditions under which an effect will or will not occur. In this article, we want to describe a way of aggregating the results from single cases in a quantitative and systematic way, maximally exploiting the information that is available in the data: the use of hierarchical linear models. The approach will be introduced by means of an elaborated example. An example In a recent number of the Journal of Early and Intensive Behavior Intervention, Lawson and Greer (2006) use a multiple baseline design with middle school students with academic delays, to study the effects of writer immersion and viewing the effect of their writing on responses from readers on the quality of writing. Participants were seven 9th graders, diagnosed with behavioral and learning disabilities. In a first experiment, three students were asked in several sessions to write an essay describing a simple picture, such that a reader could reconstruct the picture without seeing it. Initially, in the baseline phase, they were not given any feedback. In a second phase, the teacher gave them written and verbal praise for correct structural components (grammar and spelling), and corrections for incorrect responses, as well as verbal feedback on the content of the descriptions. In a final phase, the writer immersion phase, all communication was done in written form. Feedback in this phase included comments on the structural components, as well as viewing the effect their writing had on the reader who tried to draw the picture based on the description, being blind for the objectives or the setup of the study. In both intervention phases, the pupils were asked to revise their essay after receiving the feedback. Cycles of feedback and revision continued until the descriptions were written correctly and resulted in correct drawings when used as instructions for a reader. In Figure 1, the results are shown for

197

The Behavior Analyst Today

Volume 8, Issue 2, 2007

one of the dependent variables, more specifically, the structural components of the text, evaluated in each session prior to feedback. The variable equals the number of correct responses to spelling, punctuation, capitalization, word choice, and sentence structure, divided by the total number of opportunities to respond within each essay, multiplied by 100 %.

Figure 1. The percent accurate structural components in essays for each of the three phases prior to teacher editing for Student A, B, and C in Experiment 1

198

The Behavior Analyst Today

Volume 8, Issue 2, 2007

The second experiment was a systematic replication of the first experiment. To assess the effect of writer immersion, without being preceded by the traditional approach of simple teacher editing, the effect of writer immersion was now investigated for four new cases, but without the second phase. Results are displayed in Figure 2.

Figure 2. The percent accurate structural components in essays prior to teacher editing for Student D, E, F, and G in Experiment

199

The Behavior Analyst Today

Volume 8, Issue 2, 2007

It is clear from Figure 1 that for the cases of the first experiment, scores in the teacher feedback phase and especially in the writer immersion phase are substantially higher than the scores in the baseline phase. The large effect of writer immersion that was found in the first experiment however could be a cumulative effect of teacher editing alone and writer immersion. The second experiment gives evidence for a substantial effect of writer immersion, even when not preceded by the simple teacher editing phase. This is also found if for each student the mean percentages for the phases are calculated and compared, as is done by the authors. For the first subject (Student A from Figure 1) for instance, the mean percentage was 30 % for the first phase, as compared to 73.25 for the second and 83.33 % for the third phase, a rather impressive difference of 40.25 % and 53.33 %! Unfortunately, large effect magnitudes are relatively rare in behavioral science (Cohen, 1988; Pillemer, 1984). Therefore drawing conclusions based on visual inspection of a graphical display or of summary statistics may be risky since small effects are difficult to see with the naked eye and therefore may remain undetected (Edgington & Onghena, 2007), although there is also evidence that based on visual inspection, it is too easily concluded that there is an effect (Matyas & Greenwood, 1990). Considering this, it is recommended to supplement the visual analysis of the graphical display by a statistical significance test. One could for instance think about t-tests or an analysis of variance to compare the phase means. Equivalently, a regression analysis can be performed to compare phase means for a specific student, including dummy variables indicating the phase:

Scorei = β0 + β1 ( phase2)i + β 2 ( phase3)i + ei

(1)

with scorei equal to the percentage of accurate structural components in session i, and phase2i and phase 3i equal to 1 if session i is part of the second, respectively third phase, 0 otherwise. While the intercept, β 0 , can be interpreted as the expected outcome in the baseline phase, β1 and β 2 reflect the effect of the teacher editing and the writer immersion, as compared to the baseline phase. The estimate of the intercept will be equal to the mean observed baseline score (for the first student, this is 30 %), the estimates of the regression coefficients of phase2 and phase3 as the difference in the observed means (40.25 % and 53.33 % for the first student). Testing the regression coefficients yields information about the evidence against the null hypothesis that there is no effect at all. For the first student, both effects appear to be statistically significant (p <.001). Unfortunately, the results of an analysis of variance, t-test, or regression analysis assume that residuals (ei) are independent, an assumption that is likely to be violated: there is probably some autocorrelation (Busk & Marascuilo, 1988). For instance, it is likely that subsequent residuals are more alike, due to time varying factors that are not controlled for in the model. Variables that have an influence at a specific moment will often also affect subsequent observations. The analyses further assume that scores are normally distributed, although they are relatively robust to violations of this assumption if groups are of comparable size, if they are not too small, and if the shapes of the distributions are similar (Posten, 1978). Techniques that were developed for the analysis of time series could be used, to account for the problem of autocorrelation, but these techniques require a large number (a minimum of 50 to 100) of measurements in each phase (Box & Jenkins, 1970; see Gorman & Allison, 1997, for an extensive discussion of statistical analyses of single-case data). In the following, we will see how the simple regression model can be extended to a hierarchical linear model to aggregate the results from several cases, that can easily be adapted to account for autocorrelation, even if for each case a small number of observations is available.

200

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Aggregating single-case results The graphs clearly suggest that at least for the cases participating in the study, there is a positive effect of the treatment on the quality of writing. The results suggest that there will probably be an effect for similar cases. An appealing question now is whether we can indeed expect a positive effect for a new case, and how large this effect is. Furthermore, in view of generalizability, an answer on the question whether the effect varies over cases is called for. Especially if the effect appears to vary over cases, we may search for an explanation of this variation, by exploring whether the effect varies according to known characteristics of the cases. In the example, we could explore whether the difference in performance between the baseline phase and the writer immersion phase depends on the presence of an intermediate phase. In this paragraph, we combine and compare systematically the results of the seven cases by means of a hierarchical linear model, in order to answer these questions. Hierarchical linear models were developed to analyze clustered data, for instance data stemming from pupils that can be grouped according to the school they belong to. In a hierarchical linear model, a regression equation is used to describe the variation of the scores within groups. In contrast to an ordinary regression equation the coefficients of the regression equation are allowed to vary over groups, and this variation is described by means of one or more additional regression equations (Raudenbush & Bryk, 2002; Snijders & Bosker, 1999). In our example, we have a similar structure: the scores we obtained can be grouped according to the pupil they stem from. We already saw a regression equation that can be used to describe the data of one case (Equation 1). For each case, we could define a similar regression equation. To write this set of seven equations in one single equation, we use an additional index j to indicate the student: Scoreij = β 0 j + β1 j ( phase2)ij + β 2 j ( phase3)ij + eij (2) Scoreij now indicates the percentage of accurate structural components for student j in session i. The index j is also added to the regression coefficients, indicating that these coefficients can vary over cases. At the higher level of the hierarchy, the level of the cases, this variation of regression coefficients is described with additional regression equations. The most basic equations state that the regression coefficients vary around a mean value:

β 0 j = γ 00 + u0 j β1 j = γ 10 + u1 j

(3)

β 2 j = γ 20 + u2 j It is assumed that the units at this level form a sample out of a population of units, and typically, that the regression coefficients of the first level (the β ’s) are normally distributed within this population. To that end, the residuals (u’s) are defined to follow a normal distribution with zero mean. The coefficient γ 00 can now be interpreted as the expected performance (this is the population mean) under the baseline condition, γ 10 and γ 20 as the expected effect of teacher editing and writer immersion, respectively. u0 j indicates the degree to which the baseline performance from case j deviates from this expected performance, while u0 j and u0 j refer to the deviation of the effects of the treatments for case j as compared to the expected effects. In the analysis, Equations 2 and 3 are regarded as one single model, and the γ ’s, as well as the variances of the residuals ( σ e2 , σ u20 , σ u21 ,and σ u22 ), are estimated. It is also possible to assume that the different kinds of residuals at the second level covary, and to estimate the covariances ( σ u0u1 , σ u0u2 ,and

201

The Behavior Analyst Today

Volume 8, Issue 2, 2007

σ u u ). Note that the residuals (the e’s and u’s), as well as the individual regression coefficients, the β ’s, 1 2

Notation

Parameter estimates (SE) Model 1 Model 2 39.00 (3.72) 33.63 (5.81)

Model 3 33.50 (5.94)

33.02 (5.12)

36.90 (5.43)

37.05 (5.78)

43.58 (4.60)

54.23 (5.40)

54.56 (5.79)

9.57 (7.48)

9.20 (7.69)

-21.98 (7.64)

-21.79 (8.08)

Increase effect phase 3 in group 2

γ 00 γ 10 γ 20 γ 01 γ 21

Variance between cases Intercept

σ u2

47.91 (47.20)

65.70 (56.68)

62.40 (57.79)

σ u2

38.50 (69.77)

0

0

σ

219.20 (40.84)

205.65 (36.30)

.079 (.16) 210.27 (39.54)

Mean baseline Mean effect Phase2 Mean effect Phase3 Increase baseline in group 2

0

Effect Phase3

2

Autocorrelation Residual variance

2 e

Deviance 611.1 592.3 592.1 are not estimated, although they could be estimated afterwards, as will be discussed further on. To estimate the parameters of the model by means of the commonly used restricted maximum likelihood procedure, we used the procedure MIXED from SAS (Littell, Milliken, Stroup, Wolfinger, & Schabenberger, 2006). Due to the small number of cases, especially in the second phase, we simplified the model by excluding the covariances, and by excluding the variance of the effect of the second phase. Results are given in Table 1, in the column headed Model 1. The code for performing the analysis in SAS and in SPSS is given in the Appendix.

Table 1. Parameter estimates and standard errors for the example It can be seen that the expected percentage of correct structural components is 39 % under the baseline condition, while it is respectively 33.02 % and 43.58 % higher in the teacher editing and writer immersion conditions. Note that the effect of the second condition is based on the data of the three cases of the first experiment only, since this phase was omitted in the second experiment. As a rule of thumb, if the estimate is twice as large as the standard error, the parameter differs significantly from zero when performing a two-sided test with a .05 significance level, since the ratio of the estimate over the standard error follows approximately a standard normal distribution (although it may be more accurate for the regression parameters to compare the ratio to a t-distribution, and to use a likelihood ratio test for testing the variance components; see Snijders & Bosker, 1999, for more details). Anyway, it is clear that the difference between baseline and both treatment conditions is highly significant. Comparing the estimate of the variance components with their corresponding standard errors reveals that there is no convincing evidence for differences between cases in the baseline performance, as well as in the effect of writer immersion. This is confirmed by performing a likelihood ratio test: the p-values are .08 and .48 respectively.

202

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Although there is no compelling evidence for variation in baseline performance between cases, nor for any effect of writer immersion, we could test whether there are differences between the cases from the first and the second experiment, by extending the regression equations describing the variation over cases, including the group as an additional independent variable:

β 0 j = γ 00 + γ 01 (group2) j + u0 j β1 j = γ 10

(4)

β 2 j = γ 20 + γ 21 (group2) j + u2 j where ( g ro u p 2 ) j equals 1 if case j belongs to the second group, 0 otherwise. Now γ 00 can be interpreted as the expected performance in the baseline condition for a case of the first group, γ 01 as the increase in this expected value if the case belongs to the second group. The expected effect in the first experiment is indicated by γ 20 , the additional effect in the second experiment by γ 21 . Results of this model are also presented in Table 1 (Model 2). The expected performance in the baseline condition is 33.63 % for the first group, while it is 9.57 % higher for the second experiment, a difference that is, however, not statistically significant, t = 1.28, df=59, p = .21. In the first experiment, cases score 54.23 % higher in the writer immersion condition, while in the second experiment the effect of writer immersion is 21.98 % smaller (and so equals only 32.25%). The difference between the baseline phase and the writer immersion phase thus is larger in the first experiment than in the second one, a difference that is statistically significant t = - 2.88, df = 59, p = .006. Further note that residual differences between cases again are not statistically significant. The estimated variance of the between-cases variance of the effect of writer immersion even equals zero. It may look strange that the estimate is exactly zero. This is because, after taking the group into account, observed differences between cases are even smaller than could be expected based on chance alone. This situation would lead to a negative estimate of the variance (see Snijders & Bosker, 1999), which is of course outside the range of possible values, and therefore the estimate is set to zero. This small variance estimate is no surprise: in the preceding model, we have already seen that there are only small differences between cases. Since there seems to be a substantial difference between both groups of cases, differences between cases after accounting for the group, must indeed be very small. Where we described a regression analysis for one case, we discussed two problems: the small number of observations, as well as the possible autocorrelation in residuals. One merit of using hierarchical linear models for aggregating single-case results is that –since the regression coefficients are not estimated for each case separately, but conclusions rather focus on the population of cases- a small number of observations per case is allowed. Cases with only a very few measurements (or even only one) also yield (little) information about the population of cases. More important for the analysis is the number of cases, which is in our example rather small, a problem that will be discussed later on. Further, the hierarchical linear models we described before still require independent residuals, but the models easily can be adapted to accommodate to this problem. To explore a possible autocorrelation, we re-estimated the parameters of the second model, extended with a first-order autocorrelation parameter (see Appendix for the SAS and SPSS code). The estimate of the autocorrelation parameter appeared to be relatively small and statistically not significant (p = .65 when using a likelihood ratio test), and therefore all other parameter estimates and standard errors remain approximately the same (Table 1, Model 3), and conclusions are unaffected.

203

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Discussion and conclusions The use of hierarchical linear models allows summarizing the findings of several cases examined in the same or in several studies in a systematic and quantitative way. More important, by aggregating single-case results, conclusions are not necessarily restricted to the studied cases, but may refer to a broader population as well. The use of hierarchical linear models permits, for instance, assessing the effect of a certain treatment for ‘an average case’, or, otherwise stated, the expected effect if a new case from the population were investigated. They further give the opportunity to evaluate the degree to which the effect varies over cases, and to look for characteristics of persons, settings, or treatments that have an influence on the effect. By aggregating the results of several cases, we increase the power for assessing an overall effect of a treatment. It is, for instance, possible that for a set of cases, no significant treatment effect is found although there is a tendency for an effect in a certain direction. When combining the results of the cases, the small parts of evidence are accumulated, and the overall treatment effect may become visible (i.e., statistically significant), and the size of the effect can be estimated accurately. Results of a set of cases thus combines the merits of single-case designs (giving information about individual cases) and of group designs (giving information about an average case). Yet, the aggregation of single-case results may also lead to improved estimates for the effects for individual cases. Based on the results of the analysis using hierarchical linear models, one could calculate estimates for the regression coefficients for specific cases (the β ’s from Equation 2), that are an optimally weighted mean of the overall regression coefficients (the γ ’s from Equations 3 and 4) and the regression coefficients that would have been obtained when performing an ordinary regression analysis for the separate cases (Equation 1). Especially when the number of observations for a case is small, and therefore the regression coefficients that would be obtained if using the data from that case only are relatively unreliable, and when the effect does not appear to vary largely over cases, the profit of using these estimates, called empirical Bayes estimates, is large (Snijders & Bosker, 1999). In the example, only relatively simple hierarchical linear models are illustrated. Hierarchical linear models however can easily be adapted, for instance to aggregate more complex phase designs, such as withdrawal or interaction designs, or alternation designs such as completely randomized single-case designs. The model could also include a second independent variable, yielding a factorial single-case design that can be used to evaluate the main effects of both variables, as well as their interaction effect (see Barlow & Hersen, 1984, and Edgington & Onghena, 2007, for more information about these and other single-case designs). A time-varying covariate, such as the session number, can be included in the model describing the individual scores (Equation 2) in order to model growth over time for instance due to maturation. An interaction term between this covariate and the phase indicator is included to model a possible differential growth in the conditions. Additional regression coefficients (e.g., the linear trend) may be defined as varying over cases, and more complex variance structures for the residuals can be defined. For instance, while in Equation 2, it is assumed that the residual variance is the same in the three phases, this restriction can be dropped and separate variance components for each phase can be defined. Finally, these hierarchical linear models can further be adapted to aggregate single-case results that are summarized by means of measures of effect size. Several extensions are discussed and illustrated by Van den Noortgate and Onghena (2003a; 2003b). Although the aggregation of single-case data using hierarchical linear models has a lot of potential assets, aggregation should be performed with care, and may even turn out difficult or impossible for certain designs or research settings. A first consideration is that usually in single-case research, cases are not sampled randomly. Analyses performed or conclusions drawn in single-case research usually also do not require random sampling, since the focus is on validly assessing the effect for the case that is concerned, not to generalize the results to a broader population. Yet, the assumption that cases form a

204

The Behavior Analyst Today

Volume 8, Issue 2, 2007

random sample out of a population of cases must be made when generalizing the results based on a set of single-case data. The researcher therefore should carefully examine and describe the characteristics of the cases involved. In the study that was used for the example, participants were also purposefully selected: cases were persons with behavioral and learning disabilities “chosen because of many structural errors in their writing, as well as their ability to write functionally” (Lawson & Greer, 2006, p. 153). It is clear that results are not to be generalized to cases that differ from the cases selected for the study. Note that this problem also often occurs in group experimental research, since group experiments are also frequently carried out on a sample that was not (completely) randomly drawn (Edgington & Onghena, 2007). A second limitation is that it may be difficult to model all the important patterns shown by the data. For instance, in the example, we merely compared the mean performance in the three phases (as did the original authors, Lawson & Greer, 2006). Yet, the graphs suggest that the effect is not immediately fully present, which is not surprising since during the intervention phase, students learn from the feedback they are given in each session. Furthermore, in the writer immersion phase, ceiling effects are likely to be present. This suggests that the effect that is estimated is probably an underestimation of the real effect. These phenomena –and others such as cross-over effects– can in principle be modeled, but the modifications would make the model much more complex and therefore more difficult to estimate and interpret. Whenever possible, graphical displays can (and, in many instances, should) be used to qualify and supplement the results of the hierarchical linear analysis. Another point of concern is the number of cases included in the analysis. It is clear that with only a small number of cases available, it is not possible to get reliable estimates of population characteristics, such as the mean effect and variation over cases of this effect. Moreover, it will be difficult to assess an existing overall effect or possible moderator effects, unless they are as large as seems to be the case in the example. The number of cases is of concern for a second reason: the parameters of hierarchical linear models are commonly estimated using maximum likelihood procedures, and tests are based on large sample properties of maximum likelihood estimates. This means that if the results of only a few cases are aggregated, as in the example, results should be considered as only indicative. To perform the analysis comfortably, at least about 20 cases are recommended, or even more in case several independent variables are included and/or several (co)variance components have to be estimated at the level of cases. The issue of the number of cases may be even more problematic in case hierarchical linear models are used to combine the single-case results of several studies. Single-case studies are often very heterogeneous regarding procedures, situations, and subject characteristics (Salzberg, Strain, & Baer, 1987). Heterogeneity could be regarded as a source of information since it allows searching for variables that moderate the effect. As illustrated in the example, those variables could be included as independent variables in the hierarchical linear model. Yet, if studies and cases differ with respect to a large number factors, modeling a possible moderator effect of these factors would imply a considerable extension of the model at the level of cases, requiring a substantial number of cases. Finally, we note that if the treatment is randomly assigned to measurement occasions or if the phase change is randomly chosen, a nonparametric randomization test can be used to evaluate the treatment effect (Edgington & Onghena, 2007; Ferron & Onghena, 1996; Todman & Dugard, 2001). Also for aggregating single-case results from simultaneous or sequential replication designs, randomization tests or other nonparametric procedures have been proposed (Edgington & Onghena, 2007; Ferron & Sentovich, 2002). The attractiveness of the tests lies in the fact that they do not resort to distributional assumptions and are applicable even for small numbers of observations and cases. Yet, the hierarchical linear models approach offers more possibilities, including a systematic search for moderator variables, or estimating the size of the treatment and moderator effects and as well as the variation between cases in these effects.

205

The Behavior Analyst Today

Volume 8, Issue 2, 2007

To conclude, we want to stimulate behavioral researchers and practitioners to continue replicating single-case studies and to consider using hierarchical linear models to aggregate single-case results of their own studies and/or from the ones from others, allowing to make inferences at the level of the individual cases, as well as at the group level. References Barlow, D. H., & Hersen, M. (1984). Single-case experimental designs: Strategies for studying behavior change (2nd ed.). New York: Pergamon Press. Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis, forecasting and control. San Francisco: Holden-Day. Busk, P. L., & Marascuilo, L. A. (1988). Autocorrelation in single-subject research: A counterargument to the myth of no autocorrelation. Behavioral Assessment, 10, 229-242. Caramazza, A. (1990). Cognitive neuropsychology and neurolinguistics: Advances in models of cognitive function and impairment. Hillsdale, NJ: Lawrence Erlbaum. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Cook, D. J. (1996). Randomized trials in single subjects: The N of 1 study. Psychopharmacology Bulletin, 32, 363-367. Edgington, E. S., & Onghena (2007). Randomization tests (4th ed.). Boca Raton, FL: Chapman & Hall/CRC. Edgington, E. S. (1996). Randomized single-subject experimental designs. Behaviour Research and Therapy, 34, 567-574. Ferron, J., & Onghena, P. (1996). The power of randomization tests for single-case phase designs. Journal of Experimental Education, 64, 231-239. Ferron, J., & Scott, H. (2005). Multiple baseline designs. In B. S. Everittt, & D. C. Howell (Eds), Encyclopedia of statistics in behavioral science (Vol. 3 pp. 1306-1309). Chichester, UK: John Wiley & Sons. Ferron, J., & Sentovich, C. (2002). Statistical power of randomization tests used with multiple-baseline designs. Journal of Experimental Education, 70, 165-178. Franklin, R. D., Allison, D. B., & Gorman, B. S. (1997). Introduction. In R. D. Franklin, D. B. Allison, & B. S. Gorman (Eds.), Design and analysis of single-case research (pp. 1-12). Mahwah, NJ: Lawrence Erlbaum. Gorman, B. S., & Allison, D. B. (1997). Statistical alternatives for single-case designs. In R. D. Franklin, D. B. Allison, & B. S. Gorman (Eds.), Design and analysis of single-case research (pp. 159-214). Mahwah, New Jersey: Lawrence Erlbaum Associates, Publishers. Ittenbach, R. F., & Lawhead, W. F. (1997). Historical and philosophical foundations of single-case research. In R. D. Franklin, D. B. Allison, & B. S. Gorman (Eds.), Design and analysis of single-

206

The Behavior Analyst Today

Volume 8, Issue 2, 2007

case research (pp. 13-39). Mahwah, NJ: Lawrence Erlbaum. Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings. New York: Oxford University Press. Kratochwill, T. R., & Levin, J. R. (1992). Single-case research design and analysis. New directions for Psychology and Education. Hillsdale, NJ: Lawrence Erlbaum. Lawson, T. R., & Greer, R. D. (2006). Teaching the function of writing to middle school students with academic delays. Journal of Early and Intensive Behavior Intervention, 3, 151-170. Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D., & Schabenberger, O. (2006). SAS® system for mixed models (2nd ed.). Cary, NC: SAS Institute Inc. Matyas, T. A., & Greenwood, K. M. (1990). Visual analysis of single-case time series: Effects of variability, serial dependence, and magnitude of intervention effects. Journal of Applied Behavior Analysis, 23, 341-351. Onghena, P. (2005). Single-case designs. In B. S. Everittt, & D. C. Howell (Eds), Encyclopedia of statistics in behavioral science (Vol. 3 pp. 1850-1854). Chichester, UK: John Wiley & Sons. Pillemer, D. B. (1984). Conceptual issues in research synthesis. Journal of Special Education, 18, 27-40. Posten, H. O. (1978). The robustness of the two-sample t test over the Pearson system. Journal of Statistical Computation and Simulation, 6, 295-311. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). London: Sage. Salzberg, C. L., Strain, P. S., & Baer, D. M. (1987). Meta-analysis for single-subject research: When does it clarify, when does it obscure? Remedial and Special Education, 8, 43-48. SAS Institute. (2004). SAS/STAT 9.1 User's guide. Cary, NC: SAS Publishing. Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage. SPSS Inc. (2005) Linear mixed modeling in SPSS [Web Page]. URL http://www.spss.com/home_page/wp127.htm [2007, January 17]. Todman, J. B., & Dugard, P. (2001). Single-case and small-n experimentel designs: A practical guide to randomization tests. Mahwah: Erlbaum. Van den Noortgate, W., & Onghena, P. (2003). Combining single-case experimental studies using hierarchical linear models. School Psychology Quarterly, 18, 325-346. Van den Noortgate, W., & Onghena, P. (2003). Hierarchical linear models for the quantitative integration of effect sizes in single-case research. Behavior Research Methods, Instruments, & Computers, 35, 1-10.

207

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Author Contact Information: Wim Van den Noortgate Department of Educational Sciences, Katholieke Universiteit Leuven. Vesaliusstraat 2, B-3000 Leuven, Belgium. Tel.: + 32 16 32 61 92. Fax: +32 16 32 59 34. E-mail: [email protected] Patrick Onghena Department of Educational Sciences, Katholieke Universiteit Leuven. Vesaliusstraat 2, B-3000 Leuven, Belgium. Tel.: + 32 16 32 59 54 Fax: +32 16 32 59 34 E-mail: [email protected]

Appendix: SAS and SPSS codes for the models from the example SAS The dataset, named Lawson, consists of one row for each score on the dependent variable. The number of rows per case therefore is equal to the number of sessions during which the case was measured. Besides the dependent variable, called Percent, it includes indicator variables for the case and session (called Case and Session), for the second group (Group2), and the teacher feedback and writer immersion treatment (Phase2 and Phase3). For information about managing data sets, we refer to the Help menu, as well as to SAS Institute (2004). The first model is fitted by running the following code: PROC MIXED DATA=Lawson COVTEST; CLASS Case; MODEL Percent= Phase2 Phase3 / SOLUTION; RANDOM Intercept Phase3 / SUB= Case; RUN;

In the first statement, the MIXED procedure is called. The DATA= statement refers to the data set in which the data are stored. The COVTEST option is added to get standard errors and approximate p-values for the variance components. In the second statement, the indicator variable Case, is defined as a categorical variable. In the third line, the fixed part of the model is described. The variable Percent is defined as the dependent variable, the variables Phase2 and Phase3 as independent variables. The model includes an intercept by default. The SOLUTION-option is used to request in the output the estimates, standard errors, t-statistics and p-values for significance testing for all fixed effects The RANDOM statement is used to describe the random part of the model. We indicate that the intercept as well as the coefficient of Phase3 can vary randomly across cases. The code is closed by the RUN-command.

208

The Behavior Analyst Today

Volume 8, Issue 2, 2007

For the second analysis, the model is extended by including additional independent variables. This is done by extending the MODEL-statement as follows: MODEL

Percent= Phase2 Phase3 Group2 Group2*Phase3 / SOLUTION;

Finally, to estimate the first-order autocorrelation, a single statement is added (before the RUNcommand), while the rest of the code remains unchanged: REPEATED / TYPE=AR(1) SUB=

Case;

While in the RANDOM statement, the random part on the second level is described, in the the same is done for the first level. The option TYPE=AR(1) requests modeling a firstorder autocorrelation within cases. REPEATED statement

SPSS The codes in SPSS are highly similar. Extensive information is found in the Help menu, as well as in SPSS (2005). For the first model, we run the following code: MIXED Percent WITH Phase2 Phase3 /PRINT = SOLUTION TESTCOV /FIXED = Phase2 Phase3 /RANDOM Intercept Phase3 | SUBJECT(Case). For the second model, we have: MIXED Percent WITH Phase2 Phase3 Group2 /PRINT = SOLUTION TESTCOV /FIXED = Phase2 Phase3 Group2 Group2*Phase3 /RANDOM Intercept Phase3 | SUBJECT(Case). To model a possible first-order autocorrelation, we write: MIXED Percent WITH Phase2 Phase3 Group2 /PRINT = SOLUTION TESTCOV /FIXED = Phase2 Phase3 Group2 Group2*Phase3 /RANDOM Intercept Phase3 | SUBJECT(Case) /REPEATED | COVTYPE(AR1) SUBJECT(Case).

209

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Weighing the Evidence for Psychotherapy Equivalence: Implications for Research and Practice Robin Westmacott & John Hunsley University of Ottawa In the past two decades, numerous meta-analyses have been published that examine the question of psychotherapy equivalence. Hunsley and Di Giulio (2002) critically reviewed this literature and concluded that there was abundant evidence that the Dodo bird verdict of equivalence across psychotherapies is false. In this article, we summarize and update Hunsley and Di Giulio’s (2002) review of recent meta-analyses and comparative treatment studies relevant to this question. Taken together, the empirical evidence clearly indicates that psychotherapy nonequivalence is the rule, not the exception. We discuss these findings and their implications for psychological research and practice. Keywords: psychotherapy equivalence, Dodo bird, comparative treatment studies, evidence-based treatments, effectiveness research

Since Rosenzweig (1936) asserted that all psychotherapy produced equivalent outcomes (and quoted the Dodo bird from Alice in Wonderland saying, “Everyone has won, and all must have prizes”), psychotherapy equivalence has been referred to as the Dodo bird verdict, and frequent claims have been made about the general equivalence of all forms of psychotherapy. Proponents of this perspective have argued that psychotherapy, in general, is effective and that there is no compelling evidence to suggest that some treatments are better than others for clinical problems (e.g., Bohart, O’Hara, & Leitner, 1998; Zinbarg, 2000). Accordingly, the various theoretical orientations are merely variations on a single theme and, although their distinctions may be important to clinicians and psychotherapy researchers, they are essentially meaningless with respect to actual treatment outcome. Claiming all psychotherapies are equivalent is like suggesting that, for example, because applied behavioral analysis is useful for treating autistic disorder, any treatment provided for this disorder, be it thought field therapy or play therapy, is likely to be equally effective. Indeed, Luborsky et al. (2003) recently suggested that psychoanalysis, despite a lack of empirical comparisons with other treatments, may plausibly be assumed to be equivalent to other efficacious psychotherapies in light of the typical research finding of psychotherapy equivalence. Given the ubiquity of the claims for psychotherapy equivalence and the limited attention typically given to the actual research purporting to support the claim, there is the real possibility that practitioners and students in mental health fields accept the Dodo bird verdict simply because it appears to be generally and uncritically accepted by others. In the past two decades, numerous meta-analyses have been published that examine this question of psychotherapy equivalence. Hunsley and Di Giulio (2002) critically reviewed this literature and concluded that there was abundant evidence that the Dodo bird verdict is false. In this article, we begin by summarizing Hunsley and Di Giulio’s (2002) review and then provide an updated review of relevant meta-analyses and comparative treatment studies published since Hunsley and Di Guilio’s review. We consider evidence from (a) treatment outcome studies that compare the treated group with a control group to whom no services are provided (typically, a wait-list control group) and (b) comparative treatment studies that compare at least two active treatments (with a no-treatment control group sometimes, but not always, included). Clearly, comparative treatment studies are most relevant to the Dodo bird verdict as they provide a “head-to-head” comparison of treatments drawing on the same sample of clients randomly assigned to each condition. For the most part, the results we present use the d statistic for estimating the effect size of treatments (i.e., the difference between treatments or between treatment versus no treatment is expressed in standard deviation units); in some instances, when useful for interpretative purposes, we

210

The Behavior Analyst Today

Volume 8, Issue 2, 2007

also provide information on other types of effect sizes. Finally, we briefly discuss the implications of these findings for psychotherapy research and current efforts to promote evidence-based psychotherapeutic practices. Before considering the meta-analytic evidence, it is important to note that many authors have raised scientific cautions to consider when interpreting evidence that appears to indicate psychotherapy equivalence (Beutler, 1991; Cujipers, 1998; Hsu, 2000; Norcross, 1995; Reid, 1997; Shadish & Sweeney, 1991; Stiles, Shapiro, & Elliott, 1986). Treatment fidelity, researcher theoretical allegiance, and measurement quality should all be considered before tentatively accepting that there may be no true difference between treatments in a given study. Sample size is also a critical element. Kazdin and Bass (1989) calculated that, based on the typical differences found between treatments, researchers wishing to compare two or more treatments should plan to include over 70 participants per treatment condition if they wish to have adequate power to detect treatment differences. Meta-Analytic Evidence Presented by Hunsley and Di Giulio (2002) Smith, Glass, and Miller (1980) conducted the first meta-analysis of psychotherapy. Based on several hundred treatment outcome and comparative treatment studies, they found strong evidence for significant differences among effects of different types of therapy (Table 5-4, p.94): Treatment outcome studies indicated that cognitive and cognitive-behavioral treatments had the largest effect sizes (mean d values of 1.31 and 1.21, respectively), followed by behavioral and psychodynamic treatments (0.91 and 0.78), humanistic treatments (0.63), and developmental treatments (including vocational-personal development counseling and “undifferentiated counseling”; 0.42). The authors then analyzed their data based on client diagnoses and again found substantial differences among treatment types (Table 5-5, p.96). These, however, are not the results presented by advocates of the dodo bird verdict; instead, they focus on Smith et al.’s (1980) analyses conducted on therapy “classes,” in which, based on treatment outcome studies, behavioral (mean d = 0.98) and verbal (mean d = 0.85) treatments produced comparable effects. These “classes” were constructed by grouping cognitive-behavioral, behavioral, behavior modification, systematic desensitization, and other behavioral treatments in the behavioral class and grouping psychodynamic, humanistic and cognitive treatments in the verbal class. The logic of classifying cognitive treatments with psychodynamic and humanistic treatments is highly questionable, yet it is only in these analyses, among the dozens reported by Smith et al., that psychotherapy equivalence was found. In other words, the strongest evidence for the Dodo bird verdict from Smith et al. is based on a very questionable classification strategy! They also conducted analyses on 56 comparative treatment outcome studies involving behavioral and verbal classes of treatment. Even with the questionable classification strategy, behavioral treatments (mean d = 0.96) were significantly superior to the verbal treatments (mean d = 0.77; Table 5-14, p.108). Weisz and colleagues conducted a series of meta-analyses focusing specifically on the child and adolescent treatment literature. Weisz, Weiss, Alicke, and Klotz (1987) meta-analyzed treatment outcome studies published between 1958 and 1984 and concluded that there was strong evidence for the superiority of behavioral treatments (including cognitive treatments) over nonbehavioral treatments. Subsequently, Weisz, Weiss, Han, Granger, and Morton (1995) meta-analyzed 150 child and adolescent treatment outcome studies published between 1983 and 1993. Behavioral treatments (cognitive, cognitive-behavioral, parent training, operant methods, respondent methods, and social skills training) yielded a mean d of 0.54, significantly greater than the mean d of 0.30 for the nonbehavioral treatments (client-centered and insight-oriented therapies). Taking into account important methodological features (such as random assignment, attrition, and therapist experience), Weiss and Weisz (1995) evaluated the relative effectiveness of behavioral (including cognitive) versus nonbehavioral (psychodynamic and

211

The Behavior Analyst Today

Volume 8, Issue 2, 2007

humanistic) treatments in a subset of the studies used by Weisz et al. (1987). This meta-analysis included 105 studies of treatments for anxiety disorders, depression and social skills deficits. Controlling for methodological quality, the mean d values of behavioral and nonbehavioral treatments were 0.86 and 0.38, respectively, with the relative difference even greater in the 10 comparative treatment studies in the sample that directly compared behavioral to nonbehavioral treatments (mean d values of 0.76 and 0.17, respectively). Reid (1997) reviewed findings from 42 focused meta-analyses that examined treatments for specific conditions such as depression, insomnia, smoking cessation, and bulimia. He concluded that 74% showed evidence of differential treatment effects. He noted that behavioral (including cognitive and cognitive-behavioral) treatments showed clear superiority to other treatments for child maladaptation, child abuse, juvenile delinquency, and panic-agoraphobia. On the basis of his review, Reid concluded that there was little evidence to support the Dodo bird verdict. In the most direct test of the Dodo bird verdict to date, Wampold, Mondin, Moody, Stich, Benson, et al. (1997) conducted a meta-analysis of comparative treatment studies published between 1970 and 1995. The authors calculated all effect size values between pairs of treatments and then calculated their d values in two ways. First, they aggregated all the absolute values of the obtained effect sizes, and divided by the number of effect sizes. Second, they calculated a mean d value by randomly assigning a positive or negative sign to each obtained effect size and dividing the aggregate of these values by the number of obtained effect sizes. They reported a mean d of 0.19 for their first estimate (significantly different from zero) and a mean d of 0.0021 for their second (a nonsignificant effect). Although emphasizing that their results strongly supported the Dodo bird verdict, Wampold et al. explicitly cautioned that their results were not evidence that all psychotherapies found in professional practice are equally efficacious or as efficacious as those included in their sample. In fact, closer examination of their results actually reveals that their data provided strong evidence for a lack of treatment equivalence. As Crits-Cristoph (1997) and Hunsley and Di Giulio (2002) pointed out, the majority of studies included in their analysis compared one type of cognitive-behavioral treatment to another cognitive-behavioral treatment; thus, even if warranted, the conclusion of psychotherapy equivalence could only be confidently applied within the family of cognitive-behavioral treatments (CBT), not to psychotherapy treatments in general. More importantly, Wampold et al. erred greatly in their calculations, as their second method for calculating the mean d value could, by definition, only yield a mean value of zero regardless of the true value (cf. Howard, Krause, Saunders, & Kopta, 1997). The final meta-analysis reviewed by Hunsley and Di Giulio (2002) was that of Shadish, Matt, Navarro, and Phillips (2000). These researchers meta-analyzed 90 treatment outcome studies of clinically representative psychotherapy, only selecting studies in which clients, treatments, and therapists were representative of typical clinical settings. Shadish et al. found overall evidence of significant treatment effects in the sampled studies (mean d = 0.41). Using a random-effects model to predict treatment effect sizes, treatment orientation (i.e., behavioral vs. nonbehavioural) was found to be a significant predictor. In other words, treatment effect sizes were larger for behavioral than for nonbehavioral treatments as practiced in typical treatment settings with typical clients and therapists. Updating the Review: Article Search and Review Criteria Our search for meta-analyses comparing psychotherapy treatments was conducted via a computer-based literature search of PsycInfo and Medline databases. Our search labels, specifying years 2002 – 2007, included “psychotherapy and equivalence,” “dodo bird,” “psychotherapy and metaanalysis,” “empirically supported treatments,” “allegiance,” “psychotherapy efficacy,” “differential treatment,” “common factors,” and “comparative treatment.” We then searched among the meta-analyses

212

The Behavior Analyst Today

Volume 8, Issue 2, 2007

generated by this search strategy and selected only those studies in which the effects of different psychotherapies were compared via statistical analysis, not simply visual inspection. We found only one comprehensive meta-analysis published since 2002 that included a range of treatments for a range of client conditions (Luborsky, et al., 2002) and six other more focused metaanalyses that examined treatment effects for short-term psychodynamic psychotherapy versus other treatments for various patient conditions (Leichsenring, Rabung, & Leibing, 2004), sex-offenders (Lösel & Schmucker, 2005), CBT for panic disorder with and without agoraphobia (Mitte, 2005), CBT and selfregulatory treatments for chronic low back pain (Hoffman, Papas, Chatkoff, & Kerns, 2007), and child and adolescent disorders (mostly externalizing problems and depression; Weisz, Jensen-Doss, & Hawley, 2006; Weisz, Valeri, & McCarty, 2006). Luborsky and colleagues’ comprehensive meta-analysis. Luborsky et al. (2002) examined 17 meta-analyses of comparative treatment studies and reported a mean d value of 0.20; they described this value as nonsignificant, although the precise nature of the statistical analysis used to reach this conclusion is not clear. Although not discussed in their review, there were a number of primary research studies that were used in more than one of the meta-analyses they examined. Accordingly, the effect sizes reported in the 17 meta-analyses are not independent of each other. The precise impact these dependencies had on the accuracy and generalizability of their findings is hard to estimate, especially as only three of the metaanalyses contained over 10 studies, but it does raise questions about the accuracy of the .20 value they reported. When commenting on their findings in a subsequent article, they stated “Our impression is that the occasional differences are likely to be attributable to chance factors, after all results are taken together” (Luborsky et al., 2003, p.458). However, in our view, these differences are important given that the overall effect size estimate of 0.20 was derived from comparing one efficacious treatment to another efficacious treatment. To put this result in context, it is informative to convert d to the metric of number needed to treat (NNT) commonly used in medicine. NNT provides information on the number of patients one would need to treat with the target treatment to have one more successful patient outcome than would be possible with the comparison treatment. Converting an effect size d value of 0.20 to NNT yields a value of approximately 9 (8.892; see Kraemer & Kupfer, 2006).Thus, the relative benefits of the more efficacious treatment become evident before even 10 patients are treated. In this light, a d value of 0.20 may well be important in a clinical context. Were obtained differences between treatments due primarily to chance factors? Upon further examination of Luborsky et al.’s (2002) findings, it seems unlikely, as it is possible to discern some distinctive patterns in their results. Sixteen meta-analytic estimates involved comparisons among psychotherapies, with one involving a comparison between psychotherapy and pharmacotherapy. Of the 16 effect sizes, 5 involved comparisons within the family of CBT approaches (e.g., cognitive vs. behavioral), with only 1 being statistically significant. There were three significant comparisons between variants of CBT and a group of treatments labeled as “general verbal” treatments, with all three favoring CBT. Four meta-analytic results compared the CBT family of treatments to the psychodynamic family of treatments, with only one significant result (favoring CBT). Finally, there were 4 comparisons between the psychodynamic family of treatments and other treatments (described as nonspecific, nonpsychiatric, psychiatric, and other, respectively), and none of these comparisons was significant. Thus, 4 of 5 significant results involved comparisons between cognitive-behavioral treatments and treatments based on other theoretical orientations. It has been suggested that research allegiance to a particular theoretical orientation may result in delivering the preferred treatment in a more sophisticated and informed manner (Luborsky et al., 1993; Luborsky et al., 1999). Luborsky and colleagues (2002) attempted to control for such effects by averaging

213

The Behavior Analyst Today

Volume 8, Issue 2, 2007

the score of three measures of researcher allegiance (ratings of the reprint, ratings by colleagues who know the researcher’s work well, and self-ratings of allegiance by the researchers’ themselves) and calculating the correlation of this score and the outcome of the treatments compared. The result was an r of .85 for a sample of 29 comparative treatment studies; when applied to their meta-analytic findings, this resulted in a corrected mean d value of 0.14 (in contrast to the original 0.20). There are significant problems when dealing with research allegiance statistically, as this is likely to overcorrect for any researcher allegiance effect that may have had a biasing influence on study results (Hunsley & Di Giulio, 2002). Focused meta-analyses. Leichsenring et al. (2004) conducted a meta-analysis of 17 randomized studies of short-term psychodynamic psychotherapy (STPP) across a range of patient conditions (social phobia, PTSD, depression, cocaine and opiate dependence, personality disorders, chronic functional dyspepsia, and anorexia and bulimia nervosa). Some of these studies were also included in the metaanalyses used by Luborsky et al. (2002). They included only randomized controlled trials which compared an STPP to another active treatment and required that treatment manuals were used and therapists were experienced in STPP or specifically trained in STPP for the study (N=15 studies). STPP yielded a mean d of 1.39 after therapy (p<.001) and 1.57 at follow-up (p<.001) on target problems. The authors compared the efficacy of STPP with other forms of psychotherapy (mostly CBT, but also group interpersonal psychotherapy, brief supportive therapy, routine primary care, drug counselling, and brief adaptive psychotherapy); only two of the included studies had group sizes of more than 70 participants per treatment group. Leichsenring and colleagues separated treatment outcomes into target problems, general psychiatric symptoms, and social functioning. Within each study, they calculated the effect size difference between the active treatment groups for pre-post and post-follow-up for each of these groups of outcomes. They then averaged these differences for each outcome group and found mean between-group d’s ranging from -0.22 to 0.23, none of which was statistically significant when analyses were conducted separately for (a) pretherapy-posttherapy effect size differences and then (b) posttherapy-follow-up effect size differences in target symptoms, general psychiatric symptoms, and social functioning. These results are very promising in terms of the efficacy of STPP. However, given that the between-group comparisons were calculated with only 7 to 15 studies and with small sample sizes within each treatment group in all but two of the studies, the power to detect small differences that may exist between treated groups was very low. In the most comprehensive meta-analysis of sex-offender treatment to date, Lösel and Schmucker (2005) examined controlled treatment outcome evaluations published prior to 2004. Outcome was defined as recidivism; the authors followed a broad definition of recidivism, ranging from incarceration to lapse behavior. Sixty-nine studies containing 80 independent comparisons between treated and untreated offenders were analyzed. Although physical treatments (including physical castration and hormonal treatment) had much higher effects than psychosocial treatments (odds ratio = 7.37 vs. 1.32, respectively, largely due to the extreme effects of physical castration), only CBT (OR = 1.45) and behavioral treatments (OR = 2.19) had a significant impact on sexual recidivism. With odds ratios close to 1, the other approaches (including therapeutic community, insight-oriented, other and psychosocial treatments) did not significantly influence recidivism. Mitte (2005) conducted a meta-analysis of (randomized and nonrandomized) behavioral, CBT, and pharmacological treatments for panic disorder with and without agoraphobia. Mitte computed the average of between-treatment (behavioral versus CBT) effect size differences across studies, and found no significant differences for anxiety symptoms (mean d=0.09, effect sizes ranged from -0.07 to 0.24; n=26 studies), but found significant differences in favor of CBT for associated depressive symptoms (mean d=0.18, effect sizes ranged from 0.01 to 0.35; n=22 studies). As suggested previously, it is hardly surprising that significant differences are not always found (especially within a small sample of studies) when comparing various forms of behavioral and cognitive-behavioral treatments.

214

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Hoffman et al. (2007) conducted a meta-analysis of psychological interventions for chronic low back pain; across four studies (averaging within-study between-treatment effect sizes differences), CBT was equivalent to self-regulatory treatments (SRT; including hypnosis and behavioral treatments such as biofeedback and relaxation training) at posttest (mean d = -0.13, ns) for pain intensity, and marginally less effective than SRT across three studies for associated depression (mean d = -0.41, p<.10). Given the low number of studies in these analyses, it is difficult to conclude what the true treatment difference might be. Weisz et al. (2006) meta-analyzed 32 studies from the child and adolescent treatment literature that directly compared evidence-based treatments (i.e., treatments included in at least one published list of treatments showing beneficial effects) to usual care (psychotherapy, counselling, or case management provided as part of regular services). Client conditions were largely externalizing problems (conduct problems and delinquency were the focus of all but two of the studies). Averaging across the 32 studies, the authors found the mean d for evidence-based treatments (EBT) versus usual care (UC) was 0.30, indicating that the average youth treated with an EBT was better off after treatment than 62% of youths who received UC. Follow-up data from 16 of the studies indicated that the mean difference at follow-up in effect size between the EBT and UC groups was a significant d = 0.38. Notably, the superiority of EBTs over UC was not due to the use of homework to facilitate treatment generalization, efforts to ensure treatment integrity in the EBTs, research therapists delivering EBTs, theoretical allegiance of the researchers, evidence from voluntary treatment seekers, or differences in treatment setting. EBT superiority was not reduced by high levels of youth severity, comorbidity, or by inclusion of minority youths as study participants. Weisz et al. (2006) meta-analyzed 35 randomized studies of psychotherapy for child and adolescent depressive symptomatology (elevated levels of depressive symptoms or formal diagnosis of major depressive disorder or dysthymic disorder). When data from multiple informants (i.e., youth, parents, and teachers) were combined, they found an overall mean d of 0.34. This was significantly less than the mean d of 0.99 found in previous meta-analysis for the treatment of depression and less than the mean effect size typically found for the treatment of youth disorders in general. One element of their analysis involved determining whether treatments that emphasized cognitive change were more effective than treatments that did not. They computed mean d (pre-post) separately for 31 treatments that involved an emphasis on cognition (i.e., cognitive therapy and CBT; mean d = 0.35, p<.01) and 13 treatments that did not emphasize cognition (primarily behavioral treatments, but also included attachment-based family training and interpersonal psychotherapy; mean d = 0.47, p<.01). None of the included studies were comparative treatment studies. The difference between treatments with a cognitive emphasis and those without was not significant. As is evident in a number of the meta-analyses just reviewed, researchers sometimes conduct analyses to test for treatment equivalence using a relatively small number of studies, many of which have small sample sizes. It is important for readers of meta-analyses to remember that meta-analytic methods are not a simple statistical remedy for improving upon underpowered studies often found in the clinical treatment literature. Combining data from multiple treatment outcome studies may well provide a better estimate of the “true” impact of the treatment than is possible with one single study. However, comparing different treatments on the basis of such meta-analytic estimates can be problematic. If only a handful of underpowered studies are used to estimate a treatment effect, the accuracy of the estimate may be poor and the meta-analytic comparison may, itself, be underpowered to detect differences between the compared treatments. Similar problems will occur if the treatment comparisons are based on a small number of underpowered comparative treatment studies. Like the distribution of most psychological data, meta-analytic estimates are distributed around the “true” population mean value of the treatment effect (see Schmidt, 1992). Estimates derived from a small set of studies, involving relatively small sample sizes, are likely to be found across the distribution,

215

The Behavior Analyst Today

Volume 8, Issue 2, 2007

not just clustered near the population mean. As with all data, sample estimates are likely to more accurately reflect the population mean if the sample is large and generally representative. This elementary statistical fact is as true for secondary data analysis (i.e., meta-analytic data) as it is for data obtained for primary studies. Evidence from Recent Comparative Treatment Studies Given the limited number of recent meta-analyses examining the psychotherapy equivalence question, we decided to examine the outcome of recent comparative treatment studies. Accordingly, we examined the contents of several journals that typically publish such —studies—American Journal of Psychiatry, Archives of General Psychiatry, Journal of the American Medical Association, Journal of Clinical Psychology, and Journal of Consulting and Clinical Psychology—for 2006, the most recent complete publication year. Our literature search returned 12 randomized trials (two effectiveness trials and 10 efficacy trials) in which (a) two active treatments were compared or (b) a treatment was compared to whatever treatments were usually offered in the clinical setting (i.e., treatment as usual). In all instances, to be included in our presentation, researchers must have conducted statistical analyses directly comparing the outcomes of patients in the differing treatment conditions. Table 1 presents a summary of the studies. Every study we found included a variant of CBT, broadly defined, as one of the tested treatments. Inspection of Table 1 reveals that many of the treatments resulted in substantial patient improvement. However, in nine of the studies, one treatment was significantly more efficacious than the other(s). This finding was obtained in both adequately powered and underpowered studies. In three studies, no treatment differences were reported: Christensen, Atkins, Yi, Baucom, and George (2006) compared two forms of behavioral couple therapy (traditional versus integrative), McBride, Atkinson, Quilty, and Bagby (2006) compared CBT and interpersonal psychotherapy for depression, and Strauman et al. (2006) compared cognitive therapy and self-system therapy (which contains some aspects of CBT) for the treatment of depression. Of these three studies, two were underpowered to detect treatment differences, with only the Christensen et al. (2006) having a sample size close to the recommended 70 participants per treatment condition. Table 1. Randomized Trials in 2006 Comparing Two or More Psychotherapy Treatments. Study Addis et al. (2006)

Treatment 1 Panic Control Treatment (CBT) for Panic Disorder (n=38, at least 8 sessions)

Improvement An average of 42.9% of participants achieved clinically significant change on a range of outcome measures (completer analysis)

Treatment 2 Treatment as Usual (provided by experienced psychotherapists, ranging from supportive psychotherapy to interpersonal to CBT techniques; n=42, at least 8 sessions)

Behavioral Treatment for Substance Abuse in Severe and Persistent Mental Illness (BTSAS; group format, 2X weekly for 6 months) n=62, completer analysis Group Cognitive Behavioral Therapy (manualized) for 5 -

44.3% had multiple 4-week blocks of clean urine tests

Supportive Treatment for Addiction Recovery (STAR; group format, 2X weekly for 6 months) n=49, completer analysis

Improvement An average of 18.8% of participants achieved clinically significant change on a range of outcome measures (completer analysis) 10.2% had multiple 4week blocks of clean urine tests

10-year follow-up data: 2% of children had future sex

Group Play Therapy (GPT; manualized) for 5 – 12 year

10-year follow-up

1>2

Bellack et al. (2006) 1>2 Carpentier et al. (2006)

216

The Behavior Analyst Today

1>2 Christensen et al. (2006) 1=2

Clark et al. (2006)

Volume 8, Issue 2, 2007

12 year olds with sexual behavior problems (n=64,12 60-minute sessions)

offenses

olds with sexual behavior problems (n=71, 12 60minute sessions)

Integrative Behavioral Couple Therapy (IBCT) for distressed couples, n=66 couples

69% showed clinically significant improvement on the Dyadic Adjustment Scale (DAS) at 2-year follow-up

Traditional Behavioral Couple Therapy (TBCT) for distressed couples, n=68 couples

Cognitive Therapy for Social Phobia (n=21, up to 14 weekly sessions)

84% no longer met criteria for Social Phobia (intent to treat analysis)

Exposure and Applied Relaxation (behavioral treatments) for Social Phobia (n=21, up to 14 weekly sessions)

Behavioral Activation for major depression (n=43, 36 completed; max of 24 50minute sessions over 16 weeks)

High-severity subgroup: 76% of participants met criteria for response or remission on the BDI

Cognitive Therapy for major depression (n=45, 39 completed; max of 24 50minute sessions over 16 weeks)

Schema-focused therapy for borderline PD (n=44, 3 years, 2X weekly)

65.9% met criteria for reliable change, 45.5% met criteria for recovery.

Transference-focused therapy for borderline PD (n=42, 3 years, 2X weekly)

CBT for benzodiazepine (BZD) discontinuation among GAD patients (n=28 completers; 12 90-minute sessions)

75% ceased BZD intake, 35.5% still met criteria for GAD at posttest and 32.3% at 3-month follow-up

Nonspecific therapy (Borkovec & Costello, 1993; characterized by empathetic communication, active listening/reflecting, supportive statements) n=28 completers; 12 90-minute sessions

Dialectial Behavior Therapy for suicidal behaviours and Bordeline Personality Disorder (n=52, 1 year)

23.1% made a suicide attempt DBT patients required less hospitalization (p<.01)

Community treatment by experts; all therapists identified as either “eclectic but nonbehavioural” or “psychodynamic” (n=49, 1 year, minimum one session weekly)

1>2

Dimidjian et al. (2006) 1>2

Giesen-Bloo et al. (2006) 1>2

Gosselin et al. (2006) 1>2

Linehan et al. (2006) 1>2

data: 10% of children had future sex offenses 60% of couples showed clinically significant improvement on the DAS at 2-year follow-up 42% no longer met criteria for Social Phobia (intent to treat analysis) Highseverity subgroup: 48% of participants met criteria for response or remission on the BDI 42.9% met criteria for reliable change, 23.8% met criteria for recovery 37% ceased BZD intake, 80% still met criteria for GAD at posttest and 60% met criteria at 3month follow-up

46.0% made a suicide attempt

217

The Behavior Analyst Today

McBride et al. (2006)

Volume 8, Issue 2, 2007

CBT (manual: Greenberger & Padesky, 1995) for major depressive disorder (n=28, M=16.56 sessions)

CBT equally as effective as IPT

1=2

Cognitive Therapy for depressive symptomatology (n=18 completers, M=19.6 sessions)

Tang & Harvey (2006)

Behavioral Experiment (with actigraph and sleep diary) for insomnia (n=24, 3 sessions)

50% met criteria for clinically significant change (decrease greater than 50% on both Hamilton Rating Scale for Depression and Beck Depression Inventory). 33.3% recovered (score of 6 or less on Beck Depression Inventory) As a group, showed significant reductions in sleep impairment on the Insomnia Severity Index (ISI) d = 0.87

1=2 Strauman et al. (2006)

IPT (manual: Weissman, Markowitz & Klerman, 2000) for major depressive disorder (n=27, M=17.15 sessions) Self-System Therapy for depressive symptomatology (SST; based on regulatory theory, Vieth et al., 2003) (n=21 completers, M=21.7 sessions)

Verbal feedback (with actigraph and sleep diary) for insomnia (n=24, 3 sessions)

1>2

IPT equally as effective as CBT 57.2% met criteria for clinically significant change. 47.6% recovered. As a group, did not show significant reductions in sleep impairment on the ISI, d = 0.27

Note: 1>2 indicates that treatment 1 was significantly more efficacious than treatment 2, and vice versa. 1=2 indicates that the treatments were equally efficacious.

One study merits particular attention. In comparing dialectic behavior therapy (DBT) to treatment as usual, Linehan et al. (2006) attended to several methodological suggestions made by treatment researchers to ensure fairness in comparative treatment studies. Specifically, in order to maximize internal validity, they controlled for: availability of treatment, assistance finding and getting to a first appointment with a therapist, hours of individual psychotherapy offered, therapist sex, therapist training, therapist clinical experience, and therapist expertise (with the alternative treatment group therapists having more expertise), availability of group clinical consultation, allegiance to treatment approach, institutional prestige associated with treatment, and general factors associated with receiving psychotherapy. Therapists delivering the alternative treatment (community treatment by experts; CTBE) were nominated by community mental health leaders as experts at treating difficult clients. The content and dosage of therapy was not prescribed by the researchers (i.e., experts could treat clients how they saw fit within the constraints of seeing clients at least once per week), the study paid for CTBE at the same rate as for DBT, and no participants were dropped because of failure to pay. Even when controlling for these factors, which have been shown in previous research to be salient to treatment outcome across a variety of treatment modalities, the DBT group was half as likely as the comparison group to attempt suicide during the treatment year, and used crisis services significantly less (1% of DBT patients went to the emergency department at least once for any type of psychiatric emergency, versus 57.8% of CBTE patients). Implications for Psychotherapy Research Taken together, the empirical evidence clearly indicates that, when statistical comparisons have adequate power to detect differences, psychotherapy nonequivalence is the rule, not the exception. Across age groups and patient conditions, most researchers have found that some treatments are superior to others. That being said, it also appears that searching for differences among variants of CBT may not always yield statistically significant findings. From our perspective, there is little to be gained from more research comparing one treatment to another—the Dodo bird verdict is generally not supported in well-designed and adequately powered

218

The Behavior Analyst Today

Volume 8, Issue 2, 2007

studies. The only circumstance in which comparative treatment studies can be useful is when a new and promising treatment is compared to a treatment of established efficacy. It could be argued that, if a new treatment is to be tested for a condition in which there is already extensive replicated evidence of treatment efficacy for an established treatment, it would not be ethical to compare the new treatment to a no-treatment control group. Instead of conducting a treatment outcome study, it may be most appropriate to contrast the new treatment to an established treatment in a comparative treatment design, rather than withhold from patients access to a treatment known to be efficacious. Nevertheless, many comparative treatment studies not involving new treatments will undoubtedly continue to be conducted. Based on available evidence, and assuming they have sufficient power to detect group differences, most such studies will continue to find that treatments are not equivalent in their clinical effects. Knowing full well that such studies are to be conducted, we join with a growing number of researchers in suggesting that these studies should be designed to also provide information on both mechanisms and mediators of efficacious treatment (e.g., Jensen, Weersing, Hoagwood, & Goldman, 2005; Kazdin, in press). We need to know much more about how and why treatments work or fail to work, not just that one treatment is better than another. This type of information is especially important because, as shown repeatedly in the meta-analytic literature, treatments that fail to demonstrate their superiority in comparative trials still, nonetheless, demonstrate efficacy with respect to some conditions for some patients. Do all therapies exert their influence through the same mechanisms? If some therapies work via different mechanisms, is it possible to develop a treatment that optimally combines these differing mechanisms? These are the types of questions that need to be answered in order to truly advance our knowledge about the effects of psychological treatments. Implications for Evidence-Based Psychological Practice For some clinical conditions, the inescapable conclusion based on many hundreds of treatment studies is that some specific forms of psychological treatment should be viewed as first line options for clinicians. An increasing number of practice guidelines are now available that encourage attention to such findings. These include guidelines available from the Agency for Healthcare Research and Quality (http://www.ahrq.gov/), National Institute for Health and Clinical Excellence (http://www.nice.org.uk/), the American Psychiatric Association (http://www.psych.org/psych_pract/treatg/pg/prac_guide.cfm), and the American Academy of Child and Adolescent Psychiatry (http://www.aacap.org/page.ww?section=Practice+Parameters&name=Practice+Parameters). It is also important to recognize that there are some conditions for which there may be multiple treatment options that work relatively well, including adult depression and couple conflict (see Hunsley & Lee, 2006). Unfortunately, the emphasis within the field on establishing psychotherapy equivalence or treatment superiority has resulted in a rather substantial blind spot for many psychotherapy researchers and, possibly, clinicians. Some treatments are better than others but, as stated above, that does not mean that the less efficacious treatments are worthless. It is very important to know that a treatment is likely to be most beneficial to a client, but it is also important to know that, if the treatment fails to works for a specific client, there is another viable to consider, even if this alternative treatment has been found to be somewhat less efficacious in clinical trials. The movement to promote EBTs in clinical practice is precisely about encouraging the use of all treatments that have been shown to work in sound empirical investigations (e.g., Hunsley, 2007, in press). Consider what is known about the impact of psychotherapy as routinely delivered in clinical settings. Hansen, Lambert, and Forman (2002) analyzed data from over 6,000 adult patients seen in a range of clinical settings (e.g., employee assistance programs, university counseling centers, community mental health clinics, and health maintenance organizations). In this large data set only 35% of clients met criteria for improvement or recovery. Very similar outcome results (29% of patients met criteria for

219

The Behavior Analyst Today

Volume 8, Issue 2, 2007

improvement) were recently reported by Wampold and Brown (2005) in their sample of over 6,100 adult patients who received therapy services through a managed care organization. In a meta-analysis of 2,500 clients in “real world” clinical practice, Lambert et al. (2003) found that less than a quarter of patients usually make substantial gains in treatment. Furthermore, a meta-analysis of studies of usual clinical care for children and adolescents indicated that obtained effect sizes averaged about zero (mean d = 0.01; Weisz, 2004; Weisz, Donenberg, Han, & Weiss, 1995). In contrast to these findings, across 28 studies of EBTs, involving over 2,100 adult patients, Hansen et al. (2002) found that 57% of patients met criteria for recovery by the end of treatment, and fully two-thirds met criteria for improvement or recovery. Is there evidence that EBTs can work in real world clinical settings? Hunsley and Lee (2007) reviewed 35 treatment effectiveness studies for adult (N=21) and child/adolescent disorders (N=14). They included only studies that were designed to test an efficacious treatment (i.e., that had been previously tested in at least one efficacy study) in a routine clinical setting. They reported that the treatments provided in these effectiveness studies typically obtained outcome results comparable to those found in meta-analytic summaries of the efficacy studies on the same treatments. These findings suggest that treatments with established efficacy can be transported to clinical settings without any substantial loss of effectiveness. When combined with the growing evidence base showing that EBTs are superior to usual clinical services (Addis et al., 2003; Linehan et al., 2006; Mufson et al., 2004; Weisz et al., 2006), the need for dissemination and utilization of all, not just the best, EBTs is obvious. Conclusion Based on decades of research, it is clear that all psychotherapies are decidedly not equivalent in their clinical impact. Even among efficacious treatments, the mean difference between treatments is frequently estimated to be approximately d = .20 (Luborsky et al., 2003; Wampold et al., 1997). If d = .20 is the best estimate of differences between efficacious treatments, and assuming a normal distribution of individual study results, it is entirely expected that there will be some instances of treatment equivalence in the literature. Whether these instances are informative or meaningful is, however, a separate issue. Based on evidence to date, a small number of these instances of treatment equivalence may be very interesting and clinically useful (e.g., finding that two very different types of treatment yield comparable results), some are only relatively informative (e.g., finding that different forms of CBT yield comparable results), and some, frankly, are misleading and irrelevant (e.g., finding that two treatments yield comparable results in studies without adequate power to detect group differences). In this era of enhanced professional accountability and evidence-based health care, it is unlikely that evaluators, including policymakers, third party payers, and prospective clients, will be as benign and generous as the Dodo bird was in declaring all therapies to be “winners” (Winter, 2006). However, in promoting the fact that some treatments are better than others, we must not throw the proverbial “baby” out with the “bathwater.” If, despite persistent attempts by a clinician skilled in the provision of the first line treatment, insufficient progress is made in therapy, the responsible step is to consider alternative treatment options that have some supporting empirical evidence. Turning to second and third line treatments is routinely done in psychiatry and in other areas of medicine, and there is no reason that psychotherapy patients should expect any less attention from clinicians to the full range of available evidence-based psychotherapy options. References Addis, M. E., Hatgis, C., Krasnow, A. D., Jacob, K., Bourne, L., & Mansfield, A. (2004). Effectiveness of cognitive-behavioral treatment for panic disorder versus treatment as usual in a managed care setting. Journal of Consulting and Clinical Psychology, 72, 625-635.

220

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Bellack, A.S., Bennett, M.E., Gearon, J.S., Brown, C.H., & Yang, Y. (2006). A randomized clinical trial of a new behavioral treatment for drug abuse in people with severe and persistent mental illness. Archives of General Psychiatry, 63, 426-432. Beutler, L.E. (1991). Have all won and must all have prizes? Revisiting Luborsky et al.’s verdict. Journal of Consulting and Clinical Psychology, 59, 226-232. Bohart, A. C., O'Hara, M., & Leitner, L. M. (1998). Empirically violated treatments: Disenfranchisement of humanistic and other psychotherapies. Psychotherapy Research, 8, 141-157. Borkovec, T.D., & Costello, E. (1993). Efficacy of applied relaxation and cognitive-behavioral therapy in the treatment of generalized anxiety disorder. Journal of Consulting and Clinical Psychology, 61, 611-619. Bradley, R., Greene, J., Russ, E., Dutra, L., & Westen, D. (2005). A multidimensional meta-analysis of psychotherapy for PTSD. American Journal of Psychiatry, 162, 214-227. Carpentier, M.Y., Silovsky, J.F., & Chaffin, M. (2006). Randomized trial of treatment for children with sexual behavior problems: Ten-year follow-up. Journal of Consulting and Clinical Psychology, 74, 482-488. Christensen, A., Atkins, D.C., Yi, J., Baucom, D.H. & George, W.H. (2006). Couple and individual adjustment for 2 years following a randomized clinical trial comparing traditional versus integrative behavioral couple therapy. Journal of Consulting and Clinical Psychology, 74, 1180-1191. Clark, D.M., Ehlers, A., Hackmann, A., McManus, F., Fennell, M., Grey, N., et al. (2006). Cognitive therapy versus exposure and applied relaxation in social phobia: A randomized controlled trial. Journal of Consulting and Clinical Psychology, 74, 568-578. Crits-Christoph, P. (1997). Limitations of the Dodo bird verdict and the role of clinical trials in psychotherapy research: Comment on Wampold et al. (1997). Psychological Bulletin, 122, 216220. Cujipers, P. (1998). Minimizing interventions in the treatment and prevention of depression: Taking the consequences of the Dodo bird verdict. Journal of Mental Health, 7, 335-365. Dimidjian, S., Hollon, S.D., Dobson, K.S., Schmaling, K.B., Kohlenberg, R.J., Addis, M.E. et al. (2006). Randomized trial of behavioral activation, cognitive therapy, and antidepressant medication in the acute treatment of adults with major depression. Journal of Consulting and Clinical Psychology, 74, 658-670. Giesen-Bloo, J., Van Dyck, R., Spinhoven, P., Van Tilburg, W., Dirksen, C., Van Asselt, T., et al. (2006). Outpatient psychotherapy for borderline personality disorder: Randomized trial of schema-focused therapy versus transference-focused psychotherapy. Archives of General Psychiatry, 63, 649-658. Gosselin, P., Ladoucer, R., Morin, C.M., Dugas, M.J., & Baillargeon, L. (2006). Benzodiazepine discontinuation among adults with GAD: A randomized trial of cognitive-behavioral therapy. Journal of Consulting and Clinical Psychology, 74, 908-919. Greenberger, D., & Padesky, C.A. (1995). Mind over mood: A cognitive therapy treatment manual for clients. New York: Guilford Press.

221

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Hansen, N.B., Lambert, M.J., & Forman, E.M. (2002). The psychotherapy dose-response effect and its implications for treatment delivery services. Clinical Psychology: Science and Practice, 9, 329-343. Howard, K. I., Krause, M. S., Saunders, S. M., & Kopta, S. M. (1997). Trials and tribulations in the metaanalysis of treatment differences: Comment on Wampold et al. (1997). Psychological Bulletin, 122, 221-225. Hsu, L. M. (2000). Effects of directionality of significance tests on the bias of accessible effect sizes. Psychological Methods, 5, 333-342. Hoffman, B.M., Papas, R.K., Chatkoff, D.K., & Kerns, R.D. (2007). Meta-analysis of psychological interventions for chronic low back pain. Health Psychology, 26, 1-9. Hunsley, J. (in press). Addressing key challenges in evidence-based practice in psychology. Professional Psychology: Research and Practice. Hunsley, J. (2007). Training psychologists for evidence-based practice. Canadian Psychology, 48, 32-42. Hunsley, J., & Di Giulio, G. (2002). Dodo bird, phoenix, or urban legend? The question of psychotherapy equivalence. The Scientific Review of Mental Health Practice, 1, 11-22. Hunsley, J., & Lee, C. M. (2006). Introduction to clinical psychology: An evidence-based approach. Toronto: John Wiley & Sons. Hunsley, J., & Lee, C.M. (2007). Research-informed benchmarks for psychological treatments: Efficacy studies, effectiveness studies, and beyond. Professional Psychology: Research and Practice, 38, 21-33. Jensen, P.S., Weersing, R., Hoagwood, K.E., Goldman, E. (2005). What is the evidence for evidencebased treatments? A hard look at our soft underbelly. Mental Health Services Research, 7, 53-74. Kazdin, A. E. (in press). Mediators and mechanisms of change in psychotherapy research. Annual Review of Clinical Psychology. Kazdin, A. E., & Bass, D. (1980). Power to detect differences between alternative treatments in comparative psychotherapy outcome research. Journal of Consulting and Clinical Psychology, 57, 138-147. Kraemer, H. C., & Kupfer, D. J. (2006). Size of treatment effects and their importance to clinical research and practice. Biological Psychiatry, 59, 990-996. Lambert, M.J., Whipple, J.L., Hawkins, E.J., Vermeersch, D., Nielsen, S.L., & Smart, D.W. (2003). Is it time to track patient outcome on a routine basis? A meta-analysis. Clinical Psychology: Science and Practice, 10, 288-301. Leichsenring, F., Rabung, S., & Leibing, E. (2004). The efficacy of short-term psychodynamic psychotherapy in specific psychiatric disorders. Archives of General Psychiatry, 61, 1208-1216. Linehan, M.M., Comtois, K.A., Murray, A.M., Brown, M.Z., Gallop, R.J., Heard, H.L., et al. (2006). Two-year randomized controlled trial and follow-up of dialectical behavior therapy vs therapy by

222

The Behavior Analyst Today

Volume 8, Issue 2, 2007

experts for suicidal behaviors and borderline personality disorder. Archives of Generaly Psychiatry, 63, 757-766. Lösel, F., & Schmucker, M. (2005). The effectiveness of treatment for sexual offenders: A comprehensive meta-analysis. Journal of Experimental Criminology, 1, 117-146. Luborsky, L., Rosenthal, R., Diguer, L., Andrusyna, T.P., Berman, J.S., Levitt, J., et al. (2002). The dodo bird verdict is alive and well - mostly. Clinical Psychology: Science and Practice, 9, 2-12. Luborsky, L., Rosenthal, R., Diguer, L., Andrusyna, T.P., Levitt, J.T., Seligman, D.A., et al. (2003). Are some psychotherapies much more effective than others? Journal of Applied Psychoanalytic Studies, 5, 455-460. McBride, C., Atkinson, L., Quilty, L.C., & Bagby, R.M. (2006). Attachment as a moderator of treatment outcome in major depression: A randomized control trial of interpersonal psychotherapy versus cognitive behavior therapy. Journal of Consulting and Clinical Psychology, 74, 1041-1054. Mitte, K. (2005). A meta-analysis of the efficacy of psycho- and pharmacotherapy in panic disorder with and without agoraphobia. Journal of Affective Disorders, 88, 27-45. Mufson, L., Dorta, K. P., Wickramaratne, P., Nomura, Y., Olfson, M., & Weissman, M. M. (2004). A randomized effectiveness trial of interpersonal psychotherapy for depressed adolescents. Archives of General Psychiatry, 61, 577-584. Nock, M. K., & Ferriter, C. (2005). Parent management of attendance and adherence in child and adolescent therapy: A conceptual and empirical review. Clinical Child and Family Psychology Review, 8, 149-166. Norcross, J. C. (1995). Dispelling the Dodo bird verdict and the exclusivity myth in psychotherapy. Psychotherapy, 32, 500-504. Reid, W. J. (1997). Evaluating the Dodo's verdict: Do all interventions have equivalent outcomes? Social Work Research, 21, 5-16. Rosenzweig, S. (1936). Some implicit common factors in diverse methods of psychotherapy. American Journal of Orthopsychiatry, 6, 412-415. Schmidt, F. L. (1992). What do data really mean? Research findings, meta-analysis, and cumulative knowledge in psychology. American Psychologist, 47, 1173-1181. Shadish, W. R., Matt, G. E., Navarro, A. M., & Phillips, G. (2000). The effects of psychological therapies under clinically representative conditions: A meta-analysis. Psychological Bulletin, 126, 512-529. Shadish, W. R., & Sweeney, R. B. (1991). Mediators and moderators in meta-analysis: There's a reason why we don't let Dodo birds tell us which psychotherapies should have prizes. Journal of Consulting and Clinical Psychology, 59, 883-893. Smith, M. L., Glass, G. V., & Miller, T. I. (1980). The benefits of psychotherapy. Baltimore: Johns Hopkins University Press.

223

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Stiles, W.B., Shapiro, D.A., & Elliot, R. (1986). Are all psychotherapies equivalent? American Psychologist, 41, 165-180. Strauman, T.J., Vieth, A.Z., Merrill, K.A., Kolden, G.G., Woods, T.E., Klein, M.H. et al. (2006). Selfsystem therapy as an intervention for self-regulatory dysfunction in depression: A randomized comparison with cognitive therapy. Journal of Consulting and Clinical Psychology, 74, 367-376. Tang, N.K.Y. & Harvey, A.G. (2006). Altering misperception of sleep in insomnia: Behavioral experiment versus verbal feedback. Journal of Consulting and Clinical Psychology, 74, 767-776. Vieth, A., Strauman, T.J., Kolden, G., Woods, T., Michels, J. & Klein, M.H. (2003). Self-system therapy: A theory-based psychotherapy for depression. Clinical Psychology: Science and Practice, 10, 245268. Wampold, B.E., & Brown, G.S. (2005). Estimating variability in outcomes attributable to therapists: A naturalistic study of outcomes in managed care. Journal of Consulting and Clinical Psychology, 73, 914-923. Wampold, B. E., Mondin, G. W., Moody, M., & Ahn, H. (1997). The flat earth as a metaphor for the evidence for uniform efficacy of bona fide psychotherapies: Reply to Crits-Christoph (1997) and Howard et al. (1997). Psychological Bulletin, 122, 226-230. Wampold, B. E., Mondin, G. W., Moody, M., Stich, F., Benson, K., & Ahn, H. (1997). A meta-analysis of outcome studies comparing bona fide psychotherapies: Empirically, "All must have prizes." Psychological Bulletin, 122, 203-215. Weissman, M.M., Markowitz, J.C., & Klerman, G.L. (2000). Comprehensive guide to interpersonal psychotherapy. New York: Basic Books. Weiss, B., & Weisz, J. R. (1995). Relative effectiveness of behavioral versus nonbehavioral child psychotherapy. Journal of Clinical and Consulting Psychology, 63, 317-320. Weisz, J.R., (2004). Psychotherapy for children and adolescents: Evidence-based treatments and case examples. Cambridge, UK: Cambridge University Press. Weisz, J.R., Donenberg, G.R., Han, S.S., & Weiss, B. (1995). Bridging the gap between laboratory and clinic in child and adolescent psychotherapy. Journal of Consulting and Clinical Psychology, 63, 688-701. Weisz, J.R., Jensen-Doss, A., & Hawley, K.M. (2006). Evidence-based youth psychotherapies versus usual clinical care. American Psychologist, 61, 671-689. Weisz, J.R., Valeri, S.M., & McCarty, C.A. (2006). Effects of psychotherapy for depression in children and adolescents: A meta-analysis. Psychological Bulletin, 132, 132-149. Weisz, J. R., Weiss, B., Alicke, M. D., & Klotz, M. L. (1987). Effectiveness of psychotherapy with children and adolescents: A meta-analysis for clinicians. Journal of Consulting and Clinical Psychology, 55, 542-549.

224

The Behavior Analyst Today

Volume 8, Issue 2, 2007

Weisz, J. R., Weiss, B., Han, S. S., Granger, D. A., & Morton, T. (1995). Effects of psychotherapy with children and adolescents revisited: A meta-analysis of treatment outcome studies. Psychological Bulletin, 117, 450-468. Winter, D. (2006). Avoiding the fate of the Dodo bird: the challenge of evidence-based practice. In D. Loewenthal, & D. Winter (Eds.), What is psychotherapy research? (pp. 41-46). London: Karnac Books. Zinbarg, R.E. (2000). Comment on "Role of emotion in cognitive-behavior therapy": Some quibbles, a call for greater attention to patient motivation for change, and implications of adopting a hierarchical model of emotion. Clinical Psychology: Science and Practice, 7, 394-399.

Author Contact Information:

Robin Westmacott, School of Psychology University of Ottawa 618 – 120 University Ave. Ottawa, Ontario Canada K1N 6N5. E-mail: [email protected] John Hunsley School of Psychology University of Ottawa 521 Vanier Building 11 Marie Curie, Ottawa, Ontario Canada KIN 6N5 E-mail: [email protected]

225

Relative Efficiency of Response-Contingent and Response Independent Stimulation on Child Learning and Concomitant Behavior Carl J. Dunst, Melinda Raab, Orelena Hawks, Linda L. Wilson, and Cindy Parkey Observations of teachers’ use of noncontingent stimulation to elicit behavior from young children with multiple disabilities and profound developmental delays led us to evaluate the relative efficiency of response-contingent and response-independent stimulation to affect child behavior change. Data from three children (2 females, 1 male) with multiple disabilities and delays were analyzed to determine how many child contingency, visual attention, and social affective behavior would be produced per 100 learning opportunities under contrasting stimulus conditions (contingent vs. noncontingent). Results showed for all three child behaviors, response-contingent stimulation was overwhelmingly more efficient in affecting behavior change compared to response-independent stimulation. Implications for intervention are described. Keywords: Response-contingent stimulation, response-independent stimulation, child operant learning, concomitant child behavior, efficiency

More than 25 years ago, we initiated a line of research and practice investigating the value of response-contingent learning opportunities with young children with multiple disabilities and profound developmental delays (e.g., Dunst, 1981; Dunst, Cushing, & Vance, 1985; Dunst & Didoha, 1976; Laub & Dunst, 1974). This work was begun in response to experience showing that traditional early intervention and therapy was not effective in promoting the learning and development of these children. Observations of many practitioners over many years have found that the interventions they used with young children with profound developmental delays more often than not involved response-independent or noncontingent stimulation to elicit or evoke child behavior. In most cases, the more disabled and delayed the children, the greater the likelihood that practitioners would spend large amounts of time using noncontingent stimulation to attempt to affect changes in child behavior. The influences of response-contingent and response-independent stimulation on child learning and behavior has been the focus of investigation for many years (e.g., Dunst, 1984; O'Brien, 1992; Utley, Duncan, Strain, & Scanlon, 1983; Vietze, Foster, & Friedman, 1974). Response-contingent stimulation involves the provision of stimulation contingent upon a child’s behavior, whereas response-independent stimulation involves the provision of stimulation that is noncontingent or nondependent upon a child’s behavior. Findings from studies indicate that noncontingent stimulation at least initially elicits child attention and increased behavior responding, but that children habituate to the response-independent stimulation the longer it is available. In contrast, response-contingent stimulation elicits and maintains child behavior responding for longer periods of time and is often associated with positive social-emotional behavior following contingency detection and awareness (see e.g., Dunst, 2003). The purpose of the analyses in this brief report was to determine the relative efficiency of response-contingent and response-independent stimulation on child operant learning and two concomitant child behavior (visual attention and affective behavior). The analyses were completed on data collected as part of a study promoting teachers’ use of contingency games as a form of early childhood intervention with young children with multiple disabilities and profound developmental delays (Raab, Dunst, Wilson, & Parkey, 2007). During the baseline condition of the study, teachers were observed using noncontingent stimulation to elicit child behavior. The effect appeared to be behavior suppression rather than behavior enhancement. The extent to

226

which this observation was confirmed in a secondary analysis of the study data was the focus of this report. The conduct of the original study was guided by a conceptual and operational framework that postulated both immediate and extended benefits of contingency learning opportunities (Dunst et al., 1985; Raab & Dunst, 1997). Contingency learning games are characterized by behavior-based contingencies where the production or provision of reinforcement is contingent on the child’s behavior (Tarabulsy, Tessier, & Kappas, 1996). The immediate effect of the learning games is increased operant responding. The extended benefits of the contingency games include, but are not limited to, increased child visual attention to the consequences of contingency behavior and child affect (smiling, laughter, and vocalizations) in response to the consequences of a child’s behavior. Method Participants The participants were three children (“Amy,” “Brenda,” and “Cory”) with profound developmental delays and their teachers. The three children, who were between 34 and 52 months of age, attained developmental ages of only 3 to 5 months of age as determined by the Griffiths (1954) developmental scales. The children had Griffiths GDQs (General Development Quotients) between 6 and 16. The three children each had cerebral palsy, two had visual impairments, and one had a seizure disorder. Two of the three children had multiple disabilities. Setting and Procedure Data collection during the different study phases was done in the children’s classrooms. A multiple baseline design across study participants was used to assess the effectiveness of the contingency learning games for promoting response-contingent child behavior and child concomitant behavior. The study included baseline, intervention (both acquisition and mastery), and maintenance phases. The baseline phase consisted of observations of the teachers implementing 2 or 3 activities they currently used to promote child learning. The two intervention phases consisted of behavior-based contingency games (activities) where a child used a behavior to produce an environmental consequence (e.g., batting at a mobile producing movement and sound) or a behavior was reinforced by the teacher (e.g., teacher singing to the child for reaching toward and touching the teacher’s mouth). An intervention session typically included 2 or 3 games. Maintenance included teacher implementation of 2 or 3 contingency games following the termination of teacher coaching. Any one game or activity could include up to 15 trials (learning opportunities). The study was implemented over the course of 44 sessions (days). Measures Measures of child response-contingent behavior, visually attending to the consequences of a contingency behavior, and child smiling, laughter, and vocalizations, were obtained from observations of the children by the study investigators. A child contingency behavior was defined as a behavior that produced or elicited a reinforcement during a game trial that was unprompted or unaided by the teachers. Child visual attention was defined as a fixated look on the consequence of his or her behavior. Smiling or laughter was defined as a closed or open upward turning of the corners of the mouth or an audible laughing sound without smiling indicative of joy or exuberance. A vocalization was defined as an audible open vowel sound (other than laughing). Inter-rater reliability was assessed for 26% of the games. Percent agreement was 91 (Range = 84 to 96) for contingency behavior, 72 (Range = 69 to 75) for visual attention, and 91 (Range = 87 to 94) for child affect (smiling/laughing/vocalizations).

227

Efficiency Efficiency was measured in terms of the relationship between inputs (i.e., learning opportunities) and outputs (e.g., child operant responding), and was calculated as the number of outputs produced by a predetermined number of inputs (Rumble, 1997) and the number of inputs necessary to produce a predetermined number of outputs (Coelli, 2005). The former, termed educational efficiency (UNESCO, 1995), is calculated as: Educational Efficiency = (Outputs/Inputs) x 100. The latter, termed allocative efficiency (Comanor & Leibenstein, 1969), is calculated as: Allocative Efficiency = (Inputs/Outputs) x 100. Each provides a different lens for understanding the efficiency of different kinds of inputs. Efficiency was determined for the baseline (noncontingent stimulation) and each of the three contingency (acquisition, mastery, and maintenance) phases of the study, and was calculated for child contingency, visual fixation, and social affective behavior. First, we calculated the number of contingency and concomitant behavior that would be produced per 100 game trials during each phase of the study (educational efficiency). For example, if a study phase included 200 game trials, and a child produced 150 contingency behavior, his or her contingency efficiency score would be 75 ([150/200] x 100), meaning that 100 learning opportunities would likely result in 75 reinforcing consequences. Second, efficiency was assessed in terms of the ratio of the number of inputs (learning opportunities) necessary to produce 100 outputs (e.g., response-contingent behavior) (allocative efficiency). In the example above, it would require 133 game trials to produce 100 reinforcing consequences ([200/150] x 100). The larger the allocative efficiency score, the less efficient is the learning condition. Results Child Learning Figure 1 summarizes the results of the multiple baseline design. (See Raab et al. (2007), for a more complete presentation of the patterns of learning.) Results showed that few learning game trials during the baseline phase included behavior that produced or elicited reinforcing consequences. In contrast, the children demonstrated improved operant learning during the sessions immediately following the introduction of the contingency games (acquisition), followed by sustained high levels of operant responding (mastery). The children continued to demonstrate high levels of operant responding during follow-up (compared to the baseline). Efficiency The educational efficiency of response-contingent and response-independent learning opportunities on child operant responding is shown in Figure 2. Response-contingent learning opportunities were clearly more efficient in terms of the likelihood that the children would produce behavior having reinforcing consequences. Two of the three children (Amy and Brenda) showed incremental increases in the efficiency of their contingency behavior from the acquisition to mastery to maintenance phases of the study. Cory showed incremental increases in the efficiency of his contingency behavior from the acquisition to mastery phases.

228

100

Baseline

Mastery

Acquisition

Maintenance

80 60 40

Amy

20

PERCENT OF GAME TRIALS

0

0

2

4

6

8

10

12

14

16

18

20

4

6

8

10

12

14

16

18

20

6

8

10

12

14

16

18

20

100 80 60 Brenda

40 20 0

0

2

100 80 60 40

Cory

20 0

0 0 0

2

4

SESSION BLOCKS Figure 1. Child production of contingency behavior during the response-independent (baseline) and response-contingent (acquisition, mastery, maintenance) phases of the study.

229

CONTINGENCY BEHAVIOR EFFICIENCY SCORES

100 90 80

Amy Brenda Cory

70 60 50 40 30 20 10 0

Baseline

Acquisition

Mastery

Maintenance

LEARNING PHASE

Figure 2. Relative efficiency of response-independent and response-contingent learning opportunities on the production of child behavior producing reinforcing consequences. Allocative efficiency, measured in terms of the number of learning opportunities (inputs) needed to produce 100 contingency behavior (outputs), makes clear the differences in the effects of responseindependent and response-contingent stimulation. The number of response-independent inputs (baseline) needed to elicit 100 behavior producing reinforcing consequences was 1,529 for Amy, 641 for Brenda, and 3,015 for Cory. In contrast, the number of response-contingent inputs (intervention and maintenance) needed to elicit 100 contingency (output) behavior was between 103 and 157 for the three children. Figure 3 shows the educational efficiency scores for the children’s visual attention to the consequences of their behavior during the baseline (noncontingent) and three learning (acquisition, mastery, maintenance) phases of the study. Results again show that response-contingent learning opportunities were more efficient for affecting changes in child visual behavior compared to responseindependent stimulation. The relative efficiency of response-independent and response-contingent stimulation is highlighted by the differences in the input to output ratios of learning opportunities necessary to produce 100 visually fixated responses (allocative efficiency). The number of noncontingent-stimulation baseline learning episodes needed to produce 100 fixated looks was 369 for Amy, 495 for Brenda, and 1,633 for Cory. In contrast, it would require just between 107 and 171 learning episodes to produce the same number of fixated behaviors during the response-contingent learning phases of the study. The educational efficiency of response-contingent and response-independent learning opportunities in terms of influencing the display of child social affective behavior is shown in Figure 4. The likelihood that child affective behavior would be affected by contingent and noncontingent learning opportunities showed that noncontingent stimulation was associated with very few social affective behavior, but that there were incremental increases in child smiling, laughter, and vocalizations across the three contingency study phases. The pattern and amount of responding is very much like that found in other studies (Dunst, 2003). The small increase during the acquisition phase followed by larger increases during the mastery and maintenance phases is indicative of a child’s contingency detection and awareness (Tarabulsy et al., 1996; Watson, 2001).

230

VISUAL FIXATION EFFICIENCY SCORES

100 Amy Brenda Cory

90 80 70 60 50 40 30 20 10 0

Baseline

Acquisition

Mastery

Maintenance

LEARNING PHASE

Figure 3. Relative efficiency of response-independent and response-contingent learning opportunities on the children visually attending to the consequences of their behavior.

AFFECTIVE BEHAVIOR EFFICEINCY SCORES

60

50

Amy Brenda Cory

40

30

20

10

0

Baseline

Acquisition

Mastery

Maintenance

LEARNING PHASE

Figure 4. Relative efficiency of response-independent and response-contingent learning opportunities on the production of child social affective behavior. The input to output ratios of the number of learning opportunities needed to elicit 100 child positive affective behavior (allocative efficiency) shows with little doubt that response-contingent stimulation is much more efficient in producing child social-affective behavior compared to response-

231

independent stimulation. The number of noncontingent stimulation game trials needed to elicit 100 social affective behavior, for all three children combined, was 1,126 (Range = 852 to 10,700), whereas the number of response-contingent stimulation game trials needed to produce the same number of affective behaviors at the end of the study was 280 (Range = 273 to 287). Discussion Findings showed that response-contingent learning opportunities were overwhelmingly more efficient in affecting changes in children’s behavior compared to response-independent stimulation. Results indicated that attempts to elicit child behavior using noncontingent stimulation suppress or inhibit child behavior responding whereas response-contingent stimulation enhance and promote child behavior as well as have concomitant behavior consequences. A cursory examination of intervention procedures used with young children with disabilities indicates that despite the fact that noncontingent stimulation is not an efficient treatment procedure, it is a major part of many different kinds of interventions with young children with disabilities, and especially children with profound developmental delays and multiple disabilities. These include, but are not limited to, passive range of motion exercises (Flett, 2003), infant massage (Booth, Johnson-Crowley, & Barnard, 1985), cranialsacral therapy (Sullivan, 1997), neurodevelopmental theory (Palisano, 1991), therapeutic electrical stimulation (Sommerfelt, Markestad, Berg, & Saetesdal, 2001), oral motor stimulation (Domaracki & Sisson, 1990), and vestibular stimulation (Sandler & Voogt, 2001), many of which are used widely by early intervention practitioners (IDEA Infant and Toddler Coordinators Association, 2002; McWilliam, 1999). Further examination of available information indicates that many early intervention practices used with young children with profound developmental delays and multiple disabilities are not likely to be effective because so many intervention activities include large doses of noncontingent stimulation. Ironically, large doses of noncontingent stimulation over extended periods of time probably make children with disabilities and delays more passive and more resistant to learning contingency behavior (Dunst, 1982; Hutto, 2003; Watson, 1971). This is most likely the case because young children subjected to noncontingent stimulation learn that “people do things to them” rather than acquire an understanding of how their behavior can be used to affect environmental consequences. As part of extensive examination of children’s IFSPs and IEPs, for example, we found that the largest number of intervention activities on these plans included procedures that involved use of noncontingent learning opportunities, calling into question the value of the interventions (Dunst, Bruder, Trivette, Raab, & McLean, 1998). Early childhood practitioners have at their disposal a wealth of information about the characteristics of and conditions under which early learning activities and opportunities are most likely to have optimal behavior enhancing consequences. There is most certainly a need to use this information to ensure that young children with disabilities and delays, and especially children with multiple disabilities and profound developmental delays, are afforded the kind of learning opportunities that are most likely to affect positive changes in their behavior and collateral responding. Available evidence, however, indicates that this does not seem to widely be the case in many Individuals with Disabilities Education Act (IDEA) Part C early intervention programs and Part B(619) preschool special education programs (see e.g., Campbell & Halbert, 2002; McBride & Peterson, 1997).

References

232

Booth, C. L., Johnson-Crowley, N., & Barnard, K. E. (1985). Infant massage and exercise: Worth the effort? MCN: American Journal of Maternal-Child Nursing, 10, 184-189. Campbell, P. H., & Halbert, J. (2002). Between research and practice: Provider perspectives on early intervention. Topics in Early Childhood Special Education, 22, 213-226. Coelli, T. (2005). An introduction to efficiency and productivity analysis (2nd ed.). New York: Springer. Comanor, W. S., & Leibenstein, H. (1969). Allocative efficiency, x-efficiency and the measurement of welfare losses. Economica, 36, 304-309. Domaracki, L. S., & Sisson, L. A. (1990). Decreasing drooling with oral motor stimulation in children with multiple disabilities. American Journal of Occupational Therapy, 44, 680-684. Dunst, C. J. (1981). Social concomitants of cognitive mastery in Down's syndrome infants. Infant Mental Health Journal, 2, 144-154. Dunst, C. J. (1982). Theoretical bases and pragmatic considerations. In J. Anderson (Ed.), Curricula for high-risk and handicapped infants (pp. 13-23). Chapel Hill, NC: Technical Assistance Development System. Dunst, C. J. (1984). Infant visual attention under response-contingent and response-independent conditions. Journal of Applied Developmental Psychology, 5, 203-211. Dunst, C. J. (2003). Social-emotional consequences of response-contingent learning opportunities. Bridges, 1(4), 1-17. Available at http://www.researchtopractice.info/bridges/bridges_vol1_no4.pdf. Dunst, C. J., Bruder, M. B., Trivette, C. M., Raab, M., & McLean, M. (1998, May). Increasing children's learning opportunities through families and communities early childhood research institute: Year 2 progress report. Asheville, NC: Orelena Hawks Puckett Institute. Dunst, C. J., Cushing, P. J., & Vance, S. D. (1985). Response-contingent learning in profoundly handicapped infants: A social systems perspective. Analysis and Intervention in Developmental Disabilities, 5, 33-47. Dunst, C. J., & Didoha, J. (1976). An infant intervention strategy: The development of contingency awareness. Western Carolina Center Papers and Reports, 6, Number 1. Morganton, NC: Family, Infant, and Preschool Program, Western Carolina Center. Dunst, C. J., Raab, M., Trivette, C. M., Wilson, L. L., Hamby, D. W., Parkey, C., Gatens, M., & French, J. (2007). Characteristics of operant learning games associated with optimal child and adult social-emotional consequences. Manuscript submitted for publication. Flett, P. J. (2003). Rehabilitation of spasticity and related problems in childhood cerebral palsy. Journal of Paediatics and Child Health, 39, 6-14. Griffiths, R. (1954). The abilities of babies: A study in mental measurement. London: University of London Press.

233

Hutto, M. D. (2003). Latency to learn in contingency studies of young children with disabilities or developmental delays. Bridges, 1(5), 1-16. Available at http://www.researchtopractice.info/bridges/bridges_vol1_no5.pdf. IDEA Infant and Toddler Coordinators Association. (2002, January). Alternative or complementary therapies and approaches survey results. Retrieved March 27, 2007, from http://www.ideainfanttoddler.org/hottpic3.htm. Laub, K. W., & Dunst, C. J. (1974, May). Effects of imitative and non-imitative adult vocalizations on a developmentally delayed infant's rate of vocalization. Paper presented at the annual meeting of the North Carolina Speech and Hearing Association, Durham. McBride, S. L., & Peterson, C. (1997). Home-based early intervention with families of children with disabilities: Who is doing what? Topics in Early Childhood Special Education, 17, 209-233. McWilliam, R. A. (1999). Controversial practices: The need for a reacculturation of early intervention fields. Topics in Early Childhood Special Education, 19, 177-188. O'Brien, Y. (1992). Reactions to response-contingent and noncontingent stimulation by non-handicapped infants and infants with multiple learning difficulties. Dissertation Abstracts International, 52(07), 3924B. Palisano, R. J. (1991). Research on the effectiveness of neurodevelopmental treatment. Pediatric Physical Therapy, 3, 143-148. Raab, M., & Dunst, C. J. (1997, November). Extended benefits of active learning for young children with severe disabilities. Paper presented at the International Conference on Children with Special Needs, New Orleans, LA. Raab, M., Dunst, C. J., Wilson, L. L., & Parkey, C. (2007). Early contingency learning and child and teacher concomitant social-emotional behavior. Manuscript submitted for publication. Rumble, G. (1997). The costs and economics of open and distance learning. London: Kogan Page. Sandler, A. G., & Voogt, K. (2001). Vestibular stimulation: Effects on visual and auditory alertness in children with multiple disabilities. Journal of Developmental and Physical Disabilities, 13, 333341. Sommerfelt, K., Markestad, T., Berg, K., & Saetesdal, I. (2001). Therapeutic electrical stimulation in cerebral palsy: A randomized, controlled, crossover trial. Developmental Medicine and Child Neurology, 43, 609-613. Sullivan, C. (1997). Introducing the cranial approach in osteopathy and the treatment of infants and mothers. Complementary Therapies in Nursing and Midwifery, 3, 72-76 Tarabulsy, G. M., Tessier, R., & Kappas, A. (1996). Contingency detection and the contingent organization of behavior in interactions: Implications for socioemotional development in infancy. Psychological Bulletin, 120, 25-41.

234

UNESCO. (1995). UNESCO thesaurus: A structured list of descriptors for indexing and retrieving literature in the fields of education, science, social and human science, culture, communication and information. Paris: UNESCO Publishing. Utley, B., Duncan, D., Strain, P., & Scanlon, K. (1983). Effects of contingent and noncontingent visual stimulation on visual fixation in multiply handicapped children. Journal of the Association for People with Severe Handicaps, 8(3), 29-42. Vietze, P., Foster, M., & Friedman, S. (1974). A portable system for studying head movements in infants in relation to contingent and noncontingent sensory stimulation. Behavior Research Methods and Instrumentation, 6, 338-340. Watson, J. S. (1971). Cognitive-perceptual development in infancy: Setting for the seventies. MerrillPalmer Quarterly, 17, 139-152. Watson, J. S. (2001). Contingency perception and misperception in infancy: Some potential implications for attachment. Bulletin of the Menninger Clinic, 65, 296-320

Acknowledgement The research described in this paper was supported, in part, by funding from the U. S. Department of Education, Office of Special Education Programs, Research to Practice Division (H024B0015). The opinions expressed however are those of the authors and do not necessarily reflect the official position of the Department of Education. Appreciation is extended to the teachers who participated in the study. Author Contact Information: Carl J. Dunst Research Scientist Orelena Hawks Puckett Institute 18A Regent Park Blvd. Asheville, North Carolina 28806 (828) 255-0470 [email protected] Melinda Raab, Ph.D. Associate Research Scientist Orelena Hawks Puckett Institute 18A Regent Park Blvd. Asheville, NC 28806 (828) 255-0470 [email protected] Linda Wilson, M.A. Teacher Burke County Public Schools 700 East Parker Road

235

Morganton, NC 28680 (828) 439-4312 [email protected] Cindy Parkey, M.A. Assistant Branch Head, Early Intervention Branch North Carolina Department of Health and Human Services 1916 Mail Service Center Raleigh, NC 27699 (919) 707-5534 [email protected]

Advertising in the Behavior Analyst Today Advertising is available in The Behavior Analyst Today. All advertising must be paid for in advance. Make your check payable to Joseph Cautilli. The ad copy should be in our hands at least 3 weeks prior to publication. Copy should be in MS Word or Word Perfect, RTF format and advertiser should include graphics or logos with ad copy. The prices for advertising in one issue are as follows: 1/4 Page: $50.00

1/2 Page: $100.00

Full Page: $200.00

If you wish to run the same ad in multiple issues/titles for the year, you are eligible for the following discount: 1/4 Pg.: $40 - per issue

1/2 Pg.: $75 - per issue Full Page: $150.00 - per issue

An additional one-time layout/composition fee of $25.00 is applicable For more information, or place an ad, contact Halina Dziewolska by phone at (215) 462-6737 or e-mail at: [email protected]

236

Persistent Preference in Concurrent-Chain Schedules Paul Neuman, Natalie Hansell and Elizabeth Kriso Bryn Mawr College Twelve pigeons, eight naïve and four with a history of choice between different variable interval Schedules were exposed to concurrent-chains schedules to examine the effect of history on choice patterns. The initial links were always fixed-interval (FI) 3s schedules, and the terminal links were 5-valued variable-ratio (VR) schedules that varied by condition. During baseline, terminal link requirements were identical VR 60 schedules (equal alternative conditions) for both the red and white keys, producing indifference in all but one instance. Preferences were established by making the response requirements larger or smaller (depending on the condition) on the red key alternative (unequal alternative conditions). After preferences were established, 4 subjects were exposed to the equal alternative conditions, and 8 subjects were exposed to forced choice sessions with the same equal response requirements as during baseline before exposure to the equal alternative conditions. It was found that 10 of the 12 pigeons showed preferences that persisted during returns to baseline that were primarily influenced by either the immediately preceding unequal alternative condition or another particular unequal alternative condition. Two of the pigeons with prior histories did not show any shifts in preference with changes in terminal link response requirements. Keywords: choice, persistence, preference, concurrent-chains schedules, pigeons

Given two response alternatives where one reinforcement rate is higher (rich) relative to the other (lean), behavior is allocated toward the richer alternative (Hernstein, 1970). That is, behavior is allocated between concurrently available sources of reinforcement so as to match the relative rates of reinforcement obtained from those alternatives. In addition, it has been shown that preference exists for variable over fixed schedules or reinforcement (Ahearn, Hineline, & David, 1992; Field, Tonneaau, Ahearn, & Hineline, 1996), and for the smaller ratio in concurrent variable-ratio (VR) schedules (Hernstein and Loveland, 1975; MacDonall, 1988; McMillan, Hardwick & Li, 2002). There is considerable evidence suggesting that prior conditions effect later performance. Using human subjects, Weiner (1964, 1969) found that responding under identical FI schedules differed as a function of previous schedule exposure. Specifically, response rates during a FI condition were similar to either a low or high response rate previously produced by either a fixed-ratio (FR) schedule or differential-reinforcement-of-low-rate (DRL) schedule. Additionally, research conducted by LeFrancois and Metzger (1993) provided further support for the influence of history on later responding. They trained rats to lever press under a DRL schedule before changing to a FI schedule of reinforcement. With a second group of rats, lever presses were trained first under a DRL schedule and later shifted to a FR schedule, before a FI schedule was put in place. Low rates of responding during the FI condition were only found with the first group of rats where the DRL immediately preceded the FI condition, suggesting that immediate history had more of a direct influence on FI responding than remote history in this case. Similar effects of schedule history can be found in research by Cohen, Perdersen, Kinney & Myers (1994), Urbain, Poling, Millam, & Thompson, (1978), Freeman & Lattal (1992), and Wanchisen, Tatham, & Mooney. (1989). An important question involves persistence after changes in reinforcement schedules. Research by Poppen (1982) with young adult university students involved lever pressing producing points programmed on a concurrent schedule. After training with one set of concurrent schedules, all subjects were shifted to a second schedule with different response requirements. The investigations showed that for many of the schedules, naïve subjects performed differently than subjects with prior histories. Specifically, history exposure produced a subsequent pattern of responding (defined by response rate and post-reinforcement pauses) similar to that demonstrated under the previous set of schedule alternatives. That is, once a pattern of responding was established, it tended to persist after a change in schedule alternatives leading “to very inefficient performance.”

237

Much of the research examining schedule history was on resistance to change of a single response. However, McClean and Blampied (1995) examined resistance to change with pigeons in multiple schedules where the components involved concurrent schedules and responding was assessed in transition and at steady state. In part 1 of the study, a two-component multiple schedule was in place where pecks on the left key were reinforced on either a rich or lean variable-interval (VI) schedule, depending on the component. The right key schedule was initially a VI 120s schedule, and responding was disrupted on both keys by changing the right key to some other VI schedule. Consistent with behavioral momentum, responding on the left key in part one changed more in the component with the lower rate of reinforcement when it was disrupted by changes in the rate of reinforcement on the right key. However, responding on the right key changed more in the component with the higher rate of reinforcement (inconsistent with behavioral momentum). Parts 2 and 3 involved extinction and VI 80s assigned to the right key, and various variable-time (VT) food deliveries were disrupters. During these conditions, there was greater resistance to change in the components with the greatest overall reinforcement, which was consistent with other behavioral momentum experiments. More recently, Nevin, Grace, Shasta, and Mclean (2001), addressed the relation between resistance to change and preference. Their experiments demonstrated that resistance to change and preference do depend in part on response rate. Using pigeons as subjects, Bailey & Mazur (1990), investigated the rate of acquisition of preference in concurrent schedules by changing the probability of reinforcement of one of two alternatives. The experiment included ten conditions consisting of one or more equal probability of reinforcement sessions, one transition session, and one to five unequal probabilities of reinforcement sessions. In the equal probability procedure, reinforcers were assigned to one of two response alternatives with equal probability. This procedure produced near equal rates of responding on each key, or indifference. The transition session was comprised of trials of the previous equal probability session and subsequent unequal probability trials. The unequal probability procedure involved alternatives with two distinct reinforcement probabilities, one higher than the other. The probabilities of reinforcement used in the unequal probability procedure varied across the conditions, and in nearly every case, subjects developed a preference for the key with the higher probability of reinforcement. The results showed that the rate of preference acquisition was faster when the ratio of the two reinforcement probabilities was relatively large, regardless of the mathematical difference between the probabilities. Additional experiments using similar free-operant procedures produced identical results (Mazur & Ratti, 1991). In addition, research using similar procedures has demonstrated that preference acquisition is further accelerated when the overall reinforcement rate in a session is increased (Mazur, 1992). Research on the effects of history on patterns of choice provides the best context for the current experiments. Nevin & Grace, (2000) examined the relation between preference (concurrent-chains schedules) and persistence (multiple schedules) in constant duration components. Resistance to change was characterized by additive effects when disrupted by intercomponent food, extinction, and intercomponent food plus extinction, and systematically related to reinforcement ratios. However, preference was even more sensitive to reinforcement ratios. They concluded that constant duration components increase the sensitivity of preference and resistance to change, and that each are independent and convergent effects of reinforcement history. Field, Tonneau, Ahearn, and Hineline (1996) had pigeons choose between an FR schedule and a VR schedule with an arithmetic mean twice that of the FR schedule (e.g. FR 30 vs. VR60) in a series of concurrentchains schedules. They systematically manipulated the minimum and maximum response requirements of the VR alternative (ascending/descending, descending/ascending) every 11 sessions while keeping the arithmetic mean constant, and assessed preference for the FR alternative. The proportion of FR selections increased and decreased as the minimum response requirement increased and decreased, indicating sensitivity to the minimum requirement of the VR alternative. In a follow-up study, Andrzejewski, Field, and Hineline (2001), pigeons chose between an FR 20 and a VR 40 schedule of reinforcement in which variations in the response requirement of the VR alternative occurred within each session. Half of the pigeons (group 1) were exposed to an ascending and then descending series of

238

ratio requirements of the VR, and the other half (group 2) were exposed to descending and then ascending series of ratio requirements. Choice proportions changed with the cycling response requirement for the second group (descend/ascend) of pigeons as in Field et al. (1996), but not for the first group (ascend/descend). In a second condition, group 2 subjects were exposed to an unchanging condition, but their choice proportions continued to cycle as if the VR response requirements were changing as in phase 1. They also observed that cycling of preference proportions was more strongly determined by the descending portion of the series, as in Field et al. (1996). Ono (2004, 2005) examined the effects of prior experience on preference. In the initial experiments, pigeons were provided with one of three possible histories: forced choice, free choice, and both forced and free choice; each terminal link ending with a food delivery. In a subsequent experiment, pigeons were exposed to either forced choice or free choice, but the probability of food delivery was .5 at the end of each terminal-link. Preference was assessed in a test phase by arranging free and forced choice as terminal-links. Pigeons’ preferences rapidly shifted to the terminal-links which they had no prior exposure. The same shift occurred when the probability of reinforcement was lower, but it was not as immediate. Ono, Yamagishi, Aotsuka, Hojo, & Nogawa (2005), examined the effects of a history with particular initial-link stimuli on subsequent preference. Pigeon’s were exposed to multiple concurrent-chains schedules in which the initial response requirements of the terminal-links were equal, producing indifference. When one of the terminal-link alternatives in each component was changed to extinction, a strong preference for the alternative terminal-link occurred. With subsequent returns to equal terminal-link response requirements, choice patterns were characterized by indifference, except when initial-link and terminal-link stimuli were identical to the preference formation condition. That is, when the stimuli were the same as during the preference formation condition, clear evidence for the stimulus that was correlated with non-extinction alternative remained, despite the fact that the terminal-link response requirements were identical. The experiments reported here are primarily concerned with degree of persistence preference as a function of history. Short initial-links were used so preferences were primarily determined by the terminal-link response requirements. Initially, the terminal-link response requirements were equal, VR 60 schedules of reinforcement, producing indifference. Then, the response requirement for one alternative was made either larger or smaller, depending on the condition, producing a preference. Subsequent returns to baseline, where terminallink response requirements were equal did not completely eliminate preferences, indicating history effects. Method Subjects The subjects were twelve White Carneau pigeons designated as A1, A2, A5, A6, X14, X5, X22, X6, Z23, Z24, Z27, Z28. All pigeons were trained to key-peck prior to participation in this experiment. Pigeons X14, X5, X22, and X6, had previous experimental histories with concurrent-chains schedules that were similar to the current experiments, but involved different terminal-links (interval schedules as apposed to ratio schedules). The pigeons were housed individually in stainless steel cages and were subject to a 12:12-hour light/dark cycle (lights on at 8:00 a.m.). The subjects were maintained at 80% of their free-feeding weights by additional feeding after sessions when necessary. Pigeons had free access to water and grit while in their home cages. Apparatus Experimentation was conducted in three operant chambers for pigeons measuring 30.5 cm high, 30.5 cm wide, 31 cm long. The chambers were enclosed in a sound-attenuating box containing a ventilation fan. Two chambers were identical, both being equipped with a food hopper and two translucent response keys. Each response key was mounted 22 cm above the floor on the back wall of the chamber, and could be illuminated either red or white. The food hopper was located under and between the two response keys, and access to it was accompanied by illumination inside the hopper-housing unit. The third chamber had identical measurements, but included a third center key directly above the food hopper that was not used. Reinforcement consisted of 2.25

239

seconds access to the food hopper, which was filled with mixed grain. Procedural events and data recording for the experiment were accomplished using a MED-PC® system and a personal computer. Procedure Experimental sessions were conducted five days per week. All procedures involved concurrent-chain schedules with the initial-link providing a choice between a red or white key. The initial-links consisted of FI 3s schedules to ensure that pigeons had sufficient time to observe both keys and at the same time, were short enough that that preferences were almost exclusively determined by the terminal-links. Responding during the initial-link turned off the unselected key (terminal-link entry). The terminal-links were VR schedules, which were correlated with key color, and randomly assigned to the right and left keys for each cycle. Ratio schedules were chosen for the terminal-links because changes tend to be easily discriminated so that preference shifts, if they were to occur, would be relatively quick. The schedule requirement correlated with the white key was always VR 60. All VR schedules in this experiment consisted of five values, including the mean, sampled with replacement. For instance, a VR 60 consisted of the values 50, 55, 60, 65, and 70. The response requirement correlated with the red key changed depending on condition. Completing the VR response requirement for the chosen alternative resulted in access to food, followed immediately by the opportunity to make another choice, a return to the initiallinks. Each session ended after 40 such cycles, or after 90 minutes had elapsed, which ever occurred first. Incomplete sessions (less than 40 cycles) because of mechanical failures or slow responding, which rarely occurred, were repeated to ensure adequate exposure to each condition. For pigeons A1, A2, A5, and A6, there were two types of conditions that were replicated for a total of 9 conditions: equal alternatives (conditions, 1,3,5,7,9), and unequal alternatives (conditions 2,4,6,8). After the equal alternative baseline, condition type occurred in a fixed order outline in Table 1. Each condition was in place for 15 to 20 sessions. For pigeons X14, X5, X22, X6, Z23, Z24, Z27, Z28, there were 3 types of conditions that were replicated for a total of 13 conditions: equal alternatives (conditions 1, 4, 7, 10, 13), unequal alternatives (conditions 2, 5, 8, 11,), and forced choice (conditions 3,6,9,12,). After the equal alternative baseline, condition type occurred in a fixed order (unequal alternatives, forced choice, and equal alternatives) as shown in Table 1. Equal alternative and unequal alternative conditions, except for baseline, were in place for 15 sessions each, as this was sufficient to produce shifts in choice patterns when using ratio schedules in terminal-links (Neuman, Ahearn & Hineline, (1997, 2000). Forced choice conditions lasted 5 sessions, except in condition 6 due to a mechanical failure. Condition types are described below. Table 1. Order and Number of Sessions Per Condition Conditions 1 2 3 4 5 6 7 8 9

A1 TLW TLR VR 60 VR 60 VR 60 VR 30 VR 60 VR 60 VR 60 VR 90 VR 60 VR 60 VR 60 VR 30 VR 60 VR 60 VR 60 VR 90 VR 60 VR 60

A2 A5 TLW TLR VR 60 VR 60 VR 60 VR 30 VR 60 VR 60 VR 60 VR 90 VR 60 VR 60 VR 60 VR 30 VR 60 VR 60 VR 60 VR 90 VR 60 VR 60

A6 TLW TLR VR 60 VR 60 VR 60 VR 90 VR 60 VR 60 VR 60 VR 30 VR 60 VR 60 VR 60 VR 90 VR 60 VR 60 VR 60 VR 30 VR 60 VR 60

TLW TLR VR 60 VR 60 VR 60 VR 90 VR 60 VR 60 VR 60 VR 30 VR 60 VR 60 VR 60 VR 90 VR 60 VR 60 VR 60 VR 30 VR 60 VR 60

240

Conditions 1 2 3 4 5 6 7 8 9 10 11 12 13 Conditions 1 2

X14 TLW TLR Ss VR 60 VR 60 39 VR 60 VR 30 15 Forced choice 5 VR 60 VR 60 15 VR 60 VR 90 15 Forced choice 4 VR 60 VR 60 15 VR 60 VR 30 15 Forced choice 5 VR 60 VR 60 15 VR 60 VR 90 15 Forced choice 5 VR 60 VR 60 15 X22 TLW TLR Ss VR 60 VR 60 36 VR 60 VR 90 15

X5 TLW TLR Ss VR 60 VR 60 10 VR 60 VR 30 15 Forced choice 5 VR 60 VR 60 15 VR 60 VR 90 15 Forced choice 4 VR 60 VR 60 15 VR 60 VR 30 15 Forced choice 5 VR 60 VR 60 15 VR 60 VR 90 15 Forced choice 5 VR 60 VR 60 15 X6 TLW TLR Ss VR 60 VR 60 36 VR 60 VR 90 15

Z27 TLW TLR Ss VR 60 VR 60 18 VR 60 VR 30 15 Forced choice 5 VR 60 VR 60 15 VR 60 VR 90 15 Forced choice 4 VR 60 VR 60 15 VR 60 VR 30 15 Forced choice 5 VR 60 VR 60 15 VR 60 VR 90 15 Forced choice 5 VR 60 VR 60 15 Z23 TLW TLR Ss VR 60 VR 60 18 VR 60 VR 90 15

Z28 TLW TLR Ss VR 60 VR 60 18 VR 60 VR 30 15 Forced choice 5 VR 60 VR 60 15 VR 60 VR 90 15 Forced choice 4 VR 60 VR 60 15 VR 60 VR 30 15 Forced choice 5 VR 60 VR 60 15 VR 60 VR 90 15 Forced choice 5 VR 60 VR 60 15 Z24 TLW TLR Ss VR 60 VR 60 18 VR 60 VR 90 15

241

3 4 5 6 7 8 9 10 11 12 13

Forced choice 5 VR 60 VR 60 15 VR 60 VR 30 15 Forced choice 4 VR 60 VR 60 15 VR 60 VR 90 15 Forced choice 5 VR 60 VR 60 15 VR 60 VR 30 15 Forced choice 5 VR 60 VR 60 15

Forced choice 5 VR 60 VR 60 15 VR 60 VR 30 15 Forced choice 4 VR 60 VR 60 15 VR 60 VR 90 15 Forced choice 5 VR 60 VR 60 15 VR 60 VR 30 15 Forced choice 5 VR 60 VR 60 15

Forced choice 5 VR 60 VR 60 15 VR 60 VR 30 15 Forced choice 4 VR 60 VR 60 15 VR 60 VR 90 15 Forced choice 5 VR 60 VR 60 15 VR 60 VR 30 15 Forced choice 5 VR 60 VR 60 15

Forced choice 5 VR 60 VR 60 15 VR 60 VR 30 15 Forced choice 4 VR 60 VR 60 15 VR 60 VR 90 15 Forced choice 5 VR 60 VR 60 15 VR 60 VR 30 15 Forced choice 5 VR 60 VR 60 15

Equal Alternatives. The white terminal-link (TLW), which was always a VR 60 as described previously and the red terminal link (TLR) had identical response requirements during baseline and during the subsequent equal alternative conditions. The equal alternatives were designed to produce indifference during baseline, and to reveal the extent to which preferences persisted during subsequent returns to equal alternatives. The baseline condition lasted 10 sessions or until stable preferences were established (based on visual inspection of figures showing no changes in day to day variability: Sidman, 1960). Unequal Alternatives. The red terminal-link (TLR) shifted from a VR 60 during baseline and the equal alternatives conditions to either a VR 90 or a VR 30, depending on the condition. The purpose of these conditions was to create a preference for either TLw or TLR, which depended on the response requirement of TLR (either VR 30 or VR 90). Forced choice. Forced choice was comprised of 20 trials of exposure to TLW and 20 trials of exposure to TLR. The response requirement for both terminal-links was VR 60s. With the exception of baseline, forced choice conditions always preceded equal alternative conditions. The purpose of forced was to control for the possibility that selecting the previously preferred terminal link during unequal alternative conditions continued during equal alternatives resulting in a higher reinforcement rate for that alternative. That is, to discount an alternative explanation for persistent preference during equal alternative conditions observed for pigeons A1, A2, A5, and A6. Results The number of TLW (the unchanging alternative) selections was determined and plotted for all subjects and conditions. Because there were an unequal number of sessions in each condition, figure 1 shows the first 5 sessions and last 5 sessions of each condition, separated by a line break, for pigeons A1, A2, A5, and A6. As noted, both TLW and TLR were initially VR 60 schedules for all 4 pigeons. This condition continued until there was no preference for either of the two alternatives. As figure 1 shows, it continued until a pattern of indifference emerged, between 15 and 25 selections of each alternative of a total of 40 cycles. In conditions 2, 4, 6, and 8, when the terminal link response requirements were unequal (TLR was either VR 30 or VR 90), strong preferences were established for either TLW or TLR, depending on whether TLR was a VR 30 or a VR 90. These data are shown by the closed circles and squares in each panel. For pigeon A2, preferences were not strong until the last 5 sessions of conditions 6 and 8.

242

Baseline VR60/ VR60 VR30 / VR60 VR90 /VR 60 After VR30 (VR 60 / VR60) After VR90 (VR 60 / VR60)

A1

A2 40

40

30

Number of VR 60 Selections

30

20

20

10

10

0

0 2

4 6

8 10 2

4

6 8 10 2

4

6

8 10 2

4

6

8 10 2

4

6

8 10

2

4

6

A5

8 10 2

4

6

8 10 2

4 6

8 10 2

4

6 8 10 2

4

6

8 10

A6

40

40

30

30

20

20

10

10

0

0 2

4

6

8 10 2

4

6

8 10 2

4

6

8 10 2 4

6

8 10 2

4 6

8 10

2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10

Session Figure 1: The number of VR 60 (TLW) selections by subject A1 (upper left panel), A2 (upper right panel), A5 (lower left panel) and A6 (lower right panel), when TLR was VR90, VR30, or VR60, depending on condition. Each data series consists of the first 5 and last five sessions of each condition, separated by a break. After baseline, each section shows either two unequal alternative or two equal alternative conditions for the purposes of comparison.

243

Persistent preference was evident with returns to baseline (conditions 3, 5, 7, and 9), when TLW and TLR had equal response requirements of VR 60. As the open squares and circles of figure 1 show, preference persisted for all 4 pigeons during the first 5 sessions of each of these conditions, with A2 showing the weakest preference. What is most interesting is that with the exception of A2, preferences persisted through the last 5 sessions of the equal alternative conditions. That is, choice selections were a function of preferences established during conditions 2, 4, 6, and 8. Had choice not been characterized by persistent preference that were established during earlier conditions, data for conditions 3, 5, 7, and 9 would have been characterized by indifference, as during baseline (condition 1). Although preferences were repeatedly established by returning to the conditions that produced them, repeated returns to the contingencies that initially produced indifference did not eliminate choice preference. The last ten sessions of baseline where the terminal links were equal are shown in the first section of the upper panels of figures 2 through 5. Choice patterns during baseline were characterized by Indifference for all subjects except for X22, as the mean selection of TLW was typically between 17 and 26 of a total of 40 selection opportunities, depending on the subject. For instance, the mean number of TLW selections during baseline for Z27 (the lower left panel of figure 2) was 17, indicating indifference. Pigeon X22 showed a preference (the lower left panel of figure 5), as the mean TLW selections was 12, slightly below other subjects. In the upper panels of figures 2 through 5, each section after baseline shows two data series, an unequal alternative condition (open circles) and the subsequent equal alternative condition (closed circles). The lower panels are bar graphs with the means of each data series calculated for each condition. A solid black horizontal line at the baseline mean stretches the length of the lower panels to facilitate comparison between baseline and subsequent equal alternative conditions. Conditions 3, 6, 9, and 12 were forced choice, where selections were not made and a therefore not shown. In figure 2, the upper panels show TLW selections for each session of each condition for pigeons Z27 and Z28. During conditions with unequal response requirements in the terminal links, both of these pigeons preferred the alternative with lower response requirement. Persistent preference was evident with returns to baseline, when TLW and TLR had equal response requirements of VR 60. For pigeon Z27, preferences persisted during equal alternative conditions as choice patterns were influenced by the unequal response requirements that immediately preceded each equal alternative. That is, rather than returning to indifference, choice patterns showed a preference for the alternative with the lower response requirement during the previous condition. Although the preference was not typically as extreme as during the unequal conditions, it was during the last return to equal alternatives (condition 13).

244

Equal Alternatives (VR 60/VR 60) Unequal Alternatives (VR 30 or VR 90 depending on condition)

Z27 30/60

90/60

Z28 30/60

90/60

30/60

40

90/60

30/60

90/60

30

20

Number of VR 60 Selections

10

0 5 10 5 10 15 5 10 15 5 10 15 5 10 15

5 10 5 10 15 5 10 15 5 10 15 5 10 15

Session

40

40

30

30

20

20

10

10

0

0 1

2

4

5

7

8

10

11

13

1

C ondition

2

4

5

7

8

10

11

13

Equal Alternatives (60/60) Unequal Alternatives (90/60) Unequal Alternatives (30/60)

Figure 2: The number of VR 60 (TLW) selections by subject Z27 (upper left panel) and by subject Z28 (upper right panel), when TLR was VR90, VR30, or VR 60, depending on the condition. The first section shows the baseline (equal alternative, closed circles), and subsequent sections include an equal alternative condition (closed circles). The other data series (open circles) in each section subsequent to baseline is an unequal alternative condition with response requirements for that condition indicated above the data. The lower panels shows mean selections of VR 60 (TLW) in each condition by subject Z27 (lower left panel) and subject Z28 (lower right panel).

Pigeon Z28 also showed persistent preference as noted. However, preference was a function of the unequal response requirements during condition 2 when TLR was VR 30. After that, TLW selections never

245

exceeded a mean of 17 (condition 13) during the equal alternative conditions. While this may be regarded as evidence of indifference, it was far from the mean number of TLW selections during baseline, which was 26 selections. This is noteworthy given that during conditions 5 and 11, when TLR was VR 90, there was a small but clear preference of for TLW. Figure 3 shows data for pigeons Z23 and Z24. Like Z27 and Z28, both Z23 and Z24 show evidence of persistent preference with returns to equal alternative conditions. As can be seen in the upper and lower left panels, Z23 showed a preference shift for the smaller response requirement during unequal conditions, and this preference persisted during equal alternative conditions. Preference was most extreme when TLR was a VR 30 (conditions 5 and 11), and the greatest effect on equal alternative preference followed these conditions (conditions 7 and 13). Pigeon Z24 (right panels) showed a preference for TLW after TLR was a VR 90 that persisted into condition 4. When TLR changed to a VR 30 (center section upper panel, condition 5 lower panel), preference shifted to that alternative. However, there was a return to indifference during the next equal alternative condition (center section upper panel, condition 5 lower panel). As can be seen in the session by session data (upper panels) and means for conditions (lower panels), preference was again established for TLR during condition 11 that persisted during the subsequent equal alternative condition (condition 13). Equal Alternatives (VR 60/ VR60) Unequal Alternatives (VR 30 or VR 90 depending on condition)

Z23 90/60

Number of VR 60 Selections

40

Z24

30/60

90/60

30/60

90/60

40

30

30

20

20

10

10

0

30/60

90/60

30/60

0 5 10 5 10 15 5 10 15 5 10 15 5 10 15

5 10 5 10 15 5 10 15 5 10 15 5 10 15

Session

40

40

30

30

20

20

10

10

0

0 1

2

4

5

7

8

10

11

13

1

Condition

2

4

5

7

8

10

11

13

Equal Alternatives (60/60) Unequal Alternatives (90/60) Unequal Alternatives (30/60)

Figure 3: The number of VR 60 (TLW) selections by subject Z23 (upper left panel) and by subject Z24 (upper right panel), when TLR was VR90, VR30, or VR 60, depending on the condition. The first section shows the baseline (equal alternative, closed circles), and subsequent sections include an equal alternative condition (closed circles). The other

246

data series (open circles) in each section subsequent to baseline is an unequal alternative condition with response requirements for that condition indicated above the data. The lower panels shows mean selections of VR 60 (TLW) in each condition by subject Z23 (lower left panel) and subject Z24 (lower right panel).

For pigeons X5, X6, X14, and X22 who had histories with interval schedules as terminal links, the results are mixed. Figure 4 shows data for pigeons X5 (left panels) and X14 (right panels). Pigeon X5 showed a preference for TLR during the second condition that persisted throughout the remainder of the experiment. Although preference shifts occurred when TLR was a VR 90 so that there were more TLW selections than when TLR was a VR 30, there were never as many TLW selections as during baseline. Although this shift persisted during subsequent equal alternative conditions (conditions 7 and 13), there were not as many TLW selections as during baseline.

Equal Alternatives (VR 60/VR 60) Unequal Alternatives (VR 30 or VR 90 depending on condition)

X5

Number of VR 60 Selections

40

30/60

90/60

X14 30/60

40

90/60

30

30

20

20

10

10

0

30/60

90/60

30/60

90/60

0 5 10 5 10 15 5 10 15 5 10 15 5 10 15

5 10 5 10 15 5 10 15 5 10 15 5 10 15

Session

40

40

30

30

20

20

10

10

0

0 1

2

4

5

7

8

10

11

13

1

Condition

2

4

5

7

8

10

11

13

Equal Alternatives (60/60) Unequal Alternatives (90/60) Unequal Alternatives (30/60)

Figure 4: The number of VR 60 (TLW) selections by subject X5 (upper left panel) and by subject X14 (upper right panel), when TLR was VR90, VR30, or VR 60, depending on the condition. The first section shows the baseline (equal

247

alternative, closed circles), and subsequent sections include an equal alternative condition (closed circles). The other data series (open circles) in each section subsequent to baseline is an unequal alternative condition with response requirements for that condition indicated above the data. The lower panels shows mean selections of VR 60 (TLW) in each condition by subject X5 (lower left panel) and subject X14 (lower right panel).

Pigeon X14 showed a choice pattern of indifference during baseline. The first switch to unequal alternatives (TLR of VR 30) produced a preference for that alternative that persisted through condition 3, which involved equal alternatives. For the remainder of the experiment, pigeon X14 showed indifferent choice patterns. This was true even when TLR was a VR 90, when preference for TLW was predicted due to its smaller response requirement. Figure 5 shows data for pigeons X6 (left panels) and X22 (right panels). The choice pattern of pigeon X6 was indifferent during baseline, and remained that way until condition 11, when TLR was a VR 30. Then, there was a preference for TLR. During the final equal alternative condition (condition 13), there was a return to indifference but still fewer TLW selections than any other condition with the exception of condition 11. This provides slight evidence of influence from the immediately preceding unequal alternative condition.

Equal Alternatives (VR60/ VR 60) Unequal Alternatives (VR30 or VR90 depending on condition)

X6

Number of VR 60 Selections

40

90/60

30/60

X22 90/60

40

30/60

30

30

20

20

10

10

0

60/90

60/30

60/90

60/30

0 5 10 5 10 15 5 10 15 5 10 15 5 10 15

5 10 5 10 15 5 10 15 5 10 15 5 10 15

Sessions

40

40

30

30

20

20

10

10

0

0 1

2

4

5

7

8

10

11

13

1

Condition

2

4

5

7

8

10

11

13

Equal Alternatives (60/60) Unequal Alternatives (90/60) Unequal Alternatives (30/60)

248

Figure 5: The number of VR 60 (TLW) selections by subject X6 (upper left panel) and by subject X22 (upper right panel), when TLR was VR90, VR30, or VR 60, depending on the condition. The first section shows the baseline (equal alternative, closed circles), and subsequent sections include an equal alternative condition (closed circles). The other data series (open circles) in each section subsequent to baseline is an unequal alternative condition with response requirements for that condition indicated above the data. The lower panels shows mean selections of VR 60 (TLW) in each condition by subject X6 (lower left panel) and subject X22 (lower right panel).

Pigeon X22 showed preferences during equal alternative conditions that were a function of the immediately preceding unequal alternative conditions. During baseline, this pigeon preferred TLR despite the fact that both terminal links were a VR 60. The first and third switch to unequal alternatives (TLR of VR 90, conditions 2 and 8) eliminated this preference so choice was characterized by indifference. Interestingly, returning to equal alternatives (conditions 4 and 10) resulted in a slight preference for TLW, which is the opposite of baseline. During conditions 5 and 11 when TLR was a VR 30, X22 preferred this alternative slightly more than during baseline. The equal alternative conditions (conditions 7 and 13) that followed TLR of VR 30 also produced a strong preference for the red alternative. In fact, the strongest preference overall was during condition 13, the final equal alternative condition. Discussion Ten of the twelve subjects showed evidence of persistent preferences. The first group of pigeons (A1, A2, A5, and A6) all showed preferences during the equal alternative conditions that were affected by the immediately preceding unequal alternative conditions. With the exception of conditions 7 and 9 for pigeon A2, preferences persisted through the last 5 sessions of the equal alternative conditions. There were two experimental design concerns with this group of pigeons that were addressed with the two subsequent groups of pigeons. First, each condition with the exception of baseline was in place for 15 to 20 sessions. The other two groups of pigeons were exposed to 15 sessions per condition with the exception of baseline. Second, forced choice sessions were introduced after the unequal alternative conditions for the other two groups of pigeons to address an alternative interpretation to persistent preference being a history effect. That is, one interpretation of persistence during equal alternative conditions is that the previously preferred alternative continues to yield a higher rate of reinforcement during equal alternative conditions by virtue of its higher rate of selection. The purpose of introducing the forced choice sessions was ensure equal exposure to both equal VR 60 alternatives before persistent preference was examined during the equal alternative conditions. Six of the eight remaining pigeons showed evidence of persistent preference. Preferences considered persistent can be characterized by two distinct patterns. The first pattern shown by three of the six pigeons (Z24, Z27, and X22) consisted of preferences during the equal alternative conditions that were influenced by the preceding unequal alternative condition. That is, the alternative that was preferred during a particular unequal alternative was the same alternative preferred during the subsequent equal alternative condition. In addition, preference shifted for these three pigeons according to the response requirement of TLR during the unequal alternative conditions. The second pattern of persistence demonstrated by the other three pigeons (Z23, Z28, and X5) involved a preference established by one equal alternative condition that continued through subsequent equal alternative conditions. In fact, a preference was established for pigeon X5 that continued for the remainder of the experiment. It is important to note that terminal link changes were discriminable, as shifts in preference did occur. However, these shifts were not great enough so that choice patterns during equal alternative conditions could be described as indifferent like during the equal alternative baseline. Pigeons X6 and X14 did not show persistent preferences. However, neither of these pigeons demonstrated any shifts in choice patterns with changes in terminal link response requirements. Interestingly, these two pigeons along with X5 and X22 had histories with variable interval schedules as terminal links. All four pigeons did not show preference shifts when there were changes in terminal links during those experiments.

249

It is possible that the history with interval schedules as terminal links contributed to insensitivity to terminal link changes during these experiments since none of the pigeons without such histories demonstrated the same insensitivity, while half of the pigeons did. Therefore, one could argue that the lack of shifts in preference with terminal link changes represents a strong history effect. The results from the present investigation expand the generality of history effects to include persistent preference. History effects have been described as a prolonged period of behavioral adjustment (Sidman, 1960). Behavior does not immediately adjust after a change in conditions, and contingencies must be contacted for some time before there is a change in behavior. What is of interest is that preferences persisted as long as they did. One might argue that 15 sessions is not sufficient exposure to assert that preferences were quite persistent. However, it was obviously long enough to produce preferences during unequal alternative conditions for all pigeons without prior histories with interval schedules as terminal links and half with such histories. References Ahearn, W., Hineline, P.N., & David, F.D. (1992). Relative preference for various bivalued ratio schedules. Animal Learning & Behavior, 20, 407-415. Bailey, J.T., & Mazur, J.E. (1990). Choice behavior in transition: Development of preference for the higher probability of reinforcement. Journal of the Experimental Analysis of Behavior, 53, 409-422. Catania, C.A. (1998). Learning. Upper Saddle River, NJ: Prentice Hall. Cohen, S.L., Pedersen, J., Kinney, G.G., & Myers, J. (1994). Effects of reinforcement history on responding under progressive-ratio schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 61, 375-387. Freeman, T.J., & Lattal, K. A. (1992). Stimulus control of behavioral history. Journal of the Experimental Analysis of Behavior, 57, 5-15. Field, D.P., Tonneaau, F., Ahearn, W., & Hineline, P.N. (1996). Preference between variable-ratio and fixed-ratio schedules: Local and extended relations. Journal of the Experimental Analysis of Behavior, 66, 283-295. Grace, R. G., Nevin, J. A., (1997). On the relation between preference and resistance to change. Journal of the Experimental Analysis of Behavior, 67, 43-65. Hernstein, R.J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243-266. Hernstein, R.J., & Loveland, D.H., (1975). Matching and maximizing on concurrent ratio schedules. Journal of the Experimental Analysis of Behavior, 24, 107-116. LeFrancois, J.R., & Metzger, B. (1993). Low-response-rate conditioning history and fixed-internal responding in rats. Journal of the Experimental Analysis of Behavior, 59, 543-549. MacDonall, J.S. (1988). Concurrent variable-ratio schedules: Implications for the generalized matching law. Journal of the Experimental Analysis of Behavior, 50, 55-64. Mazur, J.E., & Ratti, T.A. (1991). Choice behavior in transition: Development of preference in a free-operant procedure. Animal Learning & Behavior, 19, 241-248.

250

Mazur, J.E. (1992). Choice behavior in transition: Development of preference with ratio and interval schedules. Journal of Experimental Psychology, 18, 364-378. Mazur, J.E. (2001). Hyperbolic value addition and general models of animal choice. Psychological Review, 108, 96-112. Mclean, A.P. & Blampied, N.M. ( 1995). Resistance to reinforcement change in multiple and concurrent schedules assed in transition and at steady state. Journal of the Experimental Analysis of Behavior,63, 117. McMillan, D.E., Hardwick, W.C., & Li, M. (2002). Drug discrimination under concurrent variable-ratio variableratio schedules. Journal of the Experimental Analysis of Behavior, 77, 91-104. Neuman, P., Ahearn, W.H., & Hineline, P.N. (1997). Pigeon’s choices between fixed-ratio and geometrically escalating schedules. Journal of the Experimental Analysis of Behavior, 68, 357-374. Nevin, J. A. (1974). Response strength in multiple schedules. Journal of the Experimental Analysis of Behavior, 21, 389-408. Nevin, J.A., Mandell, C., & Atak, J.R. (1983). The analysis of behavioral momentum . Journal of the Experimental Analysis of Behavior, 39, 49-60. Nevin, J. A. (1992). An integrative model for the study of behavioral momentum. Journal of the Experimental Analysis of Behavior, 57, 301-316. Nevin, J. A., Grace, R. G., (2000). Preference and resistance to change with constant-duration schedule components. Journal of the Experimental Analysis of Behavior, 74, 79-100. Nevin, J. A., Grace, R. C., Shasta. H., & McLean, A. P. (2001). Variable-ratio versus variable-interval schedules: Response rate, resistance to change, and preference. Journal of the Experimental Analysis of Behavior, 76, 43-74. Ono, K. (2004). Effects of experience on preference between forced and free choice. Journal of Experimental Analysis of Behavior, 81, 27-37. Ono, K., Yamagishi, N., Aotsuka, T., Hojo, R., & Nogawa, Y. (2005). The role of terminal-link stimuli in concurrent-chain schedules: Revisited using a behavioral-history procedure. Behavioural Processes, 70, 1-9. Poppen, R. (1982). Human fixed-interval performance with concurrently programmed schedules: A parametric analysis. Journal of the Experimental Analysis of Behavior, 37, 251-266. Sidman, M. (1960). Tactics of scientific research; evaluating experimental data in psychology. New York, NY: Basic Books. Urbain, C., Poling, A., Millam, J. & Thompson, T. (1978). d-Amphetamine and fixed-interval performance: effect of operant history. Journal of the Experimental Analysis of Behavior, 29, 385-392. Wanchisen, B.A., Tatham, T.A., & Mooney, S.E. (1989). Variable-ratio conditioning history produces high- and low-rate fixed-interval performance in rats. Journal of the Experimental Analysis of Behavior, 52, 167179.

251

Weiner, H. (1964). Conditioning history and human fixed-interval performance. Journal of the Experimental Analysis of Behavior, 7, 383-385. Weiner, H. (1969). Controlling human fixed-interval performance. Journal of the Experimental Analysis of Behavior, 12, 349-373. Author’s Note Special thanks to Rich Willard of the instrument shop at Bryn Mawr College for his help in constructing experimental chambers. Additional thanks to workers in the experimental analysis of behavior laboratory for their help in the execution of this research. Author Contact information: Paul Neuman Psychology Department Bryn Mawr College 101 N. Merion Ave. Bryn Mawr, PA 19010 (610) 526-5011 [email protected] Natalie Sheridan, M.A Immaculata University 911 Charleston Green Malvern PA 19355 Tel: (610) 745-2168 [email protected] Elizabeth Kriso 2805 Olson Drive Boulder, CO 80303 Tel: 267-257-6561

252

The Behavior Analyst Today

Volume 8, Issue 2, 2007

ASSESSING THE FUNCTION OF AGRESSION IN PSYCHIATRIC INPATIENTS Michael Daffern, BAT 8.1, 2007

www.behavior-analyst-today.com Instructions: This article by Michael Daffern, appearing in BAT 8.1, is available for One (1) Continuing Education credit for BACB Certificants by reading the article in that issue, and answering the questions below, then submitting them with payment. For details on receiving your CE Certificate for this (or any other) article, go to the end of this article. Written by Michele Katz

1) Does staff understanding of the patient’s aggression impact their willingness to help that patient? YES NO 2) Traditionally, aggression has been considered: a) b) 3) What are the characteristics most strongly associated with Instrumental Aggression? 4) What are the characteristics of Anger Mediated Aggression? 5) When a patient’s aggressive behavior is labeled as instrumental, what reactions may occur in staff? 6) Research using ACF found that 502 behaviors occurring within a secure psychiatric hospital, there was little evidence supporting the proposition that aggression was used to obtain….? 7) In the same study only 16 of the 502 aggressive behaviors were described as having….? 8) In ACF research, the assessment of function is likely to be less reliable than the recording of: _______________ 9) What are the 5 functions of aggressive behaviors according to FRAB? 10) What is the indirect benefit associated with the focus on function?

253

The Behavior Analyst Today

Volume 8, Issue 2, 2007

OBHS is providing CE credits for reading articles from the Behavior Analyst Today (or other Behavior Analyst Online journals as available) web site and answering some questions to demonstrate that you have read the article. Each article provides one CE hour and there is a $14.00 fee for one Continuing Education Credit, or for 2 or more CE CREdits, (2 or more articles), please send $12.50 per article. Go to www.behavior-analyst-today.com, or www.behavior-analyst-online.org and for links to the articles and click on the archives, or past issues links, for articles available for Continuing Education hours for BACB Certificants. Your C.E. Certificate will be sent by mail. Steps for obtaining CEs: 1. Read the article. 2. Click on the link, below, for the CE registration download form 3. Write your responses to the questions directly into email or an MS Word document (reference your answers by question number). 4. Email them to: [email protected] (MS Word attachments are fine) or include a hard copy in step 4. 5. Mail your check for $14 per article, or $12.50 Per CE for more than one article – payable to: Teresa Maxson Orlando Behavior Health Services, LLC 185 Fabyan Rd. N. Grosvenordale, CT 06255

5. You will receive your CE Certificate in the mail. Each article is worth one Continuing Education hour unless otherwise specified. Questions? Please call: 860-315-7115 Ask for Dr. Teresa Maxson OBH(S), LLC is an approved Type 2 CE provider by the BACB, and maintains responsibility for its CE offerings.

254

The Behavior Analyst Today

1. 2. 3. 4.

5.

Volume 8, Issue 2, 2007

Continuing Education Hours Form Read the article(s) in the Behavior Analyst Today or any Behavior Analyst Online journal article offered for Continuing Education Credits (online – see links below). Write your responses to the questions directly into email or an MS Word document (reference your answers by question number). Email them to: [email protected] (MS Word attachments are preferred) or include a hard copy (typewritten please) of your answers in step 4. Mail your completed form and a check, made out to Teresa Maxson, in the amount of $14 for one article, or $12.50 per article for two or more articles, to: Dr. Teresa Maxson Orlando Behavior Health Services, L.L.C. 185 Fabyan Rd. N. Grosvenordale, CT 06255 You will receive your CE Certificate in the mail. Each article is worth one Continuing Education hour.

For questions or information, please call 860-315-7115. Please fill out this form and return with your check to Teresa Maxson. All information is strictly confidential and used only as necessary for providing CE Certificates. Please check the box below to indicate each article for which you are submitting answers. Name Street City State Zip Certificant Number Email address Phone Check here if you do not want to be notified of other CE opportunities (seminars). BAT Issue

Author

Article

6.3 Raimo Lappalainen…….Functional Behavior Analysis of Anorexia Nervosa 7.2 Gary Bernfeld………….Unified perspective on family systems –teaching family model 7.2 Gary Bernfeld…………The struggle for treatment integrity in a "disIntegrated" service delivery system 7.2 Derek Hopko ………… Behavioral Activation for Anxiety Disorders NEW! 8.1 Michael Daffern…….. Assessing the functions of aggression in psychiatric inpatients

255

volume 8, issue 2

online, electronic publication of general circulation to the scientific community. ... For a free subscription to The Behavior Analyst Today, send the webmaster an e-mail .... names and dosage and routes of administration of any drugs (particularly if ... designing programs focused solely on getting the client to take medication.

3MB Sizes 2 Downloads 541 Views

Recommend Documents

Volume 2 - Issue 8.pdf
THE VICTORY SERIES ... Elohiym/Theos consist of Jah (Hebrew, YAHH, pronounced yä—Psalm 68:4), Jesus (the same as Joshua or ... Volume 2 - Issue 8.pdf.

Volume 2 - Issue 10.pdf
... http://www.youtube.com/user/SMorganEpignosis. Whoops! There was a problem loading this page. Volume 2 - Issue 10.pdf. Volume 2 - Issue 10.pdf. Open.

VOLUME IV Issue 2.pdf
Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. VOLUME IV Issue 2.pdf. VOLUME IV Issue 2

Volume 2 - Issue 1.pdf
... therefore the Lord of the harvest, that he. will send forth labourers into his harvest. And when he had called unto. him his twelve disciples, he gave them power ...

Volume 1 - Issue 2.pdf
say that Heaven is above the. earth (I Kin. 8:23) in the highest. part of creation (Job 22:12; Luke. 2:14) and far above (Eph.1:21;. 4:10). It is located north of the.

Volume 52 - Issue 2 - FINAL.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Volume 52 ...

SPRING 07 VOLUME 2, ISSUE 2 REV A.indd
tions for higher education.2 Ultimately, the label ... 2005, tim.oreilly.com, . 2.

VOLUME 2 ISSUE 2 Journal of International ... -
Ernest W. Maglischo, Ph.D. - [email protected]. Abstract. ... performance are terms associated with fatigue in events lasting one to several minutes. That type of ..... Exercise Physiology: Human Bioenergetics and Its Applications. Boston ...... Th

DH Issue 8 Volume 18 April 2017.pdf
There was a problem loading this page. Retrying... Whoops! There was a problem loading this page. DH Issue 8 Volume 18 April 2017.pdf. DH Issue 8 Volume ...

JLLT Volume 8 (2017) Issue 1
PDF format, the web page version of the text being kept. Completion of the ... Dr. Heinz-Helmut Lüger - Universität Koblenz-Landau, Germany. Prof. em.

Barry County Reflections Volume 8 Issue 2-3.pdf
Page 1 of 8. ®. A Publication of the Barry County Museum. Treasuring the Past ◊ Embracing the Present ◊ Envisioning the Future. VOLUME VIII, ISSUES 2/3. SUMMER 2015. PAGE 1. Plymouth Engine. New Addition to the Barry County Museum. With the assi

PsycINFO News, Volume 28, Issue 2, 2009 - American Psychological ...
A software platform to analyse the ethical issues of electronic patient pri- vacy policy: The S3P example. Journal of Medical Ethics,. 33, 695-698. Recupero, P. R. ...

TGIF Volume 2 Issue 1.pdf
Page 1 of 1. TGIF Student Newsletter. “Thank Goodness It's Friday”. Nicholas Orem Middle School. Volume 2, Edition 1 Friday September 2, 2016. ESOL Student ...

Volume V Issue 1and 2.pdf
prevent the reaction until being mixed with a wet ingredient. (This is why. we combine the dry ingredients first and then add the wet ingredients.) Horsford chose the corporate name “Rumford Chemical Works,” which. recognized the scientific achie

DH Issue 2 Volume 18 October 2016.pdf
United States of America. Earli- er this month, The Devil's Herald. sent out a two-question poll to. all the students, asking who they. support and what they believe ...

PsycINFO News, Volume 32, Issue 2, May 2013 - American ...
Psychotherapy App. 3 PsycCRITIQUES Book Reviews. 4 In Search of: Using. PsycCRITIQUES to Find Films. That Teach. Get More: In-Person Training in Boston ...

PsycINFO News, Volume 30, Issue 2, June 2011 - American ...
Jun 13, 2011 - participants on a host of therapy topics, which when combined with ... Results are broken out by the other databases along the top and PsycTHERAPY in the box to the left. .... the APA website for a more complete look at the available m

RC volume 1, issue 2.pdf
Hopefully you win again next year. -Layla Abdus-Salaam. Prince George's County Softball Champs. Page 3 of 5. RC volume 1, issue 2.pdf. RC volume 1, issue ...

PsycINFO News, Volume 28, Issue 2, 2009 - American Psychological ...
continued on page 2 ... 2. For books: Browse APA's book titles at http://www.apa.org/books/ until you find the ..... www.apa.org/databases/training/webinars.html.

TGIF Volume 2 Issue 1.pdf
Loading… Page 1. Whoops! There was a problem loading more pages. TGIF Volume 2 Issue 1.pdf. TGIF Volume 2 Issue 1.pdf. Open. Extract. Open with. Sign In.

HH Volume 24, Issue 2, Winter Edition.pdf
Campanale hasn?t looked better!?Next up, Bearded Santa! I. don?t know how to describe this whipped cream mess. Cheese balls are being tossed left and right ...

Journey to Cloud, Volume 2, Issue 1 - Media12
Vice President of Research and Development and Co-Founder. Zettaset ..... To develop applications with all these ..... by a Web application running on an Open-.

PsycINFO News, Volume 32, Issue 2, May 2013 - American ...
Psychotherapy App. 3 PsycCRITIQUES Book Reviews. 4 In Search of: Using. PsycCRITIQUES to Find Films. That Teach. Get More: In-Person Training in Boston ...