Wobegon Republicans and Test Score Inflation

by Anonymous

It would be difficult not to be aware of it. When this Republican think tank completes a project, the universe is quickly informed. Published in a “peer-reviewed” education journal is an article, we are told, that verifies that we can trust the results of high-stakes tests.

Here’s what the fellow did. He gathered average test score data from several states. The states he chose were special in that they administered both high-stakes and low- or no-stakes tests statewide. He looked at the trends in average statewide scores of both types of tests. He judged that the trends were, essentially, parallel. That is, when year-over-year scores one the high-stakes tests rose, so did the year-over-year scores on the low- or no-stakes tests, and by approximately the same proportion. When the high-stakes test scores declined, so did the low- or no-stakes test scores, and by approximately the same proportion.

This, he declared, is evidence that we “can trust” the results of high-stakes tests. Why?

Because “There is no reason for schools or students to manipulate the results of low-stakes tests.”

This latter statement is a long-standing assertion of testing opponents, of course – one they have made repeatedly over the years in the face of its rather obvious untruth. That many journalists, and apparently now even the Republicans, believe it represents, perhaps, anti-testing research’s greatest triumph.

What they say is the exact opposite of reality, but they have continued saying it for over a decade. Given the normal parameters for public discussion of education research, where those who would dispute what they say are simply not allowed to talk, the testing critics have successfully managed to replace reality with their own artificial reality.

For those who do not understand my point, it’s time for a history lesson.

The United States has been, and remains, one of the few industrialized countries in the world without a multi-level, multi-target high-stakes testing system. For many years in much of the country students faced no systemwide high-stakes tests whatsoever. Nonetheless, standardized tests were widely administered – tests that carried no consequences for the students or teachers – no-stakes tests. These no-stakes tests were administered and scored, however, and those scores, and the trends in those scores, were reported by district superintendents to the public.

School districts purchased these norm-referenced standardized tests “off-the-shelf” from commercial test publishers. Left alone with no stakes and, thus, no one really paying any attention, local educators were free to manipulate any and all aspects of the tests. They could look at the test items beforehand, and let their teachers look at them, too. They could give the students as much time to finish as they felt like giving them. They could keep using the same form of the test year after year. They could even score the test themselves if they wanted.

The results from these internally-administered, no-stakes tests primed many a press release. Some superintendents engineered steady year-over-year improvements in their students’ average scores that had nothing to do with genuine increases in student achievement.

Left alone to manipulate the measures of their own performance, thousands of U.S. school superintendents performed magnificently.

Any new superintendent hired into a school district after a several-year run-up in scores from one of these test score pyramid schemes faced three choices – administer tests honestly and face the fallout from the resulting plunge in scores; continue the sleight-of-hand in some fashion; or declare standardized tests to be invalid measures of “real learning,” or some such, and discontinue the testing.

Many knew, of course, that numbers were being fudged but probably no one knew how pervasive the practice was until a West Virginia physician, John Jacob Cannell, published the results of his study of the problem. He was surprised that West Virginia students kept scoring “above the national average” on these national norm-referenced standardized tests, given the state’s low relative standing on other measures of academic performance.

He surveyed the situation in other states and with other national norm-referenced tests and discovered that the students in every state were “above the national average,” according to their norm-referenced test scores. The phenomenon was quickly dubbed the “Lake Wobegon Effect,” in tribute to the mythical radio comedy community of Lake Wobegon, where “all the children are above average.”

The Cannell report remains our country’s most compelling indictment of education system self evaluation. It suggested that half the school superintendents in the country were lying about their schools’ academic achievement. It implied that, with poorer results, the other half would probably lie, too.

The cure for test score inflation was obvious to everyone – externally-administered, high-stakes testing. An agency external to the local school district would be responsible for administering the tests under standardized, monitored, secure conditions, just the way it is done in hundreds of other countries. The tests would have stakes, so that students, parents, teachers, and policy makers alike would take them seriously, and adequate resources would be invested toward ensuring test quality and security.

It hardly needs to be said, however, that most education insiders, then and now, prefer the old internally-administered no-stakes tests, over which they maintain complete control, and dislike external high-stakes testing, over which they do not.

The solution to the Lake Wobegon problem, for education insiders, was to frame the innocent and keep the guilty free. Their solution was to blame the solution for the problem and to make the problem the solution. As improbable as this sleight-of-logic would seem, apparently they have managed to pull it off.

In the artificial reality of testing opponents, internal no-stakes tests – the source of the Lake Wobegon Effect – are no longer susceptible to score inflation because, as the Republican think tanker puts it: “There is no reason for schools or students to manipulate the results of low-stakes tests.”

Now, in the artificial reality, score inflation is, instead, endemic to externally and securely administered high-stakes tests. You know, tests like the SAT (for which average scores have declined over the past several decades).

J.J. Cannell has long since moved on to other matters. It is just as well. If he knew, he would be mighty disappointed with his legacy – the very people and fraudulent practices he criticized are now largely supported by a perverted interpretation of his work.

The Republican think tanker obviously believes that he has scored a point for the good guys, providing empirical support for the use of high-stakes tests. But, his method is valid only within the artificial reality of anti-testing research where a strong alignment between standards, instruction, and test content is considered a bad thing (i.e., teaching to the test) and the scores, and score increases, of internally-administered and unmonitored no-stakes tests are as pure as polished porcelain.

The think tanker’s work has achieved some victories, though, but for the wrong side:

· he provides Republican concurrence with testing critics’ assertion that external high-stakes testing causes score inflation, and that internal no stakes testing does not;

· he provides Republican concurrence with testing critics’ assertion that “teaching to the test” (which occurs naturally with good alignment) is a bad thing, and is measurable; and

· he provides Republican concurrence with testing critics’ assertion that it is legitimate to measure the “real” score increases of high-stakes standards-based tests only with an unrelated no stakes shadow test (that, likely, will be less than a fifty percent match to those standards).

Testing critics make much of the notion that the domain of humanity’s knowledge base is infinitely vast and any standardized test samples only a tiny fraction of that domain. Then, sometimes in the same speech or article, they will declare that high-stakes tests should be considered invalid if their scores, or score increases, do not “generalize” to other unrelated and dissimilar tests that sample a different tiny fraction of the vast domain.

Granted, scores on different, general tests of ability or achievement often parallel each other very closely, as intelligence researchers point out. But, it is naive to expect this from all standards-based testing. Medical students at the end of their surgical training are not likely to perform very well on a linguistics exam. More importantly, it is not fair to expect them to.

There’s more. The think tanker also provides Republican concurrence with testing critics’ assertion that there is no correlation between high-stakes, increases in motivation, and increases in achievement, in the manner explained below.

Most anti-testing research defies common sense. For example, do students and teachers try harder and learn more when facing a test with stakes than one without? Ask any Joe on the street and you will be told that of course they do. Surveys of students and teachers corroborate the notion. High stakes induce achievement gains, not only through stronger alignment of standards, instruction, and tests, but through increased motivation.

(As if it really needed to be done...) controlled experiments from the 1960s through the 1980s tested the hypothesis. Half of the students in a population were assigned to a course of study and told there would be a final exam with consequences (reward or punishment) riding on the results. The other half were assigned to the same course of study and told that their performance on the final exam would have no consequences. Guess which group studied harder and learned more?

The Republican think tanker has now joined with testing opponents in ruling out the possibility of motivation-induced achievement gains. With his methodology, any increase in scores on a high-stakes test exceeding increases in an unrelated parallel no-stakes test must be caused by “teaching to the test,” and is, thus, an artificial and inflated score gain ...not evidence of “real learning.”.

It’s unfortunate. Our country desperately needs a group to disseminate accurate education research and information ...a group with the resources and power to match the vested interests in reaching public forums. The Republican think tanks seemed the most likely candidates. Sadly, however, they appear to believe most of the anti-testing fallacies themselves.