Have Testing and Shaming Improved Schools?

The past 10 years have seen a huge shift in educational focus.  The schools you and I attended were not test-driven and were not shamed and threatened by the federal government.  Since the advent of school report cards and the harsh penalties and intense media criticism of schools, teachers have had to focus on the subjects tested (reading and math primarily) to the exclusion of other curricula.  If this school improvement strategy were working, we could expect clear data supporting it.

Instead, the data-driven policy wonks who continue to push testing and shaming ignore the data, even their own much-heralded test data.  Many would argue that test results, particularly when students are now instructed in how to take multiple-choice tests and practice test-taking ad nauseum, are an inadequate indicator of learning.  Nevertheless, it is the language the policy wonks understand.

Individual state tests vary considerably from state to state.  But there are three national and international tests with sufficient longitudinal data showing American students' test performance in Reading, Math and Science from 2000 (just prior to NCLB and the year of the first school report cards). 

The National Assessment of Educational Progress is a national exam administered at three grade levels.  It has been given since 1971 but I focused on the past 10 years.  You can see the full results by following the link below.

I've highlighted in Green if the most recent score was higher than before NCLB.  I've highlighted in Red if the most recent score was lower.  

National Assessment of Economic Progress (NAEP)

Math Test Scores199920042008
Grade 12308307306
Grade 8276281281
Grade 4232241243

Reading Test Scores199920042008
Grade 12288285286
Grade 8259259260
Grade 4212219220

The NAEP does show gains at 8th and 4th grades, though the 8th grade gains were minor.  At 12th grade, students scored worse after 8 years of instruction focused on testing in these two subjects.

There are two international tests comparing multiple countries, both of which focus on math and science -- PISA and TIMSS.  The PISA also includes reading.  I've extrapolated only US test score data but you can look at the international comparisons by following the link below.

Program for International Student Assessment (PISA)

Math Test Scores2000200320062009
Grade 10493483474487

Reading Test Scores2000200320062009
Grade 10504495--500

Science Test Scores2000200320062009
Grade 10499491489502

The PISA, which has received tremendous media attention for the US' poor performance relative to other OECD countries, shows that all of our laser focus on reading and math testing has resulted in worsening our test performance.  It's important to note that while the NAEP tests students according to the national standards and formats familiar to students who've practiced their state tests, the PISA is an entirely different test involving a slightly different range of skills.  Test-drilling in the classroom yields narrow gains in the specific items and formats to be tested.  Real teaching gives broader results.

The other international test is the TIMSS, which was given this year (2011).  Unfortunately 2011 results are not yet available but we do have 2007 results to show gains and losses since 1995, the first year of the test.  (I included 1995 because 4th graders were not tested in 1999.)

Trends in International Math and Science Study (TIMSS)

Math Test Scores 1995 1999 2003 2007
Grade 8 492 502 504 508
Grade 4 545 -- 518 529

Science Test Scores 1995 1999 2003 2007
Grade 8 534 515 527 520
Grade 4 565 -- 536 539

Looking at the cumulative results from these three tests, I note the following:

Cumulative Results (All 3 tests)

Up Down
Total Gains 6
Total Losses
Total Points Gained 44
Total Points Lost

If our diversion to teaching to tests and pressuring schools over test results is scientifically sound, we have ten years of experience to prove or disprove it.  My former superintendent observed that you can't fatten a pig by weighing it.  It appears lecturing and threatening the pig before it steps on the scale don't work either.

It's time for the data-driven crowd to start reviewing their results.