The Concord Consortium logo

Perspective

Potholes in the Road to Proving Technology

By Robert Tinker

News Flash: Bridges Found Useless!

A research study just released by Professor Slam A. Bridge of the University of Southern North Dakota shows that bridges have no value. In side-by-side comparisons, a recent study proved that bridges have no advantage over roads. Researchers gave drivers the option of selecting either a bridge or a straight road. To make a fair comparison, the same length and height bridge and road were placed next to each other. Both were capable of handling the same traffic. So as not to give the bridge an advantage, there was no river or other obstacle to cross placed in their path. Professor Bridge found no significant difference in most measures: transit time, accidents, or preferences.

Educational technology is getting a bad name because of some bad research. Two varieties of bad research have recently received far too much press. These illustrate the “hobbled horse race” and “trivial treatment” fallacies.

The hobbled horse race

The ersatz bridge vs. road story is an example of the horse race genre of media research. In an attempt to be “scientific,” researchers hold everything constant except the presence of computers and then look to see whether kids learn more. There is nothing magic about a computer. Its presence does not confer any inherent advantage, so this is setting up a senseless comparison.

In an attempt to be “scientific,” researchers hold everything constant except the presence of computers … a senseless comparison.

In one early study of the value of probes, researchers designed two versions of the classic middle school lab of the cooling curve. The experiment involves placing mothballs in a test tube, melting them, and then cooling them in air. The temperature of the mothballs is measured while they cool and a graph is drawn. At the melting temperature, the cooling curve shows a distinct plateau that can be related to the energy released on forming a solid. The computerized version of this experiment can be done with smaller amounts, speeding up the process and allowing more experiments in the same time it takes to generate one graph by hand. The extra experiments can be used to show what a cooling curve looks like without a phase transition, so that when a plateau is observed, students realize that it is something to be explained. Experi-ments with other substances and mixtures can further enrich the probe-based lab.

None of this, however, is “fair.” The researchers in question wanted everything held constant, so exactly the same experiment was done with and without the computer at the same pace with the same amount of mothballs. Furthermore, to ensure reproducibility, the students were drilled in exactly what computer options to use, what buttons to press, and shown what data to expect. Both versions of the experiment were very “cookbook,” and students were repeatedly warned not to deviate from the procedures because of possible danger to themselves, the computers, or the experimental results. The results were inevitable—no significant difference in student learning.

More recently, it has become popular to question the value of computer-generated animations. A recent issue of Education Week (Viadero, 2007) presented this as a debate. The anti-animation viewpoint was represented by Barbara Tversky, Ph.D., whose review of research (see, e.g., Tversky et al., 2002) has a strong similarity to the bridge and probeware stories.

She concludes, “Yet the research on the efficacy of animated over static graphics is not encouraging. In cases where animated graphics seem superior to static ones, scrutiny reveals lack of equivalence between animated and static graphics in content or procedures; the animated graphics convey more information or involve interactivity” (summary, p. 247).

In reviewing the literature, Tversky throws out all studies in which students interacted with an animation or where there was more information in the animation than in an equivalent static graphic.

“In order to know if animation per se is facilitatory, animated graphics must be compared to informationally equivalent static graphics. That way, the contributions of animation can be separated from the contributions of graphics alone without confounding with content” (p. 251).

The only conclusion was that when animations and graphics are equivalent, there is no significant difference in learning. While this may be “fair” from a research perspective, it does not establish that computer animations are ineffective, only that when they are hobbled to match the capacity of static graphics, they are no better.

The more interesting question is whether highly interactive animations result in more effective learning of difficult concepts. Are there things that simply cannot be taught other ways? It seems obvious, for instance, that the intimate knowledge that students gain of the atomic world through experimentation with the Molecular Workbench software (see p.12) or learning about molecules by “roving” around them (see p.14) would be hard to duplicate with static drawings. It is difficult to imagine how a comparison with drawings would be fair, given how immediate, interactive, and flexible the software is. It conveys more information and provides many more opportunities for learning.


News Flash: Lawn Fertilizer Found Useless!

In a careful, $10 million study of the effectiveness of fertilizing lawns, no effect was found. “This proves that fertilizers should be banned,” said a leading environmentalist, Dr. I. M. Phony. The study involved 10,000 homes in nine different climates divided at random into fertilized and non-fertilized lawns. After a year, no significant difference was found in grass growth between treated and non-treated lawns as reported by the highly respected researchers who used sophisticated statistics. (Because of practical and cost issues, the amount of fertilizer in the treatment was limited to 17 ounces per acre per year.)

The trivial treatment

The fertilizer study sounds impressive until that last sentence sinks in. The amount of fertilizer is minimal. Of course they didn’t see an effect, but the problem isn’t the fertilizer, it isn’t the statistics, it is the study design. The researchers didn’t make a large enough treatment to have an effect. The only correct conclusion is that you need more fertilizer than they used to cause a measurable growth. This is an example of the “trivial treatment” fallacy.

In April of 2007, a $10 million Department of Education study of 15 software products made front page news across the country. A headline in the Washington Post read, “Software’s Benefits on Tests in Doubt: Study Says Tools Don’t Raise Scores.” The congressionally mandated study included nearly 10,000 students in more than 130 schools randomly selected by teacher. The major finding, as the Post headline suggested, was that test scores were not significantly higher in classrooms where the software products were used. This sounds like a body blow to educational technology until you realize that this study had a trivial treatment.

Teachers typically used the software about 10% of instructional time over the course of a year, which is much less than recommended by the vendors that supplied the products being tested. For example, students used the sixth grade math products for about 17 hours per year. Over 180 school days, that would average less than six minutes daily, a trivial treatment.

The Department of Education made another huge error in the design of the study; they used multiple-choice tests. To see a small effect, you need a sensitive instrument. Good software is particularly valuable in producing gains in higher-order thinking skills, which are notoriously difficult to measure with multiple-choice tests. Research from the Technology Enhanced Learning of Science Center that we co-founded has recently shown convincing gains in student thinking as a result of well-designed activities (Linn et al., 2006). Like the DoE study, the tests were delivered to large numbers of students at the end of a year during which students had 10-20 hours of exposure to the treatment. The instrument had both multiple-choice and open-response items. An analysis of the former showed no effect, but significant gains were visible when open-ended responses were analyzed with a rubric that looked for correct ideas that were linked meaningfully—a practical definition of higher-order thinking. Had the DoE researchers used similarly well-designed open-ended questions and scoring rubrics, they might have had a chance at seeing some effect.

Average readers will conclude from the publicity generated by the Department of Education study that educational technology is useless. They will use the study to support opposition to school funding for computers and other technology.

Twenty-five years ago, a study of computer-assisted instruction examined classrooms in which students used software for either 10 or 20 minutes per day, and found positive effects on test scores. Why did we need to spend $10 million to learn again that it takes significant amounts of time to increase students’ test scores?

What’s needed

If thoroughbred technology is not hobbled, it can sweep past the plow horses educators have been using. In the right conditions, with well-designed, highly interactive software, probeware enhances student explorations and animations give unparalleled educational value. We do need more studies to demonstrate this to skeptics, but not studies that use minuscule treatments and poor measurement instruments.


Robert Tinker (bob@concord.org) is President of the Concord Consortium.

@Concord

@Concord




References

Viadero, D. (2007, January 31). Computer animation being used to bring science concepts to life: Evidence of learning gains remains sparse, Education Week, pp. 12-13.

Tversky, B., Morrison, J. B., and Betrancourt, M. (2002). Animation: Can it facilitate? International Journal of Human–Computer Studies, 57, 247-262.

Paley, A. R. (April 5, 2007). Software’s benefits on tests in doubt: Study says tools don’t raise scores. Washington Post, p. A01.

Linn, M., Lee, H.-S., Tinker, R., Husic, R., & Chiu, J. (2006). Teaching and assessing knowledge integration in science. Science, 313, 1049-1050.

Note: Thanks to Andy Zucker for his analysis of the Department of Education study.