We’ve discussed the results from a number of meta-analytic studies looking at the general relationship between some kind of Intervention on any kind of Behavior Change. I consider this evidence as probative for persuasion research and practice since almost all of the Interventions employ communication as a crucial part of the treatment, since they all look at a Change, whether in observed or self reported thoughts, feelings, or actions, and since they are never lab studies, but always applied in field settings even with randomization, control, or comparison.
The most interesting outcome for me is how similar the results are to the first intervention metas ever reported, the ones in the late 1990s from Leslie Snyder’s team on communication-based interventions. Leslie found a Small Windowpane of 46/54, an r of 0.09 (d = 0.18) with considerable heterogeneity of variance indicating a non-normal distribution of effect sizes.
More lately we found Blair Johnson’s team conducting a meta-meta that came in with an effect size of d = 0.21 and again with quite a bit of heterogeneity of variance. Then a meta of Internet delivered interventions with a d = 0.16 again with quite a bit of heterogeneity of variance. Now, I’ll add (very briefly) two more Interventions metas from a team led by Vicki Conn. They looked at the impact of workplace interventions on physical activity and found a d = 0.21 and then a grand meta of any kind of intervention (workplace included) on physical activity and found a d = 0.19 and, hey, stop me if I’ve mentioned this before, in both metas the Conn team found quite a bit of heterogeneity of variance in the effect sizes.
Do you see a pattern, folks?
Yeah, a d of approximately 0.20 or a Small Windowpane effect of 45/55. And seriously non-normal distributions around that Small d effect.
The mean and distribution simply do not change in a meaningful theoretical or practical way with any obvious Intervention characteristic (communication, workplace, pharmacological; large or small sample size; more carefully controlled field versus wild community) or any obvious Change characteristic (domain like physical activity or medical compliance; observed or self report changed). When folks attempt to change behavior in the real world and measure it scientifically (in other words a program plus its evaluation) they average a Small effect, but the results vary wildly and with a decidedly skewed distribution.
What are the implications of these repetitive and consistent research results?
1. Look at the variance, not the mean.
In this instance the distribution is more informative than the average. The only way the recurring heterogeneity can occur is if the left side of the distribution is seriously different from the right side which is another way of saying, I think, most interventions are badly done. I’m going out on a limb here and generalizing only from Leslie’s data, since that the only paper in this bunch that has allowed me to determine the distribution. Authors and editors are not publishing stem and leaf diagram or a complete listing of all the effects – just the summary tables. Assuming that Leslie’s meta is like everyone else’s, then most interventions are coming in at a practical zero effect even if they can report statistically significant change. Fewer than a third are reliably producing a practical change.
Think about this. Don’t glaze over with the Brooks Effect. A small average effect that hides many badly done interventions and a few well done ones.
2. d = 0.20 is not your standard.
Ever since Leslie’s meta, I’ve seen and heard some people expressing it as some kind of gold standard for achievement. Typically this occurs when the person is also reporting that their intervention was within range of that average and almost always that means their effect size is d = 0.14 or d = 0.09. If you think that d = .20 gives you a gold star, you do not understand the research. If your intervention is within the 95% CI of the average, then you ran a weak intervention. Properly run interventions do better.
Given that wild variance in the distribution of effect sizes, the average is a highly suspect and misleading statistic. There is clearly Something Else going on. Any researcher publishing in the peer review literature who notes she hit the average is saying too much about how little she knows.
3. What elements in interventions moderate heterogeneity?
Given all that heterogeneity, researchers have tried to partition effects into categories (called moderators) to see if they can explain all that variance. Essentially this is a button sorting task just like my Great Grand Mother Mattie gave me on rainy days. Take a messy pile of buttons and sort them into categories (size, color, shape) and you don’t have a messy pile but an orderly theory of buttons!
If you scrutinize the moderator analyses from these metas you find no consistent pattern. Most moderators address a subset of effects, and even within that subset, large variance remains. Rarely, even within a subset of effects, does a moderator reduce the variance to functional zero. And, if you then look across different metas on the same moderator, you find different outcomes; in one meta the moderator on a subset produces functional zero variance while in the others it still leaves large variance.
Moderator analyses to date have never included some kind of quality control that measures how well a team actually implemented what they claimed they implemented. My experience tells me that many teams do not properly implement the theory or maintain poor process control of the intervention as it rolls along. In either case, you’ve got a bad intervention.
It’s been my recurring experience to see teams divide the intervention into the theory side and the execution side and the two sides aren’t talking with each other. The theory group devises a great description of TpB then hands it off to the creative group who thinks TpB stands for a new strain of tuberculosis and then does what it knows best.
The best way to see this is to read their description of the “theory” or “concept” they’re using and then how they operationalize that. The communication interventions are the best for this. They claim to use Planned Behavior or Social Cognitive or Whatever, but then all you have to do is look at the messages and realize they clearly have no idea what a Norm is or what Self Efficacy is. I don’t need to Name Names here because the test is so simple. If somebody reports an effect size for their intervention that is below d = .20, they did something wrong.
If you’ve ever worked on any kind of intervention for any kind of behavior change whether for profit, charity, research, greed, or just plain curiosity, you’ve seen the myriad ways it can go bad. Interventions are almost always run with teams and anytime you have teams you have opportunities for miscommunication, misunderstanding, or mere incompetence. They typically trust each others credential as if a degree or prior experience means you don’t have to ride herd on the process.
On the academic side the worst problem I’ve seen is the lack of punishment bad interventions produce for researchers. As long as they can publish reports and make grant applications, they’re golden. And you can publish d = .05 as long as your sample size is big so your alpha is small and you can holler statistical significance, thereby fooling many reviewers and editors into publishing your failure. You can also offer many wise observations for Future Research, then you’re off to the next grant application. NIH study sections clearly give little regard to prior effect sizes when reviewing the next application. As long as the rationale is interesting (or Important), they’ll gladly ignore the bad work in the past.
This sounds like a propeller head observation as if I’m a judge at a beauty contest finding fascination in the thread count on the bikini rather than the body in the bikini, but in this case, the bikini’s construction is crucial, more like those swimsuits made from NASA materials that Olympic swimmers can no longer wear. Stated more bluntly, people don’t know how to dress their intervention bodies. And, that’s being kind. Most interventionists don’t have even the body to begin with much less a well designed and built bikini.
Scientists are not nearly as smart as we’d like to believe we are. We persist in proven failures (Health Beliefs Model? Prospect Theory and Framing in Health? Learning Styles? Discovery Learning?) waiting only until all the
advocates researchers are departed to move away from the scene of the crime. Metas have found a lot of dead bodies, but the Zombies persist. This post, a well-aimed shot, is not the Silver Bullet and the unDead will continue to ravage the literature and study sections for many years to come.
For all you persuasion mavens who never forget It’s About The Other Guy and All Bad Persuasion Is Sincere, move past these metas.
1. Stone cold and simple model of effect.
2. Stone cold and efficient implementation of that model.
3. Ruthless evaluation – take no prisoners, hostages, or captives.
4. Always save your wounded, but bury your dead.