The Windowpane
Transparent Statistics

Persuasion science uses statistics.  Some people want to run away screaming anytime math enters the equation, but I want to show you a way of using numbers in a simple and straightforward way that helps you understand what’s really happening.  It’s called the “effect size” and it answers a simple question:  How big of a difference is it?  An outstanding psychologist named Jacob Cohen articulated this concept.  Another excellent psychologist, Robert Rosenthal also developed the idea further.  I’m going to combine their ideas and put them in the Windowpane Display.  Here’s how the Windowpane works.

The Windowpane

Think about a window.  Imagine that it is divided into four equal panes.  Easy to visualize, right?  Those 2 by 2 panes.  Now, let’s say we do an experiment where we randomize one group of people to get the New Thing while another group of people get the Old Thing.  To make the math simple, we’ll give 100 people the New Thing (treatment group) and 100 people the Old Thing (control group).  After each group does its Thing, we carefully observe each person to see if they Changed.  Either they did Change or the did Not Change.  Let’s dress up the Windowpanes with these labels.


Pretty simple so far.  We’re testing the New Thing against the Old Thing.  We have 100 people randomly assigned to each group.  We then see how the people Change either into Yes or No.  Now, let’s fill in each of the four little windowpanes to demonstrate different scenarios.

We’ll start with failure which is what usually happens with science.  All our good intentions are smashed against the Rock of Experimental Science and nothing happens.  Let’s be polite and call this the No Effect outcome rather than use the words scientists use when looking at the stat results on the screen and realizing their next grant application just died.  It looks like this.


We’ve got 50 people in each little windowpane.  To understand what’s happening, read each row.  We started with 100 people in the treatment condition who got the New Thing and when we observed them we found that 50 of the 100 changed and 50 of the 100 didn’t change.  We also started with 100 people in the control condition who got the Old Thing and when we observed them we found 50 of the 100 changed and 50 didn’t.  No effect.  Nada.  Zip.  The New Thing is not different from the Old Thing.  [Side Bar:  If you’re a stat maven or just pretty quick you know that failure would result if both rows were 10/90 or 30/70 or even 90/10, anything as long as both rows have the same percentage.  Failure is not just 50/50, but rather when both rows show the same finding.  I used 50/50 because it makes the math and the concept easier to follow.]

Now, let’s create an example where we start to get differences.  Let’s assume that Something Happens when people get the New Thing and it looks like this.


We now see on the rows and the columns, a 45/55 effect, a 10 point difference.  In social science parlance, this 10 point difference is called a “small” effect as popularized by Jacob Cohen in his work on power analysis and effect sizes.  Make sure that you see the impact of the treatment.  Notice in this example that more people who get the New Thing showed the desired change (read the row) compared to people who got the Old Thing (read their row).

Now, let’s increase the effect size.  Let’s go from “small” to “medium.”  Here’s the Windowpane for a medium effect.


Now, our row values are 35 and 65.  A moderate effect is a 30 point difference.  That sounds somewhat impressive, a 30 percentage point difference.  Think about this medium effect another way.  Notice that 65 is almost twice as large as 35.  A medium effect means that you’re getting almost twice as much change in the treatment group compared to the control group.  A medium effect is getting to be pretty obvious.  Think how obvious a “large” effect must be.  It looks like this.


The row values here are 25 and 75, a 50 point difference.  Now the rate of difference is three times with the Treatment producing a 300% increase over the Control.  That’s big.  That’s obvious.  Take a quick scan now and review the four Windowpanes, No Effect, Small Effect, Medium Effect, and Large Effect.  See the numbers change.

Windowpane as a Jar of Marbles

If you’re still with me, let me offer my congratulations for your patience and motivation.  You’re hanging tough with numbers, never easy or fun.  So, I’ve got a treat for you.  Let’s use jars of marbles to illustrate effect sizes.  This will give you a quick and easy visual way of observing effects rather than counting effects.

We have two jars, one is the New Thing and the other is the Old Thing (or Treatment and Control; or Special Sauce and Regular Sauce; you get it).  Inside each jar are 100 marbles, either black or white.  If we’re testing mortality, white is Alive and black is Dead.  Let’s start with the No Effect condition like we did with numbers.

Here, each jar contains 50 white and 50 black marbles meaning there is no difference between the Jars or Things or Conditions or Sauces.  It’s just that 50/50 No Effect Condition.

Now, let’s demonstrate a Small Effect, that 45/55 Effect.  Which Jar has the most white marbles?

Not so easy to see at a glance or even with a long stare.  Many people, even trained and experienced statisticians, need to count the white marbles to figure out that the orange jar contains 55 white marbles and that the blue jar contains 45 white marbles.

Now, let’s look at the Medium Effect size of 35/65.  Which jar has the most white marbles?

Pretty obvious, isn’t it?  That orange jar clearly has more white marbles and only a propeller head counts everything following the Ronald Reagan Rule of, doveryai, no proveryai.” See how much of a difference a Medium Effect is.  On the medical side of things, many researchers call these effect eizes, Clinically Significant, which means a physician in an examination room looking at a specific patient can see the impact of a new drug, for example.  If that drug has a Medium Effect, you’ll see it.

Now, just to be complete, let’s look at a Large Effect Size of 25/75 which you probably expect to be as obvious as a zit on your nose on Prom Night.  Here it is.

Boom!  Large Effects are incredibly obvious.  The impact of the New Thing compared to the Old Thing is so strong you wonder why anyone did the test in the first place.  Don’t we know that, on average, men can lift more weight than women, that athletes can run faster than injured people, that Melanie is prettier than I am?

If you’d like more examples of practical effect sizes, check out this Blog post. It looks at effects with speed, height, and IQ.

Zen with Venn Effect Sizes

Dr. Venn invented his eponymous diagrams as an Oxford don in the 19th century. They remain interesting, useful, and attractive. Consider this series of four Venn diagrams as dramatizations of Effect Sizes.

I call these dramatizations because we should not understand the circles and their relationship as exact. They convey the sense of no, little, more, and much overlap, connection, association or whatever word you prefer.

Now realize another dramatization of the Venn diagrams. If you compare the shared area (the intersection of the two sets) with the unshared area, you can see them as a ratio. Imagine them as the numerator (top) and denominator (bottom) of a fraction. This becomes an illustration of the t-test. That test divides the amount of explained variance in the overlap area by the amount of error variance in the nonoverlap area.

Quickly see a bonus dramatization! This Zen of Venn also reveals that an effect size and its associated test of significance are identities! When you look at the overlap you see the effect size. When you compare that overlap with the nonoverlap you see the test of statistical significance.

See yet another dramatization: the Venn of two predictors on one outcome.

Zen this Venn for a moment. We see that each predictor overlaps with the outcome. We can contemplate both the likely statistical significance and the effect sizes. Both appear to be SSD and Medium. We also see that using two predictors increases the amount of explained variance (the overlap) and also decreases the amount of error variance (the nonoverlapped part of the outcome). This example dramatizes a main effect for both predictors, but no interaction because the two predictors only overlap with the outcome, but not each other.

More drama. Add another predictor. Like this.

Zen again. The new predictor shows little overlap with the outcome, but notice what now happens when we form the t-test. The presence of the other, stronger, predictors consumes variance from the outcome, shrinking the error term now making that little overlap more significant than it would be alone.

This is the Sin of Venn with the Observational Tooth Fairy. You see its operation in Climate Change, Breast Cancer, Soda Pop, Sitting, Statins, Exercise, Diet, and even Health Insurance. Adjust the Venn with extra predictors that consume error variance and convert it to explained variance, then use that shrunken error term to test your Tooth Fairy predictor.

P.S. I always thought that the Alan Parsons Project should do math and science concept album with Dr. Venn as an inspiration. Something like Dr. Tarr and Professor Fether (YouTube) from Tales of Imagination.

A Nuance or Sophistical Statistics or Persuading with Numbers

Rarely will you see reports offering something like the Windowpane.  Instead you’ll see correlations (r), standard deviation effects (d or g), or ratios (risk, absolute, relative, hazard, odds).  Here’s a handy conversion guide again using the Cohen conventions.

The relationship between the three types of r, d or g, and ratios, is exact, meaning you can mathematically transform one into another.  Whether a report uses r or g is typically determined by the type of data and the training of the statistician.  But, realize that an r of .10 is conceptually equivalent to a d of .20 even though the d is numerically the larger value.  If you like scaring people, whenever you are sitting through a PowerPoint, ask the presentors to translate the statistics they are using into the various Effect Sizes, like d or r.  If they look at you like you are from Mars or get defensive, check you wallet.

Here’s a nice and simple formula (PDF) for translating an Odds Ratio into the d effect size.  Take the natural log (ln) of the OR then divide by 1.81.  That will create a very close approximation of a Cohen d.  What if they don’t report an Odds Ratio, but some other kind of Ratio?  If you’re a statistician trying to get published, you know what to do, but for the rest of us living in the practical world – just use the same formula.  It’s close enough for the real world.  We’re talking money, guns, and lawyers here, not NIH peer review.

I also caution you to go High WATT when reading those ratios, whether called hazard or odds or absolute or whatever Brand Name some epidemiologist is selling that day.  Ratios are extremely persuasive numbers than can be played like chords on a guitar to produce different tones for the same values. For example, a risk that is found to be “150%” greater than the control group sounds like a big deal, but it is only a Small Effect.  Just scan that range from 1.5 to 2.5 through 4.0 to get a sense of scale.  Sure, “150%” sounds big, but compared to “250%” or “400%,” not so much.  And most health and safety research published in good peer review journals finds these Small Effects, so when you see “134%” increase, there’s a natural temptation to go “Wow!” when the effect is barely detectable over random variation.  You can read more about this in a great review article here (PDF).

Worse still is all the “adjustment” that observational researchers will employ.  Are you looking at the analysis of the raw data or have the data been modified?  Look for covariates, transformations, and other adjustments and make sure you understand what happened.  Always seek the rawest number you can find in a report, before all the adjustments.  Change is as subtle as a hammer with a thumb – either it hits or it misses.  Nuance is groovy, but simple should always be reported.  As a Rule of Thumb (and keep hammers in mind with such a Rule), whenever a presenter will not or cannot make nuance simple, I’d caution – Caution!  A good argument should procede from simple to complex with all the steps in between made apparent.  Change is from THAT to THIS, obvious, exclusive, bordered, different.

The Cool Table guys know exactly how to play the statistical sophist with ratios and make the weaker Argument appear to be the stronger Argument.  Cassandra speaks!  You are warned!


The point of this demonstration is to show that you can think with numbers in a practical and efficient way without having a statistician in the room.  Anyone can handle the windowpane approach with numbers.  Just have a clear definition of Changed? (Yes or No) and a clear definition of the Group (Treatment or Control).  Then just count and look for percentage differences.  A 10% difference is small, 30% is moderate, and 50% is large.  And, realize that while “small” may be hard to detect, it can definitely make big practical effect.

Now whether you conceptualize Effect Sizes as windowpanes or jars with marbles, you now understand what the idea, Difference, means.  You can count or see No, Small, Medium, or Large Differences and interpret those complex statistical arguments you encounter all the time.  Realize again, that this approach is not Statistics for Dummies, Idiots, or Fools, but is a standard and mathematically correct way to present quantitative information.