Lies, Damned Lies, and Statistics

Note:  The above quotation is often erroneously attributed to Mark Twain.  He actually attributed it to Benjamin Disraeli, although no record exists of Disraeli having actually used it.  The actual originator of the phrase is unclear.

In a world where our collective attention spans appear to be growing shorter by the day, we are all increasingly susceptible to being fooled by statistical manipulations and the manipulator’s hidden agendas.

I’m firmly convinced that a clever person that gathers a large amount of data on almost any controversial subject will be able to find a way to mathematically support their position without regard to the validity of their argument.  All it takes is a bit of selective manipulation, a bit of assumption massaging, or a couple rounds of aggregation/disaggregation, and you can “prove” almost anything.

Diet Drinks

I recently came across a couple of articles from supposed “health-news” resources (the type that is often shared on Facebook) decrying the cardiac risk of diet soft drinks.  Now, as a disclaimer, I must admit that I enjoy my diet drinks, so I have my own dog in this hunt.  Both articles cited a recent study by the University of Iowa.  I went to the source, a press release but not the actual study, which was a derivative of another press release from the American College of Cardiology.  In the referenced press release (which is, apparently, an adequate source for the development of online articles) it states the following:

“We only found an association (between diet drinks and increased cardiac risk), so we can’t say that diet drinks cause these problems,” Vyas says, adding that there may be other factors about people who drink more diet drinks that could explain the connection.

Looking deeper, I saw that the study was based on self-reported consumption of the drinks – already a monkey wrench in the study’s validity.  In addition, I also discovered that those people in the subject group (post- menopausal women) that drank more diet soda also have a higher incidence smoking, diabetes, hypertension, and were more likely to be overweight.  The study supposedly corrected for these risk factors (how do they do this?  I’m suspicious…) the incidence rate of cardiac problems was 7% for light diet drink consumers and 9% for heavy diet drinkers!  Even in a large study (and this one had 60,000 subjects) this is at the level of splitting hairs.

Yet this is what had people all fired up.

Clearly, it’s not the data doing that, it’s the presentation.

This week’s data dump

I happened to latch onto another news story this week, one that this time manipulates opinions through the use of over-aggregation.  The Huffington Post reported on a release from the American Association of University Women (AAUW) concerning the “pay gap” between men and women.  According to the article, women earn 78 cents for every dollar a man earns – a statistic I’ve heard quoted often in the past (actually, it used to be 77 cents) and is used as the backbone for the “equal pay for equal work” argument.  AAUW thoughtfully provided a breakdown by state and racial group – apparently to allow those most greatly victimized to feel the most offended.

Unfortunately, the breakdown is useless when it comes to understanding the numbers.

To the author’s credit, the article mentions that some of these differences are driven by choices women make in their careers – choices, which might negatively impact their earnings.  However, it also notes that… “AAUW found that a 7 percent gender wage gap existed among college graduates one year out of school after controlling for "college major, occupation, economic sector, hours worked, months unemployed since graduation, GPA, type of undergraduate institution, institution selectivity, age, geographical region, and marital status." And 10 years after graduating college, that gap grows to 12 percent.

There’s that damned “correcting” stuff, again.

Since the obvious conclusion – that women, even recent college graduates, are being systematically discriminated against – is completely unsupported by my personal experiences in the workforce (I can’t recall ever seeing a woman being paid less simply because she was a woman) I started digging into the data.

The top level number, the one that everyone quotes, is based on a complete aggregation of everything – the median wage for all females that work more than 35 hours, divided by the median wage for all males that work more than 35 hours.  I suppose I should be pleased that they didn’t use averages, which would be further skewed by outliers like the predominance of men in CEO positions.  The data is interesting, but if you’re campaigning for “equal pay for equal work” I don’t think this gives you much.  There’s no effort made to make the “work” equal, so it shouldn’t be any surprise that the “pay” also isn’t equal.

Then we come to the factors “corrected” in the statistical analysis.  I had to do some searching to get some details on this, but eventually came across another Huffington Post article that talks about the methodology used by the AAUW and some of its flaws.

The “correcting,” in this case, involves looking at recent college graduates (at both one year post graduation and ten years) and comparing males and females in the same majors.  That seems like a reasonable approach, but again, the devil is in the details.

The article notes that for the purposes of the study, the “same major” categories are overly broad, and probably explain at least a portion – if not all – of the residual pay gap.  For example, when comparing those earning degrees in “Social Science,” women one year out earn only 83% of what men earn.  But “Social Science” includes degrees in Sociology (68% female, median wage of $40K) and Economics (66% male, median wage of $70K).  In my book, this is still apples and oranges as I wouldn’t hire a Sociology major if I was looking for an Economist, and vice versa.  And I’d definitely pay only what I had to pay to get the person with the educational background I was after.

The article further notes that… The AAUW study classifies jobs as diverse as librarian, lawyer, professional athlete, and "media occupations" under a single rubric--"other white collar."

So is there a problem here?  Are women systematically underpaid?

Who the heck can tell from the data presented?  My personal experiences would say “no,” which is why I starting looking into this subject in the first place.  Unfortuantely, I’m just as guilty as the next person of stopping my research once I find an opinion that agrees with my preconceived notions.

Just remember as you troll through all the alarmist, online articles written based upon press releases that quote “statistics” and “studies,” that the conclusions are potentially subject to the same statistical manipulation, half-truths, and improper causal relationships, that I’ve described above.

And also remember that everyone reporting on a subject has some kind of an agenda.