Why Most Supplement Studies Are Worthless (And How to Spot the Good Ones)

During my PhD, I reviewed hundreds of supplement studies. Testosterone boosters, adaptogenic herbs, mineral complexes, proprietary blends with impressive-sounding names — the entire spectrum. And I can tell you with uncomfortable certainty that the vast majority of published supplement research is, to put it clinically, garbage.

That's not a reflexive dismissal. I'm not anti-supplement. I take compounds myself that I believe are supported by strong evidence. But the path to finding those few compounds required wading through an ocean of bad science, misleading claims, and studies designed to produce a specific result rather than discover the truth.

Here's the framework I use to separate signal from noise. If you apply it consistently, you'll dismiss about 90% of what the supplement industry puts in front of you. And the 10% that survives will be worth paying attention to.

The Conflict of Interest Problem

Let's start with the structural issue that contaminates everything downstream. The supplement industry largely funds its own research. This isn't inherently disqualifying — pharmaceutical companies fund their own clinical trials too. But the supplement industry operates under far less regulatory scrutiny than pharmaceuticals. There's no FDA approval process for supplements. There's no requirement for pre-market safety or efficacy data. The bar is on the floor.

When a company funds its own study, it controls the design, the endpoints, the statistical analysis, and the publication decision. Negative results quietly disappear into file drawers. Positive results get amplified across marketing materials and Amazon listings. This is publication bias at its most brazen, and it's endemic to the supplement world.

This doesn't mean every industry-funded study is worthless. But it does mean you need to read them with a much more critical eye than you would an NIH-funded trial.

The Red Flags

Over the years, I've compiled a list of immediate red flags that, when I see them, dramatically lower my confidence in a study's conclusions. Here are the most common ones:

Tiny sample sizes. If a study has fewer than 30 participants, the statistical power is so low that almost any result could be due to random chance. I've seen testosterone booster studies with 12 participants that claim statistically significant results. With 12 people, a single outlier can swing the entire outcome. Yet these studies get cited on product labels as if they were definitive.

No placebo control. A surprising number of supplement studies are open-label — meaning the participants know they're taking the active compound. The placebo effect in subjective outcomes like "energy" and "libido" is massive. Without a placebo arm, you're measuring belief, not biochemistry.

Animal studies presented as human evidence. "Shown to increase testosterone by 40%!" Shown in whom? Rats. The metabolic differences between rodents and humans are significant enough that a compound's effect in a rat model tells you almost nothing about what it will do in a human body. Animal studies are useful for generating hypotheses. They're not evidence of efficacy in humans.

Proprietary blends hiding doses. If a product label says "Proprietary Male Vitality Complex: 800mg" and lists six ingredients without individual doses, you have no way of knowing whether any single ingredient is present at a clinically relevant amount. In most cases, it isn't. The blend exists to put impressive ingredient names on the label while including trace amounts that cost almost nothing.

📷

Hierarchy of scientific evidence pyramid

The Hierarchy of Evidence

Not all study designs are created equal. Understanding the hierarchy of evidence is the single most important skill for evaluating any health claim, not just supplement claims.

At the bottom, you have case reports and anecdotes — "I took this and felt great." This is the weakest form of evidence. It tells you nothing about causation and everything about placebo, expectation, and individual variation.

Above that are observational studies — population-level correlations. "People who consume more of X tend to have higher levels of Y." These can identify associations but cannot establish that X causes Y. Confounding variables lurk everywhere.

Then come randomized controlled trials (RCTs). Participants are randomly assigned to receive either the active compound or a placebo. Ideally, it's double-blind — neither the participants nor the researchers know who got what until the data is analyzed. This design controls for placebo effects, researcher bias, and most confounding variables. An RCT is the minimum threshold for me to take a supplement claim seriously.

At the top sit systematic reviews and meta-analyses — studies of studies. These aggregate data from multiple RCTs to produce more robust estimates of effect size and to identify patterns that individual trials might miss.

Reading Abstracts Critically: Effect Size vs. P-Value

Even when you find a well-designed RCT, the abstract can be misleading if you don't know what to look for. The most common trick is emphasizing statistical significance (p-value) while burying effect size.

A p-value tells you the probability that the observed result occurred by chance. A p-value of 0.05 means there's a 5% probability the result is a fluke. But a result can be statistically significant and clinically meaningless. If a supplement raises testosterone by 8 ng/dL with a p-value of 0.03 in a large sample, that's a "statistically significant" result that would make zero perceptible difference in how a man feels, looks, or performs. The numbers moved, but your body didn't notice.

Always ask: How much did the variable change, not just whether it changed? A 20% increase in testosterone is meaningful. A 2% increase is noise, no matter how impressive the p-value.

📷

Example of misleading supplement study claims vs actual data

Relative vs. Absolute Risk Reduction

This is a classic trick used across health marketing, not just supplements. "Reduces risk by 50%!" sounds incredible. But if the baseline risk was 2% and it dropped to 1%, the absolute reduction is 1 percentage point. The relative reduction is 50%. Both are technically accurate. Only one is honest.

In supplement studies, you'll often see relative percentage changes highlighted while absolute values are tucked into supplementary tables. Always find the raw numbers.

Cherry-Picking Endpoints

A study might measure 15 biomarkers and find a statistically significant change in one of them. If that one result becomes the headline — "Shown to improve hormonal health!" — while the other 14 null results are downplayed, the study is being used as marketing material, not as evidence. Look at the pre-registered primary endpoints (if they exist) and compare them to what the study actually reports. If the goalpost moved, be skeptical.

My Personal Framework

After years of refining my approach, here's what I require before I'll take any compound seriously as a testosterone support:

At least one randomized, double-blind, placebo-controlled trial
Conducted in healthy human males (not rats, not women, not clinically ill populations)
Sample size of at least 50 participants
Duration of at least 8 weeks (hormonal changes take time)
Clinically meaningful effect size — not just statistical significance
Published in a peer-reviewed journal, not just a company white paper

This framework eliminated the vast majority of testosterone boosters I encountered. The popular compounds you see on every shelf — tribulus terrestris, fenugreek, D-aspartic acid, most ashwagandha formulations — either failed to meet these criteria entirely or produced effect sizes so small they wouldn't be perceptible to the person taking them.

But a few compounds survived. A very few. And when something passes this filter, it gets my attention in a way that no marketing copy ever could — because the evidence did the convincing, not the label.

I'll be writing about those compounds individually in future articles, applying this exact framework to the actual data. If a compound has strong evidence, I'll say so clearly. If the evidence is mixed or weak, I'll say that too. The framework doesn't care what I want to be true. It only cares about what the data shows.

Next in this series: I apply this framework to a specific compound — Shilajit — and walk through the clinical data study by study. The results surprised me.