Polygenic Scores and Causation

Polygenic Scores and Causation

1400 words

The use of polygenic scores has caused much excitement in the field of socio-genomics. A polygenic score is derived from statistical gene associations using what is known as a genome-wide association study (GWAS). Using genes that are associated with many traits, they propose, they will be able to unlock the genomic causes of diseases and socially-valued traits. The methods of GWA studies also assume that the ‘information’ that is ‘encoded’ in the DNA sequence is “causal in terms of cellular phenotype” (Baverstock, 2019).

For instance it is claimed by Robert Plomin thatpredictions from polygenic scores have unique causal status. Usually correlations do not imply causation, but correlations involving polygenic scores imply causation in the sense that these correlations are not subject to reverse causation because nothing changes the inherited DNA sequence variation.”

Take the stronger claim from Plomin and Stumm (2018):

GPS are unique predictors in the behavioural sciences. They are an exception to the rule that correlations do not imply causation in the sense that there can be no backward causation when GPS are correlated with traits. That is, nothing in our brains, behaviour or environment changes inherited differences in DNA sequence. A related advantage of GPS as predictors is that they are exceptionally stable throughout the life span because they index inherited differences in DNA sequence. Although mutations can accrue in the cells used to obtain DNA, like any cells in the body these mutations would not be expected to change systematically the thousands of inherited SNPs that contribute to a GPS.

This is a strange claim for two reasons.

(1) They do not, in fact, imply causation since the scores derived from GWA studies which are associational and therefore cannot show causes—GWA studies are pretty much giant correlational studies that scan the genomes of hundreds of thousands of people and look for genes that are more likely to be in the sample population for the disease/”trait” in question. These studies are also heavily skewed to European populations and, even if they were valid for European populations (which they are not), they would not be valid for non-European ethnic groups (Martin et al, 2017; Curtis, 2018; Haworth et al, 2018).

(2) The claim that “nothing changes inherited DNA sequence variation” is patently false; what one experiences throughout their lives can most definitely change their inherited DNA sequence variation (Baedke, 2018; Meloni, 2019).

But, as pointed out by Turkheimer, Plomin and Stumm are assuming that no top-down causation exists (see, e.g., Ellis, Noble, and O’Connor, 2011). We know that both top-down (downward) and bottom-up (upward) causation exists (e.g., Noble, 2012; see Noble 2017 for a review). Plomin, it seems, is coming from a very hardline view of genes and how they work. A view, it looks like to me, that derives from the Darwinian view of genes and how they ‘work.’

Such work also is carried out under the assumption that ‘nature’ and ‘nurture’ are independent and can therefore be separated. Indeed, the title of Plomin’s 2018 book Blueprint implies that DNA is a blueprint. In the book he has made the claim that DNA is a “fortune-teller” and that things like PGSs are “fortune-telling devices” (Plomin, 2018: 6). PGSs are also carried out based on the assumption that the heritability estimates derived from twin/family/adoption studies tell us anything about how “genetic” a trait is. But, since the EEA is false (Joseph, 2014; Joseph et al, 2015) then we should outright reject any and all genetic interpretations of these kinds of studies. PGS studies are premised on the assumption that the aforementioned twin/adoption/family studies show the “genetic variation” in traits. But if the main assumptions are false, then their conclusions crumble.

Indeed, lifestyle factors are better indicators of one’s disease risk compared to polygenic scores, and so “This means that a person with a “high” gene score risk but a healthy lifestyle is at lower risk than a person with a “low” gene score risk and an unhealthy lifestyle” (Joyner, 2019). Janssens (2019) argues that PRSs (polygenic risk scores) “do not ‘exist’ in the same way that blood pressure does … [nor do they] ‘exist’ in the same way clinical risk models do …” Janssens and Joyner (2019) also note that “Most [SNP] hits have no demonstrated mechanistic linkage to the biological property of interest. By showing mechanistic relations between the proposed gene(s) and the disease phenotype, researchers would, then, be on their way to show “causation” for PGS/PRS.

Nevertheless, Sexton et al (2018) argue that “While research has shown that height is a polygenic trait heavily influenced by common SNPs [712], a polygenic score that quantifies common SNP effect is generally insufficient for successful individual phenotype prediction.Smith-Wooley et al (2018) write that “… a genome-wide polygenic score … predicts up to 5% of the variance in each university success variable.” But think about the words “predicts up to”—this is a meaningless phrase. Such language is, of course, causal when they—nor anyone else—has shown that such scores are indeed casual (mechanistically).

Spurious correlations

What these studies are indexing are not causal genic variants for disease and other “traits”, they are showing the population structure of the population sampled in question (Richardson, 2017; Richardson and Jones, 2019). Furthermore, the demographic history of the sample in question can also mediate the stratification in the population (Zaidi and Mathieson, 2020). Therefore, claims that PGSs are causal are unfounded—indeed, GWA studies cannot show causation. GWA studies survive on the correlational model—but, as has been shown by many authors, the studies show spurious correlations, not the “genetics” of any studied “trait” and they, therefore, do not show causation.

One further nail-in-the-coffin for hereditarian claims for PGS/PRS and GWA studies is due to the fact that the larger the dataset (the larger the number of datapoints), there will be many more spurious correlations found (Calude and Longo, 2017). When it comes to hereditarian claims, this is relevant to twin studies (e.g., Polderman et al, 2015) and GWA studies for “intelligence” (e.g., Sniekers et al, 2017). It is entirely possible, as is argued by Richardson and Jones (2019) that the results from GWA studies “for intelligence” are entirely spurious, since the correlations may appear due to the size of the dataset, not the nature of it (Calude and Longo, 2017). Zhou and Zao (2019) argue that “For complex polygenic traits, spurious correlation makes the separation of causal and null SNPs difficult, leading to a doomed failure of PRS.” This is troubling for hereditarian claims when it comes to “genes for” “intelligence” and other socially-valued traits.

How can hereditarians show PGS/PRS causation?

This is a hard question to answer, but I think I have one. The hereditarian must:

(1) provide a valid deductive argument, in that the conclusion is the phenomena to be explained; (2) provide an explanans (the sentences adduced as the explanation for the phenomenon) that has one lawlike generalization; and (3) show the remaining premises which state the preceding conditions have to have empirical content and they have to be true.

An explanandum is a description of the events that need explaining (in this case, PGS/PRS) while an explanans does the explaining—meaning that the sentences are adduced as explanations of the explanans. Garson (2018: 30) gives the example of zebra stripes and flies. The explanans is Stripes deter flies while the explanandum is Zebras have stripes. So we can then say that zebras have stripes because stripes deter flies.

Causation for PGS would not be shown, for example, by showing that certain races/ethnies have higher PGSs for “intelligence”. The claim is that since Jews have higher PGSs for “intelligence” then it follows that PGSs can show causation (e.g., Dunkel et al, 2019; see Freese et al, 2019 for a response). But this just shows how ideology can and does color one’s conclusions they glean from certain data. That is NOT sufficient to show causation for PGS.


PGSs cannot, currently, show causation. The studies that such scores are derived from fall prey to the fact that spurious correlations are inevitable in large datasets, which also is a problem for other hereditarian claims (about twins and GWA studies for “intelligence”). Thus, PGSs do not show causation and the fact that large datasets lead to spurious correlations means that even by increasing the number of subjects in the study, this would still not elucidate “genetic causation.”

