“Microcephalin, a gene regulating brain size, continues to evolved adaptively in humans” (Evans et al, 2005) “Adaptive evolution of ASPM, a major determinant of cerebral cortical size in humans” (Evans et al, 2004) are two papers from the same research team which purport to show that both MCPH1 and ASPM are “adaptive” and therefore were “selected-for” (see Fodor, 2008; Fodor and Piatteli-Palmarini, 2010 for discussion). That there was “Darwinian selection” which “operated on” the ASPM gene (Evans et al, 2004), that we identified it was selected, along with its functional effect is evidence that it was supposedly “selected-for.” Though, the combination of functional effect along with signs of (supposedly) positive selection do not license the claim that the gene was “selected-for.”
One of the investigators who participated in these studies was one Bruce Lahn, who stated in an interview that MCPH1 “is clearly favored by natural selection.” Evans et al (2005) show specifically that the variant supposedly under selection (MCPH1) showed lower frequencies in Africans and the highest in Europeans.
But, unfortunately for IQ-ists, neither of these two alleles are associated with IQ. Mekel-Boborov et al (2007: 601) write that their “overall findings suggest that intelligence, as measured by these IQ tests, was not detectably associated with the D-allele of either ASPM or Microcephalin.” Timpson et al (2007: 1036A) found “no meaningful associations with brain size and various cognitive measures, which indicates that contrary to previous speculations, ASPM and MCPH1 have not been selected for brain-related effects” in genotyped 9,000 genotyped children. Rushton, Vernon, and Bons (2007) write that “No evidence was found of a relation between the two candidate genes ASPM and MCPH1 and individual differences in head circumference, GMA or social intelligence.” Bates et al’s (2008) analysis shows no relationship between IQ and MCPH1-derived genes.
But, to bring up Fodor’s critique, if MCPH1 is coextensive with another gene, and both enhance fitness, then how can there be direct selection on the gene in question? There is no way for selection to distinguish between the two linked genes. Take Mekel-Bobrov et al (2005: 1722) who write:
The recent selective history of ASPM in humans thus continues the trend of positive selection that has operated at this locus for millions of years in the hominid lineage. Although the age of haplogroup D and its geographic distribution across Eurasia roughly coincide with two important events in the cultural evolution of Eurasia—namely, the emergence and spread of domestication from the Middle East ~10,000 years ago and the rapid increase im population associated with the development of cities and written language 5000 to 6000 years ago around the Middle East—the signifigance of this correlation is not clear.
Surely both of these genetic variants have a hand in the dawn of these civilizations and behaviors of our ancestors; they are correlated, right? Though, they only did draw that from the research studies they reported on—these types of wild speculation are in the papers referenced above. Lahn and his colleagues, though, are engaging in very wild speculation—if these variants are under positive selection, that is.
So it seems that this research and the conclusions drawn from it are ripe for a just-so story. We need to do a just-so story check. Now let’s consult Smith’s (2016: 277-278) seven just-so story triggers:
1) proposing a theory-driven rather than a problem-driven explanation, 2) presenting an explanation for a change without providing a contrast for that change, 3) overlooking the limitations of evidence for distinguishing between alternative explanations (underdetermination), 4) assuming that current utility is the same as historical role, 5) misusing reverse engineering, 6) repurposing just-so stories as hypotheses rather than explanations, and 7) attempting to explain unique events that lack comparative data.
For example, take (1): a theory-driven explanation leads to a just-so story, as Shapiro (2002: 603) notes, “The theory-driven scholar commits to a sufficient account of a phenomenon, developing a “just so” story that might seem convincing to partisans of her theoretical priors. Others will see no more reason to believe it than a host of other “just so” stories that might have been developed, vindicating different theoretical priors.” That these two genes were “selected-for” means that, for Evans et al, it is a theory-driven explanation and therefore falls prey to the just-so story criticism.
Rasmus Nielsen (2009) has a paper on the thirty years of adaptationism after Gould and Lewontin’s (1972) Spandrels paper. In it, he critiques so-called examples of two genes being supposedly selected-for: a lactase gene, and MCPH1 and ASPM. Nielsen (2009) writes of MCPH1 and ASPM:
Deleterious mutations in ASPM and microcephalin may lead to reduced brain size, presumably because these genes are cell‐cycle regulators and very fast cell division is required for normal development of the fetal brain. Mutations in many different genes might cause microcephaly, but changes in these genes may not have been the underlying molecular cause for the increased brain size occurring during the evolution of man.
In any case, Currat et al (2006: 176a) show that “the high haplotype frequency, high levels of homozygosity, and spatial patterns observed by Mekel-Bobrov et al. (1) and Evans et al. (2) can be generated by demographic models of human history involving a founder effect out-of-Africa and a subsequent demographic or spatial population expansion, a very plausible scenario (5). Thus, there is insufficient evidence for ongoing selection acting on ASPM and microcephalin within humans.” McGowen et al (2011) show that there is “no evidence to support an association between MCPH1 evolution and the evolution of brain size in highly encephalized mammalian species. Our finding of significant positive selection in MCPH1 may be linked to other functions of the gene.”
Lastly, Richardson (2011: 429) writes that:
The force of acceptance of a theoretical framework for approaching the genetics of human intellectual differences may be assessed by the ease with which it is accepted despite the lack of original empirical studies – and ample contradictory evidence. In fact, there was no evidence of an association between the alleles and either IQ or brain size. Based on what was known about the actual role of the microcephaly gene loci in brain development in 2005, it was not appropriate to describe ASPM and microcephalin as genes controlling human brain size, or even as ‘brain genes’. The genes are not localized in expression or function to the brain, nor specifically to brain development, but are ubiquitous throughout the body. Their principal known function is in mitosis (cell division). The hypothesized reason that problems with the ASPM and microcephalin genes may lead to small brains is that early brain growth is contingent on rapid cell division of the neural stem cells; if this process is disrupted or asymmetric in some way, the brain will never grow to full size (Kouprina et al, 2004, p. 659; Ponting and Jackson, 2005, p. 246)
Now that we have a better picture of both of these alleles and what they are proposed to do, let’s now turn to Lahn’s comments on his studies. Lahn, of course, commented on “lactase” and “skin color” genes in defense of his assertion that such genes like ASPM and MCPH1 are linked to “intelligence” and thusly were selected-for just that purpose. However, as Nielsen (2009) shows, that a gene has a functional effect and shows signs of selection does not license the claim that the gene in question was selected-for. Therefore, Lahn and colleagues engaged in fallacious reasoning; they did not show that such genes were “selected-for”, while even studies done by some prominent hereditarians did not show that such genes were associated with IQ.
Like what we now know about the FOXP2 gene and how there is no evidence for recent positive or balancing selection (Atkinson et al, 2018), we can now say the same for such other evolutionary just-so stories that try to give an adaptive tinge to a trait. We cannot confuse selection and function as evidence for adaptation. Such just-so stories, like the one described above along with others on this blog, can be told about any trait or gene and explain why it was selected and stabilized in the organism in question. But historical narratives may be unfalsifiable. As Sterelny and Griffiths write in their book Sex and Death:
Whenever a particular adaptive story is discredited, the adaptationist makes up a new story, or just promises to look for one. The possibility that the trait is not an adaptation is never considered.
The “fade-out effect” occurs when interventions are given to children to increase their IQs, such as Head Start (HS) or other similar programs. In such instances when IQ gains are clear, hereditarians argue that the effect of the interventions “washes” away or “fades out.” Thus, when discussing such studies, hereditarians think they are standing in victory. That the effects from the intervention fade away is taken to be evidence for the hereditarian position and is taken to refute a developmental, interactionist position. However, that couldn’t be further from the truth.
Think about where the majority of HS individuals come from—poorer environments and which are more likely to have disadvantaged people in them. Since IQ tests—along with other tests of ability—are experience-dependent, then it logically follows that one who is not exposed to the test items or structure of the test, among other things, will be differentially prepared to take the test compared to, say, middle-class children who are exposed to such items daily.
When it comes to HS, for instance, whites who attend HS are “significantly more likely to complete high school, attend college, and possibly have higher earnings in their early twenties. African-Americans who participated in Head Start are less likely to have been booked or charged with a crime” (Garces, Thomas, and Currie, 2002). Deming (2009) shows many positive health outcomes in those who attend HS. This is beside the case, though (even if we accept the hereditarian hypothesis here, there are still many, many good reasons for programs such as HS).
Students who were randomly assigned to higher quality classrooms in grades K–3—as measured by classmates’ end-of-class test scores—have higher earnings, college attendance rates, and other outcomes. Finally, the effects of class quality fade out on test scores in later grades, but gains in noncognitive measures persist.
So such gains “faded out”, therefore hereditarianism is a more favorable position, right? Wrong.
Think about test items, and testing as a whole. Then think about differing environments that social classes are in. Now, thinking about test items, think about how exposure to such items and similar questions would have an effect on the test-taking ability of the individual in question. Thus, since tests of ability are experience-dependent, then the logical position to hold is that if they are exposed to the knowledge and experience needed for successful test-taking then they will score higher. And this is what we see when such individuals are enrolled in the program, but when the program ends and the scores decrease, the hereditarian triumphs that it is another piece of the puzzle, another piece of evidence in favor of their position. Howe (1997: 53) explains this perfectly:
It is an almost universal characteristic of acquired competences that when their is a prolonged absence of opportunities to use, practise, and profit from them, they do indeed decline. It would therefore be highly surprising if acquired gains in intelligence did not fade or diminish. Indeed, had the research findings shown that IQs never fade or decline, that evidence would have provided some support for the view that measured intelligence possesses the inherent — rather than acquired — status that intelligence theorists and other writers within the psychometric position have believed it to have.
A similar claim is made by Sauce and Matzel (2018):
In simpler terms, the analysis of Protzko should not lead us to conclude that early intervention programs such as Head Start can have no long-term benefits. Rather, these results highlight the need to provide participants with continuing opportunities that would allow them to capitalize on what might otherwise be transient gains in cognitive abilities.
Now, if we think in the context of the HS and similar interventions, we can see why such stark differences in scores appear, and why some studies show a fade out effect. Such new knowledge and skills (what IQ tests are tests of; Richardson, 2002) are largely useless in those environments since they have little to no opportunity to hone their newly-acquired skills.
Take success in an action video game, weight-lifting, bodybuilding (muscle-gaining), or pole-vaulting. One who does well in any one of these three events will of course have countless of hours of training learning new techniques and skills. They continue this for a while. Then they abruptly stop. They are no longer honing (and practicing) their acquired skills so they begin to lose them. The “fade-out effect” has affected their performance and the reason is due to their environmental stimulation—the same holds for IQ test scores.
I’ll use the issue of muscle-building to illustrate the comparison. Imagine you’re 20 years old and just start going to the gym on a good program. The first few months you get what are termed “newbie gains”, as your body and central nervous system begins to adapt to the new stressor you’re placing on your body. Then after the initial beginning period, at about 2 to 3 months, these gains eventually stop and then you’ll have to be consistent with your training and diet or you won’t progress in weight lifted or body composition. But you are consistent with training and diet and you then have a satisfactory body composition and strength gains.
But then things change you stop going to the gym as often as you did before and you get lazy with your nutrition. Your body composition you worked so hard for along with your strength gains start to dissipate since you’re not placing your body under the stressor it was previously under. But there is something called “muscle memory” which occurs due to motor learning in the central nervous system.
The comparison here is clear: strength is IQ and lifting weights is doing tests/tasks to prepare for the tests (exposure to middle-class knowledge and skills). So when one leaves their “enriching environments” (in this case, the gym and a good nutritional environment), they then lose the gains they worked for. The parallel then becomes clear: leave the enriched environments and return to the baseline. This example I have just illustrated shows exactly how and why these gains “fade out” (though they don’t in all of these types of studies).
One objection to my comparison I can imagine an IQ-ist making is that training for strength (which is analogous to types of interventions in programs like HS), one can only get so strong as, for example, their frame allows, or that there is a limit to which one only get to a certain level of musculature. They may say that one can only get to a certain number of IQ and there, their “genetic potential” maxes out, as it would in the muscle-building and strength-gaining example. But the objection fails. Tests of ability (IQ tests) are cultural in nature. Since they are cultural in nature, then exposure to what’s on the test (middle-class knowledge and skills) will have one score better. That is, IQ tests are experience-dependent, as is body composition and strength, but such tests aren’t (1) construct valid and (2) such tests are biased due to the items selected to be on them. When looking at weights, we have an objective, valid measure. Sure, weight-lifting measures a whole slew of variables including, what it is intended to, strength. But it also measures a whole slew of other variables associated with weight training, dependent on numerous other variables.
Therefore, my example with weights illustrates that if one removes themselves from their enriching environments that allows X, then they will necessarily decline. But due to, in this example, muscle memory, they can quickly return to where they were. Such gains will “fade out” if, and only if, they discontinue their training and meal prep, among other things. The same is true for IQ in these intervention studies.
Howe (1997: 54-55) (this editorial here has the discussion, pulled directly from the book) discusses the study carried out by Zigler and Seitz. They measured the effects of a four year intervention program which emphasized math skills. They were inner-city children who were enrolled in the orgrwm at kindergarten. The program was successful, in that those who participated in the program were two years ahead of a control group, but a few heads after in a follow-up, they were only a year ahead. Howe (1997:54-55) explains why:
For instance, to score well at the achievement tests used with older children it is essential to have some knowledge of algebra and geometry, but Seitz found that while the majority of middle-class children were being taught these subjects, the disadvantaged pupils were not getting the necessary teaching. For that reason they could hardly be expected to do well. As Seitz perceived, the true picture was not one of fading ability but of diminishing use of it.
So in this case, the knowledge gained from the intervention was not lost. Do note, though, how middle-class knowledge continues to appear in these discussions. That’s because tests of ability are cultural in nature since culture-fair impossible (Cole, 2004). Cole imagines a West African Binet who constructs a test of Kpelle culture. Cole (2004) ends up concluding that:
tests of ability are inevitably cultural devices. This conclusion must seem dreary and disappointing to people who have been working to construct valid, culture-free tests. But from the perspective of history and logic, it simply confirms the fact, stated so clearly by Franz Boas half a century ago, that “mind, independent of experience, is inconceivable.”
So, in this case, the test would be testing Kpelle knowledge, and not middle-class cultural skills and knowledge, which proves that IQ tests are bound by culture and that culture-fair (“free”) tests are impossible. This, then, also shows why such gains in test scores decrease: they are not in the types of environments that are conducive to that type of culture-specific knowledge (see some examples of questions on IQ tests here).
The fact is the matter is this: that the individuals in such studies return to their “old” environments is why their IQ gains disappear. People just focus on the scores, say “They decreased”—hardly without thinking why. Why should test scores reflect the efficacy of the HS and similar programs and not the fact that outcomes for children in this program are substantially better than those who did not participate? For example:
HS compared to non-HS children faired better on cognitive and socio-emotive measures having fewer negative behaviors and (Zhai et al, 2011). Adults who were in the HS program are more likely to graduate high school, go to college and receive a seconday degree (Bauer and Schanzenbach, 2016). A pre-school program raised standardized test scores through grade 5. Those who attended HS were less likely to become incarcerated, become teen parents, and are more likely to finish high-school and enroll in college (Barr and Gibs, 2017).
The cause of the fading out of scores is simple: if you don’t use it you lose it, as can be seen with the examples given above. IQ scores can and do increase is evidenced by the Flynn effect, so that is not touched by the fade-out effect. But this “fading-out” (in most studies, see Howe for more information) of scores, in my opinion, is ancillary to the main point: those who attend HS and similar programs do have better outcomes in life than those who did not attend. The literature on the matter is vast. Therefore, the “fading-out” of test scores doesn’t matter, as outcomes for those who attended are better than outcomes for those who do not.
HS and similar programs show that IQ is, indeed, malleable and not “set” or “stable” as hereditarians claim. That IQ tests are experience-dependent implies that those who receive such interventions get a boost, but when they leave their abilities decrease, which is due to not learning any new ones along with returning to their previous, less-stimulating environments. The cause of the “fading-out” is therefore simple: During the intervention they are engrossed in an enriching environment, learning about, by proxy, middle-class knowledge and skills which helps with test performance. But after they’re done they return to their previous environments and so they do not put their skills to use and they therefore regress. Like with my muscle-building example: if you don’t use it, you lose it.
Validity for IQ tests is fleeting. IQ tests are said to be “validated” on the basis of performance with other IQ tests and that of job performance (see Richardson and Norgate, 2015). Further, IQ tests are claimed to not be biased against social class or racial group. Finally, through the process of “item selection”, test constructors make the types of distributions they want (normal) and get the results the want through the subjective procedure of removing items that don’t agree with their pre-conceived notions on who is or is not “intelligent.” Lastly, “intelligence” is descriptive measure, not an explanatory concept, and treating it like an explanatory measure can—and does—lead to circularity (of which is rife in the subject of IQ testing; see Richardson, 2017b and Taleb’s article IQ is largely a psuedoscientific swindle). This article will show that, on the basis of test construction, item analysis (selection and deselection of items) and the fact that there is no theory of what is being measured in so-called intelligence tests that they, in fact, do not test what they purport to.
Richardson (1991: 17) states that “To measure is to give … a more reliable sense of quantity than our senses alone can provide”, and that “sensed intelligence is not an objective quantity in the sense that the same hotness of a body will be felt by the same humans everywhere (given a few simple conditions); what, in experience, we choose to call ‘more’ intelligence, and what ‘less’ a social judgement that varies from people to people, employing different criteria or signals.” Richardson (1991: 17-18) goes on to say that:
Even if we arrive at a reliable instrument to parallel the experience of our senses, we can claim no more for it than that, without any underlying theory which relates differences in the measure to differences in some other, unobserved, phenomena responsible for those differences. Without such a theory we can never be sure that differences in the measure correspond with our sensed intelligence aren’t due to something else, perhaps something completely different. The phenomenon we at first imagine may not even exist. Instead, such verification most inventors and users of measures of intelligence … have simply constructed the source of differences in sensed intelligence as an underlying entity or force, rather in the way that children and naïve adults perceive hotness as a substance, or attribute the motion of objects to a fictitious impetus. What we have in cases like temperature, of course, are collateral criteria and measures that validate the theory, and thus the original measures. Without these, the assumed entity remains a fiction. This proved to be the case with impetus, and with many other naïve conceptions of nature, such as phlogiston (thought to account for differences in health and disease). How much greater such fictions are likely to be unobserved, dynamic and socially judged concepts like intelligence.
Richardson (1991: 32-35) then goes on to critique many of the old IQ tests, in that they had no way of being construct valid, and that the manuals did not even discuss the validity of the test—it was just assumed.
If we do not know what exactly is being measured when test constructors make and administer these tests, then how can we logically state that “IQ tests test intelligence”? Even Arthur Jensen admitted that psychometricians can create any type of distribution they please (1980: 71); he tacitly admits that tests are devised through the selection and deselection of items on IQ tests that correspond to the test constructors preconceived notions on what “intelligence” is. This, again, is even admitted by Jensen (1980: 147-148) who writes “The items must simply emerge arbitrarily from the heads of test constructors.”
We know, to build on Richardson’s temperature example, that we know exactly is what being measured when we look at the amount of mercury in a thermometer. That is, the concept of “temperature” and the instrument to measure it (the thermometer) were verified independently, without circular reliance on the thermometer itself (see Hasok Chang’s 2007 book Inventing Temperature). IQ tests, on the other hand, are, supposedly, “validated” through measures of job performance and correlations with other, previous tests assumed to be (construct) valid—but they were, of course, just assumed to be valid, it was never shown.
For another example (as I’ve shown with IQ many times) of a psychological construct that is not valid is ASD (autism spectrum disorder). Waterhouse, London, and Gilliberg (2016) write that “14 groups of findings reviewed in this paper that together argue that ASD lacks neurobiological and construct validity. No unitary ASD brain impairment or replicated unitary model of ASD brain impairment exists.” That a construct is valid—that is, it tests what it purports to, is of utmost importance to test measurement. Without it, we don’t know if we’re measuring something else completely different from what we hope—or purport—to.
There is another problem: the fact that, for one of the most-used IQ tests that there is no underlying theory of item selection, as seen in John Raven’s personal notes (see Carpenter, Just, and Shell, 1990). Items on the Raven were selected based on Raven’s intuition, and not any formal theory—the same can be said about, of course, modern-day IQ tests. Carpenter, Just, and Shell (1990) write that John Raven “used his intuition and clinical experience to rank order the difficulty of the six problem types . . . without regard to any underlying processing theory.”
These preconceived notions on what “intelligence” is, though, fail without (1) a theory of what intelligence is (which, as admitted by Ian Deary (2001), there is no theory of human intelligence like the way physics has theories); and (2) what ultimately is termed “construct validity”—that a test measures what it purports to. There are a few kinds of validity: and what IQ-ists claim the most is that IQ tests have predictive validity—that is, they can predict an individual’s outcome in life, and job performance (it is claimed). However, “intelligence” is “a descriptive measure, not an explanatory concept … [so] measures of intelligence level have little or no predictive value” (Howe, 1988).
Howe (1997: ix) also tells us that “Intelligence is … an outcome … not a cause. … Even the most confidently stated assertions about intelligence are often wrong, and the inferences that people have drawn from those assertions are unjustified.”
The correlation between IQ and school performance, according to Richardson (1991: 34) “may be a necessary aspect of the validity of tests, but is not a sufficient one. Such evidence, as already mentioned, requires a clear connection between a theory (a model of intelligence), and the values on the measure.” But, as Richardson (2017: 85) notes:
… it should come as no surprise that performance on them [IQ tests] is associated with school performance. As Robert L. Thorndike and Elizabeth P. Hagen explained in their leading textbook, Educational and Psychological Measurement, “From the very way in which the tests were assembled [such correlation] could hardly be otherwise.”
Gottfredson (2009) claims that the construct validity argument against IQ is “fallacious”, noting it as one of her “fallacies” on intelligence testing (one of her “fallacies” was the “interactionism fallacy”, which I have previously discussed). However, unfortunately for Gottfredson (2009), “the phenomena that testers aim to capture” are built into the test and, as noted here numerous times, preconceived by the constructors of the test. So, Gottfredson’s (2009) claim fails.
Such kinds of construction, too, come into the claim of a “normal distribution.” Just like with preconceptions of who is or is not “intelligent” on the basis of preconceived notions, the normal distribution, too, is an artifact of test construction, along the selection and deselection of items to conform with the test constructors’ presuppositions; the “bell curve” of IQ is created by the presuppositions that the test constructors have about people and society (Simon, 1997).
Charles Spearman, in the early 1900s, claims to have found a “general factor” that explains correlations between different tests. This positive manifold he termed “g” for “general intelligence.” Spearman stated “The (g) factor was taken, pending further information, to consist in something of the nature of an ‘energy’ or ‘power’…” (quoted in Richardson, 1991: 38). The refutation of “g” is a simple, logical, one: While a correlation between performances “may be a necessary requirement for a general factor … it is not a sufficient one.” This is because “it is quite possible for quite independent factors to produce a hierarchy of correlations without the existence of any underlying ‘general’ factor (Fancer, 1985a; Richardson and Bynner, 1984)” (Richardson, 1991: 38). The fact of the matter is, Spearman’s “g” has been refuted for decades (and was shown to be reified by Gould (1981), and further defenses of his concepts on “general intelligence”, like by Jensen (1998) have been refuted, most forcefully by Peter Schonemann. Though, “g” is something built into the test by way of test construction (Richardson, 2002).
Castles (2013: 93) notes that “Spearman did not simply discover g lurking in his data. Instead, he chose one peculiar interpretation of the relationships to demonstrate something in which he already believed—unitary, biologically based intelligence.”
So what explains differences in “g”? The same test construction noted above along with differences in social class, due to stress, self-confidence, test preparedness and other factors correlated with social class, termed the “sociocognitive-affective nexus” (Richardson, 2002).
Constance Hilliard, in her book Straightening the Bell Curve (Hilliard, 2012), notes that there were differences in IQ between rural and urban white South Africans. She notes that differences between those who spoke Afrikaans and those who spoke another language were completely removed through test construction (Hilliard, 2012: 116). Hilliard (2012) notes that if the tests that the constructors formulate don’t agree with their preconceived notions, they are then thrown out:
If the individuals who were supposed to come out on top didn’t score highly or, conversely, if the individuals who were assumed would be at the bottom of the scores didn’t end up there, then the test designers scrapped the test.
Sex differences in “intelligence” (IQ) have been the subject of some debate in the early-to-mid-1900s. Test constructors debated amongst themselves what to do about such differences between the sexes. Hilliard (2012) quotes Harrington (1984; in Perspectives on Bias in Mental Testing) who writes about normalizing test scores between men and women:
It was decided [by IQ test writers] a priori that the distribution of intelligence-test scores would be normal with a mean (X=100) and a standard deviation (SD=15), also that both sexes would have the same mean and distribution. To ensure the absence of sex differences, it was arranged to discard items on which the sexes differed. Then, if not enough items remained, when discarded items were reintroduced, they were balanced, i.e., for every item favoring males, another one favoring females was also introduced.
One who would construct a test for intellectual capacity has two possible methods of handling the problem of sex differences.
1 He may assume that all the sex differences yielded by his test items are about equally indicative of sex differences in native ability.
2 He may proceed on the hypothesis that large sex differences on items of the Binet type are likely to be factitious in the sense that they reflect sex differences in experience or training. To the extent that this assumption is valid, he will be justified in eliminating from his battery test items which yield large sex differences.
The authors of the New Revision have chosen the second of these alternatives and sought to avoid using test items showing large differences in percents passing. (McNemar 1942:56)
Change “sex differences” to “race” or “social class” differences and we can, too, change the distribution of the curve, along with notions of who is or is not “intelligent.” Previously low scorers can, by way of test construction, become high scorers, vice-versa for high scorers being made into low scorers. There is no logical—or empirical—justification for the inclusion of specific items on whatever IQ test is in question. That is, to put it another way, the inclusion of items on a test is subjective, which comes down to the test designers’ preconceived notions, and not an objective measure of what types of items should be on the test—as Raven stated, there is no type of underlying theory for the inclusion of items in the test, it is based on “intuition” (which is the same thing that modern-day test constructors do). These two quotes from IQ-ists in the early 20th century are paramount in the attack on the validity of IQ tests—and the causes for differences in scores between groups.
He and van de Vijver (2012: 7) write that “An item is biased when it has a different psychological meaning across cultures. More precisely, an item of a scale (e.g., measuring anxiety) is said to be biased if persons with the same trait, but coming from different cultures, are not equally likely to endorse the item (Van de Vijver & Leung, 1997).” Indeed, Reynolds and Suzuki (2012: 83) write that “Item bias due to“:
… “poor item translation, ambiguities in the original item, low familiarity/appropriateness of the item content in certain cultures, or influence of culture specifics such as nuisance factors or connotations associated with the item wording” (p. 127) (van de Vijver and Tanzer, 2004)
Drame and Ferguson (2017) note that their “Results indicate that use of the Ravens may substantially underestimate the intelligence of children in Mali” while the cause may be due to the fact that:
European and North American children may spend more time with play tasks such as jigsaw puzzles or connect the dots that have similarities with the Ravens and, thus, train on similar tasks more than do African children. If African children spend less time on similar tasks, they would have fewer opportunities to train for the Ravens (however unintentionally) reflecting in poorer scores. In this sense, verbal ability need not be the only pitfall in selecting culturally sensitive IQ testing approaches. Thus, differences in Ravens scores may be a cultural artifact rather than an indication of true intelligence differences. [Similar arguments can be found in Richardson, 2002: 291-293]
The same was also found by Dutton et al (2017) who write that “It is argued that the undeveloped nature of South Sudan means that a test based around shapes and analytic thinking is unsuitable. It is likely to heavily under-estimate their average intelligence.” So if the Raven has these problems cross-culturally (country), then it SHOULD have such biases within, say, America.
It is also true that the types of items on IQ tests are not as complex as everyday life (see Richardson and Norgate, 2014). Types of questions on IQ tests are, in effect, ones of middle-class knowledge and skills and, knowing how IQ tests are structured will make this claim clear (along with knowing the types of items that eventually make it onto the particular IQ test itself). Richardson (2002) has a few questions on modern-day IQ tests whereas Castles (2013), too, has a few questions from the Stanford-Binet. This, of course, is due to the social class of the test constructors. Some examples of some questions can be seen here:
‘What is the boiling point of water?’ ‘Who wrote Hamlet?’ ‘In what continent is Egypt?’ (Richardson, 2002: 289)
‘When anyone has offended you and asks you to excuse him—what ought you do?’ ‘What is the difference between esteem and affection?’ [this is from the Binet Scales, but “It is interesting to note that similar items are still found on most modern intelligence tests” (Castles, 2013).]]
Castles (2013: 150) further notes made-up examples of what is on the WAIS (since she cannot legally give questions away since she is a licensed psychologist), and she writes:
One section of the WAIS-III, for example, consists of arithmetic problems that the respondent must solve in his or her head. Others require test-takers to define a series of vocabulary words (many of which would be familiar only to skilled-readers), to answer school-related factual questions (e.g., “Who was the first president of the United States?” or “Who wrote the Canterbury Tales?”), and to recognize and endorse common cultural norms and values (e.g., “What should you do it a sale clerk accidentally gives you too much change?” or “Why does our Constitution call for division of powers?”). True, respondents are also given a few opportunities to solve novel problems (e.g., copying a series of abstract designs with colored blocks). But even these supposedly culture-fair items require an understanding of social conventions, familiarity with objects specific to American culture, and/or experience working with geometric shapes or symbols.
All of these factors coalesce into forming the claim—and the argument—that IQ tests are one of middle-class knowledge and skills. The thing is, contrary to the claims of IQ-ists, there is no such thing as a culture-free IQ test. Richardson (2002: 293) notes that “Since all human cognition takes place through the medium of cultural/psychological tools, the very idea of a culture-free test is, as Cole (1999) notes, ‘a contradiction in terms . . . by its very nature, IQ testing is culture bound’ (p. 646). Individuals are simply more or less prepared for dealing with the cognitive and linguistic structures built in to the particular items.”
Cole (1981) notes that “that the notion of a culture free IQ test is an absurdity” because “all higher psychological processes are shaped by our experiences and these experiences are culturally organized” (this is a point that Richardson has driven home for decades) while also—rightly—stating that “IQ tests sample school activities, and therefore, indirectly, valued social activities, in our culture.”
One of the last stands for the IQ-ist is to claim that IQ tests are useful for identifying at-risk individuals for learning disabilities (as Binet originally created the first IQ tests for). However, it is noted that IQ tests are not necessary—nor sufficient—for the identification of those with learning disabilities. Siegal (1989) states that “On logical and empirical grounds, IQ test scores are not necessary for the definition of learning disabilities.”
When Goddard brought the first IQ tests to America and translated them into English from French is when the IQ testing conglomerate really took off (see Zenderland, 1998 for a review). These tests were used to justify current social ranks. As Richardson (1991: 44) notes, “The measurement of intelligence in the twentieth century arose partly out of attempts to ‘prove’ or justify a particular world view, and partly for purposes of screening and social selection. It is hardly surprising that its subsequent fate has been one of uncertainty and controversy, nor that it has raised so many social and political issues (see, for example, Joynson 1989 for discussion of such issues).” So, what actual attempts at validation did the constructors of such tests need in the 20th century when they knew full-well what they wanted to show and, unsurprisingly, they observed it (since it was already going to happen since they construct the test to be that way)?
The conceptual arguments just given here point to a few things:
(1) IQ tests are not construct valid because there is no theory of intelligence, nor is there an underlying theory which relates differences in IQ (the unseen function) to, for example, a physiological variable. (See Uttal, 2012; 2014 for arguments against fMRI studies that purport to show differences in physiological variables cognition.)
(2) The fact that items on the tests are biased against certain classes/cultures; this obviously matters since, as noted above, there is no theory for the inclusion of items, it comes down to the subjective choice of the test designers, as noted by Jensen.
(3) ‘g’ is a reified mathematical abstraction; Spearman “discovered” nothing, he just chose the interpretation that, of course, went with his preconceived notion.
(4) The fact that sex differences in IQ scores were seen as a problem and, through item analysis, made to go away. This tells us that we can do the same for class/race differences in intelligence. Score differences are a function of test construction.
(5) The fact that the Raven has been shown to be biased in two African countries lends credence to the claims here.
So this then brings us to the ultimate claim of this article: IQ tests don’t test intelligence; they test middle-class knowledge and skills. Therefore, the scores on IQ tests are not that of intelligence, but of an index of one’s cultural knowledge of the middle class and its knowledge structure. This, IQ scores are, in actuality, “middle-class knowledge and skills” scores. So, contra Jensen (1980), there is bias in mental testing due to the items chosen for inclusion on the test (we have admission that score variances and distributions can change from IQ-ists themselves)
Look at a person with Down Syndrome (DS) and then look at an Asian. Do you see any similarities? Others, throughout the course of the 20th century have. DS is a disorder arising from chromosomal defects which causes mental and physical abnormalities, short stature, broad facial profile, and slanted eyes. Most likely, one suffering from DS has an extra copy of chromosome 21—which is why the disorder is called “trisomy 21” in the scientific literature.
I am not aware if most “HBDers” know this, but Asians in America were treated similarly to blacks in the mid-20th century (with similar claims made about genital and brain size). Whites used to be said to have the biggest brains out of all of the races but this changed sometime in the 20th century. Lieberman (2001: 72) writes that:
The shrinking of “Caucasoid” brains and cranial size and the rise of “Mongoloids” in the papers of J. Philippe Rushton began in the 1980s. Genes do not change as fast as the stock market, but the idea of “Caucasian” superiority seemed contradicted by emerging industrialization and capital growth in Japan, Taiwan, Hong Kong, Singapore, and Korea (Sautman 1995). Reversing the order of the first two races was not a strategic loss to raciocranial hereditarianism, since the major function of racial hierarchies is justifying the misery and lesser rights and opportunities of those at the bottom.
So Caucasian skulls began to shrink just as—coincidentally, I’m sure—Japan began to get out of its rut it got into after WWII. Morton noted that Caucasians had the biggest brains with Mongoloids in the middle and Africans with the smallest—then came Rushton to state that, in fact, it was East Asians who had the bigger brains. Hilliard (2012: 90-91) writes:
In the nineteenth century, Chinese, Japanese, and other Asian males were often portrayed in the popular press as a sexual danger to white females. Not surprising, as Lieberman pointed out, during this era, American race scientists concluded that Asians had smaller brains than whites did. At the same time, and most revealing, American children born with certain symptoms of mental retardation during this period were labeled “mongoloid idiots.” Because the symptoms of this condition, which we now call Down Syndrome, includes “slanting” eyes, the old label reinforced prejudices against Asians and assumptions that mental retardation was a peculiarly “mongoloid” racial characteristic.
[Hilliard also notes that “Scholars identified Asians as being less cognitively evolved and having smaller brains and larger penises than whites.” (pg 91)]
So, views on Asians were different back in the 19th and 20th centuries—it even being said that Asians had smaller brains and bigger penises than whites (weird…).
Mafrica and Fodale (2007) note that the history of the term “mongolism” began in 1866, with the author distinguishing between “idiotis”: the Ethiopian, the Caucasian, and the Mongoloid. What led Langdon (the author of the 1866 paper) to make this comparison was the almond-shaped eyes that DS people have as well. Though Mafrica and Fodale (2007: 439) note that it is possible that other traits could have forced him to make the comparison, “such as fine and straight hair, the distribution of apparatus piliferous, which appears to be sparse.” Mafrica and Fodale (2007: 439) also note more similarities between people with DS and Asians:
Down persons during waiting periods, when they get tired of standing up straight, crouch, squatting down, reminding us of the ‘‘squatting’’ position described by medical semeiotic which helps the venous return. They remain in this position for several minutes and only to rest themselves this position is the same taken by the Vietnamese, the Thai, the Cambodian, the Chinese, while they are waiting at a the bus stop, for instance, or while they are chatting.
There is another pose taken by Down subjects while they are sitting on a chair: they sit with their legs crossed while they are eating, writing, watching TV, as the Oriental peoples do.
Another, funnier, thing noted by Mafrica and Fodale (2007) is that people with DS may like to have a few plates across the table, while preferring foodstuffs that is high in MSG—monosodium glutamate. They also note that people with DS are more likely to have thyroid disorders—like hypothyroidism. There is an increased risk for congenital hypothyroidism in Asian families, too (Rosenthal, Addison, and Price, 1988). They also note that people with DS are likely “to carry out recreative–reabilitative activities, such as embroidery, wicker-working ceramics, book-binding, etc., that is renowned, remind the Chinese hand-crafts, which need a notable ability, such as Chinese vases or the use of chop-sticks employed for eating by Asiatic populations” (pg 439). They then state that “it may be interesting to know the gravity with which the Downs syndrome occurs in Asiatic population, especially in Chinese population.” How common is it and do they look any different from other races’ DS babies?
See, e.g., Table 2 from Emanuel et al (1968):
Emanuel et al (1968: 465) write that “Almost all of the stigmata of Down’s syndrome presented in Table 2 appear also to be of significance in this group of Chinese patients. The exceptions have been reported repeatedly, and they all probably occur in excess in Down’s syndrome.”
Examples such as this are great to show the contingencies of certain observations—like with racial differences in “intelligence.” Asians, today, are revered for “hard work”, being “very intelligent” and having “low crime rates.” But, even as recently as the mid 20th century—going back to the mid 18th century—Asians (or Mongoloids, as Rushton calls them) were said to have smaller brains and larger penises. Anti-miscegenation laws held for Asians, too of course, and so interracial marriage was forbidden with Asians and whites which was “to preserve the ‘racial integrity’ of whites” (Hilliard, 2012: 91).
Hilliard (2012) states that the effeminate, small-penis Asian man. Hilliard (2012: 86) writes:
However, it is also possible that establishing the racial supremacy of whites was not what drove this research on racial hierarchies. If so, the IQ researchers were probably justified in protesting their innocence, at least in regard to the charge of being racial supremacists, for in truth, the Asians’ top ranking might have unintentionally underscored that true sexual preoccupations underlying this research in the first place. It not seems that the real driving force behind such work was not racial bigotry so much as it was the masculine insecurities emanating from the unexamined sexual stereotypes still present within American popular culture. Scholars such as Rushton, Jensen, and Herrnstein provided a scientific vocabulary and mathematically dense charts and graphs to give intellectual polish to the preoccupations. Thus, it became useful to tout the Asians’ cognitive superiority but only so long as whites remained above blacks in the cognitive hierarchy.
Of course, by switching the racial hierarchy—but keeping the bottom the same—IQ researchers can say “We’re not racists! If we were, why would we state that Asians were better on trait T than we were!”, as has been noted by John Relethford (2001: 84) who writes that European-descended researchers “can now deflect charges of racism or ethnocentrism by pointing out that they no longer place themselves at the top. Lieberman aptly notes that this shift does not affect the major focus of many ideas regarding racial superiority that continue to place people of recent African descent at the bottom.” While biological anthropologist Fatima Jackson (2001: 83) states that “It is deemed acceptable for “Mongoloids” to have larger brains and better performance on intelligence tests than “Caucasoids,” since they are (presumably) sexually and reproductively compromised with small genitalia, low fertility, and delayed maturity.”
The main thesis of Straightening the Bell Curve is that preoccupations with brain and genital size is a driving part of these psychologists who study racial differences. Stating that Asians had smaller penises but larger heads though they were less likely to like sex while blacks had larger penises, smaller heads and were more likely to like sex while whites were, like Goldilocks, juuuuuust right—a penis size in-between Asians and blacks and a brain neither too big or too small. So, stating that X race had smaller brains and bigger penises seems, as Hilliard argues, to be a coping mechanism for certain researchers and to drive women away from that racial group.
In any case, how weird it is for Asians (“Mongoloids”) to be ridiculed as having small brains and large penises (a hilarious reversal of Rushton’s r/K bullshit) and then—all of a sudden—for them to come out on top over whites while whites are still over blacks in this racial hierarchy. How weird is it for the placements to change with certain economic events in a country’s history. Though, as many authors have noted, for instance Chua (1999), Asian men have faced emasculinazation and femininzation in American society. So, since they were seen to be “undersexed, they were thus perceived as minimal rivals to white men in the sexual competition for women” (Hilliard, 2012: 87).
So, just like the observation of racial/country IQ are contingent on the time and the place of the observation, so too is the observation of racial differences in certain traits and how they can be used for a political agenda. As Constance Hilliard (2012: 85) writes, referring to Professor Michael Billig’s article A dead idea that will not lie down (in reference to race science), “… scientific ideas did not develop in a vacuum but rather reflected underlying political and economic trends.“ And so, this is why “mongoloid idiots” and undersexed Asians appeared in American thought in the mid-20th century. So these ideas noted here—mongloidism, undersexed Asians, small penis, large penis, small brain, large brained Asians (based on the time and the place of the observation) show, again, the contingency of these racial hierarchies—which, of course, still stay with blacks on the bottom and whites above them. Is it not strange that whites moved a rung down on this hierarchy as soon as Rushton appeared in the picture? (Since, Morton noted that they had smaller heads than Caucasians, Lieberman, 2001.)
The origins of the term “mongloidism” are interesting—especially with how they tie into the origins of the term Down Syndrome and how they related to the “Asian look” along with all of the peculiarities of people with DS and (to Westerners), the peculiarities of Asian living. This is, of course, why one’s political motives, while not fully telling of their objectives and motivations, may—in a way—point one in the right direction as to why they are formulating such hypotheses and theories.
“Women may be hardwired to prefer pink“, “Study: Why Girls Like Pink“, “Do girls like pink because of their berry-gathering female ancestors?“, “Pink for a girl and blue for a boy – and it’s all down to evolution” are some of the popular news headlines that came out 12 years ago when Hurlbert and Ling (2007) published their study Biological components of sex differences in color preference. They used 208 people, 171 British Caucasians, 79 of whom were male, 37 Han Chinese (19 of whom were male). Hurlbert and Ling (2007) found that “females prefer colors with ‘reddish’ contrast against the background, whereas males prefer the opposite.”
Both males and females have a preference for blueish and reddish hues, and so, women liking pink is an evolved trait, on top of having a preference for blue. The authors “speculate that this sex difference arose from sex-specific functional specializations in the evolutionary division of labour.” So specializing for gathering berries, the “female brain” evolved “trichromatic adaptations”—that is, three colors are seen—which is the cause for women preferring “redder” hues. Since women were gatherers—while men hunters—they needed to be able to discern redder/pinker hues to be able to gather berries. Hurlbert and Ling (2007) also state that there is an alternative explanation which “is the need to discriminate subtle changes in skin color due to emotional states and social-sexual signals ; again, females may have honed these adaptations for their roles as care-givers and ‘empathizers’ .”
The cause for sex differences in color preference are simple: men and women faced different adaptive problems in their evolutionary history—men being the hunters and women the gatherers—and this evolutionary history then shaped color preferences in the modern world. So women’s brains are more specialized for gathering-type tasks, as to be able to identify ripe fruits with a pinker hue, either purple or red. Whereas for males they preferred green or blue—implying that as men evolved, the preference for these colors was due to the colors that men encountered more frequently in their EEA (evolutionary environment of adaptedness).
He et al (2011) studied 436 Chinese college students from a Chinese university. They found that men preferred “blue > green > red > yellow > purple > orange > white > black > pink > gray > brown,” while women preferred “purple > blue > yellow > green > pink > white > red > orange > black > gray > brown” (He et al, 2011: 156). So men preferred blue and green while women preferred pink, purple and white. Here is the just-so story (He et al, 2011: 157-158):
According to the Hunter-Gatherer Theory, as a consequence of an adaptive survival strategy throughout the hunting-gathering environment, men are better at the hunter-related task, they need more patience but show lower anxiety or neuroticism, and, therefore would prefer calm colors such as blue and green; while women are more responsible to the gatherer-related task, sensitive to the food-related colors such as pink and purple, and show more maternal nurturing and peacefulness (e.g., by preferring white).
Just-so stories like this come from the usual suspects (e.g., Buss, 2005; Schmidt, 2005). Regan et al (2001) argue that “primate colour vision has been shaped by the need to find coloured fruits amongst foliage, and the fruits themselves have evolved to be salient to primates and so secure dissemination of their seeds.” Men are more sensitive to blue-green hues in these studies, and, according to Vining (2006), this is why men prefer these colors: it would have been easier for men to hunt if they could discern between blue and green hues; that men like these kinds of colors more than the other “feminine” colors is evidence in favor of the “hunter-gatherer” theory.
(Image from here.)
So, according to evolutionary psychologists, there is an evolutionary reason for these sex differences in color preferences. If men were more likely to like blueish-greenish hues over red ones, then we can say that it was a specific adaptation from the hunting days: men need to be able to ascertain color differences which would have them be better hunters—preferring blue for, among other reasons, the ability to notice sky and water, as they would be better hunters if they did. And so, according to the principle of evolution by natural selection, the men who could ascertain these colors and hues had better reproductive success over those that could not, and so those men passed their genes onto the next generation, which included those color-sensing genes. The same is true for women: that women prefer pinkish, purpleish hues is evidence that, in an evolutionary context, they needed to ascertain pinkish, purpleish colors as to identify ripe fruits. And so again, according to this principle of natural selection, these women who could better ascertain colors and hues more likely to be seen in berries passed their genes on to the next generation, too.
This theory hinges, though, on Man the Hunter and Woman the Gatherer. Men ventured out to hunt—which explains the man’s color preferences—while women stayed at the ‘home’ and took care of the children and looked to gather berries—which explains women color preferences (gathering pink berries, discriminating differences in skin color due to emotional states). So the hypothesis must have a solid evolutionary basis—it makes sense and comports to the data we have, so it must be true, right?
Here’s the thing: boys and girls didn’t always wear blue and pink respectively; this is something that has recently changed. Jasper Pickering, writing for The Business Insider explains this well in an interview with color expert Gavin Moore:
“In the early part of the 20th Century and the late part of the 19th Century, in particular, there were regular comments advising mothers that if you want your boy to grow up masculine, dress him in a masculine colour like pink and if you want your girl to grow up feminine dress her in a feminine colour like blue.”
“This was advice that was very widely dispensed with and there were some reasons for this. Blue in parts of Europe, at least, had long been associated as a feminine colour because of the supposed colour of the Virgin Mary’s outfit.”
“Pink was seen as a kind of boyish version of the masculine colour red. So it gradually started to change however in the mid-20th Century and eventually by about 1950, there was a huge advertising campaign by several advertising agencies pushing pink as an exclusively feminine colour and the change came very quickly at that point.”
While Smithsonian Magazine quotes the Earnshaw Infants Department (from 1918):
The generally accepted rule is pink for the boys, and blue for the girls. The reason is that pink , being a more decided and stronger color, is more suitable for the boy, while blue, which is more delicate and dainty, is prettier for the girl.
So, just like “differences” in “cognitive ability (i.e., how if modern-day “IQ” researchers would have been around in antiquity they would have formulated a completely different theory of intelligence and not used Cold Winters Theory), if these EP-minded researchers had been around in the early 20th century, they’d have seen the opposite of what they see today: boys wearing pink and girls wearing blue. What, then, could account for such observations? I’d guess something like “Boys like pink because it’s a hue of red and boys, evolved as hunters, had to like seeing red as they would be fighting either animals or other men and would be seeing blood a majority of the time.” As for girls liking blue, I’d guess something like “Girls had to be able to ascertain green leaves from the blue sky, and so, they were better able to gather berries while men were out hunting.”
That’s the thing with just-so stories: you can think of an adaptive story for any observation. As Joyner, Boros, and Fink (2018: 524) note for the Bajau diving story and the sweat gland story “since the dawn of the theory of evolution, humans have been incredibly creative in coming up with evolutionary and hence genetic narratives and explanations for just about every human trait that can be measured“, and this can most definitely be said for the sex differences in color preferences story. We humans are very clever at making everything an adaptive story when there isn’t one to be found. Even if it can be established that there are such things as “trichomatic adaptations” that evolved for men and women liking the colors they do, then, the combination of functional effect (women liking pink for better gathering and men liking blue and green for better hunting) and that the trait truly was “selected-for” does not license the claim that selection acted on the specific trait in question since we cannot “exclude the possibility that selection acted on some other pleiotropic effect of the mutation” (Nielsen, 2009).
In sum, the causes for sex differences in color preferences, today, makes no sense. These researchers are just looking for justification for current cultural/societal trends in which sex likes which colors and then weaving “intricate” adaptive stories in order to claim that part of this is due to men’s and women’s “different” evolutionary history—man as hunter and woman as gatherer. However, due to how quickly things change in culture and society, we can be asking questions we would not have asked before due to how quickly society changes, and then ascribe evolutionary causes for out observations. As Constance Hilliard (2012: 85) writes, referring to Professor Michael Billig’s article A dead idea that will not lie down (in reference to race science), “… scientific ideas did not develop in a vacuum but rather reflected underlying political and economic trends.“
On Twitter, getting into discussions with Charles Murray acolytes, someone asked me to write a short piece describing the argument in The Bell Curve (TBC) by Herrnstein and Murray (H&M). This is because I was linking my short Twitter thread on the matter, which can be seen here:
In TBC, H&M argue that America is becoming increasingly stratified by social class, and the main reason is due to the “cognitive elite.” The assertion is that social class in America used to be determined by one’s social origin is now being determined by one’s cognitive ability as tested by IQ tests. H&M make 6 assertions in the beginning of the book:
(i) That there exists a general cognitive factor which explains differences in test scores between individuals;
(ii) That all standardized tests measure this general cognitive factor but IQ tests measure it best;
(iii) IQ scores match what most laymen mean by “intelligent”, “smart”, etc.;
(iv) Scores on IQ tests are stable, but not perfectly so, throughout one’s life;
(v) Administered properly, IQ tests are not biased against classes, races, or ethnic groups; and
(vi) Cognitive ability as measured by IQ tests is substantially heritable at 40-80%/
In the second part, H&M argue that high cognitive ability predicts desireable outcomes whereas low cognitve ability predicts undesireable outcomes. Using the NLSY, H&M show that IQ scores predict one’s life outcomes better than parental SES. All NLSY participants took the ASVAB, while others took IQ tests which were then correlated with the ASVAB and the correlation came out to .81.
They analyzed whether or not one has ever been incarcerated, unemployed for more than one month in the year; whether or not they dropped out of high-school; whether or not they were chronic welfare recipients; among other social variables. When they controlled for IQ in these analyses, most of the differences between ethnic groups, for example, disappeared.
Now, in the most controversial part of the book—the third part—they discuss ethnic differences in IQ scores, stating that Asians have higher IQs than whites who have higher IQs than ‘Hispanics’ who have higher IQs than blacks. H&M argue that the white-black IQ gap is not due to bias since they do not underpredict blacks’ school or job performance. H&M famously wrote about the nature of lower black IQ in comparison to whites:
If the reader is now convinced that either the genetic or environmental explanation has won out to the exclusion of the other, we have not done a sufficiently good job of presenting one side or the other. It seems highly likely to us that both genes and environment have something to do with racial differences. What might the mix be? We are resolutely agnostic on that issue; as far as we can determine, the evidence does not yet justify an estimate.
Finally, in the fourth and last section, H&M argue that efforts to raise cognitive ability through the alteration of the social and physical environment have failed, though we may one day find some things that do raise ability. They also argue that the educational experience in America neglects the small, intelligent minority and that we should begin to not neglect them as they will “greatly affect how well America does in the twenty-first century” (H&M, 1996: 387). They also argue forcefully against affirmative action, in the end arguing that equality of opportunity—over equality of outcome—should be the role of colleges and workplaces. They finally predict that this “cognitive elite” will continuously isolate themselves from society, widening the cognitive gap between them.
… Rushton is a serious scholar who has amassed serious data. (Herrnstein and Murray, 1996: 564)
How serious of a scholar is Rushton and what kind of “serious data” did he amass? Of course, since The Bell Curve is a book on IQ, H&M mean that his IQ data is “serious data” (I am not aware of Murray’s views on Rushton’s penis “data”). Many people over the course of Rushton’s career have pointed out that Rushton was anything but a “serious scholar who has amassed serious data.” Take, for example, Constance Hilliard’s (2012: 69) comments on Rushton’s escapades at a Toronto shopping mall where he trolled the mall looking for blacks, whites, and Asians (he payed them 5 dollars a piece) to ask them questions about their penis size, sexual frequency, and how far they can ejaculate:
An estimated one million customers pass through the doors of Toronto’s premier shopping mall, Eaton Centre, in any given week. Professor Jean-Phillipe Rushton sought out subjects in its bustling corridors for what was surely one of the oddest scientific studies that city had known yet—one that asked about males’ penis sizes. In Rushton’s mind, at least, the inverse correlation among races between intelligence and penis size was irrefutable. In fact, it was Rushton who made the now famous assertion in a 1994 interview with Rolling Stone magazine: “It’s a trade-off: more brains or more penis. You can’t have everything. … Using a grant from the conservative Pioneer Fund, the Canadian professor paid 150 customers at the Eaton Centre mall—one-third of whom he identified as black, another third white, and the final third Asian—to complete an elaborate survey. It included such questions such as how far the subject could ejaculate and “how large [is] your penis?” Rushton’s university, upon learning of this admittedly unorthodox research project, reprimanded him for not having the project preapproved. The professor defended his study by insisting that approval for off-campus experiments had never been required before. “A zoologist,” he quipped, “doesn’t need permission to study squirrels in his back yard.” [As if one does not need to get approval from the IRB before undertaking studies on humans… nah, this is just an example of censorship from the Left who want to hide the truth of ‘innate’ racial differences!]
(I wonder if Rushton’s implicit assumption here was that, since the brain takes most of a large amount of our consumed energy to power, that since blacks had smaller brains and larger penises that the kcal consumed was going to “power” their larger penis? The world may never know.)
Imagine you’re walking through a mall with your wife and two children. As your shopping, you see a strange man with a combover, holding measuring tape, approaching different people (which you observe are the three different social-racial groups) asking them questions for a survey. He then comes up to you and your family, pulling you aside to ask you questions about the frequency of the sex you have, how far you can ejaculate and how long your penis is.
Rushton: “Excuse me sir. My name is Jean-Phillipe Rushton and I am a psychologist at the University of Western Ontario. I am conducting a research study, surveying individuals in this shopping mall, on racial differences in penis size, sexual frequency, and how far they can ejaculate.”
You: “Errrrr… OK?”, you say, looking over uncomfortably at your family, standing twenty feet away.
Rushton: “First off, sir, I would like to ask you which race you identify as.”
You: “Well, professor, I identify as black, quote obviously“, as you look over at your wife who has a stern look on her face.
Rushton: “Well, sir, my first question for you is: How far can you ejaculate?”
You: “Ummm I don’t know, I’ve never thought to check. What kind of an odd question is that?“, you say, as you try to keep your voice down as to not alert your wife and children to what is being discussed.
Rushton: “OK, sir. How long would you say your penis is?”
You: “Professor, I have never measured it but I would say it is about 7 inches“, you say, with an uncomfortable look on your face. You think “Why is this strange man asking me such uncomfortable questions?
Rushton: “OK, OK. So how much sex would you say you have with your wife? And what size hat do you wear?“, asked Rushton, it seeming like he’s sizing up your whole body, with a twinkle in his eye.
You: “I don’t see how that’s any of your business, professor. What I do with my wife in the confines of my own home doesn’t matter to you. What does my hat size have to do with anything?”, you say, not knowing Rushton’s ulterior motives for his “study.” “I’m sorry, but I’m going to have to cut this interview short. My wife is getting pissed.”
Rushton: “Sir, wait!! Just a few more questions!”, Rushton says while chasing you with the measuring tape dragging across the ground, while you get away from him as quickly as possible, alerting security to this strange man bothering—harasasing—mall shoppers.
If I was out shopping and some strange man started asking me such questions, I’d tell him tough luck bro, find someone else. (I don’t talk to strange people trying to sell me something or trying to get information out of me.) In any case, what a great methodology, Rushton, because men lie about their penis size when asked.
Hilliard (2012: 71-72) then explains how Rushton used the “work” of the French Army Surgeon (alias Dr. Jacobus X):
Writing under the pseudonym Dr. Jacobus X, the author asserted that it was a personal diary that brought together thirty years of medical practice as a French government army surgeon and physician. Rushton was apparently unaware that the book while unknown to American psychologists, was familiar to anthropologists working in Africa and Asia and that they had nicknamed the genre from which it sprang “anthroporn.” Such books were not actually based on scientific research at all; rather, they were a uniquely Victorian style of pornography, thinly disguised as serious medical field research. Untrodden Fields [the title of Dr. Jacobus X’s book that Rushton drew from] presented Jacobus X’s observations and photographs of the presumably lurid sexual practices of exotic peoples, including photographs of the males’ mammoth-size sexual organs.
In the next fifteen years, Rushton would pen dozens of articles in academic journals propounding his theories of an inverse correlation among the races between brain and genital size. Much of the data he used to “prove” the enormity of the black male organ, which he then correlated inversely to IQ, came from Untrodden Fields. [Also see the discussion of “French Army Surgeon” in Weizmann et al, 1990: 8. See also my articles on penis size on this blog.]
Rushton also cited “research” from the Penthouse forum (see Rushton, 1997: 169). Citing an anonymous “field surgeon”, the Penthouse Forum, and asking random people in a mall questions about their sexual history, penis size and how far they can ejaculate. Rushton’s penis data, and even one of the final papers he penned “Do pigmentation and the melanocortin system modulate aggression and sexuality in humans as they do in other animals?” (Rushton and Templer, 2012) is so full of flaws I can’t believe it got past review. I guess a physiologist was not on the review board when Rushton’s and Templer’s paper went up for review…
Rushton pushed the just-so story of cold winters (which was his main thing and his racial differences hypothesis hinged on it), along with his long-refuted human r/K selection theory (see Anderson, 1991; Graves, 2002). Also watch the debate between Rushton and Graves. Rushton got quite a lot wrong (see Flynn, 2019; Cernovsky and Litman, 2019), as a lot of people do, but he was in no way a “serious scholar”.
Why yes, Mr. Herrnstein and Mr. Murray, Rushton was, indeed, a very serious scholar who has amassed serious data.
1843 Magazine published an article back in July titled The Curse of Genius, stating that “Within a few points either way, IQ is fixed throughout your life …” How true is this claim? How much is “a few points”? Would it account for any substantial increase or decrease? A few studies do look at IQ scores in one sample longitudinally. So, if this is the case, then IQ is not “like height”, as most hereditarians claim—it being “like height” since height is “stable” at adulthood (like IQ) and only certain events can decrease height (like IQ). But these claims fail.
IQ is, supposedly, a stable trait—that is, like height, at a certain age, it does not change. (Other than sufficient life events, such as having a bad back injury that causes one to slouch over, causing a decrease in height, or getting a traumatic brain injury—though that does not always decrease IQ scores). IQ tests supposedly measure a stable biological trait—“g” or general intelligence (which is built into the test, see Richardson, 2002 and see Schonemann’s papers for refutations on Jensen’s and Spearman’s “g“).
IQ levels are expected to stick to people like their blood group or their height. But imagine a measure of a real, stable bodily function of an individual that is different at different times. You’d probably think what a strange kind of measure. IQ is just such a measure. (Richardson, 2017: 102)
Neuroscientist Allyson Mackey’s team, for example, found “that after just eight weeks of playing these games the kids showed a pretty big IQ change – an improvement of about 30% or about 10 points in IQ.” Looking at a sample of 7-9 year olds, Mackey et al (2011) recruited children from low SES backgrounds to participate in cognitive training programs for an hour a day, 2 days a week. They predicted that children from a lower SES would benefit more from such cognitive/environmental enrichment (indeed, think of the differences between lower and middle SES people).
Mackey et al (2011) tested the children on their processing speed (PS), working memory (WM), and fluid reasoning (FR). Assessing FR, they used a matrix reasoning task with two versions (for the retest after the 8 week training). For PS, they used a cross-out test where “one must rapidly identify and put a line through each instance of a specific symbol in a row of similar symbols” (Mackey et al, 2011: 584). While the coding “is a timed test in which one must rapidly translate digits into symbols by identifying the corresponding symbol for a digit provided in a legend” (ibid.) which is a part of the WISC IV. Working memory was assessed through digit and spatial span tests from the Wechsler Memory Scale.
The kinds of games they used were computerized and non-computerized (like using a Nintendo DS). Mackey et al (2011: 585) write:
Both programs incorporated a mix of commercially available computerized and non-computerized games, as well as a mix of games that were played individually or in small groups. Games selected for reasoning training demanded the joint consideration of several task rules, relations, or steps required to solve a problem. Games selected for speed training involved rapid visual processing and rapid motor responding based on simple task rules.
So at the end of the 8-week program, cognitive abilities increased in both groups. For the children in the reasoning training, they solved an average of 4.5 more matrices than their previous try. Mackey et al (585-586) write:
Before training, children in the reasoning group had an average score of 96.3 points on the TONI, which is normed with a mean of 100 and a standard deviation of 15. After training, they had an average score of 106.2 points. This gain of 9.9 points brought the reasoning ability of the group from below average for their age. [But such gains were not significant on the test of nonverbal intelligence, showing an increase of 3.5 points.]
One of the biggest surprises was that 4 out of the 20 children in the reasoning training showed an increase of over 20 points. This, of course, refutes the claim that such “ability” is “fixed”, as hereditarians have claimed. Mackey et al (2011: 587) writes that “the very existence and widespread use of IQ tests rests on the assumption that tests of FR measure an individual’s innate capacity to learn.” This, quite obviously, is a false claim. (This claim comes from Cattell, no less.) This buttresses the claim that IQ tests are, of course, experience dependent.
This study shows that IQ is not malleable and that exposure to certain cultural tools leads to increases in test scores, as hypothesized (Richardson, 2002, 2017).
Salthouse (2013) writes that:
results from different types of approaches are converging on a conclusion that practice or retest contributions to change in several cognitive abilities appear to be nearly the same magnitude in healthy adults between about 20 and 80 years of age. These findings imply that age comparisons of longitudinal change are not confounded with differences in the influences of retest and maturational components of change, and that measures of longitudinal change may be underestimates of the maturational component of change at all ages.
Moreno et al (2011) show that after 20 days of computerized training, children in the music group showed enhanced scores on a measure of verbal ability—90 percent of the sample showed the same improvement. They further write that “the fact that only one of the groups showed a positive correlation between brain plasticity (P2) and verbal IQ changes suggests a link between the specific training and the verbal IQ outcome, rather than improvement due to repeated testing.”
Schellenberg (2004) describes how there was an advertisement looking for 6 year olds to enroll them in art lessons. There were 112 children enrolled into four groups: two groups received music lessons for a year, on either a standard keyboard or they had Kodaly voice training while the other two groups received either drama training or no training at all. Schellenberg (2004: 3) writes that “Children in the control groups had average
increases in IQ of 4.3 points (SD = 7.3), whereas the music groups had increases of 7.0 points (SD = 8.6).” So, compared to either drama training or no training at all, the children in the music training gained 2.7 IQ points more.
(Figure 1 from Schellenberg, 2004)
Ramsden et al (2011: 3-4) write:
The wide range of abilities in our sample was confirmed as follows: FSIQ ranged from 77 to 135 at time 1 and from 87 to 143 at time 2, with averages of 112 and 113 at times 1 and 2, respectively, and a tight correlation across testing points (r 5 0.79; P , 0.001). Our interest was in the considerable variation observed between testing points at the individual level, which ranged from 220 to 123 for VIQ, 218 to 117 for PIQ and 218 to 121 for FSIQ. Even if the extreme values of the published 90% confidence intervals are used on both occasions, 39% of the sample showed a clear change in VIQ, 21% in PIQ and 33% in FSIQ. In terms of the overall distribution, 21% of our sample showed a shift of at least one population standard deviation (15) in the VIQ measure, and 18% in the PIQ measure. [Also see The Guardian article on this paper.[
Richardson (2017: 102) writes “Carol Sigelman and Elizabeth Rider reported the IQs of one group of children tested at regular intervals between the ages of two years and seventeen years. The average difference between a child’s highest and lowest scores was 28.5 points, with almost one-third showing changes of more than 30 points (mean IQ is 100). This is sufficient to move an individual from the bottom to the top 10 percent or vice versa.” [See also the page in Sigelman and Rider, 2011.]
Mortensen et al (2003) show that IQ remains stable in mid- to young adulthood in low birthweight samples. Schwartz et al (1975: 693) write that “Individual variations in patterns of IQ changes (including no changes over time) appeared to be related to overall level of adjustment and integration and, as such, represent a sensitive barometer of coping responses. Thus, it is difficult to accept the notion of IQ as a stable, constant characteristic of the individual that, once measured, determines cognitive functioning for any age level for any test.”
There is even instability in IQ seen in high SES Guatemalans born between 1941-1953 (Mansukoski et al, 2019). Mansukoski et al’s (2019) analysis “highlight[s] the complicated nature of measuring and interpreting IQ at different ages, and the many factors that can introduce variation in the results. Large variation in the pre-adult test scores seems to be more of a norm than a one-off event.” Possible reasons for the change could be due to “adverse life events, larger than expected deviations of individual developmental level at the time of the testing and differences between the testing instruments” (Mansukoski et al, 2019). They also found that “IQ scores did not significantly correlate with age, implying there is no straightforward developmental cause behind the findings“, how weird…
Summarizing such studies that show an increase in IQ scores in children and teenagers, Richardson (2017: 103) writes:
Such results suggest that we have no right to pin such individual differences on biology without the obvious, but impossible, experiment. That would entail swapping the circumstances of upper-and lower-class newborns—parents’ inherited wealth, personalities, stresses of poverty, social self-perception, and so on—and following them up, not just over years or decades, but also over generations (remembering the effects of maternal stress on children, mentioned above). And it would require unrigged tests based on proper cognitive theory.
In sum, the claim that IQ is stable at a certain age like another physical trait is clearly false. Numerous interventions and reasons can increase or decrease one’s IQ score. The results discussed in this article show that familiarity to certain types of cultural tools increases one’s score (like in the low SES group tested in Mackey et al, 2011). Although the n is low (which I know is one of the first things I will hear), I’m not worried about that. What I am worried about is the individual change in IQ at certain ages, and they show that. So the results here show support for Richardson’s (2002) thesis that “IQ scores might be more an index of individuals’ distance from the cultural tools making up the test than performance on a singular strength variable” (Richardson, 2012).
IQ is not stable; IQ is malleable, whether through exposure to certain cultural/class tools or through certain aspects that one is exposed to that are more likely to be included in certain classes over others. Indeed, this lends credence to Castles’ (2013) claim that “Intelligence is in fact a cultural construct, specific to a certain time and place.“
Why do some groups of people use chopsticks and others do not? Years back, created a thought experiment. So he found a few hundred students from a university and gathered DNA samples from their cheeks which were then mapped for candidate genes associated with chopstick use. Come to find out, one of the associated genetic markers was associated with chopstick use—accounting for 50 percent of the variation in the trait (Hamer and Sirota, 2000). The effect even replicated many times and was highly significant: but it was biologically meaningless.
One may look at East Asians and say “Why do they use chopsticks” or “Why are they so good at using them while Americans aren’t?” and come to such ridiculous studies such as the one described above. They may even find an association between the trait/behavior and a genetic marker. They may even find that it replicates and is a significant hit. But, it can all be for naught, since population stratification reared its head. Population stratification “refers to differences in allele frequencies between cases and controls due to systematic differences in ancestry rather than association of genes with disease” (Freedman et al, 2004). It “is a potential cause of false associations in genetic association studies” (Oetjens et al, 2016).
Such population stratification in the chopsticks gene study described above should have been anticipated since they studied two different populations. Kaplan (2000: 67-68) described this well:
A similar argument, bu the way, holds true for molecular studies. Basically, it is easy to mistake mere statistical associations for a causal connection if one is not careful to properly partition one’s samples. Hamer and Copeland develop and amusing example of some hypothetical, badly misguided researchers searching for the “successful use of selected hand instruments” (SUSHI) gene (hypothesized to be associated with chopstick usage) between residents in Tokyo and Indianapolis. Hamer and Copeland note that while you would be almost certain to find a gene “associated with chopstick usage” if you did this, the design of such a hypothetical study would be badly flawed. What would be likely to happen here is that a genetic marker associated with the heterogeneity of the group involved (Japanese versus Caucasian) would be found, and the heterogeneity of the group involved would independently account for the differences in the trait; in this case, there is a cultural tendency for more people who grow up in Japan than people who grow up in Indianapolis to learn how to use chopsticks. That is, growing up in Japan is the causally important factor in using chopsticks; having a certain genetic marker is only associated with chopstick use in a statistical way, and only because those people who grow up in Japan are also more likely to have the marker than those who grew up in Indianapolis. The genetic marker is in no way causally related to chopstick use! That the marker ends up associated with chopstick use is therefore just an accident of design (Hamer and Copeland, 1998, 43; Bailey 1997 develops a similar example).
In this way, most—if not all—of the results of genome-wide association studies (GWASs) can be accounted for by population stratification. Hamer and Sirota (2000) is a warning to psychiatric geneticists to not be quick to ascribe function and causation to hits on certain genes from association studies (of which GWASs are).
Many studies, for example, Sniekers et al (2017), Savage et al (2018) purport to “account for” less than 10 percent of the variance in a trait, like “intelligence” (derived from non-construct valid IQ tests). Other GWA studies purport to show genes that affect testosterone production and that those who have a certain variant are more likely to have low testosterone (Ohlsson et al, 2011). Population stratification can have an effect here in these studies, too. GWASs; they give rise to spurious correlations that arise due to population structure—which is what GWASs are actually measuring, they are measuring social class, and not a “trait” (Richardson, 2017b; Richardson and Jones, 2019). Note that correcting for socioeconomic status (SES) fails, as the two are distinct (Richardson, 2002). (Note that GWASs lead to PGSs, which are, of course, flawed too.)
Such papers presume that correlations are causes and that interactions between genes and environment either don’t exist or are irrelevant (see Gottfredson, 2009 and my reply). Both of these claims are false. Correlations can, of course, lead to figuring out causes, but, like with the chopstick example above, attributing causation to things that are even “replicable” and “strongly significant” will still lead to false positives due to that same population stratification. Of course, GWAS and similar studies are attempting to account for the heriatbility estimates gleaned from twin, family, and adoption studies. Though, the assumptions used in these kinds of studies are shown to be false and, therefore, heritability estimates are highly exaggerated (and flawed) which lead to “looking for genes” that aren’t there (Charney, 2012; Joseph et al, 2016; Richardson, 2017a).
Richardson’s (2017b) argument is simple: (1) there is genetic stratification in human populations which will correlate with social class; (2) since there is genetic stratification in human populations which will correlate with social class, the genetic stratification will be associated with the “cognitive” variation; (3) if (1) and (2) then what GWA studies are finding are not “genetic differences” between groups in terms of “intelligence” (as shown by “IQ tests”), but population stratification between social classes. Population stratification still persists even in “homogeneous” populations (see references in Richardson and Jones, 2019), and so, the “corrections for” population stratification are anything but.
So what accounts for the small pittance of “variance explained” in GWASs and other similar association studies (Sniekers et al, 2017 “explained” less than 5 percent of variance in IQ)? Population stratification—specifically it is capturing genetic differences that occurred through migration. GWA studies use huge samples in order to find the genetic signals of the genes of small effect that underline the complex trait that is being studied. Take what Noble (2018) says:
As with the results of GWAS (genome-wide association studies) generally, the associations at the genome sequence level are remarkably weak and, with the exception of certain rare genetic diseases, may even be meaningless (13, 21). The reason is that if you gather a sufficiently large data set, it is a mathematical necessity that you will find correlations, even if the data set was generated randomly so that the correlations must be spurious. The bigger the data set, the more spurious correlations will be found (3).
Calude and Longo (2016; emphasis theirs) “prove that very large databases have to contain arbitrary correlations. These correlations appear only due to the size, not the nature, of data. They can be found in “randomly” generated, large enough databases, which — as we will prove — implies that most correlations are spurious.”
So why should we take association studies seriously when they fall prey to the problem of population stratification (measuring differences between social classes and other populations) along with the fact that big datasets lead to spurious correlations? I fail to think of a good reason why we should take these studies seriously. The chopsticks gene example perfectly illustrates the current problems we have with GWASs for complex traits: we are just seeing what is due to social—and other—stratification between populations and not any “genetic” differences in the trait that is being looked at.
The Modern Synthesis (MS) has entrenched evolutionary thought since its inception in the mid-1950s. The MS is the integreation of Darwinian natural selection and Mendelian genetics. Key assumptions include “(i) evolutionarily significant phenotypic variation arises from genetic mutations that occur at a low rate independently of the strength and direction of natural selection; (ii) most favourable mutations have small phenotypic effects, which results in gradual phenotypic change; (iii) inheritance is genetic; (iv) natural selection is the sole explanation for adaptation; and (v) macro-evolution is the result of accumulation of differences that arise through micro-evolutionary processes” (Laland et al, 2015).
Laland et al (2015) even have a helpful table on core assumptions of both the MS and Extended Evolutionary Synthesis (EES). The MS assumptions are on the left while the EES assumptions are on the right.
Darwinian cheerleaders, such as Jerry Coyne and Richard Dawkins, would claim that neo-Darwinisim can—and already does—account for the assumptions of the EES. However, it is clear that that claim is false. At its core, the MS is a gene-centered perspective whereas the EES is an organism-centered perspective.
To the followers of the MS, evolution occurs through random mutations and change in allele frequencies which then get selected for by natural selection since they lead to an increase in fitness in that organism, and so, that trait that the genes ’cause’ then carry on to the next generation due to its contribution to fitness in that organism. Drift, mutation and gene flow also account for changes in genetic frequencies, but selection is the strongest of these modes of evolution to the Darwinian. The debate about the MS and the EES comes down to gene-selectionism vs developmental systems theory.
On the other hand, the EES is an organism-centered perspective. Adherents to the EES state that the organism is inseparable from its environment. Jarvilehto (1998) describes this well:
The theory of the organism-environment system (Jairvilehto, 1994, 1995) starts with the proposition that in any functional sense organism and environment are inseparable and form only one unitary system. The organism cannot exist without the environment and the environment has descriptive properties only if it is connected to the organism.
At its core, the EES makes evolution about the organism—its developmental system—and relegates genes, not as active causes of traits and behaviors, but as passive causes, being used by and for the system as needed (Noble, 2011; Richardson, 2017).
One can see that the core assumptions of the MS are very much like what Dawkins describes in his book The Selfish Gene (Dawkins, 1976). In the book, Dawkins claimed that we are what amounts to “gene machines”—that is, just vehicles for the riders, the genes. So, for example, since we are just gene machines, and if genes are literally selfish “things”, then all of our actions and behaviors can be reduced to the fact that our genes “want” to survive. But the “selfish gene” theory “is not even capable of direct empirical falsification” (Noble, 2011) because Richard Dawkins emphatically stated in The Extended Phenotype (Dawkins, 1982: 1) that “I doubt that there is any experiment that could prove my claim” (quoted in Noble, 2011).
Noble (2011) goes on to discuss Dawkins’ view that on genes:
Now they swarm in huge colonies, safe inside gigantic lumbering robots, sealed off from the outside world, communicating with it by tortuous indirect routes, manipulating it by remote control. They are in you and me; they created us, body and mind; and their preservation is the ultimate rationale for our existence. (1976, 20)
Noble then switches the analogy: Noble likens genes, not as having a “selfish” attribute, but to that of being “prisoners”, stuck in the body with no way of escape. Noble then says that, since there is no experiment to distinguish between the two views (which Dawkins admitted). Noble then concludes that, instead of being “selfish”, the physiological sciences look at genes as “cooperative”, since they need to “cooperate” with the environment, other genes, gene networks etc which comprise the whole organism.
In his 2018 book Agents and Goals in Evolution Samir Okasha distinguishes between type I and type II agential thinking. “In type 1 [agential thinking], the agent with the goal is an evolved entity, typically an individual organism; in type 2, the agent is ‘mother nature’, a personification of natural selection” (Okasha, 2018: 23). An example of type I agential thinking is Dawkins’ selfish genes, while type II is the personification that one imputes onto natural selection—which Okasha states that this type of thinking “Darwin was himself first to employ” (Okasha, 2018: 36) it.
Okasha states that each gene’s ultimate goal is to outcompete other genes—for that gene in question to increase its frequency in the organism. They also can have intermediate goals which is to maximize fitness. Okasha gives three rationales on what makes something “an agent”: (1) goal-directedness; (2) behavioral flexibility; and (3) adaptedness. So the “selfish” element “constitutes the strongest argument for agential thinking” of the genes (Okasha, 2018: 73). However, as Denis Noble has tirelessly pointed out, genes (DNA sequences) are inert molecules (and are one part of the developing system) and so do not show behavioral flexibility or goal-directedness. Genes can (along with other parts of the system working in concert with them) exert adaptive effects on the phenotype, though when genes (and traits) are coextensive, selection cannot distinguish between the fitness-enhancing trait and the free-riding trait so it only makes logical sense to claim that organisms are selected, not any individual traits (Fodor and Piatteli-Palmarini, 2010a, 2010b).
It is because of this, that the Neo-Darwinian gene-centric paradigm has failed, and is the reason why we need a new evolutionary synthesis. Some only wish to tweak the MS a bit in order to allow what the MS does not incorporate in it, but others want to overhaul the entire thing and extend it.
Here is the main reason why the MS fails: there is absolutely no reason to privilege any level of the system above any other! Causation is multi-level and constantly interacting. There is no a priori justification for privileging any developmental variable over any other (Noble, 2012, 2017). Both downward and upward causation exists in biological systems (which means that molecules depend on organismal context). The organism also able to control stochasticity—which is “used to … generate novelty” (Noble and Noble, 2018). Lastly, there is the creation of novelty at new levels of selection, like with how the organism is an active participant in the construction of the environment.
Now, what does the EES bring that is different from the MS? A whole bunch. Most importantly, it makes a slew of novel predictions. Laland et al (2016) write:
For example, the EES predicts that stress-induced phenotypic variation can initiate adaptive divergence in morphology, physiology and behaviour because of the ability of developmental mechanisms to accommodate new environments (consistent with predictions 1–3 and 7 in table 3). This is supported by research on colonizing populations of house finches , water fleas  and sticklebacks [55,133] and, from a more macro-evolutionary perspective, by studies of the vertebrate limb . The predictions in table 3 are a small subset of those that characterize the EES, but suffice to illustrate its novelty, can be tested empirically, and should encourage deriving and testing further predictions.
There are other ways to verify EES predictions, and they’re simple and can be done in the lab. In his book Above the Gene, Beyond Biology: Toward a Philosophy of Epigenetics, philosopher of biology Jan Baedke notes that studies of epigenetic processes which are induced in the lab and those that are observed in nature are similar in that they share the same methodological framework. So we can use lab-induced epigenetic processes to ask evolutionary questions and get evolutionary answers in an epigenetic framework. There are two problems, though. One, that we don’t know whether experimental and natural epigenetic inducements will match up; and two we don’t know whether or not these epigenetic explanations that focus on proximate causes and not ultimate causes can address evolutionary explananda. Baedke (2018: 89) writes:
The first has been addressed by showing that studies of epigenetic processes that are experimentally induced in the lab (in molecular epigenetics) and those observed in natural populations in the field (in ecological or evolutionary epigenetics) are not that different after all. They share a similar methodological framework, one that allows them to pose heuristically fruitful research questions and to build reciprocal transparent models. The second issue becomes far less fundamental if one understands the predominant reading of Mayr’s classical proximate-ultimate distinction as offering a simplifying picture of what (and how) developmental explanations actually explain. Once the nature of developmental dependencies has been revealed, the appropriateness of developmentally oriented approaches, such as epigenetics, in evolutionary biology is secured.
Further arguments for epigenetics from an evolutionary approach can be found in Richardson’s (2017) Genes, Brains, and Human Potential (chapter 4 and 5) and Jablonka and Lamb’s (2005) Evolution in Four Dimensions. More than genes alone are passed on and inherited, and this throws a wrench into the MS.
Some may fault DST for not offering anything comparable to Darwinisim, as Dupre (2003: 37) notes:
Critics of DST complain that it fails to offer any positive programme that has achievements comparable to more orthodox neo-Darwinism, and so far this complaint is probably justified.
But this is irrelevant. For if we look at DST as just a part of the whole EES programme, then it is the EES that needs to—and does—“offer a positive programme that has achievements comparable to more orthodox neo-Darwinism” (Dupre, 2003: 37). And that is exactly what the EES does: it makes novel predictions; it explains what needs to be explained better than the MS; and the MS has shown to be incoherent (that is, there cannot be selection on only one level; there can only be selection on the organism). That the main tool of the MS (natural selection) has been shown by Fodor to be vacuous and non-mechanistic is yet another strike against it.
Since DST is a main part of the EES, and DST is “a wholeheartedly epigenetic approach to development, inheritance and evolution” (Griffiths, 2015) and the EES incorporates epigenetic theories, then the EES will live or die on whether or not its evolutionary epigenetic theories are confirmed. And with the recent slew of books and articles that attest to the fact that there is a huge component to evolutionary epigenetics (e.g., Baedke, 2018; Bonduriansky and Day, 2018; Meloni, 2019), it is most definitely worth seeing what we can find in regard to evolutionary epigenetics studies, since epigenetic changes induced in the lab and those that are observed in natural populations in nature are not that different. This can then confirm or deconfirm major hypotheses of the EES—of which there are many. It is time for Lamarck to make his return.
It is clear that the MS is lacking, as many authors have pointed out. To understand evolutionary history and why organisms have the traits they do, we need much more than the natural selection-dominated neo-Darwinian Modern Synthesis. We need a new synthesis (which has been formulated for the past 15-20 years) and only through this new synthesis can we understand the hows and whys. The MS was good when we didn’t know any better, but the reductionism it assumes is untenable; there cannot be any direct selection on any level (i.e., the gene) so it is a nonsensical programme. Genes are not directly selected, nor are traits that enhance fitness. Whole organisms and their developmental systems are selected and propagate into future generations.
The EES (and DST along with it) hold right to the causal parity thesis—“that genes/DNA play an important role in development, but so do other variables, so there is no reason to privilege genes/DNA above other developmental variables.” This causal parity between all tools of development is telling: what is selected is not just one level of the system, as genetic reductionists (neo-Darwinists) would like to believe; it occurs on the whole organism and what it interacts with (the environment); environments are inherited too. Once we purge the falsities that were forced upon us by the MS in regard to organisms and their relationship with the environment and the MS’s assumptions about evolution as a whole, we can then truly understand how and why organisms evolve the phenotypes they do; we cannot truly understand the evolution of organisms and their phenotypes with genetic reductionist thinking with sloppy logic. So who wins? The MS does not, since it has causation in biology wrong. This only leaves us with the EES as the superior theory, predictor, and explainer.