NotPoliticallyCorrect

Home » Refutations

Category Archives: Refutations

Advertisements

Test Validity, Test Bias, Test Construction, and Item Selection

3400 words

Validity for IQ tests is fleeting. IQ tests are said to be “validated” on the basis of performance with other IQ tests and that of job performance (see Richardson and Norgate, 2015). Further, IQ tests are claimed to not be biased against social class or racial group. Finally, through the process of “item selection”, test constructors make the types of distributions they want (normal) and get the results the want through the subjective procedure of removing items that don’t agree with their pre-conceived notions on who is or is not “intelligent.” Lastly, “intelligence” is descriptive measure, not an explanatory concept, and treating it like an explanatory measure can—and does—lead to circularity (of which is rife in the subject of IQ testing; see Richardson, 2017b and Taleb’s article IQ is largely a psuedoscientific swindle). This article will show that, on the basis of test construction, item analysis (selection and deselection of items) and the fact that there is no theory of what is being measured in so-called intelligence tests that they, in fact, do not test what they purport to.

Richardson (1991: 17) states that “To measure is to give … a more reliable sense of quantity than our senses alone can provide”, and that “sensed intelligence is not an objective quantity in the sense that the same hotness of a body will be felt by the same humans everywhere (given a few simple conditions); what, in experience, we choose to call ‘more’ intelligence, and what ‘less’ a social judgement that varies from people to people, employing different criteria or signals.Richardson (1991: 17-18) goes on to say that:

Even if we arrive at a reliable instrument to parallel the experience of our senses, we can claim no more for it than that, without any underlying theory which relates differences in the measure to differences in some other, unobserved, phenomena responsible for those differences. Without such a theory we can never be sure that differences in the measure correspond with our sensed intelligence aren’t due to something else, perhaps something completely different. The phenomenon we at first imagine may not even exist. Instead, such verification most inventors and users of measures of intelligence … have simply constructed the source of differences in sensed intelligence as an underlying entity or force, rather in the way that children and naïve adults perceive hotness as a substance, or attribute the motion of objects to a fictitious impetus. What we have in cases like temperature, of course, are collateral criteria and measures that validate the theory, and thus the original measures. Without these, the assumed entity remains a fiction. This proved to be the case with impetus, and with many other naïve conceptions of nature, such as phlogiston (thought to account for differences in health and disease). How much greater such fictions are likely to be unobserved, dynamic and socially judged concepts like intelligence.

Richardson (1991: 32-35) then goes on to critique many of the old IQ tests, in that they had no way of being construct valid, and that the manuals did not even discuss the validity of the test—it was just assumed.

If we do not know what exactly is being measured when test constructors make and administer these tests, then how can we logically state that “IQ tests test intelligence”? Even Arthur Jensen admitted that psychometricians can create any type of distribution they please (1980: 71); he tacitly admits that tests are devised through the selection and deselection of items on IQ tests that correspond to the test constructors preconceived notions on what “intelligence” is. This, again, is even admitted by Jensen (1980: 147-148) who writes “The items must simply emerge arbitrarily from the heads of test constructors.

We know, to build on Richardson’s temperature example, that we know exactly is what being measured when we look at the amount of mercury in a thermometer. That is, the concept of “temperature” and the instrument to measure it (the thermometer) were verified independently, without circular reliance on the thermometer itself (see Hasok Chang’s 2007 book Inventing Temperature). IQ tests, on the other hand, are, supposedly, “validated” through measures of job performance and correlations with other, previous tests assumed to be (construct) valid—but they were, of course, just assumed to be valid, it was never shown.

For another example (as I’ve shown with IQ many times) of a psychological construct that is not valid is ASD (autism spectrum disorder). Waterhouse, London, and Gilliberg (2016) write that “14 groups of findings reviewed in this paper that together argue that ASD lacks neurobiological and construct validity. No unitary ASD brain impairment or replicated unitary model of ASD brain impairment exists.” That a construct is valid—that is, it tests what it purports to, is of utmost importance to test measurement. Without it, we don’t know if we’re measuring something else completely different from what we hope—or purport—to.

There is another problem: the fact that, for one of the most-used IQ tests that there is no underlying theory of item selection, as seen in John Raven’s personal notes (see Carpenter, Just, and Shell, 1990). Items on the Raven were selected based on Raven’s intuition, and not any formal theory—the same can be said about, of course, modern-day IQ tests. Carpenter, Just, and Shell (1990) write that John Raven “used his intuition and clinical experience to rank order the difficulty of the six problem types . . . without regard to any underlying processing theory.”

These preconceived notions on what “intelligence” is, though, fail without (1) a theory of what intelligence is (which, as admitted by Ian Deary (2001), there is no theory of human intelligence like the way physics has theories); and (2) what ultimately is termed “construct validity”—that a test measures what it purports to. There are a few kinds of validity: and what IQ-ists claim the most is that IQ tests have predictive validity—that is, they can predict an individual’s outcome in life, and job performance (it is claimed). However, “intelligence” is “a descriptive measure, not an explanatory concept … [so] measures of intelligence level have little or no predictive value” (Howe, 1988).

Howe (1997: ix) also tells us that “Intelligence is … an outcome … not a cause. … Even the most confidently stated assertions about intelligence are often wrong, and the inferences that people have drawn from those assertions are unjustified.

The correlation between IQ and school performance, according to Richardson (1991: 34)may be a necessary aspect of the validity of tests, but is not a sufficient one. Such evidence, as already mentioned, requires a clear connection between a theory (a model of intelligence), and the values on the measure.” But, as Richardson (2017: 85) notes:

… it should come as no surprise that performance on them [IQ tests] is associated with school performance. As Robert L. Thorndike and Elizabeth P. Hagen explained in their leading textbook, Educational and Psychological Measurement, “From the very way in which the tests were assembled [such correlation] could hardly be otherwise.”

Gottfredson (2009) claims that the construct validity argument against IQ is “fallacious”, noting it as one of her “fallacies” on intelligence testing (one of her “fallacies” was the “interactionism fallacy”, which I have previously discussed). However, unfortunately for Gottfredson (2009), “the phenomena that testers aim to capture” are built into the test and, as noted here numerous times, preconceived by the constructors of the test. So, Gottfredson’s (2009) claim fails.

Such kinds of construction, too, come into the claim of a “normal distribution.” Just like with preconceptions of who is or is not “intelligent” on the basis of preconceived notions, the normal distribution, too, is an artifact of test construction, along the selection and deselection of items to conform with the test constructors’ presuppositions; the “bell curve” of IQ is created by the presuppositions that the test constructors have about people and society (Simon, 1997).

Charles Spearman, in the early 1900s, claims to have found a “general factor” that explains correlations between different tests. This positive manifold he termed “g” for “general intelligence.” Spearman stated “The (g) factor was taken, pending further information, to consist in something of the nature of an ‘energy’ or ‘power’…” (quoted in Richardson, 1991: 38). The refutation of “g” is a simple, logical, one: While a correlation between performances “may be a necessary requirement for a general factor … it is not a sufficient one.” This is because “it is quite possible for quite independent factors to produce a hierarchy of correlations without the existence of any underlying ‘general’ factor (Fancer, 1985a; Richardson and Bynner, 1984)” (Richardson, 1991: 38). The fact of the matter is, Spearman’s “g” has been refuted for decades (and was shown to be reified by Gould (1981), and further defenses of his concepts on “general intelligence”, like by Jensen (1998) have been refuted, most forcefully by Peter Schonemann. Though, “g” is something built into the test by way of test construction (Richardson, 2002).

Castles (2013: 93) notes that “Spearman did not simply discover g lurking in his data. Instead, he chose one peculiar interpretation of the relationships to demonstrate something in which he already believed—unitary, biologically based intelligence.”

So what explains differences in “g”? The same test construction noted above along with differences in social class, due to stress, self-confidence, test preparedness and other factors correlated with social class, termed the “sociocognitive-affective nexus” (Richardson, 2002).

Constance Hilliard, in her book Straightening the Bell Curve (Hilliard, 2012), notes that there were differences in IQ between rural and urban white South Africans. She notes that differences between those who spoke Afrikaans and those who spoke another language were completely removed through test construction (Hilliard, 2012: 116). Hilliard (2012) notes that if the tests that the constructors formulate don’t agree with their preconceived notions, they are then thrown out:

If the individuals who were supposed to come out on top didn’t score highly or, conversely, if the individuals who were assumed would be at the bottom of the scores didn’t end up there, then the test designers scrapped the test.

Sex differences in “intelligence” (IQ) have been the subject of some debate in the early-to-mid-1900s. Test constructors debated amongst themselves what to do about such differences between the sexes. Hilliard (2012) quotes Harrington (1984; in Perspectives on Bias in Mental Testing) who writes about normalizing test scores between men and women:

It was decided [by IQ test writers] a priori that the distribution of intelligence-test scores would be normal with a mean (X=100) and a standard deviation (SD=15), also that both sexes would have the same mean and distribution. To ensure the absence of sex differences, it was arranged to discard items on which the sexes differed. Then, if not enough items remained, when discarded items were reintroduced, they were balanced, i.e., for every item favoring males, another one favoring females was also introduced.

While Richardson (1998: 114) notes that test constructors had two choices when looking at sex differences in the items they administered to the sexes:

One who would construct a test for intellectual capacity has two possible methods of handling the problem of sex differences.
1 He may assume that all the sex differences yielded by his test items are about equally indicative of sex differences in native ability.
2 He may proceed on the hypothesis that large sex differences on items of the Binet type are likely to be factitious in the sense that they reflect sex differences in experience or training. To the extent that this assumption is valid, he will be justified in eliminating from his battery test items which yield large sex differences.
The authors of the New Revision have chosen the second of these alternatives and sought to avoid using test items showing large differences in percents passing. (McNemar 1942:56)

Change “sex differences” to “race” or “social class” differences and we can, too, change the distribution of the curve, along with notions of who is or is not “intelligent.” Previously low scorers can, by way of test construction, become high scorers, vice-versa for high scorers being made into low scorers. There is no logical—or empirical—justification for the inclusion of specific items on whatever IQ test is in question. That is, to put it another way, the inclusion of items on a test is subjective, which comes down to the test designers’ preconceived notions, and not an objective measure of what types of items should be on the test—as Raven stated, there is no type of underlying theory for the inclusion of items in the test, it is based on “intuition” (which is the same thing that modern-day test constructors do). These two quotes from IQ-ists in the early 20th century are paramount in the attack on the validity of IQ tests—and the causes for differences in scores between groups.

He and van de Vijver (2012: 7) write that “An item is biased when it has a different psychological meaning across cultures. More precisely, an item of a scale (e.g., measuring anxiety) is said to be biased if persons with the same trait, but coming from different cultures, are not equally likely to endorse the item (Van de Vijver & Leung, 1997).” Indeed, Reynolds and Suzuki (2012: 83) write that “Item bias due to“:

… “poor item translation, ambiguities in the original item, low familiarity/appropriateness of the item content in certain cultures, or influence of culture specifics such as nuisance factors or connotations associated with the item wording” (p. 127) (van de Vijver and Tanzer, 2004)

Drame and Ferguson (2017) note that their “Results indicate that use of the Ravens may substantially underestimate the intelligence of children in Mali” while the cause may be due to the fact that:

European and North American children may spend more time with play tasks such as jigsaw puzzles or connect the dots that have similarities with the Ravens and, thus, train on similar tasks more than do African children. If African children spend less time on similar tasks, they would have fewer opportunities to train for the Ravens (however unintentionally) reflecting in poorer scores. In this sense, verbal ability need not be the only pitfall in selecting culturally sensitive IQ testing approaches. Thus, differences in Ravens scores may be a cultural artifact rather than an indication of true intelligence differences. [Similar arguments can be found in Richardson, 2002: 291-293]

The same was also found by Dutton et al (2017) who write that “It is argued that the undeveloped nature of South Sudan means that a test based around shapes and analytic thinking is unsuitable. It is likely to heavily under-estimate their average intelligence.” So if the Raven has these problems cross-culturally (country), then it SHOULD have such biases within, say, America.

It is also true that the types of items on IQ tests are not as complex as everyday life (see Richardson and Norgate, 2014). Types of questions on IQ tests are, in effect, ones of middle-class knowledge and skills and, knowing how IQ tests are structured will make this claim clear (along with knowing the types of items that eventually make it onto the particular IQ test itself). Richardson (2002) has a few questions on modern-day IQ tests whereas Castles (2013), too, has a few questions from the Stanford-Binet. This, of course, is due to the social class of the test constructors. Some examples of some questions can be seen here:

‘What is the boiling point of water?’ ‘Who wrote Hamlet?’ ‘In what continent is Egypt?’ (Richardson, 2002: 289)

and

‘When anyone has offended you and asks you to excuse him—what ought you do?’ ‘What is the difference between esteem and affection?’ [this is from the Binet Scales, but “It is interesting to note that similar items are still found on most modern intelligence tests” (Castles, 2013).]]

Castles (2013: 150) further notes made-up examples of what is on the WAIS (since she cannot legally give questions away since she is a licensed psychologist), and she writes:

One section of the WAIS-III, for example, consists of arithmetic problems that the respondent must solve in his or her head. Others require test-takers to define a series of vocabulary words (many of which would be familiar only to skilled-readers), to answer school-related factual questions (e.g., “Who was the first president of the United States?” or “Who wrote the Canterbury Tales?”), and to recognize and endorse common cultural norms and values (e.g., “What should you do it a sale clerk accidentally gives you too much change?” or “Why does our Constitution call for division of powers?”). True, respondents are also given a few opportunities to solve novel problems (e.g., copying a series of abstract designs with colored blocks). But even these supposedly culture-fair items require an understanding of social conventions, familiarity with objects specific to American culture, and/or experience working with geometric shapes or symbols.

All of these factors coalesce into forming the claim—and the argument—that IQ tests are one of middle-class knowledge and skills. The thing is, contrary to the claims of IQ-ists, there is no such thing as a culture-free IQ test. Richardson (2002: 293) notes that “Since all human cognition takes place through the medium of cultural/psychological tools, the very idea of a culture-free test is, as Cole (1999) notes, ‘a contradiction in terms . . . by its very nature, IQ testing is culture bound’ (p. 646). Individuals are simply more or less prepared for dealing with the cognitive and linguistic structures built in to the particular items.

Cole (1981) notes that “that the notion of a culture free IQ test is an absurdity” because “all higher psychological processes are shaped by our experiences and these experiences are culturally organized” (this is a point that Richardson has driven home for decades) while also—rightly—stating that “IQ tests sample school activities, and therefore, indirectly, valued social activities, in our culture.

One of the last stands for the IQ-ist is to claim that IQ tests are useful for identifying at-risk individuals for learning disabilities (as Binet originally created the first IQ tests for). However, it is noted that IQ tests are not necessary—nor sufficient—for the identification of those with learning disabilities. Siegal (1989) states that “On logical and empirical grounds, IQ test scores are not necessary for the definition of learning disabilities.

When Goddard brought the first IQ tests to America and translated them into English from French is when the IQ testing conglomerate really took off (see Zenderland, 1998 for a review). These tests were used to justify current social ranks. As Richardson (1991: 44) notes, “The measurement of intelligence in the twentieth century arose partly out of attempts to ‘prove’ or justify a particular world view, and partly for purposes of screening and social selection. It is hardly surprising that its subsequent fate has been one of uncertainty and controversy, nor that it has raised so many social and political issues (see, for example, Joynson 1989 for discussion of such issues).” So, what actual attempts at validation did the constructors of such tests need in the 20th century when they knew full-well what they wanted to show and, unsurprisingly, they observed it (since it was already going to happen since they construct the test to be that way)?

The conceptual arguments just given here point to a few things:

(1) IQ tests are not construct valid because there is no theory of intelligence, nor is there an underlying theory which relates differences in IQ (the unseen function) to, for example, a physiological variable. (See Uttal, 2012; 2014 for arguments against fMRI studies that purport to show differences in physiological variables cognition.)

(2) The fact that items on the tests are biased against certain classes/cultures; this obviously matters since, as noted above, there is no theory for the inclusion of items, it comes down to the subjective choice of the test designers, as noted by Jensen.

(3) ‘g’ is a reified mathematical abstraction; Spearman “discovered” nothing, he just chose the interpretation that, of course, went with his preconceived notion.

(4) The fact that sex differences in IQ scores were seen as a problem and, through item analysis, made to go away. This tells us that we can do the same for class/race differences in intelligence. Score differences are a function of test construction.

(5) The fact that the Raven has been shown to be biased in two African countries lends credence to the claims here.

So this then brings us to the ultimate claim of this article: IQ tests don’t test intelligence; they test middle-class knowledge and skills. Therefore, the scores on IQ tests are not that of intelligence, but of an index of one’s cultural knowledge of the middle class and its knowledge structure. This, IQ scores are, in actuality, “middle-class knowledge and skills” scores. So, contra Jensen (1980), there is bias in mental testing due to the items chosen for inclusion on the test (we have admission that score variances and distributions can change from IQ-ists themselves)

Advertisements

Chopsticks Genes and Population Stratification

1200 words

Why do some groups of people use chopsticks and others do not? Years back, created a thought experiment. So he found a few hundred students from a university and gathered DNA samples from their cheeks which were then mapped for candidate genes associated with chopstick use. Come to find out, one of the associated genetic markers was associated with chopstick use—accounting for 50 percent of the variation in the trait (Hamer and Sirota, 2000). The effect even replicated many times and was highly significant: but it was biologically meaningless.

One may look at East Asians and say “Why do they use chopsticks” or “Why are they so good at using them while Americans aren’t?” and come to such ridiculous studies such as the one described above. They may even find an association between the trait/behavior and a genetic marker. They may even find that it replicates and is a significant hit. But, it can all be for naught, since population stratification reared its head. Population stratification “refers to differences in allele frequencies between cases and controls due to systematic differences in ancestry rather than association of genes with disease” (Freedman et al, 2004). It “is a potential cause of false associations in genetic association studies” (Oetjens et al, 2016).

Such population stratification in the chopsticks gene study described above should have been anticipated since they studied two different populations. Kaplan (2000: 67-68) described this well:

A similar argument, bu the way, holds true for molecular studies. Basically, it is easy to mistake mere statistical associations for a causal connection if one is not careful to properly partition one’s samples. Hamer and Copeland develop and amusing example of some hypothetical, badly misguided researchers searching for the “successful use of selected hand instruments” (SUSHI) gene (hypothesized to be associated with chopstick usage) between residents in Tokyo and Indianapolis. Hamer and Copeland note that while you would be almost certain to find a gene “associated with chopstick usage” if you did this, the design of such a hypothetical study would be badly flawed. What would be likely to happen here is that a genetic marker associated with the heterogeneity of the group involved (Japanese versus Caucasian) would be found, and the heterogeneity of the group involved would independently account for the differences in the trait; in this case, there is a cultural tendency for more people who grow up in Japan than people who grow up in Indianapolis to learn how to use chopsticks. That is, growing up in Japan is the causally important factor in using chopsticks; having a certain genetic marker is only associated with chopstick use in a statistical way, and only because those people who grow up in Japan are also more likely to have the marker than those who grew up in Indianapolis. The genetic marker is in no way causally related to chopstick use! That the marker ends up associated with chopstick use is therefore just an accident of design (Hamer and Copeland, 1998, 43; Bailey 1997 develops a similar example).

In this way, most—if not all—of the results of genome-wide association studies (GWASs) can be accounted for by population stratification. Hamer and Sirota (2000) is a warning to psychiatric geneticists to not be quick to ascribe function and causation to hits on certain genes from association studies (of which GWASs are).

Many studies, for example, Sniekers et al (2017), Savage et al (2018) purport to “account for” less than 10 percent of the variance in a trait, like “intelligence” (derived from non-construct valid IQ tests). Other GWA studies purport to show genes that affect testosterone production and that those who have a certain variant are more likely to have low testosterone (Ohlsson et al, 2011). Population stratification can have an effect here in these studies, too. GWASs; they give rise to spurious correlations that arise due to population structure—which is what GWASs are actually measuring, they are measuring social class, and not a “trait” (Richardson, 2017b; Richardson and Jones, 2019). Note that correcting for socioeconomic status (SES) fails, as the two are distinct (Richardson, 2002). (Note that GWASs lead to PGSs, which are, of course, flawed too.)

Such papers presume that correlations are causes and that interactions between genes and environment either don’t exist or are irrelevant (see Gottfredson, 2009 and my reply). Both of these claims are false. Correlations can, of course, lead to figuring out causes, but, like with the chopstick example above, attributing causation to things that are even “replicable” and “strongly significant” will still lead to false positives due to that same population stratification. Of course, GWAS and similar studies are attempting to account for the heriatbility estimates gleaned from twin, family, and adoption studies. Though, the assumptions used in these kinds of studies are shown to be false and, therefore, heritability estimates are highly exaggerated (and flawed) which lead to “looking for genes” that aren’t there (Charney, 2012; Joseph et al, 2016; Richardson, 2017a).

Richardson’s (2017b) argument is simple: (1) there is genetic stratification in human populations which will correlate with social class; (2) since there is genetic stratification in human populations which will correlate with social class, the genetic stratification will be associated with the “cognitive” variation; (3) if (1) and (2) then what GWA studies are finding are not “genetic differences” between groups in terms of “intelligence” (as shown by “IQ tests”), but population stratification between social classes. Population stratification still persists even in “homogeneous” populations (see references in Richardson and Jones, 2019), and so, the “corrections for” population stratification are anything but.

So what accounts for the small pittance of “variance explained” in GWASs and other similar association studies (Sniekers et al, 2017 “explained” less than 5 percent of variance in IQ)? Population stratification—specifically it is capturing genetic differences that occurred through migration. GWA studies use huge samples in order to find the genetic signals of the genes of small effect that underline the complex trait that is being studied. Take what Noble (2018) says:

As with the results of GWAS (genome-wide association studies) generally, the associations at the genome sequence level are remarkably weak and, with the exception of certain rare genetic diseases, may even be meaningless (1321). The reason is that if you gather a sufficiently large data set, it is a mathematical necessity that you will find correlations, even if the data set was generated randomly so that the correlations must be spurious. The bigger the data set, the more spurious correlations will be found (3).

Calude and Longo (2016; emphasis theirs) “prove that very large databases have to contain arbitrary correlations. These correlations appear only due to the size, not the nature, of data. They can be found in “randomly” generated, large enough databases, which — as we will prove — implies that most correlations are spurious.”

So why should we take association studies seriously when they fall prey to the problem of population stratification (measuring differences between social classes and other populations) along with the fact that big datasets lead to spurious correlations? I fail to think of a good reason why we should take these studies seriously. The chopsticks gene example perfectly illustrates the current problems we have with GWASs for complex traits: we are just seeing what is due to social—and other—stratification between populations and not any “genetic” differences in the trait that is being looked at.

The “Interactionism Fallacy”

2350 words

A fallacy is an error in reasoning that makes an argument invalid. The “interactionism fallacy” is the fallacy—coined by Gottfredson (2009)—that since genes and environment interact, that heritability estimates are not useful—especially for humans (they are for nonhuman animals where environments can be fully controlled; see Schonemann, 1997; Moore and Shenk, 2016). There are many reasons why this ‘fallacy’ is anything but a fallacy; it is a simple truism: genes and environment (along with other developmental products) interact to ‘construct’ the organism (what Oyama, 2000 terms ‘constructive interactionism—“whereby each combination of genes and environmental influences simultaneously interacts to produce a unique result“). The causal parity thesis (CPT) is the thesis that genes/DNA play an important role in development, but so do other variables, so there is no reason to privilege genes/DNA above other developmental variables (see Noble, 2012 for a similar approach). Genes are not special developmental resources and so, nor are they more important than other developmental resources. So the thesis is that genes and other developmental resources are developmentally ‘on par’.

Genes need the environment. Without the environment, genes would not be expressed. Behavior geneticists claim to be able to partition genes from environment—nature from nurture—on the basis of heritability estimates, mostly gleaned from twin and adoption studies. However, the method is flawed: since genes interact with the environment and other genes, how would it be possible to neatly partition the effects of genes from the effects of the environment? Behavior geneticists claim that we can partition these two variables. Behavior geneticists—and others—cite the “Interactionism fallacy”, the fallacy that since genes interact with the environment that heritability estimates are useless. This “fallacy”, though, confuses the issue.

Behavior geneticists claim to show how genes and the environment affect the ontogeny of traits in humans with twin and adoption studies (though these methods are highly flawed). The purpose of this “fallacy” is to disregard what developmental systems theorists claim about the interaction of nature and nurture—genes and environment.

Gottfredson (2009) coins the “interactionism fallacy”, which is “an irrelevant truth [which is] that an organism’s development requires genes and environment to act in concert” and the “two forces are … constantly interacting” whereas “Development is their mutual product.” Gottfredson also states that “heritability … refers to the percentage of variation in … the phenotype, which has been traced to genetic variation within a particular population.” (She also makes the false claim that “One’s genome is fixed at birth“; though this is false, see epigenetics/methylation studies.) Heritability estimates, according to Phillip Kitcher are “‘irrelevant’ and the fact that behavior geneticists persist
in using them is ‘an unfortunate tic from which they cannot free themselves’ (Kitcher,
2001: 413)” (quoted in Griffiths, 2002).

Gottfredson is engaging in developmental denialism. Developmental denialismoccurs when heritability is treated as a causal mechanism governing the developmental reoccurrence of traits across generations in individuals.” Gottfredson, with her “interactionism fallacy” is denying organismal development by attempting to partition genes from environment. As Rose (2006) notes, “Heritability estimates are attempts to impose a simplistic and reified dichotomy (nature/nurture) on non-dichotomous processes.” The nature vs nurture argument is over and neither has won—contra Plomin’s take—since they interact.

Gottfredson seems confused, since this point was debated by Plomin and Oyama back in the 80s (Plomin’s review of Oyama’s book The Ontogeny of Information; see Oyama, 1987, 1988; Plomin, 1988a, b). In any case, it is true that development requires genes to interact. But Gottfredson is talking about the concept of heritability—the attempt to partition genes and environment through twin, adoption and family studies (which have a whole slew of problems). For example, Moore and Shenk (2016: 6) write:

Heritability statistics do remain useful in some limited circumstances, including selective breeding programs in which developmental environments can be strictly controlled. But in environments that are not controlled, these statistics do not tell us much.

Susan Oyama writes in The Ontogeny of Information (2000, pg 67):

Heritability coefficients, in any case, because they refer not only to variation in genotype but to everything that varied (was passed on) with it, only beg the question of what is passed on in evolution. All too often heritability estimates obtained in one setting are used to infer something about an evolutionary process that occurred under conditions, and with respect to a gene pool, about which little is known. Nor do such estimates tell us anything about development.

Characters are produced by the interaction of nongenetic and genetic factors. The biological flaw, as Moore and Shenk note, throw a wrench into the claims of Gottfredson and other behavior geneticists. Phenotypes are ALWAYS due to genetic and nongenetic factors interacting. So the two flaws of heritability—the environmental and biological flaw (Moore and Shenk, 2016)—come together to “interact” to refute such simplistic claims that genes and environment—nature and nurture—can be separated.

For instance, as Moore (2016) writes, though “twin study methods are among the most powerful tools available to quantitative behavioral geneticists (i.e., the researchers who took up Galton’s goal of disentangling nature and nurture), they are not satisfactory tools for studying phenotype development because they do not actually explore biological processes.” (See also Richardson, 2012.) This is because twin studies ignore biological/developmental processes that lead to phenotypes.

Gamma and Rosenstock (2017) write that the concept of heritability that behavioral geneticists use is “is a generally useless quantity” while “the behavioral genetic dichotomy of genes vs environment is fundamentally misguided.” This brings us back to the CPT; there is causal parity to all processes/interactants that form the organism and its traits, thus the concept of heritability that behavioral geneticists employ is a useless measure. Oyama, Griffiths, and Gray (2001: 3) write:

These often overlooked similarities form part of the evidence for DST’s claim of causal parity between genes and other factors of development. The “parity thesis” (Griffiths and Knight 1998) does not imply that there is no difference between the particulars of the causal roles of genes and factors such as endosymbionts or imprinting events. It does assert that such differences do not justify building theories of development and evolution around a distinction between what genes do and what every other causal factor does.

Behavior geneticists’ endeavor, though, is futile. Aaron Panofsky (2016: 167) writes that “Heritability estimates do not help identify particular genes or ascertain their functions in development or physiology, and thus, by this way of thinking, they yield no causal information.” (Also see Panofsky, 2014; Misbehaving Science: Controversy and the Development of Behavior Genetics.) So, the behavioral genetic method of partitioning genes and environment does not—and can not—show causation for trait ontogeny.

Now, while people like Gottfredson and others may deny it, they are genetic determinists. Genetic determinism, as defined by Griffiths (2002) is “the idea that many significant human characteristics are rendered inevitable by the presence of certain genes.” Using this definition, many behavior geneticists and their sympathizers have argued that certain traits are “inevitable” due to the presence of certain genes. Genetic determinism is literally the idea that genes “determine” aspects of characters and traits, though it has been known for decades that it is false.

Now we can take a look at Brian Boutwell’s article Not Everything Is An Interaction. Boutwell writes:

Albert Einstein was a brilliant man. Whether his famous equation of E=mc2 means much to you or not, I think we can all concur on the intellectual prowess—and stunning hair—of Einstein. But where did his brilliance come from? Environment? Perhaps his parents fed him lots of fish (it’s supposed to be brain food, after all). Genetics? Surely Albert hit some sort of genetic lottery—oh that we should all be so lucky. Or does the answer reside in some combination of the two? How very enlightened: both genes and environment interact and intertwine to yield everything from the genius of Einstein to the comedic talent of Lewis Black. Surely, you cannot tease their impact apart; DNA and experience are hopelessly interlocked. Except, they’re not. Believing that they are is wrong; it’s a misleading mental shortcut that has largely sown confusion in the public about human development, and thus it needs to be retired.

[…]

Most traits are the product of genetic and environmental influence, but the fact that both genes and environment matter does not mean that they interact with one another. Don’t be lured by the appeal of “interactions.” Important as they might be from time to time, and from trait to trait, not everything is an interaction. In fact, many things likely are not.

I don’t even know where to begin here. Boutwell, like Gottfredson, is confused. The only thing that needs to be retired because it “has largely sown confusion in the public about human development” is, ironically, the concept of heritability (Moore and Shenk, 2016)! I have no idea why Boutwell claimed that it’s false that “DNA and experience [environment] are hopelessly interlocked.” This is because, as Schneider (2007) notes, “the very concept of a gene requires an environment.” Since the concept of the gene requires the environment, how can we disentangle them into neat percentages like behavior geneticists claim to do? That’s right: we can’t. Do be lured by the appeal of interactions; all biological and nonbiological stuff constantly interacts with one another.

Boutwell’s claims are nonsense. It would be worth it to quote Richard Lewontin’s forward in the 2000 2nd edition of Susan Oyama’s The Ontogeny of Information (emphasis Lewontin’s):

Nor can we partition variation quantitatively, ascribing some fraction of variation to genetic differences and the remainder to environmental variation. Every organism is the unique consequence of the reading of its DNA in some temporal sequence of environments and subject to random cellular events that arise because of the very small number of molecules in each cell. While we may calculate statistically an average difference between carriers of one genotype and another, such average differences are abstract constructs and must not be reified with separable concrete effects of genes in isolation from the environment in which the genes are read. In the first edition of The Ontogeny of Information Oyama characterized her construal of the causal relation between genes and environment as interactionist. That is, each unique combination of genes and environment produces a unique and a priori unpredictable outcome of development. The usual interactionist view is that there are separable genetic and environmental causes, but the effects of these causes acting in combination are unique to the particular combination. But this claim of ontogenetically independent status of the causes as causes, aside from their interaction in the effects produced, contradicts Oyama’s central analysis of the ontogeny of information. There are no “gene actions” outside environments, and no “environmental actions” can occur in the absence of genes. The very status of environment as a contributing cause to the nature of an organism depends on the existence of a developing organism. Without organisms there may be a physical world, but there are no environments. In like the manner no organisms exist in the abstract without environments, although there may be naked DNA molecules lying in the dust. Organisms are the nexus of external circumstances and DNA molecules that make these physical circumstances into causes of development in the first place. They become causes only at their nexus, and they cannot exist as causes except in their simultaneous action. That is the essence of Oyama’s claim that information comes into existence only in the process of ontogeny. (Oyama, 2000: 16)

There is an “interactionist consensus” (see Oyama, Griffiths, and Grey, 2001; What is Developmental Systems Theory? pg 1-13): the organism and the suite of traits it has is due to the interaction of genetic/environmental/epigenetic etc. resources at every stage of development. Therefore, for organismal development to be successful, it always requires the interaction of genes, environment, epigenetic processes, and interactions between everything that is used to ‘construct’ the organism and the traits it has. Thus “it makes no sense to ask if a particular trait is genetic or environmental in origin. Understanding how a trait develops is not a matter of finding out whether a particular gene or a particular environment causes the trait; rather, it is a matter of understanding how the various resources available in the production of the trait interact over time” (Kaplan, 2006).

Lastly, I will shortly comment on Sesardic’s (2005: chapter 2) critiques on developmental systems theorists and their critique of heritability and the concept of interactionism. Sesardic argues in the chapter that interaction between genes and environment, nature and nurture, does not undermine heritability estimates (the nature and nurture partition). Philosopher of science Helen Longino argues in her book Studying Human Behavior (2013):

By framing the debate in terms of nature versus nurture and as though one of these must be correct, Sesardic is committed to both downplaying the possible contributions of environmentally oriented research and to relying on a highly dubious (at any rate, nonmethodological) empirical claim.

In sum, the “interactionist fallacy” (coined by Gottfredson) is not a ‘fallacy’ (error in reasoning) at all. For, as Oyama writes in Evolution’s Eye: A Systems View of the Biology-Culture DivideA not uncommon reaction to DST is, ‘‘That’s completely crazy, and besides, I already knew it” (pg 195). This is exactly what Gottfredson (2009) states, that she “already knew” that there is an interaction between nature and nurture; but she goes on to deny arguments from Oyama, Griffiths, Stotz, Moore, and others on the uselessness of heritability estimates along with the claim that nature and nurture cannot be neatly partitioned into percentages as they are constantly interacting. Causal parity between genes and other developmental resources, too, upends the claim that heritability estimates for any trait make sense (not least for how heritability estimates are gleaned for humans—mostly twin, family, and adoption studies). Developmental denialism—what Gottfredson and others often engage in—runs rampant in the “behavioral genetic” sphere; and Oyama, Griffiths, Stotz, and others show how we should not deny development and we should discard with these estimates for human traits.

Heritability estimates imply that there is a “nature vs nurture” when it is “nature and nurture” which are constantly interacting—and, due to this, we should discard with these estimates due to the interaction of numerous developmental resources; it does not make sense to partition an interacting, self-organizing developmental system. Claims from behavior geneticists—that genes and environment can be separated—are clearly false.

Men Are Stronger Than Women

1200 words

The claim that “Men are stronger than women” does not need to be said—it is obvious through observation that men are stronger than women. To my (non-)surprise, I saw someone on Twitter state:

“I keep hearing that the sex basis of patriarchy is inevitable because men are (on average) stronger. Notwithstanding that part of this literally results from women in all stages of life being denied access to and discourage from physical activity, there’s other stuff to note.”

To which I replied:

“I don’t follow – are you claiming that if women were encouraged to be physically active that women (the population) can be anywhere *near* men’s (the population) strength level?”

I then got told to “Fuck off,” because I’m a “racist” (due to the handle I use and my views on the reality of race). In any case, while it is true that part of this difference does, in part, stem from cultural differences (think of women wanting the “toned” look and not wanting to get “big and bulky”—as if it happens overnight) and not wanting to lift heavy weights because they think they will become cartoonish.

Here’s the thing though: Men have about 61 percent more muscle mass than women (which is attributed to higher levels of testosterone); most of the muscle mass difference is allocated to the upper body—men have about 75 percent more arm muscle mass than women which accounts for 90 percent greater upper body strength in men. Men also have about 50 percent more muscle mass than women, while this higher percentage of muscle mass is then related to men’s 65 percent greater lower body strength (see references in Lassek and Gaulin, 2009: 322).

Men have around 24 pounds of skeletal muscle mass compared to women, though in this study, women were about 40 percent weaker in the upper body and 33 percent weaker in the lower body (Janssen et al, 2000). Miller et al (1993) found that women had a 45 percent smaller cross-section area in the brachii, 45 in the elbow flexion, 30 percent in the vastus lateralis, and 25 percent smaller CSA in the knee extensors, as I wrote in Muscular Strength by Gender and Race, where I concluded:

The cause for less upper-body strength in women is due the distribution of women’s lean tissue being smaller.

Men have larger fibers, which in my opinion is a large part of the reason for men’s strength advantage over women. Now, even if women were “discouraged” from physical activity, this would be a problem for their bone density. Our bones are porous, and so, by doing a lot of activity, we can strengthen our bones (see e.g., Fausto-Sterling, 2005). Bishop, Cureton, and Collins (1987) show that the sex difference in strength in close-to-equally-trained men and women “is almost entirely accounted for by the difference in muscle size.” Which lends credence to my claim I made above.

Lindle et al (1997) conclude that:

… the results of this study indicate that Con strength levels begin to decline in the fourth rather than in the fifth decade, as was previously reported. Contrary to previous reports, there is no preservation of Ecc compared with Con strength in men or women with advancing age. Nevertheless, the decline in Ecc strength with age appears to start later in women than in men and later than Con strength did in both sexes. In a small subgroup of subjects, there appears to be a greater ability to store and utilize elastic energy in older women. This finding needs to be confirmed by using a larger sample size. Muscle quality declines with age in both men and women when Con peak torque is used, but declines only in men when Ecc peak torque is used. [“Con” and “Ecc” strength refer to concentric and eccentric actions]

Women are shorter than men and have less fat-free muscle mass than men. Women also have a weaker grip (even when matched for height and weight, men had higher levels of lean mass compared to women (92 and 79 percent respectively; Nieves et al, 2009). So men had greater bone mineral density (BMD) and bone mineral content (BMC) compared to women. Now do some quick thinking—do you think that one with weaker bones could be stronger than someone with stronger bones? If person A had higher levels of BMC and BMD compared to person B, who do you think would be stronger and have the ability to do whatever strength test the best—the one with the weaker or stronger muscles? Quite obviously, the stronger one’s bones are the more weight they can bare on them. So if one has weak bones (low BMC/BMD) and they put a heavy load on their back, while they’re doing the lift their bones could snap.

Alswat (2017) reviewed the literature on bone density between men and women and found that men had higher BMD in the hip and higher BMC in the lower spine. Women also had bone fractures earlier than men. Some of this is no doubt cultural, as explained above. However, even if we had a boy and a girl locked in a room for their whole lives and they did the same exact things, ate the same food, and lifted the same weights, I would bet my freedom that there still would be a large difference between the two, skewing where we know it would skew. Women are more likely to suffer from osteoporosis than are men (Sözen, Özışık, and Başaran 2016).

So if women have weaker bones compared to men, then how could they possibly be stronger? Even if men and women had the same kind of physical activity down to the tee, could you imagine women being stronger than men? I couldn’t—but that’s because I have more than a basic understanding of anatomy and physiology and what that means for differences in strength—or running—between men and women.

I don’t doubt that there are cultural reasons that account for the large differences in strength between men and women—I do doubt, though, that the gap can be meaningfully closed. Yes, biology interacts with culture. So the developmental variables that coalesce to make men “Men” and those that coalesce to make women “Women” converge in creating the stark differences in phenotype between the sexes which then explains how the sex differences between the sexes manifest itself.

Differences in bone strength between men and women, along with distribution of lean tissue, differences in lean mass, and differences in muscle size explain the disparity in muscular strength between men and women. You can even imagine a man and woman of similar height and weight and they would, of course, look different. This is due to differences in hormones—the two main players being testosterone and estrogen (see Lang, 2011).

So yes, part of the difference in strength between men and women are rooted in culture and how we view women who strength train (way more women should strength train, as a matter of fact), though I find it hard to believe that even if the “cultural stigma” of the women who lifts heavy weights at the gym disappeared overnight, that women would be stronger than men. Differences in strength exist between men and women and this difference exists due to the complex relationship between biology and culture—nature and nurture (which cannot be disentangled).

The Distinction Between Action and Behavior

1300 words

Action and behavior are distinct concepts, although in common lexicon they are used interchangeably. The two concepts are needed to distinguish what one intends to do and what one reacts to and how they react. In this article, I will explain the distinction between the two and how and why people get it wrong when discussing the two concepts—since using them interchangeably is inaccurate.

Actions are intentional; they are done for reasons (Davidson, 1963). Actions are determined by one’s current intentional state and they then act for reasons. So, in effect, the agent’s intentional states cause the action, but the action is carried out for reasons. Actions are that which is done by an agent, but stated in this way, it could be used interchangeably with behavior. The Wikipedia article on “action” states:

action is an intentional, purposive, conscious and subjectively meaningful activity

So actions are conscious, compared to behaviors which are reflexive and unconscious—not done for reasons.

Davidson (1963: 685) writes:

Whenever someone does something for a reason, therefore, he can be characterized as (a) having some sort of pro attitude toward actions of a certain kind, and (b) believing (or knowing, perceiving, noticing, remembering) that his action is of that kind.

So providing the reason why an agent did A requires naming the pro-attitude—beliefs paired with desires—or the related belief that caused the agent’s action. When I explain behavior, this will become clear.

Behavior is different: behavior is a reaction to a stimulus and this reaction is unconscious. For example, take a doctor’s office visit. Hitting the knee in the right spot causes the knee to jerk up—doctors use this test to test for nerve damage. It tests the L2, L3, and L4 segments of the spinal cord, so if there is no reflex, the doctor knows there is a problem.

This is done without thought—the patient does not think about the reflex. This then shows how and why action and behavior are distinct concepts. Here’s what occurs when the doctor hits the patient’s knee:

When the doctor hits the knee, the patient’s thigh muscle stretches. When the thigh muscle stretches, a signal is then sent along the sensory neuron to the spinal cord where it interacts with a motor neuron which goes to the thigh muscle. The muscle then contracts which causes the reflex. (Recall my article on causes of muscle movement.)

So this, compared to consciously taking a step—consciously jerking your leg in the same way as a doctor expects the patellar reflex—is what distinguishes one from the other—what distinguishes action from behavior. Sure, the behavior of the patellar reflex occurred for a reason—but it was not done consciously by the agent so it is therefore not an action.

Perhaps it would be important at this point to explain the differences between action, conduct, and behavior, because we have used these three terms in the discussion of caring. …

Teleology, the reader is reminded, involves goals or lures that provide the reasons for a person actingin a certain way. It is goals or reasons that establish action from simple behavior. On the other hand the concept of efficient causation is involved in the concept of behavior. Behavior is the result of antecedentconditions. The individual behaves in response to causal stimuli or antecedent conditions. Hence, behavior is a reaction to what already is—the result of a push from the past to do something in the present. In contrast, an action aims at the future. It is motivated by a vision of what can be. (Brencick and Webster, 2000: 147)

This is also another thing that Darwin got wrong. He believed that instincts and reflexes are inherited—this is not wrong since they are behaviors and behaviors are dispositional which means they can be selected. However, he believed that before they were inherited as instincts and reflexes, they were intentional acts. As Badcock (2000: 56) writes in Evolutionary Psychology: A Critical Introduction:

Darwin explicitly states this when he says that ‘it seems probable that some actions, which were at first performed consciously, have become through habit and association converted into relex actions, and are now firmly fixed and inherited.’

This is quite obviously wrong, as I have explained above; instead of “reflexive actions”, Darwin meant “reflexive behaviors”. So, it seems that Darwin did not grasp the distinction between “action” and “behavior” either.

We can then form this simple argument, take cognition:

(1) Cognition is intentional;
(2) Behavior is dispositonal;
(3) Therefore, cognition is not responsible for behavior

This is a natural outcome of what has been argued here, due to the distinction between action and behavior. So when we think of “cognition” what comes to mind? Thinking. Thinking is an action—so thinking (cognition) is intentional. Intentionality isthe power of minds and mental states to be about, to represent, or to stand for, things, properties and states of affairs.” So, when we think, our minds/mental states can represent, stand for things, properties and states of affairs. Therefore, cognition is intentional. Since cognition is intentional and behavior is dispositional, it directly follows that cognition cannot be responsible for behavior.

Thinking is a mental activity which results in a thought. So if thinking is a mental activity which results in a thought, what is a thought? A thought is a mental state of considering a particular idea or answer to a question or committing oneself to an idea or answer. These mental states are, or are related to, beliefs. When one considers a particular answer to a question they are paving the way to holding a particular belief; when they commit themselves to an answer they have formulated a new belief.

Beliefs are propositional attitudes: believing p involves adopting the belief attitude to proposition p. So, cognition is thinking: a mental process that results in the formation of a propositional belief. When one acquires a propositional attitude by thinking, a process takes place in stages. Future propositional attitudes are justified on earlier propositional attitudes. So cognition is thinking; thinking is a mental state of considering a particular view (proposition).

Therefore, thinking is an action (since it is intentional) and cannot possibly be a behavior (a disposition). Something can be either an action or a behavior—it cannot be both.

Let’s say that I have the belief that food is downtown. I desire to eat. So I intend to go downtown to get some food. While the cause is the sensation of hunger. This chain shows how actions are intentional—how one intends to act.

Furthermore, using the example I explained above, how a doctor assesses the patellar reflex is a behavior—it is not an action since the agent himself did not cause it. One could say that it is an action for the doctor performing the reflexive test, but it cannot be an action for the agent the test is being done on—it is, therefore, a behavior.

I have explained the difference between action and behavior and how and why they are distinct. I gave an example of action (cognition) and behavior (patellar reflex) and explained how they are distinct. I then gave an argument showing how cognition (an action) cannot possibly be responsible for behavior. I showed how Darwin believed (falsely) that actions could eventually become behaviors. Darwin pretty much stated “Actions can be selected and eventually become behaviors.” This is nonsense. Actions, by virtue of being intentional, cannot be selected, even if they are done over and over again, they do not eventually become behaviors. On the other hand, behavior, by virtue of being dispositional, can be selected. In any case, I have definitively shown that the two concepts are distinct and that it is nonsense to conflate the terms.

(The Lack of) IQ Construct Validity and Neuroreductionism

2400 words

Construct validity for IQ is fleeting. Some people may refer to Haier’s brain imaging data as evidence for construct validity for IQ, even though there are numerous problems with brain imaging and that neuroreductionist explanations for cognition are “probably not” possible (Uttal, 2014; also see Uttal, 2012). Construct validity refers to how well a test measures what it purports to measure—and this is non-existent for IQ (see Richardson and Norgate, 2014). If the tests did test what they purport to (intelligence), then they would be construct valid. I will show an example of a measure that was validated and shown to be reliable without circular reliance of the instrument itself; I will show that the measures people use in attempt to prove that IQ has construct validity fail; and finally I will provide an argument that the claim “IQ tests test intelligence” is false since the tests are not construct valid.

Jung and Haier (2007) formulated the P-FIT hypothesis—the Parieto-Frontal Intelligence Theory. The theory purports to show how individual differences in test scores are linked to variations in brain structure and function. There are, however, a few problems with the theory (as Richardson and Norgate, 2007 point out in the same issue; pg 162-163). IQ and brain region volumes are experience-dependent (eg Shonkoff et al, 2014; Betancourt et al, 2015Lipina, 2016; Kim et al, 2019). So since they are experience-dependent, then different experiences will form different brains/test scores. Richardson and Norgate (2007) state that such bigger brain areas are not the cause of IQ, rather that, the cause of IQ is the experience-dependency of both: exposure to middle-class knowledge and skills leads to a better knowledge base for test-taking (Richardson, 2002), whereas access to better nutrition would be found in middle- and upper-classes, which, as Richardson and Norgate (2007) note, lower-quality, more energy-dense foods are more likely to be found in lower classes. Thus, Haier et al did not “find” what they purported too, based on simplistic correlations.

Now let me provide the argument about IQ test experience-dependency:

Premise 1: IQ tests are experience-dependent.
Premise 2: IQ tests are experience-dependent because some classes are more exposed to the knowledge and structure of the test by way of being born into a certain social class.
Premise 3: If IQ tests are experience-dependent because some social classes are more exposed to the knowledge and structure of the test along with whatever else comes with the membership of that social class then the tests test distance from the middle class and its knowledge structure.
Conclusion 1: IQ tests test distance from the middle class and its knowledge structure (P1, P2, P3).
Premise 4: If IQ tests test distance from the middle class and its knowledge structure, then how an individual scores on a test is a function of that individual’s cultural/social distance from the middle class.
Conclusion 2: How an individual scores on a test is a function of that individual’s cultural/social distance from the middle class since the items on the test are more likely to be found in the middle class (i.e., they are experience-dependent) and so, one who is of a lower class will necessarily score lower due to not being exposed to the items on the test (C1, P4)
Conclusion 3: IQ tests test distance from the middle class and its knowledge structure, thus, IQ scores are middle-class scores (C1, C2).

Still further regarding neuroimaging, we need to take a look at William Uttal’s work.

Uttal (2014) shows that “The problem is that both of these approaches are deeply flawed for methodological, conceptual, and empirical reasons. One reason is that simple models composed of a few neurons may simulate behavior but actually be based on completely different neuronal interactions. Therefore, the current best answer to the question asked in the title of this contribution [Are neuroreductionist explanations of cognition possible?] is–probably not.

Uttal even has a book on meta-analyses and brain imaging—which, of course, has implications for Jung and Haier’s P-FIT theory. In his book Reliability in Cognitive Neuroscience: A Meta-meta Analysis, Uttal (2012: 2) writes:

There is a real possibility, therefore, that we are ascribing much too much meaning to what are possibly random, quasi-random, or irrelevant response patterns. That is, given the many factors that can influence a brain image, it may be that cognitive states and braib image activations are, in actuality, only weakly associated. Other cryptic, uncontrolled intervening factors may account for much, if not all, of the observed findings. Furthermore, differences in the localization patterns observed from one experiment to the next nowadays seems to reflect the inescapable fact that most of the brain is involved in virtually any cognitive process.

Uttal (2012: 86) also warns about individual variability throughout the day, writing:

However, based on these findings, McGonigle and his colleagues emphasized the lack of reliability even within this highly constrained single-subject experimental design. They warned that: “If researchers had access to only a single session from a single subject, erroneous conclusions are a possibility, in that responses to this single session may be claimed to be typical responses for this subject” (p. 708).

The point, of course, is that if individual subjects are different from day to day, what chance will we have of answering the “where” question by pooling the results of a number of subjects?

That such neural activations gleaned from neuroimaging studies vary from individual to individual, and even time of day in regard to individual, means that these differences are not accounted for in such group analyses (meta-analyses). “… the pooling process could lead to grossly distorted interpretations that deviate greatly from the actual biological function of an individual brain. If this conclusion is generally confirmed, the goal of using pooled data to produce some kind of mythical average response to predict the location of activation sites on an individual brain would become less and less achievable“‘ (Uttal, 2012: 88).

Clearly, individual differences in brain imaging are not stable and they change day to day, hour to hour. Since this is the case, how does it make sense to pool (meta-analyze) such data and then point to a few brain images as important for X if there is such large variation in individuals day to day? Neuroimaging data is extremely variable, which I hope no one would deny. So when such studies are meta-analyzed, inter- and intrasubject variation is obscured.

The idea of an average or typical “activation region” is probably nonsensical in light of the neurophysiological and neuroanatomical differences among subjects. Researchers must acknowledge that pooling data obscures what may be meaningful differences among people and their brain mechanisms. THowever, there is an even more negative outcome. That is, by reifying some kinds of “average,” we may be abetting and preserving some false ideas concerning the localization of modular cognitive function (Uttal, 2012: 91).

So when we are dealing with the raw neuroimaging data (i.e., the unprocessed locations of activation peaks), the graphical plots provided of the peaks do not lead to convergence onto a small number of brain areas for that cognitive process.

… inconsistencies abount at all levels of data pooling when one uses brain imaging techniques to search for macroscopic regional correlates of cognitive processes. Individual subjects exhibit a high degree of day-to-day variability. Intersubject comparisons between subjects produce an even greater degree of variability.

[…]

The overall pattern of inconsistency and unreliability that is evident in the literature to be reviewed here again suggests that intrinsic variability observed at the subject and experimental level propagates upward into the meta-analysis level and is not relieved by subsequent pooling of additional data or averaging. It does not encourage us to believe that the individual meta-analyses will provide a better answer to the localization of cognitive processes question than does any individual study. Indeed, it now seems plausible that carrying out a meta-analysis actually increases variability of the empirical findings (Uttal, 2012: 132).

So since reliability is low at all levels of neuroimaging analysis, it is very likely that the relations between particular brain regions and specific cognitive processes have not been established and may not even exist. The numerous reports purporting to find such relations report random and quasi-random fluctuations in extremely complex systems.

Construct validity (CV) is “the degree to which a test measures what it claims, or purports, to be measuring.” A “construct” is a theoretical psychological construct. So CV in this instance refers to whether IQ tests test intelligence. We accept that unseen functions measure what they purport to when they’re mechanistically related to differences in two variables. E.g, blood alcohol and consumption level nd the height of the mercury column and blood pressure. These measures are valid because they rely on well-known theoretical constructs. There is no theory for individual intelligence differences (Richardson, 2012). So IQ tests can’t be construct valid.

The accuracy of thermometers was established without circular reliance on the instrument itself. Thermometers measure temperature. IQ tests (supposedly) measure intelligence. There is a difference between these two, though: the reliability of thermometers measuring temperature was established without circular reliance on the thermometer itself (see Chang, 2007).

In regard to IQ tests, it is proposed that the tests are valid since they predict school performance and adult occupation levels, income and wealth. Though, this is circular reasoning and doesn’t establish the claim that IQ tests are valid measures (Richardson, 2017). IQ tests rely on other tests to attempt to prove they are valid. Though, as seen with the valid example of thermometers being validated without circular reliance on the instrument itself, IQ tests are said to be valid by claiming that it predicts test scores and life success. IQ and other similar tests are different versions of the same test, and so, it cannot be said that they are validated on that measure, since they are relating how “well” the test is valid with previous IQ tests, for example, the Stanford-Binet test. This is because “Most other tests have followed the Stanford–Binet in this regard (and, indeed are usually ‘validated’ by their level of agreement with it; Anastasi, 1990)” (Richardson, 2002: 301). How weird… new tests are validated with their agreement with other, non-construct valid tests, which does not, of course, prove the validity of IQ tests.

IQ tests are constructed by excising items that discriminate between better and worse test takers, meaning, of course, that the bell curve is not natural, but forced (see Simon, 1997). Humans make the bell curve, it is not a natural phenomenon re IQ tests, since the first tests produced weird-looking distributions. (Also see Richardson, 2017a, Chapter 2 for more arguments against the bell curve distribution.)

Finally, Richardson and Norgate (2014) write:

In scientific method, generally, we accept external, observable, differences as a valid measure of an unseen function when we can mechanistically relate differences in one to differences in the other (e.g., height of a column of mercury and blood pressure; white cell count and internal infection; erythrocyte sedimentation rate (ESR) and internal levels of inflammation; breath alcohol and level of consumption). Such measures are valid because they rely on detailed, and widely accepted, theoretical models of the functions in question. There is no such theory for cognitive ability nor, therefore, of the true nature of individual differences in cognitive functions.

That “There is no such theory for cognitive ability” is even admitted by lead IQ-ist Ian Deary in his 2001 book Intelligence: A Very Short Introduction, in which he writes “There is no such thing as a theory of human intelligence differences—not in the way that grown-up sciences like physics or chemistry have theories” (Richardson, 2012). Thus, due to this, this is yet another barrier against IQ’s attempted validity, since there is no such thing as a theory of human intelligence.

Conclusion

In sum, neuroimaging meta-analyses (like Jung and Haier, 2007; see also Richardson and Norgate, 2007 in the same issue, pg 162-163) do not show what they purport to show for numerous reasons. (1) There are, of course, consequences of malnutrition for brain development and lower classes are more likely to not have their nutritional needs met (Ruxton and Kirk, 1996); (2) low classes are more likely to be exposed to substance abuse (Karriker-Jaffe, 2013), which may well impact brain regions; (3) “Stress arising from the poor sense of control over circumstances, including financial and workplace insecurity, affects children and leaves “an indelible impression on brain structure and function” (Teicher 2002, p. 68; cf. Austin et al. 2005)” (Richardson and Norgate, 2007: 163); and (4) working-class attitudes are related to poor self-efficacy beliefs, which also affect test performance (Richardson, 2002). So, Jung and Haier’s (2007) theory “merely redescribes the class structure and social history of society and its unfortunate consequences” (Richardson and Norgate, 2007: 163).

In regard to neuroimaging, pooling together (meta-analyzing) numerous studies is fraught with conceptual and methodological problems, since a high-degree of individual variability exists. Thus, attempting to find “average” brain differences in individuals fails, and the meta-analytic technique used (eg by Jung and Haier, 2007) fails to find what they want to find: average brain areas where, supposedly, cognition occurs between individuals. Meta-analyzing such disparate studies does not show an “average” where cognitive processes occur, and thusly, cause differences in IQ test-taking. Reductionist neuroimaging studies do not, as is popularly believed, pinpoint where cognitive processes take place in the brain, they have not been established and they may not even exist.

Nueroreductionism does not work; attempting to reduce cognitive processes to different regions of the brain, even using meta-analytic techniques as discussed here, fail. There “probably cannot” be neuroreductionist explanations for cognition (Uttal, 2014), and so, using these studies to attempt to pinpoint where in the brain—supposedly—cognition occurs for such ancillary things such as IQ test-taking fails. (Neuro)Reductionism fails.

Since there is no theory of individual differences in IQ, then they cannot be construct valid. Even if there were a theory of individual differences, IQ tests would still not be construct valid, since it would need to be established that there is a mechanistic relation between IQ tests and variable X. Attempts at validating IQ tests rely on correlations with other tests and older IQ tests—but that’s what is under contention, IQ validity, and so, correlating with older tests does not give the requisite validity to IQ tests to make the claim “IQ tests test intelligence” true. IQ does not even measure ability for complex cognition; real-life tasks are more complex than the most complex items on any IQ test (Richardson and Norgate, 2014b)

Now, having said all that, the argument can be formulated very simply:

Premise 1: If the claim “IQ tests test intelligence” is true, then IQ tests must be construct valid.
Premise 2: IQ tests are not construct valid.
Conclusion: Therefore, the claim “IQ tests test intelligence” is false. (modus tollens, P1, P2)

Defending Minimalist Races: A Response to Joshua Glasgow

2000 words

Michael Hardimon published Rethinking Race: The Case for Deflationary Realism last year (Hardimon, 2017). I was awaiting some critical assessment of the book, and it seems that at the end of March, some criticism finally came. The criticism came from another philosopher, Joshua Glasgow, in the journal Mind (Glasgow, 2018). The article is pretty much just arguing against his minimalist race concept and one thing he brings up in his book, the case of a twin earth and what we would call out-and-out clones of ourselves on this twin earth. Glasgow makes some good points, but I think he is largely misguided on Hardimon’s view of race.

Hardimon (2017) is the latest defense for the existence of race—all the while denying the existence of “racialist races”—that there are differences in mores, “intelligence” etc—and taking the racialist view and “stripping it down to its barebones” and shows that race exists, in a minimal way. This is what Hardimon calls “social constructivism” in the pernicious sense—racialist races, in Hardimon’s eyes, are socially constructed in a pernicious sense, arguing that racialist races do not represent any “facts of the matter” and “supports and legalizes domination” (pg 62). The minimalist concept, on the other hand, does not “support and legalize domination”, nor does it assume that there are differences in “intelligence”, mores and other mental characters; it’s only on the basis of superficial physical features. These superficial physical features are distributed across the globe geographically and these groups are real and exist who show these superficial physical features across the globe. Thus, race, in a minimal sense, exists. However, people like Glasgow have a few things to say about that.

Glasgow (2018) begins by praising Hardimon (2017) for “dispatching racialism” in his first chapter, also claiming that “academic writings have decisively shown why racialism is a bad theory” (pg 2). Hardimon argues that to believe in race, on not need believe what the racialist concept pushes; one must only acknowledge and accept that there are:

1) differences in visible physical features which correspond to geographic ancestry; 2) these differences in visible features which correspond to geographic ancestry are exhibited between real groups; 3) these real groups that exhibit these differences in physical features which correspond to geographic ancestry satisfy the conditions of minimalist race; C) therefore race exists.

This is a simple enough argument, but Glasgow disagrees. As a counter, Glasgow brings up the “twin earth” argument. Imagine a twin earth was created. On Twin Earth, everything is exactly the same; there are copies of you, me, copies of companies, animals, history mirrored down to exact minutiae, etc. The main contention here is that Hardimon claims that ancestry is important for our conception of race. But with the twin earth argument, since everything, down to everything, is the same, then the people who live on twin earth look just like us but! do not share ancestry with us, they look like us (share patterns of visible physical features), so what race would we call them? Glasgow thusly states that “sharing ancestry is not necessary for a group to count as a race” (pg 3). But, clearly, sharing ancestry is important for our conception of race. While the thought experiment is a good one it fails since ancestry is very clearly necessary for a group to count as a race, as Hardimon has argued.

Hardimon (2017: 52) addresses this, writing:

Racial Twin Americans might share our concept of race and deny that races have different geographical origins. This is because they might fail to understand that this is a component of their race concept. If, however, their belief that races do not have different geographical origins did not reflect a misunderstanding of their “race concept,” then their “race concept” would not be the same concept as the concept that is the ordinary race concept in our world. Their use of ‘race’ would pick out a different subject matter entirely from ours.

and on page 45 writes:

Glasgow envisages Racial Twin Earth in such a way that, from an empirical (that is, human) point of view, these groups would have distinctive ancestries, even if they did not have distinctive ancestries an sich. But if this is so, the groups [Racial Twin Earthings] do not provide a good example of races that lack distinctive ancestries and so do not constitute a clear counterexample to C(2) [that members of a race are “linked by a common ancestry peculiar to members of that group”].

C(2) (P2 in the simple argument for the existence of race) is fine, and the objections from Glasgow do not show that P(C)2 is false at all. The Racial Twin Earth argument is a good one, it is sound. However, as Hardimon had already noted in his book, Glasgow’s objection to C(2) does not rebut the fact that races share peculiar ancestry unique to them.

Next, Glasgow criticizes Hardimon’s viewpoints on “Hispanics” and Brazilians. These two groups, says Glasgow, shows that two siblings with the same ancestry, though they have different skin colors, would be different races in Brazil. He uses this example to state that “This suggests that race and ancestry can be disconnected” (pg 4). He criticizes Hardimon’s solution to the problem of race and Brazilians, stating that our term “race” and the term in Brazil do not track the same things. “This is jarring. All that anthropological and sociological work done to compare Brazil with the rest of the world (including the USA) would be premised on a translation error” (pg 4). Since Americans and Brazilians, in Glasgow’s eyes, can have a serious conversation about race, this suggests to Glasgow that “our concept of race must not require that races have distinct ancestral groups” (pg 5).

I did cover Brazilians and “Hispanics” as regards the minimalist race concept. Some argue that the “color system” in Brazil is actually a “racial system” (Guimaraes 2012: 1160). While they do denote race as ‘COR’ (Brazilian for ‘color), one can argue that the term used for ‘color’ is ‘race’ and that we would have no problem discussing ‘race’ with Brazilians, since Brazilians and Americans have similar views on what ‘race’ really is. Hardimon (2017: 49) writes:

On the other hand, it is not clear that the Brazilian concept of COR is altogether independent of the phenomenon we Americans designate using ‘race.’ The color that ‘COR’ picks out is racial skin color. The well-known, widespread preference for lighter (whiter) skin in Brazil is at least arguably a racial preference. It seems likely that white skin color is preferred because of its association with the white race. This provides a reason for thinking that the minimalist concept of race may be lurking in the background of Brazilian thinking about race.

Since ‘COR’ picks out racial skin color, it can be safely argued that Brazilians and Americans at least are generally speaking about the same things. Since the color system in Brazil pretty much mirrors what we know as racial systems, demarcating races on the basis of physical features, we are, it can be argued, talking about the same (or similar) things.

Further, the fact that “Latinos” do not fit into Hardimon’s minimalist race concepts is not a problem with Hardimon’s arguments about race, but is a problem with how “Latinos” see themselves and racialize themselves as a group. “Latinos” can count as a socialrace, but they do not—can not—count as a minimalist race (such as the Caucasian minimalist race; the African minimalist race; the Asian minimalist race etc), since they do not share visible physical patterns which correspond to differences in geographic ancestry. Since they do not exhibit characters that demarcate minimalist races, they are not minimalist races. Looking at Cubans compared to, say, Mexicans (on average) is enough to buttress this point.

Glasgow then argues that there are similar problems when you make the claim “that having a distinct geographical origin is required for a group to be a race” (pg 5). He says that we can create “Twin Trump” and “Twin Clinton” might be created from “whole cloth” on two different continents, but we would still call them both “white.” Glasgow then claims that “I worry that visible trait groups are not biological objects because the lines between them are biologically arbitrary” (pg 5). He argues that we need a “dividing line”, for example, to show that skin color is an arbitrary trait to divide races. But if we look at skin color as an adaptation to the climate of the people in question (Jones et al, 2018), then this trait is not “arbitrary”, and the trait is then linked to geographic ancestry.

Glasgow then goes down the old and tired route that “There is no biological reason to mark out one line as dividing the races rather than another, simply based on visible traits” (pg 5). He then goes on to discuss the fact that Hardimon invokes Rosenberg et al (2002) who show that our genes cluster in specific geographic ancestries and that this is biological evidence for the existence of race. Glasgow brings up two objections to the demarcation of races on both physical appearance and genetic analyses: picture the color spectrum, “Now thicken the orange part, and thin out the light red and yellow parts on either side of orange. You’ve just created an orange ‘cluster’” (pg 6), while asking the question:

Does the fact that there are more bits in the orange part mean that drawing a line somewhere to create the categories orange and yellow now marks a scientifically principled line, whereas it didn’t when all three zones on the spectrum were equally sized?

I admit this is a good question, and that this objection would indeed go with the visible trait of skin color in regard to race; but as I said above, since skin color can be conceptualized as a physical adaptation to climate, then that is a good proxy for geographic ancestry, whether or not there is a “smooth variation” of skin colors as you move away from the equator or not, it is evidence that “races” have biological differences and these differences start on the biggest organ in the human body. This is just the classic continuum fallacy in action: that X and Y are two different parts of an extreme; there is no definable point where X becomes Y, therefore there is no difference between X and Y.

As for Glasgow’s other objection, he writes (pg 6):

if we find a large number of individuals in the band below 62.3 inches, and another large grouping in the band above 68.7 inches, with a thinner population in between, does that mean that we have a biological reason for adopting the categories ‘short’ and ‘tall’?

It really depends on what the average height is in regard to “adopting the categories ‘short’ and ‘tall’” (pg 6). The first question was better than the second, alas, they do not do a good job of objecting to Hardimon’s race concept.

In sum, Glasgow’s (2018) review of Hardimon’s (2017) book Rethinking Race: The Case for Deflationary Realism is an alright review; though Glasgow leaves a lot to be desired and I do think that his critique could have been more strongly argued. Minimalist races do exist and are biologically real.

I am of the opinion that what matters regarding the existence of race is not biological science, i.e., testing to see which populations have which differing allele frequencies etc; what matters is the philosophical aspects to race. The debates in the philosophical literature regarding race are extremely interesting (which I will cover in the future), and are based on racial naturalism and racial eliminativism.

(Racial naturalism “signifies the old, biological conception of race“; racial eliminativism “recommends discarding the concept of race entirely“; racial constructivism “races have come into existence and continue to exist through “human culture and human decisions” (Mallon 2007, 94)“; thin constructivism “depicts race as a grouping of humans according to ancestry and genetically insignificant, “superficial properties that are prototypically linked with race,” such as skin tone, hair color and hair texture (Mallon 2006, 534); and racial skepticism “holds that because racial naturalism is false, races of any type do not exist“.) (Also note that Spencer (2018) critiques Hardimon’s viewpoints in his book as well, which will also be covered in the future, along with the back-and-forth debate in the philosophical literature between Quayshawn Spencer (e.g., 2015) and Adam Hochman (e.g., 2014).)

Steroid Mythconceptions and Racial Differences in Steroid Use

2000 words

Steroids get a bad reputation. It largely comes from movies and people’s anecdotal experiences and repeating stories they hear from the media and other forms of entertainment, usually stating that there is a phenomenon called ‘roid rage’ that makes steroid users violent. Is this true? Are any myths about steroids true, such as a shrunken penis? Are there ways to off-set it? Steroids and their derivatives are off-topic for this blog, but it needs to be stressed that there are a few myths that get pushes about steroids and what it does to behavior, its supposed effects on aggression and so forth.

With about 3 million AAS (ab)users (anabolic-androgenic steroids) in America (El Osta et al, 2016), knowing the effects of steroids and similar drugs such as Winny (a cutting agent) would have positive effects, since, of course, athletes mostly use them.

Shrunken testicles

This is, perhaps, one of the most popular. Though the actual myth is that AAS use causes the penis to shrink (which is not true), in reality, AAS use causes the testicles to shrink by causing the Leydig cells to decrease natural testosterone production which then decreases the firmness and shape of the testicles which then results in a loss of size.

In one study of 772 gay men using 6 gyms between the months of January and February (and you need to think of the type of bias there that those people who are ‘Resolutioners’ would be more likely to go to the gym those months), a questionnaire was given to the men. 15 .2 percent of the men had used, with 11.7 percent of them injecting within the past 12 months. HIV positive men were more likely to have used in the past compared to negative men (probably due to scripts). Fifty-one percent of them reported testicular atrophy, and they were more likely to report suicidal thoughts (Bolding, Sherr, and Elford, 2002). They conclude:

One in seven gay men surveyed in central London gyms in 2000 said they had used steroids in the previous 12 months. HIV positive men were more likely to have used steroids than other men, some therapeutically. Side effects were reported widely and steroid use was associated with having had suicidal thoughts and feeling depressed, although cause and effect could not be established. Our findings suggest that steroid use among gay men may have serious consequences for both physical and mental health.

Of course, those who (ab)use substances have more psychological problems than those who do not. Another study of 203 bodybuilders found that 8 percent (n = 17) found testicular atrophy (for what it’s worth, it was an internet survey of drug utilization) (Perry et al, 2005). Another study found that out of 88 percent of individuals who abused the drug complained of side-effects of AAS use, about 40 percent described testicular atrophy (Evans, 1997), while testicular atrophy was noted in about 50 percent of cases (sample size n = 24) (Darke et al, 2016).

 

Sperm production

One study of steroid users found that only 17 percent of them had normal sperm levels (Torres-Calleja et al, 2001), this is because exogenous testosterone will result in the atrophy of germinal cells which cause a decrease in spermatogenesis. Though, too, increased AAS (ab)use later into life may lead to infertility later in life. Knuth et al (1989) also studied 41 bodybuilders with an average age of 26.7. They went through a huge laundry list of different types of steroids they have taken over their lives. Nineteen of the men were still using steroids at the time of the investigation (group I), whereas 12 of them (group II) stopped taking steroids 3 months prior, while 10 of them (group III) stopped steroid use 4 to 24 months prior.

They found that only 5 of them had sperm counts below the average of 20 million sperm per square ml, while 24 of the bodybuilders showed these symptoms. No difference between group I and II was noticed and group III (the group that abstained from use for 4 to 24 months) largely had sperm levels in the normal range. So, the data suggests that even in cases of severe decrease of sensitivity to androgens due to AAS (ab)use, spermatogenesis may still continue normally in some men, even when high levels of androgens are administered exogenously, while even after prolonged use it seems it is possible for sperm levels to go back to the normal range (Knuth et al 1989).

Aggression and crime

Now it’s time for the fun part and my reason for writing this article. Does (ab)using steroids cause someone to go into an uncontrollable rage, a la the Incredible Hulk when they inject themselves with testosterone? The media has latched into the mind of many, with films and TV shows showing the insanely aggressive man who has been (ab)using AAS. But how true is this? A few papers have shown that this phenomenon is indeed true (Konacher and Workman, 1989; Pope and Katz, 1994), but how true is it on its own, since AAS (ab)users are known to use multiple substances???

Konacher and Workman (1989) is a case study done on one man who had no criminal history, who began taking AASs three months before he murdered his wife, and they conclude that AAS can be said to be a ‘personality changer’. Piacetino et al (2015) conclude in their review of steroid use and psychopathology in athletes that “AAS use in athletes is associated with mood and anxiety disturbances, as well as reckless behavior, in some predisposed individuals, who are likely to develop various types of psychopathology after long-term exposure to these substances. There is a lack of studies investigating whether the preexistence of psychopathology is likely to induce AAS consumption, but the bulk of available data, combined with animal data, point to the development of specific psycho-pathology, increased aggressiveness, mood destabilization, eating behavior abnormalities, and psychosis after AAS abuse/dependence.” I, too, would add that since most steroid abuse are polysubstance abusers (they use multiple illicit drugs on top of AAS), that the steroids per se are not causing crime or aggressive behavior, it’s the other drugs that the steroid (ab)user is also taking. And there is evidence for this assertion.

Lundholm et al (2015) showed just that: that AAS (ab)use was confounded with other substances used while the individual in question was also taking AAS. They write:

We found a strong association between self-reported lifetime AAS use and violent offending in a population-based sample of more than 10,000 men aged 20-47 years. However, the association decreased substantially and lost statistical significance after adjusting for other substance abuse. This supports the notion that AAS use in the general population occurs as a component of polysubstance abuse, but argues against its purported role as a primary risk factor for interpersonal violence. Further, adjusting for potential individual-level confounders initially attenuated the association, but did not contribute to any substantial change after controlling for polysubstance abuse.

Even The National Institute of Health (NIH) writes: “In summary, the extent to which steroid abuse contributes to violence and behavioral disorders is unknown. As with the health complications of steroid abuse, the prevalence of extreme cases of violence and behavioral disorders seems to be low, but it may be underreported or underrecognized.” We don’t know whether steroids cause aggression or more aggressive athletes are more likely to use the substance (Freberg, 2009: 424). Clearly, the claims of steroids causing aggressive behavior and crime are overblown and there has yet to be a scientific consensus on the matter. A great documentary on the matter is Bigger, Stronger, Faster, which goes through the myths of testosterone while chronicling the use of illicit drugs in bodybuilding and powerlifting.

This, too, was even seen in one study where men were administered supraphysiologic doses of testosterone to see its effects on muscle size and strength since it had never been tested; no changes in mood or behavior occurred (Bhasin et al, 1996). Furthermore, injecting individuals with supraphysiological doses of testosterone as high as 200 and 600 mg per week does not cause heightened anger or aggression (Tricker et al, 1996O’Connor et, 2002). Testosterone is one of the most abused AASs around, and if a heightened level of T doesn’t cause crime, nor can testosterone levels being higher this week compared to last seem to be a trigger for crime, we can safely disregard any claims of ‘roid rage’ since they coincide with other drug use (polysubstance abuse). So since we know that supraphysiologic doses of testosterone don’t cause crime nor aggression, we can say that AAS use, on its own (and even with other drugs) does not cause crime or heightened aggression since aggression elevates testosterone secretion, testosterone doesn’t elevate aggression.

One review also suggests that medical issues associated with AAS (ab)use are exaggerated to deter their use by athletes (Hoffman and Ratamess, 2006). They conclude that “Existing data suggest that in certain circumstances the medical risk associated with anabolic steroid use may have been somewhat exaggerated, possibly to dissuade use in athletes.

Racial differences in steroid use

Irving et al (2002) found that 2.1 percent of whites used steroids, whereas 7.6 percent of blacks did; 6.1 percent of ‘Hispanics’ use them within the past 12 months, and a whopping 14.1 percent of Hmong Chinese used them; 7.9 percent of ‘other Asians’ used them, and 3,1 percent of ‘Native Americans’ did with 11.3 percent of mixed race people using them within the past 12 months to gain muscle. Middle schoolers were more likely to use than high schoolers, while people from lower SES brackets were more likely to use than people in higher SES brackets.

Stilger and Yesalis (1999: 134) write (emphasis mine):

Of the 873 high school football players participating in the study, 54 (6.3%) reported having used or currently using AAS. Caucasians represented 85% of all subjects in the survey. Nine percent were African-American while the remainder (6%) consisted of Hispanics, Asian, and other. Of the AAS users, 74% were Caucasian, 13% African American, 7% Hispanic, and 3% Asian, x2 (4,854 4) 4.203, p 4 .38. The study also indicated that minorities are twice as likely to use AAS as opposed to Caucasians. Cross tabulated results indicate that 11.2% of all minorities use/used AAS as opposed to 6.5% of all Caucasians (data not displayed).

One study even had whites and blacks reporting the same abuse of steroids in their sample (n = 10,850 ‘Caucasians’ and n = 1,883 black Americans), with blacks reporting, too, lower levels of other drug abuse (Green et al, 2001). Studies indeed find higher rates of drug use for white Americans than other ethnies, in college (McCabe et al, 2007). Black Americans also frequently underreport and lie about their drug use (Ledgerwood et al, 2008; Lu et al, 2001). Blacks are also more likely to go to the ER after abusing drugs than whites (Drug Abuse Warning Network, 2011). Bauman and Ennett (1994) also found that blacks underreport drug use whereas whites overreport.

So can we really believe the black athletes who state that they do not (ab)use AAS? No, we cannot. Blacks like about any and all drug use, so believing that they are being truthful about AAS (ab)use in this specific instance is not called for.

Conclusion

Like with all things you use and abuse, there are always side-effects. Though, the media furor one hears regarding AAS and testosterone (ab)use are largely blown out of proportion. The risks associated with AAS (ab)use are ‘transient’, and will subside after one discontinues using the drugs. Blacks seem to take more AAS than whites, even if they do lie about any and all drug use. (And other races, too, seem to use it at higher rates than whites.) Steroid use does not seem to be ‘bad’ if one knows what they’re doing and are under Doctor’s supervision, but even then, if you want to know the truth about AAS, then you need to watch the documentary Bigger, Stronger, Faster. I chalk this up to the media themselves demonizing testosterone itself, along with the ‘toxic masculinity’ and the ‘toxic jock effect‘ (Miller, 2009Miller, 2011). Though, if you dig into the literature yourself you’ll see there is scant evidence for AAS and testosterone (ab)use causing crime, that doesn’t stop papers like those two by Miller talking about the effects of ‘toxic jocks’ and in effect, deriding masculine men and with it the hormone that makes Men men: testosterone. If taken safely, there is nothing wrong with AAS/testosterone use.

(Note: Doctor’s supervision only, etc)

The Native American Genome and Dubious Interpretations

1100 words

A recent paper was published on the origins of Native Americans titled Terminal Pleistocene Alaskan genome reveals first founding population of Native Americans (Moreno-Mayar et al, 2018). An infant genome was studied and it was found that group of people the infant belonged to was similar to modern Native Americans but not a direct ancestor. The infant’s group and modern Native Americans share the same common ancestors, however. This, of course, supports the hypothesis that Native Americans are descended from Asian migrants.

The infant is also related to both North and South Natives, which implies they’re descended from a single migration. (Though I am aware of a hypothesis that states that there were three waves of migration into the Americas from Beringia, along with back migrations from South America back into Asia.)

Moreno-Mayar et al (2018) write in the abstract: “Our findings further suggest that the far-northern North American presence of northern Native Americans is from a back migration that replaced or absorbed the initial founding population of Ancient Beringians.” And they conclude (pg 5):

 The USR1 results provide direct genomic evidence that all Native Americans can be traced back to the same source population from a single Late Pleistocene founding event. Descendants of that population were present in eastern Beringia until at least 11.5 ka. By that time, however, a separate branch of Native Americans had already established itself in unglaciated North America, and diverged into the two basal groups that ultimately became the ancestors of most of the indigenous populations of the Americas.

This is a highly interesting paper which shows that, as we’ve known for decades, that the ancestors of the Native Americans crossed the Bering Land Bridge around 11 kya. Though, my reason for writing this article is not for this very interesting paper, but the ‘conclusions’ that people that people are drawing from it.

Dubious ‘interpretations’

Of course, whenever a study like this gets published you get a whole slew of people who read the popular articles on the matter and don’t read the actual journal article. The problem here is that some people took the chance to attempt to say that this paper showed that the origins of Man were in Europe, not Africa as can be seen in the tweet below.

Black Pigeon SpeaksYouTuber, purportedly shows a quotation from the Nature article which said:

“…represent a growing body of evidence being discovered across the world that suggests the origins of the human race may have been Europe and not Africa as once believed.”

So I read the paper, read it again and even cntrl f’d it and didn’t see the phrase. So where did the phrase come from?

I did some digging and I found the source for the quote, which, of course, was not in the Nature article. The quote in question comes from an article titled Scientists discover DNA proving original Native Americans were White. Oh, wow. Isn’t that interesting? Maybe he read a different paper then I did.

The author stated that the infant was “more closely related to modern white Europeans“, though of course this too wasn’t stated anywhere in the article. He also quoted an evolutionary biologist who stated “This is a new population of Native Americans — the white Native American.” Wow, this is interesting. Now let’s look at what else this author writes:

Working with scientists at the University of Alaska and elsewhere, Willerslev compared the genetic makeup of the baby, named Xach’itee’aanenh t’eede gaay or “sunrise child-girl” by the local community, with genomes from other ancient and modern people. They found that nearly half of the girls DNA came from the ancient North Europeans who lived in what is no Scandinavia. The rest of her genetic makeup was a roughly even mixed of DNA now carried by northern and southern Native Americans. Using evolutionary models, the researchers showed the ancestors of the first Native Americans started to emerge as a distinct population about 35,000 years ago.

Isn’t that weird? This is nowhere in the original article. So I did some digging and what do I find? I found that the author of this article literally plagiarized almost word for word from another article from The Guardian

Working with scientists at the University of Alaska and elsewhere, Willerslev compared the genetic makeup of the baby, named Xach’itee’aanenh t’eede gaay or “sunrise child-girl” by the local community, with genomes from other ancient and modern people. They found that nearly half of the girls DNA came from the ancient north Eurasians who lived in what is now Siberia. The rest of her genetic makeup was a roughly even mixed of DNA now carried by northern and southern Native Americans.

Using evolutionary models, the researchers showed the ancestors of the first Native Americans started to emerge as a distinct population about 35,000 years ago.

This is not only an example of straight up plagiarism, the author of the other article literally only switched “Siberia” with “Scandinavia” and “ancient north Eurasians” with “ancient North Europeans”. Ancient north Eurasians are NOT WHITE! Where do you gather this from?! There is NO INDICATION that they were ‘ancient north Europeans!

In sum, if you ever see articles like this that purport to show that Native Americans were white European and that it supposedly calls the OoA model into question, always ALWAYS check the claims and don’t fall for plagiarist bullshit. This is truly incredible that not only did the author literally copy and past a full article, he also snipped a few words to fit the narrative he was pushing! I will be notifying the author of the Guardian article of this plagiarism. You can check it out yourself, read the first article cited above then read the Guardian article. Do people really think they can get away with literally plagiarizing and article like that word for word?

This article is on a whole other level compared to the claims that modern Man began in Europe and that a few teeth upend the OoA model. This guy didn’t even read the paper, it seems like he read the Guardian article and then copy and pasted it and changed a few words for his own ‘gain’ to ‘show’ that the first Native Americans were white. There is no way that one can interpret this paper in this manner if they’ve truly read and understood it. Always, always read original journal articles and, if you must read popular science articles then read it from a reputable website, not kooky websites with an agenda to push who literally plagiarize other people’s work. You can tell who’s gullible and who’s not just by what they say about new papers that can possibly be misinterpreted.

The Non-Validity of IQ: A Response to The Alternative Hypothesis

1250 words

Ryan Faulk, like most IQ-ists, believes that the correlation with job performance and IQ somehow is evidence for its validity. He further believes that because self- and peer-ratings correlate with one’s IQ scores that that is further evidence for IQ’s validity.

The Validity of IQ

Well too bad for Faulk, correlations with other tests and other IQ tests lead to circular assumptions. The first problem, as I’ve covered before, is that there is no agreed-upon model or description of IQ/intelligence/’g’ and so therefore we cannot reliably and truthfully state that differences in ‘g’ this supposed ‘mental power’ this ‘strength’ is what causes differences in test scores. Unfortunately for Ryan Faulk and other IQ-ists, again, coming back to our good old friend test construction, it’s no wonder that IQ tests correlate around .5—or so is claimed—with job performance, however IQ test scores correlate at around .5 with school achievement, which is caused by some items containing knowledge that has been learned in school, such as “In what continent is Egypt?” and Who wrote Hamlet?” and “What is the boiling point of water?” As Ken Richardson writes in his 2017 book Genes, Brains, and Human Potential: The Science and Ideology of Intelligence (pg 85):

So it should come as no surprise that performance on them [IQ tests] is associated with school performance. As Robert L. Thorndike and Elizabeth P. Hagen explained in their leading textbook, Educational and Psychological Measurement, “From the very way in which the tests were assembled [such correlation] could hardly be otherwise.”

So, obviously, neither of the two tests determine independently that they measure intelligence, this so-called innate power, and because they’re different versions of the same test there is a moderate correlation between them. This goes back to item analysis and test construction. Is it any wonder, then, why correlations with IQ and achievement increase with age? It’s built into the test! And while Faulk does cite high correlations from one of Schmidt and Hunter’s meta-analyses on the subject, what he doesn’t tell you is that one review found a correlation of .66 between teacher’s assessment and future achievement of their students later in life (higher than the correlation with job performance and IQ) (Hoge and Coladarci, 1989.) They write (pg 303): “The median correlation, 0.66, suggests a moderate to strong correspondence between teacher judgments and student achievement.” This is just like what I quoted the other day in my response to Grey Enlightenment where I quoted Layzer (1972) who wrote:

Admirers of IQ tests usually lay great stress on their predictive power. They marvel that a one-hour test administered to a child at the age of eight can predict with considerable interest whether he will finish college. But as Burt and colleagues have clearly demonstrated, teachers subjective assessments afford even more reliable predictors. This is almost a truism.

So the correlation of .5 between occupation level and IQ is self-fulfilling, which are not independent measures. In regard to the IQ and job performance correlation, which I’ve discussed in the past, studies in the 70s showed much lower correlations, between .2 and .3, which Jensen points out in The g Factor.

The problem with the so-called validity studies carried out by Schmidt and Hunter, as cited by Ryan Faulk, is that they included numerous other tests that were not IQ tests in their analysis like memory tests, reading tests, the SAT, university admission tests, employment selection tests, and a variety of armed forces tests. “Just calling these “general ability tests,” as Schmidt and Hunter do, is like reducing a diversity of serum counts to a “general. blood test” (Richardson, 2017: 87). Of course the problem with using vastly different tests is that they tap into different abilities and sources of individual differences. The correlation between SAT scores and high school grades is .28 whereas the correlation between both the SAT and high school grades and IQ is about .2. So it’s clearly not testing the same “general ability” that’s being tested.

Furthermore, regarding job performance, it’s based on one measure: supervisor ratings. These ratings are highly subjective and extremely biased with age and halo effects seen with height and facial attractiveness being seen to sway judgments on how well one works. Measures of job performance are unreliable—especially from supervisors—due to the assumptions and biases that go into the measure.

I’ve also shown back in October that there is little relationship between IQ and promotion to senior doctor (McManus et al, 2013).

Do IQ tests test neural processes? Not really. One of the most-studied variables is reaction time. The quicker they react to a stimulus, supposedly, the higher their IQ is in average as they are quicker to process information, the story goes. Detterman (1987) notes that other factors other than ‘processing speed’ can explain differences in reaction time, including but not limited to, stress, understanding instructions, motivation to do said task, attention, arousal, sensory acuity, confidence, etc. Khodadadi et al (2014) even write “The relationship between reaction time and IQ is too complicated and reveal a significant correlation depends on various variables (e.g. methodology, data analysis, instrument etc.).” Complex cognition in real life is also completely different than the simple questions asked in the Raven (Richardson and Norgate, 2014).

It is easy to look at the puzzles that make up IQ tests and be convinced that they really do test brain power. But then we ignore the brain power thst nearly everyone displays in their everyday lives. Some psychologists have noticed thst people who stumble over formal tests of cognitive can bangle highly complex problems in their real lives all the time. As Michael Eysenck put it in his well-known book Psychology, “There is an apparent contradiction between our ability to deal effectively with out everyday environment and our failure to perform well on many laboratory reasoning tasks.” We can say the same about IQ tests.

[…]

Real-life problems combine many more variables that change over time and interact. It seems that the ability to do pretentious problems in a pencil-and-paper (or computer) format, like IQ test items, is itself a learned, if not-so-complex skill. (Richardson, 2017: 95-96)

Finally, Faulk cites studies showing that how intelligent people and their peers rates themselves and others predicted how well they did on IQ tests. This isn’t surprising. Since they correlate with academic achievement at .5 then if one is good academically then they’d have a high test score more often than not. That friends rate friends high and they end up matching scores is no surprise either as people generally group together with other people like themselves and so therefore will have similar achievements. That is not evidence for test validity though!! See Richardson and Norgate (2015) “In scientific method, generally, we accept external, observable differences as a valid measure of an unseen function when we can mechanistically relate differences in one to diffences in the other …” So even Faulk’s attempt to ‘validate’ IQ tests using peer- and self-ratings of ‘intelligence’ (whatever that is) falls on its face since its not a true measure of validity. It’s not construct validity. (EDIT: Psychological constructs are validated ‘by testing whether they relate to measures of other constructs as specified by theory‘ (Strauss and Smith, 2009). This doesn’t exist for IQ therefore IQ isn’t construct valid.)

In sum, Faulk’s article leaves a ton to be desired and doesn’t outright prove that there is validity to IQ tests because, as I’ve shown in the past, validity for IQ is nonexistent, though some have tried (using correlations with job performance as evidence) but Richardson and Norgate (2015) take down those claims and show that the correlation is between .2 and .3, not the .5+ cited by Hunter and Schmidt in their ‘validation studies’. The criteria laid out by Faulk does not prove that there is true construct validity to IQ tests and due to test construction, we see these correlations with educational achievement.