1550 words
No one really discusses how IQ tests are constructed; people just accept the numbers that are spit out and think that it shows one’s intelligence level relative to others who took the test. However, there are huge methodological flaws in regard to IQ tests—one of the largest, in my opinion, being that they are constructed to fit a normal curve and based on the ‘prior knowledge’ of who is or is not intelligent.
What people don’t understand about test construction is that the behavior genetic (BG) method must assume a normal distribution. IQ tests have been constructed to display this normal distribution, so we cannot say whether or not it exists in nature, though few human traits fall on the normal distribution. The fact of the matter is this: The normal curve is achieved through keeping more items that people get right while keeping the smaller proportion of items that people get right and wrong. This forces the normal curve and all of the assumptions that come along with this so-called IQ bell curve.
Even then, the fact that the normal distribution is forced doesn’t mean as much as the assumptions and conclusions drawn from the forced curve. It is assumed that individual test score differences arise out of ‘biology’, however with how test questions are manipulated to get the results that the test constructors want, it is then assumed that the cause for individual test score differences are ‘biological’ in nature, however we don’t know if these distributions are ‘biological’ in nature due to how the tests are constructed.
The fact of the matter is, the tests are constructed based off of the prior knowledge of who is or is not intelligent. This means that we can ‘build the test’ to fit these preconceived notions. The problem of item selection was discussed by Richardson (1998) who discussed boys scoring a few points higher than girls, and wondering whether or not these differences should be ‘allowed to persist’ or not. Richardson (1998: 114) writes (12/26/17 Edit: I’ll also provide the quote that precedes this one):
“One who would construct a test for intellectual capacity has two possible methods of handling the problem of sex differences.
1 He may assume that all the sex differences yielded by his test items are about equally indicative of sex differences in native ability.
2 He may proceed on the hypothesis that large sex differences on items of the Binet type are likely to be factitious in the sense that they reflect sex differences in experience or training. To the extent that this assumption is valid, he will be justified in eliminating from his battery test items which yield large sex differences.
The authors of the New Revision have chosen the second of these alternatives and sought to avoid using test items showing large differences in percents passing.” (McNemar 1942:56)This is, of course, a clear admission of the subjectivity of such assumptions: while ‘preferring’ to see sex differences as undesirable artefacts of test composition, other differences between groups or individuals, such as different social classes or, at various times, different ‘races’, are seen as ones ‘truly’ existing in nature. Yet these, too, could be eliminated or exaggerated by exactly the same process of assumption and manipulation of test composition.
And further writes on page 121:
Suffice it to say that investigators have simply made certain assumptions about ‘what to expect’ in the patterns of scores, and adjusted their analytical equations accordingly: not surprisingly, that pattern emerges!
The only ‘assumption’ that the test constructors have is the biases they already have on who is or is not ‘intelligent’ and then they construct the test through item selection, excising items that don’t fit their desired distribution. Is that supposed to be scientific? You can ask a group of children a bunch of questions and then construct a test to get the conclusion you want based on item selection.
The BG method needs to assume that IQ test scores lie on a normal curve and that it is a quantitative trait that exhibits a normal distribution, though Micceri (1989) showed that normal distributions for measurable traits are the exception, rather than the rule, for numerous measurable traits. Richardson (1998: 113) further writes:
The same applies to many other ‘characteristics’ of IQ. For example, the ‘normal distribution, or bell-shaped curve, reflects (misleadingly as I have suggested in Chapters 1 to 3) key biological assumptions about the nature of cognitive abilities. It is also an assumption crucial to many statistical analyses done on test scores. But it is a property built into a test by the simple device of using relatively more items on which about half the testees pass, and relatively few items on which either many or only a few of them pass. Dangers arise, of course, when we try to pass this property off as something happening in nature instead of contrived by test constructors.
So with the knowledge of test construction, then there is something very obvious here: we can construct IQ tests that, say, show blacks scoring higher than whites and women scoring higher than men. We can then make the assumption that there are genes that are responsible for this distribution and then ‘find genes’ that supposedly cause these differences in test scores (which are constructed to show the differences!). What then? Let’s say that someone did do that, would the logical conclusion be that there are genes ‘driving’ the differences in IQ test scores?
Richardson (2017: 3) writes:
In summary, either directly or indirectly, IQ and related tests are calibrated against social class background, and score differences are inevitably consequences of that social stratification to some extent. Through that calibration, they will also correlate with any genetic cline within the social strata. Whether or not, and to what degree, the tests also measure “intelligence” remains debateable because test validity has been indirect and circular. … Such circularity is also reflected in correlations between IQ and adult occupational levels, income, wealth, and so on. As education largely determines the entry level to the job market, correlations between IQ and occupation are, again, at least partly, self-fullfilling. … CA [cognitive ability], as measured by IQ-type tests, is intrinsically inter-twined with social stratification, and its associated genetic background, by the very nature of the tests.
This, again, falls back on the non-existent construct validity that IQ tests have. Construct validity “defines how well a test or experiment measures up to its claims.” No such construct validity exists for IQ tests. If breathalyzers didn’t test someone’s fitness to drive, would they still be a good measure? If they had no construct validity, if there was no biological model to calibrate the breathalyzer against, would we still accept it as a realistic model to test people against and judge their fitness to drive? Still yet another definition of construct validity comes from Strauss and Smith (2009) who write that psychological constructs are “validated by testing whether they relate to measures of other constructs as specified by theory.” No such biological model exists for IQ; why expect some type of biological model like this when there are other perfectly well-reasoned response to how and why individuals differ in IQ test scores (Richardson, 2002)?
The normal distribution is forced, which IQ-ists claim to know. Richardson (1998) notes that Jensen “noted how ‘every item is carefully edited and selected on the basis of technical procedures known as “item analysis”, based on tryouts of the items on large samples and the test’s target population’ (1980:145).” These ‘tryouts’ are what force the normal curve, and no matter how ‘technical’ the procedures are, there are still huge biases, which then make people draw huge assumptions, again, based on who is or is not intelligent.
Simon (1997: 204) writes (emphasis mine):
There is another, and completely irrefutable, reason why the bell-shaped curve proves nothing at all in the context of H-M’s book: The makers of IQ tests consciously force the test into such a form that it produces this curve, for ease of statistical analysis. The first versions of such tests invariably produce odd-shaped distributions. The test-makers then subtract and add questions to find those that discriminate well between more-successful and less-successful test-takers. For this reason alone the bell-shaped IQ curve must be considered an artifact rather than a fact, and therefore tells us nothing about human nature or human society.
Simon (1997) rightly notes, as I have numerous times, how biased (against certain classes) the excision of items during their analysis and selection (of test items). This shows that both the so-called normal curve and the outcomes they supposedly show aren’t “natural”, but are chosen and forced by the test constructors and their biased and presuppositions about what “intelligence” is. John Raven, for example, also stated in his personal notes how he used his “intuition” to rank-order items, while others further noted that there was no “underlying processing theory” to guide item difficulty and retain old items on newer versions of the test (Carpenter, Just, and Shell: 408).
In sum, IQ tests are constructed to fit a normal curve on the basis of an assumption of a normal distribution, and on the presupposed basis of who is or is not ‘intelligent’ (whatever that means). The BG method needs to assume that IQ is a quantitative trait which exhibits a normal distribution. IQ is assumed to be like height, or weight, but which physiological process in the body does it mimick? I have argued that there is no physiological basis to ‘IQ’ or what they test and that they can be explained not by biology, but through test construction. I wonder what the distributions of IQ test scores would look like without forced normal distributions? Since it is assumed that IQ tests something directly measurable—like height and weight as is normally used—then they must fall on a normal distribution, which all other measurable psychological traits do not show (Micceri, 1989; Buzsaki and Mizseki, 2014).
Some may argue that ‘they know this’ (they being psychometricians). However, ‘they’ must know that most of their assumptions and conclusions about ‘good and bad genes’ lie on the huge assumption of the normal distribution. IQ test scores do not show a normal distribution, they were designed to create it. The fact that most psychological traits show a strong skew to one side and so that’s why a normal distribution is forced is meaningless. The fact of the matter is, just through how the tests are constructed means that we should be cautious as to what these tests test with the assumptions that we currently have about them.
Melo,
Of course there is no way to ‘know’ whether or not it ‘makes data less accurate’, but with the knowledge of other psychological traits’ distribution not being normal it’s a great guess. Of course there is a chance that the data is less accurate and that one should be cautious on the conclusions they draw from the tests. (Like saying X is smarter than Y because he scored higher on the test and the reason is ‘genetic’.)
It’s not ‘redundant’ nor does it ‘try to paint a “conspiracy”‘. The basic argument brings up the flaws in test construction and cautions to be extremely careful with the conclusions drawn from the forced normal distribution. It is then assumed that people towards the right end have a surfeit of ‘good genes’ while those towards the middle have ‘average genes’ and those towards the left end have ‘bad genes’. The assumption is that genes are ‘additive’ and have ‘independent genetic effects’. This implies that genes work ‘independent’ of the environment, which is very, very wrong:
… these conclusions are erroneous due to large violations of the additivity assumption underlying behavioral genetics methods – that sources of genetic and shared and nonshared environmental variance are independent and non-interactive.
Daw, J., Guo, G., & Harris, K. M. (2015). Nurture net of nature: Re-evaluating the role of shared environments in academic achievement and verbal intelligence. Social Science Research, 52, 422-439. doi:10.1016/j.ssresearch.2015.02.011
That’s one reason why the assumption of the normal distribution is flawed; it then assumes that people have ‘good and bad’ genes that ’cause’ their IQ scores. Genes don’t work like that.
Psychology is one of the softest ‘sciences’ out there. So they’ll do anything to protect their ‘golden egg’ called ‘IQ’ since it’s the ‘best they have’, and even then, as I’m showing, it’s not good enough but they have fooled themselves and others that their construct called ‘IQ’ predicts life success due to ‘testing for something biological’ (whatever it is) and that if you score higher you have a surfeit of ‘better genes’ than one who scores lower. Genes don’t work that way, and that’s what psychologists want you to believe with their forced normal distribution through item selection.
LikeLike
The validity of IQ does not rest upon it’s genetic correlations. If that’s your only argument then Im not sure what you want from me.
LikeLike
It rests on there being no agreed-upon model for its validity. If there were no validity to say scale weight would it be a useful measure? Breathalyzers? White blood cell count? It’s validity (or lack thereof) is important to discuss.
LikeLike
The physiology argument will have to wait RR, plus aren’t you writing something about that?
LikeLike
IQ tests are not perfect at measuring intelligence, but they do a good enough job. High scores correlate with success at life, both in long-term success such as career and income, but also the acquisition of abstract, cognitive-demanding skills such as coding, math, and writing. Very seldom will someone online post about a high SAT and or IQ score and be as dumb as a rock; usually, such people are quite articulate and well-read. Sub-tests such as digit recall admit a normal distribution without any need for forcing.
LikeLike
They don’t even do a ‘good enough job’ because there is a preconceived notion of who is or is not intelligent which is built into the test. The ‘high scores correlate with success in life’ because of how they’re constructed. They ‘correlate with academic achievement’ because they’re different versions of the same test. What people post online about their own test scores are irrelevant but of course there is a relationship like that between IQ and achievement tests because they’re, again, different versions of the same test. Are people’s IQs just digit recall though? Digit recall tests working memory (whatever that is) so the fact that a subtest has an unforced normal distribution (barely) says nothing to the overall critique of the normal curve being forced through item selection.
LikeLike
“we can construct IQ tests that, say, show blacks scoring higher than whites ”
If you could create such a test while maintaining its usefulness, you would become very rich. Why don’t you construct such a test and make lots and lots of money?
LikeLike
What usefulness? The ‘usefulness’ is built into the test. I’m not trained in ‘item analysis’ (Jensen, 1980) so I cannot construct such a test. However, what I stated is the logical conclusion of test construction. The quotation from McNemar shows how arbitrary the process of ‘item analysis’ is and proves my point on even group ‘differences’.
That’s not to say that I don’t believe that races are ‘equal’ in their mental faculties—even ‘intelligence’ (whatever that is). However, IQ doesn’t test ability for complex cognition (Richardson and Norgate, 2014) the ‘usefulness’ of the test is built into it. It only has as much ‘predictive power’ as the test constructors allow.
LikeLiked by 1 person
Okay so, are you saying that all studies around IQ are false?
Such as the consumption of fish and IQ for example? Such as those about lead and IQ?
LikeLike
Because I think that your opinion isn’t conclusive enough compared to the empirical evidences around IQ (Flynn effect, regression to the mean…etc)
Also, what’s the point of making a test where black women score higher than white men if they still have lower SES?
To finish, you seems to depend too much on Richardson.
LikeLike
Hey Steve, why don’t you come back when you actually have some kind of credentials in this sort of thing? Because I know you’re so qualified to talk about matters of biology and psychology with your Master’s Degree in… Finance and Marketing.
By the way, how’s that whole Trump thing going?
LikeLike
False? No. Does it test what psychometricians et al claim it does? Not by a long shot.
I touched briefly on breastfeeding and intelligence in my reply to Jared Taylor. The fact that it’s associated with IQ in RCTs means… What? That fatty acids are good for brain development? Who knew? In regard to lead and IQ, lead changes how the brain functions, which also can be passed down epigenetically, from mother to child, then child to the grandchild. The fact that lead depresses IQ is meaningless because it disrupts normal brain functioning.
The Flynn Effect can be explained by the rise in the middle class. Regression to the mean proves that IQ tests biological processes?
What’s the point of this comment?
Irrelevant.
rw95,
Appeals to authority aren’t cool.
LikeLike
RR,
The urge to stick it to Steve is extremely tempting. But your blog, your rules.
LikeLike
RaceRealist, your response about ‘regression to the mean” is too vague.
And my criticism about Richardson is not incorrect, I’ve heard that his studies are based on low samples, check E. Kirkegaard article on you, it also talk about Richardson.
I’m not saying that you’re wrong, just that you lack of solid evidences for your main arguments. Richardson seems to be a huge contrarian.
but I will concede for the rest of your points such as lead, SES…etc
LikeLike
It’s only asking if that proves that there is a biological substrate to IQ.
In this instance it is because the subject of test construction and validity has not been addressed.
I’m aware of his replies to a few of my articles. I’ll respond to him in due time.
Thr evidence for test construction and validity is sound and doesn’t fully rely on Richardson (as if that matters).
It affects normal functioning and therefore is irrelevant to normal variation, which is what the discussion rests on.
LikeLike