There are a lot of conceptual problems with IQ tests that I never see talked about. The main ones are how the tests are constructed (to fit a normal curve, no less); to the fact that there is no construct validity to the tests (IQ tests aren’t calibrated against a biological model like breathalyzers are calibrated against a model of blood in the blood stream); and how the Raven’s Progressive Matrices test is actually biased despite being touted as the most culture-free test since all you’re doing is rotating abstract symbols to see what comes next in the sequence. These three assumptions have important implications for the ‘power’ of the IQ tests, the most important being the test construction and validity.
I) IQ test construction
IQ tests are constructed with the assumption that we know what IQ tests test (we don’t) and with the prior ‘knowledge’ of who is or is not intelligent. Test constructors construct the tests to reveal presumed differences between individuals.
It is assumed that 1) IQ scores lie on a normal distribution (they don’t) and 2) few natural bio functions conform to this curve. Another problem with IQ test construction is the assumption that it increases with age and levels off after puberty. Though this, like the other things, has been built into the test by choosing items that an increasing proportion of children pass. You can, of course, reverse this effect by choosing items that older people do well on and younger people don’t.
Further, they keep 50 percent of items that children get right while keeping a smaller proportion of items that children get right, which, in effect, presupposes who is or is not intelligent.
Though, you never see those who believe that IQ is a ‘good enough’ proxy for intelligence ever being this up. Why? This is very important for the validity of these tests. Because if how the tests are constructed is wrong and test scores are not to fit a normal distribution when no normal distribution actually exists for most human mental (including IQ scores) and physiological traits, then the assumptions and conclusions drawn from them are wrong. IQ tests are constructed with the prior idea of who is or is not ‘intelligent’ and this is done by how the items are chosen—50 percent of the items that people get right are kept while the smaller proportion of items people get right or wrong are kept. This is how this so-called ‘normal curve’ appears in IQ tests and is why the book The Bell Curve has the name it has. But bell curve don’t exist for a modicum of traits including IQ!!
Simon (1997: 204) writes (emphasis mine):
There is another, and completely irrefutable, reason why the bell-shaped curve proves nothing at all in the context of H-M’s book: The makers of IQ tests consciously force the test into such a form that it produces this curve, for ease of statistical analysis. The first versions of such tests invariably produce odd-shaped distributions. The test-makers then subtract and add questions to find those that discriminate well between more-successful and less-successful test-takers. For this reason alone the bell-shaped IQ curve must be considered an artifact rather than a fact, and therefore tells us nothing about human nature or human society.
The analysis and selection of items that go on the tests are biased since there is no cognitive theory on which the analysis and selection of items are based. Carpenter, Just and Shell (1990: 408) note how John Raven, the creator of the Raven’s Progressive Matrices, even discussed this in his personal notes, writing “He used his intuition and clinical experience to rank order the difficulty of the six problem types. Many years later, normative data from Forbes (1964), shown in Figure 3, became the basis for selecting problems for retention in newer versions of the test and for arranging the problems in order of increasing difficulty, without regard to any underlying processing theory.”
II) IQ test validity
Another problem with IQ tests are its validity. People attempt to ‘prove’ its validity with correlating job performance success with IQ scores, though there are huge flaws in the studies purporting to show a .5 correlation between IQ and job performance (Richardson, 2002; Richardson and Norgate, 2015). IQ tests are not like, say, breathalyzers (which are calibrated against a model of blood alcohol) or white blood cell count (which is a proxy for disease in the body). Those two measures have a solid theoretical basis and underpinning; as blood alcohol rises, the individual had increased alcohol consumption. The same is true for white blood cell count. The same is not true for IQ tests.
One of the biggest measures used in regards to job performance and IQ testing (people attempt to use job performance to attempt to validate IQ tests) is supervisor rating. However, supervisory ratings are hugely subjective and a lot of factors that would have a supervisor be said to be a ‘good worker’ are not variables that entail just that job.
The only ‘validity’ that IQ test have is correlations with other IQ tests and tests like the SAT. This is not validity. Say the breathalyzer wasn’t calibrated against a model of blood alcohol in the body, would breathalyzers still be a valid tool to test people’s blood/alcohol level? On that same note let’s say that white blood cells wasn’t construct valid. Would we be able to reliably use white blood cell count as a valid measure for disease in the body? These very same problems plague IQ tests and people accept them as ‘proxies’ for intelligence, they test ‘enough of intelligence’ to be able to say that one is smarter than another because they scored higher in a test and therefore tap into this mystical ‘g’ that they have more of which is like a ‘power’ or ‘energy’.
These tests, therefore, are constructed with the idea of who is or is not intelligent and you can see that by looking at how the items are chosen for the test. That’s not scientific. So a true test of ‘intelligence’ may not even exist since these tests have this type of construct bias already in them.
IQ tests have no validity like breathalyzers and white blood cell count, and the so-called ‘culture-free’ IQ test Raven’s Progressive Matrices is anything but.
III) Raven’s and culture bias
I specifically asked Dr. James Thompson about Raven’s being culture-fair. I said that I recall Linda Gottfredson saying that people say that Ravens is culture-fair only because Jensen said it:
So that’s one thing about Ravens that crumbles. A quote from Ken Richardson’s book Genes, Brains, and Human Potential: The Science and Ideology of Intelligence:
It is well known that families and subcultures vary in their exposure to, and usage of, the tools of literacy, numeracy, and associated ways of thinking. Children will vary in these because of accidents of background. …that background experience with specific cultural tools like literacy and numeracy is reflected in changes in brain networks. This explains the importance of social class context to cognitive demands, but is says nothing about individual potential.
(This argument on social class is much more complex than ‘poor people are genetically predisposed to be dumb and poor’.
Consider a recent GCTA study by Plomin et al., who reported a SNP-based heritability estimate of 35% for “general cognitive ability” among UK 12 year olds (as compared to a twin heritability estimate of 46%) . According to the Wellcome Trust “genetic map of Britain,” striking patterns of genetic clustering (i.e. population stratification) exist within different geographic regions of the UK, including distinct genetic clusterings comprised of the residents of the South, South-East and Midlands of England; Cumbria, Northumberland and the Scottish borders; Lancashire and Yorkshire; Cornwall; Devon; South Wales; the Welsh borders; Anglesey in North Wales; Scotland and Ireland; and the Orkney Islands . Now consider the title of a study from the University and College Union: “Location, Location, Location – the widening education gap in Britain and how where you live determines your chances” . This state of affairs (not at all unique to the UK), combined with widespread geographic population stratification, is fertile ground for spurious heritability estimates.
I think this argument is interesting, and it throws a wrench into a lot of things, but more on that another day.)
In other words, items like those in the Raven contain hidden structure which makes them more, not less, culturally steeped than any other kind of intelligence testing items, like the Raven, as somehow not knowledge-based, when all are clearly learning dependent. Ironically, such cultural-dependency testing is sometimes tacitly admitted by test users. For example, when testing children in Kuwait on the Raven in 2006, Ahmed Abdel-Khalek and John Raven transposed the items “to read from left to right following the custom of Arabic writings.” (Richardson, 2017: 99)
Finally, we have this dissertation which shows that urban peoples score better than hunter-gatherers (relevant to this present article):
Reading was the greatest predictor of performance Raven’s, despite controlling for age and sex. Attendance was also strongly correlated with Raven’s performance. These findings suggest that reading, or pattern recognition, could be fundamentally affecting the way an individual problem solves or learns to learn, and is somehow tapping into ‘g’. Presumably the only way to learn to read is through schooling. It is, therefore, essential that children are exposed to formal education, have the motivation to go/stay in school, and are exposed to consistent, quality training in order to develop the skills associated with improved performance. (pg. 83)
This is telling: This means that there is no such thing as a ‘culture-free’ IQ test and there will always be something involved that makes it culture un-fair.
People may say ‘It’s only rotating pictures and shapes to get the final answer, how much schooling could you need??’, well as seen above with the Tsimane, schooling is very important to IQ tests since they test learned skills. I’ve seen some people claim that IQ tests don’t test learned ability and that it’s all native, unlearned ability. That’s a very incorrect statement.
So although the symbols in a test like the RPM are experience-free, the rules governing their changes across the matrix are certainly not, and they are more likely to be already represented in the minds of children from middle-class homes, less so in others. Performance on the Raven’s test, in other words, is a question not of inducing ‘rules’ from meaningless symbols, in a totally abstract fashion, but of recruiting ones that are already rooted in the activites of some cultures rather than others. Like so many problems in life, including fields as diverse as chess, science and mathematics (e.g. Chi & Glaser, 1985), each item on the Raven’s test is a recognition problem (matching the covariation structure in a stimulus array to ones in background knowledge) before it is a reasoning problem. The latter is rendered easy when the former has been achieved. Similar arguments can be made about other so-called ‘culture-free’ items like analogies and classifications (Richardson & Webster, 1996). (Richardson, 2002: pg 292-292)
Everyday life is also more complex than the hardest items on Raven’s Matrices, while the test is not complex in its demands compared to tasks undertaken in everyday life (Carpenter, Just, and Shell, 1990). They conclude that the cause is differences in working memory, but that is an ill-defined concept in psychology. They do say, though, that “The processes that distinguish among individuals are primarily the ability to induce abstract relations and the ability to dynamically manage a large set of problem-solving goals in working memory.” So item complexity doesn’t make Raven’s items more difficult for others, since everyday life is more complex.
I’ll end with a bit of physiology. What physiological process is does IQ mimic in the body? If it is a physiological process, surely you’re aware that physiological processes *are not* static. IQ is said to be stable at adulthood, what a strange physiological process. Let’s say for arguments’ sake that IQ really does test some intrinsic, biological process. Does it seem weird to you that a supposed real, stable, biological, bodily function of an individual would be different at different times?
There are a lot of assumptions about IQ tests that are never talked about. The most important being how the tests are constructed to fit a normal curve when most traits important for survival aren’t normally distributed. IQ tests are constructed with the assumption of who is or isn’t intelligent just on the knowledge of how the items are prepared for the test. When you look at how the tests are constructed you can see how they are constructed to fit the normal curve because most of their assumptions and conclusions rest on the reality of the normal curve. There is no construct validity to IQ tests, they’re not like breathalyzers for instance which are calibrated against a model of blood alcohol or white blood cell count as a proxy for disease in the body. Raven’s—despite what is commonly stated about the test—is not unbiased, it perhaps is the most biased IQ test of them all. This highlights the problems with IQ tests that are rarely ever spoken about, and should have you call into question the ‘power’ of the IQ test which assumes who is or isn’t intelligent ahead of time.