Measurement invariance (MI) is a concept used in psychology, sociology, and education research which purports to describe the property that is being measured across different groups and/or contexts (eg Boorsboom, 2006), while measurement noninvariance (MNI) means that “a construct has a different structure or meaning to different groups or on different measurement occasions in the same group, and so the construct cannot be meaningfully tested or construed across groups or across time” (Putnick and Borstein, 2017). MI assumes that psychological constructs are measurable and that the same construct can be measured across different groups and contexts.
But MI doesn’t need to be a concrete physical reality to show that the construct that’s supposedly being measured is the same between cultures/contexts, so the story goes. Furthermore, establishing MI is important in showing that a construct is a valid one—that is, if a so-called measure is construct valid. But upon further examination of MI, the concept seems to be highly flawed. It assumes that psychological constructs are fixed, objective and can be measured with the same precision as physical objects. However, this assumption runs counter to the arguments I’ve been making for years about the immateriality of mind and psychological processes along with the limits of scientific inquiry.
In this article, I will provide an argument against the concept of MI and defend it’s premises, arguing that MI is an unrealistic, invalid concept due to the assumptions both implicit and explicit in the concept. Further, if a test is measurement invariant, then scores are comparable between groups and any differences between the groups can be said to be down to true differences in the hypothesized construct that is supposedly being measured. But if this construct is a psychological one, then there is no measurement occurring. But it is important to note that even if a test is so-called measurement invariant, that does not mean that a test is free from bias.
Arguing against measurement invariance
So MI is concerned with operationalizing and measuring certain indicators. However, since MI assumes that the underlying psychological construct that is purported to be measured exists, is objective, and a stable entity that can be measured accurately can be seen to be a form of reification, just like the concept of “general intelligence.” Although I reject MI due to its conceptual and theoretical assumptions as I will argue below, it is worth noting that Wicherts (2016; cf Shuttleworth-Edwards, 2016) showed that not all IQ tests are measurement invariant—that is, some are measurement noninvariant. Wicherts states that “psychological test scores need to be valid and reliable“—though I am not aware of any IQ test that is indeed construct valid (Richardson and Norgate, 2015).
Construct validity (CV) is also related to the overarching arguments I will mount. CV refers to whether or not a test or measure accurately assessed what it is purported it does. So when it comes to. CV for “intelligence” and other psychological traits, how can CV be established if there is no physicality to the construct—if they are immaterial? Further, the neurosciences have attempted to show CV for IQ, but this fails too.
I’m not really worried about whether or not IQ tests results are or are not measurement invariant. I am, however, concerned with the conceptual and theoretical underpinnings and assumptions of MI. It is these assumptions and theoretical underpinnings, I will argue, that invalidate the concept of MI. Long-time readers will know that I argue against the materiality (and so argue for) of psychological traits. Since MI is concerned with operationalizing and measuring certain indicators that supposedly relate to the constructs in question, the arguments I am about to give and have given in the past completely invalidate MI. These arguments are conceptual, and so empirical evidence is irrelevant to them. MI assumes that psychological constructs are measurable—that is, there is a physical basis for psychological constructs. Though, metaphysical/philosophical arguments refute this claim.
Now I will give the arguments against MI.
P1: MI assumes that the same construct is being measured in the same way across different groups and contexts.
P2: The meaning of a construct may differ between different cultural and social groups.
P3: Differences in cultural and social meaning can affect the reliability and validity of the measurement instruments which are used to assess the construct.
C: Thus, it may be inappropriate to assume that measures of psychological constructs are invariant across different cultural and social groups.
Premise 1: P1 is definitional, and is the accepted definition of MI in the literature.
Premise 2: Cultural and social differences can influence the meaning of a construct. A construct like self-esteem may be interpreted differently in individualistic and collectivist cultures, where an individualistic culture would look at self-esteem as an individual attribute while a collectivist culture would look at it as a relational attribute (eg between Western and East Asian cultures) (Heine and Hamamura, 2007). Further, aggressiveness and assertiveness could also be valued differently in individualistic and collectivist cultures (Church, 2000).
Premise 3: So-called measurement instruments that are used to assess psychological constructs could be influenced by cultural and social differences. A psychological test developed and “validated” in one culture may not be “valid” due to differences in culture, language, or norms (van de Vijver and Leung, 1997). Furthermore, differences in item content or even response styles may also lead to measurement bias which could then affect the reliability and validity of the so-called measurement instruments which then could affect the reliability and validity of the so-called measures.
So the conclusion here logically follows from the premises: so it could be inappropriate to assume that measures of psychological constructs are comparable between groups.
P1: If MI is a valid concept, then the same construct should be measured in the same way across groups/contexts.
P2: If cultural and social differences can affect the meaning and measurement of a construct, then the same construct cannot be measured in the same way across groups/contexts.
P3: Cultural or social differences can affect the meaning of a construct.
C: So MI isn’t a valid concept. (Hypothetical syllogism)
If psychological constructs can have different meanings across cultural and social groups, and if these differences can affect the “measurement” of the so-called construct, then it is impossible to achieve MI as defined in P1. While it is important to note that MI would then need to be redefined or applies with caution and awareness of cultural and social factors relevant to the so-called measurement, it is also important to note that different cultures convey different meaning to concepts. This argument is a sound one.
Premise 1: Again, this is definitional and a widely-accepted definition of MI.
Premise 2: Different cultures have different psychological and cultural tools, which then affect how those who find themselves in that culture think and behave (eg Markus and Kitayama, 1991; Nisbett, 2003; Heine, 2008). The implication here is obviously that psychological constructs have different meanings based on cultural, national, and social context. So the references provided show a valid concern which challenges a main assumption of MI.
Premise 3: Along with the references to Markus and Kitayama, Nisbett, and Heine, it has also been observed that cultural differences in beliefs and values could affect how people respond to personality tests, and that language can affect the interpretation of so-called psychological scales. So P3 is well-evidenced.
So the conclusion logically follows and what I wrote above in the preceding paragraph below the argument is the same.
P1: Psychological constructs are fundamentally subjective and culturally situated.
P2: MI assumes the possibility of objective and universal measurement of psychological constructs.
C: Therefore, MI is completely invalid.
Premise 1: Psychological constructs are abstract, subjective constructs which reflect the subjective experiences, values, and beliefs of individuals and communities. They are, furthermore, also shaped by cultural and social contexts which could affect the meanings and associations of particular constructs (Markus and Kitayama, 1991; Nisbett, 2003; Heine, 2008). So this implies that psychological constructs cannot be measured in a consistent and objective way across cultural and social contexts, since they are inherently subjective and culturally situated.
Premise 2: This, again, is definitional. So based on the well-accepted definition of MI, psychological constructs are universal and can be measured in a consistent and objective way across cultures. But P1 directly contradicts this—psychological constructs are subjective and culturally situated.
Conclusion: So based on P1 and P2, MI is invalid. If psychological constructs are fundamentally subjective and culturally situated, and if MI assumes the possibility and objective measurement of these constructs, then MI is fundamentally (conceptually and theoretically) flawed and so it cannot be applied in a meaningful way. So MI is an invalid concept.
The conceptual arguments I have given against MI challenge the assumptions and underlying theoretical framework of MI. So, the next and final argument I will provide will prove that the nature and immateriality of psychological traits means that MI is an invalid concept.
P1: If construct validity requires that the same construct is being measured in the same way across contexts and groups, and if psychological traits are immaterial and so immeasurable, then MI is an invalid concept.
P2: Psychological traits are immaterial and so immeasurable.
P3: Construct validity requires that the same construct is being measured in the same way across contexts and groups.
C: So MI is an unrealistic and invalid concept.
Premise 1: CV requires that the same construct is being measured across all groups and contexts (Kane, 2006). If we can it ensure that we are measuring the same construct across all groups and contexts, then we cannot establish construct validity. So if psychological traits are immaterial and immeasurable, and we cannot ensure that we are measuring the same construct across different groups and contexts, then the concept of MI is unrealistic and therefore invalid.
Premise 2: This premise is of course widely-debated in philosophy of mind (Chalmers, 1996), and I have provided numerous arguments that establish the immateriality and immeasurability of psychological traits (here, here, here and here). So since psychological traits are immaterial and not directly observable or measurable, then they pose a huge and insurmountable obstacle for the psychometric claims that mentality can be measured.
Premise 3: This premise establishes that for the same construct to be measured across groups and contexts, then it needs to be established that the same construct is indeed being measured. However, see the discussion of the conclusion below.
Conclusion: Since psychological traits are immaterial and so immeasurable, and construct validity requires that the same construct is being measured across all groups and contexts, then MI isn’t a valid concept.
I have discussed what MI is and what is required for MI in the literature. I then gave four arguments which conclude that MI isn’t a valid concept. This is because different cultures have different psychological and cultural tools, and so different concepts would have different meaning in different cultures and nations, which does indeed have empirical support.
Most importantly, the immateriality of psychological traits means that they are not directly observable or measurable, and therefore, this is yet another reason why the concept of MI is invalid. For X to be measured, X needs a specified measured object, object of measurement and measurement unit (Berka, 1983; Nash, 1990). I have termed this “the Berka/Nash measurement objection.” For if those three conditions do not hold, then one cannot logically state that they are measuring anything. Furthermore—regarding standardized tests—they exist to assess social function (Garrison, 2009: 5). Psychometricians also assume that what they are measuring is quantitative, without showing that it is. That “something” is being measured that we don’t know due to the the fact that tests and questionnaires obtain numerical values is patently ridiculous. Psychometricians design that test first and then attend it to ascertain what it measures (Nash, 1990).
Now, while I think I have refuted the concept of MI, of course this will continue to be used in psychometric research. However, just because a thing is continously used and just because a whole field continues even when there are lethal conceptual objections against it doesn’t mean that they are in the clear. The fact of the matter is, psychometrics isn’t measurement (Uher, 2021), and no experiment can establish that it is—psychometricians need to grapple with the conceptual arguments, first. MI just isn’t a valid concept due to the arguments and reasons I have given.