Brazilian WHOQOL-OLD Module version: a Rasch analysis of a new instrument


Versão em português do Módulo WHOQOL-OLD: análise de Rasch de um novo instrumento



Eduardo ChachamovichI; Marcelo P FleckI; Clarissa TrentiniI; Mick PowerII

IPrograma de Pós-Graduação em Psiquiatria. Universidade Federal do Rio Grande do Sul. Porto Alegre, RS, Brazil
IIClinical and Health Psychology. University of Edinburgh Medical School. Edinburgh, UK





OBJECTIVE: To evaluate the Brazilian version of WHOQOL-OLD Module and to test potential changes to the instrument to increase its psychometric adequacy.
METHODS: A total of 424 older adults living in a city in Southern Brazil completed the WHOQOL-OLD instrument, in 2005. Rasch analysis was used to explore the psychometric performance of the scale, as implemented by the RUMM2020 software. Item-trait interaction, threshold disorders, presence of differential item functioning and item fit, were analyzed.
RESULTS: Two ("death and dying" and "sensory abilities") out of six domains showed inadequate item-trait interactions. Rescoring the response scale and deleting the most misperforming items led to scale improvement. The evaluation of domains and items individually showed that the "intimacy" domain does perform well in contrast to the findings using the classical approach. In addition, the "sensory abilities" domain does not derive an interval measure in its current format.
CONCLUSIONS: Unidimensionality and local independence were seen in all domains. Changes in the response scale and deletion of problematic items improved the scale's performance.

Descriptors: Health of the Elderly. Questionnaires. Quality of Life. Psychometrics. Validity of Tests. WHOQOL-OLD, Rasch.


OBJETIVO: Analisar a versão brasileira do Módulo WHOQOL-OLD, indicando alterações potenciais do instrumento para aumentar a adequação psicométrica.
MÉTODOS: O total de 424 idosos residentes em Porto Alegre, RS, responderam o instrumento WHOQOL-OLD em 2005. O modelo de Rasch foi utilizado para a análise do desempenho psicométrico da escala, a partir do software RUMM2020. Foram analisadas a interação item-traço, a presença de funcionamento diferencial dos itens e a adequação dos itens ao modelo de Rasch.
RESULTADOS: Dois domínios ("morte e morrer" e "funcionamento do sensório") apresentaram interação item-total insuficiente. Remodelar a escala de resposta e excluir itens com pior performance resultou em melhora da escala. A análise dos domínios e itens individualmente foi capaz de indicar que o domínio "intimidade" teve boa performance, ao contrário dos resultados gerados pela abordagem psicométrica clássica. O domínio "funcionamento dos sentidos" não fornece uma medida intervalar em seu formato atual.
CONCLUSÕES: Todos os domínios apresentaram unidimensionalidade e independência local. As alterações na escala de resposta e a exclusão de itens problemáticos determinaram melhora da performance da escala.

Descritores: Saúde do Idoso. Questionários. Qualidade de Vida. Psicometria. Validade dos Testes.WHOQOL-OLD. Rasch.




The world has been experiencing a profound and irreversible demographic shift as older people are living longer and healthier than ever before.24 The most dramatic increases in proportions of older people are evident in the most advanced age groups (people over 80 years old) with an almost fivefold increase from 69 million in 2000 to 377 million in 2050.24 The World Health Organization (WHO) has described this demographic shift as a major societal achievement, and a challenge25. Increased longevity has been experienced in the developed and the developing world alike, but where developed countries grew rich before it grew old, developing countries are growing old before they have grown rich.25

This shift in the age pyramid due to increased elderly population demands further research specifically approaching the aging process. One important area to be assessed is quality of life. Although there are several studies on this issue, systematic reviews have pointed out that the instruments most frequently used in these investigations are not sufficiently comprehensive and/or are not validated for application in older adult populations.4,11

The WHO Quality of Life Group has recently developed the WHOQOL-OLD Module.16 Through a simultaneous transcultural methodology, this instrument is designed to be suitable for cross-cultural comparisons. In addition, it was developed to specifically assess quality of life of the elderly, thus ensuring that important areas concerning old age are covered by the instrument. Its comprehensiveness is sustained by an initial intense qualitative phase.7,10 The WHOQOL-OLD module represents an additional tool, alongside the WHOQOL-100 or WHOQOL-BREF, and it is a useful alternative in the investigation of quality of life in older adults, including relevant aspects not covered by instruments originally designed for non-elderly populations.

The validation of the Brazilian version of the WHOQOL-OLD Module is reported in detail elsewhere.8 Briefly, it involved classic psychometric approach to analyze internal consistency, discriminant validity, criterion validity, concurrent validity and test-retest reliability. The findings indicated suitable psychometric properties for this version.

The Rasch measurement theory is a modern psychometric approach to the development and validation of instruments. It has emerged as a powerful tool for examining instrument performance in depth, allowing both the instrument as whole and individual items to be assessed. In addition, the Rasch model is also helpful for providing potential solutions for misperforming instruments. It is suggested that combining both traditional and modern psychometric approaches is a valuable strategy to enhance power of validation processes.20 Furthermore, the use of the Rasch measurement model for the development and application of quality of life instruments has been increasingly stressed.16,19,21

The present study aimed at evaluating the Brazilian version of the WHOQOL-OLD Module using a modern psychometric approach and testing potential changes to the instrument in order to increase its psychometric adequacy.



The data collected for the original classic validation8 was also analyzed in this study. A minimum sample of 300 subjects stratified by gender (50% women and 50% men), age (60–69 years, 70–79 years and over 80) and self-perceived health status (50% considering themselves healthy and 50% unhealthy) was selected at a university hospital, nursing homes, and in the community according to the WHOQOL-OLD project. Convenience sampling was used. The stratification process provided minimum subsamples that allowed for the assessment of the instrument under different conditions.

Inclusion criteria were age 60 or above and clinical ability to understand and respond to the instruments administered. Subjects were required to answer the question "In general, do you consider yourself healthy or unhealthy?," and were later stratified as healthy or unhealthy exclusively according to their subjective self-perception, regardless of their actual objective health status. This methodology is based on the theoretical background for quality of life instruments developed by the WHO, where the quality of life construct is seen as multidimensional and basically subjective23.

Subjects completed a sociodemographic information form, the WHOQOL-OLD Module and the Geriatric Depression Scale 15-item version.18 The WHOQOL-BREF instrument was also part of the assessment, and its psychometric performance is reported elsewhere.4

The sociodemographic information form included questions about gender, age, educational level, marital status, subjective self-perception of health status, and consumption of alcohol, tobacco and illegal substances. The data obtained from this questionnaire was utilized for demographic description, as well as for differential item functioning (DIF) analysis.

The WHOQOL-OLD is a 24-item self-report instrument. It is divided into six domains (sensory abilities, autonomy, past-present-future activities, social participation, death and dying and intimacy). Each domain provides an individual score. In addition, an overall score is calculated from the set of 24 items. Answers are based on a 5-point Likert response scale.16 It is validated in Brazilian Portuguese, and this version presents good classic psychometric performance.8

Data was examined by way of the Rasch model using RUMM 2020 software.3 Linacre states that the ideal sample size varies according to the scale targeting. For a well-targeted scale (40–60% endorsement rates on dichotomous items), a sample size of 108 would have a 99% confidence of person estimation of +0.5 logits. For non well-targeted scales, though, a minimum sample size for satisfactory estimations would be 243 subjects.13

The Rasch model is understood as a template which puts into operation the axioms for additive conjoint measurement.14 This theory presents a set of methods to determine whether a variable has an additive structure and, then, is amenable to be measured on an interval scale.17 Originally developed to be applied in dichotomous scales, the Rasch model is also applicable to polytomous data.1

Basically, the Rasch model assumes that the probability of a given subject endorsing an item is a function of the relative distance between the item location and the person location on a linear common scale15. In the case of a scale measuring depression, for example, the probability that a person is endorsing an item is a logistic function of the difference between the subject's ability (level of depression) and the level of depression expressed by the item. The following equation illustrates this statement:

where ln is the normal log, P is the probability of a person n to endorse the item, q is the person's level of ability and b is the level of ability expressed by the item. If the data fits the Rasch model, then both the person's ability and item difficulty will be placed in a common metric scale (log-units scale or logit), which allows a linear transformation of the raw scale. Thus, when the data fits the model, and the assumptions of local independence are met, the scale is then suitable for valid parametric approaches.14 Since the Rasch analysis is strongly dependent on unidimensionality, each one of the six WHOQOL-OLD domains was tested individually as separated scales.15

Apart from unidimensionality, local independence is also considered a Rasch assumption. Items are required not to have dependence on each other, so that the probability of endorsing one item is not associated to any other in the scale. Local independence should be examined for each scale to be analyzed by the Rasch model.

If Rasch assumptions are satisfied, and the scale fits the expected model, then it is also guaranteed that the performance of the instrument is stable and not dependent on the sample being assessed, or on certain characteristics such as gender or age, which is called specific objectivity.21

First, overall fit statistics were examined. An item-trait interaction was analyzed using the chi-square test, which indicates the invariance property if p-value is not significant (thus indicating similarity between expected and observed models). The standardized distributions of items and persons were examined by way of a diagram.

Furthermore, individual item statistics were analyzed for residuals and chi-square statistics. Again, if a determined item fits the model, low residual (+2.5) and non-significant chi-square statistics are expected. Bonferroni correction was applied to control for multiple test effects. Threshold disorders were also examined using threshold maps and category probability curves for each individual item.

An estimate of internal consistency was also obtained through the person separation index (PSI), which is comparable to the Cronbach's alpha coefficient. Items were examined for DIF. The presence of DIF indicates that a subgroup (e.g., males or young adults) has a consistent way of responding to an item, despite having the same amount of the latent trait. Both uniform DIF (when the difference is constant through the whole range of the item curve) and non-uniform DIF (when the difference occurs only at a certain level of attribute) were checked.

Finally, modifications were tested when fit statistics indicated misfit. Item rescoring and deletion were carried out in order to achieve the best item structure possible.

All respondents were informed about the objectives of the study and confidentiality of the data obtained. Subjects signed an informed consent approved by the Research Ethics Committee of the university hospital where the study was carried out.



The sample comprised 424 subjects and its characteristics are described in Table 1. The Geriatric Depression Scale means and standard deviation (SD) indicate that the sample is predominantly non-depressed. In addition, around two thirds of the subjects perceived themselves as being healthy, despite their objective health condition. Subjective self-perception is known to be related to depression levels. Thus, the high rate of "healthy" subjects may be considered an indirect effect of low depression levels in the sample.



As for Rasch analysis results, the verification for missing values showed that only items 1 and 3 had extremely low missing value rates (between 0.2% and 0.4%). The distributions of responses across the five points did not show major problems. These findings corroborate the high responsiveness of the WHOQOL-OLD in a Brazilian sample. It is likely that the close assistance research staff offered to subjects during data collection is somehow related to the unexpected low number of missing values. Table 2 shows item contents, missing values, medians and distributions.

The item-trait interaction was analyzed for the six domains individually through chi-square statistics. This test aims at checking whether the observed model (i.e., the data collected) fits the expected model (based on a probabilistic adaptation of Guttmann scale).2 Thus, as Kline states, it is primarily a test of "badness-of-fit," since statistical positive results (p-values above the critical one, after Bonferroni correction) indicate that the observed model is different from the expected.12 The "death and dying" domain had an inadequate result (domain c2 = 51.72, p=0.00012). The "sensory abilities" domain also showed high chi-square results (domain c2 =101.10 and p=0.0000).

Local dependence was examined for the six domains and the 24-item set. A correlation of residuals for all items was carried out. Coefficients equal to or higher than 0.3 were considered indicators of local dependence. No dependence was found for any domain or the overall scale.

Items 4, 5, 9 and 20 showed reversed threshold. Thresholds indicate the point where there is exactly a probability of 0.50 that a subject will respond to the item between a certain response category and the adjacent one. Threshold disorders, thus, suggest that the response scale is not efficient to discriminate between two ability levels, so that subjects with more ability could respond in the same category as another with lower ability. In other words, the response scale would not be working adequately to order subjects with distinct levels of ability. These items were examined and rescored according to the point of the disorder in the response scale. For the items 4, 5 and 9, response categories two and three were merged into one. For the item 20, categories three and four were collapsed (values for the original instrument).

Figure 1 illustrates the category probability curves of the item 5 in its 5-point original form and after rescoring. One can see that the original form presents reversed thresholds (i.e., category number 2 is not endorsed at any point). After rescoring, categories are well distributed. The RUMM2020 software3 automatically renames the categories in order to assign the value 0 for the first category. In the instrument, however, the categories range from 1 to 5.

The distributions of persons and item thresholds are illustrated in Figure 2. Persons' locations are placed on the top half of the chart. The mean person location value was 0.719 (SD=0.744). This is slightly above the average scale items (which would be zero logits). Threshold distribution is located on the bottom half of the chart. The scale's peak of information (if taken as a 24-item set) is located between 0 and -1 logits. However, thresholds adequately cover all the range of ability, which ensures that the scale is able to provide information for all levels.

DIF was assessed by gender (male and female) and age (60 to 79 years and 80 or older). Item bias indicates that item performance is not homogeneous and, thus, has distinct performance on different subjects when controlling for the level of underlying construct measured by the test.6 As a result, scores obtained from an item with DIF are not comparable across populations. Items were analyzed for uniform and non-uniform DIF. Briefly, the former is related to a constant difference of functioning through the entire spectrum of the construct, while the latter indicates that the DIF is seen only in a certain part of the curve.5 Uniform DIF items can be either excluded from the scale or, alternatively, be used to create two different scales (and then the item would have distinct weights in each).22

Item 3 ("sensory abilities" domain) showed uniform DIF for age. No DIF was found for other items.

The first step in the scale modification was rescoring response categories. Besides solving threshold disorders, the item-trait interaction showed improvement for the "sensory abilities" domain (original c2 =142.44; and after rescoring c2 =93.32). This improvement was not sufficient to adjust this domain to the expected model.

Item 3 showed differential functioning, as well as misfit of chi-square test and residuals. These three statistics suggest that item 3 is not performing according to the expected Rasch model. Thus, item 3 was deleted and the domain was then re-examined. The item-trait interaction showed improvement (c2 changed from 93.32 to 59.28). However, the model after deleting is still misfitting.

The "death and dying" domain also showed item-trait interaction misfit in its original format (c2 =60.03). Rescoring item 20 resulted in improvement of the model (c2 =51.72). At this stage, values were still non-significant, indicating persistent misfit. Deletion of item 18 (which presented high chi-square results) resulted in an adjusted structure.

Table 3 describes the fit statistics for the refined WHOQOL-OLD version.



The WHOQOL-OLD Module was developed through a simultaneous transcultural methodology, which is able to include different cultural contexts from the first steps of the instrument construction.9 This is regarded as a major characteristic of the WHOQOL-OLD.16

In addition to the theoretical design, it is also crucial that a new international measure is adequately validated. This ensures that the original strengths of the instrument remain in the new version in a different language. The validation of a scale or instrument is a longitudinal process and ideally should involve its testing in distinct contexts.

The combination of different psychometric approaches for the validation or development of a new measure is supported in the literature. Particularly, it has been argued that the Rasch measurement model is able to add important input, since it puts into operation the axioms for additive conjoint measurement.14 Using both traditional and Rasch analyses seem to be a useful strategy and provide relevant insight regarding scale performance.16,20

The findings of the present study are in line with the results previously reported through classical psychometric theory.8 The "sensory abilities" domain showed inadequate performance in multiple linear regression analyses in previous studies. The Rasch analysis corroborated the domain misfitting. The "intimacy" domain, however, showed misperformance in the classical psychometric approach (multiple linear regression), but not in Rasch analysis. This discrepancy indicates that the domain itself functions well as a set, and the items show satisfactory performance. It is suggested the previous findings are due to limitations of the multiple linear regression, particularly the choice of a suitable dependent variable.

Rescoring and item deletion has not resulted in adequate improvement in the "sensory abilities" domain. Interestingly, item rescoring and deletion significantly improved the performance of the "death and dying" domain. After these changes, the model statistics fit the Rasch model.

These potential changes should not produce crucial modifications in the scale format, since they can be made during the statistical analysis phase and not necessarily in the data collection stage. Replications of these findings in different samples are needed to confirm the results.



1. Andrich D. Rating formulation for ordered response categories. Psycometrika. 1978;43(4):561-573        

2. Andrich D. Rasch models for measurement. London: Sage University Paper; 1988.        

3. Andrich D, Lyne A, Sheridan B, Luo G. RUMM 2020. Perth: RUMM Laboratory; 2003.        

4. Chachamovich E, Trentini C, Fleck MP. Assessment of the psychometric performance of the WHOQOL-BREF instrument in a sample of Brazilian older adults. Int Psychogeriatr. 2006;19(4):635-46.        

5. Crane PK, Gibbons LE, Jolley L, van Belle G, Selleri R, Dalmonte E, et al. Differential item functioning related to education and age in Italian version of the Mini-Mental State Examination. Int Psychogeriatr. 2006;18(3):505-15.        

6. Crane PK, Gibbons LE, Narasimhalu K, Lai JS, Cella D. Rapid detection of differential item functioning in assessments of health-related quality of life: the functional assessment of cancer therapy. Qual Life Res. 2007;16(1):101-14.        

7. Fleck MPA, Chachamovich E, Trentini CM. Projeto WHOQOL-OLD: método e resultados de grupos focais no Brasil. Rev Saude Publica. 2003;37(6):793-9        

8. Fleck MPA, Chachamovich E, Trentini C. Development and validation of the Portuguese version of the WHOQOL-OLD module. Rev Saude Publica. 2006;40(5):785-91.        

9. Guillemin F. Cross-cultural adaptation and validation of health status measures. Scand J Reumathol. 1995;24(2):61-3.        

10. Hawthorne G, Davidson N, Quinn K, McCrate F, Winkler I, Lucas R, et al. Issues in conducting cross-cultural research: implementation of an agreed international procotol designed by the WHOQOL Group for the conduct of focus groups eliciting the quality of life of older adults. Qual Life Res. 2006;15(7):1257-70.        

11. Haywood KL, Garrat AM, Fitzpatrick R. Quality of life in older people: A structured review of generic self-assessed health instruments. Qual Life Res. 2005;14(7):1651-68.        

12. Kline RB. Principles and practice of structural equation modelling. 2. ed. New York: Guilford Press; 2005.        

13. Linacre JM. Sample size and item calibration stability. Rasch Meas Trans. 1994;7(4):328.        

14. Pallant J, Miller R, Tennant A. Evaluation of the Edinburgh Post Natal Depression Scale using Rasch analysis. BMC Psychiatry.,2006;6:28        

15. Pallant J, Tennant A. An introduction to the Rasch measurement model: an example using the Hospital Anxiety and Depression Scale (HADS). Br J Clin Psychol., 2007;46(Pt 1):1-18.        

16. Power M, Quinn K, Schmidt S, WHOQOL-OLD Group. Development of the WHOQOL-Old module. Qual Life Res. 2005;14(10):2197-214.        

17. Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press; 1960.        

18. Sheikh JI, Yesavage JA. Geriatric Depression Scale (GDS): recent evidence and development of a shorter version. Clin Gerontol. 1986;37:819-20.        

19. Swaine-Verdier A, Doward LC, Hagell P, Thorsen H, McKenna SP. Adapting quality of life instruments. Value in Health. 2004;7(Suppl 1):S27-30.        

20. Tammaru M, McKenna SP, Meads DM, Maimets K, Hansen E. Adaptation of the rheumatoid arthritis quality of life scale for Estonia. Rheumatol Int. 2006;26(7):655-62.        

21. Tenant A, McKenna SP, Hagell P. Application of Rasch analysis in the development and application of quality of life instrument. Value Health. 2004;7(Suppl 1):S22-6.        

22. Tennant A, Pallant JF. Unidimensionality matters! (A tale of two Smiths?). Rasch Meas Trans. 2006;20:1048-51.        

23. The World Health Organization quality of life assessment (WHOQOL): development and general psychometric properties, 1998. Soc Sci Med. 1988;46(12):1569-85.        

24. United Nations. World population prospects: the 2002 revision. New York: United Nations Population Division; 2003.        

25. World Health Organization. Active ageing: a policy framework. Geneva; 2002.        



Eduardo Chachamovich
Rua Florêncio Ygartua, 391/308
90430-010 Porto Alegre, RS, Brazil

Received: 6/6/2007
Approved: 5/31/2007



E Chachamovich was supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES – Process 3604-06/3; Foreign Scholarship for Doctorate studies).
E Chachamovich was affiliated to the University of Edinburgh Medical School at the time of the study.

Faculdade de Saúde Pública da Universidade de São Paulo São Paulo - SP - Brazil