On-line version ISSN 1518-8787
Print version ISSN 0034-8910
Rev. Saúde Pública vol.41 n.3 São Paulo Jun. 2007
ARTIGO ESPECIAL SPECIAL ARTICLE
Análise de sensibilidade para um confundidor não mensurado utilizando planilha eletrônica
Maria Deolinda Borges CabralI; Ronir Raggio LuizII
IInstituto Brasileiro de Geografia e Estatística. Rio de Janeiro, RJ, Brasil
IINúcleo de Estudos de Saúde Coletiva. Faculdade de Medicina da Universidade Federal do Rio de Janeiro. Rio de Janeiro, RJ, Brasil
In studies assessing the effects of a given exposure variable and a specific outcome of interest, confusion may arise from the mistaken impression that the exposure variable is producing the outcome of interest, when in fact the observed effect is due to an existing confounder. However, quantitative techniques are rarely used to determine the potential influence of unmeasured confounders. Sensitivity analysis is a statistical technique that allows to quantitatively measuring the impact of an unmeasured confounding variable on the association of interest that is being assessed. The purpose of this study was to make it feasible to apply two sensitivity analysis methods available in the literature, developed by Rosenbaum and Greenland, using an electronic spreadsheet. Thus, it can be easier for researchers to include this quantitative tool in the set of procedures that have been commonly used in the stage of result validation.
Keywords: Statistical interpretation of data. Sensitivity analysis. Confounding (Epidemiology). Observational studies. Electronic spreadsheet.
Em estudos que avaliam o efeito de uma dada variável de exposição e um determinado desfecho de interesse, uma situação de confusão pode ser caracterizada pela falsa aparência de que a variável de exposição produz o desfecho de interesse quando, de fato, o efeito observado se deve a um fator de confundimento que está presente. Entretanto, é pouco freqüente a utilização de técnicas quantitativas para determinar a influência potencial de confundidores não mensurados. A análise de sensibilidade é uma técnica estatística que permite uma medida quantitativa do impacto de uma variável confundidora não mensurada na associação de interesse que está sendo avaliada. O objetivo do artigo foi viabilizar a aplicação, por meio de planilha eletrônica, de dois métodos de análise de sensibilidade disponíveis na literatura, desenvolvidos por Rosenbaum e Greenland. Dessa forma, é possível facilitar ao pesquisador a incorporação desse ferramental quantitativo ao conjunto de procedimentos que já são comuns na etapa de validação dos resultados.
Descritores: Interpretação estatística de dados. Análise de sensibilidade. Fatores de confusão (Epidemiologia). Estudos observacionais.
There is much epidemiological interest in establishing causes and relationships. While science is concerned with the frequency, distribution and determination of disease factors, methodological procedures have been developed based on statistical models to identify causes of diseases.3,4,7 However, these models rely on assumptions that frequently cannot be tested through the observed data, that is the discussion of causality addresses the assessment of the validity of the findings obtained in the studies.
A study is considered valid, with a resulting causal interpretation, if it is bias-free, i.e., there are no systematic errors that explain the association found as an alternative to the causal hypothesis.4
In studies assessing the effects of a given exposure variable and a specific outcome of interest, confusion may result from the mistaken impression that the exposure variable produces the outcome of interest when the effect observed is actually due to an existing confounding factor. According to Koopman* (1997), confounding occurs when a non-causal association is observed between the exposure and the outcome of interest in a reference population. Two types of biases resulting from confounding may arise: overt bias, caused by confounders that are measured in the study, and hidden bias, caused by existing unmeasured confounders in the study5 (1991).
When analyzing observational studies, the measured potential confounders are usually analytically "adjusted" using statistical techniques such as stratification, pairing, among others. However, quantitative techniques are rarely used to determine the potential impact of unmeasured confounders. According to Greenland2 (1996), the random errors and confounders measured in the data generation process often constitute only a fraction of the total error, and are rarely the only important sources of uncertainty. It is thus convenient to develop and use an appropriate statistical tool that allows a quantitative evaluation of such errors, with the sensitivity analysis being a statistical technique that allows the quantitative measurement of the impact of an unmeasured confounding variable on the association of interest that is being assessed.
Although conceptually well-developed, the two sensitivity analysis methods available in the literature developed by Rosenbaum6 (1995) and Greenland2 (1996) require laborious calculations not handled by currently available software programs. However, such methods may be fully applied through an electronic spreadsheet. The purpose of the present study is to make it feasible to apply each of these methods using an electronic spreadsheet in order to make it easier for researchers to include this quantitative tool in the set of procedures that have been commonly used in the stage of result validation. The selection of a spreadsheet is prompted by its widespread use.
SENSITIVITY ANALYSIS METHODS
Rosenbaum5 and Greenland2 developed two sensitivity analysis methods applied to dichotomic variables that allow analyses of the behavior of study results in the event of unmeasured confounders.
Also known as the external adjustment method, the Greenland method tries to quantify the variation in the association observed in a specific study when adjusted for a potential unmeasured confounding variable. The method consists of simulating various plausible values for the confounder prevalences by exposure level, specifically in those individuals who do not show the outcome, as well as the magnitude of association between the confounder and the outcome, then calculating an estimate of the association between the exposure and the outcome "adjusted" for the specified confounding variable for each combination studied.
In contrast to the Greenland method, which considers the classic confounding scheme (i.e., the confounder must be associated with the exposure and be an independent predictor of the outcome), the Rosenbaum method works only with the association between the confounder and the exposure. This method quantifies the magnitude of the association between the unmeasured confounder and the exposure variable required to make the association statistically non-significant. It is found between the exposure and the outcome, assuming that the gap between the confounder and the outcome is enough for the confounding to affect the association between this confounder and the exposure variable.
ELEMENTS AND NOTATIONS FOR THE APPLICATION OF A SENSITIVITY ANALYSIS
To formalize Greenland2 and Rosenbaum6 methods, a hypothetical study is considered where the exposure, outcome and unmeasured confounder variables are defined as follows:
Table 1 shows the general scheme for presenting the findings obtained in this hypothetical study.
The following magnitudes are now considered:
PZ1: prevalence of the unmeasured confounding variable among exposed individuals;
PZ0: prevalence of the unmeasured confounding variable among non-exposed individuals;
ORDE: odds ratio between the outcome and the exposure;
ORDZ: odds ratio between the outcome and the confounding variable;
OREZ: odds ratio between the exposure and the confounding variable.
The Greenland method speculates on the plausible values for ORDZ, PZ1 and PZ0, and, consequently, it speculates on the possible values for the association between E and Z (OREZ), because OREZ is affected by the values of PZ1 and PZ0, according to the following formula (1).
In order to find the values to complete Table 2, the hypothesis formulated is that the odds ratio between E and Z has the same value for both Z strata (Z is the confounding variable for the association between E and D). Thus, speculating about the plausible values for these three (or four) magnitudes, various ORDE values are obtained and "adjusted" for Z, allowing an analysis of existing variations considered epidemiologically relevant that may point out findings other than those obtained.
On the other hand, the Rosenbaum method5,6 speculates on the G value, the magnitude associating the unmeasured confounder to the exposure which makes the observed association of interest ORDE statistically non-significant. For dichotomic variables, the method is based on the Mantel-Haenszel statistic (T). This is a test statistic normally used in analyses where a third variable is taken into consideration that may "mask" the association found between the exposure and the outcome of interest1 (1981). It considers the total number of exposed individuals showing the outcome (T = A in the hypothetical case presented in Table 1). The calculation of the expectation and the T variance is carried out on an approximate basis by the normal distribution, establishing the values for the marginal patient totals and the exposed individuals as R and M in Table 1. The expectation expression is given by a second level equation on the null hypothesis that the exposure is not associated with the outcome, obtained by the odds ratio between the exposure and the outcome when equal to the speculated value of the association between the exposure and the unmeasured confounder (G). The variance calculation considers the expectation value and the A1+, R, M, N and G values. Once the expectation and variance values are obtained, the standardized statistical T value (Tstd) is calculated and the p-value is obtained for the upper limit. For calculating the lower limit p-value, G is replaced by in the odds ratio equation between the exposure and the outcome, and the expectation and variance calculations are reworked. The value sought by this method is the lowest value for G, which makes the observed association of interest (ORDE) statistically non-significant at a 95% confidence level. The formulas for calculating the expectation, variance and standardized T were developed by Stevens8 (1951).
SENSITIVITY ANALYSIS PERFORMANCE SPREADSHEETS
To make available the two sensitivity analysis methods under consideration, two spreadsheets were developed that allow the calculations to be carried out as required for their application.
Figure 1 shows the spreadsheet for applying the Greenland method. All cells in this spreadsheet should be completed as described in Table 3. The cells C6, C7, E6, and E7 must be filled out with the data observed in the study and the magnitudes to be speculated should be entered into cells B12, B13 and B14. Once completed as described in Table 1, all findings will be automatically generated by the spreadsheet. Cells B23 and B24 show the odds ratio between the exposure variable and the outcome, between individuals exposed and not exposed to Z, respectively, adjusted for the speculated values in cells B12, B13 and B14. These values are identical as Z is considered a confounding variable for E and D in the method development. Cell B22 provides the value of the association between the confounder and the exposure variable (OREZ).
In turn, the spreadsheet in Figure 2 (Rosenbaum method) should be completed as described in Table 4. When completing the specified cells, the odds ratio value observed between the exposure and the outcome (ORDZ) is calculated automatically in cell H6; T expectation values, T variance, Tstd statistic and p-value for G = 1 are automatically calculated in cells B16, C16, G16 and H16, respectively. Cell B20 should be completed with the values to be speculated for G, when G ¹ 1. When completed, T expectation value, T variance, Tstd statistic and p-value of the upper limit are automatically calculated in cells G24, H24, C27 and D27, respectively. The expectation values and variance required to calculate the p-value for the lower limit are obtained in the same way, replacing G value by in cell B20.
If there are two or more strata, the expectations and variances for each stratum should be calculated for each G value considered. The T statistic will be given by the sum of the exposed individuals showing the outcome for all the strata and, as the T expectation and T variance, the sum of the expectations and variances for all the strata respectively. After obtaining T statistic, expectation and T variance values, Tstd values and p-value are calculated for the upper limit, similar to the formulas described in cells C27 and D27 respectively. Once again, for calculating the p-value of the lower limit, the calculations are repeated, replacing G by .
EXAMPLE OF A SENSITIVITY ANALYSIS APPLICATION
As an example of the use of the spreadsheets presented, a hypothetical observational study is considered, analyzing the association between the exposure to a factor E and an outcome of interest D, whose findings are presented in Table 5.
In order to verify the behavior of the association found in the presence of a potential unmeasured confounder (Z), it was decided to apply a sensitivity analysis to the observed data. Due to the importance of the two methods available, it is suggested that they be applied in an integrated manner** (2005). Initially, the Rosenbaum method was applied in order to obtain the G value making the ORDE value adjusted for the unmeasured confounding variable statistically non-significant. The spreadsheet showed in Figure 2 was used for G values equal to 1.0, 1.5, 1.8, 1.9, 2.0 and 3.0, and also for the corresponding values, and the findings are showed in Table 6.
According to Table 6, the lowest G value making the adjusted ORDE value statistically non-significant at a 95% significance level is G = 1.9. The suggestion is to start the Greenland method by taking the minimum value of G in the Rosenbaum method as the initial value for speculating on the value of the association between the exposure variable and the unmeasured confounder Z (OREZ). Thus, using the spreadsheet in Figure 1 for the ORZE values set at 1.9, 2.5, and 3.0, the ORDZ values "speculated" at 3.0, 5.0, 10.0 and 15.0, with PZ1 values varying between 0.1 and 0.9; and the corresponding PZ0 values obtained through formula (2).
Analyzing the findings showed in Table 7, based on a hypothetical study, it can be noted that the variations in the adjusted ORDE values move away from the observed ORDE value (2.57) when the unmeasured confounder increases the chance of exposure by 2.5 and also presenting an odds ratio with the outcome of at least 10.
The two spreadsheets presented in Figures 1 and 2 are intended to provide an operating tool that streamlines the application of a sensitivity analysis by researchers, allowing quantitative measurements of the impact of an unmeasured confounding variable on the association of interest that is being assessed.
The spreadsheets provided are easy to use and allow the immediate application of the Greenland2 (1996) and Rosenbaum5 (1995) methods. The Greenland method approach focuses more on the epidemiological elements of the study, while the Rosenbaum method addresses the statistical significance of the findings observed.
As these two approaches are important for observational studies, the example presented suggests a way of integrating these two techniques in order to direct and reduce the number of calculations required for a sensitivity analysis. The calculations presented for these two methods address the exposure, outcome and dichotomic confounder variables.
It should be stressed that with the Greenland method, should it prove necessary to stratify for a measured confounder, the calculations in the method description should be repeated for each stratum, and the findings obtained for each of them should then be merged. Moreover, the spreadsheet provided to apply the Rosenbaum method may be used when the marginal totals for each stratum are large, i.e., when M, N - M, R and N - R are large. Otherwise, other expressions for the exact expectation and variance of the T distribution should be used, which may be found in Rosenbaum6 (1995).
1. Fleiss JL. Statistical methods for rates and proportions. 2. ed. New York: John Wiley & Sons;1981. [ Links ]
2. Greenland S. Basic Methods for sensitivity analisys of biases. Int J Epidemiol. 1996;25(6):1107-16. [ Links ]
3. Holland PW. Statistics and causal inference. J. Am. Stat. Assoc.1986; 81(396):945-60. [ Links ]
4. Luiz RR, Struchiner CJ. Inferência Causal em Epidemiologia: o modelo de respostas potenciais. Rio de Janeiro: Editora Fiocruz; 2002. [ Links ]
5. Rosenbaum PR. Discussing hidden bias in observational studies. Ann Intern Med. 1991;115(11):901-5. [ Links ]
6. Rosenbaum PR. Observational Studies. New York: Spring-Verlag; 1995. [ Links ]
7. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688-701. [ Links ]
8. Stevens WL. Mean and variance of an entry in a contingency table. Biometrika. 1951;38(3-4):468-470. [ Links ]
Maria Deolinda Borges Cabral
Av. República do Chile, 500 50 andar
20031-170 Rio de Janeiro, RJ, Brasil
Article based on the master's dissertation of MDB Cabral, presented to the Núcleo de Estudos de Saúde Coletiva of Universidade Federal do Rio de Janeiro, in 2005.
* Koopman NJ. Stratification of exposure-disease relationships upon a third variable and the assessment of joint effects [monografia na internet]. Ann Arbor; 1997. Available at: http://www.sph.umich.edu/group/epid/ [Access on 17 May 2006]
** Cabral MDB. Análise de sensibilidade em estudos epidemiológicos [dissertação de mestrado]. Rio de Janeiro: Núcleo de Estudos de Saúde Coletiva da UFRJ; 2005.