Structured approaches for the screening and diagnosis of childhood tuberculosis in a high prevalence region of South Africa

Hatherill, Mark; Hanslo, Monique; Hawkridge, Tony; Little, Francesca; Workman, Lesley; Mahomed, Hassan; Tameris, Michele; Moyo, Sizulu; Geldenhuys, Hennie; Hanekom, Willem; Geiter, Lawrence; Hussey, Gregory

RESEARCH

Structured approaches for the screening and diagnosis of childhood tuberculosis in a high prevalence region of South Africa

Approches structures pour le dépistage et le diagnostic de la tuberculose chez I'enfant dans une region d'Afrique du Sud oü cette maladie est fortement prévalente

Sistemas estructurados de cribado y diagnóstico de la tuberculosis infantil en una región de alta prevalência de Sudáfrica

Mark Hatherill^I,^*; Monique Hanslo^I; Tony Hawkridge^II; Francesca Little^III; Lesley Workman^I; Hassan Mahomed^I; Michele Tameris^I; Sizulu Moyo^I; Hennie Geldenhuys^I; Willem Hanekom^I; Lawrence Geiter^IV; Gregory Hussey^I

^ISchool of Child and Adolescent Health, University of Cape Town, Anzio Road, Cape Town, 7925, South Africa
^IIAeras Global TB Vaccine Foundation, Rockville, United States of America (USA)
^IIIDepartment of Statistical Sciences, University of Cape Town, Cape Town, South Africa
^IVOtsuka Pharmaceutical Development and Commercialization Inc., Rockville, USA

ABSTRACT

OBJECTIVE: To measure agreement between nine structured approaches for diagnosing childhood tuberculosis; to quantify differences in the number of tuberculosis cases diagnosed with the different approaches, and to determine the distribution of cases in different categories of diagnostic certainty.
METHODS: We investigated 1445 children aged < 2 years during a vaccine trial (2001-2006) in a rural South African community. Clinical, radiological and microbiological data were collected prospectively. Tuberculosis case status was determined using each of the nine diagnostic approaches. We calculated differences in case frequency and categorical agreement for binary (tuberculosis/not tuberculosis) outcomes using McNemar's test (with 95% confidence intervals, CIs) and Cohen's kappa coefficient (K).
FINDINGS: Tuberculosis case frequency ranged from 6.9% to 89.2% (median: 41.7). Significant differences in case frequency (P < 0.05) occurred in 34 of the 36 pair-wise comparisons between structured diagnostic approaches (range of absolute differences: 1.582.3%). Kappa ranged from 0.02 to 0.71 (median: 0.18).The two systems that yielded the highest case frequencies (89.2% and 70.0%) showed fair agreement (K: 0.33); the two that yielded the lowest case frequencies (6.9% and 10.0%) showed slight agreement (K: 0.18).
CONCLUSION: There is only slight agreement between structured approaches for the screening and diagnosis of childhood tuberculosis and high variability between them in terms of case yield. Diagnostic systems that yield similarly low case frequencies may be identifying different subpopulations of children. The study findings do not support the routine clinical use of structured approaches for the definitive diagnosis of childhood tuberculosis, although high-yielding systems may be useful screening tools.

RÉSUMÉ

OBJECTIF: Mesurer le degré d'accord entre neuf approches structurées pour le diagnostic de la tuberculose chez I'enfant; quantifier les differences en termes de nombres de cas de tuberculose diagnostiques entre ees neuf approches ; et determiner la repartition des cas dans les différentes categories de certitude diagnostique.
MÉTHODES: Nous avons étudié 1445 enfants de moins de 2 ans appartenant à une communauté rurale d'Afrique du Sud, dans le cadre d'un essai vaccinal (2001-2006). Des données cliniques, radiologiques et microbiologiques ont été collectées prospectivement. Nous avons determiné quel statut diagnostique (tuberculeux/non tuberculeux) était affecté par chacune des approches aux cas potentiels de tuberculose. Nous avons calculé les differences en termes de frequence des cas et l'accord concernant la catégorie de certitude pour les résultats binaires (tuberculose/absence de tuberculose) en utilisant le test de McNemar (avec les intervalles de confiance à 95 %, IC) et le coefficient kappa de Cohen (K).
RÉSULTATS: La frequence des cas de tuberculose se situait entre 6,9 et 89,2 % (médiane : 41,7 %). Des differences significatives sont apparues
dans la frequence des cas (p < 0,05) dans 34 des 36 comparaisons par paire entre les approches diagnostiques structurées (plage de differences absolues : 1,5-82,3 %). Le coefficient kappa variait de 0,02 à 0,71 (médiane : 0,18). Les deux systémes donnant les plus fortes frequences de cas (89,2 % et 70,0 % respectivement) présentaient un accord satisfaisant (K: 0,33); les deux autres systémes, qui avaient fourni les plusfaibles frequences (6,9 % et 10,0 %, respectivement), n'étaient que faiblement en accord (K: 0,18).
CONCLUSION: II n'existequ'unfaible accord entre les approches structurées du dépistage et du diagnostic de la tuberculose chez I'enfant et il apparaít entre elles une forte variabilité du rendement en cas. Les systémes diagnostiques ayant fourni de maniere similaire des frequences de cas peu élevées pourraient identifier des sous-populations d'enfants différentes. Les résultats de cette étude ne sont pas en faveur d'un usage clinique systématique de ees approches structurées pour le diagnostic définitif des enfants tuberculeux, mais les systémes fournissant un rendement elevé en cas pourraient constituer des outils de dépistage útiles.

RESUMEN

OBJETIVO: Medir la concordancia entre nueve sistemas estructurados de diagnóstico de la tuberculosis infantil; cuantificar las diferencias en cuanto al número de casos de tuberculosis diagnosticados con los diferentes sistemas, y determinar la distribución de casos en distintas categorías de certeza diagnóstica.
MÉTODOS: Se estudió a 1445 niños menores de 2 años durante un ensayo de vacunas (2001-2006) llevado a cabo en una comunidad rural de Sudáfrica. Se reunieron de forma prospectiva datos clínicos, radiológicos y microbiológicos, y se determinó si los niños sufrían o no tuberculosis usando cada una de las nueve modalidades de diagnóstico. Para calcular las diferencias en la frecuencia de casos y la concordancia de categorías para resultados binarios (tuberculosis/no tuberculosis), aplicamos la prueba de McNemar (con intervalos de confianza del 95%) y el coeficiente kappa de Cohen (K).
RESULTADOS: La frecuencia de casos de tuberculosis se situó entre 6,9% y 89,2% (mediana: 41,7). Se observaron diferencias significativas en la frecuencia de casos (P< 0,05) en 34 de las 36 comparaciones emparejadas entre los sistemas de diagnóstico estructurado (intervalo de diferencias absolutas: 1,5-82,3%). Kappa osciló entre 0,02 y 0,71 (mediana: 0,18). Los dos sistemas que hallaron las frecuencias de casos más altas (89,2% y 70,0%), mostraron una concordancia aceptable (K: 0,33); y los dos que hallaron las frecuencias de casos más bajas (6,9% y 10,0%) mostraron una concordancia baja (K: 0,18).
CONCLUSIÓN:Se observa solo una baja concordancia entre los sistemas estructurados en lo relativo al cribado y diagnóstico de la tuberculosis infantil, y una alta variabilidad entre ellos en términos de detección de casos. Sistemas de diagnóstico que arrojan frecuencias de casos similarmente bajas podrían estar detectando subpoblaciones de niños diferentes. Los resultados del estudio no respaldan el uso clínico sistemático de criterios estructurados para el diagnóstico definitivo de la tuberculosis infantil, pero los sistemas que consiguen valores altos de detección pueden ser un valioso instrumento de cribado.

Introduction

Despite the scale of the worldwide tuberculosis epidemic, the disease remains very difficult to diagnose in children, especially in regions with limited resources.¹ Childhood tuberculosis is often paucibacillary and the diagnosis rests on interpretation of chest radiograph findings and non-specific symptoms and signs.¹Improving diagnostic accuracy and reliability is key to integrating childhood tuberculosis into national control programmes, and the World Health Organization (WHO) has thus prioritized diagnostic criteria for childhood tuberculosis.² Objective, reproducible tuberculosis diagnosis will also be pivotal for defining end-points in trials of new tuberculosis vaccines.³ The need for accurate diagnosis is felt most acutely among younger children, who contribute substantially to the burden of tuberculosis in high prevalence regions.^4-7

Routine clinical use of a structured diagnostic approach that is unsuited to a particular setting can result in systematic errors in estimating the burden of tuberculosis and in patient management. It follows that regional guidelines for screening and diagnosis of childhood tuberculosis should be tailored to their epidemiological context.

The relative merits of existing structured diagnostic approaches are debatable.^5,8-17 Hesseling et al. reviewed 16 such approaches and noted that few of the scoring systems, algorithms and classifications for the screening and diagnosis of childhood tuberculosis have been validated against a gold standard. Most have been developed for hospital-based studies and their usefulness in community settings is relatively unknown.^5,18-21 Some have suggested that structured diagnostic approaches should be used only as screening tools to select children for further investigation,^9,10 while others have proposed a simplified case definition of childhood tuberculosis, based on cardinal symptoms, as an alternative to complex diagnostic systems.^1,22

Existing structured approaches to childhood tuberculosis provide a logical and reproducible basis for diagnosis based on clinical acumen, which Cundall termed "the art of the possible".²³However, we hypothesized that commonly used, structured approaches for screening and diagnosing childhood tuberculosis may show poor agreement and yield highly variable case frequency results. The objectives of this paper were to quantify the tuberculosis case frequencies obtained by means of nine different diagnostic systems, to assess agreement between systems, and to offer possible explanations for discordant findings.

Methods

This analysis is based on data collected during a bacille Calmette-Guérin (BCG) vaccine trial conducted by the South African Tuberculosis Vaccine Initiative (SATVI) from March 2001 to August 2006 near Cape Town, South Africa (clinical trials identifier: NCT00242047).⁸ In the Boland-Over-berg region of South Africa, tuberculosis incidence among children aged < 2 years was estimated as > 3000 cases per 100 000 in 2006.^6,8,24 In the trial, which compared the vaccine efficacy obtained with percutaneous versus intradermal Tokyo-172 BCG, 11 680 neonates were followed up for a minimum of 2 years after vaccination.⁸

Children in the community suspected of having tuberculosis due to a history of contact with an adult case or to the presence of symptoms compatible with the disease were identified by a regional surveillance system. All such children underwent comprehensive radiological and bacteriological investigation, even if they had no symptoms. The presence and duration of cough, wheezing, fever or weight loss; the response to antibiotics; and the proximity of contact with an adult having tuberculosis (mother, other person within the household, person outside of the household), were recorded. Human immunodeficiency virus (HIV) status was determined by a rapid antibody test and, if the result was positive, confirmatory polymerase chain reaction (PCR) was performed as well. Tuberculin skin tests included both Mantoux and Tine. Chest radiographs (anteroposterior and lateral) were reviewed by three paediatricians and classified in terms of the likelihood of tuberculosis (Table 1). Two consecutive, paired gastric lavages and induced sputum samples were obtained for smear microscopy and culture of Mycobacterium tuberculosis using mycobacteria growth indicator tubes (Becton Dickinson and Co., Sparks, MD, United States of America). A diagnostic algorithm was developed, based on approaches described by Cundall and WHO, for objective post hoc determination of tuberculosis status as the trial end-point.^21,23 The decision to start tuberculosis treatment was made on discharge by the attending clinician on the basis of all available results, independent of the assigned trial end-point.

A protocol-specified objective was to compare the structured approaches used to diagnose childhood tuberculosis in developing countries with a high prevalence of tuberculosis and limited resources. Diagnostic approaches relevant to sub-Sa-haran Africa, dating from 1990 onwards, were selected by literature review and expert consultation. Recent modifications were preferred over versions predating the HIV era. Eight structured approaches were compared with the SATVI trial algorithm for tuberculosis case frequency.⁸The country of origin, lineage and type of approach are summarized in Table 2.

Structured diagnostic approaches were categorized as follows:

i) binary, with the diagnosis being simply positive or negative (yes = tuberculosis; no = not tuberculosis);^12,15

ii) hierarchical, with stratification into categories of diagnostic certainty, such as "definite", "probable", "possible", "unlikely" or "not tubérculosis"; ^8,14,16or

iii) numerical, with a score obtained by adding the weighted values assigned to each variable (score > x = tuberculosis).^9-11,13

Data for the variables used in these diagnostic approaches were collected prospectively during the trial. Missing variables were assigned a zero value. Referenced threshold values were used for the analysis unless cut-offthresholds were unspecified, and trial algorithm values were used as the default.⁸ To standardize reporting, the terms for the hierarchical categories of diagnostic certainty were "unlikely/not", "possible", "probable", and "definite" tuberculosis.^11,13,14,16Details of the various diagnostic approaches are provided in Appendix A (Available at: http://vacfa.com/index.php?option=com_content&view=secti on&layout=blog&id=10&Itemid= 10).

The variables required by each system to compute a tuberculosis outcome for each child were programmed using STATA version 10 (StataCorp, Inc., College Station, TX, USA). Tuberculosis cases were defined by:

i) "positive" classification for binary (tuberculosis/not tuberculosis) systems;

ii) "definite", "probable" or "possible" classification for hierarchical systems; or

iii) score > the specified cut-off for numerical scoring systems.

The analysis of binary outcomes compared the nine diagnostic approaches in terms of the number and percentage of tuberculosis cases diagnosed among the children investigated. McNemar's test was used to compare the paired proportions of tuberculosis cases diagnosed with each system, P-values were not manipulated to adjust for multiple comparisons. Cohen's kappa coefficient (K) was used to examine agreement between individual observations for each system. Weighted K statistics were calculated for systems with hierarchical classifications. The degree of agreement was defined by the following values oiK: 0-0.2 = slight; 0.2-0.4 = fair; 0.4-0.6 = moderate; 0.6-0.8 = substantial; and 0.8-1.0 = nearly perfect.²⁵

In total, 1869 case episodes involving 1654 children were investigated, and one case episode was selected for each child. Since children older than 2 years were excluded, 1445 children were included in this analysis.

Results

The median age at investigation was 11.4 months (interquartile range: 6.0-17.4). Contact with an adult with tuberculosis was reported for 952 children (65.9%), and 628 children (43.5%) had cough lasting > 2 weeks. Weight was recorded as being 60-80% of expected weight-forage in 316 (21.9%) children and as being < 60% of expected weight-for-age in 29 children (2.0%). Of the 1445 children studied, 54 (3.7%) tested positive for HIV with enzyme-linked immunosorbent assay, and 28 of these children (1.9%) were confirmed positive for HIV by polymerase chain reaction (PCR) assay. The chest radiograph was compatible with tuberculosis in 271 children (18.8%) and Mycobacterium tuberculosis was cultured from induced sputum or gastric lavage in 172 children (11.9%). Treatment for tuberculosis was started by the attending clinician in 611 children (42.3%).

Comparison of binary outcomes

Fig. 1 illustrates the number and percentage of tuberculosis cases diagnosed with each system. The median tuberculosis case frequency was 41.7% (602 of the 1445 children investigated).

Differences in tuberculosis case frequency are shown in Table 3. The differences were significant (P < 0.05) in 34 of 36 possible pair-wise comparisons between the various structured diagnostic approaches. Only the comparisons between the Stegen-Toledo and SATVI approaches and between the Stoltz-Donald and Fourie approaches yielded non-significant differences. The pair-wise differences in tuberculosis case frequency ranged from 1.5% to 82.3%.

Table 4 summarizes the observed agreement between all structured diagnostic approaches and shows the K statistics for binary "tuberculosis/not tuberculosis" outcomes. For the 36 pair-wise comparisons, K ranged from 0.02 to 0.71 (median K: 0.18).

Two systems based on clinical, radiological and bacteriological source data (Osborne and Kibel) generated the highest tuberculosis case frequencies, yet showed only fair agreement. Four systems - MASA, Osborne, Fourie and WHO-Harries - demonstrated poor to fair agreement with all of the structured diagnostic approaches analysed. Notably, two numerical systems - MASA and WHO-Harries- classified the fewest case episodes as tuberculosis, but showed only slight agreement.

Comparison of hierarchical outcomes

The distribution of diagnoses in categories of ascending diagnostic certainty is illustrated for three hierarchical and two numerical-hierarchical scoring systems (Fig. 2). The distribution of the diagnostic categories assigned by the Osborne and Kibel systems was similar: a bell-shaped curve with most diagnoses grouped in the "possible" and "probable" categories. By contrast, the Stegen-Toledo and Stoltz-Donald systems yielded results with opposite distributions, with most cases in the "not'V'unlikely" or "definite" categories.

Table 5 summarizes the observed agreement and weighted .if for hierarchical and numerical-hierarchical systems across categories of increasing diagnostic certainty. Hierarchical agreement was nearly perfect between SATVI and Stoltz-Donald, and substantial between Kibel and Osborne.

Comparison of numerical outcomes

Tuberculosis case frequency ranged from 10.0% to 70.0% across four numerical scoring systems (Kibel, Fourie, WHO-Harries and Stegen-Toledo) when set at the pre-specified threshold (Fig. 3). Relative to the observed distribution of scores, two of the numerical systems (Kibel and Stegen-Toledo) used a low threshold for tuberculosis diagnosis, resulting in case frequencies of 70.0% and 53.4%, respectively. The other two systems (Fourie and WHO-Harries) used a relatively high diagnostic threshold, resulting in case frequencies of only 30.4% and 10.0%.

Discussion

The most striking finding of this study was the wide variation (6.9-89.2%) in the frequency of tuberculosis cases diagnosed with the nine structured diagnostic systems. The fact that the differences in tuberculosis case frequency were statistically significant for all but two of 36 possible paired comparisons between systems suggests that the burden of childhood tuberculosis in a given population could be under- or overestimated by as much as 82%. The risk of systematic clinical error is clearly high, and excess morbidity or unnecessary treatment may result if an inappropriate diagnostic system is used for routine management. The variability in tuberculosis case frequency also underscores the importance of accurate phenotyping for interpretation of clinical trial end-points; genotypic studies, and studies of immune correlates.

The second major finding is that the systems that yielded the highest and lowest tuberculosis case frequencies, namely the Osborne (89.2%) and Kibel (70.0%) and the MASA (6.9%) and WHO-Harries (10.0%) systems, demonstrated only fair or slight agreement with each other. Although the two outlier systems that generated the lowest results yielded similar tuberculosis case frequencies, the slight agreement suggests that they may be identifying different subpopulations.

In this study, the variation in tuberculosis case frequency observed when different structured diagnostic approaches were used and the relatively poor agreement between systems were more pronounced than previously reported. Edwards et al. retrospectively assessed agreement between clinical scoring systems used to diagnose tuberculosis among 91 children at a hospital in Kinshasa, Democratic Republic of the Congo. The four approaches (Fourie, WHO provisional guidelines, Stegen-Kaplan, and Ghidey-Habte) generated tuberculosis case frequencies ranging from 87% to 96%.^9,19-21 Agreement between systems ranged from fair (K: < 0.4) to moderate (K: 0.4-0.6) ,²⁶ The reason Edwards et al. found less variation in case frequency may be that the study was hospital-based and all children had been diagnosed with tuberculosis on the original Edwards scale.^18,26

We have also shown marked variation between hierarchical systems in the certainty of the diagnosis of tuberculosis.^13,14The evaluation of related hierarchical approaches with similar distributions (SATVI and Stoltz-Donald) by weighting K for concordant and discordant categories resulted in better agreement than for binary outcomes.^8,16 Although hierarchical and numerical systems that share key variables, such as apositive tuberculin skin test, a positive chest radiograph, and a positive sputum culture (Stegen-Toledo, Stoltz-Donald, and SATVI) showed moderate agreement, other systems with the same common variables showed less agreement and outlying case frequencies (Kibel, Osborne).^{8,11,13,14,16} It follows that system structure, weighting of variables and the exact order of Boolean decision-making may be as important as the constituent variables in determining the diagnostic output of each system.

There are several other reasons for the observed variation in tuberculosis case frequency and the relatively poor agreement between diagnostic approaches. They include differences in: (i) the purpose for which the systems were developed (as a screening tool or for definitive diagnosis; for clinical management or to obtain a trial end-point); (ii) clinical setting (community or hospital); (iii) disease severity (mild or severe tuberculosis); and (iv) regional prevalence of tuberculosis and/ or HIV infection (low or high). Ideally, for clinical trials a low-yielding diagnostic system should be used to minimize false positives at the expense of lower sensitivity.⁸ On the other hand, clinicians might prioritize sensitivity to avoid the potentially fatal consequences of underdiagnosis and delayed treatment.^14,27 Therefore, approaches designed for clinical management, especially to serve as screening tools, might yield higher tuberculosis case frequencies.^9,14,27 Although the SATVI trial algorithm lay in the mid-range of case frequency estimates, in the absence of a gold standard it is not possible to determine which of the nine approaches yielded the most accurate rate of tuberculosis.⁸ However, the proportion of children treated for tuberculosis on clinical grounds (42.3%) was almost identical to the median tuberculosis case frequency across all nine diagnostic approaches (41.7%).

The importance of context

This study was carried out in a community in which children with suspected tuberculosis were identified early, when the disease was probably mild.⁸ By contrast, the WHO-Harries system assigns the highest diagnostic weight to chronic illness, severe malnutrition and extra-pulmonary tuberculosis, all of which occur more frequently in hospitalized children. It is therefore not surprising that this approach yielded a low tuberculosis case frequency in our context.¹⁰Similarly, the MASA approach, which requires the presence of the complete triad of symptoms compatible with tuberculosis, as well as a positive tuberculin skin test and a suggestive chest radiograph, is designed as a treatment guideline for hospitalized children.¹⁵ The Osborne approach, which yielded results at the upper extreme of tuberculosis case frequency, was designed in a developing country setting where the index of suspicion for tuberculosis is high. It functions best as a screening tool, since children with suspected or possible tuberculosis are not necessarily treated.^11,14,16Similarly, the Kibel system is designed to guide initial treatment decisions rather than to establish a definitive diagnosis in resource-limited settings.^11,27 The Fourie system, also designed as a screening tool, yielded one of the lowest tuberculosis case frequencies, which suggests that it may be unsuitable for screening in our epidemiological setting.⁹ Some have noted that regional HIV prevalence may affect the performance of a particular diagnostic approach unless HIV infection status is incorporated.^5,8,14 The confounding effect of HIV status on diagnostic decision-making is likely to be greatest in systems that emphasize the non-specific features of malnutrition.¹⁰Edwards et al. noted that HIV-infected children scored higher on the Keith Edwards scale,¹⁸ a feature that would be common to the WHO-Harries approach. Consequently, the current edition of the WHO's TB/HIV: a clinical manual no longer recommends the use of diagnostic scoring systems.^10,26

Study limitations

This study has several limitations. Investigations were nested within a clinical trial that might not reflect clinical practice in developing regions. Variables were analysed in a standardized fashion that may differ from that used in the original diagnostic systems, and we acknowledge the potential limitations of K scores for assessing agreement. Children were younger than 2 years (an age group in which diagnostic imprecision is highest) and the findings may not be applicable to older children with a different disease spectrum. Since the study was community-based and investigations were geared towards pulmonary tuberculosis, there may have been a bias against diagnostic approaches that included features of extra-pulmonary tuberculosis. Furthermore, since all children identified by active case-finding were investigated for tuberculosis, even if they had no symptoms, the discrepancies between clinical, symptom-based and bacteriology-based systems may have been exaggerated. Structured diagnostic approaches were selected on the basis of relevance to the sub-Saharan region. Thus, four of the nine approaches were of South African origin.^8,11,15,16 We acknowledge the existence of other structured approaches for diagnosing childhood tuberculosis, such as the SantAnna score, but they were not included in this analysis.^17,28

Significance of findings

The public health significance of these findings is illustrated by the marked differences in tuberculosis case frequency and the poor agreement between diagnostic systems. Regional tuberculosis control programmes should make an informed decision to advocate a specific approach for the screening and diagnosis of childhood tuberculosis. Clearly, the study data do not support the routine, uncritical use of any particular diagnostic system for therapeutic decision-making. Some diagnostic approaches may in fact be best suited to specific settings. For example, a high-yielding system, such as Osborne, may be suitable as a screening tool, whereas the low-yielding WHO-Harries system may be most appropriate as a tool for diagnosing severe tuberculosis in regions with a low prevalence of HIV infection.

Conclusion

Although systems with a moderate case yield are less prone to extreme diagnostic error, the predictive value of any one system cannot be determined in the absence of a gold standard. Any structured approach to estimate tuberculosis case frequency can yield biased results if used in a way that differs from that for which it was originally designed, whether for clinical care or research purposes, screening or definitive diagnosis, mild or severe disease, or in low or high tuberculosis prevalence regions. However, in the absence of validation cohorts, there is limited evidence that these systems would have better diagnostic accuracy in their original settings. The findings of this study should not undermine confidence in existing diagnostic methods. Instead, they should encourage innovative research and critical analysis in the search for improved diagnostics for childhood tuberculosis.

Acknowledgements

We thank the staff of The South African Bacille Calmette-Guérin Trial Team for data collection; Maurice Kibel, John Burgess and Robert Gie for expert radiology review; Suzanne Verver for epidemiological support; and Lyness Matizirofa for statistical support.

Funding: The study was supported by the Aeras Global TB Vaccine Foundation, a non-profit organization that aims to develop tuberculosis vaccines.

Competing interests: TH and LG are current and previous full time employees of the trial sponsor. The authors have not entered into any agreements that have limited the completion of the research as planned, and they have had full control of all primary data.

References

1. Marais BJ, Gie RP, Hesseling AC, Schaaf HS, Lombard C, Enarson DA et al .A refined symptom-based approach to diagnose pulmonary tuberculosis in children. Pediatrics 2006; 118:e1350-9. doi: 10.1542/peds.2006-0519 PMID:17079536

2. A research agenda for childhood tuberculosis. Geneva: World Health Organization; 2007 (WHO/HTM/TB/2007.2381).

3. Skeiky YA, Sadoff JC. Advances in tuberculosis vaccine strategies. Nat Rev Microbiol 2006;4:469-76. doi:10.1038/nrmicro1419 PMID:16710326

4. Nicol MPDM, Wood K, Hatherill M, Workman L, Hawkridge A, Eley B et al. A comparison of T-SPOT.TB and tuberculin skin test for the evaluation of young children at high risk for tuberculosis in a community setting. Pediatrics 2009;123:38-43. doi:10.1542/peds.2008-0611 PMID:19117858

5. Hesseling AC, Schaaf HS, Gie RP, Starke JR, Beyers N. A critical review of diagnostic approaches used in the diagnosis of childhood tuberculosis. Int J Tuberc Lung Dis 2002:6:1038-15. PMID:12546110

6. Groenewald P. Boland-Overberg region annual health status report 2004. Worcester: Department of Information Management, Department of Health; 2004.

7. Houwert KA, Borggreven PA, Schaaf HS, Nel E, Donald PR, Stolk J. Prospective evaluation of World Health Organization criteria to assist diagnosis of tuberculosis in children. Eur Respir J1998; 11:111620. doi: 10. 1183/09031936.98.11051116 PMID:9648965

8. Hawkridge A, Hatherill M, Little F, Goetz MA, Barker L, Mahomed H et al.Efficacy of percutaneous versus intradermal BCG in the prevention of tuberculosis in South African infants: randomised trial. BMJ 2008;337:a2052. doi:10.1136/bmj.a2052 PMID:19008268

9. Fourie PB, Becker PJ, Festenstein F, Migliori GB, Alcaide J, Antunes M et al.Procedures for developing a simple scoring method based on unsophisticated criteria for screening children for tuberculosis. Int J Tuberc Lung Dis 1998;2:116-23. PMID:9562121

10. Harries A, Maher D, Graham S. TB/HIV: a clinical manual. 2nd ed. Geneva: World Health Organization; 2004.

11. Kibel M. A point system for management of childhood tuberculosis. Cape Town: Institute of Child Health, University of Cape Town; 1999.

12. Migliori GB, Borghesi A, Rossanigo P, Adriko C, Neri M, Santini S et al.Proposal of an improved score method for the diagnosis of pulmonary tuberculosis in childhood in developing countries. Tuber Lung Dis 1992;73:145-9. doi:10.1016/0962-8479(92)90148-D PMID:1421347

13. Montenegro SH, Gilman RH, Sheen P, Cama R, Caviedes L, HopperT et al.Improved detection of Mycobacterium tuberculosis in Peruvian children by use of a heminested IS6110 polymerase chain reaction assay. Clin Infect Dis 2003:36:16-23. doi:10.1086/344900 PMID:12491196

14. Osborne CM. The challenge of diagnosing childhood tuberculosis in a developing country. Arch Dis Child 1995:72:369-74. doi:10.1136/ adc.72.4.369 PMID:7763076

15. Pinkney-Atkinson V. TB practical guidelines: managed care and quality review. Cape Town: Medical Association of South Africa; 1996.

16. Stoltz AP, Donald PR, Strebel PM, Talent JM. Criteria for the notification of childhood tuberculosis in a high-incidence area of the western Cape Province. S Afr Med J 1990;77:385-6. PMID:2330522

17. Sant'Anna CC, Orfaliais CT, March Mde F, Conde MB. Evaluation of a proposed diagnostic scoring system for pulmonary tuberculosis in Brazilian children. Int JTuberc Lung Dis 2006:10:463-5. PMID:16602415

18. Edwards K. The diagnosis of childhood tuberculosis. P N G Med J 1987:30:169-78. PMID:3314246

19. GhideyY, Habte D. Tuberculosis in childhood: an analysis of 412 cases. Ethiop Med J 1983;21:m-7. PMID:6603973

20. Stegen G, Jones K, Kaplan P. Criteria for guidance in the diagnosis of tuberculosis. Pediatrics 1969:43:260-3. PMID:5304285

21. Provisional guidelines for the diagnosis and classification of the EPI target diseases for primary health care, surveillance and special studies. Geneva: World Health Organization; 1983 (EPI/GEN/83/84).

22. Marais BJ, Gie RP, Obihara CC, Hesseling AC, Schaaf HS, Beyers N. Well defined symptoms are of value in the diagnosis of childhood pulmonary tuberculosis. Arch Dis Child 2005;90:1162-5. doi: 10.1136/ adc.2004.070797 PMID:16131501

23. Cundall DB. The diagnosis of pulmonary tuberculosis in malnourished Kenyan children. Ann Trop Paediatr 1986;6:249-55. PMID:2435230

24. Country profile. South Africa. In: Global tuberculosis control. WHO report 2008. Geneva: World Health Organization; 2008. Available from: http://www.who.int/globalatlas [accessed 20 November 2009] .

25. McGinn T, Wyer PC, Newman TB, Keitz S, Leipzig R, For GG. Tips for learners of evidence-based medicine: 3. Measures of observer variability (kappa statistic). CMAJ 2004; 171:1369-73. PMID:15557592

26. Edwards DJ, Kitetele F, Van Rie A. Agreement between clinical scoring systems used for the diagnosis of pediatric tuberculosis in the HIV era. Int J Tuberc Lung Dis 2007;11:263-9. PMID:17352090

27. Kibel MA, Hussey G. Problems in the diagnosis of childhood tuberculosis. S Afr Med J 1990:77:379-80. PMID:2330519

28. Sant'Anna CC, Santos MA, Franco R. Diagnosis of pulmonary tuberculosis by score system in children and adolescents: a trial in a reference center in Bahia, Brazil. Braz J InfecfD/s 2004:8:305-10. doi:10.1590/S1413-86702004000400006 PMID:15565261

(Submitted: 9 January 2009 - Revised version received: 11 September 2009 -Accepted: 7 October 2009 - Published online: 29 December 2009)

* Correspondence to Mark Hatherill (e-mail: mark.hatherill@uct.ac.za).

Saúde Pública

Saúde Pública