**RESEARCH**

**Prediction of community prevalence of human onchocerciasis in the Amazonian onchocerciasis focus: Bayesian approach **

**Prévision de la prévalence communautaire de l'onchocercose humaine dans le foyer amazonien : approche bayésienne**

**Predicción de la prevalencia comunitaria de la oncocercosis humana en el foco amazónico: un enfoque bayesiano**

**Hélène Carabin ^{I}; Marisela Escalona^{II}; Clare Marshall^{III}; Sarai Vivas-Martínez^{II, IV}; Carlos Botto^{II, V}; Lawrence Joseph^{VI}; María-Gloria Basáñez^{II, VII, 1}**

^{I}Department of Biostatistics and Epidemiology, Oklahoma University Health Sciences Center, Oklahoma City, USA

^{II}Centro Amazónico para Investigación y Control de Enfermedades Tropicales 'Simón Bolívar' (CAICET), Estado Amazonas, Venezuela

^{III}Department of Epidemiology and Public Health, Faculty of Medicine (St Mary's Campus), Imperial College London, London, England

^{IV}Departamento de Medicina Preventiva y Social, Escuela Luis Razetti, Facultad de Medicina, Universidad Central de Venezuela, Caracas, Venezuela

^{V}Instituto de Medicina Tropical 'Dr Felix Pifano C.', Universidad Central de Venezuela, Caracas, Venezuela

^{VI}Department of Epidemiology and Biostatistics, McGill University, and Division of Clinical Epidemiology, Montreal General Hospital, Quebec, Canada

^{VII}Department of Infectious Disease Epidemiology, Faculty of Medicine (St Mary's Campus), Imperial College London, Norfolk Place, London W2 1PG, England (email: mbasanez@imperial.ak.uk)

**ABSTRACT**

**OBJECTIVE:** To develop a Bayesian hierarchical model for human onchocerciasis with which to explore the factors that influence prevalence of microfilariae in the Amazonian focus of onchocerciasis and predict the probability of any community being at least mesoendemic (>20% prevalence of microfilariae), and thus in need of priority ivermectin treatment.** METHODS:** Models were developed with data from 732 individuals aged

__>__15 years who lived in 29 Yanomami communities along four rivers of the south Venezuelan Orinoco basin. The models' abilities to predict prevalences of microfilariae in communities were compared. The deviance information criterion, Bayesian

*P*-values, and residual values were used to select the best model with an approximate cross-validation procedure.

**A three-level model that acknowledged clustering of infection within communities performed best, with host age and sex included at the individual level, a river-dependent altitude effect at the community level, and additional clustering of communities along rivers. This model correctly classified 25/29 (86%) villages with respect to their need for priority ivermectin treatment.**

FINDINGS:

FINDINGS:

**Bayesian methods are a flexible and useful approach for public health research and control planning. Our model acknowledges the clustering of infection within communities, allows investigation of links between individual- or community-specific characteristics and infection, incorporates additional uncertainty due to missing covariate data, and informs policy decisions by predicting the probability that a new community is at least mesoendemic.**

CONCLUSION:

CONCLUSION:

**Keywords:** Onchocerciasis/epidemiology/drug therapy; Onchocerca volvulus; Ivermectin/therapeutic use; Prevalence; Risk factors; Bayes theorem; Models, Statistical; Venezuela (*source: MeSH, NLM*).

**RÉSUMÉ**

**OBJECTIF:** Mettre au point un modèle hiérarchique de type bayésien applicable à l'onchocercose humaine permettant d'étudier les facteurs qui influent sur la prévalence des microfilaires dans le foyer amazonien d'onchocercose et de prévoir la probabilité que l'onchocercose sévisse au moins sur le mode mésoendémique dans une communauté donnée (prévalence des microfilaires >20 %) et nécessite par conséquent en priorité un traitement par l'ivermectine. **MÉTHODES:** Mise au point de modèles à partir des données recueillies auprès de 732 personnes de 15 ans au moins habitant dans 29 communautés Yanomami situées le long de quatre fleuves du bassin méridional de l'Orénoque au Venezuela. La capacité des divers modèles à prévoir la prévalence des microfilaires dans la communauté a été comparée. Le meilleur modèle a été sélectionné par approximation croisée en utilisant le critère d'information bayésien, les valeurs de p des modèles bayésiens et les résidus. **RÉSULTATS:** Le meilleur modèle est un modèle à trois niveaux qui tient compte du regroupement des cas dans les communautés, avec, au niveau individuel, la prise en compte des caractéristiques d'âge et de sexe de l'hôte, au niveau communautaire, la prise en compte de l'effet de l'altitude fleuve-dépendant et, au troisième niveau, la prise en compte de l'agrégation des communautés le long des fleuves. Ce modèle a permis de classer correctement 25 des 29 villages (soit 86 %) quant à la priorité du traitement par l'ivermectine. **CONCLUSION:** Les méthodes de Bayes sont une approche souple et utile pour la recherche en santé publique et la planification de la lutte contre les maladies. Notre modèle tient compte de l'agrégation des cas au sein des communautés, permet d'étudier le lien entre d'une part les caractéristiques particulières aux individus ou aux communautés et d'autre part l'infection, tient compte de l'incertitude supplémentaire due aux données manquantes concernant les covariables, et permet d'étayer les décisions politiques grâce à des variables prédictives de la probabilité que l'onchocercose soit au moins mésoendémique dans une nouvelle communauté.

**Mots clés:** Onchocercose/épidémiologie/chimiothérapie; Onchocerque volvulus; Ivermectine/usage thérapeutique; Prévalence; Facteur risque; Théorème Bayes; Modèle statistique; Venezuela (*source: MeSH, INSERM*).

**RESUMEN**

**OBJETIVO:** Desarrollar un modelo jerárquico bayesiano de la oncocercosis humana para estudiar los factores que influyen en la prevalencia de microfilarias en el foco amazónico de oncocercosis, y estimar la probabilidad de que una comunidad sea como mínimo mesoendémica (prevalencia de microfilarias > 20%) y necesite por tanto tratamiento prioritario con ivermectina. **MÉTODOS:** Se desarrollaron modelos con datos de 732 individuos __>__ 15 años que vivían en 29 comunidades yanomami a lo largo de cuatro ríos de la cuenca meridional del Orinoco venezolano, y se comparó la capacidad de cada modelo para predecir la prevalencia de microfilarias en las comunidades. Se seleccionó el mejor modelo por aproximación cruzada utilizando el criterio de información bayesiano, los valores *P* de los modelos bayesianos y los residuos. **RESULTADOS:** El modelo que mejor funcionó fue uno de tres niveles que tenía en cuenta el agrupamiento de los casos en las comunidades. El modelo incluía la edad y el sexo del huésped a nivel individual, un efecto de altitud río-dependiente a nivel de la comunidad y, en el tercer nivel, el agrupamiento adicional de las comunidades a lo largo de los ríos. Este modelo permitió clasificar correctamente 25/29 (86%) aldeas en lo referente a su necesidad de tratamiento prioritario con ivermectina. **CONCLUSIÓN:** Los métodos bayesianos brindan un criterio flexible y útil para las investigaciones de salud pública y la planificación de la lucha contra las enfermedades. Nuestro modelo reconoce el agrupamiento de la infección en las comunidades, permite investigar la relación entre la infección y características particulares de los individuos y las comunidades, incorpora la incertidumbre adicional por falta de datos de covariables, y puede informar las decisiones de política mediante variables predictivas de la probabilidad de que una nueva comunidad sea como mínimo mesoendémica.

**Palabras clave:** Oncocercosis/epidemiología/quimioterapia; Onchocerca volvulus;Ivermectina/uso terapéutico; Prevalencia; Factores de riesgo; Teorema de Bayes; Modelos estadísticos; Venezuela (*fuente: DeCS, BIREME*).

**Introduction**

Data produced from public health research often are organized in a hierarchical structure, with clustering within units. For example, individuals "cluster" in the same community, and communities cluster within regions. Individuals who belong to the same "unit" may share common genetic, behavioural, or social risk factors of disease. They may also have similar exposures to environmental factors or, in the case of communicable diseases, infectious agents. The health outcomes of two individuals within the same unit, therefore, will correlate more highly than those of two individuals from different units. This correlation structure must be accounted for irrespective of whether data on chronic diseases or on communicable diseases are being analysed.

Hierarchical or random effect models acknowledge the nested form of such data and allow for appropriate modelling of the correlation structure (*14*). The advantages of hierarchical models are not exclusive to the Bayesian framework; nevertheless, Bayesian hierarchical models are unique in that they provide a single coherent framework that allows the incorporation of multiple sources of variability (including variability that arises from missing outcomes or exposures) and subsequent quantification of within- and between-unit variability in outcome through the investigation of potential risk factors at each "level" of the model. The appropriate pooling of information across units means that hierarchical Bayesian models also overcome problems associated with small sample sizes and thus produce more reliable estimates (or predictions) of individual- and unit-specific parameters.

A fully Bayesian approach to inference requires the specification of a full probability (likelihood) model for the data, together with a prior distribution for all the unknown parameters. Once data are available, inference is made on the basis of the posterior distribution. The posterior represents what is known currently, including the prior information and that contained in the data. By Bayes' theorem, this joint posterior distribution is proportional to the product of the likelihood function and the prior distribution.

In practice, interest lies typically with the marginal posterior distributions of a subset of parameters. In realistically complex applications, evaluation of these marginal posterior distributions requires high-dimensional integration and rarely is possible analytically. One powerful technique (and the approach taken in this paper) is the implementation of a Markov chain Monte Carlo algorithm, such as the Gibbs sampler, to obtain samples from the marginal posteriors. These sampled values are then used to describe the complete distributions for the parameters of interest or to provide summaries, such as point and interval estimates. It also is possible to estimate the posterior distribution of any arbitrary function of the parameters; this is particularly useful when estimating quantities needed to inform decision making. For example, several WHO guidelines for the control of parasitic infections use threshold prevalence values to guide priority interventions (*56*). Often, definitive diagnosis at an individual level is difficult to acquire, in which case, interest lies with the probability that, conditional on easily observable characteristics, the (unknown) prevalence of infection in a given community is above a pre-defined threshold.

This study aimed to show the use of Bayesian methods in the analysis of the type of clustered data often encountered in public health research. In particular, we developed a Bayesian hierarchical model for human onchocerciasis to explore a variety of factors thought to influence the prevalence of infection. Onchocerciasis is caused by the parasitic nematode* Onchocerca volvulus* and is transmitted from person to person by the bite of river-breeding blackfly vectors of the genus* Simulium* (*7*). Onchocerciasis is the second most common worldwide cause of infectious blindness, and it also causes severe and incapacitating skin disease. More than 90% of the people afflicted live in Africa, but smaller foci are found in Latin America (*7*). In the latter, the Amazonian focus between Venezuela and Brazil is one of the most remote and severe (*8 12*). In addition to investigating risk factors for onchocerciasis, therefore, we estimated the probability of any community being "at least mesoendemic" and thus in need of priority ivermectin treatment (*13*), on the basis of information given by predictive variables that do not require invasive parasitological procedures. Such a method would provide an advantage over those that rely solely on skin biopsies, as well as providing valuable information for the setting of treatment priorities in areas of difficult access. A given community is at least mesoendemic when the infection prevalence surpasses 20% in the Americas (*14*) and 35% in Africa (*15*). We used the former criterion for the Amazonian focus and a method that allowed us to acknowledge the clustering of infection within communities, investigate links between individual- or community-specific characteristics and infection, incorporate additional uncertainty due to missing covariate data, and inform policy decisions by predicting the probability that a new community is at least mesoendemic*.*

**Data and methods**

**Study area**

From April 1995 to August 1999, 46 Yanomami communities were visited as part of an ongoing onchocerciasis epidemiological survey and control programme coordinated by Centro Amazónico para Investigación y Control de Enfermedades Tropicales. We included 29 of these communities in our study because they had not been treated with ivermectin; were situated near to or along the rivers Ocamo, Orinoco, Padamo, and Mavaca; and were located less than 250 m above sea level (Fig. 1). Above this elevation, altitude has been shown to have no impact on the prevalence of microfilariae and all communities are considered to be hyperendemic that is, to have a prevalence __>__60% (*11, 15*). From the 29 selected villages, 732 Yanomami individuals aged __>__15 years were examined for the presence of *O. volvulus *microfilariae in the skin; this age group was chosen because it had been identified as the indicator age group in the Amazonian focus (*12*). Investigators reached the communities by first flying to the closest rural medical dispensary and then by travelling on foot or by boat, or both. Twelve villages were situated along the Ocamo river, eight along the Orinoco, four along the Padamo, and five along the Mavaca.

** Measurement of microfilarial prevalence and ethical issues**

Two iliac skin snips from each participating individual were taken with a disinfected Holth-type corneoscleral punch; after 24 hours' incubation, emerging microfilariae were confirmed morphologically as* O. volvulus* (*9, 10*). The outcome was dichotomized as positive or negative for microfilariae. Informed consent for the parasitological evaluation was obtained from each individual. All eligible members of the village were treated after the parasitological examination, irrespective of their infectious status.

**Measurement of factors potentially associated with prevalence of microfilariae**

Potential risk factors for infection with microfilariae were measured at both individual and community levels. At the individual level, age and sex were recorded. At the community level, altitude, accessibility, and presence of a missionary post were recorded. Other variables, such as level of clothing and type of housing, correlated almost exactly with the presence of a mission and so were not considered further (data not shown). Altitude was measured in metres above sea level, as described previously (*11*). Accessibility was entered in the model as a categorical variable: "near" (<5 hours to reach the community), "intermediate" (524 hours), and "remote" (>24 hours), where the number of hours to reach the community started from the nearest rural medical dispensary (Fig. 1).

We hypothesized that the prevalence of onchocerciasis is influenced by two main pathways in addition to individual-level variables. First, altitude may affect infection status through its influence upon entomological determinants (*11*). Second, the presence of a mission may influence behavioural patterns, which in turn influence exposure to vectors.

**Statistical analysis**

**Descriptive statistics**

Summary statistics for all variables including means, medians, standard deviations, and proportions were generated first, so we could perform preliminary investigations of the association between each variable and the outcome of interest (microfilarial status) and of the potential for confounding between variables. Subsequent analyses used age as a categorical variable (1519 years, 2039 years, and __>__40 years), because its effect was not linear on a logit scale.

**Bayesian hierarchical model **(*16, 17*)

At the first level, we assumed a logistic regression model, in which the logit probability of infection of each individual was modelled as an additive function of possible individual-level risk factors (for example, age and sex) and of a random community-specific intercept. The latter reflected the underlying prevalence level (on a logit scale) in each community after individual-level characteristics were adjusted for. A linear regression model was then considered at the second level, and the community effects were modelled as a function of community-level risk factors (e.g. altitude and mission). This model was later extended to a third "river" level by introducing a random river-specific intercept at the second level (see Appendix A, web version only, available at: http://www.who.int/bulletin). The third model acknowledged the presence of unmeasured river-specific effects that might influence an individual's probability of infection and so induce correlation in prevalence among communities along the banks of one river. This consideration was motivated by the observation in the two-level model (and even after altitude was adjusted for) that the posterior estimates of the community-specific intercepts were higher in communities along the Ocamo and Orinoco rivers than in those along the Padamo and Mavaca.

All regression coefficients and associated 95% Bayesian credible intervals (95% BCI) were computed via the Gibbs sampler, which was implemented using WinBUGS software (*18*). The exponential of these coefficients was taken to obtain estimates of prevalence odds ratios and their 95% BCI. Risk factors were retained in the model if the associated 95% BCI excluded one or was borderline.

The results were based on two runs of 10 000 iterations each, after a burn-in of 1000 iterations. Convergence was assessed using the Gelman and Rubin statistic (*19, 20*).

**Missing data**

Covariate data were imputed for two individuals with missing age and for a community for which no altitude measurement was available. The missing ages were imputed at each iteration of the sampler in relative proportion to the age distribution in the remaining population. To reflect a priori knowledge, the missing altitude was imputed from a uniform distribution with range 165200 m above sea level. The posterior estimates of all parameters in this model fully incorporated the additional uncertainty in these imputed data.

**Model selection**

After we identified the important predictors of infection, we used the deviance information criterion to compare the fit of: a naive flat model that assumed independence of disease status across all individuals after the already identified risk factors were adjusted for; a two-level hierarchical model that acknowledged clustering of infection within communities; and a three-level model that acknowledged additional clustering of communities along rivers. The deviance information criterion is a measure of model "support" that aims to balance the fit (deviance) and complexity (effective number of parameters) of a model (*21*). In this way, it can be viewed as a generalization of the Akaike information criterion (*22*). Lower values of the deviance information criterion indicate higher model support, and a difference __>__4 was used to discriminate between models. This threshold, which was proposed for Akaike information criterion (*23*), also was regarded as appropriate for deviance information criterion (*21*).

We then assessed the candidate models on the basis of their ability to predict the observed community-specific prevalences of microfilariae. Assessment with a Bayesian predictive model such as this is in the spirit of classical hypothesis testing (*24, 25*), in that each candidate model can be criticized without explicit consideration of an alternative. An ideal approach involves cross-validation, which contrasts the observed prevalence in each community with its corresponding predictive distribution, given the data from all remaining communities. Because this can quickly become computationally prohibitive, we adopted an approximation that allowed us to estimate the cross-validatory predicted prevalences in all of our communities in a single fit of the model (*26, 27*).

The comparison between observed and predicted prevalence is summarized via the tail area probability, or the Bayesian *P*-value:* P*(predicted__>__observed) (*24, 25*). A probability close to 0% or to 100% would mean that prediction is nearly always lower or higher, respectively, than that observed, thus casting doubt on the model. In our context, it is less acceptable not to treat a meso-or hyperendemic community than to treat a hypoendemic community, so particular concerns would surround a model that led to estimates of *P *close to 0%.

In addition to the deviance information criterion and* P*(predicted__>__observed) values, we calculated posterior estimates of the residuals (predicted observed) and their 95% BCI, as a third criterion of "goodness of fit". Finally, for the model that seemed to be the most accurate according to these three criteria, we calculated the probability that each community was at least mesoendemic (*P*(predicted prevalence of microfilariae >20%)).

**Results**

Table 1 gives the characteristics of the study population. The overall observed prevalence of microfilariae was 32.8% (240/ 732), although the variation between communities was large, ranging from 0% to 100% (Fig. 2).

Accessibility to the community was associated strongly and negatively with the presence of a mission. A decision was made not to include these variables in the same model, therefore, as this introduced near-multicollinearity (data not shown). The inclusion of the presence of a mission or of accessibility did not provide significant improvements in fit, as defined above. Host age and village altitude thus were confirmed as important predictors of individual infection status. The variable "sex" was kept in all models to control for its possible confounding effect. The effect of altitude on an individual's infection status was modified clearly according to which river their community was located along (Fig. 2). The coefficient for altitude was assumed, therefore, to be river-specific. The small numbers of communities along each river meant that we achieved greater precision in our estimates of river-specific coefficients by assuming that they are exchangeable a priori.

The values of the deviance information criteria for the three models tested were highly supportive of the two models that acknowledged clustering of infection within communities. The deviance information criterion for model 1 was 669, for model 2 was 588, and for model 3 was 587 a difference of more than 80 was seen between the flat and hierarchical models.

Fig. 3 shows the predicted prevalences of microfilariae and their 95% BCIs, along with observed prevalences. The predictions from model 3 were markedly better than those from model 2. Table 2 gives Bayesian* P*-values and residuals. Model predictions seem to be reasonably good for all communities except for community 3 (Aweitheri), which is at a relatively low altitude (162 m) but has a very high recorded prevalence of infection (86%).

Model 3 showed that prevalence odds ratios increased with age: prevalence odds ratios were 2.44 (95% BCI 1.294.17) times higher for 2039 year olds than for individuals aged 1519 (reference group) and 4.18 (2.007.76) higher for those aged __>__40 years. The prevalence odds ratio also was slightly higher for men (1.40, 0.882.14) than women (reference group). The effect of altitude varied according to which river the community was situated along. For communities situated along the Ocamo and Orinoco rivers, the prevalence odds ratio increased as the altitude measured in metres above sea level increased: 1.03 (1.021.04) vs 1.04 (1.021.05), respectively. In contrast, altitude had a negligible effect on the prevalence odds ratio for communities along the Padamo river (1.01, 0.961.06), and even slightly reduced the prevalence odds ratio for communities along the Mavaca river (0.99, 0.971.00). These prevalence odds ratios indicate the increase in the prevalence odds for a community only one metre higher than its reference.

In model 3, the residual variability in (adjusted) prevalence between communities on the logit scale was decomposed into the variability between communities situated along the same river and the variability between rivers. After age, sex, and the differential effect of altitude on probability of infection were adjusted for, approximately 77% of the between-community variability was attributed to differences between rivers. This information is crucial to the reliable prediction of community-specific prevalence (Fig. 3).

To demonstrate the usefulness of model 3 in identifying communities that warrant mass ivermectin treatment, the approximate cross-validatory probability that the model would predict a prevalence of microfilariae >20% was assessed for each community. Twenty-five of the 29 communities were classified correctly (Fig. 4). One of the incorrectly classified communities (Purima) had an observed prevalence of 19.4%, but its probability of being classified as at least mesoendemic was 65%. Toothothopiwei had an observed prevalence of 39%, but the probability that we classified this as at least mesoendemic was 45%, marginally below our decision threshold. Haruri is on the banks of the Padamo river, yet unlike the other three communities alongside this river, its prevalence of onchocerciasis was mesoendemic (observed prevalence, 29%). Finally, Yepropë had a very low recorded prevalence (only 8%) compared with other communities lining the Ocamo at the same altitude.

**Discussion**

This paper shows the broad applications and usefulness of Bayesian random-effects models in dealing with covariates measured at various levels, clustering effects, and missing data. Another advantage of this approach is its ability to obtain estimates of arbitrary functions of model parameters, which automatically and coherently take into account all sources of uncertainty. We were able to predict correctly the outcome variable of interest for control planning purposes (that is, the probability that a new community has a prevalence larger than a predetermined threshold) in 25 out of 29 Yanomami communities of the Amazonian focus of onchocerciasis.

In this focus, it had already been established that communities situated along the Ocamo and Orinoco rivers and higher than 200 m above sea level were hyperendemic (prevalence of microfilariae __>__60%) (*12*). Below this altitude, however, prevalence had been shown to range from hypoendemic to hyperendemic, so that other criteria on which a decision on whether a community should receive priority treatment could be made needed to be identified. We extended the analysis to two additional river systems (Padamo and Mavaca) and to other community-level variables.

The study population represented approximately 40% of the Yanomami communities located lower than 250 m above sea level in the Venezuelan Amazonian focus of onchocerciasis; this in turn represented 30% of the total communities (*28*). The study communities varied widely in their degree of contact with mainstream culture because their selection was not biased in favour of the most accessible communities, and this allowed us to explore their contribution to the predictive model. In addition to the four rivers investigated, other river systems in the region (for example, Siapa) hardly have been studied. Ideally, the same exercise should be run for a sample of communities located on all rivers.

Our results show that most variability in prevalence is attributable to differences between rivers, but that important between-community variations remained, even within river systems. At the community level, and for a chronic infection such as onchocerciasis that has average latent and duration periods of one year and >10 years, respectively, the patterns of micro-movements (every 23 years) and macro-movements of Yanomami communities (*29*) may contribute towards the variation of prevalence of microfilariae among villages. Over the last 200 years, the Yanomami people have tended to migrate from the higher altitudes of the Parima mountains to lower riverine locations (*29*). Some communities now found at similar altitudes may have had different geographical origins within the region, and geographical proximity between villages is not necessarily a good reflection of their contact and exposure patterns.

Another factor that could explain the large variation in prevalence of microfilariae among communities may be the slope of the terrain. We showed that within the Orinoco and Ocamo river systems, prevalence of microfilariae increased with altitude. Along these two rivers, altitudinal gradient was a strong determinant of the presence, species composition, and abundance of blackfly vectors, which also differed in vectorial competence and capacity (*10, 30, 31*). Altitude itself, therefore, had a strong biological effect on infection prevalence. In contrast, the effect of altitude was negligible for communities located along the Padamo and Mavaca rivers. The rate at which altitude varies with distance is very different between these rivers, and slope (rather than just altitude) could influence the distribution of sites suitable for the immature stages of the different vector species.

At the river level, micro-movements of communities usually take place along the same river, and some river-specific variables might explain why prevalence of microfilariae is higher along the Ocamo and Orinoco rivers. The variability among rivers could be due to ecological factors that in turn determine the availability and productivity of the breeding sites of the different vector species, as well as the abundance and age-structure of the biting population (*32*). As an example, previous work conducted between 1995 and 1999 showed that, although the three vector species* Simulium guianense*,* S. incrustatum*, and* S. oyapockense* were present in communities along the Ocamo river,* S. incrustatum* consistently was absent along the Orinoco (*32*). Preliminary entomological surveys along the Padamo river showed that *S. oyapockense *(a less successful vector) was present but that* S. guianense* (the species with highest vector competence) was absent. Other, as yet unrecognized, characteristics of the rivers or characteristics common to communities situated along the same rivers may be important for the transmission of onchocerciasis. Investigation of the pattern of estimated community- and river-specific intercepts may help generate new hypotheses about these characteristics.

Given our current knowledge about factors that precisely determine prevalence of onchocerciasis, accurate prediction of the prevalence of infection remains difficult, and even our best model is imperfect. The problem is of obvious importance, however, as decisions must be made about which communities should be included in control programmes. As more data become available, our model can be updated to provide increasingly accurate predictions.

Although we considered variables other than altitude in our analyses, we were unable to disentangle behaviour-related pathways from vector-related pathways associated with the prevalence of microfilariae. This was because most missions were located at a low altitude (and therefore were easier to access) in our dataset. *S. oyapockense* had already been shown to be the predominant species at sites lower than 150200 m above sea level (*11, 31*). A larger dataset, with more altitudinal variation in the location of missionary posts, would be needed to disentangle the possibly independent effects of altitude (our indicator of prevailing vectors) and presence of a mission (our indicator of cultural changes). Well-established mission posts are found at 250 and 950 m (in villages located in the Sierra Parima, which were not analysed here therefore), and these have a predominance of the highly competent *S. guianense* (*30, 31*) and a high (pre-control) prevalence of microfilariae despite long-lasting missionary influence (*811, 33*).

Although we have shown the application of a Bayesian approach to the selection of communities with prevalence of microfilariae >20%, a different threshold for mesoendemicity could be set according to the epidemiological patterns of a given focus of onchocerciasis. In the Amazonian focus, previous research highlighted that prevalence of microfilariae increases with age, the threshold prevalence could be 30% instead of 20% when the indicator age group is used (*12, 34*). In Africa, the threshold prevalence would be set at 3540% (*13, 15*). The approach presented here would need information to be collected on the age and sex distribution, as well as on the size of Yanomami communities not yet evaluated. This is becoming possible, as several health care programmes are being implemented, and communities are visited regularly by trained personnel (*35*). In this way, Bayesian modelling could play an important role in the planning of community-based interventions of onchocerciasis in general and of control programmes in particular.

**Acknowledgements**

We thank all the Yanomami communities that participated in this study. William Bourgeon, Hortensia Frontado, Mayila García, Miliam Pacheco, Nátali Vásquez, and Néstor Villamizar from Centro Amazónico para Investigación y Control de Enfermedades Tropicales helped process the parasitological and entomological samples. We also wish to thank the Onchocerciasis Elimination Program for the Americas, the health personnel of the Upper Orinoco Health District, the authorities of the Regional Health Directorate, the missionaries, and the Venezuelan Air Force for logistical support. Roberto Barrera kindly helped prepare the maps in Fig. 1. Finally, we thank David Balding and two anonymous referees for their helpful comments on an earlier draft of this paper.

**Funding: **H.C. and M.-G.B. received funding from the Wellcome Trust, M.E. and M.-G.B. from the British Council, S.V.-M. and C.B. from World Bank (grant no. BM-VEN-96002), M.E. and C.B. from Onchocerciasis Elimination Program for the Americas, and L.J. from the Canadian Institute for Health Research.

**Conflicts of interest:** none declared.

**References**

1. Laird NM, Warem JH. Random effects models for longitudinal data. *Biometrics* 1982;38:963-74.

2. Bryk AS, Raudenbush SW. *Hierarchical models*. Newbury Park (CA): Sage; 1992.

3. Longford NT. *Random coefficient models*. Clarendon Hills (IL): Clarendon Press; 1993.

4. Goldstein H. *Multilevel statistical models*. London: Edward Arnold; 1995.

5. World Health Organization. *Report of the WHO informal consultation on hookworm infection and anaemia in girls and women*. Geneva: WHO; 1996.

6. World Health Organization. *Guidelines for the evaluation of soil-transmitted helminthiasis and schistosomiasis at the community level*. Geneva: WHO; 1998.

7. *Onchocerciasis and its control. Report of a WHO Expert Committee on Onchocerciasis Control*. Geneva: WHO; 1995 (WHO Technical Report Series, No 852).

8. Rassi E, Monzón H, Castillo M, Hernández I, Ramírez-Pérez J, Convit J. Discovery of a new onchocerciasis focus in Venezuela. *Bulletin of the Pan American Health Organization* 1977;11:41-64.

9. Yarzábal L, Botto C, Arango M, Raga LM, Wong F, Allan R, et al. Epidemiological aspects of onchocerciasis in the Sierra Parima, Federal Territory of Amazonas, Venezuela. In: Yarzábal L, Botto C, Allan R, editors. *La oncocercosis en América*. Caracas: Centro Amazónico para la Investigación y Control de Enfermedades Tropicales; 1985. p. 43-63.

10. Basáñez M-G, Yarzábal L. Onchocerciasis in the Sierra Parima and Upper Orinoco regions, Federal Territory of Amazonas, Venezuela. In: Miller MJ, Love EJ, editors. *Parasitic diseases: treatment and control.* Boca Raton (FL): CRC Press; 1989. p. 231-56.

11. Vivas-Martínez S, Basáñez M-G, Grillet M-E, Weiss H, Botto C, García M, et al. Onchocerciasis in the Amazonian focus of southern Venezuela: altitude and blackfly species composition as predictors of endemicity to select communities for ivermectin control programmes. *Transactions of the Royal Society of Tropical Medicine and Hygiene* 1998;92:613-20.

12. Vivas-Martínez S, Basáñez M-G, Botto C, Villegas L, García M, Curtis CF. Parasitological indicators of onchocerciasis relevant to ivermectin control programmes in the Amazonian focus of southern Venezuela. *Parasitology* 2000;121:527-34.

13. Richards Jr FO, Boatin B, Sauerbrey M, Sékétéli A. Control of onchocerciasis today: status and challenges. *Trends in Parasitology* 2001;17:558-63.

14. Onchocerciasis Elimination Program for the Americas. *Evaluaciones epidemiológicas de la oncocercosis en América*. Taller Operativo de Epidemiología. Ecuador: OEPA; 1996.

15. Prost A, Hervouet JP, Thylefors B. Les niveaux d'endémicité dans l'onchocercose.* Bulletin of the World Health Organization* 1979;57:655-62.

16. Gelman A, Carlin JB, Stern HS, Rubin DB. *Bayesian data analysis*. London: Chapman and Hall; 1995.

17. Gilks WR, Richardson S, Spiegelhalter DJ. *Markov chain Monte Carlo in practice*. London: Chapman and Hall; 1996.

18. Spiegelhalter D, Thomas A, Best N. WinBUGS Version 1.3. 2000. Available from: URL: http://www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml

19. Gelman, A, Rubin, D. Inference from iterative simulation using multiple sequences. *Statistical Science* 1992;7:457-511.

20. Cowles MK, Carlin BP. Markov chain Monte Carlo convergence diagnostics a comparative review. *Journal of the American Statistical Association* 1996;91:883-904.

21. Spiegelhalter DJ, Best NG, Carlin BP. Bayesian measures of model complexity and fit. *Journal of the Royal Statistical Society, Series B* 2002;64:583- 616.

22. Lindsey JK. *Parametric statistical inference*. Oxford: Oxford Science Publications; 1996.

23. Burnham KP, Anderson DR. *Model selection and inference. A practical information theoretic approach*. New York (NY): Springer-Verlag; 1998.

24. Gelman A, Meng X-L, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. *Statistica Sinica* 1996;6:733-807.

25. Gelfand AE, Dey DK, Chang H. Model determination using predictive distributions with implementation via sampling based methods. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. *Bayesian statistics 4*. Oxford: Oxford University Press; 1992. p. 147-67.

26. Marshall EC, Spiegelhalter DJ. Approximate cross-validatory predictive checks in disease mapping models. *Statistics in Medicine* (forthcoming).

27. Marshall EC, Spiegelhalter DJ. *Identifying outliers in Bayesian hierarchical models: a simulation-based approach. Technical report. 2003*. London: Department of Epidemiology and Public Health, Imperial College; 2003.

28. Ministerio del Ambiente y Recursos Naturales Renovables, Servicio Autónomo de Desarrollo Ambiental Amazonas. *Proyecto reserva de la biósfera Alto Orinoco-Casiquiare, Venezuela*. Informe Técnico no. FT/93/09. [ Project for the biosphere reserve of the Upper Orinoco-Casiquiare area. Technical report no. FT/93/ 09.] Caracas: Ministerio del Ambiente y Recursos Naturales Renovables; 1998. In Spanish.

29. Chagnon NA. *Yanomamö.* Fort Worth (TX): Harcourt Brace College Publishers; 1997.

30. Takaoka H, Suzuki H, Noda S, Tada I, Basáñez M-G, Yarzábal L. Development of *Onchocerca volvulus larvae in Simulium pintoi* in the Amazonas region of Venezuela. *American Journal of Tropical Medicine and Hygiene* 1984;33:414-9.

31. Basáñez M-G, Yarzábal L, Takaoka H, Suzuki H, Noda S, Tada I. The vectorial role of several blackfly species (Diptera: Simuliidae) in relation to human onchocerciasis in the Sierra Parima and Upper Orinoco regions of Venezuela.* Annals of Tropical Medicine and Parasitology* 1988;82:597-611.

32. Grillet M-E, Basáñez M-G, Vivas-Martínez S, Villamizar N, Frontado H, Cortez J, et al. Human onchocerciasis in the Amazonian area of southern Venezuela: spatial and temporal variations in biting and parity rates of black fly (Diptera: Simuliidae) vectors. *Journal of Medical Entomology* 2001;38:520-30.

33. Yarzábal L, Arango M, Botto C, Jaimes JL, Sánchez-Beaujon R, Raga LM. Nuevas observaciones sobre la endemia oncocercósica de la Sierra Parima, Territorio Federal Amazonas, Venezuela. [New observations on endemic onchocerciasis in the Sierra Parima, Amazonas Federal Territory, Vanezuela.] In: Yarzábal L, Holmes R, Basáñez M-G, Petralanda I, Botto C, Arango M, et al., editors. * Las filariasis humanas en el Territorio Federal Amazonas, Venezuela*. [Human filariases in the Amazonas Federal Territory, Venezula.] Caracas: PROICET-Amazonas; 1983. p. 3-19. In Spanish.

34. Basáñez M-G. *Report of a short-term epidemiology consultancy for the Onchocerciasis Elimination Program for the Americas*: Puerto Ayacucho: Centro Amazónico para la Investigación y Control de Enfermedades Tropicales, Onchocerciasis Elimination Program for the Americas; 1999.

35. Ministerio de Salud y Desarrollo Social, Dirección Regional de Salud de Amazonas, Distrito Sanitario del Alto Orinoco, Laboratorios de Salud Pública, Centro Amazónico de Investigación y Control de Enfermedades Tropicales.* Plan de salud para el pueblo Yanomami*. [Health plan for the Yanomami population]. Puerto Ayacucho: Ministerio de Salud y Desarrollo Social, Dirección Regional de Salud de Amazonas, Distrito Sanitario del Alto Orinoco, Laboratorios de Salud Pública, Centro Amazónico de Investigación y Control de Enfermedades Tropicales; 2000. In Spanish.

1 Correspondence should be addressed to this author.

**Appendix A. Detailed descriptions of models**

**Model 3 **

**Stage 1**

The observed status of infection with microfilariae (0/1), *Y _{icr}*, of individual

*i*(

*i*= 1,.., K

_{cr}) living in community

*c*(

*c*= 1, ... C

_{r}) along river

*r*(

*r*= 1, 2, 3, 4) is modelled as a Bernoulli variate with mean q

_{icr}. That is Y

_{icr }~ Bernoulli(q

_{icr}), where:

logit(q_{icr})= d_{cr }+ b_{age1} ´ age1_{icr} + b_{age2} ´ age2_{icr }+ b_{sex} ´ sex_{icr} (1)

The parameter q_{icr} corresponds to the underlying probability of infection for the given individual; b_{age1 }and b_{age2} represent the regression coefficients for the age groups 1 (2039 years) and 2 (__>__40 years), respectively; b_{sex} represents the effect of being male; and d_{cr} represents the underlying community-within-river specific intercept.

**Stage 2**

The community-specific intercepts are modelled in stage 2, where b_{alt[r] }represents the specific regression coefficients for altitude varying by river *r*, and f_{r} represents the underlying river-specific intercept. The parameter s_{c}^{2} reflects the variability in prevalence (on a logit scale) between communities located along the same river after adjustment for altitude.

d_{cr }~ Normal (µ_{cr}, s_{c}^{2}) (2)

µ_{cr} = f_{r }+ b_{alt[r]}´altitude_{cr} (3)

**Stage 3**

The river-specific intercepts and regression coefficients for altitude are assumed to be normally distributed, where l_{r} = global model intercept and n_{r}^{2} = measure of variability in prevalence between rivers.

b_{alt[r]}~ Normal (l_{alt}, n_{alt}^{2}) (4)

f_{r}~ Normal (l_{r}, n_{r}^{2}) (5)

**Stage 4**

At the fourth and last stage of the model, prior distributions are set for all unknown parameters, including s_{c}^{2}, l_{r}, n_{r}^{2}, and all regression coefficients. We used diffuse prior distributions, so that* a priori* all values in the feasible range have approximately equal values.

Fig. 5 gives a graphical representation (*1A*).

**Model 1**

In contrast to the above, model 1 assumes common underlying prevalence of infection in all communities after taking into account sampling variability and differences in known individual- and community-level factors. Thus, model 1 assumes simply that s_{cr }= l_{r} + b_{alt[r]}´ altitude_{cr} in equation (1) and ignores the additional complexity represented in equations (2)(5).

Diffuse priors are specified for all unknown parameters in model 1.

**Model 2**

Model 2 goes a step further than model 1 to acknowledge explicitly that, after adjustment for known risk factors, prevalence of infection still varies between communities (due to the effect of unmeasured or unmeasurable factors). Communities are, however, assumed to be fully exchangeable, and so, in the above notation, model 2 assumes d_{cr} ~N(µ_{cr}, s_{c}^{2}), where now µ_{cr =}l_{r }+ b_{alt[r]}´ altitude_{cr}.

Diffuse priors are specified for all unknown parameters in model 2.

**Model implementation**

Convergence of the Gibbs sampler was assessed informally by examining trace and auto-correlation plots, and, more formally, via Gelman and Rubin's criterion (see Cowles and Carlin (*2A*) and incorporated references). Auto-correlation plots (not shown) illustrated negligible within-chain correlation.

In the approximate cross-validation, we drew replicate community-specific random effects and used these to predict replicate individual-level outcomes (0 or 1). The mean of these predicted outcomes for each community formed a realization from the posterior distribution of the predictive prevalence that was conditional on the known characteristics of the individuals in that community and the community itself. We then compared the predicted prevalence with the observed prevalence in each community, by computing the posterior probability that the former was higher than or equal to the latter (that is,* P*(predicted__>__observed)). These Bayesian* P*-values were computed with a Markov chain Monte Carlo algorithm, by introducing a dummy indicator variable for each community that took the value 1 at a given iteration if, at that iteration, the value of the predicted prevalence was greater than or equal to the observed prevalence, and 0 otherwise. The average of the indicator variables over all iterations for a given community gave the required *P*-value.

**References**

1A. Whittaker, J. *Graphical models in applied multivariate analysis*. Chichester: Wiley; 1992.

2A. Cowles MK, Carlin BP. Markov chain Monte Carlo convergence diagnostics a comparative review. *Journal of the American Statistical Association* 1996;91:883-904.