# Abstract

## OBJECTIVE:

To present aspects of the sampling plan of the Brazilian Oral Health Survey (SBBrasil Project). with theoretical and operational issues that should be taken into account in the primary data analyses.

## METHODS:

The studied population was composed of five demographic groups from urban areas of Brazil in 2010. Two and three stage cluster sampling was used. adopting different primary units. Sample weighting and design effects (deff) were used to evaluate sample consistency.

## RESULTS:

In total. 37.519 individuals were reached. Although the majority of deff estimates were acceptable. some domains showed distortions. The majority (90%) of the samples showed results in concordance with the precision proposed in the sampling plan. The measures to prevent losses and the effects the cluster sampling process in the minimum sample sizes proved to be effective for the deff. which did not exceeded 2. even for results derived from weighting.

## CONCLUSIONS:

The samples achieved in the SBBrasil 2010 survey were close to the main proposals for accuracy of the design. Some probabilities proved to be unequal among the primary units of the same domain. Users of this database should bear this in mind, introducing sample weighting in calculations of point estimates, standard errors, confidence intervals and design effects.

Dental Health Surveys; methods; Cluster Sampling; Epidemiologic Research Design

# INTRODUCTION

The Brazilian Oral Health Survey (SBBrasil 2010) is a health monitoring strategy which uses primary data to produce information which aids in implementing oral health care policies. It is the second large-scale nationwide oral health survey; a similar. previous one was carried out in 2003. Two other nationwide surveys were carried out in 1986 and 1996. although only in the state capitals and assessing fewer health problems.

The SBBrasil 2010 was planned during 2009 and data collection took place between February and November 2010 in 177 municipalities. including the 27 state capitals. A total of 37.519 interviews and oral examinations were carried out on the age groups recommended by the World Health Organization (five year olds. 12 year olds. 15 to 19 year olds. 35 to 44 year olds and 65 to 74 year olds). The main oral health care problems (dental caries. periodontal disease. malocclusion. fluorosis. trauma and edentulism). as well as socioeconomic data regarding information on use of orthodontic services. self-reported oral morbidities and self-perception of oral health. The final report and the database of original data are available on the Brazilian Ministry of Health General Coordination of Oral Health site. ^{a}**a**Ministério da Saúde. Coordenação de Saúde Bucal da Secretaria de Assistência à Saúde. Projeto SBBrasil 2010 - Pesquisa Nacional de Saúde Bucal. [cited 2013 Sep 04]. Available from: http://dab.saude.gov.br/cnsb/sbbrasil/index.html

There were 160 samples distributed according to 32 geographical domains. representing the populations of the aforementioned age groups. resident in the state capitals or in municipalities in the interior of the five regions of Brazil. Obtaining epidemiological information directly from these samples. whether for an age group. a state capital or a municipality in the interior. requires knowledge of the sampling plan. In other words. the inferences made should take into account the method designed for the inclusion. for the selection of a specific individual in the sample from the domain to which they belong. The general model used was cluster sampling in multiple stages. in which the sampling units were selected with probability proportional to the number of residences in each.

The aim of this article was to present aspects of the sampling plan. explaining theoretical and operational issues which should be taken into account when analyzing the primary data.

# METHODS

## Expected number of interviews and oral examinations

For those aged five and those aged 12. and for the 65 to 74 year old age group the coefficient of variation of ratios was adopted as a measure of accuracy. because most of the health problems consisted of categorical variables. The quantitative decayed. missing and filled teeth index (DMFT) proved to be inadequate as a parameter due to its low mean value and high variability. especially at ages five and 12. The results obtained through the equation

varied between 3% and 27% depending on the expected prevalence values for the population. when n = 125. As this was the minimum acceptable number for the domains of the abovementioned groups. it was verified that the absolute values of the standard error were below 5% and never above 18% of the rates of prevalence above 10%. In order to diminish the cluster sampling effect on this accuracy criterion. it was decided to double the number of interviews (deff = 2) and select 250 individuals in each domain.^{4}4 . United Nations, Department of Economics and Social Affairs, Statistics Division. Household sample surveys in developing and transitions countries. New York; 2005 [citado 2013 ago 13]. (Serie F, 96). Disponível em: http://stats.un.org/unsd/hhsurveys/pdf/Household_surveys.pdf

http://stats.un.org/unsd/hhsurveys/pdf/H...

For the 15 to 19 and the 35 to 44 year old age groups. the sample size was calculated using the expression n = [(s _{x} .1.96)/m] ^{2} . with 1.96 being the value of normal distribution corresponding to the 95% confidence interval estimated for the mean number of decayed. missing and filled teeth (DMFT) in each domain: (m) is the tolerable margin for error inherent to the simple random sampling process; and (s _{x} ) estimates the standard error using data from the 2003 survey. The initial results were corrected to compensate for the response rate effect of 80% and design effect (deff) of 2.

The samples of residences in the 160 domains were calculated using the equation (dom = n/r x 0.9). in which “n” is the minimum number of interviews. determined by the aforementioned criterion of accuracy. and “r” is the density of elements (of each demographic group) per residence. calculated based on data from the 2000 demographic census. The correction of 0.9 aimed to prevent loss of accuracy due to closed or vacant residences and to refusals to take part in the study.

## Sampling process

The Figure shows the distribution of the state capitals and the interior municipalities included in the macro-region samples. The method used for selecting the sample followed the general model of cluster sampling in multiple stages. With probability proportional to size (PPS). ^{1}1 . Cochran WG. Sampling techniques. New York: John Wiley & Sons; 1977. In the first stage. 30 census tracts were selected for each state capital and 30 municipalities for the interior of each region. These were the primary sampling units (PSUs) which were included when drawing up the files as well as in estimating standard error and confidence intervals.

In the second stage. residences in the census tracts of each capital and two census tracts in the municipalities making up the sample from the interior were selected. Each geographical region contained 30 tracts for each capital and 60 for the sample of municipalities in the interior. In the third stage. which only applied to the municipalities in the interior. residences were randomly selected within each of the sectors selected in the previous stage.

In the samples of residences in each demographic domain and each group. all of the elements deemed eligible were interviewed and examined. Therefore. the probability of an individual being selected was the same as the probability of their residence being selected.

In the state capitals. the equation

calculates the theoretical probability of a residence being included in the sample for the state capitals. in which is the number of residences in the j ^{th} census tract and (d) is the number of residences selected within each sector. For those residing in the interior. where the selection process had three stages. the probability of inclusion was calculated by

^{th}census tract situated in the territory of the municipality (j) selected in the first stage. In the interior. (d) was also the number of residences selected within the census tract.

However. in both equations. the denominator of the last fraction recorded the result of the quick count. conducted in the field. in order to update the data from the 2000 census which was used to select the municipalities or census tracts in the previous stages. The self-weighting of the samples according to the PPS method was abandoned and the equations shown were effectively calculated substituting these terms for their respective values( *D* ^{’} _{j} ) and ( *D* ^{’} _{m.j.i} ) updated in 2010.

## Sampling weight and Design effects

The sample weights were calculated by the inverse of the probability equations (f) ^{-1} and added to the files of the individuals examined. This meant attributing the data from each element included in the sample to those not included in the same PSU. This mechanism can reduce potential bias due to the disproportionality of the numbers observed in the interviews between PSUs. In theoretical terms. it means affirming that the sampling plan does not follow the principle of self-weighting. according to which the probability of an individual being included in the samples for all the domains. in each demographic group. ^{3}3 . Korn EL, Graubard BI. Analysis of health surveys. New York: John Wiley & Sons; 1999. (Wiley Series in Probability and Statistics). would be equal and expressed by (f = n/N).

The weights (w) were calculated for each primary sampling unit of the sample. including. as seen in the mathematical equations. the terms of the probability of being selected at each stage. Operationally. the results obtained for a PSU were attributed to all of the individuals included and the final file of the data contains this weight for each individual record from which it is composed.

Estimates for the measurements or proportions. standard error and confidence intervals were calculated with and without basic weight. using the “SVY” (survey) module of the Stata program. version 11.2. This application introduces design variables (defining the domains) and basic weights in the statistical process. Estimates of standard error were calculated using the Taylor linearization method. applicable to data from complex sampling plans. ^{1}1 . Cochran WG. Sampling techniques. New York: John Wiley & Sons; 1977. . ^{2}2 . Kish L. Survey sampling. New York: John Wiley & Sons; 1965.

Design effects (deffs) were calculated for the estimates of each domain according to geographical region and demographic group. Comparison of these measures calculated with or without basic weight meant the effect of homogeneity and the intra-class impact on the accuracy of the sample weights could be assessed. ^{3}3 . Korn EL, Graubard BI. Analysis of health surveys. New York: John Wiley & Sons; 1999. (Wiley Series in Probability and Statistics).

The SBBrasil 2010 project was carried out following the standards set by the Declaration of Helsinki and was approved by the *Conselho Nacional de Ética em Pesquisa* . record no. 15.498. 7 ^{th} January 2010.

# RESULTS

The sample was divided into geographic regions. defined by the 27 state capitals and the 150 interior municipalities in the five macro-regions of Brazil (Figure). In total. 1.110 census tracts were selected: 30 for each state capital and 60 for each sample of the municipalities in the interior.

The number of residences selected in order to achieve the minimum number of interviews and oral examinations in the domains can be found in Table 1 . It can be seen that. in the majority of cases. the results are greater for the groups of five and 12 year olds. who have lower intra-residence density. The only exception is in the North. which showed higher samples for the elderly in Porto Velho. Macapá and Palmas. This important demographic detail should not be overlooked in sampling plans which take the residence as the sampling unit at some stage of the selection. For example. in the state capital. São Paulo. in order to achieve 250 interviews. it was necessary to select 5.637 residences for the first group and almost three times fewer (1.904) for the group aged 65 to 74. This difference is the result of unequal densities. calculated by the ratio of individuals/residence. equal to five children or 15 elderly individuals for each 100 residences.

**Table 1**Number of selected residences according to samples of individuals (n) according to age group and geographic domain. SBBrasil. 2010.

Identifying and selecting the addresses within each census tract. supervised by the research coordination team. sought to preserve the criteria of accuracy defined in the sampling plan. However. the effective number of interviews and oral examinations achieved in each sample was rarely the same or above the defined minimums ( Table 2 ). Only the samples in the interior of each geographical region were the minimums defined in the sampling plan preserved. achieving at least 70% of the planned interviews in all of the domains.

**Table 2**. Number of interviews achieved according to geographic domains and age groups. SBBrasil. 2010

In the state capitals. despite the process of updating the register of residences in each census tract. circumstance due to the infrastructure and logistics of field work can be associated with the results found. Almost half of the samples in the 35 to 44 year old age group did not achieve 50% of the number expected in the plan. thus losing the protection included against the cluster sampling effect (deff = 2). Of the 13 occasions on which this occurred. 10 of them were in capitals in the North or North East. Samples with a performance below 50% of the expected sample size may record abrupt departures from the accuracy criteria and estimates of standard error and deffs should be analyzed with caution.

Table 3 shows the results of estimated prevalence. standard error and deff for the five year old. 12 year old and 65-74 year old age groups. according to geographic domain. For the five year olds. the prevalence of dental caries in deciduous teeth is illustrated. represented by the proportion of individuals with deft ≥ 1 and. at age 12. the prevalence in permanent teeth is shown (proportion of individuals DMFT ≥ 1). The prevalence of bleeding gums is shown for the 65-74 year old age group. In general. the impact of the cluster sample process and of the sampling weight on accuracy is low. There were rare occurrences similar to the prevalence estimates of bleeding in the 65 to 74 years old found in Campo Grande. which doubled the deff when estimated with weights. Also. as expected in the planning. the values for the coefficients of variation do not exceed 15%

Estimates of the mean. standard errors and deffs of the DMFT in the 15 to19 year old group and the 35 to 44 year old group ( Table 4 ) may also be considered stable in the great majority of domains and. thus. compatible with the accuracy criteria set in the design. However. the highest values for deff were reached in th 35 to 44 year old age group in Macapá. São Luís and municipalities in the interior of the macro regions for estimates without weighting. This results are due to the impact of similarity between the individuals who made up the primary sampling units and expected as. as seen before. it is in this group that the greatest deviations between the sample sizes proposed by the design and those actually achieved occurred. This is because interviews were only carried out with eligible elements residing in the selected residences included in the sample.

# DISCUSSION

It can be concluded that the samples achieved in the SBBrasil 2010 come close to the principles proposed in the design. With a response rate of above 70% for the residences selected. the probability of being included for all of the individuals was the same as the probability of their residences being selected. However. due to the difference in the age group composition of the base addresses used and what was actually found in the field. these probabilities ended up being unequal between primary units in the same domain. Thus. the users of this database should bear in mind this peculiarity. introducing sampling weight in calculations of point estimates. standard errors. confidence intervals and design effects.

The results shown for the deffs consolidate the results obtained for the means and proportions in four demographic domains. The conservative design feature seems to have preserved the accuracy criteria and the cluster effect. keeping them at desired levels. The number of interviews and examinations carried out was above the minimum planned in 142 of the samples. with deff values of 2 or below. Unfortunately. the 18 samples which achieved below 50% and with deffs above 2 were concentrated in the demographic domain of the 35 to 44 year old age group and in the North and Northeast of Brazil. These results should guide future sample designs for epidemiological surveys for this demographic section of the Brazilian population. especially those which. like the SBBrasil 2010 project. work with professionals of local health care services in order to obtain samples of residences and to carry out interviews and examinations.

Thus. it is important to highlight that the SBBrasil 2010 used an epidemiological survey model which could be incorporated into daily practice by the health care services as an essential tool in health care planning and assessment activities. It should. therefore. combine operational feasibility with representative data.

# References

^{1}Cochran WG. Sampling techniques. New York: John Wiley & Sons; 1977.^{2}Kish L. Survey sampling. New York: John Wiley & Sons; 1965.^{3}Korn EL, Graubard BI. Analysis of health surveys. New York: John Wiley & Sons; 1999. (Wiley Series in Probability and Statistics).^{4}United Nations, Department of Economics and Social Affairs, Statistics Division. Household sample surveys in developing and transitions countries. New York; 2005 [citado 2013 ago 13]. (Serie F, 96). Disponível em: http://stats.un.org/unsd/hhsurveys/pdf/Household_surveys.pdf

» http://stats.un.org/unsd/hhsurveys/pdf/Household_surveys.pdf

- aMinistério da Saúde. Coordenação de Saúde Bucal da Secretaria de Assistência à Saúde. Projeto SBBrasil 2010 - Pesquisa Nacional de Saúde Bucal. [cited 2013 Sep 04]. Available from: http://dab.saude.gov.br/cnsb/sbbrasil/index.html
- The
*Pesquisa Nacional de Saúde Bucal 2010*(SBBrasil 2010. Brazilian Oral Health Survey) was financed by the General Coordination of Oral Health/Brazilian Ministry of Health (COSAB/MS). through the*Centro Colaborador do Ministério da Saúde em Vigilância da Saúde Bucal*.*Faculdade de Saúde Pública*at*Universidade de São Paulo*(CECOL/USP). process no. 750398/2010. - This article underwent the peer review process adopted for any other manuscript submitted to this journal. with anonymity guaranteed for both authors and reviewers.
- Editors and reviewers declare that there are no conflicts of interest that could affect their judgment with respect to this article.The authors declare that there are no conflicts of interest.
- Article available from: www.scielo.br/rsp

**Correspondence:**Nilza Nunes da Silva - Depto. Epidemiologia - Faculdade de Saúde Pública - Universidade de São Paulo - Av. Dr. Arnaldo. 715 - Cerqueira Cesar - 01246-904 - São Paulo. SP. Brasil - E-mail: nndsilva@usp.br

# Publication Dates

**Publication in this collection**

Dec 2013

# History

**Received**

16 May 2012**Accepted**

18 Apr 2013