Geocoding processes in cohort studies: methods applied in the EpiFloripa Aging

Catharina Cavasin Salvador Adalberto Aparecido dos Santos Lopes Danilo Resendes Fernanda Faccio Demarco Marcelo Dutra Della Justina Renato Tibiriçá de Saboya Cassiano Ricardo Rech Eleonora d’Orsi About the authors

ABSTRACT

OBJECTIVE

To describe the process and epidemiological implications of georeferencing in EpiFloripa Aging samples (2009–2019).

METHOD

The EpiFloripa Aging Cohort Study sought to investigate and monitor the living and health conditions of the older adult population (≥ 60) of Florianópolis in three study waves (2009/2010, 2013/2014, 2017/2019). With an automatic geocoding tool, the residential addresses were spatialized, allowing to investigate the effect of the georeferencing sample losses regarding 19 variables, evaluated in the three waves. The influence of different neighborhood definitions (census tracts, Euclidean buffers, and buffers across the street network) was examined in the results of seven variables: area, income, residential density, mixed land use, connectivity, health unit count, and public open space count. Pearson’s correlation coefficients were calculated to evaluate the differences between neighborhood definitions according to three variables: contextual income, residential density, and land use diversity.

RESULT

The losses imposed by geocoding (6%, n = 240) caused no statistically significant difference between the total sample and the geocoded sample. The analysis of the study variables suggests that the geocoding process may have included a higher proportion of participants with better income, education, and living conditions. The correlation coefficients showed little correspondence between measures calculated by the three neighborhood definitions (r = 0.37–0.54). The statistical difference between the variables calculated by buffers and census tracts highlights limitations in their use in the description of geospatial attributes.

CONCLUSION

Despite the challenges related to geocoding, such as inconsistencies in addresses, adequate correction and verification mechanisms provided a high rate of assignment of geographic coordinates, the findings suggest that adopting buffers, favored by geocoding, represents a potential for spatial epidemiological analyses by improving the representation of environmental attributes and the understanding of health outcomes.

Health of Aged Persons; Environment and Public Health; Health Surveys; Geographic Mapping; Geographic Information Systems; Spatial analysis

INTRODUCTION

With the increase in the world urban population, a growing number of investigations seek to understand the relationships between urbanized environments and health outcomes11. Schulz AJ. Urban environments and health. In: Nriagu JO, ed. Encyclopedia of Environmental Health. [place unknown]: Elsevier; 2011.p. 549-55.. Planning and managing cities efficiently may promote health and well-being, as well as reduce the incidence of chronic non-communicable diseases22. Giles-Corti B, Vernez-Moudon A, Reis R, Turrell G, Dannenberg AL, Badland H, et al. City planning and population health: a global challenge. Lancet. 2016 Dec;388(10062):2912-24. https://doi.org/10.1016/S0140-6736 (16)30066-6
https://doi.org/10.1016/S0140-6736 (16)3...
,33. Renalds A, Smith TH, Hale PJ. A systematic review of built environment and health. Fam Community Health. 2010;33(1):68-78. https://doi.org/10.1097/FCH.0b013e3181c4e2e5
https://doi.org/10.1097/FCH.0b013e3181c4...
, with a lasting effect44. Bauman AE, Reis RS, Sallis JF, Wells JC, Loos RJ, Martin BW; Lancet Physical Activity Series Working Group. Correlates of physical activity: why are some people physically active and others not? Lancet. 2012 Jul;380(9838):258-71. https://doi.org/10.1016/S0140-6736 (12)60735-1
https://doi.org/10.1016/S0140-6736 (12)6...
. Geographic Information Systems (GIS) are a set of technologies that allow the integration, in the same environment, of variables about different aspects of reality and at different aggregation scales55. Michael Y, Beard T, Choi D, Farquhar S, Carlson N. Measuring the influence of built neighborhood environments on walking in older adults. J Aging Phys Act. 2006 Jul;14(3):302-12. https://doi.org/10.1123/japa.14.3.302
https://doi.org/10.1123/japa.14.3.302...
,66. McElroy JA, Remington PL, Trentham-Dietz A, Robert SA, Newcomb PA. Geocoding addresses from a large population-based study: lessons learned. Epidemiology. 2003 Jul;14(4):399-407. https://doi.org/10.1097/01.EDE.0000073160.79633.c1
https://doi.org/10.1097/01.EDE.000007316...
. Geographic models based on GIS support in the analysis of health disparities concepts such as neighborhood context, health services availability, physical activity practice, and daily destination accessibility77. Brownson RC, Hoehner CM, Day K, Forsyth A, Sallis JF. Measuring the built environment for physical activity: state of the science [Internet]. Am J Prev Med. 2009 Apr;36(4 Suppl):S99-123.e12. https://doi.org/10.1016/j.amepre.2009.01.005
https://doi.org/10.1016/j.amepre.2009.01...
, capable of contributing to work on health and quality of life in cities.

Advances in GIS in the last two decades have increased the specificity with which an individual’s neighborhood environment can be spatially defined88. Frank LD, Fox EH, Ulmer JM, Chapman JE, Kershaw SE, Sallis JF, et al. International comparison of observation-specific spatial buffers: maximizing the ability to estimate physical activity. Int J Health Geogr. 2017 Jan;16(1):4. https://doi.org/10.1186/s12942-017-0077-9
https://doi.org/10.1186/s12942-017-0077-...
. The GIS analyses in the Collective Health field are generally based on the residential location of an individual, which can be defined at various levels of geographic resolution, such as: a) administrative boundaries (neighborhoods, municipalities, or other regionalizations); b) census tracts (territorial unit defined at each census by the Brazilian Institute of Geography and Statistics, IBGE, to control the collection of population data); and, c) latitude and longitude of a residential address. For administrative limits and census tracts, converting the address into a coordinate is unnecessary; however, the correspondence of the address with the territorial limit under study should be observed. On the other hand, the latter requires a process of converting textual addresses into geographic coordinates, known as geocoding66. McElroy JA, Remington PL, Trentham-Dietz A, Robert SA, Newcomb PA. Geocoding addresses from a large population-based study: lessons learned. Epidemiology. 2003 Jul;14(4):399-407. https://doi.org/10.1097/01.EDE.0000073160.79633.c1
https://doi.org/10.1097/01.EDE.000007316...
,1111. Jacquez GM. A research agenda: does geocoding positional error matter in health GIS studies? Spat Spatio-Temporal Epidemiol. 2012 Apr;3(1):7-16. https://doi.org/10.1016/j.sste.2012.02.002
https://doi.org/10.1016/j.sste.2012.02.0...
.

The importance of geocoding for analyzing health data has been evidenced by national surveys1212. Hino P, Villa TC, Sassaki CM, Nogueira JD, Dos Santos CB. Geoprocessamento aplicado à área da saúde. Rev Lat Am Enfermagem. 2006 Nov;14(6):939-43. https://doi.org/10.1590/S0104-11692006000600016
https://doi.org/10.1590/S0104-1169200600...
.

Geocoding allows the adoption of buffers, a zone around an individual’s home address (point) that establishes a boundary area, defined by a specified maximum distance, where spatial data of interest is aggregated. Buffers define and characterize the neighborhood accurately, helping to manage census tract limitations and the modifiable area unit problem77. Brownson RC, Hoehner CM, Day K, Forsyth A, Sallis JF. Measuring the built environment for physical activity: state of the science [Internet]. Am J Prev Med. 2009 Apr;36(4 Suppl):S99-123.e12. https://doi.org/10.1016/j.amepre.2009.01.005
https://doi.org/10.1016/j.amepre.2009.01...
. Despite the importance of the scale to aggregate the environment variables, few studies have examined the influence of different neighborhood definitions in the results of analyses1313. Oliver LN, Schuurman N, Hall AW. Comparing circular and network buffers to examine the influence of land use on walking for leisure and errands. Int J Health Geogr. 2007 Sep;6(1):41. https://doi.org/10.1186/1476-072X-6-41
https://doi.org/10.1186/1476-072X-6-41...
. Thus, the results of the objective attributes of the urban environment acquired with each type of geographical resolution may be different, overestimating or underestimating the real exposure that the participants of an epidemiological study have to the attributes of interest of the investigation.

Although the agility in the spatialization of a large volume of sites is an advantage of geocoding, the conversion process increases the risk of position and classification errors. Previous works have reported variable geocoding rates and losses caused by problematic addresses and poor record quality1414. Vine MF, Degnan D, Hanchette C. Geographic information systems: their use in environmental epidemiologic research. J Environ Health. 1997 Jun;105(6);598-605. https://doi.org/10.1289/ehp.97105598
https://doi.org/10.1289/ehp.97105598...
. Errors can lead to incorrect descriptions of the built environment variables, distorted conclusions about the association between dependent and independent variables, and inadequate public health decisions1111. Jacquez GM. A research agenda: does geocoding positional error matter in health GIS studies? Spat Spatio-Temporal Epidemiol. 2012 Apr;3(1):7-16. https://doi.org/10.1016/j.sste.2012.02.002
https://doi.org/10.1016/j.sste.2012.02.0...
. International studies use ArcGIS(r)/ArcView(r), a software licensed for geocoding66. McElroy JA, Remington PL, Trentham-Dietz A, Robert SA, Newcomb PA. Geocoding addresses from a large population-based study: lessons learned. Epidemiology. 2003 Jul;14(4):399-407. https://doi.org/10.1097/01.EDE.0000073160.79633.c1
https://doi.org/10.1097/01.EDE.000007316...
, but point out risks of incorrect localization66. McElroy JA, Remington PL, Trentham-Dietz A, Robert SA, Newcomb PA. Geocoding addresses from a large population-based study: lessons learned. Epidemiology. 2003 Jul;14(4):399-407. https://doi.org/10.1097/01.EDE.0000073160.79633.c1
https://doi.org/10.1097/01.EDE.000007316...
and errors when applied in other countries1919. Davis CA Jr, Alencar RO. Evaluation of the quality of an online geocoding resource in the context of a large Brazilian city. Trans GIS. 2011;15(6):851-68. https://doi.org/10.1111/j.1467-9671.2011.01288.x
https://doi.org/10.1111/j.1467-9671.2011...
. Other studies hire commercial companies with trained professionals, their own software, and continuous spatial corrections1818. Schootman M, Sterling DA, Struthers J, Yan Y, Laboube T, Emo B, et al. Positional accuracy and geographic bias of four methods of geocoding in epidemiologic research. Ann Epidemiol. 2007 Jun;17(6):464-70. https://doi.org/10.1016/j.annepidem.2006.10.015
https://doi.org/10.1016/j.annepidem.2006...
. Therefore, to minimize internal geocoding expenses, high-quality locational data is critical.

The EpiFloripa Aging Cohort Study, conducted in Florianópolis, Santa Catarina, sought to investigate and monitor the living and health conditions of the older population (60 years or older) living in the urban area of the municipality2020. Schneider IJ, Confortin SC, Bernardo CO, Bolsoni CC, Antes DL, Pereira KG, et al. EpiFloripa Aging cohort study: methods, operational aspects, and follow-up strategies. Rev Saude Publica. 2017;51:104. https://doi.org/10.11606/S1518-8787.2017051006776
https://doi.org/10.11606/S1518-8787.2017...
. Publications from this project have, so far, used the census tracts as a spatial unit of analysis and representation of the participants’ neighborhoods2121. Weber Corseiul Giehl M, Hallal PC, Weber Corseuil C, Schneider IJ, d’Orsi E. Built environment and walking behavior among Brazilian older adults: a population-based study. J Phys Act Health. 2016 Jun;13(6):617-24. https://doi.org/10.1123/jpah.2015-0355
https://doi.org/10.1123/jpah.2015-0355...
,2222. Corseuil Giehl MW, Hallal PC, Brownson RC, d’Orsi E. Exploring associations between perceived measures of the environment and walking among Brazilian Older adults. J Aging Health. 2017 Feb;29(1):45-67. https://doi.org/10.1177/0898264315624904
https://doi.org/10.1177/0898264315624904...
. With households geocoding, new studies can be developed, applying more specific units of analysis to the urban environment that can effectively be accessed within a certain time interval. However, this process imposes several technological and operational challenges that need to be addressed to ensure reliability and accuracy of the results.

Thus, this study describes the process and epidemiological implications of geocoding the residences of the EpiFloripa Aging Cohort Study (2009–2019) participants. For the latter, more specifically, we: a) compare sociodemographic data, environment and health condition perception obtained for the total sample and the proportion that was geocoded, searching possible distortions; and b) compare the performance of three possible neighborhood definitions from geocoding (census tracts, Euclidean buffers, and buffers across the street network) for some relevant variables, such as income, residential density, mixed land use, and connectivity.

METHODS

The EpiFloripa Ageing project is a population-based cohort study developed by the Federal University of Santa Catarina2323. Orsi E, Rech CR, Paiva KM, Lopes, AAS, Boing AC, Barbosa AR, et al. Estudo de coorte EpiFloripa Idoso 3a onda (2017-2019) relatório técnico-científico. Florianópolis: Universidade Federal de Santa Catarina; 2020 [cited 2021 Mar 30]. Available from: https://repositorio.ufsc.br/handle/123456789/219631
https://repositorio.ufsc.br/handle/12345...
. The spatial context of the study involves the entire city of Florianópolis (SC), with 421,240 inhabitants and 11.4% of the population over 60 years of age1919. Davis CA Jr, Alencar RO. Evaluation of the quality of an online geocoding resource in the context of a large Brazilian city. Trans GIS. 2011;15(6):851-68. https://doi.org/10.1111/j.1467-9671.2011.01288.x
https://doi.org/10.1111/j.1467-9671.2011...
. The sample selection process was carried out by clusters, in which the first stage units were the census tracts and those of the second stage were the households themselves. Initially, in 2009, the 420 urban census tracts of the municipality were organized according to the income deciles of the heads of households, and eight sectors were systematically drawn in each decile. Subsequently, a step was taken to reduce the coefficient of variation of the households in each sector, by dividing the sectors with the largest number of households (> 500) and grouping those with the lowest number (< 150), which resulted in 83 sectors, composed of a total of 22,846 households. At baseline, 1,911 older adults (≥ 60 years old) were identified and considered eligible.

Data collection was performed with a standardized questionnaire, applied as a face-to-face interviews at the participant’s residence, which offered registration data necessary for geolocation, containing the participant’s identification code (ID), name, telephone, street, residential number, residential postal code (ZIP code), and neighborhood.

It had three waves of assessment—baseline (2009–2010), follow-up after five years (2013–2014), and follow-up after 10 years (2017–2019)—with the first wave involving 1,705 respondents. However, two duplicate participants and one with incompatible age took the sample to 1,702, keeping the response rate at 89.2%. The second wave reached 1,197 participants, and from the third, it became an open cohort with 1,335 participants, of which 743 were follow-up interviews, 105 were older adults from the EpiFloripa Adult sample, and 487 were new recruits2323. Orsi E, Rech CR, Paiva KM, Lopes, AAS, Boing AC, Barbosa AR, et al. Estudo de coorte EpiFloripa Idoso 3a onda (2017-2019) relatório técnico-científico. Florianópolis: Universidade Federal de Santa Catarina; 2020 [cited 2021 Mar 30]. Available from: https://repositorio.ufsc.br/handle/123456789/219631
https://repositorio.ufsc.br/handle/12345...
. Further methodological details can be found in previous studies2020. Schneider IJ, Confortin SC, Bernardo CO, Bolsoni CC, Antes DL, Pereira KG, et al. EpiFloripa Aging cohort study: methods, operational aspects, and follow-up strategies. Rev Saude Publica. 2017;51:104. https://doi.org/10.11606/S1518-8787.2017051006776
https://doi.org/10.11606/S1518-8787.2017...
,2323. Orsi E, Rech CR, Paiva KM, Lopes, AAS, Boing AC, Barbosa AR, et al. Estudo de coorte EpiFloripa Idoso 3a onda (2017-2019) relatório técnico-científico. Florianópolis: Universidade Federal de Santa Catarina; 2020 [cited 2021 Mar 30]. Available from: https://repositorio.ufsc.br/handle/123456789/219631
https://repositorio.ufsc.br/handle/12345...
,2424. Confortin SC, Schneider IJC, Antes DL, Cembranel F, Ono LM, Marques LP, et al. Condições de vida e saúde de idosos: resultados do estudo de coorte EpiFloripa Idoso. Epidemiol Serv Saúde.2017 Apr;26(2):305-17. https://doi.org/10.5123/S1679-49742017000200008
https://doi.org/10.5123/S1679-4974201700...
.

The geocoding procedure followed several steps in this study, with three main strategies: a) address standardization; b) manual correction; and c) coordinate assignment and conference (Figure 1). The recurrence of incomplete address records or those with formatting incompatible with the geocoding program required standardization and normalization in a format suitable for import. For a low-cost procedure that does not require trained staff, we opted for the free Google Earth Pro software. The same software was chosen for the availability of qualified researchers and for its ability to quickly and automatically process the coordinates corresponding to the addresses99. Lopes AA, Hino AA, Moura EN, Reis RS. Hino AAF, Moura EN de, Reis RS. O Sistema de Informação Geográfica em pesquisas sobre ambiente, atividade física e saúde. Rev Bras Atividade Física Saúde. 2019 Aug;23:1-11. https://doi.org/10.12820/rbafs.23e0065
https://doi.org/10.12820/rbafs.23e0065...
, suggesting corrections for invalid addresses.

Figure 1
Geocoding processes applied in three monitoring waves in Florianópolis. EpiFloripa Ageing Cohort Study, 2009–2019.

To assess the coverage (proportion of successfully geocoded addresses) and positional accuracy of the participants’ households (how close the geocoded coordinates correspond to the true coordinates)1111. Jacquez GM. A research agenda: does geocoding positional error matter in health GIS studies? Spat Spatio-Temporal Epidemiol. 2012 Apr;3(1):7-16. https://doi.org/10.1016/j.sste.2012.02.002
https://doi.org/10.1016/j.sste.2012.02.0...
, a preliminary geocoding of the baseline was generated (EpiFloripa Idoso, 2009-2010). It highlighted the need to correct the addresses, preparing them for a definitive importation.

Strategies used to deal with incomplete addresses are among the main determinants of geocoding positional error1111. Jacquez GM. A research agenda: does geocoding positional error matter in health GIS studies? Spat Spatio-Temporal Epidemiol. 2012 Apr;3(1):7-16. https://doi.org/10.1016/j.sste.2012.02.002
https://doi.org/10.1016/j.sste.2012.02.0...
. Thus, addresses that were not found were verified on a case-by-case basis (Figure 1). The correction process involved processing the database (Microsoft Excel 2013) and updating the addresses via consultation of additional reported data. Searches on mapping sites (Google Maps, Google Street View) and municipal road system data (http://geo.pmf.sc.gov.br) favored the manual geocoding of the coordinates of addresses that were not found.

Due to the change in the number of census tracts by the IBGE between the 2000 and 2010 censuses, we chose to group sectors with similar mean income per capita characteristics, to guarantee a minimum number of older adults in each location. Thus, the study created what was called an Episector: a grouping of adjacent census tracts with similar characteristics, considering their geographical location and corresponding income decile1515. Bonner MR, Han D, Nie J, Rogerson P, Vena JE, Freudenheim JL. Positional accuracy of geocoded addresses in epidemiologic research. Epidemiology. 2003 Jul;14(4):408-12. https://doi.org/10.1097/01.EDE.0000073121.63254.c5
https://doi.org/10.1097/01.EDE.000007312...
. The same grouping was used as a mechanism to verify geocoding.

To avoid sample loss, participants recruited in the first wave who lived outside the boundary of the selected Episector were reconsidered based on a safety margin defined by the average size of a block (100 meters from the surroundings of the Episectors). Thus, data from individuals living at the edges of the census tract and who are within its zone of influence were safeguarded. For the participants in the three waves of the study, the location outside the tolerance margin of the Episector was disregarded as an error factor, favoring longitudinal studies.

In similar studies, inaccessible addresses were solved by generating a “midpoint of the street segment,” deriving a centroid66. McElroy JA, Remington PL, Trentham-Dietz A, Robert SA, Newcomb PA. Geocoding addresses from a large population-based study: lessons learned. Epidemiology. 2003 Jul;14(4):399-407. https://doi.org/10.1097/01.EDE.0000073160.79633.c1
https://doi.org/10.1097/01.EDE.000007316...
,2525. Goldberg DW, Swift JN, Wilson JP. Geocoding best practices: reference data, input data, and feature matching. Los Angeles: University of Southern California; 2008.. Therefore, for participants without records related to the residential number and without possibility of contact, the latitude/longitude coordinates of the centroids of the informed street were assigned. In extensive streets, the numbering of houses within the Episector in question was sought.

The same spatialization criteria were followed for the second and third waves of the study. Participants who changed addresses between the waves of research had their new home address checked and formatted for a new geocoding.

Participants with valid addresses were analyzed regarding 19 variables derived from the EpiFloripa Ageing, which encompass blocks of the questionnaire with sociodemographic data, data of perception of the environment, and health conditions along three waves of follow-up. The information collection method has been described in previous studies2020. Schneider IJ, Confortin SC, Bernardo CO, Bolsoni CC, Antes DL, Pereira KG, et al. EpiFloripa Aging cohort study: methods, operational aspects, and follow-up strategies. Rev Saude Publica. 2017;51:104. https://doi.org/10.11606/S1518-8787.2017051006776
https://doi.org/10.11606/S1518-8787.2017...
,2323. Orsi E, Rech CR, Paiva KM, Lopes, AAS, Boing AC, Barbosa AR, et al. Estudo de coorte EpiFloripa Idoso 3a onda (2017-2019) relatório técnico-científico. Florianópolis: Universidade Federal de Santa Catarina; 2020 [cited 2021 Mar 30]. Available from: https://repositorio.ufsc.br/handle/123456789/219631
https://repositorio.ufsc.br/handle/12345...
,2424. Confortin SC, Schneider IJC, Antes DL, Cembranel F, Ono LM, Marques LP, et al. Condições de vida e saúde de idosos: resultados do estudo de coorte EpiFloripa Idoso. Epidemiol Serv Saúde.2017 Apr;26(2):305-17. https://doi.org/10.5123/S1679-49742017000200008
https://doi.org/10.5123/S1679-4974201700...
. The data were compared according to the total samples, to identify the effect of georeferencing losses on the sample data of the three waves. The significance (95%) of the difference between the values for the total sample and the geocoded sample was calculated from a Z test for proportions.

Neighborhood definitions were adopted according to three different units of spatial analysis (Figure 2). From the database of streets in the municipality (Florianópolis City Hall – PMF – 2012), Euclidean (circular) and network (detailed) buffers were generated, which were then compared with the area pre-delimited by the traditional analysis unit, the census sector. The dimension adopted for the buffer (500 meters) follows previous studies based on a distance that allows an active displacement2626. Yun HY. Environmental factors associated with older adult’s walking behaviors: a systematic review of quantitative studies. Sustainability (Basel). 2019;11(12):3253. https://doi.org/10.3390/su11123253
https://doi.org/10.3390/su11123253...
and on the average gait speed according to age group2727. Weber D. Differences in physical aging measured by walking speed: evidence from the English Longitudinal Study of Ageing [Internet]. BMC Geriatr. 2016 Jan;16(1):31. https://doi.org/10.1186/s12877-016-0201-x
https://doi.org/10.1186/s12877-016-0201-...
, representing 10 minutes of walking from home.

Figure 2
Comparison between three types of neighborhood definition, census tract, Euclidean buffer, and detailed network buffer. EpiFloripa Ageing Cohort Study, 2009–2019.

By investigating the differences regarding the three neighborhood definitions, seven environmental variables were calculated for each spatial unit of analysis. For the samples geocoded in the three waves, the variables area (km22. Giles-Corti B, Vernez-Moudon A, Reis R, Turrell G, Dannenberg AL, Badland H, et al. City planning and population health: a global challenge. Lancet. 2016 Dec;388(10062):2912-24. https://doi.org/10.1016/S0140-6736 (16)30066-6
https://doi.org/10.1016/S0140-6736 (16)3...
), mean income per capita (census tracts2828. Instituto Brasileiro de Geografia e Estatística. Censo demográfico. Brasília, DF: Instituto Brasileiro de Geografia e Estatística; 2010.), residential density (housing per hectare), mixed land use (entropy), street connectivity (three intersections or more), and health units and public open spaces counts were calculated2929. Malta DC, Iser BP, Santos MA, Andrade SS, Stopa SR, Bernal RT, et al. Estilos de vida nas capitais Brasileiras segundo a pesquisa nacional de saúde e o sistema de vigilância de fatores de risco e proteção para doenças crônicas não transmissíveis por inquérito telefônico (Vigitel), 2013. Rev Bras Epidemiol. 2015;18 suppl 2:68-82. https://doi.org/10.1590/1980-5497201500060007
https://doi.org/10.1590/1980-54972015000...
. When using buffered census data, the sectors and the portion comprised by them were considered, weighting the values according to the area of each census tract contained therein. To perform the calculations, scripts were created in the QGIS Graphical Modeler, combining different analyses into a single process and containing the analysis unit as a calculation parameter.

Medians and standard deviations were calculated for the variables income, residential density, and entropy. Finally, Pearson’s correlations between the representations by network buffer, circular buffer, and census tracts indicated the relationship between the spatial units for the same three variables. Scatter plots were used to represent the relationship between network buffers and census tract values for the three variables, showing how the different representations resulted in similar or different values.

RESULTS

Figure 1 shows the quantity of successful geocoding and the description of the specificities of the addresses during the three waves of data collection. The baseline data of the EpiFloripa Aging (2009–2010) required the highest percentage of adjustment (17% of the records were incomplete, nw1 = 301) and generated a higher number of losses than the other waves (nw1 = 132). Error correction and verification from the expanded limit of the Episector (census tract) identified addresses outside it, inconsistent, and without numerical data (geocoded by the centroid of the street). The second wave of the study (2013-2014) had 77 losses, and the third (2017–2019) had 31, most of which were due to the move to another municipality (nw3 = 22). Finally, reconsidering participants from the three study waves with residential locations outside the expanded limit of their respective Episector avoided 18 losses (Figure 1).

Comparison between Total Sample and Geocoded Sample

Table 1 shows the percentage distribution and p-value according to sociodemographic data, environment and health condition perception of the total sample compared with the georeferenced sample, for the three follow-up waves.

Table 1
Older adults’ sociodemographic variables, environmental perception, and health conditions over three follow-ups in Florianópolis according to the total and georeferenced samples. EpiFloripa Aging Cohort Study, 2009–2019.

Comparing income and schooling values shows a small bias in the direction of higher incomes and higher education, although these differences are not statistically significant in any of the cases. The geocoded sample showed a reduced proportion of participants with up to 1 minimum wage and an increased proportion of individuals with more than 10 minimum wages. Similarly, the variables related to the environment also show a clear bias towards better conditions of the georeferenced samples compared with the total sample: in both wave 1 and wave 2, the georeferenced sample has more sidewalks, crosswalks, lighting, and safety during the day than the total, whereas only wave 1 has the same effect for the presence of flat streets, traffic conditions, safety at night, and the presence of public spaces. In all cases, however, these differences were not statistically significant.

The same pattern, although less pronounced, occurs for the variables of health perception, depression symptoms, cognitive deficit, and physical activity, which are more favorable in the georeferenced sample than in the total, whereas the reverse is true for overweight, diabetes, and hypertension.

Table 2 presents seven descriptive variables for the three spatial units considered here: census tract, circular buffers, and network buffers. In general, the standard deviations of the two types of buffers are smaller than those of the census tracts. The values of the environmental characteristics for the three units indicate low variability between the neighborhoods along the three lines of study, except for the contextual income, which showed an increase. Attributes such as mixed land use, number of health units, and number of public open spaces maintain lower values over three follow-up waves. The low values, evidenced by the three units of analysis, reveal a lower access to different land uses, and a limited access to health and leisure equipment in the sampled neighborhoods.

Table 2
Neighborhood characteristics of older adults’ residence over three follow-up waves in Florianopolis according to geocoded samples. EpiFloripa Ageing Cohort Study, 2009–2019. (nw1 = 1,570; nw2 = 1,120; nw3 = 1,304).

Table 3 shows that measures of mixed land use and residential density for circular and network buffers are highly correlated across the three waves, with values ranging from 0.74 to 0.83, whereas the correlation of both types of buffers with census tracts is much lower (0.37 to 0.54). For the income variable, all measures in all spatial units are highly correlated, ranging from 0.85 to 0.97.

Table 3
Pearson’s correlation between spatial units regarding income (average per capita income in BLR), residential density (dwellings per hectare), and objective entropy according to geocoded samples. EpiFloripa Ageing Cohort Study, 2009–2019. (nw1 = 1,570; nw2 = 1,120; nw3 = 1,304).

DISCUSSION

The geocoding of data from the EpiFloripa Ageing Cohort Study with Google Earth Pro had a high proportion of matches, despite the difficulties related to inconsistencies in the addresses. Among the residential data of the three study waves, only 6% (nw1,w2,w3 = 240) were considered losses, and 1% (nw1,w2,w3 = 44) received coordinates corresponding to the centroid of their respective street, which led to the absence of statistically significant difference between the total sample and the georeferenced sample (Table 1).

Although the coordinate assignment rate approached 100%, a significant part of the losses involved addresses that were not found (nw1 = 79; nw2 = 39; nw3 = 9). This fact is partially justified by the physical-geographical characteristics of the municipality and its historical occupation process. The previous rural structuring and naval flows led to the formation of a disjointed and fragmentary urban fabric, with the presence of fishbone traces, varied easements, and disconnected and peripheral neighborhoods3030. Saboya RT, Reis AF, Bueno AP. Continuidades e descontinuidades urbanas à beira-mar: uma leitura morfológica e configuracional da área conurbada de Florianópolis. Oculum Ensaios. 2016;13(1):129. https://doi.org/10.24220/2318-0919v13n1a2756
https://doi.org/10.24220/2318-0919v13n1a...
. In addition, the slight difference in the proportion of income groups indicates possible problems related to geocoding populations of neighborhoods of lower socioeconomic status (Table 1).

In the insular portion, low-income settlements are located on hillsides and in areas with little accessibility3030. Saboya RT, Reis AF, Bueno AP. Continuidades e descontinuidades urbanas à beira-mar: uma leitura morfológica e configuracional da área conurbada de Florianópolis. Oculum Ensaios. 2016;13(1):129. https://doi.org/10.24220/2318-0919v13n1a2756
https://doi.org/10.24220/2318-0919v13n1a...
. The irregularity and urban exclusion impose inequalities in the municipal registry, implying difficulties in georeferencing. This problem is not unique to the research: another Brazilian study1919. Davis CA Jr, Alencar RO. Evaluation of the quality of an online geocoding resource in the context of a large Brazilian city. Trans GIS. 2011;15(6):851-68. https://doi.org/10.1111/j.1467-9671.2011.01288.x
https://doi.org/10.1111/j.1467-9671.2011...
revealed weaknesses in the geocoding of less urbanized sectors, neighborhoods of lower socioeconomic level, and recent settlements, with irregular completeness and precision, which may impact public health and education actions precisely in areas that need them most.

Another factor that may justify the volume of losses is the small number of interviewers in the field in the first wave of the study, their turnover, and the need for replacement in the second wave99. Lopes AA, Hino AA, Moura EN, Reis RS. Hino AAF, Moura EN de, Reis RS. O Sistema de Informação Geográfica em pesquisas sobre ambiente, atividade física e saúde. Rev Bras Atividade Física Saúde. 2019 Aug;23:1-11. https://doi.org/10.12820/rbafs.23e0065
https://doi.org/10.12820/rbafs.23e0065...
. These factors generated limitations in the accuracy and rigor of the procedure for registering the participants’ home addresses. Additionally, 53 addresses were located outside the Episector, excluding participants of the three waves (nw1,w2,w3 = 18). These results reinforce the need for epidemiological studies to include in their planning training on ways to obtain address data with greater quality or accuracy, or to use other forms of geolocation, such as mobile devices for real-time location (e.g., mobile phones, portable GPS, among others). This can ensure higher quality of the georeferenced data.

Regarding the possibility of introducing a bias with the losses imposed by geocoding, the p-values in Table 1 indicate that, for all considered variables, and for the three waves, the total sample and the georeferenced sample showed no statistically significant difference. That is, the losses in the georeferencing of the three study waves did not affect their representativeness compared with the total sample. Despite this, all variables of built environment perception showed a slight increase in the georeferenced sample. Considering that higher values in these characteristics indicate areas with higher quality (greater presence of sidewalks, greater safety during the day and at night, etc.), this suggests that the geocoding process may have inserted a small (and statistically insignificant) distortion of including a higher proportion of participants with better levels of income, education, and living conditions. The proportions of income groups confirm this impression, reinforcing what was previously commented on the greater amount of losses in areas with more socioeconomic problems.

On the other hand, although the process caused sample losses, geocoding allowed the adoption of buffers, evidencing their statistical difference compared with measures calculated by census tracts, and highlighting flaws in describing the spatial attributes calculated on this territorial unit. The artificial spatial standardization of the census tract creates units of different dimensions and aggregation levels, which generated spatial measures with high variation (larger standard deviations) compared with buffer-based measures, especially for measures such as area, income, residential density, and mixed land use (Table 2). Pearson’s correlation coefficients showed little correspondence between the measures calculated by the different spatial units during the three study waves, except for the income measure, calculated with data at the census tract level (Table 3). This was probably due to limitations in the data source causing aggregation in buffers to use data from the census tracts themselves. The results point to the influence of the use of census tracts on findings of spatial epidemiological analyses66. McElroy JA, Remington PL, Trentham-Dietz A, Robert SA, Newcomb PA. Geocoding addresses from a large population-based study: lessons learned. Epidemiology. 2003 Jul;14(4):399-407. https://doi.org/10.1097/01.EDE.0000073160.79633.c1
https://doi.org/10.1097/01.EDE.000007316...
, suggesting that adopting buffers can help manage their limitations, representing a more effective aggregation unit of environmental data77. Brownson RC, Hoehner CM, Day K, Forsyth A, Sallis JF. Measuring the built environment for physical activity: state of the science [Internet]. Am J Prev Med. 2009 Apr;36(4 Suppl):S99-123.e12. https://doi.org/10.1016/j.amepre.2009.01.005
https://doi.org/10.1016/j.amepre.2009.01...
,1313. Oliver LN, Schuurman N, Hall AW. Comparing circular and network buffers to examine the influence of land use on walking for leisure and errands. Int J Health Geogr. 2007 Sep;6(1):41. https://doi.org/10.1186/1476-072X-6-41
https://doi.org/10.1186/1476-072X-6-41...
.

Due to these problems, we recommend that household-based surveys standardize records, expanding the detailing of location information99. Lopes AA, Hino AA, Moura EN, Reis RS. Hino AAF, Moura EN de, Reis RS. O Sistema de Informação Geográfica em pesquisas sobre ambiente, atividade física e saúde. Rev Bras Atividade Física Saúde. 2019 Aug;23:1-11. https://doi.org/10.12820/rbafs.23e0065
https://doi.org/10.12820/rbafs.23e0065...
. The use of specific software and programming for normalization and search of the input addresses could have reduced the time spent updating the problematic addresses. Therefore, future studies may employ different geocoding methods, comprising address verification algorithms1616. Zinszer K, Jauvin C, Verma A, Bedard L, Allard R, Schwartzman K, et al. Residential address errors in public health surveillance data: a description and analysis of the impact on geocoding. Spat Spatio-Temporal Epidemiol. 2010 Jul;1(2-3):163-8. https://doi.org/10.1016/j.sste.2010.03.002
https://doi.org/10.1016/j.sste.2010.03.0...
, precision measurements of geocoded locations, and positional error assessments. Similarly, we recognize the need for a team familiar with geocoding and data manipulation software.

Finally, the low quality of municipal records in peripheral areas highlights a problem that impacts knowledge about urban reality and limits the creation of evidence-based public policies aimed at the most vulnerable populations. Therefore, the need to improve municipal registries is highlighted, expanding the detailing of location information that serves as input for geocoding.

REFERENCES

  • 1
    Schulz AJ. Urban environments and health. In: Nriagu JO, ed. Encyclopedia of Environmental Health. [place unknown]: Elsevier; 2011.p. 549-55.
  • 2
    Giles-Corti B, Vernez-Moudon A, Reis R, Turrell G, Dannenberg AL, Badland H, et al. City planning and population health: a global challenge. Lancet. 2016 Dec;388(10062):2912-24. https://doi.org/10.1016/S0140-6736 (16)30066-6
    » https://doi.org/10.1016/S0140-6736 (16)30066-6
  • 3
    Renalds A, Smith TH, Hale PJ. A systematic review of built environment and health. Fam Community Health. 2010;33(1):68-78. https://doi.org/10.1097/FCH.0b013e3181c4e2e5
    » https://doi.org/10.1097/FCH.0b013e3181c4e2e5
  • 4
    Bauman AE, Reis RS, Sallis JF, Wells JC, Loos RJ, Martin BW; Lancet Physical Activity Series Working Group. Correlates of physical activity: why are some people physically active and others not? Lancet. 2012 Jul;380(9838):258-71. https://doi.org/10.1016/S0140-6736 (12)60735-1
    » https://doi.org/10.1016/S0140-6736 (12)60735-1
  • 5
    Michael Y, Beard T, Choi D, Farquhar S, Carlson N. Measuring the influence of built neighborhood environments on walking in older adults. J Aging Phys Act. 2006 Jul;14(3):302-12. https://doi.org/10.1123/japa.14.3.302
    » https://doi.org/10.1123/japa.14.3.302
  • 6
    McElroy JA, Remington PL, Trentham-Dietz A, Robert SA, Newcomb PA. Geocoding addresses from a large population-based study: lessons learned. Epidemiology. 2003 Jul;14(4):399-407. https://doi.org/10.1097/01.EDE.0000073160.79633.c1
    » https://doi.org/10.1097/01.EDE.0000073160.79633.c1
  • 7
    Brownson RC, Hoehner CM, Day K, Forsyth A, Sallis JF. Measuring the built environment for physical activity: state of the science [Internet]. Am J Prev Med. 2009 Apr;36(4 Suppl):S99-123.e12. https://doi.org/10.1016/j.amepre.2009.01.005
    » https://doi.org/10.1016/j.amepre.2009.01.005
  • 8
    Frank LD, Fox EH, Ulmer JM, Chapman JE, Kershaw SE, Sallis JF, et al. International comparison of observation-specific spatial buffers: maximizing the ability to estimate physical activity. Int J Health Geogr. 2017 Jan;16(1):4. https://doi.org/10.1186/s12942-017-0077-9
    » https://doi.org/10.1186/s12942-017-0077-9
  • 9
    Lopes AA, Hino AA, Moura EN, Reis RS. Hino AAF, Moura EN de, Reis RS. O Sistema de Informação Geográfica em pesquisas sobre ambiente, atividade física e saúde. Rev Bras Atividade Física Saúde. 2019 Aug;23:1-11. https://doi.org/10.12820/rbafs.23e0065
    » https://doi.org/10.12820/rbafs.23e0065
  • 10
    Leslie E, Coffee N, Frank L, Owen N, Bauman A, Hugo G. Walkability of local communities: using geographic information systems to objectively assess relevant environmental attributes. Health Place. 2007 Mar;13(1):111-22. [ https://doi.org/10.1016/j.healthplace.2005.11.001
    » https://doi.org/10.1016/j.healthplace.2005.11.001
  • 11
    Jacquez GM. A research agenda: does geocoding positional error matter in health GIS studies? Spat Spatio-Temporal Epidemiol. 2012 Apr;3(1):7-16. https://doi.org/10.1016/j.sste.2012.02.002
    » https://doi.org/10.1016/j.sste.2012.02.002
  • 12
    Hino P, Villa TC, Sassaki CM, Nogueira JD, Dos Santos CB. Geoprocessamento aplicado à área da saúde. Rev Lat Am Enfermagem. 2006 Nov;14(6):939-43. https://doi.org/10.1590/S0104-11692006000600016
    » https://doi.org/10.1590/S0104-11692006000600016
  • 13
    Oliver LN, Schuurman N, Hall AW. Comparing circular and network buffers to examine the influence of land use on walking for leisure and errands. Int J Health Geogr. 2007 Sep;6(1):41. https://doi.org/10.1186/1476-072X-6-41
    » https://doi.org/10.1186/1476-072X-6-41
  • 14
    Vine MF, Degnan D, Hanchette C. Geographic information systems: their use in environmental epidemiologic research. J Environ Health. 1997 Jun;105(6);598-605. https://doi.org/10.1289/ehp.97105598
    » https://doi.org/10.1289/ehp.97105598
  • 15
    Bonner MR, Han D, Nie J, Rogerson P, Vena JE, Freudenheim JL. Positional accuracy of geocoded addresses in epidemiologic research. Epidemiology. 2003 Jul;14(4):408-12. https://doi.org/10.1097/01.EDE.0000073121.63254.c5
    » https://doi.org/10.1097/01.EDE.0000073121.63254.c5
  • 16
    Zinszer K, Jauvin C, Verma A, Bedard L, Allard R, Schwartzman K, et al. Residential address errors in public health surveillance data: a description and analysis of the impact on geocoding. Spat Spatio-Temporal Epidemiol. 2010 Jul;1(2-3):163-8. https://doi.org/10.1016/j.sste.2010.03.002
    » https://doi.org/10.1016/j.sste.2010.03.002
  • 17
    Silveira IH, Oliveira BFA, Junger WL. Utilização do Google Maps para o georreferenciamento de dados do Sistema de Informações sobre Mortalidade no município do Rio de Janeiro, 2010-2012. Epidemiol Serv Saude. 2017 Oct-Dec;26(4):881-6. https://doi.org/10.5123/S1679-49742017000400018
    » https://doi.org/10.5123/S1679-49742017000400018
  • 18
    Schootman M, Sterling DA, Struthers J, Yan Y, Laboube T, Emo B, et al. Positional accuracy and geographic bias of four methods of geocoding in epidemiologic research. Ann Epidemiol. 2007 Jun;17(6):464-70. https://doi.org/10.1016/j.annepidem.2006.10.015
    » https://doi.org/10.1016/j.annepidem.2006.10.015
  • 19
    Davis CA Jr, Alencar RO. Evaluation of the quality of an online geocoding resource in the context of a large Brazilian city. Trans GIS. 2011;15(6):851-68. https://doi.org/10.1111/j.1467-9671.2011.01288.x
    » https://doi.org/10.1111/j.1467-9671.2011.01288.x
  • 20
    Schneider IJ, Confortin SC, Bernardo CO, Bolsoni CC, Antes DL, Pereira KG, et al. EpiFloripa Aging cohort study: methods, operational aspects, and follow-up strategies. Rev Saude Publica. 2017;51:104. https://doi.org/10.11606/S1518-8787.2017051006776
    » https://doi.org/10.11606/S1518-8787.2017051006776
  • 21
    Weber Corseiul Giehl M, Hallal PC, Weber Corseuil C, Schneider IJ, d’Orsi E. Built environment and walking behavior among Brazilian older adults: a population-based study. J Phys Act Health. 2016 Jun;13(6):617-24. https://doi.org/10.1123/jpah.2015-0355
    » https://doi.org/10.1123/jpah.2015-0355
  • 22
    Corseuil Giehl MW, Hallal PC, Brownson RC, d’Orsi E. Exploring associations between perceived measures of the environment and walking among Brazilian Older adults. J Aging Health. 2017 Feb;29(1):45-67. https://doi.org/10.1177/0898264315624904
    » https://doi.org/10.1177/0898264315624904
  • 23
    Orsi E, Rech CR, Paiva KM, Lopes, AAS, Boing AC, Barbosa AR, et al. Estudo de coorte EpiFloripa Idoso 3a onda (2017-2019) relatório técnico-científico. Florianópolis: Universidade Federal de Santa Catarina; 2020 [cited 2021 Mar 30]. Available from: https://repositorio.ufsc.br/handle/123456789/219631
    » https://repositorio.ufsc.br/handle/123456789/219631
  • 24
    Confortin SC, Schneider IJC, Antes DL, Cembranel F, Ono LM, Marques LP, et al. Condições de vida e saúde de idosos: resultados do estudo de coorte EpiFloripa Idoso. Epidemiol Serv Saúde.2017 Apr;26(2):305-17. https://doi.org/10.5123/S1679-49742017000200008
    » https://doi.org/10.5123/S1679-49742017000200008
  • 25
    Goldberg DW, Swift JN, Wilson JP. Geocoding best practices: reference data, input data, and feature matching. Los Angeles: University of Southern California; 2008.
  • 26
    Yun HY. Environmental factors associated with older adult’s walking behaviors: a systematic review of quantitative studies. Sustainability (Basel). 2019;11(12):3253. https://doi.org/10.3390/su11123253
    » https://doi.org/10.3390/su11123253
  • 27
    Weber D. Differences in physical aging measured by walking speed: evidence from the English Longitudinal Study of Ageing [Internet]. BMC Geriatr. 2016 Jan;16(1):31. https://doi.org/10.1186/s12877-016-0201-x
    » https://doi.org/10.1186/s12877-016-0201-x
  • 28
    Instituto Brasileiro de Geografia e Estatística. Censo demográfico Brasília, DF: Instituto Brasileiro de Geografia e Estatística; 2010.
  • 29
    Malta DC, Iser BP, Santos MA, Andrade SS, Stopa SR, Bernal RT, et al. Estilos de vida nas capitais Brasileiras segundo a pesquisa nacional de saúde e o sistema de vigilância de fatores de risco e proteção para doenças crônicas não transmissíveis por inquérito telefônico (Vigitel), 2013. Rev Bras Epidemiol. 2015;18 suppl 2:68-82. https://doi.org/10.1590/1980-5497201500060007
    » https://doi.org/10.1590/1980-5497201500060007
  • 30
    Saboya RT, Reis AF, Bueno AP. Continuidades e descontinuidades urbanas à beira-mar: uma leitura morfológica e configuracional da área conurbada de Florianópolis. Oculum Ensaios. 2016;13(1):129. https://doi.org/10.24220/2318-0919v13n1a2756
    » https://doi.org/10.24220/2318-0919v13n1a2756

  • Funding: Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq — Processes 06/2008 and 569834/2008-2. Economic and Social Research Council (ESRC), multicenter project Promoting Independence in Dementia (PRIDE – contract 75/2017).

Publication Dates

  • Publication in this collection
    13 Nov 2023
  • Date of issue
    2023

History

  • Received
    25 July 2022
  • Accepted
    02 Jan 2023
Faculdade de Saúde Pública da Universidade de São Paulo São Paulo - SP - Brazil
E-mail: revsp@org.usp.br