Exploratory Spatial Data Analysis: Emerging tools and concepts This article originally appeared in Geospatial Solutions Magazine's Net Results column of July 1, 2003. Other Net Results articles about the role of emerging technologies in the exchange of spatial information are also online.

1. Introduction and Glossary   2. Naive to cynical   3. Cynical to critical   4. Pure Critics

According to Joel Best, professor of sociology and criminal justice at the University of Delaware, people approach statistics from one of four perspectives: awestruck, naïve, cynical, or critical. Most people (myself included) occupy one of the first two categories. However, if we are to believe Edward Teller, father of the hydrogen bomb, critical statistical thinking is increasingly in demand. Along the arc of his career, Teller observed his fellow scientists first believing that anything, if studied long enough, could be understood — but gradually recognizing that we simply can’t know everything and so must make the best of uncertainty. In characteristic wry fashion, Teller once quipped, "Physics seems to have explanations that are somewhat complete for everything except life."

The geospatial industry is no stranger to uncertainty; researchers often must statistically analyze incomplete or “fuzzy” spatial datasets to discover their hidden order, or confirm a lack thereof. This column surveys emerging applications of ESDA and its associated spatial-statistical tools and methods.

Awestruck, curious, or just desperate?

ESDA, or the use of statistics to better understand a spatial dataset, is popular in fields such as public policy analysis, marketing, social science, epidemiology, and geology. Ultimately, the desire to discover the secrets hidden by incomplete or uncertain data drives all ESDA. In some scenarios, the researcher is certain about a few locations, but has to guess at others. For example, mining geologists analyze a limited number of successful extraction points to deduce new areas potentially rich in oil, gas, or minerals — a practice called geostatistics. In some projects, the researcher has multiple datasets with overlapping spatial coverage but uncertain inter-relationships. Epidemiologists, for example, may guess that people living near nuclear facilities have an above-average risk of getting cancer, and statistically compare distribution of cancer cases with suspected radiation sources, aggregating similar demographic groups and searching for clusters. In most ESDA efforts, researchers compare individual spatial features to their nearest neighbors in a process called spatial autocorrelation. For example, are violent crimes randomly distributed or, for whatever reason, more concentrated in downtown clusters?

Why bother with ESDA? When the underlying data contain uncertainty or limited sample points, statistical analysis may be the only way to tickle some truth from an otherwise puzzling dataset. And even if no amount of analysis can squeeze true certainty from a fuzzy dataset, ESDA may at least speed up the search by mapping areas most likely to yield results under closer scrutiny.

CSISS Center for Spatially Integrated Social Science
ESDA Exploratory Spatial Data Analysis
LISA Local Indicators of Spatial Association
NIH National Institute of Health
TM Thematic mapper

1. Introduction and Glossary   2. Naive to cynical   3. Cynical to critical   4. Pure Critics