Critical spatio-statistical thinkers
The statisticians agree that space matters, but they aren’t sure how much. Taking the opposite
tack from Dr. Snow in his time of cholera, Stanford doctoral student and NIH grantee Michael Choy
is using an already well-documented association between demographic profile and stomach cancer to
test the validity of a spatial-statistical aggregation technique. Unlike London’s localized cholera
pandemic, stomach cancer is relatively rare and spread thinly across the US population. In all of
the year 2000, for instance, the incidence of stomach cancer was approximately 5 per 1,000 people
in the greater San Francisco Bay Area
(www.nccc.org/pdf/Registries/annual_reports/incidence/stomach1.pdf),
compared to the 1854 cholera outbreak’s dramatic 32 deaths per 1,000 people in one London
neighborhood in less than two months. This makes understanding the spatial distribution of
stomach cancer a more elusive subject.
Numerous epidemiological studies have established a stomach bacterium, Helicobacter pylori
as the etiologic agent of stomach cancer. Infection with H. pylori is strongly correlated with a
demographic profile including parameters of age, race/ethnicity, gender, income, place of birth
(foreign versus native) and smoking behavior. The stomach cancer records also include a spatial
reference: the census block group. Comparing the demographics of block groups containing stomach
cancer should reveal the same strong correlation researchers have already derived independently
of spatial information. However, because of the relative rarity of stomach cancer, many block groups
have no stomach cancer incidents. This eliminates their populations from the sample and increases the
error associated with the estimate of the association. What to do?
Choy’s problem, measuring the margin of error in his own statistics, should be familiar to anyone
interested in politics. Consider a political poll: 45 percent for Bush, 45 percent for Gore, with
a margin of error of plus-or-minus 4 percent. Without the margin of error, Bush and Gore appear
equally popular. But given the margin of error, either candidate could be as much as 8 percentage
points ahead or behind the other. While common in polls, reliable error calculation in spatial
methodologies remains a field of active research. So, coming up with an estimate of association
is only half of the problem. It takes an accurate assessment of the error to address the question:
"How good is our answer?"
The devil and the details. Choy’s strategy is to aggregate contiguous block groups, cancer-free or
otherwise, based on their common demographic profiles. This approach increases the overall sample
size and stability of the statistical correlation. Easy enough to conceptualize, but again, the devil
is in the details. Choy wants to know how the results of his aggregations differ depending on the
order in which he aggregates contiguous block groups, and on the degree of similarity required for
aggregation. For both variables, he is working on computationally intensive methods that run through
the spectrum of possibilities, remember each result, and return the entire range of results.
The vendor community is also pushing the ESDA envelope. Steve Kopp of ESRI, lead product specialist
for Spatial Analyst, is expanding the ESDA tools in ArcGIS beyond those already in the
Geostatistical Analyst extension to include investigating and quantifying relationships of point
and polygon data, as well as exploring multitemporal, multiscenario ESDA techniques. Kopp described
ESRI’s efforts as "ways to visualize and summarize a spectrum of analyses to detect trends or
patterns," and hinted that some of the most interesting applications of ESDA are still evolving.
"What best communicates the results of multiple simulations of a problem scenario?" asked Kopp.
"You can't just show the user 100 slightly different maps, or blend them into one summary map, for
example, there needs to be useful visualization and exploration tools to make use of the results."
Maybe we should check with Joel Best, who must himself fall into the critical statistical
thinking category. Is there a best practice for critically designed ESDA techniques? In Best’s
own words, “No statistic is perfect, but some are less imperfect than others. Good or bad, every
statistic reflects its creators’ choices.” So, if you don’t like the stats, go out and make some of
your own.