Consider a
conceptual overlay problem. First, software vendors have encouraged their users
to migrate spatial data from files to databases. Over the past several years,
many users have successfully done so. Second, ongoing homeland security pressures
have raised the importance of data integration across organizational and
geographic boundaries; spatial data appears to be an ideal multijurisdictional
"glue". Do these two trends--use of spatial databases and need for spatial data
integration--overlay neatly? Is it possible (or even desirable) to integrate
spatial databases? Investigating the integration strategy of spatial data
replication across distributed database systems is the topic of this month’s
column.
Warning: proceed with caution; don’t set your hopes too high. To my knowledge,
no existing product can replicate spatial data across all distributed
heterogeneous database systems. Although most database vendors support
replication of standard data types (text, numbers, and dates), replication of
nonstandard spatial types is just beginning to emerge from research labs as a
commercial product and is typically limited to replication between databases
made by the same vendor--Oracle (www.oracle.com) to Oracle, but not Oracle to
IBM’s (www.ibm.com) DB2, for instance. A step in the right direction is Lakeview
Technologies’ (www.lakeviewtech.com) OmniReplicator, a product that replicates
spatial data between SQL Server and Oracle databases. Why does such a potentially
useful capability have such limited support in the marketplace? A basic
understanding of both distributed database systems and replication will help
solve this puzzle.
A first date
A distributed database system is a collection of networked sites, each able to
function alone, but also able to access data anywhere else in the network exactly
as if that data were stored locally. Each site has its own local databases and
users can continue working even when the network is inoperable. The sites may
be geographically distant, but more often than not are distributed logically
across a single organization, such as by department or project. The idea of
distributed but partnered databases is simple enough at the surface--what’s so
special about copying data?--but in practice quickly becomes complex.
C.J. Date, author of the classic "An Introduction to Database Systems," describes
distributed database systems with language uncannily similar to the message of
homeland security integrators:
Carried to its logical conclusion, full support for [a] distributed
database [system] implies that a single application should be able to operate
"transparently" on data that is spread across a variety of different databases,
managed by a variety of different DBMSs, running on a variety of different
machines, supported by a variety of different operating systems, and connected
together by a variety of different communication networks--where "transparently"
means that the application operates from a logical point of view as if the data
were all managed by a single DBMS running on a single machine. Such a capability
might sound like a pretty tall order!--but it is highly desirable from a
practical perspective, and vendors are working hard to make such systems a
reality.
Date wrote this almost 10 years ago, and today it still sounds like a pretty
tall order. But vendors continue to tinker with distributed database solutions
because what made them highly desirable in the early 90s remains compelling today.
Namely, enterprises of every shape, size, and color already are distributed,
whether by logical divisions (such as departments or workgroups) or physical
separation (for instance, a factory or laboratory). And we all typically want
to keep our precious data local, where it most logically belongs.
No one encounters this local-data sentiment more often than centralized data
warehouse cosultants. During planning sessions, these integrators can encounter
strong resistance from their customers when suggesting a shift of local data to
a remote centralized location. The seasoned database administrators in the
discussion quickly get territorial, even hostile. "What happens to my workflow
when the network is down, or when traffic is heavy? I have live users to support!"
snorts the DBA in a panic. And they’re justified; performance is best when the
users and their data are in the same place. So, matching the data storage to the
reality of the business structure maximizes the efficiency of local data
processing. Preserving those local efficiencies while simultaneously spreading
the common wealth across the enterprise--efficiency plus mutual accessibility--is
what makes the distributed database strategy so highly desirable.
Glossary
DBA: Database Administrator
DBMS: Database Management System
|