A Date with Replication: Distributed Spatial Database Systems This article originally appeared in Geospatial Solutions Magazine's Net Results column of April 1, 2003. Other Net Results articles about the role of emerging technologies in the exchange of spatial information are also online.

1. Introduction and Glossary   2. How's my complexion?   3. Updating that little black book   4. Delayed gratification

Consider a conceptual overlay problem. First, software vendors have encouraged their users to migrate spatial data from files to databases. Over the past several years, many users have successfully done so. Second, ongoing homeland security pressures have raised the importance of data integration across organizational and geographic boundaries; spatial data appears to be an ideal multijurisdictional "glue". Do these two trends--use of spatial databases and need for spatial data integration--overlay neatly? Is it possible (or even desirable) to integrate spatial databases? Investigating the integration strategy of spatial data replication across distributed database systems is the topic of this month’s column.

Warning: proceed with caution; don’t set your hopes too high. To my knowledge, no existing product can replicate spatial data across all distributed heterogeneous database systems. Although most database vendors support replication of standard data types (text, numbers, and dates), replication of nonstandard spatial types is just beginning to emerge from research labs as a commercial product and is typically limited to replication between databases made by the same vendor--Oracle (www.oracle.com) to Oracle, but not Oracle to IBM’s (www.ibm.com) DB2, for instance. A step in the right direction is Lakeview Technologies’ (www.lakeviewtech.com) OmniReplicator, a product that replicates spatial data between SQL Server and Oracle databases. Why does such a potentially useful capability have such limited support in the marketplace? A basic understanding of both distributed database systems and replication will help solve this puzzle.

A first date

A distributed database system is a collection of networked sites, each able to function alone, but also able to access data anywhere else in the network exactly as if that data were stored locally. Each site has its own local databases and users can continue working even when the network is inoperable. The sites may be geographically distant, but more often than not are distributed logically across a single organization, such as by department or project. The idea of distributed but partnered databases is simple enough at the surface--what’s so special about copying data?--but in practice quickly becomes complex.

C.J. Date, author of the classic "An Introduction to Database Systems," describes distributed database systems with language uncannily similar to the message of homeland security integrators:

Carried to its logical conclusion, full support for [a] distributed database [system] implies that a single application should be able to operate "transparently" on data that is spread across a variety of different databases, managed by a variety of different DBMSs, running on a variety of different machines, supported by a variety of different operating systems, and connected together by a variety of different communication networks--where "transparently" means that the application operates from a logical point of view as if the data were all managed by a single DBMS running on a single machine. Such a capability might sound like a pretty tall order!--but it is highly desirable from a practical perspective, and vendors are working hard to make such systems a reality.
Date wrote this almost 10 years ago, and today it still sounds like a pretty tall order. But vendors continue to tinker with distributed database solutions because what made them highly desirable in the early 90s remains compelling today. Namely, enterprises of every shape, size, and color already are distributed, whether by logical divisions (such as departments or workgroups) or physical separation (for instance, a factory or laboratory). And we all typically want to keep our precious data local, where it most logically belongs.

No one encounters this local-data sentiment more often than centralized data warehouse cosultants. During planning sessions, these integrators can encounter strong resistance from their customers when suggesting a shift of local data to a remote centralized location. The seasoned database administrators in the discussion quickly get territorial, even hostile. "What happens to my workflow when the network is down, or when traffic is heavy? I have live users to support!" snorts the DBA in a panic. And they’re justified; performance is best when the users and their data are in the same place. So, matching the data storage to the reality of the business structure maximizes the efficiency of local data processing. Preserving those local efficiencies while simultaneously spreading the common wealth across the enterprise--efficiency plus mutual accessibility--is what makes the distributed database strategy so highly desirable.


Glossary
DBA: Database Administrator
DBMS: Database Management System

1. Introduction and Glossary   2. How's my complexion?   3. Updating that little black book   4. Delayed gratification