Geospatial Solutions - A Date with Replication: Distributed Spatial Database Systems

	This article originally appeared in Geospatial Solutions Magazine's Net Results column of April 1, 2003. Other Net Results articles about the role of emerging technologies in the exchange of spatial information are also online.

1. Introduction and Glossary 2. How's my complexion? 3. Updating that little black book 4. Delayed gratification

How’s my complexion?

If the efficiency of mirroring enterprise structure with database structure is the greatest benefit of a distributed database system, the greatest disadvantage is the resulting complexity. Networked users everywhere can and do make changes to their local data, and sometimes, hopefully unwittingly, to remote data. (If the system is well-built, users don’t even know that some of their data actually originates from another city rather than under their own desks.) Especially along shared boundaries, two users may attempt to update the same record at the same time, both moving the same street segment, for instance, but in different directions. What happens then? Will one user’s change be ignored or overwritten by the other? If the system is distributed, but the network is down, and each user changes her own copy of the same street segment, will their two databases remain inconsistent with each other? This multiuser example of potential data corruption is only one of several threats to a distributed organization’s data. In fact, threats to data integrity, whether deliberate or accidental, are largely responsible for the creation of databases in the first place: databases are designed to protect data.

Database theory: a quickie. Databases protect data integrity, recovery, and concurrency with mechanisms that prevent users from entering bad data, losing data during a system failure, or corrupting existing data. New database users learn about integrity during their first database design effort. Because different users have different opinions regarding data entry, databases respond with integrity rules to guarantee that, for instance, "Avenue" will always be abbreviated as "Ave" or that streams will only flow downhill. Recovery and concurrency are less obvious data protection mechanisms. All disk heads must eventually crash; databases respond with recovery procedures to minimize data loss. Multiple database users touch the same data at the same time; databases respond with concurrency control mechanisms to prevent interference of one request with another, avoiding lost updates (as in the previous example) or inconsistent analyses.

To deliver their data-protection guarantee, particularly regarding recovery and concurrency, databases apply the same rules to every request (or transaction) by every user. Specifically, databases take an all-or-nothing approach called atomicity to each user’s request. If a database is able to complete all parts of a user’s request successfully, it still is not finished--it must then perform an operation called a commit. Like a fighter pilot painting a star on his plane’s nose for each enemy shot down, the database logs each successfully completed transaction as committed. If unable to complete any part of a request, however, the database reverts or "rolls back" to the state preceding the request, undoing every part of the transaction. Even if the system crashes in the middle of a transaction, then, during the recovery process, the absence of a commit signals the need to roll back the disrupted transaction. Atomic transactions guarantee that, for example, your request to delete all alleyways will either succeed or fail completely, never delete just some alleyways, get disrupted, and then, unbeknownst to you, fail to finish the job.

Hand in hand with atomicity is database locking, a mechanism that prevents simultaneous transactions on the same data from interfering with each other. The moment one user begins to edit a street segment, the database locks that segment’s table, or a subset of the table records, or even the single record itself, until after the edit has been committed. While locked, no other user can edit, or, in some implementations, even see the segment. (While this strategy solves concurrency problems, it raises new ones, such as deadlocks, beyond the scope of this discussion. The key concept, though, is that locks prevent conflicts and protect data concurrency.)


1. Introduction and Glossary 2. How's my complexion? 3. Updating that little black book 4. Delayed gratification