Chapter 20


Our conservative locking in _db_dodelete is to avoid race conditions with db_nextrec. If the call to _db_writedat were not protected with a write lock, it would be possible to erase the data record while db_nextrec was reading that data record: db_nextrec would read an index record, determine that it was not blank, and then read the data record, which could be erased by _db_dodelete between the calls to _db_readidx and _db_readdat in db_nextrec.


Assume that db_nextrec calls _db_readidx, which reads the key into the index buffer for the process. This process is then stopped by the kernel, and another process runs. This other process calls db_delete, and the record being read by the other process is deleted. Both its key and its data are rewritten in the two files as all blanks. The first process resumes and calls _db_readdat (from db_nextrec) and reads the all-blank data record. The read lock by db_nextrec allows it to do the read of the index record, followed by the read of the data record, as an atomic operation (with regard to other cooperating processes using the same database).


With mandatory locking, other readers and writers are affected. Other reads and writes are blocked by the kernel until the locks placed by _db_writeidx and _db_writedat are removed.


By writing the data record before the index record, we protect ourselves from generating a corrupt record if the process should be killed in between the two writes. If the process were to write the index record first, but be killed before writing the data record, then we'd have a valid index record that pointed to invalid data.