Nailing down the bug

by John Samuel Anderson last modified Jan 15, 2009 01:56 AM
Finally, we find the root cause.

Source Code snippets

For the sake of continuity, here's the function source code that we're going to look at while debugging. I'm including it here, so that it doesn't get in the way of the debug session.

NOTE: You could also see these snippets interactively, using the Pdb list command.  However, I prefer to open the file in a text editor, since it's easier for browsing code.

From /var/plone/infocenters/eggs/ZODB3-3.8.1b7-py2.4-linux-x86_64.egg/ZODB/utils.py:

def repr_to_oid(repr):
    if repr.startswith("0x"):
        repr = repr[2:]
    try:
        as_bin = unhexlify(repr)
    except TypeError:
        import pdb; pdb.set_trace()
    as_bin = "\x00"*(8-len(as_bin)) + as_bin
    return as_bin

From /var/plone/infocenters/eggs/ZODB3-3.8.1b7-py2.4-linux-x86_64.egg/ZODB/blob.py:

    def listOIDs(self):
        """Lists all OIDs and their paths.

        """
        for candidate in os.listdir(self.base_dir):
            if candidate == 'tmp':
                continue
            oid = utils.repr_to_oid(candidate)
            yield oid, self.getPathForOID(oid)
Also from blob.py:
    @non_overridable
    def _packUndoing(self, packtime, referencesf):
        # Walk over all existing revisions of all blob files and check
        # if they are still needed by attempting to load the revision
        # of that object from the database.  This is maybe the slowest
        # possible way to do this, but it's safe.
        base_dir = self.fshelper.base_dir
        for oid, oid_path in self.fshelper.listOIDs():
            files = os.listdir(oid_path)

            for filename in files:
                filepath = os.path.join(oid_path, filename)
                whatever, serial = self.fshelper.splitBlobFilename(filepath)
                try:
                    fn = self.fshelper.getBlobFilename(oid, serial)
                    self.loadSerial(oid, serial)
                except POSKeyError:
                    remove_committed(filepath)

            if not os.listdir(oid_path):
                shutil.rmtree(oid_path)

Catching the Bug

After restarting Zope, I using the ZMI to re-run the "pack" command.  After a while, it dropped to a Pdb prompt:

> /var/plone/infocenters/eggs/ZODB3-3.8.1b7-py2.4-linux-x86_64.egg/ZODB/utils.py(146)repr_to_oid()
-> as_bin = "\x00"*(8-len(as_bin)) + as_bin
What's on the stack?
(Pdb) dir()
['pdb', 'repr']
Hmmm. What's repr?
(Pdb) repr
'.layout'
Interesting. What function called this?
(Pdb) up
> /var/Plone/eggs/ZODB3-3.8.1b7-py2.4-linux-x86_64.egg/ZODB/blob.py(391)listOIDs()
-> oid = utils.repr_to_oid(candidate)
NOTE: We switched to another file: blob.py. So, what's candidate?
(Pdb) candidate
'.layout'
Well, that's not so helpful. OK, what's the stack hold now?
(Pdb) dir()
['candidate', 'oid', 'self']
(Pdb) oid
'\x00\x00\x00\x00\x00\r\xf49'
(Pdb) self
How about up the stack?
(Pdb) up
> /var/Plone/eggs/ZODB3-3.8.1b7-py2.4-linux-x86_64.egg/ZODB/blob.py(497)_packUndoing()
-> for oid, oid_path in self.fshelper.listOIDs():
I wonder, can we find out which directory is being affected?
(Pdb) dir()
['base_dir', 'filename', 'filepath', 'files', 'fn', 'oid', 'oid_path', 'packtime', 'referencesf', 'self', 'serial', 'whatever']
(Pdb) oid_path
'/var/Plone/var/blobstorage/0x0df439'
Let's take a look at that directory listing:
$ ll /var/Plone/var/blobstorage/0x0df439
total 184
-rwxr-xr-x 1 plone plone 184217 Sep 16 16:21 0x03777fad209a54ee.blob
Drat. I don't think we're looking at the right stack. Let's try to get back to the listOIDs() stack frame.
(Pdb) down
> /var/Plone/eggs/ZODB3-3.8.1b7-py2.4-linux-x86_64.egg/ZODB/blob.py(391)listOIDs()
-> oid = utils.repr_to_oid(candidate)
Good. We're here. I wonder what self.base_dir is?
(Pdb) self.base_dir
'/var/plone/infocenters/var/blobstorage'
(Pdb) candidate
'.layout'
Oh, wait a moment! Maybe there's a file in the base directory that's causing trouble. Let's see:
$ ll -a | grep layout
-rw-r--r--    1 root  root       4 Dec  4 13:28 .layout
Yippee! This is the .layout file that's causing trouble. We know from the source code of listOIDs() in blob.py that blob storage is expecting a hex name for everything except tmp. When it tries to convert the value ".layout" from hex, it bombs.