Bug of the day, Tuesday 2013-04-02

Implicitly finding bugs in other people’s code

A crash was reported with a call stack leading to some code I’d been making changes to a week ago (code that previously featured in Unexpected use of class and Initialized, but not in the right order).

The code I’d changed now looks something like this:

struct Foo : public Baz {
  Bar* m_Bar;
  Foo() : m_Bar(NULL) {}
  void SetBar(Bar* bar) { m_Bar = bar; }

  virtual void Destroy() override;
};

The cause of the crash was something trying to make use of a null m_Bar member of a Foo object.

With regard to the previous crashes I’d encountered, my first suspicion that this was an object that was somehow being created in a way that had not correctly initialized the m_Bar. Checking the circumstances of the crash (the reporter was able to find a specific cause, which was a great help), I was unable to see a way that this object could have been created with m_Bar remaining null.

Sidebar: A way in which my debugging skills have improved in the last year

I have identified the following tendency in my thinking: see a bug, take a guess at the cause, and then look for that cause in the following way:

1. Examine some part of the codebase.
2. If expected cause not found, goto 1.

So if I guessed the wrong cause, I get stuck in a loop: “It must be X, but I haven’t yet found how X could happen — I must look harder!”

Part of this came, I think, from a degree of insecurity about my own ability to read and understand code: “the bug is in here, but I must have missed something”

What I’ve realized in the past year is that my code reading skills aren’t that bad — if I haven’t found the cause I’m looking for, it’s more likely that I’m looking for the wrong cause. Particularly, I am now able to approach a problem with a hypothesis about the cause, and can more clearly identify when I’ve found some information that invalidates that hypothesis and I should be looking for a different cause.

In the context of this bug, validating how the object was being created and seeing that the codepath would always validly initialize m_Bar meant that the bug was not in the initialization — it must be somewhere else.

The other nuller

So something else was setting m_Bar to null after the initialization. There is one other place where that happens…

  virtual void Destroy() {
    if( m_Bar != NULL) {
      m_Bar->Destroy();
      m_Bar = NULL;
    } else
      HARD_TO_MISS_NOTIFICATION( USEFUL_WARNING_37 );
  }

So, it looks like there’s a good chance that this object has had its Destroy() called, and looking through the code, it becomes clear that this is indeed what has happened.

There was a Validate() function called on our object that checked certain things about that object’s state, calling Destroy() if it did not meet certain conditions.
Validate() was called before attempting to make use of m_Bar.
The bool returned by Validate() is ignored by the calling code.
The circumstances that lead to the crash are such that Validate() will always call Destroy().

One if(!Validate()) return; later, the crash no longer occurs.

This bug came from a combination of circumstances:

existing code that ignored the logical (? I guess that’s the right word…) validity of the object — it had been destroyed, but still contained valid data, and
my change with caused the logical state to be reflected in actual state.

I did examine all the other callers of the Validate() function — none of these exhibited the same type of problem.

Implicitly finding bugs in other people’s code

Sidebar: A way in which my debugging skills have improved in the last year

The other nuller

Leave a Reply