Bug of the day, Tuesday 2013-04-02

Implicitly finding bugs in other people’s code

A crash was reported with a call stack leading to some code I’d been making changes to a week ago (code that previously featured in Unexpected use of class and Initialized, but not in the right order).

The code I’d changed now looks something like this:

struct Foo : public Baz {
  Bar* m_Bar;
  Foo() : m_Bar(NULL) {}
  void SetBar(Bar* bar) { m_Bar = bar; }

  virtual void Destroy() override;
};

The cause of the crash was something trying to make use of a null m_Bar member of a Foo object.

With regard to the previous crashes I’d encountered, my first suspicion that this was an object that was somehow being created in a way that had not correctly initialized the m_Bar. Checking the circumstances of the crash (the reporter was able to find a specific cause, which was a great help), I was unable to see a way that this object could have been created with m_Bar remaining null.

Sidebar: A way in which my debugging skills have improved in the last year

I have identified the following tendency in my thinking: see a bug, take a guess at the cause, and then look for that cause in the following way:

1. Examine some part of the codebase.
2. If expected cause not found, goto 1.

So if I guessed the wrong cause, I get stuck in a loop: “It must be X, but I haven’t yet found how X could happen — I must look harder!”

Part of this came, I think, from a degree of insecurity about my own ability to read and understand code: “the bug is in here, but I must have missed something”

What I’ve realized in the past year is that my code reading skills aren’t that bad — if I haven’t found the cause I’m looking for, it’s more likely that I’m looking for the wrong cause. Particularly, I am now able to approach a problem with a hypothesis about the cause, and can more clearly identify when I’ve found some information that invalidates that hypothesis and I should be looking for a different cause.

In the context of this bug, validating how the object was being created and seeing that the codepath would always validly initialize m_Bar meant that the bug was not in the initialization — it must be somewhere else.

The other nuller

So something else was setting m_Bar to null after the initialization. There is one other place where that happens…

  virtual void Destroy() {
    if( m_Bar != NULL) {
      m_Bar->Destroy();
      m_Bar = NULL;
    } else
      HARD_TO_MISS_NOTIFICATION( USEFUL_WARNING_37 );
  }

So, it looks like there’s a good chance that this object has had its Destroy() called, and looking through the code, it becomes clear that this is indeed what has happened.

  • There was a Validate() function called on our object that checked certain things about that object’s state, calling Destroy() if it did not meet certain conditions.
  • Validate() was called before attempting to make use of m_Bar.
  • The bool returned by Validate() is ignored by the calling code.
  • The circumstances that lead to the crash are such that Validate() will always call Destroy().

One if(!Validate()) return; later, the crash no longer occurs.

This bug came from a combination of circumstances:

  • existing code that ignored the logical (? I guess that’s the right word…) validity of the object — it had been destroyed, but still contained valid data, and
  • my change with caused the logical state to be reflected in actual state.

I did examine all the other callers of the Validate() function — none of these exhibited the same type of problem.

 

Leave a Reply

Your email address will not be published.