Implicitly finding bugs in other people’s code
A crash was reported with a call stack leading to some code I’d been making changes to a week ago (code that previously featured in Unexpected use of class and Initialized, but not in the right order).
The code I’d changed now looks something like this:
struct Foo : public Baz {
Bar* m_Bar;
Foo() : m_Bar(NULL) {}
void SetBar(Bar* bar) { m_Bar = bar; }
virtual void Destroy() override;
};
The cause of the crash was something trying to make use of a null m_Bar member of a Foo object.
With regard to the previous crashes I’d encountered, my first suspicion that this was an object that was somehow being created in a way that had not correctly initialized the m_Bar. Checking the circumstances of the crash (the reporter was able to find a specific cause, which was a great help), I was unable to see a way that this object could have been created with m_Bar remaining null.
Sidebar: A way in which my debugging skills have improved in the last year
I have identified the following tendency in my thinking: see a bug, take a guess at the cause, and then look for that cause in the following way:
1. Examine some part of the codebase.
2. If expected cause not found, goto 1.
So if I guessed the wrong cause, I get stuck in a loop: “It must be X, but I haven’t yet found how X could happen — I must look harder!”
Part of this came, I think, from a degree of insecurity about my own ability to read and understand code: “the bug is in here, but I must have missed something”
What I’ve realized in the past year is that my code reading skills aren’t that bad — if I haven’t found the cause I’m looking for, it’s more likely that I’m looking for the wrong cause. Particularly, I am now able to approach a problem with a hypothesis about the cause, and can more clearly identify when I’ve found some information that invalidates that hypothesis and I should be looking for a different cause.
In the context of this bug, validating how the object was being created and seeing that the codepath would always validly initialize m_Bar meant that the bug was not in the initialization — it must be somewhere else.
The other nuller
So something else was setting m_Bar to null after the initialization. There is one other place where that happens…
virtual void Destroy() {
if( m_Bar != NULL) {
m_Bar->Destroy();
m_Bar = NULL;
} else
HARD_TO_MISS_NOTIFICATION( USEFUL_WARNING_37 );
}
So, it looks like there’s a good chance that this object has had its Destroy() called, and looking through the code, it becomes clear that this is indeed what has happened.
- There was a Validate() function called on our object that checked certain things about that object’s state, calling Destroy() if it did not meet certain conditions.
- Validate() was called before attempting to make use of m_Bar.
- The bool returned by Validate() is ignored by the calling code.
- The circumstances that lead to the crash are such that Validate() will always call Destroy().
One if(!Validate()) return; later, the crash no longer occurs.
This bug came from a combination of circumstances:
- existing code that ignored the logical (? I guess that’s the right word…) validity of the object — it had been destroyed, but still contained valid data, and
- my change with caused the logical state to be reflected in actual state.
I did examine all the other callers of the Validate() function — none of these exhibited the same type of problem.