Problem solving can be tricky. Critical Problem Solving skills are essential in teams, with a structured approach to solving problems being particularly crucial in software development.
Have you ever had a team make an early assumption about the root cause and then waste time devising and deploying a solution that doesn't actually fix the problem?
One of the core techniques in my approach to Critical Problem Solving is what I call "Prediction as Proof".
My assertion is that if you can predict something confidently and repeatably then it is "a thing". If your prediction fails, then it is not "a thing" and you need a new hypothesis, or at least a modified one to explain the failed prediction.
This helps to avoid the scenario where you spend many hours or days chasing a solution to a flawed assumed root cause based on a couple of unstructured tests where the answer looked about right.
If you can predict something confidently and repeatably then it is "a thing".
So, when we have a problem to solve, and my team comes up with a hypothesis to explain it, I ask them to Prove It. I ask them to predict the outcome from interactions that are designed to test the hypothesis.
When they can predict the outcome of a test with confidence, then we can start to have confidence that they are on to something. It is important to also prove that the opposite of a hypothesis is also predictable. What I mean is if you can predict that the symptoms of a problem will present themselves given a set of circumstances, it is also essential to demonstrate that in the absence of those circumstances the problem symptoms do not appear. If any of the predictions fail (as long as they’re directly aligned to the hypothesis), then they’re unlikely to have found the real root cause of the problem.
I was once working with a software developer trying to work out why a particular set of information was not displaying correctly on a user's screen. I won't be more specific than that because I don't want to give any clues as to which of the businesses I've worked for or with that this was at.
The information didn't always display incorrectly. In some situations it was there and in others it was missing, but superficially there was no difference between the scenarios where it happened and didn't happen. It was a real headscratcher.
Eventually we tracked it down to a very minor difference between some data passing into the system in success versus fail scenarios. The data was a collection of timestamped records where a single sample of data was taken every 1 second, and it was being compared to a separate set of data that was also timestamped. Very occasionally the timestamps were not close enough within tolerances and the particular characteristic of the data was ignored on display.
When we challenged ourselves to prove it we were able to say "if record x and record y are more than z milliseconds apart in the timestamp, the information will not display, otherwise it will". Then we set about testing the theory. Without fail, over dozens of tests, when we inspected the incoming data and applied our theory we saw the outcome we expected.
Without fail, over dozens of tests, we saw the outcome we expected.
We had predicted both the scenarios when the issue presented itself and when it did not. Over enough tests, we considered that prediction was proof, and we were able to design a solution to the problem. On deployment, the issue did not present itself again, ever.
The key point in all of this is that most issues have many possible root causes at the outset, but only one of those options is actually causing the problem. That option will always adhere to the rules of your prediction, so test it until you can predict confidently and repeatably, and you will find the answer.
Σχόλια