The best reason to follow a structured approach is the shot in the dark that often happens if you don’t gather enough information or work to nail down the cause of the problem. It results in haphazard guessing with little to remind you of what you have changed and why.
A general outline of troubleshooting as described in the first paragraph of this chapter:
1. Report
2. Define the Issue
3. Gather Information
4. Better define the issue
5. Hypothesize root cause
6. Propose resolution
7. Test resolution
8. Document solution
Troubleshooting Model:
1. Problem report — Often needs further investigation.
2. Problem diagnosis
a. Collect information — Use sh and debug to get a better understanding.
b. Examine information
— Look for evidence that points to the cause.
— Look for evidence that can eliminate a vector.
— What is happening on the network.
— What should be happening on the network.
c. Eliminate causes — Start to form hypotheses for what is wrong.
d. Decide most likely cause — Once possible causes have been narrowed decide on most plausible.
e. Verify Hypothesis — How can you test whether you have found the problem.
3. Problem resolution — Apply solution, test and document.
Troubleshooting Methods:
- Top-down Method — Start at layer 7, the application and work down.
- Bottom-up Method — Start at layer 1 and make sure the physical layer is correct. No efficient in later networks.
- Divide and Conquer — Begin by pinging and work up or down the OSI stack accordingly.
- Follow the traffic — Work through the network one switch at a time from source to destination.
- Configuration Comparison — Compare a working configuration with one that does not work.
- Component swapping — By changing components you can figure out which one is not working, either hardware or misconfigured.
Troubleshooting involves knowing what should be happening as opposed to what is happening on the network. The best way to do that is to have a baseline which could involve SNMP and NetFlow data.