The article below aims to clarify facts on ITIL concepts for the benefit of reader & the community
This write up is prompted by the need to clarify the difference between ITIL Incident management & ITIL Problem management. One would guess that there should be a simple answer to this, given the enormous amount of resources & information available on the subject. True, the subject has been debated extensively, yet the jury is out on the verdict.
It’s worth noting here that the answer to the question is built on the principle of best practice. Whether an organization uses incidents & problems interchangeably or it clearly delineates the two terms, it’s up to the organization to decide.
Generally speaking, an incident is an event that results in disruption to a service. Incident Management is tasked with restoring normal operations to a service when it suffers an interruption or is degraded or down. A problem on the other hand is the cause of an incident. Problem Management is tasked with preventing problem recurrence. In other words, Incident Management wants to "fix it now" by just about any means necessary, including work-around. Problem Management is a more methodical discipline, looking for patterns that will indicate some systemic problem and setting in motion actions that will prevent it from happening again.
According to ITIL V3, “A problem is a cause of one or more Incidents. The cause is not usually known at the time a Problem Record is created, and the Problem Management Process is responsible for further investigation”. ITIL V3 Foundation Training has now become one of the most sought after trainings by the organizations all across the globe.
Preparing for ITIL? Take this test to know where you stand!
The confusion between incident & problem management arise due to various factors:
- Misunderstanding of the difference between an incident and a problem: Problems are not "really big incidents" as some people may describe. Theoretically, all incidents have an underlying cause, which is a problem. Multiple incidents could point to one underlying problem, such as the network cable that gets pulled in the data center, shutting down several services to end users. Or, a problem could just be a user that needs some training.
- Multi functional resources: if the people doing Incident Management are the same people doing Problem Management, proper Problem Management will simply not happen, due to the constant fire fighting. Problem Management follows more methodical approach, takes time, special skills and tools, and is a major investment, unlike Incident management where the focus is to ensure quick turnaround. A careful decision needs to be made & payouts weighed, for the simple reason that payoff from reduced outages and reduced need for Incident Management resources should outweigh the investment in problem management.
- There is no clear guidance on who or how the major incidents should be handled, leaving it up to an organization, thus creating room for confusion.
One of the major differences between incident & problem management is reactive & proactive approach. While problem management focuses on eliminating the cause altogether leading to lesser incidents, incident management focuses on restoring the services disrupted due to the event (problem or cause).
Incident management works on the concept of “here and now”. One cannot leave an incident unresolved for significant time while the problem management team is working on a resolution for the root cause. The severity and impact of the incident must dictate development of a quick workaround within the prescribed SLAs for incident management.
Severity & Impact
The severity & impact of an incident are different. Severity of an incident is usually denoted by the scale of disruption while impact is the number of users who are not able to use the service and to what extent.
Both Incident & problem management exist side by side, again the choice is with the organization.
Incident management is never concerned with formal RCA in to the cause of the incident, that’s formally the role played by Problem management. Again, In terms of RCA, it is up to the organization. During the Incident, if however the cause is known, it’s documented for record and verification through formal problem management.