The article below aims to clarify facts on ITIL concepts for the benefit of reader & the community
There is frequently a conflict between Incident Management & Problem Management
Problem management should aim to reduce the adverse impact of incidents & problems that are caused by errors within the IT infrastructure, and to prevent recurrence of incidents. Problems should be addressed in priority order with higher priority given to the resolution of problems that can cause serious disruption to critical IT services.
A 'Problem' is the unknown cause of one or more incidents, often identified as a result of multiple similar incidents. A 'Known error' is an identified root cause of a Problem.
The Problem Management process uses these inputs:
- Incident Records And Details About Incidents
- Known Errors
- Information about CIs From The CMDB
- Information From Other Processes
The outputs of the Problem management process:
- RFCs (Request for Change)
- Management Information
- Work Around
- Known Errors
- Update Problem Records (solved problems records if the known error is resolved)
Problem management works by using analysis techniques to identify the cause of the problem. Incident management is not usually concerned with the cause, only the cure, i.e. restoration of service. Problem management takes longer and should be done once the urgency of the incident has been dealt with, for example, removing a faulty computer and replacing it with a working computer, takes the urgency away and leaves the faulty computer ready for diagnostics.
Significant points of difference between Incident & Problem Management
An incident never becomes a problem. An incident is an event that results in disruption to a service. A problem is its cause.
Problem management is not concerned with resolving incidents. Problem management may provide either workarounds that are used by incident management to resolve incidents, or it identifies and removes the causes, so that incidents do not occur in the future (proactive).
The goal of incident management is not to determine the root cause of the incident as it only focuses on restoring the services - if the service worked yesterday, the team should be able to figure out how to make it work as it did yesterday. For example, if the fastest way to resolve an incident is to reboot a server, then that is what is done. Incident management team does not need to know the cause to do this. Problem management aims at eliminating the root cause to prevent reoccurrence.
What comes first
The first priority is always to restore the service. One never waits for problem management to figure out the root cause before trying to resolve an incident. One may wish to worry about the cause & how to eliminate the cause afterwards if the impact is significant or just file it for records.
A problem can exist without having an immediate impact on the users, for example, just in the form of network diagnostics indicating that system is not behaving in expected manner. Incidents are usually more visible and the impact on the user is more immediate, either in the form of downgrade or complete stoppage of services.
On many occasions, delay in resolution for an incident may result in financial or other penalty as agreed in Statement of work between the parties. Though this increases the impact, it should not be a yardstick to categorize an incident as problem, as many may believe. This may correctly impact the priority of the incident.
Benefits of Problem Management
- Improved quality of the IT service
- Incident volume reduction
- Permanent solutions
- Improved organizational learning
- A better first time fix rate at the Service Desk