ITIL Intermediate OSA - Problem Management Tutorial

Welcome to lesson 5 ‘Problem Management’ of the ITIL Intermediate OSA Tutorial, which is a part of the ITIL Intermediate OSA Certification Course. This lesson is all about Problem Management and its features.

Let us begin with the objectives of this lesson.


By the end of this ‘Problem Management’ lesson, you will be able to:

  • Discuss the purposes, objectives, scope, activities, key concepts, triggers, inputs and outputs, challenges, risks, Critical success factors and KPIs of Problem Management process.

  • Know the types of problem management and get introduced to terms such as problem model, KEDB, and techniques.

Let us learn about the purpose and objective of problem management in the next section.

Preparing to become an expert in ITIL Intermediate OSA? Why not enroll in our ITIL OSA Course!

Problem Management Purpose and Objective

Problem Management is the process responsible for managing the lifecycle of all problems from first identification through further investigation, documentation and eventual removal.

The primary objectives of Problem Management are to:

  • prevent problems and resulting Incidents from happening

  • to eliminate recurring Incidents

  • to minimize the impact of Incidents that cannot be prevented.

Next, we will look at the scope of problem management.

Problem Management - Scope

What is the scope of problem management?

Problem Management contains the activities needed to diagnose the root cause of Incidents and determine the resolution to the resulting problems. It is responsible for ensuring that the resolution is implemented adhering to the correct control procedures, in particular, Change Management and Release Management.

Problem Management will maintain the information about problems and the appropriate workaround and resolutions so that the organization is able to reduce the number and impact of Incidents over time. In this respect, Problem Management has strong links with Knowledge Management, and tools such as the Known Error Database will be used for both.

Although Incident and Problem Management are separate processes, they are closely related and typically will use the same tools, and may use similar categorization, impact, and priority coding systems. This will ensure effective communication when dealing with related incidents and problems.

Problem Management process has both reactive and proactive approach, explained as follows:

  • Reactive problem management is concerned with solving problems in response to one or more incidents

  • Proactive problem management is concerned with identifying and solving problems and known errors before further incidents related to them can occur again

  • While reactive problem management activities are performed in reaction to specific incident situations, proactive problem management activities take place as ongoing activities targeted to improve the overall availability and end user satisfaction with IT services.

  • Conducting Major incident reviews and conducting periodic scheduled reviews of event logs targeting patterns and trends of warnings and exceptions. This is also under the scope of problem management.

  • Conducting brainstorming sessions to identify trends and Using check sheets to proactively collect data on service or operational quality issues that may help to detect underlying problems are almost the day to day job for this process

Does problem management add value to the business? Let’s look at it in the next section.

Problem Management - Value to the Business

As mentioned earlier, problem management is the main step in managing the lifecycle of all problems. Therefore let us now learn how it adds value to the business. Problem Management works together with Incident Management and Change Management to ensure that IT service availability and quality are increased.

When incidents are resolved, information about the resolution is recorded. Over time, this information is used to speed up the resolution time and identify permanent solutions, reducing the number and resolution time of Incidents. This results in less downtime and less disruption to business-critical systems.

Let us look into the key concepts and terms of problem management.

Problem Management - Key Concepts

Let us learn about problem management concepts and key terms such as:


A Problem is the unknown cause of one or more Incidents.


A temporary solution till the permanent solution is found.

Known Error

A problem that has a documented root cause and a work-around is a Known Error. Known Errors are created and managed throughout their lifecycle by problem management. Known Errors may also be identified by Developers or Suppliers.

Many Problems will be unique and will require handling in an individual way. However, it is conceivable that some Incidents may recur because of dormant or underlying roblems (for example, where the cost of a permanent resolution will be high and a decision has been taken not to go ahead with an expensive solution-but to “live with the problem”).

Similar to the Incident models you can use Problem Models to ensure quicker diagnosis. Reducing or eliminating the impact of an incident or problem for which a full resolution is not yet available, for e.g. by restarting a failed configuration Item.

Workaround for Problems is documented in Known Error Records. Workaround for Incidents that do not have associated Problem Records is documented in the Incident Record.

Known Error Database (KEDB)

The purpose of a Known Error Database is to allow storage of previous knowledge of Incidents and Problems-and how they have overcome-to allow quicker diagnosis and resolution if they recur.

The Known Error Record should hold exact details of the fault and the symptoms that occurred, together with precise details of any workaround or resolution action that can be taken to restore the service and/or resolve the Problem.

The Known Error Database is used at the initial diagnosis activity in the Incident Management process to see if any Incidents with the same or similar symptoms already exist. If they do exist, most likely there is a workaround that can be used to restore the service. The Known Error Database is owned by problem management.

Problem Models

Now let’s see what a Problem Model is?

Many problems will be unique and will require handling in an individual way-but it is conceivable that some Incidents may recur because of dormant or underlying Problems (for example, where the cost of a permanent resolution will be high, so leadership decides to instead to instead “live with the Problem”).

A Problem Model is a way of predefining the steps that should be taken to handle a process (in this case a process for dealing with a particular type of Problem) in an agreed way. Support tools can then be used to manage the required process. This will ensure that “standard” problems are handled in a predefined path and within predefined timeframes. This is a similar concept of Incident Models.

Like incident management process flow, problem management has its own process flow. Let’s check this out in the next section.

Problem Management Process Flow

The below mentioned figure depicts the standard flow of a problem lifecycle.

It all starts with problem detection.

Once the problem is detected it goes for categorization and prioritization.

Once the priority is being set, start the investigation.

In that investigation first, you try to identify a solution from the KEDB. If it is there then you just need to provide the solution but if it is not there then you have to work towards identifying the problem by doing root cause analysis.

But again before starting the root cause analysis, you have to provide the workaround or temporary solution so that the business can run till the time you do the analysis. Post investigation and once the resolution is implemented, inform the user and update KEDB before closing the ticket.

In the next few sections, we will get into the details of some of the activities of problem management.

Problem Management Activities

There are two major activities of problem management: Reactive and Proactive Problem Management.  

Reactive Problem Management

It is generally executed as part of Service Operation.

Proactive Problem Management

It is initiated in Service Operation but generally driven as part of Continual Service Improvement. CSI and Problem Management are closely related since one of the goals of problem management is to identify and remove errors permanently that impact services from the infrastructure.

This directly supports CSI activities of identifying and implementing service improvements. Problem Management also supports CSI activities through trend analysis and the targeting of preventive action. It is likely that multiple ways of detecting problems will exist in all organizations.

These will include:

  • Suspicion or detection of an unknown cause of one or more Incidents, resulting in a Problem Record being raised.

  • Analysis of an Incident by a technical support group which reveals that an underlying Problem exists, or could exist.

  • Automated detection of an infrastructure or application fault, using event/alert tools or a notification from a supplier that a Problem exists that has to be resolved could also trigger.

  • Analysis of Incidents as part of proactive Problem Management-resulting in the need to raise a Problem Record so that the underlying fault can be investigated further


All the relevant details of problem must be recorded so that a full historical record exists. This must be date and time stamped to allow suitable control and escalation.

The problem must be categorized in the same way as Incidents (and it is good practice to use the same system) so that the true nature of the problem can be easily traced in the future and meaningful management information can be obtained.

Problems should be prioritized in the same way and for the same reason as Incidents – but the frequency, impact, and cost to resolve the related Incidents must also be taken into account.

Next, we will learn about the techniques involved in problem management.

Problem Management Techniques

What are these techniques used in resolving problems?

The investigation should be conducted according to the priority code allocated; however, speed will depend upon impact, severity and the urgency of the problem. There are a number of problem solving techniques to help diagnose and resolve problem such as:

  • Chronological Analysis, which takes into account the timeline of events in chronological order to aid the investigation

  • Pain Value Analysis is a broader analysis of the impact of an Incident or Problem. An in-depth analysis is done to determine exactly what level of pain has been caused to the organization (or) business by these Incident/Problems.

  • One of the most common one is Kepner and Tregoe. A useful way of Problem analysis which can be used formally to investigate deeper rooted Problems.

  • Ishikawa Diagrams is a method of documenting causes and effects which can be useful in helping identify where something may be going wrong, or be improved.

  • Pareto Analysis is a technique for separating important potential causes from more trivial issues. These are the techniques of Problem management used during the analysis of the root ca

Next, we will look at the inputs and outputs of problem management.

Problem Management - Inputs and Outputs

The inputs and outputs of problem management are discussed below:


Inputs of problem management include:

  • Incident Records for incidents that have triggered problem management activities

  • Information about CIs and their status

  • Incident reports and histories that will be used to support proactive problem trending

  • Communication and feedback about incidents and their symptoms

  • Communication and feedback about RFCs and releases that have been implemented or planned for implementation

  • Communication of events that were triggered by event management

  • Operational and service level objectives

  • Agreed criteria for prioritizing and escalating problems

  • Outputs from risk management and risk assessment activities and Customer feedback on the success of problem resolution activities and overall quality of problem management activities


Outputs of problem management include:

  • Resolved Problems

  • Updated Problem Management Record

  • RFCs to remove infrastructural errors

  • Workarounds for incidents

  • Known Error Records

  • Problem Management Reports

  • Improvement Recommendation

Let us look into the triggers of problem management in the next section.

Problem Management - Triggers

With reactive problem management, the vast majority of problem records will be triggered in reaction to one or more incidents, and many will be raised or initiated via service desk.

Other problem records and corresponding known error records, may be triggered in testing, particularly the latter stages of testing such as user acceptance testing(UAT) if a decision is made to go ahead with a release even though some faults are known.

Suppliers may trigger the need for some problem records through the notification of potential faults or known deficiencies in their products or services. With proactive problem management, problem records may be triggered by identification of patterns and trends in incidents when reviewing historical incident records.

Next, we will learn about the interfaces of problem management.

Problem Management Interfaces

Problem Management interfaces with other processes which we will study below:

Change Management

With Change Management, Problem Management ensures that all resolutions or workaround that requires a change to a CI are submitted through Change Management through an RFC.

Configuration Management

With Configuration Management, Problem Management uses the CMS to identify faulty CIs and also to determine the impact of problems and resolutions. The CMS can also be used to form the basis for KEDB and hold or integrate with Problem Records.

Release and Deployment Management

Problem management also interfaces with Release and Deployment Management. It is responsible for rolling Problem fixes out into the live environment. It also assists in ensuring that the associated Known Errors are transferred from the development Known Error Database into the live Known Error Database.

Availability Management

Availability Management is involved in determining how to reduce downtime and increase uptime. As such, it has a close relationship with Problem Management, especially in the proactive areas.

Capacity Management

With Capacity Management the relationship is like some problems will require investigation by Capacity Management teams and techniques, e.g., performance issues. Capacity Management will also assist in assessing proactive measures.

IT Service Continuity Management

With IT Service Continuity, Problem Management acts as an entry point into IT Service Continuity Management where a significant Problem is not resolved before it starts to have a major impact on the business.

Service Level Management

With Service Level Management, Problem management does share the relationship. The occurrence of Incidents and Problems affects the level of service delivery measured by SLM. Problem Management contributes to improvements in service levels, and its management information is used as the basis of some of the SLA review components.

Financial Management

Financial Management assists in assessing the impact of proposed resolution or workaround, as well as Pin Value Analysis. Problem Management provides management information about the cost of resolving and preventing problems, which is used as input into budgeting and accounting system. And that shows that financial management shares a strong bond with problem management.

We have now looked at how problem management works in tandem with other functional management systems. Let us proceed to see how the information is managed in problem management process.

Problem Management - Information Management

The CMS will hold details of all of the components of the IT infrastructure as well as the relationships between these components. It will act as a valuable source for problem diagnosis and for evaluating the impact of Problems (e.g., if this server is down, what data is on that server?; which services use that data?; which users use those services?).

As it will also hold details of previous activities, it can also be used as a valuable source of historical data to help identify trends or potential weaknesses – a key part of proactive problem management. KEDB is another means of managing information under problem management.

Let us look into the critical success factors of information management.

Problem Management - Metrics

Based on the goals of the target audience (operational, tactical, or strategic) the process owners need to define what they should measure in a perfect world. To do this: Map the activities of the process that need to be measured.

Consider what measurements would indicate that each service and Service Management activity is being performed consistently and can determine the health of the process. It is important to identify the measurements that can be provided based on existing toolsets, organizational culture, and process maturity.

Note that there may be a gap in what can be measured vs. what should be measured. When initially implementing processes don’t try to measure everything, rather be selective about what measures will help to understand the health of a process. A major mistake many organizations make is trying to do too much in the beginning. Be smart about what you choose to measure.

Next, let us look at the challenges and risks of problem management.

Problem Management - CSFs and KPIs

Each organization should identify appropriate CSFs based on its objectives for the process. Each sample CSF is followed by a small number if typical KPIs that support the CSF. These KPIs should not be adopted without careful consideration.

Each organization should develop KPIs that are appropriate for its level of maturity, its CSFs and its particular circumstances. Achievement against KPIs should be monitored and used to identify opportunities for improvement, which should be logged in the continual service improvement (CSI) register for evaluation and possible implementation

The following table depicts the CSFs and their corresponding KPIs:



Minimizing the impact to the business of incidents that cannot be prevented.

  • The number of known errors added to the KEDB

  • The percentage accuracy of the KEDB

  • Percentage of incidents closed by the service desk without reference to other levels of support

  • Average incident resolution time for those incidents linked to problem records

Maintain quality of IT services through the elimination of recurring incidents

  • Total number of problems

  • Size of current problem backlog for each IT Service

  • Number of repeat incidents for each IT service

Provide overall quality and professionalism of problem handling activities to maintain business confidence in IT capabilities

  • The number of major problems

  • The percentage of major problem reviews successfully performed

  • The percentage of major problem reviews completed successfully and on time

  • Number and percentage of problems incorrectly assigned

  • Number and percentage of problems incorrectly categorized.

Problem Management - Challenges and Risks

Key Challenges associated with problem management are:

  • Linking Incident and Problem Management tools

  • The ability to relate Incident and Problem Records

  • The second-and third-line staff should have a good working relationship with staff on the first line

  • Making sure that business impact is well understood by all staff working on problem resolution.

  • Able to use all Knowledge and Configuration Management resources available.

  • Training of technical staff

Problem Management risks include:

  • Not properly aligned with incident management

  • Not focused on process improvement

  • Lack of management commitment

  • Poor communications


In this lesson, we have learned about problem management purpose, objective, scope, challenges, risks, scope, metrics, types of problem management, CSFs and KPIs.

The next lesson talks about Access Management.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Request more information

For individuals
For business
Phone Number*
Your Message (Optional)
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Work Email*
Phone Number*
Job Title*