ITIL Intermediate OSA - Problem Management Tutorial

5.1 Problem Management

Hello and welcome to the eLearning course on ITIL 2011 Intermediate OSA module preparation, by Simplilearn. This is module 5 about Problem Management. Just to remind you it is compulsory that you have access to all 9 modules and attempt the quiz at the end of this module. So far, we have covered Foundation on ITIL, Introduction on OSA, Event Management, Incident Management and Request Fulfillment modules. Now let us proceed to learn about Problem Management. The next slide speaks about the agenda of this module.

5.2 Problem Management

Here we will discuss about the purposes, objectives, scope, activities, key concepts, triggers, inputs and outputs, challenges, risks, Critical success factors and KPIs of Problem Management process. We will also learn about types of problem management and get introduced to terms such as problem model, KEDB and techniques. Let us learn about the purpose and objective of problem management in the next slide.

5.3 Problem Management - Objective

Problem Management – Purpose and Objectives What is the purpose and objective of problem management? Problem Management is the process responsible for managing the lifecycle of all Problems. The primary objectives of Problem Management is to prevent Problems and resulting Incidents from happening, to eliminate recurring Incidents and to minimize the impact of Incidents that cannot be prevented. Next, we will look at the scope of problem management.

5.4 Problem Management - Scope

Problem Management-Scope What is the scope of problem management? Problem Management contains the activities needed to diagnose the root cause of Incidents and determine the resolution to the resulting Problems. It is responsible for ensuring that the resolution is implemented adhering to the correct control procedures, in particular Change Management and Release Management. Problem Management will maintain the information about Problems and the appropriate work-around and resolutions, so that the organization is able to reduce the number and impact of Incidents over time. In this respect, Problem Management has strong links with Knowledge Management, and tools such as the Known Error Database will be used for both. Although Incident and Problem Management are separate processes, they are closely related and typically will use the same tools, and may use similar categorization, impact and priority coding systems. This will ensure effective communication when dealing with related Incidents and Problems. Problem Management process has both reactive and proactive approach: • Reactive problem management is concerned with solving problems in response to one or more incidents • Proactive problem management is concerned with identifying and solving problems and known errors before further incidents related to them can occur again • While reactive problem management activities are performed in reaction to specific incident situations, proactive problem management activities take place as ongoing activities targeted to improve the overall availability and end user satisfaction with IT services. • Conducting Major incident reviews and conducting periodic scheduled reviews of event logs targeting patterns and trends of warnings and exceptions. This is also under the scope of Problem management. • Conducting brainstorming sessions to identify trends and Using check sheets to proactively collect data on service or operational quality issues that may help to detect underlying problems are almost the day to day job for this process. Does problem management add value to the business? Let’s look at it in the next slide.

5.5 Problem Management - Value to the Business

Problem Management-Value to the Business As mentioned earlier, problem management is the main step in managing the lifecycle of all problems. Therefore let us now learn how does it add value to the business? Problem Management works together with Incident Management and Change Management to ensure that IT service availability and quality are increased. When Incidents are resolved, information about the resolution is recorded. Over time, this information is used to speed up the resolution time and identify permanent solutions, reducing the number and resolution time of Incidents. This results in less downtime and less disruption to business critical systems. Let us look into the key concepts and terms of problem management.

5.6 Problem Management - Key Concepts

Problem Management-Key Concepts In this slide we will learn about problem management concepts and key terms. A Problem is the unknown cause of one or more Incidents. A Problem that has a documented Root Cause and a Work-around is a Known Error. Known Errors are created and managed throughout their lifecycle by Problem Management. Known Errors may also be identified by Developers or Suppliers. Many Problems will be unique and will require handling in an individual way-but it is conceivable that some Incidents may recur because of dormant or underlying Problems (for example, where the cost of a permanent resolution will be high and a decision has been taken not to go ahead with an expensive solution-but to “live with the Problem”). Similar to the Incident models you can use Problem Models to ensure quicker diagnosis. Reducing or eliminating the impact of an Incident or Problem for which a full resolution is not yet available, for e.g.(pronounced as example) by restarting a failed Configuration Item. Work–around for Problems are documented in Known Error Records. Work-around for Incidents that do not have associated Problem Records is documented in the Incident Record. We have looked at the key concept, the next slide is the continuation and we will get introduced to some of the key terms.

5.7 Problem Management - Key Concepts

Problem Management –Key concepts In this slide we will see the details of two key concepts of problem management. And they are Known Error Database and Problem models. Known Error Database (KEDB) The purpose of a Known Error Database is to allow storage of previous knowledge of Incidents and Problems-and how they have overcome-to allow quicker diagnosis and resolution if they recur. The Known Error Record should hold exact details of the fault and the symptoms that occurred, together with precise details of any work-around or resolution action that can be taken to restore the service and/or resolve the Problem. The Known Error Database is used at the initial diagnosis activity in the Incident Management process to see if any Incidents with the same or similar symptoms already exist. If they do exist, most likely there is a work-around that can be used to restore the service. The Known Error Database is owned by Problem Management. Now let’s see what is a Problem Model? Many Problems will be unique and will require handling in an individual way-but it is conceivable that some Incidents may recur because of dormant or underlying Problems (for example, where the cost of a permanent resolution will be high, so leadership decides to instead to instead “live with the Problem”). A Problem Model is a way of predefining the steps that should be taken to handle a process (in this case a process for dealing with a particular type of Problem) in an agreed way. Support tools can then be used to manage the required process. This will ensure that “standard” Problems are handled in a predefined path and within predefined timeframes. This is similar concept of Incident Models. Like incident management process flow, problem management has its own process flow. Let’s check this out in the next slide.

5.8 Problem Management - Process Flow

Problem Management – Process Flow The mentioned figure depicts the standard flow of a problem lifecycle. It all starts with problem detection. Once the problem is detected it goes for categorization and prioritization. Once the priority is being set, start the investigation. In that investigation first you try to identify a solution from the KEDB. If it is there then you just need to provide the solution but if it is not there then you have to work towards identifying the problem by doing root cause analysis. But again before starting the root cause analysis you have to provide workaround or temporary solution so that the business can run till the time you do the analysis. Post investigation and once the resolution is implemented, inform the user and update KEDB before closing the ticket. In the next few slides you will get into the details of some of the activities of problem management

5.9 Problem Management - Activities

Problem Management-Activities There are two major activities of Problem management. Reactive and Proactive Problem Management (we have already learnt about them briefly under the scope of problem management). - Reactive Problem Management is generally executed as part of Service Operation. Proactive Problem Management is initiated in Service Operation, but generally driven as part of Continual Service Improvement. CSI and Problem Management are closely related, since one of the goals of Problem Management is to identify and remove errors permanently that impact services from the infrastructure. This directly supports CSI activities of identifying and implementing service improvements. Problem Management also supports CSI activities through trend analysis and the targeting of preventive action. It is likely that multiple ways of detecting Problems will exist in all organizations. These will include: • Suspicion or detection of an unknown cause of one or more Incidents, resulting in a Problem Record being raised or an Analysis of an Incident by a technical support group which reveals that an underlying Problem exists, or could exist. • Automated detection of an infrastructure or application fault, using event/alert tools or a notification from a supplier that a Problem exists that has to be resolved could also triggers. • Analysis of Incidents as part of proactive Problem Management-resulting in the need to raise a Problem Record so that the underlying fault can be investigated further Let’s see what is Logging. All the relevant details of Problem must be recorded so that a full historic record exists. This must be date and time stamped to allow suitable control and escalation. • Problem must be categorized in the same way as Incidents (and it is good practice to use the same system) so that the true nature of the Problem can be easily traced in the future and meaningful management information can be obtained. • Problems should be prioritized in the same way and for the same reason as Incidents – but the frequency, impact and cost to resolve the related Incidents must also be taken into account. Next we will learn about the techniques involved in problem management.

5.10 Problem Management - Techniques

Problem Management – Techniques So, what are these techniques used in resolving problems? The investigation should be conducted according to the priority code allocated; however, speed will depend upon impact, severity and the urgency of the Problem. There are number of Problem solving techniques to help diagnose and resolve Problem: Firstly, Chronological Analysis which takes into account the timeline of events in chronological order to aid investigation Next is Pain Value Analysis – it is a broader analysis of the impact of an Incident or Problem. An in-depth analysis is done to determine exactly what level of pain has been caused to the organization (or) business by these Incident/Problems. One of the most common one is Kepner and Tregoe. A useful way of Problem analysis which can be used formally to investigate deeper rooted Problems. Ishikawa Diagrams is a method of documenting causes and effects which can be useful in helping identify where something may be going wrong, or be improved. Pareto Analysis is a technique for separating important potential causes from more trivial issues. These are the techniques of Problem management used during the analysis of the root cause. Next we will look at the inputs and outputs of problem management.

5.11 Problem Management - Inputs and Outputs

Problem Management – Inputs and Outputs Inputs of problem management includes Incident Records for incidents that have triggered problem management activities, Information about CIs and their status, Incident reports and histories that will be used to support proactive problem trending, Communication and feedback about incidents and their symptoms, Communication and feedback about RFCs and releases that have been implemented or planned for implementation, Communication of events that were triggered from event management , Operational and service level objectives, Agreed criteria for prioritizing and escalating problems, Outputs from risk management and risk assessment activities and Customer feedback on success of problem resolution activities and overall quality of problem management activities Outputs are resolved Problems, Updated Problem Management Record, RFCs to remove infrastructural errors, Workarounds for incidents, Known Error Records, Problem Management Reports and Improvement Recommendation .

5.12 Problem Management - Triggers

Problem Management – Triggers With reactive problem management, the vast majority of problem records will be triggered in reaction to one or more incidents, and many will be raised or initiated via service desk. Other problem records and corresponding known error records, may be triggered in testing, particularly the latter stages of testing such as user acceptance testing(UAT), if a decision is made to go ahead with a release even though some faults are known. Suppliers may trigger the need for some problem records through the notification of potential faults or known deficiencies in their products or services. With proactive problem management, problem records may be triggered by identification of patterns and trends in incidents when reviewing historical incident records. Next we will learn about the interfaces of problem management.

5.13 Problem Management - Interfaces

Problem Management – Interfaces Problem Management interfaces with other processes which we will study in this slide. With Change Management, Problem Management ensures that all resolutions or work-around that requires a change to a CI are submitted through Change Management through an RFC. With Configuration Management, Problem Management uses the CMS to identify faulty CIs and also to determine the impact of Problems and resolutions. The CMS can also be used to form the basis for KEDB and hold or integrate with Problem Records. Problem management also interfaces with Release and Deployment Management. It is responsible for rolling Problem fixes out into the live environment. It also assists in ensuring that the associated Known Errors are transferred from the development Known Error Database into the live Known Error Database. Availability Management is involved with determining how to reduce downtime and increase uptime. As such, it has a close relationship with Problem Management, especially in the proactive areas. With Capacity Management the relationship is like some Problems will require investigation by Capacity Management teams and techniques, e.g., performance issues. Capacity Management will also assist in assessing proactive measures. With IT Service Continuity, Problem Management acts as an entry point into IT Service Continuity Management where a significant Problem is not resolved before it starts to have a major impact on the business. With Service Level Management, Problem management do share relationship. The occurrence of Incidents and Problems affects the level of service delivery measured by SLM. Problem Management contributes to improvements in service levels, and its management information is used as the basis of some of the SLA review components. Financial Management assists in assessing the impact of proposed resolution or work-around, as well as Pin Value Analysis. Problem Management provides management information about the cost of resolving and preventing Problems, which is used as input into budgeting and accounting system. And that shows that financial management shares a strong bond with problem management. We have now looked at how problem management works in tandem with other functional management systems. Let us proceed to see how the information is managed in problem management process.

5.14 Problem Management - Information Management

Problem Management – Information Management The CMS will hold details of all of the components of the IT infrastructure as well as the relationships between these components. It will act as a valuable source for Problem diagnosis and for evaluating the impact of Problems (e.g., if this server is down, what data is on that server?; which services use that data?; which users use those services?). As it will also hold details of previous activities, it can also be used as a valuable source of historical data to help identify trends or potential weaknesses – a key part of proactive Problem Management. KEDB is another means of managing information under problem management.(Refer to slide 120 as we have already discussed about KEDB in detail). Let us look into the critical success factors of information management.

5.15 Problem Management - Metrics

Problem Management – Metrics Based on the goals of the target audience (operational, tactical, or strategic) the process owners need to define what they should measure in a perfect world. To do this: Map the activities of the process that need to be measured. Consider what measurements would indicate that each service and Service Management activity is being performed consistently and can determine the health of the process. It is important to identify the measurements that can be provided based on existing tool sets, organizational culture and process maturity. Note that there may be a gap in what can be measured vs. what should be measured. When initially implementing processes don’t try to measure everything, rather be selective of what measures will help to understand the health of a process. A major mistake many organizations make is trying to do too much in the beginning. Be smart about what you choose to measure. Next, let us look at the challenges and risks of problem management.

5.16 Problem Management - CSFs and KPIs

Problem Management – CSFs and KPIs The list includes some sample CSFs for problem management. Each organization should identify appropriate CSFs based on its objectives for the process. Each sample CSF is followed by a small number if typical KPIs that support the CSF. These KPIs should not be adopted without careful consideration. Each organization should develop KPIs that are appropriate for its level of maturity, its CSFs and its particular circumstances. Achievement against KPIs should be monitored and used to identify opportunities for improvement, which should be logged in the continual service improvement (CSI) register for evaluation and possible implementation Let us take an example of a CSF which states Minimizing the impact to the business of incidents that cannot be prevented. Supporting KPIs of this CSF would be the number of known errors added to the KEDB, The percentage accuracy of the KEDB, Percentage of incidents closed by the service desk without reference to other levels of support and Average incident resolution time for those incidents linked to problem records Let us take one more example of CSF. Here the CSF states Maintain quality of IT services through elimination of recurring incidents and the supportive KPIs are Total number of problems , Size of current problem backlog for each IT Service, Number of repeat incidents for each IT service, Provide overall quality and professionalism of problem handling activities to maintain business confidence in IT capabilities, The number of major problems, The percentage of major problem reviews successfully performed, The percentage of major problem reviews completed successfully and on time, Number and percentage of problems incorrectly assigned and Number and percentage of problems incorrectly categorized.

5.17 Problem Management - Challenges and Risks

Problem Management – Challenges and Risks It is important that Problem Management is able to use Knowledge Management and Configuration Management resources. Another CSF is the ongoing training of technical staff in both technical aspects of their job as well as the business implication of the services they support and the processes they use. Key Challenges could be The ability to identify Problems as early as possible, Ensuring all staff understand the difference between Incidents and Problems, Availability of information about Problem and Known Errors, Integration into CMS to determine relationships between CIs and to refer to the history of CIs and Alignment with the Incident Management process to support analysis and investigation.

5.18 Problem Management Summary

Summary We have come to the end of this module; let us quickly recap on the Problem management module. In this module we have learnt about problem management purpose, objective, scope, challenges, risks, scope, metrics, types of problem management, CSFs and KPIs, Meet you in Module 6, Thankyou!

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Request more information

For individuals
For business
Phone Number*
Your Message (Optional)
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Phone Number*
Job Title*