ITIL Intermediate OSA - Incident Management Tutorial

3.1 Incident Management

Hello and welcome to the elearning course by Simplilearn on ITIL 2011Intermediate OSA module preparation. This is module 3 of the syllabus. We hope that you had time to go through our introductory module, foundation and event management modules. In this module we will learn about Incident Management. At the end of this module you will have to take up a quiz to gauge your understanding. Let’s quickly move on to the next slide which introduces you to the agenda of this module.

3.2 Incident Management

Agenda Similar to module 1 & 2, we will discuss about objectives, scope, activities, key concepts, challenges, risks, activities, interfaces and success factors as well in this module. In addition we will also learn about process flow, escalations, categorization and prioritization of Incident management. Let us look into the purpose of incident management.

3.3 Incident Management - Purpose

What is the purpose of Incident Management? The Purpose of Incident Management is to restore normal service operation as quickly as possible by minimising adverse impact on business operations and ensuring agreed levels of service quality is maintained. The next slide talks about the objective of incident management.

3.4 Incident Management - Objective

Incident Management – Objective What is the objective of Incident Management? There are many objectives of Incident Management. The Objectives of Incident Management includes Standardization across methods and procedures which are used for efficient and prompt response, analysis, documentation, ongoing management and reporting of incidents. It also focuses on Increase visibility and communication of incidents to business and IT Support staff and enhances business perception of IT through use of a professional approach in quickly resolving and communicating incidents when they occur. Objectives also include alignment of incident management activities and priorities with those of the business and maintaining user satisfaction with the quality of IT services. Next, let’s look into the scope of incident management.

3.5 Incident Management - Scope

Incident Management – Scope What is the scope of incident management? Incident Management includes any event which disrupts, or which could disrupt a service. This includes events which are communicated directly by users, either through the Service Desk or through an interface from Event Management to Incident management tools. Incidents can also be reported and (or) logged by technical staff. This does not mean that all events are Incidents. Many types of events are not related to disruptions at all, but are indicators of normal operation or are simply informational. In next slide we will learn about incident management as value to business.

3.6 Incident Management - Value to Business

Incident Management – Value to Business Like event management does incident management add any value to the business? Let’s look into the details: Incident Management provides major value to business by providing the ability to detect and resolve incidents, which results in lower downtime to the business, which, in turn, means higher availability of the service. The means that the business is able to exploit the functionality of the service as designed. Next value addition I would say is the ability to align IT activity to real-time business priorities. This is because Incident Management includes the capability to identify business priorities and dynamically allocate resources as necessary. The ability to identify potential improvements to services is one of the very important value provided by incident management. This happens as a result of understanding what constitutes an Incident and also from being in contact with the activities of business operational staff. Let us move on to learn about the policies of incident management.

3.7 Incident Management - Policies

Incident Management – Policies What are these policies? Like any other process incident management also have a set of policies. Policies can be put as rules. Let us see what are the different policies stated for incident management. • First is incidents and their status must be timely and effectively communicated. • Secondly incidents must be resolved within timeframes acceptable to the business. • Thirdly Customer Satisfaction must be maintained at all times. • Next policy states that incident processing and handling should be aligned with overall service levels and objectives • One of the important policy is that all Incidents should be stored and managed and should subscribe to a standard classification schema • Incident records should be audited on a regular basis • Lastly all incident records should utilize a common format and a common and agreed set of criteria for prioritization and escalation. In the next few slides we will learn about the key concepts of Incident Management. We will get introduced to terms of Incident Management.

3.8 Incident Management - Key Concepts

Incident Management – Key Concepts The key concepts of Incident Management are Time scales, Incident Models, Major incident and Service request (which we will learn in the next slide). Timeframes must be agreed for all Incident-handling stages (these will differ depending upon the priority level of the Incident) – based upon the overall Incident response and resolution targets within SLAs – and captured as targets within OLAs(pronounced as O-L-As) and Underpinning Contracts (UCs). All support groups should be made fully aware of these timeframes. Service Management tools should be used to automate timeframes and escalate the Incident as required based on predefined rules. Although a lot of Incidents are not new, they are often dealt in the same way repeatedly. This is the reason, many organisations will find it helpful to have standard Incident Models and apply them to appropriate incident when they occur. An Incident Model is a way of predefining the steps that should be taken to handle an Incident in an agreed way. Support tools can then be used to manage the required process. The Incident Model should include the predefined steps that should be taken to handle the Incident and the chronological order of these steps should be considered with any associated dependences. Roles and Responsibilities of those involved should be defined within it. Timeframes and thresholds for completion of actions and Escalation procedures; who should be contacted and when should also be documented. Any necessary extra information that may need to be recorded (particularly relevant for security- and capacity-related Incidents) can also be a part of it The next slide is the continuation of the key concepts.

3.9 Incident Management - Key Concepts

Incident Management – Key Concepts Here we will look into the two key words of Incident management frequently used in the IT Service Management industry. Major Incident: Major incident is a separate procedure, with shorter timeframes and greater urgency, must be used for “major” Incidents. A definition of what constitutes a Major Incident must be agreed and ideally mapped on to the overall Incident prioritization system. Service Request Service Request is a request from a user for information, or advice, or for a standard change or for access to an IT service. For example to reset a password, or to provide standard IT service for a new user. Service Requests are usually handled by Service Desk, and do not require an RFC to be submitted. These types of Service Requests can be performed through standard changes. Next, let us learn about the process flow of incident management.

3.10 Incident Management - Process Flow

The incident management process consists of the following steps: 1. Identification - The incident is detected or reported. This could happen through event Management, or the user impacted could register it through a web interface or over a phone call or through email. 2. Registration – When an incident is reported by earlier stated means, the incident is logged and a record is created. 3. The registered incident is coded by type, status, impact, urgency, SLA, et cetera. This is called incident categorization. At this step it may be realized that the issue reported is not an incident but a request from the user or it can be change proposal. If it is a service change request and is then categorized as a service request and handled as per the request fulfillment process. Otherwise categorized as change proposal and handled by the service portfolio management. For example, a user calls in to report that her email is not working and the service desk person realizes that her email has not been configured, so it’s not an incident but a service request for email configuration. 4. Once the Incident has been categorized it is assigned an appropriate prioritization code to determine how the incident is to be handled by support tools and support staff. Recall priority is decided on the impact and urgency of the issue. During this step, identification of Major Incidents also happens, and if found so, the incident is acted upon as per the procedures defined for the major incident. 5. After prioritization, an initial diagnosis is carried out to try to discover the full symptoms of the incident. 6. When the service desk cannot resolve the incident itself, the incident is escalated for further support also called functional escalation or if incidents are more serious, the appropriate IT managers must be notified, also called as hierarchical escalation. Functional escalation is based on knowledge or expertise and is also known as “Horizontal Escalation”. Whereas Hierarchical escalation is done for corrective actions by authorized line management. It is also known as “Vertical Escalation” and is usually done when resolution of an incident will not be in time or satisfactory to the end user. 7. If no escalation is required and if there is no known solution, the incident is investigated. This investigation for a solution could also happen at a functional escalation level. 8. Once the solution has been found, the solution is applied and the issue can be resolved. 9. Finally, the service desk should check that the incident is fully resolved, the service has been recovered to a fully functional level and that the user is satisfied with the solution and the incident can be closed. The key thing about incident management is that the Service Desk typically OWNS and is accountable for ALL Incidents. It monitors progress and manages escalation of Incidents.

3.11 Incident Management - Activities

Incident Management – Activities Action to resolve an incident cannot take place until the incident has been identified. It is not considered as a good practice in most organization where the technology team has to wait for the impacted user to escalate the incident to the Service Desk. Therefore, all key components need to be monitored so that failures or potential failures are detected early so that the Incident Management process can be started. The quality of the Incident identification will be heavily dependent on the Event Management process. All Incidents must be fully logged and date/time stamped, regardless of whether they are raised through Service Desk telephone call or whether automatically detected via an event alert. All relevant information relating to the nature of the incident must be logged so that a full historical record is maintained which will help in referring the incident to other support group(s), thus they will have all relevant information on hand to assist them. While logging an incident, one should ensure that the following information is updated: • Incident categorization • Incident urgency • Incident impact • Incident prioritization • Name/ID of the person and/or group recording the Incident • Name/department/phone/location of user • Description of symptoms, • Incident status (active, waiting, closed, etc.) • Related CI, support group/person to which the Incident is allocated • Related Problem/Known Error • Activities undertaken to resolved the incident • Closure category Once the incident is logged, it is essential to categorize them into types of incidents which help in easy resolution of the event. Let’s learn about categorization in the next slide.

3.12 Incident Management - Categorization

Incident Management – Categorization To ensure quick response for the incidents recorded, segregating them into different categories is very important. Let us see how it can be done? The final part of the initial phase is to allocate suitable Incident categorization coding so that the exact type of Incident is recorded. This will be important later when looking at Incident types/frequencies to establish trends for use in Problem Management, Supplier Management and other ITSM activities. Multi-level categorization is available in most tools – usually to three or four levels of granularity. Note: Please note that the check for Service Requests in this step does not imply that Service Requests are Incidents. This is simply recognition of the fact that Service Requests are sometimes incorrectly logged as Incidents (e.g., a user incorrectly enters the request as an Incident from the web interface). This check will detect any such requests and ensure that they are passed to the Request Fulfillment process. The figure on the slide is an example of multi-layered categorization. Post categorization, prioritizing the incident for quick response is very important. Let’s learn about prioritization in the next slide.

3.13 Incident Management - Prioritization

Incident Management – Prioritization To determine how an incident is handled both by support tools and support staff. Prioritization can normally be determined by taking into account both the urgency of the Incident (how quickly the business needs a resolution) and the level of impact it is causing. In all cases, clear guidance – with practical example – should be provided for all support staff to enable them to determine the correct urgency and impact levels, so the correct priority is allocated. Such guidance should be produced during service level negotiation. The table on the slide shows the example of priority coding system. Once the incident has been categorized and prioritized, diagnosis of it is done to find a solution. Let us learn about Diagnosis in the next slide.

3.15 Incident Management - Escalation

Incident Management – Escalation When does an escalation of incident happen? Let us look into it: Incident routing is called horizontal or functional escalation and primarily takes place due to lack of knowledge or expertise. As soon as it becomes clear that the Service Desk is unable to resolve the Incident itself, the Incident must be immediately escalated for further support. If the organization has a second-level support group and the Service Desk believes that the Incident can be resolved by that group, it should refer the Incident to them. If the second-level support group cannot resolve the Incident, it must be escalated to the third-level support group. This could be internal or an external third party. The rules for escalation and handling of Incident must be agreed in OLAs and UCs with internal and external groups respectively. When referring Incidents, care should be taken by Service Desk to ensure that SLA resolution times are not exceeded. Vertical or hierarchical escalation can take place at any moment during the Incident Lifecycle. It usually occurs when major Incidents are reported or when it becomes apparent that an Incident will not be resolved in time, which results in breached SLAs. This allows the relevant authority to take corrective action. Escalation never turns an Incident into a problem, although it may result in ownership of an Incident passing to the Problem Manager for administrative reasons and/or the identification of an associated Problem. The Service Desk owes the Incident throughout its lifecycle, regardless of where it has been escalated! The Service Desk is responsible for tracking progress, keeping users informed and ultimately for Incident closure. Functional escalation: When the Service Desk can’t resolve an Incident itself (or when first level resolution target times are to be breached) the Incident must be escalated for further support. This can be to a second-line support group or if the Incident requires deeper technical knowledge it can be escalated to a third-line group, this could be an internal department or a third party such as a software supplier or hardware manufacturer. The picture in the slide shows the flow of hierarchical escalation and functional escalation. In this slide we have learnt two types of escalations, moving on let us learn about resolution and recovery process of incidents.

3.16 Incident Management - Resolution and Recovery

Incident Management – Resolution and Recovery Resolution is an important part of incident management. This is the step where an incident can be resolved or a resolution can be identified. When a potential resolution has been identified, it should be applied and tested. The specific actions to be undertaken and the people who will be involved in taking the recovery actions may vary, depending upon the nature of the fault. Even when a resolution has been found, sufficient testing must be performed to ensure that the recovery action is completed and that the service has been fully restored. The resolving group should pass the Incident back to the Service Desk for closure action. Let’s look into the closure step of incident management.

3.17 Incident Management - Closure

Incident Management- Closure Let us look at the steps involved in closure. The Service Desk should check that the Incident is fully resolved and that the users are satisfied and willing to agree that the Incident can be closed. The Service Desk should also check the following: • Closure categorization: Check and confirm that the initial Incident categorization was correct or, where the categorization subsequently turned out to be incorrect, update the record so that a correct closure categorization is recorded for the Incident-seeking advice or guidance from the resolving group(s) as necessary. • User satisfaction survey: carry out a user satisfaction call-back or e-mail survey for the agreed percentage of Incidents. • Incident Documentation: Chase any outstanding details are ensures that the Incident Record is fully documented so that a full historic record at a sufficient level of details is complete. • Ongoing or recurring Problem? Determine (in conjunction with resolution groups) whether it is likely that the Incident could recur and decide whether any preventive action is necessary to avoid this. In conjunction with Problem Management, raise a Problem Record in all such cases so that preventive action is initiated. •Formal closure: Formally close the Incident Record. Even with a mature Incident process well managed, there will be occasions when Incidents recur even though they have been formally closed. Because of such cases, it is wise to have predefined rules about if and when an Incident can be reopened. We have looked at all phases of incident management so far. However you might question what happens when there is recurrence of incidents? The answer to this question is available in the next slide.

3.18 Incident Management - Rules for reopening incidents

Incident Management- Rules for reopening incidents What happens when the incidents are recurring? Despite all adequate care, there will be occasions when incidents recur even though they have been formally closed. The choice made must consider its effect on data collection, so the reoccurrence and associated work is clearly recorded and accurately reported. Because of such cases, it is wise to have predefined rules about if and when an incident can be reopened. It might make sense, for example to agree that if the incident recurs within one working day then it can be reopened – but that beyond this point a new incident must be raised, but linked to the previous incident/s. The exact time/thresholds may vary between individual organizations but clear rules should be agreed and documented. Like Event management, incidents also have their own trigger points. Let us look at the details.

3.19 Incident Management - Triggers

Incident Management – Triggers This slide explains Incident management triggers. Incidents can be triggered in many ways. The most common route is when a user rings the Service Desk or completes a web-based incident-logging screen, but increasingly incidents are raised automatically via Event Management tools. Technical staff may notice potential failures and raise an incident, or ask the Service Desk to do so, so that the fault can be addressed. Some incidents may also arise at the initiation of suppliers – who may send some form of notification of a potential or actual difficulty that needs attention. In the next slide we will learn about the inputs and outputs of incident management.

3.20 Incident Management - Inputs and Outputs

Incident Management – Inputs and Outputs As we all know every process has its own set of inputs as well as outputs. Here I will go through one by one the different inputs and the outputs for the process. Let us start with the inputs of incident management. Inputs for the incident management could be • Information about CIs and their status • Information about known errors and their workarounds • Communication and feedback about incidents and their symptoms • Communication and feedback about RFCs and releases that have been implemented or planned for implementation • Communication of events that were triggered from event management • Operational and Service Level Objectives • Customers Feedback on success of incident resolution activities and overall quality if incident management activities • Agreed criteria for prioritizing and escalating incidents Outputs for the process could be • Resolved Incidents and actions taken to achieve their resolution • Updated Incident Management Records with accurate incident detail and history • Updated classification of incidents to be used to support proactive problem management activities • Raising of problem records for incidents where an underlying cause has not been identified • Validation that incidents have not recurred for problems that have been resolved • Feedbacks on incidents related to changes and releases • Identification of CIs associated with or impacted by incidents • Satisfaction feedback from customers who have experienced incidents • Feedback on level and quality of monitoring technologies and event management activities • Communication about incidents and resolution history detail to assist with identification of overall service quality. In the next slide we will be learning the interfaces of incident management.

3.21 Incident Management - Interfaces

Incident Management – Interfaces Let us start to understand interfaces of incident management with other managements such as: Problem Management, Incident Management forms part of the overall process of dealing with Problem. Incidents are often caused by underlying Problems, which must be solved to prevent the Incident from recurring. Incident Management provides a point where these are reported. Now Configuration Management provides the data used to identify and progress Incidents. The CMS contains information about which categories of Incident should be assigned to which support group. In turn, Incident Management can maintain the status of faulty Cls. It can also assist Configuration Management to audit the infrastructure when working to resolve an Incident. And that’s the relationship configuration management shares with incident management. Whenever there is a change required to implement a work-around or resolution, this will need to be logged as an RFC and progressed through Change Management. In turn, Incident Management is able to detect and resolve Incidents that arise from failed changes. Availability Management: will use Incident Management data to determine the availability of IT services and look at improvements. There is a specific relationship between service level management and incident management. The ability to resolve Incidents in a specified time is a key part of delivering an agreed level of service. Incident Management enables SLM (pronounced as S-L-M) to define measurable responses to service disruptions. It also provides reports that enable SLM to review SLAs objectively and regularly. Incident Management can interface with the following:

3.22 Incident Management - Metrics and Information Management

Incident Management-Metrics Based on the goals of the target audience (operation, tactical, or strategic) the service owners need to define what they should measure in a perfect world. To do this they must: Map the activities of the process that need to be measured. Consider what measurements would indicate that each service and Service management activity is being performed consistently and can determine the health of the process. Identify the measurements that can be provided based on existing tool sets, organizational culture and process maturity. Note: There may be a gap in what can be measured vs. what should be measured. When implementing, initially processes don’t try to measure everything, rather be selective of what measures will help to understand the health of a process. A major mistake many organizations make is trying to do too much in the beginning. Be smart about what you choose to measure.

3.23 Incident Management - Information Management

Incident Management-Information Management IT must now be able to measure and report against an end-to-end service. This information will be important in feeding CSI enabling it to answer any business questions. Therefore for information management one should ensure to maintain 1. Incident Management Tools which includes Resolution actions and History 2. Incident Record Data which includes Incident Classification, Details of any action taken, Incident Category, impact, urgency, priority and Relationship with other Incidents, Problems, changes or Known Errors. So far we have studied about metrics and information management, let us proceed to see what the challenges of incident management are?

3.24 Incident Management - Challenges

Incident Management – Challenges There can be multiple challenges while implementing incident management. The ability to detect Incidents as early as possible will require education of the users reporting Incidents and the configuration of Event Management tools which can be considered as a huge challenge for the organization. • Convincing all staff (technical teams as well as users) that all incidents must be logged, and encouraging the use of self-help web-based capabilities itself is a major challenge faced by incident management • Availability of information about Problems and Known Errors will enable Incident Management staff to learn from previous Incidents and also to track the status of resolutions. • Integration of CMS to determine relationship between CIs and to refer history of CIs while performing first-line support. Alignment with the SLM process will assist Incident Management correctly to assess the impact and priority of Incidents and assists in defining and executing escalation procedures. We have looked at the challenges of Incident Management, let us know learn about the Critical Success Factors and KPIs.

3.25 Incident Management - CSFs and KPIs

Incident Management – CSFs and KPIs The following list includes some sample CSFs for Incident Management. Each organization should identify appropriate CSFs based on its objectives for the process. Each sample CSF is followed by a small number if typical KPIs that support the CSF. These KPIs should not be adopted without careful consideration. Each organization should develop KPIs that are appropriate for its level of maturity, its CSFs and its particular circumstances. Achievement against KPIs should be monitored and used to identify opportunities for improvement, which should be logged in the continual service improvement (CSI) register for evaluation and possible implementation For example CSF is to resolve incidents as quickly as possible minimizing impacts on the business. Supporting KPIs would mean elapsed time to achieve incident resolution or circumvention, broken down by impact code, Breakdown of incidents at each stage (e.g. logged, work in progress, closed etc.), Percentage of incidents closed by the service desk without reference to other levels of support, Number and Percentage of incidents resolved remotely, without the need for a visit and lastly Number of incidents resolved without impact to the business In the next example let us take the CSF as maintaining quality of IT services. Supporting KPIs would be Total number of incidents (as a control measure), Size of current incident backlog for each IT service and Number and percentage of major incidents for each IT Service

3.26 Incident Management - Risks

Incident Management – Risks There can be number of Risks which can be associated with incident management. Here we will look at those risks. Being overambitious can be a risk. That’s why never try to improve everything at once. Be realistic with timelines and expectations. • Not discussing improvement opportunities with the business can be a risk as the business has to be involved in improvement decisions that will impact them. • There should be balanced focus on both services as a whole and incident management. • Improvement projects should be prioritized. Not prioritizing improvement projects itself, can be a risk. • Lack of making strategic, tactical or operational decisions based on knowledge gained – reports are actually used; people see that the reports are being used. • Lack of management taking action on recommended service improvement opportunities • Lack of meeting with the business to understand new business requirements • The communication/awareness campaign for any improvement is lacking, late or missing altogether • Not involving the right people at all levels to plan, build, test and implement the improvement. This is the last topic of the module, let us summarize now.

3.27 Service Operation - Incident Management Summary

Incident Management – Summary We have come to the end of this module, let us quickly recap. Under Incident Management, we have learnt about the module purpose, objective, scope, key concepts, categorization, prioritization, escalations, resolution and recovery, closure, challenges, risks, metrics and information management. In the next module we will be covering topics on Request Fulfillment. Thankyou & meet you in module 4! Incident Management – Summary We have come to the end of this module, let us quickly recap. Under Incident Management, we have learnt about the module purpose, objective, scope, key concepts, categorization, prioritization, escalations, resolution and recovery, closure, challenges, risks, metrics and information management. In the next module we will be covering topics on Request Fulfillment. Thankyou & meet you in module 4!

3.28 Quiz

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Request more information

For individuals
For business
Phone Number*
Your Message (Optional)
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Phone Number*
Job Title*