ITIL - Service Operations Processes Tutorial

1 Service Operations Processes

This lesson covers the basics of the processes related to service operations.

2 Objectives

After completing this lesson, you will be able to: •Explain the purpose, objective, scope, value and basic concepts of the processes of service operations

3 Event Management-Overview

Let us start this lesson with an overview of event management. In the previous lesson, we have already discussed what an event is. Event management is the basis for operational monitoring and control. We will now focus on the purpose, objective and scope of event management. Purpose: The purpose of event management is to detect events, understand them and ensure that appropriate control action is coordinated. These events should be managed throughout their lifecycle. Objective: The primary objective of event management is to detect all changes that are significant for the management of a Configuration Item or CI, or an IT service. Event management determines the appropriate control action or events and ensures that they are communicated to appropriate functions. Event management provides the means to compare actual operating performance and behavior. It provides a basis for service assurance, reporting and service improvement. Scope: The scope of event management applies to all facets of service management that require control. The scope includes: •configuring items, for example, updating a file server; •detecting environmental conditions, for example, fire and smoke detection; •monitoring software licenses to ensure optimum or legal utilisation and allocation; •maintaining security, for example, intrusion detection; and •executing activities such as mainframe utilisation and batch jobs.

4 Event Management-Process Activities

Following is a generic process flow of the activities involved in event management: Event occurs The event lifecycle starts when an event occurs. Events occur continuously, however, not all of them are detected or registered. Event notification Once an event happens, a notification is issued. The notification can be an email or SMS. It can be logged in a tool, or it may be a flag on the monitoring dashboard. Event detection Once the event is notified, it is detected by an agent running on the same system, or transmitted directly to a management tool. Event logging The event is then recorded in the event record of the event management tool, or is left in the system as an entry. First-level event correlation The purpose of first-level correlation is to decide if the event should be communicated to the management tool or can be ignored. If it is ignored, it will be recorded in the event log file. Significance If the event is sent to the management tool, it is categorised according to its significance. There are three broad levels of categorisation, such as informational, warning and exception. Second-level correlation If the event is a warning, a decision is made on the way to deal with the event and the action that needs to be taken. Further action If the second-level correlation detects an event, then a response to the event is required. Response selection There are different types of responses. Each type of response is designed specifically for a task that it should initiate. Example: Auto response, alert and human intervention and open RFC or Request for Change Review action It is important to check if an event has been handled appropriately. The review is used as an input to Continual Service Improvement or CSI, evaluation and audit of the event management process. Close event It is important to ensure that all events are closed. Events that give rise to incidents should be formally closed by a link appropriately recorded.

5 Event Logging and Filtering

Let us now focus on how event logging and filtering is done. Events occur continuously in an organisation, but not all of them are detected or registered. Based on business rules, this could be intentional as the volume of events generated is huge. Once an event notification is generated, an agent otherwise known as trap, detects it. This agent may run on the same system, or get transmitted directly to a management tool. Then the filtering occurs, which helps to decide whether to treat the event or ignore it. If ignored, the event is recorded in a log file on the device, but no further action is taken. If not ignored, first-level correlation is performed to determine whether the event is informational, a warning, or an exception. An agent residing at the CI or server usually performs the first-level correlation. Filtering is not always necessary. For some Configuration Items, every event is significant. It moves directly into a management tools correlation engine, even when duplicated. In addition, it is possible to turn off all unwanted event notifications.

6 Manage Exceptional Events

Now let us discuss how to manage exceptional events. If the output of filtering is an exception, it means that a service or device is currently operating abnormally. For example, it may be defined that if the organisation’s website takes more than 5 seconds to load, it should be an exceptional event and needs to be looked into. This means that an Operational Level Agreement or OLA and a Service Level Agreement or SLA have been breached. It also indicates that the business is impacted. Exceptions can represent a total failure, an impaired functionality or a degraded performance. An exception does not always represent an incident or a problem. It can be managed by using either an incident record or a Request for Change or both. This depends on the organisation’s incident and change management policies. Following are the examples of exceptions: •A server is down. •The response time of a standard transaction across the network has slowed to more than 15 seconds. •More than 1500 users have logged on to the Internet application concurrently. The image illustrates how exceptional events are managed. Whenever there is an exceptional event, the incident management team decides if this should be treated as an incident. If yes, then it goes to the incident management process. If the event is not considered as an incident, the team decides if it is a problem. If the event is considered as a problem, the problem management team handles the event. If the incident management team decides that the event is a change, the change management team steps in to handle the event.

7 Manage Informational and Warning Events

We will now discuss how to manage informational and warning events. If the output of filtering is informational, the event does not require any action, and does not represent an exception. Such events are stored in the system or service log files for a fixed period. Informational events are used to check the status of a device or service. They are also used to confirm the successful completion of an activity and generate statistics such as the number of users logged on. These events are used as inputs to an investigation such as finding out which jobs have completed successfully before the transition processing queue hung. Following are the examples of informational events: •A device has come online. •A job in the batch queue has completed successfully. •An user logs into an application. •A transaction has completed successfully. A warning output is generated when a service or device approaches a predefined threshold. Warnings are usually not raised for a device failure. For example, memory utilisation on a server is currently at 65% and it is increasing. If it reaches 75%, the response time will be excessively long and the OLA for that department will be breached. The image illustrates how an informational event and a warning are handled. If an event is an informational event, it is simply logged. But if the event is a warning, there are two ways to handle it. If any manual interference is needed, the concerned team needs to log the incident and work on it. If the warning is automated, the event is handled by the CI or Configuration Item and it is logged before being closed. Irrespective of the types of events, all of them have to be logged.

8 Knowledge Check

Let us do a quick recall of the event management process activities.

9 Incident Management-Overview

We will now focus on an overview of incident management. The process of dealing with all incidents throughout their lifecycle is known as incident management. Incidents can be elevated over the phone or email, or through web-based and event management tools. It is the responsibility of the incident management team to coordinate the fixing of the service. This is to ensure that the service is available to users as soon as possible. Let us now understand the purpose, objective, scope and value of incident management. Purpose: The purpose of incident management is to help restore service operations to normal as soon as possible. Incident management also helps to minimise adverse impact on business operations. It ensures the best service quality and availability. Objective: The primary objective of incident management is to standardise methods and procedures used for efficient and prompt response of IT services. It also analyses, documents, and reports incidents during the management process. Incident management increases the visibility in communication of incidents to the business and IT support staff. In addition, it aligns incident management activities and priorities with those of the business. Scope: The primary scope of incident management is to manage any disruption or potential disruption to live IT services. The process also manages the events that are identified. The events may be identified directly by users through service desk or an interface from event management to incident management tools. The events may also be reported or logged by the technical staff. Value: Incident management adds a lot of value to business. It lowers business downtime, which in turn leads to higher availability of the services. It improves the capability to identify business priorities and allocate resources dynamically as required. Let understand this with a scenario. Suppose an IT service provider has no incident management process in place but it has people to provide IT support. In the absence of incident management, issues are handled on a first-come, first-served basis. It means that if there are 10 issues related to printing, the IT support staff gets busy in resolving those issues. So when a major business service becomes unavailable, there is no IT support available to respond. On the other hand, if the IT service provider has incident management process in place, printer calls are assigned a low priority, and fewer resources are allocated to resolve the issue. So if a high-priority incident occurs, the resources can be shifted immediately to resolve the issue, while managing the printer issues as well. Another value that incident management adds to a business is that it increases the ability to identify potential improvements to services.

10 Incident Management-Basic Concepts

Some of the basic concepts of incident management are time scales, incident models and major incidents. The objective of incident management is to restore service as soon as possible. The time scale in which the incidents are resolved is important. To commit to such a time scale, the service provider and the customer must agree and document it in the Service Level Agreements. Time scales depend on the priority defined and are documented in the OLA and the UPC or Underpinning Contracts. Service management tools are used to automate timescales and escalate the incident as required. This is done on the basis of predefined rules. This brings up another key concept of incident models. Incident models are based on predefined steps to handle a particular incident. They are detailed descriptions of the steps and order in which incidents are handled. An example of incident model is major incident. A major incident is a break in service, which threatens to cause or is causing loss to the business and if not given immediate attention may lead to invocation of a disaster. The loss can be in terms of finance or brand image. A separate procedure, with shorter timescales and greater urgency, is used for major incidents.

11 Incident Management-Process Flow

Following are the steps involved in the process flow of incident management. Identification Here the incident is detected or reported through event management. Alternatively, the user impacted can register it through a web interface, phone call or email. Registration Here the incident is logged, and a record is created. Incident categorisation The registered incident is categorised according to type, status, impact, urgency or SLA. This is called incident categorisation. If the issue reported may not be an incident but a request from the user or a change proposal, then it is handled according to the request fulfilment process. Prioritisation Once the incident is categorised, it is assigned an appropriate prioritisation code to determine how it is to be handled by support tools and support staff. Priority is decided on the impact and urgency of the issue. Functional escalation After prioritisation, an initial diagnosis is carried out to discover the full symptoms of the incident. If the service desk cannot resolve the incident, it is escalated for further support. This is known as functional escalation. If incidents are more serious, the appropriate IT managers should be notified. This is called hierarchical escalation. Functional escalation is based on knowledge or expertise. It is also known as horizontal escalation. Hierarchical escalation If hierarchical escalation is done for corrective actions by authorised line management, it is known as vertical escalation. It is done when the resolution of an incident is not satisfactory to the end user. Investigation If no escalation is required and there is no known solution, the incident is investigated. This investigation for a solution can also happen at the level of functional escalation. Resolved If the solution is found, it is applied and the issue is resolved. Closed If the incident is fully resolved, the service recovers to a fully functional level and the user is satisfied with the solution. So the incident can be closed. Take a look at the example of incident management.

12 Process Interfaces

Following are the interactions of incident management with other processes for its day-to-day activities: Service Level Management or SLM Incident management and SLM are interdependent to know the SLA of the CI that needs to be fixed. Service Level Management is dependent on incident management for their SLA targets and performances. Event management This is a one-way dependency where incident management is dependent on event management to raise the incidents. Problem management This is a one-way dependency where problems are raised by incidents when they are not able to solve it. Change management This is an interdependent process because some incidents are fixed only by changes in the configuration, and some changes implemented may cause incidents. Service Asset and Configuration Management or SACM Incident management and SACM are interdependent on each other. This is because information from the CMDB or Configuration Management Database is needed to resolve the incidents. Whereas, the CMDB needs to know the faulty CIs on a periodic basis. Availability management This is a one-way dependency where incident management needs to coordinate with availability management to know the available time, planned downtime and available incidents. Capacity management Incident management and capacity management are interdependent. This is because some of the incidents may arise due to capacity issues. So it is important to know the performance of the components and the service.

13 Problem Management-Overview

We will now focus on problem management. Problem management is the process responsible for managing the lifecycle of all problems. Let us now understand the purpose, objective, scope and value of problem management. Purpose: The purpose of problem management is to identify and eliminate the root cause of incidents in the IT infrastructure. The process helps to eliminate recurring incidents and minimise the impact of incidents that cannot be prevented. Objective: The primary objective of problem management is to prevent problems and resulting incidents from taking place. It also eliminates recurring incidents, and fixes the root cause to minimise the impact of the incidents. Scope: The primary scope of problem management is to diagnose the root cause of incidents and determine their resolution. The process ensures that the resolution is implemented through appropriate control procedures. It also ensures that information on the problems is maintained. Value: Problem management adds value to any business. It improves IT service availability. It also reduces downtime and disruption of business critical systems. Problem management reduces the expenditure on workarounds or fixes that do not work. It also reduces the cost of effort made to resolve repeating incidents.

14 Types of Problem Management Processes

Let us now discuss the types of problem management processes. Problem management has two sub-processes, which are reactive and proactive problem management processes. Reactive problem management deals with analysing and resolving the causes of incidents. These activities are performed by service operations. The activities in reactive problem management are similar to those of incident management. The subsequent activities are different as this is where the actual root-cause analysis is performed and Known Errors are corrected. Proactive problem management deals with activities that detect and prevent future problems or incidents. Proactive problem management includes the identification of trends or potential weaknesses. It is initiated by service operations. There are two main activities in proactive problem management: Trend analysis It involves the reviewing of reports from other processes. Example: Trends in incidents, availability levels, relationships with changes, releases and identification of recurring problems Targeting preventative action It involves performing a cost-benefit analysis of all costs associated with prevention, and targeting specific areas. This requires more support and attention.

15 Reactive Problem Management-Process Flow

Following is the simple process flow of reactive problem management: Detection and logging Here the service desk detects the problem. If a definitive cause to the incident is not identified or the incident is suspected to recur again, then it is recorded. A cross-reference is made of the incident(s), which initiated the problem record, and all relevant details are copied from the incident record(s) to the problem record. Categorisation Once the problem is recorded, they are categorised and prioritised in the same way as incidents. However, the frequency and impact of incidents also play a role in the prioritisation of problems. Investigation and diagnosis Once the problem is prioritised, effort is spent on investigation and diagnosis of the problem to find the root cause. In some cases, it may be possible to find a workaround that is a temporary way of overcoming the difficulties to the incidents caused by the problem. For example, a manual addition is made to an input file, so that a program runs successfully. The manual addition also allows a billing process to complete satisfactorily. But it is important that work on a permanent resolution continues. In this scenario, find the reason for the file becoming corrupt and correct the cause of the defect to prevent this from happening again. In case a workaround is found, it is important that the problem record remains open and details of the workaround are always documented within the problem record. Known Error recorded As soon as the diagnosis is complete, and a workaround is found, a Known Error record must be raised and placed in the Known Error Database. This helps to identify further incidents or problems when they arise. It also helps to restore the service faster. Request for Change As soon as a solution is found, a Request for Change or RFC is raised to apply the solution. Not all solutions are cost justified. So decisions might be taken to not apply the solution and use a workaround. After the change is implemented and the resolution applied, the problem record is formally closed. Major problems A major problem is determined by the priority of the problem. Once a major problem is detected, a review is conducted and the lessons learnt are documented before the problem record is closed.

16 Problem Management-Interface with Other Processes

Following are the processes problem management involves for its day-to-day activities: Change management Problem management and change management are interdependent. This is because some problems can be fixed only by changes in the configuration, whereas, some changes implemented may cause problems. SACM Problem management and SACM are interdependent. This is because problems need information from the CMDB to solve the problems, and the CMDB needs to know the faulty CIs on a periodic basis. This helps in proactive problem management. Release and Deployment Management or RDM Problem management and RDM are interdependent. This is because some problems can be fixed only by version upgradation, and need release planning for proactive problem management. IT Service Continuity Management Problem management and IT Service Continuity Management are interdependent. This is because problem management has to know the continuity level, business continuity plan and disaster recovery plans. Availability management Problem management and availability management need to coordinate with availability management to know the availability time and planned downtime for releases. Capacity management Problem management and capacity management are interdependent. This is because some of the problems may arise due to capacity issues. So you should know the performance of the components and the services. Service Level Management Service Level Management is dependent on problem management to fix all the problems for their CI targets and performance. Finance management Problem management and finance management are interdependent because problem management needs the problems to be solved by changes. These changes may incur cost based on which the finance team needs to plan the finance for the year.

17 Request Fulfillment-Overview

Now we will focus on an overview of request fulfilment. The process of dealing with service requests from the users is known as request fulfilment. Many elements of request fulfilment are automated through self-help such as websites and user applications, with manual activities being used where necessary. Let us understand the purpose, objective and scope of request fulfilment. Purpose: The main purpose of request fulfilment is to manage all service requests raised by users throughout the IT Service Management lifecycle. Objective: Maintaining customer satisfaction is the primary objective of request fulfilment. This is achieved through efficient and professional handling of requests. Request fulfilment also provides a channel for users to request and receive standard services. A predefined authorisation and qualification process exists for these services. Another objective of request fulfilment is to source as well as deliver the components of the requested standard. This process assists the customer with general information, complaints or comments. Scope: The scope of request fulfilment in an organisation, where large number of service requests are handled, is to separate the work stream. The process also records and manages the service requests as separate record types. The organisation decides what should be handled as a service request. Anything that occurs frequently and has low cost, risk and impact on the organisation's business is considered as a service request. The basic concept of the request fulfilment process is the request model, which is used for specific procedures. The request model is also used for handling certain types of requests. Such requests are IMAC that is insert, move, add and change, and password resets.

18 Service Request

Let us discuss service request. Service request is a generic description of the different types of demands that users place on the IT department. Most of these requests are small changes—low risk, frequently occurring, low cost and so on. Their scale, frequency and low-risk nature mean that instead of obstructing the normal incident and change management processes, you should implement a separate process to handle the requests better. Hence, the request fulfilment process is used to handle them. Example: Request to change a password or unlock accounts, or request to install an approved software application onto a particular PC

19 Access Management-Overview

Now we will focus on an overview of access management. Access management is the operational execution of the policies defined by information security and availability management. Access management is the process of granting and preventing service rights to users. Let us understand the purpose, objective and scope of access management. Purpose: The key purpose of access management is to grant authorised users the right to use a service. The process also helps to prevent access to non-authorised users in an organisation. Access management ensures that users are given the right to use a service, but it does not ensure that the access is available at all agreed times. Objective: The objective of access management is to execute policies and actions defined by Information Security Management or ISM and availability management. Often this process is coordinated centrally by the Service Desk and can involve the technical and application management functions. Example: If you want Internet access for your banking account, you call the helpdesk at the bank. The helpdesk then co-ordinates with the Internet banking application support team to get your access granted. Scope: The scope of access management is to enable the organisation to manage the confidentiality, integrity and availability of their data and intellectual property.

20 Summary

Let us summarise what we have learnt in this lesson: •Event management is the basis of operational monitoring and control. •The process of dealing with all incidents throughout their lifecycle is known as incident management. •The three basic concepts of incident management are time scales, incident models and major incidents. •Problem management has two sub-processes—reactive problem management and proactive problem management. •The process of dealing with service requests is known as request fulfilment. •Access management is the operational enforcement of policies. Next, we will focus on the third lesson of this unit, which is Functions.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Request more information

For individuals
For business
Phone Number*
Your Message (Optional)
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Phone Number*
Job Title*