ITIL - Service Operations Processes Tutorial

Welcome to the eleventh chapter of the ITIL Foundation tutorial (part of the ITIL® Foundation Certification Training). This chapter covers the basics of the processes related to service operations.

Objectives

After completing this chapter, you will be able to explain the purpose, objective, scope, value and basic concepts of the processes of service operations

Event Management - Overview

Let us start this lesson with an overview of event management.

In the previous lesson, we have already discussed what an event is. Event management is the basis for operational monitoring and control.

We will now focus on the purpose, objective and scope of event management.

Purpose

The purpose of event management is to:

  • Detect events

  • Ensure that an appropriate control action is coordinated

  • Manage these events throughout their lifecycle.

Objective

The objective of event management is to:

  • Detect all changes that are significant for the management of a CI (Configuration Item) or an IT service;

  • Determine the appropriate control action or events;

  • Provide means to compare actual operating performance and behavior; and

  • Provide a basis for service assurance, reporting and service improvement.

Scope

The scope of event management applies to all facets of service management that require control.

The scope includes:

  • Configuring items, for example, updating a file server

  • Detecting environmental conditions, for example, fire and smoke detection

  • Monitoring software licenses to ensure optimum or legal utilization and allocation

  • Maintaining security, for example, intrusion detection

  • Executing activities such as mainframe utilization and batch jobs.

Event Management - Process Activities

Following is a generic process flow of the activities involved in event management:

Event Occurs

The event lifecycle starts when an event occurs. Events occur continuously, however, not all of them are detected or registered.

Event Notification

Once an event happens, a notification is issued. The notification can be an email or SMS. It can be logged in a tool, or it may be a flag on the monitoring dashboard.

Event Detection

Once the event is notified, it is detected by an agent running on the same system, or transmitted directly to a management tool.

Event Logging

The event is then recorded in the event record of the event management tool or is left in the system as an entry.

First-level Event Correlation

The purpose of first-level correlation is to decide if the event should be communicated to the management tool or can be ignored. If it is ignored, it will be recorded in the event log file.

Significance

If the event is sent to the management tool, it is categorized according to its significance. There are three broad levels of categorization, such as informational, warning and exception.

Second-level Correlation

If the event is a warning, a decision is made on the way to deal with the event and the action that needs to be taken.

Further Action

If the second-level correlation detects an event, then a response to the event is required.

Response Selection

There are different types of responses. Each type of response is designed specifically for a task that it should initiate.

Example: Auto response, alert and human intervention and open RFC or Request for Change

Review Action

It is important to check if an event has been handled appropriately. The review is used as an input to Continual Service Improvement or CSI, evaluation, and audit of the event management process.

Close Event

It is important to ensure that all events are closed. Events that give rise to incidents should be formally closed by a link appropriately recorded.

event management - process activities

The image given above illustrates process activities involved in event management.

Event Logging and Filtering

Let us now focus on how event logging and filtering is done.

Events occur continuously in an organization, but not all of them are detected or registered. Based on business rules, this could be intentional as the volume of events generated is huge.

When an event is generated:

  • It is detected by an agent known as a trap.

  • It is filtered and decided whether to be treated or ignored.

  • If ignored, the event is recorded in a log file on the device.

  • If not ignored, first-level correlation is performed.

An agent residing at the CI or server usually performs the first-level correlation. Filtering is not always necessary. For some Configuration Items, every event is significant. It moves directly into a management tools correlation engine, even when duplicated.

The image given below shows event logging and event filtering.

event logging and event filtering

In addition, it is possible to turn off all unwanted event notifications.

Manage Exceptional Events

Now let us discuss how to manage exceptional events.

If the output of filtering is an exception, it means that a service or device is currently operating abnormally.

For example, it may be defined that if the organization’s website takes more than 5 seconds to load, it should be an exceptional event and needs to be looked into.

This means that an Operational Level Agreement or OLA and a Service Level Agreement or SLA have been breached. It also indicates that the business is impacted. Exceptions can represent a total failure, an impaired functionality or a degraded performance.

An exception does not always represent an incident or a problem. It can be managed by using either an incident record or a Request for Change or both. This depends on the organization’s incident and change management policies.

Following are the examples of exceptions:

  • A server is down.

  • The response time of a standard transaction across the network has slowed to more than 15 seconds.

  • More than 1500 users have logged on to the Internet application concurrently.

The image below illustrates how exceptional events are managed.

manage exceptional events

Whenever there is an exceptional event, the incident management team decides if this should be treated as an incident. If yes, then it goes to the incident management process.

If the event is not considered as an incident, the team decides if it is a problem.

If the event is considered as a problem, the problem management team handles the event.

If the incident management team decides that the event is a change, the change management team steps in to handle the event.

Preparing for a career in IT Service? Check out our Course Preview on ITIL Foundation here!

Manage Informational and Warning Events

We will now discuss how to manage informational and warning events.

If the output of filtering is informational, the event does not require any action and does not represent an exception. Such events are stored in the system or service log files for a fixed period. Informational events are used to check the status of a device or service.

They are also used to confirm the successful completion of an activity and generate statistics such as the number of users logged on. These events are used as inputs to an investigation such as finding out which jobs have completed successfully before the transition processing queue hung.

Following are the examples of informational events:

  • A device has come online.

  • A job in the batch queue has completed successfully.

  • A user logs into an application.

  • A transaction has completed successfully.

A warning output is generated when a service or device approaches a predefined threshold. Warnings are usually not raised for a device failure.

For example, memory utilization on a server is currently at 65% and it is increasing. If it reaches 75%, the response time will be excessively long and the OLA for that department will be breached.

managing information and warning events

The image above illustrates how an informational event and a warning are handled.

If an event is an informational event, it is simply logged. But if the event is a warning, there are two ways to handle it. If any manual interference is needed, the concerned team needs to log the incident and work on it.

If the warning is automated, the event is handled by the CI or Configuration Item and it is logged before being closed. Irrespective of the types of events, all of them have to be logged.

Incident Management - Overview

We will now focus on an overview of incident management.

The process of dealing with all incidents throughout their lifecycle is known as incident management. Incidents can be elevated over the phone or email, or through web-based and event management tools.

It is the responsibility of the incident management team to coordinate the fixing of the service. This is to ensure that the service is available to users as soon as possible.

Let us now understand the purpose, objective, scope, and value of incident management.

Purpose

The purpose of incident management is to:

  • Restore normal service operations as soon as possible;

  • Minimise adverse impact on business operations; and

  • Ensure best possible levels of service quality and availability.

Objective

Following are the objectives of incident management process:

  • To ensure standardization of methods and procedures used for the efficient and prompt response.

  • To analyze, document and report incidents during management process.

  • To increase visibility and communication of incidents to business and IT support staff

  • To align incident management activities and priorities with those of the business.

Scope

The primary scope of incident management is to manage any disruption or potential disruption to live IT services. The process also manages the events that are identified. The events may be identified directly by users through service desk or an interface from event management to incident management tools. The events may also be reported or logged by the technical staff.

Value

Incident management adds a lot of value to the business.

It lowers business downtime, which in turn leads to the higher availability of the services. It improves the capability to identify business priorities and allocate resources dynamically as required.

Let understand this with a scenario.

Suppose an IT service provider has no incident management process in place but it has people to provide IT support. In the absence of incident management, issues are handled on a first-come, first-served basis.

It means that if there are 10 issues related to printing, the IT support staff gets busy in resolving those issues. So when a major business service becomes unavailable, there is no IT support available to respond.

On the other hand, if the IT service provider has incident management process in place, printer calls are assigned a low priority, and fewer resources are allocated to resolve the issue. So if a high-priority incident occurs, the resources can be shifted immediately to resolve the issue, while managing the printer issues as well.

Another value that incident management adds to a business is that it increases the ability to identify potential improvements to services.

Incident Management - Basic Concepts

Some of the basic concepts of incident management are timescales, incident models, and major incidents.

The objective of incident management is to restore service as soon as possible.

Timescales

The timescale in which the incidents are resolved is important. To commit to such a time scale, the service provider and the customer must agree and document it in the Service Level Agreements.

Timescales depend on the priority defined and are documented in the OLA and the UPC or Underpinning Contracts.

Service management tools are used to automate timescales and escalate the incident as required. This is done on the basis of predefined rules. This brings up another key concept of incident models.

Incident models

Incident models are based on predefined steps to handle a particular incident. They are detailed descriptions of the steps and order in which incidents are handled.

An example of an incident model is a major incident.

Major incidents

A major incident is a break in service, which threatens to cause or is causing loss to the business and if not given immediate attention may lead to the invocation of a disaster. The loss can be in terms of finance or brand image.

A separate procedure, with shorter timescales and greater urgency, is used for major incidents.

Incident Management - Process Flow

Following are the steps involved in the process flow of incident management.

Identification

Here the incident is detected or reported through event management. Alternatively, the user impacted can register it through a web interface, phone call or email.

Registration

Here the incident is logged, and a record is created.

Incident categorization

The registered incident is categorized according to type, status, impact, urgency or SLA. This is called incident categorization. If the issue reported may not be an incident but a request from the user or a change proposal, then it is handled according to the request fulfillment process.

Prioritization

Once the incident is categorized, it is assigned an appropriate prioritization code to determine how it is to be handled by support tools and support staff. Priority is decided on the impact and urgency of the issue.

Functional escalation

After prioritization, an initial diagnosis is carried out to discover the full symptoms of the incident. If the service desk cannot resolve the incident, it is escalated for further support. This is known as a functional escalation.

If incidents are more serious, the appropriate IT managers should be notified. This is called hierarchical escalation.

Functional escalation is based on knowledge or expertise. It is also known as a horizontal escalation.

Hierarchical escalation

If hierarchical escalation is done for corrective actions by authorized line management, it is known as a vertical escalation. It is done when the resolution of an incident is not satisfactory to the end user.

Investigation

If no escalation is required and there is no known solution, the incident is investigated. This investigation for a solution can also happen at the level of functional escalation.

Resolved

If the solution is found, it is applied and the issue is resolved.

Closed

If the incident is fully resolved, the service recovers to a fully functional level and the user is satisfied with the solution. So the incident can be closed.

incident management process flow

The image above illustrates steps involved in involved in incident management process flow.

Take a look at the example of incident management below.

Incident management example

Process Interfaces

Following are the interactions of incident management with other processes for its day-to-day activities:

Service Level Management or SLM

Incident management and SLM are interdependent to know the SLA of the CI that needs to be fixed. Service Level Management is dependent on incident management for their SLA targets and performances.

Event management

This is a one-way dependency where incident management is dependent on event management to raise the incidents. Problem management This is a one-way dependency where problems are raised by incidents when they are not able to solve it.

Change management

This is an interdependent process because some incidents are fixed only by changes in the configuration, and some changes implemented may cause incidents.

Service Asset and Configuration Management or SACM

Incident management and SACM are interdependent on each other. This is because information from the CMDB or Configuration Management Database is needed to resolve the incidents. Whereas, the CMDB needs to know the faulty CIs on a periodic basis.

Availability management

This is a one-way dependency where incident management needs to coordinate with availability management to know the available time, planned downtime and available incidents.

Capacity management

Incident management and capacity management are interdependent. This is because some of the incidents may arise due to capacity issues. So it is important to know the performance of the components and the service.

process activities in incident management

The image above illustrates process interfaces of incident management.

Problem Management - Overview

We will now focus on problem management. Problem management is the process responsible for managing the lifecycle of all problems.

Let us now understand the purpose, objective, scope, and value of problem management.

Purpose

The purpose of problem management is to:

  • Identify and eliminate the root cause of incidents in the IT infrastructure;

  • Eliminate recurring incidents; and

  • Minimise the impact of incidents that cannot be prevented.

Objective

Following are the objectives of problem management:

  • To prevent problems and resulting incidents from taking place.

  • To eliminate recurring incidents.

  • To fix the root cause and minimize the impact of the incidents.

Scope

Following is the scope of problem management:

  • To diagnose the root cause of incidents.

  • To determine the resolution to the problems.

  • To ensure the resolution is implemented through appropriate control procedures.

  • To maintain information about problems.

Value

Following are the values problem management adds to businesses:

  • Adds improvement to IT service availability.

  • Reduces downtimes and disruptions of business-critical systems.

  • Reduces expenditure on workarounds or fixes that do not work.

  • Reduces cost of effort in resolving repeating incidents.

Types of Problem Management Processes

Let us now discuss the types of problem management processes.

Problem management has two sub-processes, which are reactive and proactive problem management processes.

Reactive problem management

Reactive problem management deals with analyzing and resolving the causes of incidents. These activities are performed by service operations. The activities in reactive problem management are similar to those of incident management.

The subsequent activities are different as this is where the actual root-cause analysis is performed and Known Errors are corrected.

Proactive problem management

Proactive problem management deals with activities that detect and prevent future problems or incidents. Proactive problem management includes the identification of trends or potential weaknesses. It is initiated by service operations.

There are two main activities in proactive problem management:

Trend analysis

It involves the reviewing of reports from other processes.

Example: Trends in incidents, availability levels, relationships with changes, releases and identification of recurring problems

Targeting preventative action

It involves performing a cost-benefit analysis of all costs associated with prevention and targeting specific areas. This requires more support and attention.

Reactive Problem Management - Process Flow

Following is the simple process flow of reactive problem management:

Detection and logging

Here the service desk detects the problem. If a definitive cause of the incident is not identified or the incident is suspected to recur again, then it is recorded.

A cross-reference is made of the incident(s), which initiated the problem record, and all relevant details are copied from the incident record(s) to the problem record.

Categorization

Once the problem is recorded, they are categorized and prioritized in the same way as incidents. However, the frequency and impact of incidents also play a role in the prioritization of problems.

Investigation and diagnosis

Once the problem is prioritized, an effort is spent on investigation and diagnosis of the problem to find the root cause. In some cases, it may be possible to find a workaround that is a temporary way of overcoming the difficulties to the incidents caused by the problem.

For example, a manual addition is made to an input file, so that a program runs successfully.

The manual addition also allows a billing process to complete satisfactorily. But it is important that work on a permanent resolution continues.

In this scenario, find the reason for the file becoming corrupt and correct the cause of the defect to prevent this from happening again.

In case a workaround is found, it is important that the problem record remains open and details of the workaround are always documented in the problem record.

Known Error recorded

As soon as the diagnosis is complete, and a workaround is found, a Known Error record must be raised and placed in the Known Error Database. This helps to identify further incidents or problems when they arise. It also helps to restore the service faster.

Request for Change

As soon as a solution is found, a Request for Change or RFC is raised to apply the solution. Not all solutions are cost justified. So decisions might be taken to not apply the solution and use a workaround. After the change is implemented and the resolution applied, the problem record is formally closed.

Major problems

A major problem is determined by the priority of the problem. Once a major problem is detected, a review is conducted and the lessons learned are documented before the problem record is closed.

Problem Management – Interface with Other Processes

Following are the processes problem management involves its day-to-day activities:

Change management

Problem management and change management are interdependent. This is because some problems can be fixed only by changes in the configuration, whereas, some changes implemented may cause problems.

SACM

Problem management and SACM are interdependent. This is because problems need information from the CMDB to solve the problems, and the CMDB needs to know the faulty CIs on a periodic basis. This helps in proactive problem management.

Release and Deployment Management or RDM

Problem management and RDM are interdependent. This is because some problems can be fixed only by version upgradation, and need release planning for proactive problem management.

IT Service Continuity

Management Problem management and IT Service Continuity Management are interdependent. This is because problem management has to know the continuity level, business continuity plan, and disaster recovery plans.

Availability management

Problem management and availability management need to coordinate with availability management to know the availability time and planned downtime for releases.

Capacity management

Problem management and capacity management are interdependent. This is because some of the problems may arise due to capacity issues. So you should know the performance of the components and the services.

Service Level Management

Service Level Management is dependent on problem management to fix all the problems for their CI targets and performance.

Finance management

Problem management and finance management are interdependent because problem management needs the problems to be solved by changes. These changes may incur cost based on which the finance team needs to plan the finance for the year.

problem management - interface with other processes

The image above illustrates the processes that integrate with problem management for its day-to-day activities.

Request Fulfillment - Overview

Now we will focus on an overview of request fulfillment.

The process of dealing with service requests from the users is known as request fulfillment. Many elements of request fulfillment are automated through self-help such as websites and user applications, with manual activities being used where necessary.

Let us understand the purpose, objective and scope of request fulfillment.

Purpose

The main purpose of request fulfillment is to manage all service requests raised by users throughout the IT Service Management lifecycle.

Objective

Following are the objectives of request fulfillment:

  • Maintain user and customer satisfaction through efficient and professional handling.

  • Provide a channel for users to request and receive standard services.

  • Source and deliver the components of requests.

  • Assist the customer with general information, complaints or comments.

Scope

The scope of request fulfillment in an organization, where a large number of service requests are handled, is to separate the work stream. The process also records and manages the service requests as separate record types. The organization decides what should be handled as a service request.

Anything that occurs frequently and has low cost, risk, and impact on the organization's business is considered as a service request. The basic concept of the request fulfillment process is the request model, which is used for specific procedures.

The request model is also used for handling certain types of requests. Such requests are IMAC that inserts, moves, adds and changes, and password resets.

Service Request

Let us discuss service request.

A service request is a generic description of the different types of demands that users place on the IT department.

Most of these requests are small changes—low risk, frequently occurring, low cost and so on.

Their scale, frequency and low-risk nature mean that instead of obstructing the normal incident and change management processes, you should implement a separate process to handle the requests better. Hence, the request fulfillment process is used to handle them.

Example: Request to change a password or unlock accounts, or request to install an approved software application onto a particular PC

Access Management - Overview

Now we will focus on an overview of access management.

Access management is the operational execution of the policies defined by information security and availability management. Access management is the process of granting and preventing service rights to users.

Let us understand the purpose, objective and scope of access management.

Purpose

The key purpose of access management is to grant authorized users the right to use a service. The process also helps to prevent access to non-authorised users in an organization. Access management ensures that users are given the right to use a service, but it does not ensure that the access is available at all agreed times.

Objective

The objective of access management is to execute policies and actions defined by Information Security Management or ISM and availability management. Often this process is coordinated centrally by the Service Desk and can involve the technical and application management functions.

Example: If you want Internet access for your banking account, you call the helpdesk at the bank. The help desk then coordinates with the Internet banking application support team to get your access granted.

Scope

The scope of access management is to enable the organization to manage the confidentiality, integrity, and availability of their data and intellectual property.

Curious about the ITIL Foundation course? Watch our Course Preview for free!

Summary

Let us summarize what we have learned in this lesson:

  • Event management is the basis for operational monitoring and control.

  • The process of dealing with all the incidents throughout the service lifecycle is known as incident management.

  • The three basic concepts of incident management are Timescales, Incident models, and Major incidents.

  • Problem management has two sub-processes: reactive problem management and proactive problem management.

  • The process of dealing with service requests is known as request fulfillment.

  • Access management is the operational enforcement of the policies.

Conclusion

Next, we will focus on the twelfth chapter which is Functions.

Find our ITIL® Foundation Online Classroom training classes in top cities:


Name Date Place
ITIL® Foundation 12 Oct -13 Oct 2018, Weekdays batch Your City View Details
ITIL® Foundation 22 Oct -26 Oct 2018, Weekdays batch Chicago View Details
ITIL® Foundation 9 Nov -10 Nov 2018, Weekdays batch Dallas View Details
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Request more information

For individuals
For business
Name*
Email*
Phone Number*
Your Message (Optional)
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Email*
Phone Number*
Company*
Job Title*