Incident Management Practices

Overview

This article outlines the practices and associated compliance requirements, roles and responsibilities, and overall process standards for Incident Management at Duck Creek Technologies for the Software-as-a-System (SaaS) Operations Team. It provides a mechanism to carry out the Incident Management Process in a structured and consistent manner in accordance with ITIL guidelines.

Performed effectively, the Incident Management Practice will minimize the adverse impact of incidents on the business and restore service as quickly as possible to ensure service levels are maintained and service meets Customer contractual requirements.

The practices outlined in this article are intended to influence and determine decisions and actions surrounding Incident Management. This Practice is reviewed annually, and in accordance with the enterprise Practice, Standard, and Procedure Management Practice, to ensure it is current and accurately reflects business activities and processes.

Requirements

A complete Incident Management Practice meets the following requirements:

  • Outlines the steps that should be taken to handle the incident.
  • Identifies the required roles and responsibilities.
  • Supports the classification of incidents.
  • Supports and is able to match the incident against known problems and errors.
  • Provides and supports integration with a Problem Management Practice.
  • Supports and assists in the provision of initial support and diagnosis of incidents.
  • Supports and assists in the activities associated with the assessment of incidents.
  • Defines the activities required to support the resolution of incidents and the subsequent recovery of services.

Scope

This practice statement applies only to Duck Creek SaaS Operations and provides direction on managing resources supporting Duck Creek SaaS Operations networks and applications. This Practice applies to, but is not limited to, the following Duck Creek SaaS product items:

  • Infrastructure Management
  • Environment Management
  • Database Management
  • Disaster Recovery and Backup Management
  • Performance Management
  • Integration Management
  • Updates or Upgrades
  • Base Software Defects

The following parties must adhere to the requirements outlined in this this article:

  • Duck Creek SaaS Operations employees
  • Customers
  • Contractors
  • Consultants
  • System Implementation Teams (SIs)
  • Other Duck Creek workers, including all personnel affiliated with third-party partners.

Additionally, the Practice adheres to the specific scope exclusions detailed in each respective customer contract.

Related Practices

Incident Management has a close relationship with a number of related practices, which are outlined in the table below.

PRACTICERELATIONSHIP
Knowledge
Management
IT Service Management demands a customer-centric view of IT. Knowledge Management helps Duck Creek achieve customer satisfaction, exceed customer expectations, and manage customer perceptions. Knowledge Management is a separate practice and is used alongside Service Request Management as a whole and Incident Management in particular.
Problem
Management
Problem Management assists Incident Management by providing the next path to escalation and resolution (as part of the Major Incident Management Practice), establishing root cause and known errors, supporting Incident Management in restoring services, and providing management reporting on historical data and trend analysis. For additional information on Problem Management, see Problem Management.
Configuration
Management
Configuration Management assists Incident Management by providing valuable information on how much of the IT infrastructure is affected, the Configuration Item (CI) relationships and dependencies of other CIs, up-to-date information on customers, the owner and status of CIs, and identification of incidents of a similar CI type.
Change
Management
Change Management assists Incident Management by providing information on current and future change activity, change history, controlled implementation of changes, and up-to-date information to customers on the progress of changes. For additional information on Change Management, see Change Management.
Service Level ManagementService Level Management assists Incident Management by providing performance metrics on incident response and resolution times and establishing a contact point for Customers when escalations are breached.
Escalation
and
Notification
Escalation and Notification is triggered by Incident Management to assist with the resolution of major incidents within clearly defined timeframes.
Major
Incident
Management
(MIM)
Major Incident Management is initiated when a critical system component outage is not resolved within a predetermined time limit. The restoration team assumes sole responsibility for the restoration of the impacted services and establishes a formal communication vehicle between groups involved in the restoration. For additional information on Major Incident Management, see Major Incident Management Practices.

Definitions

Key terms used in the Incident Management Practice are defined in the table below.

TERMDEFINITION
IncidentPer ITIL, an incident is defined as “an unplanned interruption to an IT Service or a reduction in the Quality of an IT Service. Typically, this interruption is a minor occurrence or condition that requires an explanation, workaround, or resolution. An incident is the initial notification that something unexpected has occurred. It is possible for multiple similar or related incidents to be logged, which all relate back to a single problem.
Incident
Severities
Incident Severity is the assigned severity of the incident ticket based on that incident’s business impact and urgency. To identify severity, Duck Creek uses an urgency and impact matrix.
ImpactDuck Creek defines impact as the scope of number of users affected by the service interruption.
UrgencyUrgency is defined as the measure of effect an incident has on business processes.
Service
Level
Agreements
(SLA)
Each incident severity level has a contractually agreed to response time and resolution time. SLAs are the contractually agreed to Incident response and resolution times, defined by Duck Creek and the customer.
Service
Level
Objectives
(SLO)
Each incident severity level has an internal objective, or goal, response time, and resolution time. Service Level Objectives act as internal targets for the Incident Management team to promote continuous improvements in process and timeframe for delivery of Incident Management services.
Configuration
Item
A CI is defined as any component under the control of Change Management. Although CIs may vary immensely in complexity size and type from an entire system to a specific component, the Incident Management Process is primarily concerned with the CI representing the affected Service also known as the Application CI.
Ticket
Transfer
Ticket transfer is the process of reassigning or escalating an incident to the appropriate functional team for resolution.
Ticket StateTicket state is the current status of the incident ticket during its lifecycle.

Roles and Responsibilities

Roles included in the Incident Management Practice and their designated responsibilities are outlined in the table below.

ROLERESPONSIBILITIES
Incident
Management
Process
Owner
The Incident Management Process Owner is responsible for the lifecycle of the Incident Management process. This person represents and is accountable for the performance of the process in the enterprise. The process owner maintains the procedures and this document, ensures compliance, and enforces the directives contained in this document.
Major
Incident
Management
(MIM) Team
The Major Incident Management (MIM) Team facilitates the resolution of a major incident, which includes the following actions:
  • Ensuring the all impacted parties are notified of the outage.
  • Establishing MIM Team by Product or impacted services.
  • Driving the bridge until service is restored.
  • Ensuring the timely internal and external notification of MIM communication updates. Internal updates are provided to the Incident Commander, and external updates are provided to the CSM so that they can then be conveyed to the customer.
  • Ensuring that a Scribe records all data and information on the Incident Summary tab in ServiceNow.
For more information on the responsibilities of the MIM Team, see Major Incident Management Practices.
Incident
Commander
The Incident Commander (IC) is responsible for creating and facilitation the MIM bridge, engaging Product Subject Matter Experts (SMEs) to investigate on the bridge, sending out major incident communications, ensuring the Post Incident Review (PIR) is scheduled, and that an Incident Owner has agreed to attend the PIR. In the MIM Process, the Incident Commander (IC) coordinates the efforts and resources to manage the restoration of service.
Incident
Coordinator
The Incident Coordinator is designated to oversee the incident ticket queues of one or more functions. This person ensures mandated response times are met, proper escalation procedures are followed, and tracks all incidents in his or her queue from assignment through closure for the non-major incident process. The coordinator identifies the required resources to work the incident, communicates and documents the timeline of events, and performs post-incident reviews as needed.
Incident
Fulfiller
(Assignee)
The Incident Fulfiller (assignee) is an individual on the Duck Creek Team who is responsible at a single incident level for resolving the incidents, recording an incident summary, recording outage information for business impact reporting, and ensuring the veracity of all information in the ticket prior to resolution.

There may be more than one Incident Fulfiller for each incident. Incident Fulfillers can be of any rank or title and may work independently, in collaboration, or as lateral points of escalation and consultation.
Customer
Service
Manager
(CSM)
The Customer Service Manager (CSM) acts as the primary contact between Duck Creek and the customer. Any necessary communications to the customer are first communicated to the CSM.
Qualification
Team
Member
The Qualification Team Member acts as the gatekeeper for incidents input into the associated team’s queue. The team member verifies that the Severity Level is in accordance with standardized matrixes or SLA agreements, ensures relevant details such as Description, Repeatable Failure Path, Environment details, Product Version, and Product Family are accurate and present within the incident case, and communicates with the incident case submitter to gather any missing details.
EngineeringThe Engineering Group is responsible for delivering new features and maintaining the base product code. They are assigned incidents deemed to be base code defects. They may also be pulled in to consult on specific incidents and problems.

Incident Severity

An incident’s severity is calculated based on its urgency and impact. Urgency is the measure of effect an incident has on business processes, while impact is the scope of number of users affected by the service outage.

Incident Urgency

Urgency categories for the Incident Management Practice are described in the table below. In order to determine an incident’s urgency, choose the most relevant category.

URGENCY
CATEGORY
DESCRIPTION
All work functions not available
  • Complete outage of one or more product applications.
  • Causes significant financial or reputational impact.
  • As an example, a user being unable to access Claims Desktop or bind transactions within Duck Creek Policy.
Work functions not available
  • Certain business activities within a product are unavailable.
  • Causes financial or reputational impact.
  • As an example, a user being able to bind policies but not print forms.
Work functions partially impacted
  • Degradation or latency of one application or within certain business activities.
  • Potential to cause financial or reputation impact.
  • Urgent concern that needs to be addressed before go-live.
  • As an example, a user would be able to complete work, but with the system running slow, or an improper rating for a specified state that has not gone live.
Work functions at risk
  • Concern that needs to be addressed before go-live.
  • Issue for which a workaround already exists.
  • As an example, a required indicator not showing for a required field on a page.

Incident Impact

Impact categories for the Incident Management Practice are described in the table below. In order to determine an incident’s impact, choose the most relevant category.

IMPACT CATEGORYDESCRIPTION
Enterprise, RegionAffects all users for a company.
Department, Location, Business UnitAffects an entire department, location, or business unit.
Multiple UsersAffects multiple users.
Single UserAffects only a few users or a single user.

Incident Severity Matrix

If classes are defined to rate urgency and impact, an Urgency-Impact Matrix (also referred to as an Incident Severity Matrix) can be used to define severity levels. In the table below, the numbers represent the severity level determined by impact and urgency, ranging from 1 (critical severity) to 4 (low severity).

URGENCY IMPACT
Critical
Enterprise, Region
High
Department, Location, Business Unit
Medium
Multiple Users
Low
Single User
Critical – All work functions not available. 1 1 2 2
High – Work functions not available. 1 2 3 3
Medium – Work functions partially impacted. 2 3 3 4
Low – Work functions at risk. 2 3 4 4

Service Level Objectives

A Service Level Objective (SLO) is a key element of a Service Level Agreement (SLA) between Duck Creek and a customer. SLOs are agreed upon as a means to measure the performance of the service provider and are outlined to avoid disputes between the two parties due to misunderstandings. Duck Creek set the SLOs based on the severity levels defined in the Incident Severity Matrix.

The Support Window for Severity Levels 2-4 starts on Sunday at 7 PM EST and runs through Friday 7 PM EST, excluding holidays.

The SLOs are outlined in the table below.

SEVERITY LEVELRESOLUTION OBJECTIVE (IN HOURS)SUPPORT WINDOW
Severity 1424×7
Severity 22424×5
Severity 34824×5
Severity 46024×5

Incident Management Process

The Incident Management Process is triggered by an incident that causes an interruption to or a reduction in the quality of service, and concludes with the successful restoration of service and the incident being resolved to the customer’s satisfaction.

The inputs and outputs of the Incident Management Process are listed in the table below.

INPUTSOUTPUTS
  • An incident reported by the customer, Customer Services Manager (CSM), a support group, or a System Monitoring Tool.
  • Configuration details from the Configuration Management Database (CMDB).
  • Response from an incident matching against Problems and known errors.
  • Response on Requests for Change (RFC) to effect resolution for incident(s).
  • Service is restored.
  • A completed Incident Record, including resolution details.
  • Notification to customers including formal RCA document, if applicable.
  • Management information (in the form of reports).
  • Problem raised in Problem Management System.
  • Changes raised in Change Management System.
  • Request for new knowledge article.
  • Updated Configuration Management records.

The Incident Management process consists of the following sub-processes:

  1. Incident Intake
  2. Qualification
  3. Diagnose and Escalate
  4. Resolve
  5. Incident Closure

A high-level diagram of the Incident Management Process is provided below.

Incident Intake Process

The Incident Intake Process is outlined in the diagram below. For additional information on the Service Request Management Process, see Service Request Management Practices.

Qualification Process

The Qualification Process is outlined in the diagram below. For additional information on the Major Incident Management Process, see Major Incident Management Practices.

Diagnose and Escalate Process

The Diagnose and Escalate Process is outlined in the diagram below.

Resolve Process

The Resolve Process is outlined in the diagram below. For additional information on the Change Management Process, see Change Management Practices, and for more information on the Problem Management Process, see Problem Management Practices.

Incident Closure Process

The Incident Closure Process is outlined in the diagram below. For additional information on the Problem Management Process, see Problem Management Practices.