Incident Management Practices
Overview
This article outlines the practices and associated compliance requirements, roles and responsibilities, and overall process standards for Incident Management at Duck Creek Technologies for the Software-as-a-System (SaaS) Operations Team. It provides a mechanism to carry out the Incident Management Process in a structured and consistent manner in accordance with ITIL guidelines.
Performed effectively, the Incident Management Practice will minimize the adverse impact of incidents on the business and restore service as quickly as possible to ensure service levels are maintained and service meets Customer contractual requirements.
The practices outlined in this article are intended to influence and determine decisions and actions surrounding Incident Management. This Practice is reviewed annually, and in accordance with the enterprise Practice, Standard, and Procedure Management Practice, to ensure it is current and accurately reflects business activities and processes.
Requirements
A complete Incident Management Practice meets the following requirements:
- Outlines the steps that should be taken to handle the incident.
- Identifies the required roles and responsibilities.
- Supports the classification of incidents.
- Supports and is able to match the incident against known problems and errors.
- Provides and supports integration with a Problem Management Practice.
- Supports and assists in the provision of initial support and diagnosis of incidents.
- Supports and assists in the activities associated with the assessment of incidents.
- Defines the activities required to support the resolution of incidents and the subsequent recovery of services.
Scope
This practice statement applies only to Duck Creek SaaS Operations and provides direction on managing resources supporting Duck Creek SaaS Operations networks and applications. This Practice applies to, but is not limited to, the following Duck Creek SaaS product items:
- Infrastructure Management
- Environment Management
- Database Management
- Disaster Recovery and Backup Management
- Performance Management
- Integration Management
- Updates or Upgrades
- Base Software Defects
The following parties must adhere to the requirements outlined in this this article:
- Duck Creek SaaS Operations employees
- Customers
- Contractors
- Consultants
- System Implementation Teams (SIs)
- Other Duck Creek workers, including all personnel affiliated with third-party partners.
Additionally, the Practice adheres to the specific scope exclusions detailed in each respective customer contract.
Related Practices
Incident Management has a close relationship with a number of related practices, which are outlined in the table below.
PRACTICE | RELATIONSHIP |
---|---|
Knowledge Management | IT Service Management demands a customer-centric view of IT. Knowledge Management helps Duck Creek achieve customer satisfaction, exceed customer expectations, and manage customer perceptions. Knowledge Management is a separate practice and is used alongside Service Request Management as a whole and Incident Management in particular. |
Problem Management | Problem Management assists Incident Management by providing the next path to escalation and resolution (as part of the Major Incident Management Practice), establishing root cause and known errors, supporting Incident Management in restoring services, and providing management reporting on historical data and trend analysis. For additional information on Problem Management, see Problem Management. |
Configuration Management | Configuration Management assists Incident Management by providing valuable information on how much of the IT infrastructure is affected, the Configuration Item (CI) relationships and dependencies of other CIs, up-to-date information on customers, the owner and status of CIs, and identification of incidents of a similar CI type. |
Change Management | Change Management assists Incident Management by providing information on current and future change activity, change history, controlled implementation of changes, and up-to-date information to customers on the progress of changes. For additional information on Change Management, see Change Management. |
Service Level Management | Service Level Management assists Incident Management by providing performance metrics on incident response and resolution times and establishing a contact point for Customers when escalations are breached. |
Escalation and Notification | Escalation and Notification is triggered by Incident Management to assist with the resolution of major incidents within clearly defined timeframes. |
Major Incident Management (MIM) | Major Incident Management is initiated when a critical system component outage is not resolved within a predetermined time limit. The restoration team assumes sole responsibility for the restoration of the impacted services and establishes a formal communication vehicle between groups involved in the restoration. For additional information on Major Incident Management, see Major Incident Management Practices. |
Definitions
Key terms used in the Incident Management Practice are defined in the table below.
TERM | DEFINITION |
---|---|
Incident | Per ITIL, an incident is defined as “an unplanned interruption to an IT Service or a reduction in the Quality of an IT Service. Typically, this interruption is a minor occurrence or condition that requires an explanation, workaround, or resolution. An incident is the initial notification that something unexpected has occurred. It is possible for multiple similar or related incidents to be logged, which all relate back to a single problem. |
Incident Severities | Incident Severity is the assigned severity of the incident ticket based on that incident’s business impact and urgency. To identify severity, Duck Creek uses an urgency and impact matrix. |
Impact | Duck Creek defines impact as the scope of number of users affected by the service interruption. |
Urgency | Urgency is defined as the measure of effect an incident has on business processes. |
Service Level Agreements (SLA) | Each incident severity level has a contractually agreed to response time and resolution time. SLAs are the contractually agreed to Incident response and resolution times, defined by Duck Creek and the customer. |
Service Level Objectives (SLO) | Each incident severity level has an internal objective, or goal, response time, and resolution time. Service Level Objectives act as internal targets for the Incident Management team to promote continuous improvements in process and timeframe for delivery of Incident Management services. |
Configuration Item | A CI is defined as any component under the control of Change Management. Although CIs may vary immensely in complexity size and type from an entire system to a specific component, the Incident Management Process is primarily concerned with the CI representing the affected Service also known as the Application CI. |
Ticket Transfer | Ticket transfer is the process of reassigning or escalating an incident to the appropriate functional team for resolution. |
Ticket State | Ticket state is the current status of the incident ticket during its lifecycle. |
Roles and Responsibilities
Roles included in the Incident Management Practice and their designated responsibilities are outlined in the table below.
ROLE | RESPONSIBILITIES |
---|---|
Incident Management Process Owner | The Incident Management Process Owner is responsible for the lifecycle of the Incident Management process. This person represents and is accountable for the performance of the process in the enterprise. The process owner maintains the procedures and this document, ensures compliance, and enforces the directives contained in this document. |
Major Incident Management (MIM) Team | The Major Incident Management (MIM) Team facilitates the resolution of a major incident, which includes the following actions:
|
Incident Commander | The Incident Commander (IC) is responsible for creating and facilitation the MIM bridge, engaging Product Subject Matter Experts (SMEs) to investigate on the bridge, sending out major incident communications, ensuring the Post Incident Review (PIR) is scheduled, and that an Incident Owner has agreed to attend the PIR. In the MIM Process, the Incident Commander (IC) coordinates the efforts and resources to manage the restoration of service. |
Incident Coordinator | The Incident Coordinator is designated to oversee the incident ticket queues of one or more functions. This person ensures mandated response times are met, proper escalation procedures are followed, and tracks all incidents in his or her queue from assignment through closure for the non-major incident process. The coordinator identifies the required resources to work the incident, communicates and documents the timeline of events, and performs post-incident reviews as needed. |
Incident Fulfiller (Assignee) | The Incident Fulfiller (assignee) is an individual on the Duck Creek Team who is responsible at a single incident level for resolving the incidents, recording an incident summary, recording outage information for business impact reporting, and ensuring the veracity of all information in the ticket prior to resolution. There may be more than one Incident Fulfiller for each incident. Incident Fulfillers can be of any rank or title and may work independently, in collaboration, or as lateral points of escalation and consultation. |
Customer Service Manager (CSM) | The Customer Service Manager (CSM) acts as the primary contact between Duck Creek and the customer. Any necessary communications to the customer are first communicated to the CSM. |
Qualification Team Member | The Qualification Team Member acts as the gatekeeper for incidents input into the associated team’s queue. The team member verifies that the Severity Level is in accordance with standardized matrixes or SLA agreements, ensures relevant details such as Description, Repeatable Failure Path, Environment details, Product Version, and Product Family are accurate and present within the incident case, and communicates with the incident case submitter to gather any missing details. |
Engineering | The Engineering Group is responsible for delivering new features and maintaining the base product code. They are assigned incidents deemed to be base code defects. They may also be pulled in to consult on specific incidents and problems. |
Incident Severity
An incident’s severity is calculated based on its urgency and impact. Urgency is the measure of effect an incident has on business processes, while impact is the scope of number of users affected by the service outage.
Incident Urgency
Urgency categories for the Incident Management Practice are described in the table below. In order to determine an incident’s urgency, choose the most relevant category.
URGENCY CATEGORY | DESCRIPTION |
---|---|
All work functions not available |
|
Work functions not available |
|
Work functions partially impacted |
|
Work functions at risk |
|
Incident Impact
Impact categories for the Incident Management Practice are described in the table below. In order to determine an incident’s impact, choose the most relevant category.
IMPACT CATEGORY | DESCRIPTION |
---|---|
Enterprise, Region | Affects all users for a company. |
Department, Location, Business Unit | Affects an entire department, location, or business unit. |
Multiple Users | Affects multiple users. |
Single User | Affects only a few users or a single user. |
Incident Severity Matrix
If classes are defined to rate urgency and impact, an Urgency-Impact Matrix (also referred to as an Incident Severity Matrix) can be used to define severity levels. In the table below, the numbers represent the severity level determined by impact and urgency, ranging from 1 (critical severity) to 4 (low severity).
URGENCY | IMPACT | |||
---|---|---|---|---|
Critical Enterprise, Region |
High Department, Location, Business Unit |
Medium Multiple Users |
Low Single User |
|
Critical – All work functions not available. | 1 | 1 | 2 | 2 |
High – Work functions not available. | 1 | 2 | 3 | 3 |
Medium – Work functions partially impacted. | 2 | 3 | 3 | 4 |
Low – Work functions at risk. | 2 | 3 | 4 | 4 |
Service Level Objectives
A Service Level Objective (SLO) is a key element of a Service Level Agreement (SLA) between Duck Creek and a customer. SLOs are agreed upon as a means to measure the performance of the service provider and are outlined to avoid disputes between the two parties due to misunderstandings. Duck Creek set the SLOs based on the severity levels defined in the Incident Severity Matrix.
The Support Window for Severity Levels 2-4 starts on Sunday at 7 PM EST and runs through Friday 7 PM EST, excluding holidays. |
The SLOs are outlined in the table below.
SEVERITY LEVEL | RESOLUTION OBJECTIVE (IN HOURS) | SUPPORT WINDOW |
---|---|---|
Severity 1 | 4 | 24×7 |
Severity 2 | 24 | 24×5 |
Severity 3 | 48 | 24×5 |
Severity 4 | 60 | 24×5 |
Incident Management Process
The Incident Management Process is triggered by an incident that causes an interruption to or a reduction in the quality of service, and concludes with the successful restoration of service and the incident being resolved to the customer’s satisfaction.
The inputs and outputs of the Incident Management Process are listed in the table below.
INPUTS | OUTPUTS |
---|---|
|
|
The Incident Management process consists of the following sub-processes:
A high-level diagram of the Incident Management Process is provided below.
Incident Intake Process
The Incident Intake Process is outlined in the diagram below. For additional information on the Service Request Management Process, see Service Request Management Practices.
Qualification Process
The Qualification Process is outlined in the diagram below. For additional information on the Major Incident Management Process, see Major Incident Management Practices.
Diagnose and Escalate Process
The Diagnose and Escalate Process is outlined in the diagram below.
Resolve Process
The Resolve Process is outlined in the diagram below. For additional information on the Change Management Process, see Change Management Practices, and for more information on the Problem Management Process, see Problem Management Practices.
Incident Closure Process
The Incident Closure Process is outlined in the diagram below. For additional information on the Problem Management Process, see Problem Management Practices.