Minimizing Fallout With Incident Management

Every business or organization wants to ensure smooth operations and provide uninterrupted services to its customers, but unexpected problems like power outages, or server failures (that are more common than you can expect) can interrupt these services and lead to potential loss. The problem is not the occurrence of unexpected incidents, but how effectively and efficiently an organization handles and resolves these problems without them significantly affecting the business. This is where incident management comes into place.

What is an Incident?

In the information technology space, the ITIL (Information Technology Infrastructure Library) defines an incident as any unplanned event that could cause interruption to or reduce the quality of, an IT service. This includes events that may not disrupt a service completely but impact its quality, e.g. slow internet speed or viruses consuming processing power.

What is Incident Management

Incident management is the process of identifying, managing, and analyzing such incidents to restore service operations to normal with minimum impact on the business.

Much like you have your own processes and tools within your own life to prevent theoretical misfortune, such as:

  • Making sure your phone is plugged in or having a separate alarm clock to keep you from being late for work, or
  • Installing a smoke detector in your apartment to prevent a fire or reduce the potential damage.

For the IT services in your business, this would include implementing firewalls and detection systems, etc to protect and monitor your systems.

The Need for Incident Management

In 2010, the Stuxnet worm destroyed multiple centrifuges in Iran’s nuclear power plant. It was not a remote attack but spread through an infected USB. A simple unauthorized access led to a huge political and national crisis with losses in the millions.

A more recent incident is that of the exploitation of the printer spool service in windows systems, dubbed as PrintNightmare. A combination of remote code execution and privilege escalation enabled the attacker to take control of the system. All that has to be done to prevent this is disable the target service until your system is updated with a patch (Hopefully, done by the time you are reading this.)

Being part of the Incident Management team does not mean only acting when there is a fire to put out, but creating and refining preventative processes to reduce the chances of an incident, to begin with, along with the potential fallout.

Categorization

From the printer not working, to service being completely down – each incident does not carry the same impact level, and each event needs to be categorized in order to be efficiently resolved. This is done by keeping multiple variables in mind:

Impact – The effect of an incident on the business services or processes

Priority – Variable used to define the importance of an incident. It is usually defined as Low, Medium, High.

Time period – The agreed expected response time and resolution time of the target event. This is usually incorporated in the SLAs and defined for each phase of Incident Management.

Urgency – How long it takes for an impact to affect the business significantly.

Usually, an ‘Impact’ and ‘Urgency’ matrix is utilized to assign a final level to an incident. A high-impact incident may have low urgency and vice versa and needs to be defined by the organization. An incident with high impact and high urgency is known as a Major Incident.


Incident Management Life Cycle

There are many standards like ITIL, NIST Incident Handling Guide, and PCI-DSS, etc. that define Incident Management processes, but broadly the multiple phases can be divided into three main stages:

  1. Pre-Incident is mostly administrative and focuses on detecting and identifying an incident
  2. Incident Response actually mitigates and resolves the incident that has occurred
  3. Post-incident wraps up the process and usually focuses on generating detailed reports and lessons learned.

Let’s have a closer look at the various stages of an incident

Pre Incident

1. Identification & Logging

Identification – As the name suggests, this stage identifies that an incident has occurred. This is usually carried out with monitoring and detection systems in place. This does not necessarily ensure that an incident will always be detected beforehand e.g. zero-day attacks.

Logging – After identifying an incident, we need to keep track of it throughout its lifetime until the incident is resolved. This is usually done by generating a ticket against the incident with information like the date and time of the incident, its impact, etc. Logging and documenting help keep track of previous incidents which can be viewed later for various purposes like auditing, trend analysis, or forensics.

2. Classification & Prioritization

Classification – This step is important in resolving the issue and is usually graded according to the requirements of an organization. An incident can be categorized with respect to hardware, software and further sub-categorized into printers, servers, etc. Simplicity is key here, if you create too many categories and subcategories it can quickly become unmanageable.

Logging – This step assigns a level to the incident based on both its impact on the business as well as its urgency. An incident with low impact and high urgency has higher priority than an incident with high impact and low urgency.

Incident Response

3. Investigation & Diagnosis

First, we need to investigate who needs to be involved to resolve the incident and perform an initial diagnosis to comprehend the entailing problem. Can the IT team resolve the incident? Does executive management need to get involved?

4. Resolution & Recovery

Easier said than done, but this step is as simple as finding a solution to the incident and ensuring that business services and operations resume as soon as possible.

Post-Incident

5. Incident Closure

After the incident has been successfully resolved, the ticket can be closed at this stage. Reports can be generated to ensure that it is not a recurring incident. Meetings can be held with required members of the organization accordingly.

Best Practices for Incident Management

  • Define Incident Management procedures, policies, protocols for communication during an incident, and guidelines for detecting, assessing, documenting, reporting, and responding to an incident.
  • Devise an Incident Response Checklist that can help guide an employee or customer in identifying an incident.
  • Establish an Incident Response team with skilled members.  Define roles and responsibilities for each member. The team should have representation from other departments (IT, finance, legal, etc.) as well.
  • A process should be put in place to inform involved or impacted parties with the cooperation of the legal team.
  • Classification and the ongoing status of incidents should be automated to reduce the chances of errors and save time. Besides being efficient, this also helps to keep track of multiple active incidents.
  • A training program should be developed to test your Incident Management plan and practice security procedures. Awareness campaigns should also be developed for the company employees.
  • An analysis of past incidents can help identify any recurring events and narrow down any vulnerable areas of the organization’s cybersecurity framework. A forensics team should also be established (or third-party services utilized) for the analysis and investigation of incidents that have occurred.

An Incident Management Plan ensures customer satisfaction through a swift and efficient response, analysis, and logging of an incident provided the plan is devised effectively. This makes it an essential tool for any service-based organization. StandardFusion is an end-to-end GRC software that can be used to develop an Incident Management plan centred on your organization’s information security and compliance requirements. Contact our team and set up a demo to see how you can develop your own incident management plan for any scenario.