Are you doing these things to reduce IT incidents?

Every time there's an IT incident, there's an impact on IT's resources and on IT's reputation. Service Desk or Help Desk staff get involved, 2nd level support get involved, managers get involved (when there's a big stuff up or there's an escalation) and, no matter how you look at it, the customer is disrupted.

Every incident carries the risk of IT's reputation being eroded and confidence in IT being lost.

The best way to improve the support you provide to your customer is to reduce the need to provide support in the first place. By pro-actively reducing the number of incidents experienced each month, an IT department can increase customer satisfaction and reduce its support costs at the same time.

So, what can you do to reduce the volume of IT incidents in your IT department?

Here's nine ways to do it. How many of them are you already doing?

#1 Implement ITIL Change Management

According to the itSMF (IT Service Management Forum), 80% of incidents are caused by changes made to the IT environment. This pretty much lines up with my experience. We have a client that halved their ongoing number of Priority 1 incidents by implementing ITIL Change Management and providing incentives for their staff to adopt it. Essentially, ITIL Change Management is a quality control that helps to stop people doing whatever they like in production.

#2 Improve Release Management

Sound management of releases (both their preparation and the way they are introduced into production) will prevent incidents before they happen, e.g. having a methodology and supporting tools that incorporate such things as ensuring appropriate testing has been done prior to a release (quality control) and that releases into production are automated as much as possible (quality control).

Another client of ours reduced their total number of unplanned outages for a key service from an average of 2 hours per month (yikes!) to about 2 minutes per month by implementing ITIL Change Management and improving their release management capability.

#3 Do Root Cause Analysis

After a major incident, root cause analysis should be conducted to understand what caused the incident and how it can be prevented from happening again. This would usually be done as part of the Major Incident Review.

Obviously, the important bit is to make sure that whatever actions were identified to stop the major incident from reoccuring are actually carried out. Sounds simple but many organisations seem to stop at minuting the findings of the review.

#4 Analyse Incident Trends

Periodic trend analysis of incident records can help you identify reoccuring incidents. Depending on how you categorise incidents (Resolution Code is often useful here), you can do a pareto type of analysis to see what types of incident are most frequent and what their causes here. Sometimes you'll be amazed at how many incidents don't have technical root causes, but can be eliminated by changing a business process or providing decent customer training.

If your data doesn't support meaningful analysis, then this can be done by asking the domain experts simple questions like "What can we do to reduce the number of incidents that come to your team?". The domain experts usually know what recurring incidents they keep having to resolve and have a good idea of what can be done to eliminate them.

#5 Create 'On-the-fly' Problem Records

Get the Service Desk and 2nd level support teams to create a Problem Record when they notice that an incident keeps reoccurring, e.g. when an engineer notices that this is the third time this month that a server needed to be rebooted, she would create a Problem Record. Problem Records can be assigned to the appropriate staff (a Problem Management team in larger IT departments, domain experts in smaller departments) so they can investigate the root cause and determine how it can be eliminated.

This same discipline, of creating Problem Records so that root-causes and solutions can be determined, can also be applied to Points 3 and 4 above.

#6 Implement Configuration Management (well)

When ITIL Configuration Management is done well (the configuration Items and their relationships are accurate and meaningful, and the CMDB - the tool where the config info is stored and managed - is actually used by people) configuration management can help reduce incidents. In environments with complex IT infrastructure (which I suspect is most of us!), a good CMDB can help IT staff identify the potential impact of the change(s) they want to make and therefore avoid any undesirable consequences (incidents).

#7 Monitor for Events

Event Management technology can be used to proactively monitor the network and servers and generate alerts that warn IT operations of looming problems, e.g. when disks start filling up or servers start becoming overloaded. Provided that the technology has been configured in the right way, domain experts can be notified before service is impacted and they can take steps to return things to normal before there is an incident.

#8 Communicate Proactively

A good Service Desk will proactively notify customers when there is an incident that affects them, e.g. by a recorded message on the phone system, text messages, tweets, messages on the intranet, or a handwritten notice on the broken printer. Proactively notifying customers of an incident will prevent multiple customers from reporting the same incident. While technically this does not reduce the number of incidents, it does reduce the overhead associated with managing all those extra calls.

#9 Promote the Right Skills, Attitudes & Behaviours

Having IT staff who have the right skills and the right attitude to quality, will go a long way to compensating for not having good processes and quality controls in place. Conversely, good processes will not compensate for inexperienced staff or cowboys who don't give two hoots about what happens when their change is made to the production environment. Having the right skills, attitudes and behaviours is about having some good people management stuff in place:

- A clearly articulated and thoroughly communicated vision, mission & values for IT (what are we here for? how should we behave?).

- An IT Balanced Scorecard, individual,and team KPIs (measures) and SMART objectives (a big topic all by itself!).

- An effective performance management process that provides regular coaching and performance feedback; tries to match people's wants and skills to IT roles; provides appropriate training and development; provides timely reward and recognition; acts on decisively on underperformance.

- Has an effective staff selection and induction process (finding the right people and giving them a great start.

Every incident carries the risk of IT's reputation being eroded and confidence in IT being lost. So, what are you doing about it?

Back to top

© Silversix Pty Ltd 2011