BA risks adding insult to injury: Boards & IT Crisis Management

June 3, 2017

I have been keeping up with the news on the IT outage at BA and noticed that it resonates with subject matter in a blog entry I wrote in February.

 

Many will know of the IT outage blamed in the press on power failure probably linked to human error at the British Airways data centre near Heathrow airport; please no jokes about location being in the flight path!

 

Of course, it would be great if we could have more clarification of the exact causes of the incident but it is certainly looking like there was either a single point of failure in the power delivery to IT or that someone negligently bypassed existing equipment as part of a maintenance activity.

 

I noticed that senior IAG and BA Executives, were expressing confidence in their colleagues in relation to their IT strategy and efforts in recovery. This is generally good practice in crisis management but it should not precede a clear explanation of the situation and will not resonate well with those who suffered hours, if not days of angst in delay, lost belongings and often complete cancellation of their holidays leading to huge costs. Separately, I note that we haven't seen any article yet commenting on the lost annual leave allocation for many affected.

 

There is much that boards can do to up skill but they face a massive problem in the lack of truly knowledgeable specialists who speak and can translate across three key languages; boardroom/executive{ese}, operational risk{ish} and practical/deep IT{an}.

 

This problem; at least in the UK, stems historically from the abject lack of good, government and business supported electronics and IT engineering training from the mid-80's to recently. This means that there is a massive skills gap with much true knowledge occurring in the over 50's or in the under 25's. That's a real problem for many organisations whose strategies are to divest of expensive over 50's workforce and who rarely regard under 25's to be suitable to advise executive decisions.

 

What exacerbates the situation is that quite understandably, the 25-year IT skills vacuum (the 'challenged 25') has been filled. Yes, there are some in the 25-50 age group who know their onions but they are relatively few and most have risen up; often through a series of sideways moves, with project management or planning roles, learning the jargon as they go but actually having little foundational understanding, insight or appreciation of how their organisations need to best leverage and maintain technology and manage cyber risk.

 

Perhaps Bill Francis, Head of IT at IAG, is indeed ahead of the curve in realising the skills gap and it was this reason and not for purely for that of saving money, which drove the decision to outsource British Airways IT capability. I do appreciate that situations can be complex but surely the unions are also a partner in ensuring suitable IT skills are available to the organisations their members serve? Certainly, I was lucky enough to have benefited from strategic cooperation of my first employer, the government and the unions during my electrical/electronics engineering apprenticeship in the 80's. Such collaboration seems much less strong or widely available in the UK now at it was then although improvement does seem to be happening at last.

 

A General Lack of IT Maturity

Some of the key risk indicators I use when assessing an organisation (of any size) in their IT and cyber risk maturity include ratios of end user/IT specialist and end user/ cyber security specialist. There are almost always complications in determining these ratios accurately, for instance, just because a large organisation has a large IT complement it is often the case that much of the workforce is in the 'challenged 25' and so some key analysis is required to uncover the real knowledge and skills. Another complication involves IT outsourcing and so determining the actual, adequately skilled, FTE, needs some work. However, once complete these analyses are revealing and pretty much always indicate a low ratio and a consequently high risk.

 
A General Lack of IT Appreciation

I read a recent comment on an LI post from a gentleman for whom I have a lot of respect (and who shall remain nameless) that "IT must be our slave and not our master". This suggests to many not confident with IT, that there is an actual need for priority and it reinforces the tongue in cheek, sometimes Luddite approach apparent in many boardrooms that often belies under-funding. The reality is that IT is an extension of ourselves; as individuals and collective organisations. IT extends our capabilities in ways that slave/master relationships can (thankfully) never achieve. Essentially, an organisation's IT is now a direct reflection of the organisations true capability. No amount of staff expertise alone can change this fact. A small organisation with good IT and cyber risk management is better in capability to any size of organisation that lacks it. Many business leaders and governments still do not appreciate this reality and consequently do not suitably invest, yet their organisations continue to suffer loss as a result. The NHS is o e of many blatant examples.

 

For many years now I have been of the depressing view that organisations big and small almost always under-invest in IT. This is another consistent cyber risk indicator in my CYBER3 assessments and when helping me determine the real investment their organisation has; many executives are a bit shocked to learn that those consistently suffering cyber incidents have a low ratio of IT spend to revenue. This I find startling! You would not expect such a view when considering spend on upkeep of physical assets yet many consider IT to be a necessary evil run by staff who should be seen and not heard. Some even think they could operate their business without technology when the stark reality is that their business is technology. This is no longer a question of naivety. It is nothing short of blind ignorance which is still too often evident in the highest positions in society.

 

Because many boards and the organisations which support them; including the government and regulators, appear to have been sleep walking into this situation it means that even the largest organisations now have big, legacy IT problems with which to contend and they are not purely related cyber attack from external criminals but actually exist in the core of their organisations implementation and use of IT; a problem which will only get worse as layer upon layer of new technology is added (IoT adoption the latest layer of risk), with no route-branch redesign action. To solve this problem there needs to be radical and bold steps taken to sort the wheat from the chaff in IT and cyber security departments nationwide.

 
Specifically on the BA Incident

Here are my observations:

 

Five reasons why it is often not a good idea to:

  1. Congratulate senior staff for IT strategy and recovery of systems when it is likely the reasons for a lack of expected IT understanding and scenario planning will reveal problems in such plans pretty soon. In short, get your investigation complete then congratulate those deserving of praise.
     

  2. Fail to adequately train customer-facing staff in business continuity planning and exercises involving total systems outage.
     

  3. Maintain a relative PR blackout relating to non-malicious incident. From dealing with many similar incidents, I can attest that it is often possible to be clearer about the situation and the steps to recovery without prejudicing the situation.
     

  4. Make statements to rebut speculation, in this case relating to cost-cutting, when at the same time giving other statements that an investigation is still under way.
     

  5. Amplify, through your actions, the ignorance for IT that your senior management team may suffer as this will reinforce any concerns your customers may form in both the short and long term.

Five reasons why it is often a really good idea to:

  1. Ensure systems are properly upgraded and renewed and that systems assumed to be fault-tolerant are properly tested. This includes interfacing review and renewal, too often used as a reason not to update.
     

  2. Ensure that changes to Facilities provisioning, incl. power, cannot catastrophically affect critical systems. Ensure that Facilities Management, IT, Business Continuity, Risk Management all inter-operate effectively. Be careful of blaming third-parties for a failure which regardless should not have affected you so catastrophically.
     

  3. Ensure your Business Continuity Planning (BCP) includes adequate scenario modelling that covers operator/process error and complex systems inter-dependence; from the 'mains to the brains' (of the end user). Those responsible to include solid IT skills and not be solely project managers. Be ready to pay high rates for the top-shelf IT knowledge you will need.
     

  4. Have at least one executive and one non-executive on the board with actual practical IT and cyber risk management experience; project management or IT procurement/planning will often not be sufficient. If you cannot find this knowledge then contract it. Part-time high-quality knowledge will always trump second-rate permanents.
     

  5. Task the board to take a much more informed strategy driven from a Business Information (the cart) perspective where the latest IT (the horse) is used to drive it. In the information revolution, it simply doesn't make good business sense to operate sub-optimally for the sake of saving significantly less that what you can lose when your IT fails. Ensure your insurance cover is also optimally aligned to digital risk.

As always, hope these comments help and inform. They are just a view :-)

Please reload

Archive
Please reload

Search by Tags
Please reload