I had the opportunity to spend time at Dell HQ this week getting a deep dive look at some of their initiatives. One of the discussions that took place revolved around overall IT systems availability, which is always an important consideration. And it was also a good reminder that there are still a lot of IT shops out there that have a long way to go before they are able to maximize their availability potential. With that in mind, here are five things that CIOs should so in an effort to, among other things, improve that availability metric.
Automate to reduce human error
While this may come as a shock to some, people can and will make mistakes. The more that people have to touch things, the more likely it is that they will make mistakes. When mistakes are made in IT, they can have negative impact on the business by bringing down key systems.
For those that have yet to introduce comprehensive automation to IT, it’s time to practice what you preach! There are technology and services out there that can help IT automate routine tasks which can reduce errors and, as a result, improve availability. These technologies include things like identity management systems and data center orchestration tools.
A nice side benefit is that, with the right solution, you can also offload staff from having to perform manual processes and retarget them at more value-add activities.
Implement change management processes
Change can be a good thing, but when it goes unmanaged, it can wreak havoc in the data center and can also wreak havoc on other activities taking place in the organization. This is why a considered change management process is critical. The change management process forces additional planning and communication to take place ahead of changes. This planning ensures that all aspects of the change have been considered and the additional communication ensures that everyone is aware that the change is to take place. This provides an opportunity to ensure that a change will not impact an organizational activity,
Implement automatic workload migration tools
Just as humans make mistakes every so often, hardware fails every so often, too. Of course, when it does, availability suffers immediately. In traditional shops, it might take some time for a service to be brought back up. First, IT has to notice that something is down and then someone has to manually place that service back into operation. But, with services such as VMware’s Distributed Resource Scheduler, workloads can be automatically shifted around these kinds of failures and, while there may still be a few minutes of downtime, the impact is much less than it would be otherwise.
Review and revise backup and recovery systems and policies
What’s your current RTO? What’s your current RPO? Are you running backup and recovery software that best matches the recovery needs of the organization? Can you afford to lose a complete day’s worth of work if you’re still using traditional backup tools? Do you need to start thinking about continuous data protection tools to reduce downtime and data loss?
These are questions that CIOs should consider on a cycle as they look at the rest of the environment to determine whether or not it’s continuing to meet organizational needs.
Comprehensive monitoring
Items in your environment will fail. It’s gonna happen. It’s how you react to those failures that make the difference. If you have automated workload failover techniques, that’s great, but that original instance still needs to be returned to service at some point. You might be able to use workload orchestration technologies to automatically restart the original workload when a failure is detected, but you first have to actually detect that a failure occurred.
Even better, it’s nice to know when a failure is about to occur. Perhaps a perfectly preventable problem is poised to take place. For example, servers run out of disk space and, when that happens, workloads stop working. These kinds of issues are so easy to prevent with even the most basic monitoring solution, but too many run without, instead depending on administrators to notice when issues are about to arise.
Every IT shop should have basic monitoring in place and, if possible, comprehensive monitoring that goes down to the application level, down to the disk level, and as deep as reasonable to keep systems running and functional.
Summary
As IT continues to grow in importance, CIOs must remain ever-vigilant with maintaining and even increasing availability levels. After all, if a CIO can’t demonstrate that he can “keep the lights on” how can he be taken seriously as a business equal?