Premise. Resilient data systems are essential to reducing digital business risk. These systems are engineered to ensure that data that must be live, is live. Analytic systems, the core for data-first organizations, must also be made resilient. In a multicloud world, data protection and true private cloud architectures are the base for data resilient multiclouds.
Executive Summary
Resilience is risk mitigation that is engineered into your IT infrastructure. It’s the confidence that your infrastructure won’t fail you, especially in times of crisis. Banking and financial services firms by nature of their business, must incorporate comprehensive risk mitigation into every layer of their infrastructure to remain viable.
IT resiliency to reduce risk is intuitive and not necessarily a new theme even as applied to the cloud. However, technology on a rapid evolutionary trajectory forces traditional or legacy approaches out, and inspires the development of new and more efficient replacements.
Implementing a comprehensive data resilience strategy requires an enterprise data environment built from the following principles:
- Ensure that the benefits of cloud computing are always available as organizations migrate to flexible hybrid and multi-cloud IT architectures;
- Guarantee that all data is always live, replicated, protected, complete, consistent, and available for all analytical, transactional, and other uses, subject to applicable security, policy, and compliance constraints;
- Automate 24×7 IT management; and
- Always provide a “single throat-to-choke” point of responsibility for end-to-end service levels.
True Private Cloud as an Enterprise Resilience Architecture
Resilience is the confidence that your IT infrastructure won’t fail you, especially in times of crisis. Wikibon refers to the target enterprise architecture for comprehensive resilience as a True Private Cloud.
This refers to an IT infrastructure that addresses the following mission-critical requirements:
- Responsibility: It provides a single “throat to choke” point of responsibility for managing the entire system end-to-end.
- Flexibility: It supports hybrid private-public cloud and flexible multi-cloud architectures without compromising the service levels that enterprise users would enjoy in fully on-premises environments.
- Interoperability: It ensures seamless integration of all resources and platforms from end to end across all functional service layers.
- Accessibility: It provides self-service access to elastic, scalable, robust pools of compute, storage, and network resources.
- Completeness: It provides simple, unified access to all data resources, from structured to unstructured, across all sources and platforms, both historical and real-time streaming.
- Consumability: It allows users to consume resources on a long-term commitment or on a pay-as-you-go basis.
- Robustness: It ensures robust disaster recovery, high availability, reliability, predictability, security, agility, and trustworthiness.
- Automation: It automates disaster recovery, system monitoring, event logging, capacity optimization, backup and recovery, predictive maintenance, load balancing, anomaly detection, root cause analysis, alerting and escalation, closed-loop issue remediation, and other operational functions to meet baseline 24×7 service levels.
Resilience is Critical to all Industries
True Private Clouds are fundamental to robust IT infrastructure in every industry.
Resilience is risk mitigation that is engineered into all your IT assets. It’s the confidence that your infrastructure won’t fail you ever, especially in times of crisis. Resilience that’s baked into the True Private Cloud ensures that businesses can weather those once-in-a-lifetime “black swan,” “perfect storm,” and other disruption scenarios that can put them and their stakeholders out of business permanently.
To evolve toward this resilience architecture, enterprises must take steps to ensure that their migration of data, analytics, and other IT infrastructures to the cloud environments are comprehensively resilient. The path to the True Private Cloud requires a keen focus on building unshakeable resilience into distributed data assets.
Management information systems and enterprise data warehouse systems have historically been viewed as “second-class citizens” among IT infrastructure platforms. Occasional failures of these systems were viewed as tolerated events. However, those platforms now act as the backbone data source for business-critical operational processes, such as risk monitoring and measurement, without which banks cannot operate for even a moment. It is no longer acceptable for organizations to suffer downtime or lack of 24×7 access to transactional systems and business applications.
The key resilience challenges facing enterprises include:
- Business resilience: This involves achieving robust profitability via sustainable cost reduction and control throughout business technology systems. It also involves having the agility to accelerate the data-driven process transformations necessary to compete in a world where consumer and corporate customers demand seamlessly digital cross-channel self-service capabilities.
- Experience resilience: This relies on robust performance of data analytics and other business technology platforms in Web, mobile, and other customer-facing digital engagement channels. An IT infrastructure that can deliver consistently excellent customer experience is absolutely essential to avoid the competitive damage that outages can cause to the institution’s reputation.
- Compliance resilience: Compliance in financial and other regulated institutions, requires robust reliability, timeliness, and accuracy of key IT and business operational reporting to regulators. A data infrastructure with assured high availability can help banking and financial services firms to avoid fines and protect their reputation in today’s increasingly demanding compliance landscape.
- Cloud resilience: This depends on robust movement of data analytics and other IT workloads to multi-clouds, mitigating against cloud-provider lock-in. The key to cloud resilience is the being able to use the right cloud for the right job and ensure that data is protected everywhere it’s stored and processed.
- Application resilience: This comes from meeting service-level agreements consistently across all applications – which depends intimately on iron-clad assurance of data availability, consistency, and recoverability throughout end-to-end IT infrastructure.
How Poorly Engineered Data Analytics Infrastructures Can Impact Resilience
In the real world, poorly engineered data infrastructure often impedes true multi-level resilience. The checklist in Table 1 spells out the symptoms.
SYMPTOMS | DISCUSSION |
Increasing operational overhead | Business bottom lines suffer when a poorly engineered data analytics architecture introduces operational risks and unprotected systems. |
Declining customer satisfaction | Customer experience suffers when businesses cannot respond to inquiries with personalized service because data is lost or unavailable. |
Longer compliance latencies | Compliance plummets and fines grow when regulated institutions, are unsure of their ability to deliver required report formats to regulators in a real-time, low-latency, and other time-sensitive scenarios. |
Excessive unplanned downtime | Cloud availability deteriorates when glitches lead to extended downtime, e.g. when transactional systems crash without a ready hot-failover system to bring online. Application service levels take a hit when the period of time needed to bring primary business systems back to full usability after an outage or other unplanned downtime continues to lengthen. |
Table 1: Symptoms of Inadequate Data Resilience
Failure to keep business data always available, live, replicated, protected, and consistent can have dire business consequences for enterprises. Here are two examples of data-related resilience issues faced by typical banking and financial services companies:
- Business and compliance resilience issues: Cost reduction requires banks to keep capital buffers at their lowest permissible levels. However, banks that fail to manage transactional-level data effectively may not be able to track capital requirements closely enough to ensure this, which may boost their buffers, hence their overhead. Low margins are a risk factor that may put the bank’s solvency and survivability in jeopardy. One European bank faced this challenge. For this €500B bank, adding capital buffers to its balance sheet amounted to around €50M per basis point. It had spent around €500M million on this challenge since 2001, and yet the data infrastructure in place—based on Teradata EDW, IBM Z, IBM DataStage, secureFTP and manual operations—was failing to keep up with the pace of change. If it could enable transaction-level risk analysis, it could potentially reduce capital buffers by €1B. It was also confronting difficulties in rolling up consistent, detailed transactional data across all functions—including risk, compliance, and finance, as defined in the Basel II BCBS 239. To the extent that their data had no single transactional source and had been inconsistently aggregated across disparate sources, the bank needed to add a “protective” capital buffer to their balance sheet to accommodate inaccuracies in what they must report to regulatory authorities.
- Experience, application, and cloud resilience issues: A balky enterprise data management environment may deprive banks of the agility to boost customer experience in its mobile, Web, and other digital channels. For example, a UK-based bank launched a customer-facing mobile app running on a Hadoop platform in its private cloud. The bank’s data innovation team brought the app to market fast to gain a competitive advantage on rivals who were introducing the same capabilities into their customer-facing systems. The app’s code embeds disaster-recovery capabilities. The desired outcome is to automate both low-delay recovery and application failover, cut the eight-hour disaster-recovery window to near-zero, and increase the app availability service level agreement to the top tier by moving to a private-cloud Hadoop platform. However, the app currently draws data directly from the bank’s core mainframe systems. Disaster recovery is provided through cold backup of the mainframe data, taking that private-cloud-based system offline for eight hours. But the bank is caught in a Catch-22 in its attempt to migrate away from the mainframe and move all transaction and balance inquiries to its newer Hadoop platform. Any resilience issues in the underlying cloud infrastructure will impact its ability to migrate data to the new Hadoop platform without compromising availability, and will similarly impact the customer’s experience. Meanwhile, maintaining the app codebase is proving to be a drag on development capacity, which in turn is causing the bank to miss its milestones in the migration away from its mainframe-based legacy data.
Data Protection Is the Heart of Comprehensive Resilience
Data is the fundamental business asset managed within IT infrastructures, and it must be resilient 24×7. Regardless of the underlying infrastructure, all data should always be live, replicated, protected, consistent, and available for all analytical, transactional, and other uses – subject to appropriate security, policy, and compliance constraints.
Accordingly, an enterprise data-resilience environment will conform with the hyperscale architecture presented in Figure 1.
Figure 1: End-to-End Data Resilience Architecture (source: WANdisco)
Considering all the moving parts one finds in a modern IT landscape, resilience can be difficult to ensure. A typical bank or financial services firm might incorporate any or all of the following data platforms listed in Table 2 in their various lines of business operations.
DATA PLATFORMS | EXAMPLES |
Mainframe | IBM z Systems, etc. |
Cloud storage | AWS S3, Azure Data Lake, Google Cloud Platform, IBM Cloud, Oracle Cloud Infrastructure, Alibaba, etc. |
Big data | Hadoop, Hive, etc. |
Data warehousing | Teradata, Oracle, etc. |
Stream computing | Spark Streaming, Kafka, NoSQL, etc. |
Business intelligence, reporting, and statistical analysis | Tableau, Microsoft PowerBI, Qlik, SAS, SAP Business Objects, IBM Cognos, Oracle BI Enterprise Edition, etc. |
Advanced analytics | Spark, R, TensorFlow, H2O.ai, IBM SSPS, SAS, etc. |
Table 2: Typical Data Platforms in a Banking or Financial Services Firm’s Environment
Banks and financial services firms are acutely aware of the escalating costs arising from data management across the expanding set of platforms. However, there is a perception among many of these firms that solutions are too hard to implement or too expensive to acquire, leaving them with suboptimal data management. Technical complexities, operational fragmentation, and competitive churn make it difficult for financial institutions to implement the unified enterprise-wide data resilience infrastructures needed to address these issues comprehensively. Frequently, personnel in a financial services institution’s various departments may rely on disparate analytics packages and myriad data sources.
This state of affairs tends to produce inconsistent, out-of-date, and error-ridden reports and other deliverables that expose the bank to significant operational and regulatory risks. By the time that banks have migrated to new, converged platforms to address the previous data management issues, regulations and reporting requirements may have changed, and the range and diversity of new data sources may have expanded. Moreover, the steady pace of business acquisitions, mergers, and divestments in which these organizations engage may leave their data environments in a state of perpetual chaos.
A True Private Cloud ensures comprehensive data resilience, with sustainable cost reduction, ongoing regulatory compliance, and the competitive advantages that derive from smart uses of business analytics. Whether you’re in the financial services arena or any other sector, building robust resilience into your IT infrastructure requires a multipronged approach that comprises:
- Business resilience rooted in a global data completeness and consistency. This involves maintaining a data analytics infrastructure provisions guaranteed, automated, and scalable access to replicated data assets across clouds. It supports arbitraging of costs across different cloud vendors, avoidance of lock-in to cloud providers, and persistence of live data in a consistent, reconciled fashion across the multi-cloud.
- Experience resilience, which requires data analytics infrastructure that supports the modernization of enterprise analytics capabilities, sourcing from mainframes to big data platforms. It should enhance digital engagement performance through real-time and advanced analytics. And it should support guaranteed, automated, and scalable data replication in lieu of manual disaster recovery processes.
- Compliance resilience demanding data analytics infrastructure that ensures integrity -hallmarks include: high availability, reliability, scalability, performance, and security of business-critical reporting and analytics platforms with guaranteed, automated, and scalable data replication and disaster recovery.
- Cloud resilience requiring data analytics infrastructure that makes analytics applications agnostic to the cloud locality which data assets are deployed. It should enable global access to data analytics. And, via guaranteed, automated, and scalable data replication, it ensures that data is always where it needs to be.
- Application resilience. This requires data analytics infrastructure that ensures high availability for scalable multi-cloud environments that thousands of compute and storage nodes and concurrent users, along with guaranteed, automated, and scalable data replication for near-zero recovery time objectives and recovery point objectives with guaranteed consistency.
Data Resilience Takeaways
Fundamental resilience issues stem from the proliferation of independent, fragmented data sources. Many banks and financial services institutions have failed to bring together their data platforms in a single fabric that aggregates, standardizes, and reconciles all data to mandated formats. Moreover, many have not yet rolled out an end-to-end resilience architecture to ensure that all data is always live, replicated, protected, consistent, and available for all uses.
CDOs and IT professionals at banks and financial service firms striving for data resilience are encouraged to architect a True Private Cloud with the capabilities presented in Table 3 below.
CAPABILITY | DISCUSSION |
Comprehensive data consistency | The True Private Cloud provides a scalable, shared pool of data across the enterprise in both hybrid cloud and multicloud environments. It requires support for traditional SAN/NAS file systems, as well as Hadoop for Hive and Spark, plus public cloud environments such as AWS S3 and AWS Snowball, Microsoft Azure HDInsight and Data Box, IBM OpenStack Swift, and Oracle OCI and BDCS. It requires the extensibility to accommodate new and legacy data types. It also requires a hyperscale architecture for scale-out across on-premise and cloud environments. |
Proactive data availability | The environment should ensure continuous synchronization and replication of data with guaranteed data availability, consistency, and integrity across all nodes, with simplified cloud migration to any supported public or private cloud without application downtime. |
Automated data protection | The data-protection mechanisms should support automated execution of enterprise data policies for cloud data management, replication, and protection and ensures stringent service-level attainment to support near-zero recovery-point and recovery-time objectives on data availability service levels. |
Table 3: Data Resilience in the True Private Cloud