Contributing Wikibon Analysts
David Floyer
Peter Burris
Ralph Finos
Premise
Architecting data centers to satisfy the growing demand for high performance is challenging – primarily because the existing technology is running out of gas. This will require IT professionals to adopt new approaches for solving these demanding requirements.
Key system trends and challenges
System performance improvements are sustained by advances along four dimensions:
- Chip technology (e.g., process geometries and clock speeds);
- Chip architecture (e.g., CISC vs. RISC);
- Inter and Intra systems communication (e.g. Memory Bus speed and bandwidth, PCIe Gen 3, RoCE (RDMA over Converged Ethernet), NVMe, NVMf);
- System organization (e.g., greater use of parallelism).
For much of the computing industry’s 50+-year history, the industry largely relied on Moore’s Law, which predicted a doubling of performance due solely to chip technology every 18 months. However, this technology contribution has slowed down to a crawl. The fastest processor is about 5GHz (IBM Z mainframe), and the latest Skylake Intel processors in the range 2.5-4.0 GHz. The total computing density has continued to increase by increasing the number of parallel cores within a processor (up to 24), but with a negative impact of the processor clock speed. There are natural limitations of the number of cores that can be used to solve a problem (Amdahl’s law) causing hardware vendors to seek performance improvements elsewhere. While architecture advances are on the horizon, (e.g., CPU vs. GPU, FPGA for offloads), most performance improvements are catalyzed by advances in system organization, including:
- Server SAN to exploit hyper-convergence trends. First, storage was connected to specific servers and could only be accessed by requesting data from those servers. Then, storage was networked, granting data access to any server administered on that network. Server SAN provides the best of both worlds: local performance speeds and network-style access. It’s the basis for future storage advances in hyperconverged infrastructure.
- Flash storage to improve storage productivity. Flash storage devices use semiconductor memories to:
- Reduce disk latencies;
- Increase IOPS and bandwidth;
- Increase compute density with lower power and 15TB capacity drives
- Facilitate copying of live data for multiple applications, including dev/test;
- Remove the data-access application design bottleneck with new protocols (NVMe, NVMf, working on PCIe) which introduces many orders of magnitude in the amount of data that can be processed
- I/O parallelization to accelerate storage subsystem performance. For most of the industry’s history, disk performance doubled at between one-third and one-half the rate of semiconductor performance. Why? Because disks are electromechanical — they spin. As a result, the performance improvements that could be obtained through greater use of I/O parallelism were modest. However, flash drives change that equation, catalyzing important new developments in I/O parallelism.
Server SAN To Exploit Hyper-convergence Trends
Server SAN has emerged as a key leg of a compelling solution to performance slowdowns. Server SAN storage is closer to the processor, thus reducing the time data runs on I/O interconnects. As a result, Server SAN is rapidly replacing traditional array storage, a trend which will accelerate by the end of the decade. Hyperscale storage, largely server cloud and other service providers who do not use traditional arrays, is also growing rapidly.
Wikibon estimates the total enterprise worldwide Server SAN Storage market will be around $37 billion in 2026, a large majority of all storage spending, fundamentally replacing traditional disk arrays by the mid-2020s.
Flash Storage To Improve Storage Productivity
Adopting a flash-only strategy for active data will enable server SAN solutions to run even faster. Flash devices are much faster that traditional disk systems across all important metrics, including both read and write access times. How much faster? The latency of disk devices is measured in milliseconds; latency from flash devices typically is measured in microseconds. As a result, I/O times for flash devices are much lower than disk devices, other things being equal. By lowering I/O times you can improve CPU utilization. I/O wait time is the bane of a DBA’s and storage administrator’s life, and fast flash dramatically improves utilization of both servers and people.
For databases, I/O response time is especially critical, especially the predictability of I/O response times. I/O jitter, the variability inherent in disk-based solutions, is as important a factor to throughput as the average response time. A very few long outages can have a crippling effect on database throughput.
The savings in licenses, CPU time, and, most expensive of all, the savings in people time, required DBAs, systems administrators required to constantly monitor the I/O space for the database performance. From a total TCO perspective, it’s very easy to cost justify flash as best of breed for active data.
Another benefit of flash is that one can make a snapshot of the data, and use it immediately for other applications, which can’t be done with disk. For example, if you can snapshot a copy of the production database, you can immediately have 30 developers with a full snapshot of the main production database and working straightaway on the latest full copy vs. having to wait weeks for copies of the copy to reach all the interested parties.
I/O Parallelism To Accelerate Storage Subsystem Performance
Traditional storage arrays have tried to attack the parallelization problem by having a large number of processes dedicated to parallelization. A typical array may have 16 different processes parallelizing the data. One needs extra cores and extra processing power to make this sort of parallelization work.
However, there is a significant additional performance benefit to be had in deploying parallelizing IO technology to reduce the overhead within the Server SAN architecture itself. DataCore’s Server SAN offering is able to reduce the number of cores dramatically by enabling parallel I/O technology, which has completely rewritten all of the traditional I/O. The performance benefits are clearly shown in the SPC-1 test shown in Table 1 below. This benchmark compares the performance benefits of the DataCore parallel server technology and DataCore SANsymphony vs. the highest performing storage arrays. The SPC-1 IOPS benchmark data shows an enormous increase in throughput with Datacore. The price/performance is a factor of eight or nine better and the actual amount of storage required is less.
One of the most important aspects to a designer of a system is the I/O response time. The data shows measures of 0.28 milliseconds for the DataCore Parallel Server, 0.22 for the other DataCore SAN Symphony. Compared to the other systems, it’s around four times better. The reason this is so important is that the faster that I/O, the less the processor has to wait for IO (CPU wait time), the greater the throughput through the processor and lower the parallel overhead becomes. Running in concert with this is the fact that DataCore’s parallel IO architecture achieves much lower IO response times by the use of all the many cores in today’s Intel processors and has significantly reduced the path length of each IO. This is a truly virtuous circle that leads to less storage being required (more IO throughput per drive 3-6x), far fewer processors being required (4-8x), and overall the cost per IO is reduced to 12% of the traditional high performance storage arrays.
Every benchmark, of course, has its idiosyncrasies. Every workload is different. There is a need to be careful that this test workload will be a reasonable estimation of target workload. But the data does show that the I/O overhead for DataCore systems is very likely to be significantly less than other leading storage systems, and shows how IO architecture is key to reducing system performance.
Action Item
New approaches to advancing overall system performance must be studied and applied whenever possible. Server SAN, which moves storage closer to processors without giving up shared data access, is one advance. Deploying flash-based storage devices is another, especially in situations that demand either predictable database performance or large numbers of live copies. Finally, new technologies for increasing I/O parallelism such as DataCore’s Parallel IO can amplify the benefits of Server SAN and flash storage.