GPU databases have been on the market for a few years, gaining steady though not overwhelming adoption in the high-performance computing arena.
GPU databases use the underlying chipset technology as a parallel processing data analytics accelerator. Most GPU databases incorporate a master-slave architecture, in which a central CPU-based node farms out subqueries in parallel to an array of GPU-accelerated database instances, each on a separate server. The individual servers execute their subqueries in parallel, send and result sets back to the master node. The master node then combines them and sends a single one back to the client.
Clearly, this architecture works best when you have data analytics applications that can be easily broken down into parallelizable tasks. The rise of the GPU database market is the consequence of continuing demand in GPU technology’s three core—and highly parallelizable–use cases: artificial intelligence (AI), gaming, and cryptocurrencies.
Considering that any existing DBMS architecture could conceivably add a GPU-accelerator, it’s not clear whether there’s a future for GPU databases as a stand-alone segment. For example, any PostgreSQL database can simply add the customer-scan PG-Strom provider module to leverage the parallel processing of GPU devices for sequential scans, hash-based table joins, and aggregate functions.
Nevertheless, this remains a very hot market niche. Wikibon expects the following incumbent GPU database vendors to make strong showings at this week’s GTC 2018:
- Brytlyt: This GPU-accelerated database and analytics platform can cost-effectively query multibillion row datasets in seconds. Built on PostgreSQL, the database is smoothly scalable, supporting flexible addition and removal of GPU-acceleration nodes. It is optimized for real-time insights on large and streaming data sets and supports easy integration with existing code, analytics, and visualization systems.
- BlazingDB: This GPU-accelerated data analytics platform runs fast, simple, SQL quries on massive data sets. Optimized for data warehousing workloads, the SQL engine runs on a cluster of distributed GPU servers and leverages multi-tiered storage media (RAM, SSD, HDD) throughout the compute clusters. It vectorizes SQL operations for execution on thousands of processing cores for each server in a cluster. It enables users to create separate GPU clusters for different workloads, all running off the same original data source but without anyone being able to impact others’ workloads.
- Kinetica: This GPU-powered data analytics engine uses in-memory storage and distributed processing, up to 6,000 cores in parallel. It can perform standard SQL queries on billions of rows in microseconds while concurrently visualizing results, executing machine-learning models, and ingesting large amounts of streaming data. It provides native support for geospatial objects and comes with a suite of geospatial functions for filtering data by area, by track, custom shapes and more.
- MapD: This GPU-powered SQL data platform uses in-memory storage and also leverages modern SSDs for persistent storage, It boasts micrsecod query processing performance in the billions of rows. It compiles SQL queries with a just-in-time LLVM-based compiler into machine code that can run on Nvidia GPUs as well as X86 or Power CPUs. It intelligently caches hot data in main memory and GPU VRAM. It parallelizes computation across multiple GPUs and CPUs, as well as execute entirely on CPUs. It replicates data across multiple servers for resiliency and redundancy to meet service level agreements.
- SQream: This GPU-accelerated columnar database processes trillions of rows in near real-time. Leveraging CPUs as well as massively parallel multi-core GPU nodes, it supports ad-hoc and high-throughput queries with fast data ingestion. Its automatic and transparent partitioning allows selective access to the required subset of columns, reducing disk and memory I/O when compared with standard row storage. And it is extensible for machine learning. The vendor has a wide range of partnership with cloud, data integration, edge computing, business intelligence, machine learning, and other vendors.
GPU databases may gain a new lease on life through performance-hungry use cases such as graph processing. That possibility was suggested last year’s launch of AWS Neptune, which incorporates the Blazegraph GPU database that the cloud powerhouse has previously acquired. This GPU-accelerated graph database supports RDF/SPARQL APIs and the Apache TinkerPop stack, processing up to 50 billion edges on a single system.
Nevertheless, this is a tricky time for vendors who wish to ground their continued success in GPU technology. Already, the AI market is finding cost-effective parallel processing alternatives to GPUs, and the cryptocurrency market is softening. Also, skyrocketing prices for GPUs themselves, as discussed in this recent Forbes article, could act as a further dampener on market growth.
Here’s Kinetica CEO Paul Appleby interviewed recently on theCUBE at Big Data SV 2018.