Premise
Deep learning developers are gravitating toward the leading modeling frameworks, most notably, TensorFlow, MXNet, and CNTK. In addition to having well-developed ecosystems, these frameworks enable developers to compose, train, and deploy DL models in in their preferred languages, accessing functionality through simple APIs, and tapping into rich algorithm libraries and pre-defined modular components.
Analysis
Today’s deep learning (DL) frameworks range widely in their support for core enterprise requirements. In addition to being commercially immature, most DL tools are unfamiliar to the average application developer, and even to those who have experience in the overlapping field of ML. Nevertheless, substantial developer communities and growing vendor ecosystems have begun to form around the most popular DL toolkits.
Sorting through this fast-evolving field can be daunting. The myriad of DL distributions, toolkits, and other offerings, as enumerated in this recent survey by The Data Incubator, makes it difficult to ascertain which might be best for your needs. Developers should place priority on the following requirements when searching for the right DL modeling tool:
- Enable coding of DL models in their preferred languages;
- Support access to DL functionality through simple APIs;
- Allow composition of DL models from rich algorithm libraries and pre-defined modular components;
- Facilitate efficient training of DL models on top of the developer’s preferred data lakes and parallel computing clusters;
- Accelerate compilation of DL models to the most efficient execution formats for back-end hardware platforms;
- Support automatic packaging and deployment of DL models to target back-end application platforms; and
- Streamline management of ongoing DL model release, monitoring, governance, and administration through well-engineered toolchain and DevOps platforms
Most of today’s leading DL toolkits were developed by leading solution and cloud providers to support the requirements of their internal product teams and IT staff. This describes the most popular open-source DL toolkits, including the Google-developed TensorFlow, the AWS-developed Apache MXNet, Microsoft Cognitive Toolkit (CNTK), and the Facebook-developed Caffe2. Nevertheless, substantial developer communities and growing vendor ecosystems have begun to form around the most popular DL toolkits. Just as important, the leading toolkits are supported in a growing range of commercial DL workbenches, platforms, services, and solutions from a wide range of vendors.
What follows are technical profiles of each of the most popular general-purpose DL open-source distributions: TensorFlow, MXNet, CNTK, Caffe2, DeepLearning4J, Torch, and Theano. Readers are also encouraged to explore any of the myriad of other DL tools that are in circulation, including Apache Singa, BigDL, Chainer, DeepDist, DistBelief, Distributed Deep Learning, DLib, DyNet, HP Cognitive Computing Toolkit, Keras, MatConvNet, Nervana Neon, NVIDIA Digits, OpenDeep, OpenDL, OpenNN, PaddlePaddle, PlaidML, PyTorch, Sonnet, TFLearn, and Veles. If your interest is in specialized DL toolkit for device-embedded mobile applications, you should also explore such distributions as TensorFlow Lite, Caffe2Go, and CoreML.
Table 1 provides an overview of the leading general-purpose DL modeling-tool distributions.
DISTRO | SOURCE | LICENSE | PLATFORMS | LANGUAGES | APIS | LIBRARIES | HARDWARE |
TensorFlow | Apache 2.0 | 64-bit Linux, macOS, Windows platforms, as well as Android and iOS mobile platforms | Python, C/C++, Java, Go, and R; third-party packages are available for C#, Julia, Scala, Haskell, and Rust | Keras, CUDA, SYCL | CNNs, RNNs, RBMs, DBMs, symbolic math functions | GPUs, CPUs, TPUs | |
MXNet | AWS | Apache 2.0 | Linux, Mac OS X, Windows, and Android iOS, AWS cloud | C++, Python, Julia, Matlab, JavaScript, Go, R, Scala, Perl, and Wolfram | OpenMP, CUDA, Gluon | CNNs, RNNs, RBMs, DBMs, LSTMs | GPUs, CPUs |
CNTK | Microsoft | MIT | Windows, Linux, and Mac OS X | Python, C++, C#, BrainScript | Keras, OpenMP, CUDA, NumPy | RNNs, CNNs, LSTM, batch normalization, sequence-to-sequence with attention | GPUs, CPUs |
Caffe2 | Apache 2.0 | Linux (Ubuntu), Mac OS X, and Windows | Python, C++, MATLAB | OpenMP, CUDA | RNNs, CNNs, LSTMs | GPUs, CPUs | |
DeepLearning4J | Skymind | Apache 2.0 | Linux, Mac OS X, Windows, Android | Java, Scala, Clojure, Python, or any other JVM language | Keras, OpenMP, and CUDA | RNN, CNN, RBM, DBM, algorithms, ND4J, Numpy for the JVM, DataVec, JavaCPP, Arbiter, RL4J | GPUs, CPUs |
Torch | Collobert, Kavukcuoglu, Farabet | BSD | Linux, Mac OS X, Windows, Android, iOS | Lua, the LuaJIT, C | OpenMP, OpenCL, CUDA | RNNs, CNNs, RBMs, DBMs, ML neural-net algorithms | GPUs, CPUs |
Theano | Montreal Institute for Learning Algorithms | BSD | Linux (Ubuntu, Gentoo), MacOS, Windows, CentOS 6, NVIDIA Jetson TXI embedded platform, Docker images | Python | Keras, OpenMP, and CUDA | CNNs, RNNs, RBMs, DBMs, LSTMs, auxiliary classifiers, optimization methods, other mathematical expressions involving multi-dimensional arrays, NumPy | GPUs, CPUs |
Table 1: Leading Deep Learning Modeling Tool Distributions
TensorFlow
Google released TensorFlow initially in November 2015, issued the first stable release (1.3.0) in August 2017, and the codebase’s current version (as of this writing) is here. The tool’s website is here and its GitHub is here.
As noted in this recent InfoWorld article, TensorFlow is extremely popular, being far and away the most-forked project on GitHub in the past 12 months and being in the top 4 in term of number of contributors. It is now incorporated into more than 6,000 open source repositories. By the same token, it’s beginning to provoke a fair amount of developer backlash.
TensorFlow was developed as an open-source codebase by the Google Brain Team for internal Google use, replacing its closed-source predecessor, DistBelief. Available under Apache 2.0 license, TensorFlow runs on the 64-bit Linux, macOS, Windows platforms, as well as Android and iOS mobile platforms.
Developers can program TensorFlow in Python, C/C++, Java, Go, and R, and third-party packages are available for writing TensorFlow models using C#, Julia, Scala, Haskell, and Rust. Developers can leverage TensorFlow’s native support for Keras, CUDA, and SYCL APIs. The distribution is geared to model training through supervised learning methods.
TensorFlow enables developers to build DL models as symbolic programs that instantiate dataflow graphs. TensorFlow models consist of neural networks, their gradients, and other complex expressions. TensorFlow models are directed graphs of vectors and matrices involving n-dimensional arrays of base datatypes. To accelerate development, TensorFlow includes a library of pretrained DL models.
TensorFlow enables efficient compilation of an entire DL computation graph, or subgraphs, for execution in a single session. To facilitate development of these graphs, the tool provides a library of symbolic math functions as well as common DL algorithms, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), Restricted Boltzmann Machines (RBMs), and Deep Boltzmann Machines (DBMs).
TensorFlow’s reference implementation—including clients, master controller, and worker processes–runs locally on a single OS/hardware platform. However, the code is architected to enable massively parallel distributed DL model execution across multinode clusters and clouds. The back-end TensorFlow graph-execution engine may distribute computations across any combination of graphic processing units (GPUs), central processing units (CPUs), and even hardware that implements Google’s own DL application-specific integrated circuit (ASIC) architecture, called Tensor Processing Units (TPUs).
TPUs enable high throughput of low-precision DL arithmetic (e.g., 8-bit) and is more optimized for inferencing and training of TensorFlow models. Available in Google Compute Engine, TPUs, now in their second generation, deliver up to 180 teraflops of performance, and when organized into clusters of 64 TPUs, provide up to 11.5 petaflops. Also available in the TensorFlow Research Cloud, Google provides TPU-based inferencing at no charge to accelerate open DL research.
Regardless of the mix of hardware back-end architectures, developers can access this parallelized multi-node execution from a single API, which enables them to use it to accelerate training or inference of TensorFlow models. A client DL application accesses a TensorFlow session API to communicate with a back-end master controller on the TensorFlow graph-execution engine. The master, in turn, schedules, distributes, and monitors TensorFlow computations that execute across one or more distributed worker processes. Within a TensorFlow computation session, each worker process arbitrates computational jobs’ execution on one or more specific hardware “devices”–such as a CPU core, GPU card, or embedded mobile TPU—under the master’s dynamic supervision and control.
Within the open-source distribution, users can access a visualization utility to examine TensorFlow DL graph structures and roll-up summary statistics on the execution across local or distributed environments.
Apache MXNet
Amazon Web Services (AWS)’s MXNet distribution was accepted into the Apache incubator in January 2017. The distribution’s website is here, its incubator page is here, and its GitHub is here.
The MXNet project now counts over 400 contributors, including developers from AWS, Apple, Samsung, Microsoft, Intel, Dato, Baidu, and Wolfram Research. It is also supported by such research institutions as Carnegie Mellon, MIT, the University of Washington, and the Hong Kong University of Science and Technology.
Currently available in version 0.11 under the Apache 2.0 license, MXNet runs on Linux, Mac OS X, Windows, and Android iOS, as well as in the AWS cloud. Developers can develop MXNet models in many programming languages, including C++, Python, Julia, Matlab, JavaScript, Go, R, Scala, Perl, and Wolfram. It supports OpenMP and CUDA APIs, as well as both imperative and symbolic programming methods. It includes a rich library of CNNs, RNNs, RBMs, DBMs, and Long-Short Term Memory (LSTM) networks, as well as pretrained models. The distribution is geared to model training through supervised learning methods.
Regardless of front-end language, API, or libraries, MXNet compiles all code to C++ for optimized back-end deployment. The distribution enables automatic scaling of distributed MXNet DL workflows across all available distributed GPUs and CPUs for high-performance execution in multi-node training and/or inferencing. In addition to handling multi-GPU training and deployment of complex models in the cloud, MXNet produces lightweight neural network model representations that can run on lower-powered edge devices, such as Raspberry Pi, smartphones, IoT devices (using AWS Greengrass), serverless environments (using AWS Lambda), and Docker containers.
Recently, AWS added a new set of simplified development capabilities under the “Gluon” initiative. Gluon provides a development abstraction for prototyping, building, training and optimization of DL models. The Gluon application programming interface, defined in Python, is agnostic to underlying DL frameworks and runtime engines—and, in fact, Microsoft has also committed to adding it to its own CNTK DL framework in the near future. The specification allows other DL engines to be plugged into the Gluon API without hurting the training speed enabled by those engines.
Currently available in MXNet 0.11, Gluon allows DL developers to:
- Prototype, build, train and deploy DL models in any framework and deploy in an efficient format to any target platform;
- Program models using a concise Python API, which reduces the amount of coding associated with any given DL project;
- Model DL models flexibly like any other data structure;
- Create DL models on the fly, with any structure, and change them rapidly using Python’s native control flow;
- Reuse pre-built, optimized DL building blocks, including predefined layers, optimizers and initializers;
- Allow developers to use standard programming loops and conditionals to prototype, build, revise and debug DL models;
- Free developers from having to know about specific DL compilation or execution details;
- Let developers easily track, debug, save checkpoints, and modify hyperparameters of DL models; and
- Optimize DL training algorithms automatically in alignment with model revisions.
CNTK
Microsoft released its Cognitive Toolkit (CNTK) under free MIT license in April 2015. The distribution’s website is here and GitHub is here.
Developed by Microsoft’s Technology and Research division, CNTK receives code contributions from many Microsoft product teams, including Skype, Cortana, Bing, Xbox. The distribution is used in production by Microsoft for speech recognition and image and text training.
CNTK runs on Windows, Linux, and Mac OS X. CNTK DL models can be programmed in Python, C++, C#, BrainScript, and from the command line. The distribution supports Keras, OpenMP, CUDA, and (for scientific computing) NumPy APIs. Its library includes algorithms and routines for RNNs, CNNs, LSTM, batch normalization, and sequence-to-sequence with attention, as well as pretrained models. It includes built-in readers for training DL models using multiple input files and for working reliably with massive datasets. Readers are fully customizable allowing support for arbitrary input formats. And, as noted above, Microsoft plans to bring Gluon’s abstraction layer to the distribution soon.
In support of training and inferencing workloads, CNTK scales efficiently from single-node environments to parallel execution across multi-machine, multi-node networks of GPUs and CPUs, including Azure GPU clouds. The distribution provides automatic hyperparameter tuning and automatic inferencing of a DL model’s optimal shape based on the characteristics of the data being analyzed.
In addition to supervised learning, CNTK supports unsupervised learning, reinforcement learning, and generative adversarial networks. It provides a plug-in architecture allowing users to define their own computation nodes within DL graphs.
Caffe2
Facebook released Caffe2 in April 2017 under Apache 2.0 license. Its website is here and GitHub is here.
Developed by Facebook as a superset distribution beyond the original UC-Berkeley-developed Caffe, Caffe2 runs on Linux (Ubuntu), Mac OS X, and Windows, and can run on cloud services inside Docker images. It enables programming in Python, C++, and MATLAB, and supports OpenMP and CUDA APIs. It also integrates with Android Studio, Microsoft Visual Studio, or XCode for mobile development.
Caffe2’s library includes algorithms for RNNs, CNNs, and LSTMs, as well as pretrained models. It supports parallel scale-out from single-node GPU or CPU platforms to multi-node GPU or CPU clusters.
DeepLearning4J
Developed by Skymind to support DL on Hadoop and Spark, DeepLearning4J is available under Apache 2.0 license. Its website is here.
DeepLearning4J runs on Linux, Mac OS X, Windows, and Android. It supports DL programming in Java, Scala, Clojure, Python, or any other JVM language. It supports the Keras, OpenMP, and CUDA APIs.
The distribution includes a library of RNN, CNN, RBM, and DBM algorithms, as well as pretrained models. Also included are the following algorithms: N-Dimensional Arrays for Java (ND4J), Numpy for the JVM; DataVec: Tool for Machine Learning ETL Operations; JavaCPP: The Bridge Between Java and Native C++; Arbiter: Evaluation Tool for Machine Learning Algorithms; and RL4J: Deep Reinforcement Learning for the JVM. In addition, DL4J can import neural net models from most major frameworks—including TensorFlow, Caffe and Theano–via Keras.
DL models built in DL4J can be scaled out from single-node to multi-node parallel execution. DL4J models are compiled to C, C++ or CUDA for execution on target platforms. They can be executed on works with Spark and Hadoop clusters involving distributed CPUs or GPUs, including the AWS cloud.
Torch
Torch is a scientific computing library with strong DL and ML features that was developed by Ronan Collobert, Koray Kavukcuoglu, and Clement Farabet. It is available under BSD license. Its website is here and GitHub is here.
Torch, currently in version 7, runs on Linux, Mac OS X, Windows, Android, and iOS. It supports DL programming in Lua, the LuaJIT fast scripting language, and C. It supports the OpenMP, OpenCL, and CUDA APIs. Torch’s library includes RNNs, CNNs, RBMs, and DBMs, as well as a broad range of ML neural-network algorithms and pretrained models. Torch’s community ecosystem provides packages for ML, computer vision, signal processing, parallel processing, and image, video, and audio processing.
As a scientific computing framework, Torch also includes tensors for such routines as indexing, slicing, transposing, type-casting, resizing, sharing storage and cloning; mathematical operations such as max, min, and sum; statistical distributions such as uniform, normal and multinomial; basic linear algebra operations such as dot product, matrix-vector multiplication, matrix-matrix multiplication, matrix-vector product and matrix product; energy-based models; and numeric optimization routines.
DL models and scientific-computing routines built in Torch can be scaled out and parallelized from single-node to multi-node back-ends consisting of GPUs and CPUs, as well as, via ports to iOS and Android backends, for embedding in mobile devices. Torch modules can be installed on platforms with LuaRocks, the Lua package manager which is also included with the distribution.
Theano
Launched by the Montreal Institute for Learning Algorithms (MILA) in 2007, Theano is a cross-platform DL and scientific computing tool. The institute recently announced that it has ended Theano development with the 1.0 release. Its website is here.
Available under BSD license, Theano runs on Linux (Ubuntu, Gentoo), MacOS, Windows, and CentOS 6, as well as in the NVIDIA Jetson TXI embedded platform and in Docker images. It supports programming in Python and supports the Keras, OpenMP, and CUDA APIs. Its library is the Lasagne model zoo, which includes CNNs, RNNs, RBMs, DBMs, and LSTMs, as well as auxiliary classifiers, optimization methods (Nesterov momentum, RMSprop and ADAM), and other mathematical expressions involving multi-dimensional arrays. It also integrates tightly with NumPy for scientific computing.
DL models and scientific computing routines built in Theano can be compiled for parallel execution as efficient C code in multi-node environments consisting of CPUs and GPUs.
Action Item
Developers should adopt DL modeling frameworks that allow them to prototype and program these models in their preferred languages, through simple APIs, and with rich statistical and scientific computing libraries. The leading DL development frameworks—most notably, TensorFlow, MXNet, and CNTK—all support fast, efficient training and deployment of sophisticated DL for diverse applications running in disparate cloud, application, and accelerator-chipset architectures.