Optimizing applications for the sprawl we call the Internet of Things (IoT) is a daunting challenge.
To craft high-performance IoT apps, developers need a federated environment that distributes algorithmic capabilities for execution at IoT network endpoints, also known as “edge devices.” Federation is essential because many IoT edge devices—such as mobile phones—lack sufficient local resources for storing all data and executing all the algorithms needed to do their jobs effectively.
Key among the capabilities being federated to the IoT edges are machine learning (ML), deep learning (DL), and other cognitive-computing algorithms. These analytic capabilities enable IoT edge devices—such as drones, self-driving cars, and industrial robots—to make decisions and take actions autonomously based on locally acquired sensor data. In particular, these algorithms drive the video recognition, motion detection, natural-language processing, clickstream processing, and other real-time pattern-sensing applications upon which IoT apps depend.
Federated decisioning, driven by device-embedded ML/DL, is the essence of a well-tuned edge application. As my Wikibon colleague David Floyer recently remarked on theCUBE:
“When you look at the amount of data that is expected to come from video, Wikibon did some research a few years ago looking at the amount of data from different types of sensors, and video is, around 40% of all the data is going to be coming in from video….[T]he interesting new thing is then, where do you make that [sensor-driven] decision? Do you take a snapshot of that head, make a thumbprint, and send it up to the cloud? Do you send it to some other process locally? Or do you keep it very close to the camera? And most of, most people agree that the final landing point for this technology of recognition is going to be in the camera itself. Why? Because that’s where most of the data is. It’s raw data, it’s the original data, you want to make the decision there and then. That puts a time scale on the decision. So you’ve got to put in a fair amount of compute power, into that device. In this case, the camera itself.…perhaps also with a real-time streaming feed of decision logic or instructions from some application in the cloud that’s looking at the broader pattern related to patterns that it sees coming in from sensors at other edges.”
Here are the key scenarios where federated management of ML, DL, and other IoT data analytics workloads will prove essential:
- Facilitate embedding of algorithmic capabilities so that IoT devices and apps can adapt continuously and react locally to their environments and, as needed, to metrics and commands from neighboring devices;
- Enable developers to rapidly access, configure, train, and tweak any algorithm, including those discovered on the fly in neighboring nodes, that is suited to the IoT analytics challenges they face;
- Execute algorithms on any size device at the edge or at gateways in containers on widely supported processing frameworks, including Spark and Hadoop, within a distributed IoT fabric;
- Analyze data as it streams at the device level and then move it rapidly to federated cloud-based platforms for storage, thereby eliminating the need to retain it locally;
- Support execution of selected IoT analytics workloads locally, reducing or eliminating the need to round-trip many capabilities back to federated computing clusters in the cloud;
- Perform spatio-temporal sensor-data analysis and summarization at the IoT endpoint, and send the rest to a federated IoT data lake or log database for further analysis, storage, and archiving;
- Cleanse incoming sensor data at the endpoints, such as by imputing missing values, and then forwarding to a federated IoT gateway for cross-endpoint normalization, aggregation, inference, and analysis;
- Access data from any streaming data platform, as well as from any RDBMS, hub, repository, file system, or other store in any cloud, on-premise, or other at-rest data platform;
- Tap into the massively parallel computing power of the cloud computing clusters at the heart of distributed IoT environments;
- Scale IoT applications to support any size, volume, and speed of data from any federated IoT device anywhere on the planet;
- Source data from any federated device, sensor, database, stream, and middleware fabric in the IoT;
- Correlate events at the application level across the vast span of the IoT;
- Distribute execution of analytic algorithms dynamically out to disparate IoT edge devices and gateways in order to maximize end-to-end application speed, throughput, and agility;
- Enable robust, secure, and efficient performance, job, state, and health management under centralized administration across heterogeneous IoTs, clouds, devices, and apps;
- Expose a functional API that simplifies development, testing, and deployment of algorithms and other complex application logic artifacts that are being deployed to the edges of a complex IoT application; and
- Aggregate a library of prebuilt IoT algorithms, maintained across federated repositories, to speed developer productivity in the building and maintain edge applications.
To keep data-driven IoT apps fit for their core purposes, developers need federated infrastructure for training and updating of the ML/DL algorithms that come embedded on edge devices. In this regard, an important announcement recently came from Google Research. It announced that it is testing a federated ML/DL algorithm training capability for Android phones. Google Federated Learning uses a scaled-down version of TensorFlow on the device to support the collection, training, and updates to and from Android devices. The company is currently testing Federated Learning on Gboard for Android. The service provides real-time suggestions as the user continues typing.
How does it work? Google Federated Learning uses crowdsourcing to drive on-the-go, continuous retraining of device-embedded ML/DL algorithms across IoT fogs. Each device evaluates the latest fog-delivered ML/DL algorithm against locally acquired sensor data and sends updates securely to a coordinating server in the IoT fog. The server averages the updates from from all devices to which it has federated access in the fog, retrains the corresponding ML/DL model to improve performance, and sends the updated model back to devices for immediate execution. All of this takes place in real-time in a manner that is transparent, non-disruptive, compressed, secure, low power, and low latency. The coordinating server uses device-acquired data without compromising user privacy.
For further depth on the challenges of optimizing IoT edge applications, here’s an excellent recent Wikibon Research report by David Floyer.