A Guide to Concepts in Digital Twins

By George Gilbert | October 31, 2017

Premise. The digital twin programming concept will extend well beyond IoT. It presents a richer representation of real things that traditional programming technologies. Users need conventions for core concepts and how they fit together.

In our conversations with the Wikibon community, we hear both interest and confusion regarding the notion of digital twins. We believe digital twins will have enormous impacts on IoT — and future classes of digital business systems. Adopting these notions, however, requires coherent conventions, which we present in Table 1.

DT IoT Artifact	Definition
Digital Twin (DT)	Wikibon’s definition of a Digital Twin is a representation or model of a product, process or service, customer, supplier – any entity involved in a business. IBM’s definition is a bit narrower. To IBM, a DT is a working model that digitizes the operations of a physical product and its subsystems, including mechanical, electronic, and software. The DT was also meant to capture the structural attributes and behavior of an entity as designed, built, tested, deployed, operated, serviced. The DT can also be considered a rendering into digital terms where the fidelity is a promise and it grows over time.
Edge Device	The edge device is typically the physical asset represented by the DT. Sensors instrument the edge device and analytics take place locally in order to achieve the lowest possible latency. Analytics include predictions from machine learning models that are usually trained in the cloud where the richest and largest datasets reside. Edge devices communicate with each other via a high-speed backplane such as the wheels and breaks in a car that are self-adjusting in order to avoid locking.
Gateway controller	This server typically connects to multiple edge devices. The server is responsible for ingesting data coming from the sensors on the edge devices or physical assets, analyzing the data, and then “programming” the edge devices through their DTs. The gateway controller can aggregate and filter data from multiple edge devices. The filtered data represents a small fraction of all the sensor data but it’s what is necessary to publish to the cloud for future retraining of the models. The gateway controller also has a user interface to configure sensors on the edge devices, provision and manage software on the edge devices including models trained in the cloud or updated locally. The administrator for gateway controllers is from operations technology (OT), not information technology (IT), who tend to applications in the data center or the cloud.
Operational model	The operational model collects the sensor data to create the behavioral representation of the model. The operational model captures the range of operational states of a device (elevator open, closed, opening, closing, moving up, moving down). Simulating or gathering actual data from an experiment (like a vehicle in a wind tunnel) creates an operational model. The operational model is used for creating the machine learning model(s). The ML models can be used for prescriptive suggestions such as for maintenance or to suggest a better product design. With more data or simulations, the operational model improves in fidelity over time.
Data model	The data model represents the structural properties of the Digital Twin but not its operation. This structural representation gets richer over time. Ultimately, it is similar to a bill of materials for a discrete manufactured product.
API	Exposes some or all of the operational model of the DT to developers. It should conform to the data model for maximum developer usability. There is a “downward-facing” data ingestion API, a backplane that does analysis (CEP, predictive, or prescriptive) and then publishes the output through an “upward-facing” API that other applications consume.
Level of detail	The hierarchical structure that organizes multiple DTs and their APIs and data models. For example, the DTs for four anti-lock breaks fit within the DT for the drivetrain of a car.
Machine learning models	Models can correspond to multiple levels of detail in a DT. At the lowest level, a model might correspond to a valve on a pipe which has a sensor reporting on the volume flowing through it. At a higher level, a model might correspond to a car, though there are likely to be additional models at lower levels of detail. Models can be either predictive or prescriptive. The models explain what is happening, what will happen, and with prescriptive models, you can adjust the inputs to get the optimal output. In other words, prescriptive models let you perform simulations. With a car, you can look at wind tunnel data across multiple simulations or physical experiments and optimize it for a balance of wind resistance and styling. You base the mechanics of model building on the observations in the experiment or simulation. The process also works at multiple levels of detail, like modeling the brakes which are part of the drive train which is part of the car. For each model, you do feature selection & engineering, and training. How much each feature weighs or contributes to each model is part of the training process.
ML model features	Features are the drivers or independent variables in an ML model. You can think of them as the knobs that represent volume, treble, and bass that collectively drive the sound output of a stereo. Data scientists select and engineer the features t Challenge = risk of quality of DT representation accruing to competitors thru IBM they believe will drive the most accurate answers when fed first training data and then live data.
ML model feature coefficients	Features have coefficients, more colloquially known as weights or values, that adjust the weight of each feature in the ML model. While features are knobs on the stereo, their coefficients correspond to how the knobs are tuned. The training process for a model typically sets these coefficients so that new recordings are sound faithful. A high number on the bass produces a deep sound, independent of the volume level.
ML model hyper-parameters	Hyper-parameters are the metadata that describe such things as the structure of the model or its learning rate so that it can best fit the data. Data scientists typically set these parameters manually while the model’s features’ parameters come from the data that trains the model.
ML model hyper-parameter coefficients	Hyper parameter coefficients adjust the weight of each of the hyper-parameters, much the way feature coefficients adjust the weight of each of the features.
Knowledge Graph	Creates common abstraction layer in the form of a data model and API that integrate multiple component data models and APIs – like the structure model from a CAD design, the operational model based on behavior observed by sensors, the maintenance model predicted anomalies in the operational model, etc. The KG knows how the pieces fit together and provide semantic consistency to an operator and a developer. In addition to a generic version, there are customer-specific extensions. But all customers should be running an instance of the canonical knowledge graph.
Canonical model	Represents the generic version of a DT or data model or Knowledge Graph that has no customer-specific extensions.
Security	A typical rule of thumb for enhancing security is to minimize the surface area of an object such as a DT. This can be challenging with DTs because many industrial devices in operations achieve security through physical isolation from traditional IT networks.
Backward compatibility	Enhancements to the DT co-developed at one customer require that prior customers be able to upgrade to the canonical model of this most recent DT. Prior customers should also be able to add back their specific extensions to the canonical model without breaking compatibility.

Table 1: Glossary of Digital Twin artifacts

Article Categories

By George Gilbert | October 31, 2017

George Gilbert

George Gilbert, lead data & analytics analyst for theCUBE Research. Former Gartner analyst, former lead enterprise software analyst for Credit Suisse First Boston, one of the top investment banks serving the technology sector. Big Data analyst for Gigaom Research. Co-founded Techalphapartners, a consultancy that advised vendors and institutional investors on market development and product strategy. George has led conference panels with prominent thought leaders in cloud infrastructure and big data. He has been profiled on the front page of the Wall Street Journal and published as a guest author in a major overview of the evolution of cloud computing in The Economist. Prior to being an analyst, George was a product manager on Notes at Lotus Development. George received his BA in economics from Harvard University.

You may also be interested in

AI Infrastructure Breaks Away From Cloud-Native Models as Enterprises Chase Production Scale

Paul Nashawaty April 8, 2026

Research Report – Nutanix and the Emerging AI Infrastructure Stack

John Furrier April 7, 2026

A Guide to Concepts in Digital Twins

Article Categories

George Gilbert

You may also be interested in

Research Report – Nutanix and the Emerging AI Infrastructure Stack

Studio Locations

Stay Connected

Research Areas

Podcasts

Solutions

Engage

theCUBE Research weekly

A Guide to Concepts in Digital Twins

Article Categories

George Gilbert

You may also be interested in

AI Infrastructure Breaks Away From Cloud-Native Models as Enterprises Chase Production Scale

Research Report – Nutanix and the Emerging AI Infrastructure Stack

Book A Briefing