Effective Approaches for Building IoT Digital Twins: Joint Development With IBM as Case Study

By George Gilbert | October 30, 2017

IBM’s Watson Digital Twin IoT solutions can improve and even transform customer businesses. However, IBM and the customer have to work closely together to build and operate these solutions. That close collaboration means IBM has access to the customer’s most sensitive data and intellectual property. IBM claims they will never share one customer’s data or IP with another customer. Users need to monitor that promise.

Building IoT Digital Twins (DTs) remains challenging for mainstream customers. The technology is immature and requires combinations of skills not widely available in either IT or OT groups. Our research shows that DT development and management is best when undertaken as a joint development between vendors and mainstream customers. However, the ultimate solution provider has to be responsible for the whole product in order to satisfy concerns such as interoperability and security, for example. Bringing all the necessary, scarce skills together for joint development requires intellectual property (IP) reuse across customer engagements.

Enterprise technology vendors such as IBM claim they can build industry and customer-specific IoT Digital Twin solutions without combining and sharing customer data the way consumer online services operate. Little analysis exists that examines that claim all the way down to the detailed level of machine learning model design, training, and operation. Users need coherent conventions and approaches to building and maintaining digital twin-based systems. (In order to be precise about DTs and IP, we’ve included a detailed guide to concepts in a separate document).

IP reuse isn’t new. Technology vendors and their customers have engaged in joint development for decades, especially in industry-specific solutions. But what’s different about joint development of DTs is that these solutions are among the most strategic software resources any company can develop. IoT solutions are strategic not only because they can drive deeper levels of consumer engagement and experience, but they can improve enterprise operations as well as enable new products, services, and business models. Moreover, because DTs often run as-a-Service by suppliers, users must ensure that future value is appropriated according to business and contractual expectations. Thus, DT strategies must feature contracting and relationships that provide for:

Joint development creates DTs with greater structural fidelity over time. IBM and the customer together define DTs with richer, higher resolution structure. Shared data plays a critical role in training the models.
Joint operation yields data that improves DT design and operational services long after go live. Ongoing operation of DTs and their machine learning models enables new business models built around after-market services. But now shared customer data risks extending leakage from design IP to operations IP.

Joint Development Creates DTs with Greater Structural Fidelity Over Time.

The recipe for building DTs requires a scarce mix of skills. Only by working together with customers can vendors such as IBM build these increasingly strategic solutions. But that skills scarcity means IBM has to rely on intellectual property (IP) reuse to leverage its talent shortage. And that reuse exposes customers to some degree of IP “leakage” with their competition.

Both IBM and its customers get powerful leverage from joint development of DTs

Joint development of DT applications today provides leverage for most customers and their technology vendors that no other go-to-market approach can offer. The technology required to build Digital Twins requires a mix of skills that very few companies currently have at sufficient scale. IBM has been a pioneer in commercializing machine learning and they have only 1,000 data scientists worldwide. Few companies in non-technology industries have large number of data scientists. Insurance companies are one example of an industry with pools of statisticians. These companies can probably train their actuaries in data science. But there aren’t many other industries with deep pools of similar, related expertise.

A full DT solution requires additional skills to come together: platform, industry solutions, and customer expertise. Technology vendors more typically build platforms. Some enterprise application vendors and system integrators build industry solutions but rarely with the depth of specificity required for DTs. GE is widely recognized for pioneering industrial IoT applications. The reality, however, is that they have the industry solutions skills and are struggling with the platform technology.

IBM is one of the few vendors that can combine machine learning platform technology with decades of experience building industry solutions between its Global Business Services and Industry Solutions groups. But the solutions those groups build are predicated on joint development. For now, when assembling all the required skills, joint development is the only way to build effective solutions for most customers and vendors.

Customers can’t avoid the risk of IP “leakage” to competitors via joint development

When a technology vendor such as IBM works with a customer in one industry, IBM is assimilating expertise they have to reuse with similar customers. If IBM doesn’t, they can support neither an industry solutions software group nor a global services group.

The industry solutions group is an applications business whose business model is built on repeatable solutions. Global Business Services (GBS) has a different constraint that requires reuse. The Watson IoT Consulting group within GBS has 3,000 people worldwide, according to a briefing on May 30, 2017. The 3,000 includes some fraction of IBM’s 1,000 data scientists. But the consulting group faces additional skills shortages. 70% of their customer engagements start with concept and strategy. But they can only maintain 5-8% of their headcount with the requisite strategy and architecture skills. IBM has to compete with other technology vendors to such a degree that they can’t recruit, train, and retain enough people with these skills to keep up with demand. This problem is likely symptomatic of IBM’s corporate-wide inability to grow its analytics businesses fast enough to make up for its more mature businesses. Reuse of intellectual property is the only leverage point left, according to Al Opher, the head of that consulting group.

Let’s take a look at jet engines as an example of how to take advantage of reuse in building a DT. Jet engines are a good example both because GE Digital has already articulated the value of such a solution and because they are among the most technically sophisticated products in any industry. They have three principal suppliers: GE, Rolls Royce, and Pratt & Whitney. At a high level, all jet engines share a common structure. Moving from front to back, every engine has intake fans, air compressors, combustors, and turbine blades. This level of commonality helps make it possible to create generic DTs for jet engines.

Joint development between IBM and an engine manufacturer that leverages commonality doesn’t have an ending destination. The process is a journey because the DT’s fidelity increases over time. Sensors embedded in new engines are a proxy for the data available to “fit” to a digital structure. Several years ago new engines had roughly 100 sensors. Today, new engines have 5,000 sensors. The combination of shared design principles and ever greater fidelity creates the opportunity for a vendor such as IBM to develop ever richer DTs for multiple engine manufacturers.

In IBM’s go to market model, the company’s same industry development and global business service teams work extremely closely with customers on joint development. As a result, IBM can’t act completely as a “clean room” development facility where no learnings transfer from one client to the next. The most precise way of describing the IP protection process is to understand that there are three components to a solution (see Figure 1 below), as Veeru Ramaswamy, IBM VP of IoT Platform Development, explains. The three are

the data inputs;
the analytic black box, which is the machine learning model; and
the insights that come as outputs.

The customer keeps all three, but IBM can take a copy of the black box. The black box is where IBM’s answer is actually somewhat nuanced. In machine learning terms the data inputs don’t just flow through the black box, or model, without “touching” it. The data inputs can actually “program” the black box and make it a richer model over time. So the models in IBM’s industry-specific DTs gain fidelity and value over time.

Figure 1: 3 Core IP components are input data; the machine learning model, which acts like a black box from the customer’s perspective; and the output predictions and prescriptions. If customers have exclusive rights to the input data and the output prescriptions, it’s easy to assume that customers own all the IP from their joint development projects. But in the age of machine learning, data can be just as important as humans in creating key parts of programs. The customer data flowing through models improves them over time. That improvement is where the risk of IP leakage exists. Customers must monitor just how much richer the models get during joint development.

We’ve seen how reuse works in theory. Now let’s take a look at a real example in the Watson IoT Consulting group. As part of a briefing in May, IBM explained that the company has always leveraged insights from multiple customers to build solutions. In the old era of “Smarter Planet” solutions, IBM built common data models (see Table 1), which they also refer to as semantic models, so that they could ingest data without significant additional engagement with each new client. Today’s IoT solutions are like the Smarter Planet ones except that they are real-time and have richer context from other sources of data. IBM cited its solution for engineering, procurement, and construction (EPC) as an example of IoT reuse. These are huge capital projects that cross multiple industries such as aerospace and oil exploration and refinery construction. They typically all have a common way of doing project planning and work breakdowns. Quoting the head of the Watson IoT Consulting group, “We harnessed all the work we did and harvested all the data associated with these engagements so that now we have semantic models for the EPC industry… Now, whenever we have an EPC engagement, we can immediately take their data and run it through our models to drive prediction and prescription… without going through a 6-8 month ingestion cycle.”

Using the model of a DT with three components: input data, an analytic black box, and insights or prescriptions as output, we can break down in Table1 what’s reusable in the black box with precision.

“Black Box” Artifact	Role in reusable joint DT development
Digital Twin / Knowledge Graph	A Digital Twin is actually a composition of many digital artifacts that collectively define the structure and behavior of a product or process. The Knowledge Graph is the representation that creates a common abstraction for a programmer to get at all these components. Using IBM’s terminology, the Knowledge Graph is roughly equivalent to the semantic models it has been building since its Smarter Planet Solutions.
Data model	Using the jet engine example, the top-level data model includes the intake fans, air compressors, combustors, and turbine blades. This data model grows in detail and fidelity with more customer engagements as it captures more of the component parts, otherwise known as the bill of materials. When IBM talks about how it “harnessed and harvested” all the data from many engagements with similar customers to build an industry solution, a common data model is the output.
Level of detail	The level of detail is related to the data model in that it captures the structure of the DT at different levels of resolution. The highest resolution corresponds to the visibility of lowest level components, such as single turbine blade. Structural fidelity increases with additional levels of detail.
Machine learning models	Machine learning models describe the behavior of the DT and have their own levels of detail. These models also grow in fidelity across customer engagements. The key metric for their fidelity is how closely the jet engine DT behaves relative to the models’ predictions. Under the covers, the multiple models might participate in an ensemble to describe how part of the DT works. In addition, each machine learning model might have more “features” or variables that describe how it should work. These features come from the training process and are part of how the DT grows in fidelity over time. According to Ramaswamy, IBM can share models with richer features across customers but not the “coefficients” or values of the variables in the machine learning models. The coefficients indicates the relative importance of each individual feature.
Canonical model	The canonical model corresponds to the part of the DT that IBM decides is generic and share-able across customers. For example, the engineering, procurement, and construction model is a canonical model. And a jet engine could have a canonical model that didn’t capture vendor and model-specific differences.

Table 1: Definition of the technical components in the “analytic black box”.

The process of developing a strategic IoT solution such as DTs highlights the leverage both customer and vendor get from jointly working on solutions. Reviewing the shared artifacts from IBM engagements as an example highlights the nuanced risk competing vendors take in participating in the process.

Joint Operation Yields Data That Improves DT Design and Operational Services Long After Go Live.

The same skills shortage that drives joint development is also driving joint operation. Unlike past versions of joint operation, this time vendors can get access to the critical data from ongoing operations. While vendors typically will not share this data, it informs and improves the automated decisions in the analytic black box through which it flows via continuous training. But even more crucially, the data enables IBM to augment those decisions. The operational data supports simulations that inform the design of ever more advanced models that can prescribe valuable “after-market” services. The same risks of IP leakage exist with these operational models as they do with the models coming from joint development.

IBM and its customers both get leverage when IBM jointly operates competing DTs

GE’s growth over Jack Welch’s 20 year tenure was built on capturing value beyond just manufacturing products. GE also focused on “other-market” services such as finance and maintenance, repair, and overhaul, which often represented more than 70% of total spend over a product’s lifetime. Operational data represents a similar, major new source of after-market value. Continuing with the jet engine example, a new model from Pratt & Whitney with 5,000 sensors throws off as much as 10GB of data per second. That level of data generation means a twin engine jet flying a 12 hour route can accumulate close to a PB of data. Many new, ongoing services are possible with that operational data.

DT IoT solutions consume and analyze real-time data. But operating the solutions can be almost as difficult as building them. This difficulty is partly because the solutions are still semi-custom and partly because the skills for operating the solutions are almost as scarce as they are for development. As a result, 80% of the Watson IoT Consulting group’s clients engage IBM for some amount of ongoing management. That means live customer data continues to flow through the analytic “black box” long after the original solution goes into production.

So what does joint access to the data mean? Just as generic DTs of a jet engine share a common structure, they also share common operational behavior. For example, the turbine blades all share operational metrics such as rotational speed, temperature, vibration, and air pressure. And just as joint development improves structural fidelity over time, jointly running the DT improves the DT’s operational fidelity.

Customers benefit powerfully from that greater fidelity. For example, IBM could work with engine manufacturers to collect operational data in real-time across the manufacturers’ entire installed base. DTs can grow much richer representations of how the DTs work over the products’ lifecycles. A DT of a jet engine can evolve to capture not only how it was designed, but how it was manufactured, delivered, operated, and serviced. Having this work across manufacturers would produce many times more data and, therefore, better prescriptions. Engine manufacturers would have new ongoing services to offer their airline customers. The manufacturers could prescribe highly specific operational best practices for the fleet, for particular routes, climates, and air quality conditions that wear the engines differently – even particular pilots. Prescriptions could optimize for uptime or fuel efficiency or other outcomes.

Customers can’t avoid risk of competitive leakage of after-market value-add services

While history doesn’t repeat itself, it’s often close enough to “rhyme”. Just the way joint development risks IP leakage, so does joint operation.

Let’s continue to use the metaphor of input data, an analytic black box, and insights as output. This time we are going to map operational data to the artifacts in a DT that benefit from joint operation and reuse.

“Black Box” Artifact	Role in reusable joint DT operation
Digital Twin / Knowledge Graph	At this level of abstraction, the Knowledge Graph presents a consistent representation of the ever richer behavioral functionality of a jet engine’s DT.
Operational model	Unlike joint development, joint operation allows a vendor such as IBM to continue improving the shared analytic black box after it goes into production. As more sensors capture more operational data across more instances of a jet engine, the behavioral fidelity of the DT increases just the way joint development improves its structural fidelity.
Level of detail	Operational level of detail, or fidelity, maps closely to the operational data coming from the rapidly growing number of sensors. The number of sensors is a proxy for representing the “resolution” of the operational model of more of the components within the DT.
Machine learning models	Access to operational data has the greatest impact on shared machine learning models. Instead of just using training data to build models at design time, ever increasing levels of detail enables data scientists to build more and richer models. Richer models represent DT operational behavior with greater fidelity. As machine learning models get richer, they can ultimately support simulations that enable new ways of modeling behavior, not just tweaking existing models. Machine learning richness can come in the form of more features that capture how to DT works as well as the use of multiple models working together in an ensemble.
Canonical model	A canonical jet engine DT ultimately represents the shared artifact with common operational attributes that is shared across product models and manufacturers. It gets richer over the operational lifecycle of products. That increasing richness ultimately represents the IP that can “leak” between competing customers.

Table 2: The artifacts used in joint operation of DT solutions.

Joint development of DTs can create highly strategic applications but the process itself creates risks of IP leakage. At the most concrete level, IBM says it can reuse the analytic black box in new customer engagements. But there is a precise boundary beyond which IBM won’t share IP. The weightings, or coefficients, of the features in the machine learning models remain with each customer. These weightings translate into how much each factor influences the output of the models.

Action item. These dynamics of joint development and operation will affect how systems and digital businesses are designed while the technology matures and the required skills are scarce. Customers need to be extremely precise in contractual negotiations with vendors such as IBM about the exact IP that can be shared. In fact, customers should include data scientists in the contract negotiations that their lawyers normally drive. Some level of IP reuse is going to be a fact of life. Customers should make sure vendor contracts give them a migration path to access future improvements in DT solutions vendors jointly develop with other customers. Not all improvements can be backward compatible but CT canonical models can support this critical process.

Article Categories

By George Gilbert | October 30, 2017

George Gilbert

George Gilbert, lead data & analytics analyst for theCUBE Research. Former Gartner analyst, former lead enterprise software analyst for Credit Suisse First Boston, one of the top investment banks serving the technology sector. Big Data analyst for Gigaom Research. Co-founded Techalphapartners, a consultancy that advised vendors and institutional investors on market development and product strategy. George has led conference panels with prominent thought leaders in cloud infrastructure and big data. He has been profiled on the front page of the Wall Street Journal and published as a guest author in a major overview of the evolution of cloud computing in The Economist. Prior to being an analyst, George was a product manager on Notes at Lotus Development. George received his BA in economics from Harvard University.

You may also be interested in

Dell data protection and cyber resilience for AI

Dell PowerProtect forms a Resilient Foundation for AI Infrastructure

Rob Strechay June 30, 2025

HPE Discover AI and Data flowing through the air of the Sphere in Vegas

HPE Discover 2025: Unlocking Agentic Infrastructure

Rob Strechay June 30, 2025

Cutting Edge Research, Analysis, Insights + Media

Studio Locations

Silicon Valley
989 Commercial St.
Palo Alto, CA 94303

Boston Metro
5 Mount Royal Ave.
Marlborough, MA 01752

Research Areas

Podcasts

Solutions

Engage

Stay Connected

theCUBE Research weekly

Stay ahead of the curve with the exclusive insights by our team straight to your inbox each week.

By submitting this form, you are consenting to receive marketing emails from: theCUBEResearch, info@siliconangle.com. You can revoke your consent to receive emails at any time by using the SafeUnsubscribe® link, found at the bottom of every email. Emails are serviced by Constant Contact