Formerly known as Wikibon
Close this search box.

2017 Big Data & Machine Learning Predictions

Premise: Machine learning applications that deliver strategic differentiation will remain science projects, but mainstream enterprises will at least be able to see a path to their adoption starting in 2017.

The challenge facing the big data arena in 2017 is complexity. Why are so many pilots failing? Because setting up big data infrastructure consumes significant resources and diminishes the focus on delivering the actual big data outcomes. Why are developers still largely on the sidelines, despite big data being around in some form for nearly a decade? Because toolset complexity continues to grow as open source-based start-ups try to position themselves into sustainable niches. Why is it difficult to leverage big data successes across multiple applications? Because analytic pipeline complexity often takes bespoke forms that reduces opportunities to leverage pipeline investments to other uses.

The general theme for our 2017 big data and machine learning predictions is that the industry starts addressing these and other forms of complexity directly. Will this solve all complexity-related problems? No. Too many shops still struggle to develop the type of concrete use cases required for big data investments to succeed. But until infrastructure, tool, and pipeline capability complexities are addressed, most companies won’t get the chance to excel at defining use cases.

The need is clear. Here are actions we predict big data players will take in 2017:


  • The leading public cloud vendors drive machine learning apps mainstream.
  • Machine learning reaches beyond data scientists to mainstream developers.
  • Strategic machine learning apps remain science projects that IBM, Accenture, and Palantir can deliver to mainstream customers.
  • Live machine learning models with data feedback loops become the source of sustainable differentiation for enterprises.


1 – The leading public cloud vendors drive machine learning apps mainstream.

While the big data ecosystem has rapidly invented and delivered an impressive array of big data tooling, the conventions for how to fit all this great software into reliable, repeatable, and leverageable analytic pipelines doesn’t exist. Public cloud vendors are in the best position to take machine learning applications mainstream because they can:

  1. Simplify how to fit the building blocks together.  They can coordinate product roadmaps across a very wide-range of services that constitute the analytic data pipeline at the heart of big data and machine learning applications. That top-down coordination can minimize the “seams” between the services, offering greater simplicity to developers and administrators. In fact, building the products exclusively for operation in the public cloud can turn the products into SaaS applications that have minimal administrative demands.
  2. “Feed” an enterprise sales and service organization. Larger vendors have deep enough product lines and the market presence required to pursue large deals, which means they can pay for direct sales forces. That go-to-market strategy is critical for helping mainstream companies evaluate, architect, deploy, and operate custom or semi-custom machine learning applications.
  3. Design their technology to minimize the cost of operation. Azure runs millions of instances of SQL Server databases. If the Azure version of SQL Server required traditional ratios of on-premises dba’s, it would need tens or even hundreds of thousands of dba’s. The design goal for cloud services is lights-out operation. While public cloud customers will face the trade-offs of less choice and greater lock-in, Wikibon believes that will be attractive relative to finally making the technology accessible to those without a large staff of data scientists and specialized admins.


2 – Machine learning reaches beyond data scientists to mainstream developers.

Machine learning to date (outside large Internet vendors) has mostly been “science projects” with highly-trained data scientists building predictive models. Several challenges make this a science project. First, mapping business data to machine learning algorithms requires close collaboration between data engineers and data scientists, with the latter in extremely short supply. Second, once the models are built and trained, operationalizing these custom models from machine learning development environments features poor tooling and fragmented technology. These two factors have severely limited adoption in mainstream enterprises. Mainstream developers need all this complexity hidden behind simple API’s.

Within a still emerging ecosystem of machine learning technology, basic capabilities are being “packaged” into API’s that mainstream developers can use without collaborating with data scientists. For example, IBM’s Watson API’s now include a framework for building a chat bot and integrating it with Facebook Messenger, Slack, and other services. Several years ago, conversational bot technology belonged in the realm of the biggest tech companies. With Watson, after uploading the data, any developer can call the API from any application. However, the crucial data feedback loop that improves the model as it’s used over time doesn’t exist yet. It’s on IBM’s roadmap, but it’s not part of the current package. Public cloud vendors are introducing many other out-of-the-box predictive services as well. For example, recommendation engines accessible via API’s are widely available. But they are typically limited to recommending products from people with overlapping likes. As the simple API’s present ever more sophisticated machine learning capabilities, mainstream developers will be able to adopt them with less and less friction.


3 – Strategic machine learning apps remain science projects that IBM, Accenture, and Palantir can deliver to mainstream customers.

Netflix’s recommendation service is an example of an application delivering strategic differentiation. Netflix also qualifies as one of the big Internet vendors with enough in-house skills to build such an application that few other companies can undertake. Unlike out-of-the-box recommenders, Netflix ranks recommendations by finding similar people and movies based on their unique attributes, such as a movie’s genre, storyline, actors, and director. Netflix can also learn in real-time while a user is browsing the library. The recommender even highlight some of the “long tail” of its movies. The more time users watch movies other than the most popular, the more bargaining power Netflix has with its suppliers. This type of learning application is far from being available as an off-the-shelf solution. Out-of-the-box recommenders typically just take a user with a partial list of likes and fill out the list based on similar but more comprehensive lists from other users.


The machine learning chops to build applications approaching Netflix’s sophistication belong to high-end professional services firms such as IBM, Accenture, Palantir and boutiques such as Pivotal Labs and Silicon Valley Data Science. The critical skills include mapping industry- and company-specific information such as grocery inventory and replenishment data to learning algorithms that have variables and parameters that are meaningful only to data scientists. Other critical skills include training those models, integrating them with operational applications, taking the data from the predictions and comparing it with the application’s results and continually retraining the models. This data science functionality isn’t available in packaged applications other than specialized verticals such as ad-tech.


4 – Live machine learning models with data feedback loops become the source of sustainable differentiation for enterprises.

Prediction #3 above about strategic machine learning apps is all about rare skills. Data scientists and data engineers have to map machine learning algorithms to domain-specific data in order to create a predictive model. But having the right mix of skills to build these models is table stakes. It doesn’t create sustainable differentiation.

Sustainable differentiation requires data feedback loops that come from operating an application that continuously improve the predictive model. That connection between the software and the data is a living model that gets better with use. It also exhibits network effects. The more users the living model has the harder it is for a competitor to start later and collect similar data volumes data and kick-start its own self-reinforcing feedback cycle. There are two types of feedback loops, one features strategic data that an enterprise wouldn’t want to share, and another one is share-able because the data is non-strategic.

  1. Strategic feedback loops require privacy: Bloomberg has traditionally been a market data provider. But some of its most sophisticated customers are exploring hosting their predictive models with Bloomberg so that it becomes a provider of analytic feeds. Bloomberg also knows that it can’t build private or shared models based on these new feeds. When building custom applications, IBM knows that if it works with customers in one industry, it can’t leverage the living models of multiple competitors or it will never be trusted. Ginni Rometty, its CEO, said as much during a talk at the Churchill Club on November 9.
  2. Non-strategic feedback loops benefit all who share the “commons”: IBM’s The Weather Company has an example of an app that can only exist on shared data, but it’s non-differentiating data to its customers. It has a turbulence prediction map that runs as part of a suite of apps on airline pilots’ iPads. The gyroscopes in each iPad collect the the turbulence data in flight and pass it back to the cloud in real-time where all the feeds are combined with other weather data and turned into a real-time map that guides other pilots. Competitive airlines share this data because it’s non-differentiating, but the Weather Company’s high market penetration and first mover advantage makes it virtually impossible to dislodge with current technology. Another example of a living model with network effects that leverages non-differentiating customer data is cybersecurity intrusion patterns. Living models in one enterprise cannot possibly learn about the always morphing patterns that large-scale aggregation could ameliorate. The cybersecurity service provider that can own this living model will have an incredibly valuable franchise.


Action item

Look to the public cloud vendors not only to simplify big data infrastructure, but to make machine learning more accessible to mainstream developers by packaging functionality into easy-to-use API’s. For those enterprises that want to undertake strategic machine learning applications, partner with a professional services firm with the requisite skills, such as IBM, Accenture, or Palantir. Finally, put data feedback loops in place in order to keep improving the living models to ensure sustainable differentiation.

Article Categories

Book A Briefing

Fill out the form , and our team will be in touch shortly.
Skip to content