Wikibon Big Data Definitions and Methodology

By Dr. Ralph Finos | March 30, 2016

Contributing Analysts

Ralph Finos

George Gilbert

David Floyer

Peter Burris

Overview

This document describes Wikibon’s definition of big data and big data categories. This also serves as our methodology underpinning for our big data market forecasts and other related forecasts, as well as vendor big data market shares. This research document is used a basis for the following Wikibon research:

Big Data Definition

The big data market is defined as the workloads with data sets whose size, type and variety, speed-of-creation, and data velocity make them less practical to process and analyze with traditional infrastructure and software technologies, and require new tools and management processes to successfully execute and manage.

Representative workloads utilizing big data that meet our definition would include the following. Of course, this list is meant to be illustrative and is not exhaustive.

Customer data analytics
- Customer segmentation
- Customer churn analysis
- Recommendation engine (i.e. Upselling customers)
- Patient diagnosis, outcomes and remote management
- Sentiment analysis
Transaction analytics
- Fraud detection
- Marketing campaign analysis
- Supply chain optimization
- Workflow optimization
Risk management
- Financial
- Weather
- Security Analytics – anomaly detection, threat assessment
Machine-related monitoring/prediction
- IT operations support
- Industrial equipment predictive maintenance and failure reporting
- Gas & oil field monitoring
- Network monitoring and analysis
- Smart meters and Smart Grid
- Application performance management
- Distributed sensor data management, analysis, and coordination (Internet of Things)
Spatial and Location-based applications
- Smart Cities
- Transportation
- Emergency response
- Field force management
Rich media analytics
- Surveillance
- Entertainment
Content Analytics
Data Lakes

Big Data Revenues

Wikibon counts revenues derived from sales of hardware, software, and services to the end-users of big data, who in turn utilize big data and big data analytics for their enterprise. Typically this will involve an enterprise’s purchasing:

compute clusters, storage, Hadoop or other big data-related hardware and infrastructure software
Purchasing of analytic software, big data tools, databases, middleware, application software or application services (e.g., SaaS services) that will utilize the big data infrastructure and create business value
deployment internally or using external professional services (business and IT consulting, system integration, application development, data management) to realize the value of big data for their enterprise.

Product and Service Categories

In this report, Wikibon defines the product and services segments as follows:

Hardware – Hardware includes compute, storage and networking. A fundamental premise of big data is to leverage a linear scale-out (versus scale-up) approach where possible to reduce the cost of the underlying hardware architecture supporting big data storage, processing and analysis. As such, the majority of hardware-related revenue in the Big Data market is associated with commodity servers with direct attached storage to support scale-out Hadoop and MPP analytics database clusters. A major change in hardware will start to be deployed in the second five years of the forecast, as Edge computing takes hold to manage the sensors distributed in all areas of human endeavor, including Agriculture, Key vendors include HP, EMC, Dell, and Cisco. For the purposes of this study, Wikibon includes the IaaS portion of big data Public Cloud services (e,g., AWS, Microsoft Azure) as hardware.
Core Technologies – Hadoop, Spark and Streaming – Hadoop, Spark and Streaming are the core foundation technologies that enable large-scale big data data storage, processing and, ultimately, analytics. Revenue from additional data management, analytics, and applications that reside on or utilize the Hadoop, Spark, and Streaming platforms would be counted in those categories (defined below). Spark and Streaming are poised to become important complements and components of the Big Data foundation over the next few years. Examples of providers of Hadoop Platforms include Cloudera, Hortonworks, MapR, and IBM.
Big Data Database – Traditional software tools that are utilized to support big data deployments and usage are included in this category, as well as NoSQL databases. These may be tools developed for more traditional database environments but are included in the big data market to the extent to which they have been extended to use big data use or are adaptable to solving big data workload and data set requirements as defined above. Despite the emphasis on new approaches to data analytics (i.e., Hadoop), SQL-based relational technologies also play an important role in developing and creating big data applications. This is because SQL is well understood by systems developers and system development senior executives. SQL can significantly reduce the cost of development, as more is done for the programmer. NoSQL has lower overheads, but need more complex coding testing and maintenance. Big data database includes both relational database management systems, which feed both big data technologies such as Hadoop and in some cases are used to operationalize big data insights, as well as data warehouses that often are used as environments to blend big data-driven insights with more traditional data and reporting. There is also an emphasis on applying SQL techniques to Hadoop, with all the Hadoop distribution vendors developing SQL-on-Hadoop technologies. Wikibon expects the Hadoop and SQL segments of the big data market to increasingly overlap in the coming years. In-memory – both DRAM and flash – databases as applied to big data workloads are included here as well. Examples of providers of big data database software tools would include IBM, HP, SAP, Oracle, Pivotal, and Teradata. Examples of providers of NoSQL Software include Aerospike, Amazon, MarkLogic, DataStax, MongoDB, Couchbase, and Basho. NoSQL databases are usually supplied with SQL options, for the reasons given above.

Data Management – Big Data optimized data integration and data quality platforms, tools and services supporting Big Data workloads and applications are included here. Layered on top of the core Big Data Hadoop and NoSQL platforms are the data management tools required to make Big Data useful – data integration, data transformation for analysis, applications development, data veracity and quality management, data governance and compliance, and process integration software as applied to Big Data workloads. This segment initially formed around the concept of the data lake as a foundational approach to Big Data platform deployments. Data management is critical today in creating data that is readily adaptable for applications that optimize business processes. Examples of providers of Big Data Data Management tools would include IBM, SAP, Informatica, Oracle, Talend, Syncsort, Datameer, Attunity, Paxata Trifacta Tibco, DataTorrent, Attivio, and Altiscale.

Big Data Applications, Analytics, and Tools – This market segment is where the “rubber meets the road” – i.e., where the transformative business value of big data analytics and usage takes shape. It includes data science, analytics, business intelligence and data visualization tools and applications as well as packaged analytics-focused applications. Today, the vast majority of this software category revenue is associated with business intelligence and data visualization tooling as opposed to packaged, operational big data applications. Wikibon believes the latter, however, will have a bigger impact on value delivered from big data than business intelligence. Applications that automate business processes hold significant promise to transform business processes across industries. However, the maturation of applications will lag other parts of the big data market due to the complexity of the technologies (e.g., artificial intelligence, machine learning, cognitive computing) involved. However, over-time Wikibon expects this segment of the market to grow significantly as vendors provide mature packaged application offerings that leverage and deliver the results of advanced analytics, data science and cognitive computing to end-users. These offerings will come from software designed to run on internal private or public cloud, as well as services provided by Sofware-as-a-Service (SaaS) vendors. This growth will come at the expense of professional services, which will become less critical to big data projects as software matures. Examples of providers of big data vendors in this category include IBM, SAP, Palantir, SAS Institute, Splunk, Tableau Software, and Qlik.
Professional Services – Professional services play a crucial role in helping big data practitioners efficiently and effectively apply the technology to real-world business problems. This includes identifying initial use cases, designing and deploying the supporting infrastructure, architecting data flows and transformations to enable data lakes, practicing analytics and data science to derive insights from Big Data, consulting, education, and related services to operationalize big data applied to specific business challenges. The market is led by the usual large system integrators (IBM, Accenture, etc.), but also includes thousands of small and mid-sized SIs and consultancies especially those with traditional data warehouse, analytics and business intelligence DNA, as well as those with vertical market expertise. While the big data professional services market makes up the largest slice of the overall big data market today, Wikibon expects professional services to become less critical to big data projects over the long-term (5-10+ years out) as software (both platforms and applications) and hardware (private and public clouds) mature. This is projected to make big data more accessible to less sophisticated practitioners and organizations without the assistance of armies of consultants. Examples of providers of big data vendors in this category include IBM, Accenture, Palantir, and Teradata.
Public Cloud Services – The market for public cloud services – native big data tools and IaaS and PaaS supporting native and third party big data tools and workloads is an increasingly important factor in the market. A growing portion of big data development and apps are finding their way to the cloud and in many cases is already resident there. Cloud native services from AWS (Kinesis, Azure HDInsight, etc.) are gaining traction especially with cloud resident data sets. Hybrid solutions involving data moving back and forth from on-premise workloads to the cloud are becoming more frequent – constrained by obvious latency constraints posed by movement of such large datasets. Examples of providers of big data Vendors in this category are AWS, Microsoft Azure, IBM, and Google GCP. Public Cloud revenue derived by AWS, Microsoft Azure, Google GCP, etc from their customers is treated as an orthogonal view of the overall forecast. PaaS and SaaS software native on public clouds is included in our software figures. Software via 3rd parties is attributed to each 3rd party provider. Hardware (typically ODM) required to deliver big data public cloud services includes IaaS to support native and 3rd party software, as well as hardware services utilized for big data directly by enterprises for their own in house code. Any related professional services on the part of public cloud providers would be counted as professional services.

Big Data Application Patterns

Wikibon has extended an orthogonal view of its big data forecast to highlight the key classes of use cases that we see forming and their growth over time. This analysis is covered in our report Forecasting Big Data Application Patterns. Briefly,

Data Lakes pattern Data lakes are going to go through a transition from repositories that complement data warehouses platforms that produce production-ready predictive models based on machine learning not of population segments but individuals. Successful practitioners are manually integrating disparate tools for data prep, modeling, and visualization and starting to simplify their execution engines with Spark.
Designing Intelligent Systems of Engagement pattern. Applications with an engaging user experience grow in value if developers can integrate predictive models to anticipate, influence, and ultimately personalize each interaction. That requires deep integration of models between outward-facing consumer applications and live Systems of Record running operations.
Evaluating Intelligent Self-Tuning Systems pattern. Many applications, like fraud detection, have to continually learn new patterns of activity the applications need machine learning to track automatically the new patterns without having data scientists always in the loop. Lots of new new computer science is required, but it is beginning to show up in technologies such as Spark and — in key cases — customers are starting to experiment as design partners for leading edge vendors.

2015 Changes in Big Data Definitions from Prior Years

It is critically important to understand how Wikibon defines big data as it relates to the market size overall and to revenue estimates for specific vendors in particular. In our 2016 Sizing of the 2014, 2015 markets and projections to 2026 Forecast entitled “2016 – 2026 Worldwide Big Data Market Forecast“, we have altered our definition to reflect the fact of the evolution of the market and the appearance of product suites from a wide variety of vendors pointed at solving big data problems.

In Wikibon’s prior forecasts, we included spending on projects where practitioners embraced an exploratory and experimental mindset regarding data and analytics replacing gut instinct with data-driven decision-making. In prior years, workloads and projects whose processes were informed by this mindset met Wikibon’s definition of big data, even in cases where some of the tools and technology involved may not were included.

This definition was suitable for the state of a disruptive and initially forming market where practitioners were often using free, open-source tools to experiment with the possibilities of realizing the business value of this technology. The market was led in this period by Fortune 500 enterprises vying for advantage at large scale and by web-native data-driven companies. We believe that the market has moved beyond its infancy into (perhaps) its adolescence and is well on its way to a modest level of maturity and rationalization that will help facilitate real business process solutions and value.

As a result, our 2016 Forecast takes a more narrow perspective that reflects big data’s move into the mainstream with a landscape of targeted tools and solutions aimed at enabling more modest enterprises to participate meaningfully. Also, we recognize that users are moving beyond free open-source tools to paying for support so as to enable them to confidently move to real workloads with real business value.

As such, we believe it is time to require that the big data workload and dataset be the primary gate – with “mindset”, per se, being excluded. However, we do include some traditional database and application components and tools which include capabilities and extensions to support big data use cases where workload and data set size, type, and speed-of-creation are the primary considerations for solution selection.

In addition, we have redefined the software taxonomy in the following way to accommodate the trends that we see emerging over the forecast period:

Data Management – No change from our 2015 Big data Market Forecast
Core-Technologies – Hadoop, Spark and Streaming – We are extending the Hadoop software category to account for Spark and Streaming since these will become an increasing important component to big data solutions in the coming years.
Big Data Database – This new category recognizes that traditional SQL tools and database technologies are converging with NoSQL approaches and that the separation of the two is an artifact of an earlier time.
Big Data Applications, Analytics, and Tools – No change from our 2015 Big data Market Forecast

Big Data Market Share & Forecast Research Methodology

Wikibon has built a body of vendor and user research on Big Data since 2012, when we initiated our first annual Big Data Market Forecast. Regarding our data sources over the years, Wikibon’s big data market size, forecast, and related market-share data is based on 100s of extensive interviews with vendors, conversations on theCube at big data events, venture capitalists and resellers regarding customer pipelines, product roadmaps, and feedback from the Wikibon community of IT practitioners. Third party sources – public financial data, media reports, and Wikibon and 3rd party surveys of big data practitioners. Information types used to estimate revenue of private big data vendors included supply-side data collection, number of employees, number of customers, size of average customer engagement, amount of venture capital raised, and the vendor’s years of operations.

Wikibon’s overarching research approach is “Top-Down & Bottom-Up”. That is, we consider the state and possibilities of technology in the context of potential business value that is deliverable (Top-Down) and leaven that with both supply-side (vendor revenue and directions, product segment conditions) and demand-side (user deployment, expectations, application benefits, adoption friction and business attitudes) perspectives. In general, we believe a ten-year forecast window is preferable to a five year forecast for emerging, disruptive and dynamic markets because we feel there are significant market forces – both providers and users – that won’t play out completely over a shorter time period. By extending our window we are able to describe these trends better, and indicate more clearly how Wikibon believes they will play out.

Article Categories

By Dr. Ralph Finos | March 30, 2016

Dr. Ralph Finos

Ralph Finos, Ph.D., is an innovative IT industry market research analyst and consultant with 20+ years designing and executing strategic projects for IT vendors. He’s experienced in applying the right methodology and approach to solve challenging market research problems. He complements his industry knowledge and research know-how with hands-on management and continuous communication and collaboration with clients throughout the engagement. Dr. Finos’s areas of interest include the fundamental adoption drivers and behavior involved in technology innovation and the development of market and company models. Recent project work includes sizing and forecasting cloud markets, big data user trends, organizational and job role implications of storage management, and analysis of the application of technology to line-of-business applications. He earned a BA from Bowdoin College and a Ph.D. from Hofstra University in Applied Research. He led IDC’s US Hardware, Software, and Services Consulting practice for many years before founding ralphfinosconsulting.com.

You may also be interested in

AWS Summit NY 2025 Highlights the Agentic Era of Software Development

Paul Nashawaty July 18, 2025

Agentic Infrastructure Arrives at AWS – Now It Needs a System of Intelligence

David Vellante July 16, 2025

Cutting Edge Research, Analysis, Insights + Media

Studio Locations

Silicon Valley
989 Commercial St.
Palo Alto, CA 94303

Boston Metro
5 Mount Royal Ave.
Marlborough, MA 01752

Research Areas

Podcasts

Solutions

Engage

Stay Connected

theCUBE Research weekly

Stay ahead of the curve with the exclusive insights by our team straight to your inbox each week.

By submitting this form, you are consenting to receive marketing emails from: theCUBEResearch, info@siliconangle.com. You can revoke your consent to receive emails at any time by using the SafeUnsubscribe® link, found at the bottom of every email. Emails are serviced by Constant Contact