Informatica & Integration vs. Specialization & The Rest

By George Gilbert | January 04, 2016

Premise

In just about any software market, customers have a choice between best of breed and integrated tools. The decision point for organizations of which direction to take depends on the degree of differentiation and value add derived from each option. Integration will yield simplicity, greater manageability and lower operational costs, but will limit differentiation and places the value add mandate outside of the software technology stack (e.g. business model, unique IP, pricing, etc.).

This research note applies this premise and highlights some of the trade-offs between specialized data preparation tools and integrated products. Although this is a high-level look at the market, the examples in each category should serve to illustrate which approaches resonate with IT practitioner needs. Informatica has earned a place in the comparison with the announcement of its Big Data v.10 product.

Approaches to building tools for the analytic data pipeline

Given the steps in the previous article on requirements, we can show the scope of integration that different vendors’ tools offer. Competing on scope of integration highlights the trade-off between having different modules reinforce each other and building the richest possible functionality in any one module.

Extract + Load/Ingest

This step was left implicit in the last article as the first step in the pipeline. In the data warehouse era, ETL tools had their own connectors to applications and databases and these fed the transformation hub. In the Data Lake era, the Kafka message queue is rapidly taking over. It doesn’t substitute for the rest of the pipeline, but it does make connectivity and transport fast and easy.

Governance: Waterline Data, Alation, Attunity

Several standalone governance tools have emerged because the scope of the need is so much greater. Two of the most prominent include Waterline Data and Alation. Waterline takes an inventory by crawling the data and then adds structure and meaning to it so it can be further structured and analyzed manually. In other words, it jumpstarts the process so data engineers and scientists encounter a partially curated pool of data.

Alation2 — **Figure 2: Alation uses a crowdsourcing approach to build a “graph” or catalog of metadata that serves as a map to the Data Lake.**
*Source: Wikibon 2015*

Alation takes a crowdsourcing approach. The more end users access the data, the more it learns about what’s valuable and how it can be used.

AttunityPNG — **Figure 3: Attunity is a classic decision support application. It is focused on making production decision support applications and infrastructure perform at peak efficiency.**
*Source: Wikibon 2015*

Attunity also belongs in this group because it helps create the proper data warehouse design and then optimizes resource utilization so the right workloads can go on the right platforms.

Security: PHEMI

PHEMI takes a new approach to security. It assumes orders of magnitude more users will be accessing data so traditional security models of perimeters and permissions will break down. Rather, it figures out what data particular users can see based on policies that take into account attributes of the user and the data they’re trying to access.

Data Wrangling: Trifacta

Trifacta was one of the earliest of the new generation of standalone data preparation tools. They help data analysts and engineers find structure when more automated approaches come up short.

Integration and Runtime: Hadoop MapReduce, Hive, Spark

Increasingly, vendors are following their customers toward Hadoop infrastructure as the most scalable and cost-effective way to transform data and make it ready for analysis in the Data Lake. In the data warehouse generation, the data preparation and integration took place on a proprietary software foundation that typically lived on a very expensive server. Table stakes now requires execution on a Hadoop cluster. Bonus points goes to vendors who can execute on a choice of runtime engines such as MapReduce, Hive, or directly on Spark.

Figure 6: Syncsort uses Hadoop to do data preparation and integration tasks once reserved for proprietary tools. Other vendors can deliver the same capabilities but currently have a greater dependence on some of Hadoop’s more fragile underpinnings. The HiveQL language and the Hive runtime that other vendors have used makes fast and expressive preparation and integration tasks more difficult.
*Source: Wikibon 2015*

Where some of the more integrated vendors fit

Informatica chose a scope of product integration that leaves off just where their last generation product did. They chose to include everything up to analysis. Their perspective is that few companies standardize on one tool for the whole analytic data pipeline. However, if a vendor can integrate competitive functionality across data wrangling, integration, governance, and security, they can build functionality across these features that other vendors can’t do.

Talend appears to go further than Syncsort but not as far as Informatica. They find the synergistic intersection between data preparation and integration and master data management. Master Data helps ensure the quality of the data they are preparing and integrating.

At the other end of the extreme is Pentaho. They chose to extend data prep and integration all the way through analytics. Their appeal is to customers whose collaboration around curating the data to be analyzed can help inform what data to prepare and integrate. These scenarios typically show up when the analysis gets embedded in an application to drive an operational decision.

Action Item

Choosing a data preparation and integration tool is the first step in upgrading Systems of Record to Systems of Intelligence. When IT decision makers are weighing best of breed vs. integrated solutions, they should consider the ultimate objectives for their new applications. If the goal is highly differentiated functionality aligned with a business strategy, specialized systems may have have an advantage. If the goal is matching the competition with the new systems and differentiating elsewhere, the greater simplicity of an integrated solution will probably be more competitive.

Article Categories

By George Gilbert | January 04, 2016

George Gilbert

George Gilbert, lead data & analytics analyst for theCUBE Research. Former Gartner analyst, former lead enterprise software analyst for Credit Suisse First Boston, one of the top investment banks serving the technology sector. Big Data analyst for Gigaom Research. Co-founded Techalphapartners, a consultancy that advised vendors and institutional investors on market development and product strategy. George has led conference panels with prominent thought leaders in cloud infrastructure and big data. He has been profiled on the front page of the Wall Street Journal and published as a guest author in a major overview of the evolution of cloud computing in The Economist. Prior to being an analyst, George was a product manager on Notes at Lotus Development. George received his BA in economics from Harvard University.

You may also be interested in

IBM’s Granite 3.0: Pioneering the Future of Enterprise AI with Open-Source Models

John Furrier October 21, 2024

Grammarly’s New ROI Tools: A Game-Changer for AI Measurement in Enterprise Communication

Shelly Kramer October 21, 2024

Cutting Edge Research, Analysis, Insights + Media

Studio Locations

Silicon Valley
989 Commercial St.
Palo Alto, CA 94303

Boston Metro
5 Mount Royal Ave.
Marlborough, MA 01752

Research Areas

Podcasts

Solutions

Engage

Stay Connected

theCUBE Research weekly

Stay ahead of the curve with the exclusive insights by our team straight to your inbox each week.

By submitting this form, you are consenting to receive marketing emails from: theCUBEResearch, info@siliconangle.com. You can revoke your consent to receive emails at any time by using the SafeUnsubscribe® link, found at the bottom of every email. Emails are serviced by Constant Contact