Formerly known as Wikibon

Adding Data Science To Application Development

Premise

Enterprises in all industries are tasking data science-driven development teams with building their most strategic do-or-die applications.

Analysis

Traditional application development focuses on codifying well-understood business processes (e.g., accounting) and the corresponding data structures (e.g, general ledgers) in software. Generations of developers have spent their careers writing the deterministic business rules that automate a vast range of repetitive tasks. The net result has been to make those business functions more predictable and less prone to human errors, delays, and inconsistencies.

As we push deeper into the 21st century, the era of predominantly deterministic programming is giving way to a new world where many of the most important processes are fundamentally probabilistic. Predictive algorithms capture the fundamentally uncertain, accidental, and contingent flow of control that drives many real-world applications. For example, predictive algorithms guide how mobile-commerce apps respond to dynamically shifting user clickstreams and physical locations.

In this new era, more business and consumer applications encapsulate probabilistic execution logic that is defined through the techniques of data science. Developers are shifting their focus toward building and optimizing machine learning, predictive analytics, natural-language processing, and other statistical algorithms. Without data scientists, organizations would not be able to develop or manage the statistical logic that drives recommendation engines, facial recognition, streaming media analytics, cognitive chatbots, autocaptioning, mobile experience management, and other innovative features inside today’s most disruptive applications.

Though data scientists’ role in the development process will continue to grow, it won’t be at the expense of traditional programmers. Within multidisciplinary development teams, data science and programming are complementary roles. Whereas data science professionals build and optimize statistical algorithms, programmers write business rules, if/then/else statements, and other types of deterministic code that forms the backbone of practical solutions.

To proactively address these challenges, organizations must bring data science skills, tools, and methodologies into the heart of their application development initiatives. From the C-level on down, organizations must:

  • Identify how data science professionals can best contribute to development initiatives. Digital business imperatives confront companies in all industries, size classes, and geographies to various degrees. Chief among these challenges is becoming a more effectively predictive business, able to anticipate and mitigate uncertainties in your competitive environment. Predictive analytics—the heart and soul of data science—can help you survive and even thrive in spite of uncertainties related to marketplace demand, customer sentiment, labor supplies, factor prices, and exchange rates. In a world of uncertainties, data scientists can be your prime resource for ensuring that you have the right algorithmic models to respond and adapt to any contingency. In this regard, businesses should assess the extent to which they’ve factored data science skills, tools, and methodologies into their digitization strategies..
  • Gauge the development team’s data science readiness. Data science requires a wide range of roles, skills, tools, platforms, data, algorithms, and other assets. Depending on the development initiatives in which data science techniques will be applied, these enabling assets may vary considerably. Organizations should continually reassess their readiness to deliver the data science resources needed for successful application development, with a keen focus on current staff capabilities, tool sophistication, and DevOps workflow methodologies.
  • Align data science teams with enterprise app-development organizations. Alongside traditional coding and IT infrastructure management, data science is one of several workstreams that must be managed in parallel for successful delivery of digital business capabilities. Organizations should align the provisioning of data science resources with the structure—centralized, decentralized, or otherwise—of the organization that develops, deploys, and manages business applications and IT infrastructure.

Identify How Data Science Professionals Can Best Contribute To Development Initiatives

Data science is the process of extracting statistical insights from data.

Data science professionals generally work in teams, in which multiple specialties collaborate to discover, acquire, model, train, and refine algorithmically based statistical models. Just as important, data science teams establish repeatable, patterned pipelines for deploying their statistical models into applications, processes, and other touch-points. Furthermore, data science teams improve their models through an iterative process of monitoring, revision, redeployment, and monitoring to ensure that they are effective in their designated tasks, such as predicting whether a customer is likely to churn or automatically tagging a caption on a video stream based on its content.

In the real world, data science teams vary widely in size, composition, specialties, tools, projects, and other attributes. However, most data-science teams have individuals who play these core roles:

  • Statistical modelers: These individuals build, train, and iterate statistical models—such as predictive analytics, machine learning, natural language processing, and other algorithmic models to extract insights from data.
  • Data engineers: These individuals deploy and manage the data acquisition, integration, preparation, storage, and governance platforms, lakes, and pipelines used by data scientists to build, train, and iterate their statistical models.
  • Data-driven programmers: These individuals use R, Python, Scala, and other programming languages to build the declarative, procedural, and other business logic inside data-driven applications.
  • Subject-domain specialists: These individuals provide the subject-matter expertise necessary for data scientists to explore and model the application domain effectively; for data engineers to manage the governance of data sets used in statistical modeling; and for data-driven programmers to build applications that delivery statistical insights directly into real-world scenarios.

Gauge The Development Team’s Data Science Readiness

Data has always been fundamental to application development. Traditional programming involves  designing the data and coding the logic that drives applications. Befitting this focus, many professionals who gravitate toward programming come from the data-centric disciplines often known as “STEM” (science, technology, engineering, mathematics).

As development organizations add data science to their skillsets, they need to consider a new set of factors—above and beyond current data management practices—that can help them deliver better results on data-driven projects. The readiness of a data-science team to deliver results on development initiatives depends on the following factors:

  • Expertise: Readiness requires high-performance data-science team collaboration that involves individuals with diverse aptitudes, skills and roles. At the very least, these teams should include statistical modelers, data engineers, subject matter experts and business analysts, data-driven application developers, and analytics team leaders. These highly skilled specialties may be in short supply, so your organization must constantly be on the lookout for new sources of expertise to address these requirements. In addition to talent recruitment from external sources, your organizations should establish an ongoing program of professional enhancement that encourages existing developers and analysts to cultivate data science skills.
  • Applications: Readiness requires that every developer learn your organization’s chief business applications of data science. This may require that you recruit new data scientists from statistically knowledgeable personnel in the business functions, such as marketing and finance. It may also involve embedding subject matter experts from these business functions in your data science organization to cross-train statistical modelers and others on the application domain of interest. Some of this domain expertise may also come embedded in the statistical modeling tools, templates, and applications that your data science team uses to develop customized applications for your specific requirements.
  • Algorithms: Readiness requires that every developer obtain a core understanding of linear algebra, basic statistics, linear and logistic regression, data mining, predictive modeling, cluster analysis, association rules, market basket analysis, decision trees, time-series analysis, forecasting, machine learning, Bayesian and Monte Carlo Statistics, supervised learning, support vector machines, and constrained optimization. As a key productivity accelerator for your data science staff, most of these algorithms should be included in the libraries bundled into the data science workbenches provided by key tool vendors.
  • Tools: Readiness requires that every developer master a core group of modeling, development, and visualization tools used on your data science projects. Depending on your environment, and the extent to which data scientists work with both structured and unstructured data, this may involve some combination of Spark Hadoop, Kafka, TensorFlow, Caffe2, and other platforms. It will probably also entail providing instruction in R, Python, Scala, and other new open-source programming development languages geared to data science. Your development team’s tools should enable automation of most data-science pipeline tasks, including data discovery, preparation, modeling, training, deployment, and governance.
  • Practices: Readiness requires that every developer acquire a grounding in core concepts of data science, analytics, and data management. They should gain a common understanding of the data science lifecycle. They should learn a standard DevOps approach for establishing, managing, and operationalizing data science workstreams in the business.

Align Data Science Teams With Enterprise App-Development Organizations

As a team that contributes to application development initiatives, data science professionals can be effective only if they are clearly aligned to how your organization manages the software delivery lifecycle.

Typically, an organization deploys its data science resources in any of the following alignments vis-à-vis software development teams:

  • Centralized: In this alignment, which is consistent with the concept of a “shared-services organization,” data science teams serve all business functions throughout the entire organization. However, they would typically report to a chief data scientist who decides which app-development projects the teams will work on, how the projects will be managed, and how data science professionals will interface to the rest of the development organization.
  • Decentralized: In this alignment, data-science teams work with development teams in specific business units such as marketing, research and development, operations, and logistics, with the data science teams reporting to and taking directions from the app-development leads in those units.
  • Embedded: In this alignment, d data-science teams are decentralized and dedicated to particular functional business units. However, unlike in the purely decentralized alignment, they report to a single chief data scientist as opposed to functional units’ app-development leads.

Data science professionals might organize their efforts differently depending on how they align with your organization’s software development teams. Here are the chief factors that may distinguish data science teams in different app-dev organizational alignments:

  • Role specialization: In centralized data-science teams, there might be separate pools of specialists within each of the professional categories previously discussed (statistical modelers, data engineers, data-driven programmers, etc.).  Depending on the scope of your organization’s data science initiatives, each of those centralized specialties might also have personnel focused on various sub-specialties. For example, there might be dedicated statistical modeling specialists for such specialties as regression analysis, natural language processing, artificial neural networks, and behavioral graph modeling. This division of responsibilities might lessen the need for “unicorn” data scientists who are versatile and adept in all or most of these specialties. However, these “unicorns” might play an important role in decentralized and embedded data science teams. Decentralized teams are often self-sufficient and dedicated to specific business projects. This arrangement typically places a higher priority on each team member being adept in a wider range of data-science specialties and having a strong grasp of a specific application domain, such as marketing, finance, or security.
  • Tooling and platforms:  A centralized data-science team might have the advantage of a larger IT budget than smaller, more functional-specific decentralized or embedded teams. In those circumstances, the centralized teams might have the resources to invest in scalable data lakes, sophisticated modeling tools, automated data-preparation programs, and productivity-enhancing collaboration tools that are beyond the wherewithal of smaller, more focused, decentralized teams. To the extent that data science teams are decentralized out to business units and report directly to app-development leads in those units, the data scientists might use different platforms, tools, and data sets than their counterparts in other business units. However, when the decentralization is in the context of an “embedded” alignment—in other words, function-dedicated data scientist who report up to an enterprise-wide chief data scientist who defines standard practices and may even provide access to shared data-science resources—the practices in dispersed teams may be keep in rough alignment with each other.

Action Item

Data science is a foundation capability for application development professionals in the era of artificial intelligence, machine learning, and predictive analytics. To support development and deployment of data science assets into enterprise applications, developers should add data science to their core curricula, recruit data scientists into their teams or cross-train existing developers on data science, incorporate the tools and techniques of data science into their work, and integrate data-science workstreams into their DevOps practices.

Article Categories

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
"Your vote of support is important to us and it helps us keep the content FREE. One click below supports our mission to provide free, deep, and relevant content. "
John Furrier
Co-Founder of theCUBE Research's parent company, SiliconANGLE Media

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well”

You may also be interested in

Book A Briefing

Fill out the form , and our team will be in touch shortly.
Skip to content