Formerly known as Wikibon
Close this search box.

Cybersecurity: Emerging Departmental Systems of Intelligence

Premise. Systems of intelligence aren’t process-based. Rather, they operate according to insights into patterns of human interactions. But the industry hasn’t generally figured out how to package models of human interaction into enterprise applications. This report will discuss how IT can deploy packaged systems of intelligence that are more comprehensive than big data micro app (BDMA), but not as broad in scope as ERP.

As ever more organizational capital becomes digital, protecting it gets ever more critical. Packaged cybersecurity applications are rapidly evolving to help. What’s significant about these applications is that they are among the first to employ pervasive, built-in machine learning. Why? Because they support a relatively well-understood and successfully modeled scope. While focused on IT network and security ops (NOC and SOC), the concepts employed to package cybersecurity applications can be extended to a broader set of human interactions, complementing the BDMA approach.  Splunk’s packaged app for cybersecurity, UBA (for User Behavior Analytics), is a good example of these concepts in action.  It illustrates how big-data-driven, machine-learning apps serving a departmental scope are likely to emerge. UBA takes advantage of clear domain boundaries to achieve packaging goals, including:

  • Relatively familiar malicious activities. While broad-based threats continue to appear in unpredictable forms, many of the discrete activities within the attacks fall into well-known categories, making it easier to use machine learning to flag what’s likely to be suspicious behavior. Discernment in flagging activity is critical to minimizing false positives, which are the security-equivalent of “the boy who cried wolf.”
  • Known “entities” on the network. Rather than requiring scarce and expensive data scientists and engineers to wrangle data from an unknown set of sources, Splunk UBA knows the “entities” on the network.  The large majority of devices, applications, and users on networks are already declared as data sources to the system.  Splunk thus can provide out-of-the-box “connectors” in UBA so that it can easily acquire and prep source data from network resources.
  • Minimal disruption to existing people, processes, and technologies. The roles and responsibilities of individuals in SOCs and NOCs are well-known and cybersecurity applications naturally support them. The apps also interoperate with other key security technologies such as SIEM applications beyond simple data integration.

Anticipating malicious behavior

Cybersecurity apps are able to start off with basic knowledge of discrete activities that should raise red flags.  Some of these activities include account take-over, moving data outside the network, and malware that moves across the network.  Knowing about these  activities makes it possible to build-in machine learning models that know how to look for threats.  Traditional cybersecurity systems of record, such as a security information and event management (SIEM) apps, create a repository for all relevant security data in order to support real-time visualization and analysis. Examples include HP’s ArcSight and IBM’s QRadar. While SIEMs provide fully automated and continuous threat monitoring and alerting, threat characteristics typically have to be declared to the system. In other words, humans have to program the SIEM explicitly to search for dangerous activity patterns and the application just follows rules for raising alerts.  Machine learning has the promise to make cybersecurity far more effective and efficient. Rather than looking for predefined activities, machine learning can learn what a baseline of normal operations looks like in order to identify the “footprints” of anomalous activities. Machine learning can then connect the activities together into richer patterns that constitute threats, rank them by severity, and display them in a dashboard for operator review and action.

Figure 1: A partial list of machine learning models that are specific to network security. Making the models domain specific rather than generic, it permits security analysts rather than data scientists to train the application to look for ongoing threats.
© Wikibon Big Data Project 2016

Splunk’s UBA cybersecurity app embeds threat models, ensuring that the machine learning process is more circumscribed and simpler than starting with a blank canvas of all log data on the network (See Figure 1 for some examples of machine learning models focused on security threats). However, while cybersecurity apps come with some built-in activities to look for, many of those activities can be just a little outside the bell curve of normal activity. Distinguishing between activity that’s slightly outside the norm and what likely constitutes a threat is where company-specific machine learning is required. Training the machine learning models about what is normal behavior for each company makes the difference between lots of false positives and finding the “actionable” needles in the information haystack (See Figure 2 for an example of a company-wide dashboard of threats). Today, Splunk UBA learns about normal behavior and threats within an individual company. In 3-4 years we could see Splunk collect these patterns into a cloud-based service and deliver self-tuning to UBA in companies with much richer, crowd-sourced patterns of likely threats similar to how anti-virus companies distribute malware signatures.

Figure 2: A company-wide threat dashboard shows threat activity by type on the left side. Multiple “entities” engaged in a sequence of anomalous behavior constitute a threat. Machine learning determines both what is an anomalous event as well as what sequences of events are threats.
© Wikibon Big Data Project 2016

Integrating data without data prep

Simply put, UBA knows where to look for the data it needs, just as it knows how to look for threatening behavior.  One of the banes of packaging big data apps is data wrangling: the complex, labor-intensive, and skills-dependent process of translating or mapping data from one source to another before it can be processed and analyzed.  Splunk is able to short-circuit this problem in UBA by packaging data integration and preparation models based on known network devices, directories, firewalls, malware detection controls, apps, and users. Each of these entities emits events describing their operation (See Figure 3 for examples of events emitted by several different databases and messaging systems).  Admins configure Splunk UBA about which entities to track.  Splunk then uses pre-packaged “parsers” that read these logs and unpack the data contained in them. That unpacking makes the data immediately available for visualization, alerts, and machine learning.

Figure 3: Splunk UBA obviates the need for most “data wrangling” by data scientists and engineers. UBA comes with importers for most of the entities emitting data that UBA needs to rely on.
© Wikibon Big Data Project 2016

Minimizing disruption and adoption “friction”

Change management challenges become simpler when applications augment the expertise of people in existing roles, working within existing workflows, and interoperating with existing technologies.

Cybersecurity apps such as Splunk UBA fit this profile. NOC and SOC analyst roles and workflows are relatively well-known. Ironically, for all its out-of-the-box simplicity, Splunk UBA is built on components from the Hadoop ecosystem including Spark for machine learning; Storm for processing high velocity event streams; HDFS or S3 for persistence; a time series database; HBase; Hive; and a graph database. The Splunk UBA app hides these moving parts. Threats above a certain risk threshold can be forwarded to other security systems of record such as Splunk’s native product or a SIEM. And Splunk UBA can also ingest data continuously from the core Splunk app as well.

Action Item. Splunk’s UBA deserves the attention of Doers for two reasons. First, it can help solve cybersecurity challenges.  Second, it highlights an approach for evaluating the likely effectiveness of other big data applications with a departmental or functional scope.  Doers should conceptualize these applications along the dimensions of how easily the relevant activities and entities are known and can be modeled and if the application can fit in with minimal disruption to existing skills, processes, and infrastructure. Splunk’s UBA is an example of one such application that makes it easy to get up and running with a very sophisticated solution.

Book A Briefing

Fill out the form , and our team will be in touch shortly.
Skip to content