Formerly known as Wikibon
Close this search box.

Attackers can fool AI programs. Here’s how developers can fight back

Artificial intelligence isn’t all that different from natural intelligence. No matter how smart you are, you can be fooled. If the tricksters are also intelligent at their craft, they can dupe you with well-designed illusions that prey on weaknesses in your perceptual makeup.

In the entertainment field, that’s called magic and it can be a lot of fun. But in the world where AI is being designed to do everything from driving cars to managing distributed supply chains, it can be catastrophic. As the primary developers of AI apps, data scientists need to build statistical models that can detect and defend against efforts to lead them astray.

What cybersecurity professionals call the “attack surface” of an enterprise AI model can be vast and mysterious. Vulnerabilities in your deep neural networks can expose your company to considerable risk if they are discovered and exploited by third-parties before you even realize or have implemented defenses. The potential for adversarial attacks against deep neural networks — such as those behind computer vision, speech recognition and natural language processing — are an increasing cause for concern within the data science profession. The research literature is full of documented instances where deep neural networks have been fooled by adversarial attacks.

Much of the research focus has been on the potential for minute, largely undetectable alterations to images — which researchers refer to generally as “noise perturbations” — that cause computer algorithms to misidentify or misclassify the images. Attackers can succeed with these tactics even if they don’t know the details of how the target neural net was constructed or trained. Adversarial tampering can be extremely subtle and hard to detect, even all the way down to pixel-level subliminals.

This is no idle threat. Eliciting false algorithmic inferences can cause an AI-based app to take incorrect decisions, such as when a self-driving vehicle misreads a traffic sign and then turns the wrong way or, in a worst-case scenario, crashes into a building, vehicle or pedestrian. Though much of what’s discussed are simulated attacks done in controlled laboratory environments rather than in deployed AI applications in the real world, general knowledge that these attack vectors are available will almost certainly cause terrorists, criminals or mischievous parties to exploit them.

Going forward, AI developers should follow these guidelines to build antiadversarial protections into their applications:

  • Assume the possibility of adversarial attacks on all in-production AI assets: As AI is deployed everywhere, developers need to assume that their applications will be high-profile sitting ducks for adversarial manipulation. AI exists to automate cognition, perception and other behaviors that, if they produce desirable results, might merit the praise one normally associates with “intelligence.” However, AI’s adversarial vulnerabilities might result in cognition, perception and other behaviors of startling stupidity, perhaps far worse than any normal human being would have exhibited under the circumstances.
  • Perform an adversarial threat assessment prior to initiating AI development: Upfront and throughout the lifecycle of their AI apps, developers should frankly assess their projects’ vulnerability to adversarial attacks. As noted in a 2015 research paperpublished by the IEEE, developers should weigh the possibility of unauthorized parties gaining direct access to key elements of the AI project, including the neural net architecture, training data, hyperparameters, learning methodology and loss function being used. It’s clear that attack vectors will continue to expand as your data scientists develop more models in more tools and languages, incorporating more features that are fed from more data sources, scored and validated according more rapidly, and delivering algorithmic insights into more business processes.
  • Recognize that attackers may use indirect methods to generate adversarial examples: Someone might be able to collect a surrogate dataset from the same source as the original training data that was used to optimize a deep neural net. This could provide the adversary with insights into what type of ersatz input data might fool a classifier model that was built with the targeted deep neural net. As stated in the 2015 IEEE paper, an adversary who lacks direct visibility into the targeted neural net and associated training data could still exploit tactics that let them observe “the relationship between changes in inputs and outputs … to adaptively craft adversarial samples.”
  • Generate adversarial examples as a standard activity in the AI training pipeline: AI developers should immerse themselves in the growing body of research on the many ways in which subtle adversarial alterations may be introduced into the images processed by convolutional neural networksor CNNs. Data scientists should avail themselves of the growing range of open source tools, such as this one on GitHub, for generating adversarial examples to test the vulnerability of CNNs and other AI models.
  • Test algorithms against a wide range of inputs to determine the robustness of their inferences: AI developers should be able to measure how reliably their neural nets filter a wide range of inputs to produce consistent, reliable outcomes, such as recognizing specific faces or classifying objects in a scene into the correct categories.
  • Recognize the need to rely on both human curators and algorithmic discriminators of adversarial examples: The effectiveness of an adversarial attack depends on its ability to fool your AI apps’ last line of defense. Adversarial manipulation of an image might be obvious to the naked eye but still somehow fool a CNN into misclassifying it. Conversely, a different manipulation might be too subtle for a human curator to detect, but a well-trained discriminator algorithm in Generative Adversarial Networksmay be able to pick it out without difficulty. One promising approach to second issue is to have a GAN in which an adversary model alters each data point in an input image, thereby trying to maximize classification errors, while a countervailing discriminator model tries to minimize misclassification errors.
  • Build ensemble models that use a range of AI algorithms for detecting adversarial examples: Some algorithms may be more sensitive than others to the presence of adversary-tampered images and other data objects. For example, researchers at the University of Campinas founda scenario in which a shallow classifier algorithm might detect adversarial images better than a deeper-layered CNN. They also found that some algorithms are best suited for detecting manipulations across an entire image, while others may be better at finding subtle fabrications in one small section of an image. One approach for immunizing CNNs from these attacks might be to add what Cornell University researcher Arild Nøkland calls an “adversarial gradient” to the backpropagation of weights during an AI model’s training process. It would be prudent for data science teams to test the relative adversary-detection advantages of different algorithms using ongoing A/B testing both in development and production environments.
  • Reuse adversarial-defense knowledge to improve AI resilience against bogus input examples: As noted in a 2016 IEEE research paper, data scientists can use transfer-learning techniquesto reduce the sensitivity of a CNN or other model to adversarial alterations in input images. Whereas traditional transfer learning involves applying statistical knowledge from an existing model to a different one, the paper discusses how a model’s existing knowledge — gained through training on a valid data set — might be “distilled” to spot adversarial alterations. According to the authors, “we use defensive distillation to smooth the model learned by a [distributed neural net] architecture during training by helping the model generalize better to samples outside of its training dataset.” The result is that a model should be better able to recognize the difference between adversarial examples, or those that resemble examples in its training set, and nonadversarial examples, those that may deviate significantly from those in its training set.
  • Address ongoing adversary attack defenses throughout the lifecycle of deployed AI models: To mitigate the risks of proliferating adversary-exploitable vulnerabilities in your AI apps, your data scientists’ DevOps environment should support strong lifecycle governance controls. These should include consistent enforcement of adversary protections into configuration management, change tracking, version control, permission management and validation controls across all AI projects and assets.

Ideally, working data scientists should have sophisticated antiadversarial tools to guide them in applying these practices throughout the AI development and operationalization lifecycle. In that regard, the cybersecurity industry witnessed a significant milestone this past week with IBM’s launch of its Adversarial Robustness ToolboxAnnounced at the annual RSA Conference, this is the first open-source toolkit that includes attacks, defenses and benchmarks for:

  • Detecting adversarially tampered inputs: The toolbox includes runtime methods for flagging input data that an adversary might have tampered with in an attempt to try to exploit abnormal activations in the internal representation layers of a deep neural net.
  • Hardening neural-net models’ architectural defenses against adversarially tampered inputs: Hardening involves changing a deep neural net’s architecture to prevent adversarial signals from propagating through the internal representation layer, augmenting the training data with adversarial examples, and/or making preprocessed changes to the inputs of a deep neural net. The toolbox supports three such methods: feature squeezingspatial smoothingand label smoothing.
  • Measuring neural-net robustness against adversarially tampered inputs: Robustness is measured by recording the loss of accuracy on adversarially altered inputs and how much the internal representations and the output of a deep neural net vary when small changes are applied to its inputs. The toolbox implements a new metric — CLEVER (Cross Lipschitz Extreme Value for nEtwork Robustness) — that can be used to evaluate any neural network classifier. The metric indicates how easy is would be for a would-be attacker to compromise a neural net and cause a model to misclassify data input. It estimates the minimum attack strength required for an adversarial attack to be successful at modifying a natural image to an adversarial one.

Developed in IBM’s labs in Dublin, Ireland, and written in Python, the toolbox is open-source and works with deep neural networks models developed in multiple deep learning frameworks. The first release supports TensorFlow and Keras, while plans call for support to be extended to PyTorch and MXNet in subsequent release. Adversarial defenses written in the toolbox can be trained on IBM Research’s recently released Fabric for Deep Learning or FfDL, which provides a consistent way to deploy, train and visualize deep learning jobs across multiple frameworks, or on IBM Deep Learning as a Service within Watson Studio. Developers can access the toolbox’s open source code for these frameworks through the ART GitHub repository.

Currently, the toolbox’s libraries support adversarial robustness only for one type of deep neural networks: those for visual recognition and classification The toolbox includes several sample attacks (Deep FoolFast Gradient MethodJacobian Saliency Map). Future releases will include model hardening for deep neural networks designed to handle speech, text and time series data.

The toolbox’s libraries are also primarily geared to defending against “evasion attacks,” in which adversarial data are introduced during a neural net model’s operational inferencing. However, defenses against “poisoning attacks,” in which training data is tampered during upfront model development, will be provided in future releases.

Wikibon recommends that data scientists working on sensitive deep learning projects evaluate the IBM toolkit to help build robustness into their AI models. For more information, check out research team’s discussions in this blogthis research paper and this presentation deck. And here’s an excellent video from late last year in which the toolbox’s developers discuss their approach for building efficient defenses against adversarial examples for deep neural networks:


Book A Briefing

Fill out the form , and our team will be in touch shortly.
Skip to content