ARLA: Using Reinforcement Learning to Strengthen DNNs

By IEEE Computer Society Team on

April 1, 2025

Deep neural networks will be crucial to future human–machine teams aiming to modernize safety-critical systems. Yet DNNs have at least two key problems:

Researchers have proposed many defense schemes to counter many attack vectors, yet none have yet secured DNNs from adversarial examples (AEs).
This DNN vulnerability to AEs renders their role in safety-critical systems problematic.

Enter the Adversarial Reinforcement Learning Agent (ARLA), a novel AE attack based on reinforcement learning that was designed to discover DNN vulnerabilities and generate AEs to exploit them.

ARLA is described in detail in Matthew Akers and Armon Barton’s Computer magazine article, “Forming Adversarial Example Attacks Against Deep Neural Networks With Reinforcement Learning.” Here, we offer a glimpse at ARLA’s approach and its capabilities.

The Reinforcement Learning Approach

ARLA is the first adversarial attack based on reinforcement learning (RL); in RL, an agent

Uses its sensors to observe an unknown environment
Makes decisions and receives feedback (rewards or penalties) on those decisions via changes in its sensory perceptions
Uses trial and error to learn actions that maximize its expected rewards

The authors offer a simple example of this in a Pac-Man RL agent acting in a 2D grid:

The agent senses its location and its distance from pellets and from ghosts.
To maximize its expected reward, it learns to avoid ghosts while eating the maximum number of pellets in the shortest amount of time.

The agent learns which state–action pairs generate the most rewards but because the agent’s knowledge is always partial, RL entails an exploration/exploitation tradeoff:

During exploration, the agent randomly chooses actions to broaden its knowledge of the environment.
During exploitation, the agent uses existing knowledge to estimate actions with the highest reward.

To ensure that the agent continues to explore rather than simply greedily exploit known knowledge, it is given a policy that determines the amount of time it can engage in each of the two activities. This time is tuned to achieve the best-possible test performances.

The ARLA Attack

ARLA uses double Q-learning with a dueling deep Q-network agent architecture. At a high level, ARLA

Uses a benign sample image as a learning environment to generate AEs
Seeks to find the adversary with the shortest Euclidean distance between it and the original sample

In experiments, the authors report that ARLA significantly degraded the accuracy of five CIFAR-10 DNNs—four of which used a state-of-the-art defense. They also compared ARLA to other state-of-the-art attacks and found evidence that ARLA is adaptive, making it a useful tool for testing the reliability of DNNs before they are deployed.

Dig Deeper

DNNs used in image recognition are especially susceptible to perturbed or noisy data. As the authors point out, an RL approach to adversarial testing such as ARLA could be used to develop robust testing protocols to identify these and other DNN vulnerabilities to adversarial attacks.

To read details about the innovative ARLA approach, its results, and future research areas, read Akers and Barton’s “Forming Adversarial Example Attacks Against Deep Neural Networks With Reinforcement Learning.”