ADELPHI, Md. -- Cyberattacks and the extent of their damage have increased in recent years. In response, Army researchers discovered a method to enhance the effectiveness of cyber defenders and resilient systems in support of military operations by developing an adversarial framework for evaluating a cyber-alert inspection system.
This method produces sub-optimal, but usable, defender policies that are utilized in training the cyber-alert investigation system until being robust to the discovered attacker policy.
Researchers from the U.S. Army Combat Capabilities Development Command’s Army Research Laboratory, the University of South Florida, Singapore Management University and George Mason University made an adversarial evaluation of the defender’s approach. The peer-reviewed journal, ACM Transactions on Intelligent Systems and Technology, published the team’s findings in April.
According to Dr. Hasan Cam, a researcher at the laboratory, large organizations operate a first line of cyber defense, known as cyber security operation centers.
These centers consist of a team of specialized analysts, engineers and responders who maintain and improve cybersecurity, Cam said. The inspection of cyber alerts is a critical part of their operations.
“Given the high false positive rates of cyber alerts, it is important to screen the alerts effectively to identify any real attack signal from these alerts,” Cam said. “It is also required to maintain the queue length of alerts within acceptable limits.”
As a computational approach to understanding and automating learning and decision-making, reinforcement learning places more emphasis on learning through the interaction between a learning agent and its environment, he said.
“The premise of this research is to create a reinforcement learning framework for an adversary that learns the patterns of security decisions made by an organization to safeguard their critical assets,” Cam said. “By continuously interacting with the security environment, the adversary learns optimal timing and the amount of actions that are needed to overload the cyber-alert inspection system of the organization and gain critical time needed to execute malicious activities in their network.”
This work first exploits the fundamental learning-based model to develop a successful adversarial attack policy and then finds a new relearned defender policy robust to the discovered attacker policy, Cam said. The proposed reinforcement learning-based approach reveals that there exists a defender’s policy that is robust against any attacker policy.
“With reinforcement learning, the defender/agent interacts with its environment to obtain information and choose an action from a set of available actions in each time step,” Cam said. “Then, the agent receives a reward of the current time step and the environment moves to a new state.”
To test the limits of the defender’s reinforcement learning approach, the researchers present several cybersecurity alert generation policies launched by adversaries and the best response against them for various defenders’ inspection policies.
This research extends the defender’s reinforcement learning model to a game model with adversarial reinforcement learning. The proposed model has an adversarial interaction of two players on top of the queuing process, which, as far as the researchers know, has not been addressed in the queuing theory literature.
“Inspired by game theory, this research’s approach keeps retraining the cyber defender using episodes from the discovered attacker policy, so that the defender’s policy learns becoming robust to the attacker policy,” Cam said.
The relearned defender policy now exhibits more systematic and tactical behavior in allocating resources, Cam said, and allows the backlog of more cyber alerts to be inspected, compared to the previously known policy by the defender.
Overall, this work, which supports the Network Army Modernization Priority, proposes an adversary who can fully observe and interact with the defender reinforcement learning approach, called the CSOC-RL model, and uses a reinforcement learning framework to learn the best policy of actions to test the robustness of the CSOC-RL model.
"I am very optimistic that this research work will help establish the basic framework, particularly theoretical foundations, of cyber resilient systems over tactical networks,” Cam said. “This research has provided a game theoretic formulation of the problem, enabling us to understand the defender-adversary interaction. The theory has yielded simple and sub-optimal, but usable defender policies.”
According to Cam, this work will help any science and technology effort that aims at enhancing effectiveness of cyber defenders and resilient systems using reinforcement learning and learning-based decision-making systems over enterprise and tactical networks.