Integrating AI and ML technologies across OT, ICS environments to enhance anomaly detection and operational resilience

As the industrial cybersecurity landscape adopts AI and ML technologies, helping enhance anomaly detection across OT (operational technology) and ICS (industrial control systems) environments, they also improve capability visibility and response throughout the organizational systems lifecycle. Applying AI (artificial intelligence) in OT environments presents unique challenges, as the data preconditions are different. Relatively low complexity OT systems tend to yield noisy, unstructured, or incomplete data, necessitating specialized domain-knowledge filtering and extensive preprocessing.

Unlike traditional signature-based detection methods, which have difficulty keeping up with new threats, AI/ML systems can evaluate massive datasets and recognize unusual behavioral patterns indicative of potential threats that can be mitigated in real-time. Nonetheless, adding AI in OT systems provides challenges in data quality because the data produced by OT systems is often regarded as noisy, unstructured, or incomplete; therefore, inferring reliable outputs necessitates comprehensive domain-specific preprocessing and tuning.

AI and ML technologies are advancing unknown threat detection. The AI/ML reduction of such previously verified signatures is highly beneficial, allowing for identifying subtler deviations that signify novel exploits. The reliance on AI also comes with risks; sub-degree adjustments could mean improving detection rates, but are also prone to introducing false negatives or false positives. Adjusting parameters tends to create a burden of irrelevant issues, so countering fatigue requires meticulous control over sensitivity and specificity blur the line of discernment.

Organizational teams are required to change their operational competencies with the infusion of AI/ML into cybersecurity. New skills require a fundamental comprehension of ML (machine learning) algorithms alongside data science, data analytics, and threat modeling. Constantly evolving standards also require stronger collaboration between data scientists and algorithmic engineers, alongside cybersecurity professionals, to interpret data, refine models, and devise new cybersecurity measures to bolster OT/ICS infrastructure against the overwhelming complexity of cyber threats.

Using AI and ML to enhance anomaly detection in OT/ICS systems

Industrial Cyber reached out to industry experts to explore how AI and ML technologies boost efficiency and accuracy in industrial cybersecurity, especially in detecting anomalies within OT/ICS systems.

Ofir Arkin, manager and senior distinguished architect for cybersecurity platforms at NVIDIA

Ofir Arkin, manager and senior distinguished architect for cybersecurity platforms at NVIDIA, told Industrial Cyber that there is a unique opportunity to apply AI-powered behavioral analytics in OT networks due to their unique characteristics and the data they hold.

“For example, telemetry data containing commands issued to devices enables comparing devices of the same device type to identify devices configured outside of the norm,” Arkin detailed. “The same data can be used to identify anomalous commands that are not usually being issued. Using telemetry data received from devices about their operation allows identifying devices operating outside of the norm, providing predictive maintenance capabilities. The value here stretches beyond cybersecurity and well into operational resiliency.”

Another example, Arkin cited as not unique to industrial cybersecurity, is using AI-powered log intelligence. “By leveraging machine learning and automation, it can make sense out of the endless stream of log data, detecting anomalies proactively and prioritizing detections. By automating the detection triage, AI-powered log intelligence allows to significantly reduce the amount of time invested in investigation, providing faster detection times (hours of work to minutes or less), faster responses to threats, and guided instructions. In many implementations, one can query its log data, using LLMs/AI agents, enabling interaction using plain language prompts to fast-track investigation, for example,” he added.

Jeffrey Macre, industrial security solutions architect at Darktrace

“AI is revolutionizing cybersecurity across OT/ICS systems. AI can learn the unique network communication patterns of each device within these environments,” Jeffrey Macre, industrial security solutions architect at Darktrace, told Industrial Cyber. “Unlike traditional rule-based methods, unsupervised ML can detect anomalies in real-time, spotting subtle changes, like unusual device behavior or network traffic, that may indicate potential threats. This approach makes monitoring much more accurate and reduces false positives.”

Macre added that in the world of ICS, where keeping systems up and running is critical, proactive AI-powered anomaly detection helps quickly identify threats, ensuring operations stay safe and systems remain intact.

Carlos Buenaño, chief technology officer for OT at Armis, told Industrial Cyber that by leveraging crowdsourced information from a wide array of devices, AI/ML algorithms can establish baseline behaviors specific to each system component and identify deviations from these norms that may signal potential threats or anomalies. “This capability is essential because traditional cybersecurity measures often rely on static rules or signatures based on known threats, which are insufficient in dynamic industrial settings where new vulnerabilities are constantly emerging.”

With machine learning, models can be trained on historical data to recognize patterns of normal operation, incorporating various parameters such as device performance metrics, communication logs, and environmental conditions, Buenaño detailed. As new data streams in from a large number of heterogeneous devices, the algorithms continuously update and refine their understanding of what constitutes ‘normal’ behavior within the operational context, thus improving both the precision and recall of anomaly detection. This proactive approach allows for the identification of subtle changes or suspicious activities that might indicate a cyber threat, even in its early stages.

“The integration of crowdsourced data allows for cross-device learning, where insights from one device or network segment can enhance the understanding of others, creating a more robust defense mechanism,” according to Buenaño. “For instance, if an anomaly is detected in one part of the network or device type, information about this event can be shared across the rest of the networks or device types, informing other systems of potential risks and allowing them to adapt accordingly. This interconnectedness not only speeds up the response time to threats but also enables the development of a more collaborative cybersecurity framework, where collective intelligence is harnessed to strengthen defenses.”

He added that AI/ML can automate the anomaly detection process, reducing the need for manual intervention and enabling cybersecurity teams to focus on more strategic tasks rather than overwhelming alert management. This automation leads to quicker threat identification, minimizing the window of opportunity for attackers and consequently reducing potential damage.

“AI/ML can process much more data than humans can, to the tune of orders of magnitude,” Clint Bodungen, founder, president, and CEO at ThreatGEN, told Industrial Cyber. “Additionally, AI/ML is not limited to pattern matching like a traditional code-based workflow. It can process statistical analysis and heuristics to infer behavior and anomalies. With the addition of generative AI, it can now even analyze correlative relationships, semantics, and even ‘fact check’ its own output.”

Addressing data quality challenges in OT/ICS environments

The executives examine the primary challenges in acquiring and sustaining high-quality, labeled datasets for OT/ICS environments and address how these challenges are addressed.

Arkin said that the challenge might be slightly different. “Gaining granular visibility at the network, host, and application level in OT/ICS environments is challenging. That level of insight is required to produce valuable telemetry, which is then used as the input to AI/ML to enable use cases for cybersecurity and operational resiliency. Without this level of insight, it is hard to secure these networks regardless of the use of AI/ML, so it serves multiple purposes.

He added that this is why the NVIDIA cybersecurity AI platform for OT is pushing protection to the server edge, enabling granular visibility and control at the server/workstation level while maintaining operational resiliency and producing the telemetry that is used as input by Morpheus, the company’s cybersecurity AI framework.

Macre said obtaining high-quality, labeled datasets in OT/ICS environments is challenging due to limited attack data, system complexity, and operational sensitivity. Legacy systems often lack standardized logging, while real-world incidents are rare, making labeled examples scarce. “This makes unsupervised ML techniques particularly useful as they do not rely on pre-labeled datasets. Instead, AI can learn normal behavior from raw, unlabeled data in real-time, adapting to each unique ICS setup. This approach overcomes data scarcity and ensures continuous, accurate threat detection without disrupting critical operations,” he added.

Likewise, Buenaño also observed that obtaining and maintaining high-quality, labeled datasets for OT/ICS environments presents several significant challenges, largely due to the unique characteristics of these systems. “Primarily, OT/ICS environments are often characterized by limited connectivity; many systems are designed with minimal exposure to external networks to enhance security and reliability. This limited connectivity complicates the collection of real-time data and reduces the frequency of updates, making it difficult to gather comprehensive datasets.”

Additionally, he noted that the processing power of many OT/ICS devices is low, as these systems are typically optimized for specialized tasks rather than data-intensive operations. “This constraint hampers the ability to perform complex data processing or machine learning tasks on-site, further complicating the acquisition of high-quality labeled data. High latency in these environments poses challenges when transmission of data occurs, particularly as real-time monitoring and updates are hindered, further affecting the responsiveness and relevance of the acquired data.”

Buenaño added that addressing these challenges requires a multifaceted approach. “One common strategy involves establishing a hybrid architecture where edge computing devices with more robust processing capabilities are deployed to preprocess data locally before sending it back to central systems for further analysis. This allows for a reduction in data volumes transmitted over the network, facilitating higher-quality datasets while accommodating the limitations of the OT environment.”

“I think the main challenge is obvious. Organizations don’t want to disclose their sensitive data, and rightfully so. It is also a tall order to ask asset owners to sanitize their own data for AI training,” Bodungen said. “As a result, we are left with mechanisms for proprietary training specifically on the asset owners’ data, or training models generically on sanitized data. Generative AI provides the opportunity to allow for accurate synthetic data creation as well as a deeper level of inferred analysis that traditional ML can’t achieve… not as easily, at least.”

Decoding zero-day threats with AI/ML and behavior analysis

The executives examine how AI/ML algorithms identify and address unknown or zero-day threats, and consider the role of behavioral analysis in this process.

Arkin illustrated this with an example, mentioning the recent release by NVIDIA of an open-source GNN-based autoencoder designed for NetFlow anomaly detection, which is part of the NVIDIA Morpheus cybersecurity AI software framework. “By modeling network flows as graphs, this approach can capture complex relationships between hosts, ports, and protocols, enabling the detection of anomalous traffic that is indicative of malicious activity.”

He added that cybersecurity ISVs (independent software vendors) in OT/ICS can leverage this capability to immediately detect anomalies in network traffic and react to them in real-time, minimizing their possible effect.

“AI can detect zero-day threats by focusing on behavioral analysis rather than known signatures,” Macre mentioned. “They establish a baseline of normal activity for OT/ICS devices and networks, then flag deviations—like unexpected data flows or protocol misuse—that could indicate an unknown attack. This proactive approach catches threats that evade traditional defenses. Behavioral analysis enables real-time response by prioritizing anomalies based on risk, empowering security teams to act swiftly, even without prior threat knowledge, ensuring robust protection against unknown exploits or ones we have never seen before.”

Buenaño noted that AI and ML algorithms play a crucial role in detecting and responding to unknown or zero-day threats, leveraging sophisticated models and behavioral analysis to enhance security protocols across networks. “Traditional security measures often rely on known signatures or heuristics to identify threats, which can leave organizations vulnerable to novel attacks that exploit zero-day vulnerabilities. AI-driven systems address this gap by utilizing advanced algorithms that can learn from vast datasets and recognize patterns of behavior that deviate from the norm.”

He added that one significant approach is cross-device learning, where algorithms analyze data from multiple devices within a network ecosystem. The collaborative learning process allows the AI to develop a comprehensive understanding of typical behaviors across different environments, enabling it to identify anomalous activities that signal potential threats.

Buenaño added that the approach enhances threat detection accuracy because it uses contextual information, such as user behavior, device interactions, and network traffic patterns, creating a nuanced understanding of what constitutes normal and abnormal activity. This methodology empowers organizations to detect unknown and zero-day threats more effectively, respond preemptively, and continuously reinforce their defenses, contributing to a more secure digital environment.

Bodungen explained that, without delving into technical details, the focus is on behavioral analysis, heuristics, and now, analysis through generative AI. AI, machine learning, and generative AI excel in understanding behavior, identifying patterns, and conducting analysis.

Double-edged sword of AI/ML: Mitigating false positives and negatives

The executives evaluate the risks associated with false positives and false negatives in AI/ML-driven threat detection and examine short-term strategies to mitigate these challenges.

Arkin said that AI-powered solutions should produce fewer false positives and negatives compared to traditional systems. “Off-the-shelf AI-powered SOC solutions, as an example, can leverage the data and context they gather from the massive amounts of data they process to automatically determine whether an event is a false positive or negative. In addition, there is usually further tuning that can be used with these models,” he added.

Macre said that his company manages false positives and negatives in AI/ML-driven threat detection by leveraging its unique self-learning AI that refines its understanding of OT/ICS environments over time. “It correlates anomalies with context, like device roles, reducing irrelevant alerts. False negatives are minimized by detecting subtle behavioral shifts, not just known threats. Short-term strategies include fine-tuning alert thresholds, integrating human oversight, and prioritizing high-risk anomalies for immediate review. This balances sensitivity and accuracy, ensuring critical threats aren’t missed while avoiding alert fatigue,” he added.

“Managing the risks of false positives and negatives in AI/ML-driven threat detection is crucial for maintaining security integrity and operational efficiency in increasingly complex digital environments,” Buenaño said. “False positives—instances where legitimate activities are flagged as threats—can lead to unnecessary alarm, wasted resources, and potential desensitization to real threats. Consequently, false negatives—where actual threats are not detected—can result in significant security breaches.”

To mitigate these risks, he suggested that organizations often implement several short-term strategies that focus on understanding the behaviors of devices within their networks and comparing these behaviors across datasets.

“One effective method is the deployment of vulnerability scanners that continuously assess system vulnerabilities and provide baseline behavior profiles for each device,” according to Buenaño. “By establishing these profiles, organizations can differentiate benign activity from malicious actions, reducing the likelihood of false positives. Additionally, anomaly detection techniques can be employed, allowing AI/ML algorithms to learn normal operational patterns of devices over time, thus improving the accuracy of threat detection.”

“‘Human in the loop’ is still the best practice, but using generative AI reflective agents (i.e., AI-based agents that can fact-check their own, or other, AI output) is becoming a viable solution,” Bodungen identified. “Generative AI is, of course, known to provide false information (hallucinations) sometimes, but it is exceptionally good at verifying and validating existing output.”

Exploring key skills for cybersecurity teams in AI and ML age

The executives discuss how cybersecurity professionals can effectively collaborate with AI and ML systems, highlighting the essential skills OT/ICS teams need to manage and interpret these advanced tools.

Arkin believes this provides a unique opportunity to rethink how cybersecurity is applied in OT networks. “Acquiring the right telemetry as input to AI/ML is key. As granular that data is, the more you can do and the more you can understand. And with an AI/ML-powered solution, it would be faster, more accurate, and way quicker. This re-thinking needs to factor in another question – which is how can we adopt enterprise-grade security in OT networks without jeopardizing operational resiliency? These are the questions we have put in front of ourselves when we created the NVIDIA Cybersecurity AI platform for OT and built a strong collaboration with our ecosystem of partners,” he added.

He identified that, as third-party cybersecurity ISVs, GSI, and even manufacturers are to continue to provide AI-powered solutions, the skillset needed by OT/ICS teams is to apply the right context and user feedback to create a data flywheel. “The idea being they would use natural language and get quicker, more accurate results that also include guided instructions. Automated threat triage, for example, and the ability to provide guided instructions, reduce, in some cases, the level of expertise needed.”

“To fully realize AI’s potential in cybersecurity, we need to be confident that AI cybersecurity systems are being developed and applied in a way that is secure and trustworthy so OT/ICS teams can view AI as a trusted partner,” Macre highlighted. “Essential skills for OT/ICS teams include understanding industrial protocols (e.g., Modbus, DNP3), interpreting behavioral anomalies, and contextualizing alerts within operational workflows. Basic data analysis and system administration skills help manage and tune the AI, ensuring it aligns with specific environments. This partnership enhances efficiency and strengthens defenses against sophisticated threats.”

Buenaño recognizes that in the rapidly evolving landscape of cybersecurity, collaboration between cybersecurity professionals and AI/ML systems is paramount in safeguarding OT/ICS from sophisticated cyber threats. “Effective collaboration begins with knowledge sharing, which is essential for developing robust AI/ML models to detect and remediate threats. Cybersecurity professionals must engage in active communication, exchanging insights about emerging threats, vulnerabilities, and detection strategies.”

He added that participating in industry conferences and open forums can further enhance this exchange of knowledge, allowing professionals to learn from case studies and share best practices. These platforms also foster connections among experts who can collectively address challenges, thus driving innovation in threat detection and incident response. Crowdsourcing is another effective strategy, enabling professionals to contribute to shared datasets that enhance the accuracy of AI/ML systems.

“By aggregating diverse inputs, organizations can build comprehensive datasets that reflect real-world threats, improving machine learning models’ ability to identify anomalies and protect systems,” Buenaño added. “Also, the use of standardized frameworks plays a critical role in enabling OT/ICS teams to manage and interpret AI/ML tools effectively. Frameworks such as the NIST Cybersecurity Framework or IEC 62443 provide guidelines for developing and implementing processes that are consistent across various environments. This standardization not only aids in streamlining detection and remediation processes but also fosters collaboration among different teams by creating a common language for interpreting data and responses to threats.”

Bodungen identified that in terms of AI/ML, this is a specialized skillset that limits where asset owners can collaborate, aside from providing training data and verifying outputs. “However, generative AI is much more accessible. OT/ICS teams can interact directly with agentic AI systems through ‘prompt engineering’ and as the ‘human in the loop.’ They can even create their own customized agent workflows to suit their needs. Teams need a thorough understanding of how generative AI systems work. Specifically, agentic systems.”

He added that although no-code agent creation systems do exist, it is highly advisable to be able to understand generative AI frameworks and their code base. “However, probably most importantly, is having an understanding of how to keep your data private and secure when working with large language models (LLMs), which is the foundation of generative AI.”