In this article, I will demonstrate how to move from simply forecasting outcomes to actively intervening in systems to steer toward desired goals. With hands-on examples in predictive maintenance, I will show how data-driven decisions can optimize operations and reduce downtime.
with descriptive analysis to investigate “what has happened”. In predictive analysis, we aim for insights and determine “what will happen”. With Bayesian prescriptive modeling, we can go beyond prediction and aim to intervene in the outcome. I will demonstrate how you can use data to “make it happen”. To do this, we need to understand the complex relationships between variables in a (closed) system. Modeling causal networks is key, and in addition, we need to make inferences to quantify how the system is affected in the desired outcome. In this article, I will briefly start by explaining the theoretical background. In the second part, I will demonstrate how to build causal models that guide decision-making for predictive maintenance. Finally, I will explain that in real-world scenarios, there is another important factor that needs to be considered: How cost-effective is it to prevent failures? I will use bnlearn for Python across all my analyses.
This blog contains hands-on examples! This will help you to learn quicker, understand better, and remember longer. Grab a coffee and try it out! Disclosure: I’m the author of the Python packages bnlearn.
What You Need To Know About Prescriptive Analysis: A Brief Introduction.
Prescriptive analysis may be the most powerful way to understand your business performance, trends, and to optimize for efficiency, but it is certainly not the first step you take in your analysis. The first step should be, like always, understanding the data in terms of descriptive analysis with Exploratory Data Analysis (EDA). This is the step where we need to figure out “what has happened”. This is super important because it provides us with deeper insights into the variables and their dependencies in the system, which subsequently helps to clean, normalize, and standardize the variables in our data set. Cleaned data set are the fundamentals in every analysis.
With the cleaned data set, we can start working on our prescriptive model. In general, for these types of analysis, we often need a lot of data. The reason is simple: the better we can learn a model that fits the data accurately, the better we can detect causal relationships. In this article, I will use the notion of ‘system’ frequently, so let me first define ‘system’. A system, in the context of prescriptive analysis and causal modeling, is a set of measurable variables or processes that influence each other and produce outcomes over time. Some variables will be the key players (the drivers), while others are less relevant (the passengers).
As an example, suppose we have a healthcare system that contains information about patients with their symptoms, treatments, genetics, environmental variables, and behavioral information. If we understand the causal process, we can intervene by influencing (one or multiple) driver variables. To improve the patient’s outcome, we may only need a relatively small change, such as improving their diet. Importantly, the variable that we aim to influence or intervene must be a driver variable to make it impactful. Generally speaking, changing variables for a desired outcome is something we do in our daily lives. From closing the window to prevent rain coming in to the advice from friends, family, or professionals that we take into consideration for a specific outcome. But this may also be a more trial-and-error procedure. With prescriptive analysis, we aim to determine the driver variables and then quantify what happens on intervention.
With prescriptive analysis we first need to distinguish the driver variables from the passengers, and then quantify what happens on intervention.
Throughout this article, I will focus on applications with systems that include physical components, such as bridges, pumps, dikes, in combination with environmental variables such as rainfall, river levels, soil erosion, and human decisions (e.g., maintenance schedules and costs). In the field of water management, there are classic cases of complex systems where prescriptive analysis can offer serious value. A great candidate for prescriptive analysis is predictive maintenance, which can increase operational time and decrease costs. Such systems often contain various sensors, making it data-rich. At the same time, the variables in systems are often interdependent, meaning that actions in one part of the system often ripple through and affect others. For example, opening a floodgate upstream can change water pressure and flow dynamics downstream. This interconnectedness is exactly why understanding causal relationships is important. When we understand the crucial parts in the entire system, we can more accurately intervene. With Bayesian modeling, we aim to uncover and quantify these causal relationships.
Variables in systems are often interdependent, meaning that intervention in one part of the system often ripple through and affect others.
In the next section, I will start with an introduction to Bayesian networks, together with practical examples. This will help you to better understand the real-world use case in the coming sections.
Bayesian Networks and Causal Inference: The Building Blocks.
At its core, a Bayesian network is a graphical model that represents probabilistic relationships between variables. These networks with causal relationships are powerful tools for prescriptive modeling. Let’s break this down using a classic example: the sprinkler system. Suppose you’re trying to figure out why your grass is wet. One possibility is that you turned on the sprinkler; another is that it rained. The weather plays a role too; on cloudy days, it’s more likely to rain, and the sprinkler might behave differently depending on the forecast. These dependencies form a network of causal relationships that we can model. With bnlearn
for Python, we can model the relationships as shown in the code block:
# Install Python bnlearn package
pip install bnlearn
# Import library
import bnlearn as bn
# Define the causal relationships
edges = [('Cloudy', 'Sprinkler'),
('Cloudy', 'Rain'),
('Sprinkler', 'Wet_Grass'),
('Rain', 'Wet_Grass')]
# Create the Bayesian network
DAG = bn.make_DAG(edges)
# Visualize the network
bn.plot(DAG)
This creates a Directed Acyclic Graph (DAG) where each node represents a variable, each edge represents a causal relationship, and the direction of the edge shows the direction of causality. So far, we have not modeled any data, but only provided the causal structure based on our own domain knowledge about the weather in combination with our understanding/ hypothesis of the system. Important to understand is that such a DAG forms the basis for Bayesian learning! We can thus either create the DAG ourselves or learn the structure from data using Structure Learning. See the next section on how to learn the DAG form data.
Learning Structure from Data.
In many occasions, we don’t know the causal relationships beforehand, but have the data that we can use to learn the structure. The bnlearn
library provides several structure-learning approaches that can be selected based on the type of input data (discrete, continuous, or mixed data sets); PC algorithm (named after Peter and Clark), Exhaustive-Search, Hillclimb-Search, Chow-Liu, Naivebayes, TAN, or Ica-lingam. But the decision for the type of algorithm is also based on the type of network you aim for. You can for example set a root node if you have a good reason for this. In the code block below you can learn the structure of the network using a dataframe where the variables are categorical. The output is a DAG that is identical to that of Figure 1.
# Import library
import bnlearn as bn
# Load Sprinkler data set
df = bn.import_example(data='sprinkler')
# Show dataframe
print(df)
+--------+------------+------+------------+
| Cloudy | Sprinkler | Rain | Wet_Grass |
+--------+------------+------+------------+
| 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 |
| 0 | 1 | 0 | 1 |
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 |
| ... | ... | ... | ... |
| 1000 | 1 | 0 | 0 |
+--------+------------+------+------------+
# Structure learning
model = bn.structure_learning.fit(df)
# Visualize the network
bn.plot(DAG)
DAGs Matter for Causal Inference.
The bottom line is that Directed Acyclic Graphs (DAGs) depict the causal relationships between the variables. This learned model forms the basis for making inferences and answering questions like:
- If we change X, what happens to Y?
- Or what’s the effect of intervening on X while holding others constant?
Making inferences is crucial for prescriptive modeling because it helps us understand and quantify the impact of the variables on intervention. As mentioned before, not all variables in systems are of interest or subject to intervention. In our simple use case, we can intervene for Wet grass based on Sprinklers, but we can not intervene for Wet Grass based on Rain or Cloudy conditions because we can not control the weather. In the next section, I will dive into the hands-on use case with a real-world example on predictive maintenance. I will demonstrate how to build and visualize causal models, how to learn structure from data, make interventions, and then quantify the intervention using inferences.
Generate Synthetic Data in Case You Only Have Experts’ Knowledge or Few Samples.
In many domains, such as healthcare, finance, cybersecurity, and autonomous systems, real-world data can be sensitive, expensive, imbalanced, or difficult to collect, particularly for rare or edge-case scenarios. This is where synthetic Data becomes a powerful alternative. There are, roughly speaking, two main categories of creating synthetic data: Probabilistic and Generative. In case you need more data, I would recommend reading this blog about [3]. It discusses various concepts of synthetic data generation together with hands-on examples. Among the discussed points are:
- Generate synthetic data that mimics existing continuous measurements (expected with independent variables).
- Generate synthetic data that mimics expert knowledge. (expected to be continuous and Independent variables).
- Generate synthetic Data that mimics an existing categorical dataset (expected with dependent variables).
- Generate synthetic data that mimics expert knowledge (expected to be categorical and with dependent variables).

A Real World Use Case In Predictive Maintenance.
To this point, I have briefly described the Bayesian theory and demonstrated how to learn structures using the sprinkler data set. In this section, we will work with a complex real-world data set to determine the causal relationships, perform inferences, and assess whether we can recommend interventions in the system to change the outcome of machine failures. Suppose you’re responsible for the engines that operate a water lock, and you’re trying to understand what factors drive potential machine failures because your goal is to keep the engines running without failures. In the following sections, we will stepwise go through the data modeling parts and try to figure out how we can keep the engines running without failures.

Step 1: Data Understanding.
The data set we will use is a predictive maintenance data set [1] (CC BY 4.0 licence). It captures a simulated but realistic representation of sensor data from machinery over time. In our case, we treat this as if it were collected from a complex infrastructure system, such as the motors controlling a water lock, where equipment reliability is critical. See the code block below to load the data set.
# Import library
import bnlearn as bn
# Load data set
df = bn.import_example('predictive_maintenance')
# print dataframe
+-------+------------+------+------------------+----+-----+-----+-----+-----+
| UDI | Product ID | Type | Air temperature | .. | HDF | PWF | OSF | RNF |
+-------+------------+------+------------------+----+-----+-----+-----+-----+
| 1 | M14860 | M | 298.1 | .. | 0 | 0 | 0 | 0 |
| 2 | L47181 | L | 298.2 | .. | 0 | 0 | 0 | 0 |
| 3 | L47182 | L | 298.1 | .. | 0 | 0 | 0 | 0 |
| 4 | L47183 | L | 298.2 | .. | 0 | 0 | 0 | 0 |
| 5 | L47184 | L | 298.2 | .. | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | .. | ... | ... | ... | ... |
| 9996 | M24855 | M | 298.8 | .. | 0 | 0 | 0 | 0 |
| 9997 | H39410 | H | 298.9 | .. | 0 | 0 | 0 | 0 |
| 9998 | M24857 | M | 299.0 | .. | 0 | 0 | 0 | 0 |
| 9999 | H39412 | H | 299.0 | .. | 0 | 0 | 0 | 0 |
|10000 | M24859 | M | 299.0 | .. | 0 | 0 | 0 | 0 |
+-------+-------------+------+------------------+----+-----+-----+-----+-----+
[10000 rows x 14 columns]
The predictive maintenance data set is a so-called mixed-type data set containing a combination of continuous, categorical, and binary variables. It captures operational data from machines, including both sensor readings and failure events. For instance, it includes physical measurements like rotational speed, torque, and tool wear (all continuous variables reflecting how the machine is behaving over time). Alongside these, we have categorical information such as the machine type and environmental data like air temperature. The data set also records whether specific types of failures occurred, such as tool wear failure or heat dissipation failure, represented as binary variables. This mix of variables allows us to not only observe what happens under different conditions but also explore the potential causal relationships that might drive machine failures.

Step 2: Data Cleaning
Before we can begin learning the causal structure of this system using Bayesian methods, we need to perform some pre-processing steps first. The first step is to remove irrelevant columns, such as unique identifiers (UID
and Product ID
), which holds no meaningful information for modeling. If there were missing values, we may have needed to impute or remove them. In this data set, there are no missing values. If there were missing values, bnlearn
provide two imputation methods for handling missing data, namely the K-Nearest Neighbor imputer (knn_imputer
) and the MICE imputation approach (mice_imputer
). Both methods follow a two-step approach in which first the numerical values are imputed, then the categorical values. This two-step approach is an enhancement on existing methods for handling missing values in mixed-type data sets.
# Remove IDs from Dataframe
del df['UDI']
del df['Product ID']
Step 3: Discretization Using Probability Density Functions.
Most of the Bayesian models are designed to model categorical variables. Continuous variables can distort computations because they require assumptions about the underlying distributions, which are not always easy to validate. In case of the data sets that contain both continuous and discrete variables, it is best to discretize the continuous variables. There are multiple ways for discretization, and in bnlearn the following solutions are implemented:
- Discretize using probability density fitting. This approach automatically fits the best distribution for the variable and bins it into 95% confidence intervals (the thresholds can be adjusted). A semi-automatic approach is recommended as the default CII (upper, lower) intervals may not correspond to meaningful domain-specific boundaries.
- Discretize using a principled Bayesian discretization method. This approach requires providing the DAG before applying the discretization method. The underlying idea is that experts’ knowledge will be included in the discretization approach, and therefore increase the accuracy of the binning.
- Do not discretize but model continuous and hybrid data sets in a semi-parametric approach. There are two approaches implemented in
bnlearn
are those that can handle mixed data sets; Direct-lingam and Ica-lingam, which both assume linear relationships. - Manually discretizing using the expert’s domain knowledge. Such a solution can be beneficial, but it requires expert-level mechanical knowledge or access to detailed operational thresholds. A limitation is that it can introduce certain bias into the variables as the thresholds reflect subjective assumptions and may not capture the true underlying variability or relationships in the data.
Approach 2 and 3 may be less suitable for our current use case because Bayesian discretization methods often require strong priors or assumptions about the system (DAG) that I cannot confidently provide. The semi-parametric approach, on the other hand, may introduce unnecessary complexity for this relatively small data set. The discretization approach that I will use is a combination of probability density fitting [3] in combination with the specifications about the operation ranges of the mechanical devices. I don’t have expert-level mechanical knowledge to confidently set the thresholds. However, the specifications are listed for normal mechanical operations in the documentation [1]. Let me elaborate more on this. The data set description lists the following specifications: Air Temperature is measured in Kelvin, and around 300 K with a standard deviation of 2 K. The Process temperature within the manufacturing process is approximately the Air Temperature plus 10 K. The Rotational speed of the machine is in revolutions per minute, and calculated from a power of 2860 W. The Torque is in Newton-meters, and around 40 Nm without negative values. The Tool wear is the cumulative minutes. With this information, we can define whether we need to set lower and/ or upper boundaries for our probability density fitting approach.

See Table 2
where I defined normal and critical operation ranges, and the code block below to set the threshold values based on the data distributions of the variables.
pip install distfit
# Discretize the following columns
colnames = ['Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]']
colors = ['#87CEEB', '#FFA500', '#800080', '#FF4500', '#A9A9A9']
# Apply distribution fitting to each variable
for colname, color in zip(colnames, colors):
# Initialize and set 95% confidence interval
if colname=='Tool wear [min]' or colname=='Process temperature [K]':
# Set model parameters to determine the medium-high ranges
dist = distfit(alpha=0.05, bound='up', stats='RSS')
labels = ['medium', 'high']
else:
# Set model parameters to determine the low-medium-high ranges
dist = distfit(alpha=0.05, stats='RSS')
labels = ['low', 'medium', 'high']
# Distribution fitting
dist.fit_transform(df[colname])
# Plot
dist.plot(title=colname, bar_properties={'color': color})
plt.show()
# Define bins based on distribution
bins = [df[colname].min(), dist.model['CII_min_alpha'], dist.model['CII_max_alpha'], df[colname].max()]
# Remove None
bins = [x for x in bins if x is not None]
# Discretize using the defined bins and add to dataframe
df[colname + '_category'] = pd.cut(df[colname], bins=bins, labels=labels, include_lowest=True)
# Delete the original column
del df[colname]
This semi-automated approach determines the optimal binning for each variable given the critical operation ranges. We thus fit a probability density function (PDF) to each continuous variable and use statistical properties, such as the 95% confidence interval, to define categories like low, medium, and high. This approach preserves the underlying distribution of the data while still allowing for interpretable discretization aligned with natural variations in the system. This allows to create bins that are both statistically sound and interpretable. As always, plot the results and make sanity checks, as the resulting intervals may not always align with meaningful, domain-specific thresholds. See Figure 2 with the estimated PDFs and thresholds for the continuous variables. In this scenario, we see nicely that two variables are binned into medium-high, while the rest are in low-medium-high.

Step 4: The Final Cleaned Data set.
At this point, we have a cleaned and discretized data set. The remaining variables in the data set are failure modes (TWF
, HDF
, PWF
, OSF
, RNF
) which are boolean variables for which no transformation step is needed. These variables are kept in the model because of their possible relationships with the other variables. As an example, Torque
can be linked to OSF
(overstrain failure), or Air temperature
differences with HDF
(heat dissipation failure), or Tool Wear
is linked with TWF
(tool wear failure). In the data set description is described that if at least one failure mode is true, the process fails, and the Machine Failure label is set to 1. It is, however, not transparent which of the failure modes has caused the process to fail. Or in other words, the Machine Failure label is a composite outcome: it only tells you that something went wrong, but not which causal path led to the failure. In the last step we will learning the structure to discover the causal network.
Step 5: Learning The Causal Structure.
In this step, we will determine the causal relationships. In contrast to supervised Machine Learning approaches, we do not need to set a target variable such as Machine Failure. The Bayesian model will learn the causal relationships based on the data using a search strategy and scoring function. A scoring function quantifies how well a specific DAG explains the observed data, and the search strategy is to efficiently walk through the entire search space of DAGs to eventually find the most optimal DAG without testing them all. For this use case, we will use HillClimbSearch as a search strategy and the Bayesian Information Criterion (BIC) as a scoring function. See the code block to learn the structure using Python bnlearn
.
# Structure learning
model = bn.structure_learning.fit(df, methodtype='hc', scoretype='bic')
# [bnlearn] >Warning: Computing DAG with 12 nodes can take a very long time!
# [bnlearn] >Computing best DAG using [hc]
# [bnlearn] >Set scoring type at [bds]
# [bnlearn] >Compute structure scores for model comparison (higher is better).
print(model['structure_scores'])
# {'k2': -23261.534992034045,
# 'bic': -23296.9910477033,
# 'bdeu': -23325.348497769708,
# 'bds': -23397.741317668322}
# Compute edge weights using ChiSquare independence test.
model = bn.independence_test(model, df, test='chi_square', prune=True)
# Plot the best DAG
bn.plot(model, edge_labels='pvalue', params_static={'maxscale': 4, 'figsize': (15, 15), 'font_size': 14, 'arrowsize': 10})
dotgraph = bn.plot_graphviz(model, edge_labels='pvalue')
dotgraph
# Store to pdf
dotgraph.view(filename='bnlearn_predictive_maintanance')
Each model can be scored based on its structure. However, the scores do not have straightforward interpretability, but can be used to compare different models. A higher score represents a better fit, but remember that scores are usually log-likelihood based, so a less negative score is thus better. From the results, we can see that K2=-23261
scored the best, meaning that the learned structure had the best fit on the data.
However, the differences in score with BIC=-23296
is very small. I then prefer choosing the DAG determined by BIC
over K2
as DAGs detected BIC
are generally sparser, and thus cleaner, as it adds a penalty for complexity (number of parameters, number of edges). The K2
approach, on the other hand, determines the DAG purely on the likelihood or the fit on the data. Thus, there is no penalty for making a more complex network (more edges, more parents). The causal DAG is shown in Figure 3, and in the next section I will interpret the results. This is exciting because does the DAG makes sense and can we actively intervene in the system towards our desired outcome? Keep on reading!

Identify Potential Interventions for Machine Failure.
I introduced the idea that Bayesian analysis enables active intervention in a system. Meaning that we can steer towards our desired outcomes, aka the prescriptive analysis. To do so, we first need a causal understanding of the system. At this point, we have obtained our DAG (Figure 3) and can start interpreting the DAG to determine the possible driver variables of machine failures.
From Figure 3, it can be observed that the Machine Failure label is a composite outcome; it is influenced by multiple underlying variables. We can use the DAG to systematically identify the variables for intervention of machine failures. Let’s start by examining the root variable, which is PWF (Power Failure). The DAG shows that preventing power failures would directly contribute to preventing machine failures overall. Although this finding is intuitive (aka power issues lead to system failure), it is important to recognize that this conclusion has now been derived purely from data. If it were a different variable, we needed to think about it what it could mean and whether the DAG is accurate for our data set.
When we continue to examine the DAG, we see that Torque is linked to OSF (Overstrain Failure). Air Temperature is linked to HDF (Heat Dissipation Failure), and Tool Wear is linked to TWF (Tool Wear Failure). Ideally, we expect that failure modes (TWF
, HDF
, PWF
, OSF
, RNF
) are effects, while physical variables like Torque, Air Temperature, and Tool Wear act as causes. Although structure learning detected these relationships quite well, it does not always capture the correct causal direction purely from observational data. Nonetheless, the discovered edges provide actionable starting points that can be used to design our interventions:
- Torque → OSF (Overstrain Failure):
Actively monitoring and controlling torque levels can prevent overstrain-related failures. - Air Temperature → HDF (Heat Dissipation Failure):
Managing the ambient environment (e.g., through improved cooling systems) may reduce heat dissipation issues. - Tool Wear → TWF (Tool Wear Failure):
Real-time tool wear monitoring can prevent tool wear failures.
Additionally, Random Failures (RNF) are not detected with any outgoing or incoming connections, indicating that such failures are truly stochastic within this data set and cannot be mitigated through interventions on observed variables. This is a great sanity check for the model because we would not expect the RNF to be important in the DAG!
Quantify with Interventions.
Up to this point, we have learned the structure of the system and identified which variables can be targeted for intervention. However, we are not finished yet. To make these interventions meaningful, we must quantify the expected outcomes.
This is where inference in Bayesian networks comes into play. Let me elaborate a bit more on this because when I describe intervention, I mean changing a variable in the system, like keeping Torque at a low level, or reducing Tool Wear before it hits high values, or making sure Air Temperature stays stable. In this manner, we can reason over the learned model because the system is interdependent, and a change in one variable can ripple throughout the entire system.
To make these interventions meaningful, we must quantify the expected outcomes.
The use of inferences is thus important and for various reasons: 1. Forward inference, where we aim to predict future outcomes given current evidence. 2. Backward inference, where we can diagnose the most likely cause after an event has occurred. 3. Counterfactual inference to simulate the “what-if” scenarios. In the context of our predictive maintenance data set, inference can now help answer specific questions. But first, we need to learn the inference model, which is done easily as shown in the code block below. With the model we can start asking questions and see how its effects ripples throughout the system.
# Learn inference model
model = bn.parameter_learning.fit(model, df, methodtype="bayes")
What is the probability of a Machine Failure if Torque is high?
q = bn.inference.fit(model, variables=['Machine failure'],
evidence={'Torque [Nm]_category': 'high'},
plot=True)
+-------------------+----------+
| Machine failure | p |
+===================+==========+
| 0 | 0.584588 |
+-------------------+----------+
| 1 | 0.415412 |
+-------------------+----------+
Machine failure = 0: No machine failure occurred.
Machine failure = 1: A machine failure occurred.
Given that the Torque is high:
There is about a 58.5% chance the machine will not fail.
There is about a 41.5% chance the machine will fail.
A High Torque value thus significantly increases the risk of machine failure.
Think about it, without conditioning, machine failure probably happens
at a much lower rate. Thus, controlling the torque and keeping it out of
the high range could be an important prescriptive action to prevent failures.

If we manage to keep the Air Temperature in the medium range, how much does the probability of Heat Dissipation Failure decrease?
q = bn.inference.fit(model, variables=['HDF'],
evidence={'Air temperature [K]_category': 'medium'},
plot=True)
+-------+-----------+
| HDF | p |
+=======+===========+
| 0 | 0.972256 |
+-------+-----------+
| 1 | 0.0277441 |
+-------+-----------+
HDF = 0 means "no heat dissipation failure."
HDF = 1 means "there is a heat dissipation failure."
Given that the Air Temperature is kept at a medium level:
There is a 97.22% chance that no failure will happen.
There is only a 2.77% chance that a failure will happen.

Given that a Machine Failure has occurred, which failure mode (TWF, HDF, PWF, OSF, RNF) is the most probable cause?
q = bn.inference.fit(model, variables=['TWF', 'HDF', 'PWF', 'OSF'],
evidence={'Machine failure': 1},
plot=True)
+----+-------+-------+-------+-------+-------------+
| | TWF | HDF | PWF | OSF | p |
+====+=======+=======+=======+=======+=============+
| 0 | 0 | 0 | 0 | 0 | 0.0240521 |
+----+-------+-------+-------+-------+-------------+
| 1 | 0 | 0 | 0 | 1 | 0.210243 | <- OSF
+----+-------+-------+-------+-------+-------------+
| 2 | 0 | 0 | 1 | 0 | 0.207443 | <- PWF
+----+-------+-------+-------+-------+-------------+
| 3 | 0 | 0 | 1 | 1 | 0.0321357 |
+----+-------+-------+-------+-------+-------------+
| 4 | 0 | 1 | 0 | 0 | 0.245374 | <- HDF
+----+-------+-------+-------+-------+-------------+
| 5 | 0 | 1 | 0 | 1 | 0.0177909 |
+----+-------+-------+-------+-------+-------------+
| 6 | 0 | 1 | 1 | 0 | 0.0185796 |
+----+-------+-------+-------+-------+-------------+
| 7 | 0 | 1 | 1 | 1 | 0.00499062 |
+----+-------+-------+-------+-------+-------------+
| 8 | 1 | 0 | 0 | 0 | 0.21378 | <- TWF
+----+-------+-------+-------+-------+-------------+
| 9 | 1 | 0 | 0 | 1 | 0.00727977 |
+----+-------+-------+-------+-------+-------------+
| 10 | 1 | 0 | 1 | 0 | 0.00693896 |
+----+-------+-------+-------+-------+-------------+
| 11 | 1 | 0 | 1 | 1 | 0.00148291 |
+----+-------+-------+-------+-------+-------------+
| 12 | 1 | 1 | 0 | 0 | 0.00786678 |
+----+-------+-------+-------+-------+-------------+
| 13 | 1 | 1 | 0 | 1 | 0.000854361 |
+----+-------+-------+-------+-------+-------------+
| 14 | 1 | 1 | 1 | 0 | 0.000927891 |
+----+-------+-------+-------+-------+-------------+
| 15 | 1 | 1 | 1 | 1 | 0.000260654 |
+----+-------+-------+-------+-------+-------------+
Each row represents a possible combination of failure modes:
TWF: Tool Wear Failure
HDF: Heat Dissipation Failure
PWF: Power Failure
OSF: Overstrain Failure
Most of the time, when a machine failure occurs, it can be traced back to
exactly one dominant failure mode:
HDF (24.5%)
OSF (21.0%)
PWF (20.7%)
TWF (21.4%)
Combined failures (e.g., HDF + PWF active at the same time) are much
less frequent (<5% combined).
When a machine fails, it's almost always due to one specific failure mode and not a combination.
Heat Dissipation Failure (HDF) is the most common root cause (24.5%), but others are very close.
Intervening on these individual failure types could significantly reduce machine failures.
I demonstrated three examples using inferences with interventions at different points. Remember that to make the interventions meaningful, we must thus quantify the expected outcomes. If we don’t quantify how much these actions will change the probability of machine failure, we are just guessing. The quantification, “If I lower Torque, what happens to failure probability?” is exactly what inference in Bayesian networks does as it updates the probabilities based on our intervention (the evidence), and then tells us how much impact our control action will have. I do have one last section that I want to share, which is about cost-sensitive modeling. The question you should ask yourself is not just: “Can I predict or prevent failures?” but how cost-effective is it? Keep on reading into the next section!
Cost Sensitive Modeling: Finding the Sweet-Spot.
How cost-effective is it to prevent failures? This is the question you should ask yourself before “Can I prevent failures?”. When we build prescriptive maintenance models and recommend interventions based on model outputs, we must also understand the economic returns. This moves the discussion from pure model accuracy to a cost-optimization framework.
One way to do this is by translating the traditional confusion matrix into a cost-optimization matrix, as depicted in Figure 6. The confusion matrix has the four known states (A), but each state can have a different cost implication (B). For illustration, in Figure 6C, a premature replacement (false positive) costs €2000 in unnecessary maintenance. In contrast, missing a true failure (false negative) can cost €8000 (including €6000 damage and €2000 replacement costs). This asymmetry highlights why cost-sensitive modeling is critical: False negatives are 4x more costly than false positives.

In practice, we should therefore not only optimize for model performance but also minimize the total expected costs. A model with a higher false positive rate (premature replacement) can therefore be more optimal if it significantly reduces the costs compared to the much costlier false negatives (Failure). Having said this, this does not mean that we should always go for premature replacements because, besides the costs, there is also the timing of replacing. Or in other words, when should we replace equipment?
The exact moment when equipment should be replaced or serviced is inherently uncertain. Mechanical processes with wear and tear are stochastic. Therefore, we cannot expect to know the precise point of optimal intervention. What we can do is look for the so-called sweet spot for maintenance, where intervention is most cost-effective, as depicted in Figure 7.

This figure shows how the costs of owning (orange) and repairing an asset (blue) evolve over time. At the start of an asset’s life, owning costs are high (but decrease steadily), while repair costs are low (but rise over time). When these two trends are combined, the total cost initially declines but then starts to increase again.
The sweet spot occurs in the period where the total cost of ownership and repair is at its lowest. Although the sweet spot can be estimated, it usually cannot be pinpointed exactly because real-world conditions vary. We can better define a sweet-spot window. Good monitoring and data-driven strategies allow us to stay close to it and avoid the steep costs associated with unexpected failure later in the asset’s life. Acting during this sweet-spot window (e.g., replacing, overhauling, etc) ensures the best financial outcome. Intervening too early means missing out on usable life, while waiting too long leads to rising repair costs and an increased risk of failure. The main takeaway is that effective asset management aims to act near the sweet spot, avoiding both unnecessary early replacement and costly reactive maintenance after failure.
Wrapping up.
In this article, we moved from a RAW data set to a causal Directed Acyclic Graph (DAG), which enabled us to go beyond descriptive statistics to prescriptive analysis. I demonstrated a data-driven approach to learn the causal structure of a data set and to identify which aspects of the system can be adjusted to improve and reduce failure rates. Before making interventions, we also must perform inferences, which give us the updated probabilities when we fix (or observe) certain variables. Without this step, the intervention is just guessing because actions in one part of the system often ripple through and affect others. This interconnectedness is exactly why understanding causal relationships is so important.
Before moving into prescriptive analytics and taking action based on our analytical interventions, it is highly recommended to research whether the cost of failure outweighs the cost of maintenance. The challenge is to find the sweet spot: the point where the cost of preventive maintenance is balanced against the rising risk and cost of failure. I showed with Bayesian inference how variables like Torque can shift the failure probability. Such insights provides understanding of the impact of intervention. The timing of the intervention is crucial to make it cost-effective; being too early would waste resources, and being too late can result in high failure costs.
Just like all other models, Bayesian models are also “just” models, and the causal network needs experimental validation before making any critical decisions.
Be safe. Stay frosty.
Cheers, E.
You have come to the end of this article! I hope you enjoyed and learned a lot! Experiment with the hands-on examples! This will help you to learn quicker, understand better, and remember longer.
Software
Let’s connect!
References
- AI4I 2020 Predictive Maintenance Data set. (2020). UCI Machine Learning Repository. Licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0).
- E. Taskesen, bnlearn for Python library.
- E. Taskesen, How to Generate Synthetic Data: A Comprehensive Guide Using Bayesian Sampling and Univariate Distributions, Towards Data Science (TDS), May 2026