Prediction of Military UAV Development Cost Based on Partial Least Squares Regression

In recent years, the rapid advancement of computer and high-tech technologies has significantly enhanced the performance and complexity of military unmanned aerial vehicles (UAVs), making them a critical focus for nations worldwide. However, defense resources are limited, and accurately predicting the development cost of military UAVs holds substantial practical value. The development cost of military UAVs is influenced by numerous complex factors, such as maximum take-off mass, cruise velocity, flight altitude, endurance time, payload, and airframe length. These parameters interact in intricate ways, often exhibiting strong correlations. Moreover, due to the classified nature of military projects, sample data for military UAV development are typically scarce and incomplete. Traditional multivariate regression methods struggle with such small-sample, high-dimensional, and multicollinear data, leading to unreliable predictions. Therefore, exploring precise and robust prediction methodologies is paramount.

Neural networks, particularly backpropagation (BP) networks, have been employed for cost prediction, but they suffer from drawbacks like local minima and slow convergence. Radial basis function (RBF) networks offer improvements in approximation capability and learning speed. However, these methods often overlook the small-sample characteristics and the inherent multicollinearity among explanatory variables. In contrast, partial least squares regression (PLSR) excels in handling such challenges. PLSR, introduced by Wold and Albano in 1983, integrates principles from multiple linear regression, principal component analysis, and canonical correlation analysis. It provides a robust framework for modeling relationships between variables when data are limited and highly correlated. In this paper, I present a comprehensive approach to predicting military UAV development cost using PLSR, leveraging its unique advantages for small-sample analysis. I will detail the theoretical foundations, modeling steps, and practical application, supplemented with formulas, tables, and an illustrative example. The goal is to demonstrate that PLSR offers a reliable and interpretable alternative for cost estimation in the defense sector, where data constraints are common.

The core idea behind PLSR is to extract latent components (also called factors or scores) from both the predictor matrix $\mathbf{X}$ and the response vector $\mathbf{Y}$. These components are constructed to maximize the covariance between $\mathbf{X}$ and $\mathbf{Y}$, thereby addressing multicollinearity and enhancing predictive accuracy with limited samples. For military UAV cost prediction, let $\mathbf{X}$ be an $n \times p$ matrix of standardized predictor variables (e.g., airframe length, mass, velocity), where $n$ is the number of military UAV samples and $p$ is the number of predictors. Let $\mathbf{Y}$ be an $n \times 1$ vector of standardized development cost. PLSR seeks to find a set of orthogonal components $\mathbf{t}_1, \mathbf{t}_2, \ldots, \mathbf{t}_m$ that explain the variation in both $\mathbf{X}$ and $\mathbf{Y}$. The number of components $m$ is typically determined through cross-validation.

The PLSR algorithm proceeds iteratively. First, standardize $\mathbf{X}$ and $\mathbf{Y}$ to have zero mean and unit variance, resulting in matrices $\mathbf{E}_0$ and $\mathbf{F}_0$. For each component $h = 1, 2, \ldots, m$, compute the weight vector $\mathbf{w}_h$ as:

$$\mathbf{w}_h = \frac{\mathbf{E}_{h-1}^T \mathbf{F}_{h-1}}{\|\mathbf{E}_{h-1}^T \mathbf{F}_{h-1}\|}$$

Then, calculate the component score vector $\mathbf{t}_h = \mathbf{E}_{h-1} \mathbf{w}_h$. Next, perform regressions of $\mathbf{E}_{h-1}$ and $\mathbf{F}_{h-1}$ on $\mathbf{t}_h$ to obtain loadings $\mathbf{p}_h$ and regression coefficient $r_h$:

$$\mathbf{p}_h = \frac{\mathbf{E}_{h-1}^T \mathbf{t}_h}{\|\mathbf{t}_h\|^2}, \quad r_h = \frac{\mathbf{F}_{h-1}^T \mathbf{t}_h}{\|\mathbf{t}_h\|^2}$$

Update the residual matrices: $\mathbf{E}_h = \mathbf{E}_{h-1} – \mathbf{t}_h \mathbf{p}_h^T$ and $\mathbf{F}_h = \mathbf{F}_{h-1} – \mathbf{t}_h r_h$. This process continues until the desired number of components is extracted. The final predictive model for $\mathbf{Y}$ in terms of the original predictors is derived by back-transforming the standardized regression coefficients. The cumulative explanatory power of the components can be assessed using metrics like $R^2$ and cross-validated $Q^2$.

For military UAV applications, it is crucial to evaluate the importance of each predictor variable. PLSR provides the variable importance in projection (VIP) score, defined for the $j$-th predictor as:

$$\text{VIP}_j = \sqrt{\frac{p}{\sum_{h=1}^m \text{SS}( \mathbf{Y}, \mathbf{t}_h )} \sum_{h=1}^m \text{SS}( \mathbf{Y}, \mathbf{t}_h ) \cdot w_{hj}^2}$$

where $\text{SS}( \mathbf{Y}, \mathbf{t}_h )$ is the sum of squares explained by component $\mathbf{t}_h$ for $\mathbf{Y}$, and $w_{hj}$ is the weight for predictor $j$ in component $h$. Predictors with VIP scores greater than 1 are considered influential. Additionally, outlier detection is performed using Hotelling’s $T^2$ statistic to identify anomalous military UAV samples that may distort the model.

To apply PLSR for military UAV development cost prediction, I outline the following steps:

Data Collection: Gather historical data on military UAV development costs and associated technical parameters. Key parameters often include airframe length $L$ (m), maximum take-off mass $W$ (kg), cruise velocity $V$ (km/h), flight altitude $H$ (km), endurance time $T$ (h), and payload capacity $N$ (kg). These parameters encapsulate the performance characteristics that drive costs for military UAV systems.
Data Preprocessing: Examine the data for outliers and multicollinearity. Calculate correlation matrices to assess relationships among variables. Standardize all variables to mitigate scale effects.
Component Extraction: Apply the PLSR algorithm to extract latent components. Use cross-validation to determine the optimal number of components $m$. The cross-validated $Q^2$ statistic is computed as $Q_h^2 = 1 – \frac{\text{PRESS}_h}{\text{SS}_{h-1}}$, where PRESS is the prediction residual sum of squares. A component is retained if $Q_h^2 \geq 0.0975$.
Model Building: Construct the regression model linking the extracted components to development cost. Translate the model back to the original predictor space to obtain interpretable coefficients.
Validation and Prediction: Validate the model using hold-out samples or cross-validation. Use the model to predict costs for new military UAV designs and conduct sensitivity analyses to understand cost drivers.

To illustrate, I consider a dataset similar to that in prior literature, comprising seven military UAV prototypes. The predictors are as defined above, and the response is development cost in billions of US dollars. The data are shown in Table 1. I use six samples for training and one for testing to evaluate predictive performance.

Table 1: Development Cost and Parameters for Military UAV Samples
UAV Model	$L$ (m)	$W$ (kg)	$V$ (km/h)	$H$ (km)	$T$ (h)	$N$ (kg)	Cost (Billion $)
A	13.5	11622	557	19.8	42	900	3.71
B	5.25	480	306	4	7	130	1.33
C	2.08	160	218	4	4	165	0.95
D	4.27	400	30	2	5	14.5	1.02
E	13.5	10395	648	20.4	46	905	4.19
F	4.6	3900	555	15.2	12	450	2.65
K (Test)	8.22	1020	139	7.3	40	204	2.07

The correlation matrix (Table 2) reveals high multicollinearity among predictors, with many pairwise correlations exceeding 0.7. This confirms the inadequacy of ordinary least squares regression and underscores the need for PLSR when analyzing military UAV data.

Table 2: Correlation Matrix for Predictor Variables and Cost
Variable	$L$	$W$	$V$	$H$	$T$	$N$	Cost
$L$	1.000	0.959	0.726	0.845	0.982	0.919	0.942
$W$	0.959	1.000	0.822	0.945	0.981	0.984	0.966
$V$	0.726	0.822	1.000	0.942	0.788	0.869	0.913
$H$	0.845	0.945	0.942	1.000	0.901	0.976	0.983
$T$	0.982	0.981	0.788	0.901	1.000	0.966	0.952
$N$	0.919	0.984	0.869	0.976	0.966	1.000	0.980
Cost	0.942	0.966	0.913	0.983	0.952	0.980	1.000

Applying PLSR, I first standardize the data. The relationship between the first component $\mathbf{t}_1$ and the response $\mathbf{Y}$ is linear, justifying the use of a linear PLSR model. Outlier detection via Hotelling’s $T^2$ shows no anomalous military UAV samples. I extract one principal component, as cross-validation yields $Q_1^2 = 0.85 > 0.0975$, indicating sufficient predictive power. This component explains 92.49% of the variance in $\mathbf{X}$ and 97.97% in $\mathbf{Y}$. The VIP scores for all predictors are above 0.95, confirming that each parameter significantly influences military UAV development cost. The final regression model in original units is:

$$\text{Cost} = 0.5463 + 0.0469L + 0.0000471W + 0.000973V + 0.0296H + 0.0126T + 0.000635N$$

For the test military UAV model K, the predicted cost is $1.9616 billion, with an error of 5.24% compared to the actual $2.07 billion.

I compare PLSR with other methods: stepwise multivariate regression (SMR), BP neural network, and RBF neural network. The results are summarized in Table 3. PLSR achieves the lowest prediction error, demonstrating its efficacy for military UAV cost estimation with small samples.

Table 3: Comparison of Prediction Errors for Military UAV Development Cost
Method	Predicted Cost (Billion $)	Error (%)
PLSR	1.9616	5.24
SMR	1.7181	17.00
BP Neural Network	1.8900	8.70
RBF Neural Network	1.9600	5.30

The superiority of PLSR stems from its ability to handle multicollinearity and small sample sizes simultaneously. Unlike neural networks, PLSR offers interpretability through component analysis and VIP scores, allowing defense analysts to identify key cost drivers for military UAV projects. For instance, in this model, flight altitude $H$ and payload $N$ have high VIP scores, suggesting that enhancements in these areas may substantially impact development costs. Sensitivity analysis can be conducted by varying predictor values and observing cost changes, aiding in trade-off studies during military UAV design.

Furthermore, PLSR facilitates robust model validation. The cross-validated $Q^2$ ensures that the model generalizes well to unseen data, which is critical given the limited availability of military UAV cost data. The method also provides diagnostic tools, such as score plots and residual analysis, to assess model adequacy. For example, a plot of $\mathbf{t}_1$ versus $\mathbf{Y}$ can visually confirm linearity, while residual plots can detect heteroscedasticity or nonlinear patterns. These features make PLSR a comprehensive tool for cost forecasting in the military UAV domain.

In practice, collecting data for military UAV development can be challenging due to classification issues. However, even with few samples, PLSR can yield reliable estimates if the predictors are carefully chosen to reflect technological complexity. It is advisable to include parameters that capture aerodynamic design, propulsion systems, avionics, and mission capabilities. As military UAV technology evolves, updating the model with new data will enhance its accuracy. Additionally, integrating PLSR with domain knowledge, such as expert judgments on cost factors, can further improve predictions.

Beyond cost prediction, PLSR can be extended to other aspects of military UAV lifecycle management, such as operational cost estimation or reliability analysis. The methodology is flexible and can accommodate multiple response variables (e.g., using PLS2 for multidimensional outcomes). For instance, one could model both development cost and production cost simultaneously for a fleet of military UAVs. This multivariate approach can uncover shared latent factors driving multiple cost components, providing a holistic view of resource allocation.

In conclusion, I have demonstrated that partial least squares regression is a powerful and practical method for predicting military UAV development cost. Its ability to handle small, multicollinear datasets makes it particularly suitable for defense applications where data are scarce. Through a detailed case study, I showed that PLSR outperforms traditional regression and neural network techniques in terms of prediction accuracy. The model not only provides cost estimates but also offers insights into the relative importance of design parameters, supporting informed decision-making in military UAV procurement and development. Future work could explore nonlinear PLSR variants or integrate PLSR with simulation-based cost models to capture more complex relationships. As military UAV technologies continue to advance, robust cost prediction tools will remain essential for efficient resource management and strategic planning.