Optimal Abort Policy for Mission-Critical Systems Under Imperfect Condition Monitoring

Controlling stochastic systems often relies on the assumption of Markovian dynamics. However, this assumption frequently breaks down in mission-critical systems subject to failures—such as drones for power grid inspections—where the system failure rate increases over time. To enhance system survivab...

Full description

Saved in:

Bibliographic Details
Published in	Operations research
Main Authors	Sun, Qiuzhuang, Hu, Jiawen, Ye, Zhi-Sheng
Format	Journal Article
Language	English
Published	26.05.2025
Online Access	Get full text
ISSN	0030-364X 1526-5463
DOI	10.1287/opre.2022.0643

Cover

More Information
Summary:	Controlling stochastic systems often relies on the assumption of Markovian dynamics. However, this assumption frequently breaks down in mission-critical systems subject to failures—such as drones for power grid inspections—where the system failure rate increases over time. To enhance system survivability, operators may choose to abort missions based on noisy condition-monitoring signals. Yet, determining the optimal abort time in such settings leads to an intractable stopping problem under partial observability and non-Markovian behavior. In “Optimal Abort Policy for Mission-Critical Systems Under Imperfect Condition Monitoring,” Sun, Hu, and Ye introduce a novel Erlang mixture-based approximation that transforms the original non-Markovian process into continuous-time Markov chains. This approximation enables the formulation of partially observable Markov decision processes (POMDPs), whose optimal policies are shown to converge almost surely to the original optimal abort decision rules as the Erlang rate increases. Structural properties of the optimal POMDP policy are established, and a modified point-based value iteration algorithm is proposed to numerically solve the POMDP. Although most on-demand mission-critical systems are engineered to be reliable to support critical tasks, occasional failures may still occur during missions. To increase system survivability, a common practice is to abort the mission before an imminent failure. We consider optimal mission abort for a system whose deterioration follows a general three-state (normal, defective, failed) semi-Markov chain. The failure is assumed self-revealed, whereas the healthy and defective states have to be inferred from imperfect condition-monitoring data. Because of the non-Markovian process dynamics, optimal mission abort for this partially observable system is an intractable stopping problem. For a tractable solution, we introduce a novel tool of Erlang mixtures to approximate nonexponential sojourn times in the semi-Markov chain. This allows us to approximate the original process by a surrogate continuous-time Markov chain whose optimal control policy can be solved through a partially observable Markov decision process (POMDP). We show that the POMDP optimal policies converge almost surely to the optimal abort decision rules when the Erlang rate parameter diverges. This implies that the expected cost by adopting the POMDP solution converges to the optimal expected cost. Next, we provide comprehensive structural results on the optimal policy of the surrogate POMDP. Based on the results, we develop a modified point-based value iteration algorithm to numerically solve the surrogate POMDP. We further consider mission abort in a multitask setting where a system executes several tasks consecutively before a thorough inspection. Through a case study on an unmanned aerial vehicle, we demonstrate the capability of real-time implementation of our model, even when the condition-monitoring signals are generated with high frequency. Funding: This work was supported in part by the National Science Foundation of China [Grants 72171037, 72471144, 72371161, and 72071071], Singapore MOE AcRF Tier 2 grants [Grants A-8001052-00-00 and A-8002472-00-00], and the Future Resilient Systems project supported by the National Research Foundation Singapore under its CREATE program. Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2022.0643 .
ISSN:	0030-364X 1526-5463
DOI:	10.1287/opre.2022.0643