Age-based maintenance under population heterogeneity: Optimal exploration and exploitation
نویسندگان
چکیده
We consider a system with finite lifespan and single critical component that is subject to random failures. An age-based replacement policy applied preventively replace the before its failure. The components used for come from either weak population or strong population, referred as heterogeneity. However, true type unknown decision maker. By considering maker has belief on probability of having we build partially observable Markov process model objective minimizing total cost over system. resulting optimal updates variable in Bayesian fashion by using data obtained course lifespan, it denotes when execute preventive replacement. It optimally balances trade-off between learning (via deliberately delaying time better learn type) maintenance activities. addressing this so-called exploration-exploitation trade-off, generate insights compare performance existing heuristic approaches literature. also characterize lower bound cost, allowing us determine value resolving uncertainty type.
منابع مشابه
Nearly Optimal Exploration-Exploitation Decision Thresholds
While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. Optimal decision thresholds for the multi-armed bandit problem, one for the infinite horizon discounted reward case and one for the finite horizon undiscounted reward case are derived, which make the link between the reward horizon, uncertainty ...
متن کاملShortest Path under Uncertainty: Exploration versus Exploitation
In the Canadian Traveler Problem (CTP), a traveler seeks a shortest path to a destination through a road network, but unknown to the traveler, some roads may be blocked. This paper studies the Bayesian CTP (BCTP), in which road states are correlated with known prior probabilities and the traveler can infer the states of an unseen road from past observations of other correlated roads. As general...
متن کاملHuman and Optimal Exploration and Exploitation in Bandit Problems
We consider a class of bandit problems in which a decision-maker must choose between a set of alternativeseach of which has a fixed but unknown rate of rewardto maximize their total number of rewards over a short sequence of trials. Solving these problems requires balancing the need to search for highly-rewarding alternatives with the need to capitalize on those alternatives already known to be...
متن کاملInfomax strategies for an optimal balance between exploration and exploitation
Proper balance between exploitation and exploration is what makes good decisions, which achieve high rewards like payoff or evolutionary fitness. The Infomax principle postulates that maximization of information directs the function of diverse systems, from living systems to artificial neural networks. While specific applications are successful, the validity of information as a proxy for reward...
متن کاملAn Optimal Exploration-Exploitation Approach for Assortment Selection
We consider an online assortment optimization problem, where in every round, the retailer offers a Kcardinality subset (assortment) of N substitutable products to a consumer, and observes the response. We model consumer choice behavior using the widely used multinomial logit (MNL) model, and consider the retailer’s problem of dynamically learning the model parameters, while optimizing cumulativ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: European Journal of Operational Research
سال: 2022
ISSN: ['1872-6860', '0377-2217']
DOI: https://doi.org/10.1016/j.ejor.2021.11.038