Realizing Midcourse Penetration With Deep Reinforcement Learning
نویسندگان
چکیده
A midcourse maneuver controller is obtained using deep reinforcement learning to maintain the survivability of a ballistic missile. First, abstracted as Markov decision process (MDP) with an unknown system state equation. Then, formed by Dueling Double Deep Q (D3Q) neural network used approximate state-action value function MDP. In order make controller’s intelligence improved learning, space, action and instant reward MDP are customized. The uses real-time situation input outputs ignition states pulse motors. Offline training shows that can achieve optimal strategy’s convergence after approximately 65 hours. Online tests demonstrate ability avoid interceptor intelligently account for entry error. scenarios multiple random factors, achieved penetration probability 100% mean re-entry error less than 5000 m.
منابع مشابه
Deep Reinforcement Learning with POMDPs
Recent work has shown that Deep Q-Networks (DQNs) are capable of learning human-level control policies on a variety of different Atari 2600 games [1]. Other work has looked at treating the Atari problem as a partially observable Markov decision process (POMDP) by adding imperfect state information through image flickering [2]. However, these approaches leverage a convolutional network structure...
متن کاملReinforcement Learning with Deep Architectures
There is both theoretical and empirical evidence that deep architectures may be more appropriate than shallow architectures for learning functions which exhibit hierarchical structure, and which can represent high level abstractions. An important development in machine learning research in the past few years has been a collection of algorithms that can train various deep architectures effective...
متن کاملDeep Reinforcement Learning with Double Q-Learning
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether this harms performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-le...
متن کاملOperation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm
: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...
متن کاملCollaborative Deep Reinforcement Learning
Besides independent learning, human learning process is highly improved by summarizing what has been learned, communicating it with peers, and subsequently fusing knowledge from dierent sources to assist the current learning goal. is collaborative learning procedure ensures that the knowledge is shared, continuously rened, and concluded from dierent perspectives to construct a more profound...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2021.3091605