DEEP DETERMINISTIC POLICY GRADIENT WITH GENERALIZED INTEGRAL COMPENSATOR FOR HEIGHT CONTROL OF QUADROTOR

Anlin Liu; Lei Liu; Jinde Cao; Fawaz E. Alsaadi

doi:10.11948/20210442

2022 Volume 12 Issue 3

Article Contents

Article navigation > Journal of Applied Analysis & Computation > 2022 > 12(3): 868-894

Next Article Previous Article

Anlin Liu, Lei Liu, Jinde Cao, Fawaz E. Alsaadi. DEEP DETERMINISTIC POLICY GRADIENT WITH GENERALIZED INTEGRAL COMPENSATOR FOR HEIGHT CONTROL OF QUADROTOR[J]. Journal of Applied Analysis & Computation, 2022, 12(3): 868-894. doi: 10.11948/20210442

Citation:

Anlin Liu, Lei Liu, Jinde Cao, Fawaz E. Alsaadi. DEEP DETERMINISTIC POLICY GRADIENT WITH GENERALIZED INTEGRAL COMPENSATOR FOR HEIGHT CONTROL OF QUADROTOR[J]. Journal of Applied Analysis & Computation, 2022, 12(3): 868-894. doi: 10.11948/20210442

DEEP DETERMINISTIC POLICY GRADIENT WITH GENERALIZED INTEGRAL COMPENSATOR FOR HEIGHT CONTROL OF QUADROTOR

1.
College of Science, Hohai University, Nanjing 210098, China
2.
School of Mathematics, Southeast University, Nanjing 210096, China
3.
Yonsei Frontier Lab, Yonsei University, Seoul 03722, South Korea
4.
Department of information Technology, Faculty of Computing and IT, King Abdulaziz University, Jeddah, Saudi Arabia

Dedicated to Professor Jibin Li on the occasion of his 80th birthday.
Corresponding author: Email: liulei_hust@163.com(L. Liu), liulei_hust@hhu.edu.cn(L. Liu)

Abstract

Abstract

This paper is corned with the desired height control of the quadrotor under the framework of deep deterministic policy gradient with prioritized experience replay (PER-DDPG) algorithm. The reward functions are designed based on an out-of-bounds plenty mechanism. By introducing a generalized integral compensator to the actor-critic structure, the PER-DDPG-GIC algorithm is proposed. The quadrotor is controlled by a neural network trained by the proposed PER-DDPG-GIC algorithm, which maps the system state to control commands directly. The simulation results demonstrate that introduction of generalized integral compensator mechanism can effectively reduce the steady-state error and the reward has been greatly enhanced. Moreover, the generalization ability and robustness, with respect to quadrotor models with different weights and sizes, have also been verified in simulations.
- Height control /
- quadrotor vehicle /
- deep reinforcement learning /
- deterministic policy gradient /
- neural network
MSC: 93C10, 93C40

FullText(HTML)

References

[1]	G. M. Barros and E. L. Colombini, Using soft actor-critic for low-level uav control, 2020. Google Scholar
[2]	L. Cao, X. Hu, S. Zhang and Y. Liu, Robust flight control design using sensor-based backstepping control for unmanned aerial vehicles, Journal of Aerospace Engineering, 2017, 30(6), 04017068. doi: 10.1061/(ASCE)AS.1943-5525.0000783 CrossRef Google Scholar
[3]	X. Cao, H. Wan, Y. Lin and S. Han, High-value prioritized experience replay for off-policy reinforcement learning, in 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), 2019. Google Scholar
[4]	A. Das, K. Subbarao and F. Lewis, Dynamic inversion with zero-dynamics stabilisation for quadrotor control, IET Control Theory & Applications, 2009, 3(3), 303-314. Google Scholar
[5]	T. De Bruin, J. Kober, K. Tuyls and R. Babuska, Experience selection in deep reinforcement learning for control, 2018. Google Scholar
[6]	T. Dierks and S. Jagannathan, Output feedback control of a quadrotor uav using neural networks. , IEEE Transactions on Neural Networks, 2009, 21(1), 50-66. Google Scholar
[7]	A. R. Dooraki and D. J. Lee, An innovative bio-inspired flight controller for quad-rotor drones: Quad-rotor drone learning to fly using reinforcement learning, Robotics and Autonomous Systems, 2021, 135, 103671. doi: 10.1016/j.robot.2020.103671 CrossRef Google Scholar
[8]	N. T. Duc, Q. Hai, D. N. Van et al., An approach for UAV indoor obstacle avoidance based on AI technique with ensemble of ResNet8 and Res-DQN, in 2019 6th NAFOSTED Conference on Information and Computer Science (NICS), 2019. Google Scholar
[9]	J. Ghommam, M. Saad, S. Wright and M. Quan, Relay manoeuvre based fixed-time synchronized tracking control for UAV transport system, Aerospace Science and Technology, 2020, 103, 105887. doi: 10.1016/j.ast.2020.105887 CrossRef Google Scholar
[10]	U. H. Ghouri, M. U. Zafar, S. Bari et al., Attitude control of quad-copter using deterministic policy gradient algorithms (DPGA), in 2019 2nd International Conference on Communication, Computing and Digital systems (C-CODE), IEEE, 2019, 149-153. Google Scholar
[11]	J. Han, From PID to active disturbance rejection control, IEEE Transactions on Industrial Electronics, 2009, 56(3), 900-906. doi: 10.1109/TIE.2008.2011621 CrossRef Google Scholar
[12]	H. Hasselt, Double q-learning, Advances in neural information processing systems, 2010, 23, 2613-2621. Google Scholar
[13]	Y. Hou, L. Liu, Q. Wei et al., A novel ddpg method with prioritized experience replay, in 2017 IEEE international conference on systems, man, and cybernetics (SMC), IEEE, 2017, 316-321. Google Scholar
[14]	H. Hu and Q. Wang, Proximal policy optimization with an integral compensator for quadrotor control, Frontiers of Information Technology & Electronic Engineering, 2020, 21, 777-795. Google Scholar
[15]	Y. Jiang, Z. Mi and H. Wang, An improved OLSR protocol based on task driven used for military UAV swarm network, Intelligent Robotics and Applications, 2019. Google Scholar
[16]	S. Kapturowski, G. Ostrovski, J. Quan et al., Recurrent experience replay in distributed reinforcement learning, in International conference on learning representations, 2018. Google Scholar
[17]	W. Koch, R. Mancuso, R. West and A. Bestavros, Reinforcement learning for UAV attitude control, ACM Transactions on Cyber-Physical Systems, 2019, 3(2), 1-21. Google Scholar
[18]	L. Liu, B. Tian, X. Zhao and Q. Zong, UAV autonomous trajectory planning in target tracking tasks via a dqn approach, in 2019 IEEE International Conference on Real-time Computing and Robotics (RCAR), 2019. Google Scholar
[19]	A. Y. Ng, D. Harada and S. Russell, Policy invariance under reward transformations: Theory and application to reward shaping, in Icml, 99, 1999, 278-287. Google Scholar
[20]	C. Peng, Y. Bai, X. Gong et al., Modeling and robust backstepping sliding mode control with Adaptive RBFNN for a novel coaxial eight-rotor UAV, IEEE/CAA Journal of Automatica Sinica, 2015, 2(1), 56-64. doi: 10.1109/JAS.2015.7032906 CrossRef Google Scholar
[21]	S. Santos, C. L. Nascimento and S. N. Givigi, Design of attitude and path tracking controllers for quad-rotor robots using reinforcement learning, in Aerospace Conference, 2012. Google Scholar
[22]	T. Schaul, J. Quan, I. Antonoglou and D. Silver, Prioritized experience replay, arXiv preprint arXiv: 1511.05952, 2015. Google Scholar
[23]	M. Z. Shah, R. Samar and A. I. Bhatti, Guidance of air vehicles: A sliding mode approach, IEEE Transactions on Control Systems Technology, 2015, 23(1), 231-244. doi: 10.1109/TCST.2014.2322773 CrossRef Google Scholar
[24]	F. Shang, H. Chou, S. Liu and X. Wang, A framework of power pylon detection for UAV-based power line inspection, in 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), IEEE, 2020, 350-357. Google Scholar
[25]	D. Silver, G. Lever, N. Heess et al., Deterministic policy gradient algorithms, in International conference on machine learning, PMLR, 2014, 387-395. Google Scholar
[26]	A. K. Tiwari and S. V. Nadimpalli, Augmented random search for quadcopter control: An alternative to reinforcement learning, in International Journal of Information Technology and Computer Science(IJITCS), 2019. Google Scholar
[27]	H. Van Hasselt, A. Guez and D. Silver, Deep reinforcement learning with double q-learning, in Proceedings of the AAAI Conference on Artificial Intelligence, 30, 2016. Google Scholar
[28]	Y. Wang, J. Sun, H. He and C. Sun, Deterministic policy gradient with integral compensator for robust quadrotor control, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019, 50(10), 3713-3725. Google Scholar
[29]	Z. Wang, V. Bapst, N. Heess et al., Sample efficient actor-critic with experience replay, arXiv preprint arXiv: 1611.01224, 2016. Google Scholar
[30]	S. L. Waslander, G. M. Hoffmann, J. Jang and C. J. Tomlin, Multi-agent quadrotor testbed control design: integral sliding mode vs. reinforcement learning, in IEEE/RSJ International Conference on Intelligent Robots & Systems, 2005. Google Scholar
[31]	H. Yin and S. Pan, Knowledge transfer for deep reinforcement learning with hierarchical experience replay, in Proceedings of the AAAI Conference on Artificial Intelligence, 31, 2017. Google Scholar
[32]	B. Zhao, B. Xian, Y. Zhang and X. Zhang, Nonlinear robust adaptive tracking control of a quadrotor UAV via immersion and invariance methodology, IEEE Transactions on Industrial Electronics, 2015, 62(5), 2891-2902. doi: 10.1109/TIE.2014.2364982 CrossRef Google Scholar
[33]	J. Zhao, Y. Li, D. Hu and Z. Pei, Design on altitude control system of quad rotor based on laser radar, in 2016 IEEE International Conference on Aircraft Utility Systems (AUS), IEEE, 2016, 105-109. Google Scholar
[34]	N. Zhen, N. Malla and X. Zhong, Prioritizing useful experience replay for heuristic dynamic programming-based learning systems, IEEE Transactions on Cybernetics, 2018, 49(11), 3911-3922. Google Scholar