Jian Fu and Yadong Zhong and Feng Yang
International Conference on Advanced Robotics and Mechatronics (ICARM)
The performance of deep neural networks deteriorates when the domain representing the underlying data distribution changes during training and testing. Domain generalization expects learning from multiple source domains to improve generalization to never-before-seen target domains. We propose hybrid domain generalization using source domain and multiple latent domains as a new research scenario, and we attempt to train a generalization model that self-generates latent domain labels. In order to solve this scenario, we use the MixStyle to generate latent domain samples and assume that the styles of the samples are closely related to their domains. Therefore, we propose that GMM cluster latent domains according to style features and iteratively assign pseudo domain labels before introducing them into adversarial training. By using image style features, Our proposed method successfully synthesizes latent domains and achieves adversarial domain generalization without latent domain labels. Meanwhile, considering that the original domain labels are underutilized, this method introduces an auxiliary feature extractor to improve the performance of the model. Experiments demonstrate that our method has excellent generalization performance and outperforms classical domain generalization methods.
Abstract
Jian Fu, Feng Yang, Yadong Zhong & Zhu Yang
School of Automation, Wuhan University of Technology, Wuhan 430070, China
How to make robots self-adaptive to obstacle avoidance in the process of human-robot collaboration is one of the challenges in the community. In an actual environment, robots often encounter unanticipated obstacles that make it difficult to complete a task. So we in this paper proposed an obstacle avoidance framework based on Interaction Probabilistic Movement Primitives (iProMP), which combines online static obstacle avoidance with offline static obstacle avoidance. For unanticipated obstacles in human-robot collaboration, we find obstacle avoidance trajectories by solving the Lagrange equation, and then the product of Gaussian distribution is used to fuse the two iProMP trajectories to smoothly switch from the original trajectory to the obstacle avoidance trajectory to achieve fast online static obstacle avoidance. However, the obstacle avoidance trajectory is not optimal. When human-robot collaboration is over, the obstacle is usually not immediately cleared, and the unanticipated obstacles become the anticipated obstacles. In order to obtain a better obstacle avoidance trajectory, Path Integral Policy Improvement with Covariance Matrix Adaptation algorithm is used to train the demonstration trajectories to obtain new iProMP parameters, using the new parameters of human-robot cooperation to realize offline static obstacle avoidance. Experimental results based on two-dimensional trajectory obstacle avoidance and UR5 obstacle avoidance demonstrate the feasibility and effectiveness of the proposed framework
Abstract
Jian Fu; Yucai Wang; Fan Luo; Xiaolong Li
2022 International Conference on Advanced Robotics and Mechatronics (ICARM)
Demonstration learning based on Probabilistic Movement Primitives (ProMP) has been widely used in robotics skill learning. For trajectory planning in traditional ProMP, the sequential online learning method is adopted. In other words, only one data point is considered at each time, and the model parameters are updated correspondingly. This usually leads to the problem that as the number of new data points to be fitted increases, old points that could be fitted accurately by the model are now not fitted accurately. In this paper, we demonstrate that the degree of uncertainty in the prediction distribution gradually decreases as the number of observed data points increases, which is responsible for the occurrence of the above phenomenon. To solve this problem, we propose a weight combination algorithm. Every point to be fitted is processed one by one and the basis functions that fall within the highly correlated range with the point to be fitted are involved in the regression operation. Finally, the weight vector components corresponding to these basis functions are concatenated and combined to obtain the complete weight vector. We mathematically prove that the new algorithm is better than the traditional online algorithm. At the end of this paper, the simulation experiments are given to prove the rationality of the new algorithm and the accuracy higher than the traditional ProMP.
Abstract
Jian Fu; Qifeng Wang
2023 International Conference on Advanced Robotics and Mechatronics (ICARM)
Robot vision, which integrates measurement and perception, plays an important role in robot manipulation applications. Although the current deep learning-based visual neural network models have high perception capabilities, their network structures are usually too large to be implemented on embedded devices for robot vision. In this paper, we propose neural network architecture search (NAS) that combines pre-training and pruning operations to simplify deep neural network architectures. It not only solves this problem without losing network accuracy, but also significantly alleviates the difficulties of long network computation time and redundant search space in traditional NAS methods. Finally, the experimental results show that the neural network generated by the proposed algorithm outperforms the artificially designed neural network, which demonstrates the effectiveness of the method. At the end of the paper, the rationality of the method is proved by experiments and comparisons. The performance of the new algorithm and the generated neural network is better than that of the artificially designed neural network.
Abstract
Jian Fu; Nan Wang
2023 International Annual Conference on Complex Systems and Intelligent Science October 20~22, 2023, Shenzhen, China
Deconstructing a task into multiple phases and connecting each phase in a sequence to achieve complex motion planning is the prevailing approach in the field of robotics research. However, this paradigm also faces the problem of how to increase the generalization capacity of each stage and ensure effective smooth transitions between adjacent stages. To address this problem, we propose task-oriented sequential postural motion primitives. A number of phases are partitioned based on task characteristics, and the pose movement primitives for each phase are parameterized in a data -driven manner to facilitate the acquisition of specific skills from multiple demonstration trajectories. Besides, the problem of transitioning between temporally sequential motion primitives is modeled as a tracking problem of a moving target in order to achieve a seamless merging of movements. Finally, experiments on the Sawyer robot demonstrate the effectiveness and feasibility of the proposed method.
Abstract
Jian Fu, Xiaolong Li, Zhu Yang
School of Automation, Wuhan University of Technology, Wuhan 430070, China
The problem of Person Re-Identification is still a big challenge, as the complex network structure and unsatisfactory generalization performance of widely used deep neural networks make them unsuitable for application to real-world problems.In this paper, we propose
a global feature-based person re-identification network with strong generalization. The extracted features part contains two channels of feature fusion: the feature extraction module and the feature generalization module.The feature generalization module is a new MixStyle module added to the feature extraction module, which can effectively mix the style information of images under different domains or even the same domain to form multiple potential domain features, thus improving the generalization performance of the model.In addition, this paper also makes some improvements to the loss function by adding a new constraint on the positive sample pair distance, which makes it possible to maximizes the reduction of intra-class distance in addition to pushing the distance between different classes during the training process.Experimental results on two datasets, Market1501 and DukeMTMC, show that the method proposed in this paper has strong generalization performance on the person re-identification problem and outperforms current global featurbased person re-identification methods.
Abstract
Jian Fu, Xiaolong Li, Zhu Yang
School of Automation, Wuhan University of Technology, Wuhan 430070, China
The application of motion primitives to encode robot motion has garnered considerable attention in the field of academic research. Existing models predominantly focus on reproducing task trajectory in relation to position, often neglecting the significance of orientation. Orientation Probabilistic Movement Primitives (ProMPs) indirectly encode motion primitives for attitude by utilizing
their trajectory probabilities on Riemannian manifolds, specifically the 3-sphere S3. However, assuming a Gaussian distribution imposes constraints on its abilities. We propose Mixed Orientation ProMPs to enhance trajectory planning and minimize the occurrence of singular configurations. This model consists of multiple separate Gaussian distributions in the tangent space, enabling the approximation of any distribution. Furthermore, optimization objective functions of the Lagrangian type can incorporate constraints, such as singularity avoidance, and others. Finally, the effectiveness and reliability of the algorithm were validated through trajectory planning experiments conducted on the UR5 robotic arm.
Abstract
Jian Fu; Jinyu Du;Xiang Teng; Yuxiang Fu;Lu Wu
IEEE Access
Learning from demonstrations with Probabilistic Movement Primitives (ProMPs) has been widely used in robot skill learning, especially in human-robot collaboration. Although ProMP has been extended to multi-task situations inspired by the Gaussian mixture model, it still treats each task independently. ProMP ignores the common scenario that robots conduct adaptive switching of the collaborative tasks in order to align with the instantaneous change of human intention. To solve this problem, we proposed an alternate learning-based parameter estimation method and an empirical minimum variation-based decomposition strategy with projection points, combining with linear interpolation strategy for weights, based on a Gaussian mixture model framework. Alternate learning of weights and parameters in multi-task ProMP (MTProMP) allows the robot to obtain a smooth composite trajectory planning which crosses expected via points. Decomposition strategy reflects how the desired via point state is projected onto the individual ProMP component, rendering the minimum total sum of deviations between each projection point with the respective prior. Linear interpolation is used to adjust the weights among sequential via points automatically. The proposed method and strategy are successfully extended to multi-task interaction ProMPs (MTiProMP). With MTProMP and MTiProMP, the robot can be applied to multiple tasks in industrial factories and collaborate with the worker to switch from one task to another according to changing intentions of the human. Classical via points trajectory planning experiments and human-robot collaboration experiments are performed on the Sawyer robot. The results of experiments show that MTProMP and MTiProMP with the proposed method and strategy perform better.
Abstract
Jian Fu; Xiang Teng; Ce Cao; Zhaojie Ju; Ping Lou
IEEE Transactions on Neural Networks and Learning Systems Early online 24 Sep 2020
Recent research achievements in Learning from Demonstration (LfD) demonstrate that the reinforcement learning is effective for the robots to improve its movement skills. The current challenge mainly remains in how to generate new robot motions, which have similar preassigned performance indicator but are different from the demonstrated tasks. To deal with the above issue, this paper proposes a framework to represent the policy and conduct imitation learning and optimization for robot intelligent trajectory planning, based on the improved local weighted regression (iLWR) and policy improvement with path integral by dual perturbation (PI2-DP). Besides, the reward guided weight searching and basis function’s adaptive evolving are performed alternately in two spaces, i.e. the basis function space and the weight space, to deal with the above problem. The alternate learning process constructs a sequence of two-tuples which joins the demonstration task and new one together for motor skill transfer. So that the robot skills can be gradually learnt from similar tasks, and those skills can also correspond the demonstrated tasks to dissimilar tasks in different criterion. Classical via-points trajectory planning experiments are performed with the SCARA manipulator, a 10 DOF planar and the UR robot. These results show that the proposed method is not only feasible but also effective.
Abstract
Jian Fu; Cong Li; Xiang Teng;Fan Luo;Boqun Li
APPLIED SCIENCS. Appl. Sci. 2020, 10(15), 5346; Received: 30 June 2020 / Revised: 28 July 2020 / Accepted: 30 July 2020 / Published: 3 August 2020
Scale-invariant feature transform (SIFT) is a popular pattern recognition method in 2D-image because it can abstracts the features which are invariant to rotation, scale zooming, brightness changing. So it demonstrates a certain stability to objects subjected to view point changing and noise distribution. However, the dimension of the SIFT descriptors is too high, and its runtime is too long. Aiming at this disadvantage, this paper propose a new method to generate feature descriptor based on hierarchical region and treat different regions differently. Improved SIFT Algorithm reclassificates dDiscovering the implicit pattern and using it as heuristic information to guide the policy search is one of the core factors to speed up the procedure of robot motor skill acquisition. This paper proposes a compound heuristic information guided reinforcement learning algorithm PI2-CMA-KCCA for policy improvement. Its structure and workflow are similar to a double closed-loop control system. The outer loop realized by Kernel Canonical Correlation Analysis (KCCA) infers the implicit nonlinear heuristic information between the joints of the robot. In addition, the inner loop operated by Covariance Matrix Adaptation (CMA) discovers the hidden linear correlations between the basis functions within the joint of the robot. These patterns which are good for learning the new task can automatically determine the mean and variance of the exploring perturbation for Path Integral Policy Improvement (PI2). Compared with classical PI2, PI2-CMA, and PI2-KCCA, PI2-CMA-KCCA can not only endow the robot with the ability to realize transfer learning of trajectory planning from the demonstration to the new task, but also complete it more efficiently. The classical via-point experiments based on SCARA and Swayer robots have validated that the proposed method has fast learning convergence and can find a solution for the new task. escriptor generating regions, using a circular area divide into 2×2+1 sub-regions instead of rectangular area in original algorithm. In the feature matching stage, setting different thresholds to 2×2 fan-shaped regions and 1 annular region, to achieve retaining right matched points as much as possible while removing wrongs. Comparing with the SIFT algorithm in some aspects, experiment results show that in the condition of fuzzy, light, rotation and affine transformation, improved SIFT algorithm can accomplish image matching test well and matching speed significantly improved.
Abstract
傅剑; 滕翔; 曹策; 娄平。
《华中科技大学学报:自然科学版》,2019年第11期96-102,共7页
针对当前模仿强化学习(LfDRL)框架面向新任务时并未考虑机器人各关节之间的联系,从而影响学习效果的不足,利用伪协方差矩阵的思想,基于再生核空间(RKHS)和广义瑞丽熵构建面向泛函指标的关节间摄动相关局部坐标系,进而设计出一种集成核典型相关分析(KCCA)与路径积分策略提升(PI2)的强化学习方法.利用学习经验数据基于KCCA推断出机器人各关节间面向轨迹规划任务的隐含非线性启发式信息,引导PI2搜索到最优/次优策略,使得机器人实现从示范轨迹规划任务到新轨迹规划任务的快速迁移学习,并高质量完成.选择顺应性装配机械手臂(SCARA)和优傲5(UR5)机器人的过单点、过两点迁移学习智能轨迹规划实验,结果表明:融合KCCA推断启发式信息的强化学习的平均代价下降率明显优于经典的PI2算法,其机器人智能轨迹规划在提升学习收敛速度的同时也提高了机器人完成新任务的精度。
Abstract
Yingbo Liang; Jian Fu.
International Journal of Pattern Recognition and Artificial Intelligence
The traditional watershed algorithm has the limitation of false mark in medical image segmentation, which causes over-segmentation and images to be contaminated by noise possibly during acquisition. In this study, we proposed an improved watershed segmentation algorithm based on morphological processing and total variation model (TV) for medical image segmentation. First of all, morphological gradient preprocessing is performed on MRI images of brain lesions. Secondly, the gradient images are denoised by the all-variational model. While retaining the edge information of MRI images of brain lesions, the image noise is reduced. And then, the internal and external markers are obtained by forced minimum technique, and the gradient amplitude images are corrected by using these markers. Finally, the modified gradient image is subjected to watershed transformation. The experiment of segmentation and simulation of brain lesion MRI image is carried out on MATLAB. And the segmentation results are compared with other watershed algrothims. The experimental results demonstrate that our method obtains the least number of regions, which can extract MRI images of brain lesions effectively. In addition, this method can inhibit over-segmentation, improving the segmentation results of lesions in MRI images of brain lesions.
Abstract
Jian Fu; Siyuan Shen; Ce Cao; Cong Li.
International Conference on Intelligent Robotics and Applications ICIRA 2019: Intelligent Robotics and Applications pp 356-367
Learning from demonstration with the reinforcement learning (LfDRL) framework has been successfully applied to acquire the skill of robot movement. However, the optimization process of LfDRL usually converges slowly on the condition that new task is considerable different from imitation task. We in this paper proposes a ProMPs-Bayesian-PI 2 algorithms to expedite the transfer process. The main ideas is adding new heuristic information to guide optimization search other than random search from the stats of imitation learning. Specifically, we use the result of Bayesian estimation as the heuristic information to guide the PI 2 when it random search. Finally, we verify this method by UR5 and compare it with the traditional method of ProMPs-PI 2 . The experimental results show that this method is feasible and effective.
Abstract
Jian Fu; Ce Cao; Jinyu Du; Siyuan Shen.
International Conference on Intelligent Robotics and Applications ICIRA 2019: Intelligent Robotics and Applications pp 379-389
Motor skill acquisition and refinement is critical for the robot to step in human daily lives, which can endow it with the ability of autonomously performing unfamiliar tasks. However, how does the robot autonomously fulfill the new motion task with preassigned performance based on the demonstration task is still a challenge. We in this paper proposed a novel motor skill acquisition policy to conquer above problem, which is based on improved local weighted regression (iLWR), policy improvement with path integral (PI 2 ). Besides, the mixture Gaussian regression (GMR) guided self-reconstruction of basis function and the search of weight coefficient in the policy expression are performed alternately in basis function space and weight space to seek the optimal/suboptimal solution. In this way, robot can achieve the gradual acquisition of movement skills from similar tasks which is related to the demonstration to unsimilar task with different criterion. At last, the classical via-points trajectory planning experiment are performed with SCARA manipulator, NAO humanoid robot to verify that the proposed method is effective and feasible.
Abstract
Jian Fu; ChaoQi Wang; JingYu Du; Fan Luo.
International Conference on Intelligent Robotics and Applications ICIRA 2019: Intelligent Robotics and Applications pp 701-714
The paper proposed a new method to endow a robot with the ability of human-robot collaboration and online obstacle avoidance simultaneously. In other words, we construct a probabilistic model for human-robot collaboration primitives to learn the nonlinear correlation between human and robot joint space and Cartesian space both based on interaction trajectories from the demonstration. This multidimensional probabilistic model not only helps to infer robot collaboration motion depending on the human action by the correlation between human and robot in joint space but also convenient to conduct robot obstacle avoidance reverse kinetics from cartesian space via the correlation between them. Specifically, as for the latter, a modulation matrix is established from the obstacle form to automatically generate robot obstacle avoidance trajectory in Cartesian space. Obstacle avoidance in the human-robot collaboration experimental is investigated, and its simulation results verify the feasibility and efficiency of the algorithm.Abstract
Xiang Teng; Jian Fu; Cong Li; ZhaoJie Ju.
International Conference on Intelligent Robotics and Applications ICIRA 2019: Intelligent Robotics and Applications pp 342-355
Reinforcement Learning (RL) was successfully applied in multi-degree-of-freedoms robot to acquire motor skills, however, it hardly ever consider each joints’ relationship, or just think about the linear relationship between them. In order to find the nonlinear relationship between each degrees of freedom (DOFs), we propose a Pseudo Covariance Matrix (PCM) to guide reinforcement learning for motor skill acquisition. Specifically it combined Path Integral Policy Improvement ( PI2 ) with Kernel Canonical Correlation Analysis (KCCA), where KCCA is used to obtain the PCM in high dimensional space and record it as the heuristic information to search an optimal/sub-optimal strategy. The experiments based on robots (SCARA and UR5) demonstrate the new method is feasible and effective.
Abstract
Jian Fu ; Sujuan Wei ; Haibo He ; Shengyong Wang
Conference paper. First Online: 06 August 2019. Part of the Lecture Notes in Computer Science book series (LNCS, volume 11745)
Motor skill acquisition and refinement is critical for the robot to step in human daily lives, which can endow it with the ability of autonomously performing unfamiliar tasks. However, how does the robot autonomously fulfill the new motion task with preassigned performance based on the demonstration task is still a challenge. We in this paper proposed a novel motor skill acquisition policy to conquer above problem, which is based on improved local weighted regression (iLWR), policy improvement with path integral (PI 2 ). Besides, the mixture Gaussian regression (GMR) guided self-reconstruction of basis function and the search of weight coefficient in the policy expression are performed alternately in basis function space and weight space to seek the optimal/suboptimal solution. In this way, robot can achieve the gradual acquisition of movement skills from similar tasks which is related to the demonstration to unsimilar task with different criterion. At last, the classical via-points trajectory planning experiment are performed with SCARA manipulator, NAO humanoid robot to verify that the proposed method is effective and feasible.
Abstract
傅剑,陈思明,庞牧野,娄平
《华中科技大学学报:自然科学版》,2017年第10期p90-94
针对如何基于示范任务学习让机器人自主获得完成新任务的能力的难题,提出一种高斯混合回归结合路径积分策略提升(GMR-PI2)的表达、模仿和优化框架,同时采用基函数、策略表达权系数两个空间上交替搜索执行方案来解决上述问题.核心思想是当权系数探索到最佳逼近点附近时,根据经验最优轨迹集进行基函数的自重组,然后再重启权系数搜索,从而实现从示范任务到指标集约束任务的渐进运动技能获取.经典的轨迹规划过点实验结果表明该方法是有效和可行的.
Abstract
Xinmin Zhou;Kaiyuan Wang;Jian Fu.
Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), 2016 International Conference on: 19 January 2017
Scale-invariant feature transform (SIFT) is a popular pattern recognition method in 2D-image because it can abstracts the features which are invariant to rotation, scale zooming, brightness changing. So it demonstrates a certain stability to objects subjected to view point changing and noise distribution. However, the dimension of the SIFT descriptors is too high, and its runtime is too long. Aiming at this disadvantage, this paper propose a new method to generate feature descriptor based on hierarchical region and treat different regions differently. Improved SIFT Algorithm reclassificates descriptor generating regions, using a circular area divide into 2×2+1 sub-regions instead of rectangular area in original algorithm. In the feature matching stage, setting different thresholds to 2×2 fan-shaped regions and 1 annular region, to achieve retaining right matched points as much as possible while removing wrongs. Comparing with the SIFT algorithm in some aspects, experiment results show that in the condition of fuzzy, light, rotation and affine transformation, improved SIFT algorithm can accomplish image matching test well and matching speed significantly improved.
Abstract
Jian Fu;Junwei Sun;Kaiyuan Wang.
Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), 2016 International Conference on 10.1109/ICIICII.2016.0023: 19 January 2017
Apache Spark is a distributed memory-based computing framework which is natural suitable for machine learning. Compared to Hadoop, Spark has a better ability of computing. In this paper, we analyze Spark's primary framework, core technologies, and run a machine learning instance on it. Finally, we will analyze the results and introduce our hardware equipment.
Abstract
Jian Fu; Da Wei.
Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), 2016 International Conference on: 19 January 2017
Nowadays, endowing robots with the capability to learn is an important goal for the robotics research community. An important part of this research is learning skills. Dynamic movement primitives (DMPs) is a very powerful model to conduct learning from demonstration for robot. In this paper, we have made a great improvement on Local weighted Regression(LWR) which is an original regression technique in DMPs. Specifically, we change the phase from integrating into time average and give an logistic function to make sure the final forcing term to be zero. Then, we can make better use of min-jerk criterion demonstrate the effect and efficient.
Abstract
Jian Fu;Siming Chen.
Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), 2016 International Conference on: 19 January 2017
Learning from demonstration has been applied successfully in acquiring similar motor skills for robot. However, how to accomplish different tasks with no explicit demonstration is still a challenging issue. In this paper, we propose a novel robot skills learning method consisted of Dynamical Movement Primitives with mixture Gaussian Model Regression(DMPS-GMR) and Policy Improvement with Path Integrals (PI2). The DMPS-GMR make the robot have the ability of learning fundamental task from the rough demonstration, and then Policy Improvement with Path Integrals based on GMR (PI2-GMR) endow robot the optimal/suboptimal solution for dissimilar task from the imitated state gain from DMPS-GMR. Experimental results demonstrate that the proposed approach can make robot acquisition skill more accurately.
Abstract
Jian Fu ; Sujuan Wei ; Li Ning ; Kui Xiang
Published in: 2015 Chinese Automation Congress (CAC)
Dynamic movement primitives (DMPs) is very powerful model to conduct learning from demonstration for robot. In this paper, we put forward a method for forcing term learning based on Gaussian Model Regression (GMR). Specifically, we apply the Gaussian Mixture Model (GMM) to model the jointly probability over data from demonstrations (desired values, positions and velocities from canonical system). Thus we can obtain the generalized prediction by means of the corresponding conditional distribution. The proposed the method has a more fitting precision than LWR (Local weighted Regression) which is a classical regression technique in DMPs. Simulation results on trajectory planning with min-jerk criterion demonstrate the effect and efficient.
Abstract
Fu, Jian; Ning, Li; Wei, Sujuan; Zhang, Liyan.
Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), 2015 International Conference on: 3-4 Dec. 2015 ,111-115
Imitation learning is a promising paradigm for enabling robots to autonomously perform new tasks, which is similar to the procedure of human's motion skill acquirement. In the paper, we present a novel DS-GMR coupled primitive (DGCP) for robotic motion skill learning based on imitation learning. DGCP comprises a dominated linear ordinary differential dynamic component and a GMR based forcing component. Furthermore, we carefully design the linkage mechanism of hyper parameters to achieve spatiotemporal coupling synchronically. In this way an intelligent trajectory planning in similar scenario (fulfilling target within different time and positon) could be generated spontaneously. Finally, simulation that robot perform a trajectory planning with min-jerk criteria in various duration demonstrates practical capability and efficiency of the presented method.
Abstract
傅剑;马冰洁;熊沁怡;卫素娟;张俊.
专利: 2015-08-26
Abstract
Jian Fu; Sujuan Wei; Haibo He; Shengyong Wang.
Neural Networks (IJCNN), 2014 International Joint Conference on: 2014/7 ,vol. 6 no.11,3649-3656
We present a novel online learning control algorithm (OLCPA) which comprises projected gradient temporal difference for action-value function (PGTDAVF) and advanced heuristic dynamic programming with one step delay (AHD-POSD). PGTDAVF can guarantee the convergence of temporal difference(TD)-based policy learning with smooth action-value function approximators, such as neural networks. Meanwhile, AHDPOSD is a specially designed framework for embedding PGTDAVF in to conduct online learning control. It not only coincides with the intention of temporal difference but also enables PGTDAVF to be effective under nonidentical policy environment, which results in more practicality. In this way, the proposed algorithms achieve the stability and practicability simultaneously. Finally, simulation of online learning control on a cart pole benchmark demonstrates practical control capability and efficiency of the presented method.
Abstract
Jian Fu; Haibo He; Huiying LI; Qing Liu.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): 2012 ,vol 7367 ,p 555-564
In this paper, we propose a novel strategy for approximating policy evaluation during online critic-actor learning procedure. We adopt the adaptive differential evolution with elites (ADEE) to optimize moving least square temporal difference with one step (MLSTD(0)) at the early stage which is good at global searching. Next we apply gradient method to perform local search efficiently and effectively. That solves the dilemma between explore and exploit in weight seeking for critic neural network. Simulation results on the online learning control of a cart pole benchmark demonstrate the efficiency of the presented method.
Abstract
Haibo He,Zhen Ni,Jian Fu
NEUROCOMPUTING. Volume 78, Issue 1. 2011. PP 3-13
In this paper, we propose a novel adaptive dynamic programming (ADP) architecture with three networks, an action network, a critic network, and a reference network, to develop internal goal-representation for online learning and optimization. Unlike the traditional ADP design normally with an action network and a critic network, our approach integrates the third network, a reference network, into the actor-critic design framework to automatically and adaptively build an internal reinforcement signal to facilitate learning and optimization overtime to accomplish goals. We present the detailed design architecture and its associated learning algorithm to explain how effective learning and optimization can be achieved in this new ADP architecture. Furthermore, we test the performance of our architecture both on the cart-pole balancing task and the triple-link inverted pendulum balancing task, which are the popular benchmarks in the community to demonstrate its learning and control performance over time.
Abstract
Jian Fu; Haibo He; Zhen Ni.
Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011 IEEE Symposium on: 29 July 2011
In this paper we propose to integrate the recursive Levenberg-Marquardt method into the adaptive dynamic programming (ADP) design for improved learning and adaptive control performance. Our key motivation is to consider a balanced weight updating strategy with the consideration of both robustness and convergence during the online learning process. Specifically, a modified recursive Levenberg-Marquardt (LM) method is integrated into both the action network and critic network of the ADP design, and a detailed learning algorithm is proposed to implement this approach. We test the performance of our approach based on the triple link inverted pendulum, a popular benchmark in the community, to demonstrate online learning and control strategy. Experimental results and comparative study under different noise conditions demonstrate the effectiveness of this approach.
Abstract
Jian Fu; Haibo He; Xinmin Zhou.
Neural Networks, IEEE Transactions on: 16 June 2011 ,vol.22, no.7, ,pp.1133-1148
Adaptive dynamic programming (ADP) is a promising research field for design of intelligent controllers, which can both learn on-the-fly and exhibit optimal behavior. Over the past decades, several generations of ADP design have been proposed in the literature, which have demonstrated many successful applications in various benchmarks and industrial applications. While many of the existing researches focus on multiple-inputs-single-output system with steepest descent search, in this paper we investigate a generalized multiple-input-multiple-output (GMIMO) ADP design for online learning and control, which is more applicable to a wide range of practical real-world applications. Furthermore, an improved weight-updating algorithm based on recursive Levenberg-Marquardt methods is presented and embodied in the GMIMO approach to improve its performance. Finally, we test the performance of this approach based on a practical complex system, namely, the learning and control of the tension and height of the looper system in a hot strip mill. Experimental results demonstrate that the proposed approach can achieve effective and robust performance.
Abstract
Jian Fu ; Qing Liu ; Xinmin Zhou ; Kui Xiang ; Zhigang Zeng
Published in: 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence)
In the paper, we propose an adaptive variable strategy Pareto differential evolution algorithm for multi-objective optimization (AVSPDE). It is different from the general adaptive DE methods which are regulated by variable parameters and applied in single-objective area. Based on the real-time information from the tournament selection set (TSS), there are two DE variants to switch dynamically during the run, in which one aims at fast convergence and the other focus on the diverse spread The theoretical analysis and the digital simulation show the presented method can achieved better performance.
Abstract