Expert heuristic tuning design for the FRIQ-learning
Kulcsszavak:reinforcement learning, heuristically accelerated reinforcement learning, expert knowledgebase, Q-learning, fuzzy Q-learning
The conventional reinforcement learning (RL) methods (e.g. Q-learning, SARSA, Fuzzy Q-learning) are searching for the solution starting from an empty initially empty knowledgebase, which is then expanded and filled by the problem related knowledge through iterations incrementally. These traditional RL systems do not have any additional external knowledge about the solution, therefore the learning phase may be a long process. Many methods exist which is able to inject external information into the RL system. This RL area is called heuristically accelerated reinforcement learning. The heuristically accelerated version of the fuzzy rule interpolation based Q-learning (FRIQ-learning) is able to incorporate the external expert knowledge in form of fuzzy rule-base into its knowledgebase. In this FRIQ-learning system the expert knowledge is static, it does not change during the learning phase. In the case if the external knowledge is not entirely correct, it can have a negative influence on the system efficiency (e.g. low convergence rate). Thus a methodology is needed, which is able to optimize (tune) the external knowledge rule-base (Q-function) during the learning phase too. The main goal of this paper is to suggest a method for the FRIQ-learning system which may be suitable for optimizing the injected expert knowledgebase (Q-function) too.