||Self-Generation of Reward by Moderate-Based Index for Senor Inputs
KURASHIGE, Kentarou ,
Journal of robotics and mechatronics
63 , 2015-12-18 , 富士技術出版株式会社
In conventional reinforcement learning, a reward function influences the learning results, and therefore, the reward function is very important. To design this function considering a task, knowledge of reinforcement learning is required. In addition to this, a reward function must be designed for each task. These requirements make the design of a reward function unfeasible. We focus on this problemand aim at realizing a method to generate a reward without the design of a special reward function. In this paper, we propose a universal evaluation for sensor inputs, which is independent of a task and is modeled on the basis of the indicator of pleasure and pain in biological organisms. This evaluation estimates the trend of sensor inputs based on the ease of input prediction. Instead of the design of a reward function, our approach assists a human being in learning how to interact with an agent and teaching it his/her demand. We recruited a research participant and attempted to solve the path planning problem. The results show that a participant can teach an agent his/her demand by interacting with the agent and the agent can generate an adaptive route by interacting with the participant and the environment.