XAI and Strategy Extraction via Reward Redistribution
Sprache des Vortragstitels:
International Digital Security Forum
Sprache des Tagungstitel:
Assigning credit for a received reward to previously performed actions is one of the central tasks in reinforcement learning. Credit assignment often uses world models, either in a forward or in a backward view. In a forward view, the future return is estimated by replacing the environment through a model or by rolling out sequences until episode end. A backward view either learns a backward model or performs a backward analysis of a forward model that predicts or models the return of an episode. Our method RUDDER performs a backward analysis to construct a reward redistribution to credit those actions that caused a reward. Its extension Align-RUDDER learns a reward redistribution from few demonstrations. An optimal reward redistribution has zero expected future reward and, therefore, immediately credits actions for all they will cause.
XAI aims at credit assignment, too, when asking what caused a model to produce a particular output given an input. Even further, XAI wants to know how and why a policy solved a task, why an agent is better than humans, why a decision was made. Humans best comprehend a strategy of an agent if all its actions are immediately evaluated and do not have hidden consequences in the future. Reward redistributions learned by RUDDER and Align-RUDDER help to understand task-solving strategies of both humans and machines.