Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

dc.catalogadorjlo
dc.contributor.authorToro Icarte, Rodrigo Andrés
dc.contributor.authorKlassen, Toryn Q.
dc.contributor.authorValenzano, Richard
dc.contributor.authorCastro Anich, Margarita
dc.contributor.authorWaldie, Ethan
dc.contributor.authorMcIlraith, Sheila A.
dc.date.accessioned2023-08-07T19:33:55Z
dc.date.available2023-08-07T19:33:55Z
dc.date.issued2023
dc.description.abstractReinforcement Learning (RL) is a machine learning paradigm wherein an artificial agentinteracts with an environment with the purpose of learning behaviour that maximizesthe expected cumulative reward it receives from the environment. Reward machines(RMs) provide a structured, automata-based representation of a reward function thatenables an RL agent to decompose an RL problem into structured subproblems that canbe efficiently learned via off-policy learning. Here we show that RMs can be learnedfrom experience, instead of being specified by the user, and that the resulting problemdecomposition can be used to effectively solve partially observable RL problems. We posethe task of learning RMs as a discrete optimization problem where the objective is to findan RM that decomposes the problem into a set of subproblems such that the combinationof their optimal memoryless policies is an optimal policy for the original problem. Weshow the effectiveness of this approach on three partially observable domains, where itsignificantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations,and broader potential.
dc.fechaingreso.objetodigital2023-08-07
dc.format.extent60 páginas
dc.fuente.origenORCID
dc.identifier.doi10.1016/j.artint.2023.103989
dc.identifier.eissn1872-7921
dc.identifier.issn0004-3702
dc.identifier.urihttps://doi.org/10.1016/j.artint.2023.103989
dc.identifier.urihttps://repositorio.uc.cl/handle/11534/74370
dc.identifier.wosidWOS:001062209400001
dc.information.autorucEscuela de Ingeniería; Toro Icarte, Rodrigo Andrés; 0000-0002-7734-099X; 170373
dc.information.autorucEscuela de Ingeniería; Castro Anich, Margarita; 0000-0002-4689-6143; 170767
dc.language.isoen
dc.nota.accesoContenido parcial
dc.pagina.final60
dc.pagina.inicio1
dc.revistaArtificial Intelligence
dc.rightsacceso restringido
dc.subjectReinforcement learning
dc.subjectReward machines
dc.subjectPartial observability
dc.subjectAutomata learning
dc.subject.ddc000
dc.subject.deweyCiencias de la computaciónes_ES
dc.titleLearning Reward Machines: A Study in Partially Observable Reinforcement Learning
dc.typepreprint
sipa.codpersvinculados170373
sipa.codpersvinculados170767
sipa.trazabilidadORCID;2023-08-07
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Learning Reward Machines.pdf
Size:
79.34 KB
Format:
Adobe Portable Document Format
Description: