Skip to main content

Elaboration on the policy improvement theorem for soft policies in reinforcement learning

Tim Kovacs, Elaboration on the policy improvement theorem for soft policies in reinforcement learning. CSTR-10-007, University of Bristol. October 2010. No electronic version available.

Abstract

In section 5.4 of their book on reinforcement learning Sutton and Barto show that the policy improvement theorem applies to soft policies, that is, when making a soft policy greedier (but still soft) with respect to its Q-function we obtain an improved policy. I found this material difficult to follow and wrote this short document to elaborate on their proof. Familiarity with the material up until that section is assumed.

Bibtex entry.

Contact details

Publication Admin