Elaboration on the policy improvement theorem for soft policies in reinforcement learningTim Kovacs, Elaboration on the policy improvement theorem for soft policies in reinforcement learning. CSTR-10-007, University of Bristol. October 2010. No electronic version available.
In section 5.4 of their book on reinforcement learning Sutton and Barto show that the policy improvement theorem applies to soft policies, that is, when making a soft policy greedier (but still soft) with respect to its Q-function we obtain an improved policy. I found this material difficult to follow and wrote this short document to elaborate on their proof. Familiarity with the material up until that section is assumed.