In online one step actor critic, why does the weights update become less significant as the episode progresses?

↧

Answer by Neil Slater for In online one step actor critic, why does the...

February 12, 2019, 9:43 am

This "decay" of later values is a direct consequence of the episodic formula for the objective function for REINFORCE:$$J(\theta) = v_{\pi_\theta}(s_0)$$That is, the expected return from the first...

View Article

Image may be NSFW.
Clik here to view.

In online one step actor critic, why does the weights update become less...

February 12, 2019, 9:43 am

The Reinforcement Learning Book by Richard Sutton et al, section 13.5 shows an online actor critic algorithm.Why do the weights updates depend on the discount factor via $I$?It seems that the more we...

View Article

More Pages to Explore .....