↧
Answer by Neil Slater for In online one step actor critic, why does the...
This "decay" of later values is a direct consequence of the episodic formula for the objective function for REINFORCE:$$J(\theta) = v_{\pi_\theta}(s_0)$$That is, the expected return from the first...
View ArticleIn online one step actor critic, why does the weights update become less...
The Reinforcement Learning Book by Richard Sutton et al, section 13.5 shows an online actor critic algorithm.Why do the weights updates depend on the discount factor via $I$?It seems that the more we...
View Article
More Pages to Explore .....