Quantcast
Channel: In online one step actor critic, why does the weights update become less significant as the episode progresses? - Artificial Intelligence Stack Exchange
Browsing all 2 articles
Browse latest View live

Answer by Neil Slater for In online one step actor critic, why does the...

This "decay" of later values is a direct consequence of the episodic formula for the objective function for REINFORCE:$$J(\theta) = v_{\pi_\theta}(s_0)$$That is, the expected return from the first...

View Article


Image may be NSFW.
Clik here to view.

In online one step actor critic, why does the weights update become less...

The Reinforcement Learning Book by Richard Sutton et al, section 13.5 shows an online actor critic algorithm.Why do the weights updates depend on the discount factor via $I$?It seems that the more we...

View Article
Browsing all 2 articles
Browse latest View live