this post was submitted on 20 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 11 months ago
MODERATORS
 

I am playing with CartPole using the RL DQN method, similar to the approach outlined here. Subsequently, I examined the greedy actions taken at various velocities, and the results are presented below, which shows the decision boundary decrease with increasing velocity.

https://preview.redd.it/agmnudtexi1c1.png?width=1787&format=png&auto=webp&s=f93caad02435a3871685fc0ce7476c92e59727ad

My intuition in physics suggests that these results might be incorrect. I believe that cart velocity should not influence the optimal action. The optimal action at (ฮธ, ๐œ”) should aim to reduce or maintain the magnitude of ฮธ and decrease the magnitude of ๐œ” when ฮธ is close to zero. Applying force to the cart changes the angular acceleration, subsequently altering ๐œ” and, in turn, ฮธ. The change in ๐œ” and ฮธ is not affected by the current cart velocity and position. Mathematical equations describing the dynamics of cartpole can be found in here, where the cart velocity and position has no effect on angular motion.

Another argument supporting the idea that the policy should be consistent across different cart velocities is by changing the point of reference to match the cart velocity. In doing so, an observer should observe the exact same dynamics as if the cart were stationary, and applying the same policy as if the cart were stationary.

Is there any flaw in my reasoning, or could the difference in greedy policy at different velocities be attributed to artifacts in RL, such as the agent lacking sufficient experience at high velocities?

you are viewing a single comment's thread
view the rest of the comments
[โ€“] mrfox321@alien.top 1 points 10 months ago

Correct. Physical intuition (symmetry) should inform how one models the problem.

Here, Galilean invariance is invoked to reduce the relevant degrees of freedom of the objective function.