Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[D] CartPole - Does velocity of cart impact the optimum action at (θ, 𝜔)? (alien.top)

submitted 2 years ago by throwaway_legal810@alien.top to c/machinelearning@academy.garden

1 comments fedilink hide all child comments

I am playing with CartPole using the RL DQN method, similar to the approach outlined here. Subsequently, I examined the greedy actions taken at various velocities, and the results are presented below, which shows the decision boundary decrease with increasing velocity.

https://preview.redd.it/agmnudtexi1c1.png?width=1787&format=png&auto=webp&s=f93caad02435a3871685fc0ce7476c92e59727ad

My intuition in physics suggests that these results might be incorrect. I believe that cart velocity should not influence the optimal action. The optimal action at (θ, 𝜔) should aim to reduce or maintain the magnitude of θ and decrease the magnitude of 𝜔 when θ is close to zero. Applying force to the cart changes the angular acceleration, subsequently altering 𝜔 and, in turn, θ. The change in 𝜔 and θ is not affected by the current cart velocity and position. Mathematical equations describing the dynamics of cartpole can be found in here, where the cart velocity and position has no effect on angular motion.

Another argument supporting the idea that the policy should be consistent across different cart velocities is by changing the point of reference to match the cart velocity. In doing so, an observer should observe the exact same dynamics as if the cart were stationary, and applying the same policy as if the cart were stationary.

Is there any flaw in my reasoning, or could the difference in greedy policy at different velocities be attributed to artifacts in RL, such as the agent lacking sufficient experience at high velocities?

you are viewing a single comment's thread
view the rest of the comments

[–] mrfox321@alien.top 1 points 2 years ago

Correct. Physical intuition (symmetry) should inform how one models the problem.

Here, Galilean invariance is invoked to reduce the relevant degrees of freedom of the objective function.