Skip to content

cleandqn, cleansac and cleanppo

Thilo Fryen requested to merge custom_dqn into devel

Adds three "clean" algorithms that are based on CleanRL.

CleanDQN

Is a one-file implementation of DQN. Here is how it performs in comparison to SB3-DQN on Cartpole-v1 averaged over five runs. Note that the hyperparameters are different. When I used the cleandqn hyperparameters for the SB3 version, it did not learn anything, so I stuck to the default parameters.

image

CleanSAC

Is a one-file implementation of SAC. It has the option to use HER. Here is how it performs in comparison to SB3 SAC using the same hyperparameters (left is cleansac, right is SB3 SAC). FetchReach-v2 averaged over five runs.

image

CleanPPO

Is a one-file implementation of PPO. It converges slower than the SB3 version of PPO, so I will open a separate issue for that. FetchReach-v2 averaged over five runs.

image

other changes

  • always set torch.backends.cudnn.deterministic = True when setting a seed
  • smoke tests explicitly exclude dqn and cleandqn because they don't work with continuous action spaces

closes #137 (closed)

Edited by Thilo Fryen

Merge request reports