cleandqn, cleansac and cleanppo
Adds three "clean" algorithms that are based on CleanRL.
CleanDQN
Is a one-file implementation of DQN. Here is how it performs in comparison to SB3-DQN on Cartpole-v1
averaged over five runs. Note that the hyperparameters are different. When I used the cleandqn hyperparameters for the SB3 version, it did not learn anything, so I stuck to the default parameters.
CleanSAC
Is a one-file implementation of SAC. It has the option to use HER. Here is how it performs in comparison to SB3 SAC using the same hyperparameters (left is cleansac, right is SB3 SAC). FetchReach-v2
averaged over five runs.
CleanPPO
Is a one-file implementation of PPO. It converges slower than the SB3 version of PPO, so I will open a separate issue for that. FetchReach-v2
averaged over five runs.
other changes
- always set
torch.backends.cudnn.deterministic = True
when setting a seed - smoke tests explicitly exclude dqn and cleandqn because they don't work with continuous action spaces
closes #137 (closed)