CleanDQN

Is a one-file implementation of DQN. Here is how it performs in comparison to SB3-DQN on Cartpole-v1 averaged over five runs. Note that the hyperparameters are different. When I used the cleandqn hyperparameters for the SB3 version, it did not learn anything, so I stuck to the default parameters.

CleanSAC

Is a one-file implementation of SAC. It has the option to use HER. Here is how it performs in comparison to SB3 SAC using the same hyperparameters (left is cleansac, right is SB3 SAC). FetchReach-v2 averaged over five runs.

CleanPPO

Is a one-file implementation of PPO. It converges slower than the SB3 version of PPO, so I will open a separate issue for that. FetchReach-v2 averaged over five runs.

other changes

always set torch.backends.cudnn.deterministic = True when setting a seed
smoke tests explicitly exclude dqn and cleandqn because they don't work with continuous action spaces

closes #137 (closed)

Edited Oct 23, 2023 by Thilo Fryen

cleandqn, cleansac and cleanppo

CleanDQN

CleanSAC

CleanPPO

other changes

Merge request reports