Enable A2C and PPO
A2C and PPO are a little different from the other SB3 algorithms that run in our framework so far (e.g. they don't have a replay buffer). With some adjustments in the code, we should still be able to use them.
A2C and PPO are a little different from the other SB3 algorithms that run in our framework so far (e.g. they don't have a replay buffer). With some adjustments in the code, we should still be able to use them.