Integrate easy-to-modify versions of common RL algorithms

Algorithms

PPO
SAC + HER
simple Q-Learning

Functionality

saving and loading of models
logging metrics similar to SB3 algorithms
rendering
at least one performance test per algorithm

I'll probably mostly rely on Clean-RL.

Edited Nov 30, 2023 by Ghost User