Add very basic custom algorithm that is not part of SB3
So far, we use only SB3 algorithms. However, it is important to also have an algorithm that is absolutely basic in terms of the interface. However, it should use the logger of SB3, so that we can use wandb and the hydra hyperparameter system. The algorithm should build on and be compatible with
- random action selection
- alternate between training and testing
- finite-step-episodes
- SB3-based logger
- hyperparameter optimization