Refactor CustomEvalCallback

Currently, many of the framework's features are provided by the CustomEvalCallback. It handles:

  • the train / test loop
  • the evaluation
  • early stopping
  • rendering & recording

Should new algorithms use the callback?

How do new algorithms get information like n_eval_episodes, which is currently only provided to the callback?

We need to discuss these questions and refactor the callback to ease the integration of new algorithms and provide a clear structure.