Refactor CustomEvalCallback

Currently, many of the framework's features are provided by the CustomEvalCallback. It handles:

Should new algorithms use the callback?

How do new algorithms get information like n_eval_episodes, which is currently only provided to the callback?

We need to discuss these questions and refactor the callback to ease the integration of new algorithms and provide a clear structure.