RE-think the testing scripts

Currently, the testing scripts perform a grid search over a set of predefined parameter spaces and we can define the to-be achieved success rate after a given number of steps, etc. The testing scripts rely heavily on the logging mechanisms that we have used so far. However, these logging mechanisms should be simplified further, which gives rise also to improved testing scripts.

The new testing mechanism should

Rely on hydra to specify the desired parameters and performance expectations
Have function and performance mode, where function just tests if a script runs through without any errors for 2 or so epochs, and performance also checks if the performance of a script meets the expectations. The function check should execute quite quickly within an hour or so for all environment/algorithm combinations. The performance script will naturally take one or more days, depending on how many environment/algorithm combinations we want to support and check.
Probably: it should rely on the custom optuna sweeper in combination with grid search as an optimization mechanism. This allows for using the search_space config option to specify which env/alg combinations we want to investigate.

Edited Jun 24, 2021 by Manfred Eppe