RE-think the testing scripts
Currently, the testing scripts perform a grid search over a set of predefined parameter spaces and we can define the to-be achieved success rate after a given number of steps, etc. The testing scripts rely heavily on the logging mechanisms that we have used so far. However, these logging mechanisms should be simplified further, which gives rise also to improved testing scripts.
The new testing mechanism should
-
Rely on hydra to specify the desired parameters and performance expectations -
Have function
andperformance
mode, where function just tests if a script runs through without any errors for 2 or so epochs, andperformance
also checks if the performance of a script meets the expectations. The function check should execute quite quickly within an hour or so for all environment/algorithm combinations. The performance script will naturally take one or more days, depending on how many environment/algorithm combinations we want to support and check. -
Probably: it should rely on the custom optuna sweeper in combination with grid search as an optimization mechanism. This allows for using the search_space
config option to specify which env/alg combinations we want to investigate.
Edited by Manfred Eppe