Hyperopt-Score and epoch-limiting

I noticed an unfavorable interplay between the hyperopt score and the epoch-limiting during the hyperopt.

  • Refresh: The hyperopt-score is the sum of the mean of the evaluation metric (usually test/success_rate) and the mean of the evaluation metric in the last early_stop_last_n epochs, divided by the number of epochs. In pseudo-python-code
score = mean(early_stop_data_column) + mean(early_stop_data_column[-early_stop_last_n:])
score /= epochs

Therefore, the hyperopt score is highly dependent on the number of epochs.

  • The custom sweeper limits n_epochs to int(n_epochs_of_fastest_run * 1.5).

The following example shows that this can be problematic:

Say we have an easy to learn environment and the following history of test/success_rate for one run:

suc_hist = [0.5, 0.5, 0.8, 0.7, 0.9, 0.8, 0.6, 1.0, 0.8, 0.9]

It had 10 epochs. It's hyperopt-score is 0.165.

The hyperopt continues and one lucky configuration manages to reach early stopping in only 5 epochs. Now n_epochs is int(n_epochs_of_fastest_run * 1.5) = 7.

The same run from before would now stop after 7 epochs and achieve a hyperopt_score of about 0.207.

This gives a huge advantage for all configurations that run after a run with few epochs. Maybe we should disable the epoch-limiting or at least give an option to disable it. @ckv0173 @czs4581 what do you think?