Hyperopt-Score and epoch-limiting
I noticed an unfavorable interplay between the hyperopt score and the epoch-limiting during the hyperopt.
- Refresh: The hyperopt-score is the sum of the mean of the evaluation metric (usually
test/success_rate
) and the mean of the evaluation metric in the lastearly_stop_last_n
epochs, divided by the number of epochs. In pseudo-python-code
score = mean(early_stop_data_column) + mean(early_stop_data_column[-early_stop_last_n:])
score /= epochs
Therefore, the hyperopt score is highly dependent on the number of epochs.
- The custom sweeper limits
n_epochs
toint(n_epochs_of_fastest_run * 1.5)
.
The following example shows that this can be problematic:
Say we have an easy to learn environment and the following history of test/success_rate
for one run:
suc_hist = [0.5, 0.5, 0.8, 0.7, 0.9, 0.8, 0.6, 1.0, 0.8, 0.9]
It had 10 epochs. It's hyperopt-score is 0.165.
The hyperopt continues and one lucky configuration manages to reach early stopping in only 5 epochs. Now n_epochs
is int(n_epochs_of_fastest_run * 1.5) = 7
.
The same run from before would now stop after 7 epochs and achieve a hyperopt_score of about 0.207.
This gives a huge advantage for all configurations that run after a run with few epochs. Maybe we should disable the epoch-limiting or at least give an option to disable it. @ckv0173 @czs4581 what do you think?