Scilab-RL issueshttps://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues2024-02-02T08:50:25+01:00https://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues/171play around with multi-agent RL2024-02-02T08:50:25+01:00Manfred Eppeplay around with multi-agent RLManfred EppeManfred Eppehttps://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues/170Integrate Carlottas environment2024-01-11T17:29:00+01:00Manfred EppeIntegrate Carlottas environmentIntegrate Carlottas free energy testing environment in our framework.Integrate Carlottas free energy testing environment in our framework.Carlotta LangerCarlotta Langerhttps://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues/167Add performance test for onestep actor critic.2023-12-30T22:48:20+01:00Manfred EppeAdd performance test for onestep actor critic.Thilo added the one step actor critic algorithm from Sutton and Barto's Book. However, so far it is not double-checked with a performance test. This should be added.
M.Thilo added the one step actor critic algorithm from Sutton and Barto's Book. However, so far it is not double-checked with a performance test. This should be added.
M.https://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues/159Add CliffWalking environment performance test2023-12-11T14:40:11+01:00Thilo FryenAdd CliffWalking environment performance testThe `CliffWalking` env is a simple RL environment which can be used to showcase the differences of basic RL algorithms. For example, [this notebook](https://marcinbogdanski.github.io/rl-sketchpad/RL_An_Introduction_2018/1305a_One_Step_Ac...The `CliffWalking` env is a simple RL environment which can be used to showcase the differences of basic RL algorithms. For example, [this notebook](https://marcinbogdanski.github.io/rl-sketchpad/RL_An_Introduction_2018/1305a_One_Step_Actor_Critic.html) implements the `CliffWalking` environment and shows that SARSA and one-step actor-critic solve it differently.
For teaching purposes, it would be nice to also have this in our framework. Luckily, it is already [available](https://www.gymlibrary.dev/environments/toy_text/cliff_walking/) in Gymnasium.
See if our custom algorithm `onestepac` can solve it and create a performance test for it. Maybe also upload a video of the agent solving the environment to this issue.Ismail Barkin UlusoyIsmail Barkin Ulusoyhttps://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues/94Hyperopt-Score and epoch-limiting2022-05-16T16:00:09+02:00Ghost UserHyperopt-Score and epoch-limitingI noticed an unfavorable interplay between the hyperopt score and the epoch-limiting during the hyperopt.
- Refresh: The hyperopt-score is the sum of the mean of the evaluation metric (usually `test/success_rate`) and the mean of the ev...I noticed an unfavorable interplay between the hyperopt score and the epoch-limiting during the hyperopt.
- Refresh: The hyperopt-score is the sum of the mean of the evaluation metric (usually `test/success_rate`) and the mean of the evaluation metric in the last `early_stop_last_n` epochs, divided by the number of epochs. In pseudo-python-code
```
score = mean(early_stop_data_column) + mean(early_stop_data_column[-early_stop_last_n:])
score /= epochs
```
Therefore, the hyperopt score is highly dependent on the number of epochs.
- The custom sweeper limits `n_epochs` to `int(n_epochs_of_fastest_run * 1.5)`.
The following example shows that this can be problematic:
Say we have an easy to learn environment and the following history of `test/success_rate` for one run:
```
suc_hist = [0.5, 0.5, 0.8, 0.7, 0.9, 0.8, 0.6, 1.0, 0.8, 0.9]
```
It had 10 epochs. It's hyperopt-score is 0.165.
The hyperopt continues and one lucky configuration manages to reach early stopping in only 5 epochs. Now `n_epochs` is `int(n_epochs_of_fastest_run * 1.5) = 7`.
The same run from before would now stop after 7 epochs and achieve a hyperopt_score of about 0.207.
This gives a huge advantage for all configurations that run after a run with few epochs. Maybe we should disable the epoch-limiting or at least give an option to disable it. @ckv0173 @czs4581 what do you think?