Coding principle discussion: Redundant but simple vs. elegant but more complex

Two of the most popular RL libraries are stable-baselines3 (SB3) and CleanRL. They follow very different coding principles. While SB3 is very modular, CleanRL tries to fit almost everything about one algorithm into one single script and not use paradigms like inheritance.

CleanRLs' approach has the advantage that everything important about one algorithm is located in a single file and it is easier to keep an overview. However, there is also a lot of redundancy.

SB3-code is harder to understand but also covers more cases while being much less redundant.

We wanted to have our own versions of popular algorithms so that they are easier to modify. Therefore, I started to copy CleanRL code and adjust it to our framework. For the case of SAC, I now want your opinion on the following design decision:

Problem

CleanRLs' SAC works with flat observations and does not use HER. We also want to use HER, which uses environments with dict-observations. Suppose we have a working version of SAC with flat observations and a simple ReplayBuffer that can solve environments like Hopper-v4. We could now

Solution 1 (CleanRL-style)

Copy the whole file and make adjustments so that it uses HER and works with dict-observations. This keeps the code simpler but carries the risk that, if we find a mistake or make an improvement in one file, we forget it in the other.

Solution 2 (SB3-style)

Adjust the file with an extra flag to use HER or not and also check the environment observation-space to handle the case of dict-observations. This bloats the code but prevents redundancy.

Both approaches have their (dis-)advantages. What do you think? @czs4581 @cjd7625 @ckv0173