@NutsAboutAmiga
We use a mixture of runtime and compile time abstraction for this. For example, if you run the 020+ generic build, it will choose an 030 or 040 optimised C2P path, depending on the CPU detected. This is no additional overhead since it's just a function pointer to call. We just set the pointers up as soon as we know the screenmode and use them from then on. This is a runtime abstraction.
If you run the 040 or 060 build, the same code path is just hardwired to use the 040+ C2P and the 030 C2P code isn't even present in the executable.
If you are running the 060 build, several inner loops are replaced entirely with their 060 optimised counterparts. These are less ameniable to call by indirection because that indirection happens on countless individual drawing calls, so we don't run-time abstract it, just use the 060 build. That's a compile time abstraction.
Usually the rendering paths are the same in all the builds (except for the inner loop optimisations above), but an example where rendering paths differ somewhat is the text message rendering. On fullscreen mode (RTG or AGA), the text is chunky-plotted to the fast ram chunky buffer every frame. In 2/3 RTG mode, the text is chunky plotted directly to the VRAM region (while locked) of the display/back buffers below the game view, when and only when the text updates.
In 2/3 AGA mode, a completely different, planar based plotter writes directly to a single plane in fast memory and is then longword copied to the display/back buffers. This is why the text in AGA 2/3 mode looks grey on black, because the single bitplane we write to corresponds to a grey colour in the palette. Another option to revisit later is to render it via sprites as this will work better with double height pixel modes and can support different colours again.
We already made most of the obvious low-level optimisations in the code quite some time ago. For example, at one point there were up to 8000 32-bit long division instructions per frame in my mod due to the greater use of polygon models which used it for perspective correction. They were mostly eliminated by converting division to multiply by reciprocal (which is a lookup). On the 060, the difference is significant because multiplication is so fast. However, division is still so slow on earlier CPUs that even 040 and 030 benefit, just by smaller margins.
The optimisations now are higher level and are largely being done in C. The primary issue is that of overdraw. There are examples in the original game where an entire complex scene is rendered and then completely drawn over by a wall. I've managed to eliminate a lot of that, but one of the worst offenders still remains which are door/lift zones. That's the area I'm working on next. Last edited by Karlos on 22-Dec-2024 at 12:25 PM.
_________________ Doing stupid things for fun... |