Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
|
|
|
|
Poster | Thread | Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 1-Feb-2025 10:38:19
| | [ #41 ] |
| |
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| @thellier
Quote:
thellier wrote: Yes Wazp3D use internally RGBA32 textures and also have functions to directly write to RGBA32 screen but it also can convert the pixels to many 15/16/24/32 bits screen formats = so you can use any RTG screen format, up to 2048x2048, but not 8bits modes
Certainly WaZp3D is more compatible(*) than any existing WaRp3D drivers  |
32-bit is all I need, so I'm happy :)
Quote:
thellier wrote: It also support all the texture formats, all zbuffer operators (equal, greater or equal, etc...) all blend functions |
Working Z-Buffer would make my life easier for many future games, but the current racing engine simply draws back-to-front, so absence of ZBuffer would not be an issue anyway. But, it's important to know that I *can* rely on Z-Buffer for future designs! I certainly skipped few features even now because of missing Z-Buffer...
Quote:
thellier wrote: The main thing that WaZp3D dont support is stencil, 8 bit mode and dithering
|
At first I was sad about the 8-bit mode, but ten seconds later I realized that I only need 8-bit to support 040/060, which will soon have a separate C2P codepath anyway :)
Quote:
thellier wrote: Also the OS4 version dont really support multitexturing |
Not a deal breaker for a flatshader :)
Quote:
thellier wrote: * So beware that a program coded on Wazp3D and working on Wazp3D may not works on some real hardware WaRp3D drivers that dont implement some features
| The real HW group is certainly a minority compared to PiStorm now, let alone future. How many people are we talking about here anyway? Like, 5-7 ? |
| Status: Offline |
| | Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 1-Feb-2025 10:41:58
| | [ #42 ] |
| |
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| What kind of provisions are there to run 68k code on OS4 ? Are there some emulators, or perhaps translators that convert 68k instructions to PowerPC ones ?
I'm just wondering if there's an easy way to test my 68k code on PowerPC, somehow. I'm guessing there isn't ?
|
| Status: Offline |
| | Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 1-Feb-2025 11:08:29
| | [ #43 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @Heimdall
OS4 runs user-mode 68K code under emulation, usually under JIT but occasionally problematic applications may have to run interpretively.
Unless you are banging metal, it's pretty seamless. Same goes for MorphOS.
Back when I was rocking my old BlizzardPPC, my 603e 240MHz ran 68K games like Doom attack at playable rates at 640*400 resolution, whereas this was a slideshow with the counterpart 68040 25MHz _________________ Doing stupid things for fun... |
| Status: Offline |
| | thellier
|  |
Re: Integrating Warp3D into my 3D engine Posted on 1-Feb-2025 11:55:38
| | [ #44 ] |
| |
 |
Regular Member  |
Joined: 2-Nov-2009 Posts: 270
From: Paris | | |
|
| Wazp3D (with WinUAE version) is on Aminet
Wazp3D for Vampire use Maggie for hardware rendering, this is works in progress that I periodically post on discord ApolloTeam
I can also send it to you per email
|
| Status: Offline |
| | Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 2-Feb-2025 0:04:38
| | [ #45 ] |
| |
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| @Karlos
Quote:
Karlos wrote: @Heimdall
OS4 runs user-mode 68K code under emulation, usually under JIT but occasionally problematic applications may have to run interpretively.
Unless you are banging metal, it's pretty seamless. Same goes for MorphOS.
|
What exactly do you mean by banging metal here ? Cache-specific CPU instructions ? Or the OCS/AGA system registers ?
If PPC runs native OCS games it must be capable of handling all the OCS registers, no ?
Quote:
Karlos wrote: Back when I was rocking my old BlizzardPPC, my 603e 240MHz ran 68K games like Doom attack at playable rates at 640*400 resolution, whereas this was a slideshow with the counterpart 68040 25MHz |
"Playable rate" at mere 4x pixel rate (of 320x200) sounds pretty awful for a 240 MHz RISC ! What kind of architecture was it ? Didn't its pipeline have several execution units ? Wouldn't they execute most ops in 1 cycle ? Maybe its branch prediction was horrendous or the pipeline stalls were a common occurrence ?
I've worked a lot with DSP and GPU RISCs on Jaguar so I'm familiar with that kind of a RISC instruction throughput (which was at 26.6 MHz).
This one is, like, an order of magnitude faster !
Perhaps the FastRAM access is as slow like chipram access from 040, meaning the CPU mostly just waits for the RAM ?
I should probably look up the PDF on the PowerPC...
On Jaguar, I had a ~raycaster at 640x200 at 65,536 colors (RGB 565) and it had a Playable framerate on a 26.6 MHz RISC, so I'm most definitely very curious about the bottlenecks on the PPC !Last edited by Heimdall on 02-Feb-2025 at 12:27 AM.
|
| Status: Offline |
| | Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 2-Feb-2025 0:51:58
| | [ #46 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @Heimdall
By metal banging I mean anything that touches the chipset, less sure about supervisor mode registers. Metal banging does work on machines with the actual metal, e.g. A1200 etc. Otherwise, it's UAE.
If you have system friendly code that relies on library calls, generally it will work.
As for the Doom example, the fact is that there are still many bottlenecks in that old system. Under the circumstances, it was doing well to be playable at that resolution and rate. Native ports that I tried weren't radically better.
Last edited by Karlos on 02-Feb-2025 at 12:58 AM.
_________________ Doing stupid things for fun... |
| Status: Offline |
| | Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 2-Feb-2025 2:43:52
| | [ #47 ] |
| |
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| @Karlos
Quote:
Karlos wrote: @Heimdall
By metal banging I mean anything that touches the chipset, less sure about supervisor mode registers. Metal banging does work on machines with the actual metal, e.g. A1200 etc. Otherwise, it's UAE.
If you have system friendly code that relies on library calls, generally it will work.
|
My extent of OS-friendly code ends the moment I obtain pointer for the FrameBuffer from the RTG (I don't use RTG methods to draw). After that, there's no OS within the frame loop.
I guess I should still refactor it because RTG could drop that pointer at any time - it just hasn't happened to me yet under WInUAE or V4, so I had no motivation to do that, but I am told that the FrameBuffer pointer is only valid for the duration of single frame...
Quote:
Karlos wrote: As for the Doom example, the fact is that there are still many bottlenecks in that old system. Under the circumstances, it was doing well to be playable at that resolution and rate. Native ports that I tried weren't radically better. |
True, Doom was designed around 386DX40 MHz with the typical bandwidth of that era.
240 MHz RISC beast would probably wipe the floor with the 386 from MIPS perspective, especially if the code was nicely pipelined with minimum pipeline stalls (not sure if the CPU behaves nicely from coder's perspective or if you have to be careful to not break/overwrite the pipeline stage results like on Jaguar)...
Now I'm curious if the performance of 240 MHz PPC is comparable with that of Celeron 266 MHz (that I OC'ed to 448 MHz). Because that was one hell-of-a beast upgrade from DX4-133! All my PASCAL code compiled suddenly in a split of a second, everything was extremely snappy, especially if it ran under DOS... |
| Status: Offline |
| | Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 2-Feb-2025 2:49:39
| | [ #48 ] |
| |
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| Quote:
thellier wrote: Wazp3D (with WinUAE version) is on Aminet |
Thanks. Just downloaded it from Aminet.
Quote:
thellier wrote: Wazp3D for Vampire use Maggie for hardware rendering, this is works in progress that I periodically post on discord ApolloTeam |
I'm not sure I can see the performance benefit of Warp on Maggie in flatshading.
Does Maggie now have some kind of scanline or triangle rasterizer ? Last time I checked, it did have scanline texturing.
Does it perhaps now have some sort of a Blitter that will fill a scanline with fixed color? That'd be handy in higher resolutions!
Up to 1280x800 I can get playable framerates on my V4SA, but 1920x1080 is a slide-show... |
| Status: Offline |
| | Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 2-Feb-2025 2:53:15
| | [ #49 ] |
| |
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| Quote:
Karlos wrote: Native ports that I tried weren't radically better. |
That begs the question. What's the best-performing native PPC 3D game ? Preferably written in ASM (not just recompiled from C/C++) |
| Status: Offline |
| | matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 2-Feb-2025 5:02:23
| | [ #50 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2460
From: Kansas | | |
|
| Heimdall Quote:
"Playable rate" at mere 4x pixel rate (of 320x200) sounds pretty awful for a 240 MHz RISC ! What kind of architecture was it ? Didn't its pipeline have several execution units ? Wouldn't they execute most ops in 1 cycle ? Maybe its branch prediction was horrendous or the pipeline stalls were a common occurrence ?
I've worked a lot with DSP and GPU RISCs on Jaguar so I'm familiar with that kind of a RISC instruction throughput (which was at 26.6 MHz).
This one is, like, an order of magnitude faster !
Perhaps the FastRAM access is as slow like chipram access from 040, meaning the CPU mostly just waits for the RAM ?
I should probably look up the PDF on the PowerPC...
On Jaguar, I had a ~raycaster at 640x200 at 65,536 colors (RGB 565) and it had a Playable framerate on a 26.6 MHz RISC, so I'm most definitely very curious about the bottlenecks on the PPC !
|
The PPC603(e) bottlenecks are many. It can issue two instructions and a branch every cycle but it has only one load/store unit, one simple integer unit and tiny OoO queues so instruction scheduling is difficult and multi-execution/retirement rates are low. It only has static branch prediction which is incorrect 25%-35% of the time but it only has a 4-stage pipeline and it has multiple condition code registers that can sometimes become valid before requiring speculative execution. The shallow 4-stage pipeline made it difficult to clock up but it is nothing expensive die shrinks could not solve and it ended up clocking up further than the more powerful 6-stage 604e which was likely clock limited by the large caches before L2 caches. The PPC603(e) FPU was pipelined for single precision floating point where compilers defaulted to double precision. It had a selectable 32 or 64 bit data bus and the Amiga accelerators had good memory bandwidth but the poor code density wasted bandwidth for instruction fetches. The accelerators also had a slot for the GPU avoiding the lower performance Amiga ZorroIII bus but the Permedia2 based GPUs had early GPU limitations as has already been discussed and no upgrades were available for the non-standard slots.
Heimdall Quote:
That begs the question. What's the best-performing native PPC 3D game ? Preferably written in ASM (not just recompiled from C/C++)
|
Cough, cough. There are no AmigaNOne PPC games written in assembly. I expect few games to even have PPC assembly optimizations. PPC assembly is acronym hell and nobody writes whole games using it, at least not since Apple PPC when there was a large enough hardware base to have a chance at getting development costs back. The number of PPC AmigaNOne hardware sold is only in the low thousands of units and some of the units have failed or users gave up on the PPC AmigaNOne failure. Vamp/AC hardware volumes are higher in a shorter period of time. THEA500 Mini hardware likely sold a couple of hundred thousand units and nothing has come close to that since the Commodore days.
Wipeout was one of the best PPC 3D game ports but it plays fine on a 68k Amiga using emulation.
WipEout 68k Amiga port https://www.youtube.com/watch?v=3JHCYEJNaCg
Enviable PPC games left that are not available on the 68k are Tower 57 which is maybe only pseudo 3D.
Tower 57 - New 2018 Amiga Game! https://www.youtube.com/watch?v=khueCD7qmIw
And Shogo Armor Division.
Shogo Armor Division AmigaOS 4.1 https://www.youtube.com/watch?v=taU-cUkPw4w
Emulated 68k Amigas are higher CPU performance than PPC hardware so the only advantage PPC hardware has left comes from blocking 3D for the 68k. Even the obsolete Warp3D is too much of a threat for A-EonKit to allow, at least until Trevor has unloaded his PPC hardware dead inventory and then we may see it on the A600GS only. Without competitive hardware, it comes down to using software to protect hardware. Only Amiga makes it possible.
Last edited by matthey on 02-Feb-2025 at 02:28 PM. Last edited by matthey on 02-Feb-2025 at 05:09 AM. Last edited by matthey on 02-Feb-2025 at 05:06 AM.
|
| Status: Offline |
| | Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 2-Feb-2025 11:19:07
| | [ #51 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @Heimdall
Quote:
Heimdall wrote: Quote:
Karlos wrote: Native ports that I tried weren't radically better. |
That begs the question. What's the best-performing native PPC 3D game ? Preferably written in ASM (not just recompiled from C/C++) |
Hahaha. We'll just wait for someone to write one, then we'll know!
Almost nothing is written in PPC assembly language, with the exception of the occasional function that's been tuned by hand.
This is one reason why I've never understood why it's still even a thing. It doesn't have any "native" software really. At least not in the sense you imply.
Here is a very old (poor quality phonecam) video of Doom attack on that old HW https://youtu.be/AQ1t5q3xmYk?si=kKPiy0cAtNqxL5qQ
The screen wipe effect is very slow as it involves reading back from the VRAM *I think* bytewise. Once the game has started the frame rate was OK, better than the 68040/25 was getting at 320x200. My mouse was a bit knackered and I had only one hand. It's my movement that's jerky not the game.
In context I think it's quite a good achievement given that you are running code optimised for one CPU on a completely different one, under emulation. Whatever emulation overheads they quickly get hidden by other bottlenecks. I'm sure the sound still being submitted to chip ram didn't help, IIRC chip ram access from the PPC side is brutal.Last edited by Karlos on 02-Feb-2025 at 12:27 PM.
_________________ Doing stupid things for fun... |
| Status: Offline |
| | Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 2-Feb-2025 12:36:18
| | [ #52 ] |
| |
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| @matthey
Quote:
The PPC603(e) bottlenecks are many. It can issue two instructions and a branch every cycle but it has only one load unit, one simple integer unit and tiny OoO queues so instruction scheduling is difficult and multi-execution/retirement rates are low. It only has static branch prediction which is incorrect 25%-35% of the time but it only has a 4-stage pipeline and it has multiple condition code registers that can sometimes become valid before requiring speculative execution. The shallow 4-stage pipeline made it difficult to clock up but it is nothing expensive die shrinks could not solve and it ended up clocking up further than the more powerful 6-stage 604e which was likely clock limited by the large caches before L2 caches. | That is very similar to the design of DSP/GPU in Jaguar, which had horrendous amount of HW bugs - I gotta wonder how many HW bugs slipped through the cracks here. During one of my 10-week ASM coding sprees I discovered multiple undocumented HW bugs on Jaguar, which made coding it very problematic, as the code you wrote was technically correct, it just didn't execute correctly
Perhaps PowerPC had more funds than Atari and its engineers were able to persuade management to not release the CPU while it was barely in the Alpha stage ?
Quote:
Heimdall Quote:
That begs the question. What's the best-performing native PPC 3D game ? Preferably written in ASM (not just recompiled from C/C++)
|
Cough, cough. There are no AmigaNOne PPC games written in assembly. I expect few games to even have PPC assembly optimizations. PPC assembly is acronym hell and nobody writes whole games using it, at least not since Apple PPC when there was a large enough hardware base to have a chance at getting development costs back. | That is just batsh*t crazy ! Wow! In over quarter century nobody could be arsed to use that Pure RISC POWAH! WTF!
I mean, I understand from first-hand Atari experience that RISC ASM is savage, but I'm going to assume that PowerPC wouldn't release the chip in Alpha stage like Atari did, so at the very least, the chip actually works.
Quote:
The number of PPC AmigaNOne hardware sold is only in the low thousands of units and some of the units have failed or users gave up on the PPC AmigaNOne failure. Vamp/AC hardware volumes are higher in a shorter period of time. THEA500 Mini hardware likely sold a couple of hundred thousand units and nothing has come close to that since the Commodore days. | Few Thousands units - That's not an insubstantial market though. It's enough for a single developer, for sure, maybe also paying for an artist for few months of work (the game genre would have to be selected carefully of course).
Quote:
Emulated 68k Amigas are higher CPU performance than PPC hardware so the only advantage PPC hardware has left comes from blocking 3D for the 68k. Even the obsolete Warp3D is too much of a threat for A-EonKit to allow, at least until Trevor has unloaded his PPC hardware dead inventory and then we may see it on the A600GS only. Without competitive hardware, it comes down to using software to protect hardware. Only Amiga makes it possible. | Yes, in 2025, it's been long surpassed by V4SA, rPi et al. But, there was about quarter century when it was dominant, no ? |
| Status: Offline |
| | Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 2-Feb-2025 12:52:40
| | [ #53 ] |
| |
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| @matthey Quote:
There are no AmigaNOne PPC games written in assembly. I expect few games to even have PPC assembly optimizations. PPC assembly is acronym hell and nobody writes whole games using it, at least not since Apple PPC when there was a large enough hardware base to have a chance at getting development costs back. |
While I have no idea about how good C compiler is for PowerPC, I'm inclined to believe that it's going to be at least marginally better than the one for 68000 - with which I have a lot of experience when at the beginning of my Jaguar journey I was working with the 13.3 MHz 68000.
And even if it wasn't faster, the 240 MHz makes it basically unimportant for 90% of the code, because at that frequency, even crappy bloated code will have plenty of frame time to execute fast.
I'd argue that you can write 90% of the code in C and it's still going to run at 30/60 fps anyway.
From the experience of writing racing game, the following components would be just fine even if written in C and yet still make the game run at 60 fps: - Track Culling - Strafing Physics - AI for 8 enemies - Frame Loop - Input - Menus - HUD
The only component that would need to be in ASM is just the flatshader itself, the rest can absolutely be a crappy C compiled code. All you'd have to do to make it run at 60 fps is benchmark the throughput of the flatshader and based on how much CPU time is left just adjust 3D scene complexity.
There's no way that you could spend a whole frame time of 240 MHz on the frame loop (minus the rasterizer), but even if one somehow would, there's another whole frame time for the ASM rasterizer, and that's plenty for 30 fps. |
| Status: Offline |
| | Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 2-Feb-2025 15:06:54
| | [ #54 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @Heimdall
If you only require flat shading, or even gouraud shading without textures, using W3D as a rasterizer, even with the possible complication of having to convert your fixed point coordinates to full floating point, is going to fly on 68K. That conversion overhead is per vertex but even the ancient Permedia2 can fill flat and basic shaded polygons at a much faster rate than the CPU can.
_________________ Doing stupid things for fun... |
| Status: Offline |
| | kolla
|  |
Re: Integrating Warp3D into my 3D engine Posted on 2-Feb-2025 16:54:08
| | [ #55 ] |
| |
 |
Elite Member  |
Joined: 20-Aug-2003 Posts: 3359
From: Trondheim, Norway | | |
|
| @matthey
Quote:
Wipeout was one of the best PPC 3D game ports but it plays fine on a 68k Amiga using emulation |
But that's not really using the CPU emulation much, it uses a warp3d wrapper for windows-native 3D handling through WinUAE. If you want to see how it works using 68k CPU emulation, you rather use Wazp3D instead of QuarkTex.Last edited by kolla on 02-Feb-2025 at 04:56 PM.
_________________ B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC |
| Status: Offline |
| | matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 2-Feb-2025 17:53:33
| | [ #56 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2460
From: Kansas | | |
|
| Heimdall Quote:
That is very similar to the design of DSP/GPU in Jaguar, which had horrendous amount of HW bugs - I gotta wonder how many HW bugs slipped through the cracks here. During one of my 10-week ASM coding sprees I discovered multiple undocumented HW bugs on Jaguar, which made coding it very problematic, as the code you wrote was technically correct, it just didn't execute correctly
|
RISC CPU designs evolved from the classic RISC pipeline. The 5-stage scalar classic RISC pipeline was simple and minimalist working reasonably well until superscalar cores were desirable.
https://en.wikipedia.org/wiki/Classic_RISC_pipeline
The PPC603(e) is a superscalar design and limited OoO execution was chosen to deal with the irregular instruction execution due to dependencies and limited resources for multiple execution pipelines. This is in contrast to superscalar in-order CPU core designs which use separate instruction fetch pipelines and execution pipelines with an instruction buffer between them to couple consistent instruction fetch and inconsistent instruction execution. The latter design choice is more popular today for low end load/store ARM and RISC-V cores, RISC-V being the ISA evolution of David Patterson's influential Berkley RISC.
https://en.wikipedia.org/wiki/Berkeley_RISC https://www.cs.utexas.edu/~fussell/courses/cs352h/papers/risc.pdf
Ironically, the RISC-V SiFive 7-series 8-stage superscalar CPU core design resembles the 8-stage superscalar 68060 design more than the classic RISC pipeline or the PPC603(e) limited OoO design. The 68060 can execute more powerful CISC instructions in a single execution pipeline per cycle that are the equivalent of 2 RISC instructions like "add mem,reg" which RISC breaks into a "load mem,reg1+add reg1,reg2" also requiring an extra register. The SiFive 7-series adds extra hardware to the design, counter to RISC philosophy, to avoid load-to-use stalls but the weak RISC ISA limits performance that they are unwilling to give up for RISC philosophical reasons.
The shallow limited OoO PPC603(e) design also hides the minimal load-to-use latency. The problem was that shallow pipeline CPU cores do not clock up well and deepening the RISC pipeline generally increases load-to-use latencies and requires much large OoO queues. Deeper pipelined PPC core designs used more power and were not as efficient. Ironically, the 8-stage superscalar 68060 design was the better design to clock up and had better performance efficiency (performance/MHz) than the PPC603 core but Motorola entered into the AIM Alliance and cancelled the announcement of a 68060@66MHz, never allowing the full 68060 with MMU and FPU to be clocked up.
Heimdall Quote:
Perhaps PowerPC had more funds than Atari and its engineers were able to persuade management to not release the CPU while it was barely in the Alpha stage ?
|
Motorola/Freescale/NXP made professional 68k and PPC CPU designs with few bugs. Intel CPUs generally had more bugs but their designs were often more aggressive and they incrementally upgraded them more often. Motorola/Freescale was often too conservative, too slow to make incremental improvements and suffered from management issues.
Heimdall Quote:
That is just batsh*t crazy ! Wow! In over quarter century nobody could be arsed to use that Pure RISC POWAH! WTF!
I mean, I understand from first-hand Atari experience that RISC ASM is savage, but I'm going to assume that PowerPC wouldn't release the chip in Alpha stage like Atari did, so at the very least, the chip actually works.
|
I believe MorphOS has the better low level PPC programmers and the better optimized for PPC OS and software. Hyperion's AmigaOS 4 and AmigaNOne existence is practically due to a scam and coercion of the financially distressed Amiga Inc owners. The lawsuits are still going on today wasting valuable resources that could be used to improve the Amiga. Trevor/A-EonKit was once seen as the way forward with PPC despite likely financing the Hyperion shenanigans and now they are more of an Amiga road block hanging onto obsolete PPC and low end ARM emulation.
There are still Amiga assembly programmers but they are 68k Amiga assembly programmers. They choose to program the 68k Amiga because it is more fun and easier, even if they understand PPC assembly. Take Frank Wille for example. He has optimized PPC code for the VBCC compiler and PPC games but he has written many games in assembly for the 68k Amiga and even has written his own 68k assemblers and linkers.
https://www.lemonamiga.com/games/list.php?list_people=Frank%20Wille https://amiga.abime.net/artists/view/frank-wille-phx http://sun.hasenbraten.de/~frank/
Frank is not just an Amiga legend but a great guy too.
Heimdall Quote:
Few Thousands units - That's not an insubstantial market though. It's enough for a single developer, for sure, maybe also paying for an artist for few months of work (the game genre would have to be selected carefully of course).
|
Why PPC for a few thousand user base? The 68k market is much larger. The 68060 Amiga user base alone is likely larger. There are several new 68060 accelerators to go with older designs and 68060s have dried up, with rev6 68060s that clock around 100 MHz bringing around 500 Euros now. There used to be thousands of used 68060s from the embedded market available and now full 68060s are difficult to find with people using 68LC060s instead. The Vamp/AC hardware has likely sold over 10,000 units. THEA500 Mini has likely sold at least 200,000 units. Amiga users want real 68k Amiga hardware but they are more willing to accept 68k Amiga emulation with Amiga chipset compatibility than PPC 68k emulation without it. There are likely tens of thousands of WinUAE, RPi emulation and other emulation users and tens of thousands of other 68k Amiga FPGA hardware users. PPC hardware is insanely priced, the performance is not good enough for the price and 68k Amiga compatibility is not good enough. It is dead for all intents and purposes. Everyone has moved on except Trevor who loves his PPC and is still trying to sell his inventory of dead PPC hardware.
There is actually a chance to create 68k Amiga SoC ASICs for a low cost. The RP2040 MCU uses more transistors than a 68060+AGA and is sold for less than $1 USD.
https://en.wikipedia.org/wiki/RP2040 https://en.wikipedia.org/wiki/RP2350
The upgraded RP2350 uses more transistors and also sells for ~$1 USD. The 68060 or AC68080 would no longer be limited to 100 MHz but clock to 1+ GHz speeds. AGA would no longer be the slow chipset on new silicon and with modern memory. Actually, SRAM could be used for chip mem that would outperform modern GPU memory if small memory was adequate. SRAM for the MCU memory is what caused the transistor counts of the RPi MCUs to be higher than a 68060+AGA transistor count after all.
Heimdall Quote:
Yes, in 2025, it's been long surpassed by V4SA, rPi et al. But, there was about quarter century when it was dominant, no ?
|
I used to be on the Apollo team and tried to talk Gunnar into planning for an ASIC but he optimized the Apollo core and Apollo ISA for a FPGA instead. Only Amiga makes it possible to have a large 68k Amiga market and pitiful hardware.
|
| Status: Offline |
| | Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 3-Feb-2025 1:50:21
| | [ #57 ] |
| |
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| Quote:
Karlos wrote:
If you only require flat shading, or even gouraud shading without textures, using W3D as a rasterizer, even with the possible complication of having to convert your fixed point coordinates to full floating point, is going to fly on 68K. |
I can't imagine that conversion function would take more than a day, really. Especially, considering I do have a method working with floating points in ASM. Besides, basic FMOVE should work just fine, I think - I'm using FxP for 3D coordinates, but 3D transform inputs FIxed Point, but outputs simple Integer screen coords - hence FMOVE should work.
Quote:
Karlos wrote: That conversion overhead is per vertex but even the ancient Permedia2 can fill flat and basic shaded polygons at a much faster rate than the CPU can. | You just made me realize that even though I primarily want Warp support for the rPI4, it's going to have a side effect of reaching out to a much broader HW base, especially if I won't require texturing...
Meaning, that'd actually help the 030/040 CPUs as they wouldn't have to do any drawing whatsoever, right ? I have no idea if anyone could even have such a config (030 with Permedia - is it even possible?), but the moment I'll have C2P, the user will be able to choose the rasterizer upon start-up (1. SW RTG (32-bit), 2. SW C2P (8-bit), 3. Warp)
That would actually encompass great majority of all Amiga configs
I like this forum I can slowly come to realizations that I wouldn't have had, otherwise ! |
| Status: Offline |
| | Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 3-Feb-2025 2:11:07
| | [ #58 ] |
| |
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| Quote:
Karlos wrote:
Hahaha. We'll just wait for someone to write one, then we'll know!
Almost nothing is written in PPC assembly language, with the exception of the occasional function that's been tuned by hand.
|
Actually, it's not an entirely unreasonable idea! My high-level ASM compiler Higgs (which has an output to vasm) has several backends that I wrote over the years: - 6502 (Lynx and Atari XL) - 68000 - Z80 (Ready for Spectrum Next that I got via Kickstarter) - RISC DSP - RISC GPU
Adding the PowerPC backend would be a question of about 2 weeks, once I learn the syntax and experiment with it. It wouldn't produce the fastest and bestest PowerPC code (that takes time and experience), but it would be native code and it would primarily use all 32 registers before attempting to access any RAM variables.
My Higgs Compiler does have some basic functionality to rearrange instructions to fit the pipeline stages. I could refactor that based on the PowerPC603 Pipeline stage architecture and then I know it'd create fastest possible code with minimum stalls.
Now - that's a rabbit hole that I don't want to enter before this summer, because I primarily want to finish the damn game first
But, once my part-time job becomes full-time around July, and I will have only couple days here and there (which are unusable for game coding), that'd be just fine to keep adding features to the Higgs compiler for the PowerPC backend.
BTW, didn't early Acorn Archimedes have one of those PowerPC chips ? Those STB instructions and 3-operand ops look awfully similar to Archie ones...
Till then (~June '25) , my plan is to have C2P, Warp and a full game. Maybe set up a dev environment for PowerPC experimenting... |
| Status: Offline |
| | Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 3-Feb-2025 6:47:25
| | [ #59 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @Heimdall
Although 030/882 will run W3D I am unaware of any specific HW configuration that had anything less than 040 for Permedia at least .
As for Archimedes, they were ARM all the way, AFAIK
Quote:
Meaning, that'd actually help the 030/040 CPUs as they wouldn't have to do any drawing whatsoever, right ? |
Your workload would approximate to converting your fixed point coordinates post transformation to floating point (as you say, you can use fmove to do that) and filling an array of vertices. For W3D v4 you have 3 different pointers - one for geometry, one for colour data and one for texture data. IIRC, there are 2 colour pointers and 16 texture pointers but no v4 driver supports multitexture. It's irrelevant for your use case anyway by the sounds of it.
You have a range of options on how to draw your primitives at this point. You might want to keep dense vertex data and an index buffer so that you can use DrawElements() or you might want to emit a linear array of data for use with DrawArray(). There are advantages and disadvantages of both.
Drawing wise, youre going to be stuck with triangles, fans and strips. Don't count on point and line primitives working almost anywhere.Last edited by Karlos on 03-Feb-2025 at 10:54 AM.
_________________ Doing stupid things for fun... |
| Status: Offline |
| | matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 3-Feb-2025 18:33:51
| | [ #60 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2460
From: Kansas | | |
|
| Karlos Quote:
Although 030/882 will run W3D I am unaware of any specific HW configuration that had anything less than 040 for Permedia at least .
|
As I recall, the Warp3D.library itself is compiled for the 68020+6888x and is not a problem. The code is so bad that it looks more like 68000 or RISC code. The compiler generated code using MOVE like it was THE load instruction and did not know about index register scaling which it would recalculate for every use. The combination and other problems literally doubled the number of instructions as the unofficial optimized version was approaching half the size and could likely reach it without exaggeration. EGCS compiler trash?
The Warp3D Avenger/Voodoo driver libraries were likely compiled with a newer version of GCC for the 68040. The FSOP/FDOP FPU instructions with result rounding are used by GCC for better IEEE consistency between platforms but the disadvantage is decreased intermediate calculation result precision and compatibility was lost with the 6888x FPUs (VBCC code compiled for the 68040 or 68060 FPU should be compatible with the 6888x in contrast). The code is compiled for the 68040 specifically as the 68040 has missing FINT(RZ), which was the only FPU instruction(s) restored in hardware for the 68060, for common floating point to integer rounding and creates a bloated mess with thousands of inlined software equivalent conversions. I did not recognize what the Permedia2 driver libraries were compiled with. Maybe an older version of GCC? Perhaps Storm C? What CPU(s)+FPU(s) was it compiled for?
Karlos Quote:
As for Archimedes, they were ARM all the way, AFAIK
|
Early ARM Archimedes was before PPC existed and maybe PPC601 prototypes were available when the last Archimedes models were introduced.
1985 ARM1 introduced 1986 ARM2 introduced 1987 first Archimedes released 1988 1989 1990 1991 1992 last Archimedes models released 1993 PPC601 released
The original ARM architecture was used in the 3DO but then disappeared from desktop and console use and was not competitive for embedded use until the 1994 Thumb ISA after licensing Hitachi SuperH to improve code density. This saved ARM but the 68k and SuperH remained more popular with ARM 32-bit embedded volumes a distant 4th place behind the 68k, MIPS and SuperH in 1997.
1997 32-bit embedded volumes 1 68k 79.3million 2 MIPS 44.0million 3 SuperH 23.5million 4 ARM 10million 5 i960 9million 6 x86 9million 7 PPC 3.9million
RISC Volume Gains But 68K Still Reigns https://websrv.cecs.uci.edu/~papers/mpr/MPR/19980126/120102.pdf
Even with PPC Mac volumes and PPC embedded volumes combined, 68k volumes were still about 8 times higher and about the same domination the 68k had in the embedded market over ARM. ARM replaced their fat ARM ISA with skinny Thumb ISA and eventually went from #4 to #1 while Motorola replaced their #1 skinny 68k ISA with fat PPC ISA and it eventually went extinct in the embedded market. ARM even performed this turnaround of fortunes using technology borrowed from the 68k by 2nd source Hitachi and licensed to ARM for Thumb. Code density was even important in the desktop market where fat Alpha, PA-RISC, MIPS, SPARC, original ARM and PPC ISAs were all replaced by better code density ISAs too.
Last edited by matthey on 03-Feb-2025 at 06:38 PM.
|
| Status: Offline |
| |
|
|
|
[ home ][ about us ][ privacy ]
[ forums ][ classifieds ]
[ links ][ news archive ]
[ link to us ][ user account ]
|