Poster | Thread |
Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 3-Feb-2025 18:40:23
| | [ #61 ] |
|
|
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @matthey
I seem to recall it was all StormC4 in GCC mode. Except for my unreleased Permedia2 iteration which I think ended up compiling with StormC "classic".
Most of those code quality issues were irrelevant, almost all the performance issues and bloat were due to source code organisation problems - every combination of vertex/colour/texture settings ended up generating full sets duplicate code for the v4 API call internals.
That was the main thing I fixed - I had small, highly targeted fetch routines for each element in each possible that were called indirectly from a single copy of a routine. Whenever the format or key state changes happned, we'd update those pointers to the matching fetch operation. _________________ Doing stupid things for fun... |
|
Status: Offline |
|
|
matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 3-Feb-2025 19:29:48
| | [ #62 ] |
|
|
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2460
From: Kansas | | |
|
| Karlos Quote:
I seem to recall it was all StormC4 in GCC mode. Except for my unreleased Permedia2 iteration which I think ended up compiling with StormC "classic".
|
Storm C as I suspected. That likely explains the 68k FPU color rounding problems in the Permedia2 driver. The Storm C GCC versions are known to be buggy. The code probably just needed a recompile with a newer version of GCC but Storm C was not updated to work with newer versions of GCC and programmers using Storm C were reluctant to abandon the nice IDE.
So there were unreleased 68k Permedia2 and Radeon drivers? I can see why PPC AmigaOS 4 is so poorly optimized compared to MorphOS. It is not just PPC that makes optimization and debugging difficult but Hyperion development incompetence. Well, I guess it is no longer Hyperion's fault but A-EonKit is just as bad. Development would be easier without these road blocks that claim to support the Amiga.
Karlos Quote:
Most of those code quality issues were irrelevant, almost all the performance issues and bloat were due to source code organisation problems - every combination of vertex/colour/texture settings ended up generating full sets duplicate code for the v4 API call internals.
That was the main thing I fixed - I had small, highly targeted fetch routines for each element in each possible that were called indirectly from a single copy of a routine. Whenever the format or key state changes happened, we'd update those pointers to the matching fetch operation.
|
Yea, lots of duplicate code with minor differences in Warp3D. Inlining gone bad. Maybe the programmers treated the 68k like PPC with the expensive function prologues and epilogues making function calls expensive and encouraging inlining. PPC optimization is usually max inlining and loop unrolling but that gives max bloat and minimum code sharing. One of the remaining 68k AmigaOS strengths is code sharing that allows a smaller footprint than other OSs but PPC sabotages it big time. It is not surprising that Hyperion's intentions of using the PPC AmigaOS for embedded use were never realized.
|
|
Status: Offline |
|
|
Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 3-Feb-2025 20:00:43
| | [ #63 ] |
|
|
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @matthey
My 68K (also WarpOS) Permedia2 driver was never released, but it was ported to become the driver for 4.1.
The techniques for deduplicating the all the vertex array stuff ended up in the radeon drivers too, IIRC. _________________ Doing stupid things for fun... |
|
Status: Offline |
|
|
Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 3-Feb-2025 20:09:36
| | [ #64 ] |
|
|
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| Just to clarify, the reason the P2 drivers weren't released wasn't due to any sort of embargo or legal nonsense, but there were maintenance/repo issues, not helped by the fact my code has become incompatible with the SC4 gcc compile path. _________________ Doing stupid things for fun... |
|
Status: Offline |
|
|
Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 4-Feb-2025 2:17:24
| | [ #65 ] |
|
|
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| @matthey
Quote:
matthey wrote:
The PPC603(e) bottlenecks are many. It can issue two instructions and a branch every cycle but it has only one load/store unit, one simple integer unit and tiny OoO queues so instruction scheduling is difficult and multi-execution/retirement rates are low. | Yeah, that's the standard RISC deal. It took me some time to get the inner loops on Jaguar's GPU to work at peak pipeline efficiency with zero stalls. Sometimes I would write the function in half a day and then it took a week of refactoring and rearranging till it was 100% instruction throughput. Typical performance difference was 3:1, so it was worth the effort.
Quote:
matthey wrote:
It only has static branch prediction which is incorrect 25%-35% of the time but it only has a 4-stage pipeline and it has multiple condition code registers that can sometimes become valid before requiring speculative execution. The shallow 4-stage pipeline made it difficult to clock up but it is nothing expensive die shrinks could not solve and it ended up clocking up further than the more powerful 6-stage 604e which was likely clock limited by the large caches before L2 caches. |
Well, I guess that means that the performance difference with PPC603e between unoptimized and pipeline-optimized code would be up to 4:1. Ouch.
Quote:
matthey wrote:
The PPC603(e) bottlenecks are many. | The single most important system-level RISC bottleneck on Jaguar, that we couldn't do anything about, was Bus Bandwidth - even if you coded a 10-op short loop writing #0 into main RAM, you couldn't do more than 19,000 writes during single NTSC frame. That's less than 1/3 of 320x200 screen. Now contrast that number (19,000 store ops) to an actual instruction throughput of 443,000 ops within NTSC frame. That's a ratio of almost 25:1 ! That's how much slower the main RAM access was from the 4 KB RISC cache.
What kind of RAM access throughput can one expect on PPC603e ?
On Jaguar, we had Blitter, so I would have 2 scanlines in the 4 KB cache, one being written to by GPU, the other being copied in parallel by Blitter. Is there anything like that on PPC603e ?
|
|
Status: Offline |
|
|
Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 4-Feb-2025 2:29:48
| | [ #66 ] |
|
|
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| @matthey
Quote:
Why PPC for a few thousand user base? The 68k market is much larger. The 68060 Amiga user base alone is likely larger. There are several new 68060 accelerators to go with older designs and 68060s have dried up, with rev6 68060s that clock around 100 MHz bringing around 500 Euros now. There used to be thousands of used 68060s from the embedded market available and now full 68060s are difficult to find with people using 68LC060s instead. The Vamp/AC hardware has likely sold over 10,000 units. THEA500 Mini has likely sold at least 200,000 units. Amiga users want real 68k Amiga hardware but they are more willing to accept 68k Amiga emulation with Amiga chipset compatibility than PPC 68k emulation without it. There are likely tens of thousands of WinUAE, RPi emulation and other emulation users and tens of thousands of other 68k Amiga FPGA hardware users. PPC hardware is insanely priced, the performance is not good enough for the price and 68k Amiga compatibility is not good enough. It is dead for all intents and purposes. Everyone has moved on except Trevor who loves his PPC and is still trying to sell his inventory of dead PPC hardware. |
From my perspective, even if the A500Mini sold 1 Mil units, it still wouldn't matter, as OCS is just too slow for the games I love creating and playing.
The second biggest potential market is going to be PiStorm/EMU68, as long-term, that's the only viable route given the HW keeps dying (and prices keep rising).
The numbers on Vampires sold vary wildly, so I'll take a guess that it sold over 5,000 units.
I have zero idea as to how many active 060s are in the wild. Is it even 1,000 ? I doubt there's more of them then V2/V4 Vampires, though...
It probably doesn't help that each day more of us die of old age and life... |
|
Status: Offline |
|
|
matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 4-Feb-2025 5:29:41
| | [ #67 ] |
|
|
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2460
From: Kansas | | |
|
| Heimdall Quote:
Yeah, that's the standard RISC deal. It took me some time to get the inner loops on Jaguar's GPU to work at peak pipeline efficiency with zero stalls. Sometimes I would write the function in half a day and then it took a week of refactoring and rearranging till it was 100% instruction throughput. Typical performance difference was 3:1, so it was worth the effort.
|
It is the standard deal for low end low power superscalar RISC cores anyway. Minimal units and load-to-use latency makes instruction scheduling difficult and impossible in some cases. The PPC603 design was alright after a 2nd integer unit and dynamic branch prediction was added to create the PPC G3.
Arthur Revitalizes PowerPC Line https://websrv.cecs.uci.edu/~papers/mpr/MPR/ARTICLES/110203.PDF
The PPC604e had 2 load/store units too which makes instruction scheduling easy even for RISC but is was far from low end and the PPC G3 was based on the PPC603 because it is lower power. More OoO units means more OoO queues (reservation stations in PPC terminology). PPC designs make me appreciate the superscalar in-order 68060 design with dual address generation and integer units, no load-to-use stalls and good branch prediction. It is a much better low end design than most low end RISC designs.
Heimdall Quote:
Well, I guess that means that the performance difference with PPC603e between unoptimized and pipeline-optimized code would be up to 4:1. Ouch.
|
The shallow pipeline meant the branch mispredict penalty was small. Motorola was still preferring the shallow PPC 603 pipeline advantages over the deeper PPC 604 pipeline from the paper above. Steve Jobs was so unhappy with shallow pipeline PPC CPUs not clocking up though that he switched to x86.
Heimdall Quote:
The single most important system-level RISC bottleneck on Jaguar, that we couldn't do anything about, was Bus Bandwidth - even if you coded a 10-op short loop writing #0 into main RAM, you couldn't do more than 19,000 writes during single NTSC frame. That's less than 1/3 of 320x200 screen. Now contrast that number (19,000 store ops) to an actual instruction throughput of 443,000 ops within NTSC frame. That's a ratio of almost 25:1 ! That's how much slower the main RAM access was from the 4 KB RISC cache.
|
Small scratch pad memory? Yea, SRAM performance blows DRAM memory performance away. The 68060 8kiB I+D caches are SRAM which reach 600 MiB/s @50MHz using a 500nm chip process.
https://www.nxp.com/docs/en/data-sheet/MC68060UM.pdf Quote:
This pipeline architecture supports extremely high data transfer rates within the MC68060 processor. The on-chip instruction and operand data caches provide 600 MBytes/sec @ 50 MHz to the pipelines, while the integer execute engines can support sustained transfer rates of 1.2 GBytes/sec.
|
The CyberStorm MKIII 68060 accelerator reaches a sustained 68 MiB/s with DRAM Simms.
http://amiga.resource.cx/manual/CyberStorm3.pdf
This is only about a 9:1 ratio but the CyberStorm MKIII had good DRAM performance at that time. Newer 68060 accelerators have better memory performance with more modern memory though.
Heimdall Quote:
What kind of RAM access throughput can one expect on PPC603e ?
|
Good question. The Phase5 CyberStorm MKIII manual gives the memory bandwidth but the Phase5 Blizzard PPC manual does not.
http://amiga.resource.cx/exp/blizzardppc http://amiga.resource.cx/manual/BlizzardPPC-de.pdf
The Blizzard PPC accelerator with PPC603e and 32-bit data/memory bus was not as high end as the Cyberstorm PPC accelerator with PPC604e and 64-bit data/memory bus. I would not be surprised if the CyberStorm MK-III 68060 accelerators had a higher memory bandwidth.
Heimdall Quote:
On Jaguar, we had Blitter, so I would have 2 scanlines in the 4 KB cache, one being written to by GPU, the other being copied in parallel by Blitter. Is there anything like that on PPC603e ?
|
No, but a CPU would not have a blitter. The Amiga chipset has a blitter but it was poorly upgraded by Commodore so even the AGA blitter is not worth using with faster 68k CPUs. Maybe the Vamp/AC hardware with SAGA has a fast enough blitter as a FPGA chipset with high memory bandwidth pulls closer to a low clocked FPGA CPU core. An FPGA blitter could have new features for 3D rendering too. Some PPC SoCs have DMA engines and discrete GPUs have DMA engines too but there may not be enough documentation to get the latter working.
Heimdall Quote:
From my perspective, even if the A500Mini sold 1 Mil units, it still wouldn't matter, as OCS is just too slow for the games I love creating and playing.
|
THEA500 Mini emulates AGA and reaches 68030 levels of performance with JIT turned on which is off by default for better compatibility. It has better performance than probably 90% of the Amigas Commodore sold but is low performance for 3D.
Heimdall Quote:
The second biggest potential market is going to be PiStorm/EMU68, as long-term, that's the only viable route given the HW keeps dying (and prices keep rising).
The numbers on Vampires sold vary wildly, so I'll take a guess that it sold over 5,000 units.
|
I have heard 10,000 units but many would be older, cheaper and less capable accelerators. Maybe the newer more expensive V4 hardware reaches 5,000 units.
Heimdall Quote:
I have zero idea as to how many active 060s are in the wild. Is it even 1,000 ? I doubt there's more of them then V2/V4 Vampires, though...
It probably doesn't help that each day more of us die of old age and life...
|
I expect the number of 68060 accelerators exceeds the number of V4 hardware. There are over a dozen 68060 accelerator designs including at least 4 *new* designs released in the last 20 years and still available today. There are more planned too. The supply of 68060s has dried up and the few that pop up sell for several times what they did a few years ago. I expect there are easily 10,000 68060 accelerators in working Amigas and double that would not surprise me. The number of 68060 accelerators that have been sold is likely higher but the ancient hardware is dying as well as the people. It is a problem for the PiStorm as well even though some Amiga users prefer their museum pieces over new standalone hardware. Well, standalone emulation hardware is not an Amiga but using real Amiga hardware for only the keyboard and mouse is far removed from the original Amiga philosophy. I still think the majority of Amiga fans stay away from the Amiga market but the remaining optimists seem to be perfectly happy with emulation despite the abandonment of the Amiga philosophy and elegance.
|
|
Status: Offline |
|
|
Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 4-Feb-2025 11:10:00
| | [ #68 ] |
|
|
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @Heimdall
Quote:
What kind of RAM access throughput can one expect on PPC603e ? |
That depends on which RAM you are writing to. Chip RAM writes were the worst and local Fast RAM are the best. I don't have the numbers to hand, but VRAM on the BVision (which was designed to work with the 603e card) was still only around 20MB/s but I could be misremembering that so don't take it as read._________________ Doing stupid things for fun... |
|
Status: Offline |
|
|
Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 4-Feb-2025 14:40:22
| | [ #69 ] |
|
|
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| @matthey
Quote:
I used to be on the Apollo team and tried to talk Gunnar into planning for an ASIC but he optimized the Apollo core and Apollo ISA for a FPGA instead. Only Amiga makes it possible to have a large 68k Amiga market and pitiful hardware. | 5 years ago it would have been ludicrous, but given current prices for 060 rev6, we might as well crowdfund an ASIC.
I wonder, is the reasonable amount for 060 ASIC still in the $10 Mil range these days? |
|
Status: Offline |
|
|
Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 4-Feb-2025 16:20:53
| | [ #70 ] |
|
|
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| As controversial a figure as he is, Gunnar chose FPGA. It was a practical and achievable goal. An ASIC is not, it's just a pipedream where people have millions to burn on a vanity project for the fun of it. _________________ Doing stupid things for fun... |
|
Status: Offline |
|
|
NutsAboutAmiga
|  |
Re: Integrating Warp3D into my 3D engine Posted on 4-Feb-2025 19:12:50
| | [ #71 ] |
|
|
 |
Elite Member  |
Joined: 9-Jun-2004 Posts: 12962
From: Norway | | |
|
| |
Status: Offline |
|
|
Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 4-Feb-2025 19:51:45
| | [ #72 ] |
|
|
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| |
Status: Offline |
|
|
matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 4-Feb-2025 20:52:19
| | [ #73 ] |
|
|
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2460
From: Kansas | | |
|
| Heimdall Quote:
5 years ago it would have been ludicrous, but given current prices for 060 rev6, we might as well crowdfund an ASIC.
I wonder, is the reasonable amount for 060 ASIC still in the $10 Mil range these days?
|
ASIC planning and development should have begun more than 5 years ago. There have been many 68k Amiga FPGA developments including BoXeR, MiniMig, MiST, Natami and Vampire. You may have heard of some due to 68k Atari FPGA development as well like MiST supporting both and newer Vamp/AC hardware supporting Atari graphics modes. For a better overview of post Amiga hardware, including FPGA hardware, there is Trevor's Amiwest 2024 video where he tries to answer, "What is an Amiga in 2024?"
Dickinson Keynote Speech - Amiwest 2024 https://youtu.be/RSfkPQOsmOM?t=208
My involvement began with Thomas Hirsch's Natami project which was started in 2003.
https://www.reviewmylife.co.uk/blog/2011/03/23/natami-native-amiga-interview/
Thomas was working on the SAGA chipset replacement and making it functional with real 68k CPUs but there was a limited supply of high end 68k CPUs and they offered poor value using 1990s silicon. Jens Kuenzer created the N68050 CPU core and then Gunnar von Boehn started to help him make a more advanced superscalar version. They knew each other from working at IBM Germany together.
The Natami project gained strong interest with no advertising. Many interested users and developers appeared offering to help which is where I came in. Thomas refused financial help though and tried to focus on his work. The Natami "MX Bringup Thread" had 761487 views at one point I recorded and likely went much higher. It was Thomas documenting the bringup of the Natami MX board.
Thomas Hirsch Quote:
First true enhancement
For the first time on the MX board it is getting exciting, even for me. Till now I was "only" adapting the LX design to the new board. OK, by doing that I made quite huge progress in stability and usability. The board can now even be operated stand-alone. But this were just all the many mandatory things needed for the completeness of the system itself. But now... I was able (and had the time) to improve, or rather extend ECS. As you (hopefully) know, OCS had a hard wired frame generator. Because of that there were two different Agnus chips, one for NTSC and one for PAL. With ECS this issue was resolved in a quite superior manner. They did not only implement a NTSC/PAL switch but also added a complete set set of frame generation registers. From that on there was no limitation to the screen size anymore. Even the A2024 resolutions (1024x1024) were possible. This was a lot more than a common PC could offer that time (in 1988).
With the ECS frame generator it was even possible to display some VGA screen modes as 640x480. But there was still one limitation. The pixel clock was limited to fixed 28MHz. For the A2024 this was no problem, the refresh rate was set to 10Hz and the monitor itself had a built in frame buffer to display the image content at a much higher frequency. VGA and Multisync monitors had no internal memory. So this technique could not be used and resolutions that high as the A2024 were not possible to display on them. And even the 800x600 resolution needed to be in interlace because of the in comparison low pixel frequency. AGA did increase the color depth and the overall number of colors available, but left the pixel clock unchangeable.
The pixel clock on the Natami is not generated by an external oscillator. It is synthesized by a programmable PLL (Phase Locked Loop). Its frequency can be changed at run time. I now implemented an interface which allows the PLL being accessed through DFF registers. With that I am able to set up a basic screen resolution of 1280x1024 in 60Hz for a functionality test. Not system friendly, just a part of a memory field and mouse pointer. But it actually works. I have known it from the beginning that it is possible and will work, but seeing that the Natami can now match the native resolution of my test TFT is something different! I`ll send a design update to Annika as soon as I can.
And the second good news is that with the new resolution I was able to confirm that the digital portion of DVI is also working.
Chipset Features (new) Frame generation .......... ECS and variable pixel clock -> UCS SyncZorro Interface ....... preliminary version Copper .................... fully implemented, with buffered data fetch Video DMA ................. fully implemented 256 color registers ....... fully implemented AGA HAM8 .................. fully implemented Sprites ................... 16bit linebuffer blitter ................... basic implementation. Block and fill mode only, line to come Video priority ............ half implemented Scandoubler ............... fully implemented Interrupts ................ fully implemented Paula DMA control ......... fully implemented Audio out ................. fully implemented Disk DMA .................. 880k and 1760k, read only Serial Port Paula UART .... fully implemented Slow peripheral I/O ....... fully implemented (Joy/Mouse/Keyb/PRT/DSK/SER) PC mouse and kbd support .. o CIAs ...................... fully implemented
Board Features VGA out (DVI-A) ........... working (new) DVI out (DVI-D) ........... working PCI ....................... transfer only, arbiter and config missing IDE ....................... PIO mode 0 working Compact Flash connector ... o NEC USB PCI ............... o RTL 8110 LAN .............. o Battery-backed up clock.... working 15k Video out (module) .... o 15k Video in (module) ..... o Audio in .................. o
|
 https://www.reviewmylife.co.uk/blog/2011/03/23/natami-native-amiga-interview/
The pressure got to Thomas and he took the project under ground. Gunnar started the Apollo Team to continue the FPGA CPU development and invited me to join. Jens was there but much less involved. The early hardware came from my suggestion of helping and partnering with Majsta who had a cheap FPGA based Amiga 600 accelerator that clamped on to the 68000 like teeth, which is where the Vampire name came from. A larger FPGA was really needed from the beginning and eventually there were discussions of new boards. I suggested adding SAGA/RTG with digital output and Gunnar contacted Thomas about using SAGA. I left before Thomas rejoined the team after Gunnar accused me of sabotaging development and sent his goons to harass me on forums. Gunnar is more than capable of sabotaging his own project though. I did push for an ASIC and worked toward that goal. I contacted InnovASIC that specializes in embedded chips and the architect of the Fido CPU32 (68k compatible) who sounded interested in helping us create an ASIC that they could sell into the embedded market. Gunnar acted excited based on the news but then ignored me as I tried to bring it together. Gunnar continued right on optimizing for a FPGA including the core and ISA convincing me that it was his toy project.
It is possible to create an ASIC for much less than $10 million USD. The low end cost is likely below $1 million for a simple FPGA to ASIC conversion but the performance gain is minimal. Converting a large FPGA gains many transistors which increases performance and features for the price but the clock speed needs to be raised to 1-2 GHz to be competitive. A more expensive ASIC process and development work is needed for this and I am not sure how the costs separate out between the development costs, ASIC costs and chip costs. A professionally developed ASIC SoC would likely cost several million dollars to begin producing chips. Licensing existing IP including CPU cores, GPU cores and SoC I/O could be half a million plus each and there could be royalties, although I believe it is possible to avoid some royalties ARM charges, giving a competitive advantage. Licensed GPU cores and HDMI royalties are difficult to avoid though. The actual cost would depend on the ASIC goal with a lower end 68k Amiga MCU like the RP2040 or RP2350 costing much less than a RPi 4 SoC competitor. A FPGA could be optional for 68k retro chipsets as a FPGA is more competitive for chipsets than CPUs.
Crowd funding to produce an ASIC is a possibility but most of the development should be complete and the SoC ready to go into production. The Ouya microconsole raised $8,596,475 USD via Kickstarter back in 2013 so crowd funding everything is possible but developing a SoC ASIC is a slow process that could easily take 18 months. There are a lot of Amiga IP loose ends currently with the endless lawsuits. Uncertainty is the enemy of investment. The cost of developing a 68k Amiga ASIC SoC is low enough to be possible for small businesses. It may be a large investment for a small business but it is reasonable considering the cost of continued noncompetitive hardware. Trevor, who is in the video I linked above, has likely malinvested millions USD into PPC Amiga hardware. The X5000 and A1222 funding was at least $1.2 million.
A-EON Technology & Ultra Varisys sign $1.2M agreement for new PowerPC hardware https://web.archive.org/web/20140328052600/http://www.a-eon.com/18-10-2013-3.pdf
This does not include the X1000, AmigaOS, other software or lawsuit funding. Hyperion claimed to have spent 400,000 Euros on legal fees.
https://docs.google.com/file/d/1OeDNpvkf99a5-4F-3Y11HqE22rqjV767/edit
The Amiga Documents site claims the Hyperion lawsuits were funded by Trevor and has evidence to back it up, some of which has come to light due to the lawsuits.
https://sites.google.com/site/amigadocuments/
All that PPC AmigaNOne malinvestment and nothing to show for it but a stockpile of expensive outdated PPC hardware that has sold less than Vamp/AC FPGA hardware and ancient 68060 silicon accelerators despite the sabotage of the 68k market. Trevor claims to be an angel investor but he is no angel and no investor in the Amiga. Only Amiga and Amiga mental cases make it possible.
Last edited by matthey on 04-Feb-2025 at 09:09 PM. Last edited by matthey on 04-Feb-2025 at 09:07 PM. Last edited by matthey on 04-Feb-2025 at 09:01 PM.
|
|
Status: Offline |
|
|
NutsAboutAmiga
|  |
Re: Integrating Warp3D into my 3D engine Posted on 4-Feb-2025 22:32:56
| | [ #74 ] |
|
|
 |
Elite Member  |
Joined: 9-Jun-2004 Posts: 12962
From: Norway | | |
|
| |
Status: Offline |
|
|
matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 4-Feb-2025 22:50:04
| | [ #75 ] |
|
|
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2460
From: Kansas | | |
|
| NutsAboutAmiga Quote:
$10 million USD
you know that’s $1 million per customer 
|
Maybe for PPC AmigaNOne. I know the PPC AmigaNOne market is in steep decline but is it that bad already?
There are hundreds of thousands of 68k Amiga users vs thousands of PPC AmigaNOne users after two decades of sabotaging the 68k Amiga. Only Amiga and Amiga mental cases make it possible.
|
|
Status: Offline |
|
|
Heimdall
|  |
Re: Integrating Warp3D into my 3D engine Posted on 5-Feb-2025 13:37:41
| | [ #76 ] |
|
|
 |
Member  |
Joined: 20-Jan-2025 Posts: 47
From: North Dakota | | |
|
| @matthey Quote:
Yea, lots of duplicate code with minor differences in Warp3D. Inlining gone bad. Maybe the programmers treated the 68k like PPC with the expensive function prologues and epilogues making function calls expensive and encouraging inlining.
PPC optimization is usually max inlining and loop unrolling but that gives max bloat and minimum code sharing. | I can only speak for the 3D engine portion of RISC optimizations, but I've experienced many scenarios on Jaguar, with benchmarks to prove it, where unrolling the loops was actually faster not than just simple unrolled loop, but it was faster than the following process:
1. HALT GPU 2. HALT DSP 3. HALT 68000 4. Halt Blitter (it is always busy during flatshading) 5. Copy Unrolled code into 4 KB RISC Cache using Blitter 6. Restart DSP 7. Restart GPU 8. Restart 68000 9. Wait for the routine to finish 10. Go back to Step 1 and copy the code the GPU was supposed to be doing before this
We're not talking here just about the naive performance difference between unrolled loops, but the performance of all 3-4 chips that is completely lost because all of those chips stop processing the 3D scene while the new code is being copied.
That's a metric crapton of performance lost across all the processors (which were busy processing AI, input, audio,culling, etc.), yet due to the nature of RISC, it's faster to stop everything just to copy the unrolled code.
Oh, and once the unrolled loop is done, then you have to halt all processors again, just to copy the 4 KB chunk that was supposed to be processed before the unrolled loop.
Meaning, you pay that performance price twice. Yet, it's still faster overall ! Absolutely mind-boggling ! Imagine if you didn't have to freeze the system just to copy new 4 KB code chunk...
So, a term Code Bloat is not appropriate here. If you want actual performance on RISC, that's what you gotta do, otherwise we're getting maybe 15% of the true performance potential of the RISC processor...Last edited by Heimdall on 05-Feb-2025 at 01:39 PM.
|
|
Status: Offline |
|
|
matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 5-Feb-2025 21:54:03
| | [ #77 ] |
|
|
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2460
From: Kansas | | |
|
| @Heimdall Early consoles like the Jaguar typically had CPUs with small instruction caches, no data caches, small scratch pad memories and multiple memories with different performance for the CPU, video and sound. This makes them more difficult to program than later consoles. The Amiga chipset combined the video and sound memory into chip memory and Commodore rarely shipped Amiga computers with fast memory so it was almost like a unified memory. The 1996 N64 changed the typical console to become more like desktop hardware with a higher clocked CPU with larger caches and unified memory but transistor budgets and improved silicon allowed this. There was still stall happy RISC hardware with high theoretical maximum performance but the PS3 PPC Cell processor was another inflection point resulting in the abandonment of difficult to program PPC consoles and the replacement with easier to program CISC CPU cores. Modern consoles are practically standardized lower end x86-64 desktop hardware which required modern silicon to finally lower the power and heat to allow x86-64 cores in a console. The x86-64 hardware is not as easy to program or as low of power as 68k hardware using equivalent silicon but some people call themselves Amiga supporters while road blocking and sabotaging the 68k Amiga.
Even RISC hardware does not have to be as difficult to program. It is a philosophy and design choice. Simple instructions result in more instructions that bloat code and makes programming tedious. Load/store instructions split CISC mem-reg instructions creating dependent instructions and load-to-use stalls. Minimizing the number of units means only one instruction of each type can be executed per cycle. Some so called RISC ISAs have more complex and powerful CISC like instructions and addressing modes like ARM64/AArch64. Some so called RISC hardware adds more hardware and CISC like hardware to add more units and eliminate load-to-use stalls like the in-order SiFive 7 series CPU cores. These violations of the RISC philosophy to be simple, move RISC closer to CISC and make programming, optimization and debugging easier but only use some of the CISC benefits. There are also some CISC benefits RISC developers have not figured out like using a variable length encoding to not only reduce the code size but reduce the number of instructions. Modern RISC more closely resembles CISC than classic RISC but RISC developers are afraid they would not be accepted as RISC anymore if they abandon too many RISC principals. If RISC developers would abandon everything but load/store memory accesses, an easy to use and powerful RISC architecture is possible. Maybe the RISC propaganda and CISC bad reputation would finally go away and we would see the return of CISC reg-mem accesses and maybe even revival of good CISC architectures like the 68k.
Last edited by matthey on 05-Feb-2025 at 09:54 PM.
|
|
Status: Offline |
|
|
kolla
|  |
Re: Integrating Warp3D into my 3D engine Posted on 6-Feb-2025 15:02:50
| | [ #78 ] |
|
|
 |
Elite Member  |
Joined: 20-Aug-2003 Posts: 3359
From: Trondheim, Norway | | |
|
| @matthey
Quote:
There are hundreds of thousands of 68k Amiga users |
Where are they hiding!?_________________ B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC |
|
Status: Offline |
|
|
matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 6-Feb-2025 18:07:06
| | [ #79 ] |
|
|
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2460
From: Kansas | | |
|
| matthey Quote:
There are hundreds of thousands of 68k Amiga users
|
kolla Quote:
They are not hiding unless with a bag over the head from the current embarrassing Amiga situation and hardware. The market is divided and some are not very active. Many just play retro Amiga games and do not buy anything in the Amiga market or communicate in the Amiga community. My guess is about 300,000 68k Amiga users.
THEA500 Mini 200,000 other emulation (WinUAE and ARM/RPi) 50,000 Vamp/AC 10,000 other FPGA 20,000 original hardware 20,000 --- total: ~300,000
It is not bad considering there is no competitive hardware. Amiga fans are less likely to invest in EOL hardware and software and less likely to continue using it.
|
|
Status: Offline |
|
|
NutsAboutAmiga
|  |
Re: Integrating Warp3D into my 3D engine Posted on 6-Feb-2025 21:02:58
| | [ #80 ] |
|
|
 |
Elite Member  |
Joined: 9-Jun-2004 Posts: 12962
From: Norway | | |
|
| |
Status: Offline |
|
|