Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
|
|
|
|
Poster | Thread | pixie
 |  |
Re: Integrating Warp3D into my 3D engine Posted on 24-Feb-2025 6:42:25
| | [ #161 ] |
| |
 |
Elite Member  |
Joined: 10-Mar-2003 Posts: 3449
From: Figueira da Foz - Portugal | | |
|
| @matthey
Quote:
You say can't before you even tried. I'm surprised you aren't still crawling from saying you can't walk as a baby. |
Would it allow for more software to be developed that could take advantage of that speed? Why isn't that happening already with the emulation route? From a user's point of view, they can't tell the difference, so why isn't more software appearing? You have a success with The A500 Mini; where are the new games being sold? Heck, where's the old IP being sold so that it can attract new developers? Is it really due to a lack of a new 68k processor?_________________ Indigo 3D Lounge, my second home. The Illusion of Choice | Am*ga |
| Status: Online! |
| | Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 24-Feb-2025 9:15:23
| | [ #162 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4937
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @matthey
Quote:
You say can't before you even tried. I'm surprised you aren't still crawling from saying you can't walk as a baby. |
Lol, is that your riposte? You don't have a single leg to stand on. Look, I said it *could* be done. Of course it could: you'd need an experienced designer (perhaps more than one) and access to fabrication and testing facilities. Custom ASIC are made for all kinds of purposes and there's an entire sector around it. Then you need to pay for it all, and you need to debug it. There could be several iterations involved before you get a verified working part.
And that's just to produce a working design. But what other factors are involved? What bus protocol will your new ASIC use? How will you interface it to memory and peripherals? In order to be useful at 1-2GHz you're not going to want a Zorro or trapdoor expansion interfaced to some paltry few tens of MB/s to your ancient RTG solution.
You're going to want support for modern solid state storage. What about RTG, 3D, USB and Networking?
It's not just about building a new ASIC, it's an entirely new system. You're going to end up redesigning all those other components too if you want to avoid potential endian issues and "keep the original vision" as you keep alluding to. We all know how you feel about byteswapping, you've complained about "swizzles" more times than I can count.
You're talking about an outlay of millions of standard Earth credits. What ROI are you going to get? You're going to have to charge prices higher than AEON does for their systems and they're using *existing* CPU. Trevor has made this happen at his own considerable expense and even at the prices his systems sold for its not a remotely viable product. It's a vanity project at best and he didn't have to start by *designing, implementing, fabricating and testing* the fecking CPU from scratch.
Who else would be in the market for an expensive 68K ASIC? Sure there are other retro communities but there's no ongoing business concern that would pick it up that aren't already invested in ARM, x64 or even RISC V.
You are an intelligent guy and you know your low level detail, but you appear to live in a state of complete delusion. It's really very sad to see.
You could be contributing towards a more performant emulation, with hotspot tracing, instruction rescheduling and other features to help mitigate what you consider to be the worst latencies in execution.
But that's probably about as likely as the ASIC. You couldn't even be bothered to spend the minutes it took to write a benchmark tool to test your own hypothesis! I had to do it for you.
Last edited by Karlos on 24-Feb-2025 at 09:22 AM. Last edited by Karlos on 24-Feb-2025 at 09:16 AM.
_________________ Doing stupid things for fun... |
| Status: Offline |
| | Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 24-Feb-2025 9:54:02
| | [ #163 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4937
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| Quote:
I'm sorry. I walk away with what 68k Amiga hardware is today like hundreds of thousands of Amiga fans. I hope you and a few thousand eternal optimists enjoy your ARM and PPC Frankenmigas. The masses have already voted with their feet. There is nothing to see in Amiga Neverland but a pitiful ending controlled by anti-Jay thinkers |
Anti Jay Thinkers. You should trademark this! The evil AJT.
Look, there aren't any Anti Jay Thinkers, except in your distorted fantasy world. Every extant computing platform works in the same fundamental way as the Amiga did, even going further:
- Dedicated peripheral hardware for graphics, sound, IO, etc, complete with DMA, bus mastering, etc. - Asynchronous, concurrent execution of hardware accelerated operations. - Multitasking. Asynchronous concurrent execution of software tasks: genuinely, across multiple CPU cores. - Graphical user interface with HID input.
When Jay brought a machine to this world first sporting all these fundamental principles together, various other vendors were still asking if basic multitasking was even useful on their dull monochrome textmode beep boxes.
All that changed in pretty short order. The Amiga design legacy is literally everywhere in modern desktop computing.
Anti Jay Thinkers, lol.
-edit- Typos, etc.Last edited by Karlos on 24-Feb-2025 at 10:13 AM. Last edited by Karlos on 24-Feb-2025 at 10:00 AM.
_________________ Doing stupid things for fun... |
| Status: Offline |
| | Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 24-Feb-2025 10:06:38
| | [ #164 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4937
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| Actually, now that I think about it, maybe you *do* have a point. Jay is on record as stating that he didn't believe everyone needed or should want a personal computer and that it might actually be a bad thing for society in general.
Pinning your future direction on hardware that doesn't exist and that in all probability will not ever exist, is one way to ensure that.
Except for the fact that everyone already does have a personal computer these days, whether that's a phone, tablet, laptop, desktop, smart watch, other... _________________ Doing stupid things for fun... |
| Status: Offline |
| | matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 24-Feb-2025 21:35:22
| | [ #165 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2602
From: Kansas | | |
|
| pixie Quote:
Would it allow for more software to be developed that could take advantage of that speed? Why isn't that happening already with the emulation route? From a user's point of view, they can't tell the difference, so why isn't more software appearing? You have a success with The A500 Mini; where are the new games being sold? Heck, where's the old IP being sold so that it can attract new developers? Is it really due to a lack of a new 68k processor?
|
THEA500 Mini and A600GS ARM Cortex-A53 SoC based stand alone hardware have likely sold 100,000-200,000 units together that qualify as 68040+ spec CPU performance and at least AGA spec. The sales volume may be approaching the number of AGA Amigas sold by Commodore. It does not feel like the community has received a large influx of new Amiga fans or even returning fans. The problem is that the hardware is not real 68k Amiga hardware and is not good enough to be taken seriously. There is not much use for such low end ARM hardware when other ARM hardware is cheaper like RPi hardware. The future of emulation is EOL support with no need for a roadmap showing a dead end. Virtual machines died with Amiga Nowhere, the Java virtual machine (JVM) where poor performance resulted in Java code being compiled into native code and Android's Dalvik which, due to poor performance, was replaced by the complex Android Runtime (ART) using a combination of AOT and JIT. Mentioning a VM is like trying to sell the lowest end hardware possible. "Come get your slow hardware, nobody has slower hardware than us." Even Amiga fans are not going to return in mass for crap hardware. THEA500 Mini is a facade designed to bring back nostalgic memories but everyone knows it uses emulation which is the least accurate and lowest end form of hardware recreation. A good review of the hardware is that it is adequate. Developers are highly unlikely to target these low end systems because it is well known that most of these limited devices soon end up in a drawer or the trash. Compilers rarely target a VM with any support more likely to come from the VM developers than the compiler developers. It is not worthwhile to optimize code for a VM moving target and even compilers with good Amiga support like VBCC require a real 68k CPU as a target even though more Amiga users likely use Amiga VMs than real hardware today. The Amiga market is controlled by Amiga IP squatters that may as well shout, "we are crooked amateurs now buy our crap hardware while we defend what we stole and sabotage everyone else". In contrast, the Natami project brought in returning Amiga users and developers that showed up on the Natami forum practically everyday offering to help with developing or financing, all with grassroots word of mouth and no advertising. The Natami MX Bringup thread with hardware testing shot up to 761,487 views quickly which I recorded and likely went higher. Real quality and competitive hardware makes a huge difference. A real 68k CPU is important because CPU cores in FPGAs affordable to the consumer can not be clocked up enough and emulation of the 68k can only provide a fraction of the performance of a CPU core. While JIT compilation outperforms FPGA CPU cores like the AC68080, there will never be 64-bit support, SMP support, SIMD support or other enhancements. A single 1990s era CPU core does not offer much value without new silicon and enhancements.
|
| Status: Offline |
| | Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 24-Feb-2025 21:49:34
| | [ #166 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4937
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @matthey
Quote:
While JIT compilation outperforms FPGA CPU cores like the AC68080, there will never be 64-bit support, SMP support, SIMD support or other enhancements. |
Yes, because 68K AmigaOS is literally bristling with support for these things! If 64-bit and SMP updates to AmigaOS was a simple task, then NG would have gone that way already given that multicore 64-bit hardware is available to both for years and years. It's easier to do if you want to break all kinds of backwards compatibility.
Anyway the argument you make here is unfounded. Creating a reimagined 64 bit 68K in software that's already running on a 64-bit substrate is a far simpler prospect than building a 64-bit 68K multi core ASIC, for the same reason that building anything complex in software is easier than doing it from scratch in hardware.
_________________ Doing stupid things for fun... |
| Status: Offline |
| | ZXDunny
|  |
Re: Integrating Warp3D into my 3D engine Posted on 24-Feb-2025 23:33:38
| | [ #167 ] |
| |
 |
New Member |
Joined: 7-Feb-2025 Posts: 7
From: Unknown | | |
|
| Yeah but...
The PiStorm is available now. The 1GHz+ 68060 is not and never will be.
I mean, that's like... a no-brainer really. The PiStorm is revitalising development and causing new patches to older software to be developed - which 060 users also get a benefit from - and new ports are coming through to take advantage of the increased speed and RTG graphics.
It's not a massive upheaval, but I don't see anyone losing anything due to the existence of the PiStorm.
|
| Status: Offline |
| | matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 25-Feb-2025 0:58:10
| | [ #168 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2602
From: Kansas | | |
|
| Karlos Quote:
Lol, is that your riposte? You don't have a single leg to stand on. Look, I said it *could* be done. Of course it could: you'd need an experienced designer (perhaps more than one) and access to fabrication and testing facilities. Custom ASIC are made for all kinds of purposes and there's an entire sector around it. Then you need to pay for it all, and you need to debug it. There could be several iterations involved before you get a verified working part.
|
I guess I'm down to your crawling level so maybe we can see eye to eye. I know verification and testing is a lot of the work. Using already tested and verified IP can save a lot of time but licensing is not cheap either. Fabless semi development is common today that even small businesses are capable of. Thomas Hirsch developed the Natami boards with his own lab and equipment. Jens and Gunnar developed the N68050 and N68070 into the AC68080 on a shoe string budget. Ideally when developing for an ASIC, a large high end FPGA would be used for development which can get expensive. Gunnar has tested the AC68080 in higher end FPGAs so even that is within his budget and he made an ASIC sound prohibitively expensive too. Once the core is prepared for the fab, they receive it and use their own equipment. It is not easy but accessible enough that the Amiga community has already developed and tested many FPGA cores which could be further developed into ASIC cores. When I was talking to Dave at InnovASIC which specialized in creating ASICs for embedded use, we discussed specs but he never asked me about financing even though I told him I was from the Apollo team instead of a business. I did not expect an ASIC would be free but assumed some kind of partnership where they would be able to market into the embedded space. I specifically looked for embedded market end users that could increase economies of scale too. One IoT customer that needs millions of SoCs and the ASIC becomes cheap and lower risk. It is surprising how easy it is to talk to CEOs of some embedded businesses doing IoT but I suppose they made it to their position by being better communicators and negotiators than Gunnar.
Karlos Quote:
And that's just to produce a working design. But what other factors are involved? What bus protocol will your new ASIC use? How will you interface it to memory and peripherals? In order to be useful at 1-2GHz you're not going to want a Zorro or trapdoor expansion interfaced to some paltry few tens of MB/s to your ancient RTG solution.
You're going to want support for modern solid state storage. What about RTG, 3D, USB and Networking?
|
Internally, the standard HDL is the US is Verilog and the standard bus is AMBA.
https://en.wikipedia.org/wiki/Advanced_Microcontroller_Bus_Architecture
ColdFire uses Verilog and AMBA. The 68060 is likely written in Verilog but predates AMBA even though it is written to be modular too. It would be useful to license the ColdFire V5 also as it is very similar and parts could be used to quickly upgrade and enhance the 68060 like hardware return stack, ColdFire ISA instructions, improved branch prediction and 32kiB I+D L1 caches. There is a full static design 68000 core as well. As far as modern and tested I/O, licensing SiFive IP would likely be the easiest which is also written in Verilog and uses AMBA. Connecting the modules as needed should provide a mostly modern working system but some work may be needed for the Amiga chipset and system. Most legacy I/O could be provided on board headers. I would hope there would be no RTG but rather chunky modes along with legacy modes. There are several AGA cores to choose from. SAGA is one of the nicest but it is written in VHDL along with the AC68080 core although it was translated from the original AHDL. I believe an integrated 3D GPU is necessary to be competitive for small SBCs and it makes sense as many discreet GPUs have stopped supporting big endian modes. There are many GPU cores available to license.
Karlos Quote:
It's not just about building a new ASIC, it's an entirely new system. You're going to end up redesigning all those other components too if you want to avoid potential endian issues and "keep the original vision" as you keep alluding to. We all know how you feel about byteswapping, you've complained about "swizzles" more times than I can count.
|
Endianess can be a real pain in software. It is easier to fix in hardware as often the wires can be rewired as desired. I doubt there would be a major problem replacing the RISC-V core of SiFive IP with a 68k core although it is a good question to ask hardware engineers. Some licensable GPUs have likely already dropped/removed BE support but there are enough licensable GPUs available that I doubt it would be a problem.
Karlos Quote:
You're talking about an outlay of millions of standard Earth credits. What ROI are you going to get? You're going to have to charge prices higher than AEON does for their systems and they're using *existing* CPU. Trevor has made this happen at his own considerable expense and even at the prices his systems sold for its not a remotely viable product. It's a vanity project at best and he didn't have to start by *designing, implementing, fabricating and testing* the fecking CPU from scratch.
|
There is risk but it is still better than flushing cash down the drain for fantasy "desktop" hardware. The 68k and ColdFire were very popular for embedded use and there is no doubt a market for replacement hardware, developers who like the 68k/ColdFire and developers who like ease of development. It is a tough sell against entrenched ARM hardware but not nearly as tough as PPC hardware with the AmigaOS for the desktop market. The embedded market is huge and diverse. The RPi entered the embedded market with a SoC using a single scalar 32-bit ARM11 CPU core and quickly defined an embedded hardware form factor and GPIO layout. The RPi Fabless Semi developed SoCs use dual core superscalar CPU cores that clock to 150MHz in the latest ASIC which they sell in the millions at ~$1 each. Dual ARM Cortex-M33@150MHz, 520kiB of SRAM and 2MiB NOR flash (like ROM/kickstart) is higher spec than the most popular 68k Amiga OCS spec but does not have HDMI. The 68k AmigaOS has the code sharing they need to allow useful HDMI with a not much larger spec. I believe there is room in the embedded market for both as they would have different power, performance and features. There is no room in the embedded market for a Cortex-A53@1.4GHz emulating a 68060@150MHz though.
Karlos Quote:
Who else would be in the market for an expensive 68K ASIC? Sure there are other retro communities but there's no ongoing business concern that would pick it up that aren't already invested in ARM, x64 or even RISC V.
|
Very low priced retro toys could easily become embedded hardware. The original RPi was thought to be a toy from an unknown when it first came out.
Desktop OoO x86-64 hardware does not scale low enough to be a threat to a 68k in-order core. RISC-V has competitive enough low end hardware but lacks software (especially games), maturity, ease of use and standards. For example, see the following videos of VisionFive 2 games.
Games on VisionFive 2 https://www.youtube.com/playlist?list=PLwFl__0ckx_xHsGp-4QWRGEskkbdxMO7q
Emulation, no standard OS, compiling games before playing, bugs. The CPU and GPU performance for this sub $100 SBC is on par with old PPC hardware though. The SoC uses RISC-V in-order 8-stage SiFive U74 CPU cores that have better performance/MHz than a PPC G5 CPU. The design resembles a 68060 design and gains performance from avoiding load-to-use stalls but the RISC-V ISA can not fully take advantage of the design.

https://en.wikichip.org/wiki/sifive/microarchitectures/7_series#Memory_subsystem Quote:
Memory subsystem
The largest change in the 7 Series is the overhaul of the memory subsystem. The data cache and the optional tightly integrated memory (TIM) can now span two cycles, enabling large SRAM/TIM to be included with the core. Additionally, two sets of ALUs have been incorporated into the pipeline in order to allow a zero cycle load-to-use latency where the first stage is used for the address generation and the last stage can be used to operate on the data.
|
ALUs are relatively cheap by today's CPU core standards. RISC-V has the better in-order "CISC" design but a weak ISA. ARM has a stronger ARM64 ISA but weak low power in-order CPU designs and reduced power efficiency high end in-order designs. ARM knew load-to-use stalls were performance killers as they had just reduced the Cortex-A7 load-to-use penalty from 2 cycle to 1 cycle and promoted the "performance optimizing feature" before downgrading the Cortex-A53 successor to a 3 cycle load-to-use penalty.
https://community.arm.com/cfs-file/__key/telligent-evolution-components-attachments/01-2142-00-00-00-00-45-56/Enabling_5F00_Mobile_5F00_Innovation_5F00_with_5F00_the_5F00_Cortex_2D00_A7_5F00_Processor.pdf Quote:
Memory System Tuned to Minimize memory latency
There are several performance optimizing features in the memory system. The address generation unit is shifted one stage back in the pipeline to enable a single cycle load-use penalty. The design team increased TLB size to 256 entries, up from 128 entries for the Cortex-A5 and Cortex-A9; this reduces page walks saving power and significantly improves performance for large workloads like web browsing with large data sets that span a large number of pages. Also, page tables entries can be cached in L1, improving the speed of page table walks on TLB misses. The bus interface unit has support for multiple outstanding read and write transactions. Finally, the physically indexed caches enable efficient OS Context switching.
|
Sometimes architects change and lessons are unlearned. Customers will buy ARM no matter what because ARM has the best embedded chips. Reputations can be better than products but many non-technical people make hardware decisions and nobody ever got fired for buying IBM or ARM. ARM improved the load-to-use penalty from 3 to 2 cycles in the Cortex-A55 but the Cortex-A53 is still more popular. ARM did not advertise their Cortex-A53 load-to-use penalty like they did when it was good and were perhaps too embarrassed to tell customers they fixed the "performance degrading feature" with the Cortex-A55.
Karlos Quote:
You are an intelligent guy and you know your low level detail, but you appear to live in a state of complete delusion. It's really very sad to see.
You could be contributing towards a more performant emulation, with hotspot tracing, instruction rescheduling and other features to help mitigate what you consider to be the worst latencies in execution.
But that's probably about as likely as the ASIC. You couldn't even be bothered to spend the minutes it took to write a benchmark tool to test your own hypothesis! I had to do it for you.
|
There is no future for the Amiga on the current path. The Amiga powers like the status quo and sabotage anyone who tries to alter the demise of the Amiga. No more wasted Amiga development from me!
|
| Status: Offline |
| | Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 25-Feb-2025 7:54:47
| | [ #169 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4937
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @matthey
Quote:
Very low priced retro toys could easily become embedded hardware. The original RPi was thought to be a toy from an unknown when it first came out |
How on earth, given everything you just said, is your completely custom ASIC solution, designed from the ground up and mixing in various licensed IP to solve all the other edge/peripheral cases going to be "low priced" ? This is the bit you just don't seem to get.
The Pi was inexpensive because the chip already existed and ARM devices tend to have all the additional hardware you might want already. ARM was already well established in the embedded space by the time the Pi was released.
Yes there are some uses of coldfire still in the embedded sector. Why don't you compare how many there are relative to arm and even x86/x64?
Your custom solution, if it was ever built, would not be inexpensive. Unless you think all the money required to perform the design and verification is not going to be passed on to the end consumer. You should have asked specifically about the costs when talking to "Dave" and not made any assumptions of any kind about how it might be financed. Just because he heard you out and didn't ask doesn't mean he thought your idea was viable. He may just have been being polite.
_________________ Doing stupid things for fun... |
| Status: Offline |
| | Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 25-Feb-2025 11:53:04
| | [ #170 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4937
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @matthey
Quote:
There is no future for the Amiga on the current path. The Amiga powers like the status quo and sabotage anyone who tries to alter the demise of the Amiga. No more wasted Amiga development from me! |
Statements like this make me worry for your state of mind. With respect, it's 2025. The demise already happened decades ago. There was a brief moment where NG development looked like it might pick up the reins, but it too ended up as an "also ran".
The legacy of the Amiga's overall approach to computing is everywhere, but the Amiga itself is nowhere. You couldn't being it back today, and meet today's expectations on security, multi user, SMP, 64-bit etc without changing everything that made it such a joy to use in the first place. If you wanted a modern take on the operating system, at best you're going to be running all the legacy code in a 32-bit sandbox. It's going to be no different than running something like AROS natively on x64 or ARM with the ability to run old 68K code in a box. Which you can do already, only without the expense of a bespoke hardware platform.
Don't get me wrong. If you offered me some supercharged 68K on a motherboard with backwards hardware compatibility and quasi modern performance, at a remotely sane price, I'd take it in a heartbeat. But you aren't offering that. You're offering some delerium what-ifery that is so long in the tooth now. How many years have you been beating the drum for the ASIC approach?
No, you'd rather deride other people's efforts to create something that users can actually make use of, here and now, as "killing the Amiga future".
Seriously, sort your life out._________________ Doing stupid things for fun... |
| Status: Offline |
| | matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 25-Feb-2025 19:40:49
| | [ #171 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2602
From: Kansas | | |
|
| Karlos Quote:
How on earth, given everything you just said, is your completely custom ASIC solution, designed from the ground up and mixing in various licensed IP to solve all the other edge/peripheral cases going to be "low priced" ? This is the bit you just don't seem to get.
|
Economies of scale from the retro 68k market and embedded market could pay for the ASIC. An ASIC allows the price to go down which Retro Games Limited understood when they approached Jeri Ellsworth about creating an ASIC for the upcoming THEA500 Mini. Value is better if not just compatibility is maintained but performance improved and modern I/O supported. The ASIC production cost does not increase much by providing performance and modern I/O with compatibility as can be seen in the Vortex86 ASICs. The 68k series is considered EOL which should allow reduced license costs. The Vortex86 line of ASICs use EOL x86 cores and even appears to have swapped from a Rise Technology developed superscalar 8-stage mP6 CPU core to a lower power scalar 6-stage Cyrix 5x86 core. There are at least 9 different Vortex86 ASICs.
1. Vortex86 (original) 2. Vortex86SX 3. Vortex86DX 4. Vortex86MX 5. Vortex86MX+ 6. Vortex86DX2 7. Vortex86EX 8. Vortex86DX3 9. Vortex86EX2
https://en.wikipedia.org/wiki/Vortex86
Having to downgrade to a scalar x86 CPU core to save power for embedded use makes the ASIC SoCs less competitive in the embedded market but the combined sales into the embedded market and x86 retro market has allowed for 9 ASICs. The 68k dominated the x86 in the embedded market during the 1990s because the 68k is lower power and scales lower, even all the way down to ColdFire MCUs. In 1997, the 68k was #1 in the 32-bit embedded market with a volume of 79.3 million CPU shipments compared to #6 x86 with a volume of 9 million. There were more 32-bit 68k CPU shipments than 32-bit x86 desktop shipments in 1997! A superscalar 68k CPU core would be more competitive than a scalar x86 CPU core for the embedded market but it did not stop the creation of 9 x86 ASICs. Do you think 9 ASICs were created without them being profitable?
Karlos Quote:
The Pi was inexpensive because the chip already existed and ARM devices tend to have all the additional hardware you might want already. ARM was already well established in the embedded space by the time the Pi was released.
|
The RPi using commodity SoCs was no doubt very cheap with ridiculously low capital investment, especially for the tech industry, for RPI to become a 1.15B GBP market cap business today (after a recent loss of about 20%). Many tech businesses require billions in investment and do not turn profitable for many years. A professional quality 68k ASIC likely requires 3-7 million USD of investment and maybe $5-10 million USD to bring products to market. You and Gunnar may think that is impossible when business people would look at it and think it is cheap before writing a check. Yes, there are investors with millions USD in their checking accountants and more in their investing accounts. Also, RPi produces their own "custom" ASIC SoCs which have been hugely successful and sell in the millions.
https://en.wikipedia.org/wiki/RP2040 https://en.wikipedia.org/wiki/RP2350
The RPi ASIC SoCs are not compatible or standardized with other SoCs, have very limited OS support and have no GUI or HDMI support. An ASIC 68k Amiga SoC would be compatible with the AmigaOS and a large library of 68k Amiga software with other 68k software compatibility possible with a small FPGA and could provide a GUI and HDMI output with not much larger footprint.
Karlos Quote:
Yes there are some uses of coldfire still in the embedded sector. Why don't you compare how many there are relative to arm and even x86/x64?
|
Even the 68000 can still be purchased directly from NXP.
https://www.nxp.com/products/MC68000?tab=Buy_Parametric_Tab&_gl=1*13uw9t*_ga*MTE5MjU2NDU5OC4xNzQwNTA2NTQ2*_ga_WM5LE0KMSH*MTc0MDUwNjU0Ni4xLjAuMTc0MDUwNjU0Ni4wLjAuMA..#/
It is out of production and EOL but there is still embedded demand after 46 years! EOL, no development and no roadmap is a problem for developers, as we saw with emulation, and it applies to hardware too which is why it would be good to get 68k and ColdFire hardware back into production. Big businesses go after big fish often leaving profitable niches behind.
Karlos Quote:
Your custom solution, if it was ever built, would not be inexpensive. Unless you think all the money required to perform the design and verification is not going to be passed on to the end consumer. You should have asked specifically about the costs when talking to "Dave" and not made any assumptions of any kind about how it might be financed. Just because he heard you out and didn't ask doesn't mean he thought your idea was viable. He may just have been being polite.
|
It is common that all capital and R&D expenditures are not passed on to a first product. Many businesses take years to become profitable. Consider Roblox (RBLX) which has not made a net profit in the last 5 years.
https://www.google.com/finance/quote/RBLX:NYSE
The Roblox revenue is up every year while the net profit has been down every year until 2024 which was good enough for the stock price to rise over the last year. It is very questionable whether this stock will ever be profitable but I believe there is a bubble in tech stocks reminiscent of 2000. I would rather invest in my idea than Roblox. I am watching the profitable RPI though as it falls to a more reasonable valuation. Stock sell offs often scare investors out of the market and create value opportunities but I usually like some dividend protection.
|
| Status: Offline |
| | Karlos
|  |
Re: Integrating Warp3D into my 3D engine Posted on 25-Feb-2025 21:09:22
| | [ #172 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 4937
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| @matthey
9 iterations of an x86 ASIC only demonstrate that x86 is still in demand. There are doubtless many industrial applications that still need them and very specifically don't need or aren't compatible with a full 64 bit refresh. Not to mention the increase costs of such an upgrade where it's not strictly needed.
Whatever demand there is for 68K in this space seems to be covered by the fact you can still get 68000, 68020 and coldfire. There doesn't seem to be any demand for the 68040/68060. Things that needed them either moved on to coldfire later or to completely alternative processors. _________________ Doing stupid things for fun... |
| Status: Offline |
| | matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 25-Feb-2025 22:08:40
| | [ #173 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2602
From: Kansas | | |
|
| Karlos Quote:
9 iterations of an x86 ASIC only demonstrate that x86 is still in demand. There are doubtless many industrial applications that still need them and very specifically don't need or aren't compatible with a full 64 bit refresh. Not to mention the increase costs of such an upgrade where it's not strictly needed.
|
It is easier to convert x86 to ARM due to both being little endian. However, most "industrial" embedded is big endian.
https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/9tg4gGg8w-c/m/ADy-5lrHAgAJ Shumpei Kawasaki Quote:
I made that comment in the marketing members' meeting.
99 percent of PC, mobile and data center applications are little-endian but it is also true that 90 percent industrial and infrastructure applications are big-endian. ARM users actively use its big-endian mode and will continue to do so.
The bi-endian feature can improve performance or simplify the logic of networking devices and software. Many architectures (ARM, PowerPC, Alpha, SPARC V9, MIPS, PA-RISC, SuperH SH-4 and IA-64) feature a setting which allows for switchable endianness in data segments, code segments or both (Source: https://en.wikipedia.org/wiki/Endianness).
GNU Compiler Collections, binutils, Linux, UEFI and other cross tools and OSes support bi-endian in clean manners. It is more cross tool work that is needed and work in hardware. We can start some ground work to provide a bi-endian platform for RISC-V and RISC-V GCC shows no prior bi-endian work so developers will need to work with community. We know that this reduces porting work involved in convert applications to RISC-V.
This feature on RISC-V will creates an easier transition path from PowerPC, SH, 68K, and Coldfire.
|
The thread is about the lack of big endian support in RISC-V and Shumpei is specifically talking about Japan but the rest of the world is similar. The 68k, SuperH, ColdFire, MIPS and PPC dominated the "industrial" embedded market and defaulted to big endian. Many ARM users were using the better big endian support of ARM but that is deprecated and has been dropped on newer cores. Not only was the 68k 32-bit embedded market nearly 9 times larger than the x86 32-bit embedded market but actively updated 68k ASICs like the Vortex86 ASICs would gain ColdFire, SuperH, PPC and MIPS industrial embedded customers. ColdFire is a nearly compatible subset of the 68k and most new ColdFire instruction encodings are open and can be easily supported on the 68k while improving code density (Gunnar finally saw the light years after I pushed the idea) and SuperH assembly code can be mistaken for 68k code at first glance with the 68k ISA being more flexible and able to support many SuperH instructions one for one. Read the rest of the RISC-V forum thread. RISC-V was originally going to be big endian but later little endian was chosen for x86 and ARM compatibility with little thought given to BE support. I would rather have different and even unique products rather than trying to compete head to head with x86 and ARM. Commodore discovered this when the bottom fell out of the commodity x86 clone market while Amiga computers remained profitable and demand resilient.
Karlos Quote:
Whatever demand there is for 68K in this space seems to be covered by the fact you can still get 68000, 68020 and coldfire. There doesn't seem to be any demand for the 68040/68060. Things that needed them either moved on to coldfire later or to completely alternative processors.
|
The 68000 CPUs are unique and likely used for hobby, educational markets and old embedded products. Even low production 68k cores today are likely FPGA cores in FPGA SoCs. The 68040 and 68060 are no longer competitive because they are not SoCs. ColdFire used simple SoCs but they become noncompetitive due to aging silicon and old I/O over time. New ASICs do not even require frequent updates as the Vortex86 ASICs show but development, support and periodic enhancements to keep them current are required. If the whole line of 68k CPU cores was licensed, it may be possible to sublicense them for FPGA cores like ColdFire cores are by a 3rd party today. Relatively small fabless semi businesses like SiFive, RPI and DM&P Electronics look good compared to perpetually subsidized Trevor Amiga businesses.
Last edited by matthey on 26-Feb-2025 at 12:49 AM. Last edited by matthey on 26-Feb-2025 at 12:43 AM. Last edited by matthey on 25-Feb-2025 at 10:14 PM.
|
| Status: Offline |
| | kolla
|  |
Re: Integrating Warp3D into my 3D engine Posted on 26-Feb-2025 2:39:44
| | [ #174 ] |
| |
 |
Elite Member  |
Joined: 20-Aug-2003 Posts: 3418
From: Trondheim, Norway | | |
|
| @matthey
Quote:
It does not feel like the community has received a large influx of new Amiga fans or even returning fans. The problem is that the hardware is not real 68k Amiga hardware |
No, the “problem” is that a vast majority of the THEA500 owners are primarily _gamers_ and not users. Same for the vast majority of amiga owners in the past, it wasn’t the hardware itself that attracted them, it was games and various dedicated software. Once games and the software moved to other platforms, they moved too. _________________ B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC |
| Status: Offline |
| | Hammer
 |  |
Re: Integrating Warp3D into my 3D engine Posted on 26-Feb-2025 4:57:46
| | [ #175 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @matthey
Quote:
There is risk but it is still better than flushing cash down the drain for fantasy "desktop" hardware. The 68k and ColdFire were very popular for embedded use and there is no doubt a market for replacement hardware, developers who like the 68k/ColdFire and developers who like ease of development. It is a tough sell against entrenched ARM hardware but not nearly as tough as PPC hardware with the AmigaOS for the desktop market. The embedded market is huge and diverse. The RPi entered the embedded market with a SoC using a single scalar 32-bit ARM11 CPU core and quickly defined an embedded hardware form factor and GPIO layout.
|
The 1st Raspberry Pi B used Broadcom's BCM2835 SoC (ref 1) that contains ARM1176JZFS (ARMv6, VFPv2 FPU, Jazelle) and Broadcom VideoCore 4 iGPU .
ARM1176JZF-S includes DSP SIMD instructions for 32-bit registers (ref 2). 32bit SIMD pack math for quad 8bit and dual 16bit datatypes.
From Raspberry Pi (ref 1).
Capable of BluRay quality playback, using H.264 at 40MBits/s.
Provides Open GL ES 2.0, hardware-accelerated OpenVG, and 1080p30 H.264 high-profile decode.
Capable of 1Gpixel/s, 1.5Gtexel/s or 24 GFLOPs of general purpose compute and features a bunch of texture filtering and DMA infrastructure.
Provides Open GL ES 2.0, hardware-accelerated OpenVG, and 1080p30 H.264 high-profile decode.
That is, graphics capabilities are roughly equivalent to Xbox 1 level of performance.
Broadcom Videocore 4 GPU has claims of the original Xbox performance level. Notice the comparison is against a well known game console.
ARM1176JZFS's 700 Mhz clock speed is close to the original Xbox's Intel Coppermine 128K's 733 Mhz clock speed target.
Broadcom's BCM2835 SoC is more than the solo 68060 CPU.
Reference 1. https://raspberry-projects.com/pi/pi-hardware/raspberry-pi-model-b/hardware-general-specifications
2. https://developer.arm.com/documentation/dui0425/f/hardware-description/arm1176jzf-s-development-chip/arm1176jzf-s-development-chip-overview
Quote:
Not only was the 68k 32-bit embedded market nearly 9 times larger than the x86 32-bit embedded market
|
Break down the model sales for 68000, 68020/68030/CPU32, 68040 and 68060.
68060B R&D was canceled.
You will find 68K's large scale embedded sales didn't translate into big core 68K models.
I'm game for another DataQuest based debate.
Embedded AI for smart devices requires higher level compute power for embedded markets.
For example https://en.wikipedia.org/wiki/Mobileye#/media/File:Lane_Guidance_Camera_PCB.jpg
Mobileye EyeQ2 architecture consists of two floating point, hyper-thread 64bit RISC 34KMIPS CPUs, five Vision Computing Engines (VCE), three Vector Microcode Processors (VMP), Denali 64bit Mobile DDR Controller, 128bit internal Sonics Interconnect, dual 16bit Video input and 18bit Video output controllers, 16 channels DMA and several peripherals. The MIPS34K CPU manages the five VCEs, three VMP and the DMA, the second MIPS34K CPU and the multi-channel DMA as well as the other Peripherals. The five VCEs, three VMP and the MIPS34K CPU perform all the intensive vision computations required by the applications such as tracking and pattern classification.
Mobileye EyeQ2 was on the market in 2010.
For 2024, Mobileye EyeQ6 High has 34 TIOPS for INT8 and still based on 64-bit MIPS I6500 architecture as the EyeQ5. The 2 clusters of this SoC contains 4 cores which are capable of running 4 threads. Besides this, it features multiple controllers such as the classic UART, high speed I2C, SPI, as well as CAN-FD, PCIe Gen4, Octal/Quad SPI Flash interface, Gigabit Ethernet, MIPI CSI-2, MIPI DSI, and eMMC 5.1. It also includes a Hardware Security Module, Functional Safety Hardware, and video encoders and more.
https://www.phoronix.com/news/Linux-6.11-Mobileye-EyeQ6H https://en.wikipedia.org/wiki/Mobileye
Quote:
Much of the JIT performance comes from caches. The 68k code translated to ARM code likely more than doubles in size but modern CPU cores have 4 times the L1 caches of the 68060. L2 caches are much better performance than the 68060 memory and even when the memory must be accessed, modern memory is much better too. The max CPU performance is when the code and data is in the L1 caches which is what we are looking at here and JIT is not good at keeping code in the L1 due to simple translation and much worse code density ISAs with ARM64 and x86-64. The translated code is not nearly as efficient as native compiled code either. The Karlos emububble benchmarks only examine all L1 cache performance which should scale linearly with the clock speed.
|
For Zen 4's lower than the L1 cache and after decoder stage, mops cache can hold 6.75K micro-operations, 9 mops dispatch per cycle, the loop buffer / mops queue can hold 144 entries.
I plan to disable the mops cache and assess the impact with Karlos' emububble benchmark. Disabled SMT re-allocates higher cache storage to a single thread.
Last edited by Hammer on 26-Feb-2025 at 03:16 PM. Last edited by Hammer on 26-Feb-2025 at 03:13 PM. Last edited by Hammer on 26-Feb-2025 at 05:51 AM. Last edited by Hammer on 26-Feb-2025 at 05:50 AM. Last edited by Hammer on 26-Feb-2025 at 05:04 AM. Last edited by Hammer on 26-Feb-2025 at 05:02 AM. Last edited by Hammer on 26-Feb-2025 at 05:00 AM. Last edited by Hammer on 26-Feb-2025 at 04:58 AM.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | Hammer
 |  |
Re: Integrating Warp3D into my 3D engine Posted on 26-Feb-2025 6:54:07
| | [ #176 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @matthey
Quote:
Economies of scale from the retro 68k market and embedded market could pay for the ASIC. An ASIC allows the price to go down which Retro Games Limited understood when they approached Jeri Ellsworth about creating an ASIC for the upcoming THEA500 Mini. Value is better if not just compatibility is maintained but performance improved and modern I/O supported. The ASIC production cost does not increase much by providing performance and modern I/O with compatibility as can be seen in the Vortex86 ASICs. The 68k series is considered EOL which should allow reduced license costs. The Vortex86 line of ASICs use EOL x86 cores and even appears to have swapped from a Rise Technology developed superscalar 8-stage mP6 CPU core to a lower power scalar 6-stage Cyrix 5x86 core. There are at least 9 different Vortex86 ASICs.
1. Vortex86 (original) 2. Vortex86SX 3. Vortex86DX 4. Vortex86MX 5. Vortex86MX+ 6. Vortex86DX2 7. Vortex86EX 8. Vortex86DX3 9. Vortex86EX2
|
1. Vortex86 has RISE mP6 and MMX. https://encyclopedia.pub/entry/32847
2. Vortex86SX.
https://archive.org/details/bitsavers_dmpelectrox86SXBriefDataSheet_2527972/page/4/mode/2up Vortex86SX has split 16KB instruction and 16KB data cache L1 design.
i486 has unified 8KB L1 cache design.
Cyrix 5x86 has unified 16KB L1 cache design. https://www.cpu-world.com/CPUs/5x86/Cyrix-5x86-100GP.html
3. Vortex86DX has a split 16KB instruction and 16KB data cache L1 design. Includes 256KB L2 cache https://www.dmp.com.tw/tech/vortex86dx/Vortex86DX_V0.9A_Brief.pdf
Vortex86-DX is an i586-class CPU and lacks cmov.
4. Vortex86MX appear to have implemented MMX instructions. https://encyclopedia.pub/entry/32847
5. Vortex86MX+,
6. Vortex86DX2, https://www.cpu-world.com/cgi-bin/CPUID.pl?CPUID=66102 With FPU, MMX and CMPXCHG8B.
7. Vortex86EX,
8. Vortex86DX3 has a split 32 KB instruction and 32KB data L1 cache, 512KB L2 cache and dual CPU cores. https://www.vortex86.com/products/Vortex86DX3
9. Vortex86EX2 has split 16KB instruction and 16KB data cache L1, 256KB L2 cache, MMX, SSE, SSE2, SSE3, SSSE3, CMOV, CMPXCHG8B, FXSAVE/FXRSTORE, NX bit/XD-bit,
https://www.cpu-world.com/cgi-bin/CPUID.pl?CPUID=72324_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | Hammer
 |  |
Re: Integrating Warp3D into my 3D engine Posted on 26-Feb-2025 8:11:04
| | [ #177 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @matthey
Quote:
On stock Ryzen 5 7600X (fat Zen 4 with 32 MB L3 cache) on cheapo B650 motherboard + 128bit DDR5-6000 office PC, WinUAE 5.2,
Test case 0: 30 ms, like 68060 @ 3.395 Ghz Test case 1: 15 ms, like 68060 @ 5.093 Ghz Test case 2: 10 ms, like 68060 @ 6,360 Ghz Test case 3: 10 ms, like 68060 @ 5.725 Ghz Test case 4: 35 ms Test case 5: 19 ms Test case 6: 11 ms
Ryzen 7 8845HS SoC (mobile Zen 4 16 MB L3 cache, 128bit DDR5L-6400), WinUAE 5.3.1 Test case 0: 38 ms, like 68060 @ 2,680 Ghz Test case 1: 18 ms, like 68060 @ 4,244 Ghz Test case 2: 12 ms, like 68060 @ 5,300 Ghz Test case 3: 12 ms, like 68060 @ 4,770 Ghz Test case 4: 41 ms Test case 5: 23 ms Test case 6: 14 ms This mobile Zen 4 16 MB L3 cache variant acts like Zen 4C. Last edited by Hammer on 26-Feb-2025 at 02:18 PM. Last edited by Hammer on 26-Feb-2025 at 08:16 AM.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | matthey
|  |
Re: Integrating Warp3D into my 3D engine Posted on 26-Feb-2025 20:54:04
| | [ #178 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2602
From: Kansas | | |
|
| kolla Quote:
No, the “problem” is that a vast majority of the THEA500 owners are primarily _gamers_ and not users. Same for the vast majority of amiga owners in the past, it wasn’t the hardware itself that attracted them, it was games and various dedicated software. Once games and the software moved to other platforms, they moved too.
|
"Gamers" are not impressed with lag (high latency and jitter), low performance, low expandability and mediocre compatibility hardware. The Amiga used to be one of the best affordable high performance and quality gaming system. Retro gamers using emulators and universal FPGA hardware likely look for the Amiga version of games from the late 1980s and early 1990s. Amiga hardware has little to offer them today though. Hardcore Amiga fans may have enjoyed THEA500 Mini unboxing, tank mouse and CD32 controller but they are better off using the controllers on a RPi retro gaming setup that has better value. THEA500 Mini has some value but it comes from the controllers and the ready to play installed games rather than the SBC hardware.
Hammer Quote:
The 1st Raspberry Pi B used Broadcom's BCM2835 SoC (ref 1) that contains ARM1176JZFS (ARMv6, VFPv2 FPU, Jazelle) and Broadcom VideoCore 4 iGPU .
ARM1176JZF-S includes DSP SIMD instructions for 32-bit registers (ref 2). 32bit SIMD pack math for quad 8bit and dual 16bit datatypes.
|
You did not mention the most important new feature of the ARM1176JZFS CPU core. The feature that made this scalar ARM core so popular and a reasonable choice for a cheap 256MiB SBC is Thumb-2.
o ARM1136 o ARM1156 - introduced Thumb2 instructions o ARM1176 - introduced security extensions o ARM11MPcore - introduced multicore support
Thumb already existed and has good code density but is copied from the fixed length 16-bit encoded SuperH which has performance "problems" due to a "significant increase in the number of instructions executed".
Profile Guided Selection of ARM and Thumb Instructions https://www2.cs.arizona.edu/~arvind/papers/lctes02.pdf Quote:
While the use of Thumb instructions generally gives smaller code size and lower instruction cache energy, there are certain problems with using the Thumb mode. In many cases the reductions in code size are obtained at the expense of a significant increase in the number of instructions executed by the program. In our experiments this increase ranged from 9% to 41%. In fact in case of one of the benchmarks, the increase in dynamic instruction count was so high that instead of obtaining reductions in cache energy used, we observed an increase in the total amount of energy expended by the instruction cache.
|
Thumb-2 introduced a variable length 16-bit encoding like the 68k but with only 2 sizes. This reduces the number of instructions executed which improves performance but the 68k still has fewer instructions executed. The 68k and even ColdFire has more encoding sizes and more GP registers than Thumb-2 which improves performance without sacrificing code density. I do not believe a 256 MiB memory RPi would have been a success with ARM64/AArch64. The RPi 3 and RPi Zero 2 W have ARM64 cores but 512 MiB of memory. The 68k Amiga has a much smaller footprint than the RPi and a better performance ISA than Thumb-2.
Hammer Quote:
From Raspberry Pi (ref 1).
Capable of BluRay quality playback, using H.264 at 40MBits/s.
Provides Open GL ES 2.0, hardware-accelerated OpenVG, and 1080p30 H.264 high-profile decode.
Capable of 1Gpixel/s, 1.5Gtexel/s or 24 GFLOPs of general purpose compute and features a bunch of texture filtering and DMA infrastructure.
Provides Open GL ES 2.0, hardware-accelerated OpenVG, and 1080p30 H.264 high-profile decode.
That is, graphics capabilities are roughly equivalent to Xbox 1 level of performance.
Broadcom Videocore 4 GPU has claims of the original Xbox performance level. Notice the comparison is against a well known game console.
ARM1176JZFS's 700 Mhz clock speed is close to the original Xbox's Intel Coppermine 128K's 733 Mhz clock speed target.
Broadcom's BCM2835 SoC is more than the solo 68060 CPU.
|
Your comparison of the original 2001 XBox and original RPi SoC using a 2003 ARM11 core is interesting. The Coppermine Pentium III performance destroys the ARM11 core though.
https://en.wikipedia.org/wiki/Raspberry_Pi#Hardware Quote:
Performance
While operating at 700 MHz by default, the first generation Raspberry Pi provided a real-world performance roughly equivalent to 0.041 GFLOPS.[83][84] On the CPU level the performance is similar to a 300 MHz Pentium II of 1997-99. The GPU provides 1 Gpixel/s or 1.5 Gtexel/s of graphics processing or 24 GFLOPS of general purpose computing performance. The graphical capabilities of the Raspberry Pi are roughly equivalent to the performance of the Xbox of 2001.
|
The scalar ARM11 core is weaker than the superscalar 68060.
year | CPU/core | caches | int bench 1994 68060 L1=16kiB 1.8DMIPS/MHz 1999 Pentium3 L1=64kiB+L2=256kiB 3.4DMIPS/MHz 2003 ARM1176 L1=32kiB+L2=128kiB 1.25DMIPS/MHz RPi1 2011 Cortex-A7 L1=64kiB+L2=256kiB 1.9DMIPS/MHz RPi2 2012 Cortex-A53 L1=64kiB+L2=512kiB 2.3DMIPS/MHz RPi3
The 1994 in-order superscalar 68060 was outperforming the 2003 scalar ARM1176 which had the benefit of twice the L1 caches and nearly a decade newer silicon. ARM needed about 17 years to surpass the 68060 with the Cortex-A7 using quadruple the caches and newer silicon. The planned 68060+, which likely doubled the L1 caches, Motorola claimed would provide a 20%-30% performance increase independent of clock frequency. With the same caches and process, I expect a 68060 core to be much closer in performance to the Pentium 3 than an ARM11 core. ARM cores were weak sauce until recently. ARM liked to clock up their weak sauce cores not only because they needed to for performance but higher clocked cores fool people who naively look at and compare clock speed. An ARM1176@700MHz is maybe equivalent to a P3@150-300MHz. The original RPi has 256 MiB of SDRAM while the XBox has 64 MiB of DDR SDRAM which likely gives a Windows paging performance handicap. The XBox was one hot box while the RPi passive cooling allows for a tiny affordable SBC. While the scalar ARM1176@700MHz has low performance efficiency, overall CPU performance is better than the high performance efficiency 68060@100MHz which is the problem. Ideally, the high performance and power efficiency core is the one to clock up but the 68060 has been ignored if not sabotaged.
Hammer Quote:
On stock Ryzen 5 7600X (fat Zen 4 with 32 MB L3 cache) on cheapo B650 motherboard + 128bit DDR5-6000 office PC, WinUAE 5.2,
Test case 0: 30 ms, like 68060 @ 3.395 Ghz Test case 1: 15 ms, like 68060 @ 5.093 Ghz Test case 2: 10 ms, like 68060 @ 6,360 Ghz Test case 3: 10 ms, like 68060 @ 5.725 Ghz Test case 4: 35 ms Test case 5: 19 ms Test case 6: 11 ms
|
Ryzen 5 7600X has 4.7-5.3GHz OoO cores, the 5nm SoC has a list price of $299, the 38MiB of L2+L3 caches is more than the memory of most Amigas and the 105W TDP (~158W peak) requires a more expensive power supply, fans and heatsink than a RPi SBC. Yes, the performance is impressive compared to ARM designs which have improved a lot but have a ways to go for general purpose processing. The OoO Ryzen 5 7600X is wasteful to emulate a high clocked in-order 68060 with half the performance at most. An ASIC with 68060@1-2GHz cores using a 5nm process could likely be passively cooled. Electricity has about 1/100 of the distance to travel using a 5nm process compared to the 500nm 68060 process and about 1/1000 the distance to travel compared to the 5000nm Amiga chipset. A 68060&AA+ SoC ASIC could be mass produced for less than $1 USD and use fewer transistors than an ARM1176 core. WinUAE on x86-64 CISC processors makes more sense than the low end ARM offerings. Low end 68k Amiga hardware needs to become more affordable and higher performance or it disappears. The 68k Amiga has the small footprint, code density, standardization and software, especially games, that is desirable for low end hardware while many advantages disappear with emulation. One last observation is that the CISC cores appear to be more tolerant of code quality while RISC/ARM cores require more optimization for good performance but that was my point. Usually in-order cores require code with instruction scheduling to gain much superscalar performance but the 68060 is an exception and even tolerated loop code better than the much larger and more complex OoO Cortex-A72 which is able to rearrange the code. Most compilers do not have 68060 instruction schedulers and only hand optimized 68060 assembly code has reached the limits of 68060 performance. It is one of the greatest in-order CPUs ever that was unfairly locked away before it could grow up in clock frequency!
Last edited by matthey on 27-Feb-2025 at 06:51 AM. Last edited by matthey on 26-Feb-2025 at 10:09 PM. Last edited by matthey on 26-Feb-2025 at 09:44 PM.
|
| Status: Offline |
| | Hammer
 |  |
Re: Integrating Warp3D into my 3D engine Posted on 26-Feb-2025 22:01:32
| | [ #179 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @matthey
Quote:
Your comparison of the original 2001 XBox and original RPi SoC using a 2003 ARM11 core is interesting. The Coppermine Pentium III performance destroys the ARM11 core though.
|
Xbox comparison is only for the GPU.
Quote:
The scalar ARM11 core is weaker than the superscalar 68060.
|
For integer performance, ARM11 is reliant on 32-bit integer SIMD. Your benchmark doesn't use SIMD.
Refer to Amiga Hombre's PA-RISC's custom SIMD extensions for a similar direction.
Intel i860 has MMX-like 64-bit SIMD extensions.
Jazelle (Java acceleration) is important for Android and Blu-ray. https://en.wikipedia.org/wiki/BD-J
Quote:
year | CPU/core | caches | int bench 1994 68060 L1=16kiB 1.8DMIPS/MHz 1999 Pentium3 L1=64kiB+L2=256kiB 3.4DMIPS/MHz 2003 ARM1176 L1=32kiB+L2=128kiB 1.25DMIPS/MHz RPi1 2011 Cortex-A7 L1=64kiB+L2=256kiB 1.9DMIPS/MHz RPi2 2012 Cortex-A53 L1=64kiB+L2=512kiB 2.3DMIPS/MHz RPi3
|
ARM9 generation (e.g. ARM925T) pushed out Motorola/Freescale Dragon Ball VZ from the smart handheld market.
DragonBall Super VZ (MC68SZ328) has 68000 at 66Mhz with 10.8 MIPS.
Tungsten T (Palm OS version 5) used Texas Instruments OMAP 1510 (ARM925T with MMU) with 144 Mhz displaced 68000 based DragonBall Super VZ.
DragonBall Super VZ wasn't competitive in the PDA, smart mobile phone, and handheld game console market.
Nintendo DSi has 133 MHz ARM9 and 33 MHz ARM7 cores.
Nintendo's dual-core ARM11 MPCore @ 268 Mhz + single-core ARM9 enabled Nintendo 3DS handheld game console.
These ARM9 and ARM11 CPU cores are 68040 class CPUs with very low power consumption, 32-bit SIMD multimedia extensions for ARM11, and triple-digit high clock speeds.
With DEC's engineering skills, StrongARM SA-110 was released in 1996 with clock speeds of 100, 160, and 200 Mhz. The SA-110's first design win was the Apple MessagePad 2000. The lessons from StrongARM influenced ARM9.
SA-1500 was a derivative of the SA-110 developed by DEC with 200 to 300 MHz clock speeds with FPU and 64-bit SIMD via on-chip Attached Media Processor. SA-1500 influenced later ARM designs when they gained ARM-designed FPU and SIMD.
StrongARM, Xscale (ARM architecture v5) ARM7, ARM9, and ARM11 are strong in the handheld market. ARM gained a "safe space" away from powerful desktop processors while there's market demand for powerful evolving handheld CPUs.
Intel XScale PXA25x has ARMv5TE, ARM Thumb, ARM DSP. Intel XScale PXA27x has iwMMXt that contains 64-bit SIMD MMX and integer SSE.
Intel sold the Xscale PXA family to Marvell Technology Group in June 2006.
Fit 68060 inside Apple MessagePad 2000 PDA in 1996!
Quote:
Ryzen 5 7600X has 4.7GHz OoO cores, the 5nm SoC has a list price of $299, the 38MiB of L2+L3 caches is more than the memory of most Amigas and the 105W TDP (~158W peak) requires a more expensive power supply, fans and heatsink than a RPi SBC.
|
For mobile, there's the Zen CPU family U series which are qualified for aggressive under voltage, half L3 cache and less power consumption.
Undervoltage is a silicon lottery on desktop CPUs.
https://pcpartpicker.com/products/cpu/#F=99,103&sort=price&page=1 Ryzen 5 8500G = $148.88 (box SKU includes cooler and fan) Ryzen 5 8400F = $149.00 (box SKU includes cooler and fan) Ryzen 5 7600 = $184.98 Ryzen 5 7600X = $209.00 Ryzen 5 9600X = $240.00
All Ryzen SKUs on AM5 have iGPU except for F models.
105W TDP (~158W peak) is running something like Blender MT or Cinebench MT.
With Ryzen Master with Ryzen 5 7600X, Karlos's small benchmark multiple times shows the CPU consumes 21 to 22 watts.
https://www.cpubenchmark.net/cpu.php?cpu=AMD+Ryzen+5+7600X&id=5033 AMD Ryzen 5 7600X Physics: 1,840 Frames/Sec Extended Instructions: 24,283 Million Matrices/Sec Single Thread: 4,139 MOps/Sec (doesn't test single core)
https://www.cpubenchmark.net/cpu.php?cpu=AMD+Ryzen+5+9600X&id=6199 AMD Ryzen 5 9600X Physics: 2,003 Frames/Sec Extended Instructions: 27,983 Million Matrices/Sec Single Thread: 4,577 MOps/Sec (doesn't test single core)
https://www.cpubenchmark.net/cpu.php?cpu=AMD+Ryzen+7+8840U&id=6003 AMD Ryzen 7 8840U (mobile) Physics: 1,187 Frames/Sec Extended Instructions: 18,783 Million Matrices/Sec Single Thread: 3,599 MOps/Sec (doesn't test single core)
AMD Ryzen AI 7 350 (mobile) Physics: 1,297 Frames/Sec Extended Instructions: 20,553 Million Matrices/Sec Single Thread: 4,088 MOps/Sec (doesn't test single core)
https://www.cpubenchmark.net/cpu.php?cpu=ARM+Cortex-A72+4+Core+2000+MHz&id=4077 ARM Cortex-A72 4 Core 2000 MHz Physics: 72 Frames/Sec Extended Instructions: 785 Million Matrices/Sec Single Thread: 570 MOps/Sec
https://www.cpubenchmark.net/cpu.php?cpu=ARM+Cortex-A76+4+Core+3000+MHz&id=5739 ARM Cortex-A76 4 Core 3000 MHz Physics: 125 Frames/Sec Extended Instructions: 2,902 Million Matrices/Sec Single Thread: 1,299 MOps/Sec
Raspberry Pi 5 16GB is priced around $120 USD.
https://videocardz.com/newz/steam-deck-is-now-available-for-only-296 SteamDeck reached USD296
https://www.theverge.com/2024/1/10/24033161/ayaneo-next-lite-steam-deck-competitor-steamos Ayaneo’s Next Lite is a USD299 Steam Deck competitor with AMD Ryzen 5 4500U.
For UAE's single-thread JIT and IGP, $109.97 Intel Core i3-14100 would do the job.
https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i3-14100&id=5831 Physics: 1,011 Frames/Sec Extended Instructions: 11,903 Million Matrices/Sec Single Thread: 3,769 MOps/Sec
All Intel Raptor Lake R SKUs have iGPU except for F models.
The "claim to fame" for PiStorm is accelerating Commodore-Amiga Inc's genuine A500/A600/A2000/A1200 on CPU and RTG since "the name" (TM) can be important in the retro scene.
The modern wedge-keyboard desktop microcomputers are known as laptops with a built-in display. Many ultrabooks are missing the keypad just like A600. Most +15 inch to 16-inch laptops have the keypad.
Back to the Amiga, if the A600 had CD32's core features, it would be a nice little wedge-keyboard desktop micro-computer for 1992 time.
The main reason why X86 survives are due to Intel and 2nd source insurance AMD. 68K was effectively abandoned by Motorola when Apple focused on RISC. Other x86 cloners are using desktop x86 PC's ecosystem momentum as a leverage to spread into embedded markets.
Commodore f---up during 1992 which killed 68K's second best-selling desktop microcomputer platform.
None of the major 68K licensees are interested in clean sheet cloning the CISC 68K e.g. Hitachi, Rockwell, Signetics, Thomson/SGS-Thomson and Toshiba are not AMD's R&D level.
In 2014, Rochester Electronics re-established manufacturing capability for the 68020 microprocessor. Rochester Electronics CEO and founder Curt Gerrish was an ex-Motorola employee for more than 20 years. Curt Gerrish has passed away at age 88 on Dec 20, 2024.
Single purpose GE Flight Management Computer Model 2907C1 has 68040 @ 60 Mhz (30Mhz bus). Boeing is not into cutting-edge GUI when compared to Space X.
Last edited by Hammer on 27-Feb-2025 at 03:51 AM. Last edited by Hammer on 27-Feb-2025 at 03:27 AM. Last edited by Hammer on 27-Feb-2025 at 03:06 AM. Last edited by Hammer on 27-Feb-2025 at 12:29 AM. Last edited by Hammer on 27-Feb-2025 at 12:04 AM. Last edited by Hammer on 26-Feb-2025 at 11:35 PM. Last edited by Hammer on 26-Feb-2025 at 11:34 PM. Last edited by Hammer on 26-Feb-2025 at 10:29 PM.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | Hammer
 |  |
Re: Integrating Warp3D into my 3D engine Posted on 27-Feb-2025 5:20:23
| | [ #180 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @Karlos
Quote:
When Jay brought a machine to this world first sporting all these fundamental principles together, various other vendors were still asking if basic multitasking was even useful on their dull monochrome textmode beep boxes.
All that changed in pretty short order. The Amiga design legacy is literally everywhere in modern desktop computing.
|
When Amiga Lorraine was demoed in 1984, Apple already released Lisa (custom MMU + 68000, 720 Ă— 364p GUI) and Mac 128K (68000, 512 x 342p GUI).
Santa Cruz Operation (SCO) published Microsoft Xenix (version 3) operating system for the 1984 Lisa 2nd revision.
Apple Lisa OS has pre-emptive multitasking, virtual memory and memory protection which are removed on MMU-less Macintosh with 128K RAM.
https://arstechnica.com/gadgets/2023/01/revisiting-apples-ill-fated-lisa-computer-40-years-on/ Lisa OS has GUI, 32-bit pre-emptive multitasking, virtual memory, memory protection via Apple's custom MMU and error correcting RAM. Apple would repeat Lisa OS's features with Apple Unix with Mac app compatibility and again with MacOS X, the third time's the charm.
Microsoft's Xenix team has 68000 experience. During the 1985 meeting, Bill Gates has pro 32bit 80386 position against IBM's pro 16bit 80286 position.
The purpose for the OS/2 project is to replace PC-DOS/MS-DOS and Xenix's expensive AT&T license. IBM's goal with OS/2 project is to degrade MS into junior partner. Bill Gates is not stupid with IBM's OS/2 ultimate goal.
Meanwhile, IBM's RT PC project has 32-bit RISC CPU with a purpose to displace Intel. 32bit 80386 would interfere with IBM RT PC project's 32-bit CPU transition. This IBM transition phase is the similar to later 64-bit desktop transition with PowerPC 970. With Apple's mass production support, IBM's PowerPC 970 has a good chance knocking out Intel Itanium, except that IBM's imposed x86's second source insurance AMD created 64bit extensions for the x86.
For 386 project during early 1985, Intel partnered with Compaq, MS and SCO to rapidly release Compaq 386/AT standard in 1986 followed Windows 386 and Xenix 386 releases in 1987. 1986 released MS Excel GUI version has embedded Windows 2.0 runtime and it's a port from 1985 Mac's Excel GUI version.
You have to factor in IBM's throttling PC's 32bit evolution for IBM's 32bit transition self-interest.
80386 comes with MMU as standard for a reason e.g. MS Xenix. Xenix 386 can run multiple virtual 86 DOS text programs. Around 1986 to 1987, Commodore would design two custom MMUs for 68000 (C= MMU too slow, need TLB caches) and 68020. C900 team members have no loyality to AmigaOS and focused on Coherent based Unix clone without AmigaOS app compatibility. C= resources and HR focused on AMIX instead of AmigaOS.
I reviewed C= AMIX (1991), NextSTEP 68K (1988/1990), SGI IRIS 6.5 (1993), and MS Windows NT 3.1 (1993).
Last edited by Hammer on 27-Feb-2025 at 05:22 AM.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| |
|
|
|
[ home ][ about us ][ privacy ]
[ forums ][ classifieds ]
[ links ][ news archive ]
[ link to us ][ user account ]
|