Click Here
home features news forums classifieds faqs links search
6071 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
18 crawler(s) on-line.
 31 guest(s) on-line.
 0 member(s) on-line.



You are an anonymous user.
Register Now!
 kolla:  30 mins ago
 Panthro:  31 mins ago
 Musashi5150:  33 mins ago
 matthey:  1 hr 5 mins ago
 DiscreetFX:  1 hr 16 mins ago
 Trixie:  1 hr 42 mins ago
 Hypex:  2 hrs 31 mins ago
 gonegahgah:  2 hrs 32 mins ago
 Hammer:  3 hrs 41 mins ago
 JimS:  4 hrs 25 mins ago

/  Forum Index
   /  Amiga General Chat
      /  PowerPC lost in space to RISC-V the final nail in the coffin for PowerPC?
Register To Post

PosterThread
matthey 
PowerPC lost in space to RISC-V the final nail in the coffin for PowerPC?
Posted on 20-Nov-2024 23:23:10
#1 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2355
From: Kansas

The PowerPC RAD750 (PPC G3) has been replaced by a SiFive RISC-V PIC64-HPSC SoC using SiFive X280 in-order CPU cores.

https://www.janschafrich.com/overview-over-nasas-new-risc-v-based-processor/
http://microelectronics.esa.int/riscv/rvws2022/presentations/04-SiFive_Intelligence_X280_for_Space_Exploration_v2.0_Dec_22.pdf

The X280 in-order CPU cores are based on the SiFive 7 series U74 cores I recognized earlier as having a CISC like CPU design and impressive performance.

https://fuse.wikichip.org/news/7115/sifive-introduces-a-new-coprocessor-interface-targets-custom-accelerators/ Quote:

Launched under the new family of processors called SiFive Intelligence, the X280 is the first core to cater to AI acceleration. At a high level, the X280 builds on top of their silicon-proven U7-series high-performance (Linux-capable) core. SiFive’s Intelligence X280 is somewhat of a unique processor from SiFive. Targetting ML workloads, its main feature point is both the new RISC-V Vector (RVV) Extension as well as SiFive Intelligence Extensions – the company’s own RISC-V custom-extension for handling ML workloads which includes fixed-point data types from 8-bits to 64-bits as well as 16-64 bit FP and the BFloat16 data type. On the RVV extension side, the X280 supports 512-bit vector register lengths, allowing variable length operations up to 512-bits.


The CISC core design reduces performance killing load-to-use stalls.

https://en.wikichip.org/wiki/sifive/microarchitectures/7_series Quote:

0-cycle load-to-use latency (down from 1 cycle)


SiFive has improved the U74 core design and performance at least twice which I mentioned in earlier posts has plenty of room for improvements even though RISC-V lacks several 68k performance advantages like being able to execute the equivalent of 2 RISC instructions from execution pipelines, more powerful addressing modes and better code density. I previously mentioned that if the weak RISC-V ISA could reach 2.64 DMIPS/MHz with the U74 core, then the 68k should be able to reach 3-4 DMIPS/MHz with the stronger 68k ISA. The X280 core is now reaching a claimed 3.3 DMIPS/MHz which is better than any PPC core ever.

http://microelectronics.esa.int/riscv/rvws2022/presentations/04-SiFive_Intelligence_X280_for_Space_Exploration_v2.0_Dec_22.pdf Quote:

Performance
o 5.8 CoreMarks/MHz 3.3 Dhrystone/MHz
o 4.5 SpecINT2006/GHz 3.4 SpecFP2006/GHz (HiPerf config)
o 4.8 TOPS (INT8 Matrix Multiplication)


https://www.sifive.com/cores/intelligence-x280 Quote:

Performance benchmarks
- 5.75 CoreMarks/MHz
- 3.25 DMIPS/MHz
- 4.6 SpecINT2k6/GHz


The X280 cores may benefit from a L3 cache instead of the L2 cache most U74 SoCs used, depending on the flexible configuration. Memory can be up to DDR4. I'm not sure what chip fab process is used but it is likely at least 28nm which is the largest the SiFive U74 SoCs used. Their up coming OoO cores are using 12nm and maybe even 7nm though.

https://www.sifive.com/blog/incredibly-scalable-high-performance-risc-v-core-ip Quote:

SiFive also offers high-bandwidth memory interface IP, supporting SiFive TileLink and industry standard protocols, for SoC or chiplet style designs for memory intensive workloads that require the latest HBM2E+ memory capabilities. With validation in 7nm and 12nm process technology currently in progress, SiFive is extending high-performance DRAM capabilities from existing 16nm processes to leading-edge technologies.


PPC Amiga1 hardware is stuck at 45nm for desktop hardware while RISC-V hardware is using up to 7nm for embedded hardware. Intel was not competing well with a 10nm process vs AMD 7nm process vs Apple 3nm Process. If it wasn't for AMD 7nm hardware competing against Apple 3nm hardware, some people might believe the x86-64 ISA was noncompetitive.

I will now answer some related questions asked by Minator that was off topic in another thread.

matthey Quote:
an in-order 68k CPU core like the 68060 can compete with low end ARM OoO CPU cores in performance using less area/transistors and power resulting in a cost advantage and lower system cost.


minator Quote:

What are you basing this on?


A RISC-V in-order 8-stage SiFive U74 core uses a CISC like design and is outperforming the in-order 8-stage Cortex-A53 and Cortex-A55 cores. It was competitive with low end OoO Cortex-A57 cores.



These would be low end configured Cortex-A57 cores using smaller caches and older fab sizes. The SiFive X280 is claimed to reach 3.3 DMIPS/MHz which may outperform some low end Cortex-A57 cores in some benchmarks (high end configuration Cortex-A57 cores may reach 4.1 DMIPS/MHz though). A Cortex-A57 OoO core is about 6 times the size in transistors of an in-order Cortex-A53 core and the in-order core is much lower power as active transistors use power. A wafer may give 6 times the number of in-order cores and better chip fab yields. The lower power of the in-order CPU core may allow to use a cheaper fab process or increase the fab process and use the smaller dies to compete against fat and hot OoO cores. The saved in-order CPU transistor and power budget can be used for the GPU transistor and power budget. If all a RISC OoO CPU is doing is removing load-to-use stalls then the in-order CISC design like the SiFive U74/X280 core design makes a lot of sense. Even more sense is to use a CISC ISA to gain more performance than any RISC ISA, especially the weak RISC-V ISA. You want to beef up an in-order CPU design as much as possible to compete with vulnerable OoO designs not to mention that a beefed 68k in-order CPU design could most likely outperform every PPC CPU ever made if a weak RISC-V ISA in-order core design can do it. In-order designs are relatively simple and cheaper to develop compared to OoO designs too. The major downside of in-order CPU cores can be the need for instruction scheduling but designs that minimize stalls like the 68060 and SiFive U74/X280 core designs minimize this disadvantage. Most compilers don't have a 68060 specific instruction scheduler yet performance is still very good and there is room for improvement at reducing scheduling needs and adding instruction schedulers to compilers. The 68060 was outperforming OoO PPC CPUs like the PPC601 and PPC603 in integer performance/MHz and should have been able to out clock them with the deeper pipeline while requiring fewer caches.

Year | CPU | transistors
1975 6502 3,500
1979 68000 68,000
1984 68020 190,000
1985 ARM1 25,000
1985 80386 275,000
1986 ARM2 30,000
1987 68030 273,000
1990 68040 1,170,000
1993 Pentium 3,100,000 superscalar in-order 2-way
1994 68060 2,530,000 superscalar in-order 2-way
1994 ARM7 250,000
1995 PentiumPro 5,500,000 OoO uop
2002 ARM11 7,500,000
2008 Nehalem 731,000,000 (1st gen Core i7 with 4 cores) 64 bit OoO uop
2011 Cortex-A7 10,000,000 superscalar in-order 2-way
2012 Cortex-A53 12,500,000 64-bit superscalar in-order 2-way
2012 Cortex-A57 75,000,000 64-bit OoO 3-way big.LITTLE companion of Cortex-A53

Rough ARM transistor counts come from the following link.
https://www.sciencedirect.com/topics/computer-science/stage-pipeline

The transistor budgets of in-order ARM cores was increasing linearly but OoO ARM cores are increasing exponentially. The RPi 4 Cortex-A72 is two generations past the Cortex-A57. I can no longer find the transistor counts for newer ARM cores but I will guess the trend below for the RPi 4 and RPi 5.

RPi 1 ARM11 core - ~3 times 68060 transistors
RPi 2 Cortex-A7 - ~4 times 68060 transistors times 4 cores is ~16 times 68060 transistors
RPi 3 Cortex-A53 - ~5 times 68060 transistors times 4 cores is ~20 times 68060 transistors
??? Cortex-A57 - ~30 times 68060 transistors times 4 cores is ~119 times 68060 transistors
RPi 4 Cortex-A72 - ~40? times 68060 transistors times 4 cores is ~160? times 68060 transistors
RPi 5 Cortex-A76 - ~60? times 68060 transistors times 4 cores is ~240? times 68060 transistors

RPi 3 Cortex-A53 baseline
??? Cortex-A57 - ~6 times Cortex-A53 transistors
RPi 4 Cortex-A72 - ~8 times Cortex-A53 transistors
RPi 5 Cortex-A76 - ~12 times Cortex-A53 transistors

A SiFive U74 core is likely smaller than a Cortex-A53 core although it didn't have a SIMD unit before the X280 added a vector unit which is different and I don't know how it would compare. The ARM64/AArch64 ISA is large compared to RISC-V, the 68k and even PPC so the cores are larger area. Even OoO RISC-V cores have a large area and small code density advantage which is a selling point for RISC-V in the embedded market.




https://www.allaboutcircuits.com/news/with-its-new-risc-v-processors-sifive-bets-on-compute-density/

Who would buy an in-order Cortex-A55 when a SiFive P470 OoO core is 30% smaller and has 2.75 times the single thread performance? Who would buy a Cortex-A78 when a SiFive P670 core with 5% less performance is half the size?

I tried to find compute density charts like this with the SiFive U74/X280 cores but I couldn't. Many customers choose ARM because they make creating ASICs easy and have a good reputation for low power embedded cores but they changed ISAs and now their cores are fat which is not good for power and their code density is not embedded market leading like Thumb-2 and 68k ISAs. SiFive is trying to make creating custom ASICs easier, cheaper and more accessible too.

https://www.anandtech.com/show/10488/sifive-unveils-freedom-platforms-for-riscvbased-semicustom-chips Quote:

SiFive does not elaborate how much money its customers will be able to save due to the free RISC-V microarchitecture, any pre-developed platforms (with re-used components), proven silicon, open-source software or other advantages that the company has to offer. This is understandable because every customer product could be unique in complexity and customization. However, SiFive says that in certain cases it will be able to deliver products to startups that do not have any silicon teams at all, which essentially means that the developer plans to address needs of very small players. Typically, such companies cannot get access to custom silicon because of high costs and other difficulties, but SiFive implies that with their pre-developed Freedom platforms the startups may get their chance to build semi-custom chips and take advantage of things like higher performance and/or lower power consumption compared to off-the-shelf not-customized silicon or FPGAs. The VP of SiFive told us that he could see a future where a couple of engineers in a garage can get access to a custom SoC “with a moderate Kickstarter campaign.â€


A couple of engineers in a garage using crowd funding is of course beyond the tiny 68k Amiga market though

minator Quote:

I guess the idea of an ASIC 68K is kind of cool, but it's not very realistic. Even if it could be built, it'll be slow.

If you increase clock speed by 10X the cycle time reduces by 10X, however this also means latency goes up by 10X the clock cycles. Without all the modern mechanisms used to deal with this, the CPU will be constantly stalling. I doubt it'll get anywhere near the performance of an Arm A53, never mind Arm's OOO cores.


ASIC logic speed has greatly outpaced memory speed which means a high CPU clock speed needs more caches. This is true of an in-order or OoO CPU core with a fat OoO CPU core using up the transistor and power budget for the caches. An in-order CPU core has about half the max performance potential of an OoO CPU core but that is more performance than any PPC CPU or virtual 68k Amiga as the in-order SiFive X280 core exhibits while leaving performance potential only an efficient CISC ISA can fully exploit.

minator Quote:

The 68K instruction set won't help, multi-length instructions makes instruction decoding complex and unpredictable, trying to prefetch is going to take a load of complex logic. Then you've got potential page faults in the middle of instructions, and how 68K handles exceptions is even worse.


A variable length encoding is no problem at all. It is dealt with with the decoupled instruction fetch pipeline (IFP) feeding into an instruction buffer which feeds the execution pipelines (OEPs). Most superscalar in-order RISC cores (and some OoO RISC cores) use the same technique because superscalar execution is of a variable number of instructions due to dependencies. The decoupled IFP and OEPs with an instruction buffer allow for a smaller fetch size saving power. The 68060 only fetches 4 bytes/cycle but is a powerful superscalar CPU while PPC and ARM64/AArch64 cores can't even be superscalar with such a small instruction fetch. The improved code density reduces fetch needs, increases remaining memory bandwidth and increases cache hit rates or allows to reduce cache sizes.

Are page faults in the middle of instructions any worse than imprecise OoO exceptions with rollback and weak memory model ordering of loads/stores? Did RISC just move the complexity elsewhere after breaking down CISC instructions into smaller weaker RISC instructions and putting them back together in a less efficient way with OoO?

minator Quote:

BTW The A53 is a 12 year old processor, and it was the low end option even then. The RPi 5 is many times faster and half the price of an A500 mini.


The Cortex-A53 is still used because it is one of the smallest 64-bit ARM64/AArch64 cores. It is one of ARMs best attempts to minimize area for an application core (the Cortex-A35 tries to minimize power). The ARM ISA is fat and it isn't just SiFive going after them on area.



https://www.jonpeddie.com/news/imagination-launches-apxm-6200-risc-v-cpu-ip-core/
https://www.eetimes.com/imagination-reveals-risc-v-processor-at-embedded-world-2024/

That is an Imagination Technologies RISC-V APXM-6200 11-stage in-order core claiming to outperform newer in-order ARM cores. No RVC compression on this one so it would be better suited for small code embedded applications but those exist and the claim is to save memory bandwidth when used with an Imagination Technologies GPU so perhaps some kind of cache coherency between CPU and GPU core caches like HSA. There isn't a perfect CPU ISA for everything even though there are primarily only 3 major ISAs remaining and PPC being well behind the 3rd place primarily embedded player and standing still as technology disappears further and further out of sight much like as it did for Commodore. Trevor is like Irving after all. He loves his PPC and it does have an area advantage over ARM64/AArch64 which he is leveraging with the A1222+ CPU with castrated FPU. RISC-V is difficult to beat for area though. I would rather play the 68k in-order performance, code density and retro gaming advantages.

Last edited by matthey on 20-Nov-2024 at 11:59 PM.
Last edited by matthey on 20-Nov-2024 at 11:48 PM.
Last edited by matthey on 20-Nov-2024 at 11:38 PM.
Last edited by matthey on 20-Nov-2024 at 11:30 PM.
Last edited by matthey on 20-Nov-2024 at 11:25 PM.

 Status: Offline
Profile     Report this post  
Hammer 
Re: PowerPC lost in space to RISC-V the final nail in the coffin for PowerPC?
Posted on 21-Nov-2024 2:52:33
#2 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5906
From: Australia

@matthey

Is that relevant for 68K Amiga?

PiStorm16 targets RPi CM4 (Cortex A72) for 16-bit Amigas.

PiStorm16 replaces the original PiStorm.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
matthey 
Re: PowerPC lost in space to RISC-V the final nail in the coffin for PowerPC?
Posted on 21-Nov-2024 6:10:36
#3 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2355
From: Kansas

Hammer Quote:

Is that relevant for 68K Amiga?


The NASA PPC RAD750 CPU replacement is relevant to the PPC, 68k and Amiga. The PPC shallow pipeline 4-stage limited OoO CPU was replaced by a RISC-V 8-stage in-order SiFive X280 CPU that closely resembles the 8-stage in-order design of the 68060. What do they have in common?

68060 and SiFive U74/X280 similarities
o in-order 8-stage pipeline
o variable length encoding
o 2 instruction dispatch/issue to 2 execution pipelines
o decoupled instruction fetch pipeline feeding an instruction buffer feeding the execution pipelines
o address generation ALU before execution ALU in execution pipeline to eliminate load-to-use stalls
o early execution of instructions in the AG ALU or late execution of instructions in the execute ALU
o Single cycle throughput of most instructions
o correct dynamic branch prediction gives zero cycle branches in most cases

There are differences too but the pipeline design is very similar. The architects obviously resurrected a CISC design and it very well could have been the 68060 design they copied. The weak RISC-V ISA leaves quite a bit of performance on the table compared to a good CISC ISA like the 68k but we can see that the 68060 architects nailed it only to be sabotaged to push the limited OoO shallow pipeline PPC603/G3 design that was difficult to clock up (the RAD750 only reached 200MHz using a 150nm process at best).

Will this change anything Amiga related? No. Trevor likes his PPC turds and making them fly. The sane Amiga masses left and the remaining Amiga fans are very tolerant of anything with an Amiga label or are happy with their original 68000@7MHz Amiga hardware. They don't care how Motorola and Commodore mistreated the 68k and Amiga. Amiga Corporation made it and Commodore fucked it up but nobody cares about fixing it anymore because it wouldn't be the original Motorola and Commodore nostalgic fuck ups. At least we can see that the 68060 was headed in the right direction and the PPC direction looks more dead end now. PPC is far more dead than the 68060 was when replaced by PPC and ironically one of the last PPC replacements was with a 68060 like design. The PPC Amiga attempted replacement in the first place would not have been necessary if Motorola/Freescale had clocked up the 68060 but then maybe the 68060 would have been clocked up for or by Commodore had they thought it was important to upgrade Amiga CPUs and chipsets. Maybe it would be better to let the Amiga die rather than use ancient bastardized castrated embedded PPC CPUs or emulate deficient Motorola/Freescale and Commodore 68k CPUs and chipsets while pretending there is some elegance from the original design left. At least then, the Amiga embarrassment would be over.

 Status: Offline
Profile     Report this post  

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle