Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
|
|
|
|
Poster | Thread | matthey
| |
Re: risc-v news Posted on 10-Feb-2022 2:34:48
| | [ #21 ] |
| |
|
Elite Member |
Joined: 14-Mar-2007 Posts: 2017
From: Kansas | | |
|
| bison Quote:
@ferrels
You should get in touch with Intel and tell them that you know more than they do. |
lol
If performance was the only metric to measure an architecture then ARM would be dead too. Architectures have advantages and disadvantages and RISC-V is no exception.
+ open hardware (Linux of the hardware world) + large free encoding space allows 3rd party customization for specialized workloads + compressed encoding provides better code density than most RISC architectures + simplicity and versatility allows it to scale down to deeply embedded applications + simplicity reduces power and area - simplicity makes performance challenging - practically a boring moderately improved SPARC/MIPS architecture - strong versatility and configurability results in less standardization and compatibility
IMO, RISC-V is most suited for embedded markets but doesn't have universal appeal. ARM architectures are more human friendly. The SuperH open hardware J-Core project is another example where RISC-V was rejected (by someone with 68k roots). The friendliness of an architecture was not supposed to matter but I believe the decline of PPC had a lot to do with developers not liking it leading to less optimized code, buggier programs and less software development. Even the quirky x86 was more friendly but one of the most human friendly architectures of all time is the 68k.
hardwaretech Quote:
Yes. The Raspberry Pi 3 has 4 Cortex-A53 cores which are low power and not performance cores using a 40nm process. The Raspberry Pi 4 has 4 Cortex-A72 cores which are high performance cores using a 28nm process. The performance difference between the RPi 3 and RPi 4 is huge. It looks like the benchmarks in the link were chosen to take advantage of the 4 cores of the RPi 3. In practice, parallel processing of many workloads is not as advantageous. Still, the RPi 3 has much better performance/MHz and performance/W in a single core comparison with the PPC G5 and the RPi 4 likely has better overall performance/core than the PPC G5.
hardwaretech Quote:
If so what does it say about Amiga ppc?
|
The PPC G5 fab process used is old and the old memory bandwidth is significantly lower.
Cortex-A72 2012 28nm for RPi 4 SoC 64 bit 14-16 stage pipeline (15 cycle branch mispredict penalty) micro-oped OoO, instruction-fusion decode up to 3 instructions, dispatch up to 5 per cycle performance design
Cortex-A53 2012 40nm for RPi 3 SoC 64 bit 8 stage pipeline (7 cycle branch mispredict penalty) in-order superscalar 2 instructions per cycle low power throughput design
PPC 970 (G5) 2002 130nm process 64 bit 16-21 stage pipeline (16 cycle branch mispredict penalty) micro-oped OoO fetch and decode up to 8 instructions, dispatch up to 5, issue up to 8 and retire up to 5 per cycle performance design
68060 1994 500nm 32 bit 8 stage pipeline (7 cycle branch mispredict penalty) in-order superscalar 2 instructions per cycle (3 instructions per cycle with branch folding) balanced design
The PPC G5 and Cortex-A72 cores are more similar OoO designs for high performance. The Cortex-A53 is a much simpler low power throughput design. I added the 68060 specs because they are somewhat similar in comparison. The 8 stage pipeline of the 68060 was actually considered long back then in comparison to early shallow PPC pipeline designs and even the early Pentium pipeline design but it was ahead of its time. There is a nice article with a diagram of the Cortex-A53 and the similar Cortex-A55 successor which states the following.
https://www.anandtech.com/show/11441/dynamiq-and-arms-new-cpus-cortex-a75-a55/4 Quote:
At a high level, the A55 is still a dual-issue, in-order CPU with an 8-stage pipeline. According to ARM, 8 stages is still the sweet spot, because it’s not seeing significant frequency improvements when moving from 16/14nm to 10nm to 7nm (most of the process gains are with area scaling and reduced dynamic/leakage power). With 8 stages, the A55 should reach a similar peak frequency as A53. Moving to a shorter pipeline would reduce the max frequency without a significant improvement to power or area, while a longer pipeline would increase area and power consumption for only a small frequency gain.
|
They didn't change much when upgrading from the Cortex-A53 to Cortex-A55. One of the minor tweaks was a move to a 4-way set associative L1 ICache like the 68060 has. The Cortex-A53 has some modern features the 68060 didn't get like full 64 bit support, symmetric dual-issue for most instructions so both issue slots can feed instructions to any pipeline, larger L1 and new L2 and L3 caches, loop termination prediction (in Apollo core which I suggested), hardware return/link stack (in Apollo core), indirect branch predictor, SIMD unit (in Apollo core but integer only), IEEE half precision support (would improve code density for fp code), etc. most of which is mentioned in the article. One other change that was mentioned for the Cortex-A55 is separate load and store units which is likely what "reduced the L1 pointer chasing load-to-use latency from 3 cycles in A53 to 2 cycles in A55". The 68060 doesn't need this though because most load-to-use penalties are avoided by having a horizontal pipeline beneficial with CISC instructions. AArch64 added CISC like addressing modes but their instructions are still broken apart into separate load/store and reg-reg ALU instructions with the load-to-use penalty between. The 68060 does the EA calculation and ALU operation in the same execution pipeline and both execution pipes can do this with single cycle throughput simultaneously. RISC only pipelines instructions which access registers while the 68060 design can pipeline accesses to registers and memory together with no load-to-use penalty. ARM is aware of the performance killing load-to-use penalties with in-order RISC cores and goes to great lengths to minimizing them to 2 or 3 cycles. The RISC-V in-order BOOMv2 core for example has a 4 cycle load-to-use penalty.
; 68k Amiga library Open() ; a6=libptr, d0=version Open: move.l a6,d0 ; return library pointer in d0 addq.w #1,(LIB_OPENCNT,a6) ; increment library open count bclr #LIBB_DELEXP,(sb_Flags,a6) ; prevent delayed expunges rts
; RISC Amiga library Open() ; r30=libptr, r3=version Open: load (LIB_OPENCNT,r30),r3 ; load-to-use stall/bubble BOOMv2=4 cycles, Cortex-A53=3 cycles, Cortex-A55=2 cycles add #1,r3,r3 store r3,(LIB_OPENCNT,r30) load (sb_Flags,r30),r3 ; load-to-use stall/bubble BOOMv2=4 cycles, Cortex-A53=3 cycles, Cortex-A55=2 cycles bclr #LIBB_DELEXP,r3 store r3,(sb_Flags,r30) move.l r30,r3 ; return library pointer in r3 return
None of these in-order RISC processors can execute this code in fewer cycles than the ancient 68060 which also needs less than half the code. The humble little 68060 with an in-order core design was outperforming some high performance OoO cores back in the day and using a similar and even older fab process unlike the in-order Cortex-A53 using 40nm process outperforming the OoO G5 using 130nm process from a decade earlier.
hardwaretech Quote:
All I know risc cpus are getting more powerful and cheaper each year. It takes time to make a product and sell it. If they started now they cpu leave open be able to select it the last year before making the product. Software like Aros is designed to port to any hardware. I am sure the rest of Amiga OS could do the same with that much lead time.
|
Chip fab process improvements are not giving as much benefit as they used to but they are still coming for now even though they are getting more difficult and expensive. It has nothing to do with RISC which has the advantage of being quicker to design when Moore's Law was in full swing allowing to get to market quicker using a better fab process. RISC has adopted CISC features like advanced addressing modes and variable length instructions to try to close the performance gap. Some people would argue they aren't even RISC anymore but they still aren't getting all the CISC performance.
Last edited by matthey on 10-Feb-2022 at 04:40 PM. Last edited by matthey on 10-Feb-2022 at 07:57 AM. Last edited by matthey on 10-Feb-2022 at 07:44 AM. Last edited by matthey on 10-Feb-2022 at 02:46 AM. Last edited by matthey on 10-Feb-2022 at 02:38 AM.
|
| Status: Offline |
| | hardwaretech
| |
Re: risc-v news Posted on 11-Feb-2022 22:07:41
| | [ #22 ] |
| |
|
Member |
Joined: 5-May-2010 Posts: 62
From: blaine minnesota usa | | |
|
| @matthey You stated -Chip fab process improvements are not giving as much benefit as they used to but they are still coming for now even though they are getting more difficult and expensive. It has nothing to do with RISC which has the advantage of being quicker to design when Moore's Law was in full swing allowing to get to market quicker using a better fab process. RISC has adopted CISC features like advanced addressing modes and variable length instructions to try to close the performance gap. Some people would argue they aren't even RISC anymore but they still aren't getting all the CISC performance.
But from what I understand Risc V custom build the instruction set according to the user's needs. If you don't need it they do not add it. So if more like cisc, then someone wanted it, but it can be leaner. |
| Status: Offline |
| | MEGA_RJ_MICAL
| |
Re: risc-v news Posted on 12-Feb-2022 1:51:18
| | [ #23 ] |
| |
|
Super Member |
Joined: 13-Dec-2019 Posts: 1200
From: AMIGAWORLD.NET WAS ORIGINALLY FOUNDED BY DAVID DOYLE | | |
|
| @matthey
Quote:
matthey wrote: bison Quote:
@ferrels
You should get in touch with Intel and tell them that you know more than they do. |
lol
If performance was the only metric to measure an architecture then ARM would be dead too. Architectures have advantages and disadvantages and RISC-V is no exception.
+ open hardware (Linux of the hardware world) + large free encoding space allows 3rd party customization for specialized workloads + compressed encoding provides better code density than most RISC architectures + simplicity and versatility allows it to scale down to deeply embedded applications + simplicity reduces power and area - simplicity makes performance challenging - practically a boring moderately improved SPARC/MIPS architecture - strong versatility and configurability results in less standardization and compatibility
IMO, RISC-V is most suited for embedded markets but doesn't have universal appeal. ARM architectures are more human friendly. The SuperH open hardware J-Core project is another example where RISC-V was rejected (by someone with 68k roots). The friendliness of an architecture was not supposed to matter but I believe the decline of PPC had a lot to do with developers not liking it leading to less optimized code, buggier programs and less software development. Even the quirky x86 was more friendly but one of the most human friendly architectures of all time is the 68k.
hardwaretech Quote:
Yes. The Raspberry Pi 3 has 4 Cortex-A53 cores which are low power and not performance cores using a 40nm process. The Raspberry Pi 4 has 4 Cortex-A72 cores which are high performance cores using a 28nm process. The performance difference between the RPi 3 and RPi 4 is huge. It looks like the benchmarks in the link were chosen to take advantage of the 4 cores of the RPi 3. In practice, parallel processing of many workloads is not as advantageous. Still, the RPi 3 has much better performance/MHz and performance/W in a single core comparison with the PPC G5 and the RPi 4 likely has better overall performance/core than the PPC G5.
hardwaretech Quote:
If so what does it say about Amiga ppc?
|
The PPC G5 fab process used is old and the old memory bandwidth is significantly lower.
Cortex-A72 2012 28nm for RPi 4 SoC 64 bit 14-16 stage pipeline (15 cycle branch mispredict penalty) micro-oped OoO, instruction-fusion decode up to 3 instructions, dispatch up to 5 per cycle performance design
Cortex-A53 2012 40nm for RPi 3 SoC 64 bit 8 stage pipeline (7 cycle branch mispredict penalty) in-order superscalar 2 instructions per cycle low power throughput design
PPC 970 (G5) 2002 130nm process 64 bit 16-21 stage pipeline (16 cycle branch mispredict penalty) micro-oped OoO fetch and decode up to 8 instructions, dispatch up to 5, issue up to 8 and retire up to 5 per cycle performance design
68060 1994 500nm 32 bit 8 stage pipeline (7 cycle branch mispredict penalty) in-order superscalar 2 instructions per cycle (3 instructions per cycle with branch folding) balanced design
The PPC G5 and Cortex-A72 cores are more similar OoO designs for high performance. The Cortex-A53 is a much simpler low power throughput design. I added the 68060 specs because they are somewhat similar in comparison. The 8 stage pipeline of the 68060 was actually considered long back then in comparison to early shallow PPC pipeline designs and even the early Pentium pipeline design but it was ahead of its time. There is a nice article with a diagram of the Cortex-A53 and the similar Cortex-A55 successor which states the following.
https://www.anandtech.com/show/11441/dynamiq-and-arms-new-cpus-cortex-a75-a55/4 Quote:
At a high level, the A55 is still a dual-issue, in-order CPU with an 8-stage pipeline. According to ARM, 8 stages is still the sweet spot, because it’s not seeing significant frequency improvements when moving from 16/14nm to 10nm to 7nm (most of the process gains are with area scaling and reduced dynamic/leakage power). With 8 stages, the A55 should reach a similar peak frequency as A53. Moving to a shorter pipeline would reduce the max frequency without a significant improvement to power or area, while a longer pipeline would increase area and power consumption for only a small frequency gain.
|
They didn't change much when upgrading from the Cortex-A53 to Cortex-A55. One of the minor tweaks was a move to a 4-way set associative L1 ICache like the 68060 has. The Cortex-A53 has some modern features the 68060 didn't get like full 64 bit support, symmetric dual-issue for most instructions so both issue slots can feed instructions to any pipeline, larger L1 and new L2 and L3 caches, loop termination prediction (in Apollo core which I suggested), hardware return/link stack (in Apollo core), indirect branch predictor, SIMD unit (in Apollo core but integer only), IEEE half precision support (would improve code density for fp code), etc. most of which is mentioned in the article. One other change that was mentioned for the Cortex-A55 is separate load and store units which is likely what "reduced the L1 pointer chasing load-to-use latency from 3 cycles in A53 to 2 cycles in A55". The 68060 doesn't need this though because most load-to-use penalties are avoided by having a horizontal pipeline beneficial with CISC instructions. AArch64 added CISC like addressing modes but their instructions are still broken apart into separate load/store and reg-reg ALU instructions with the load-to-use penalty between. The 68060 does the EA calculation and ALU operation in the same execution pipeline and both execution pipes can do this with single cycle throughput simultaneously. RISC only pipelines instructions which access registers while the 68060 design can pipeline accesses to registers and memory together with no load-to-use penalty. ARM is aware of the performance killing load-to-use penalties with in-order RISC cores and goes to great lengths to minimizing them to 2 or 3 cycles. The RISC-V in-order BOOMv2 core for example has a 4 cycle load-to-use penalty.
; 68k Amiga library Open() ; a6=libptr, d0=version Open: move.l a6,d0 ; return library pointer in d0 addq.w #1,(LIB_OPENCNT,a6) ; increment library open count bclr #LIBB_DELEXP,(sb_Flags,a6) ; prevent delayed expunges rts
; RISC Amiga library Open() ; r30=libptr, r3=version Open: load (LIB_OPENCNT,r30),r3 ; load-to-use stall/bubble BOOMv2=4 cycles, Cortex-A53=3 cycles, Cortex-A55=2 cycles add #1,r3,r3 store r3,(LIB_OPENCNT,r30) load (sb_Flags,r30),r3 ; load-to-use stall/bubble BOOMv2=4 cycles, Cortex-A53=3 cycles, Cortex-A55=2 cycles bclr #LIBB_DELEXP,r3 store r3,(sb_Flags,r30) move.l r30,r3 ; return library pointer in r3 return
None of these in-order RISC processors can execute this code in fewer cycles than the ancient 68060 which also needs less than half the code. The humble little 68060 with an in-order core design was outperforming some high performance OoO cores back in the day and using a similar and even older fab process unlike the in-order Cortex-A53 using 40nm process outperforming the OoO G5 using 130nm process from a decade earlier.
hardwaretech Quote:
All I know risc cpus are getting more powerful and cheaper each year. It takes time to make a product and sell it. If they started now they cpu leave open be able to select it the last year before making the product. Software like Aros is designed to port to any hardware. I am sure the rest of Amiga OS could do the same with that much lead time.
|
Chip fab process improvements are not giving as much benefit as they used to but they are still coming for now even though they are getting more difficult and expensive. It has nothing to do with RISC which has the advantage of being quicker to design when Moore's Law was in full swing allowing to get to market quicker using a better fab process. RISC has adopted CISC features like advanced addressing modes and variable length instructions to try to close the performance gap. Some people would argue they aren't even RISC anymore but they still aren't getting all the CISC performance.
|
I was sitting with Matthew,
we were watching TV I said;
Hey Matthew, what do you see?
Do you see the guns? Do you see the bombs?
See those people throwing all of those stones?
Do you see the cars going up in flames?
See their faces, do you know their names?
Hey Matthew, when you're watching TV;
Hey, Hey Matthew, what do you see?
Do you see the tension in a rich man's house?
Do you see the cat? Do you see the mouse?
Do you see the beauty or the big bad beast?
Do you see the famine? Do you see the feast?
In this world of villains do you see the crime?
Is a superhero waiting at the edge of time?
Hey Matthew, when you're watching TV;
I said Hey, Hey Matthew, what do you see?
I see Dallas, Dynasty, Terrahawks,
He-Man, Tom and Jerry, Dukes of Hazzard,
Airwolf, Blue Thunder, Rambo, Road Runner,
Daffy Duck, The A-Team, The A-Team, I see the A-Team.
I was sitting with Matthew,
we were watching TV;
I said Hey Matthew,
what'll you be?
Will you walk like a lion in the danger zone?
Will you pass unnoticed in the great unknown?
When you see the press out on a witch hunt.
When you see a political publicity stunt.
Will you fight for the right? Will you be a man?
Will you step aside? Will you give a damn?
Will you ride that tide with the starry eyed?
Will you give and take?
Will you laugh 'til you ache?
Will you learn to live?
Will you take? Will you give?
As the bridges burn, will you live and learn? Will you be numbered with the brave and true? Well,
good luck kid, here's lookin' at you.
In the future, Matthew, so bright and free,
Hey Matthew, what'll you be?
Hey Hey Matthew, what'll you be?
I want to be a soldier, street fighter, be a policeman, A captain of a boat - big boat.
I want to be a medic man, a cowboy,
a train driver, high jump champion,
a fireman, a pilot, I want to be your friend.
It's all a game - I hope.
Hey Matthew,
Hey Hey Matthew.
I hope.
I hope.
I hope.
I hope._________________ I HAVE ABS OF STEEL -- CAN YOU SEE ME? CAN YOU HEAR ME? OK FOR WORK |
| Status: Offline |
| | matthey
| |
Re: risc-v news Posted on 12-Feb-2022 1:53:18
| | [ #24 ] |
| |
|
Elite Member |
Joined: 14-Mar-2007 Posts: 2017
From: Kansas | | |
|
| hardwaretech Quote:
You stated -Chip fab process improvements are not giving as much benefit as they used to but they are still coming for now even though they are getting more difficult and expensive. It has nothing to do with RISC which has the advantage of being quicker to design when Moore's Law was in full swing allowing to get to market quicker using a better fab process. RISC has adopted CISC features like advanced addressing modes and variable length instructions to try to close the performance gap. Some people would argue they aren't even RISC anymore but they still aren't getting all the CISC performance.
But from what I understand Risc V custom build the instruction set according to the user's needs. If you don't need it they do not add it. So if more like cisc, then someone wanted it, but it can be leaner. |
RISC-V allows to choose standard extensions. For example, RV32E would be about as simple as it gets with only minimal 32 bit integer instructions for embedded use included. A RV64IMAC variant would have normal integer instructions, multiply and division instructions, atomic instructions and use variable length compressed instructions for a 64 bit CPU. This flexibility allows the hardware to scale from very small deeply embedded CPU cores to supercomputers but it is difficult to support all variations well and different RISC-V CPU cores will be less standardized than ARM AArch64 which took the opposite approach to make a large standardized 64 bit instruction set that is bloated and expensive to support in hardware for simple embedded use. Even though RISC-V is an open standard, it is still copyrighted and licensed so can't be altered as it would defeat the purpose of a standard. The customization allowed is to choose the variant and add new instructions into a large encoding area set aside for them. The following link describes how to add new instructions (where I grabbed the pic from my last post too).
https://www.elektormagazine.com/articles/what-is-risc-v
I don't believe it would be practical to add new addressing modes or reg-mem CISC instructions (in-order RISC core load-to-use stall elimination is not that easy). Most likely register only RISC style instructions are all that can be added but many can be added. It's a nice feature. Another option to increase flexibility and customizability is to include FPGA capabilities. For parallel workloads, a FPGA has much higher performance than a sequential CPU but it is more difficult to program and uses more power than custom CPU instructions. FPGA capabilities are especially enticing for a 68k retro machine as many FPGA cores already exist. It would be great to be able to load different chipsets, simple CPU, DSP and 3D cores, multimedia codecs, custom embedded cores, etc., into FPGA while multitasking with the AmigaOS.
|
| Status: Offline |
| |
|
|
|
[ home ][ about us ][ privacy ]
[ forums ][ classifieds ]
[ links ][ news archive ]
[ link to us ][ user account ]
|