Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6071 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

16 crawler(s) on-line.

126 guest(s) on-line.

0 member(s) on-line.

You are an anonymous user.
Register Now!

matthey: 56 mins ago

Matt3k: 1 hr 29 mins ago

kolla: 1 hr 34 mins ago

Torque: 1 hr 43 mins ago

Karlos: 1 hr 55 mins ago

amigakit: 2 hrs 23 mins ago

MickJT: 2 hrs 40 mins ago

pixie: 2 hrs 45 mins ago

kiFla: 3 hrs 11 mins ago

OneTimer1: 3 hrs 29 mins ago

Forum Index

Amiga General Chat

risc-v news

Poster

Thread

matthey

Re: risc-v news
Posted on 10-Feb-2022 2:34:48

[ #21 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2017
From: Kansas

bison Quote:

@ferrels

You should get in touch with Intel and tell them that you know more than they do.

lol

If performance was the only metric to measure an architecture then ARM would be dead too. Architectures have advantages and disadvantages and RISC-V is no exception.

+ open hardware (Linux of the hardware world)
+ large free encoding space allows 3rd party customization for specialized workloads
+ compressed encoding provides better code density than most RISC architectures
+ simplicity and versatility allows it to scale down to deeply embedded applications
+ simplicity reduces power and area
- simplicity makes performance challenging
- practically a boring moderately improved SPARC/MIPS architecture
- strong versatility and configurability results in less standardization and compatibility

IMO, RISC-V is most suited for embedded markets but doesn't have universal appeal. ARM architectures are more human friendly. The SuperH open hardware J-Core project is another example where RISC-V was rejected (by someone with 68k roots). The friendliness of an architecture was not supposed to matter but I believe the decline of PPC had a lot to do with developers not liking it leading to less optimized code, buggier programs and less software development. Even the quirky x86 was more friendly but one of the most human friendly architectures of all time is the 68k.

hardwaretech Quote:

More risc news
https://youtu.be/z89ysU4RwIg
https://youtu.be/yU2MHsRMtZE
https://youtu.be/yTMRGERZrQE
https://forums.macrumors.com/threads/raspberry-pi-vs-power-mac-g5.2111057/
A pi 3 was slightly weaker than a Mac g5 does that mean a pi 4 is equal or more powerful?

Yes. The Raspberry Pi 3 has 4 Cortex-A53 cores which are low power and not performance cores using a 40nm process. The Raspberry Pi 4 has 4 Cortex-A72 cores which are high performance cores using a 28nm process. The performance difference between the RPi 3 and RPi 4 is huge. It looks like the benchmarks in the link were chosen to take advantage of the 4 cores of the RPi 3. In practice, parallel processing of many workloads is not as advantageous. Still, the RPi 3 has much better performance/MHz and performance/W in a single core comparison with the PPC G5 and the RPi 4 likely has better overall performance/core than the PPC G5.

hardwaretech Quote:

If so what does it say about Amiga ppc?

The PPC G5 fab process used is old and the old memory bandwidth is significantly lower.

Cortex-A72
2012
28nm for RPi 4 SoC
64 bit
14-16 stage pipeline (15 cycle branch mispredict penalty)
micro-oped OoO, instruction-fusion
decode up to 3 instructions, dispatch up to 5 per cycle
performance design

Cortex-A53
2012
40nm for RPi 3 SoC
64 bit
8 stage pipeline (7 cycle branch mispredict penalty)
in-order
superscalar 2 instructions per cycle
low power throughput design

PPC 970 (G5)
2002
130nm process
64 bit
16-21 stage pipeline (16 cycle branch mispredict penalty)
micro-oped OoO
fetch and decode up to 8 instructions, dispatch up to 5, issue up to 8 and retire up to 5 per cycle
performance design

68060
1994
500nm
32 bit
8 stage pipeline (7 cycle branch mispredict penalty)
in-order
superscalar 2 instructions per cycle (3 instructions per cycle with branch folding)
balanced design

The PPC G5 and Cortex-A72 cores are more similar OoO designs for high performance. The Cortex-A53 is a much simpler low power throughput design. I added the 68060 specs because they are somewhat similar in comparison. The 8 stage pipeline of the 68060 was actually considered long back then in comparison to early shallow PPC pipeline designs and even the early Pentium pipeline design but it was ahead of its time. There is a nice article with a diagram of the Cortex-A53 and the similar Cortex-A55 successor which states the following.

https://www.anandtech.com/show/11441/dynamiq-and-arms-new-cpus-cortex-a75-a55/4 Quote:

At a high level, the A55 is still a dual-issue, in-order CPU with an 8-stage pipeline. According to ARM, 8 stages is still the sweet spot, because it’s not seeing significant frequency improvements when moving from 16/14nm to 10nm to 7nm (most of the process gains are with area scaling and reduced dynamic/leakage power). With 8 stages, the A55 should reach a similar peak frequency as A53. Moving to a shorter pipeline would reduce the max frequency without a significant improvement to power or area, while a longer pipeline would increase area and power consumption for only a small frequency gain.

They didn't change much when upgrading from the Cortex-A53 to Cortex-A55. One of the minor tweaks was a move to a 4-way set associative L1 ICache like the 68060 has. The Cortex-A53 has some modern features the 68060 didn't get like full 64 bit support, symmetric dual-issue for most instructions so both issue slots can feed instructions to any pipeline, larger L1 and new L2 and L3 caches, loop termination prediction (in Apollo core which I suggested), hardware return/link stack (in Apollo core), indirect branch predictor, SIMD unit (in Apollo core but integer only), IEEE half precision support (would improve code density for fp code), etc. most of which is mentioned in the article. One other change that was mentioned for the Cortex-A55 is separate load and store units which is likely what "reduced the L1 pointer chasing load-to-use latency from 3 cycles in A53 to 2 cycles in A55". The 68060 doesn't need this though because most load-to-use penalties are avoided by having a horizontal pipeline beneficial with CISC instructions. AArch64 added CISC like addressing modes but their instructions are still broken apart into separate load/store and reg-reg ALU instructions with the load-to-use penalty between. The 68060 does the EA calculation and ALU operation in the same execution pipeline and both execution pipes can do this with single cycle throughput simultaneously. RISC only pipelines instructions which access registers while the 68060 design can pipeline accesses to registers and memory together with no load-to-use penalty. ARM is aware of the performance killing load-to-use penalties with in-order RISC cores and goes to great lengths to minimizing them to 2 or 3 cycles. The RISC-V in-order BOOMv2 core for example has a 4 cycle load-to-use penalty.

; 68k Amiga library Open()
; a6=libptr, d0=version
Open:
move.l a6,d0 ; return library pointer in d0
addq.w #1,(LIB_OPENCNT,a6) ; increment library open count
bclr #LIBB_DELEXP,(sb_Flags,a6) ; prevent delayed expunges
rts

; RISC Amiga library Open()
; r30=libptr, r3=version
Open:
load (LIB_OPENCNT,r30),r3
; load-to-use stall/bubble BOOMv2=4 cycles, Cortex-A53=3 cycles, Cortex-A55=2 cycles
add #1,r3,r3
store r3,(LIB_OPENCNT,r30)
load (sb_Flags,r30),r3
; load-to-use stall/bubble BOOMv2=4 cycles, Cortex-A53=3 cycles, Cortex-A55=2 cycles
bclr #LIBB_DELEXP,r3
store r3,(sb_Flags,r30)
move.l r30,r3 ; return library pointer in r3
return

None of these in-order RISC processors can execute this code in fewer cycles than the ancient 68060 which also needs less than half the code. The humble little 68060 with an in-order core design was outperforming some high performance OoO cores back in the day and using a similar and even older fab process unlike the in-order Cortex-A53 using 40nm process outperforming the OoO G5 using 130nm process from a decade earlier.

hardwaretech Quote:

All I know risc cpus are getting more powerful and cheaper each year. It takes time to make a product and sell it. If they started now they cpu leave open be able to select it the last year before making the product. Software like Aros is designed to port to any hardware. I am sure the rest of Amiga OS could do the same with that much lead time.

Chip fab process improvements are not giving as much benefit as they used to but they are still coming for now even though they are getting more difficult and expensive. It has nothing to do with RISC which has the advantage of being quicker to design when Moore's Law was in full swing allowing to get to market quicker using a better fab process. RISC has adopted CISC features like advanced addressing modes and variable length instructions to try to close the performance gap. Some people would argue they aren't even RISC anymore but they still aren't getting all the CISC performance.

Last edited by matthey on 10-Feb-2022 at 04:40 PM.
Last edited by matthey on 10-Feb-2022 at 07:57 AM.
Last edited by matthey on 10-Feb-2022 at 07:44 AM.
Last edited by matthey on 10-Feb-2022 at 02:46 AM.
Last edited by matthey on 10-Feb-2022 at 02:38 AM.

Status: Offline

hardwaretech

Re: risc-v news
Posted on 11-Feb-2022 22:07:41

[ #22 ]

Member

Joined: 5-May-2010
Posts: 62
From: blaine minnesota usa

@matthey
You stated -Chip fab process improvements are not giving as much benefit as they used to but they are still coming for now even though they are getting more difficult and expensive. It has nothing to do with RISC which has the advantage of being quicker to design when Moore's Law was in full swing allowing to get to market quicker using a better fab process. RISC has adopted CISC features like advanced addressing modes and variable length instructions to try to close the performance gap. Some people would argue they aren't even RISC anymore but they still aren't getting all the CISC performance.

But from what I understand Risc V custom build the instruction set according to the user's needs. If you don't need it they do not add it. So if more like cisc, then someone wanted it, but it can be leaner.

Status: Offline

MEGA_RJ_MICAL

Re: risc-v news
Posted on 12-Feb-2022 1:51:18

[ #23 ]

Super Member

Joined: 13-Dec-2019
Posts: 1200
From: AMIGAWORLD.NET WAS ORIGINALLY FOUNDED BY DAVID DOYLE

@matthey

Quote:

matthey wrote:
bison Quote:

@ferrels

You should get in touch with Intel and tell them that you know more than they do.

lol

If performance was the only metric to measure an architecture then ARM would be dead too. Architectures have advantages and disadvantages and RISC-V is no exception.

+ open hardware (Linux of the hardware world)
+ large free encoding space allows 3rd party customization for specialized workloads
+ compressed encoding provides better code density than most RISC architectures
+ simplicity and versatility allows it to scale down to deeply embedded applications
+ simplicity reduces power and area
- simplicity makes performance challenging
- practically a boring moderately improved SPARC/MIPS architecture
- strong versatility and configurability results in less standardization and compatibility

IMO, RISC-V is most suited for embedded markets but doesn't have universal appeal. ARM architectures are more human friendly. The SuperH open hardware J-Core project is another example where RISC-V was rejected (by someone with 68k roots). The friendliness of an architecture was not supposed to matter but I believe the decline of PPC had a lot to do with developers not liking it leading to less optimized code, buggier programs and less software development. Even the quirky x86 was more friendly but one of the most human friendly architectures of all time is the 68k.

hardwaretech Quote:

More risc news
https://youtu.be/z89ysU4RwIg
https://youtu.be/yU2MHsRMtZE
https://youtu.be/yTMRGERZrQE
https://forums.macrumors.com/threads/raspberry-pi-vs-power-mac-g5.2111057/
A pi 3 was slightly weaker than a Mac g5 does that mean a pi 4 is equal or more powerful?

Yes. The Raspberry Pi 3 has 4 Cortex-A53 cores which are low power and not performance cores using a 40nm process. The Raspberry Pi 4 has 4 Cortex-A72 cores which are high performance cores using a 28nm process. The performance difference between the RPi 3 and RPi 4 is huge. It looks like the benchmarks in the link were chosen to take advantage of the 4 cores of the RPi 3. In practice, parallel processing of many workloads is not as advantageous. Still, the RPi 3 has much better performance/MHz and performance/W in a single core comparison with the PPC G5 and the RPi 4 likely has better overall performance/core than the PPC G5.

hardwaretech Quote:

If so what does it say about Amiga ppc?

The PPC G5 fab process used is old and the old memory bandwidth is significantly lower.

Cortex-A72
2012
28nm for RPi 4 SoC
64 bit
14-16 stage pipeline (15 cycle branch mispredict penalty)
micro-oped OoO, instruction-fusion
decode up to 3 instructions, dispatch up to 5 per cycle
performance design

Cortex-A53
2012
40nm for RPi 3 SoC
64 bit
8 stage pipeline (7 cycle branch mispredict penalty)
in-order
superscalar 2 instructions per cycle
low power throughput design

PPC 970 (G5)
2002
130nm process
64 bit
16-21 stage pipeline (16 cycle branch mispredict penalty)
micro-oped OoO
fetch and decode up to 8 instructions, dispatch up to 5, issue up to 8 and retire up to 5 per cycle
performance design

68060
1994
500nm
32 bit
8 stage pipeline (7 cycle branch mispredict penalty)
in-order
superscalar 2 instructions per cycle (3 instructions per cycle with branch folding)
balanced design

The PPC G5 and Cortex-A72 cores are more similar OoO designs for high performance. The Cortex-A53 is a much simpler low power throughput design. I added the 68060 specs because they are somewhat similar in comparison. The 8 stage pipeline of the 68060 was actually considered long back then in comparison to early shallow PPC pipeline designs and even the early Pentium pipeline design but it was ahead of its time. There is a nice article with a diagram of the Cortex-A53 and the similar Cortex-A55 successor which states the following.

https://www.anandtech.com/show/11441/dynamiq-and-arms-new-cpus-cortex-a75-a55/4 Quote:

At a high level, the A55 is still a dual-issue, in-order CPU with an 8-stage pipeline. According to ARM, 8 stages is still the sweet spot, because it’s not seeing significant frequency improvements when moving from 16/14nm to 10nm to 7nm (most of the process gains are with area scaling and reduced dynamic/leakage power). With 8 stages, the A55 should reach a similar peak frequency as A53. Moving to a shorter pipeline would reduce the max frequency without a significant improvement to power or area, while a longer pipeline would increase area and power consumption for only a small frequency gain.

They didn't change much when upgrading from the Cortex-A53 to Cortex-A55. One of the minor tweaks was a move to a 4-way set associative L1 ICache like the 68060 has. The Cortex-A53 has some modern features the 68060 didn't get like full 64 bit support, symmetric dual-issue for most instructions so both issue slots can feed instructions to any pipeline, larger L1 and new L2 and L3 caches, loop termination prediction (in Apollo core which I suggested), hardware return/link stack (in Apollo core), indirect branch predictor, SIMD unit (in Apollo core but integer only), IEEE half precision support (would improve code density for fp code), etc. most of which is mentioned in the article. One other change that was mentioned for the Cortex-A55 is separate load and store units which is likely what "reduced the L1 pointer chasing load-to-use latency from 3 cycles in A53 to 2 cycles in A55". The 68060 doesn't need this though because most load-to-use penalties are avoided by having a horizontal pipeline beneficial with CISC instructions. AArch64 added CISC like addressing modes but their instructions are still broken apart into separate load/store and reg-reg ALU instructions with the load-to-use penalty between. The 68060 does the EA calculation and ALU operation in the same execution pipeline and both execution pipes can do this with single cycle throughput simultaneously. RISC only pipelines instructions which access registers while the 68060 design can pipeline accesses to registers and memory together with no load-to-use penalty. ARM is aware of the performance killing load-to-use penalties with in-order RISC cores and goes to great lengths to minimizing them to 2 or 3 cycles. The RISC-V in-order BOOMv2 core for example has a 4 cycle load-to-use penalty.

; 68k Amiga library Open()
; a6=libptr, d0=version
Open:
move.l a6,d0 ; return library pointer in d0
addq.w #1,(LIB_OPENCNT,a6) ; increment library open count
bclr #LIBB_DELEXP,(sb_Flags,a6) ; prevent delayed expunges
rts

; RISC Amiga library Open()
; r30=libptr, r3=version
Open:
load (LIB_OPENCNT,r30),r3
; load-to-use stall/bubble BOOMv2=4 cycles, Cortex-A53=3 cycles, Cortex-A55=2 cycles
add #1,r3,r3
store r3,(LIB_OPENCNT,r30)
load (sb_Flags,r30),r3
; load-to-use stall/bubble BOOMv2=4 cycles, Cortex-A53=3 cycles, Cortex-A55=2 cycles
bclr #LIBB_DELEXP,r3
store r3,(sb_Flags,r30)
move.l r30,r3 ; return library pointer in r3
return

None of these in-order RISC processors can execute this code in fewer cycles than the ancient 68060 which also needs less than half the code. The humble little 68060 with an in-order core design was outperforming some high performance OoO cores back in the day and using a similar and even older fab process unlike the in-order Cortex-A53 using 40nm process outperforming the OoO G5 using 130nm process from a decade earlier.

hardwaretech Quote:

All I know risc cpus are getting more powerful and cheaper each year. It takes time to make a product and sell it. If they started now they cpu leave open be able to select it the last year before making the product. Software like Aros is designed to port to any hardware. I am sure the rest of Amiga OS could do the same with that much lead time.

Chip fab process improvements are not giving as much benefit as they used to but they are still coming for now even though they are getting more difficult and expensive. It has nothing to do with RISC which has the advantage of being quicker to design when Moore's Law was in full swing allowing to get to market quicker using a better fab process. RISC has adopted CISC features like advanced addressing modes and variable length instructions to try to close the performance gap. Some people would argue they aren't even RISC anymore but they still aren't getting all the CISC performance.

I was sitting with Matthew,

we were watching TV I said;

Hey Matthew, what do you see?

Do you see the guns? Do you see the bombs?

See those people throwing all of those stones?

Do you see the cars going up in flames?

See their faces, do you know their names?

Hey Matthew, when you're watching TV;

Hey, Hey Matthew, what do you see?

Do you see the tension in a rich man's house?

Do you see the cat? Do you see the mouse?

Do you see the beauty or the big bad beast?

Do you see the famine? Do you see the feast?

In this world of villains do you see the crime?

Is a superhero waiting at the edge of time?

Hey Matthew, when you're watching TV;

I said Hey, Hey Matthew, what do you see?

I see Dallas, Dynasty, Terrahawks,

He-Man, Tom and Jerry, Dukes of Hazzard,

Airwolf, Blue Thunder, Rambo, Road Runner,

Daffy Duck, The A-Team, The A-Team, I see the A-Team.

I was sitting with Matthew,

we were watching TV;

I said Hey Matthew,

what'll you be?

Will you walk like a lion in the danger zone?

Will you pass unnoticed in the great unknown?

When you see the press out on a witch hunt.

When you see a political publicity stunt.

Will you fight for the right? Will you be a man?

Will you step aside? Will you give a damn?

Will you ride that tide with the starry eyed?

Will you give and take?

Will you laugh 'til you ache?

Will you learn to live?

Will you take? Will you give?

As the bridges burn, will you live and learn? Will you be numbered with the brave and true? Well,

good luck kid, here's lookin' at you.

In the future, Matthew, so bright and free,

Hey Matthew, what'll you be?

Hey Hey Matthew, what'll you be?

I want to be a soldier, street fighter, be a policeman, A captain of a boat - big boat.

I want to be a medic man, a cowboy,

a train driver, high jump champion,

a fireman, a pilot, I want to be your friend.

It's all a game - I hope.

Hey Matthew,

Hey Hey Matthew.

I hope.

I hope.

I hope.

I hope.

_________________
I HAVE ABS OF STEEL
--
CAN YOU SEE ME? CAN YOU HEAR ME? OK FOR WORK

Status: Offline

matthey

Re: risc-v news
Posted on 12-Feb-2022 1:53:18

[ #24 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2017
From: Kansas

hardwaretech Quote:

You stated -Chip fab process improvements are not giving as much benefit as they used to but they are still coming for now even though they are getting more difficult and expensive. It has nothing to do with RISC which has the advantage of being quicker to design when Moore's Law was in full swing allowing to get to market quicker using a better fab process. RISC has adopted CISC features like advanced addressing modes and variable length instructions to try to close the performance gap. Some people would argue they aren't even RISC anymore but they still aren't getting all the CISC performance.

But from what I understand Risc V custom build the instruction set according to the user's needs. If you don't need it they do not add it. So if more like cisc, then someone wanted it, but it can be leaner.

RISC-V allows to choose standard extensions. For example, RV32E would be about as simple as it gets with only minimal 32 bit integer instructions for embedded use included. A RV64IMAC variant would have normal integer instructions, multiply and division instructions, atomic instructions and use variable length compressed instructions for a 64 bit CPU. This flexibility allows the hardware to scale from very small deeply embedded CPU cores to supercomputers but it is difficult to support all variations well and different RISC-V CPU cores will be less standardized than ARM AArch64 which took the opposite approach to make a large standardized 64 bit instruction set that is bloated and expensive to support in hardware for simple embedded use. Even though RISC-V is an open standard, it is still copyrighted and licensed so can't be altered as it would defeat the purpose of a standard. The customization allowed is to choose the variant and add new instructions into a large encoding area set aside for them. The following link describes how to add new instructions (where I grabbed the pic from my last post too).

https://www.elektormagazine.com/articles/what-is-risc-v

I don't believe it would be practical to add new addressing modes or reg-mem CISC instructions (in-order RISC core load-to-use stall elimination is not that easy). Most likely register only RISC style instructions are all that can be added but many can be added. It's a nice feature. Another option to increase flexibility and customizability is to include FPGA capabilities. For parallel workloads, a FPGA has much higher performance than a sequential CPU but it is more difficult to program and uses more power than custom CPU instructions. FPGA capabilities are especially enticing for a 68k retro machine as many FPGA cores already exist. It would be great to be able to load different chipsets, simple CPU, DSP and 3D cores, multimedia codecs, custom embedded cores, etc., into FPGA while multitasking with the AmigaOS.

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle