Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6220 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

22 crawler(s) on-line.

95 guest(s) on-line.

1 member(s) on-line.

OneTimer1

You are an anonymous user.
Register Now!

OneTimer1: 3 mins ago

nbache: 7 mins ago

kolla: 11 mins ago

minator: 24 mins ago

Beajar: 1 hr 2 mins ago

amigakit: 1 hr 11 mins ago

Rob: 1 hr 18 mins ago

Karlos: 1 hr 34 mins ago

amigagr: 1 hr 59 mins ago

lionstorm: 2 hrs 8 mins ago

Forum Index

Amiga OS4 Hardware

some words on senseless attacks on ppc hardware

Poster

Thread

Karlos

Re: some words on senseless attacks on ppc hardware
Posted on 17-Nov-2023 21:23:21

[ #41 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@amigang

You should see how well OS4 runs on a RAD750 at 150MHz. You might even get to see it boot up before you die of acute radiation poisoning in its intended environment.

_________________
Doing stupid things for fun...

Status: Offline

NutsAboutAmiga

Re: some words on senseless attacks on ppc hardware
Posted on 17-Nov-2023 22:06:04

[ #42 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12993
From: Norway

@amigang

The mars rover is old now, and other article talks about DDR3 ram, new chips use DDR5.
I don’t hate powerpc on the contrary, I do like it, but there is no market for it.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

kolla

Re: some words on senseless attacks on ppc hardware
Posted on 17-Nov-2023 22:18:32

[ #43 ]

Elite Member

Joined: 20-Aug-2003
Posts: 3474
From: Trondheim, Norway

@NutsAboutAmiga

So, you suggest NASA, ESA etc send out upgrades to their space crafts to replace hardware? Or to give them up because they’re using a certain architecture?

Heh.

PowerPC was mainstream long enough to need support for decades to come, and the architecture still has a few niches, and that’s really all it takes to keep on living.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

NutsAboutAmiga

Re: some words on senseless attacks on ppc hardware
Posted on 17-Nov-2023 22:28:15

[ #44 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12993
From: Norway

@kolla

They where talking about upgrading Hubble I know, as part of refueling maintenance project, a new module dock with the telescope and takes over.

Last edited by NutsAboutAmiga on 17-Nov-2023 at 10:39 PM.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

kolla

Re: some words on senseless attacks on ppc hardware
Posted on 17-Nov-2023 23:58:28

[ #45 ]

Elite Member

Joined: 20-Aug-2003
Posts: 3474
From: Trondheim, Norway

@NutsAboutAmiga

Quote:

NutsAboutAmiga wrote:
@kolla

They where talking about upgrading Hubble I know, as part of refueling maintenance project, a new module dock with the telescope and takes over.

Yes, Hubble can be serviced, since it’s close by, but many if not most, are out of reach for various reasons… distance being one.

Just about all computer hardware in space is “old news”, not just because decisions about hardware and software is typically done years before launch, but also because not every generation or type of computer hardware end up as radiation hardened. For mission critical systems it’s also not uncommon to not rely on one system from one team, but have an odd number of systems built with different architectures by different teams, and let them all solve problems in parallel, taking the consensus as the output. One these has often been powerpc for the last couple of decades.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

DiscreetFX

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 1:05:23

[ #46 ]

Elite Member

Joined: 12-Feb-2003
Posts: 2555
From: Chicago, IL

@amigang

Can PPC solve the thrust problem? Humans are unable to make spacecraft that reach near light speed.

_________________
Sent from my Quantum Computer.

Status: Offline

kolla

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 1:15:47

[ #47 ]

Elite Member

Joined: 20-Aug-2003
Posts: 3474
From: Trondheim, Norway

@DiscreetFX

OS/2 Warp for PowerPC?

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

DiscreetFX

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 1:21:52

[ #48 ]

Elite Member

Joined: 12-Feb-2003
Posts: 2555
From: Chicago, IL

@kolla

Last update was ages ago.

4.52 / December 2001; 21 years ago

https://en.wikipedia.org/wiki/OS/2

_________________
Sent from my Quantum Computer.

Status: Offline

agami

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 3:04:41

[ #49 ]

Super Member

Joined: 30-Jun-2008
Posts: 1958
From: Melbourne, Australia

@kolla

Quote:
kolla wrote:
@NutsAboutAmiga

PowerPC was mainstream long enough to need support for decades to come, and the architecture still has a few niches, and that’s really all it takes to keep on living.

You mean keep on existing.
Existing is not the same as living.

Point-in-time thinking does not help. Not enough of you employ trend thinking.
The trend for PowerPC/Power ISA/OpenPower, is that it is a shrinking market. Systems do not grow in shrinking ecosystems. They die out.

Now y'all can stand there on the 15-degree inclined deck of the Titanic and think how it's still not bad as most of the ship is still above water. Or y'all can acknowledge where the ship is headed and jump across to another fully seaworthy vessel. With state-of-the-art iceberg detection systems, fully segmented hull, GPS navigation, satellite communications, and an adequate number of life boats.

Last edited by agami on 18-Nov-2023 at 03:23 AM.
Last edited by agami on 18-Nov-2023 at 03:05 AM.

_________________
All the way, with 68k

Status: Offline

agami

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 3:22:31

[ #50 ]

Super Member

Joined: 30-Jun-2008
Posts: 1958
From: Melbourne, Australia

@amigang

Quote:
amigang wrote:
There is one market where PowerPC may have a future and remain the processor of choice,

NASA / SPACE!

So take that haters 😀

Also military seem to like to use PowerPC,

Until a better radiation-hardened part comes along with x86+FPGA, ARM or RISC V cores.

The age of general computing CPUs is finally coming to an end.
Fit-for-purpose logic and software defined silicon is a core pillar of the 3rd revolution in computing.
DARPA, like many others, is investing big in RISC V (among other things).

Also, for the record, I'm not a hater of PowerPC. I was a big fan of it in the '90s and was as devastated the day Apple announced the switch away from PowerPC as I was when NASA announced the end to the Space Shuttle program.

But my emotions were irrelevant when it comes to the logic of the decisions. Despite my personal feelings on the matter, these were fully justified logical decisions.
At the end of the day, reason must prevail.
And there is no good reasoning in the decisions to push on with PowerPC for the Amiga ethos.

_________________
All the way, with 68k

Status: Offline

fishy_fis

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 4:42:55

[ #51 ]

Elite Member

Joined: 29-Mar-2004
Posts: 2170
From: Australia

@NutsAboutAmiga

Quote:
it’s the equivalent as to insulting a breast toaster

Damn!!

What the heck do you get up to behind closed doors in your spare time?
Where would someone even get such a torture device?

Actually, no, please don't tell me. It sounds serial-killer-ey.

Status: Offline

cdimauro

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 7:10:37

[ #52 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4432
From: Germany

@K-L

Quote:

K-L wrote:
@Thread

Not enough popcorn (it was pretty obvious it was going to take this way) with the usual suspects making ultra long answsers that nearly no one will be willing to read.

Who cares? If YOU don't like it you already have the solution: don't read it and skip the comment...

@Hypex

Quote:

Hypex wrote:
@matthey

Quote:
IBM was unlikely to produce "cheaper" POWER chips. They need high end server and workstation features and POWER backward compatibility making their POWER cores huge. They also demand high margins.

It also looks the recent open sourcing of POWER so people can freely fabricate their own designs hasn't encouraged any new lower cost chips to be produced either.

Quote:
Simple code like this to bump the open count of a library is common on the 68k but is not safe on the PPC. The 68k uses a single instruction atomic RMW operation while RISC uses separate load, add and store instructions where another program can change the value before it is updated. Most Amiga code is written for a CISC big endian CPU and the 68k remains the best choice for compatibility.

The most sensitive would be Forbid/Permit. Actually more sensitive would be Disable/Enable. And these need to modify a single byte. PPC doesn't like to work with scalars that aren't words. Especially with alignment and Amiga structures are not long aligned.

However PPC does include instructions for atomic access. The lwarx and stwcx pair. In typical PPC style it must be loaded, modified, then stored back. This looks like a logical way to perform an atomic lock. Or atomic increment. However it must be accessed as a whole word. Easy enough to long align it and mask it off. But that looks somewhat sloppy and code should be able to directly modify the intended scalar. Of course the 68K had no problem doing a simple atomic add. But I do wonder, if 256 tasks in a row Forbid, then relinquished it temporary with a Wait, that would surely break it.

Because Forbid/Permit, Disable/Enable and their ridiculous 8 bit counters are clear examples of why the Amiga o.s. BadByDesign.
Quote:
Amusingly enough, there is a Microsoft blog that talks about it. It's fairly recent from 2018, but the original must surely be 2 decades old going by the context, since the only time I recall Microsoft touching PPC was for the short lived WindowsNT PPC port.

https://devblogs.microsoft.com/oldnewthing/20180814-00/?p=99485

Well, this is not "a blog": it's THE blog. Raymond Chen is simply amazing!

@OneTimer1

Quote:

OneTimer1 wrote:
@bhabbott

Quote:

Actually it was bit more than just 'some' assembler.

Good luck translating that to ppc!

Well he could have used AROS but we all know 'a real AmigaOS' can not be based on such a highly portable open source abomination, that was translated to x86 before it got to 68k.

That's wrong: AROS was written in C with the scope of being portable to multiple architectures / platforms.

It was NOT written for x86. However x86 was the first architecture that they target, but with scope of expedite the o.s. development (since it was the mainstream platform of the time. And still is).
Quote:
AFAIK there is a PPC version running hosted on Linux

There's an AROS native PowerPC version: http://aros.org/nightly1.php

"sam440-ppc-boot-iso

The native version for Sam440EP, Sam440EP Flex and Sam460ex computers. This is the bootable CD-ROM ISO image including ub2lb (Parthenope) SLB. It contains all core AROS system files."

but I don't know its status.
Quote:
but that would be to simple for real Amiga-OS-Fourians.

@bhabbott

Quote:

bhabbott wrote:
@matthey

C source code tends to be a lot terser than assembler so those numbers might be a bit misleading. I'm guessing about 1/4 of the actual kickstart binary is asm. However when it come to porting it's the source code that matters. Translating assembler to C isn't trivial.

It was already done by OS4 developers, which have replaced almost all 68k code in the o.s., and added a bunch of PowerPC assembly for the most critical / very low-level parts.

Anyway, translating assembly (not "assembler"! This is the compiler, not the language!) to C could be trivial, having the right tool.
Quote:
That's a very good reason to use 68k - but not a good reason to discuss it here. This forum is about OS4 and ppc hardware, not 68k hardware.

This forum is general about the Amiga technology and similar things, so it's NOT about OS4 NEITHER PowerPC-only.

This sub-forum is about OS4, anyway.
Quote:
OP says 'it is stupid to made software for emulator when simply switch to native code made everything run many times faster'. But emulating 68k is only done when running classic Amiga software that is designed to run on classic 68k Amiga hardware, where it is absolutely required. Vampire and PiStorm are ways of getting a faster 68k CPU into your classic Amiga - and that is all. How they do it internally (HDL or software emulation) is irrelevant. This has nothing to do with OS4, which runs natively on ppc.

AFAIK nobody is attacking ppc hardware. Nobody is stopping ppc 'Amiga' owners from porting C code to their systems. Nobody is preventing development of OS 4 and apps for it. Then the OP says there are 'no reasons to use aros on x86 or arm.' The OP complains about people 'attacking ppc' when they are doing the same to other platforms, telling us we have 'no reason' to use them.

The OP is simply THE Parrot / Troll of the forum, so whatever he says usually is pure BS. Forget it.

@NutsAboutAmiga

Quote:

NutsAboutAmiga wrote:
@bhabbott

Quote:
ppc code takes up twice the space of 68k code.

Not always, in the case of JMP its 6 bytes on 680x0, on PowerPC all instructions are 4 bytes.
so actually 66% of the size compared to 680x0, I must admit, you need load register with value to jump. So, it takes two instructions. That’s 8 bytes, but that’s not twice the size,

Are you kidding? The JMP used on Amiga libraries allows you to reach ANY address on the 32-bit address space.

Try to do it on PowerPC and tell me how many instructions are needed for that...
Quote:
compared to 6 bytes, It is really not that important, C code is bigger because of debug stuff, and standard libraries, if you compile without the standard libraries, and without the debug symbols, the exe files are often not that big. So blame GCC, and newlibs / clib, etc, not PowerPC, they are just as big on x64 and ARM.

Irrelevant: all this burden you have on ANY hardware platform.
Quote:
another fun fact is that bigger code is often faster, take unrolled loops as an example. Optimizing for different CPU target can produce completely different results, a 680x0 CPU often not a lot instruction cache, so unrolling loops can infect make the code slower. But lets say you had a new super duper 68090 CPU with lots of instruction cache you most definitely want to unroll the loops, and that will make the code fat. So we are not really comparing apples to apples here.

Exactly: you're comparing not unrolled code with unrolled code, which is a complete non-sense, since you can have / do both for ANY processor architecture (yes, even on 68k).
Quote:
We also talk about out of order execution, a nice technique, where if next instructions does not depend on the previous instruction can be executed in parallel, PowerPC with its 32 registers, can take advantage, by loading in values in series into R0,R1,R2,R4, etch, and the saving values R0,R1,R2,R4, because loading a value in R1, does not depend on value loaded into R0, parts of instructions can overlap, its complicated write code like that because you need to know what instructions benefits from out of order, and you know what instruction that collides.

This is about microarchitecture, NOT about architecture.
Quote:
It’s actually not the syntax of PowerPC that’s most complicated, yes, their horrible names of a few instructions, most of it is understandable.

So, it's complicated... by definition.
Quote:
A good compiler most often has better understanding of this things then an simple human does not. When hand write assembler code it can end up slower than a good C compiler, yes there are exceptions where the compiler sucks. But really its often a question of mastering the language, and giving the compiler the hints, to make the code faster. For example, you can tell arguments of function to be assigned to registers, or you can tell the iterators to be registers, you don’t need iterators often.

This you can also do on ANY architecture...
Quote:
When comes down to fast code, it’s so often memory copy, stuff and table manipulation, reading in series of data into uninterrupted, can also reduce need for data cache flushes. If you have AltiVec its naturally the best tool in the PowerPC tool box, but as I wrote before its not the only tool.

Tell Trevor about Altivec...
Quote:

NutsAboutAmiga wrote:
@DiscreetFX

It’s called Power, not PowerPC, same chip markers who make PowerPC for embedded also makes ARM chips. As everyone asking for ARM, and no one is asking for PowerPC anymore, development has halted.

IBM used to make PowerPC chips by cutting Power chips in half, so its not dead, dead, as long as IBM has interest in it, but other embedded chipset makers are as faithful as your x wife.

It's dead because practically everybody moved to either ARM or RISC-V.

Where are you leave, in a cave? Don't you follow the market?
Quote:
In any case the argument why its better to write 680x0 assembler over C or C++ does not hold water, in particular if want to port the OS to a different CPU.

And if the end gool is to port everything to C, it does not matter how fat the PowerPC binaries are, it’s the equivalent as to insulting a breast toaster. it has no meaning.

But C code compiled for PowerPC results... in FAT binaries. That's the point here.

@amigang

Quote:

amigang wrote:
There is one market where PowerPC may have a future and remain the processor of choice,

NASA / SPACE!

https://www.theverge.com/tldr/2021/3/2/22309412/nasa-perseverance-mars-rover-processor-cpu-imac-1998

So take that haters 😀

Also military seem to like to use PowerPC,
https://www.unmannedsystemstechnology.com/2022/06/new-sosa-aligned-powerpc-sbc-for-military-and-defense-applications/

ROFL

Status: Offline

matthey

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 8:11:29

[ #53 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2747
From: Kansas

bhabbott Quote:

ppc code takes up twice the space of 68k code.

NutsAboutAmiga Quote:

Not always, in the case of JMP its 6 bytes on 680x0, on PowerPC all instructions are 4 bytes.
so actually 66% of the size compared to 680x0, I must admit, you need load register with value to jump. So, it takes two instructions. That’s 8 bytes, but that’s not twice the size, compared to 6 bytes, It is really not that important, C code is bigger because of debug stuff, and standard libraries, if you compile without the standard libraries, and without the debug symbols, the exe files are often not that big. So blame GCC, and newlibs / clib, etc, not PowerPC, they are just as big on x64 and ARM.

The 68k JMP instruction can be as little as 2 bytes depending on the size of the displacement/absolute address data. The data can be 0, 2 or 4 bytes that add to the size of the variable length instruction. A 4 byte PPC branch instruction uses either 14 or 24 bit data. A PPC 14 bit branch range is similar to that of a 4 byte 68k JMP with 16 bit branch range as PPC did not encode the lower bits which is why a compressed encoding was so difficult (PPC VLE uses a whole new encoding and incompatible mode). The 68k JMP xxx.L that is 6 bytes covers the whole 32 bit address range which is not possible with a 32 bit PPC branch instruction and the reason why at least 2 PPC instructions are needed using at least 8 bytes.

Debug data should be similar for both the 68k and PPC executables which should be stripped for comparisons. I expect PPC code to average about 50% larger in size than 68k code. PPC may use "twice" the memory of the 68k though. The 68k is a stack and data miser compared to PPC. AmigaOS 4 PPC requires about 10 times the amount of stack of the 68k.

https://amigaworld.net/modules/newbb/viewtopic.php?topic_id=45043&forum=14&start=200&200#864317 Quote:

I was talking more in general but accessing memory is the major bottleneck of load/store architectures. Not reduced instruction set load/store architectures should be efficient when all data is in registers although larger immediates can result in multiple dependent instructions. Accessing memory not only causes increased code memory traffic but it likely cause increased data traffic as well. The 68k AmigaOS 3.1 had a default stack size of about 4kiB while some increased this to 8kiB for safety. It looks like 64kiB is the default stack size for PPC AmigaOS 4 and a Hyperion AmigaOS Core Developer broadblues recommended increasing this to 80kiB for safety.

https://forum.hyperion-entertainment.com/viewtopic.php?t=2934

Some of the difference is more efficient stack alignment but PPC often needs double the stack space to go with 50% more code size and both increase memory traffic. More GP registers is supposed to reduce memory traffic though?

NutsAboutAmiga Quote:

another fun fact is that bigger code is often faster, take unrolled loops as an example. Optimizing for different CPU target can produce completely different results, a 680x0 CPU often not a lot instruction cache, so unrolling loops can infect make the code slower. But lets say you had a new super duper 68090 CPU with lots of instruction cache you most definitely want to unroll the loops, and that will make the code fat. So we are not really comparing apples to apples here.

The biggest disadvantage of loop unrolling and function inlining is that "bigger code" is slower due to increased instruction cache misses. Performance gained by removing stalls and decreasing the number of executed instructions can more than offset the performance loss of larger code though.

Loop:
lw $t0, 0($s1) // $t0 = array element
add $t0, $t0, $s2 // add scalar in $s2
sw $t0, 0($s1) // store result
addi $s1, $s1, -4 // decrement pointer
bne $s1, $zero, Loop // branch if $s1 != 0

The MIPS RISC code above can be found at https://people.cs.pitt.edu/~melhem/courses/1541p/pipelining5.pdf. There is a 1 cycle load-to-use stall on the shallow MIPS pipeline after the lw instruction so an instruction is moved between.

Loop:
lw $t0, 0($s1) // $t0 = array element
addi $s1, $s1, -4 // decrement pointer
add $t0, $t0, $s2 // add scalar in $s2
sw $t0, 4($s1) // store result
bne $s1, $zero, Loop // branch if $s1 != 0

Deeper pipelines often have a larger load-to-use penalty. If there was a 3 cycle load-to-use penalty like the popular Cortex-A53, there isn't enough instructions that can be moved between. The RISC example at the link shows unrolling the loop to reduce the decrement and branch instruction overhead but a 3 cycle load-to-use penalty increases the advantage of unrolling by giving more independent instructions to fill the load-to-use bubbles.

Loop:
lw $t0, 0($s1)
lw $t1, -4($s1)
addi $s1, $s1, -8
add $t0, $t0, $s2
add $t1, $t1, $s2
sw $t0, 8($s1)
sw $t1, 4($s1)
bne $s1, $zero, Loop

This is unrolled twice and can't eliminate the 3 cycle load-to-use penalty so it is strongly encouraged to unroll it 3 times. What will be three lw and sw instructions usually can't be executed together as most cores can only perform a single read or write mem access per cycle. Many registers are required as well.

loop:
add.l d2,(a1) ;pOEP
subq.l #4,a1 ;sOEP
bne loop ; folded away when predicted

The 68060 executes this code in 1 cycle per load+calc+store iteration where scheduled MIPS code required 4 cycles per iteration. Unrolling two times is 5 cycles per 2 iterations or 2.5 cycles per iteration. Even unrolling 3 times can't match the performance of the 68060 which is also using only 2 registers and 6 bytes of code. The superscalar MIPS shows a "Load-use hazard" in slide 83 with example code and states "Should be separated by at least one cycle" which is the load-to-use penalty. PPC cores originally tried to keep pipelines shallow to minimize load-to-use penalties but they did less work and couldn't be clocked up. The RISC-V U74 core uses a similar pipeline design as the 68060 with the tremendous value of avoiding all load-to-use penalties but the RISC-V ISA does not allow a single cycle "add.l d2,(a1)" in one pipeline.

NutsAboutAmiga Quote:

We also talk about out of order execution, a nice technique, where if next instructions does not depend on the previous instruction can be executed in parallel, PowerPC with its 32 registers, can take advantage, by loading in values in series into R0,R1,R2,R4, etch, and the saving values R0,R1,R2,R4, because loading a value in R1, does not depend on value loaded into R0, parts of instructions can overlap, its complicated write code like that because you need to know what instructions benefits from out of order, and you know what instruction that collides. It’s actually not the syntax of PowerPC that’s most complicated, yes, their horrible names of a few instructions, most of it is understandable. A good compiler most often has better understanding of this things then an simple human does not. When hand write assembler code it can end up slower than a good C compiler, yes there are exceptions where the compiler sucks. But really its often a question of mastering the language, and giving the compiler the hints, to make the code faster. For example, you can tell arguments of function to be assigned to registers, or you can tell the iterators to be registers, you don’t need iterators often.

I have read the PowerPC Compiler Writer's Guide.

https://cr.yp.to/2005-590/powerpc-cwg.pdf Quote:

The example in Figure 4-6 uses pointer chasing to illustrate how execution pipelines can stall because of the latency for cache access. This latency stalls dispatch of the dependent compare, creating an idle execution cycle in the pipeline. Moving an independent instruction between the compare and the branch can hide the stall, that is, perform useful work during the delay. The delay is referred to as the load-use delay. The same principle applies to any instruction which follows the load and has operands that depend on the result of the load.

During a single cycle load-to-use stall, a 2 instruction issue 2 instruction completion per cycle superscalar core could benefit from 2 independent instructions during the delay and another could potentially be issued and executed in the same cycle as the load. The PPC 604 is the newest PPC CPU referenced in the guide and can issue and complete 4 instructions per cycle so could benefit from more independent instructions during the load-to-use stall. Loop unrolling is more important for the PPC 604 and it has a 2nd load/store unit so it can perform 2 sequential loads per cycle increasing the performance advantage of loop unrolling. Newer PPC cores often have deeper pipelines with larger load-to-use delays and need more loop unrolling but don't have enough independent instructions to fill all the bubbles. The guide talks a lot about instruction scheduling considering limited OoO is supposed to fix the problem by reordering loads and stores as the next paragraph in the guide explains.

https://cr.yp.to/2005-590/powerpc-cwg.pdf Quote:

To enhance performance, some PowerPC implementations may dynamically reorder the execution of memory accessing instructions, executing loads prior to stores with the intent of preventing processor starvation. Processor starvation occurs when an execution unit is stalled waiting for operand data. This reordering could violate program semantics if a reordered load is executed prior to a store that modifies an overlapping area in memory. This situation is called a load-following-store contention. PowerPC implementations must correct this situation in order to maintain correct program behavior, but the mechanism of correction varies among implementations. The correction, however, must result in re-executing the load in program order. This serialization of the load may involve redispatching or even refetching the load and subsequent instructions, thus significantly adding to the effective latency of the load instruction. This situation can arise in implementations with a single Load-Store Unit that dynamically reorder the loads and stores, or in implementations with multiple Load-Store Units, which can execute a load instruction and a store instruction during the same cycle in different units.

Does OoO combined with weak memory ordering make implementing SMP more difficult in PPC AmigaOS 4?

The Guide's glossary has a nice definition for load-use delay.

Quote:

load-use delay - The time between when a value is requested from cache or memory and when it is available to a subsequent instruction.

Even the well written guide was not enough to save PPC. Neither was limited OoO as the load-to-use stalls crept up as the pipelines deepened and scheduling around stalls became impossible.

NutsAboutAmiga Quote:

When comes down to fast code, it’s so often memory copy, stuff and table manipulation, reading in series of data into uninterrupted, can also reduce need for data cache flushes. If you have AltiVec its naturally the best tool in the PowerPC tool box, but as I wrote before its not the only tool.

A SIMD unit is specialized. It should be able to do large memory copies without flushing the data cache but it isn't as good for dealing with small memory copies, tables and structures like the Amiga uses. These you want to be cached and a 68k CPU core is the preferred solution.

Last edited by matthey on 18-Nov-2023 at 04:02 PM.

Status: Offline

Karlos

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 13:10:19

[ #54 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

Comparing individual cherry-picked instruction sizes to make a case for overall code density? Are you guys for real? There are always going to be examples favouring each but what matters is the statistical frequency of instructions in real world cases.

You need to compare the typical object code output by your best compilers for the same input code for equivalent optimisation (do one round for speed and one round for size). I mean if you want to have a serious discussion anyway.

_________________
Doing stupid things for fun...

Status: Offline

NutsAboutAmiga

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 14:32:55

[ #55 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12993
From: Norway

@fishy_fis

You should never go full HotRod

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

OneTimer1

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 17:01:20

[ #56 ]

Super Member

Joined: 3-Aug-2015
Posts: 1257
From: Germany

Quote:

Karlos wrote:

Are you guys for real?

No they are not, on the most embedded platforms, where code size might been an advantage, no one cares about it.

Status: Online!

NutsAboutAmiga

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 19:50:03

[ #57 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12993
From: Norway

@OneTimer1

But its relevant in Amiga land where Amiga's are using old CPU’s like 68020, minimum of instruction and CPU cache, without FPU and MMU.

Often in Amiga land, code is optimized for minimum specs, sometime people spend lots of time trying to get things working on Kickstart1.2 yet no one uses that roms anymore.

If actually care about fitting things on floppy disks, you will result to compressing exe files.

On the contrary, on AmigaOS4.1 where have lot space, we often want good debug reports, so we don’t remove the debug stuff, and sometimes we compile with extra debug enabled, like -gstab

Last edited by NutsAboutAmiga on 18-Nov-2023 at 08:11 PM.
Last edited by NutsAboutAmiga on 18-Nov-2023 at 08:11 PM.
Last edited by NutsAboutAmiga on 18-Nov-2023 at 08:08 PM.
Last edited by NutsAboutAmiga on 18-Nov-2023 at 08:07 PM.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

OneTimer1

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 21:37:33

[ #58 ]

Super Member

Joined: 3-Aug-2015
Posts: 1257
From: Germany

@NutsAboutAmiga

Quote:

NutsAboutAmiga wrote:

But its relevant in Amiga land ...

So we should rip out the 68k CPU and replace it with an ARM that has Thumb ... */sarcasm*

No, we won't do it, because software compatibility is much more important than code density and we could use an ARM only, if it runs 68k code even if it will means 'bloat' compared to 68k CPUs.

Last edited by OneTimer1 on 18-Nov-2023 at 09:43 PM.
Last edited by OneTimer1 on 18-Nov-2023 at 09:42 PM.

Status: Online!

matthey

Re: some words on senseless attacks on ppc hardware
Posted on 18-Nov-2023 21:53:44

[ #59 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2747
From: Kansas

Karlos Quote:

Comparing individual cherry-picked instruction sizes to make a case for overall code density? Are you guys for real? There are always going to be examples favouring each but what matters is the statistical frequency of instructions in real world cases.

You need to compare the typical object code output by your best compilers for the same input code for equivalent optimisation (do one round for speed and one round for size). I mean if you want to have a serious discussion anyway.

The paper "SPARC16: A new compression approach for the SPARC architecture" has the best code density comparison I'm aware of. The SPEC2006 benchmark programs are compiled with GCC for code size with the following results.

https://www.researchgate.net/publication/221306454_SPARC16_A_new_compression_approach_for_the_SPARC_architecture

The 68k had the best code density by geometric mean while Thumb was the best by arithmetic mean (AM is biased by size while GM is not so I prefer GM). This likely means that one or two of the 68k compiles were outliers that had compiler issues. Already by 2009 when the paper was released, 68k compiler support had declined. The PPC code is about 45% larger by GM which is normalized to the 68k as it has the best code density by GM. There are developers who swap GCC GAS for vasm as it has a better peephole optimizer so there is significant room for improvement, especially for FPU code. GCC 3.4.6 was released in 2006 and I noticed a significant bloat in code and decline in code quality after GCC 3.3. Vince Weaver's hand assembled results show PPC with 81% larger code than the 68k and the 68k has 45% better code density.

https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vTyfDPXIN6i4thorNXak5hlP0FQpqpZFk2sgauXgYZPdtJX7FvVgabfbpCtHTkp5Yo9ai6MhiQqhgyG/pubhtml?gid=909588979&single=true&pli=1

RISC-V research found that every 25%-30% code density improvement is like doubling the size of the instruction cache so an 8kIB instruction cache on the 68k is nearly the performance of a 32kiB instruction cache on PPC. Some people don't think that's important except for embedded systems yet the fat 8 of 15 architectures in the graph above are dead and they don't even include PA-RISC or it would be 9 of 16. AArch64 and RVC have better code densities than the old ARM ISA in the graph. Can ISA viability be predicted by code density? Was throwing out the 68k baby with the bath water to switch to PPC a bad politically motivated move by Motorola?

Status: Offline

Karlos

Re: some words on senseless attacks on ppc hardware
Posted on 19-Nov-2023 1:00:13

[ #60 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

I put it to you that the code density argument is somewhat moot here. The smallest instruction cache on a PPC machine running OS4 is 16K, which is the 603e on the BlizzardPPC. That goes up to 32K on the 604e. So by your reasoning the 68060, which is the highest performing "true 68K" has the same code density/icache profile as the CSPPC, which already starts at around 200MHz and has a faster bus.

Yet as anyone whose ever measured it can attest, 68K code JIT executed on that same 604 outperforms the same 68K by a wide margin. You can make the same icache/code density comparison between the 040 and the 603e. And the same facts hold. I myself measured a 10x performance improvement for some compute bound code relative to 040/25 MHz for that specific combination.

I'm not condemning PPC because it's objectively bad in some way. Sure the code density is lower than 68K but you have to be very kind of special to think that it makes that much difference in reality - at least in the context of Amiga software.

I don't dislike the PPC architecture. In fact, there's nothing about PPC itself I dislike. I am not a hater.

I am, however, a realist. The objection to PPC is that it's too niche and too expensive which is due to the fact that as a desktop processor it basically died the day apple abandoned it for intel. Which was good news for a while - suddenly there was a surge of reasonable second hand, well-engineered hardware, but no, we must only have some new hardware. I am sure OS4 for Mac PPC would've sold far more copies than the rest put together. But that's a different conversation.

All the technical arguments for sticking to PPC are absolute unmitigated drivel. Everything written for it that's worthwhile keeping is already portable enough to move to another architecture. There may be some bits and bobs in assembler here and there but it's going to be a small minority for sure.

Everything that was said about 68K being a dead end when PPC first appeared applies to PPC now and has done for years.

Last edited by Karlos on 19-Nov-2023 at 01:01 AM.

_________________
Doing stupid things for fun...

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle