Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6071 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

4 crawler(s) on-line.

97 guest(s) on-line.

1 member(s) on-line.

matthey

You are an anonymous user.
Register Now!

matthey: 4 mins ago

MEGA_RJ_MICAL: 16 mins ago

agami: 1 hr 46 mins ago

K-L: 3 hrs 16 mins ago

sibbi: 4 hrs 20 mins ago

Karlos: 4 hrs 50 mins ago

OlafS25: 5 hrs 22 mins ago

OneTimer1: 6 hrs 12 mins ago

hardwaretech: 6 hrs 39 mins ago

CosmosUnivers: 7 hrs 6 mins ago

Forum Index

Amiga OS4 Hardware

some words on senseless attacks on ppc hardware

Poster

Thread

bhabbott

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 1:41:53

[ #81 ]

Regular Member

Joined: 6-Jun-2018
Posts: 339
From: Aotearoa

@cdimauro

Quote:

cdimauro wrote:
@bhabbott

I beg to differ: it's absolutely relevant, otherwise you're comparing apples with oranges.

No, it's not comparing 'apples to oranges', it's comparing the 68k and ppc versions of a closed-source commercial program. Unless the IBrowse people decide to open-source it the binaries are all we are ever going to get. That means that in practice the difference is what you see - the ppc version being much larger than its 68k equivalent.

I bet they didn't optimize the 68k version for size, because it needs the speed more. But how much difference would it make if they did? When compiling A09 I optimized for speed, but I just now recompiled it for size and guess how much smaller the executable got? a miserable 12 bytes. I also checked the size of the executables in memory, and again the difference was only 12 bytes (1011088 vs 1011100). So there's one case where changing the optimization made practically no difference. In my experience this is typical of SASC - you wait for ages while it 'optimizes' the code, and it makes hardly any difference. :(

Quote:
Quote:
bhabbott wrote:
I recompiled A09 with SASC and the size went down from 102240 bytes to 95612 bytes (6.5% smaller).

Even better, ok, but it's irrelevant because you can't do the same for PowerPCs.

No, it's totally relevant. As Karlos said, "You need to compare the typical object code output by your best compilers...". With ppc in particular (and many other platforms) theoretical discussions about how code could be smaller are worthless if there is no compiler available that can achieve it. Some CPUs are easier to make compilers for that generate efficient code. If the Amiga happens to have a more efficient compiler for 68k code then that's a factor in its favor. If a 30 year old compiler can beat the latest tech that just shows how good 68k is!

The other question is what is 'best'? I think it's generally 'best' to trust the developer who created the code to know what's 'best' for it. I think the IBrowse people know what they are doing.

But hey, perhaps you are right. Maybe VBCC produces much smaller ppc code than GCC. If you want to know then recompile A09 with VBCC and tell us what you get. I doubt it will make a big difference, but you won't know until you try.

Quote:
Quote:
bhabbott wrote:

This isn't a perfect measure of machine code size because many programs include data with a fixed size (eg. text strings) that make the ratio smaller. Both IBrowse and A09 have a lot of it.

Which is ok, because data should also be included IMO: some architectures need to store immediates to memory because they have no way to directly specify them in the instructions.

Taking into account only instructions and not counting such data would be like cheating, in my opinion.

I agree. I'm willing to let it slide because it's the total size that matters.

Some programs have a lot more data than code, so code size isn't that important. Others are practically all code, and then it matters a lot. Duke Nukem is a good example of the latter.

OTOH if a CPU cannot handle data without jamming it into instructions then it is relevant. I have worked with other CPUs that had such issues:-

RCA CDP1802 - has sixteen 16 bit registers. Sounds wonderful, but you can only load them 8 bits at a time. That means loading an immediate address uses 33% more space and takes longer than CPUs that have 16 bit immediate arguments. The 16~24 clock cycles per instruction don't help either.

Microchip PIC16 - cannot read data from ROM. The only way to do it is with an immediate load instruction. Luckily it has an instruction called RETLW (return with 'literal' byte in working register) which can be used with computed GOTO to read tables of data. Each instruction uses 14 bits, so data wastes 43% of the memory it's stored in - which is a lot when you only have 1k of ROM.

Quote:
However it could be a pain to do it and using the overall executable size might be a good compromise anyway. As long as the binaries use the same executable format (e.g.: comparing HUNK vs HUNK is ok. But definitely NOT comparing HUNK vs ELF).

Size on disk can matter too. If ELF or HUNK format is significantly more bloated then that's a factor that should be considered. But as you say, overall executable size is probably close enough for rough comparisons, which is all we are doing here (when I said ppc code is twice as large, I didn't mean literally 2.000 times bigger!).

Status: Offline

cdimauro

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 5:42:25

[ #82 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@OneTimer1

Quote:

OneTimer1 wrote:
@cdimauro

Quote:

cdimauro wrote:

I assume that you aren't a fun of computer architectures and don't follow their evolution, ...

Wrong, it's my job and no one there is asking for code density, they are asking for availability.

Then the second part of my sentence doesn't apply.

As I've said, code density is one of the key topics when talking about computer architecture: you easily find TONs of publications, articles, etc.. and architectures have specific extensions only for this. That's a fact which anyone can easily verify.

If on your work this doesn't matter, well, I can understand but then you're living on your silos...

@bhabbott

Quote:

bhabbott wrote:
@cdimauro

Quote:

cdimauro wrote:
@bhabbott

I beg to differ: it's absolutely relevant, otherwise you're comparing apples with oranges.

No, it's not comparing 'apples to oranges', it's comparing the 68k and ppc versions of a closed-source commercial program. Unless the IBrowse people decide to open-source it the binaries are all we are ever going to get. That means that in practice the difference is what you see - the ppc version being much larger than its 68k equivalent.

Then it's very simple: don't use it when comparing architectures about the code density topic. Easy-peasy...
Quote:
I bet they didn't optimize the 68k version for size, because it needs the speed more. But how much difference would it make if they did? When compiling A09 I optimized for speed, but I just now recompiled it for size and guess how much smaller the executable got? a miserable 12 bytes. I also checked the size of the executables in memory, and again the difference was only 12 bytes (1011088 vs 1011100). So there's one case where changing the optimization made practically no difference. In my experience this is typical of SASC - you wait for ages while it 'optimizes' the code, and it makes hardly any difference. :(

That's for SASC. Other compilers could do (very) different.

You can't take a single case and generalize to the overall situation...
Quote:
Quote:
Even better, ok, but it's irrelevant because you can't do the same for PowerPCs.

No, it's totally relevant. As Karlos said, "You need to compare the typical object code output by your best compilers...".

Which doesn't work when talking about code density studies. That's very simple to understand: you need to fix ALL variables EXCEPT one (the architectures to be compared).

That's why such studies use a compiler like GCC (some also use LLVM) which is available for the benchmarked architectures, which has an overall good support for all of them.

Have you ever seen Intel's compiler used on such studies? Rarely (usually when the studies involve benchmarking different compilers), because of the above. However you should question yourself which compiler gives (on average) better result for x86 and x64 architectures and you can give yourself the easy answer. Which, unfortunately, does NOT support your statement...
Quote:
With ppc in particular (and many other platforms) theoretical discussions about how code could be smaller are worthless if there is no compiler available that can achieve it. Some CPUs are easier to make compilers for that generate efficient code. If the Amiga happens to have a more efficient compiler for 68k code then that's a factor in its favor. If a 30 year old compiler can beat the latest tech that just shows how good 68k is!

Well, I've no problems even taking into different compilers for different architectures with such benchmarks. The audience of the results will be limited, because of what I've reported above, but it could be an interesting comparison, anyway.
Quote:
The other question is what is 'best'? I think it's generally 'best' to trust the developer who created the code to know what's 'best' for it. I think the IBrowse people know what they are doing.

But hey, perhaps you are right. Maybe VBCC produces much smaller ppc code than GCC. If you want to know then recompile A09 with VBCC and tell us what you get. I doubt it will make a big difference, but you won't know until you try.

Just by case (well, not really: I've some interest), yesterday I've spent a lot of time tinkering on VBCC, VLink and VAsm, generating cross-compilers/assemblers at least for 68k, i386/x86 and ARM (which to me are the most important architectures nowadays. I haven't setup anything for PowerPC for this reason) and I've also tinkered and inspected their sources (VBCC and VAsm, specifically. VLink looks like architectures independent).

What came out is that VBCC and VASM are very very strong / good for 68k and poor / very bad for i386/x86 and specially for ARM.
VBCC basically supports only the 80386 + the x87 FPU. There's no trace of newer architectures, SIMD, etc..
VAsm supports even the x64, but only up to the first architecture and it's also incomplete (a lot of stuff is missing).
ARM support... well... no comment: I've just tried to compile the simple Fibonacci and VBCC generated an empty file... I haven't check VAsm, anyway (because I've no source generated to be tested and I was too lazy to try something else).

Even checking at their source codes, VBCC has 1.5 more for 68k than for i386, whereas VAsm has 2.5 more for 68k than for x86. It's crystal clear that its target is 68k.

VBCC is good because it's a very small compiler and adding a backend for a new architecture (or enhancing an existing one) requires much less effort compared to LLVM and GCC, but the results are also very limited (because you can make benchmarks mostly only against 68k, which is the most advanced architecture available).
Vasm is more or less the same.

Unfortunately their source code doesn't look good: it's difficult to read because its terse, with a few or no comments (many are in German! But it's not a bit problem, because I understand something), scarse documentation.
And, what's even worse is that... there's no test suite for them! So, you change something and then you've to compile and check yourself if the produced output is what you expect. That's really terrible: it's like developing at the Stone Age time!

Anyway, maybe Vasm could be worth a try: it's simple enough and adding some test suite shouldn't be a big problem for me (that's what I've did in the last two decades. And Python helps a lot here).
Quote:
[quote]bhabbott wrote:

OTOH if a CPU cannot handle data without jamming it into instructions then it is relevant. I have worked with other CPUs that had such issues:-

RCA CDP1802 - has sixteen 16 bit registers. Sounds wonderful, but you can only load them 8 bits at a time. That means loading an immediate address uses 33% more space and takes longer than CPUs that have 16 bit immediate arguments. The 16~24 clock cycles per instruction don't help either.

Microchip PIC16 - cannot read data from ROM. The only way to do it is with an immediate load instruction. Luckily it has an instruction called RETLW (return with 'literal' byte in working register) which can be used with computed GOTO to read tables of data. Each instruction uses 14 bits, so data wastes 43% of the memory it's stored in - which is a lot when you only have 1k of ROM.

At least their cores are / should be small. On the extreme embedded market that's what is important.

Status: Offline

Hans

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 7:03:56

[ #83 ]

Elite Member

Joined: 27-Dec-2003
Posts: 5067
From: New Zealand

@matthey

Quote:
It is possible to do endian conversions at or near copy speed. It is also possible to do endian conversion mappings of whole pages which is a feature of PPC MMUs. What specifically makes a big endian OS difficult for Vulkan?

It's not the OS' endianness, but the CPU & GPU having opposite endianness. Vulkan's data buffers don't come with detailed descriptors of their contents. Endianness conversion is only possible if you know what format the data is ahead of time (16-bit endian conversion is different from 32-b or 64-bit). Getting the GPU to do endianness conversion would waste a lot of cycles, and also render texture interpolation/filtering hardware useless (because your endianness code runs after texture fetching). It's not feasible.

Another option would be for all Vulkan game ports to convert their data to little-endian before sending it to the GPU *if* the CPU & GPU endianness differ (which it currently does on all supported hardware, but this isn't guaranteed). Good luck with that...

The Vulkan specification was a draft when I was designing Warp3D Nova. I studied it, and realized it wasn't feasible unless both the CPU & GPU use the same endianness. This problem is why Warp3D Nova has functions to set the buffer layout. That way the endianness conversion can be done by the drivers

Hans

_________________
http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more. Home of the RadeonHD driver for Amiga OS 4.x project.
https://keasigmadelta.com/ - More of my work.

Status: Offline

MagicSN

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 9:00:36

[ #84 ]

Hyperion

Joined: 10-Mar-2003
Posts: 670
From: Unknown

@Hans

Thanks for explaining it in detail (with my earlier post I was referring to the private email you sent me some times ago, but the way I wrote it on the forum was probably not completely correct). After this more detailed explanation I actually understand it completely, thanks again

I guess in the current situation the "easiest" way to deal with it would be to port a Vulkan game over to gl4es/Warp3DNova. But of course as they already use different shader languages this would be work which requires someone who knows both OpenGLES and Vulkan very good.

Possibly it needs to be evaluated if AI could be used to "convert" the Shader language programs (not at runtime, but during development). Up to now of course nobody acquired a licence of a game which requires Vulkan.

Anyways this is why I said earlier in this thread "there is an advantage of taking x86" (even though I do not favor x86 - but for the sake of argument).

I also now understand why you have been favoring switching to Little Endian for some times.

Status: Offline

Karlos

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 10:27:51

[ #85 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@MagicSN

Endianness is one of those weirdly emotive things. The thing I always found odd, however, is that the 68K and PPC both basically treat their register files as little endian anyway. You do a byte or word sized operation on a register and it's always the least significant portion that is affected.

Last edited by Karlos on 20-Nov-2023 at 01:52 PM.

_________________
Doing stupid things for fun...

Status: Offline

kolla

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 18:24:28

[ #86 ]

Elite Member

Joined: 21-Aug-2003
Posts: 2917
From: Trondheim, Norway

@agami

Quote:
Until a better radiation-hardened part comes along with x86+FPGA, ARM or RISC V cores

In these scenarios it’s often the code (binaries) that sets the premise - a lot of code was made for powerpc, and as long as companies like Honeywell are able to provide hardened ppc systems for the existing code, that’s what’s being used.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

NutsAboutAmiga

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 18:52:33

[ #87 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12825
From: Norway

@Karlos

Quote:
is that the 68K and PPC both basically treat their register files as little endian anyway.

I find that odd thing to say.. LE is
16bit bit LE is this format in memory…
Low address/Low Byte, Hight address/Hight Byte
b07 b06 b05 b04 b03 b02 b01 b0 | b15 b14 b13 b12 b11 b10 b09 b08

16bit bit BE is this format in memory…
Low address/high Byte, Hight address/Low Byte.
b15 b14 b13 b12 b11 b10 b09 b08 | b07 b06 b05 b04 b03 b02 b01 b0

The BE endian format as how you read it, LE endian format assumes that memory should be backwards, you count down in memory. So, first address be 0xFFFFFFFF address, and then its kind makes sense, except it does not. (because then all the ascii text gets backwards)

Now why it does not make sense what you’re saying, is because, there is no address space in registers, so there is no order in memory, all register has a fixed size. I define that as undetermined or BE format if anything, all CPU I believe handles the registers the same, you shift to the left, move bits into high bits, shift to right it moves the bit to low bits. It does not suddenly gets pushed into low bits when shift left.

the only major difference is how its stored in memory, now that causes a bunch of issues.
and this why the format is important.

I can only think of one reason why LE can be good, and that’s when cast from int32 to int16, because when you can just read the first bytes, without any offset, but why the hell do you want to crop the numbers, and discard the precision, it does not seem like a valid argument to me.

Now don’t get me wrong, and you probably intended to, as you setup argument as maybe as bate, I do realize that most of the world standardized on LE format. And we must accept the inferior format, and that is sadly true, we are stuck with the LE format.

Last edited by NutsAboutAmiga on 20-Nov-2023 at 07:12 PM.
Last edited by NutsAboutAmiga on 20-Nov-2023 at 07:06 PM.
Last edited by NutsAboutAmiga on 20-Nov-2023 at 07:04 PM.
Last edited by NutsAboutAmiga on 20-Nov-2023 at 07:03 PM.
Last edited by NutsAboutAmiga on 20-Nov-2023 at 07:00 PM.
Last edited by NutsAboutAmiga on 20-Nov-2023 at 07:00 PM.
Last edited by NutsAboutAmiga on 20-Nov-2023 at 06:55 PM.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

Karlos

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 19:10:32

[ #88 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@NutsAboutAmiga

Stop. Nothing you have said can excuse the dissonance between performing operations on registers versus the same operation on a memory location.

Imagine for a moment you load a 32-bit value from memory to a register, then perform some byte sized operation and writing back the whole 32-bit value. Then contrast that with performing the same byte sized operation directly on the same 32-bit word memory. You have to remember to increase the address by 3 to make sure you modify the correct byte or you get the wrong result.

On a little endian machine there's no difference, the byte, word and long portion of the word all live at the same address, so doing the byte operation on a 32-bit entity in memory somewhere is no different than if you'd used a register. For a non-load/store architecture in particular, this is a good thing (tm).

As for why this discrepancy between memory and register "addressing" needs to exist at all, it's just the side effect of deciding to store bytes most significant digit first. However, a machine doesn't need to store numbers "the way you write them down", that's an anthropomorphism. What it should focus on is consistency.

Most of the time, endianness is not an issue. But after 30+ years of coding, if I had to choose, I'd say little endian makes more sense for a machine, even if not for a human.

_________________
Doing stupid things for fun...

Status: Offline

Karlos

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 19:34:00

[ #89 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

Also, the argument about "how numbers are written" is somewhat nonsensical too. We use the 0-9 digit system introduced to Europe by the Arabs. In Arabic, the numbers are written exactly as we write them, with the most significant digit to the left. However, Arabic is an RTL writing scheme and consequently, numbers are written down, and read least significant digit first.

_________________
Doing stupid things for fun...

Status: Offline

kolla

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 19:57:29

[ #90 ]

Elite Member

Joined: 21-Aug-2003
Posts: 2917
From: Trondheim, Norway

@Karlos

‘cause the Arabs got it from the Indians.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

Karlos

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 20:06:38

[ #91 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@kolla

They got the concept of zero as a distinct number from there. Both cultures were using a decimal counting scheme independently for most, if not all, of their recorded history (makes sense if you are counting on fingers). Hence why the scheme is known as Indo Arabic. None of which changes the fact that it was the Arabic specific notation which found its way into Europe. There are buildings from the middle-ages that have the actual eastern Arabic numerals engraved into wooden beams, right here, in the UK, even though it was ultimately the western Arabic style that eventually became dominant.

Last edited by Karlos on 20-Nov-2023 at 08:07 PM.

_________________
Doing stupid things for fun...

Status: Offline

NutsAboutAmiga

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 21:05:27

[ #92 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12825
From: Norway

@Karlos

perhaps we should have used the Sexagesimal format.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

Karlos

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 21:13:38

[ #93 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@NutsAboutAmiga

Whatever floats your boat.

_________________
Doing stupid things for fun...

Status: Offline

bhabbott

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 22:36:21

[ #94 ]

Regular Member

Joined: 6-Jun-2018
Posts: 339
From: Aotearoa

@Karlos

Quote:

Karlos wrote:

Imagine for a moment you load a 32-bit value from memory to a register, then perform some byte sized operation and writing back the whole 32-bit value. Then contrast that with performing the same byte sized operation directly on the same 32-bit word memory. You have to remember to increase the address by 3 to make sure you modify the correct byte or you get the wrong result.

But performing 'byte' arithmetic on a 32 bit value doesn't make sense. So you don't have to remember anything because bytes and longwords are different types that shouldn't be mixed.

This reminds me of the 'bug' in SACS V6.5.0 where the bit order in bit-field structures was switched around. Provided you only used those bit-fields inside the program it was fine, but anything else expecting them to be in a particular order would blow up (I put 'bug' in scare quotes because the C standard doesn't specify a bit order or what padding to apply between bit-fields, It's all 'implementation defined').

Quote:
As for why this discrepancy between memory and register "addressing" needs to exist at all, it's just the side effect of deciding to store bytes most significant digit first. However, a machine doesn't need to store numbers "the way you write them down", that's an anthropomorphism. What it should focus on is consistency.

Yes. That's why you need to be careful when using Texas Instruments chips from the 70s and 80s, because they designated bits in the opposite direction. Bit 0 was the most significant bit, and bit 7 (byte) or 15 (16 bit word or address) was the least significant bit. Now you might think who cares so long as it's consistent, but having it backwards compared to the weighting (eg. bit '15' is 2^0, not 2^15) was mighty confusing.

In computers anthropomorphism isn't a bad thing. A computer is effectively an extension of the human mind. We design and program them to do things the way we would if we were inside a computer. It may not be the most efficient way, but it's a way that we can understand. That's starting to break now as we apply AI techniques like genetic algorithms and machine learning. In the future we may have blobs of code that nobody can understand, all we know is that they (hopefully) work.

Quote:
Most of the time, endianness is not an issue. But after 30+ years of coding, if I had to choose, I'd say little endian makes more sense for a machine, even if not for a human.

I'd say most of the time it doesn't matter much to the hardware, but sometimes it does.

For example an A/D converter with SAR (successive approximation register) outputs bits from most significant to least significant, with each bit providing another bit of precision as the remaining voltage difference is divided by 2. This can be sent directly down a wire in big-endian, but in little-endian you have to store up the whole word and send it 'backwards' when complete, increasing latency and requiring more transistors.

It may also matter when interfacing to humans, which computers often have to do. For example we struggle to read text that is reversed every 4 characters, and numbers are worse. Computers are not humans, but they do have to 'anthropomorphize' themselves to communicate with us.

Last edited by bhabbott on 20-Nov-2023 at 10:37 PM.

Status: Offline

OneTimer1

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 23:02:30

[ #95 ]

Cult Member

Joined: 3-Aug-2015
Posts: 984
From: Unknown

@Karlos

Quote:

Karlos wrote:

Endianness is one of those weirdly emotive things.

It's only a problem if you share data. You know, having 68k compatible shared structures that must be compatible with other shared structures.

A new AmigaOS shouldn't use shared structures at all and 68k software should be kept in an UAE like sandbox.

This sentence was me, shouting 'Jehova'

Last edited by OneTimer1 on 20-Nov-2023 at 11:08 PM.
Last edited by OneTimer1 on 20-Nov-2023 at 11:07 PM.

Status: Offline

Karlos

Re: some words on senseless attacks on ppc hardware
Posted on 20-Nov-2023 23:29:58

[ #96 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@bhabbott

There are plenty of reasons why you might want to access words, bytes and longwords at the same address. C has unions for this very reason. And if you want an actual use case, how about CPU emulation, in particular, emulating the registers of a CPU?

Take the 68K data registers and dump them to some memory location. There's 8 32-bit words, maybe organised so that the lowest is d0, the upper is d7, all nice and logical. But if you want to perform a byte operation on one of these mimicking how the 68K would actually do it, you have to fanny around with offsets because although the actual 68K register behaves like a union on a little endian machine, on any big endian memory model, the physical memory does not.

As I said. Little endian may not fit our particular human left-to-right reading sensibilities, but it's more sensible for a machine.

_________________
Doing stupid things for fun...

Status: Offline

matthey

Re: some words on senseless attacks on ppc hardware
Posted on 21-Nov-2023 0:15:45

[ #97 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2026
From: Kansas

cdimauro Quote:

That's for SASC. Other compilers could do (very) different.

You can't take a single case and generalize to the overall situation...

Correct. It is surprising that optimizing for size in SAS/C compiled programs doesn't make more of a difference. Usually better quality support for a target architecture and more mature compilers will make more changes with different -O compiler options. Vbcc should make some changes. I know vasm uses different peephole optimizations depending on the -O optimization selection and I expect vbcc will stop loop unrolling and any function inlining that would increase code size. This should make a big difference which is, or at least was, expected with GCC for the 68k back when it was better supported.

cdimauro Quote:

Which doesn't work when talking about code density studies. That's very simple to understand: you need to fix ALL variables EXCEPT one (the architectures to be compared).

That's why such studies use a compiler like GCC (some also use LLVM) which is available for the benchmarked architectures, which has an overall good support for all of them.

Have you ever seen Intel's compiler used on such studies? Rarely (usually when the studies involve benchmarking different compilers), because of the above. However you should question yourself which compiler gives (on average) better result for x86 and x64 architectures and you can give yourself the easy answer. Which, unfortunately, does NOT support your statement...

Using the same cross compiler for a benchmark reduces the number of variables but there is a big difference in compiler support for different architectures. There is also an over reliance on GCC which may have flaws for particular targets, especially less supported ones. It would be interesting to compare a basket of compiled executables for a target from any source, especially if one of them has a better result than from GCC. It is easier to spot and compare a good compiled result with -Os than with -On and any better result sets a new standard. Optimizing with -Os is more common for embedded use where GCC may not be the best compiler. Vbcc is targeted more at embedded use but it is hit and miss.

Optimizing for size with -Os doesn't change 68k code much except for a 68040 target. This is because smaller code is usually faster. This is in contrast to x86-64 where high performance code has much reduced code density while optimizing for size will make the code look like x86 code with small instructions, much increased memory traffic and using only 8 GP registers where possible. This was a result of x86 being out of encoding space but instead of x86-64 reencoding based on modern frequency of instruction use, longer encodings were used instead.

cdimauro Quote:

Just by case (well, not really: I've some interest), yesterday I've spent a lot of time tinkering on VBCC, VLink and VAsm, generating cross-compilers/assemblers at least for 68k, i386/x86 and ARM (which to me are the most important architectures nowadays. I haven't setup anything for PowerPC for this reason) and I've also tinkered and inspected their sources (VBCC and VAsm, specifically. VLink looks like architectures independent).

What came out is that VBCC and VASM are very very strong / good for 68k and poor / very bad for i386/x86 and specially for ARM.
VBCC basically supports only the 80386 + the x87 FPU. There's no trace of newer architectures, SIMD, etc..
VAsm supports even the x64, but only up to the first architecture and it's also incomplete (a lot of stuff is missing).
ARM support... well... no comment: I've just tried to compile the simple Fibonacci and VBCC generated an empty file... I haven't check VAsm, anyway (because I've no source generated to be tested and I was too lazy to try something else).

Even checking at their source codes, VBCC has 1.5 more for 68k than for i386, whereas VAsm has 2.5 more for 68k than for x86. It's crystal clear that its target is 68k.

ARM support was not working the last I heard. Yes, it is strange for a compiler targeting embedded use. ARM is more difficult to support than it first appears. There are essentially 4 ISAs with different modes and many extensions and variants until AArch64 which is standardized but huge. It would be difficult to compete with the support in big compilers. Vbcc has actually become as much of a niche market compiler for odd targets as it has for embedded use. There needs to be interest, cooperation and incentive to support a target. PPC is supported well as Volker worked with PPC in the automotive embedded field. Volker and Frank are 68k Amiga users too. The x86(-64) support is minimal but adequate for cross compiling and testing. They could use an x86(-64) expert if you want to volunteer to improve it while implementing its replacement.

cdimauro Quote:

VBCC is good because it's a very small compiler and adding a backend for a new architecture (or enhancing an existing one) requires much less effort compared to LLVM and GCC, but the results are also very limited (because you can make benchmarks mostly only against 68k, which is the most advanced architecture available).
Vasm is more or less the same.

The lack of dependencies is one of the biggest advantages. Vbcc can be downloaded and recompiled with itself without downloading other software packages. The compiler frontend is modern and sophisticated enough. It's not a good sign for the future that the 68k and PPC are the most mature backends but it has been good for the (cursed?) Amiga.

cdimauro Quote:

Unfortunately their source code doesn't look good: it's difficult to read because its terse, with a few or no comments (many are in German! But it's not a bit problem, because I understand something), scarse documentation.
And, what's even worse is that... there's no test suite for them! So, you change something and then you've to compile and check yourself if the produced output is what you expect. That's really terrible: it's like developing at the Stone Age time!

Anyway, maybe Vasm could be worth a try: it's simple enough and adding some test suite shouldn't be a big problem for me (that's what I've did in the last two decades. And Python helps a lot here).

Volker's C code is all bunched together and not to my liking either. He is a doctor and there is a joke here about how unreadable doctor writing is. I thought the commenting was ok but it is in German. There is documentation in the vbcc manual for writing a backend and it appears to be much simpler than for some other compilers. There is a test suite and support code available for developers. Some of the support code is copyrighted so it is not available to everyone but it is available for developers that need it. That's what I worked on so it isn't too difficult to get. E-mail Frank Wille if you are interested. He is extremely helpful if you have questions and is the main person I dealt with. He is practically a support team himself which is a game changer for development efforts.

Status: Online!

matthey

Re: some words on senseless attacks on ppc hardware
Posted on 21-Nov-2023 2:15:31

[ #98 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2026
From: Kansas

Hans Quote:

It's not the OS' endianness, but the CPU & GPU having opposite endianness. Vulkan's data buffers don't come with detailed descriptors of their contents. Endianness conversion is only possible if you know what format the data is ahead of time (16-bit endian conversion is different from 32-b or 64-bit). Getting the GPU to do endianness conversion would waste a lot of cycles, and also render texture interpolation/filtering hardware useless (because your endianness code runs after texture fetching). It's not feasible.

I thought endian conversion was correct if the data was swizzled at the time it was written and in the datatype size it was written. Like I mentioned before, the PPC can use the MMU to map page endianess. It is supported on the PPC 440 and even in the P1022 surviving embedded hardware castration.

http://datasheets.chipdb.org/IBM/PowerPC/440/PowerPC-440-Core.pdf Quote:

Big Endian and Little Endian Support
The PPC440 supports big endian or little endian byte ordering for instructions and data stored in external memory. The PowerPC Book E architecture is endian neutral; each page in memory can be configured for big or little endian byte ordering via a storage attribute contained in the TLB entry for that region. Strapping signals on the PPC440 core initialize the beginning TLB entry’s endian attribute, so the PPC440 can boot from little or big endian memory.

...

Each page of memory is accompanied by a set of storage attributes. These attributes include cacheability, write through/write back mode, big/little endian, guarded and four user-defined attributes. The user-defined attributes can be used to mark a memory page with an application-specific meaning. The guarded attribute controls speculative accesses. The big/little endian attribute marks a memory page as having big or little endian byte ordering. Write through/write back specifies whether memory is updated in addition to the cache during store operations.

I expect the Vulkan data buffers you refer to are in GPU memory so not the CPU memory. They should still have MMU pages as the GPU memory needs to be marked as non-cacheable without HSA. I was under the impression than changing the endianness of the MMU pages would swizzle the data as written in the datatype sizes written while the GPU may not know the datatype sizes unless they are defined for a structure. GPU swizzling doesn't sound like a good option as there is overhead and the datatype is not always known. The PPC MMU method is very cheap though as swizzling the store data is minimal logic. I'm not a fan of PPC but the MMU endianness bit is a nice feature. I believe PPC had the feature before POWER but I expect it made the conversion of POWER to little endian much easier. Just mark new MMU pages as little endian when setting up while older programs can still use big endian pages for compatibility. IBM choosing to switch POWER from BE to LE demonstrates that endianness is important for software while switching may have been encouraged by minimal changes needed to hardware.

Hans Quote:

Another option would be for all Vulkan game ports to convert their data to little-endian before sending it to the GPU *if* the CPU & GPU endianness differ (which it currently does on all supported hardware, but this isn't guaranteed). Good luck with that...

One of the 68k Warp3D Avenger libraries did that. Data was swizzled before being written to GPU registers. It bloated up the code not to mention the performance impact. The compiler made a mess of it in places too. Immediate/constant data could have been pre-swizzled but instead were loaded and then swizzled at execution time. Does anyone look at the code the compiler generates? It was 68k code not PPC code after all.

Hans Quote:

The Vulkan specification was a draft when I was designing Warp3D Nova. I studied it, and realized it wasn't feasible unless both the CPU & GPU use the same endianness. This problem is why Warp3D Nova has functions to set the buffer layout. That way the endianness conversion can be done by the drivers

The other option is an integrated GPU which still has big and little endian support. The physical address lines can be swizzled too so there are more options. Some of the embedded GPUs may have better big endian support as there are a few big endian CPUs still being used like MIPS/Loongson and PPC. Big endian CPUs used to be preferred for networking. Some embedded GPUs are very advanced like the high end Imagination Technologies GPUs but balance performance and power.

Status: Online!

Hans

Re: some words on senseless attacks on ppc hardware
Posted on 21-Nov-2023 5:21:17

[ #99 ]

Elite Member

Joined: 27-Dec-2003
Posts: 5067
From: New Zealand

@matthey

Quote:
I thought endian conversion was correct if the data was swizzled at the time it was written and in the datatype size it was written. Like I mentioned before, the PPC can use the MMU to map page endianess. It is supported on the PPC 440 and even in the P1022 surviving embedded hardware castration.

I looked into that, but it's not a workable solution. Here's a simple situation where it fails: a program writes data to a big-endian page, and copies it to a little-endian page. The memory copy routine has no idea of the data it's copied, and therefore cannot do any conversion. So, big-endian data ends up in a "little-endian page." It doesn't matter if the memory copy is done via DMA or CPU.

Bulk data copies are common in graphics.

Quote:
The other option is an integrated GPU which still has big and little endian support. The physical address lines can be swizzled too so there are more options. Some of the embedded GPUs may have better big endian support as there are a few big endian CPUs still being used like MIPS/Loongson and PPC. Big endian CPUs used to be preferred for networking. Some embedded GPUs are very advanced like the high end Imagination Technologies GPUs but balance performance and power.

Know of any modern GPUs on PCIe cards that still have big-endian support? The Northern Islands (Radeon HD 69xx series) were the last Radeon cards to be fully bi-endian.

I've said it before: my personal opinion is that we need to accept that little-endian "won," and make the switch to little-endian just like IBM has done.

Hans

Status: Offline

Hans

Re: some words on senseless attacks on ppc hardware
Posted on 21-Nov-2023 5:34:21

[ #100 ]

Elite Member

Joined: 27-Dec-2003
Posts: 5067
From: New Zealand

@Karlos

Quote:
Endianness is one of those weirdly emotive things. The thing I always found odd, however, is that the 68K and PPC both basically treat their register files as little endian anyway. You do a byte or word sized operation on a register and it's always the least significant portion that is affected.

I'm grateful that they did, or porting software between big and little endian CPUs would be even harder.

This is all legacy from the days when one company would do the opposite of what their main competitor did just to be different. Or, they'd do it differently to get around patent/copyright issues.

The silliest thing I remember encountering at university, was that everyone except National Semiconductor had to use the term "three-state" instead of "tri-state" because National Semiconductor had trademarked the term. Tri-state rolls of the tongue slightly easier. What a competitive advantage!

Hans

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle