Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6220 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

22 crawler(s) on-line.

95 guest(s) on-line.

0 member(s) on-line.

You are an anonymous user.
Register Now!

amiwell: 40 mins ago

zipper: 52 mins ago

pixie: 1 hr 4 mins ago

Karlos: 1 hr 25 mins ago

amigakit: 1 hr 26 mins ago

MEGA_RJ_MICAL: 2 hrs 16 mins ago

Hammer: 2 hrs 30 mins ago

Pelsaert002: 2 hrs 45 mins ago

OneTimer1: 2 hrs 52 mins ago

Mobileconnect: 2 hrs 52 mins ago

Forum Index

Amiga Development

Packed Versus Planar: FIGHT

Poster

Thread

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 11:32:49

[ #521 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

Salute Karlos!

Quote:

Karlos wrote:
@Gunnar

How wide is D2 ? If the answer is 32 bits and you are using into process single 32-bit ARGB values at a time, it's not really SIMD, is it? Sure it's faster than doing each channel separately but the "data element" in the operation here is a 32-bit pixel, your code is doing them one at a time.

I appreciate this is splitting hairs.

All 68080 Registers are 64bit.

[quote]
68080 PROGRAMMING MODEL

16 64-Bit Address registers (A0-A15)
8 64-Bit General Purpose Data registers (D0-D7)
8 64-Bit FPU registers (Fp0-Fp7)
24 64-Bit General Purpose Data registers (E0-E23) which can be used by both ALU and FPU.

32 Data Registers (D0-D7, E0-E23)
These registers are for bit and bit field (1 - 32 bits), byte (8 bits), word (16 bits), long-word (32 bits), and quad-word (64 bits) operations. D0-D7 can also be used as index registers in EA calculation.

32 FPU Registers (Fp0-Fp7,E0-E23)
The FPU has access to 32 work registers. In addition to this FPU instructions can also use register also use the 8 Dn Register as source.
Therefore the FPU has 32 registers it can update with calculation,
and 40 registers it can use as source.

16 Address Registers (A0-A15)
These registers can be used as software stack pointers, index registers, or base address registers. The base address registers can be used for word and long-word operations. Register A7 is used as a hardware stack pointer during stacking for subroutine calls and exception handling. In the user programming model, A7 refers to the user stack pointer (USP).

68K Funktion Programming model
A7 is defined as stackpointer
D0,D1,E0-E23 are regarded as scratch registers

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 11:43:46

[ #522 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

I clearly haven't read the manual, have I?

_________________
Doing stupid things for fun...

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 12:09:46

[ #523 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@Karlos

Quote:

Karlos wrote:
@Gunnar

I clearly haven't read the manual, have I?

In the old days the grumpy experiences guys always used to say:

But isn't first reading the manual, and then talking about the topic, come totally out of fashion?

Last edited by Gunnar on 09-Oct-2022 at 12:10 PM.

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 12:29:18

[ #524 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

I had no idea about the extra registers. Presumably all existing 68K object code treats the registers as 32 bit.

32 sounds a bit excessive unless it's also from vectorisation purposes (i.e. treating a 2 adjacent 64 bit registers as 4 32-bit ints and that sort of thing). Is there any compiler support or is it all for assembly language use?

_________________
Doing stupid things for fun...

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 13:14:44

[ #525 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@Karlos

Hello Karlos, how are you?

Quote:

Karlos wrote:
@Gunnar

I had no idea about the extra registers.

The manual...

Quote:

Presumably all existing 68K object code treats the registers as 32 bit.

Not 100% sure how you mean this.
68K instruction define the operation. For example : CLR.B does clear a BYTE.

Quote:

32 sounds a bit excessive unless.

Lets look at this in detail.
The 68K did had 8 FPU register, 8 Pointer Register, and 8 Data register.

The FPU can read both FPU and Integer registers, and Memory, and Immediate.
This mean it has 16 register as possible inputs.
If you write FPU performance code, like e.g 3D matrix Stuff
then being able to use the Integer as source does really help you a lot to put some variables in them.

The 68K FPU was sequential and all operations needed several clockcycle
In other words it was slow.

Today all good FPU are fully pipelined.
Still all the FPU operation need several clock cycle to finish.
But you can start a new FPU instruction every clock!
This means you have several instruction in flight in parallel.
All modern FPUs work like this - they are all pipelined.
On POWER, on INTEL on ARM, on 68080 - all modern FPU work like this

Typically todays FPUs have about 6 or more operations in flight.
The 68080 can have up to 22 FPU operations in flight in parallel!

To have this working you need more register.
Everyone uses many registers for this.
PowerPC used to have 32 FPU register, but some years ago this was upgraded to 64 register to reach better performance. CELL used 128 register to support this.
ARM uses 32 register and so on.
Also INTEL has a huge of number of register.
Even if you seen them on first look, they use "hidden" internal register for this.

So everyone in the industry knows that to make FPU code good, you need many registers.
8 register are not sufficient. 32 register is a good working number.

Lets look at the address register, the pointers.
Yes, the 8 Pointers of the 68000 are a very good start.
Of the 8 pointers an often seen usage pattern is A7=stackptr, A6=LibraryPtr, A5=LINKregister... A4=Variablebase. Sometimes having more pointers would be very helpful.

Simple Amiga example, Convert Chunky 2 Planar.
1 ChunkyPtr, 8 Planarpointer, 1 stack PTR = with 10 register this is easy to code
While 8 PTR are good, having 16 PTR register makes coding easier - and also increases performance.

I my experience 8 Address register was GOOD, but having 16 is SUPER.
There are often workloads were having a few more pointer registers is great.
Having 16 Address register available is RICH.
There is little need or benefit of more address register in my opinion.

Lets look at DATA register. 8 Data register are nice. And can do lot with them.
But every coder knows the problem that he will often run of registers in a work loop.
And having more registers, makes coding easier, and increases performance.
As every time the CPU runs out of register, it need to compensate this with more costly operations , like spilling register on stack, or doing calculation in memory/stack.
So having more register does improve performance.

And before you say INTEL has 16 register, no one needs more.
INTEL has more than 16 register. But INTEL "hides" them and uses them by implementing very costly hardware renaming logic.

In my opinion a cleaner and also more energy efficient option is to have the register visible in the architecture.

We have many assembly coders and all of them REALLY appreciate that we have more register.

The more register increase performance, but maybe a lot more important:
the more registers help to make assembly coding nicer and easier.

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 14:32:31

[ #526 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

Good, thanks for asking. Yourself?

Quote:
Not 100% sure how you mean this

Ok, so you have 64-bit address registers. What does, for example, pea a0 do? Does it move all 64 bits of a0 onto the stack or just 32 bits worth? Moving 64 is a breaking change for existing software that assumes 32 bit, is it not?

As for lots of registers, yes I am sure assembly language devs like it. Are there any compilers that support it?

Last edited by Karlos on 09-Oct-2022 at 02:34 PM.

_________________
Doing stupid things for fun...

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 15:36:47

[ #527 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@Karlos

Quote:

Good, thanks for asking. Yourself?

very well. Thank you very much.

Quote:

Ok, so you have 64-bit address registers. What does, for example, pea a0 do? Does it move all 64 bits of a0 onto the stack or just 32 bits worth? Moving 64 is a breaking change for existing software that assumes 32 bit, is it not?

Thanks for the example. Now I understand your.

As you said that didn't read the manual, let me give a bit of general info
about the 68080 design. Maybe this will make it easier to understand.

(1) All the 68080 registers are 64bit
(2) The CPU external bus interface to memory is 64bit for data and with 32 Address lines
This means an 68080 @ 100MHz can write up to 800MB/sec to memory.
(3) The CPU Icache is 128bit wide. The CPU can read and decode 128 bits of instructions per clock.
(4) The CPU Data-cache is 64bit wide and support the following feature per cycle
a read (even misaligned for no extra cost)
a write (even misaligned for no extra cost)
a parallel executed prefetch from memory.
(5) The execution units can do 64 DATA operations, like MOVE/ ADD/ AND/ you name them per cycle

This means the CPU is designed to process a good amount per cycle.
It can move and process a lot of data around, and has no issues with memory alignments.
This makes it very easy to program.
Programmers will recall what pain misalignment was on the 68000.

The address lines are 32bit!
With Bill Gates words - no one needs more than 4GB memory on Amiga OS.

Having 64bit register helps a lot to improve throughput
Having 64bit memory bus highly improves performance
Having 64bit Move and SIMD operations highly improve performance
Having 32bit address connected keeps 100% Amiga compatibility - and we not more memory today.

This means the CPU is 100% compatible with existing Software.
PEA as example is 100% compatible with 32bit on stack
You can also use AN register for 64bit operations.

Quote:

for lots of registers, yes I am sure assembly language devs like it. Are there any compilers that support it?

GCC supports several feature already, like a number of new improved instruction. The extra register not yet.

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 17:27:16

[ #528 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

Interesting. Is there a specific reason the address registers are 64 bit? Did it simplify the design having all registers the same width? Are there any plans to "Go 64" and support other 64-bit capable OS ?

When designing the MC64K VM (it's not a CPU, it's not compatible in any binary sense, it's just a bytecode interpreter with a 68K inspired assembler front end), I settled on 16 64-bit GPR (data or address) and 16 FPR. Pretty much every dyadic instruction (with a couple of exceptions) can use the full gamut of effective addressing modes (most of the 68000 ones) for each operand simultaneously. For register to register operations there's an alternative compact encoding and 16 is the sweet spot, packing both into a byte.

Performance wise, on an old mobile i7, I'm getting around 500MIPs for an example Mandelbrot generator (2048x2048) down to about 130 for dyadic EA modes (EA decode call overhead). This seems ok for an interpreter but I knew there would be some things the design won't be good at. There's no SIMD. Not enough FPU registers for 4x4 matrix on homogenous 4f coordinate driven 3D operations. To solve this, it has a "host trap" mechanism, a very low latency (about the same as an unconditional branch) means of calling native operations. It uses two immediate operand bytes, one defines a broad area of concern, the second the specific operation. Any parameters remain in registers. This is ultimately how I've interfaced it to "the outside". It includes a few library like components, one of which is a whole matrix/vector library.

Last edited by Karlos on 09-Oct-2022 at 06:13 PM.
Last edited by Karlos on 09-Oct-2022 at 05:29 PM.

_________________
Doing stupid things for fun...

Status: Offline

NutsAboutAmiga

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 17:51:50

[ #529 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12993
From: Norway

@Gunnar

With that many registers, wont you have where long context switching between tasks, you need to save and restore, so many registers, or can you only use these new registers, when have disabled multitasking. Or do you patch the kernel, or is this only supported on AROS kernel?

Last edited by NutsAboutAmiga on 09-Oct-2022 at 05:53 PM.
Last edited by NutsAboutAmiga on 09-Oct-2022 at 05:52 PM.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

matthey

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 20:46:28

[ #530 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2747
From: Kansas

NutsAboutAmiga Quote:

With that many registers, wont you have where long context switching between tasks, you need to save and restore, so many registers...?

Now you PPC fans who were so proud of 32 GP integer registers for so many years mention the down sides.

1) longer context switches
2) extra function prologue & epilogue register saving and restoring overhead (POWER/PPC)
3) increased instruction traffic from larger code offsets some of data traffic reductions *

* PPC lost performance because of 32 GP register code bloat with limited memory bandwidth common on embedded systems. This is why compressed RISC encodings were used for embedded use. Motorola used PPC for embedded markets anyway despite having one of the best compressed code ISAs in 68k which was already the 32 bit embedded champion.

At least the PPC 32 GP integer registers are orthogonal and likely do give a few percent of performance on higher end systems. There is no increased cost or additional decreased code density from using all 32 registers and 3 op instructions are available which give a minor advantage. The separate register files for FPU/SIMD unit use can also go to sleep when these units are not used which is often.

CISC reg-mem CPUs do much better with 16 GP integer registers than load/store RISC CPUs. Rather than spilling a register to memory, they can often use variables in memory instead with a fraction of the overhead of most RISC CPUs. Some developers overlook or ignore the CISC advantages though. Some even unnecessarily bloat the code and increase encoding overhead reducing or eliminating CISC advantages. The 68060 was able to predecode most 68k instructions in a single stage/cycle of an 8 stage pipeline using a table lookup. That code is then ready for RISC like execution but has the advantage of more powerful instructions with pipelined OP+mem accesses. A similar 68k 64 bit mode could use a different table allowing optimum reencoding of the whole ISA. I would start with the field for size=8,16,32,64 bit which is a cleaner encoding scheme. The other option is to go down the x86-64 road where 64 bit ops need a REX prefix which is likely used about 40% of the time with a 64 bit OS (the average x86-64 instruction increase was ~.4 which is 40% of a 1 byte REX prefix). Maybe a 64 bit 68k would use the REX prefix less as it already has most of the integer registers it needs but the cost of a prefix is twice as much at 2 bytes and there is still a decoder overhead increase. Reencoding for 64 bit offers efficiency increases as the length of the instructions can be adjusted for the frequency of use instead of using 68000 encodings where all ops are 16 bit. A reencoded 68k 64 bit ISA could likely come very close to the same code density of the 68k 32 bit ISA and the simpler 16 bit encoding is easier to decode than the 8 bit x86-64 encoding. That's the way I see it not that anybody listens to arm chair experts.

Last edited by matthey on 14-Oct-2022 at 01:35 AM.
Last edited by matthey on 09-Oct-2022 at 09:18 PM.
Last edited by matthey on 09-Oct-2022 at 08:55 PM.
Last edited by matthey on 09-Oct-2022 at 08:52 PM.

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 21:28:57

[ #531 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

Quote:
Reencoding for 64 bit offers efficiency increases as the length of the instructions can be adjusted for the frequency of use instead of using 68000 encodings where all ops are 16 bit. A reencoded 68k 64 bit ISA could likely come very close to the same code density of the 68k 32 bit ISA and the simpler 16 bit encoding is easier to decode than the 8 bit x86-64 encoding. That's the way I see it not that anybody listens to arm chair experts.

Why don't you try it then? What is stopping you from devising an optimum code-density encoding for a hypothetical 64-bit extension to 68K and implementing a basic interpreter or simple disassembler to validate it works in practise? Why should someone else have to validate the ideas of an "armchair expert"?

Last edited by Karlos on 09-Oct-2022 at 09:57 PM.
Last edited by Karlos on 09-Oct-2022 at 09:56 PM.
Last edited by Karlos on 09-Oct-2022 at 09:29 PM.

_________________
Doing stupid things for fun...

Status: Offline

matthey

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 22:49:15

[ #532 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2747
From: Kansas

Karlos Quote:

Why don't you try it then? What is stopping you from devising an optimum code-density encoding for a hypothetical 64-bit extension to 68K and implementing a basic interpreter or simple disassembler to validate it works in practise? Why should someone else have to validate the ideas of an "armchair expert"?

I'm not asking anyone to do anything. It's a waste of time to design a 68k ISA for real hardware or maybe you would have done it yourself. The Amiga future is emulation, virtual machines and FPGA optimized ISAs and CPU cores. Remember?

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 9-Oct-2022 23:22:07

[ #533 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

Who said anything about real hardware? I'll ask again.

Quote:
A similar 68k 64 bit mode could use a different table allowing optimum reencoding of the whole ISA. I would start with the field for size=8,16,32,64 bit which is a cleaner encoding scheme.

If you have a vision of how to create a neat, space efficient 68K instruction set encoding for some hypothetical 64-bit extension of the
68K, why not do it? You make enormous posts about it and throw around numbers frequently while (assuming the comment is self-referential) simultaneously bemoan that nobody listens to your armchair expertise.

Given that nobody else is going to do it, whether you are asking or not, you could just go right ahead and do it yourself. Define your opcode layout and write, at the very least, a basic assembler and disassembler front end or a simple interpreter for it to demonstrate it works. Show us how you would keep the code density and backwards compatibility with existing 32-bit object code of the 68K we all know and love while extending it to support 64 bit operations.

Last edited by Karlos on 09-Oct-2022 at 11:48 PM.

_________________
Doing stupid things for fun...

Status: Offline

cdimauro

Re: Packed Versus Planar: FIGHT
Posted on 10-Oct-2022 4:30:41

[ #534 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4432
From: Germany

@Gunnar

Quote:

Gunnar wrote:
Quote:
Much better would have been to use a different syntax for the SIMD registers, like S2 in this case instead of D2, because D2 is causing confusion, like for you.

You misunderstand this.
This is not another register.
AMMX can also work on the Integer Register and this code example did use the Integer Register D2.

I fully understood and I knew very well of what I was talking about.

In fact it was a SUGGESTION to change the data register names IN THIS SPECIFIC CASE: when used on SIMD instructions.
Quote:
Quote:
Yes it tells me it's loading into D2. I don't see it any other way. Redesign it before they start writing assembler parsers.

You wrongly assume the SIMD unit would be limited to only new registers.
This is not the case.

In my opinion a very important feature of the 68K architecture is - that it always had less limits than some other architectures.

For example the 68K FPU - its not limited to FPU registers only.
The FPU could read /move from and to INTEGER registers.
Hammer made a point how Quake suffers on PowerPC FPU
and the reason for this performance problem
is the limitation of the PowerPC FPU not being able to write to INT registers.

Like the 68K FPU, the AMMX SIMD unit can also use Integer registers as source and destination.

Which I've already reported here some weels ago and you said nothing. So, I knew very well it.

But actually in THIS part you quoted HypeX which didn't know some AMMX details and was just asking.
Quote:
Please read the 68080 documentation.
Than you make less wrong posts based on false assumptions.

No, next time you've to understand what people are saying before start typing on the keyboard.

And, more important, you should learn quoting, since you quoted one sentence from me and another from HypeX, and mixed up all together...

Status: Offline

cdimauro

Re: Packed Versus Planar: FIGHT
Posted on 10-Oct-2022 4:54:20

[ #535 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4432
From: Germany

@Gunnar

Quote:

Gunnar wrote:
Quote:
Irrelevant. Another logical fallacy, but we know that you're used to them. You luck elementary logic.

Quote:
Irrelevant again. You're the king of logical fallacies.

Lets talk about this:

I do love Amiga and many other people love Amiga too.
Many Amiga lovers can code, but some can not code.

Of the many coding one can do, some coding tasks are more difficult than others to do.
For example its often much easier to make a small demo, or write a small tool than making a full game. Making a game needs a real good coder.

All Amiga lovers that I know that still use Amiga and can code,
all enjoy coding and again and again continuously do this.
They write a demo here, a nice tool which end in Aminet there.
Maybe they write some improved datatype, maybe even a game.
But you need to be a good coder to make a game!

My friend, Cesare Di Mauro, you like to speak a lot about your coding inventions.
How you invented a new double buffering for Amiga ..
How you did something never seen on Amiga before.
How you made games.

The thing is that all the people that I know, that can code on Amiga - do code on Amiga again and again.

Are you really the one exception in the whole world?
Have great coding skills but never actually does code for Amiga at all?

Do you understand how strange and unbelievable this looks to all of us?

Dear Gunnar von Goebbles,

you continue repeating a pile of logical fallacies because it's quite evident that you have nothing else to "argue" on a discussion where you was destroyed each time, since I replied on everything and you weren't able to do the same.

Specifically, you built-up a completely invented story as an Amiga demo or game coder should have done. Something which you imagined and which only stays on your mind; which is something that should advice an big alert, since you are completely detached from reality.

In fact, if you created this scenario on your mind and of course gave absolutely NO proof it. People should believe it just because you're written. This could happen only with your Minions. However logic and science demand it to you: YOUR is the claim and YOUR should be burden of proof. Which is systematically missing from you. As usual. And as people can always see by chronologically reading all posts on any discussion which involved you.

So, those are logical fallacies, of course.

But you continue with some other, because it wasn't enough for you, eh!

In fact you might have knew of SOME Amiga developers that behaved and assumed that ALL others should have done the same. Well, it's another and very common logical fallacy.

In fact, I was one which did NOT followed this path. And I'm not the only one, since the same did Dario Merola, Fightin' Spirit main coder. The same for Davide Busetta, which was the musician but also contributed to game code (because he was and is a very talented coder): he did nothing after that.
Actually also the graphic artists did NOTHING else.

So, as you can see, this shows that your claim is plainly false.

Anyway, your logical fallacies are so big that it nobody sane could believe them. In fact, did you talked to ALL coders which did demos and/or games for the Amiga? ALL OF THEM? And ALL of them still produced stuff for the Amiga after the Commodore bankrupt? Care to PROVE it?

Anyway, it's also very easy to rebut it without to go deep with the logical fallacies. You said "all of us". Which means also me. And I've already said that I did something completely different. So, I've easily rebutted your statement.

This is elementary logic: what you're missing.

Gunnar, Gunnar, Gunnar...
Quote:
This is like someone claiming to be guitar rockstar, being a fantastic super player.
Do you have a guitar?
No! This is irrelevant!

Did you play guitar in the 30 years?
No! This is irrelevant!

This is all irrelevant. I real talented player never needs to play.
I'm a super talented guitar player.

This sounds just like a lot of bullshit to all of us.

Again same logical fallacies. And with "all of us" I'm included, so the elementary logic confutes you again.

Anyway, what I see is a classic pattern that on your compatriot classified as "envy of penis".

In fact, I worked on the Amiga squeezing the most of it one of the best game for this platform (and another was in progress). So, I've a place on Amiga history.

Whereas you did NOTHING for the Amiga (A-MI-GA). And take credits for some quick and mediocre port which didn't used the available resources; rather the contrary.

The difference is quite evident and you seem to suffer for that. Poor Gunnar.
Quote:
Dear Friend, Cesare Di Mauro,

And when we talk about bullshit that were involved, then I always have to think about your involvement in the TINA project.

In the TINA project of you and your friends. Cesare Di Mauro, you and your friends did clearly name the ALTERA FPGA model the project would be using and you gave many technical details of TINA implements.

You and your friends boasted around with the claimed hardware features.
Just the same way like you brag with your invention on you unreleased game.

You guys bragged around that TINA has
- a 128Bit memory
- and 400 MHz CPU clock

But if you look at the ALTERA manual of this FPGA model, then we clearly see the memory bus can not go over 32bit, and the clockrate can not go over 200 MHz.

"If you repeat a lie often enough, people will believe it, and you will even come to believe it yourself." - Joseph Goebbels

Same as before. Replied here: https://amigaworld.net/modules/newbb/viewtopic.php?topic_id=44169&start=200&post_id=855345&order=0&viewmode=flat&pid=0&forum=17#855345

Dear Gunnar von Goebbles, you should have understood that repeating lies doesn't work with me.
Quote:
So all this bragged what was it?
Now tell me, were all the values claimed for the TINA project bullshit? Yes or No?

Well, you know what: I can answer you. I've absolutely no problem on that.

But I'll do AFTER that you answer to ALL questions that I've asked you. I think that it's fair, right?

Status: Offline

cdimauro

Re: Packed Versus Planar: FIGHT
Posted on 10-Oct-2022 4:58:04

[ #536 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4432
From: Germany

@Gunnar

Quote:

Gunnar wrote:
@Karlos

Hello Karlos, how are you?

Quote:

Karlos wrote:
@Gunnar

I had no idea about the extra registers.

The manual...

Quote:

Presumably all existing 68K object code treats the registers as 32 bit.

Not 100% sure how you mean this.
68K instruction define the operation. For example : CLR.B does clear a BYTE.

Quote:

32 sounds a bit excessive unless.

Lets look at this in detail.
The 68K did had 8 FPU register, 8 Pointer Register, and 8 Data register.

The FPU can read both FPU and Integer registers, and Memory, and Immediate.
This mean it has 16 register as possible inputs.
If you write FPU performance code, like e.g 3D matrix Stuff
then being able to use the Integer as source does really help you a lot to put some variables in them.

The 68K FPU was sequential and all operations needed several clockcycle
In other words it was slow.

Today all good FPU are fully pipelined.
Still all the FPU operation need several clock cycle to finish.
But you can start a new FPU instruction every clock!
This means you have several instruction in flight in parallel.
All modern FPUs work like this - they are all pipelined.
On POWER, on INTEL on ARM, on 68080 - all modern FPU work like this

Typically todays FPUs have about 6 or more operations in flight.
The 68080 can have up to 22 FPU operations in flight in parallel!

To have this working you need more register.
Everyone uses many registers for this.
PowerPC used to have 32 FPU register, but some years ago this was upgraded to 64 register to reach better performance. CELL used 128 register to support this.
ARM uses 32 register and so on.
Also INTEL has a huge of number of register.
Even if you seen them on first look, they use "hidden" internal register for this.

So everyone in the industry knows that to make FPU code good, you need many registers.
8 register are not sufficient. 32 register is a good working number.

Lets look at the address register, the pointers.
Yes, the 8 Pointers of the 68000 are a very good start.
Of the 8 pointers an often seen usage pattern is A7=stackptr, A6=LibraryPtr, A5=LINKregister... A4=Variablebase. Sometimes having more pointers would be very helpful.

Simple Amiga example, Convert Chunky 2 Planar.
1 ChunkyPtr, 8 Planarpointer, 1 stack PTR = with 10 register this is easy to code
While 8 PTR are good, having 16 PTR register makes coding easier - and also increases performance.

I my experience 8 Address register was GOOD, but having 16 is SUPER.
There are often workloads were having a few more pointer registers is great.
Having 16 Address register available is RICH.
There is little need or benefit of more address register in my opinion.

Lets look at DATA register. 8 Data register are nice. And can do lot with them.
But every coder knows the problem that he will often run of registers in a work loop.
And having more registers, makes coding easier, and increases performance.
As every time the CPU runs out of register, it need to compensate this with more costly operations , like spilling register on stack, or doing calculation in memory/stack.
So having more register does improve performance.

And before you say INTEL has 16 register, no one needs more.
INTEL has more than 16 register. But INTEL "hides" them and uses them by implementing very costly hardware renaming logic.

In my opinion a cleaner and also more energy efficient option is to have the register visible in the architecture.

We have many assembly coders and all of them REALLY appreciate that we have more register.

The more register increase performance, but maybe a lot more important:
the more registers help to make assembly coding nicer and easier.

In short: you continue to confuse physical registers with internal / microarchitecture registers trying to justify your ridiculous choice of providing an enormous amount of registers on your ISA.
Quote:

Gunnar wrote:
@Karlos

Quote:

Good, thanks for asking. Yourself?

very well. Thank you very much.

Quote:

Ok, so you have 64-bit address registers. What does, for example, pea a0 do? Does it move all 64 bits of a0 onto the stack or just 32 bits worth? Moving 64 is a breaking change for existing software that assumes 32 bit, is it not?

Thanks for the example. Now I understand your.

As you said that didn't read the manual, let me give a bit of general info
about the 68080 design. Maybe this will make it easier to understand.

(1) All the 68080 registers are 64bit
(2) The CPU external bus interface to memory is 64bit for data and with 32 Address lines
This means an 68080 @ 100MHz can write up to 800MB/sec to memory.
(3) The CPU Icache is 128bit wide. The CPU can read and decode 128 bits of instructions per clock.
(4) The CPU Data-cache is 64bit wide and support the following feature per cycle
a read (even misaligned for no extra cost)
a write (even misaligned for no extra cost)
a parallel executed prefetch from memory.
(5) The execution units can do 64 DATA operations, like MOVE/ ADD/ AND/ you name them per cycle

This means the CPU is designed to process a good amount per cycle.
It can move and process a lot of data around, and has no issues with memory alignments.
This makes it very easy to program.
Programmers will recall what pain misalignment was on the 68000.

The address lines are 32bit!
With Bill Gates words - no one needs more than 4GB memory on Amiga OS.

Having 64bit register helps a lot to improve throughput
Having 64bit memory bus highly improves performance
Having 64bit Move and SIMD operations highly improve performance
Having 32bit address connected keeps 100% Amiga compatibility - and we not more memory today.

This means the CPU is 100% compatible with existing Software.
PEA as example is 100% compatible with 32bit on stack
You can also use AN register for 64bit operations.

Quote:

for lots of registers, yes I am sure assembly language devs like it. Are there any compilers that support it?

GCC supports several feature already, like a number of new improved instruction. The extra register not yet.

And here you gave proof that addresses on 68080 are 32-bit and NOT 64-bit.

So you LIED (guess what!) before when you quoted my sentence...

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 10-Oct-2022 6:29:56

[ #537 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@NutsAboutAmiga

Quote:

NutsAboutAmiga wrote:
@Gunnar

With that many registers, wont you have where long context switching between tasks, you need to save and restore, so many registers,

The context switch time is not a problem at all.

For Interrupt code the more register need not be saved.
An interrupt routine only saves the register it touches.

For task switch code, we need to mind that this done very seldom.
Lets say you run with 100MHz, and do a task switch every 20ms.
This means you have 2 Million clock between a task switch.
This means your CPU might execute 2-4 Million instructions before a task switch.
What are 64 cycle more a register save/restore in comparison to 2 Million?

The 68080 "monitors" if program makes use of the new registers.
And the EXEC does only need to save the registers from program that really use them.

This means for old programs nothing changes in the task switch.
And new program using the more Registers do massive gain performance because of them.
So the cycle saving them is 1000 times worth it.

Also if you look at the 68080 REGISTER file, we did an important trick to make this faster.
Other architectures with 32 Integer Register and 32 FPU register - have to save 64 register
The 68080 only needs to save 40 Register for the same.
We did this on purpose this way to make context switch even faster.

The 68080 has the 8 DATA register, the 8 FPU register, and 24 Extended register which both units can use. This design has many advantages. It makes is very easy to use result from FPU in integer code and visa verse. Like for example Quake does.
In real world applications you sometimes need FPU to use 32 register, and some algorithm need 32 Integer register - its very uncommon that are routine needs both at the same time.

Quote:

or can you only use these new registers, when have disabled multitasking.

Every program can use them. And we do this also in our programs.
RIVA would not be that faster without the registers.

Quote:

Or do you patch the kernel, or is this only supported on AROS kernel?

Yes we replace this part of EXEC.

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 10-Oct-2022 7:13:47

[ #538 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

Matthew Hey,

Your arguments are not correct.

Quote:

1) longer interrupts where registers need saving

Mind, that interrupts code does only need to backup the registers that it uses.
If your Interrupt code uses 2 register, then you need to save 2 register.
Whether your CPU has 8 register total, or 256 register total does not change this.
You only need to save the 2 register that you use.

Quote:

1) context switches where registers need saving

Taskswitch is done so seldom that this is irrelevant.
And a taskswitch has to do a lot more instructions too, that the saving the few register is not relevant.

Quote:

2) extra function prologue & epilogue register saving and restoring overhead (POWER/PPC)

Having more hardware CPU register does not mean that function get more expensive.
Actually it often means function get cheaper as more scratch are available.
A function only needs to save the amount of register that they use.
Which are no scratch registers.

Please mind that the function saving overhead does despends not on the number of CPU register, but on the number of non-volatile registers. In other words this a not a CPU hardware question.
But a software/ programming ABI choice.

The 68000 for example has 8 Address and 8 Data register.
Of these 16 register 4 are scratch A0/A1 D0/D1 and 12 non-volatile register.
The POWERPC has 32 register, of these 9 are scratch register per software ABI.
This means on POWERPC a function has more free scratch to use.
Which does reduce the need to SAVE/RESTORE non-volatile registers.

The 68080 for example added more registers.
And all of the extra register are defined as scratch register.
This means function have more free register - functions get faster
and the need to save/backup non-volatile register is reduced.

Having more register = more scratch register
does reduce the function overhead = this makes them faster.
It exactly the opposite effect than you think.

Quote:

The separate register files for FPU/SIMD unit use can also go to sleep when these units are not used which is often.

You should not talk about ASIC and CPU design if you have no clue.
First of all, IBM POWER chips use a combined register file - exactly like the 68080 does.
Second, register files work very different than your think.
There is not one register file somewhere in the chip.
Real CPUs have local copies of the register file, next to the execution units.
And if a unit "sleeps" then also the read ports of its local copy of the register file can sleep.

Quote:

CISC reg-mem CPUs do much better with 16 GP integer registers than load/store RISC CPUs.

CISC has the advantage that it can use directly memory, and can use immediate better.
This makes the Code in general more compact. = you need less instructions.

RISC generally needs more instruction for the same amount of integer code.
Tracking the number of instruction in flight - is today the CPU design limitation.

RISC CPU have simple decoding.
RISC CPU have no problem to decode 8 instruction per clock,
but tracking them is what not works!
This is why RISC can not be stronger than CISC.

Both CISC and RISC today track the same amount of instructions per cycle.
CISC can do more work with them.

RISC would need to increase the instruction to keep up - but the tracking limitations prevent this.
You need to track the instructions, and this is limit.

The 68080 tracks up to 4 instructions issued per clock cycle. (Often code uses less than 4)
4 instructions per clock cycle is state of the art.

Last edited by Gunnar on 10-Oct-2022 at 07:27 AM.

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 10-Oct-2022 7:41:06

[ #539 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@Karlos

Hello Karlos,

Quote:

Karlos wrote:
@Gunnar

Interesting. Is there a specific reason the address registers are 64 bit? Did it simplify the design having all registers the same width?

As you know the 68K architecture has both Address and Data register.
The Address register are for EA calculation.
But also Data can be used for this.

The Data register are for ALU operations - But also Address register can be used in ALU operations.
Like for example ADD.L A0,D0 = ALU operation using A0 as source.

Quote:

Are there any plans to "Go 64" and support other 64-bit capable OS ?

The Apollo-Team are all 100% Amiga fans.
We all love the programmer friendly 68K CPU,
we love the Amiga chipset, we love the clever design idea of the Amiga DMA,
and we also all like the slim, low overhead, small Amiga OS.

As you know Amiga OS is based on a 32bit address design.
If you want to run Amiga OS than 32bit address is the world in which you live.

We have no desire to run Linux on 68K.
We want to run AmigaOS. Therefore 64bit address makes no sense for us.

In our experience on POWER 64bit address does increase program size and costs some performance. 64bit memory space is useful if you want to run SAP 100 GB in memory database.
This is not what we can or want to do on Amiga OS.

Amiga program are generally small, and need less resources.
4GB is a lot of memory. I believe that the 4 GB memory space of AmigaOS is enough for rest of our lives.

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 10-Oct-2022 8:03:08

[ #540 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

Matthey Hey,

Quote:

That's the way I see it not that anybody listens to arm chair experts.

The problem with you and other arm chair experts is that often talk total nonsense because of your lack of understanding.

You dont't know how an ASIC is internally designed.
You don't know how register files are internally designed.
You have in your head a total unrealistic idea how this is done.
And based on your wrong conception - you again and again repeatedly post the same bullshit - since years.

How many posts did you write in Amiga forums claiming a mixed FPU/INT register file would not work or be bad in an ASIC? How may times?

Your claims are total nonsense.
This design works very good and is often used, by IBM and many others too.

How many times did you post more registers would make functions slower?
Also this claim is totally wrong.
The truth is more register make functions faster.

Matthew Hey,
I was told you work in the farming area.
Are you good at this?
Then give people advice in how breed cattle or which tractor is good. Give advice in areas that you understand.

What you think about an IBM chip designer nerd coming to farm and talking out of his ass about cattle breeding?

Last edited by Gunnar on 10-Oct-2022 at 08:15 AM.

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle