Click Here
home features news forums classifieds faqs links search
6084 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
22 crawler(s) on-line.
 95 guest(s) on-line.
 0 member(s) on-line.



You are an anonymous user.
Register Now!
 t0lkien:  16 mins ago
 amigakit:  16 mins ago
 michalsc:  32 mins ago
 billt:  34 mins ago
 ZXDunny:  49 mins ago
 clint:  1 hr 6 mins ago
 Heimdall:  1 hr 25 mins ago
 Deaths_Head:  1 hr 33 mins ago
 OlafS25:  1 hr 52 mins ago
 amigappc:  2 hrs 26 mins ago

/  Forum Index
   /  Classic Amiga Hardware
      /  DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Register To Post

Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 Next Page )
PosterThread
bhabbott 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 21-Aug-2024 5:03:11
#381 ]
Cult Member
Joined: 6-Jun-2018
Posts: 509
From: Aotearoa

@Lou

Quote:

Lou wrote:

Normally, aka in the real world, you'd do something like this to an array, such as a color plane in memory. Once again the 6502 family would crush the 68000 as it can access memory much faster.

In the real world you are limited by memory speed.

In a typical home computer you were even more limited by the need to prioritize video DMA. On many early systems the CPU could not access display RAM at all during the active display time, either because it was locked out (eg. RCA COSMAC, ZX81) or the display would be corrupted (IBM CGA). However the 6800, 6502, 6809 and 68000 have a handy feature where the first half of the memory cycle is not used by the CPU so video DMA can be interleaved with it.

Other CPUs like the Z80 didn't have that ability, so other techniques were used. The ZX Spectrum stretched the CPU clock until the memory cycle lined up. The Amstrad CPC added a wait state to make all memory cycles take 4 clocks, lowering the effective CPU speed from 4MHz to 3MHz. Machines using the TMS9918 VDP had to access video RAM via the VDP's I/O port, with a programmed delay between each access.

In the real word the 6502 didn't crush the 68000, which had the advantages of a 16 bit data bus, 16 MB flat addressing range and large number of internal 32 bit registers. And it had a direct path to full 32 bit.

As you say, benchmarks like Byte Sieve are pretty worthless for comparing real-world performance. A 6502 might be good for small programs written in highly optimized assembler, but was a poor fit to compilers of the day. It would perform poorly in the Amiga because it didn't have the required addressing range or the ability to multitask large programs efficiently. That's why Jay Miner chose the 68000.

Quote:
In the real world, the cpu doesn't do much math. It uses LUT to get the answer.

In the real world a general purpose computer doesn't use LUTs instead of proper math because the memory usage is onerous.

Quote:
You can have your 140-152 cycle DIV opcode. It's a joke. BRA takes about 12 cycles. You can go down the line and cheer that you can do 16/32bit math faster but it doesn't matter. Your fancy 'programmer-friendly' addressing modes take up to 48 cylces. You NEED 8Mhz minimum just to feel fast.

You are getting hung up on CPU clock speed. What matters is memory cycles. 68000 Bcc takes 3 memory cycles when the branch is taken (2 when not) and has a range of +- 32k Bytes. The 6502 takes 4 cycles if the destination isn't on the same page, and 4-7 cycles if the distance is greater than +-128 bytes (Bcc + Bcc or JMP).

Quote:
Smarter engineers than us/you/all-of-us have already done the analysis. On average a 1Mhz 6502 is generally equal to a 2.47Mhz 68000...add 20% when using a 65C02...add another 25% when comparing to a 65CE02. Deny reality all you want.

Once again, CPU clock speed doesn't matter. The fastest 6502 you could get in 1983 (when the Amiga was designed) was 3MHz. 3 x 2.47 = 7.41, 70% slower than a 12.5 MHz 68000 (introduced in 1982) even by your simplistic measure. But the Amiga used an 8 MHz 68000 throttled back to 7.09 MHz to match the memory cycle time, which was set by the video system.

Smarter engineers chose the 68000 because the 6502 wasn't up to the task. Amiga OS 2+ comes on a 512kB ROM. Let's be generous and assume the 6502 would only need half the space. That still means a massive amount of bank switching just to execute the OS code. Add 1MB of RAM and you see the problem. Your poor little 6502 with its pathetic 256 byte zero page and tiny 256 byte stack would be chasing its tail all day long just trying to boot up!

Quote:
You can cherry pick tasks all you want, but it's still reality. This is why ARM won. 68K was inefficient. When it got efficient (040/060), it was too late and too expensive. ARM was superior...is superior.

I have an Acorn Archimedes A3000. I haven't switched it on in over a year. Why? Because the Amiga won. In the real world ARM stunk.

Last edited by bhabbott on 21-Aug-2024 at 05:04 AM.

 Status: Offline
Profile     Report this post  
kolla 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 21-Aug-2024 5:46:43
#382 ]
Elite Member
Joined: 20-Aug-2003
Posts: 3356
From: Trondheim, Norway

@mskov

Quote:

mskov wrote:
@kolla

Someone shared a downloadable manual for the A300 on Twitter recently. It looked like the A600 but I seem to recall that there were no possibility for an internal harddrive.

Here it is:

https://archive.org/details/introducing-the-amiga-300/mode/2up


Very cool - yes, it looks exactly like an A600.

So how would such an A300 have been any less expensive to make than the A600? How would this A300 not have failed the same way the A600 did? It too lacks the full keyboard desperately needed for flight sims and whatnot...

From what I recall, the higher cost at the time had more to do with the move to SMD than what it had to do with adding the Gayle IDE - the PCMCIA was already there.

Last edited by kolla on 21-Aug-2024 at 09:26 AM.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

 Status: Offline
Profile     Report this post  
Lou 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 21-Aug-2024 15:28:07
#383 ]
Elite Member
Joined: 2-Nov-2004
Posts: 4229
From: Rhode Island

@bhabbott

Quote:

bhabbott wrote:
@Lou

In a typical home computer you were even more limited by the need to prioritize video DMA. On many early systems the CPU could not access display RAM at all during the active display time, either because it was locked out (eg. RCA COSMAC, ZX81) or the display would be corrupted (IBM CGA). However the 6800, 6502, 6809 and 68000 have a handy feature where the first half of the memory cycle is not used by the CPU so video DMA can be interleaved with it.

Other CPUs like the Z80 didn't have that ability, so other techniques were used. The ZX Spectrum stretched the CPU clock until the memory cycle lined up. The Amstrad CPC added a wait state to make all memory cycles take 4 clocks, lowering the effective CPU speed from 4MHz to 3MHz. Machines using the TMS9918 VDP had to access video RAM via the VDP's I/O port, with a programmed delay between each access.

In the real word the 6502 didn't crush the 68000, which had the advantages of a 16 bit data bus, 16 MB flat addressing range and large number of internal 32 bit registers. And it had a direct path to full 32 bit.

As you say, benchmarks like Byte Sieve are pretty worthless for comparing real-world performance. A 6502 might be good for small programs written in highly optimized assembler, but was a poor fit to compilers of the day. It would perform poorly in the Amiga because it didn't have the required addressing range or the ability to multitask large programs efficiently. That's why Jay Miner chose the 68000.

Again, I'm not attacking 'Amiga' per say. I'm attacking the choice of cpu. A short-run decision gimped the platform in the long run. Perhaps he was blinded by 'marketing' at the time.

Quote:

Quote:
In the real world, the cpu doesn't do much math. It uses LUT to get the answer.

In the real world a general purpose computer doesn't use LUTs instead of proper math because the memory usage is onerous.

Quote:
You can have your 140-152 cycle DIV opcode. It's a joke. BRA takes about 12 cycles. You can go down the line and cheer that you can do 16/32bit math faster but it doesn't matter. Your fancy 'programmer-friendly' addressing modes take up to 48 cylces. You NEED 8Mhz minimum just to feel fast.

You are getting hung up on CPU clock speed. What matters is memory cycles. 68000 Bcc takes 3 memory cycles when the branch is taken (2 when not) and has a range of +- 32k Bytes. The 6502 takes 4 cycles if the destination isn't on the same page, and 4-7 cycles if the distance is greater than +-128 bytes (Bcc + Bcc or JMP).

No actually, I think you and other members are the ones getting hung up on clockspeed actually.
There were ways around that. For instance, the C128 offered an MMU that let you relocate the zero-page and stack pointer which let's you use the faster addressing modes. Eventually these features made it into extra registers like in the 65CE02's Z register addition.
Here's an example on the C128:
https://www.youtube.com/watch?v=u-ae8ZFZwaI

Quote:

Quote:
Smarter engineers than us/you/all-of-us have already done the analysis. On average a 1Mhz 6502 is generally equal to a 2.47Mhz 68000...add 20% when using a 65C02...add another 25% when comparing to a 65CE02. Deny reality all you want.

Once again, CPU clock speed doesn't matter. The fastest 6502 you could get in 1983 (when the Amiga was designed) was 3MHz. 3 x 2.47 = 7.41, 70% slower than a 12.5 MHz 68000 (introduced in 1982) even by your simplistic measure. But the Amiga used an 8 MHz 68000 throttled back to 7.09 MHz to match the memory cycle time, which was set by the video system.

Smarter engineers chose the 68000 because the 6502 wasn't up to the task. Amiga OS 2+ comes on a 512kB ROM. Let's be generous and assume the 6502 would only need half the space. That still means a massive amount of bank switching just to execute the OS code. Add 1MB of RAM and you see the problem. Your poor little 6502 with its pathetic 256 byte zero page and tiny 256 byte stack would be chasing its tail all day long just trying to boot up!
Quote:

Show me the Amiga that launched with a 12Mhz 68000? Only the Megadrive/Genesis CD addon gave you that...and in the 90's. In 1984 the, 20% more efficient than a 6502, 65C02 and 65816 were doing 4Mhz. A cheap MMU like in the C128 is all it takes to address stack and memory limitations.
The 65816 can address 1MB without an MMU, the A1000 launched with 256k. A lot of these arguments for the time are a mute point. They made the decisions they made because that's what they wanted, pre-Commodore.

[quote]
[quote]You can cherry pick tasks all you want, but it's still reality. This is why ARM won. 68K was inefficient. When it got efficient (040/060), it was too late and too expensive. ARM was superior...is superior.

I have an Acorn Archimedes A3000. I haven't switched it on in over a year. Why? Because the Amiga won. In the real world ARM stunk.

Again, are we discussing the cpu or the platform? Your Arch A3000 may have stunk to you due to lack of software support but in what I saw, it had great performance on what it offered.
The ARM2 cpu was doing 8Mhz in 1986 and had an IPC of .5 (4 MIPS @ 8Mhz) which outperforms all Amigas until the Amiga 3000 out of the box. (Again - I ignore accelerators.) ARM3 was doing 25Mhz in 1989.

So again, I re-iterate: an Amiga launch with a 4Mhz 65816 would have eventually led to a move to ARM instead of the delayed over-priced and under-performing and late to the game Motorola poop-show followed by an attempt to move to the failed PPC line.

Last edited by Lou on 21-Aug-2024 at 04:04 PM.

 Status: Offline
Profile     Report this post  
Lou 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 21-Aug-2024 15:49:29
#384 ]
Elite Member
Joined: 2-Nov-2004
Posts: 4229
From: Rhode Island

@cdimauro

Quote:

cdimauro wrote:
@Lou

Quote:

That makes sense as to why that moron started them.


Don't try to change the cards on the table: it's enough to sequentially read the comments to see who started insulting.

You're so childish that you aren't even be able to take the responsibility for your actions...

Still butt-hurt I see...

Quote:

Don't worry: the video that you've already shared was enough to see how much "fast" it was.

I only cared enough about the C900 to prove you wrong. No more.

Quote:

There's also a video which shows it in action. Have you felt ashamed to share it? Here is it: https://www.youtube.com/watch?v=iYFQZyK3xSo

A nice and slow... slideshow.

a demo of a slide show that is a slide show .... somehow I'm supposed to be offended?
You are not smart.

Quote:

That's the most that you can get, since from the code it's clearly visible why:
// copy bitmap (256x200=6400 bytes) from C128 RAM to VDC RAM
VDC_BlitBitmap:
[...]
loop: lda $0000,y
!: bit VDC_REG
bpl !-
sta VDC_DATA_REG
iny
bne loop
inc loop+2
inx
cpx #$19
bne loop
rts

The super slow copy operation from the CPU's RAM to the VDC's RAM.

That's for transferring ONE byte at the time, but at the beginning you need to check the VDC's status bit, otherwise you interfere with it.
In fact, you can only transfer data when it's NOT displaying something (e.g.: only during the vertical or horizontal blank period).

That's why you can do very little with the VDC and it's not suitable for games: its memory is too limited for storing both the screen and the graphics assets, so you need to use the CPU's memory for them, but with this so slow operation.

In short: USELESS CRAP.

Oh really where's your time study? Also - again, you display your complete and utter incompetence in all aspects of development.

Assets are copied to the gpu's memory so that they can be reused as necessary. This is how programming all gpus work...unless you are in a shared-memory environment like the VIC-II and Amiga. Once in memory, you're just manipulating registers to do block-copying, etc. to do animation.

When you turn on a C128, it literally dumps the 8k characterset ROM into the VDC's memory once regardless of what display mode you're in. That's done ONCE. Not every time it wants to display a character. How long did that take?

This is not rocket science. You're not smart.

Last edited by Lou on 21-Aug-2024 at 03:55 PM.

 Status: Offline
Profile     Report this post  
Karlos 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 21-Aug-2024 20:56:09
#385 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

Quote:
The 68k takes 7 instructions to add two 32-bit numbers in memory a byte at a time


But the point is you wouldn't do that, you'd just use a single 32-bit sized add. Contrast to the 6502 where you have to clear and then add 4 bytes (which also may need additional data movement operations) achieve the same result. So while a 6502 may be able to perform some simple operations in fewer cycles than the 68000, the 68000 can do significantly more in fewer instructions.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
cdimauro 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 21-Aug-2024 21:07:08
#386 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4127
From: Germany

@Lou

Quote:

Lou wrote:
@bhabbott

Quote:

bhabbott wrote:
@Lou

In a typical home computer you were even more limited by the need to prioritize video DMA. On many early systems the CPU could not access display RAM at all during the active display time, either because it was locked out (eg. RCA COSMAC, ZX81) or the display would be corrupted (IBM CGA). However the 6800, 6502, 6809 and 68000 have a handy feature where the first half of the memory cycle is not used by the CPU so video DMA can be interleaved with it.

Other CPUs like the Z80 didn't have that ability, so other techniques were used. The ZX Spectrum stretched the CPU clock until the memory cycle lined up. The Amstrad CPC added a wait state to make all memory cycles take 4 clocks, lowering the effective CPU speed from 4MHz to 3MHz. Machines using the TMS9918 VDP had to access video RAM via the VDP's I/O port, with a programmed delay between each access.

In the real word the 6502 didn't crush the 68000, which had the advantages of a 16 bit data bus, 16 MB flat addressing range and large number of internal 32 bit registers. And it had a direct path to full 32 bit.

As you say, benchmarks like Byte Sieve are pretty worthless for comparing real-world performance. A 6502 might be good for small programs written in highly optimized assembler, but was a poor fit to compilers of the day. It would perform poorly in the Amiga because it didn't have the required addressing range or the ability to multitask large programs efficiently. That's why Jay Miner chose the 68000.

Again, I'm not attacking 'Amiga' per say. I'm attacking the choice of cpu. A short-run decision gimped the platform in the long run. Perhaps he was blinded by 'marketing' at the time.

But you can easily change it with your time machine, right?
Quote:
Quote:

In the real world a general purpose computer doesn't use LUTs instead of proper math because the memory usage is onerous.

You are getting hung up on CPU clock speed. What matters is memory cycles. 68000 Bcc takes 3 memory cycles when the branch is taken (2 when not) and has a range of +- 32k Bytes. The 6502 takes 4 cycles if the destination isn't on the same page, and 4-7 cycles if the distance is greater than +-128 bytes (Bcc + Bcc or JMP).

No actually, I think you and other members are the ones getting hung up on clockspeed actually.

Actually you were the only one talking about (hypothetical) clocks. And the IPC/MIPS crappy measure.

Whereas you never think about combining them. IPC * clock = ? I leave it to you as homework.
Quote:
There were ways around that. For instance, the C128 offered an MMU that let you relocate the zero-page and stack pointer which let's you use the faster addressing modes. Eventually these features made it into extra registers like in the 65CE02's Z register addition.
Here's an example on the C128:
https://www.youtube.com/watch?v=u-ae8ZFZwaI

Then show me how a Fibonacci's recursive implementation using it. And then measure it.

I'm preparing other popcorns, in the meanwhile...
Quote:
Quote:

Once again, CPU clock speed doesn't matter. The fastest 6502 you could get in 1983 (when the Amiga was designed) was 3MHz. 3 x 2.47 = 7.41, 70% slower than a 12.5 MHz 68000 (introduced in 1982) even by your simplistic measure. But the Amiga used an 8 MHz 68000 throttled back to 7.09 MHz to match the memory cycle time, which was set by the video system.

Smarter engineers chose the 68000 because the 6502 wasn't up to the task. Amiga OS 2+ comes on a 512kB ROM. Let's be generous and assume the 6502 would only need half the space. That still means a massive amount of bank switching just to execute the OS code. Add 1MB of RAM and you see the problem. Your poor little 6502 with its pathetic 256 byte zero page and tiny 256 byte stack would be chasing its tail all day long just trying to boot up!

Show me the Amiga that launched with a 12Mhz 68000? Only the Megadrive/Genesis CD addon gave you that...and in the 90's.

Not needed: even a 7Mhz 68000 completely crashes the 65xx CRAProcessors when running any Amiga software, OS included.
Quote:
In 1984 the, 20% more efficient than a 6502, 65C02 and 65816 were doing 4Mhz.

Please, tell me where I can buy a time machine like your:
https://en.wikipedia.org/wiki/WDC_65C816
Launched: 1985

Plus: https://en.wikipedia.org/wiki/MOS_Technology_6502
Max. CPU clock rate 1 MHz to 3 MHz
Quote:
A cheap MMU like in the C128 is all it takes to address stack and memory limitations.

Sure. Then it should be very easy to use it and implement the above exercise, right?
Quote:
The 65816 can address 1MB without an MMU,

Actually it's 16MB.
Quote:
the A1000 launched with 256k. A lot of these arguments for the time are a mute point. They made the decisions they made because that's what they wanted, pre-Commodore.

And because they had no time machine available...
Quote:

Quote:

I have an Acorn Archimedes A3000. I haven't switched it on in over a year. Why? Because the Amiga won. In the real world ARM stunk.

Again, are we discussing the cpu or the platform?

It depends: usually you compare entire platforms when it was talking only about CPUs, to show some advantage for your favourite CRAPprocessors.

Now you do the opposite.

So, as per YOUR CONVENIENCE...
Quote:
Your Arch A3000 may have stunk to you due to lack of software support but in what I saw, it had great performance on what it offered.
The ARM2 cpu was doing 8Mhz in 1986 and had an IPC of .5 (4 MIPS @ 8Mhz) which outperforms all Amigas until the Amiga 3000 out of the box. (Again - I ignore accelerators.)

As per above: only because of YOUR CONVENIENCE.
Quote:
ARM3 was doing 25Mhz in 1989.

Intel's 80486 did the same.
Quote:
So again, I re-iterate: an Amiga launch with a 4Mhz 65816 would have eventually led to a move to ARM instead of the delayed over-priced and under-performing and late to the game Motorola poop-show followed by an attempt to move to the failed PPC line.

I implore you: where I can get your time machine?
Quote:

Lou wrote:
@cdimauro

Quote:

cdimauro wrote:
@Lou

Don't try to change the cards on the table: it's enough to sequentially read the comments to see who started insulting.

You're so childish that you aren't even be able to take the responsibility for your actions...

Still butt-hurt I see...


Quote:
Quote:

Don't worry: the video that you've already shared was enough to see how much "fast" it was.

I only cared enough about the C900 to prove you wrong. No more.

Ah, do you mean with the other slideshow that you've shown about it?
Quote:
Quote:

There's also a video which shows it in action. Have you felt ashamed to share it? Here is it: https://www.youtube.com/watch?v=iYFQZyK3xSo

A nice and slow... slideshow.

a demo of a slide show that is a slide show .... somehow I'm supposed to be offended?
You are not smart.

Well, a smart people would have not provided the source of other big laughs.
Quote:
Quote:

That's the most that you can get, since from the code it's clearly visible why:
// copy bitmap (256x200=6400 bytes) from C128 RAM to VDC RAM
VDC_BlitBitmap:
[...]
loop: lda $0000,y
!: bit VDC_REG
bpl !-
sta VDC_DATA_REG
iny
bne loop
inc loop+2
inx
cpx #$19
bne loop
rts

The super slow copy operation from the CPU's RAM to the VDC's RAM.

That's for transferring ONE byte at the time, but at the beginning you need to check the VDC's status bit, otherwise you interfere with it.
In fact, you can only transfer data when it's NOT displaying something (e.g.: only during the vertical or horizontal blank period).

That's why you can do very little with the VDC and it's not suitable for games: its memory is too limited for storing both the screen and the graphics assets, so you need to use the CPU's memory for them, but with this so slow operation.

In short: USELESS CRAP.

Oh really where's your time study?

I've already reported on a previous comment. Goldfish syndrome?
Quote:
Also - again, you display your complete and utter incompetence in all aspects of development.

Well, actually I'm the only one which has shown something about development.

You claimed to have written software for the C128, but you have provided not a single example neither the implementation of the simple exercise which I gave you.

Guess why: you've no clue at all of software development.
Quote:
Assets are copied to the gpu's memory so that they can be reused as necessary. This is how programming all gpus work...unless you are in a shared-memory environment like the VIC-II and Amiga. Once in memory, you're just manipulating registers to do block-copying, etc. to do animation.

When you turn on a C128, it literally dumps the 8k characterset ROM into the VDC's memory once regardless of what display mode you're in. That's done ONCE. Not every time it wants to display a character. How long did that take?

This is not rocket science. You're not smart.

Sure. And how much memory has the VDC? 16kB in TOTAL.

Let me do a basic math. 8kB are wasted by this character table. 16 - 8 = 8kB (great math operation!) are left.

Now let's assume that we've a 320x200 screen, which takes 8000 bytes. So, we've 8kB = 8192 - 8000 = 192 bytes left.

Basically the only thing that you can do is drawing characters from this table to the screen. And only that!

Now let's take a look at what I was able to achieve with USA Racing (my car racing game) on a 1MB Amiga. It used 640 32x32 tiles with 32 colours = 480kB of memory only for that. The virtual screen for the player was 8192x65536 pixels wide, and it was scrolled on all directions at maximum speed of 800 pixels/s (on each direction) at rock solid 50FPS.

Ah, the "slow" 68000 was doing a big part there, by moving such tiles from the Slow/Fast mem to the Chip mem (since 480kB of graphics only for the tiles cannot stay all in Chip mem).

I've transferred the same technique to Fightin' Spirit, and that's the reason why it was able to move so much graphics for the big characters on the screen at 25FPS.

Do the same with your crappy VDC, even using a REU.

 Status: Offline
Profile     Report this post  
matthey 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 21-Aug-2024 22:49:43
#387 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2452
From: Kansas

Lou Quote:

Again, I'm not attacking 'Amiga' per say. I'm attacking the choice of cpu. A short-run decision gimped the platform in the long run. Perhaps he was blinded by 'marketing' at the time.


Your statements defy intellect and history. The 68000 was one of the best performance MPUs in the early 1980s and was likely the best performance commodity MPU when introduced in 1979. The performance was just one advantage though. Jay Miner likely wanted and chose the 68000 because of features with one of the most important being a large flat address space. This feature alone was evolutionary and futuristic for a MPU in 1979. It was as evolutionary as the introduction of the low priced 6502 yet you blatantly ignore it even in hindsight while Jay Miner recognized it quickly before there were hardware implementations and correctly predicted the price would soon drop enough to use it and that engineering of a chipset should be started soon to take advantage of the opportunity.

Lou Quote:

Again, are we discussing the cpu or the platform? Your Arch A3000 may have stunk to you due to lack of software support but in what I saw, it had great performance on what it offered.
The ARM2 cpu was doing 8Mhz in 1986 and had an IPC of .5 (4 MIPS @ 8Mhz) which outperforms all Amigas until the Amiga 3000 out of the box. (Again - I ignore accelerators.) ARM3 was doing 25Mhz in 1989.

So again, I re-iterate: an Amiga launch with a 4Mhz 65816 would have eventually led to a move to ARM instead of the delayed over-priced and under-performing and late to the game Motorola poop-show followed by an attempt to move to the failed PPC line.


You can argue that the Amiga should have had a 65816 which would have been cheaper than the 68000 and MOS/CSG could have potentially produced it, not that Amiga Corporation knew CBM would buy them. The 68000 was the better choice then and now though. The 68k Amiga would not be nearly as popular today without the large flat address space. The AmigaOS would be much more limited and less dynamic than it is. The 68k Amiga completely avoided the world of various kludge memory bank switching techniques on the 65816, 808x/x86, Z architectures, etc. It was the 80386 with large flat address space which allowed x86 to quickly catch up, despite baggage, 5 years after the 68000 was introduced. RISC architectures introduced in the mid-1980s often had large flat address spaces too although some had ISA flaws like ARM. Early Acorn RISC OS software is incompatible with later ARM CPUs because early ARM CPUs were not 32 bit clean (only supported 26 bits of addressing internally). Most Amiga software developed for the 16-bit 1979 68000 will run on a 32-bit 1994 68060. Code written for the 68000 will have better performance on the 68060 than 808x/186/286 code on a Pentium where compatibility is more challenging too. Motorola/Freescale certainly deserves some criticism for mistakes made with the 68020 ISA but the 68000 ISA was so good that it made this possible.

If the 65816 was so good, there wouldn't be a need to move to another architecture. I'm not so sure RISC is the natural replacement for the 6502 accumulator architecture. ARM introduced performance handicaps which the 6502 did not have because of RISC.


opa mem ; operation of mem to accumulator (3-4 cycles for zero page access)


RISC separates memory accesses giving 2 instructions.


ldr r4,mem ; load mem to register (3 cycles)
op r5,r5,r4 ; reg to reg operation (1 cycle)


ARM needs 2 instructions, 2 registers and 8 bytes of code to do the same work as the 6502 with 1 instruction, 1 register and less code. The 6502 minimum instruction cycles can be less too. Is this the worthy successor to the 6502?

That was the original 3-stage ARM RISC pipeline but a deeper pipeline usually does not improve RISC performance as much as CISC performance. Let's take a look at the 5-stage ARM pipeline (ARM9TDMI for example).


ldr r4,mem ; load mem to register (1 cycle)
load-to-use stall (2 cycles)
op r5,r5,r4 ; reg to reg operation (1 cycle)


Load is finally 1 cycle but there is a 2 cycle load-to-use stall if the next instruction touches the result. This is still an improvement as two independent instructions can be placed between if possible. A two way superscalar CPU would need 4 independent instructions between. The popular Cortex-A53 is two way superscalar and the 8-stage pipeline has a 3 cycle load-to-use penalty so it needs 6 independent instructions between the load and an operation using the load. Load-to-use stalls are the great RISC performance killer as I've posted about before. A simulation in one paper showed that 8 GP registers without load-to-use stalls had better performance than 32 GP registers with a 2 cycle load-to-use stall. CISC pipelines usually avoided load-to-use stalls while ARM only has 14-15 GP registers (the PC is not GP while the link register is debatable).

ARM Appendix Instruction Cycle Timings
https://gab.wallawalla.edu/~curt.nelson/cptr380/textbook/advanced%20material/Appendix_B3.pdf

The 6502 architecture is a simplified 6800 architecture. The 68000 has influences from the 6800 so is similar to the 6502 too. Many of the instruction mnemonics are the same like AND, ASL, Bcc, CMP, EOR, JMP, JSR, LSR, ROL, ROR, RTS and operate in a similar way with often more powerful options. Instead of only one accumulator and 2 index registers, the 68k is like having 8 accumulators (Data registers) and 8 index registers (address registers). The code more closely resembles 6502 code although a size should be specified and a register has to be specified .


op.b mem,reg ; operation of mem to register (byte size)


There is a single instruction now able to access multiple sized datatypes in a large flat address space using only a single register and without load-to-use stalls. The 68k needs a 7-8 stage pipeline and more hardware resources to make this efficient but the 68060 can execute cached operations like this in a single cycle including most addressing modes with no penalty. At the same time, it can execute another integer instruction in parallel. In my opinion, the 68k is closer to an upgraded 6800 or 6502 than RISC ISAs. It was also heavily influenced by the PDP-11.

In case you missed my post in the other thread...

http://bitsavers.trailing-edge.com/components/zilog/z80000/Z80000_CPU_Preliminary_Technical_Manual_Sep84.pdf

P.S.
Lou and Hammer should read the Z80000 manual linked above pages E-19 and E-20. The proper way to calculate millions of instructions per second (VAX MIPS) is shown which includes not just the average instruction latency but average pipeline delay, average addressing delay and average memory delay. The Z80000 average instruction execution cycles is 1.8 cycles but the average instruction cycles can increase to 2.5 to 4.0 cycles depending on the memory used. This is with 16 GP registers and a fully associative 256B unified cache with a 62% to 88% hit ratio. A 6502 with 3 registers and no cache is likely to have a hard time with the memory delay no matter how optimized the memory accesses are.

Last edited by matthey on 21-Aug-2024 at 11:14 PM.

 Status: Offline
Profile     Report this post  
Hammer 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 22-Aug-2024 7:19:50
#388 ]
Elite Member
Joined: 9-Mar-2003
Posts: 6168
From: Australia

@Lou

Quote:

Again, are we discussing the cpu or the platform? Your Arch A3000 may have stunk to you due to lack of software support but in what I saw, it had great performance on what it offered.
The ARM2 cpu was doing 8Mhz in 1986 and had an IPC of .5 (4 MIPS @ 8Mhz) which outperforms all Amigas until the Amiga 3000 out of the box. (Again - I ignore accelerators.) ARM3 was doing 25Mhz in 1989.

So again, I re-iterate: an Amiga launch with a 4Mhz 65816 would have eventually led to a move to ARM instead of the delayed over-priced and under-performing and late to the game Motorola poop-show followed by an attempt to move to the failed PPC line.

A2500/030 has an "out of the box" experience from 1989. The A3000 wasn't the 1st Amiga with 68030 with 25Mhz "out of the box".

The cheapo ARM60 in 3DO lacks the fast MUL instruction, hence PIO driven matrix math co-processor in MADAM is needed. ARMv3 and ARM60 don't guarantee fast MUL instruction.

For 3D, other game console platform vendors like Sega or Sony weren't stupid enough to select ARM60.

https://3dodev.com/_media/documentation/hardware/arm60_datasheet_-_gec_plessey_semiconductors.pdf
Page 49. Using barrel shifter for multiplication operations.
Page 58, 2-bit booth multiplication implementation with variable completion cycle times.



_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
Hammer 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 22-Aug-2024 7:28:38
#389 ]
Elite Member
Joined: 9-Mar-2003
Posts: 6168
From: Australia

@matthey

Quote:
Lou and Hammer should read the Z80000 manual linked above pages E-19 and E-20. The proper way to calculate millions of instructions per second (VAX MIPS) is shown which includes not just the average instruction latency but average pipeline delay, average addressing delay and average memory delay. The Z80000 average instruction execution cycles is 1.8 cycles but the average instruction cycles can increase to 2.5 to 4.0 cycles depending on the memory used. This is with 16 GP registers and a fully associative 256B unified cache with a 62% to 88% hit ratio. A 6502 with 3 registers and no cache is likely to have a hard time with the memory delay no matter how optimized the memory accesses are.

Note why I argued for Quake benchmark standard e.g. resistant against Pentium Overdrive's larger 32KB L1 cache.

Look in the mirror with your 68060's 32-bit external bus.

Quote:

Load is finally 1 cycle but there is a 2 cycle load-to-use stall if the next instruction touches the result. This is still an improvement as two independent instructions can be placed between if possible. A two way superscalar CPU would need 4 independent instructions between. The popular Cortex-A53 is two way superscalar and the 8-stage pipeline has a 3 cycle load-to-use penalty so it needs 6 independent instructions between the load and an operation using the load. Load-to-use stalls are the great RISC performance killer as I've posted about before. A simulation in one paper showed that 8 GP registers without load-to-use stalls had better performance than 32 GP registers with a 2 cycle load-to-use stall. CISC pipelines usually avoided load-to-use stalls while ARM only has 14-15 GP registers (the PC is not GP while the link register is debatable).

PiStorm16 officially supports RPI CM4 which includes ARM Cortex A72.

Original PiStorm with RPi 4B works with Emu68.

Implied load-store's performance with CISC is dependent on the implementation e.g. Zen 2 vs Zen 3. Both Zen 2 and Zen 3 have similar ALU and FP/Vec units with significant differences in the load-store units.

Last edited by Hammer on 22-Aug-2024 at 08:02 AM.
Last edited by Hammer on 22-Aug-2024 at 07:34 AM.
Last edited by Hammer on 22-Aug-2024 at 07:30 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
Hammer 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 22-Aug-2024 9:09:47
#390 ]
Elite Member
Joined: 9-Mar-2003
Posts: 6168
From: Australia

@matthey

Quote:
You can argue that the Amiga should have had a 65816 which would have been cheaper than the 68000 and MOS/CSG could have potentially produced it, not that Amiga Corporation knew CBM would buy them. The 68000 was the better choice then and now though. The 68k Amiga would not be nearly as popular today without the large flat address space. The AmigaOS would be much more limited and less dynamic than it is. The 68k Amiga completely avoided the world of various kludge memory bank switching techniques on the 65816, 808x/x86, Z architectures, etc. It was the 80386 with large flat address space which allowed x86 to quickly catch up, despite baggage, 5 years after the 68000 was introduced. RISC architectures introduced in the mid-1980s often had large flat address spaces too although some had ISA flaws like ARM. Early Acorn RISC OS software is incompatible with later ARM CPUs because early ARM CPUs were not 32 bit clean (only supported 26 bits of addressing internally). Most Amiga software developed for the 16-bit 1979 68000 will run on a 32-bit 1994 68060. Code written for the 68000 will have better performance on the 68060 than 808x/186/286 code on a Pentium where compatibility is more challenging too. Motorola/Freescale certainly deserves some criticism for mistakes made with the 68020 ISA but the 68000 ISA was so good that it made this possible.

Using a higher memory address to store data is also a problem for 68020 since 68000/68010 has a 24-bit memory address range. Mac 68K also has "32-bit clean" issues.




_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
matthey 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 22-Aug-2024 19:39:39
#391 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2452
From: Kansas

Hammer Quote:

Note why I argued for Quake benchmark standard e.g. resistant against Pentium Overdrive's larger 32KB L1 cache.

Look in the mirror with your 68060's 32-bit external bus.


This thread is about Doom where the 68060 likely has an advantage over the original Pentium due to the 68060 having better integer performance and more efficient caches. The Pentium 64-bit data bus and off-chip L2 cache support probably can't overcome the large performance deficit.

Quake is a software anomaly and was even a game anomaly back then. Most of the data was used once so it didn't do any good to cache it. Yes, a 64-bit data bus was likely an advantage. Also, the FPU was used more than most software, especially games. The Pentium was not used in consoles because it was too expensive and too hot. The 68060 could have been used as it was lower power and the 32 bit data bus was more typical and practical for a console then. The 68060 FPU was on the weak side to play Quake but it was acceptable if it had been clocked up like the 8-stage pipeline should have been. Most consoles at that time did not have FPUs but the 68060 FPU is certainly better than the alternative of converting Quake to use integer fixed point math. Games like Quake really needed 3D hardware acceleration so cheaper and lower power CPU hardware could be used. Quake on my 68060@75MHz with Voodoo4 512x384x16 averages about 25fps and looks good. This is without hardware T&L, with a bottle necked Zorro III to PCI bus and with poor 68060 compiler support.

Hammer Quote:

PiStorm16 officially supports RPI CM4 which includes ARM Cortex A72.

Original PiStorm with RPi 4B works with Emu68.

Implied load-store's performance with CISC is dependent on the implementation e.g. Zen 2 vs Zen 3. Both Zen 2 and Zen 3 have similar ALU and FP/Vec units with significant differences in the load-store units.


ARM wastes a lot of power and generates a lot of heat with OoO to reduce load-to-use stalls and aggressive OoO may not be able to completely remove them according to the Zero-Cycle Loads paper simulations. These are not the simple small low power cores that ARM started with but rather huge OoO cores with many times the transistors and instructions of a 68060. ARM is approaching x86(-64) bloat levels in their pursuit of competitive performance. Expensive die shrinks solve most efficiency problems but it is not the way to improve price efficiency.

Hammer Quote:

Using a higher memory address to store data is also a problem for 68020 since 68000/68010 has a 24-bit memory address range. Mac 68K also has "32-bit clean" issues.


The Mac problem was purely a software problem and not a 68k problem. The ARM 26-bit problem is an ARM hardware problem.

https://en.wikipedia.org/wiki/26-bit_computing#Early_ARM_processors Quote:

Early ARM processors

In the ARM processor architecture, 26-bit refers to the design used in the original ARM processors where the Program Counter (PC) and Processor Status Register (PSR) were combined into one 32-bit register (R15), the status flags filling the high 6 bits and the Program Counter taking up the lower 26 bits.

In fact, because the program counter is always word-aligned the lowest two bits are always zero which allowed the designers to reuse these two bits to hold the processor's mode bits too. The four modes allowed were USR26, SVC26, IRQ26, FIQ26; contrast this with the 32 possible modes available when the program status was separated from the program counter in more recent ARM architectures.

This design enabled more efficient program execution, as the Program Counter and status flags could be saved and restored with a single operation. This resulted in faster subroutine calls and interrupt response than traditional designs, which would have to do two register loads or saves when calling or returning from a subroutine.

Despite having a 32-bit ALU and word-length, processors based on ARM architecture version 1 and 2 had only a 26-bit PC and address bus, and were consequently limited to 64 MiB of addressable memory. This was still a vast amount of memory at the time, but because of this limitation, architectures since have included various steps away from the original 26-bit design.

The ARM architecture version 3 introduced a 32-bit PC and separate PSR, as well as a 32-bit address bus, allowing 4 GiB of memory to be addressed. The change in the PC/PSR layout caused incompatibility with code written for previous architectures, so the processor also included a 26-bit compatibility mode which used the old PC/PSR combination. The processor could still address 4 GB in this mode, but could not execute anything above address 0x3FFFFFC (64 MB). This mode was used by RISC OS running on the Acorn Risc PC to utilise the new processors while retaining compatibility with existing software.

ARM architecture version 4 made the support of the 26-bit addressing modes optional, and ARM architecture version 5 onwards has removed them entirely.


It was a bad idea for the PC and status register to be in the orthogonal register file and to be writeable for branching which made the problem worse. Mistakes were made which is nothing another mode can't fix. There have been three ARM do over ISAs since then too. The baggage would almost be like x86(-64) except Intel keeps it while ARM removes their baggage and mistakes over time to try to compete with the CISC ISA that is inferior to the 68k. Well, Motorola removed hardware too even though they retained very good compatibility. At least the 68k doesn't need any modes to access the large flat 32-bit (4GiB) address space that even some 32-bit architectures had trouble getting right. Sometimes inferior technology wins and everyone is a loser.

Last edited by matthey on 22-Aug-2024 at 07:40 PM.

 Status: Offline
Profile     Report this post  
matthey 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 22-Aug-2024 22:28:37
#392 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2452
From: Kansas

Karlos Quote:

But the point is you wouldn't do that, you'd just use a single 32-bit sized add. Contrast to the 6502 where you have to clear and then add 4 bytes (which also may need additional data movement operations) achieve the same result. So while a 6502 may be able to perform some simple operations in fewer cycles than the 68000, the 68000 can do significantly more in fewer instructions.


Of course. It is interesting to compare taking away the advantage of larger datatype sizes though. The 68000 used 7 instructions but the 6502 needs many more. I believe the 6502 code should look something like the following code.


lda (x) ; load accumulator with least significant byte of 1st num
clc ; clear carry
adc (y) ; add with carry least significant byte of 2nd num
sta (x) ; store accumulator to 1st num

inx ; increment x if LE or decrement x if BE
iny ; increment y if LE or decrement y if BE
lda (x) ; load accumulator with next byte of 1st num
adc (y) ; add with carry next byte of 2nd num
sta (x) ; store accumulator to 1st num

inx ; increment x if LE or decrement x if BE
iny ; increment y if LE or decrement y if BE
lda (x) ; load accumulator with next byte of 1st num
adc (y) ; add with carry next byte of 2nd num
sta (x) ; store accumulator to 1st num

inx ; increment x if LE or decrement x if BE
iny ; increment y if LE or decrement y if BE
lda (x) ; load accumulator with most significant byte of 1st num
adc (y) ; add with carry most significant byte of 2nd num
sta (x) ; store accumulator to 1st num


I haven't coded 6502 assembly in decades and I was a novice then. Maybe our 6502 expert Lou can point out any mistakes and count the 6502 code size in bytes.

CPU | instructions | code size
6502 19 31?
68000 7 16

The 68000 vs 6502 match up is like a race between the Tortoise and the Hare. The 6502 is quicker to execute simple instructions but is it enough to beat the slower but more powerful 68000?

https://en.wikipedia.org/wiki/The_Tortoise_and_the_Hare

The moral of the story is that arrogance can lose the race. I wonder if Lou knows the story.

Last edited by matthey on 22-Aug-2024 at 10:37 PM.

 Status: Offline
Profile     Report this post  
Hammer 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 23-Aug-2024 2:20:51
#393 ]
Elite Member
Joined: 9-Mar-2003
Posts: 6168
From: Australia

@matthey

Quote:

This thread is about Doom where the 68060 likely has an advantage over the original Pentium due to the 68060 having better integer performance and more efficient caches. The Pentium 64-bit data bus and off-chip L2 cache support probably can't overcome the large performance deficit.

68060 has four bytes per cycle from the L1 instruction cache limitation which SysInfo easily crossed over. For example, two ADD.L $xxxx would trip over the 4-byte (32-bit) fetch limitation.

AC68080's 16-byte (128-bit) fetch per cycle from the L1 instruction cache fixes this 68060's design flaw.

Pentium has a 32-byte (256-bit) fetch per cycle from the L1 cache.

By design, Motorola crippled 68060 relative to PowerPC 601's 256-bit fetch from the L1 cache. 68060 wouldn't be able to compete with Motorola's sabotage. 68K doesn't have a competent "AMD" cloner licensee insurance to enhance the original 68060 design against the wishes of the original CPU vendor i.e. Apollo-Core didn't exist in the 1990s.

For the X86 PC platform, AMD's X86-64 shows IBM's enforced "second source" insurance works when Intel wanted to execute the "PowerPC" job (i.e. Itanium) on the X86 PC platform. Intel caught the VLIW hype from HP's PA-WideWord.

------

When Motorola released 68060 (600 nm) at 50 Mhz in April 1994, Intel already moved on with its second generation P5 i.e. P54C (600 nm).

In 1993, Pentium 60/66's 800 nm competed against 68040's 650 nm. 68040 reached 40Mhz in 1993.

Pentium 90 and 100 models were released in March 1994 and Pentium 75 was released in October 1994.

Multiple PC vendors have quicker motherboard releases with Socket 5 P54C with Intel offering completed Socket 5 PC reference motherboard designs. During the PCI era, Intel performs most of the motherboard engineering for "cut-and-paste" PC vendors.

Amiga's 68060 experience started in 1995 via Phase 5 and Quikpak 060 card. For A4000T/040, Commodore's A3640 v3.2 card was replaced by Quikpak 060 card. Phase 5's 060 is bound by A1200/A2000/A3000/A4000's install base which is less than 1 million. A1200 expansion edge connector adapter for 166,000 CD32 units didn't exist from 1994 to 1996.

Phase 5 couldn't expand the AGA install base with their 060 cards since the Amiga wasn't a clone platform.

If AGA sales numbers are treated as worldwide
44,000 (A1200, the UK has 30,000 during its launch),
100,000 (A1200, AF50, Sep 1993),
170,000 (A1200, AF56, Feb 1994),
166,000 (CD32, Commodore US president, Jan 1994),
7,500 (Germany's A4000/030),
3,800 (Germany's A4000/040),
Total: 491,300 AGA units.

For Amitech's A2200-1/A2200-2 clone, Commodore Canada's 65,000 CD32 orders are locked up in the Philippines warehouse.

Commodore Germany sold all their 25,000 CD32 allocation.
Commodore UK sold their 75,000 CD32 allocation. Commodore UK is complaining about A1200 and CD32 production issues.

Escom era has an additional 20,000 A1200s.

To fix Commodore's financial problem, 400,000 CD32s would be needed (cite: Commodore - The Final Years). You're looking at the 725,000 AGA units target to sustain Commodore.

Commodore European operations couldn't sell A1200/CD32 into A500's sales boom level when Commodore International couldn't fund enough A1200/CD32 production units.

When Commodore International went bust, the Amiga lost the "economies of scale" enabler.

A600's shouldn't have been released i.e. A600's manufacturing funds should have been AGA machines.

Jeff Porter's original intent for CD32 is integrated CL-450 SoC ($50, includes a custom MIPS-X @ 40Mhz CPU-DSP) with "A1200" (AGA/EC020/+Akiko C2P) and 8 MB RAM (extra $20). Mehdi Ali's management team has reduced this design to barebone CD32. Jeff Porter has to justify every component in CD32.

Motorola doesn't have a cheap 68EC040 in the CL-450 SoC's $50 range.
Motorola doesn't have a cheap 68EC060 in the CL-450 SoC's $50 range.

As per Motorola's pricing policy, your pro-68060 argument doesn't fit Commodore's "economies of scale" model.

A1000 Plus's $800 price target could have tolerated 68EC040-25 or 68LC040-25.

Commodore Germany's PC clone production capability would been "A1000 Plus", but it was shut down in 1993 due to financial and inventory mismanagement.

------

With Doom or software renderer workload, memory bandwidth is a major factor.

With PC100 SDR missing in action between 1993 to 1996, Pentium has a memory bandwidth advantage.

Using the CPU as a software render device has a streaming compute behavior i.e. it acts like a GpGPU device.

In 1995, Doom was displaced by other games such as Star Wars Dark Forces.

Quote:

ARM wastes a lot of power and generates a lot of heat with OoO to reduce load-to-use stalls

Reminder, A1200/CD32 are desktop platforms.

In real life, stock RPi 4B @ 1.8 Ghz or CM4 are within TDP specs when coupled with PiStorm and A1200 or A500. Installing RPi active cooling is for +2 Ghz overclocking e.g. my A1200's PiStorm32-CM4 has successful 2.2 Ghz overclocking. I could install a heatpipe linked with A1200's metal shield for passive cooling.

My PiStorm32's CM4 adapter is purpose-designed with a fan header and A1200's breakout end-user I/O panel.

68000-based DragonBall VZ was pushed out of the smartphone market since ARMv4T.

I also installed extra cooling for TF1260's 68060 rev1 overclocking attempts.

Quote:

and aggressive OoO may not be able to completely remove them according to the Zero-Cycle Loads paper simulations. These are not the simple small low power cores that ARM started with but rather huge OoO cores with many times the transistors and instructions of a 68060. ARM is approaching x86(-64) bloat levels in their pursuit of competitive performance. Expensive die shrinks solve most efficiency problems but it is not the way to improve price efficiency.

At 1 GHz, 68060's front-end design has weaker handling for higher latency DDR3 memory.

Warp1260's 68060 configuration is coupled with an external 64 KB L2 cache for its DDR3. Warp 1260's extra bandwidth from DDR3 benefits its RTG.

TF1260 has 100Mhz SDR.

CPUs are designed with the intended memory design.

68060's 32-bit 68K ISA is frozen in time with no support for 32x32=64, general computing 64-bit, FMA, Tensor (pack math resolves to 32-bit result), SIMD/pack math instruction set, and scatter-load/gather-store.

68060's TLB caches are small by modern standards.
68060's 8 data registers for FPU are dated by modern standards.

N64's geometry processor is a cut-down MIPS R4000-based CPU (a subset of MIPS-III) with a 128-bit vector unit with 32 128-bit registers e.g. pack math of eight 16bit per 128-bit register vs 68060 scalar would perform 8 seperate math operations.

N64 followed PSX's example of two MIPS-based IPs for CPU and GTE.

For the game console job, Jeff Porter also selected an MIPS-based solution for the original intended CD32 to augment A1200's baseline.

I already told you missing PS3 CELL style 8-bit (byte) pack math 128-bit vector instruction with SSE2 is very slow, let alone 68060's 32-bit scalar. Intel Core 2 SSSE3 is needed to match it. Intel's Core 2 release is timely.

SSSE3's pshufb instruction is invaluable for emulating CELL's shufb instruction.

https://whatcookie.github.io/posts/why-is-avx-512-useful-for-rpcs3/
Quote:

The performance when targeting SSE2 is absolutely terrible, likely due to the lack of the pshufb instruction from SSSE3. pshufb is invaluable for emulating the shufb instruction, and it’s also essential for byteswapping vectors, something that’s necessary since the PS3 is a big endian system, while x86 is little endian.

The SSE4.1 target achieves an average of 160 FPS, while the AVX2/FMA target achieves an average of 190 FPS. This is a 18% improvement over the SSE4.1 target. AVX2 doesn’t include many new instructions over SSE4.1, but it does include a new 3 operand form for instructions, which eliminates many register to register mov instructions. Crucially, all CPUs that support AVX2 also support FMA instructions. FMA instructions aren’t just faster than a chain of multiply + add instructions, but can also produce different results due to not rounding to single precision between the multiply and the add. Accurately emulating this without FMA instructions adds some overhead, and so native FMA operations help out quite a bit.

The Icelake tier AVX-512 target hits a ludicrous 235 FPS average, 23% faster than the AVX2/FMA target. The sheer number of new instructions added in AVX-512 is so large that quite a number of them end up being useful for RPCS3. Unlike AVX2 which was mostly a straightforward extension of existing SSE instructions to 256 bits, AVX-512 includes a huge number of new features which are very useful for SIMD programming, even at lower bit widths. However, since intel chose to market AVX-512 with the -512 moniker, people who aren’t familiar with the instruction set usually fixate on the 512 bit vector aspect of the instruction set.


https://whatcookie.github.io/Gow3Comparison.png
From left to right, SSE2 (4.83 fps), SSE4.1(165.74 fps), AVX2/FMA (187.36 fps), Icelake tier AVX-512 (241.97 fps) running PS3's God of War3 on Core i9 12900K @ 5.2GHz with AVX-512 enabled.

Good luck with the 3 Ghz 68060's scalar processing. LOL

AVX2 is standard for X86-64v3 level.

AVX-512 is standard for X86-64v4 level.

Intel's next ArrowLake-S enables AVX-512's 256-bit subset as part of AVX-10.

X86 CPUs can process integer workloads on FPU and vector units i.e. 32 registers with AVX10 and AVX512.

Since Pentium III, SSE has acted like another CPU within a CPU due to scalar integer processing capability.
--------------

For NVIDIA's SoundStorm, NVIDIA's main complaint about Motorola/Freescale's DSP56300 is cost. LOL. Without Microsoft's Xbox subsidy, NVIDIA didn't include DSP56300 in the follow-on nForce 3.

For PS5, AMD's TrueAudio Next removes extra DSP IP cost i.e. the removal of Cadence Tensilica HiFi EP DSP with Tensilica Xtensa SP float support. PS5's DSP is based on AMD's GCN CU design bundled with AMD's APU SoC.

For 3D games, $294 Steamdeck's semi-custom AMD APU has no problem destroying fictional 1 Ghz 68060 into oblivion. https://videocardz.com/newz/steam-deck-is-now-available-for-only-296

Intel and AMD beats NXP on 2D/3D gaming i.e. the workload that matters for most Amiga users. Qualcomm and NVIDIA are the gaming SoC alternative vendors.

My point is about what performance at a given price.




Last edited by Hammer on 23-Aug-2024 at 05:31 AM.
Last edited by Hammer on 23-Aug-2024 at 02:35 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
cdimauro 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 23-Aug-2024 4:31:56
#394 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4127
From: Germany

@matthey

Quote:

matthey wrote:
Karlos Quote:

But the point is you wouldn't do that, you'd just use a single 32-bit sized add. Contrast to the 6502 where you have to clear and then add 4 bytes (which also may need additional data movement operations) achieve the same result. So while a 6502 may be able to perform some simple operations in fewer cycles than the 68000, the 68000 can do significantly more in fewer instructions.


Of course. It is interesting to compare taking away the advantage of larger datatype sizes though.

I don't agree because... we have the larger datatype sizes on 68000 and other architectures, and we want to take full advantage at it. That's the reason why the 68000 wipes out the 6502 in the real world, with regular algorithms.
Quote:
The 68000 used 7 instructions but the 6502 needs many more. I believe the 6502 code should look something like the following code.


lda (x) ; load accumulator with least significant byte of 1st num
clc ; clear carry
adc (y) ; add with carry least significant byte of 2nd num
sta (x) ; store accumulator to 1st num

inx ; increment x if LE or decrement x if BE
iny ; increment y if LE or decrement y if BE
lda (x) ; load accumulator with next byte of 1st num
adc (y) ; add with carry next byte of 2nd num
sta (x) ; store accumulator to 1st num

inx ; increment x if LE or decrement x if BE
iny ; increment y if LE or decrement y if BE
lda (x) ; load accumulator with next byte of 1st num
adc (y) ; add with carry next byte of 2nd num
sta (x) ; store accumulator to 1st num

inx ; increment x if LE or decrement x if BE
iny ; increment y if LE or decrement y if BE
lda (x) ; load accumulator with most significant byte of 1st num
adc (y) ; add with carry most significant byte of 2nd num
sta (x) ; store accumulator to 1st num


I haven't coded 6502 assembly in decades and I was a novice then.

There's one problem: the zero page indexed addressing mode can only be used with the X register.
So, you've to resort to the absolute indexed addressing modes which can use X and Y.
It means that each LDA/ADC/STA instruction takes 3 bytes and 4 cycles each (one more if the address crosses the page).
Quote:
Maybe our 6502 expert Lou can point out any mistakes and count the 6502 code size in bytes.

Don't expect anything from him: he has never shown a single line of code because he claimed to have done something with its C128. He's clearly a cheater.
Quote:
CPU | instructions | code size
6502 19 31?
68000 7 16

43 bytes for the 6502: 12 * 3 + 7 * 1 bytes.

It's also interesting to notice how many memory accesses are needed in its case.
Quote:
The 68000 vs 6502 match up is like a race between the Tortoise and the Hare. The 6502 is quicker to execute simple instructions but is it enough to beat the slower but more powerful 68000?

https://en.wikipedia.org/wiki/The_Tortoise_and_the_Hare

The moral of the story is that arrogance can lose the race. I wonder if Lou knows the story.

What's sure is that he doesn't know coding...

 Status: Offline
Profile     Report this post  
Hammer 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 23-Aug-2024 7:13:01
#395 ]
Elite Member
Joined: 9-Mar-2003
Posts: 6168
From: Australia

@matthey

Quote:

Games like Quake really needed 3D hardware acceleration so cheaper and lower power CPU hardware could be used. Quake on my 68060@75MHz with Voodoo4 512x384x16 averages about 25fps and looks good

Voodoo 4 didn't exist until October 2000's release. Unless you have a time machine, your counter-argument is not realistic.

Phase 5's CyberVision 3D was released in 1996 and it's S3 ViRGE.

My Quake statements are for software rendering only without using GLQuake.

Your Voodoo 4 PCI is bottlenecked by the slow 68060-75.

For GLQuake 640x480x16 with 1.8 GHz Athlon XP.
Voodoo 3 3500 AGP2x has 196 fps.
Voodoo 5 5500 PCI has 343 fps.
GeForce 2 Pro AGP4x has 444.5 fps.

1.8 GHz Athlon XP can push 714.2 fps when the GPU has less per frame rendering workload e.g. 320x240x16.

For 320x240x16.
GeForce 2 Pro AGP4x has 714 fps.
Voodoo 3 3500 AGP2x has 544 fps.
Voodoo 3 2000 PCI has 502 fps.

In late 2000, my CPU was a K7 Athlon Thunderbird 1133 Mhz. Intel Celeron 533 Mhz overclocked to 600Mhz (via 75 Mhz FSB) was my previous CPU before the Athlon 1133 Mhz build. I also overclocked Athlon 1133 MHz into the 1200 MHz range.

https://www.anandtech.com/show/449/3
Good success rate with Intel Celeron 533 Mhz overclocked to 600 Mhz (via 75 Mhz FSB, Intel 440ZX). My older Celeron 300A was able to overclocked to 450 Mhz since there's Pentium II 450 Mhz.

The Ghz race between AMD vs Intel has pushed out other non-X86 CPUs chances on the desktop. Falling X86 CPU price and intense competition enabled Pentium III/Celeron Coppermine @ 733 Mhz (originally K7 Duron before Bill Gates override) entry into the original Xbox game console.

I have given my Socket 7 Pentium 166 Mhz PC to my relatives.

Your 3D acceleration argument is not realistic.

On price and 3D gaming, Motorola ColdFire v5e wouldn't be able to beat Intel Celeron "Mendocino" and Coppermine-128.

I still have my old 533 Mhz Celeron gaming PC somewhere.

Atari TOS platform went into the ColdFire direction.

http://kronos.lutece.net/ Benchmarks between V4SA AC68080 @ 93 Mhz vs CT060 @ 100Mhz vs ColdFire V4e @ 200 Mhz vs FireBee ColdFire V4e @ 266 Mhz.

V4SA AC68080 @ 93 Mhz is roughly equivalent to Freescale ColdFire V4e @ 200 Mhz. AC68080 V4 @ 93 Mhz crushed MC68060 @ 100 Mhz

Atari ColdFire users seem to be less readily posting Doom/Quake benchmarks which differs from Amiga users.

Last edited by Hammer on 23-Aug-2024 at 08:14 AM.
Last edited by Hammer on 23-Aug-2024 at 08:10 AM.
Last edited by Hammer on 23-Aug-2024 at 08:05 AM.
Last edited by Hammer on 23-Aug-2024 at 07:30 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
WolfToTheMoon 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 23-Aug-2024 8:37:37
#396 ]
Super Member
Joined: 2-Sep-2010
Posts: 1410
From: CRO

MOS 6502 is less than 3500 transistors
WDC 65816 is cca 20 000 transistors
MC 68000 is cca 68000 transistors.

Bang per buck, in the late 70s/early to mid 80s... MOS chip line was probably the better choice. Now if only Tramiel did something more with MOS instead of cost cutting.



_________________

 Status: Offline
Profile     Report this post  
Karlos 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 23-Aug-2024 8:51:57
#397 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4843
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

Quote:
@matthey

Quote:

Games like Quake really needed 3D hardware acceleration so cheaper and lower power CPU hardware could be used. Quake on my 68060@75MHz with Voodoo4 512x384x16 averages about 25fps and looks good


Somewhere on my long dead Seagate (pretty sure it could be recovered by a data recovery company) there's an experimental quake I was working on. It was identical to the software rendered right up to the point it actually has to draw stuff and uses Warp3D directly to render triangles and write the depth buffer. It was very unfinished and buggy but it was pretty quick at just drawing the world. This was specifically for BVision and it's limited capabilities. The pain points were texture management (which the GL version also has), but I managed to streamline some of that by putting lightmap textures into a tall thin (32 px) strip where possible. It's easier and more efficient to update them when they are like this because at 32 width, the sub patching isn't used and uploading the damage region becomes a linear copy and not a complex bunch of address permutations.

You got to strip out all the hacky quake to GL stuff, the varying inefficiencies of Minimal on W3D and kept the ruthlessly optimised software 3D transformation and clipping. And you replaced the lowest level rasterization with hardware.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
NutsAboutAmiga 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 23-Aug-2024 11:01:01
#398 ]
Elite Member
Joined: 9-Jun-2004
Posts: 12960
From: Norway

@Hammer

That was interesting benchmarks, the ARM implementation must be horrible compared to what we have on pistrom, or perhaps just a standard emulator, what is mot cost effective considering V4SA / 080 vs ColdFire 200Mhz?

Is it a full computer, is do you need to own a wherry old computer to use it.

Last edited by NutsAboutAmiga on 23-Aug-2024 at 11:02 AM.
Last edited by NutsAboutAmiga on 23-Aug-2024 at 11:02 AM.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

 Status: Offline
Profile     Report this post  
Hypex 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 23-Aug-2024 16:35:58
#399 ]
Elite Member
Joined: 6-May-2007
Posts: 11351
From: Greensborough, Australia

@Lou

Quote:
So again, I re-iterate: an Amiga launch with a 4Mhz 65816 would have eventually led to a move to ARM instead of the delayed over-priced and under-performing and late to the game Motorola poop-show followed by an attempt to move to the failed PPC line.


However, the 6502/65816 and ARM are totally different CPUs. Might as well first use ARM if it was available. The Amiga already ended up in a CPU mess when Motorola abandoned the 68K and PPC was introduced to the Amiga. About the only thing the 65816 has in common with an ARM is little endian. As a follow up from the C128 the 65816 makes sense in the next big Commodore. But the Amiga wasn't a Commodore computer.

 Status: Offline
Profile     Report this post  
WolfToTheMoon 
Re: DoomAttack (Akiko C2P) on Amiga CD32 + Fast RAM (Wicher CD32)
Posted on 23-Aug-2024 18:13:20
#400 ]
Super Member
Joined: 2-Sep-2010
Posts: 1410
From: CRO

@Hypex

amiga originally being little endian would mean a world to some of the amiga ng projects

_________________

 Status: Offline
Profile     Report this post  
Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle