Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6213 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

22 crawler(s) on-line.

95 guest(s) on-line.

0 member(s) on-line.

You are an anonymous user.
Register Now!

zipper: 5 mins ago

MEGA_RJ_MICAL: 6 mins ago

TheMightyTRexUK: 1 hr 19 mins ago

Panthro: 1 hr 33 mins ago

pixie: 1 hr 39 mins ago

OneTimer1: 2 hrs 54 mins ago

Amiboy: 3 hrs 14 mins ago

Musashi5150: 3 hrs 38 mins ago

number6: 4 hrs 38 mins ago

matthey: 5 hrs 33 mins ago

Forum Index

Amiga Development

Packed Versus Planar: FIGHT

Poster

Thread

Hammer

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 14:16:22

[ #381 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6500
From: Australia

@Gunnar

Quote:

Gunnar wrote:
@Karlos

Quote:

Karlos wrote:
@Gunnar

The issue here is clearly one of relative versus absolute performance. I believe your claim is that, per MHz, AMMX is faster than AltiVec. I'm sure that may be true for some operations. However the claim that it's "faster than" Altivec, without further qualification, risks being misinterpreted as delivering higher throughput on any altivec implementation. I doubt you are claiming that the Apollo as it stands today can compete with the last generation 1-2 GHz rated G4 and G5 parts in raw vector performance.

Yes, for the workloads we speak about = AMMX is more efficient than ALTIVEC.
Yes, for the workloads we speak about = AMMX has more performance than ALTIVEC.
If you put an POWERPC in the same technology, if you put a POWERPC in an FPGA, than AMMX offer higher performance than ALTIVEC.

So its clear that if you produce them on the same level like both FPGA or both ASIC then AMMX system can beat ALTIVEC.

But what if you compare not the same clockrate?
What if you compete with existing systems like AmigaOne or Pegasos?

If you compare the real world performance that existing systems.
If you compare what the V4 does deliver versus AmigaOne G4, and Pegasos 2 G4.
If you look at what real life, real memory, gaming blitting performance the deliver
= then the 68080 AMMX system does outperform 1GHz PowerPC systems
in the maximum real screen game/sprite blitting performance.

Please show Quake demo3 320x200 resolution benchmark for Vampire V4.

Without the Emu68 monitor program running in the background, PiStorm's Quake demo3 320x200 resolution benchmark yields 65 fps which is about Pentium II 266/300 and Celeron 300A performance levels.

Quake delivers 3D perspective parallax that is superior when compared to fake 2D multi-parallax.

https://www.youtube.com/watch?v=nR3w5scj1Yw
Quake engine mod in a top-down view. This 3D engine mod can support perspective-correct parallax top-down shootem-up games.

This is why Commodore focused on the OpenGL-capable Amiga Hombre chipset.

You're wasting AC68080 performance on fake 2D multi-parallax.

Last edited by Hammer on 04-Oct-2022 at 02:26 PM.
Last edited by Hammer on 04-Oct-2022 at 02:23 PM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 14:19:10

[ #382 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4954
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

I don't mind waiting for you to code it up. You can just use raw binary image dumps if it makes it easier to implement and validate the output.

You've gone to lengths to point out the inadequacies in the claims made around TINA. You've subsequent made claims that sound equally outlandish, namely the 68080 using AMMX can outperform a 1GHz G4 using Altivec for classic steam processing operations like alpha blending.

Perhaps someone else here with a functional PPC system (MorphOS a last generation G4 Mac for example) can provide a version to rest against if you'd prefer to work only on the AMMX version.

Your claim sounds like horseshit, but I'm perfectly willing to be persuaded by real benchmarks like the above.

_________________
Doing stupid things for fun...

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 14:27:25

[ #383 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4954
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Bosanac

Quote:

Bosanac wrote:
@Gunnar

Quote:

The armchair CPU experts performance discussion here of people with no real world coding experience are just a waste of time.

You hear that?

Just exactly what do you do all day you lazy oaf?

And you can take your name off the AmigaOS credits too! Bloody shyster!

To be fair, I haven't written any altivec code since the A1 croaked.

_________________
Doing stupid things for fun...

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 14:30:39

[ #384 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

Yes we have PowerPC and from G2 to G5 here, we also have CELL .
And we did a log of benchmarks.
You will recall that we also ported a game to MorphOS and produced software for PPC.
When Pegasos 2 was new, we did all those benchmarks you might recall those number.
But this quite some time ago, this was 2005/2006 -I think?
We did publish benchmark tools and the results.
Today my A1 is toast and I've not turned any of my Pegasos on since maybe ten years.

I've sources and EXE still around, and can happily give your source to compile

Here for example a well tuned memcopy using Altivec:
Be impressed:

Quote:

memcpy:
subf. r7,r4,r3
cmpwi cr1,r5,0
cmpwi cr7,r5,16
addi r8,r4,-1
addi r9,r3,-1
add r10,r4,r5
beqlr
add r11,r3,r5
subf r0,r3,r4
beqlr cr1
bgt- Cpy_bkwd
cmpwi cr5,r0,128
bgt- cr7,v_memcpy
mtctr r5

Byte_cpy_fwd:
lbzu r0,1(r8)
stbu r0,1(r9)
bdnz+ Byte_cpy_fwd
blr
nop

Cpy_bkwd:
cmpwi cr5,r7,128
cmpw cr6,r7,r5
bgt- cr7,v_memmove
mtctr r5

Byte_cpy_bwd:
lbzu r0,-1(r10)
stbu r0,-1(r11)
bdnz+ Byte_cpy_bwd
blr

v_memmove:
clrlwi r8,r4,28
clrlwi r9,r3,28
bge- cr6,MC_entry
lis r11,268
subf. r8,r9,r8
lvsr v2,r0,r7
ori r11,r11,65504
dst r10,r11,0
addi r11,r10,-1
bgt- Rt_shft
addi r8,r8,16

Rt_shft:
rlwinm r11,r11,0,0,27
addi r7,r5,-1
subf r0,r11,r10
add r11,r3,r7
addi r10,r3,16
subf. r8,r0,r8
clrlwi r0,r11,28
rlwinm r10,r10,0,0,27
blt- Get_bytes_rt
lvx v1,r4,r7
addi r4,r4,-16

Get_bytes_rt:
lvx v0,r4,r7
subf r10,r10,r11
cmpwi cr7,r0,15
cmpwi cr1,r9,0
rlwinm r10,r10,28,4,31
add r0,r3,r5
cmpwi cr6,r10,0
vperm v3,v0,v1,v2
vor v1,v0,v0
beq- cr7,Rt_just
mtcrf 1,r0
rlwinm r11,r11,0,0,27
li r9,0
bge- cr7,Only_1W_bkwd
stvewx v3,r11,r9
addi r9,r9,4
stvewx v3,r11,r9
addi r9,r9,4

Only_1W_bkwd:
ble- cr7,Only_2W_bkwd
stvewx v3,r11,r9
addi r9,r9,4

Only_2W_bkwd:
bne- cr7,Only_B_bkwd
stvehx v3,r11,r9
addi r9,r9,2

Only_B_bkwd:
bns- cr7,All_done_bkwd
stvebx v3,r11,r9
b All_done_bkwd

Rt_just:
stvx v3,r3,r7

All_done_bkwd:
addi r7,r7,-16
ble- cr6,Last_load
mtctr r10
cmpwi cr6,r10,4

QW_loop:
lvx v0,r4,r7
vperm v3,v0,v1,v2
vor v1,v0,v0
stvx v3,r3,r7
addi r7,r7,-16
bdnzf+ 4*cr6+gt,QW_loop
add r9,r3,r7
bgt- cr6,GT_4QW

Last_load:
blt- No_ld_bkwd
addi r4,r4,16

No_ld_bkwd:
lvx v0,r0,r4
dss 0
vperm v3,v0,v1,v2
subfic r9,r3,16
beq- cr1,Lt_just
mtcrf 1,r9
li r9,0
bns- cr7,No_B_bkwd
stvebx v3,r3,r9
addi r9,r9,1

No_B_bkwd:
bne- cr7,No_H_bkwd
stvehx v3,r3,r9
addi r9,r9,2

No_H_bkwd:
ble- cr7,No_W1_bkwd
stvewx v3,r3,r9
addi r9,r9,4

No_W1_bkwd:
bge- cr7,No_W2_bkwd
stvewx v3,r3,r9
addi r9,r9,4
stvewx v3,r3,r9
b No_W2_bkwd

Lt_just:
stvx v3,r0,r3

No_W2_bkwd:
blr

GT_4QW:
lvx v0,r4,r7
mtcrf 2,r9
vperm v3,v0,v1,v2
vor v1,v0,v0
addi r9,r9,-16
stvx v3,r3,r7
vor v7,v0,v0
addi r7,r7,-16
bdnzt+ 4*cr6+so,GT_4QW
lis r8,258
mtcrf 2,r3
addi r9,r7,-16
ori r8,r8,65504
addi r11,r4,-64
bso- cr6,B32_bkwd
bdnz- B32_bkwd

B32_bkwd:
lvx v6,r4,r7
addi r11,r11,-32
lvx v1,r4,r9
vperm v3,v6,v7,v2
dst r11,r8,1
dcba r3,r9
vperm v4,v1,v6,v2
vor v7,v1,v1
stvx v3,r3,r7
addi r7,r9,-16
bdz- Nxt_loc_bkwd

Nxt_loc_bkwd:
stvx v4,r3,r9
addi r9,r7,-16
bdnz+ B32_bkwd
bns- cr6,One_odd_QW
b Last_load

One_odd_QW:
lvx v1,r4,r7
vperm v4,v1,v7,v2
stvx v4,r3,r7
b Last_load
nop

v_memcpy:
clrlwi r8,r4,28
clrlwi r9,r3,28

MC_entry:
lis r10,268
subf. r8,r8,r9
lvsr v2,r0,r7
ori r10,r10,32
dst r4,r10,0
addi r10,r3,16
addi r11,r11,-1
bge- Ld_bytes_rt
lvx v0,r0,r4
addi r4,r4,16

Ld_bytes_rt:
lvx v1,r0,r4
rlwinm r10,r10,0,0,27
cmpwi cr1,r9,0
subf r0,r3,r10
subf r10,r10,r11
li r7,0
mtcrf 1,r0
rlwinm r10,r10,28,4,31
vperm v3,v0,v1,v2
vor v0,v1,v1
beq- cr1,Left_just
bns- cr7,No_B_fwd
stvebx v3,r3,r7
addi r7,r7,1

No_B_fwd:
bne- cr7,No_H_fwd
stvehx v3,r3,r7
addi r7,r7,2

No_H_fwd:
ble- cr7,No_W1_fwd
stvewx v3,r3,r7
addi r7,r7,4

No_W1_fwd:
bge- cr7,No_W2_fwd
stvewx v3,r3,r7
addi r7,r7,4
stvewx v3,r3,r7
b No_W2_fwd

Left_just:
stvx v3,r0,r3

No_W2_fwd:
clrlwi r0,r11,28
cmpwi cr6,r10,0
li r7,16
cmpwi cr1,r0,15
cmpwi cr7,r10,14
ble- cr6,Last_ld_fwd
mtctr r10
cmpwi cr6,r10,4

QW_fwd_loop:
lvx v1,r4,r7
vperm v3,v0,v1,v2
vor v0,v1,v1
stvx v3,r3,r7
addi r7,r7,16
bdnzf+ 4*cr6+gt,QW_fwd_loop
add r9,r3,r7
addi r10,r10,-1
bgt- cr6,GT_4QW_fwd

Last_ld_fwd:
add r11,r3,r5
add r10,r4,r5
bge- No_ld_fwd
addi r10,r10,-16

No_ld_fwd:
mtcrf 1,r11
addi r11,r11,-1
addi r0,r10,-1
lvx v1,r0,r0
dss 0
dss 1
vperm v3,v0,v1,v2
beq- cr1,Rt_just_fwd
rlwinm r11,r11,0,0,27
li r9,0
bge- cr7,Only_1W_fwd
stvewx v3,r11,r9
addi r9,r9,4
stvewx v3,r11,r9
addi r9,r9,4

Only_1W_fwd:
ble- cr7,Only_2W_fwd
stvewx v3,r11,r9
addi r9,r9,4

Only_2W_fwd:
bne- cr7,Only_B_fwd
stvehx v3,r11,r9
addi r9,r9,2

Only_B_fwd:
bns- cr7,All_done_fwd
stvebx v3,r11,r9
b All_done_fwd

Rt_just_fwd:
stvx v3,r3,r7

All_done_fwd:
blr
nop
nop
nop

GT_4QW_fwd:
lvx v1,r4,r7
addi r10,r10,-1
mtcrf 2,r9
addi r9,r9,16
addi r0,r10,-2
vperm v3,v0,v1,v2
vor v0,v1,v1
stvx v3,r3,r7
addi r7,r7,16
bdnzf+ 4*cr6+so,GT_4QW_fwd
mtcrf 2,r11
lis r8,260
addi r9,r7,16
ori r8,r8,32
rlwinm r11,r0,29,3,31
rlwinm r0,r0,0,0,28
bgt- cr7,Big_loop

No_big_loop:
addi r11,r4,256
xoris r8,r8,6
bns- cr6,B32_fwd
bdnz- B32_fwd

B32_fwd:
lvx v1,r4,r7
addi r11,r11,32
lvx v6,r4,r9
vperm v4,v0,v1,v2
dst r11,r8,1
dcba r3,r7
vperm v3,v1,v6,v2
vor v0,v6,v6
stvx v4,r3,r7
addi r7,r9,16
bdz- Nxt_loc_fwd

Nxt_loc_fwd:
stvx v3,r3,r9
addi r9,r7,16
bdnz+ B32_fwd
bso- cr6,One_even_QW
b Last_ld_fwd

One_even_QW:
lvx v1,r4,r7
vperm v3,v0,v1,v2
vor v0,v1,v1
stvx v3,r3,r7
addi r7,r7,16
b Last_ld_fwd
nop

Big_loop:
subf r10,r0,r10
blt+ cr5,No_big_loop
mtctr r11
addi r11,r4,256

Loop_of_128B:
lvx v1,r4,r7
addi r9,r7,32
addi r11,r11,128
lvx v7,r4,r9
addi r9,r9,32
lvx v9,r4,r9
addi r9,r9,32
lvx v11,r4,r9
addi r9,r7,16
lvx v6,r4,r9
addi r9,r9,32
lvx v8,r4,r9
addi r9,r9,32
lvx v10,r4,r9
addi r9,r9,32
vperm v3,v0,v1,v2
lvx v0,r4,r9
vperm v4,v1,v6,v2
dst r11,r8,1
dcba r3,r7
stvx v3,r3,r7
addi r7,r7,16
vperm v5,v6,v7,v2
stvx v4,r3,r7
addi r7,r7,16
vperm v6,v7,v8,v2
dcba r3,r7
stvx v5,r3,r7
addi r7,r7,16
vperm v7,v8,v9,v2
stvx v6,r3,r7
addi r7,r7,16
vperm v8,v9,v10,v2
dcba r3,r7
stvx v7,r3,r7
addi r7,r7,16
vperm v9,v10,v11,v2
stvx v8,r3,r7
addi r7,r7,16
vperm v10,v11,v0,v2
dcba r3,r7
stvx v9,r3,r7
addi r7,r7,16
stvx v10,r3,r7
addi r7,r7,16
bdnz+ Loop_of_128B
mtctr r10
addi r9,r7,16
bns+ cr6,B32_fwd
bdnz+ B32_fwd

bcopy:
mr r0,r3
mr r3,r4
mr r4,r0
b bcopy+0xc
nop
nop

I will did around and give you some EXE and results.

But honestly beating a AmigaONE XE is not a heroes deed
This is nothing to be very proud of. This is like kicking a disabled guy in a wheelchair.

The AmigaONE XE is totally crippled because of its low memory performance.

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 14:33:04

[ #385 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4954
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

Quote:
You misunderstood what I said.

Can you explain how I misunderstood you here? I asked a very clear question and you responded unambiguously.

Quote:

Gunnar wrote:

Quote:

Karlos wrote:
@Gunnar

Those are bold claims.

I have both 68080 and PowerPC Systems here

Quote:

Your workloads must be very selective. Alpha blending is a good example though. Suppose I have two large pixel, e.g 1080p arrays of ARGB 32-bit pixels and I want to alpha blend buffer B onto buffer A using B's alpha channel.

Are you claiming the 68080, at it's normal clock rate, using AMMX will complete this in less time than a 1GHz PPC using altivec instructions to perform this task?

Yes correct.

Emphasis mine.

Last edited by Karlos on 04-Oct-2022 at 02:34 PM.

_________________
Doing stupid things for fun...

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 14:39:45

[ #386 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@Karlos

Fair enough let us make a competition:

Lets make a real simple example. Here is my proposal:
Lets code and compare "8bit Cookie Cut Sprite Copy" for both systems.
Lets say, both src and dst are 1024x1024 pixel.
Lets say that source starts on a misaligned address.
Let us measure how many milisec each system needs for the task!

Are you ready for this?

The needed code for 68080 is only the mini loop that was posted before.

Quote:

.loop
load (A0)+,D0
storem2.b D0,(A1)+
dbra.l D1,.loop

Last edited by Gunnar on 04-Oct-2022 at 02:40 PM.

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 14:42:08

[ #387 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4954
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

I already asked first and my example should be simpler to write for both platforms as it doesn't require handling misaligned data not does it have to worry about pathologically slow read access to VRAM.

My A1 is dead so it's not even in the race.
https://imgflip.com/i/6vp7t7

Last edited by Karlos on 04-Oct-2022 at 02:43 PM.

_________________
Doing stupid things for fun...

Status: Offline

Hammer

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 14:45:55

[ #388 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6500
From: Australia

@Gunnar

PiStorm/Emu68 and associated Raspberry Pi 3a+ (quad-core ARM Cortex A53 @ 1.4Ghz, Broadcom IV OpenGL ES2.0 IGP) hardware is a real and present competitor against FireBird V4.

Raspberry Pi 3a+ can deliver about 2194.68 megabytes per second of memory bandwidth (1)

Reference
1. https://magpi.raspberrypi.com/articles/raspberry-pi-specs-benchmarks
Raspberry pi Foundation has defined MBps as megabytes per second.

------

Raspberry Pi 4 CM and 4B+ are coming with PiStorm32 and PiStorm32 Lite. PiStorm32 has NVMe storage support.

TF1200 Buffee was upgraded with ARM Cortex A53-based SOC from TI.

Last edited by Hammer on 04-Oct-2022 at 02:50 PM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 14:48:43

[ #389 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@Karlos

Quote:

Can you explain how I misunderstood you here? I asked a very clear question and you responded unambiguously.

Yes I can explain you this:

Your proposal was:
Quote:

For the purposes of this, you can have everything optimally aligned and in whatever passes for Fast RAM

My point was that doing benchmarks with always optimally aligned data are wrong and useless benchmarks. As in real world you need to able to support misaligned operations.

Just look at your mouse pointer, it can move to any location on screen, right?
It not jumps only to positions evenly dividable by 16, right?

So proposing to run a benchmark only under "useless" conditions = would be useless.
All clear now and do you agree with me?

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 14:52:09

[ #390 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@Hammer

Can you do anything else than post off topic images?
We all know that posting INTEL picture in Amiga forums is a super power of you.

If you can code, then why dont you help Karlos to code the PPC code he wants?

Status: Offline

Hammer

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 14:55:20

[ #391 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6500
From: Australia

@Gunnar

1. The only PowerPC hardware that I owned is a color laser printer. Quake frame rate from paper printing is very slow.

2. Quake is a real gaming benchmark. Again, please show Quake demo3 320x200 resolution benchmark for Vampire V4.

I have Coffin R58's and R60's Quake installs.

3. I have posted a memory bandwidth benchmark for Raspberry Pi 3a+ hardware.

Unlike Apollo-Core's Intel Inside Cyclone III and V, Raspberry Pi 3a+ doesn't include Intel's CPU and Cyclone FPGA.

PiStorm includes Intel Inside Altera MAX II CPLD.

Last edited by Hammer on 04-Oct-2022 at 04:10 PM.
Last edited by Hammer on 04-Oct-2022 at 04:07 PM.
Last edited by Hammer on 04-Oct-2022 at 03:32 PM.
Last edited by Hammer on 04-Oct-2022 at 03:31 PM.
Last edited by Hammer on 04-Oct-2022 at 03:28 PM.
Last edited by Hammer on 04-Oct-2022 at 03:19 PM.
Last edited by Hammer on 04-Oct-2022 at 03:16 PM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 14:55:31

[ #392 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@Karlos

Quote:

Karlos wrote:
@Gunnar

I already asked first and my example should be simpler to write for both platforms as it doesn't require handling misaligned data not does it have to worry about pathologically slow read access to VRAM.

Yes VRAM reads are slow on AmigaONE XE - much slower than on Vampire.
But also FASTMEM reads are slower on AmigaONE XE than on Vampire.

This is my point

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 15:00:07

[ #393 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4954
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

You seem to have a problem with recall. The thing you claim I misunderstood is not stated in your original claim. I just asked about blending a pair of 1080p buffers.

You said, without any ambiguity that the 68080/AMMX would be faster. I stated my incredulity at this and challenge you to prove it.

If you allocate a pair of pixel arrays like this, e.g. two 1080p buffer, you aren't going to intentionally misalign their addresses unless you are particularly silly. At the very least you'd align them to a 32 bit boundary and more than likely a cache aligned one.

_________________
Doing stupid things for fun...

Status: Offline

Hammer

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 15:08:43

[ #394 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6500
From: Australia

@Gunnar

AmigaONE XE's MAI Articia 'S' northbridge is garbage.

For POWER's modern memory bandwidth example, use Raptor Blackbird POWER9-based system.

Raptor Blackbird is a nice PCIe 4.0 class motherboard and my ASUS ROG Strix X570 PCIe 4.0 class motherboard murdered Raptor Blackbird in the price category. Both Raptor Blackbird and ASUS Strix X570 support DDR4 ECC memory.

Raptor Blackbird is a modern motherboard.

I don't understand why Hyperion is not porting AmigaOS 4.1 FE for the Raptor Blackbird.

If Raptor Blackbird was price competitive and Raptor Blackbird with POWER 9 v2 (4 cores with 16 threads) is feature competitive, it would be on my buy list.

Last edited by Hammer on 04-Oct-2022 at 03:10 PM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Gunnar

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 15:10:59

[ #395 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@Karlos

Quote:

You seem to have a problem with recall. The thing you claim I misunderstood is not stated in your original claim. I just asked about blending a pair of 1080p buffers.

I wrongly assumed you did read my post before.
I tried to explain that in real world systems need to handle misaligned.
And my point was that benchmarks using only aligned data are by design nonsense .

If you take a look at ALTIVEC memcopy that I posted, then you see that its 400 lines of code.
The reason why the code is so complicated is that it needs to handle alignment.

You either overlooked that this was my starting point?
Or you ignored it
Proposing an only aligned benchmark after we made a point that all only aligned benchmarks are nonsense - is a little strange.

But it does not matter - even for the best case the AmigaOne XE can not win.
As the maximum speed of the operation is memory bound.
And the memory speed of the 800Mhz G4 PPC system is lower than the memory speed of 100 MHz 68080 CPU.

Alphablending is very easy for the 68080

.loop
load (a0),D0
mulalpha (a1)+,D0
store D0,(a0)+
dbra.l d1,.loop

The 68080 can do full alpha blending of 2 sources and writing 1 destination in 3 clocks per loop = processing 3x64bit per loop. This means three 64bit memory access in three clock cycles.
With the slow memory interface the AmigaOne XE can never win this.
Please write your testcase, and let us run them :)

Last edited by Gunnar on 04-Oct-2022 at 03:16 PM.

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 15:34:16

[ #396 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4954
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

It doesn't matter what your opinion of alignment is because both versions are free to use the most optimal alignment for the test. What's good for one is good for the other.

Alpha Blend a source 1080p 32-bit ARGB buffer onto a destination 1080p 32-bit ARGB buffer, each optimally aligned in the fastest RAM you can read from/write to. Time the complete operation in milliseconds.

This is the challenge, everything else you've said about it since is irrelevant. You already claimed 68080 would outperform a GHz class PPC using AltiVec optimised code for the same task.

All I ask is that you prove it.

Quote:
Please write your testcase, and let us run them

You already know I don't have a functional G4 class PPC so this is just showboating. Hopefully someone else here does though.

Last edited by Karlos on 04-Oct-2022 at 03:37 PM.
Last edited by Karlos on 04-Oct-2022 at 03:37 PM.

_________________
Doing stupid things for fun...

Status: Offline

kolla

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 20:41:22

[ #397 ]

Elite Member

Joined: 20-Aug-2003
Posts: 3473
From: Trondheim, Norway

@Gunnar

I want to see video of Slamtilt on V4, with someone good enough to actually reach multiball (and interlaced screen mode).

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

kolla

Re: Packed Versus Planar: FIGHT
Posted on 4-Oct-2022 20:48:15

[ #398 ]

Elite Member

Joined: 20-Aug-2003
Posts: 3473
From: Trondheim, Norway

@cdimauro

Quote:

Anyway, the important message was that they were upset with the very bad work done (Again! According to them) by those demo makers.

Hyperbole, jargon and timing suggests tongue-in-cheek humour.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

Hypex

Re: Packed Versus Planar: FIGHT
Posted on 5-Oct-2022 5:50:18

[ #399 ]

Elite Member

Joined: 6-May-2007
Posts: 11351
From: Greensborough, Australia

@Karlos

Quote:
PHPoop. How'd I miss that?

Ha!

Yes the unfortunate OOP extension to PHP.

Quote:
PHP lets you write OO code or not, much like C++. Before v5, the OO features were rather basic. It became significantly better with 5, with the conversion to reference semantics for objects by default (the norm for other OO languages generally), support for interfaces, abstract classes, namespaces and later traits. 7.4 "introduces" proper covariance (in my view a this is a "fix" to something that should've worked in the first place). All "modern" PHP projects tend to be highly OO.

Looks like they've really improved it a lot. Not so poopy then now. A good feature is not being locked to OOP always, and being able to write in it or not, which reminds me of AmigaE.

Quote:
That's mostly due to a class-per-file "best practises" approach, tbh.

I find it's good to keep it organised when a project expands. For example, back in 2007 or abouts, CIAgent start off as one source file. I'd never written anything big really that needed to go beyond one file apart from includes or AmigaE modules which are compiled objects. But I was using the free GoldED to edit it in which had a limit of 500 lines or something. So then I needed to split it up. But it was a good idea as it keeps expanding and is organised into ten files or something now. I ended up buying CubicIDE which includes full GoldED, so it's somewhat ironic now that I don't have line limits, but still need to keep the sources split up in different files.

Quote:
Probably. I do sometimes fall foul of mixing the terms. The current design is OO so it will likely be a relatively simple port. It could also be ported to C with a little more effort.

When there's a mention of OOP I immediately think of C++.

Would it be worth the effort porting to C?

I tend to think of what happened with datatypes when they ported Obj-C to C. The mess is still there in OS4. Despite OS4 including OOP idea with methods what would have really cleaned up the datatypes API.

Status: Offline

Hypex

Re: Packed Versus Planar: FIGHT
Posted on 5-Oct-2022 6:19:34

[ #400 ]

Elite Member

Joined: 6-May-2007
Posts: 11351
From: Greensborough, Australia

@cdimauro

Quote:
So, you still haven't the original version.

No, but this one is on Aminet:
http://aminet.net/package/demo/mega/StateOfTheArt

Quote:
I think it was a typo from Skid Row: it should have been "WITH" (not WITHOUT) 680x0.

Yes. When x != 0.

Quote:
Anyway, the important message was that they were upset with the very bad work done (Again! According to them) by those demo makers.

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle