Click Here
home features news forums classifieds faqs links search
6071 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
16 crawler(s) on-line.
 121 guest(s) on-line.
 0 member(s) on-line.



You are an anonymous user.
Register Now!
 Hammer:  8 mins ago
 amigakit:  16 mins ago
 OneTimer1:  20 mins ago
 pixie:  27 mins ago
 kolla:  30 mins ago
 Rob:  49 mins ago
 matthey:  53 mins ago
 corb0:  1 hr 19 mins ago
 zipper:  1 hr 20 mins ago
 RobertB:  2 hrs 54 mins ago

/  Forum Index
   /  Amiga Emulation
      /  Relative performance of some 680x0 code...
Register To Post

Goto page ( 1 | 2 Next Page )
PosterThread
Karlos 
Relative performance of some 680x0 code...
Posted on 7-May-2004 2:06:58
#1 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4404
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

Hi,

I have recently been testing out some 680x0 code I wrote across a variety of systems for compatibility purposes.

I have a small program that tests some pixel format conversion routines I wrote and also measures simple read/write/copy operations on RAM RAM / RAM VRAM.

The code is intresting from the point of view that it doesn't rely on OS calls (apart for timing and bitmap locking purposes) and does actually do work in that it operates on data and generates results that are stored so isn't "optimized away" by JIT emulation.

Anyhow, I discovered something that I thought was interesting and figured I'd share it.

What I discovered is that under interpretive 68020/FPU emulation on 240MHz BlizzardPPC/OS4 the code outperformed my real 68040 considerably (up to 2x for the memory copy test).

Out of intrest, I decided to try the same thing in UAE, using non-JIT emulation of 68020+FPU (for comparison fairness), using "fastest possible" emulation settings. This was on a fairly old PC, a 500MHz K6-II with 100MHz FSB and AGP 2x Voodoo2.
The results were much slower than I expected even though the potential bandwidth is a lot higher.

Anyway, here they are. All transfers are for 16x unrolled aligned 32-bit operations (apart from some move16 tests on 040 which are obviously 16 byte aligned).

I've only included the tests that involve moving (rather than converting) data. Thus the memory speed plays an important part here, but of course the PC used has the fastest bandwidth available out of the configurations tested (making the results more surprising).

Test configuration 1

BlizzardPPC [68040 25MHz] OS3.5 BB2, CGX 4.2
BVision, ScreenMode 1024x768x16 85Hz
256MB 60ns

Read RAM: 23237.34 K/sec
Write RAM: 23692.00 K/sec
RAM->RAM: 8563.27 K/sec
RAM->RAM[16]: 12790.70 K/sec
Read VRAM: 4423.96 K/sec
Write VRAM: 10344.83 K/sec
RAM->VRAM: 6916.43 K/sec
RAM->VRAM[16]: 9248.55 K/sec
VRAM->RAM: 3302.75 K/sec
VRAM->RAM[16]: 4332.13 K/sec


Test Configuration 2

BlizzardPPC 603e 240MHz [68020+FPU emulated, non-JIT], OS4.0Beta (kernel 50.30), P96
BVision, ScreenMode 1024x768x16 85Hz
256MB 60ns

Read RAM: 26164.52 K/sec
Write RAM: 26061.20 K/sec
RAM->RAM: 18343.20 K/sec
RAM->RAM[16]: N/A
Read VRAM: 4660.19 K/sec
Write VRAM: 14371.26 K/sec
RAM->VRAM: 11078.72 K/sec
RAM->VRAM[16]: N/A
VRAM->RAM: 4146.10 K/sec
VRAM->RAM[16]: N/A


Test Configuration 3

Apollo 1240 [68040 28MHz], OS3.9 BB2, P96
Voodoo3000, ScreenMode 1024x768x16PC 75Hz
32Mb 60ns

Read RAM: 31516.35 K/sec
Write RAM: 31641.79 K/sec
RAM->RAM: 12524.85 K/sec
RAM->RAM[16]: 17786.56 K/sec
Read VRAM: 5586.59 K/sec
Write VRAM: 9266.41 K/sec
RAM->VRAM: 7086.61 K/sec
RAM->VRAM[16]: 7894.74 K/sec
VRAM->RAM: 4166.67 K/sec
VRAM->RAM[16]: 6000.00 K/sec


Test Configuration 4

K6-II 500MHz, UAE [68020+FPU emulated, non-JIT], OS3.5 bb2, P96
UAEGFX, ScreenMode 1024x768x16PC 85Hz
256Mb 10ns

Read RAM: 8991.01 K/sec
Write RAM: 8259.59 K/sec
RAM->RAM: 11439.47 K/sec
RAM->RAM[16]: N/A
Read VRAM: 8670.52 K/sec
Write VRAM: 11875.59 K/sec
RAM->VRAM: 8771.93 K/sec
RAM->VRAM[16]: N/A
VRAM->RAM: 8179.16 K/sec
VRAM->RAM[16]: N/A

I really expected a lot more out of UAE than this I must stress that these are measuring code that *does not* depend on OS calls, giving the OS4 version no native performance advantage.

-edit-

Sweeping generalisation follows:

From the above observation that a PPC less than half the clockspeed of the x86 walks all over it for *interpreted* 680x0 emulation, we can conclude PPC rocks, x86 sucks

/me zips up fire retardent asbestos overcoat...

Right, flame on

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
DrBombcrater 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 2:30:00
#2 ]
Super Member
Joined: 6-Feb-2004
Posts: 1382
From: UK

@Karlos

Emulators in general and UAE in particular, are very sensitive to cache effectiveness and memory throughput, both of which are very poor on the K6-2. Change that K6 for a chip with on-die L2 cache and a better support chipset and the results would go up considerably, even using the same PC100 memory as the K6.

_________________
Who do you serve, and who do you trust? - Galen

 Status: Offline
Profile     Report this post  
Karlos 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 2:42:42
#3 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4404
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@DrBombcrater

Point taken, but is it poorer than the 603e with it's total lack of L2 cache and 32-bit memory interface running at 60MHz?

-edit-

I'm expecting some results from a P4 system running UAE and an A1XE (G4 800MHz) soon...

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Hammer 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 6:08:16
#4 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5285
From: Australia

@Karlos

Note that K6-2 's performance doesn't represent the modern x86 performance i.e. this is due to numerous changes in CPU core and the supporting core logics. Do you have the test software available for download (I have WinUAE-JIT/AOS3.9 on K7 Athlon XP/nForce2 400 Ultra as a test box)?

Note that there?s some flaws with K6?s decoder i.e. it can?t supply enough decoded instructions to the post-RISC core.

For 65 CPUs from 100 MHz to 3066 MHz benchmarks refer to
http://www4.tomshardware.com/cpu/20030217/index.html

Most of the test systems were equipped with ATI Radeon 9700 VPU card (with exception of a single PC setup i.e. Pentium Classic @100Mhz was fitted with a Geforce 4 TI 4200).

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
EntilZha 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 11:36:45
#5 ]
OS4 Core Developer
Joined: 27-Aug-2003
Posts: 1679
From: The Jedi Academy, Yavin 4

@Karlos

Can you send me the test program ? I'll run it through the JIT...

_________________
Thomas, the kernel guy

"I don't have a frigging clue. I'm norwegian" -- Ole-Egil

All opinions expressed are my own and do not necessarily represent those of Hyperion Entertainment

 Status: Offline
Profile     Report this post  
Rogue 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 11:38:14
#6 ]
OS4 Core Developer
Joined: 14-Jul-2003
Posts: 3999
From: Unknown

@Hammer

Quote:
Note that K6-2 's performance doesn't represent the modern x86 performance


Neither does a 603 represent modern PPC performance

_________________
Seriously, if you want to contact me do not bother sending me a PM here. Write me a mail

 Status: Offline
Profile     Report this post  
DrBombcrater 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 12:21:53
#7 ]
Super Member
Joined: 6-Feb-2004
Posts: 1382
From: UK

@Karlos

Quote:
Point taken, but is it poorer than the 603e with it's total lack of L2 cache and 32-bit memory interface running at 60MHz?

The K6 is obviously quicker than the 603e but remember that UAE's overhead is much greater than a plain 68k emulator because it's emulating an entire platform, not just the CPU. Cutting away that non-CPU stuff can boost the speed of emulation by a factor of 2 or 3 -- that's why Amithlon is so much quicker than UAE despite using a similar CPU emulation core.

Also, don't underestimate how bad the memory bus can be on K6 systems. At default timings many of them can barely pull 100MB/sec from main memory.

Quote:
I'm expecting some results from a P4 system running UAE and an A1XE (G4 800MHz) soon...

I'd be happy to provide results from Amithlon on AthlonXP hardware, if you want them

_________________
Who do you serve, and who do you trust? - Galen

 Status: Offline
Profile     Report this post  
Georg 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 12:32:04
#8 ]
Regular Member
Joined: 14-May-2003
Posts: 451
From: Unknown

@Karlos

Quote:

From the above observation that a PPC less than half the clockspeed of the x86 walks all over it for *interpreted* 680x0 emulation, we can conclude PPC rocks, x86 sucks


It would be better to do the comparison between Linux/PPC/UAE and Linux/x86/UAE ...

 Status: Offline
Profile     Report this post  
KimmoK 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 12:41:29
#9 ]
Elite Member
Joined: 14-Mar-2003
Posts: 5211
From: Ylikiiminki, Finland

@DrBombcrater

"Cutting away that non-CPU stuff can boost the speed of emulation by a factor of 2 or 3 -- that's why Amithlon is so much quicker than UAE despite using a similar CPU emulation core."

I thought it's more because of JIT and the fact that it is using native GFX HW instead of emulated.

What is the real difference of UAE and Amithlon on CPU or memory intensive tests?

"I'd be happy to provide results from Amithlon on AthlonXP hardware, if you want them "

That would be interesting.


Also running the test on AOS4 version of UAE would be interesting... insane as well, but aren't we all a little bit

_________________
- KimmoK
// For freedom, for honor, for AMIGA
//
// Thing that I should find more time for: CC64 - 64bit Community Computer?

 Status: Offline
Profile     Report this post  
Karlos 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 13:19:46
#10 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4404
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@all

To my mind JIT warps the results for any benchmark although I find it difficult to see how a JIT could optimise away the code in this test (because it actually generates data that is written back to memory) apart from the memory read tests which naturally don't do anything useful (a big fat unrolled loop of move.l (a0)+, d0).

You can get the test program here.

Usage is

pixeltest [width < w >] [height < h >] [fullscreen < d >] [fmt < f >]

Where:

w/h = test are dimensions (default is 640x480)

d = depth for fullscreen test. Must be >= 15. Defaults to running in a window. You must have a 15 bit or higher workbench.

f = absolute pixel format of source data (default is same format as display)

valid values for this parameter are

rgb16b, rgb15l, brg15b, bgr15l
rgb16b, rgb16l, brg16b, bgr16l
rgb24p, bgr24p
argb32b, argb32l, abgr32b, abgr32l

Be warned, it has bugs that can crash your system if you use 24 bit packed source data and strange widths that cause a modulus. It is only a test program written for my own use so I'm not about to waste time fixing it (it seems the bug is in the actual application when it does the memory write test, not the 24-bit -> X conversion code).

If you simply run it without any arguments, as I did here, it will open a 640x480 window on your (15 bit or higher) workbench and run it's battery of tests. The conversion test can be ignored largely because it will effectively be only a scanline copy in this case.

To be fair to the original tests, I recommend using a 16-bit 1024x768 display at about 85 Hz refresh (or as close as you can get). Don't worry about the endian nature of the screen since the read/write/copy tests don't do any conversion.

The code generates more output than I posted (various information about the screen/test data format and so on), I only put the raw read/write/copy speeds.

@EntilZha & Hammer

Sure, I'd be interested to see the results but I felt the non-JIT tests were slightly more representetive. I expect that on any type of optimising JIT, the read test performance will be sky high. You might have to watch out for overflow there

I'm, also keen to see interpreted mode tests on these systems, so if you could run those also, I'd be grateful.

@Rogue

Exactly what I was thinking too.

Still, in playing around with the thing I feel comfortable in saying the interpretive 020/FPU emulation in OS4 is of a comparable speed to my real 040 - some things are appreciably quicker also - no doubt code that ultimately calls the OS gets a boost if it is for anything cpu intensive

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
DrBombcrater 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 13:30:44
#11 ]
Super Member
Joined: 6-Feb-2004
Posts: 1382
From: UK

@KimmoK

Quote:
I thought it's more because of JIT and the fact that it is using native GFX HW instead of emulated.

It's hard to make a direct comparison as Amithlon is built around JIT and pretty much grinds to a halt if you switch it off, but the gap between UAE with JIT and Amithlon with JIT is huge.

GFX hardware shouldn't be much of an issue with a test like this, but it is possible to disable hardware gfx acceleration in Amithlon to even things up.

Quote:
What is the real difference of UAE and Amithlon on CPU or memory intensive tests?

I find Amithlon is 2-4 times faster than UAE-JIT, depending on the task and the hardware involved.

_________________
Who do you serve, and who do you trust? - Galen

 Status: Offline
Profile     Report this post  
Karlos 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 13:42:59
#12 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4404
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

Graphics acceleration makes no difference in these tests as everything is performed by the CPU. Only refreshing the window uses blitting and that's outside the timing loop anyway.

For the curious, the benchmarking works by repeating a test a number of times, timing each iteration and accumulating the total time. Once this accumulated time exceeds a threshold, the value is calculated based on the total number of iterations and the accumulated time. I believe this to be much less error prone than timing a fixed number of iterations which may result in small interval measure problems.

Strictly, the timing is carried out using the EClock, but considered accurate to the nearest millisecond.


_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
KimmoK 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 13:53:47
#13 ]
Elite Member
Joined: 14-Mar-2003
Posts: 5211
From: Ylikiiminki, Finland

@DrBombcrater

I think AmigaHW emulation should not play any part on CPU/memory intensive test. ( unless 7mhz 16bit memory is emulated )

Here is an example: http://www.volny.cz/luky-amiga/Benchmarks.html

CPU68k test with JIT:
WinUAE Athlon 1800+ 1381
Amithlon Athlon 1700+ 1295

So it seems that UAE and Amithlon are as fast on CPU intensive tests.

btw. it seems that non-JIT 68 emulation sucks hard on x86

_________________
- KimmoK
// For freedom, for honor, for AMIGA
//
// Thing that I should find more time for: CC64 - 64bit Community Computer?

 Status: Offline
Profile     Report this post  
Anonymous 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 14:49:57
# ]

0
0

Athlon64 3200+ on WindowsXP Pro with WinUAE 0.8.26b3 (JIT):

Read RAM : 299400.00 K/sec
Write RAM : 299400.00 K/sec
RAM->RAM : 149700.60 K/sec
RAM->RAM (OS) : 1248600.00 K/sec
RAM->RAM[16] : 149700.60 K/sec
Read VRAM : 297604.79 K/sec
Write VRAM : 300000.00 K/sec
RAM->VRAM : 141690.96 K/sec
RAM->VRAM[16] : 142829.27 K/sec
VRAM->RAM : 148126.80 K/sec
VRAM->RAM[16] : 147842.76 K/sec
Conversion : 142551.12 K/sec [output bandwidth]
Conversion : 72986173.32 pix/sec

 
     Report this post  
Karlos 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 17:50:32
#15 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4404
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Enverex

So what was the non-JIT performance like? I'm actually more keen to see that since I can't be sure the JIT hasn't invalidated some of the tests.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Karlos 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 17:55:40
#16 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4404
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

Here's the results for interpetive 68020/FPU emulation on A1 800MHz 750FX PPC, 133 MHz FSB

Read RAM: 118444.67 K/sec
Write RAM: 118326.69 K/sec
RAM->RAM: 60059.46 K/sec
RAM->RAM[16]: N/A
Read VRAM: 119760.48 K/sec
Write VRAM: 144827.59 K/sec
RAM->VRAM: 54493.31 K/sec
RAM->VRAM[16]: N/A
VRAM->RAM: 59940.06 K/sec
VRAM->RAM[16]: N/A

@all

Note the RAM->RAM (OS) is a quick and dirty CopyMem() test added last minute. This will invariably end up calling the OS to do the job and so I haven't included them for fairness since there's no guarentee the (emulated) 680x0 is performing that operation.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
GadgetMaster 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 19:32:03
#17 ]
Cult Member
Joined: 26-Dec-2002
Posts: 603
From: TrustVille

@Karlos

Here are my results running latest WINUAE on a 2.4Ghz Pentium 4 CPU 266Mhz memory bus speed (Emulated AmigaOS 3.1) at 1024x768 16Bit PC @ 85 Hz (No JIT)

Read RAM : 115200.00 K/sec
Write RAM : 107462.69 K/sec
RAM->RAM : 79446.64 K/sec
RAM->RAM (OS) : 90443.35 K/sec
RAM->RAM[16] : N/A
Read VRAM : 119174.04 K/sec
Write VRAM : 115820.90 K/sec
RAM->VRAM : 81466.80 K/sec
RAM->VRAM[16] : N/A
VRAM->RAM : 84282.91 K/sec
VRAM->RAM[16] : N/A

Interesting indeed.

_________________
Trust me. I'm a doctor.

 Status: Offline
Profile     Report this post  
DrBombcrater 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 21:49:00
#18 ]
Super Member
Joined: 6-Feb-2004
Posts: 1382
From: UK

@Karlos

I can't get your program to run successfuly on Amithlon. On a 16-bit screen it crashes as soon as the window opens, and on a 32-bit screen it runs then brings down the OS.

There's not time to note down all the results before the crash, but I did get the first two tests (ram read/write) which both come out at 1200000K/sec on my XP 3200+ with Amithlon in JIT mode.

_________________
Who do you serve, and who do you trust? - Galen

 Status: Offline
Profile     Report this post  
Karlos 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 22:13:05
#19 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4404
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@DrBombcrater

Bugger I dunno. Seems to work everywhere else - from 040+CGX to 040+P96 to 603, 750, G4 and various UAE.

I know of one bug that affects 24 bit source data during the mem copy test.

Perhaps Amithlon has some issues? The code queries and locks bitmaps using CGX v3 compliant code. Locks are not held for longer than a frame typically.

-edit-

Pity it doesnt work properly - the read/write tests are probably wrecked by the JIT optimising them away.

Is it possible to test in non JIT mode?

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Karlos 
Re: Relative performance of some 680x0 code...
Posted on 7-May-2004 22:15:42
#20 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4404
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@GadgetMaster

Compare that to the PPC 750 @ 800 MHz 133 FSB

-edit-

More A1 results just in:

A1 XE PPC 7400 800Mhz (68020+FPU emulated, ni JIT), OS4 50.9 Beta
Radeon 7000, 1024x768 16Bit PC

Read RAM : 99701.49 K/sec
Write RAM : 99800.80 K/sec
RAM->RAM : 74701.20 K/sec
RAM->RAM (OS) : 99401.20 K/sec
RAM->RAM[16] : N/A
Read VRAM : 100000.00 K/sec
Write VRAM : 99708.45 K/sec
RAM->VRAM : 73714.29 K/sec
RAM->VRAM : N/A

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Goto page ( 1 | 2 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle