Click Here
home features news forums classifieds faqs links search
6071 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
9 crawler(s) on-line.
 69 guest(s) on-line.
 0 member(s) on-line.



You are an anonymous user.
Register Now!
 VooDoo:  11 mins ago
 pixie:  19 mins ago
 amigakit:  26 mins ago
 OlafS25:  50 mins ago
 Rob:  1 hr 4 mins ago
 matthey:  1 hr 13 mins ago
 Birbo:  1 hr 46 mins ago
 Gunnar:  3 hrs 4 mins ago
 DiscreetFX:  3 hrs 10 mins ago
 Hammer:  3 hrs 33 mins ago

/  Forum Index
   /  Amiga OS4.x \ Workbench 4.x
      /  AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Register To Post

Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 Next Page )
Poll : AmigaOS4 KVM/Emulation
I would get AmigaOS4 Forever Edition/check out emulation
I already run OS4 in Emulation
Intresting, see where this goes...
AmigaOS4 Hardware only!
Not intrested in Emulation
Not intrested in OS4
Pancakes!
 
PosterThread
V8 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 9:22:57
#161 ]
Regular Member
Joined: 30-Mar-2022
Posts: 134
From: Unknown

@NutsAboutAmiga

Quote:
you can’t get free chip memory from Execbase, you need to call AvailMem(), so they dropped support for software written for Kickstart 1.2. AvailMem function does not exist in Kickstart 1.2, if want support it all you need to do some version checking.


This makes no sense at all. Why would the OS4 developers do this?
Sure, I get it that you may need to add new calls with a new API for these kind of things.

But why remove the old API? Why? Why not leave it in for compatibility reasons and just document "new software should only use the new API, old API is deprecated but still available for backward compatibility".

Just leaving the old API in would kept compatibility. It would even have been LESS work. Instead they did the extra work to remove the old API?


Is this really what they did or did I misunderstand something? Because this sounds less than optimal.

 Status: Offline
Profile     Report this post  
Karlos 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 12:27:35
#162 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

Quote:
In theory, PPC code shouldn't be much more difficult to emulate than 68k code...


In theory, theory is like practise. In practise, it isn't. PPC FPU has entered the chat...

Even without MMU there are challenges. Emulating the integer unit of the PPC perfectly doable and QEMU does quite a good job there. The FPU, though, is tricky. It has a lot of intricate behaviours that made it quite performant in the day but are not trivial to emulate. In the old 68K days, an FPU was optional for AmigaOS so most code tended not to use one. However, OS4 and MOS were built on a minimum PPC specification that includes an FPU and AFAIK has never targeted anything without one (does anyone know of any contra-examples?). I would hazard a guess that as a result, both systems more than happily use the FPU wherever it makes sense to do so, because it's always there and performs better than doing whatever it is that needs it than some integer based fudge would.

The most compatible PPC FPU emulation in QEMU is still software based. Therein lies the challenge. Not only is the floating point performance hobbled by having to use a software implementation, but the integer performance also suffers because the host integer unit is busy executing soft-emulated floating point operations and not the translated integer code.

I don't know how vector unit emulation fares (it might be a lot better) but like the FPU in the original 68K amiga, that's something optional. Not every PPC currently running "NG" OS have them. I know that MOS applications tend to be optimised for that, but presumably use the vanilla FPU for anything not explicitly vectorisable. Which tends to be most non-stream-processing things.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
umisef 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 14:04:34
#163 ]
Super Member
Joined: 19-Jun-2005
Posts: 1714
From: Melbourne, Australia

@Kronos

Quote:
I do remember what my AthlonXP Amithlon box could do, and there 50% was an extremely best case cherry picked result.


Not entirely true --- the best cherry picked result was for RC5, for which the 68k code had a giant inner loop without any branches; And that one, from memory, did considerably better than 50%.

Also, keep in mind that things have somewhat changed in the last 22 years. Amithlon, I believe, was designed to run well on (a) machines with 32MB of RAM and (b) just a single core. That severely limited what it could do with regards to compiling through branches and subroutine calls. Also, it was running on an x86 host, which left it with bugger all host registers to use, meaning there was a lot of shuffling 68k registers from and to host memory.

These days, you'd easily set aside a GB or two for keeping track of your compilation, and would separate the job of determining what to compile (in the main execution thread) from the job of actually doing the compile (in a separate thread, or even multiple). This would allow farming out the actual compile to separate cores while the main execution continues unimpeded (albeit at that point interpretatively) --- which reduces the impact of compilation on latency.

Put all that together, and a 2023 JIT has hugely more opportunity to optimise the living daylights out of the code it generates, compared to 2001 Amithlon JIT.

 Status: Offline
Profile     Report this post  
pixie 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 14:40:47
#164 ]
Elite Member
Joined: 10-Mar-2003
Posts: 3153
From: Figueira da Foz - Portugal

@umisef

Quote:
Not entirely true --- the best cherry picked result was for RC5, for which the 68k code had a giant inner loop without any branches; And that one, from memory, did considerably better than 50%.


Indeed, on emu68k is on of those cases where it goes beyond 1:1.

JIT Statistics
1979 MIPS - M68k speed
1800 MIPS - ARM speed

Quote:
What you see here is m68k code doing more instructions per second then AArch64 cpu clock ticks. Ah, it is not only the code above, this ARM counter does take into account entire aarch64 environment, including JIT translator and JIT loop. We see here the power of superscalar aarch64 which can issue more than one instruction per cpu cycle and this is happening with RC5-72. The Effectiveness here stopped at 100% but actually it should have gone almost up to 110%...

Michal Schulz: Emu68 0.15.3 - new nightly

_________________
Indigo 3D Lounge, my second home.
The Illusion of Choice | Am*ga

 Status: Offline
Profile     Report this post  
Matt3k 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 16:09:31
#165 ]
Regular Member
Joined: 28-Feb-2004
Posts: 223
From: NY

@pixie
Say you shift for Arm, you could do it just as you did up till now, fall back on 68k software when needed, on x86 I would be harder to do it like that.


Good point, and I would hope that is factored in to a decision.

At the same time having all the same apps in the same OS, will make it easier to cut ties with the past since you have a full solution running natively already. I think it also help to have the web browser and email client coder part of the core team so pieces needed under the hood to make it as drama free as possible only helps. The code being nurtured and developed for 30 years by the same team.

That appose to having separate code for the same OS, some not "legal", half started programs and initiatives, coders come and gone and not coming back over the years, unfinished drivers, multicore set to work on only one system first you can't even buy and likely never completed like timberwolf, system 54, etc. (by a part time person), it is just a hot mess to be kind... It's done at this point and has been for a while, but we can't seem to wake up from the dream. I'm not saying stop entirely, but a real evaluation of the status, make some tough and sobering decisions, and use more substantial and proven resources that have a track record to help everyone instead of the current plan. I'm not saying this to be cruel, but objective and helpful.

 Status: Offline
Profile     Report this post  
Karlos 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 16:55:16
#166 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@umisef

That? Oh that's not a nerd sniper rifle at all.. it's er, my new lamp stand. Yeah. Lampstand.

No no. Those are my, er, electric wind chimes. Yeah, indoor. Not the sound of ammunition cartridges falling down a stairwell. It just sounds like it.


Honest.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
NutsAboutAmiga 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 17:30:45
#167 ]
Elite Member
Joined: 9-Jun-2004
Posts: 12825
From: Norway

@pixie

I think this kind of thing show MIPS is not reliable benchmark.
2 x 32bit writes can be combined to 1 x 64bit write.
4 x 16bit writes can be combined to 1 x 64bit write.
2 x 8bit writes can combined to 1 x16bit write, and so on.

benchmark tool thinks you executed operations on left, but did execute operations right.
what this means that the benchmark becomes just a lie.

Last edited by NutsAboutAmiga on 23-Oct-2023 at 05:31 PM.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

 Status: Offline
Profile     Report this post  
michalsc 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 17:43:41
#168 ]
AROS Core Developer
Joined: 14-Jun-2005
Posts: 377
From: Germany

@matthey

Quote:
AArch64 is little endian only with a few endian conversion instructions to convert big endian data


Not true. Many AArch64 cpus support both big and little endian modes. Emu68 is actively using AArch64 CPU in big endian mode and because of that does not need any further data manipulation instructions to adjust endianness of fetched data.

 Status: Offline
Profile     Report this post  
michalsc 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 17:46:07
#169 ]
AROS Core Developer
Joined: 14-Jun-2005
Posts: 377
From: Germany

@NutsAboutAmiga

Quote:
I think this kind of thing show MIPS is not reliable benchmark.
2 x 32bit writes can be combined to 1 x 64bit write.

Emu68 does it sometimes, so does it 68060 too with its write buffer

Quote:

4 x 16bit writes can be combined to 1 x 64bit write.

Emu68 does not do that.

Quote:

2 x 8bit writes can combined to 1 x16bit write, and so on.

Emu68 does not do that.

Quote:

benchmark tool thinks you executed operations on left, but did execute operations right.
what this means that the benchmark becomes just a lie.


Every benchmark is a lie to some degree, I repeat it almost every time. Much better is using real software and comparing the performance that way.

 Status: Offline
Profile     Report this post  
Karlos 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 19:01:51
#170 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

Does it matter that much how writes are combined if the target memory is cacheable? I would've thought line transfers would take care of that for you regardless.

I agree regarding benchmarks. Real world applications are a better fit. Even then someone will find a reason to disagree because you didn't test their application of choice.

Most applications are IO bound most of the time, waiting for something to happen. So you are left with compute-bound stuff. I thought the lightwave example was a good one. Computationally dense, lots of memory access, lots of branching and basically a nightmare for silicon and emulation alike.

But people object because it's too floating pointy, or it's too "doesn't make my hardware preference feel suitably validated and it's obsolete old application anyway"-ish.

You can't win either way. Keep on making Emu68, sir. It's kicking arse.

Last edited by Karlos on 23-Oct-2023 at 07:05 PM.
Last edited by Karlos on 23-Oct-2023 at 07:04 PM.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Karlos 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 19:03:35
#171 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@agami

Quote:

agami wrote:
@Karlos

Quote:
Karlos wrote (edited):

Real Amiga hardware, unreal 68K performance

That should be the official marketing tag for the PiStorm32.



I am fully in favour

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
matthey 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 19:58:10
#172 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2024
From: Kansas

Karlos Quote:

In theory, theory is like practise. In practise, it isn't. PPC FPU has entered the chat...

Even without MMU there are challenges. Emulating the integer unit of the PPC perfectly doable and QEMU does quite a good job there. The FPU, though, is tricky. It has a lot of intricate behaviours that made it quite performant in the day but are not trivial to emulate. In the old 68K days, an FPU was optional for AmigaOS so most code tended not to use one. However, OS4 and MOS were built on a minimum PPC specification that includes an FPU and AFAIK has never targeted anything without one (does anyone know of any contra-examples?). I would hazard a guess that as a result, both systems more than happily use the FPU wherever it makes sense to do so, because it's always there and performs better than doing whatever it is that needs it than some integer based fudge would.

The most compatible PPC FPU emulation in QEMU is still software based. Therein lies the challenge. Not only is the floating point performance hobbled by having to use a software implementation, but the integer performance also suffers because the host integer unit is busy executing soft-emulated floating point operations and not the translated integer code.


PPC AmigaOS 4 had a standard FPU spec until the A1222. The 68k AmigaOS 3 could use software math libraries but I expect using these is slower than emulating FPU instructions in most cases. The most common emulated FPU instructions likely translate to few instructions and sometimes a 1:1 translation to the host FPU/SIMD unit even though some PPC FPU instructions are unusual. The PPC FPU is no doubt more difficult to emulate than the 68060 hardware FPU instructions but perhaps similar complexity to emulating a 6888x which does not appear to be a problem for performance, at least when extended precision is dropped which can cause problems. I still see MMU emulation as being more difficult. MMUs vary more in features and instructions than basic FPU support so translating hardware MMU support to the host MMU is difficult and emulation the support is likely difficult and low performance. Heavy FPU use in PPC AmigaOS 4 could make the FPU emulation the larger bottleneck much like the A1222 standard PPC FPU trapping although that has the added overhead of a trap on every FPU instruction (the 68040/68060 does not trap on the most common FPU instructions and FPU registers increase the efficiency). It would be better to patch PPC programs once at startup but that would be a significantly more difficult programming job. It was a major mistake to choose a PPC CPU without a standard FPU. This hints at another problem of PPC that makes it difficult to emulate. PPC CPUs are not nearly as standardized as AArch64 CPUs and specifications/ISAs changed from the original desktop standard to less standard embedded standards.

Karlos Quote:

I don't know how vector unit emulation fares (it might be a lot better) but like the FPU in the original 68K amiga, that's something optional. Not every PPC currently running "NG" OS have them. I know that MOS applications tend to be optimised for that, but presumably use the vanilla FPU for anything not explicitly vectorisable. Which tends to be most non-stream-processing things.


A SIMD unit is not part of the pseudo PPC AmigaOS 4 standard so is nothing to worry about. Just make the PPC CPU detect as a CPU without a SIMD unit. Trying to get extra performance by translating PPC SIMD unit instructions to the host SIMD instructions is not worthwhile for PPC AmigaOS emulation where even basic PPC emulation doesn't attract much open source developer interest. There are not a bunch of PPC assembler coders that are going to show up to help unlike the many 68k assembler fans.

 Status: Offline
Profile     Report this post  
Karlos 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 21:56:44
#173 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

Quote:

PPC AmigaOS 4 had a standard FPU spec until the A1222


That must be why it took a little bit of extra time to get into production, eh?

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
matthey 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 23-Oct-2023 22:18:40
#174 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2024
From: Kansas

umisef Quote:

Not entirely true --- the best cherry picked result was for RC5, for which the 68k code had a giant inner loop without any branches; And that one, from memory, did considerably better than 50%.


pixie Quote:

Indeed, on emu68k is on of those cases where it goes beyond 1:1.

JIT Statistics
1979 MIPS - M68k speed
1800 MIPS - ARM speed

Michal Schulz: Emu68 0.15.3 - new nightly


Let's take a closer look at this emu68 RC5 code translation.



68k: 41 instructions, 148 bytes
ARM: 47 instructions, 188 bytes (+15% instructions, +27% code size)

This cherry picked code is not bad for a emulation translation but the 68k still has a significant advantage according to instruction count and code density metrics. The compiler emitting the 68k code has issues with CISC code generation which are handicapping it though. The poor register management results in extra MOVE mem,reg instructions instead of OR mem,reg instructions in at least 3 cases (last use of a register variable does not need to be saved allowing reg-mem operation to it).

68k: 38 instructions, 142 bytes
ARM: 47 instructions, 188 bytes (+24% instructions, +32% code size)

Now we will look at how this code executes on an in-order superscalar ARM Cortex-A53 vs in-order superscalar 68060. We will assume that all data is in the L1 cache. The Cortex-A53 load/store unit is in a different pipeline from the execution units so there is a 3 cycle load-to-use latency or penalty on all loads before another instruction can use the loaded data. I count 18 loads so 18*3=24 cycles of the Cortex-A53 doing nothing as I don't see a single case where a non-dependent instruction comes after the load that could be executed. I only see a couple of opportunities for superscalar execution of 2 instructions at once on the Coretex-A53 with this code but this does not come close to offsetting the load-to-use penalties. I would estimate total cycles would be 47+24-2=69 which is significantly less than one instruction per cycle through this code. The 68060 pipelines the addressing mode calculation with the execution pipeline so there is no load-to-use penalty after a load. The 68060 has 2 integer pipes that can do an addressing mode calculation like this vs the Cortex-A53 only having a single load/store unit. The 68060 sometimes allows multiple loads/stores in the same cycle by using a multi-banked cache while the Cortex-A53 has a single load/store unit that only allows one load or store per cycle. The 68060 has optimizations so that a MOVE.L mem,reg+OP.L reg,EA and OP.L EA,reg+OP.L reg,mem can have superscalar execution which helps it here. The 68060 instructions per cycle is greater than one through this code while the Cortex-A53 is about 0.69 and the 68060 has fewer instructions to execute. An in-order Cortex-A53 needs instruction scheduling for performance while the 68060 is a memory munching monster that performs well without instruction scheduling. The Cortex-A53 has much larger caches and much higher clock speeds which make up for some of the deficit but the RISC bottlenecks remain. OoO execution can reduce the load-to-use RISC bottleneck and increase performance but with increased area, power and cost.

michalsc Quote:

Not true. Many AArch64 cpus support both big and little endian modes. Emu68 is actively using AArch64 CPU in big endian mode and because of that does not need any further data manipulation instructions to adjust endianness of fetched data.


Most AArch64 CPUs do currently support 32 bit AArch32 and big endian support that comes with it but is that going away eventually? The low end RPi likely could and would use older cores with 32 bit support as their OS defaults to 32 bit because the older RPis did not have AArch64 support. Eventually, these AArch64 only cores may replace them. Who would have thought the RPi 5 would have moved up to a Cortex-A76 already? For your emu68 project it is fine as older RPis will remain available but it could be important if planning to port the 32 bit big endian AmigaOS to the RPi while hoping to retain similar compatibility as 68k emulation like PPC AmigaOS 4. ARM CPUs may be 64 bit only by the time AmigaOS 4 is ported to ARM in Amiga Neverland.

Last edited by matthey on 24-Oct-2023 at 01:23 PM.
Last edited by matthey on 23-Oct-2023 at 10:28 PM.
Last edited by matthey on 23-Oct-2023 at 10:25 PM.

 Status: Offline
Profile     Report this post  
Hans 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 24-Oct-2023 3:11:43
#175 ]
Elite Member
Joined: 27-Dec-2003
Posts: 5067
From: New Zealand

@Karlos

Quote:
Does it matter that much how writes are combined if the target memory is cacheable? I would've thought line transfers would take care of that for you regardless.

It matters for writes. Imagine writing a single byte to a cache-line. The memory controller will load the entire cache line in, and then overwrite a single byte. That's fine for the single byte case. However, it's a huge waste if you're going to overwrite the whole cache line, anyway.

It's easier for the memory controller to know what to do if you're writing in larger blocks. Of course, using cache instructions is even better because you can tell it that the entire line will be overwritten. They're an absolute pain in the butt to use, though, and that's before handling different cache line sizes (as happens on PPC machines).

Hans

_________________
http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more. Home of the RadeonHD driver for Amiga OS 4.x project.
https://keasigmadelta.com/ - More of my work.

 Status: Offline
Profile     Report this post  
Hammer 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 24-Oct-2023 3:41:53
#176 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5312
From: Australia

@matthey

Integer RC5 workload is useless for FPU-enabled Quake.

The RC5 performance test is almost exclusively a measure of Integer performance. Factors such as memory bandwidth, and FSB frequency have less influence on RC5's score.

PiStorm has access to low-cost RPi 4B's Cortex-A72 @ 1.8 Ghz with out-of-order processing.

RPi 4B cost less than Amibay's 68060 Rev 5 from Russia.

68060 being a memory munching monster is meaningless when the 68060 has inferior performance than the competition and inferior performance vs cost ratio. Ghz high clock speed is a design feature.

I have TF1260 with full 68060 rev 1, PiStorm32-RPi 4B and PiStorm-RPi 3A+.


_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
matthey 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 24-Oct-2023 20:29:47
#177 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2024
From: Kansas

Karlos Quote:

That must be why it took a little bit of extra time to get into production, eh?


I expect too much time was wasted on A1222 development trying to improve standard FPU performance and compatibility due to the lack of standard PPC FPU hardware but that isn't necessarily the reason for A1222 delays. There may not be a solution that offers good performance. Solutions that worked well for the 68k may not work well for the PPC.

Trap missing FPU instructions
PPC: Simple to implement but performance is likely worse than software emulation. After the trap, a branch to a prologue function is necessary which creates a new stack frame to save the integer registers used for the instruction emulation, the FPU instruction is emulated and the epilogue function is branched to to restore integer registers and stack.
68040/68060: Most common FPU instructions are in hardware executing at full performance. FPU registers are in hardware allowing trapped FPU instructions to use them. State data and registers only need to be saved and restored to the stack as needed.

Patch on the fly (OxyPatcher/CyberPatcher technique)
PPC: It should be possible to replace 32 bit FPU instructions with a 32 bit branch either relative or absolute on first trap of the FPU instruction. There are range limitations of addressing modes but MMU remapping of memory could solve this problem. If an address range can be specified for flushing the instruction cache, it may have acceptable warmup performance even though larger caches and multi-level caches will slow the warmup.
68k: This works on the 68k because 32 bit FPU instructions can be replaced by a 32 bit BSR (xxx).W using absolute addressing. A MMU to map the support code to a negative 16 bit range is more system friendly. Instruction cache flushing is also needed but small caches improve performance.

Disassemble and patch all FPU instructions at program execution
PPC: Feasibility depends on the accuracy of the disassembly so it may not be practical. It would avoid all the instruction cache flushes of patching on the fly if possible. The patched executable could be saved to disk after first patching avoiding any warmup time.
68k: The variable length encoding makes it more difficult to accurately disassemble 68k executables. It is possible to have a success rate above 50% with system friendly software and higher with debugging symbols but I don't consider this good enough for average users.

Automated download of recompiled executables
PPC: There is not much PPC AmigaOS 4 software and most of the source code is available. First time execution speed depends on internet connection speed but executable could be saved to disk for later execution. For the A1222, the dropped non-standard FPU compiler support is a pain for anyone wanting to compile for the A1222. An online database of executables is extra work but newer version updates could be performed at the same time like AmiUpdate does. If AmiUpdate is small and flexible enough to check for particular updates, it likely could perform an executable check and replacement on demand.
68k: The same could work for the 68k Amiga but much of the software and support is older.

Hammer Quote:

68060 being a memory munching monster is meaningless when the 68060 has inferior performance than the competition and inferior performance vs cost ratio. Ghz high clock speed is a design feature.


The 68060 has an 8 stage integer pipeline while the Alpha 21164 has a 7 stage integer pipeline. Both these CPUs were released in 1994, used a 500nm chip fab process and operated at 3.3V yet the Alpha 21164 had a 333MHz part and the Alpha 21164A with no changes to the pipeline or voltage had a 666MHz part using a 350nm chip fab process. Motorola originally planned to release parts with a higher clock rating than were ever released, not that the 68060 would clock as high as the Alpha but it didn't need to as it had significantly better integer performance/MHz like the Pentium CPUs which ended the Alpha and DEC. A deeper pipeline and smaller L1 caches allow higher clock ratings and Motorola didn't want the 68060 competing with the shallow pipeline PPC CPUs with big caches that couldn't be clocked as high. The 68060 already had better integer performance/MHz (DMIPS/MHz) than the PPC 601, PPC 603 and all Alpha CPUs. Motorola made a political decision to push the PPC with the AIM alliance which improved economies of scale but ignored existing technology.

Chip cost is pretty simple. Area, the fab process and economies of scale are the primary factors in cost. ARM cores originally had an advantage in area as the ARM2 CPU used 30k transistors to the 68000 68k transistors. ARM CPUs originally tried clocking up the memory with the CPU but this led to expensive memory prices for the Acorn computers and caches were adopted in ARM3 but the 4kiB cache used more transistors than the whole ARM2 core (4kiB SRAM=196,608 transistors). Lack of code density significantly reduced the efficiency too. Modern CPUs use many more transistors for caches than the CPU cores. Even lower end ARM cores like the Cortex-A53 have large caches. The 68060 used about 2.5 million transistors while a Cortex-A53 core uses roughly 12.5 million transistors according to "Digital Design and Computer Architecture". They are both superscalar in-order 8 stage pipeline cores although the Cortex-A53 is 64 bit while the 68060 is 32 bit. The 68060 used in an Amiga doesn't need 64 bit or as many cores which saves transistors. The 68060 uses fewer transistors than the RP2040 ARM based SoC chip that only costs $1 using a 40nm chip fab process. THEA500 Mini is evidence that mass production may be possible. Cortex-A53 emulation of the 68k seems to be the choice though. Amiga purgatory seems to be expensive hardware or poor emulation with nothing in between.

 Status: Offline
Profile     Report this post  
umisef 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 25-Oct-2023 8:35:14
#178 ]
Super Member
Joined: 19-Jun-2005
Posts: 1714
From: Melbourne, Australia

@Hans

Quote:
Imagine writing a single byte to a cache-line.


Back in 2001, that sort of thing caused issues even with x86 registers on quite a few processors. Doing partial register writes would seriously interfere with register renaming, and often stall the pipeline.

Amithlon spent quite a bit of its limited cycle budget for optimisation on working out, separately for bits 0...7, 8...15 and 16...31, of each 68k register whether it could prove at the point of a register write whether any of those parts were "don't care" (i.e. will be overwritten down the track without first having any influence of the long-term program state), mostly so it could replace 8 or 16 bit loads on the 68k side with 32 bit ones on x86.

Last edited by umisef on 25-Oct-2023 at 08:39 AM.

 Status: Offline
Profile     Report this post  
Karlos 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 25-Oct-2023 14:19:43
#179 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

I know it wasn't the reason for the delay, but it's a great example of the farce of chasing some PPC hardware/OS pipedream.

An irony is that this sort of incompatibility wouldn't matter if OS4 and its software were 68K/JIT based, since only the JIT would need fixing for the guest code to continue working.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
matthey 
Re: AmigaOS4 KVM Edition? virtual gpu driver Picasso96 coming soon.
Posted on 25-Oct-2023 20:27:35
#180 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2024
From: Kansas

umisef Quote:

Back in 2001, that sort of thing caused issues even with x86 registers on quite a few processors. Doing partial register writes would seriously interfere with register renaming, and often stall the pipeline.


Partial register stalls and only register sized result forwarding/bypassing are common for modern CPU cores. The same problem occurs with smaller 32 bit code on a 64 bit CPU which is usually solved by clearing the upper 32 bits on 32 bit operations. Additional hardware can reduce the performance reduction but is not free so it is usually not done. For x86, MOVSX and MOVZX instructions were added to sign extend or clear the upper part of registers as the default behavior of 8 bit and 16 bit operations for x86 is the same as the 68k. The 68060 suffers from partial register stalls and can only forward 32 bit register results like many Pentium and later x86 CPUs but the ISA never evolved to gain similar instructions although ColdFire did with MVS and MVZ instructions. These encodings are open on the 68k and MVS/MVZ reduce code size on the 68k/CF while MOVSX/MOVZX increase code size on x86. All this doesn't really matter unless trying for performance. AmigaOS 3.2+ is still compiled for the 68000 where 16 bit and 8 bit datatypes are preferred over 32 bit datatypes which is the opposite of best performance for the 68060. The emu68 RC5 code example earlier is so efficient for AArch64 because it uses all 32 bit datatypes which are supported in AArch64 allowing a 1:1 translation for many instructions. Even for emulation, there can be a difference in performance based on compiler options. Unfortunately, the 68060 did not receive good compiler support often lacking an instruction scheduler, it suffers from partial register writes and can only forward 32 bit results, it didn't get MOVSX/MOVZX instructions and too many instructions were dropped especially 64 bit integer math instructions but it still shows amazing integer performance for an in-order CPU including code not optimized for the 68060. Even the poor optimization level of the example emu68 RC5 code only reduces the 68060 performance from inferior code density while the most popular in-order core in the world, the Cortex-A53, falls on its face from load-to-use penalties. This code was cherry picked as best case emulation result for AArch64 and not the 68060 yet the 68060 shows up the Cortex-A53 used in THEA500 Mini and A600GS.

Karlos Quote:

I know it wasn't the reason for the delay, but it's a great example of the farce of chasing some PPC hardware/OS pipedream.

An irony is that this sort of incompatibility wouldn't matter if OS4 and its software were 68K/JIT based, since only the JIT would need fixing for the guest code to continue working.


Kissing frogs (A1222 e500v2 and A600GS Cortex-A53 cores) isn't going to turn them into a princess. It's more likely to turn a prince into a frog as everyone runs away in disgust.

Last edited by matthey on 25-Oct-2023 at 08:31 PM.

 Status: Offline
Profile     Report this post  
Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle