Click Here
home features news forums classifieds faqs links search
6167 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
22 crawler(s) on-line.
 95 guest(s) on-line.
 1 member(s) on-line.


 jingof

You are an anonymous user.
Register Now!
 jingof:  1 min ago
 agami:  9 mins ago
 V8:  14 mins ago
 ppcamiga1:  28 mins ago
 kolla:  1 hr 4 mins ago
 amigang:  1 hr 15 mins ago
 matthey:  1 hr 32 mins ago
 AmigaMac:  4 hrs 37 mins ago
 Hammer:  4 hrs 49 mins ago
 DiscreetFX:  6 hrs 26 mins ago

/  Forum Index
   /  Amiga General Chat
      /  New Classic Amiga market?
Register To Post

Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Next Page )
PosterThread
Karlos 
Re: New Classic Amiga market?
Posted on 29-Jul-2024 21:13:55
#141 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4943
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@michalsc

So, theoretically, an optimising JIT could be even faster? I appreciate that keeping it simple is the goal.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
pixie 
Re: New Classic Amiga market?
Posted on 29-Jul-2024 22:14:38
#142 ]
Elite Member
Joined: 10-Mar-2003
Posts: 3468
From: Figueira da Foz - Portugal

@Karlos

It would be cool to have some optimisations if they make sense, not for benchmarks though.

_________________
Indigo 3D Lounge, my second home.
The Illusion of Choice | Am*ga

 Status: Offline
Profile     Report this post  
OneTimer1 
Re: New Classic Amiga market?
Posted on 29-Jul-2024 22:47:05
#143 ]
Super Member
Joined: 3-Aug-2015
Posts: 1205
From: Germany

Quote:

michalsc wrote:

Results vary because SysInfo is very bad benchmark. The Dhrystone test has nothing to do with real dhrystone benchmakr - it is just the MIPS value multiplied by a fixed factor.

Pi5 and Pi400 results most likely from some UAE variant. Pistorm results totally broken since acquired with ancient version of sysinfo. This table compares apples with oranges…


I was already told about the stupid results given by Sysinfo, but faking Dhrystones is even worse, they could have used the sources from the net:

https://github.com/sifive/benchmark-dhrystone

AIBB (i was told) has better algorithms:
https://aminet.net/search?name=AIBB


There is whetstone (important for benchmarking floating point calculations)

https://github.com/zvonkok/benchmarks/blob/master/whetstone.c

I haven't seen any X-Platform benchmarks for disk speed or 2D GFX

But it could be done easily in 'c' so it might run on any 'Amiga' or Linux system.

Last edited by OneTimer1 on 29-Jul-2024 at 10:53 PM.
Last edited by OneTimer1 on 29-Jul-2024 at 10:52 PM.

 Status: Offline
Profile     Report this post  
pixie 
Re: New Classic Amiga market?
Posted on 29-Jul-2024 22:52:08
#144 ]
Elite Member
Joined: 10-Mar-2003
Posts: 3468
From: Figueira da Foz - Portugal

@OneTimer1

https://aminet.net/package/util/cli/WhetDhryStone

Might be interesting since it has source and runs also on MorphOS/AmigaOS 4

_________________
Indigo 3D Lounge, my second home.
The Illusion of Choice | Am*ga

 Status: Offline
Profile     Report this post  
Karlos 
Re: New Classic Amiga market?
Posted on 30-Jul-2024 0:47:09
#145 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4943
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@pixie

The problem with benchmarks is not optimising JIT, it's the idea that any purely synthetic test is a realistic indicator of anything other than what the test itself does, which being a treat is usually non representative.

Don't get me wrong, I write synthetic tests for stuff all the time, to evaluate ideas and algorithms, but never as a gauge of system performance.

The only meaningful measure of system performance is the quantifiable improvement to the software you use, whether that's quake timedemo FPS, elapsed render time in your favourite 3D tool, rendering audio tracks, image processing, whatever it is you actually use it for. All of these things legitimately benefit from any optimisations a JIT can apply, just as they legitimately may benefit from 060 superscalar execution, 2 cycle multiplication, zero cycle branches or other things that seem like cheating when comparing to older 68K.

Last edited by Karlos on 30-Jul-2024 at 12:49 AM.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
matthey 
Re: New Classic Amiga market?
Posted on 30-Jul-2024 2:06:52
#146 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2669
From: Kansas

michalsc Quote:

Emu68 does not optimize code producing unused results. Even a single NOP is not optimized away…


Is it this black and white though? Is Emu68 doing some optimizations on the level of peephole optimizations but not profiling/trace optimizations? Are some unintentional optimizations done due to architecture instruction conversion assumptions? What is the Emu68 68k to AArch64 translation of NOP for example?

The 68k NOP instruction is different than the AArch64 NOP instruction as it performs a pipeline synchronization.

M68000 Programmer's Reference Manual Quote:

Description: Performs no operation. The processor state, other than the program counter, is unaffected. Execution continues with the instruction following the NOP instruction. The NOP instruction does not begin execution until all pending bus cycles have completed. This synchronizes the pipeline and prevents instruction overlap.


The AArch64 equivalent of a pipeline synchronization is an ISB (Instruction Synchronization Barrier) instruction which is very expensive on a pipelined CPU (68060 NOP has a 9 cycle latency without superscalar execution). An AArch64 DSB (Data Synchronization Barrier) instruction may be needed too. There are major differences in the hardware so it depends on how robust of compatibility is desired. For example, the 68040 allows to mark areas of memory as serialized, the 68060 always performs serialized memory accesses and AArch64 memory accesses may be neither serialized or have a way to mark areas of memory as serialized besides inhibiting the caches. The preferred RISC way seems to be many fence style instructions to make more serialized CISC code work and the weak RISC memory models become more problematic with SMP. I expect AmigaOS 4 has added thousands of SYNC/EIEIO instructions trying to get SMP working, to no avail. SYNC would have been a better name for the existing 68k NOP instruction while TRAPF is a better NOP instruction for code alignment. There is no 68k equivalent of AArch64 DSB, PPC EIEIO and similar RISC data access fence instructions probably because it would only be useful on the 68040 when memory regions could not be marked as serialized.

michalsc Quote:

My own will to not do any sort of JIT cheating and the beauty of keeping Emu68 as simple as possible. For the same reason all hardware drivers are pure m68k code - the only hardware Emu68 is touching is GPIO - used for communication with FPGA/CPLD on PiStorm.


Less jitter at the cost of maximum performance but folks are already asking for more optimizations and performance. All the optimists that were excited about a maybe Pentium III circa 2000 level of performance may have to reevaluate their thinking. The 68k Amiga moved forward about 5 years but is still about 25 years behind modern.

Karlos Quote:

So, theoretically, an optimising JIT could be even faster? I appreciate that keeping it simple is the goal.


Optimizing JIT is not just theory as it exists. Look at ART for the Android OS which does a combination of the following.

o trace based just-in-time (JIT) compilation
o ahead-of-time (AOT) compilation
o profiling/trace based optimizations from used hardware as well as similar hardware
o detects idle and battery charging to minimize performance loss when optimizing

https://en.wikipedia.org/wiki/Android_Runtime

It's very sophisticated but still a waste and slower than standardized native code. The OS standard profiling idea is good and could be optionally enabled for a real CPU too.

There is pretty much endless potential for optimizations. Other cores could help with the optimizations but cache sharing of the JIT buffers reduces the performance gain. Instruction scheduling would be a major improvement for most in-order RISC cores and should improve OoO cores some too. Improvements go into the emulators rather than into CPU compilers is the problem though. There is no reason to improve the 68060 backend and write an instruction scheduler if it is being replaced by emulation. Just compile for a 68000, do more emulation optimizations and look for faster emulation hardware. EOL. RIP Amiga.

 Status: Offline
Profile     Report this post  
Hammer 
Re: New Classic Amiga market?
Posted on 30-Jul-2024 4:37:34
#147 ]
Elite Member
Joined: 9-Mar-2003
Posts: 6434
From: Australia

@matthey

Quote:

Less jitter at the cost of maximum performance but folks are already asking for more optimizations and performance. All the optimists that were excited about a maybe Pentium III circa 2000 level of performance may have to reevaluate their thinking. The 68k Amiga moved forward about 5 years but is still about 25 years behind modern.

ARM's Cortex A72 evolved from Cortex A57 which evolved from Cortex A15 i.e. these ARM CPU models have three decoders and out-of-order processing.

RPi 5 's ARM's Cortex A76 with four decoders is still far from Apple's Firestorm M1's 8 decoders.

Marvell's ThunderX3 ARMv8.3+ has 8-wide decoders.

Investment money is needed to glue a super fat ARMv8 SoC with backward compatibility with the Amiga chipset, anything less is just WinUAE or Amiga Forever on an inexpensive Ryzen 5 7600X/B650 motherboard which is being displaced by Ryzen 5 9600X.

Quote:

There is no reason to improve the 68060 backend and write an instruction scheduler if it is being replaced by emulation. Just compile for a 68000, do more emulation optimizations and look for faster emulation hardware. EOL. RIP Amiga.

Reminder, Motorola terminated 68K. Motorola themselves has exited the CPU market by creating Freescale which is then purchased by NXP.

Many non-X86 platform vendors are using ARM and RISC-V as lifeboats due to many RISC instruction sets have fallen from the mainstream.

ARM has a "safe space" commercial application software platform via Google's Android application platform which dominates the handheld smartphone market.

One trick pony embedded microcontrollers are different from application CPU platforms.

Last edited by Hammer on 30-Jul-2024 at 04:40 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
Hammer 
Re: New Classic Amiga market?
Posted on 30-Jul-2024 4:46:10
#148 ]
Elite Member
Joined: 9-Mar-2003
Posts: 6434
From: Australia

@OneTimer1

Quote:

OneTimer1 wrote:
Quote:

michalsc wrote:

Results vary because SysInfo is very bad benchmark. The Dhrystone test has nothing to do with real dhrystone benchmakr - it is just the MIPS value multiplied by a fixed factor.

Pi5 and Pi400 results most likely from some UAE variant. Pistorm results totally broken since acquired with ancient version of sysinfo. This table compares apples with oranges…


I was already told about the stupid results given by Sysinfo, but faking Dhrystones is even worse, they could have used the sources from the net:

https://github.com/sifive/benchmark-dhrystone

AIBB (i was told) has better algorithms:
https://aminet.net/search?name=AIBB

There is whetstone (important for benchmarking floating point calculations)

https://github.com/zvonkok/benchmarks/blob/master/whetstone.c

I haven't seen any X-Platform benchmarks for disk speed or 2D GFX

But it could be done easily in 'c' so it might run on any 'Amiga' or Linux system.

SysInfo reached 68060''s 4-byte fetch per cycle from L1 cache limitation which shows the major weakness with 68060.

SysInfo is useful for showing 68060's major bottleneck design flaws.

Last edited by Hammer on 30-Jul-2024 at 04:47 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
Hammer 
Re: New Classic Amiga market?
Posted on 30-Jul-2024 4:50:52
#149 ]
Elite Member
Joined: 9-Mar-2003
Posts: 6434
From: Australia

@kolla

Quote:

kolla wrote:
@Hammer

Quote:
AmiCube F5000


According to google, the only mention of such a thing is you here on AWN.
In other words… now you’re just making up stuff as you go.

Bullshit.

Quote:

AmiCube seems busy evaluating Spartan 7 dev boards after he ran his MiST clone F1200 project into the ground over something as silly as refusing to give credit to the MiST project.

Try youtube instead of google.

https://www.youtube.com/watch?v=HsjiMXMYqrU
From AmiCube:
Quote:

5 months ago

No, but AmiCube F5000 does. This is new design based on Cyclone 10 that we are working on.


1. https://www.youtube.com/results?search_query=AmiCube+F5000
2. click on "AmiCube F1200 - Hardware introduction" video.
3. Look for "Would it support PiStorm?" question.

You can't even use the internet properly.



Last edited by Hammer on 30-Jul-2024 at 06:03 AM.
Last edited by Hammer on 30-Jul-2024 at 06:01 AM.
Last edited by Hammer on 30-Jul-2024 at 06:00 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
Hammer 
Re: New Classic Amiga market?
Posted on 30-Jul-2024 4:54:47
#150 ]
Elite Member
Joined: 9-Mar-2003
Posts: 6434
From: Australia

@amigakit

Quote:

amigakit wrote:
@amigang

The A600GS will be much faster in many areas compared to pure emulation such as TheA500Mini. This is due to native ARM libraries running in AmiBench. We are slowly moving libraries over to native ARM which will make performance increase in areas such as graphics and raw number crunching.

Of course TheA500Mini is not able to have a Workbench distributed with it due to the legal constraints of the 2009 Settlement Agreement. Retro Games Limited are not Amiga developers so will not want to natively progress their own Workbench envirnonment- it would be far too much cost and breadth of work. They are interested in moving on commercialising other retro platforms once they have extracted as much money from our community as possible.


TheA500Mini has licensed Amiga ROMs © 1985–1993 Cloanto Corporation. TheA500Mini is focusing on "kick-the-OS" Amiga retro games. Cloanto provided the necessary Amiga ROM IP.

The current Amiga Forever 10 R3 has a bootable Linux-hosted UAE Cloanto's customized "AmigaOS 3.X" desktop.

I have "Amiga Forever 10" and I can verify the customized "Workbench 3.X" distribution.

https://sites.google.com/site/amigadocuments/hyperion-entertainment-vs-amiga-2
The legal battle continues beyond the 2009 Settlement Agreement.

Ben Hermans (Hyperion Entertainment) and Robert Trevor Dickinson (A-Eon) vs Mike Battilana (Amiga Corporation, Cloanto Corporation) and Gordon Troy.

Last edited by Hammer on 30-Jul-2024 at 05:13 AM.
Last edited by Hammer on 30-Jul-2024 at 05:04 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
michalsc 
Re: New Classic Amiga market?
Posted on 30-Jul-2024 5:01:20
#151 ]
AROS Core Developer
Joined: 14-Jun-2005
Posts: 437
From: Germany

@OneTimer1

Yes, AIBB is much better but not optimal. In case of very fast machines the benchmarking time is too short. Nevertheless it gives much wider range of results focused on specific areas of the machine:





 Status: Offline
Profile     Report this post  
michalsc 
Re: New Classic Amiga market?
Posted on 30-Jul-2024 5:47:20
#152 ]
AROS Core Developer
Joined: 14-Jun-2005
Posts: 437
From: Germany

@matthey

Quote:
Is it this black and white though? Is Emu68 doing some optimizations on the level of peephole optimizations but not profiling/trace optimizations?


It is pretty much black and white. However (there is always however) keep in mind I am not emulating a specific CPU. Rather, I am implementing 680x0 instruction set architecture by means of JIT. Which means, everything that is internal to JIT is my decision.

Consider following instructions (ignore their stupidity)
Quote:
moveq #10, d0
add.l d0, (a2)+


Since add.l instruction will set XNZVC flags fully, there is no need to set any flags right after moveq instruction. This would be waste of cycles and waste of efficiency and Emu68 does not do that. Another example:

Quote:
moveq #0, d0
move.b (a0)+, d0


Here one wants to load 32-bit register with an unsigned 8-bit value, clearing the remaining 24 bits. Since AArch64 does have a single instruction doing exactly this, the two instructions from m68k will be replaced by a single aarch64 instruction (yes, including post-increment). Eventually NZ00 calculation will take place too.

Quote:
Are some unintentional optimizations done due to architecture instruction conversion assumptions?


If there are optimizations (like above) then they are all intentional.

Quote:
What is the Emu68 68k to AArch64 translation of NOP for example?


NOP is translated into full DSB and synchronization of register containing program counter. I could also emit ISB, but this, considering JIT nature, is pretty irrelevant in this case. Btw, why not check it yourself? https://github.com/michalsc/Emu68/blob/b9f91e87a46fed362cede4fbb9b96b182fe6d83c/src/M68k_LINE4.c#L1467C1-L1478C2

BTW. If someone is interested, there are three debug logs from Emu68, containing output from the JIT translator, feel free to anyse them and find the "benchmark" loop:

1. Launching SysInfo
2. SysInfo benchmark started
3. SysInfo quit

These are from an older version of Emu68, so there might be some changes between this and what would be generated now.

 Status: Offline
Profile     Report this post  
Hammer 
Re: New Classic Amiga market?
Posted on 30-Jul-2024 5:49:40
#153 ]
Elite Member
Joined: 9-Mar-2003
Posts: 6434
From: Australia

@matthey

Quote:

matthey:

No, but benchmarks based on realistic code can be useful for comparing systems if limitations and flaws are known. Sysinfo Dhrystones benchmark code is not realistic but synthetic and useless. Benchmarking JIT emulators with useless code is worthless as code which gives unused results may be optimized away.

What's the matter?

Can't you handle SysInfo exposing 68060's 4 bytes per cycle fetch from L1 instruction cache bottleneck?

Quote:

matthey:

FPS games provide a much better benchmark than SysInfo. I previously showed that roughly half the performance of the Cortex-A53 is lost to load-to-use stalls of unscheduled 68k to AArch64 instructions.

For A500's PiStorm, I replaced my RPi 3A+ with RPi 4B, and still works including "turtle mode" WHDLoad's multi-parallax-enabled games.

If additional compute power is required, there's RPi 4B or CM4. RPi 4B/CM4's overclocking beyond 1.8 Ghz is easy.

Your fixation with ARM Cortex A53 is not real when there's a working upgrade path.

Quote:

Before Hammer creates another wall of text, I expect the 68060 FPU has stages (and separate FADD, FMUL and FDIV/FSQR units) even though it is not fully pipelined. See figure 2 in the following Microprocesor Report for a picture of multiple FPU stages even though it still doesn't show the individual unit pipelines.

Hint: 100 Mhz 32-bit memory bus is 75 percent of 66 Mhz 64-bit memory's bandwidth.

For Quake benchmarks, 75 percent of Pentium 100Mhz's 26.70 fps average is 20.025 fps.

Warp1260's 68060 rev6 @ 100Mhz with RTG's 19 fps average shows memory bandwidth bound by 68060's single data rate 32-bit external bus @ 100 Mhz.

Memory bandwidth is very important for rendering. VRAM is not a paper tiger. VRAM has been replaced by SDRAM-based SGRAM and GDDR memory types.

68060's external bus is not designed for double-rate data (DDR).

Quote:

The Pi4 and Pi5 OoO execution at least partially remove the load-to-use stalls but the CPU cores are huge and generate a lot of heat doing it even with expensive chip fab improvements. The heat is outpacing the die shrinks making the RPi4 and especially RPi5 less practical for embedded use "building blocks" like is used for the PiStorm and limits how much of a GPU upgrade is possible for these already large and hot SoC chips.

PRi 4B's temps are fine for A500. I used 2.0 Ghz overlock for RPi 4B without CPU voltage increase.

The Amiga is not an embedded platform.

Last edited by Hammer on 30-Jul-2024 at 10:18 AM.
Last edited by Hammer on 30-Jul-2024 at 05:51 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
amigang 
Re: New Classic Amiga market?
Posted on 30-Jul-2024 10:44:59
#154 ]
Elite Member
Joined: 12-Jan-2005
Posts: 2133
From: Cheshire, England

Im dropping the idea of comparing speed with Sysinfo/Dhrystones, the number are too over the place and too many different factors. (it be could maybe be useful with context of other benchmarks, like how fast does one of these platform take to render a 3d scene in say Lightwave, I agree would likely be a better benchmark.)

I was more just trying to gage the speeds of each one of these 68K+ platform in relation to one another.

I guess the systems would be in this order, with right config and settings, with PC or pi5 being the fastest out of these systems, the A600GS & A500 mini being pretty much the same, with maybe the A600GS coming out on top due to the better software support being written for ARM gfx.

68K Plus Platforms

PC WinUAE ---------- Fastest (any modern x86 clocked over 2Ghz+)
Raspberry Pi5 ------
Raspberry Pi400 ---
A600 GS ---------------
A500 Mini -------------
Pi4 Pistorm32 -------
Pi3A Pistorm32 -----
Vampire V4 -----------
68060 at 100Mhz ---- Slowest

I think it maybe fun for someone to make a Benchmark suit that could be tested on all these system, just to really find out what difference hardware and even say software each make, like Time to boot, Time to Render scene, FPS in say Quake, lha compression time, browser render times etc.

Something like this would be cool (AmigaONE X1000 benchmarked compared to PPC Macs)
https://amiga-news.de/de/news/AN-2012-02-00011-EN.html


_________________
AmigaNG, YouTube, LeaveReality Studio

 Status: Offline
Profile     Report this post  
matthey 
Re: New Classic Amiga market?
Posted on 31-Jul-2024 9:35:12
#155 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2669
From: Kansas

michalsc Quote:

It is pretty much black and white. However (there is always however) keep in mind I am not emulating a specific CPU. Rather, I am implementing 680x0 instruction set architecture by means of JIT. Which means, everything that is internal to JIT is my decision.


Good. I hope Emu68 JIT compatibility is more black and white than WinUAE JIT which has to be turned off for compatibility, some of the features and for Tony to accept bug reports. Some of the problems may be timing related as Windows is far from a RTOS and the Amiga needs specific timing in some cases. Disabling the MMU and using a RTOS no doubt helps (THEA500 Mini and A600GS?) but no OS is the best to reduce jitter. As long as everyone is happy with the compatibility and satisfied forever with the current level of performance, Emu68 is finished besides bug fixes?

michalsc Quote:

Consider following instructions (ignore their stupidity)

Quote:

moveq #10, d0
add.l d0, (a2)+


Since add.l instruction will set XNZVC flags fully, there is no need to set any flags right after moveq instruction. This would be waste of cycles and waste of efficiency and Emu68 does not do that.


Looks good to me as long as the flags are set correctly for a 68k interrupt. This pair of 68k instructions could be one instruction and 2 bytes instead of 4 bytes with 68k ISA enhancements. My idea was to introduce an immediate word addressing mode for ADD.L and other OP.L instructions to improve performance metrics. I believe the AC68080 also has an ADDIW.L instruction although it may use a different encoding than my idea.

michalsc Quote:

Another example:

Quote:

moveq #0, d0
move.b (a0)+, d0


Here one wants to load 32-bit register with an unsigned 8-bit value, clearing the remaining 24 bits. Since AArch64 does have a single instruction doing exactly this, the two instructions from m68k will be replaced by a single aarch64 instruction (yes, including post-increment). Eventually NZ00 calculation will take place too.


This pair of 68k instructions could be a single ColdFire MVZ.B (a0)+,d0 using 2 bytes of code instead of 4 bytes. The 68k did not get ColdFire enhancements though while ARM gained at least 3 new ISAs and thousands of instructions. The 68k has the reduced instruction set architecture but AArch64 is still considered RISC. The AC68080 recently reenabled the MVZ, MVS and MOV3Q ColdFire instructions for the FPGA ISA. It looks like they would be simple to support in Emu68 if you wanted to have partial compatibility with ColdFire and the AC68080. I added an option in the ADis disassembler to disassemble some ColdFire instructions in 68k code. The encodings are free on the 68k although the MOV3Q encoding uses A-line which interferes with some emulators.

michalsc Quote:

NOP is translated into full DSB and synchronization of register containing program counter. I could also emit ISB, but this, considering JIT nature, is pretty irrelevant in this case. Btw, why not check it yourself? https://github.com/michalsc/Emu68/blob/b9f91e87a46fed362cede4fbb9b96b182fe6d83c/src/M68k_LINE4.c#L1467C1-L1478C2


After pondering for awhile, my initial thought is that a DSB instruction alone is adequate for interpreted emulation where the result of each instruction is obtained before the next instruction is interpreted. This effectively serializes the execution of code as if the execution was not pipelined. This is similar to using an ISB instruction after every executed instruction.

JIT execution executes pipelined code like the emulated CPU. There can be multiple instructions executing in parallel due to pipelining, superscalar execution and/or OoO execution. The way I interpret the 68k NOP description, all parallel instruction execution should complete before execution of the NOP instruction itself. For my logic, this is an ISB+DSB instruction pair. For example, a long latency division instruction should no longer be executing after a NOP instruction. This could affect interrupts or an instruction which saves the current state of the machine if there is no stall logic to wait for all results to complete. I would recommend seeking other opinions too. Thomas "ThoR" Richter may have some insights.

It would be interesting to know the full history of the 68k NOP instruction. The M68000 User Manual says nothing about pipeline synchronization but the 68000 is not pipelined. NOP on the 68000 is only 4 cycles latency where TRAPF does not exist as an alternative until the 68020 ISA. According to the 68020 User Manual, the 68020 ISA may be the first to hijack the NOP "alignment" instruction and effectively turn it into a SYNC instruction which was a bad idea as it is both confusing and more expensive on more modern 68k CPUs. The 68020 NOP instruction including synchronization is relatively cheap at 2-3 cycles but so is TRAPF at 1-5 cycles. TRAPF became lower latency on the 68040 and 68060 than NOP due to the pipeline synchronization.

CPU | NOP cycles | TRAPF cycles
68000 4 N/A
68020 2-3 1-5
68040 8 5
68060 9 1

On the 68060, NOP is pOEP only and TRAPF is pOEP|sOEP so the 68060 can execute 18 TRAPF instructions in the time it takes to execute one NOP instruction (TRAPF.W and TRAPF.L are 1 cycle latency too widening the TRAPF advantage further). It may be possible to fill the instruction fetch pipeline (IFP) and instruction buffer while waiting for the operand execution pipelines (OEPs) and write buffer to empty thus reducing the NOP latency by 4-5 cycles although this may require a little more logic than the 68060 currently has.

The best option for programmers is usually to avoid executing NOP instructions where possible on the 68k. Aligning branch targets may give some performance but gains are offset by the execution of instructions in the code and reduced code density. NOP instructions outside of executed code to 32-bit align branch targets is an option for performance critical code but 68k CPUs generally have minimal mis-alignment overhead which improves code density. This is in contrast to x86 code where code density reducing NOPs are more common and mis-alignment penalties at least used to be higher (variable length byte encodings result in innately worse code alignment than variable length 16-bit encodings).

Last edited by matthey on 31-Jul-2024 at 01:19 PM.
Last edited by matthey on 31-Jul-2024 at 12:04 PM.
Last edited by matthey on 31-Jul-2024 at 09:43 AM.

 Status: Offline
Profile     Report this post  
kolla 
Re: New Classic Amiga market?
Posted on 31-Jul-2024 13:51:14
#156 ]
Elite Member
Joined: 20-Aug-2003
Posts: 3452
From: Trondheim, Norway

@Hammer

So a random comment on a youtube vid about project that they have already cancelled… yay…

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

 Status: Offline
Profile     Report this post  
michalsc 
Re: New Classic Amiga market?
Posted on 31-Jul-2024 14:33:07
#157 ]
AROS Core Developer
Joined: 14-Jun-2005
Posts: 437
From: Germany

@matthey

Quote:
Good. I hope Emu68 JIT compatibility is more black and white than WinUAE JIT which has to be turned off for compatibility, some of the features and for Tony to accept bug reports. Some of the problems may be timing related as Windows is far from a RTOS and the Amiga needs specific timing in some cases.


As you probably know Emu68 consists of JIT only, so the only way to turn JIT off is to toggle the power switch. The JIT parameters are highly configurable on runtime and because of that, even such demanding demos as e.g. State of the art (where one CPU instruction modifies subsequent instruction can be run properly. Some people even wrote a wrapper (script) with large database of JIT parameters which can be used to enjoy nearly everything in WHDLoad.

Quote:
Emu68 is finished besides bug fixes?


No. This is moving target and there are still things to implement there, such as MMU for example. There are also some ideas for improving the performance and avoiding unnecessary recompilations after flushing JIT caches. Another wild idea is additional "mixer" which would eventually schedule subsequent instructions into previous ones to avoid stalls on memory read/write, kind of software controller OoO execution. Register renaming could be another funny idea to give it a try (e.g. marking D1 as "alias" of D0 after MOVE.L D0, D1 until it is modified or written into).

The rest are bug fixes and HW drivers (but this is not Emu68 side, only m68k one)

Quote:
Looks good to me as long as the flags are set correctly for a 68k interrupt.


Regular interrupts are "imprecise", so it's up to Emu68 to decide where to handle them. And they are handled in a way where such CCR optimizations do not harm.

Quote:
This pair of 68k instructions could be one instruction and 2 bytes instead of 4 bytes with 68k ISA enhancements. My idea was to introduce an immediate word addressing mode for ADD.L and other OP.L instructions to improve performance metrics. I believe the AC68080 also has an ADDIW.L instruction although it may use a different encoding than my idea.


Extending 680x0 ISA was never on my schedule and I'm not a big fan of it. For that reason the only "extension" in Emu68 are additional control registers.

Quote:
It looks like they would be simple to support in Emu68 if you wanted to have partial compatibility with ColdFire and the AC68080.


Nope, not going to happen :) I prefer to do this kind of optimization when translating m68k to aarch64, having an extra opcode for that wouldn't change anything in my case.

Quote:
After pondering for awhile, my initial thought is that a DSB instruction alone is adequate for interpreted emulation where the result of each instruction is obtained before the next instruction is interpreted. This effectively serializes the execution of code as if the execution was not pipelined. This is similar to using an ISB instruction after every executed instruction.


You are probably right and I might add ISB there. For me the DSB was much more important as memory barriers are essential when dealing with MMIO registers. But you are right, letting it empty the pipeline might be a good thing to do.

 Status: Offline
Profile     Report this post  
kolla 
Re: New Classic Amiga market?
Posted on 31-Jul-2024 21:45:49
#158 ]
Elite Member
Joined: 20-Aug-2003
Posts: 3452
From: Trondheim, Norway

@amigang

Your list is whack, PiStorms with Emu68 runs in circles around other emulators on ARM (if not, people wouldn’t use it, would they)

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

 Status: Offline
Profile     Report this post  
Hammer 
Re: New Classic Amiga market?
Posted on 2-Aug-2024 1:43:07
#159 ]
Elite Member
Joined: 9-Mar-2003
Posts: 6434
From: Australia

@amigang

Quote:

amigang wrote:
Im dropping the idea of comparing speed with Sysinfo/Dhrystones, the number are too over the place and too many different factors. (it be could maybe be useful with context of other benchmarks, like how fast does one of these platform take to render a 3d scene in say Lightwave, I agree would likely be a better benchmark.)

I was more just trying to gage the speeds of each one of these 68K+ platform in relation to one another.

I guess the systems would be in this order, with right config and settings, with PC or pi5 being the fastest out of these systems, the A600GS & A500 mini being pretty much the same, with maybe the A600GS coming out on top due to the better software support being written for ARM gfx.

68K Plus Platforms

PC WinUAE ---------- Fastest (any modern x86 clocked over 2Ghz+)
Raspberry Pi5 ------
Raspberry Pi400 ---
A600 GS ---------------
A500 Mini -------------
Pi4 Pistorm32 -------
Pi3A Pistorm32 -----
Vampire V4 -----------
68060 at 100Mhz ---- Slowest

I think it maybe fun for someone to make a Benchmark suit that could be tested on all these system, just to really find out what difference hardware and even say software each make, like Time to boot, Time to Render scene, FPS in say Quake, lha compression time, browser render times etc.

Something like this would be cool (AmigaONE X1000 benchmarked compared to PPC Macs)
https://amiga-news.de/de/news/AN-2012-02-00011-EN.html


Pi 4B or Pi CM4 PiStorm32 with Emu68 is faster than full-machine Amiga emulation (UAE) on Linux-hosted Pi 400. Unlike UAE or full Amiga machine emulation, Emu68 doesn't emulate the Amiga chipset.

Pi 400 has Pi 4 level hardware i.e. quad-core ARM Cortex-A72 @ 1.8 Ghz.

PC WinUAE ---------- Fastest (any modern x86 clocked over 2Ghz+)
Pi4 Pistorm32 ------- ARM Cortex A72 @ 1.8Ghz, bare-metal 68k emulation only,
Pi4 Pistorm ---------- ARM Cortex A72 @ 1.8Ghz, bare-metal 68k emulation only,
Pi3A+ Pistorm32 --- ARM Cortex A53 @ 1.4Ghz, bare-metal 68k emulation only,
Pi3A+ Pistorm ------ ARM Cortex A53 @ 1.4Ghz, bare-metal 68k emulation only,
Raspberry Pi400 --- full machine emulation with ARM Cortex A72 @ 1.8Ghz and Linux host.
A600 GS --------------- full machine emulation with native ARM graphics layers and Linux host.
A500 Mini ------------- full machine emulation with ARM Cortex A53 @ 1.8Ghz and Linux host.
Vampire V4 ----------- 68080 V4,
68060 at 100Mhz ---- Slowest.

I haven't tested Raspberry Pi5 with UAE and I don't use slower Linux-hosted Musashi.

Last edited by Hammer on 02-Aug-2024 at 01:51 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
Hammer 
Re: New Classic Amiga market?
Posted on 2-Aug-2024 1:58:23
#160 ]
Elite Member
Joined: 9-Mar-2003
Posts: 6434
From: Australia

@kolla

Quote:

kolla wrote:
@Hammer

So a random comment on a youtube vid about project that they have already cancelled… yay…

It wasn't random when the answer was from the AmiCube itself.

As long the solution is an FPGA Amiga AGA clone with support for PiStorm, then it fulfills the goal. There are multiple FPGA selections for hosting the Amiga AGA clone.

For low-cost ARM CPU cores, emulating the full Amiga machine with a Linux host can slow down the 68K emulation e.g. PiStorm-RPi 3A+Emu68 beats Linux-hosted TheA500Mini (full Amiga machine emulation).

Apollo-Core has the potential for a standalone Vampire V4 SAGA with PiStorm32 support since Apollo-Core knows A1200's internal expansion bus protocols and AGA.

Last edited by Hammer on 02-Aug-2024 at 02:13 AM.
Last edited by Hammer on 02-Aug-2024 at 02:13 AM.
Last edited by Hammer on 02-Aug-2024 at 02:11 AM.
Last edited by Hammer on 02-Aug-2024 at 02:09 AM.
Last edited by Hammer on 02-Aug-2024 at 02:03 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle