Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6203 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

22 crawler(s) on-line.

95 guest(s) on-line.

1 member(s) on-line.

davidf215

You are an anonymous user.
Register Now!

davidf215: 3 mins ago

DiscreetFX: 42 mins ago

Hammer: 47 mins ago

matthey: 1 hr 54 mins ago

jingof: 3 hrs 30 mins ago

minator: 3 hrs 55 mins ago

MarisaG: 4 hrs 44 mins ago

OneTimer1: 5 hrs 6 mins ago

Agafaster: 5 hrs 15 mins ago

cpaek72: 6 hrs 25 mins ago

Forum Index

Amiga General Chat

We should be united !!!

Poster

Thread

cdimauro

Re: We should be united !!!
Posted on 13-Jun-2025 5:52:19

[ #61 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4420
From: Germany

@ppcamiga1

Quote:

ppcamiga1 wrote:
mattay agami hammer di mauro
stop trolling start working on mui
you had it written on first page of this thread what to do

You've written about Unix, which is pure shit.

Since YOU like that shit, then it's YOUR work.

BTW, MUI is private and closed source project: nobody else can work on it, besides his author.
You don't know even the basic things of the Amiga, and yet continue to fire complete non-sense. That's the reason why you're an hamster.

Status: Offline

bhabbott

Re: We should be united !!!
Posted on 14-Jun-2025 9:11:22

[ #62 ]

Cult Member

Joined: 6-Jun-2018
Posts: 551
From: Aotearoa

@matthey

Quote:

matthey wrote:

I respectfully disagree. The only place the 68k Amiga virtual machine is succeeding is as EOL support for a dead platform. The user base is likely growing somewhat due to the increased accessibility of affordable 68k Amiga virtual machines but there is minimal development beyond compatibility for an EOL and dead platform. In some ways, the 68k Amiga platform is declining.

Not a dead platform, a retro platform. The emphasis has changed from being a commercial product to an object of nostalgia and familiarity. There is so much I still have to learn and achieve on this 'dead' platform that I doubt I will live long enough to enjoy it all - no need or desire to make it anything more!

The Amiga platform probably is declining, but so what? I don't care what happens to it so long as I get enjoyment out it. Back in the day I was getting enjoyment out of systems that had a user base of only a few hundred. But that was before the internet, when most people were isolated and didn't know what others were doing. Today we are so much more connected that the community only needs a few hundred members around the entire world.

Quote:
Commodore upgraded the AmigaOS to the 68020 ISA which improves performance and code density on 32-bit CPUs but the Hyperion 68k AmigaOS is back to the 16-bit 68000.

Only half true. Commodore also produced OS3.1 for the 68000. The 'improvements' for AGA machines (which had at least a 68020) were minimal.

Quote:
I tried to introduce 68k ISA improvements years ago which would have increased performance, improved code density and made compiler development easier but EOL retro guys were not interested

Not interested because it's not important. For those who want faster operation and more free memory we have PiStorm. Any code improvement you could make pales in comparison to the improvement it provides.

What we really need is PiStorm with MMU so we can debug our code more easily. Maybe some other features too like profiling. But I prefer to do it on vintage hardware because it's actually the retro experience I crave, not the results. The problem with PiStorm is that efficient coding isn't necessary, which robs you of one the most enjoyable aspects of programming on the Amiga.

Quote:
The 68k and Amiga compilers are mostly not getting better as there is no reason to improve them for an EOL virtual machine.

No need, but there could be a desire.

Quote:
There is not much new software for the 68k Amiga either and most is hardware hitting low spec games or quick ports of games like every other platform has. Productivity software is not common and lacking.

You want productivity? Just use a modern PC. There's no shame in using the best tool for the job.

Sometimes I want to do stuff on the Amiga because it's more convenient than firing up the PC (which takes ~2 minutes to boot) or because it's related to something I'm doing on the Amiga. But there's little point trying to replicate stuff that already works well on a PC - other than just to see if if can be done.

Last edited by bhabbott on 14-Jun-2025 at 09:13 AM.
Last edited by bhabbott on 14-Jun-2025 at 09:12 AM.

Status: Offline

Matt3k

Re: We should be united !!!
Posted on 14-Jun-2025 16:22:21

[ #63 ]

Regular Member

Joined: 28-Feb-2004
Posts: 277
From: NY

@bhabbott

You want good productivity software and a boot time less than 20 seconds? Well then, the choice is obvious :)

The fun for me is using your Amiga like system to do as much as possible including productivity. It has been similar in all my years not only with Amiga but my 8 bit days as well.

In 2025 it is arguable that one Amiga Flavor is on par with PC and Mac offerings today, closer than most other times in the history of the platform. Given the literal hundreds of updates combined for Wayfarer, Iris, and PolyOrga it is easy to see why. Now we just need a native spreadsheet to replace TurboCalc... I sense one is coming very soon:)

Status: Offline

matthey

Re: We should be united !!!
Posted on 14-Jun-2025 22:26:13

[ #64 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2733
From: Kansas

bhabbott Quote:

Not a dead platform, a retro platform. The emphasis has changed from being a commercial product to an object of nostalgia and familiarity. There is so much I still have to learn and achieve on this 'dead' platform that I doubt I will live long enough to enjoy it all - no need or desire to make it anything more!

The Amiga platform probably is declining, but so what? I don't care what happens to it so long as I get enjoyment out it. Back in the day I was getting enjoyment out of systems that had a user base of only a few hundred. But that was before the internet, when most people were isolated and didn't know what others were doing. Today we are so much more connected that the community only needs a few hundred members around the entire world.

The 68k Amiga does not have to be declining. It is possible to make low end small footprint 68k Amiga hardware, with enough value and a low enough price, that it would proliferate, new and ex-Amiga people would be able to experience the 68k Amiga and it would survive for future generations to enjoy. Why accept a declining 68k Amiga community when the RPi community is growing and proliferating?

Raspberry Pi celebrates 12 years as sales break 61 million units
https://www.tomshardware.com/raspberry-pi/raspberry-pi-celebrates-12-years-as-sales-break-61-million-units Quote:

According to the Guinness Book of Records, the Commodore 64, released in 1982, is the best-selling desktop computer. Selling 12.5 million units between 1982 and 1993, the Commodore 64. The 61 million units of Raspberry Pi sales are across a great deal of product SKUs, unlike the Commodore 64 which had only a few cost reducing revisions during its lifetime.

Originally released on February 29 2012, the Raspberry Pi was initially created to help university students get to grips with computer science. Initially there was just 10,000 units made for release, a number that were soon snapped up by an army of enthusiasts.

The 68k Amiga has advantages over the RPi like a smaller footprint and retro games. The difference is competitive hardware which is not possible with a 68k Amiga virtual machine.

bhabbott Quote:

Only half true. Commodore also produced OS3.1 for the 68000. The 'improvements' for AGA machines (which had at least a 68020) were minimal.

Commodore believed it was worthwhile to release a 68020 compiled AmigaOS 3.1 to gain the performance and code density advantage. Optimizing for a 16-bit 68000 vs a 32-bit 68020+ is much different. The 68000 32-bit ISA reduces the inefficiency somewhat but the 32-bit 68020-68060 have more in common and can mostly be optimized for together. Commodore did not release separate 68030 and 68040 compiled versions of AmigaOS 3.1 because the difference would be minor in comparison but understood the significant difference between a 16-bit 68000 and 32-bit 68020 . The Hyperion 68k AmigaOS targets a 68k Amiga virtual machine as the 68k Amiga is a 2nd class citizen to PPC AmigaOS 4 hardware. The closed AmigaOS means it is not possible to compile for the 68020 or any other custom target. I understand that the AmigaOS is lightweight so in many cases even a 68000 compiled AmigaOS will have an unnoticeable performance difference on a 68020+ CPU but there are places where the additional performance and smaller executables would be nice.

bhabbott Quote:

Not interested because it's not important. For those who want faster operation and more free memory we have PiStorm. Any code improvement you could make pales in comparison to the improvement it provides.

What we really need is PiStorm with MMU so we can debug our code more easily. Maybe some other features too like profiling. But I prefer to do it on vintage hardware because it's actually the retro experience I crave, not the results. The problem with PiStorm is that efficient coding isn't necessary, which robs you of one the most enjoyable aspects of programming on the Amiga.

A 68k Amiga virtual machine improvements will be in the virtual machine JIT code which is the problem. There has not been any enhancements to the 68k Amiga virtual machine standard and there will not be, including the lack of MMU. A virtual machine MMU implementation is slow so it is out. An extended precision FPU is slow so it is out. Improved code density should provide an improvement to a virtual machine but not as much as real hardware so it is out. The 68k Amiga virtual machine is not a 68k+ standard but a 68k- standard. Performance is the priority and everything else is sacrificed for performance. It is a sign of a dead platform when development shortcuts are supposed to be the way forward but development is actually declining backward.

bhabbott Quote:

No need, but there could be a desire.

We do not need computers at all. We can throw away all our tools and try to survive but it is not desirable. We are better off building on the shoulders of others and moving forward but the 68k Amiga is no longer moving forward and much of the good tech will be forgotten over the next 20 years because it will not be passed on to the next generation. A few people in future generations will play retro games on a 68k Amiga virtual machine but the next generation will mostly know nothing about the hardware and software. The 68k and Amiga are on course to experience a mass extinction event over the next 20 years despite hundreds of thousands if not millions of 68k and Amiga fans right now. RPi hardware sales will continue to grow and be adopted by the next generation at the same time.

bhabbott Quote:

You want productivity? Just use a modern PC. There's no shame in using the best tool for the job.

Sometimes I want to do stuff on the Amiga because it's more convenient than firing up the PC (which takes ~2 minutes to boot) or because it's related to something I'm doing on the Amiga. But there's little point trying to replicate stuff that already works well on a PC - other than just to see if if can be done.

RPi hardware that costs as much as a meal is capable of productivity software but not the 68k Amiga even though the Amiga used to make it possible. The Amiga dinosaurs accept extinction.

Status: Offline

Hammer

Re: We should be united !!!
Posted on 14-Jun-2025 23:51:55

[ #65 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6481
From: Australia

@bhabbott

Quote:
What we really need is PiStorm with MMU so we can debug our code more easily. Maybe some other features too like profiling. But I prefer to do it on vintage hardware because it's actually the retro experience I crave, not the results. The problem with PiStorm is that efficient coding isn't necessary, which robs you of one the most enjoyable aspects of programming on the Amiga.

Efficient coding partly matters since the ARM Cortex A72 core is relatively weak by modern standards. For RPi CM4 and Emu68, having a 700-750 MHz Pentium III with disabled SSE equivalent 68K is just a 1999-level CPU.

RPi 5's ARM Cortex A76 can brute force Amiberry to be faster than RPi CM4 and Emu68 combo.

---------------------------------
Atm, RPi 5 can't be used for PiStorm due to higher latency PCIe being connected to GPIO. The batching nature of the PC graphics API is inherent from PCIe.

https://www.aewin.com/application/pcie-5-0-harnessing-the-power-of-high-speed-data-transfers/
PCIe 5.0 has lower latency than previous generations.

Newer PCIe generations (e.g., PCIe 4.0, 5.0) reduce latency compared to older versions.

https://www.synopsys.com/articles/pcie-7-design-ai-bandwidth.html
PCIe 7.0 reduces latency vital for real-time processing and responsiveness in AI algorithms and high-speed data processing in HPC.

PCIe is adapting towards lower latency due to increased GpGPU AI / CPU IO exchange workload.

Lower latency benefits any future PiStorm-like solution.

https://www.crucial.com/support/articles-faq-ssd/pcie-speeds-limitations
PCie generation's relative improvements:
PCIe 1.0/1.1 = Latency is Moderate, Bandwidth overhead is High,
PCIe 2.0/2.0 = Latency is Low, Bandwidth overhead is High,
PCIe 3.0/3.1 = Latency is Very Low, Bandwidth overhead is Lower,
PCIe 4.0/4.1 = Latency is Very Low, Bandwidth overhead is Low,
PCIe 5.0 = Latency is Lowest, Bandwidth overhead is Lowest,

PCIe 1.x and PCIe 2.x are f__kups for GpGPU, but the PC world adapts when there's a competitive use case. PCie f__kups also affected RPi 5's old PCIe implementation.

Last edited by Hammer on 14-Jun-2025 at 11:53 PM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Hammer

Re: We should be united !!!
Posted on 15-Jun-2025 0:25:10

[ #66 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6481
From: Australia

@matthey

Quote:
The 68k Amiga has advantages over the RPi like a smaller footprint

68K wouldn't win code density with multimedia stream processing.

Semi-modern 128-bit SIMD ADD can process four (FP32/INT32) or eight (FP16/INT16) data pair elements with a single instruction vs 68K's single data element pair i.e. four or eight 68K instructions for the same job.

FMA is a single instruction for fused MUL and ADD, while they are separate on 68K.

128-bit FMA3 SIMD instruction murders 68K equivalent multiple instruction sequence.

PPC was ideal for 3D due to FMA instructions. The x86 world used higher clock speeds until they gained FMA, starting with the SSE4.2 generation. RISC competition struggled with high clock speeds, including the Alpha CPU family being defeated in the 1 GHz race.

X86 and 68K have advantages with load-store fused simple arithmetic operations.

ARM Cortex A53 has dual FPU/FMA/64-bit SIMD pipelines, a little CPU core designed for 3D and multimedia.

68K is a victim of RISC.

Last edited by Hammer on 15-Jun-2025 at 12:37 AM.
Last edited by Hammer on 15-Jun-2025 at 12:35 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

matthey

Re: We should be united !!!
Posted on 15-Jun-2025 3:20:27

[ #67 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2733
From: Kansas

Hammer Quote:

68K wouldn't win code density with multimedia stream processing.

Semi-modern 128-bit SIMD ADD can process four (FP32/INT32) or eight (FP16/INT16) data pair elements with a single instruction vs 68K's single data element pair i.e. four or eight 68K instructions for the same job.

The majority of the footprint of a computer system is memory used by code in the OS. A paper estimated that 75% of the footprint of a particular 32-bit system was code but I expect wide variations. A 64-bit system would likely have a lower percentage used for code as a higher percentage of memory is used for data because of 64-bit pointers in structures. The code of an OS is mostly integer instructions. The 68k AmigaOS is especially light on floating point code. Even floating point heavy code is mostly integer instructions. Recall the old Photoshop analysis by Caesar DiMauro.

x86 & x64 Statistics â€“ Part 1 (Instruction Macrofamilies)
https://www-appuntidigitali-it.translate.goog/18054/statistiche-su-x86-x64-parte-1-macrofamiglie-di-istruzioni/?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en-US&_x_tr_pto=wapp Quote:

Below are the results for PS32:

Class Count % Avg sz
INTEGER 1631136 93.39 3.2
FPU 114521 6.56 3.2
SSE 912 0.05 4.0

and for PS64:

Class Count % Avg sz
INTEGER 1638505 94.31 4.3
SSE 93942 5.41 5.2
FPU 4884 0.28 3.1

Floating point heavy Photoshop was ~94% integer instructions for x86 and x86-64. The SIMD instructions for x86 were 0.05% of the total while they were 5.41% for x86-64 because the x86 FPU was deprecated. Most of the x86-64 SIMD instructions are scalar floating point operations that could be performed in a FPU like the x86 version. Integer code density is by far the most important because it is most of the code. FPU code density is less important and SIMD code density should be practically negligible, even with the large average SIMD instruction size on x86-64. However, x86-64 uses the SIMD unit for scalar operations like a FPU and they are frequent enough and large enough to have a small impact on overall code density with floating point heavy code. An OS is not floating point heavy so the impact on the memory footprint is likely to be negligible for x86-64 and less for an architecture with a FPU using more compact FPU instructions.

Hammer Quote:

FMA is a single instruction for fused MUL and ADD, while they are separate on 68K.

128-bit FMA3 SIMD instruction murders 68K equivalent multiple instruction sequence.

PPC was ideal for 3D due to FMA instructions. The x86 world used higher clock speeds until they gained FMA, starting with the SSE4.2 generation. RISC competition struggled with high clock speeds, including the Alpha CPU family being defeated in the 1 GHz race.

FMA generally does not use intermediate rounding meaning results are different than separate FMUL and FADD instructions. Compilers generally do not use FMA to replace FMUL+FADD because the result would be different so there is generally no code density advantage. It is generally necessary to use GCC -ffast-math and/or -mfma which can have other unwanted side effects or call the C99 functions fma(), fmaf() or fmal().

https://en.cppreference.com/w/c/numeric/math/fma

It may be possible to have an unrounded/fused FMA and a rounded/unfused FMA using the same pipeline where the intermediate rounding stage is skipped on the fused FMA but I do not know of any architecture that does this. One instruction encoding bit could select between the two with the unfused FMA usable for most compiled code with a potential code density benefit as FMUL+FADD are a common combo. However, FPU and SIMD code density improvements would likely barely affect overall code density and the system footprint.

Hammer Quote:

X86 and 68K have advantages with load-store fused simple arithmetic operations.

The 68k FPU uses CISC Fop mem-reg instructions but does not allow CISC Fop reg,mem instructions including RMW instructions. This gains most of the CISC advance while simplifying the FPU. Likewise for the FPU, a FMA mem,reg,reg should be possible but likely not a FMA reg,reg,mem.

Hammer Quote:

ARM Cortex A53 has dual FPU/FMA/64-bit SIMD pipelines, a little CPU core designed for 3D and multimedia.

68K is a victim of RISC.

RISC is a victim of x86(-64). The 68k is a victim of a bad AIM.

Last edited by matthey on 15-Jun-2025 at 03:25 AM.

Status: Offline

Hammer

Re: We should be united !!!
Posted on 16-Jun-2025 2:39:42

[ #68 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6481
From: Australia

@matthey

That's a flawed Photoshop argument when 2D effects are performed by small performance-critical code sections.

Photoshop is not computationally extensive when compared to raytracing or 3D games or video NLE use cases.

https://www.phoronix.com/news/FFmpeg-AVX-512-uyvytoyuv422
AVX-512 Optimization For FFmpeg Shows Wild Improvement On AMD Ryzen

Merged today for the widely-used FFmpeg open-source multimedia library was yet another AVX-512 optimized code path... Compared to the pure C code, the AVX2 code path was 10.98x faster while this new AVX-512 code path clocks in at 18x the performance of the common C code.

The latest FFmpeg code seeing the AVX-512 treatment is the uyvytoyuv422 function for UYVY to YUV422 format conversion. The AVX-512 optimized code path via hand-written Assembly is a great benefit here. AVX-512 namely found with Intel Xeon processors or all AMD Ryzen and EPYC processors since Zen 4. The benchmarks posted for this patch were carried out with an AMD Ryzen 9 7950X.

https://www.phoronix.com/news/Intel-AVX10-Drops-256-Bit
Date: 19 March 2025
Intel's AVX10 version 3.0 drops AVX10's optional 512-bit, it's full AVX512 with future e-Cores.

Within GCC patches posted today it's also spelled out clearly:

In this new whitepaper, all the platforms will support 512 bit vector width (previously, E-core is up to 256 bit, leading to hybrid clients and Atom Server 256 bit only). Also, 256 bit rounding is not that useful because we currently have rounding feature directly on E-core now and no need to use 256-bit rounding as somehow a workaround. HW will remove that support.

Thus, there is no need to add avx10.x-256/512 into compiler options. A simple avx10.x supporting all vector length is all we need. The change also makes -mno-evex512 not that useful. It is introduced with avx10.1-256 for compiling 256 bit only binary on legacy platforms to have a partial trial for avx10.x-256. What we also need to do is to remove 256 bit rounding.

Intel will fully support X86-64 v4 (AVX512) with future desktop SKUs.

https://www.phoronix.com/benchmark/result/amd-ryzen-9-7950x-ryzen-9-9950x-avx-512-comparison/ospray-gravity_spheres_volume-dim_512-ao-real_time.svgz
OSPRay 3.2 boosted by AVX-512, higher is better
gravity spheres volume
Ryzen 9 9950X
AVX-512 On = 9.727
AVX-512 Off = 6.7006

Ryzen 9 7950X
AVX-512 On = 8.07646
AVX-512 Off = 4.44233

https://www.phoronix.com/benchmark/result/amd-ryzen-9-7950x-ryzen-9-9950x-avx-512-comparison/ospray-gravity_spheres_volume-dim_512-pathtracer-real_time.svgz
OSPRay 3.2
Gravity Sphere volume Path Tracer
Ryzen 9 9950X
AVX-512 On = 11.03
AVX-512 Off = 9.83

Ryzen 9 7950X
AVX-512 On = 9.17
AVX-512 Off = 6.35

https://www.phoronix.com/benchmark/result/amd-ryzen-9-7950x-ryzen-9-9950x-avx-512-comparison/tensorflow-cpu-64-resnet-50.svgz
TensorFlow 2.16.1 (higher is better)
Ryzen 9 9950X
AVX-512 On = 51.74
AVX-512 Off = 38.99

Ryzen 9 7950X
AVX-512 On = 44.08
AVX-512 Off =19.73

https://whatcookie.github.io/posts/why-is-avx-512-useful-for-rpcs3/
Why Is AVX 512 Useful for RPCS3?
https://whatcookie.github.io/Gow3Comparison.png
From left to right: SSE2, SSE4.1, AVX2/FMA, and Icelake tier AVX-512.
Running God of War 3 game on PS3 emulator
SSE2 = 4.83 fps
SSE4.1 = 165.73 fps
AVX2/FMA = 187.36 fps
AVX-512 = 241.97 fps

The performance when targeting SSE2 is absolutely terrible, likely due to the lack of the pshufb instruction from SSSE3. pshufb is invaluable for emulating the shufb instruction, and itâ€™s also essential for byteswapping vectors, something thatâ€™s necessary since the PS3 is a big endian system, while x86 is little-endian.

Supplemental Streaming SIMD Extensions 3(SSSE3) pshufb is invaluable for emulating PS3 CELL's shufb vector instruction. Shufb is pack byte 128-bit vector instruction. Intel made sure the X86 world has the necessary vector instruction set update that is on par with IBM CELL.

https://steamcommunity.com/app/1085660/discussions/0/3105764982068500052/?l=brazilian
Game not launching is related to non AVX & AES-NI support

Due to AMD's Zen 2 standard in the current generation game console market, many modern PC games require the AVX2 (Advanced Vector Extensions 2) instruction set.

From Google AI,

Spider-Man 2: As mentioned, it explicitly requires AVX and AVX2.

Death Stranding: The Steam store page notes it requires AVX.

Sony Ports: Many Sony PC ports like God of War, Horizon Forbidden West, Ratchet & Clank: Rift Apart, The Last of Us Part 1, and Uncharted 4 require AVX2.

Alan Wake 2: This game also requires AVX2 support.

Forza Horizon series: Forza titles, including Forza Horizon 3, 4, and 5, require AVX2.

Microsoft Flight Simulator (MSFS 2020): This simulation game relies on AVX2.

Other Notable Titles: Yakuza: Like a Dragon, Persona 5 Royal, Helldivers 2, Dying Light 2, and Resident Evil 8 Village are also known to require AVX2.

Emulators: Some emulators like Yuzu (Nintendo Switch emulator) and Xenia (Xbox 360 emulator) also require AVX2.

https://walbourn.github.io/directxmath-avx2/
DirectX Math AVX2 support across the board.

https://walbourn.github.io/directxmath-arm64/
DirectX Math ARM64 supports NEON i.e. the Windows on ARM (64-bit) platform assumes support for ARMv8, ARM-NEON, and VFPv4.

https://chipsandcheese.com/p/cinebench-2024-reviewing-the-benchmark
Cinebench 2024: Reviewing the Benchmark

Both libx264 and Cinebench contrast with Y-Cruncher, which is dominated by AVX-512 instructions. AVX-512 is used in Cinebench 2024, but in such low amounts that itâ€™s irrelevant.

Although SSE and AVX provide 128-bit and 256-bit vector support respectively, Cinebench 2024 makes little use of vector compute. Most AVX or SSE instructions operate on scalar values. The most common FP/vector math instructions are VMULSS (scalar FP32 multiply) and VADDSS (scalar FP32 add). About 6.8% of instructions do math on 128-bit vectors. 256-bit vectors are nearly absent, but AVX isnâ€™t just about vector length. It provides non-destructive three operand instruction formats, and Cinebench leverages that.

Cinebench 2024 exploits classic RISC workstations' non-destructive three-operand instruction formats, which AVX2's scalar supports.

Cinebench 2024's code base is old from the scalar multithreading era, despite being marketed with 2024 year.

Blender's Cycles raytracing render engine utilizes Advanced Vector Extensions (AVX) instructions, including AVX, AVX2, and AVX-512, to enhance rendering performance.

Future PS6's CPU has AVX512 via AMD's Zen 5 or 6. AMD has already designed a cost-reduced embedded Zen 5c (compact) that is used in AMD's Strix Point laptops' E-Cores.
PS5 used a PC laptop's area-reduced Zen 2 variant.

I wonder who exited Amiga's primary gaming audience.

https://www.youtube.com/watch?v=vxYnz3S87uE

Carmageddon 68k port on Amiga 4000 with 68060 running in 100mhz and zz9000 gfx doesent get a good framrate.

In 1997, I had a PC with Pentium 166 Mhz, 430VX chipset,

Quote:

RISC is a victim of x86(-64). The 68k is a victim of a bad AIM.

https://www.youtube.com/watch?v=SQDZV3K1jYI
The First ARM Transition: Palm Dragonball 68000 CPU emulation. ARM9xxx (ARMv4T) displaced 68000-based Dragonball.
https://en.wikipedia.org/wiki/ARM9 (the Dragonball VZ killer)

ARM9's +100Mhz clock speed existed with DEC's StrongARM.

ARMv8/ARMv9 is competing.

https://deepcomputing.io/product/dc-roma-risc-v-ai-pc/
SiFive P550 CPU (8 cores) + Imagination iGPU SoC mainboard for the Framework 13 laptop form factor is competing. This is the 2nd RISC-V mainboard for the Framework 13 form factor.

Last edited by Hammer on 16-Jun-2025 at 03:27 AM.
Last edited by Hammer on 16-Jun-2025 at 03:17 AM.
Last edited by Hammer on 16-Jun-2025 at 03:13 AM.
Last edited by Hammer on 16-Jun-2025 at 03:10 AM.
Last edited by Hammer on 16-Jun-2025 at 03:00 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

agami

Re: We should be united !!!
Posted on 16-Jun-2025 3:51:59

[ #69 ]

Super Member

Joined: 30-Jun-2008
Posts: 1953
From: Melbourne, Australia

@cdimauro

Quote:
cdimauro wrote:
@agami

Where to get the funds? Do you have some already, and/or do you plan to ask for some?

If uniting the various Amiga-compatible missions were my goal, then I would be taking the aforementioned course of action.

I've talked about it a few times in a bunch of threads: My goal is to establish the third commercial personal computing platform, in somewhat of a spiritual mirroring of what Hi-Toro/Amiga Corporation accomplished (despite Commodore) in the mid to late '80s. That's where my fundraising plans and efforts are concentrated.

Quote:
Quote:
Is this "new 68k ISA":
- binary-compatible with the existing one
or
- assembly-level compatible (100% or very close)
or
- strongly inspired (e.g.: very similar in many aspects, but a distinct, new ISA)?

The three architectures scenarios are already so much different that it'll take ages to reach a consensus.

Consensus belongs to the person paying for the outcome. Were I that person, I may initially think that the 3rd option is best aligned to the mission/vision, but I would want to hear from the advisory board on pros and cons for each before making the call.

Quote:
I can't imagine what'd happen when defining the rest of the platform...

Again, this would not be some open source project. Advisors advise, product manager makes the call.

Last edited by agami on 16-Jun-2025 at 03:55 AM.
Last edited by agami on 16-Jun-2025 at 03:53 AM.

_________________
All the way, with 68k

Status: Offline

cdimauro

Re: We should be united !!!
Posted on 16-Jun-2025 4:34:05

[ #70 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4420
From: Germany

@matthey

Quote:

matthey wrote:
Hammer Quote:

68K wouldn't win code density with multimedia stream processing.

Semi-modern 128-bit SIMD ADD can process four (FP32/INT32) or eight (FP16/INT16) data pair elements with a single instruction vs 68K's single data element pair i.e. four or eight 68K instructions for the same job.

The majority of the footprint of a computer system is memory used by code in the OS. A paper estimated that 75% of the footprint of a particular 32-bit system was code but I expect wide variations.

Could you please share the paper or, at least, its title? It's very very interesting to have it as a reference study on this subject.
Quote:
A 64-bit system would likely have a lower percentage used for code as a higher percentage of memory is used for data because of 64-bit pointers in structures.

Yes, assuming that there's not much difference between the 32 and 64 bit architecture.

However, differences might be very relevant (e.g.: x86 vs x86-64 -> the latter has around 25% more space for the code, compared to the former).
Quote:
The code of an OS is mostly integer instructions. The 68k AmigaOS is especially light on floating point code. Even floating point heavy code is mostly integer instructions. Recall the old Photoshop analysis by Caesar DiMauro.

x86 & x64 Statistics â€“ Part 1 (Instruction Macrofamilies)
https://www-appuntidigitali-it.translate.goog/18054/statistiche-su-x86-x64-parte-1-macrofamiglie-di-istruzioni/?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en-US&_x_tr_pto=wapp Quote:

Below are the results for PS32:

Class Count % Avg sz
INTEGER 1631136 93.39 3.2
FPU 114521 6.56 3.2
SSE 912 0.05 4.0

and for PS64:

Class Count % Avg sz
INTEGER 1638505 94.31 4.3
SSE 93942 5.41 5.2
FPU 4884 0.28 3.1

Floating point heavy Photoshop was ~94% integer instructions for x86 and x86-64. The SIMD instructions for x86 were 0.05% of the total while they were 5.41% for x86-64 because the x86 FPU was deprecated. Most of the x86-64 SIMD instructions are scalar floating point operations that could be performed in a FPU like the x86 version. Integer code density is by far the most important because it is most of the code. FPU code density is less important and SIMD code density should be practically negligible, even with the large average SIMD instruction size on x86-64. However, x86-64 uses the SIMD unit for scalar operations like a FPU and they are frequent enough and large enough to have a small impact on overall code density with floating point heavy code. An OS is not floating point heavy so the impact on the memory footprint is likely to be negligible for x86-64 and less for an architecture with a FPU using more compact FPU instructions.

Correct: code / code density is dominated by "integer" / GP instructions.
Quote:
Hammer Quote:

FMA is a single instruction for fused MUL and ADD, while they are separate on 68K.

128-bit FMA3 SIMD instruction murders 68K equivalent multiple instruction sequence.

PPC was ideal for 3D due to FMA instructions. The x86 world used higher clock speeds until they gained FMA, starting with the SSE4.2 generation. RISC competition struggled with high clock speeds, including the Alpha CPU family being defeated in the 1 GHz race.

FMA generally does not use intermediate rounding meaning results are different than separate FMUL and FADD instructions. Compilers generally do not use FMA to replace FMUL+FADD because the result would be different so there is generally no code density advantage. It is generally necessary to use GCC -ffast-math and/or -mfma which can have other unwanted side effects or call the C99 functions fma(), fmaf() or fmal().

https://en.cppreference.com/w/c/numeric/math/fma

It may be possible to have an unrounded/fused FMA and a rounded/unfused FMA using the same pipeline where the intermediate rounding stage is skipped on the fused FMA but I do not know of any architecture that does this. One instruction encoding bit could select between the two with the unfused FMA usable for most compiled code with a potential code density benefit as FMUL+FADD are a common combo. However, FPU and SIMD code density improvements would likely barely affect overall code density and the system footprint.

Again, the rounding problems. Dammit.

Another solution is to have an FMADD instruction which is simply a MUL+ADD with both operations using the current rounding mode.
Quote:
Hammer Quote:

X86 and 68K have advantages with load-store fused simple arithmetic operations.

The 68k FPU uses CISC Fop mem-reg instructions but does not allow CISC Fop reg,mem instructions including RMW instructions. This gains most of the CISC advance while simplifying the FPU. Likewise for the FPU, a FMA mem,reg,reg should be possible but likely not a FMA reg,reg,mem.

It would be possible, but it wastes too much encoding on the 68k.

At the end, loads are much more (usually twice) the stores, so having only the possibility from reading from memory is enough for the 68k.

Status: Offline

cdimauro

Re: We should be united !!!
Posted on 16-Jun-2025 4:43:54

[ #71 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4420
From: Germany

@Hammer

Quote:

Hammer wrote:
@matthey

That's a flawed Photoshop argument when 2D effects are performed by small performance-critical code sections.

Photoshop is not computationally extensive when compared to raytracing or 3D games or video NLE use cases.
[...]

Totally irrelevant, since this part of discussion was/is about CODE DENSITY.

BTW and FYI, AVX-512 has a very poor code density.

Whereas the 68k's FPU has a very good code density, thanks to the instructions ability of directly referencing memory AND the post-increment addressing mode.

An example of the CISC advantage: https://www.appuntidigitali.it/21533/nex64t-7-the-new-simd-vector-unit/
I bet that the 68k's FPU can do a similar good job because of the above reasons.

Anyway, and as I've already explained even other times, code density isn't much relevant on FPU/SIMD code: it's on integer/GP code which is very very important.

Status: Offline

cdimauro

Re: We should be united !!!
Posted on 17-Jun-2025 4:42:07

[ #72 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4420
From: Germany

@minator

Quote:

minator wrote:
@matthey

Quote:
ARM Thumb licensed from Hitachi

You keep saying this, but it's not true.

Thumb was invented quite independently by then Arm Chief Architect Dave Jagger in the early 90s as a way to decrease code size. They probably found some prior art when they went to file patents, and licensed the relevant patents from Hitachi.

How it came about is mentioned in a talk on Youtube just after 27 minutes.

Thanks for the great video, I really enjoyed it!

IMO this Hitachi licensing is more likely related to some patent which was filed for SuperH and that was "accidentally" (!) used by Thumb. So, ARM had to license it to avoid legal issues.

Besides that, what I also liked a lot is the chart which was showing how many ARM processors were sold, split by "category". Cortex-M dominates even nowadays, which speaks a lot.

Status: Offline

Hammer

Re: We should be united !!!
Posted on 18-Jun-2025 6:10:41

[ #73 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6481
From: Australia

@cdimauro

Quote:
Totally irrelevant, since this part of discussion was/is about CODE DENSITY.

My argument position is relevant for arithmetic operations intensity.

Quote:

BTW and FYI, AVX-512 has a very poor code density.

Factor in instruction issue slots. How many 68K instructions to match one AVX-512 VNNI VPDPBUSD?

Quote:

An example of the CISC advantage: https://www.appuntidigitali.it/21533/nex64t-7-the-new-simd-vector-unit/

That's largely fiction. Another two weeks(TM)?

I can buy a SiFive P550 8 cores based SoC mainboard for my Framework 13 laptop empty case. https://deepcomputing.io/product/dc-roma-risc-v-ai-pc/
The Framework 13 laptop has two RISC-V mainboard generations already.

Quote:

I bet that the 68k's FPU can do a similar good job because of the above reasons.

Which 68k FPU? 68882 FPU? 68060 FPU? LOL

Quote:

Anyway, and as I've already explained even other times, code density isn't much relevant on FPU/SIMD code: it's on integer/GP code which is very very important.

My focus on FPU/SIMD is 3D games i.e. Amiga's primary target audience when the platform was mainstream.

Code density is not a major concern in modern GPUs e.g. for CUDA GPUs, each 8-byte instruction is accompanied by an 8-byte â€œop-steeringâ€ control block that is not publicly documented. The usual motivation is to simplify processor hardware with maximum math units. An X86 GPU was attempted by Intel.

68060 sucked at floating point 3D.

I purchased TF1260 with a 68060 rev1 1994 era CPU to investigate the 68060 hype.

68K's code density advantage wasn't able to overcome the subpar FPU, 68060's L1 cache fetch width, and 32-bit data external bus IO bottlenecks.

A copy-and-paste engineering would retain 68060's external 32-bit data bus design scaled to 1 GHz as a "What If" example. 1 GHz 32bit (4 byte) external data IO is equivalent to 250 MHz 128-bit. Who's going to improve 68060's external 32-bit data bus within the proposed SoC?

Modern gaming SoCs have a ZLIB hardware block with storage I/O.

Again: deliver a superior Quake benchmark result against Pentium 100. This is getting f_cking absurd with promises from 68K fanboys.

For 1995's what if, deliver a f_cking PS1-like plan with a f_cking 68060! You can't do it with Motorola's cost structure.

Show me the money.

Last edited by Hammer on 19-Jun-2025 at 07:52 AM.
Last edited by Hammer on 18-Jun-2025 at 07:34 AM.
Last edited by Hammer on 18-Jun-2025 at 06:55 AM.
Last edited by Hammer on 18-Jun-2025 at 06:30 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

matthey

Re: We should be united !!!
Posted on 18-Jun-2025 21:25:31

[ #74 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2733
From: Kansas

cdimauro Quote:

Could you please share the paper or, at least, its title? It's very very interesting to have it as a reference study on this subject.

I did not save the paper because I did not think it was good. The paper was useful only because it made me think about how memory is used in relation to the footprint. I looked again for it and did not find it but I found some good info on footprint which I will put in your old code density thread.

cdimauro Quote:

Again, the rounding problems. Dammit.

Another solution is to have an FMADD instruction which is simply a MUL+ADD with both operations using the current rounding mode.

Both a fused (no intermediate rounding) and non fused (with intermediate rounding) version of FMA is what I was talking about. With a 68k floating-point FMA instruction, the 'F' would stand for floating-point, not fused, so a fused version may be FFMA (floating-point fused multiply add) and a non fused may be FMA (floating-point multiply add). An ISA which has both encodings would make switching between them easier for compilers than just having a fused multiply add as well as improve floating point code density, even though it is less important than integer code density. From the hardware perspective, I do not know if is a good idea though. I know it is possible to skip pipeline stages (the intermediate rounding stage) but there may be other considerations. It likely could be split into separate FMUL and FADD instructions for FPUs without a FMA pipeline so it likely would be ok.

cdimauro Quote:

It would be possible, but it wastes too much encoding on the 68k.

At the end, loads are much more (usually twice) the stores, so having only the possibility from reading from memory is enough for the 68k.

Right. There are usually 2 to 3 times as many loads as stores and, as I recall, the load to store ratio increases with floating point code. Any performance gain from "Fop reg,mem" would likely not be worth the increased complexity and encoding space loss.

Status: Offline

cdimauro

Re: We should be united !!!
Posted on 22-Jun-2025 21:05:07

[ #75 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4420
From: Germany

@matthey

Quote:

matthey wrote:
cdimauro Quote:

Could you please share the paper or, at least, its title? It's very very interesting to have it as a reference study on this subject.

I did not save the paper because I did not think it was good. The paper was useful only because it made me think about how memory is used in relation to the footprint. I looked again for it and did not find it but I found some good info on footprint which I will put in your old code density thread.

OK, np. Your contribute with the memory footprint is already important, thanks.
Quote:
cdimauro Quote:

Again, the rounding problems. Dammit.

Another solution is to have an FMADD instruction which is simply a MUL+ADD with both operations using the current rounding mode.

Both a fused (no intermediate rounding) and non fused (with intermediate rounding) version of FMA is what I was talking about. With a 68k floating-point FMA instruction, the 'F' would stand for floating-point, not fused, so a fused version may be FFMA (floating-point fused multiply add) and a non fused may be FMA (floating-point multiply add).
An ISA which has both encodings would make switching between them easier for compilers than just having a fused multiply add as well as improve floating point code density, even though it is less important than integer code density.

From the hardware perspective, I do not know if is a good idea though. I know it is possible to skip pipeline stages (the intermediate rounding stage) but there may be other considerations. It likely could be split into separate FMUL and FADD instructions for FPUs without a FMA pipeline so it likely would be ok.

The mnemonic isn't that important. However, I've just checked the x64 manual, and there are only the FMA versions (several VFM* instructions) which only operate on SIMD data. x87 has no such instructions.

The multiplication is working with "infinite precision", and the rounding is applied only after the addition (according to the FP size: 64, 32 or 16 bit).

Since there's no MUL + ADD instruction, I assume that the "infinite precision" for the intermediate result gives no problem.

Something similar can be made with 68k, but only if there's a field which allows to select the rounding precision after the addition (since the FPU always operate in extended precision).

Status: Offline

cdimauro

Re: We should be united !!!
Posted on 22-Jun-2025 21:23:08

[ #76 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4420
From: Germany

@Hammer

Quote:

Hammer wrote:
@cdimauro

Quote:
Totally irrelevant, since this part of discussion was/is about CODE DENSITY.

My argument position is relevant for arithmetic operations intensity.

And it remains completely irrelevant for the same reason which I've already tried to explain you, in vain, since you don't get it (which is the usual problem with you: you talk of things that you've no clue, at all).
Quote:
Quote:

BTW and FYI, AVX-512 has a very poor code density.

Factor in instruction issue slots.

Talking about code density?
Quote:
How many 68K instructions to match one AVX-512 VNNI VPDPBUSD?

Let me check my crystal ball... ah, yeah: it says that more instructions will be added to AVX-512 in future (and to other architectures as well), which require more 68k instructions to be emulated.

However, the ball still says that this is irrelevant regarding the context (which was: code density).
Quote:
Quote:

An example of the CISC advantage: https://www.appuntidigitali.it/21533/nex64t-7-the-new-simd-vector-unit/

That's largely fiction.

I reveal you a secret: the 68k CPU & FPU are already able to... directly address a memory operand... and use post-increment when accessing arrays.

Note: it was the crystal ball which told it to me.
Quote:
Another two weeks(TM)?

It was completed a couple of years ago. Now it's time for a new architecture (already mostly defined. I still have some pending stuff for the kernel mode).
Quote:
I can buy a SiFive P550 8 cores based SoC mainboard for my Framework 13 laptop empty case. https://deepcomputing.io/product/dc-roma-risc-v-ai-pc/
The Framework 13 laptop has two RISC-V mainboard generations already.

Completely irrelevant?
Quote:
Quote:

I bet that the 68k's FPU can do a similar good job because of the above reasons.

Which 68k FPU? 68882 FPU? 68060 FPU? LOL

The 68881 is enough when talking about the FPU: it's already able to access a memory location, and use the post-increment addressing mode.

A SIMD unit... it still has to be defined.
Quote:
Quote:

Anyway, and as I've already explained even other times, code density isn't much relevant on FPU/SIMD code: it's on integer/GP code which is very very important.

My focus on FPU/SIMD is 3D games

And... who cares? The situation will be the same: the code of a game is largely dominated by "integer"/GP instructions.
Quote:
i.e. Amiga's primary target audience when the platform was mainstream.

This doesn't automatically become true only because YOU said it.

In fact, how many 3D games were published for the Amiga?
Quote:
Code density is not a major concern in modern GPUs e.g. for CUDA GPUs, each 8-byte instruction is accompanied by an 8-byte â€œop-steeringâ€ control block that is not publicly documented. The usual motivation is to simplify processor hardware with maximum math units. An X86 GPU was attempted by Intel.

68060 sucked at floating point 3D.

I purchased TF1260 with a 68060 rev1 1994 era CPU to investigate the 68060 hype.

68K's code density advantage wasn't able to overcome the subpar FPU, 68060's L1 cache fetch width, and 32-bit data external bus IO bottlenecks.

A copy-and-paste engineering would retain 68060's external 32-bit data bus design scaled to 1 GHz as a "What If" example. 1 GHz 32bit (4 byte) external data IO is equivalent to 250 MHz 128-bit. Who's going to improve 68060's external 32-bit data bus within the proposed SoC?

Modern gaming SoCs have a ZLIB hardware block with storage I/O.

Again: deliver a superior Quake benchmark result against Pentium 100. This is getting f_cking absurd with promises from 68K fanboys.

For 1995's what if, deliver a f_cking PS1-like plan with a f_cking 68060! You can't do it with Motorola's cost structure.

Show me the money.

Again, you completely derail the discussion talking of totally different things -> pure non-sense AKA Hammer's PADDING.

If you want to discuss of OTHER things, then you can open a thread and move there your hallucinations.

Because this has NOTHING to do with the code density, where everything which was already discussed applies.

In fact, and to prove ignorance, I can give you a homework: pick whatever architecture explicitly supports increasing code density (e.g: have a proper subset of the opcodes/instructions ONLY FOR THIS PURPOSE) and count how many of those instructions are for GP/"integer" computing, how many for the FPU and how many for the SIMD unit.

I'm preparing the space ship to colony the Andromeda Galaxy, filling it with popcorns...

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle