Click Here
home features news forums classifieds faqs links search
6071 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
24 crawler(s) on-line.
 91 guest(s) on-line.
 1 member(s) on-line.


 pixie

You are an anonymous user.
Register Now!
 pixie:  2 mins ago
 bhabbott:  8 mins ago
 Birbo:  15 mins ago
 amigakit:  1 hr 11 mins ago
 kolla:  1 hr 44 mins ago
 Beajar:  1 hr 51 mins ago
 VooDoo:  2 hrs 43 mins ago
 Hammer:  2 hrs 58 mins ago
 Musashi5150:  3 hrs 17 mins ago
 amigang:  3 hrs 42 mins ago

/  Forum Index
   /  Amiga OS4 Hardware
      /  some words on senseless attacks on ppc hardware
Register To Post

Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 Next Page )
PosterThread
agami 
Re: some words on senseless attacks on ppc hardware
Posted on 31-Jan-2024 23:25:38
#801 ]
Super Member
Joined: 30-Jun-2008
Posts: 1648
From: Melbourne, Australia

@BigD

It's always about high performance with these guys.

We said the Cell CPU was cool in reference to its architecture, not because it had top performance.
In many ways it was a curious choice for Sony to put it in a game console, even if it was intended as more than just a game console.
By its design, it was more suited to cluster computing.

As it happens, around the turn of the millennium, when consumer-grade CPUs reached 1GHz, I spent some time working on a cellular computing architecture. Different to how IBM/Toshiba/Sony ended up doing it, but similar in some of the philosophies.

The main change in philosophy is what I would liken to the Navy SEAL's mantra of "Slow is Smooth, Smooth is Fast".

_________________
All the way, with 68k

 Status: Offline
Profile     Report this post  
matthey 
Re: some words on senseless attacks on ppc hardware
Posted on 1-Feb-2024 0:59:38
#802 ]
Super Member
Joined: 14-Mar-2007
Posts: 1999
From: Kansas

BigD Quote:

No one is going to buy a Ā£1000+ PPC AmigaOne from outside the elitist "Classes not the Masses" AmigaOne fanboydom. The Cell had its use and brought us TLOU ahead of its time in 2013. That's good enough for me!


I have been talking about designing and producing budget hardware, potentially for the Amiga masses (under Ā£100). Native 68k Amiga hardware that is better than a RPi 3 should be possible for a similar cost to the A500 Mini. It requires more up front cost though.

agami Quote:

It's always about high performance with these guys.

We said the Cell CPU was cool in reference to its architecture, not because it had top performance.
In many ways it was a curious choice for Sony to put it in a game console, even if it was intended as more than just a game console.
By its design, it was more suited to cluster computing.

As it happens, around the turn of the millennium, when consumer-grade CPUs reached 1GHz, I spent some time working on a cellular computing architecture. Different to how IBM/Toshiba/Sony ended up doing it, but similar in some of the philosophies.

The main change in philosophy is what I would liken to the Navy SEAL's mantra of "Slow is Smooth, Smooth is Fast".


I like usable consistent performance as opposed to theoretical and peak performance which are more about hype. The Cell CPU fell into the hyped category. The PS2 design was somewhat similar with a MIPS CPU with vector/SIMD unit and a separate vector/SIMD unit used for T&L.

https://en.wikipedia.org/wiki/Emotion_Engine

This setup was popular and worked well. The Cell processor separate vector/SIMD units (SPEs) didn't have a specific purpose and the CPU core (PPE) was stripped down so far it didn't work well as a general purpose CPU core. If the CPU core had been more usable, Cell wouldn't have a bad reputation. The concept was a reasonable design taken to extremes (clocking the CPU core too high and stripping it to add more SIMD performance for an undefined purpose). The Cell processor may be a descent parallel media processor but it is a poor gaming engine. I like tech that is "different" but it needs to be easy to use with at least as many advantages as disadvantages.

 Status: Offline
Profile     Report this post  
Karlos 
Re: some words on senseless attacks on ppc hardware
Posted on 1-Feb-2024 1:27:48
#803 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

Quote:
I have been talking about designing and producing budget hardware, potentially for the Amiga masses (under Ā£100). Native 68k Amiga hardware that is better than a RPi 3 should be possible for a similar cost to the A500 Mini. It requires more up front cost though.


You've been talking about it, but let's be honest, it's a massive load of bollocks. You want some form of physical 68K, in the past you've touted custom ASIC for this. Newsflash. Hardware, just like software can have bugs and hardware bugs are a bitch to sort out unless your hardware is fundamentally rewritable (FPGA) or you can just keep throwing money at it until it works.

Look, if you want better than RPi 3 performance, you can use a CM4. Job done. And the best bit is that when incompatibilities and bugs in your virtual 68K, you can get a software update to fix it. Marvellous.



_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Hammer 
Re: some words on senseless attacks on ppc hardware
Posted on 1-Feb-2024 1:59:36
#804 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5273
From: Australia

@agami

Quote:

agami wrote:
@BigD

It's always about high performance with these guys.

We said the Cell CPU was cool in reference to its architecture, not because it had top performance.
In many ways it was a curious choice for Sony to put it in a game console, even if it was intended as more than just a game console.
By its design, it was more suited to cluster computing.

As it happens, around the turn of the millennium, when consumer-grade CPUs reached 1GHz, I spent some time working on a cellular computing architecture. Different to how IBM/Toshiba/Sony ended up doing it, but similar in some of the philosophies.

The main change in philosophy is what I would liken to the Navy SEAL's mantra of "Slow is Smooth, Smooth is Fast".



SPE couldn't even pointer swap with the host PPE CPU like normal multi-core CPUs, AMD APUs, NVIDIA CUDA v12.3, and Xbox 360's PPE/Xenos.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
Hammer 
Re: some words on senseless attacks on ppc hardware
Posted on 1-Feb-2024 4:23:51
#805 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5273
From: Australia

@matthey

Quote:

Sure, separate FPU units act like OoO which is nothing special (68k FPUs were similar). The "zero-cycle FXCH" is necessary for the stack based x86 FPU to take advantage of the FPU pipelining as most FPU instructions used the top of stack register/variable creating dependency chains using the result of the previous instruction which is commonly still executing for multi-cycle latency instructions.

Your focus on the FXCH argument is a nothing burger when a heavy FPU workload is required, the classic Pentium-based workstation was readily available.

You argued that 68060's FPU needs its own special treatment. Look in the mirror.

What matters are the end user use cases e.g. Lightwave and Quake benchmarks trump promises made by 68060.

Classic Pentium (P5) was Intel's flagship product from 1993 to 1995. Pentium Pro (P6) was released in 1995 and took over the flagship role for Intel's X86 product range. From 1995, classic Pentium was the second-tier "Celeron" role until Celeron's release.

In 1998, Intel created a three-tier product stack for Pentium II-based SKUs i.e. Celeron, Pentium II, and Xeon.

AMD offered the K6 MMX (RISC86 microarchitecture) alternative from 1997. AMD's K5 (29K RISC-based X86) was a failure (e.g. 133 Mhz clock speed wall) and its R&D was terminated for NextGen's RISC86 microarchitecture R&D.

Keep in mind that the 68060 was released in 1994 and Amiga's end user 68060 @ 50 Mhz experience started in 1995.

68060 instruction cache provides a maximum of 32 bits per cycle and a 32-bit front-side bus.

P5 Pentium instruction cache provides a maximum of 64 bits per cycle and a 64-bit front-side bus.

From https://www.youtube.com/watch?v=XOLaOVXT1Gk

Quake on 68060 50MHz vs 100MHz Amiga

Same resolution: 320x240. 50MHz gets 9 FPS, 100 MHz gets 18.1 FPS.
on Warp1260.


https://www.youtube.com/watch?v=ZNdfgF7DCNw
Quake Clickboom on BFG9060's 68060 Rev 6 @ 100Mhz Cybervision 64 (S3 Trio 64V) 320x200x8bit Demo1 (A4000)
Results: 23.46 fps.

During 1996, I predicted the potential upgrade for my A3000's Phase 5's CyberStorm 060 @ 50Mhz / Cyberstorm 64 (S3 Trio 64U)'s Quake performance and it's not cost vs performance competitive against the new build Pentium 150 / OEM S3 Trio 64UV+ PCI-based PC. When Amiga's Quake port was released in 1998, my prediction was correct.

BFG9060's 32-bit 100Mhz memory design didn't exist in 1996, hence it's about half the frame rate.

https://thandor.net/benchmark/33
320x200x8bit Demo1
Pentium 90 (430VX) = 24.30 fps.
Pentium 100 (430VX) = 26.70 fps
Pentium 150 (430VX) = 33.90 fps.
Pentium 166 (430VX) = 37.30 fps.

I jumpered my Pentium 150's 60 Mhz FSB into 66 Mhz FSB, hence the "Pentium 166 Mhz" recognition in the BIOS start-up.
This is my 1996 Pentium motherboard https://theretroweb.com/motherboards/s/pcpartner-mb520n-35-8258-xx#docs

I overclocked my 68060 rev 1 50 Mhz to 62.5 Mhz on TF1260. I tried 74 Mhz and failed.


Quote:

While the Pentium was issuing/executing FXCH instructions, the 68060 could execute integer instructions instead which improved performance for common mixed integer and FPU code. Quake likely used hand coded assembler FPU inlines to take advantage of the pipelined FPU but the advantage is partially offset by FPU advantages of the 68060 like better mixed code and memory handling, shorter instruction latencies in some cases and a cleaner FPU ISA.

Pentium FXCH instruction is zero cycle.

FXCH didn't actually move values around on Pentium and Pentium MMX processors either despite them being "in-order" cores. There was 'rename hardware' sitting in front of the X87/MMX register file.

The FXCH instruction does not in reality swap the contents of two registers, it only swaps their names. Instructions that push or pop the register stack also work by renaming. Floating point register renaming has been highly optimized on the Pentiums so that a register may be renamed while in use.

Pentium 4's FXCH instruction is not zero cycles.

The 68060 does not support several Integer Instructions, for example MUL64, and DIV64.

Quote:

The Pentium FDIV instruction is not pipelined like most FPUs and has a longer latency than the 68060 so the 68060 has a small advantage with FDIV.

Quake's Pentium FDIV latency is 19 clock cycles.

Quote:

Having separate pipelined units for FADD, FMUL and FDIV is a significant advantage.

I believe the 68060 combines the FMUL and FADD unit and I don't know if the FDIV unit can execute an instruction in parallel. FXCH is a kludge to allow a benefit from pipelining with the stack based x86 FPU ISA. There is a reason it was replaced by the SIMD unit.

Your focus on FXCH is a nothing burger.

Intel didn't completely replace x87 (double precision FP64) until SSE2's FP64 support.

WinUAE 64-bit edition still has a host FP80-enabled option.

For D language in Linux X64 example, when you use the types float or double it compiles SSE/SSE2 code, when you use real, it tells the compiler to use the implementations' most precise type i.e. x87's FP80. https://godbolt.org/z/50kr-H

Microsoft's anti-X87 directive is meaningless for some use cases. Until AMD/Intel releases hardware FP128 like on IBM's Power9 FP128, X87's FP80 is still in play in certain use cases.

Quote:

The Bytemark benchmark shows the 68060 isn't far behind in FPU performance and would likely be on par or better in performance with a 25% clock speed advantage or double the caches (ala 68060+) and both could be done together for more of an advantage.

NBench (Bytemark) is not Quake. Bytemark is meaningless for gamers.


Quote:

I improved the vbcc compiler backend support for the 68060. Frank Wille just happened to compile the Bytemark benchmark and record 68060 results not knowing the FPU results were nearly on par with a Pentium at the same clock speed. My changes were only 68060 support code changes with the largest gains likely from eliminating the use of trapped instructions. This does not include changes to the 68k backend for the FPU, FPU arguments are still passed on the stack and there is still no instruction scheduler so it is possible the 68060 FPU outperforms the in-order Pentium at the same clock speed. The Bytemark FPU benchmark index is actually a composite of several realistic FPU benchmarks but there are some algorithms where the 68060 would lack performance like heavy transcendental (trigonometry related) math. Hand coded FPU assembler for the pipelined Pentium FPU would give an advantage in some cases as well like with matrix math. Most FPU code is mixed integer/FPU code even for Quake where the 68060 can be surprisingly competitive with the Pentium even with compiler generated code from a far from major compiler. Any more proof won't happen with Amiga being an emulated EOL platform.

Bytemark doesn't concern the PC audience. What matters is Quake and it sent many X86 cloners to their market death.


Last edited by Hammer on 01-Feb-2024 at 05:39 AM.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
Hammer 
Re: some words on senseless attacks on ppc hardware
Posted on 1-Feb-2024 5:32:53
#806 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5273
From: Australia

@matthey

Quote:

Motorola had problems with the 68040 hot running chip which is well known. The 68060 runs cool and they were testing 68060@66MHz versions before it was released. Motorola produced PPC CPUs well over 100 MHz using the same process. Most of Motorola's competitors were producing 150+ MHz parts with 7-9 stage core designs using the same process.

In 1995, https://everymac.com/systems/by_year/macs-released-in-1995.html
Motorola's PowerPC 604 reached 132 MHz and PowerPC 603 reached 120 MHz.

Intel's Pentium Pro reached 200 Mhz and Pentium reached 133 Mhz.

---
In 1996, https://everymac.com/systems/by_year/macs-released-in-1996.html
Motorola's PowerPC 604e reached 200 MHz and PowerPC 603e reached 200 Mhz.

Intel's Pentium Pro has 200 Mhz and Pentium reached 200 Mhz.

---
In 1997, https://everymac.com/systems/by_year/macs-released-in-1997.html
Motorola's PowerPC 750 (G3) reached 266 Mhz.

Cyberstorm PPC was released in 1997.

Intel Pentium II reached 300 MHz and Pentium MMX reached 233 MHz.
AMD K6 (Model 6) reached 233 Mhz.

---
In 1998, https://everymac.com/systems/by_year/macs-released-in-1998.html
Motorola's PowerPC 750 (G3) reached 333 Mhz

Intel Pentium II reached 450 MHz.
AMD K6 (Model 7) reached 300 Mhz and K6-2 400 (Model 8) reached 400 MHz.

---
In 1999, https://everymac.com/systems/by_year/macs-released-in-1999.html
Motorola's PowerPC 750 (G3) reached 500 Mhz.

Intel Pentium III reached 800 MHz (Dec 1999).
AMD K7 Athlon (Model 2) reached 750 Mhz.

---
In 2000, https://everymac.com/systems/by_year/macs-released-in-2000.html
Motorola's PowerPC 7400 has 500 Mhz.

Intel Pentium III reached 1133 MHz and Pentium IV reached 1500 Mhz.
AMD K7 Athlon (Model 4) reached 1200 MHz.


Quote:

The Pentium (P55C) MMX increased the pipeline by one stage from 5 to 6. The PPro had a pipeline length of 14 stages. Yes, with a 14 stage pipeline it should be possible to get 150MHz using a 0.5um process. Intel learned how to increase clock speeds with deeper pipelines.

DEC sued Intel on the copied Alpha IP issue. https://www.wired.com/1997/10/intel-dec-settle-alpha-chip-dispute/

Around 1997, Intel gained DEC's state-of-the-art semiconductor fabrication facility in Hudson, Massachusetts, as well as development operations in Jerusalem and Austin (Texas).

Both Intel (via DEC business unit buyout) and AMD (via NextGen) gained personnel from DEC and clock speed wars.

Motorola's clock speed has largely kept up with Intel's clock speed until 1997 to 1998 date range i.e. DEC's brain drain towards Intel is a major factor.

AMD and Intel engaged in a major clock-speed war around 1998.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
Hammer 
Re: some words on senseless attacks on ppc hardware
Posted on 2-Feb-2024 2:37:28
#807 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5273
From: Australia

@matthey

Quote:
The first 4 cores in the 7-zip benchmark above are in-order core designs while the last 2 are OoO cores. In-order cores are smaller and simpler with the 3 newer in-order cores used in sub $100 USD hardware while the PPC OoO cores were expensive to develop and produce. The newer in-order cores destroy Cell while the SiFive U74 comes surprisingly close to the 2019 POWER9 using a 14nm chip process which is smaller than used for most of the in-order cores. The SiFive U74 in-order core design is as close as RISC cores come to the 68060 design but CISC instructions can be executed out of each execution pipeline each cycle which are the equivalent of two RISC instructions while avoiding more multi-cycle load-to-use stalls which is the purpose of the CISC like U74 design to begin with. A SiFive U74 core could likely reach 3 DMIPS/MHz executing CISC instructions like the 68k uses (U74 and 68060 cores can execute the equivalent of 5 RISC instructions/cycle using CISC instructions). Some people may think the RPi 4 with OoO ARM Cortex-A72 and RPi 5 with OoO ARM Cortex-A76 have surpassed in-order performance and won the core wars but these OoO cores are several times larger, use several times the power leaving less for the GPU and requiring more expensive power supplies with cooing fans and are much more complex and expensive to develop with increased security risks. 3 DMIPS/MHz in-order cores have a large cost advantage which can be leveraged and a SBC with a good GPU but moderately weaker CPU cores is likely to be more impressive. The VisionFive2 using SiFive U74 CPU cores which are higher performance than the RPi 3 and a better GPU than any of the RPi models is already competitive at $89.99 USD for the 8GiB SBC but RISC-V lacks the software to take advantage.


https://en.wikichip.org/wiki/sifive/microarchitectures/7_series#google_vignette
SiFive's 7 Series microarchitecture has two decoders i.e. dual issue.
SiFive U7 has 2.5 DMIPS/MHz or 4.9 CoreMarks/MHz.

VisionFive2's SiFive U74 CPU cores are clocked at 1.5 GHz. The asking price is USD 89.99 or $138.07 AUD.
----
https://www.eembc.org/coremark/scores.php
ARM Cortex A72 (DRA726) has 5.24 CoreMarks/MHz.

https://www.amazon.com/Raspberry-Pi-Computer-Suitable-Workstation/dp/B0899VXM8F
Raspberry Pi 4 Model B 8GB. The asking price is $118.77 AUD.

GPIO's performance is a larger concern for the PiStorm-like solution.

Claude Schwarz's PiStorm32 (non-lite) Y2021 prototype has Raspberry CM4's PCIe linkage and Lattice ECP5-12 FPGA.

The current PiStorm32 Lite has Efinix FPGA with RPi GPIO linkage.


Last edited by Hammer on 02-Feb-2024 at 02:52 AM.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
Karlos 
Re: some words on senseless attacks on ppc hardware
Posted on 2-Feb-2024 11:26:31
#808 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

I'll just say it again. There's zero sense in trying to build high performance 68K hardware (except perhaps FPGA that can be bugfixed readily). Even real 68060 units are difficult to interface with modern memory these days.

What we need from 68K is the instruction set compatibility only. The PiStorm an exemplar of a high performance 68K accelerator done right. It exists already, it's relatively inexpensive, it provides RTG and bugs/incompatibilities tend to be software fixes and has plenty of memory capacity and bandwidth.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
matthey 
Re: some words on senseless attacks on ppc hardware
Posted on 2-Feb-2024 18:59:14
#809 ]
Super Member
Joined: 14-Mar-2007
Posts: 1999
From: Kansas

Karlos Quote:

You've been talking about it, but let's be honest, it's a massive load of bollocks. You want some form of physical 68K, in the past you've touted custom ASIC for this. Newsflash. Hardware, just like software can have bugs and hardware bugs are a bitch to sort out unless your hardware is fundamentally rewritable (FPGA) or you can just keep throwing money at it until it works.


Jay Miner must have been insane. There were no affordable FPGAs or auto layout tools back then and chip development and production were very expensive.

1982-1983
average car cost: ~$9,000
median house cost: $~70,000
Fed funds effective rate: ~8.5-15% (corporate borrowing costs would be several percent higher)
initial Amiga investment: ~$7 million
RJ Mical estimated necessary Amiga investment: ~$49 million
RJ estimate would have Amiga costing the equivalent of ~5444 cars to bring to market
RJ estimate would have Amiga costing the equivalent of ~700 houses to bring to market

2023
average car cost: ~$48,000
median house cost: ~$430,000
Fed funds effective rate: 5.33%
my rough estimate to bring a small 68k Amiga SBC with SoC ASIC to market: $5-15 million
A $10 million estimate would have Amiga costing the equivalent of ~208 cars to bring to market.
A $10 million estimate would have Amiga costing the equivalent of ~23 houses to bring to market.

If I'm insane, Jay Miner was more insane.

Karlos Quote:

Look, if you want better than RPi 3 performance, you can use a CM4. Job done. And the best bit is that when incompatibilities and bugs in your virtual 68K, you can get a software update to fix it. Marvellous.


Emulated performance is not as good as native performance and brings many disadvantages.

Karlos Quote:

I'll just say it again. There's zero sense in trying to build high performance 68K hardware (except perhaps FPGA that can be bugfixed readily). Even real 68060 units are difficult to interface with modern memory these days.


FPGAs are an important tool for development and could be used in a final design where flexibility is desired or testing is incomplete. An ASIC is required for competitive CPU clock speeds though.

The 68060 doesn't have a built in memory controller so more external logic is required to interface with memory. This is flexible as synchronous memory became popular and older asynchronous memory would be required otherwise. The 68060 core design is similar to newer designs but is missing some newer advancements.

Karlos Quote:

What we need from 68K is the instruction set compatibility only. The PiStorm an exemplar of a high performance 68K accelerator done right. It exists already, it's relatively inexpensive, it provides RTG and bugs/incompatibilities tend to be software fixes and has plenty of memory capacity and bandwidth.


The PiStorm won't bring many users or developers back to the Amiga. Most ex-Amiga users have WinUAE on x86-64 hardware elsewhere that has superior performance and features. Emulation and virtual machines are nice gimmicks but THEA500 Mini didn't provide many new Amiga users or developers despite likely hundreds of thousands of units sold. The Amiga is dead while the RPi continues to attract real users and real developers using real hardware.

 Status: Offline
Profile     Report this post  
matthey 
Re: some words on senseless attacks on ppc hardware
Posted on 2-Feb-2024 22:05:05
#810 ]
Super Member
Joined: 14-Mar-2007
Posts: 1999
From: Kansas

Hammer Quote:

Your focus on the FXCH argument is a nothing burger when a heavy FPU workload is required, the classic Pentium-based workstation was readily available.

You argued that 68060's FPU needs its own special treatment. Look in the mirror.

What matters are the end user use cases e.g. Lightwave and Quake benchmarks trump promises made by 68060.


The 68060 has the best FPU performance with mixed integer/FPU code. The 68040 and 486 also have best performance with mixed integer/FPU code. The reason is that integer instructions can be executed at the same time as FPU instructions similar to OoO execution. The 68060 can issue integer instructions with FPU instructions where the 68040 and 680x0+6888x generally can't. The 68060 FPU instructions usually require fewer cycles to execute limiting the number of integer instructions that can be executed in parallel and superscalar multi-issue partially offsets the reduced cycles. The take away is that instruction scheduling for the 68060, 68040 and 6888x is mostly similar. This is not true for the Pentium and predecessors where code for the 486 practically doesn't benefit from the FPU pipelining and code for the Pentium runs poorly on the 486 where the FXCH instructions are not free. The inputs for pipelined FPU instructions need to be independent to avoid dependencies and only one can be without FXCH. It looks like a common FPU instruction with FXCH is 6 bytes for all stack register independent inputs which bloats the code as well (all register input 68k instructions are usually 4 bytes). Also, only 8 FPU registers limits the amount of pipelining possible, at least without FPU register renaming which was expensive back then. I rarely needed more than 8 FPU registers without pipelining for 68k FPU support code due to CISC mem-reg FPU capabilities.

The missing 6888x FPU instructions were more of a problem for 68040 and 68060 performance. This affected Lightwave more than Quake. The expected solution from Motorola was to recompile code for the 68040 or 68060 when trapped FPU instructions were common. Lightwave is compiled with SAS/C which received minimal 68040 and 68060 enhancements (nothing like what I did for vbcc). SAS/C continues to use trapped instructions when compiling for the 68040 and 68060 while GCC will also when compiling for 680x0-68060. Motorola's strategy did not work out as well as they hoped and 68060 compiler support was especially poor. Motorola engineers were correct that a relatively few common FPU instructions were used very often and provided most of the performance. The transistor savings from trapping less common FPU instructions and not pipelining the FPU could have been used to double the cache sizes providing a performance boost to CPU and FPU instructions (68060+). Today, a few hundred thousand transistors are nothing, all FPU instructions could be implemented, multiple FPU unit pipelining and FPU register renaming could all be implemented. The Motorola strategy was enticing though.


Hammer Quote:

68060 instruction cache provides a maximum of 32 bits per cycle and a 32-bit front-side bus.

P5 Pentium instruction cache provides a maximum of 64 bits per cycle and a 64-bit front-side bus.


The 68060 was already ahead of the Pentium in performance using a 32 bit memory bus, do to better cache efficiency, which allowed for a cheaper CPU chip with fewer pins and cheaper 32 bit memory. Doubling the caches would have provided more performance, perhaps 20%-30% more from the 68060+ while costing less than the Pentium.

Hammer Quote:

The 68060 does not support several Integer Instructions, for example MUL64, and DIV64.


It's too bad 64 bit integer MULx was removed as it is not uncommon. GCC uses it for magic number code to avoid 32 bit divides. The 68060 already has 32x32=32 and a 32x32 multiplier can produce a 64 bit result. The 64 bit division takes quite a few transistors though (unless done in the FPU which wasn't always available with some 68060 variants). The move may make some sense for embedded use but was the opposite direction of many desktop and even high end embedded CPUs which were adding integer 64 bit MUL and DIV support around the same time.

Hammer Quote:

NBench (Bytemark) is not Quake. Bytemark is meaningless for gamers.


I wouldn't say Bytemark is meaningless for games but it is just one benchmark. Quake is quite a good benchmark itself. Unfortunately, the vbcc 68k backend doesn't produce integer code anywhere close to the best GCC compiler versions for the 68k and GCC compiler versions that generate good 68k integer code are poor at generating 68k FPU code. THEA500 Mini, WinUAE x86-64, PiStorm and A600GS won't change this situation. They encourage ARM and x86-64 developers, not 68k Amiga developers.


Hammer Quote:

https://en.wikichip.org/wiki/sifive/microarchitectures/7_series#google_vignette
SiFive's 7 Series microarchitecture has two decoders i.e. dual issue.
SiFive U7 has 2.5 DMIPS/MHz or 4.9 CoreMarks/MHz.

VisionFive2's SiFive U74 CPU cores are clocked at 1.5 GHz. The asking price is USD 89.99 or $138.07 AUD.


Newer versions of SiFive U74 cores are claimed to be up to 2.64 DMIPS/MHz. That is with just two integer execution units which is very impressive (in-order U74 core is close to the performance of the best OoO PPC cores using fewer resources). Like the 68060 CPU core, they can remove/fold out predicted branches and probably superscalar execute FPU instructions so at least 3 instructions/cycle can be retired. Unlike the U74 core, the 68060 can execute instructions in each integer pipe in a single cycle that are the equivalent of 2 RISC instructions so the 68060 can retire the equivalent of 5 RISC instructions per cycle (U74 core can retire at least 3 RISC instructions/cycle). The U74 core design is very similar to the 68060 core design but is handicapped by executing weak RISC instructions.



https://allinfo.space/2022/08/28/visionfive-2-single-board-computer-with-risc-v-processor-and-3d-graphics-chip/

Hammer Quote:

https://www.eembc.org/coremark/scores.php
ARM Cortex A72 (DRA726) has 5.24 CoreMarks/MHz.


The newer ARM OoO cores are not even limited OoO like PPC but aggressive micro-oped OoO designs like x86-64 cores. These OoO cores cost tens of millions of dollars to develop, they are many times the size and cost of an in-order CPU core, they are approaching the power draw of x86-64 cores and they have more security risks. With Moore's Law slowing, that is not where I would want to try to compete. How cheap can a relatively simple 3 DMIPS/MHz in-order core be produced is what I would like to see explored (SiFive U74 core is the right idea but they have weak RISC instructions and don't have a large enough market). I see a 68k advantage in executing more powerful CISC instructions and having a better code density and smaller footprint than the competition. It may be possible to have a cost advantage by avoiding royalties compared to ARM commodity cores too.

 Status: Offline
Profile     Report this post  
Karlos 
Re: some words on senseless attacks on ppc hardware
Posted on 2-Feb-2024 22:35:10
#811 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

Jay was a chip designer, but he didn't design CPUs. Which is it you design?

Quote:
Emulated performance is not as good as native performance


Since no ARM runs 68K code natively and the hypothetical ASIC doesn't fecking exist, I'll file that observation under "nonsensical statement that never gets any better no matter how many times reiterated."

Quote:
and brings many disadvantages


Name one that actually matters, at all, for properly written user mode software.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
matthey 
Re: some words on senseless attacks on ppc hardware
Posted on 2-Feb-2024 23:18:42
#812 ]
Super Member
Joined: 14-Mar-2007
Posts: 1999
From: Kansas

Karlos Quote:

Jay was a chip designer, but he didn't design CPUs. Which is it you design?


And Jay didn't understand the finances well or he would have looked at the numbers above and decided never to start an Amiga computer. The numbers may come close to working for a simple game device/console but when the gaming market crashed, continuing with a more expensive computer was a long shot at best. He was fortunate to not lose his house but he lost something that may have been as important to him, control of his baby. Of course C= was suffering because of the crazy borrowing costs too, but their finances were not tight in the right places. Of course you have to find the right people. Jay found good help at least.

Karlos Quote:

Since no ARM runs 68K code natively and the hypothetical ASIC doesn't fecking exist, I'll file that observation under "nonsensical statement that never gets any better no matter how many times reiterated."


A scalar 68k ASIC CPU that can execute 1 instruction/cycle would outperform many if not most ARM CPUs at the same clock speed.

Karlos Quote:

Name one that actually matters, at all, for properly written user mode software.


Disadvantages of 68k emulation on ARM include higher CPU % of use constantly, more jitter (inconsistent performance), more caches wasted, more memory used, more power used, no 68k development, not competitive or attractive for most uses.

 Status: Offline
Profile     Report this post  
Karlos 
Re: some words on senseless attacks on ppc hardware
Posted on 2-Feb-2024 23:56:03
#813 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

Quote:
A scalar 68k ASIC CPU that can execute 1 instruction/cycle would outperform many if not most ARM CPUs at the same clock speed.


A deluxe 10 flavour ice cream making machine with dual cone dispenser that can execute 2 instructions/cycle would outperform your ASIC at the same clockspeed and can clock way higher due to the inbuilt refrigeration system used to keep the ice cream cold.

The difference is, the CM4/PiStorm exists. Your CPU and my ice cream machine/CPU hybrid do not.

Quote:
Disadvantages of 68k emulation on ARM include higher CPU % of use constantly, more jitter (inconsistent performance), more caches wasted, more memory used, more power used, no 68k development, not competitive or attractive for most uses.


Hypothetical nonsense. I asked for a disadvantage that matters to actual well written user mode software. The variance in performance under JIT might be a problem for a critical realtime process that depends on cycle exact timing. But guess what. That's a problem for almost any CPU that has caches and an MMU and has more than one thing to do at once. Which is like every full 68030+ equipped Amiga.

The memory usage is irrelevant because it's being used by the emulator. The 68K has 2GB addressable all together out of 4GB or more available to the host.

Now let's consider the RPi advantages over your proposed alternative:

It exists and is available right now, doesn't cost an up front wedge in R&D, provides more than just a CPU, is actively maintained..

I could go on, but I don't need to because *actually existing* crits your wet dream imaginary solution for all the HP.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Hammer 
Re: some words on senseless attacks on ppc hardware
Posted on 3-Feb-2024 6:06:11
#814 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5273
From: Australia

@matthey

Quote:

The 68060 has the best FPU performance with mixed integer/FPU code.

Words are cheap. Prove it with Quake benchmarks.

The Amiga lost the Lightwave rendering niche.

Quote:

The 68040 and 486 also have best performance with mixed integer/FPU code. The reason is that integer instructions can be executed at the same time as FPU instructions similar to OoO execution. The 68060 can issue integer instructions with FPU instructions where the 68040 and 680x0+6888x generally can't.

68040 only has a single instruction issue and it's expected.

Quote:

The 68060 FPU instructions usually require fewer cycles to execute limiting the number of integer instructions that can be executed in parallel and superscalar multi-issue partially offsets the reduced cycles. The take away is that instruction scheduling for the 68060, 68040 and 6888x is mostly similar. This is not true for the Pentium and predecessors where code for the 486 practically doesn't benefit from the FPU pipelining and code for the Pentium runs poorly on the 486 where the FXCH instructions are not free.

68060's instruction set is not a strict superset of 68040's instruction set, hence 68060 wasn't a straight drop-in replacement for Mac 68K.

68060 didn't match Pentium's backward compatibility.

https://www.nxp.com/docs/en/supporting-information/MC68060AR.pdf
Porting software from an MC68040 to an MC68060.


------

In 1996, many PC gamers like myself has switched over to Pentium-based PCs since PS1 game ports require this performance level.

PS1 arrived in Western markets in Q4 1995.

P5 Pentium architecture was already 3 years old in 1996.

Quote:

The inputs for pipelined FPU instructions need to be independent to avoid dependencies and only one can be without FXCH. It looks like a common FPU instruction with FXCH is 6 bytes for all stack register independent inputs which bloats the code as well (all register input 68k instructions are usually 4 bytes). Also, only 8 FPU registers limits the amount of pipelining possible, at least without FPU register renaming which was expensive back then. I rarely needed more than 8 FPU registers without pipelining for 68k FPU support code due to CISC mem-reg FPU capabilities.

68060 instruction cache can only provide 4 bytes (64-bits) of instructions per cycle, hence this design issue has reduced 68060's performance. This design issue is fixed on AC68080.

https://en.m.wikipedia.org/wiki/File:Intel_Pentium_arch.svg
The diagram shows
1. Pentium's instruction cache provides 32 bytes (256 bits) of instructions per cycle.
2. Pentium FPU with three functional units. The bottleneck is Control/Register File's single FP instruction dispatcher. Control/Register File already has register rename capability.

On the Intel 486 processor, each FXCH takes 4 clocks while there is no penalty on the Pentium processor.

Pentium FPU is pipelined despite 8 FPU registers.

AC68080 needs 68K MMU for embedded market Linux.

Quote:

The missing 6888x FPU instructions were more of a problem for 68040 and 68060 performance. This affected Lightwave more than Quake. The expected solution from Motorola was to recompile code for the 68040 or 68060 when trapped FPU instructions were common.

68882's MFLOPS is in low 1.x MFLOPS which is less useful for games.

Quote:

Lightwave is compiled with SAS/C which received minimal 68040 and 68060 enhancements (nothing like what I did for vbcc). SAS/C continues to use trapped instructions when compiling for the 68040 and 68060 while GCC will also when compiling for 680x0-68060. Motorola's strategy did not work out as well as they hoped and 68060 compiler support was especially poor. Motorola engineers were correct that a relatively few common FPU instructions were used very often and provided most of the performance. The transistor savings from trapping less common FPU instructions and not pipelining the FPU could have been used to double the cache sizes providing a performance boost to CPU and FPU instructions (68060+). Today, a few hundred thousand transistors are nothing, all FPU instructions could be implemented, multiple FPU unit pipelining and FPU register renaming could all be implemented. The Motorola strategy was enticing though.

From game console vendor's POV, 68K incompatibility is a major factor.

The Amiga platform has benefited from the WHDLoad effort's patching for the full 32-bit 68K CPUs.

Besides the FP80 issue, Emu68 (self-reports as 68040) implements the full 68K instruction set support like AC68080.

Quote:

The 68060 was already ahead of the Pentium in performance using a 32 bit memory bus, do to better cache efficiency, which allowed for a cheaper CPU chip with fewer pins and cheaper 32 bit memory. Doubling the caches would have provided more performance, perhaps 20%-30% more from the 68060+ while costing less than the Pentium.

Your cheaper assertion doesn't reflect the real world when Pentium's Socket 5 and Socket 7 PC motherboards are manufactured economies of scale.

Logistics wins the wars.


Quote:

It's too bad 64 bit integer MULx was removed as it is not uncommon. GCC uses it for magic number code to avoid 32 bit divides. The 68060 already has 32x32=32 and a 32x32 multiplier can produce a 64 bit result. The 64 bit division takes quite a few transistors though (unless done in the FPU which wasn't always available with some 68060 variants). The move may make some sense for embedded use but was the opposite direction of many desktop and even high end embedded CPUs which were adding integer 64 bit MUL and DIV support around the same time.


https://www.nxp.com/docs/en/supporting-information/MC68060AR.pdf
Porting software from an MC68040 to an MC68060 has a list of missing 68K instructions for 68060.

This missing instructions design issue is fixed on AC68080.


Quote:

I wouldn't say Bytemark is meaningless for games but it is just one benchmark. Quake is quite a good benchmark itself. Unfortunately, the vbcc 68k backend doesn't produce integer code anywhere close to the best GCC compiler versions for the 68k and GCC compiler versions that generate good 68k integer code are poor at generating 68k FPU code. THEA500 Mini, WinUAE x86-64, PiStorm and A600GS won't change this situation. They encourage ARM and x86-64 developers, not 68k Amiga developers.

PiStorm is just a gateway interface between the Amiga and RPi SBC. Emu68 is the software component that enables 68K to ARMv8 translation hypervisor.

The CPU vendors who didn't learn from Quake have doomed themselves. The problem with Quake 3 has gimped PowerMac gaming.

PiStorm-Emu68 still encourages 68K i.e. think of Transmeta's Code Morphing Software. https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/transmeta/
https://ieeexplore.ieee.org/document/1191529

If ARM software is the target, I have my Qualcomm Snapdragon 8 Android phone.

Quote:

The newer ARM OoO cores are not even limited OoO like PPC but aggressive micro-oped OoO designs like x86-64 cores. These OoO cores cost tens of millions of dollars to develop, they are many times the size and cost of an in-order CPU core, they are approaching the power draw of x86-64 cores and they have more security risks. With Moore's Law slowing, that is not where I would want to try to compete. How cheap can a relatively simple 3 DMIPS/MHz in-order core be produced is what I would like to see explored (SiFive U74 core is the right idea but they have weak RISC instructions and don't have a large enough market). I see a 68k advantage in executing more powerful CISC instructions and having a better code density and smaller footprint than the competition. It may be possible to have a cost advantage by avoiding royalties compared to ARM commodity cores too.

The quoted retail asking prices are stated by the mentioned Amazon links.

Both Qualcomm Oryon (ARMv9 ISA) and Apple M3 (ARMv8.6 ISA) are ARM64 clones that compete against ARM Ltd's Cortex X series.

RISC-V mostly covers single-purpose embedded devices e.g. GeForce RTXs has NVIDIA's custom RISC-V CPU and it's hidden from 3rd party programmers.

Last edited by Hammer on 03-Feb-2024 at 06:25 AM.
Last edited by Hammer on 03-Feb-2024 at 06:21 AM.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
kolla 
Re: some words on senseless attacks on ppc hardware
Posted on 3-Feb-2024 9:37:26
#815 ]
Elite Member
Joined: 21-Aug-2003
Posts: 2882
From: Trondheim, Norway

@matthey

Years and years of the sameā€¦ please just do it already!

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

 Status: Offline
Profile     Report this post  
matthey 
Re: some words on senseless attacks on ppc hardware
Posted on 3-Feb-2024 20:36:43
#816 ]
Super Member
Joined: 14-Mar-2007
Posts: 1999
From: Kansas

Karlos Quote:

A deluxe 10 flavour ice cream making machine with dual cone dispenser that can execute 2 instructions/cycle would outperform your ASIC at the same clockspeed and can clock way higher due to the inbuilt refrigeration system used to keep the ice cream cold.

The difference is, the CM4/PiStorm exists. Your CPU and my ice cream machine/CPU hybrid do not.


The 68k Amiga doesn't get a deluxe ice cream maker. It's more like a cheap shaved ice machine with corn syrup sweetener but has more markup. That's what the ARM Cortex-A53 based THEA500 Mini and A600GS are like. A scalar 68k CPU could and should outperform the in-order superscalar Cortex-A53 at the same clock speed due to fewer instructions (CISC instruction=2xRISC instructions) and no load-to-use stalls.

Karlos Quote:

Hypothetical nonsense. I asked for a disadvantage that matters to actual well written user mode software. The variance in performance under JIT might be a problem for a critical realtime process that depends on cycle exact timing. But guess what. That's a problem for almost any CPU that has caches and an MMU and has more than one thing to do at once. Which is like every full 68030+ equipped Amiga.

The memory usage is irrelevant because it's being used by the emulator. The 68K has 2GB addressable all together out of 4GB or more available to the host.


What percentage of embedded systems emulate another ISA? Even desktop emulation and virtual machines went the way of the dodo bird because they aren't efficient. The Amiga is going the way of the dodo bird too.

Karlos Quote:

Now let's consider the RPi advantages over your proposed alternative:

It exists and is available right now, doesn't cost an up front wedge in R&D, provides more than just a CPU, is actively maintained..

I could go on, but I don't need to because *actually existing* crits your wet dream imaginary solution for all the HP.


Not much would ever be created if alternatives were considered that would suffice or were "good enough" including the original Amiga. Technology would not improve with your stick your head in a hole view of the world.

 Status: Offline
Profile     Report this post  
Karlos 
Re: some words on senseless attacks on ppc hardware
Posted on 3-Feb-2024 22:37:45
#817 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

Quote:
. The Amiga is going the way of the dodo bird too

Is going? It's 2024. It went that way decades ago already!

Quote:
What percentage of embedded systems emulate another ISA? Even desktop emulation and virtual machines went the way of the dodo bird because they aren't efficient.


This is a total strawman argument because there generally aren't any embedded systems that *need* to emulated another ISA. That said, Java and .net are still more popular than the Amiga has ever been and virtual machines, by which I specifically mean software defined bytecode machines, continue to power most of the internet and mobile applications you use day to day. Horrific though it is to contemplate, half the things you do on a mobile device these days are just mini chromeless browsers running JS. I know, it makes me shudder as well, but that's how it is.

Yes native is great, but there is no modern native 68K hardware solution and no matter how much you go on about the cycle efficiency of 68060, it's just not relevant. Unless you have a huge disposable wedge of money and the ability to bring it to market, it's just a dream.

Quote:
Not much would ever be created if alternatives were considered that would suffice or were "good enough" including the original Amiga


What? There have been numerous new 060 boards these last years that are "good enough" solutions for most people. There has been the vampire which is even more performant.

The PiStorm utterly batters these into the dust performance wise and is more affordable.

I hate to be the one to break it to you but you don't have a hope in hell of creating an ASIC solution, because even if it was better than the deluxe ten flavour ice cream version, you don't have the up front development funds and you don't have the guaranteed market to sell it to, either. We are all dropping dead, slowly but surely.

You've gone on about this idea for decades. Either do it, if you can, or accept that you can't and just enjoy what there is, while there is still the time left to enjoy it.

Last edited by Karlos on 03-Feb-2024 at 10:59 PM.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
OneTimer1 
Re: some words on senseless attacks on ppc hardware
Posted on 3-Feb-2024 23:26:36
#818 ]
Cult Member
Joined: 3-Aug-2015
Posts: 973
From: Unknown

@matthey

Quote:


A scalar 68k ASIC CPU that can execute 1 instruction/cycle would outperform many if not most ARM CPUs at the same clock speed.


The 68060 runs some instructions even faster, it was a good CPU but never got the updates it would need to compete with a Pentium.

Oh BTW. a cheap ARM CPU from a RasPi has 4 cores and does 'out of order' executions, so it might beat every ' scalar 68k ASIC CPU' by a factor of 4.

 Status: Offline
Profile     Report this post  
matthey 
Re: some words on senseless attacks on ppc hardware
Posted on 4-Feb-2024 2:38:09
#819 ]
Super Member
Joined: 14-Mar-2007
Posts: 1999
From: Kansas

Hammer Quote:

68040 only has a single instruction issue and it's expected.


The 68040 FPU can execute more than one FPU instruction "concurrently".

M68040 User's Manual Quote:

Like the IU, the FPU has been optimized for the most frequently used instructions and data types to provide the highest possible performance. To boost performance further, the FMOVE instruction concurrently executes with arithmetic calculations and executes completely transparent to the user. Instructions can execute nonsequentially as long as there are no register dependencies.


Hammer Quote:

68060's instruction set is not a strict superset of 68040's instruction set, hence 68060 wasn't a straight drop-in replacement for Mac 68K.

68060 didn't match Pentium's backward compatibility.


For integer instructions, the removal of the 64 bit result MULx and DIVx hurt by far the most. The others had negligible affect on performance or code density and were not so bad to trap. I have never seen a CHK2, CMP2 or CAS2 instruction used in Amiga code while CAS2 is documented as illegal on the Amiga. MOVEP was used by a few old Amiga games while the only issue was a lack of 680x0.library in kickstart at startup.

For FPU instructions, the 68060 removed FScc, FDBcc and FTRAPcc which I never seen used on the Amiga and added back FINTRZ and FINT which are very common. Removing FINT(RZ) from the 68040 FPU was an epic mistake that never should have happened. The 68060 FPU was overall a nice improvement from the 68040 FPU as far as frequency of trapped instructions.

Overall, the 68060 instruction removal was minor other than the 64 bit MULx and DIVx. The other instructions removed were rare. Many of the 6888x FPU instructions removed with the 68040 are more useful, although it is arguable whether the code should be part of the FPU or as regular 68k functions. Not counting trap overhead, most removed 6888x instructions are not much faster than similar functions. The legacy Pentium FPU instructions likely didn't give much of a performance advantage after trapped FPU instructions on the 68060 were replaced with function calls. The saved transistors likely could be used to increase performance more with increased caches.

Hammer Quote:

68060 instruction cache can only provide 4 bytes (64-bits) of instructions per cycle, hence this design issue has reduced 68060's performance. This design issue is fixed on AC68080.


Motorola had different perspective on the 68060 feature.

Motorola High-Performance Internal Product Portfolio Overview Quote:

Competitive Advantages:
Intel Pentium: Dominates PC-DOS market
o Weaknesses: Requires 64-bit bus.
68060: Superior integer performance with low-cost memory system


Hammer Quote:

From game console vendor's POV, 68K incompatibility is a major factor.

The Amiga platform has benefited from the WHDLoad effort's patching for the full 32-bit 68K CPUs.

Besides the FP80 issue, Emu68 (self-reports as 68040) implements the full 68K instruction set support like AC68080.


What game console used a 68040, 68060 or 6888x? Most consoles used a 68000 and the only 68000 instructions removed were MOVEP and user mode MOVE from SR (which was removed from later 68000 variations). The CD32 was the only 68020+ console that I can recall and only the removed 64 bit result MULx and DIVx instructions are common. We are practically talking about 1-2 incompatible instructions for 68000 compatibility and 3-4 instructions for 68020 compatibility. MOVE from SR was mostly dealt with back in C= days and MOVEP has been patched in many WHDLoad patches. Bring back the 64 bit result removed MULx and DIVx and put the 680x0.library in kickstart to give good compatibility and performance. There are 68060 accelerators for the CD32 which I expect give good enough compatibility to make them worthwhile.

The last I heard, the AC68080 does not even implement all 68060 FPU instructions in hardware and it uses double precision instead of extended precision.

Hammer Quote:

Your cheaper assertion doesn't reflect the real world when Pentium's Socket 5 and Socket 7 PC motherboards are manufactured economies of scale.


68060 cost advantages over Pentium
+ 32 bit memory bus uses fewer expensive CPU pins and allows the use of cheaper 32 bit memory
+ 68060 uses significantly fewer transistors giving smaller cheaper chips
+ 68060 static logic design is simpler and cheaper to develop than Pentium dynamic logic design
+ fully static 68060 design uses cheap CMOS process instead of expensive BiCMOS process
+ 68060 CMOS design uses less power and produces less heat giving power supply and cooling savings
+ 68060 code density and cache efficiency is better than the Pentium

The 68060 had many cost advantages but the Pentium had the desktop economies of scale advantage. Motorola gave up the first and the last advantage when they went with fat PPC and dumped the 68060.

Last edited by matthey on 04-Feb-2024 at 02:46 AM.

 Status: Offline
Profile     Report this post  
matthey 
Re: some words on senseless attacks on ppc hardware
Posted on 4-Feb-2024 5:30:25
#820 ]
Super Member
Joined: 14-Mar-2007
Posts: 1999
From: Kansas

kolla Quote:

Years and years of the sameā€¦ please just do it already!


Karlos Quote:

I hate to be the one to break it to you but you don't have a hope in hell of creating an ASIC solution, because even if it was better than the deluxe ten flavour ice cream version, you don't have the up front development funds and you don't have the guaranteed market to sell it to, either. We are all dropping dead, slowly but surely.

You've gone on about this idea for decades. Either do it, if you can, or accept that you can't and just enjoy what there is, while there is still the time left to enjoy it.


Larry Kaplan probably said the same thing to Jay Miner. Do it already. It's not a one man project and there are IP squatters and road blocks. After development, imagine an Amiga device without the Amiga name, AmigaOS or Workbench. Oh wait, that already happened with THEA500 Mini and it still likely sold in the hundreds of thousands. The Amiga legacy continues to be made by the likes of Irving Gould, Medhi Ali, Bill Sydnes, Ben Hermans and Trevor Dickinson.

OneTimer1 Quote:

The 68060 runs some instructions even faster, it was a good CPU but never got the updates it would need to compete with a Pentium.

Oh BTW. a cheap ARM CPU from a RasPi has 4 cores and does 'out of order' executions, so it might beat every ' scalar 68k ASIC CPU' by a factor of 4.


Most ARM CPU cores produced are in-order. Most customers buy the cheapest low power compromise processors including in Amiga Neverland with the ARM Cortex-A53 in THEA500 Mini and A600GS. The micro-oped OoO ARM cores beat any in-order core but they are likely 50 times the transistors/area, usually use a more expensive chip process and are significantly more expensive.

 Status: Offline
Profile     Report this post  
Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle