Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6071 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

13 crawler(s) on-line.

87 guest(s) on-line.

0 member(s) on-line.

You are an anonymous user.
Register Now!

amigakit: 21 mins ago

NutsAboutAmiga: 43 mins ago

michalsc: 50 mins ago

kolla: 1 hr 26 mins ago

Tuxedo: 1 hr 37 mins ago

DiscreetFX: 2 hrs 26 mins ago

Rob: 2 hrs 30 mins ago

Swisso: 4 hrs 47 mins ago

Matt3k: 4 hrs 52 mins ago

matthey: 4 hrs 55 mins ago

Forum Index

Amiga OS4 Hardware

some words on senseless attacks on ppc hardware

Poster

Thread

matthey

Re: some words on senseless attacks on ppc hardware
Posted on 1-May-2024 20:59:33

[ #1261 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2150
From: Kansas

Hammer Quote:

68060's FPU wasn't pipelined. Your 8-stage pipelines are for integers. CPU's clock speed potential is only as good as the weak point.

The 68060 FPU is not considered pipelined as a unit but FPU instructions are pipelined more than the classic RISC pipeline.

1. IF (Instruction Fetch)
2. ID (Instruction Decode)
3. EX (EXecute)
4. MEM (MEMory access)
5. WB (register Write Back)

https://en.wikipedia.org/wiki/Classic_RISC_pipeline

Now lets look at the 68060 instruction pipeline.

1. IAG (Instruction Address Generation)
2. IC (Instruction fetch Cycle)
3. IED (Instruction Early Decode)
4. IB (Instruction Buffer)
5. DS (Decode instruction and Select)
6. AG (operand Address Generation)
7. OC (Operand fetch Cycle)
8. EX (instruction EXecution)
--- stages 9 and 10 are optional ---
9. DA (Data Available)
10. WB (Write Back)

Both 68060 integer instruction stages and FPU instruction stages are similar with the difference being the EX stage which is performed by the integer operand execution pipelines (OEPs) while FPU instructions are sent to the FPU. The FPU is actually in the primary OEP (pOEP) though. The 68060 FPU internal operation is serial and not pipelined but the FPU can execute FPU instructions in parallel with the integer OEPs. The FPU EX stage may have more work to perform than typical integer instructions like normalizing fp numbers including 67 bit barrel shift and extra result and error checking but CISC pipelines have multi cycle integer instructions like division too (no hardware MUL or DIV for classic RISC pipeline). From what I've read, critical clock speed limiting stages are more likely to be related to MMU or cache accesses where professional chip designs use custom or licensed optimized IP blocks. The 68060 designers likely just needed more time to analyze where the critical timing areas are and optimize them.

Hammer Quote:

Without a 64-bit front-side bus, sustained FP64 and dual INT32 wouldn't be optimal.

Each 68060 OEP can access the banked data cache in the same cycle so this would be 2x32 bit data cache accesses. The FPU has a 64 bit path to the data cache. Anytime sustained accesses to memory are necessary is not optimal. The Pentium P5 and P6 have a small advantage here made even smaller by subtracting the memory bandwidth used by loading ~20% more code. I still prefer Motorola's strategy with the 68060+ to provide competitive performance at a much lower cost.

Hammer Quote:

AMD's K5 has a 6-stage pipeline and runs into a 133 Mhz clock-speed wall. AMD's K6 (with technology from NexGen's ex-Alpha DEC engineers) still has 6-stage pipelines and can reach higher clock speeds.

Pipeline stages are one aspect of clock speed potential.

The 8 stage 68060 ran into a pencil pusher wall at only 50MHz making Motorola look incompetent. The high performance at a low clock speed was perfect for embedded use though. As I recall, the 68060 life was over a decade and the last revision and die shrink came in 1999 but was still rated at 50MHz. Even though Motorola handed ARM with Thumb2 the embedded market by shoving fat PPC down customers throats, ARM didn't have a CPU core which could match 68060 DMIPS/MHz performance until about 2005 with the Cortex-A8 but it had a less general purpose 13 stage integer pipeline that improves ILP at the expense of branch performance (more of a media processor like the Pentium 4 but with less heat).

Hammer Quote:

That's a flawed argument when Pentium Pro's larger cache and front end cover the gap between the slower 64-bit 66 Mhz front side bus and the CPU's higher clock speed.

Additional transistors are spent when there's a large gap between the slower front side bus and CPU clock speed.

I was comparing the 68060+ with 16kiB I+D to the PPro.

68060+
8 stage in-order superscalar design
16kiB I+D (64 bit internal data paths to data cache where advantageous like FPU)
32 bit data bus to memory (reduces CPU, memory and board costs)
single die ~3.3 million transistors

Pentium Pro
14 stage OoO design (OoO and longer than necessary pipeline wastes transistors)
L1 8kiB I+D
L2 256kiB
64 bit data bus to memory
2-3 dies bonded together to make expensive CPU module of ~5.5 million transistors

The 32 bit data bus of the 68060 was usually not a bottleneck but doubling the caches with the 68060+ reduces any disadvantage the 68060 had because double the data can be accessed using internal 64 bit paths to the caches instead of accessing memory over the 32 bit data bus. Memory accesses are also reduced with the larger caches making the 32 bit data bus less of an issue. It is true that higher CPU clock speeds would increase data requirements making the 32 bit data bus more of an issue. The PPro definitely has the advantage for larger workloads with the L2 cache and 64 bit data bus but the 68060+ would have been potent while much more efficient and much cheaper. Let's not forget that the 68k code density is about 20% better than x86 code density which reduces memory accesses and improves instruction cache efficiency. RISC-V research found that every 25%-30% code density improvement is like doubling the instruction cache size. With ISA code density improvements like adding ColdFire instructions, the 16kiB instruction cache may have held as much code as a x86 32kiB instruction cache (adding instructions to x86 worsens code density due to larger instructions from no free encoding space while adding instructions to the 68k can improve code density). It certainly looks like code density improvements were worthwhile enough to add them to the AC68080 giving code density beyond the Thumb2 embedded standard and without the performance loss of RISC compressed encodings due to increased instruction counts. The ColdFire instructions would have helped 68060 performance not just from code density improved cache efficiency but also from elimination of partial register writes and 32 bit results improving forwarding/bypassing capabilities.

Hammer Quote:

http://archive.computerhistory.org/resources/access/text/2013/04/102723315-05-01-acc.pdf
Page 86 of 417, DataQuest 1995

1994 Worldwide Microprocessor Market Share Ranking.

For 1994 Market Share
1. Intel, 73.2%
2. AMD, 8.6%
3. Motorola, 5.2%
4. IBM, 2.2%

Page 84 of 417,

This is MPU market share by revenue which shows how much higher margin high end computer markets like desktop and workstation markets were than the embedded market which Motorola led with the 68k.

Hammer Quote:

Supply Base for 32-Bit Microprocessorsâ€”1994,
For Product's Share of Total 32-Bit-and-Up MPU Market 1994

68000, 17%
80386SX/SL, 3%
80386DX, 3%
80486SX, 16%
80486DX, 21%
683XX, 9%
68040, 3%
68030, 1%
68020, 3%
80960, 4%
AM29000, 1%
32X32, 3%
R3000/R4000, 1%
Sparc, 1%
Pentium, 4%
Others, 10%

Motorola wasn't able to convert 68000's success for 68020, 68030 and 68040. This factor has weakened Motorola's independent R&D capability.

Now you are talking about volume where the 68k looks much better because of the high volume of the embedded market.

68000, 17%
683XX, 9%
68040, 3%
68020, 3%
68030, 1%
---
68k is ~33% of 32+ bit MPU market by volume in 1994 but most of this is for the low margin embedded market. The 68000 should really be classified as a 16 bit CPU but it does have a 32 bit ISA.

80386SX/SL, 3%
80386DX, 3%
80486SX, 16%
80486DX, 21%
Pentium, 4%
---
x86 is ~47% of 32+ bit MPU market by volume in 1994 but most of this is for the high margin desktop market.

Last edited by matthey on 01-May-2024 at 09:10 PM.

Status: Offline

Hammer

Re: some words on senseless attacks on ppc hardware
Posted on 2-May-2024 7:37:51

[ #1262 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5616
From: Australia

@matthey
Quote:

Both 68060 integer instruction stages and FPU instruction stages are similar with the difference being the EX stage which is performed by the integer operand execution pipelines (OEPs) while FPU instructions are sent to the FPU. The FPU is actually in the primary OEP (pOEP) though. The 68060 FPU internal operation is serial and not pipelined but the FPU can execute FPU instructions in parallel with the integer OEPs. The FPU EX stage may have more work to perform than typical integer instructions like normalizing fp numbers including 67 bit barrel shift and extra result and error checking but CISC pipelines have multi cycle integer instructions like division too (no hardware MUL or DIV for classic RISC pipeline). From what I've read, critical clock speed limiting stages are more likely to be related to MMU or cache accesses where professional chip designs use custom or licensed optimized IP blocks. The 68060 designers likely just needed more time to analyze where the critical timing areas are and optimize them.

Show 68060 Rev 6 100 Mhz beating Pentium 100 at Quake benchmark. Hint: Warp1260 with RTG couldn't do it.

Pentium's Floating-Point Unit (FPU) of Pentium has an eight-stage pipeline i.e.
Prefetch (PF),
Decode-1 (D1),
Decode-2 (D2),
Execute (dispatch),
Floating Point Execute-1 (X1)
Floating Point Execute-2 (X2)
Write Float (WF)
Error Reporting (ER)

Pentium's Floating-Point Unit (FPU) of Pentium MMX has a nine-stage pipeline i.e.
Prefetch (PF),
Fetch (F),
Decode-1 (D1),
Decode-2 (D2),
Execute (dispatch),
Floating Point Execute-1 (X1)
Floating Point Execute-2 (X2)
Write Float (WF)
Error Reporting (ER)

Pipelined enables "instructions-in-flight".

-----
P54 Pentium 75/90/100/120/133/150/166/200's internal instruction bus is 256 bits wide which is an improvement over the 1993 released P5 Pentium 60/66's 128-bit internal instruction bus. (Cite: Intel's Pentium Processor Family Developerâ€™s Manual, 1997, Page 24 of 609).

P55 Pentium includes MMX.

Quote:

Each 68060 OEP can access the banked data cache in the same cycle so this would be 2x32 bit data cache accesses. The FPU has a 64 bit path to the data cache. Anytime sustained accesses to memory are necessary is not optimal. The Pentium P5 and P6 have a small advantage here made even smaller by subtracting the memory bandwidth used by loading ~20% more code. I still prefer Motorola's strategy with the 68060+ to provide competitive performance at a much lower cost.

Show 68060 Rev 6 100 Mhz match Pentium 100 at Quake benchmark.

The lower platform cost argument from 68060 is a joke in practice.

Quote:

The 8 stage 68060 ran into a pencil pusher wall at only 50MHz making Motorola look incompetent. The high performance at a low clock speed was perfect for embedded use though. As I recall, the 68060 life was over a decade and the last revision and die shrink came in 1999 but was still rated at 50MHz.

Amiga/Atari Falcon's 68060 accelerators are not limited by Motorola's official 50Mhz.

Quote:

Even though Motorola handed ARM with Thumb2 the embedded market by shoving fat PPC down customers throats,

PowerPC 601 has 2.8 million transistors, scaled to 120 Mhz clock speed in 1995, and good FPU.
68060 has 2.5 million transistors.

Power Macintosh 8100's PPC 601 reached 80Mhz in March 1994.

Intel Pentium reached 100 Mhz in March 1994.

Power Macintosh 8100/110's PPC 601 reached 110 Mhz in Nov 1994.

PowerPC came out strong in 1994.

-------
In 1995...Pentium Pro 150,166,180 and 200 Mhz models were released in Nov 1995.
Pentium 133 Mhz was released in June 1995.

PowerPC 604 reached 132 Mhz with Power Mac 9500/132's June 1995 release.

-------
Pentium 150/166 in Jan 1996.

Power Mac 9500/150's 604 reached 150 MHz CPU in April 1996.

Pentium 200 in June 1996.
AMD K5-100 reached 100 MHz in June 1996.

Power Mac 9500/200's 604e reached 200 MHz CPU in August 1996.

-------
1997
Power Mac 9600's 604e reached 233 Mhz around February 1997.

AMD K6 with MMX SIMD (mainstream Socket 7) reached 233 Mhz in Apr 1997.

Pentium II 233, 266, and 300 Mhz with MMX SIMD was released in May 1997.

Pentium MMX (mainstream Socket 7) 233 Mhz was released in June 1997.

Power Mac 9600's 604e reached 350 Mhz around August 1997.
-------
1998
Pentium II "Deschutes" 266, 300, and 333 Mhz were released in Jan 1998.
K6 266Mhz was released in Jan 1998.

Pentium II "Deschutes" 350 and 400 Mhz were released in April 1998.
K6 300Mhz was released in April 1998.

PowerPC camp ran into a clock speed wall in 1998.

Pentium II "Deschutes"450 Mhz was released in August 1998.

K6-2 400 Mhz with 3DNow SIMD was released in Nov 1998.
-------
1999 to 2000, the Ghz race between Intel Pentium III (reached 1Ghz on March 8, 2000), AMD Athlon (reached 1Ghz on March 6, 2000), and Alpha EV67 (750 Mhz)/EV68 (1Ghz in 2001).

Quote:

ARM didn't have a CPU core which could match 68060 DMIPS/MHz performance until about 2005 with the Cortex-A8 but it had a less general purpose 13 stage integer pipeline that improves ILP at the expense of branch performance (more of a media processor like the Pentium 4 but with less heat).

You omitted ARM Cortex-A8 includes a 64-bit wide NEON SIMD.

Quote:

This is MPU market share by revenue which shows how much higher margin high end computer markets like desktop and workstation markets were than the embedded market which Motorola led with the 68k.

A higher revenue margin is important for R&D health.

Quote:

Now you are talking about volume where the 68k looks much better because of the high volume of the embedded market.

68000, 17%
683XX, 9%
68040, 3%
68020, 3%
68030, 1%
---
68k is ~33% of 32+ bit MPU market by volume in 1994 but most of this is for the low margin embedded market. The 68000 should really be classified as a 16 bit CPU but it does have a 32 bit ISA.

80386SX/SL, 3%
80386DX, 3%
80486SX, 16%
80486DX, 21%
Pentium, 4%
---
x86 is ~47% of 32+ bit MPU market by volume in 1994 but most of this is for the high margin desktop market.

You missed my point on why I have shown specific 68K models i.e. to remove 68000 cloak that hides 68020's, 68030's, and 68040's volume market share.

68000 is not Doom CPU capable.

80386SX has a 16-bit front side bus, hence its 32-bit ALU is gimped i.e. hence it joins a similar boat as 68000/68010. 80386SX has a built-in i386 MMU.

Removing the "16-bit" lame 32-bit CPU ducks.

68040, 3%
68020, 3%
68030, 1%
Total: 7%

80386DX, 3%
80486SX, 16%
80486DX, 21%
Pentium, 4%
Total: 44%

In 1992's wholesale price, Motorola's 68030-25 wasn't cost-effective against AMD's Am386-40.

Last edited by Hammer on 02-May-2024 at 09:15 AM.
Last edited by Hammer on 02-May-2024 at 08:54 AM.
Last edited by Hammer on 02-May-2024 at 08:05 AM.
Last edited by Hammer on 02-May-2024 at 07:59 AM.
Last edited by Hammer on 02-May-2024 at 07:47 AM.
Last edited by Hammer on 02-May-2024 at 07:42 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

matthey

Re: some words on senseless attacks on ppc hardware
Posted on 2-May-2024 23:21:09

[ #1263 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2150
From: Kansas

Hammer Quote:

Show 68060 Rev 6 100 Mhz beating Pentium 100 at Quake benchmark. Hint: Warp1260 with RTG couldn't do it.

The compiler support between the P5 Pentium which became the most popular desktop CPU and the 68060 which was a high end and thus lower volume embedded CPU is incomparable. The 68060 received minimal if any compiler support but high performance CPU cores usually require highly tuned and optimized code to get anywhere close to their potential. The 68060 performance with old and poorly optimized code is impressive but there is more possible. My VBCC support code enhancements demonstrate what improvements to one part of compiler support can make but it is just scratching the surface of what is possible to the many parts.

o compiler backend (68k compilers lack good int and fp code generation)
o compiler support software (GCC support primitive, VBCC has some assembly code and inlines)
o compiler instruction scheduler (no 68060 specific instruction scheduler for GCC or VBCC)
o compiler other (VBCC uses VASM which has the best peephole optimizer for the 68k)
o game optimizations (Amiga Quake versions may have assembly code but some not 68060 optimized)
o OS optimizations (AmigaOS is optimized for a 16 bit 68000 CPU with a 25 year old compiler)
o gfx/RTG driver optimizations (many drivers are compiled so poorly optimized)

While compiler support is the most important for performance, Quake was designed and highly optimized for an x86 PC target using likely man years of optimizations that only a profitable game market can provide. OS and gfx drivers play a part and the quality of their code usually depends on compiler support as well.

Hammer Quote:

Pentium's Floating-Point Unit (FPU) of Pentium has an eight-stage pipeline i.e.
Prefetch (PF),
Decode-1 (D1),
Decode-2 (D2),
Execute (dispatch),
Floating Point Execute-1 (X1)
Floating Point Execute-2 (X2)
Write Float (WF)
Error Reporting (ER)

Pentium's Floating-Point Unit (FPU) of Pentium MMX has a nine-stage pipeline i.e.
Prefetch (PF),
Fetch (F),
Decode-1 (D1),
Decode-2 (D2),
Execute (dispatch),
Floating Point Execute-1 (X1)
Floating Point Execute-2 (X2)
Write Float (WF)
Error Reporting (ER)

Pipelined enables "instructions-in-flight".

The 68060 allows multiple FPU "instructions-in-flight" in the 68060 pipeline but only one can be executed at a time inside the FPU. There are several single cycle latency FPU instructions like FMOVE, FABS, FNEG, FCMP and FTST where there is no disadvantage to a non-pipelined FPU. FDIV and FSQRT are usually not pipelined in pipelined FPUs. FADD, FSUB and FMUL are the important FPU instructions to pipeline as they are used often but there is no performance loss if they are scheduled 3 cycles apart which is their execution latency. The 68060 can likely have more pipelined "instructions-in-flight" than the Pentium.

https://websrv.cecs.uci.edu/~papers/mpr/MPR/ARTICLES/061505.pdf Quote:

One advantage that the 68060 will have over Pentium is that it can issue an integer instruction in parallel with most floating-point instructions. Also, it can continue to issue integer instructions into both pipes while a long-latency operation continues in the floating-point unit. Pentium, in contrast, ties up the entire processor when performing floating-point calculations.

https://en.wikipedia.org/wiki/Motorola_68060#Architecture Quote:

Against the Pentium, the 68060 can perform better on mixed code; Pentium's decoder cannot issue an FP instruction every opportunity and hence the FPU is not superscalar as the ALUs were. If the 68060's non-pipelined FPU can accept an instruction, it can be issued one by the decoder. This means that optimizing for the 68060 is easier: no rules prevent FP instructions from being issued whenever was convenient for the programmer other than well understood instruction latencies. However, with properly optimized and scheduled code, the Pentium's FPU is capable of double the clock for clock throughput of the 68060's FPU.

Most FPU using code is mixed code (int+fp instructions) where the 68060 has the performance advantage but heavy FPU code (mostly fp instructions) on the Pentium has a much higher theoretical FPU performance limit that is rarely realized, especially by compilers. The 68060 designers focused on integer performance and were likely trying to save transistors to double the caches with the 68060+ which would have improved FPU and integer performance. A pipelined FPU and/or multiple FPU sub units (FADD/FSUB/FMUL, FDIV/FSQRT, FMOVE/FABS/FNEG/FCMP/FTST FPU instruction execution in parallel) could be provided when transistor costs were cheaper.

Jay Miner Quote:

They say that engineering is the art of compromise and I can really attest to that.

https://youtu.be/n-MqC35aWrQ?t=439

Hammer Quote:

Show 68060 Rev 6 100 Mhz match Pentium 100 at Quake benchmark.

The lower platform cost argument from 68060 is a joke in practice.

The low price of the 68060 allowed it to survive and succeed in the lower volume high end embedded market. In April of 1994, the 68060@50MHz had a price of $308 each while the Pentium P54C@100MHz price was $995 each for 1000 units. The 68060 had better performance/price than the Pentium even though it was reduced by the low clock speed which only grew worse as it was held at 50MHz and by lackluster compiler benchmarks compared to the Pentium which received much better compiler support.

Hammer Quote:

PowerPC 601 has 2.8 million transistors, scaled to 120 Mhz clock speed in 1995, and good FPU.
68060 has 2.5 million transistors.

Power Macintosh 8100's PPC 601 reached 80Mhz in March 1994.

Intel Pentium reached 100 Mhz in March 1994.

Power Macintosh 8100/110's PPC 601 reached 110 Mhz in Nov 1994.

PowerPC came out strong in 1994.

The PPC 601 has a 32kiB unified cache where the 68060 and Pentium have 8kiB I+D (16kiB total). This is a good example of the RISC concept of using a simpler shorter (4 stage) pipeline and applying the transistor savings to the caches. Performance/MHz was competitive with the 68060 and Pentium but the shallow pipeline and 32kiB cache likely limited the max clock speed resulting in the PPC 601+ with an expensive die shrink. As I recall, the PPC 601 and PPC 603 FPUs were only fully pipelined for single precision while compilers used practically all double precision instructions before C99. Still, it wasn't difficult to outperform the ugly stack based Pentium FPU or the minimalist 68060 FPU.

Hammer Quote:

In 1995...Pentium Pro 150,166,180 and 200 Mhz models were released in Nov 1995.
Pentium 133 Mhz was released in June 1995.

PowerPC 604 reached 132 Mhz with Power Mac 9500/132's June 1995 release.

The 14 stage PPro pipeline was overkill wasting transistors but it allowed a high max clock speed. The PPC 604 had a longer 6 stage pipeline that allowed it to be clocked up and the split 16kiB I+D (32kiB total) was an improvement over the PPC 601. As I recall the PPC 604 receive a FPU that was fully pipelined for double precision fp which was also a nice upgrade. It's a powerful practical design but now 3.6 million transistors and Steve Jobs wants more MHz to compete with the deeply pipelined PPro.

Hammer Quote:

Pentium 150/166 in Jan 1996.

Power Mac 9500/150's 604 reached 150 MHz CPU in April 1996.

Pentium 200 in June 1996.
AMD K5-100 reached 100 MHz in June 1996.

Power Mac 9500/200's 604e reached 200 MHz CPU in August 1996.
-------
1997
Power Mac 9600's 604e reached 233 Mhz around February 1997.

AMD K6 with MMX SIMD (mainstream Socket 7) reached 233 Mhz in Apr 1997.

Pentium II 233, 266, and 300 Mhz with MMX SIMD was released in May 1997.

Pentium MMX (mainstream Socket 7) 233 Mhz was released in June 1997.

Power Mac 9600's 604e reached 350 Mhz around August 1997.

The PPC 604e received the old die shrink with doubling of the caches that boosted the clock speed in the past. The caches were now 32kiB I+D (64kiB total) which may have limited the max clock speed but the poor code density required more instruction cache at higher speeds, the 6 stage pipeline was barely adequate at higher clock speeds and the 5.1 million transistor chip was expensive. The PPC 604e was one of the most powerful desktop CPUs for a short period of time but the design didn't have much potential left. PPC looked dead end until the PPC G3 with L2 cache to feed the RISC instruction fetch bottleneck replaced the 604(e) design. The early PPC AmigaNOne line used some of the first PPC G3 CPUs to have on-chip L2 caches now using 20+ million transistors, mostly for caches to feed the RISC monster. Those shallow pipeline PPC cores are definitely smaller than more powerful and pipelined CISC cores though.

Hammer Quote:

1998
Pentium II "Deschutes" 266, 300, and 333 Mhz were released in Jan 1998.
K6 266Mhz was released in Jan 1998.

Pentium II "Deschutes" 350 and 400 Mhz were released in April 1998.
K6 300Mhz was released in April 1998.

PowerPC camp ran into a clock speed wall in 1998.

PPC G3 designs started appearing about this time although the first ones did not have an on chip L2. PPC designs generally stopped trying to compete for max clock speeds which was always the intention of the PPC ISA. The PPC ISA breaks from the classic RISC philosophy choosing more complexity to better compete with simpler RISC ISAs. This strategy was ahead of its time as most RISC ISAs added complexity and adopted more CISC like features. So what went wrong? PPC designs focused on shallow pipeline limited OoO designs to minimize load-to-use stalls and branch prediction logic. While generally a good idea, they were too locked into this concept with PPC designs for too long. Another problem was their code density was not enough of an improvement over older classic RISC ISAs. ARM Thumb ISAs took the low end of the PPC market with much better 68k like code density but lacked 68k like performance to finish off the better than Thumb performance PPC but then the similar to PPC AArch64 with better performance and significantly better code density did.

Hammer Quote:

Pentium II "Deschutes"450 Mhz was released in August 1998.

K6-2 400 Mhz with 3DNow SIMD was released in Nov 1998.
-------
1999 to 2000, the Ghz race between Intel Pentium III (reached 1Ghz on March 8, 2000), AMD Athlon (reached 1Ghz on March 6, 2000), and Alpha EV67 (750 Mhz)/EV68 (1Ghz in 2001).

The odd man out here is the DEC Alpha where architects pioneered the L2 cache to minimize the RISC instruction fetch bottleneck. Just because Alpha, MIPS, SPARC, PA-RISC and PPC are dead RISC ISAs with some of the worst code densities doesn't mean their death was all about code density. Performance matters too as as a 1GHz x86 CPU is a lot more powerful than a 1GHz classic RISC CPU. CISC ISAs are stronger because both register and cache accesses can be pipelined, fewer more powerful instructions are used, more powerful instructions and addressing modes are used, memory traffic is reduced with fewer memory accesses and less code fetches and caches are saved with compressed code. Well, x86 is far from a good example of CISC but it was good enough to take down the RISC competition which has 4 times the number of GP registers.

Hammer Quote:

You omitted ARM Cortex-A8 includes a 64-bit wide NEON SIMD.

A SIMD unit back then didn't make any difference to compiled benchmarks and most general purpose compiled software. Even today, auto vectorization is tricky to extract consistent performance gains from general purpose compiled code.

Hammer Quote:

A higher revenue margin is important for R&D health.

True. Intel and AMD won not just because of x86 CISC superiority but because of economies of scale in their high margin markets. Had Motorola not thrown out their 68k baby, they may have been able to leverage larger volume economies of scale in the embedded market like ARM did. Granted, the embedded market is much larger now but I believe Intel is more worried about ARM trying to push up into their high end high margin markets than AMD competition because ARM is more destructive to margins than AMD.

Hammer Quote:

You missed my point on why I have shown specific 68K models i.e. to remove 68000 cloak that hides 68020's, 68030's, and 68040's volume market share.

68000 is not Doom CPU capable.

80386SX has a 16-bit front side bus, hence its 32-bit ALU is gimped i.e. hence it joins a similar boat as 68000/68010. 80386SX has a built-in i386 MMU.

Removing the "16-bit" lame 32-bit CPU ducks.

68040, 3%
68020, 3%
68030, 1%
Total: 7%

80386DX, 3%
80486SX, 16%
80486DX, 21%
Pentium, 4%
Total: 44%

In 1992's wholesale price, Motorola's 68030-25 wasn't cost-effective against AMD's Am386-40.

The 68EC030 was very good value from historic prices I have seen. CBM should have moved to a 68EC030@28MHz instead of 68EC020@14MHz but they needed a chipset that could run at 28MHz. A 68EC030@28MHz with AA+ would have competed better against cheap 386s and saved their reputation. Motorola generally expected closer to desktop margins for their full CPUs though. Intel's pricing seemed to be more about higher margins for higher clock speed rated CPUs which they pushed more but it made more sense to push for high margin markets.

Last edited by matthey on 02-May-2024 at 11:39 PM.
Last edited by matthey on 02-May-2024 at 11:27 PM.

Status: Offline

Hammer

Re: some words on senseless attacks on ppc hardware
Posted on 3-May-2024 0:20:26

[ #1264 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5616
From: Australia

@matthey

Quote:
While compiler support is the most important for performance, Quake was designed and highly optimized for an x86 PC target using likely man years of optimizations that only a profitable game market can provide. OS and gfx drivers play a part and the quality of their code usually depends on compiler support as well.

Quake was designed for Pentium FPU.
https://youtu.be/DWVhIvZlytc?t=934
K6 vs Pentium FPU with Quake and Quake 2.

K6-3 includes the full FPU design fix. It took AMD about 23 months to fix K6's FPU with concurrent K7 Athlon's R&D.

Quote:

The 68060 allows multiple FPU "instructions-in-flight" in the 68060 pipeline but only one can be executed at a time inside the FPU. There are several single cycle latency FPU instructions like FMOVE, FABS, FNEG, FCMP and FTST where there is no disadvantage to a non-pipelined FPU. FDIV and FSQRT are usually not pipelined in pipelined FPUs. FADD, FSUB and FMUL are the important FPU instructions to pipeline as they are used often but there is no performance loss if they are scheduled 3 cycles apart which is their execution latency. The 68060 can likely have more pipelined "instructions-in-flight" than the Pentium.

Nope. 68060 has a 32-bit front-side bus issue in addition to 68060's FPU design issues.

https://www.youtube.com/watch?v=0_dW-21gdkw
Warp 1260 with RTG playing Quake. Warp 1260's RTG doesn't have a Zorro III/Super Buster bottleneck. Warp 1260 includes on PCB L2 cache.

Warp 1260's 68060 @ 100 MHz has a 32-bit 100 Mhz front side bus while Pentium 100 Mhz has 64 64-bit 66 Mhz front side bus. 32-bit 100 Mhz front side bus is equivalent to 50 Mhz 64-bit front side bus.

Warp 1260's 68060 @ 100 MHz has Pentium 75 results.

K6-3 has up to 100 Mhz 64-bit front side bus Super Socket 7 that competed against Pentium II 450's 100Mhz 64-bit front side bus.

K6-2 has 100Mhz Super Socket 7 support, but without the full FPU design fix.

Motorola didn't port 88110's 60x bus for 68060. By 1994, Motorola's revenues are less than AMD's and Motorola is fully focused on PowerPC. Motorola designed two distinct 603 and 604 PowerPC core designs with different CPU core die sizes for its high/low product segmentation.

Intel just cut Pentium II's L2 cache and called it Celeron for its high/med/low product segmentation i.e. Xeon, Pentium II, and Celeron.

PowerPC 601/603/604 FPU has a simplified FP64 design instead of Pentium's and 68060's FP80.

683XX led to the lesser tier Coldfire R&D.

Quote:

True. Intel and AMD won not just because of x86 CISC superiority but because of economies of scale in their high margin markets. Had Motorola not thrown out their 68k baby, they may have been able to leverage larger volume economies of scale in the embedded market like ARM did.

ARM's supporters create a safe high-margin market space for ARM i.e. handheld smart phones.

Quote:

Granted, the embedded market is much larger now but I believe Intel is more worried about ARM trying to push up into their high end high margin markets than AMD competition because ARM is more destructive to margins than AMD.

Nope. Intel RaptorLake-R suffered another unstable Pentium III Ghz race-like debacle. https://wccftech.com/only-5-out-of-10-core-i9-13900k-2-out-of-10-core-i9-14900k-cpus-stable-in-auto-profile-intel-board-partners-stability-issues/

The tester is the owner of a studio that buys several CPUs for their own needs. In invoices shared by the tester, it is revealed that he has bought and tested at least 100s of Intel Core i9-13900K and Core i9-14900K CPUs and it looks like almost all of the chips he acquired had some sort of issue in terms of stability. Motherboards used by the studio include ASUS's Z790, B760, Z690 and B660 boards.

The software he runs requires each CPU and PC to pass through a certain variety of tests and at the Auto profile set in the ASUS motherboards, the majority of CPUs fail this test and have to be resold. Based on these tests, the tester determined a probability rate respective to the CPU's stability & it is shared below:

Intel Core i9-13900K "AUTO -253W" - 40/50% (4/5 out of 10 units stable)
Intel Core i9-13900K "Reduced Loadline" - 50-60% (5/6 out of 10 units stable)
Intel Core i9-13900K "B760/B660 Board" - 60-70% (6/7 out of 10 units stable)
Intel Core i9-14900K "AUTO - 253W" - 20% (2 out of 10 units stable)
Intel Core i9-14900K "Reduced Loadline" - ~30% (3 out of 10 units stable)
Intel Core i9-14900K "B760/B660 Board" - 40% (4 out of 10 units stable)

So the out-of-the-box experience on an Intel 13th and 14th Gen CPUs is bad. It is reported that the chips might work fine for a week or a little over a month but usually end up producing stability issues.

Quote:

The 68EC030 was very good value from historic prices I have seen. CBM should have moved to a 68EC030@28MHz instead of 68EC020@14MHz but they needed a chipset that could run at 28MHz. A 68EC030@28MHz with AA+ would have competed better against cheap 386s and saved their reputation. Motorola generally expected closer to desktop margins for their full CPUs though. Intel's pricing seemed to be more about higher margins for higher clock speed rated CPUs which they pushed more but it made more sense to push for high margin markets.

In 1992, Motorola should have recognized they were not a leading CPU vendor and should have acted like AMD's price disruption tactics.

For the Saturn, Sega rejected 68030 on pricing. Motorola focused on Intel instead of non-Intel competitors.

For "kick-the-OS hit-the-metal" games, 68020/68030 wasn't 100% instruction set compatible with 68000.

Motorola's MMU premium price didn't encourage Linux 68K. ARM's uptake has the low-cost/low power consumption MMU-equipped ARM710T (ARMv4T) CPUs with Linux kernel wave.

Last edited by Hammer on 03-May-2024 at 12:53 AM.
Last edited by Hammer on 03-May-2024 at 12:48 AM.
Last edited by Hammer on 03-May-2024 at 12:42 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

matthey

Re: some words on senseless attacks on ppc hardware
Posted on 3-May-2024 20:57:38

[ #1265 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2150
From: Kansas

Hammer Quote:

Quake was designed for Pentium FPU.
https://youtu.be/DWVhIvZlytc?t=934
K6 vs Pentium FPU with Quake and Quake 2.

K6-3 includes the full FPU design fix. It took AMD about 23 months to fix K6's FPU with concurrent K7 Athlon's R&D.

There are many FPU design decisions which affect FPU performance.

o ISA
o ABI
o FPU algorithm efficiency (FPU instruction latency)
o FPU pipelining (FPU instruction throughput)
o FPU register renaming (FPU registers can sometimes be reused, easier instruction scheduling)
o FPU parallel sub units (like OoO execution for FPU)
o FPU to CPU interface
o memory and cache resources
o code optimization

AMD was unfortunate with the OoO K6 that independent instructions could not be retired under a FDIV. This is not really a bug but maybe a design oversight. Most floating point programs do not use FDIV as much as 3D transformations for Quake and only a few years later perspective corrected T&L 3D gfx boards reduced the importance of FDIV performance. Intel dodged a bullet by having good FPU performance for the Pentium P5 which was weak at more important integer performance compared to the CISC competition.

How well can the minimalist 68060 FPU handle FDIV for Quake in comparison? The Pentium P5 has a FDIV latency in cycles of 19 single precision, 33 double precision and 39 extended precision. All precisions have a latency of 37 cycles on the 68060. The 68k FPU ISA has a single precision division instruction called FSGLDIV which on the 6888x has a 69 cycle latency instead of FDIV 103 cycle latency. The 68040 and 68060 also received FSop and FDop instructions which round the instruction result to single and double precision which it may be possible to optimize for less precision. The advantage of these FPU instructions is that the FPCR register does not need to be changed and changed back for selecting different precisions which are expensive operations.

FMOVE Dn,FPCR ; 8 cycles on 68060
FMOVE FPCR,Dn ; 4 cycles on 68060

FLDCW ; 8 cycles on P5 Pentium
FNSTCW ; 2 cycles on P5 Pentium

I believe x86 has to change the equivalent of the FPCR (CW?) and there are no instructions which select a different precision without this expensive overhead. Quake needs more than single precision variables sometimes so it is not possible to set the global precision to single precision all the time. Integer instructions on the 68060 can continue to execute in parallel with the FDIV like the P5 Pentium and shouldn't ever stall like the AMD K6. The 68060 single precision FDIV takes 18 cycles longer than the P5 Pentium but can continue to execute int instructions while the AMD K6 stalls for 14 cycles with limited execution of instructions. The 68060 avoids the overhead of changing the precision in the FPCR/CW as well. Since we know integer instructions execute in parallel with the FDIV, I would say the 68060 is likely to have better performance for this specific case than the AMD K6. The K6 came out in 1997 with OoO, 32kiB I+D caches and using 8.8 million transistor to the 68060 2.5 million so overall the K6 FPU performance was likely better. The earlier P5 Pentium FPU performance no doubt has more potential than the 68060 FPU but the x86 stack based FPU ISA reduces the advantage and the 68060 clean GP register FPU ISA boosts the fp performance to be surprising close in practice. Quake may have been the exception with Pentium P5 specific optimizations and lots of hand optimized assembly code necessary to unlock the difficult to extract x86 FPU performance.

Hammer Quote:

Nope. 68060 has a 32-bit front-side bus issue in addition to 68060's FPU design issues.

The max number of "instructions-in-flight" is a best case scenario with everything cached and is highly dependent on the pipeline length where the 68060 has the advantage over the P5 Pentium. The 2 extra stages of the P5 Pentium FPU don't make up the shortfall. On average, the 68060 has more "instructions-in-flight" too as it has fewer superscalar multi-issue restrictions and multi-issues a lot more (45%-55% dual/triple issue with existing 68k code and 50%-65% dual/triple issue with 68060 code). Also, The P5 Pentium is really good at tying up the integer units with FXCH instructions while FPU instructions are executing rather than executing integer instructions in parallel with FPU instructions. Don't under estimate the advantage of a more orthogonal and cleaner CISC ISA.

Hammer Quote:

683XX led to the lesser tier Coldfire R&D.

Motorola should have adopted the CPU32 ISA (6833x) across all their embedded products and upgraded it with ColdFire instructions. This would have given good 68k compatibility with some simplification over the 68020 ISA, full 32 bit ISA support and better than Thumb2 code density. Instead, ColdFire destroyed 68k compatibility, reduced compiler support and further divided the 68k embedded market.

Hammer Quote:

ARM's supporters create a safe high-margin market space for ARM i.e. handheld smart phones.

I'm not sure how safe Qualcomm considers that market space for ARM.

Hammer Quote:

In 1992, Motorola should have recognized they were not a leading CPU vendor and should have acted like AMD's price disruption tactics.

For the Saturn, Sega rejected 68030 on pricing. Motorola focused on Intel instead of non-Intel competitors.

For "kick-the-OS hit-the-metal" games, 68020/68030 wasn't 100% instruction set compatible with 68000.

Motorola's MMU premium price didn't encourage Linux 68K. ARM's uptake has the low-cost/low power consumption MMU-equipped ARM710T (ARMv4T) CPUs with Linux kernel wave.

The 68000 had Japanese 2nd sources in Hitachi and Toshiba which improved the likely hood of use in Japanese products. The fall out and lawsuits with Hitachi resulted in Motorola becoming spooked with 2nd suppliers and producing the high end 68020+ chips themselves. As I recall, Hitachi tried to get an injunction that would stop the 68030 from being sold. This likely means that Motorola used some of Hitachi's fab technology to produce the 68030 and Hitachi may have been slated to 2nd source produce it, perhaps to supply Sega. Sega sided with the Japanese Hitachi and went with SuperH instead of 68k.

I didn't like Motorola's 68k pricing strategy either. It really didn't make sense with the 68040+ which could have and perhaps should have included a standard MMU and FPU at no additional cost and just charge more for higher clocked and enhanced versions. For example, the first 68060 included the MMU and FPU which they sold as EC and LC versions, likely including chips that failed MMU and FPU testing but others were fully functional. Their biggest priority was to make new dies for EC and LC chips without the MMU and/or FPU but by this time there wasn't much silicon savings from eliminating them. Economies of scale were better to make more of one standard chip. They also could have made the 68060+ with double the caches for not much more development effort than the EC and LC chips and which could have been sold with a significantly higher margin. Don't downgrade customers, upgrade them. Intel knew how to upgrade their products and customers. Upgrading the 68060 would have damaged the PPC market though so down was the only direction for the 68k until it was stripped down to the ColdFire. It's kind of like the CBM strategy of stripping down and cost reducing products like the 68000+OCS/ECS Amiga to become a C64 only to find that the now obsolete product has little value and demand.

Last edited by matthey on 03-May-2024 at 09:51 PM.

Status: Offline

Hypex

Re: some words on senseless attacks on ppc hardware
Posted on 25-Jun-2024 5:48:41

[ #1266 ]

Elite Member

Joined: 6-May-2007
Posts: 11281
From: Greensborough, Australia

@jPV

Quote:
For me it works exactly the same on MorphOS and AmigaOS 3.1, so I can't see any incoherence here. Both add quotation marks on the RequestFile output, and give the same result when combining.

Unfortunately I didn't take any specific notes. I just set about writing a script that was compatible with everything. And in this case that is one script that could work with OS 3.1, 3.9, 4.x and MOS.

Retesting I have found MOS does work like OS3.1 . But OS3.9 was different as was OS4. The later of which would filter out quotes. I tend to test OS3.1 using RunInUAE on OS4. And have an OS3.9 setup on FSUAE on my laptop.

But there are other differences and I had to go searching to find them. Links work different. On OS3.1 files can be linked into WBStartup. On MOS that fails. It could be due to SFS not fully supporting links.

Then I found the guide viewer on MOS didn't work the same. I had a guide with tables using guide tags but on MOS guide it was all out of place. I had to replace me neat tags with literal spaces.

So by the end of it I had one script that should have transparently worked on everything based on a few install settings but in the end had to put all these conditionals in to account for all differences. And not just MOS. It had to account for Enhancer command differences as well so that's almost like 5 different platforms.

Status: Offline

Hammer

Re: some words on senseless attacks on ppc hardware
Posted on 25-Jun-2024 9:07:45

[ #1267 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5616
From: Australia

@matthey

Quote:

There are many FPU design decisions which affect FPU performance.

o ISA
o ABI
o FPU algorithm efficiency (FPU instruction latency)
o FPU pipelining (FPU instruction throughput)
o FPU register renaming (FPU registers can sometimes be reused, easier instruction scheduling)
o FPU parallel sub units (like OoO execution for FPU)
o FPU to CPU interface
o memory and cache resources
o code optimization

Quake's public profile is major.

Michael Abrash's optimization for Quake's Pentium code path and rewarded with work in Intel's Larabee efforts.

In modern times, Zen 5 Strix (different from desktop Zen 5) is optimized for certain benchmarks used by Apple and Cinebench i.e. focus on MUL while reducing ADD.

Quote:

AMD was unfortunate with the OoO K6 that independent instructions could not be retired under a FDIV. This is not really a bug but maybe a design oversight. Most floating point programs do not use FDIV as much as 3D transformations for Quake and only a few years later perspective corrected T&L 3D gfx boards reduced the importance of FDIV performance.

https://thandor.net/benchmark/33
Quake benchmark
Intel Pentium MMX 166 Mhz = 39.80 fps
Intel Pentium 166 Mhz = 37.30 fps
AMD K6 166ALR (166 Mhz) = 34.70 fps

Cyrix 6x86MX PR200 (166 Mhz) = 27.40 fps
Intel Pentium 100 Mhz = 26.70 fps
Cyrix/IBM 6x86 P200+ (150MHz) = 22.90 fps

AMD's IPC Quake gap with Intel wasn't big compared to Cyrix's offerings. Cyrix was culled from the market.

Quote:

Intel dodged a bullet by having good FPU performance for the Pentium P5 which was weak at more important integer performance compared to the CISC competition.

Pentium's integer performance can be good since its FPU can also process integers e.g. FIMUL. SSE units handling integer datatype are not new.

Pentium's legacy integer with just GPRs is inferior to the competition. When Pentium class games require FPU, it doesn't mean it need floating point processing.

Quote:

How well can the minimalist 68060 FPU handle FDIV for Quake in comparison? The Pentium P5 has a FDIV latency in cycles of 19 single precision, 33 double precision and 39 extended precision.

For IEEE 3D, SSE1 or 3DNow's precision is enough. GLfloat is IEEE FP32.

Modern gaming GPpGPUs are mostly optimized for IEEE FP32.

The last good FP64 gaming GpGPU was the Radeon HD 7970 GE. If you want a good FP64 GpGPU, buy expensive workstation or server models.

Reminder, 68060 has 4 bytes per cycle fetch from the instruction cache problem.

Quote:

How well can the minimalist 68060 FPU handle FDIV for Quake in comparison? The Pentium P5 has a FDIV latency in cycles of 19 single precision, 33 double precision and 39 extended precision. All precisions have a latency of 37 cycles on the 68060. The 68k FPU ISA has a single precision division instruction called FSGLDIV which on the 6888x has a 69 cycle latency instead of FDIV 103 cycle latency. The 68040 and 68060 also received FSop and FDop instructions which round the instruction result to single and double precision which it may be possible to optimize for less precision.

The advantage of these FPU instructions is that the FPCR register does not need to be changed and changed back for selecting different precisions which are expensive operations.

FMOVE Dn,FPCR ; 8 cycles on 68060
FMOVE FPCR,Dn ; 4 cycles on 68060

FLDCW ; 8 cycles on P5 Pentium
FNSTCW ; 2 cycles on P5 Pentium

Agner claims 8 cycles for Pentium's FLDCW.

https://home.zcu.cz/~dudacek/SOJ/manualy/80x87set.pdf
Pentium's FLDCW has 7 cycles. 486's FLDCW has 4 cycles.

FLDCW is not a major problem.

Quote:

I believe x86 has to change the equivalent of the FPCR (CW?) and there are no instructions which select a different precision without this expensive overhead. Quake needs more than single precision variables sometimes so it is not possible to set the global precision to single precision all the time.

Integer instructions on the 68060 can continue to execute in parallel with the FDIV like the P5 Pentium and shouldn't ever stall like the AMD K6. The 68060 single precision FDIV takes 18 cycles longer than the P5 Pentium but can continue to execute int instructions while the AMD K6 stalls for 14 cycles with limited execution of instructions.

Quake benchmark
Intel Pentium 166 Mhz = 37.30 fps
AMD K6 166ALR (166 Mhz) = 34.70 fps

https://www.youtube.com/watch?v=0_dW-21gdkw
Quake on Warp1260's 68060 Rev 6 @ 100 Mhz, 64 KB L2 cache, and RTG. The results are like Pentium 75 or Pentium 83 Mhz OverDrive (with 486's 32-bit bus).

Pentium 83 Mhz OverDrive has an L1 cache of 16KB+16KB while normal Pentiums have 8KB+8KB. Pentium MMX's L1 cache has 16KB+16KB.

Quote:

The 68000 had Japanese 2nd sources in Hitachi and Toshiba which improved the likely hood of use in Japanese products. The fall out and lawsuits with Hitachi resulted in Motorola becoming spooked with 2nd suppliers and producing the high end 68020+ chips themselves. As I recall, Hitachi tried to get an injunction that would stop the 68030 from being sold. This likely means that Motorola used some of Hitachi's fab technology to produce the 68030 and Hitachi may have been slated to 2nd source produce it, perhaps to supply Sega. Sega sided with the Japanese Hitachi and went with SuperH instead of 68k.

Most of 68030's IPC gains are from 68020.

Other 68K second source vendors didn't develop their CISC CPU R&D capability at similar levels as AMD or Cyrix.

Quote:

I didn't like Motorola's 68k pricing strategy either. It really didn't make sense with the 68040+ which could have and perhaps should have included a standard MMU and FPU at no additional cost and just charged more for higher clocked and enhanced versions.

68EC040 with 68030's cache control mode would be sufficient with write-through mode.

040 and 060 have cache copyback, hence MMU is needed.

Last edited by Hammer on 25-Jun-2024 at 11:31 AM.
Last edited by Hammer on 25-Jun-2024 at 11:23 AM.
Last edited by Hammer on 25-Jun-2024 at 11:09 AM.
Last edited by Hammer on 25-Jun-2024 at 11:03 AM.
Last edited by Hammer on 25-Jun-2024 at 10:54 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Hypex

Re: some words on senseless attacks on ppc hardware
Posted on 25-Jun-2024 16:08:10

[ #1268 ]

Elite Member

Joined: 6-May-2007
Posts: 11281
From: Greensborough, Australia

@matthey

Quote:
I agree even though a 68k AmigaOS 4 may be interesting. It would likely be more work to port AmigaOS 4 back to 68k than to improve the performance of the emulation of PPC. The poor performance of the emulation of PPC likely is due to the MMU with the easy solution being an option to disable the use of the MMU in AmigaOS 4. The Hyperion 68k AmigaOS developers are currently porting back some of the AmigaOS 4 features to improve API compatibility for software development which may be better than a full AmigaOS 4 backport. There are AmigaOS 4 features that are heavy for an emulated 68k Amiga virtual machine like compositing and MMU use.

If the MMU is dropped then some core features are lost as well. The Grim Reaper crash logger relies on MMU to catch traps. For example, a simple null pointer write, or invalid memory access. Now, for 68K, making use of the MMU was a developers benefit. It wasn't a standard in the OS. But being standard and catching bad access is a good idea for stability. If the OS4 style crashes were around in 68K times then we'd have more stable software. It's almost like the stability of 68K is security through obscurity to make a contrast. The open nature of the OS wasn't secure, but bad software could trash memory without any obvious clues, so it was rather obscure. You had to judge a book by the cover, as most software ran fine, and only by going too far did the Guru jump up.

That aside, I think for 68K, an AmigaOS 4 that would be original would have been best. That is the AmigaOS 4 Commodore would have been developing. Though, by that stage, Commodore were moving away from 68K on the roadmap. So AmigaOS 4, had it been released, may have not even been on 68K. The talk was about Hombre and PA-RISC, but, from what I've read from people online the PA-RISC may not have been the replacement CPU we all were lead to think. I've read it would have been used as a GPU, so not as a direct 68K replacement, and another CPU would be driving it. Which possibly could have been a PPC.

Quote:
With emulation, the major development effort goes into the emulator instead of the compiler as this gives the most benefit. This is why emulation of the 68k Amiga is so good while 68k Amiga compiler support declines. There is minimal effort to improve the performance of the 68k AmigaOS which is only compiled for the same 16 bit 68000 target from 1985. The A600GS performs optimizations for ARM software to recover the lost performance from RISC load-to-use stalls during 68k emulation. The 68k Amiga is not moving forward but rather x86(-64) Windows with WinUAE, ARM Raspberry Pi with emu68, ARM custom hardware with THEA500 Mini and A600GS, etc. Where is the new 68k Amiga development with hundreds of thousands of units using emulation of the 68k Amiga already sold?

I'd say the PiStorm would be the closest to answer that. It is a new board for old Amigas. Without new chips, there cannot be new Amigas, since any new redesigns rely on scrapping the Amiga chips. It almost takes a backstep by backporting to 68000. And all those CPU specific compiles of big software almost made redundant. Since backward compatibility with a stock A500 is in demand. Perhaps this is what also held the Amiga back and not just Commodore. Amiga users were reluctant to upgrade. Now sure, you could really only upgrade CPU and memory, if it was supported. But the wedge design meant it was only possible to update and Amiga by replacing the Amiga. I don't know if Apple or MS were held to the same standards, needing all MacOS to run back to the 68000 and all Windows to be compatible with 80286.

Status: Offline

kolla

Re: some words on senseless attacks on ppc hardware
Posted on 26-Jun-2024 1:06:59

[ #1269 ]

Elite Member

Joined: 21-Aug-2003
Posts: 3072
From: Trondheim, Norway

@Hypex

CBMâ€™s OS4 was released by Microsoft under the name Windows NT 4.0.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

Hammer

Re: some words on senseless attacks on ppc hardware
Posted on 26-Jun-2024 6:17:40

[ #1270 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5616
From: Australia

@kolla

Quote:

kolla wrote:
@Hypex

CBMâ€™s OS4 was released by Microsoft under the name Windows NT 4.0.

https://www.openpa.net/windows_netware_pa-risc.html
Quote:

Windows NT

There were development efforts in the mid-1990s to port Microsoft Windows NT to PA-RISC. HP wanted to hedge its bets in the workstation market and especially the anticipated NT workstation market, which threatened Unix workstations.

Several magazine sources and USEnet posts around 1993 point to HP pursuing a PA-RISC port to NT, modified the PA-RISC architecture for bi-endianess and even conducted a back-room presention at the â€™94 Comdex conference of a (modified HP 712?) PA-7100LC workstation running Windows NT.

Mentions of NT on PA-RISC continued in 1994 with some customer interest but ended around 1995.

Sources at HP (from the Unix division no less) spoke of dim prospects for NT on PA-RISC in October 1994 and a dead-end architecture in 1996. The final nail was the missing application landscape for PA-RISC on NT.

Consensus at HP at the time seemed to favor the ancipated move to the post-RISC era with VLIW EPIC/Itanium â€“ which did support Windows NT.

Windows NT apparently ran the following HP 9000 PA-RISC computers:

712 based on PA-7100LC processors. https://www.openpa.net/systems/hp-9000_712.html

Commodore had early information on modified PA-RISC with Windows NT (little endian).

Commodore's proposed AmigaOS port to PA-RISC is big-endian.

Last edited by Hammer on 26-Jun-2024 at 06:24 AM.
Last edited by Hammer on 26-Jun-2024 at 06:19 AM.
Last edited by Hammer on 26-Jun-2024 at 06:18 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Hypex

Re: some words on senseless attacks on ppc hardware
Posted on 26-Jun-2024 17:04:16

[ #1271 ]

Elite Member

Joined: 6-May-2007
Posts: 11281
From: Greensborough, Australia

@kolla

I've heard of the Window NT replacement but that is not an AmigaOS 4. Since it isn't an Amiga OS. It wouldn't even qualify as the x86 Amiga OS4 Amiga Inc proposed.

However, this looks to be no more than a popular rumour. One obvious technical issue is that the HP-RISC is big endian while Windows NT is little endian. It is like AmigaOS in one respect; it is not portable, it is tied to one endian. This was pointed out above. Another is that it's mentioned how Windows NT was planned to run on Hombre. But not that it was to be exclusive.

Amiga people are not skin deep surface dwellers, the OS mattered. Except for Amiga gamers. Replacing the OS with any kind of Windows would have both insulted and destroyed the majority of the user base. We know how hard core Amiga people are at Amiga being 68K with that original chipset and nothing else. Replacing that with four layer parallax of 3d and real 16 bit sound was already taking a huge risk. Especially for Amiga gamers.

Last edited by Hypex on 26-Jun-2024 at 05:07 PM.

Status: Offline

matthey

Re: some words on senseless attacks on ppc hardware
Posted on 26-Jun-2024 21:24:13

[ #1272 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2150
From: Kansas

Hypex Quote:

If the MMU is dropped then some core features are lost as well. The Grim Reaper crash logger relies on MMU to catch traps. For example, a simple null pointer write, or invalid memory access. Now, for 68K, making use of the MMU was a developers benefit. It wasn't a standard in the OS. But being standard and catching bad access is a good idea for stability. If the OS4 style crashes were around in 68K times then we'd have more stable software. It's almost like the stability of 68K is security through obscurity to make a contrast. The open nature of the OS wasn't secure, but bad software could trash memory without any obvious clues, so it was rather obscure. You had to judge a book by the cover, as most software ran fine, and only by going too far did the Guru jump up.

Sure. Even without process isolation, protecting and catching illegal accesses to the zero page, code, unused addresses, etc. are valuable. AmigaOS3 has optional and modular use of the MMU through ThoR's MMU Libraries. An example of how much difference this makes is a Warp3D Avenger (Voodoo) library bug that steps through memory trashing it and crashing the system without active MMU while ThoR's MMU support stopped the crashing and let me find and fix the bug (there is still no official fix for the 68k Warp3D bug from A-Eon). AmigaOS 4 may have a few more MMU features as standard like the Grim Reaper and code protection on by default but this is possible with ThoR's MMU library even though it is not the default on 68k AmigaOS 3. Does requiring a MMU and making it standard have a performance advantage compared to optional modular MMU access? I doubt there is much difference if whole MMU and non-MMU modules are swapped out. Most of the MMU instructions occur within a few AmigaOS modules and not by user mode programs. Optional modular MMU support has an advantage that MMU support can be optionally turned off to improve performance for games, real time embedded use, emulation, etc. It provides the best of both worlds.

Standard support is more important where the features are more commonly used. For the CPU, some features are so commonly used that it is better to use the instructions directly without any abstraction for performance reasons. AmigaOS 3 doesn't even have 68020+ integer MUL/DIV instructions as standard to keep from having separate 68000 and 68020 compiles. Also, math IEEE libraries are used instead of FPU instructions. The abstraction makes the performance a fraction of inlined instruction performance and discourages using these features. The A1222 introduced the end of the standard PPC FPU for AmigaOS 4 which is much worse than removing the MMU as standard. The standard FPU instructions went from a major performance advantage to a major bottleneck. The ISA trend has been to make more units and their instructions standard although this limits how low the ISA scales and the reason why the A1222 chose not to use the standard FPU. Which units gain the most benefit from being standard?

1. CPU (integer) - CPUs mostly execute integer code so most integer instructions should be standard
2. FPU - a FPU standard is good for all but the lowest embedded use
3. SIMD unit - debatable as standard, poor scaling, resource hog for limited use, autovectorization challenging
4. MMU - commonly standard but instructions not used often and are grouped together

Which units are standard depends on how low the ISA should scale. I see the optional modular MMU support in the 68k AmigaOS as an advantage even though it is unusual. At the same time, improved and easier MMU support in the AmigaOS would be beneficial (e.g. MMU protection of code hunks with a flag).

Hypex Quote:

That aside, I think for 68K, an AmigaOS 4 that would be original would have been best. That is the AmigaOS 4 Commodore would have been developing. Though, by that stage, Commodore were moving away from 68K on the roadmap. So AmigaOS 4, had it been released, may have not even been on 68K. The talk was about Hombre and PA-RISC, but, from what I've read from people online the PA-RISC may not have been the replacement CPU we all were lead to think. I've read it would have been used as a GPU, so not as a direct 68K replacement, and another CPU would be driving it. Which possibly could have been a PPC.

CBM was leaving their options open. Hombre may have been used for the following according to the CBM post bankruptcy docs.

1. Games Console/Interactive Multimedia Player
2. Home Computers
3. Multimedia Video/Audio adapter for PCs
4. Audio/Video Subsystem for Next Generation Amiga
5. Interactive TV Set Top Boxes and Other Embedded Applications

Option #3 and #4 likely would be as 3D GPUs with audio capabilities. Option #4 is likely shown in another slide where an Hombre core and Amiga core are together.

Hombre
RISC 3D (2 chips)
135MHz 64-bit
PA-RISC processor

Single Chip Amiga (1 chip, includes CPU)
57MHz 32-bit (upgrade of AA+ chipset)
68k processor

This would be a 68k Amiga with Hombre 3D GPU. I expect the AmigaOS would remain on the Amiga while the Hombre GPU would have its own OS, perhaps called HombreOS. It may have been possible to port the AmigaOS to the Hombre GPU so it could be called AmigaOS too. There could have been 2 separate AmigaOS instances at the same time although they would not be able to share code as they use different ISAs. There are other OS options but this is likely what the "Amiga" option would have been judging from the CBM docs.

Hypex Quote:

I'd say the PiStorm would be the closest to answer that. It is a new board for old Amigas. Without new chips, there cannot be new Amigas, since any new redesigns rely on scrapping the Amiga chips. It almost takes a backstep by backporting to 68000. And all those CPU specific compiles of big software almost made redundant. Since backward compatibility with a stock A500 is in demand. Perhaps this is what also held the Amiga back and not just Commodore. Amiga users were reluctant to upgrade. Now sure, you could really only upgrade CPU and memory, if it was supported. But the wedge design meant it was only possible to update and Amiga by replacing the Amiga. I don't know if Apple or MS were held to the same standards, needing all MacOS to run back to the 68000 and all Windows to be compatible with 80286.

I expect most 68k Amiga users would welcome new 68k Amiga hardware that is compatible, a worthy spiritual successor and affordable. The PiStorm is compatible and affordable without being a spiritual successor and sees some success. The AC/Vamp hardware is closer to a spiritual successor and is compatible but not affordable enough. I don't expect 68060 availability and prices to drop like they would for a spiritual successor that is also compatible and affordable. A 68k Amiga SoC ASIC and small SBC could be a spiritual successor, compatible, affordable and higher performance than any of the other Amiga hardware solutions. There may be a few die hards that would prefer to use original hardware but they would likely buy new affordable hardware too. The old hardware can break down after all which is part of the problem including with the PiStorm which relies on it.

Status: Offline

kolla

Re: some words on senseless attacks on ppc hardware
Posted on 30-Jun-2024 5:26:34

[ #1273 ]

Elite Member

Joined: 21-Aug-2003
Posts: 3072
From: Trondheim, Norway

@Hypex

Quote:

I've heard of the Window NT replacement but that is not an AmigaOS 4.

There were no AmigaOS 4 plans, that idea came much later.

Quote:
Since it isn't an Amiga OS. It wouldn't even qualify as the x86 Amiga OS4 Amiga Inc proposed.

You mean QNX Neutrino? Or TAO?

Quote:
However, this looks to be no more than a popular rumour.

A rather widespoken rumour, from multiple sources within CBM.

Quote:
One obvious technical issue is that the HP-RISC is big endian while Windows NT is little endian.

Thatâ€™s only obvious in hindsight. There are no technical reasons for WinNT to not at all work on big-endian. HP was working with Microsoft in porting Windows NT to PA-RISC, and it was even demonstrated around the time CBM went bust. PA-RISC was also likely to go bi-endian like its peers at the time. WinNT for Sparc was in a similar state, port was announced but never materialized. And then things like Itanium showed upâ€¦

Quote:
It is like AmigaOS in one respect; it is not portable, it is tied to one endian.

Windows NT ran on just about all relevant CPU archs, the list of contemporary archs for which NT 4.0 didnâ€™t exist or for which there never was an effort for a port, is rather short.

The major obstacle for any non-x86 WinNT was updates and native applications, and in case of any big-endian WinNT, exchange of binary data/files with little-endian systems.

Quote:
Another is that it's mentioned how Windows NT was planned to run on Hombre. But not that it was to be exclusive.

What do you mean exclusive?

Quote:
Amiga people are not skin deep surface dwellers, the OS mattered.

CBM was not lead by "Amiga people", and even the Amiga people at CBM were perfectly and painfully aware of the shortcommings of the OS.

Quote:
Except for Amiga gamers. Replacing the OS with any kind of Windows would have both insulted and destroyed the majority of the user base.

The vast majority of the user base were gamers. The professional Amiga portfolio and its users were already migrating to Windows and NTâ€¦ so who are you talking about? Us, the zealots, but we were never a majority of the original user base!

Quote:

We know how hard core Amiga people are at Amiga being 68K with that original chipset and nothing else. Replacing that with four layer parallax of 3d and real 16 bit sound was already taking a huge risk. Especially for Amiga gamers.

"Amiga gamers" is a constructed concept. In reality it was more like gamers who at the time happened to have an Amiga as their platform. They quickly moved to PlayStation etc and dropped the Amiga.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

kolla

Re: some words on senseless attacks on ppc hardware
Posted on 30-Jun-2024 5:50:25

[ #1274 ]

Elite Member

Joined: 21-Aug-2003
Posts: 3072
From: Trondheim, Norway

@matthey

Quote:
spiritual successor

What does that even mean? What are the criteria for "spiritual" similarity and who gets to decide those? I always felt "spiritual" similarities between m68k and ARM in that both archs had lives in truly embedded systems (as controller chips) as well as general purpose workstations and everything between.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

Kronos

Re: some words on senseless attacks on ppc hardware
Posted on 30-Jun-2024 6:19:33

[ #1275 ]

Elite Member

Joined: 8-Mar-2003
Posts: 2615
From: Unknown

@Hypex

Quote:

Hypex wrote:

Amiga people are not skin deep surface dwellers, the OS mattered. Except for Amiga gamers. Replacing the OS with any kind of Windows would have both insulted and destroyed the majority of the user base.

"Amiga people" 2024 != "Amiga people" 1994

Gamers would have gone where the best games were.

Video editors would have gone to whomever got non-linear video right first.

Other users had already mostly gone.

Which leaves a small subset of "Intel outside" sticker owning fanboys whining about how "Amiga" doesn't match their hyperspecific definition of Amiga.

Or in short "alles bleibt anders" (everything stays different).

_________________
- We don't need good ideas, we haven't run out on bad ones yet
- blame Canada

Status: Offline

Hypex

Re: some words on senseless attacks on ppc hardware
Posted on 30-Jun-2024 16:20:43

[ #1276 ]

Elite Member

Joined: 6-May-2007
Posts: 11281
From: Greensborough, Australia

@matthey

Quote:
Sure. Even without process isolation, protecting and catching illegal accesses to the zero page, code, unused addresses, etc. are valuable. AmigaOS3 has optional and modular use of the MMU through ThoR's MMU Libraries. An example of how much difference this makes is a Warp3D Avenger (Voodoo) library bug that steps through memory trashing it and crashing the system without active MMU while ThoR's MMU support stopped the crashing and let me find and fix the bug (there is still no official fix for the 68k Warp3D bug from A-Eon). AmigaOS 4 may have a few more MMU features as standard like the Grim Reaper and code protection on by default but this is possible with ThoR's MMU library even though it is not the default on 68k AmigaOS 3. Does requiring a MMU and making it standard have a performance advantage compared to optional modular MMU access? I doubt there is much difference if whole MMU and non-MMU modules are swapped out. Most of the MMU instructions occur within a few AmigaOS modules and not by user mode programs. Optional modular MMU support has an advantage that MMU support can be optionally turned off to improve performance for games, real time embedded use, emulation, etc. It provides the best of both worlds.

I would say no as the OS works the same. The MMU is really used for some speed up tricks, However, the PPC is an MMU driven MPU, as memory is broken up into virtual and physical addresses so the kernel has to address it and internally does make MMU mapping. AmigaOS [4] is somewhat lazy as it doesn't check all input pointers are valid. A related problem is using real pointers instead of passing around handles, but handles would have overhead. So the MMU can catch bad access. The OS can work fast and assume all pointers are valid. On 68K bad access won't take it down unless it's really bad. On OS4 it will usually catch it but then the Grim might just be annoying to see.

It uses the MMU for emulation jumps. So it can try to execute 68K code and trap as the memory block isn't marked as executable. Check the address and if PPC jump native. This removes needing to check all code addresses. On the other side 68K has PPC traps as backdoor hooks to jump back to native. Obviously a pure 68K port wouldn't need these MMU tricks. Which are like a soft CPU context switch. But, on 68K CPU with MMU, it would be good to have some basic protection built in. It may be limited as 68K doesn't divide code and data into marked blocks. 68K hunks are too dynamic for such rigid organisation.

Quote:
Standard support is more important where the features are more commonly used. For the CPU, some features are so commonly used that it is better to use the instructions directly without any abstraction for performance reasons. AmigaOS 3 doesn't even have 68020+ integer MUL/DIV instructions as standard to keep from having separate 68000 and 68020 compiles. Also, math IEEE libraries are used instead of FPU instructions. The abstraction makes the performance a fraction of inlined instruction performance and discourages using these features. The A1222 introduced the end of the standard PPC FPU for AmigaOS 4 which is much worse than removing the MMU as standard. The standard FPU instructions went from a major performance advantage to a major bottleneck. The ISA trend has been to make more units and their instructions standard although this limits how low the ISA scales and the reason why the A1222 chose not to use the standard FPU. Which units gain the most benefit from being standard?

OS3 should support 68020+ through utility.library and patching related functions. Now code would need to use those functions but that is the only upward compatible way I know to make use of 32 bit CPU. As I understand it, the FPU libraries were produced by Mototola for 68000 math. So when you have an FPU, in chip or CPU, those math libraries are depreciated. But, most FPU code had the FPU version. So even an FPU math library wouldn't be as efficient. A lot of this optimised code was done by hand with different binaries. And no fat binaries existed, though the overlay functions in the loader, possibly could have simulated it. There was no system in place to run the best binary in the OS, though libraries could have been used. So one transparent binary needed more work would have been possible.

The PPC did have CPU, MMU and FPU as standard in BookS. User code for a 603 is mostly compatible with a G3. Such as Heretic II. The G4 combining this with SIMD. Then they introduced BookE and messed it up. Well introducing BookE to the Amiga world was more messed up. The Sam 440 at least was CPU and FPU compatible, but less so, with Heretic II randomly crashing. The 1222 was even worse, with a limited edition SPE, merging FPU and SIMD in some incompatible CPU using codes from both FPU and AltiVec. A project ten years and counting now, just being realised, to a price more expensive than ten years ago. If the CPU was cheap, then it caused the whole board to cost a fortune. Writing the LTE code emulator would have cost a small fortune surely. Waste of time writing a PPC emulator for a short lived PPC CPU, when OS4 needed ongoing work. If Trevor has a technical advisor, then he should sue till the pants are on the ground. I think the A1222 should be dropped like an X1000 hotcake! Move on to the next thing.

Quote:
CBM was leaving their options open. Hombre may have been used for the following according to the CBM post bankruptcy docs.

So just like Amiga, CDTV and CD32 in last century.

Quote:
This would be a 68k Amiga with Hombre 3D GPU. I expect the AmigaOS would remain on the Amiga while the Hombre GPU would have its own OS, perhaps called HombreOS. It may have been possible to port the AmigaOS to the Hombre GPU so it could be called AmigaOS too. There could have been 2 separate AmigaOS instances at the same time although they would not be able to share code as they use different ISAs. There are other OS options but this is likely what the "Amiga" option would have been judging from the CBM docs.

Interesting. I'm not sure what 68K. I suppose the 68060 made sense though stocks would soon dry up and they needed it future proof.

Quote:
I expect most 68k Amiga users would welcome new 68k Amiga hardware that is compatible, a worthy spiritual successor and affordable. The PiStorm is compatible and affordable without being a spiritual successor and sees some success. The AC/Vamp hardware is closer to a spiritual successor and is compatible but not affordable enough. I don't expect 68060 availability and prices to drop like they would for a spiritual successor that is also compatible and affordable. A 68k Amiga SoC ASIC and small SBC could be a spiritual successor, compatible, affordable and higher performance than any of the other Amiga hardware solutions. There may be a few die hards that would prefer to use original hardware but they would likely buy new affordable hardware too. The old hardware can break down after all which is part of the problem including with the PiStorm which relies on it.

The Vampire is a good example of reproducing a 68K Amiga. Even if it doesn't use "real" ASIC chips. But, it is more expensive than a standard PC, and is less powerful, so I don't know why it isn't criticised on every Amiga street corner. Modern Amigans like to complain how new Amiga hardware is too expensive and decades behind their Ryzen PC!

The Vampire is a specifically tailored design. But it is worth it? So there is a virtual 64 bit 68K and virtual enhanced AGA chipset. I say virtual as there is no official 64 bit 68K design nor is the SAGA chipset enhancements based on any official register map of AAA or the like. It's about as official as as a Warp3D graphics card and a PPC 604e. Both go beyond an official Amiga. So dedicated software will not be Amiga compatible. Just like OS4 or WOS software is not Amiga compatible. Without the right CPU expansion cards that are not 68K. Is it enough? Will Amiga people want to use it for a virtual expanded modern Amiga from the 2000's? Or will it be a novelty for classic 68K software? Right now I'm starting to see PiStorm figures of CPU beating Vampire. In particular with CM4 compute module. But, unlike the average accelerator, PiStorm is not plug and play. It's still experimental, mine keeps crashing. And there is no Amiga install disk so it relies on a PC or Mac.

But as to spiritual successor, what does this even mean? Computers are digital devices, they have no links to the supernatural. Is there a ghost in the machine? Another guru meditating? A spirit or essence.

Last edited by Hypex on 30-Jun-2024 at 06:06 PM.
Last edited by Hypex on 30-Jun-2024 at 04:22 PM.

Status: Offline

Hypex

Re: some words on senseless attacks on ppc hardware
Posted on 30-Jun-2024 18:02:06

[ #1277 ]

Elite Member

Joined: 6-May-2007
Posts: 11281
From: Greensborough, Australia

@kolla

Quote:
There were no AmigaOS 4 plans, that idea came much later.

It is alluded to, even if unofficially. Check this game out. But, be careful in emulation, somehow it takes down my UAE. Can just read source.

Classic Commodore installer game from 1993. It's practically built around "OS4". You need OS4 to escape Commodore!

http://aminet.net/package/game/misc/InstallerGame

Quote:
You mean QNX Neutrino? Or TAO?

I'd say TAO. Amiga Inc or Amiga Mc had a thing on TAO and OS5 was a plan. First with the AmigaOS4 x86 PC developer OS which would become AmigaOS5 all PC user OS supporting all popular CPUs. I suspect, if they had even brought out an OS4 which turned into an OS5, all other hardware but x86 would be suddenly dropped. TAO didn't even work on WarpOS or 68K despite it said to support PPC. In reality it was x86 only.

Quote:
A rather widespoken rumour, from multiple sources within CBM.

Are there any references for these sources or revealed papers?

Quote:
Thatâ€™s only obvious in hindsight. There are no technical reasons for WinNT to not at all work on big-endian. HP was working with Microsoft in porting Windows NT to PA-RISC, and it was even demonstrated around the time CBM went bust. PA-RISC was also likely to go bi-endian like its peers at the time. WinNT for Sparc was in a similar state, port was announced but never materialized. And then things like Itanium showed upâ€¦

PPC can however give us a real world example. Raymond Chen gives some small details in this blog post:
https://devblogs.microsoft.com/oldnewthing/20180806-00/?p=99425

Quote:
Windows NT ran on just about all relevant CPU archs, the list of contemporary archs for which NT 4.0 didnâ€™t exist or for which there never was an effort for a port, is rather short.

All common ones I found are MIPS, Alpha, Itanium and PPC which are all bi-endian.

Quote:
The major obstacle for any non-x86 WinNT was updates and native applications, and in case of any big-endian WinNT, exchange of binary data/files with little-endian systems.

Common media would be RIFF wave and BMP, But back then there was BE support for LE formats as Mac had to support them. Though RIFF looks like another rip off of IFF like AIFF is.

Quote:
What do you mean exclusive?

I mean as the official OS with a Windows desktop. Contrasted with Amigas being supplied with AmigaOS and a Workbench desktop.

Quote:
CBM was not lead by "Amiga people", and even the Amiga people at CBM were perfectly and painfully aware of the shortcommings of the OS.

No, but the Amiga still came with AmigaOS till the last model. And Commodore had stopped using MS OS in the C128.

Quote:
The vast majority of the user base were gamers. The professional Amiga portfolio and its users were already migrating to Windows and NTâ€¦ so who are you talking about? Us, the zealots, but we were never a majority of the original user base!

I suppose that would be the zealots. The only ones left that survived Commodore. But I thought the Amiga pros shifted to Mac before Windows.

Quote:
"Amiga gamers" is a constructed concept. In reality it was more like gamers who at the time happened to have an Amiga as their platform. They quickly moved to PlayStation etc and dropped the Amiga.

Well in this I case I mean the Amiga zealot gamers. Those who bought a CD32 because it was an Amiga. Those who continued to use their Amigas for games.

The PlayStation was once described as the spiritual descendant of the Amiga, by some Amigans, though I think the Kambrook 3DO has more in common.

Quote:
What does that even mean?

Exactly.

Last edited by Hypex on 01-Jul-2024 at 04:18 AM.
Last edited by Hypex on 30-Jun-2024 at 06:08 PM.

Status: Offline

matthey

Re: some words on senseless attacks on ppc hardware
Posted on 1-Jul-2024 1:35:33

[ #1278 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2150
From: Kansas

kolla Quote:

What does that even mean? What are the criteria for "spiritual" similarity and who gets to decide those? I always felt "spiritual" similarities between m68k and ARM in that both archs had lives in truly embedded systems (as controller chips) as well as general purpose workstations and everything between.

A "spiritual successor" is a successor that is in the spirit of the original. The development philosophy is similar and products are worthy of the original as judged by customers and users. Hardware and businesses can be spiritual successors of original hardware and businesses. For example, the Raspberry Pi Foundation is a spiritual successor of Acorn and the RPi hardware a spiritual successor of the Acorn Archimedes.

Many Amiga users would like to see a spiritual successor to Amiga Corporation that would produce Amiga hardware that is a spiritual successor of original Amiga hardware. A-Eon and Hyperion failed to achieve spiritual successor status with PPC Amiga1 hardware as judged by low acceptance and sales in the Amiga community. Shady, arrogant and greedy, if not criminal, business dealings have made Hyperion more like a CBM successor and it appears A-Eon/Trevor have supported Hyperion in nefarious acts. Michele Battilana created a new Amiga Corporation with the understanding of the business name importance for a spiritual successor but Hyperion has been obstructive and A-Eon passive at best to retain the status quo of a two decade old failed business model. The years roll by as the Amiga window of opportunity passes. There have been projects and hardware that came close to being a spiritual successor like the Boxer, Minimig, Natami, AA3000+, AC/Vamp and THEA500 Mini. Some of the hardware was good enough to be a spiritual successor but they lack(ed) value to be widely accepted by the Amiga community and ex-Amiga community. The two major issues are lack of a competitive 68k CPU without an ASIC and legal issues from divided control of Amiga IP.

Hypex Quote:

I would say no as the OS works the same. The MMU is really used for some speed up tricks, However, the PPC is an MMU driven MPU, as memory is broken up into virtual and physical addresses so the kernel has to address it and internally does make MMU mapping. AmigaOS [4] is somewhat lazy as it doesn't check all input pointers are valid. A related problem is using real pointers instead of passing around handles, but handles would have overhead. So the MMU can catch bad access. The OS can work fast and assume all pointers are valid. On 68K bad access won't take it down unless it's really bad. On OS4 it will usually catch it but then the Grim might just be annoying to see.

Certainly on lower end hardware, virtual addresses give an overall performance decrease due to TLB misses which also commonly have longer penalties. Jitter (worst case latency) is increased on all virtual address using hardware using TLBs. A MMU can likely increase performance in some cases but in most cases I expect overall performance to be better without virtual addressing. PPC CPUs were good at providing MMUs with virtual address support but this may have reduced their ability to scale down to compete with ARM/Thumb. The idea was to use higher level languages and more advanced OSs like Linux since PPC is difficult to hand code at a low level and has too fat of code to scale low anyway. Using pointers directly instead of handles is a performance vs security/stability trade off. Not catching bad accesses with the MMU has a good chance of leading to more bad accesses and likely a crash. I think most Amiga users would like to have the stability increase of a MMU but there are times when the performance and compatibility hit may be too much and it would be nice to be able to turn it off. I believe optional MMU support is possible with a reboot required to turn the MMU on/off.

Hypex Quote:

It uses the MMU for emulation jumps. So it can try to execute 68K code and trap as the memory block isn't marked as executable. Check the address and if PPC jump native. This removes needing to check all code addresses. On the other side 68K has PPC traps as backdoor hooks to jump back to native. Obviously a pure 68K port wouldn't need these MMU tricks. Which are like a soft CPU context switch. But, on 68K CPU with MMU, it would be good to have some basic protection built in. It may be limited as 68K doesn't divide code and data into marked blocks. 68K hunks are too dynamic for such rigid organisation.

ThoR's mmu.library can provide the same support and programs should exit gracefully if MMU support is not available. As important as 68k emulation is to AmigaOS 4, another method would be needed when MMU support is not available. The most likely reason a MMU would not be available in AmigaOS 4 would be because PPC code is being emulated meaning there would be an emulator using emulation which is cringe worthy. Two decades after the 68k to PPC transition, Macs had not only dropped built-in emulation of 68k code but built-in emulation of PPC code after transitioning to x86(-64). If 68k support is still so important, isn't it time to transition back to the 68k to at least solve the emulation in emulation issue? How much more absurd can Amiga Neverland get?

Hypex Quote:

OS3 should support 68020+ through utility.library and patching related functions. Now code would need to use those functions but that is the only upward compatible way I know to make use of 32 bit CPU. As I understand it, the FPU libraries were produced by Mototola for 68000 math. So when you have an FPU, in chip or CPU, those math libraries are depreciated. But, most FPU code had the FPU version. So even an FPU math library wouldn't be as efficient. A lot of this optimised code was done by hand with different binaries. And no fat binaries existed, though the overlay functions in the loader, possibly could have simulated it. There was no system in place to run the best binary in the OS, though libraries could have been used. So one transparent binary needed more work would have been possible.

There are newly released programs still using the 68k IEEE math libraries as it simplifies support if there is light floating point use. The 68k VBCC compiler is one of them although the source code is available to compile with direct FPU support and it is relatively easy due to few dependencies. I would not recommend compiling Quake with the IEEE libraries but it should have less overhead with a FPU than the A1222 executing standard PPC code with FPU instructions. I wouldn't be surprised if it is less buggy too. Of course both will be a slideshow.

Hypex Quote:

The PPC did have CPU, MMU and FPU as standard in BookS. User code for a 603 is mostly compatible with a G3. Such as Heretic II. The G4 combining this with SIMD. Then they introduced BookE and messed it up. Well introducing BookE to the Amiga world was more messed up. The Sam 440 at least was CPU and FPU compatible, but less so, with Heretic II randomly crashing. The 1222 was even worse, with a limited edition SPE, merging FPU and SIMD in some incompatible CPU using codes from both FPU and AltiVec. A project ten years and counting now, just being realised, to a price more expensive than ten years ago. If the CPU was cheap, then it caused the whole board to cost a fortune. Writing the LTE code emulator would have cost a small fortune surely. Waste of time writing a PPC emulator for a short lived PPC CPU, when OS4 needed ongoing work. If Trevor has a technical advisor, then he should sue till the pants are on the ground. I think the A1222 should be dropped like an X1000 hotcake! Move on to the next thing.

BookE(mbedded). I doubt it is possible to sue someone for bad advice unless there is a safety or qualification problem. The leadership should take the blame.

Hypex Quote:

So just like Amiga, CDTV and CD32 in last century.

Interesting. I'm not sure what 68K. I suppose the 68060 made sense though stocks would soon dry up and they needed it future proof.

The time frame was 1995 so I doubt it would have been a 68060 yet. The 68k was still in production and even number one for 32 bit embedded use long after it disappeared from the desktop.

https://websrv.cecs.uci.edu/~papers/mpr/MPR/19980126/120102.pdf Quote:

Motorola Stays on Top; Other Players Double

Worldwide volume in 32-bit embedded microprocessors surpassed 180 million units, as Figure 1 shows. Of that total, three architecturesâ€”68K, MIPS, and SuperHâ€”accounted for 80% of shipments in 1997. All the top vendors maintained their relative positions, although in some cases the gaps between players narrowed.

ARM, MIPS, and PowerPC won the biggest advances in terms of multiples. The first two more than doubled from 1996 to 1997, growing by 129% and 138%, respectively. MIPS also enjoyed the biggest unit increase, shipping 24.8 million more chips and CPU cores than it did the previous year. PowerPC is still in startup mode, multiplying from half a million in 1996 to about 3.9 million in 1997.

Motorolaâ€™s 79.3 million units put it on top, as usual. Its 68K line has been the embedded 32-bit volume leader since it created the category. As the figure shows, sales of 68K chips were about equal to worldwide sales of PCs. Taken together, thatâ€™s one new 32-bit microprocessor for every man, woman, and child living in the United States.

CBM was not in danger of losing 68k chips any time soon. They still had a plan to create a single chip 68k Amiga SoC at 57MHz with the diagram targeting early 1995. This would have been through a license for 68k CPUs. It could have been a 68EC020 or 68EC030 core even though the clock speed is given as 57MHz because newer silicon could be used and better integration allows to increase the clock speed of the system due to shorter distances for the electricity to travel. It could have been a 68040V core but this was likely too new and expensive. It could have been a custom core as changing things like cache sizes could be done. Sony customized the caches in the PS1 MIPS CPU for example. Surprisingly, from the article above and thinking of the success of the PS1, it wasn't MIPS that finally surpassed the 68k for 32 bit embedded use but SuperH and then ARM. Hitachi's SuperH had good code density with tech borrowed from the 68k while they were a 2nd source 68k producer. They licensed the tech to ARM who created the good code density Thumb and Thumb2 ISAs allowing them to become number one in embedded in combination with their prolific licensing. Motorola was not much better than CBM at licensing, especially after their lawsuits with Hitachi.

Hypex Quote:

The Vampire is a good example of reproducing a 68K Amiga. Even if it doesn't use "real" ASIC chips. But, it is more expensive than a standard PC, and is less powerful, so I don't know why it isn't criticised on every Amiga street corner. Modern Amigans like to complain how new Amiga hardware is too expensive and decades behind their Ryzen PC!

The FPGA development is how CPU and chip development is done. The design could be turned into a low production cost ASIC (mass produced 1-2GHz 68k Amiga SoC for maybe $1 USD/chip). It is obvious that the value is not competitive in FPGA which is why I tried to convince Gunnar to plan for an ASIC including ISAs. I even tried to find the help to make it possible but Gunnar is oblivious, uncompromising and unprofessional which is not true of Thomas and Jens in my experience. I doubt anyone outside of the Amiga community would be interested enough in the SoC as is to help fund an ASIC and it is unlikely on his terms. It's too bad. SAGA looks pretty good and the 68k CPU has good performance and compatibility despite weirdness. Embedded market customers aren't going to want all the low utilization non-orthogonal registers and if someone wants a SIMD unit they are unlikely to want a 64 bit SIMD unit with no floating point support that can't easily be upgraded.

Hypex Quote:

The Vampire is a specifically tailored design. But it is worth it? So there is a virtual 64 bit 68K and virtual enhanced AGA chipset. I say virtual as there is no official 64 bit 68K design nor is the SAGA chipset enhancements based on any official register map of AAA or the like. It's about as official as as a Warp3D graphics card and a PPC 604e. Both go beyond an official Amiga. So dedicated software will not be Amiga compatible. Just like OS4 or WOS software is not Amiga compatible. Without the right CPU expansion cards that are not 68K. Is it enough? Will Amiga people want to use it for a virtual expanded modern Amiga from the 2000's? Or will it be a novelty for classic 68K software? Right now I'm starting to see PiStorm figures of CPU beating Vampire. In particular with CM4 compute module. But, unlike the average accelerator, PiStorm is not plug and play. It's still experimental, mine keeps crashing. And there is no Amiga install disk so it relies on a PC or Mac.

SAGA more closely resembles the AA+ chipset which was more practical than AAA. Some of the features are common between AAA and AA+. Chunky modes, multiple playfields and display layers, and audio upgrades are the biggest upgrades. There are minor upgrades like doubling the number of sprites and adding some other features that were sorely missing in the Amiga chipset. Some things are just natural extensions like increasing the amount of chip memory. There are at least 3 AGA compatible FPGA re-implementations so there are plenty of choices. It would be nice if other developers were able to provide input and create standards but Gunnar likes development closed so he can do what he wants. I miss the Natami days with more open development instead of the Cult of the Vampire with a single dictator.

Status: Offline

Hypex

Re: some words on senseless attacks on ppc hardware
Posted on 1-Jul-2024 4:56:57

[ #1279 ]

Elite Member

Joined: 6-May-2007
Posts: 11281
From: Greensborough, Australia

@Kronos

Quote:
"Amiga people" 2024 != "Amiga people" 1994

Except for some newbies, they'd be the same people, just older.

Quote:
Gamers would have gone where the best games were.

Except now days most of the gamers are A500 fan boys and think AGA sucks! The Amiga left with 3d games, both native examples like Genetic, and extreme turbo examples like WipEout. Now the modern Amiga gamer wants to scrap all that and go back to his A500!

Quote:
Video editors would have gone to whomever got non-linear video right first.

Casablanca? Draco? Mac as recommended by Gordon Harwoods ?

Quote:
Other users had already mostly gone.

I find some random comments by them when I am looking for Amiga stuff online.

Quote:
Which leaves a small subset of "Intel outside" sticker owning fanboys whining about how "Amiga" doesn't match their hyperspecific definition of Amiga.

Well, now days, perhaps ironically, the "Intel outside" fellows now use Intel or AMD if they still subscribe to "Intel outside" ideals. It's somewhat still around as some people prefer AMD. Now it's all x86/64 these days so it doesn't really matter. Or ARM if you like Macs. But I must say if I had a choice I'd go AMD as well. I suppose using a Radeon in my OS4 machines all those years has given me some bias.

Other Amiga people are rigid about Amiga being 68K and only one chipset. No expansions allowed. That would rule the Vampire out. And all other expansions. RTG and sound cards should fail to meet standards here as well.

The rest would highlight how expensive Amiga expansions are. You can't criticise an AmigaOne being expensive then, go on about how a PC is way cheaper, then buy a ZZ9000 for an A4000. It's common for the average "Intel Inside"Amigan or ex-Amigan to point out how Amiga expansions or whole AmigaOne machines are too expensive to be realistic. But, at the same time, being realistic the same criticism needs to be levelled at a Mediator as well. Did the same people criticise the Mediator for being too expensive because they can plug the same cards into a PC for free? Did the same people criticise the PPC expansions because they could just buy a Mac for same price? Do the sane people criticise expensive CPU and GPU Zorro cards in the modern age? That's what I wonder.

Last edited by Hypex on 01-Jul-2024 at 05:01 AM.

Status: Offline

agami

Re: some words on senseless attacks on ppc hardware
Posted on 1-Jul-2024 4:58:54

[ #1280 ]

Super Member

Joined: 30-Jun-2008
Posts: 1718
From: Melbourne, Australia

@Hypex

Quote:
Hypex wrote:

The Vampire is a good example of reproducing a 68K Amiga... But, it is more expensive than a standard PC, and is less powerful, so I don't know why it isn't criticised on every Amiga street corner.

Because, value.
Putting the SA V4 aside, most of the V2 and V4 SKUs are packaged as accelerator upgrades for existing Amiga's. In a market where such upgrades are rare and expensive, be they 68k or PPC, the Apollo boards represent excellent value for enhancing the abilities of Amigas and extending their usefulness.

When I think about what I spent on BlizzardPPC + Bvision, the Apollo IceDrake V4 is an absolute bargain. Plus I don't have to put the A1200 in a tower.

Would I prefer for it to be cheaper? Of course. Given the R&D costs, and low volume, and Apollo team's development goals, it's easy to see and accept the current retail pricing as practical and symbiotic.

Whereas with A-EON's AmigaOS 4 (PPC) hardware, it's not the sticker price itself that gives me pause. It's the overall lack of value.
Their products never made sense to me, which is also why I never thought that any attack on PPC hardware since 2010 has ever been senseless.

_________________
All the way, with 68k

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle