Click Here
home features news forums classifieds faqs links search
6071 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
26 crawler(s) on-line.
 96 guest(s) on-line.
 0 member(s) on-line.



You are an anonymous user.
Register Now!
 JKD:  15 mins ago
 matthey:  17 mins ago
 davidf215:  37 mins ago
 amigakit:  44 mins ago
 pixie:  1 hr 34 mins ago
 michalsc:  1 hr 35 mins ago
 Karlos:  1 hr 39 mins ago
 Rob:  1 hr 42 mins ago
 Dragster:  2 hrs 15 mins ago
 pavlor:  2 hrs 23 mins ago

/  Forum Index
   /  Classic Amiga Hardware
      /  One major reason why Motorola and 68k failed...
Register To Post

Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 Next Page )
PosterThread
Hammer 
Re: One major reason why Motorola and 68k failed...
Posted on 14-Jun-2024 3:35:07
#201 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5859
From: Australia

@Matt3k

CD32's FMV module has the following:

1. 24-bit DAC (STM's STV8438CV) for 16.7 million colors display.

2. MPEG-1 decoder from C-Cube CL450 SoC.

https://websrv.cecs.uci.edu/~papers/mpr/MPR/ARTICLES/060803.PDF

CL450 has about 398K transistors with up to 40 MHz. CL450 includes a licensed MIPS-X RISC processor with semi-custom extensions. In quantities of 100K or more per year, the price is less than $50 in 1992.

CL450's MIPS-X RISC processor still has the usual RISC instruction set.

You're looking at 40 million instructions per second (MIPS) RISC-based CPU i.e. it's like parts of PS1's CPU or Rendition Verite V1000's MIPS like CPU @ 25 Mhz.

3. LSI l64111qc (Digital Audio Decoder, 16-bit DAC),

4. 512 KB local RAM, NEC 423260 DRAM 4Mbit (512 KB) with 80 ns.

5. Lattice ispLSI 1024-60LJ CPLD.

MIPS CPU @ 33 Mhz blows away 68030 @ 50 Mhz. On small CPUs, the RISC CPU has the edge on arithmetic intensity against mostly ROM'ed microcode CISC competitors.

Intel used X86's economies of scale and advanced process node to beat MIPS dominated Advanced Computing Environment (ACE) consortium. This Intel tactic wouldn't work on ARM since it has reached large-scale economies of scale and advanced process nodes. ARM and Qualcomm have a safe market segment with smartphones.

Since 68020/68030 has a fast hardware barrel shifter, Motorola didn't modify a few math instructions as direct hardware implementation for the 68030 size CPU and create a free C++ compiler with a bias towards these very fast math instructions. The solution is between the extremes from mostly ROM'ed microcode 68030 and 68040's mostly hardware implementation.

68EC035 with a few fast ADD and MUL instructions at a low price would be nice.

68EC035 would have fast and slow instruction paths.

68885 FPU would fast FP32 subset and FP64's performance would remain as is. 68885 FPU would have fast FP32 and slow FP64 instruction paths. 68060's fast FP64 is needed for non-gaming applications. The product segmentation tactics are similar to PC's GpGPU tactics i.e.
1. cheap and fast IEEE FP32 and slow FP64,
2. expensive and fast IEEE FP64,

The gamers gets cheapo and fast 32-bit compute. The purpose is to maintain the consumer desktop computer install base and counter RISC's arithmetic intensity advantage at a comparable price.

Not factoring the Amiga, no other post-16 bit games bias platforms place their compute strength on Motorola's 68K e.g. Sega rejected 68030 and selected SuperH2, 3DO selected ARM60, Sony selected MIPS and Nintendo selected MIPS. Capcom replaced 68K CPS2 with SuperH2 CPS3.

Last edited by Hammer on 14-Jun-2024 at 05:18 PM.
Last edited by Hammer on 14-Jun-2024 at 05:08 AM.
Last edited by Hammer on 14-Jun-2024 at 05:01 AM.
Last edited by Hammer on 14-Jun-2024 at 03:39 AM.
Last edited by Hammer on 14-Jun-2024 at 03:37 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
matthey 
Re: One major reason why Motorola and 68k failed...
Posted on 14-Jun-2024 22:30:20
#202 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2270
From: Kansas

Hammer Quote:

For Emu68, the 68K's performance is dictated by the JIT 68k-to-ARM compiler's and host CPU core's quality. The compiler's optimization occurs with JIT.


Right. There are no compiler targets for emulation even with sales of hundreds of thousands of hardware units like with THEA500 Mini. The few remaining developers are likely to use decades old compilers like SAS/C and GCC 2.95.3 that lack modern functionality because modern 68k compiler support is so bad.

Hammer Quote:

Pushing without the means (money) is futile. Mainstream venture capitalists are addicted to AI hype. It's either a "national security" or an AI issue.

68K advocates need to attach AI hype to attract venture capitalists.


I was talking to potential embedded business partners including one that could help develop and produce an ASIC and one that could improve economies of scale with mass produced IoT devices. Would Acorn/ARM have been successful without the help of VLSI Technology to produce the silicon and aggressive embedded market licensing to improve economies of scale? Did business partners make it possible for ARM to produce their own CPU on a shoe string budget?

I was exploring opportunities which were very limited considering the lack of a business. People had different visions and one important person was inflexible which caused me to abort before attempting to raise money. Is it more important to have a business plan before raising money or raise money before a business plan? Can there even be a business plan without a business?

Hammer Quote:

Flawed argument. You're using an old Photoshop CS6.

Adobe Photoshop CS6 was released in May 2012. Intel Sandy Bridge with AVX1 was released on January 9, 2011.


How does an old version of Photoshop affect the argument? Do you think newer versions of Photoshop using newer SIMD ISAs with larger SIMD instructions are going to have better code density? Do you think newer versions of Photoshop are compiled for both x86-64 and x86 so the code can be compared?

Hammer Quote:

Performance = IPC x clock speed.


Some instructions do more work than others. CISC memory instructions are often the equivalent of 2 RISC memory instructions. Early RISC ISAs had "reduced instruction sets" requiring more instructions for many algorithms. Some early RISC propaganda used the VAX MIPS benchmark which was just the number of instructions executed.

Hammer Quote:

68000 has it's four-cycle memory access. 68000 is good for hosting 32-bit OS.


The 68k CPUs were designed for good performance using low cost memory. Many RISC CPUs required expensive memory.

http://marc.retronik.fr/motorola/68K/68000/High-Performance_Internal_Product_Portfolio_Overview_with_Mask_Revision_[MOTOROLA_1995_112p].pdf Quote:

LSI Logic 33000 (Cut-down R3000 core)
Competition's Disadvantages (compared to EC030)
o Poorer DRAM performance
o Not cost effective
o Inferior development tools

LSI Logic 33020 (Cut-down R3000 core)
Competition's Disadvantages (compared to EC040)
o Poor DRAM performance
o Not general purpose
o Inferior development tools

Intel 960KA—Around 68EC030 performance at roughly same price
Weaknesses: Multiplexed address and data buses. No data cache. Performance very susceptible to wait states. Interrupt latency poor—Intel quote typically 1 ms at 33 MHz.

AMD29000—Performance lies between 68EC030 and 68EC040 levels
Weaknesses: Lower performance with DRAMs in burst-mode and much more susceptible to wait states. Large register sets—not well suited to multi-tasking.

Intel 960CA/F: Marketed as a RISC high-end solution
Weaknesses: High-Power consumption and less performance RISC machine intolerant of wait states, requires expensive high speed SRAM. Poor IDT 3051/52, 3081/3082: Aggressive pricing, high performance, surface mount
Weaknesses: Multiplexed bus requires external components. RISC machine intolerant of wait states, requires expensive high-speed SRAM. High-power consumption

AMD29030/35—Aggressive pricing, 4K/8K instruction cache
Weaknesses: No data cache. Very high bus usage. No support for multiprocessor system. RISC machine intolerant of wait states, requires expensive high-speed SRAM. High-power consumption

IDT 3051/52: Aggressive pricing, high performance, surface mount
Weaknesses: Multiplexed bus requires external components. RISC machine intolerant of wait states. Limited development tool support compared to 68K.


RISC CPUs needed more caches and memory in addition to often needing more expensive memory. Many RISC systems were high cost systems. Compare the whole system cost.

Hammer Quote:

Intel i860 beats 68040 e.g. 68040 is not used for SGI RealityEngine's geometry engine.

3DO's ARM60 @ 12.5 Mhz Doom performance shows A1200's 68EC020 @ 14 Mhz with Fast RAM doesn't have ARM60's arithmetic intensity.


The i860 was not as good of a general purpose CPU as the 68040. Like DSPs, it is not peak or theoretical performance that is important but consistent performance. Small amounts of specialized code can sometimes be optimized to avoid stalls with VLIW CPUs and DSPs but compilers have difficulty producing general purpose code.

I wonder how true the rumor is that ARM Archimedes needing high cost memory is what sunk ACORN. Acorn Archimedes computers were generally more expensive than 68k Amiga computers but there were more factors than memory cost. It was cheaper to boost performance with the chipset rather than with more expensive CPUs or memory back then. CBM failed when the the Amiga chipset was not upgraded adequately.

Hammer Quote:

Dhrystone high
Motorola SYS1147's 68030 @ 20 Mhz = 6,334
Motorola SYS3600's 68030 @ 25 Mhz = 8,826

Compaq 386/20's 80386 @ 20 Mhz = 9,436
Compaq 386/25's 80386 @ 25 Mhz = 10,617

MIPS (RISC) R2000 @ 15 Mhz = 25,000

RISC threat is real.


Those are really low Dhrystone numbers for the 68030. They can be converted to DMIPS by dividing by 1757.

68030@20MHz 3.6DMIPS 0.18DMIPS/MHz
68030@25MHz 5.0DMIPS 0.20DMIPS/MHz

Motorola's official DMIPS/MHz are about twice the benchmark numbers you found.

68030@50MHz 17.9DMIPS 0.36DMIPS/MHz

This official number is from Motorola's High Performance Internal Product Portfolio.

http://marc.retronik.fr/motorola/68K/68000/High-Performance_Internal_Product_Portfolio_Overview_with_Mask_Revision_[MOTOROLA_1995_112p].pdf

R2000@15MHz 14.2DMIPS 0.95DMIPS/MHz (from your numbers)

The MIPS R2000 MPU used a R2010 floating-point accelerator chip and four R2020 write buffer chips. It had a large external SRAM cache. This was a high dollar chipset originally sold as a fully populated board. Integration made later MIPS systems more practical but the R2000 barely competed with the 68030. It was not uncommon for RISC CPUs to use multiple chips. Even Motorola's own 88k was a multi-chip chipset. The 68030@50MHz had competitive performance with the R2000@15MHz but they were very different in many ways. MIPS was a threat to the 68030 because it had better performance/MHz and could be clocked up. The 68k needed a longer pipeline and less microcoding which came late and not great with the 68040.

Hammer Quote:

MIPS CPU @ 33 Mhz blows away 68030 @ 50 Mhz. On small CPUs, the RISC CPU has the edge on arithmetic intensity against mostly ROM'ed microcode CISC competitors.


Now you are likely talking about the 1988 R3000 which still was a chipset (external MMU chip and FPU chip) and external cache. There were some embedded derivatives but they may not have the performance as some or all of the caches may have been removed. MIPS likely had a performance advantage on the high end with large SRAM caches and fast memory but the 68030@50MHz was not blown away for more affordable embedded and gaming hardware.

Hammer Quote:

Intel used X86's economies of scale and advanced process node to beat MIPS dominated Advanced Computing Environment (ACE) consortium. This Intel tactic wouldn't work on ARM since it has reached large-scale economies of scale and advanced process nodes. ARM and Qualcomm have a safe market segment with smartphones.


x86 beefed up and then bloated up in pursuit of performance which left an opening for lower power ISAs. ARM replaced MIPS because they adopted Thumb ISAs for embedded use. Thumb ISAs were licensed from Hitachi SuperH which was a 2nd source supplier of the 68000 and SuperH is obviously based on the 68000. ARM thinks AArch64 has good enough code density that they can abandon 68k like Thumb code density and I suppose they can get away with it if nobody challenges them.

Hammer Quote:

68885 FPU would fast FP32 subset and FP64's performance would remain as is. 68885 FPU would have fast FP32 and slow FP64 instruction paths. 68060's fast FP64 is needed for non-gaming applications. The product segmentation tactics are similar to PC's GpGPU tactics i.e.
1. cheap and fast IEEE FP32 and slow FP64,
2. expensive and fast IEEE FP64,

The gamers gets cheapo and fast 32-bit compute. The purpose is to maintain the consumer desktop computer install base and counter RISC's arithmetic intensity advantage at a comparable price.


For FPU instructions, there is not much difference in execution latency between IEEE single, double and extended precision instructions except for division.

P.S. Please slow down on your posts. Quality over quantity is better.

Last edited by matthey on 15-Jun-2024 at 11:49 AM.
Last edited by matthey on 15-Jun-2024 at 11:36 AM.

 Status: Offline
Profile     Report this post  
Hammer 
Re: One major reason why Motorola and 68k failed...
Posted on 16-Jun-2024 8:06:15
#203 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5859
From: Australia

@matthey

Quote:
Right. There are no compiler targets for emulation even with sales of hundreds of thousands of hardware units like with THEA500 Mini. The few remaining developers are likely to use decades old compilers like SAS/C and GCC 2.95.3 that lack modern functionality because modern 68k compiler support is so bad.

I'm looking at Bebbo's amiga-gcc.


Quote:

I was talking to potential embedded business partners including one that could help develop and produce an ASIC and one that could improve economies of scale with mass produced IoT devices. Would Acorn/ARM have been successful without the help of VLSI Technology to produce the silicon and aggressive embedded market licensing to improve economies of scale? Did business partners make it possible for ARM to produce their own CPU on a shoe string budget?

VLSI Technology Inc. has aided ARM.

Quote:

I was exploring opportunities which were very limited considering the lack of a business. People had different visions and one important person was inflexible which caused me to abort before attempting to raise money. Is it more important to have a business plan before raising money or raise money before a business plan? Can there even be a business plan without a business?

Your 68K advocacy is your business.

Quote:

How does an old version of Photoshop affect the argument? Do you think newer versions of Photoshop using newer SIMD ISAs with larger SIMD instructions are going to have better code density? Do you think newer versions of Photoshop are compiled for both x86-64 and x86 so the code can be compared?

Vector optimization is not automatic despite auto vector compiler claims.

Photoshop CS6 is not a bastion of vector examples.

Quote:

The i860 was not as good of a general purpose CPU as the 68040. Like DSPs, it is not peak or theoretical performance that is important but consistent performance. Small amounts of specialized code can sometimes be optimized to avoid stalls with VLIW CPUs and DSPs but compilers have difficulty producing general purpose code.

For SGI's Onyx and Crimson workstations, the MIPS CPU handles the host OS and general-purpose code while the array of i860 handles geometry processing.

NVIDIA's Project Denver is based on VLIW which targeted ARMv8 ISA. Denver was followed by Carmel (including in Xavier) which is improved Denver.

https://www.phoronix.com/review/nvidia-carmel-quick/2

After Carmel, NVIDIA switched to ARM Cortex-A78AE with Orin SoC.

Quote:

I wonder how true the rumor is that ARM Archimedes needing high cost memory is what sunk ACORN. Acorn Archimedes computers were generally more expensive than 68k Amiga computers but there were more factors than memory cost. It was cheaper to boost performance with the chipset rather than with more expensive CPUs or memory back then. CBM failed when the the Amiga chipset was not upgraded adequately.

Amiga OCS addressed 68000's weak IPC for the 2D multimedia, but the Amiga chipset didn't address texture-mapped 3D. Amiga OCS was designed to saturate (use) a 3.5 Mhz (260 ns read/write cycle) 16-bit memory bus.

68K didn't help with texture-mapped 3D since it has inferior math compute power per dollar when compared to the competition.

Quote:
Those are really low Dhrystone numbers for the 68030. They can be converted to DMIPS by dividing by 1757.

68030@20MHz 3.6DMIPS 0.18DMIPS/MHz
68030@25MHz 5.0DMIPS 0.20DMIPS/MHz

Motorola's official DMIPS/MHz are about twice the benchmark numbers you found.

68030@50MHz 17.9DMIPS 0.36DMIPS/MHz

Your Motorola claims "17.9 MIPS at 50 MHz". Where's Dhrystones qualification?

From https://www.nxp.com/docs/en/data-sheet/MC68EC030TS.pdf
for MC68EC030
Quote:

25- and 40-MHz Operating Frequency (up to 9.2 MIPS)


You asserted "Dhrystones" with Motorola's "17.9 MIPS at 50 MHz" claim. Prove Motorola's "17.9 MIPS at 50 MHz" is the Dhrystone benchmark.

It's well known that 68030's MUL integer instructions are slow.

https://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/68030it.HTML
68030's instruction cycle times which shows are very slow i.e.
MUL has 28 (word) and 44 (long) cycles.
DIV has 56 and 78 cycles.
No native byte size MUL and DIV operations.

https://www.digchip.com/datasheets/download_datasheet.php?id=59667&part-number=386DX
386DX instruction cycle times
MUL has 20 (byte), 30(word), and 44(doubleword) cycles.
DIV(unsigned) has 17, 25 and 41 cycles.
IDIV(signed) has 22, 30, and 46 cycles.

386 has faster byte size MUL and DIV.

Pentium FDIV is completed with 16 cycles and with limited out-of-order processing capability.

https://www.nxp.com/docs/en/data-sheet/MC68060UM.pdf
68060 FDIV has 37 to 38 cycles.

I picked on MUL and DIV instructions for a reason. The RISC or DSP competition has faster MUL instructions.

http://www.bitsavers.org/components/motorola/68000/CPU32_Reference_Manual_Aug90.pdf
CPU32's MUL (26 for word, 52 for long) and DIV cycle times are 68030-like.

There's a product segmentation for fast MUL via Motorola DSP SKUs. If you want a fast ADD and MUL, buy a $108 68EC040 CPU.

Last edited by Hammer on 16-Jun-2024 at 10:03 AM.
Last edited by Hammer on 16-Jun-2024 at 09:04 AM.
Last edited by Hammer on 16-Jun-2024 at 09:03 AM.
Last edited by Hammer on 16-Jun-2024 at 09:00 AM.
Last edited by Hammer on 16-Jun-2024 at 08:58 AM.
Last edited by Hammer on 16-Jun-2024 at 08:53 AM.
Last edited by Hammer on 16-Jun-2024 at 08:07 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
matthey 
Re: One major reason why Motorola and 68k failed...
Posted on 16-Jun-2024 20:44:00
#204 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2270
From: Kansas

Hammer Quote:

I'm looking at Bebbo's amiga-gcc.


Bebbo's solo 68k Amiga GCC effort is good but does it surpass the team efforts of the earlier "Geek Gadgets" unofficial GCC or VBCC?

Hammer Quote:

Vector optimization is not automatic despite auto vector compiler claims.

Photoshop CS6 is not a bastion of vector examples.


I expect Photoshop has more SIMD code and more optimized SIMD code than average.

Hammer Quote:

Amiga OCS addressed 68000's weak IPC for the 2D multimedia, but the Amiga chipset didn't address texture-mapped 3D. Amiga OCS was designed to saturate (use) a 3.5 Mhz (260 ns read/write cycle) 16-bit memory bus.

68K didn't help with texture-mapped 3D since it has inferior math compute power per dollar when compared to the competition.


How did the more expensive competition handle 2D and 3D graphics?

Year | PC Model | CPU | Graphics | Price]
1984 Macintosh (128K) 68000@7.8MHz black-and-white US$2,495
1984 PC AT 80286@6MHz MDA|CGA|EGA/ISA US$4000-$6000
1985 Amiga 1000 68000@7.2MHz OCS US$1,285

Hammer Quote:

Your Motorola claims "17.9 MIPS at 50 MHz". Where's Dhrystones qualification?


Motorola generally backed up their numbers. Authors of news articles could and sometimes would ask for more specific benchmark data including the compiler used and sometimes would publish some of the details. For example, the Diab compiler was used to achieve 68060 DMIPS results. I don't know if individual customers would receive specific data if they contacted Motorola/Freescale. Intel did not respond to at least one request from a publication author to give specifics of benchmark results they used to attack the 68k with a propaganda campaign. The author published the lack of response from Intel.

Hammer Quote:

You asserted "Dhrystones" with Motorola's "17.9 MIPS at 50 MHz" claim. Prove Motorola's "17.9 MIPS at 50 MHz" is the Dhrystone benchmark.


VAX MIPS didn't help a CISC CPU like a RISC CPU so there was no reason to use them. There may be some DMIPS results from before the DMIPS 2.1 standard but they are not common. DMIPS results will vary depending on compiler and memory performance. It is acceptable to choose the best compiler results which may improve over time. Memory performance should be taken from existing hardware. Some of the modern retro 68030 accelerators may have better DMIPS/MHz performance. It is possible that SRAM with no wait states could be used and they are for MCUs which are designed to execute code from a limited amount of SRAM. MCUs can have surprisingly good DMIPS/MHz results because of this, even simple 6502 MCUs.

Hammer Quote:

It's well known that 68030's MUL integer instructions are slow.

https://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/68030it.HTML
68030's instruction cycle times which shows are very slow i.e.
MUL has 28 (word) and 44 (long) cycles.
DIV has 56 and 78 cycles.
No native byte size MUL and DIV operations.

https://www.digchip.com/datasheets/download_datasheet.php?id=59667&part-number=386DX
386DX instruction cycle times
MUL has 20 (byte), 30(word), and 44(doubleword) cycles.
DIV(unsigned) has 17, 25 and 41 cycles.
IDIV(signed) has 22, 30, and 46 cycles.

386 has faster byte size MUL and DIV.


Byte sized integer MUL and DIV can use a table lookup which is what most 8/16 bit CPUs used. Not many modern 32/64 bit CPUs have byte sized MUL and DIV either. A 32x32 MUL was already down to 2 cycle latency for the 68060 and I believe a single cycle latency is possible using a modern chip fab process. Byte sized MUL and DIV were already minor legacy baggage back in the 1980s and both byte and word sized MUL and DIV are minor legacy baggage today.

Hammer Quote:

Pentium FDIV is completed with 16 cycles and with limited out-of-order processing capability.

https://www.nxp.com/docs/en/data-sheet/MC68060UM.pdf
68060 FDIV has 37 to 38 cycles.


The 6888x FPU has single precision FSGLDIV and FSGLMUL instructions to provide lower latency single precision FDIV and FMUL which were supported in hardware but not optimized in the 68040 and 68060. The fact they were supported in hardware means they had the option to optimize them later but the lack of optimization indicates area was prioritized over floating point performance. There was already not enough latency decrease from an extended precision FMUL to FSGLMUL by the time of the 68060 to be worthwhile. There is also not much difference in latency between an extended precision division and double precision. A single precision FDIV optimization is valuable, especially without a SIMD unit for high speed floating point processing. There are plenty of FPUs that do not optimize single precision FDIV but it makes more sense with an extended precision FPU.

Hammer Quote:

I picked on MUL and DIV instructions for a reason. The RISC or DSP competition has faster MUL instructions.


Most early RISC CPUs did not have hardware MUL and DIV including ARM and MIPS.

Hammer Quote:

http://www.bitsavers.org/components/motorola/68000/CPU32_Reference_Manual_Aug90.pdf
CPU32's MUL (26 for word, 52 for long) and DIV cycle times are 68030-like.

There's a product segmentation for fast MUL via Motorola DSP SKUs. If you want a fast ADD and MUL, buy a $108 68EC040 CPU.


The XCF5102 early 68040/ColdFire hybrid had an improved 68040 pipeline with 2kiB instruction and 1kiB data caches. This is almost what was needed to improve performance and lower cost but the XCF5102 was only 68000 and ColdFire compatible. Motorola nearly hit the target but missed the market at times.

http://marc.retronik.fr/motorola/68K/68000/High-Performance_Internal_Product_Portfolio_Overview_with_Mask_Revision_[MOTOROLA_1995_112p].pdf

Last edited by matthey on 16-Jun-2024 at 11:24 PM.
Last edited by matthey on 16-Jun-2024 at 08:54 PM.

 Status: Offline
Profile     Report this post  
Hammer 
Re: One major reason why Motorola and 68k failed...
Posted on 17-Jun-2024 3:08:12
#205 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5859
From: Australia

@matthey

Quote:
I expect Photoshop has more SIMD code and more optimized SIMD code than average.

You assumed.

Blender 2.9 used Intel Embree for CPU ray trace cycles and Intel OpenImageDenoise for interactive denoising in the 3D viewport and for final renders via Intel/AMD CPU with at least SSE 4.1.

https://developer.blender.org/docs/release_notes/2.90/cycles/
Quote:

Intel Embree is now used for ray tracing on the CPU. This significantly improves performance in scenes with motion blur. Other scenes with high geometric complexity also benefit on average, depending on the scene contents.


Performance-critical sections are being optimized toward the CPU's vector extensions.

Quote:

How did the more expensive competition handle 2D and 3D graphics?

Where's the 1986-era Pixar Image Computer? Pixar Image Computer has an array of AMD 21116 bit-slice processors in a SIMD configuration.

From 1990 to 1994, Archie didn't have an extensive 2.5D and texture-mapped 3D game library when compared to a gaming PC platform.

PC's 1990 has dual Windows 3.0 and Wing Commander releases.

Michael Abrash published optimized Mode X R&D in July 1991 for public consumption which helped the gaming PC platform. Where's Amiga's published optimized C2P R&D in the 1993 to 1996 timeframe?

Software sells hardware. Providing sufficient hardware is part of the solution, selling "dreams" is another.

Sony provided sufficient PS1 hardware with good SDK, good 1st party games, and timed exclusive 3rd party games.

For CD32's FMV module, Commodore had "RISC power" with $50 CL450 SoC's MIPS-X CPU in a limited use case.

Quote:

Year | PC Model | CPU | Graphics | Price]
1984 Macintosh (128K) 68000@7.8MHz black-and-white US$2,495

Your argument is before 68K is pushed out of Unix workstation markets.

Macintosh 512K has "next-generation" GUI business software such as MS Excel, MS Word, and Aldus PageMaker.

Quote:

1984 PC AT 80286@6MHz MDA|CGA|EGA/ISA US$4000-$6000

IBM wasn't the entity that kept the PC platform competitive.

Quote:

1985 Amiga 1000 68000@7.2MHz OCS US$1,285

A1000 is a sales failure. Commodore didn't spend their USD $50 million advertisement budget on business software deals. Most people have day jobs.

Steve Jobs has arranged agreements with business software providers which is different from Commodore's A1000 marketing approach.

Quote:

Motorola generally backed up their numbers. Authors of news articles could and sometimes would ask for more specific benchmark data including the compiler used and sometimes would publish some of the details. For example, the Diab compiler was used to achieve 68060 DMIPS results. I don't know if individual customers would receive specific data if they contacted Motorola/Freescale. Intel did not respond to at least one request from a publication author to give specifics of benchmark results they used to attack the 68k with a propaganda campaign. The author published the lack of response from Intel.

So, you have nothing to back Motorola's 68030's 17 MIPS claim. You asserted Motorola's 68030's 17 MIPS claim to be Dhrystone.

Quote:

Byte sized integer MUL and DIV can use a table lookup which is what most 8/16 bit CPUs used. Not many modern 32/64 bit CPUs have byte sized MUL and DIV either.

Wrong. "AI" has matrix math INT8 and FP8 with 32-bit results.

Reminder, X86-64 still has native byte size math and expanded into SIMD's matrix INT8 (byte) with 32-bit results for AI. I already told you about CELL SPU and SSSE3 SIMD with packed math byte size.

Quote:

A 32x32 MUL was already down to 2 cycle latency for the 68060

Reminder, 68060 removed 32x32=64 MUL and DIV version.

32-bit x 32-bit would exceed 32-bit results. P5 Pentium didn't delete the 32x32=64 instruction.


Quote:

and I believe a single cycle latency is possible using a modern chip fab process.

That's speculative.

Quote:

Byte sized MUL and DIV were already minor legacy baggage back in the 1980s and both byte and word sized MUL and DIV are minor legacy baggage today.

You're in dreamland. X86-64 still has native byte size math and expanded into SIMD's matrix INT8 (byte) with 32-bit results for AI. I already told you about CELL SPU and SSSE3 SIMD with packed math byte size.

Quote:

The 6888x FPU has single precision FSGLDIV and FSGLMUL instructions to provide lower latency single precision FDIV and FMUL which were supported in hardware but not optimized in the 68040 and 68060.

Define "lower latency". I want numbers, not fluff.

Prove 68882 @ 50 Mhz's FP32 math operations can deliver 25 MFLOPS FP32.

Quote:

Most early RISC CPUs did not have hardware MUL and DIV including ARM and MIPS.
[quote]

https://wiki.preterhuman.net/MIPS_architecture
[quote]
n 1984 Hennessy was convinced of the future commercial potential of the design, and left Stanford to form MIPS Computer Systems. They released their first design, the R2000, in 1985, improving the design as the R3000 in 1988. These 32-bit CPUs formed the basis of their company through the 1980s, used primarily in Silicon Graphics series of workstations. These commercial designs deviated from the Stanford academic research by implementing most of the interlocks in hardware, supplying full multiply and divide instructions (among others).

The commercial MIPS designs from MIPS Computer Systems deviated from Stanford academic research MIPS.

PS1 has an R3000 CPU from 1988.

For SuperH RISC CPU family,
https://antime.kapsi.fi/sega/files/h12p0.pdf
Quote:

A built-in multiplier can execute multiplication and addition as
quickly as DSP.

SuperH-2 has very fast MUL instructions, for example

16 x 16 = 32, execute in 1 to 3 cycles
32 x 32 = 32, execute in 2 to 4 cycles
32 x 32 = 64, execute in 2 to 4 cycles. 68060 is missing this feature.

32 x 32 + 64 = 64, execute in 2 to 4 cycles range.

SuperH-2 is "cheap RISC" i.e. cheaper than $108 68EC040.

For MUL, SuperH-2 is faster than 68030 @ 50 Mhz.

Motorola wants its customers to buy 68030 (fast ADD) and separate DSP (fast MUL) or useless $108 68EC040. Nickel and dime the fast basic instructions in separate product segmentation. Freescale has the mentality to sell a "nickel and dime" PowerPC e500v2 CPU with a non-standard PowerPC FPU. Motorola has "nickel and dime" MMU feature.

Staying with Motorola/Freescale is not optimal for desktop computers, gaming, and smart handheld devices.

During the ARM60 era, ARMv3M was the 1st ARM CPU with MUL instruction. ARMv3 and ARM60 don't guarantee MUL instruction.

Last edited by Hammer on 17-Jun-2024 at 05:02 AM.
Last edited by Hammer on 17-Jun-2024 at 03:39 AM.
Last edited by Hammer on 17-Jun-2024 at 03:29 AM.
Last edited by Hammer on 17-Jun-2024 at 03:26 AM.
Last edited by Hammer on 17-Jun-2024 at 03:13 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
matthey 
Re: One major reason why Motorola and 68k failed...
Posted on 17-Jun-2024 20:05:08
#206 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2270
From: Kansas

Hammer Quote:

You assumed.

Blender 2.9 used Intel Embree for CPU ray trace cycles and Intel OpenImageDenoise for interactive denoising in the 3D viewport and for final renders via Intel/AMD CPU with at least SSE 4.1.

https://developer.blender.org/docs/release_notes/2.90/cycles/
Quote:

Intel Embree is now used for ray tracing on the CPU. This significantly improves performance in scenes with motion blur. Other scenes with high geometric complexity also benefit on average, depending on the scene contents.


Performance-critical sections are being optimized toward the CPU's vector extensions.


Photoshop SIMD optimizations could be as good as for Embree by using preprocessor conditional commands. Not having to support older SIMD ISAs and optimizing for a more standard "Intel" target may simplify and reduce the work to optimize. A better comparison for Embree would be with Vulkan which supports a wider variety of hardware but appears to be highly optimized. The original Photoshop selection was about comparing x86 and x86-64 code density and instruction size increases mostly due to prefixes and large SIMD instructions though. It would be interesting to compare early x86-64 SIMD code vs newer x86-64 SIMD code too. The SIMD instruction size likely keeps growing and growing requiring x86-64 hardware to be higher end and more expensive. ARM AArch64 introduced a lesser SIMD standard that is also very standardized but it did not scale well to low end hardware on introduction and grows slowly leaving lower end markets to use less standardized Thumb Cortex-R and RISC-V hardware. This is where a more standardized 68k ISA with good code density and a lower spec than AArch64 and x86-64 could be competitive. The key is to do more with less rather than joining the bloat race to compete with AArch64, x86-64 and POWER. It doesn't mean low performance either as x86 showed the CISC advantage of being able to lead in performance with fewer registers than RISC ISAs.

Hammer Quote:

Where's the 1986-era Pixar Image Computer? Pixar Image Computer has an array of AMD 21116 bit-slice processors in a SIMD configuration.


Funny. A computer with a price that was more than a house and required a workstation to control it was competition for the 68000 Amiga 1000 with a price that was lower than a car?

Hammer Quote:

From 1990 to 1994, Archie didn't have an extensive 2.5D and texture-mapped 3D game library when compared to a gaming PC platform.

PC's 1990 has dual Windows 3.0 and Wing Commander releases.

Michael Abrash published optimized Mode X R&D in July 1991 for public consumption which helped the gaming PC platform. Where's Amiga's published optimized C2P R&D in the 1993 to 1996 timeframe?


This is a thread about the 68k and obviously you don't limit to PCs so lets talk about the successes.

https://en.wikipedia.org/wiki/NeXT Quote:

Many successful applications have lineage from NeXT, including the first web browser and the video games Doom and Quake.


The 68k based NeXT didn't have any problem with chunky. Porting and optimizing for more limited and inferior x86 hardware was likely more difficult than programming on a 68040 NeXT. The 68040 had the performance necessary for Doom and Quake as well. The 68k Amiga should have moved in the direction of integrating NeXT like features into the chipset but it moved in the direction of an antiquated C64 replacement instead.

Hammer Quote:

Software sells hardware. Providing sufficient hardware is part of the solution, selling "dreams" is another.

Sony provided sufficient PS1 hardware with good SDK, good 1st party games, and timed exclusive 3rd party games.

For CD32's FMV module, Commodore had "RISC power" with $50 CL450 SoC's MIPS-X CPU in a limited use case.


Jay Miner tried to bring an affordable 68000 Amiga for the masses to the market instead of an expensive NeXT workstation. If successful at creating a large enough user base to attract software developers, he knew further upgrades and integration could bring NeXT like features to an affordable Amiga but CBM did not follow his vision. It was more difficult for NeXT and other 68k workstation producers to integrate and reduce prices without economies of scale. Affordable hardware is still a good way to increase the user base to gain software developers and improve economies of scale but emulation does not provide competitive value.

Hammer Quote:

Your argument is before 68K is pushed out of Unix workstation markets.

Macintosh 512K has "next-generation" GUI business software such as MS Excel, MS Word, and Aldus PageMaker.


Most business computers did not have advanced graphics back then. The black and white Mac was radical for using bitmapped graphics instead of a monochrome text based CLI. The Mac was not a pure business computer either.

Hammer Quote:

IBM wasn't the entity that kept the PC platform competitive.


In 1984, IBM was still a market leader setting PC standards like MDA, CGA and EGA on an ISA bus.

Hammer Quote:

A1000 is a sales failure. Commodore didn't spend their USD $50 million advertisement budget on business software deals. Most people have day jobs.

Steve Jobs has arranged agreements with business software providers which is different from Commodore's A1000 marketing approach.


As I recall, the 68k Mac launch was slow and software support lacking at first too. Apple support and marketing was much better than CBM failed to deliver.

Hammer Quote:

So, you have nothing to back Motorola's 68030's 17 MIPS claim. You asserted Motorola's 68030's 17 MIPS claim to be Dhrystone.


Most Motorola literature uses MIPS but it is DMIPS. It wouldn't be anything else considering the values and what we generally know to be actual benchmark numbers.

Hammer Quote:

Wrong. "AI" has matrix math INT8 and FP8 with 32-bit results.

Reminder, X86-64 still has native byte size math and expanded into SIMD's matrix INT8 (byte) with 32-bit results for AI. I already told you about CELL SPU and SSSE3 SIMD with packed math byte size.


I was talking about the integer CPU and not the SIMD unit lacking integer byte sized MUL and DIV instructions. Datatypes in SIMD registers generally retain the same size with different operations. MUL and DIV may change the datatype size which is necessary to keep from losing data. For example, an 8x8=16 bits is necessary to avoid overflow that is possible with 8x8=8 bits. This means that 8x8=8 bits is very limited.

moveq #16,d0
mulu.b d0,d0 ; 16x16=256 but unsigned d0.b range is 0-255 so we already have overflow

The 68k does not have MULx.B because it is so limited. The CPU integer registers can be extended before using a 16x16=32 bit or 32x32=32 bit multiply. In fact, modern compilers often keep datatypes less than the register size as already signed or unsigned extended data in registers since modern CPUs can benefit more often from forwarding/bypassing results and avoid partial register stalls. ColdFire went all the way to eliminating most byte and word operations which are often replaced with longword operations of extended datatypes with the help of some new extension instructions like MVS and MVZ.

Hammer Quote:

Reminder, 68060 removed 32x32=64 MUL and DIV version.

32-bit x 32-bit would exceed 32-bit results. P5 Pentium didn't delete the 32x32=64 instruction.


I believe removing 32x32=64 bits was the biggest design mistake of the 68060. A 32 bit datatype is large enough that 32x32=32 is useful and more common but 32x32=64 is an important building block for partial register multiplies of higher precision multiplies and it was already being used for an optimization by GCC to replace division by a constant with multiply by a magic number and adjust with shifts and adds.

Hammer Quote:

You're in dreamland. X86-64 still has native byte size math and expanded into SIMD's matrix INT8 (byte) with 32-bit results for AI. I already told you about CELL SPU and SSSE3 SIMD with packed math byte size.


SIMD 8x8=8 bit multiply is limiting reducing the usefulness of an integer byte datatype when multiply is needed. For the integer CPU, 8x8=8 is baggage while 8x8=16 has limited usefulness because the latency is the same as 16x16=32 bits.

Hammer Quote:

Define "lower latency". I want numbers, not fluff.


Lower latency means fewer cycles are required to execute an instruction from start to finish.

Instruction | 6888x cycles | 68060 cycles | ColdFireV4e cycles
FMUL 71 3 4
FSGLMUL 59 3 no
FDIV 103 37 23
FSGLDIV 69 37 no

The table above shows FPU register only latencies in cycles. The 68k FPU FMUL and FDIV are extended precision while the ColdFire is double precision likely saving a few cycles for FDIV but none for FMUL. The FSGLMUL and FSGLDIV are single precision and save cycles for the 6888x while the 68060 lacks an optimization for this precision. ColdFire lacks these instructions altogether. I did not include the 68040 because the instruction timing chart is difficult to understand and is missing the FSGLMUL and FSGLDIV instructions. I believe they are supported in hardware but have no optimization for single precision.

Hammer Quote:

n 1984 Hennessy was convinced of the future commercial potential of the design, and left Stanford to form MIPS Computer Systems. They released their first design, the R2000, in 1985, improving the design as the R3000 in 1988. These 32-bit CPUs formed the basis of their company through the 1980s, used primarily in Silicon Graphics series of workstations. These commercial designs deviated from the Stanford academic research by implementing most of the interlocks in hardware, supplying full multiply and divide instructions (among others).


MIPS stands for Microprocessor without Interlocked Pipelined Stages which means no forwarding/bypassing of results. If this came early in the MIPS history, it would have been less embarrassing to change the name than to add what the name implies is missing. It only took a few years to realize simple was bad for performance and to begin abandoning RISC principals. Most RISC architectures as introduced starting in about 1985 did not have hardware integer MUL and DIV instructions as I stated earlier.

Hammer Quote:

SuperH-2 has very fast MUL instructions, for example

16 x 16 = 32, execute in 1 to 3 cycles
32 x 32 = 32, execute in 2 to 4 cycles
32 x 32 = 64, execute in 2 to 4 cycles. 68060 is missing this feature.

32 x 32 + 64 = 64, execute in 2 to 4 cycles range.

SuperH-2 is "cheap RISC" i.e. cheaper than $108 68EC040.

For MUL, SuperH-2 is faster than 68030 @ 50 Mhz.


SuperH was good for low end embedded CPUs and avoided some of the early RISC mistakes. The problem is a 16 bit fixed length encoding that limits scaling it up. Even though it is microcoded RISC, the cores were smaller than 68k cores allowing for more pipelining and caches at the time of limited silicon space. The 68k ISA has nicer features, fewer limitations and can scale up.

Hammer Quote:

Motorola wants its customers to buy 68030 (fast ADD) and separate DSP (fast MUL) or useless $108 68EC040. Nickel and dime the fast basic instructions in separate product segmentation. Freescale has the mentality to sell a "nickel and dime" PowerPC e500v2 CPU with a non-standard PowerPC FPU. Motorola has "nickel and dime" MMU feature.


MIPS did not originally have hardware MUL and DIV instructions, the MMU on a single chip with the CPU or the FPU on a single chip with the CPU. ARM originally had a weird MMU, no hardware MUL and DIV instructions and no FPU. MPUs were primitive in those days and the 68k generally had nice features at the cost of some performance. As transistor budgets increased, performance improved as microcoding decreased but 68k development stopped just when the 68k was getting good.

Hammer Quote:

Staying with Motorola/Freescale is not optimal for desktop computers, gaming, and smart handheld devices.

During the ARM60 era, ARMv3M was the 1st ARM CPU with MUL instruction. ARMv3 and ARM60 don't guarantee MUL instruction.


Motorola/Freescale replaced the 68k with PPC for political reasons. It wasn't practical to even use FPGA 68k CPUs back then and it was much more expensive to make ASICs for anyone wanting to continue with the 68k. Licensing the 68k and adding to a custom Amiga SoC was considered by CBM but they were too little too late.

Last edited by matthey on 17-Jun-2024 at 10:55 PM.
Last edited by matthey on 17-Jun-2024 at 08:05 PM.

 Status: Offline
Profile     Report this post  
kolla 
Re: One major reason why Motorola and 68k failed...
Posted on 18-Jun-2024 1:50:30
#207 ]
Elite Member
Joined: 21-Aug-2003
Posts: 3187
From: Trondheim, Norway

@matthey

Quote:
The 68040 had the performance necessary for Doom and Quake as well.


You need to check out what exactly the NeXT was used _for_ with these games?

Just because a piece of software is (partly) developed _on_ a certain system, doesn’t imply it’s developed _for_ that same system.

https://youtu.be/b74vNajqCcA?si=8ZS5N2A37Zt8BHYx

And for comparisons…
https://youtu.be/jJwnmq-WGVA?si=9VLWXgAPvjoIy9bK

Quote:
The 68k Amiga should have moved in the direction of integrating NeXT like features into the chipset


What “NeXT like features”? You clearly have never actually used one - fixed oddball non-standard higher resolutions that require special monitors and cabling, mostly were made to satisfy DTP market? Pay tons extra for colours and not just 3bit gray scale? Postscript Display? Notice how Doom on NeXT is always in a tiny window and never full screen? Are these the “NeXT like features” you think of?

Last edited by kolla on 18-Jun-2024 at 02:48 AM.
Last edited by kolla on 18-Jun-2024 at 02:46 AM.
Last edited by kolla on 18-Jun-2024 at 02:44 AM.
Last edited by kolla on 18-Jun-2024 at 02:37 AM.
Last edited by kolla on 18-Jun-2024 at 01:51 AM.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

 Status: Offline
Profile     Report this post  
Hammer 
Re: One major reason why Motorola and 68k failed...
Posted on 18-Jun-2024 6:37:00
#208 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5859
From: Australia

@matthey

Quote:
Photoshop SIMD optimizations could be as good as for Embree by using preprocessor conditional commands.

That's speculative. Photoshop's modules may have performance-critical code.

Quote:

Not having to support older SIMD ISAs and optimizing for a more standard "Intel" target may simplify and reduce the work to optimize. A better comparison for Embree would be with Vulkan which supports a wider variety of hardware but appears to be highly optimized.

Vulkan requires staging setup time and it's not good for many small transactions i.e. PCIe I/O latency issues. Lower-level Vulkan API wouldn't beat the CPU's SIMD extension in low-latency transactions.

For modern PC games with real-time raytracing, the CPU maintains a BVH tree (with geometry boxes and dot geometry) to feed the BVH hardware-accelerated GPU.

Vulkan doesn't remove the CPU's BVH tree maintenance. GPU renders the CPU's viewport.

Quote:

Funny. A computer with a price that was more than a house and required a workstation to control it was competition for the 68000 Amiga 1000 with a price that was lower than a car?

Other companies adopted SIMD e.g. Intel i860 with MMX-like extensions which later influenced Pentium MMX and AMD's 3DNow.

Who is delivering "workstation compute" and "workstation graphics" for the masses?


Quote:
This is a thread about the 68k and obviously you don't limit to PCs so lets talk about the successes.

This thread is about Motorola's and 68K's failures, hence competitors will be involved.

Quote:

The 68k based NeXT didn't have any problem with chunky. Porting and optimizing for more limited and inferior x86 hardware was likely more difficult than programming on a 68040 NeXT. The 68040 had the performance necessary for Doom and Quake as well.

Quake on 68040. LOL.

Quake's 3D optimization was mostly done by Michael Abrash i.e. VGA Mode X optimization gaming PC evangelist.

Quote:

The 68k Amiga should have moved in the direction of integrating NeXT like features into the chipset but it moved in the direction of an antiquated C64 replacement instead.

There's a question of price with that particular NeXT experience.

There is no Michael Abrash-like (VGA's Mode X optimization) advocate for the Amiga platform's optimized C2P advocacy from 1991 to 1995 time period. I can name names who didn't open-source their recent open-source ports.


Quote:

Jay Miner tried to bring an affordable 68000 Amiga for the masses to the market instead of an expensive NeXT workstation. If successful at creating a large enough user base to attract software developers, he knew further upgrades and integration could bring NeXT like features to an affordable Amiga but CBM did not follow his vision. It was more difficult for NeXT and other 68k workstation producers to integrate and reduce prices without economies of scale. Affordable hardware is still a good way to increase the user base to gain software developers and improve economies of scale but emulation does not provide competitive value.

Commodore management didn't believe in delivering "workstation compute" and "workstation graphics" for the masses.

Jay Miner's Amiga OCS initial success wasn't repeated. With Amiga OCS, you have a single leadership with Jay Miner.

Amiga OCS's 4096 color palette (12-bit) use case wasn't original.

From Commodore The Final Years by Brian, the post-OCS road map was a directionless confusion. Amiga's low-risk ECS Denise is mostly competed near the end of 1987.

Amiga OCS could have ECS Denise and ECS Agnus in 1988 and help A500/A2000's business use case. There was confusion with the post-OCS/ECS road map which delayed Amiga's 256-color display capability i.e. "To be, or not to be, that is the question" - William Shakespeare.

MS released GUI Excel for Windows 2.x in 1988 and GUI Word in 1989 which partly dislodged text-based Lotus 123/Word Perfect incumbents.

The years from 1987 to 1989 were critical times for attracting business customers from text-based Lotus 123/Word Perfect incumbents.

Quote:

Most business computers did not have advanced graphics back then. The black and white Mac was radical for using bitmapped graphics instead of a monochrome text based CLI. The Mac was not a pure business computer either.

Mac has "next-gen" GUI MS Excel (1985), MS Word (1985), Aldus PageMaker (1985) and QuarkXPress (1987).

Apple obtained a sizable business customer enough to sustain 1.2 million buying higher priced PowerMacs during 1994.

From Mac experience, Microsoft and Aldus focused on Windows 2.x GUI Mac ports for Excel, Word, and PageMaker. Like Apple, Microsoft attracted a large enough business customer base for Windows 2.x's business applications and set the stage for the Windows 3.0 wave in 1990.

Apple's Macintosh II's 1987 release establishes the 256-color Quickdraw RTG road map for the Mac platform.

Quote:

In 1984, IBM was still a market leader setting PC standards like MDA, CGA and EGA on an ISA bus.

That's not an argument. IBM still established the VGA standard and it was quickly cloned (e.g. ET3000AX) in the same 1987 year.

1986's MCGA has mode 13h which is carried into VGA's mode 13h.

VGA monitor timings and port are based on IBM PGC's monitor timings and port. A small tweak on PGC's monitor timings enables the 1984 monitor to display VGA.

Quote:

As I recall, the 68k Mac launch was slow and software support lacking at first too. Apple support and marketing was much better than CBM failed to deliver.

For business software and stable high-resolution display, Mac was ahead of the A1000 and 1987-era A500/A2000.

ECS can do stable 800x300p (240,000 pixels) with 16 colors without the slow double scan https://aminet.net/package/driver/moni/HighGFXnmore

1987 VGA has 640x480p (308,160 pixels) with 16 colors. Windows 2.x supported Video 7's SVGA and IBM 8514 modes.

For AGA, it would need 140 ns read/write cycle FP DRAM with a 32-bit bus which is 4X memory bandwidth from OCS.

Where's 1985 GUI MS Excel for A1000?

Quote:

Most Motorola literature uses MIPS but it is DMIPS. It wouldn't be anything else considering the values and what we generally know to be actual benchmark numbers.

Again, prove Motorola's 17 MIPS claim for 68030 is Dhrystone. Furthermore, later Motorola literature changed MIPS numbers e.g. 9 MIPS for 40Mhz 68EC030.

Quote:

I was talking about the integer CPU and not the SIMD unit lacking integer byte sized MUL and DIV instructions

The same SSE unit can run in scalar mode. LOL

Pixel processing is very friendly with parallelism which is useful for games.

Quote:

. Datatypes in SIMD registers generally retain the same size with different operations. MUL and DIV may change the datatype size which is necessary to keep from losing data. For example, an 8x8=16 bits is necessary to avoid overflow that is possible with 8x8=8 bits. This means that 8x8=8 bits is very limited.

moveq #16,d0
mulu.b d0,d0 ; 16x16=256 but unsigned d0.b range is 0-255 so we already have overflow

The lack of pack math 16x8bit SIMD128 instruction is a major problem for a PS3 emulator since this is significant CELL SPU instruction.

Pixel processing is very friendly with parallelism which is useful for games.

Quote:

The 68k does not have MULx.B because it is so limited. The CPU integer registers can be extended before using a 16x16=32 bit or 32x32=32 bit multiply.

68060 has removed 32x32=64 bit multiply.

Both AC68080 and Emu68 have restored 32x32=64 bit multiply instruction which undo Motorola's stupidity.

Quote:

In fact, modern compilers often keep datatypes less than the register size as already signed or unsigned extended data in registers since modern CPUs can benefit more often from forwarding/bypassing results and avoid partial register stalls. ColdFire went all the way to eliminating most byte and word operations which are often replaced with longword operations of extended datatypes with the help of some new extension instructions like MVS and MVZ.

Motorola/Freescale's instruction set kitbashing is part of it's DNA.


Quote:

Lower latency means fewer cycles are required to execute an instruction from start to finish.

Instruction | 6888x cycles | 68060 cycles | ColdFireV4e cycles
FMUL 71 3 4
FSGLMUL 59 3 no
FDIV 103 37 23
FSGLDIV 69 37 no

That's too late since ColdFireV4e was released in the year 2000. Do you want companies like Sony to pulse their business until ColdFireV4e's year 2000?

A full pipeline FPU can hide latency.

There's a reason why Motorola has exited the semiconductor business and Freescale was purchased by NXP.

Active game platform companies don't run their road map schedules with Motorola's or Freescale's.

The original Xbox in development selected AMD's K7 Duron until Bill Gates' Pentium III Coppermine with 128 L2 cache override. The original Xbox development started in 1998 codenamed "Midway" and in 1999 DirectX8.1 was also in development.

For NVIDIA, there's an overlapping Y1999 R&D (GeForce's DirectX8) and Y1999 product release (e.g. GeForce 256, DirectX6.1/DirectX7) roadmap.

For Y1998 to Y2000 time frame, the embedded gaming CPU competition against ColdFireV4e is AMD's K7 Duron and Intel's Pentium III Coppermine with 128 L2 cache.


Quote:

MIPS stands for Microprocessor without Interlocked Pipelined Stages which means no forwarding/bypassing of results. If this came early in the MIPS history, it would have been less embarrassing to change the name than to add what the name implies is missing.

That's a red herring.

Quote:

It only took a few years to realize simple was bad for performance and to begin abandoning RISC principals. Most RISC architectures as introduced starting in about 1985 did not have hardware integer MUL and DIV instructions as I stated earlier.

That's a red herring. "Cheapo RISC" wasn't a major factor in 1985.

1988 release of R3000 powered the later PS1.

Quote:

Motorola/Freescale replaced the 68k with PPC for political reasons.

1. Motorola has its own RISC road map with MC88000 and was working with Apple.

2. IBM invited Apple into PowerPC and Apple included Motorola in PowerPC. PowerPC 601 bus recycled MC88110 bus interface. For Motorola, PowerPC displaced the MC88000 RISC project.

Most big-endian 68K Unix workstations exited Motarola's orbit with big-endian RISC projects.

MIPS Inc. spotted Motorola's weakness and offered a strong MUL-equipped MIPS R2000 CPU.

Motorola wasn't paranoid enough to respond with 68030M i.e. MUL strong 68030 variant.

My proposed 68030M would have a strong math add and mul reg,reg operands-only instruction subset which differs from 68040's mostly strong instruction set. I'm a bit paranoid if I run Motorola. 68030M would specifically target game platforms with "cheapo RISC" requirements i.e. blocking SuperH2, R2000, and R3000.

Quote:

It wasn't practical to even use FPGA 68k CPUs back then and it was much more expensive to make ASICs for anyone wanting to continue with the 68k.

68000 licensees didn't develop a strong CISC R&D i.e. none of the 68000 licensees developed their business like AMD's.

Not including Atmel TS68020MR1B, I don't recall 68020/68030 clones from the usual 68000 licensees.

Hitachi and HP had a long-term alliance since 1989 that included joint PA-RISC development.
Hitachi was busy with both PA-RISC and their own SuperH CPU family.

NEC was in both MIPS and PA-RISC camps and NEC's X86 license adventure ended with 286's MMU release.

Quote:

Licensing the 68k and adding to a custom Amiga SoC was considered by CBM but they were too little too late.

Atmel had a 68020 license.

Last edited by Hammer on 18-Jun-2024 at 07:56 AM.
Last edited by Hammer on 18-Jun-2024 at 07:52 AM.
Last edited by Hammer on 18-Jun-2024 at 07:42 AM.
Last edited by Hammer on 18-Jun-2024 at 07:16 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
kolla 
Re: One major reason why Motorola and 68k failed...
Posted on 18-Jun-2024 6:53:39
#209 ]
Elite Member
Joined: 21-Aug-2003
Posts: 3187
From: Trondheim, Norway

Inspired by all the repetetive nonsense here, I spent quite a few hours this morning playing Doom (ADoom 1.4) on an A1000 with PiStorm. Yes, OCS with 512kB chipram and glorious EHB mode.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

 Status: Offline
Profile     Report this post  
Hammer 
Re: One major reason why Motorola and 68k failed...
Posted on 18-Jun-2024 7:22:54
#210 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5859
From: Australia

@kolla

Quote:

kolla wrote:
Inspired by all the repetetive nonsense here, I spent quite a few hours this morning playing Doom (ADoom 1.4) on an A1000 with PiStorm. Yes, OCS with 512kB chipram and glorious EHB mode.

Emu68 has its 1st Warp3D VideoCore4 hardware acceleration and I played Quake II yesterday.

Warp3D VideoCore4 library is missing a few OpenGL extensions.

My ECS Denise arrived early this morning and I installed it in my A500 Rev6A. ECS Denise is needed for AHI Paula 44.1/48 kHz mode with PiStorm-Emu68. I plan to transfer my A500 Rev6A-PiStorm to my brother.

My other A500 Rev5 is gaining missing chips i.e. it needs Gary and two CIA chips to be fully functional. I have my second RPi 3A+ for the second PiStorm.

Last edited by Hammer on 18-Jun-2024 at 07:32 AM.
Last edited by Hammer on 18-Jun-2024 at 07:26 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
ppcamiga1 
Re: One major reason why Motorola and 68k failed...
Posted on 18-Jun-2024 15:31:39
#211 ]
Cult Member
Joined: 23-Aug-2015
Posts: 858
From: Unknown

@Hammer

I also play Quake II on my rpi. And Doom.
Drop emulator use native version on debian.

 Status: Offline
Profile     Report this post  
pixie 
Re: One major reason why Motorola and 68k failed...
Posted on 18-Jun-2024 18:21:05
#212 ]
Elite Member
Joined: 10-Mar-2003
Posts: 3287
From: Figueira da Foz - Portugal

@ppcamiga1

why would you run quake ii on rpi? Don't you have a pc?

_________________
Indigo 3D Lounge, my second home.
The Illusion of Choice | Am*ga

 Status: Offline
Profile     Report this post  
matthey 
Re: One major reason why Motorola and 68k failed...
Posted on 18-Jun-2024 19:57:10
#213 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2270
From: Kansas

kolla Quote:

You need to check out what exactly the NeXT was used _for_ with these games?

Just because a piece of software is (partly) developed _on_ a certain system, doesn’t imply it’s developed _for_ that same system.


We don't have all the info so we can only piece together what we know and expect.

The NeXTstation market (~50,000 units sold) was likely too small for a profitable Doom and Quake release. The 68040 NeXTstation Color user base would have been even smaller and maybe even more like the size of the PPC Amiga1 user base today.

https://www.forbes.com/sites/quora/2016/09/01/why-john-carmack-chose-next-for-developing-doom-and-other-favorites/ Quote:

I have no regrets at all about developing Doom on a NeXT!

I bought our first NeXT (a ColorStation) out of personal interest. Jason Blochowiak had talked to me about the advantages of Unix-based systems from his time in college and I was interested in seeing what Steve Jobs' next big thing was. It is funny to look back; I can remember honestly wondering what the advantages of a real multi-process development environment would be over the DOS and older Apple environments that we were using. Actually, using the NeXT was an eye-opener, and it was quickly clear to me that it had a lot of tangible advantages for us, so we moved everything but pixel art (which was still done in Deluxe Paint on DOS) over. Using Interface Builder for our game editors was a NeXT unique advantage, but most Unix systems would have provided similar general purpose software development advantages (the debugger wasn’t nearly as good as Turbo Debugger 386, though!) Kevin Cloud even did our game manuals, starting with Wolfenstein 3D, in Framemaker on a NeXT.

This was all in the context of DOS or Windows 3.x; it was revolutionary to have a computer system that didn’t crash all the time. By the time Quake 2 came around, Windows NT was in a similar didn’t-crash-all-the-time state; it had hardware accelerated OpenGL, and Visual Studio was getting really good, so I didn’t feel too bad about moving over to it. At that transition point I did evaluate most of the other Unix workstations and didn’t find a strong enough reason not to go with Microsoft for our desktop systems.

Over the entire course of Doom and Quake 1’s development we probably spent $100,000 on NeXT computers, which isn’t much at all in the larger scheme of development. We later spent more than that on Unix SMP server systems (first a quad Alpha, then an eventually 16-way SGI system) to run the time consuming lighting and visibility calculations for the Quake series. I remember one year looking at the Top 500 supercomputer list and thinking that if we had expanded our SGI to 32 processors, we would have just snuck in at the bottom.


While the NeXTstation likely was used to cross compile to a x86 PC as your video shows, there was real and significant development on the NeXTstation.

https://www.nextcomputers.org/forums/index.php?msg=16560 Quote:

Recently I did come across the stolen quake source (before the GPL) and it does have more platform code for nextstep, so perhaps that'd help. I would imagine much like the i386 assembly code, if someone were to port that to the 68030 itd help a GREAT deal. I would imagine that the Amiga people would have done so... Now if there is an Amiga source dump of quake I can certainly look at it, although I don't have any m68k gear to test on...


A more expensive NeXTstation Color would be unnecessary for simple cross compiling. NeXTstation Color Doom and Quake performance may be inferior to 68040 Amiga performance but this may simply be due to lack of optimization kind of like Amiga1 software suffers from poor optimization because of a tiny market. The 68040 NeXTstation Color hardware is higher end than most Amiga hardware with 4096 colors (RGB palette?), 1.5MiB VRAM, 16MiB 70ns main memory, 56001 DSP, 4.8MiB/s SCSI controller, etc.

My theory is that John Carmack was an Apple and Steve Jobs fan. The rivalry between 68k PCs was not as friendly back then especially because the Amiga with "too much hardware" was a major threat to the inferior Mac hardware and even NeXTstation hardware (NeXTstation and Amiga 3000UX competed in the 68k workstation market in the early 1990s). This explains the trash talking of Amiga hardware as inadequate for Doom when it was later found to have acceptable Doom performance. Notice above that the DOS version of Deluxe Paint was used when the Amiga version was generally considered to be the best version and was used by other developers like 3DO developers.

kolla Quote:

What “NeXT like features”? You clearly have never actually used one - fixed oddball non-standard higher resolutions that require special monitors and cabling, mostly were made to satisfy DTP market? Pay tons extra for colours and not just 3bit gray scale? Postscript Display? Notice how Doom on NeXT is always in a tiny window and never full screen? Are these the “NeXT like features” you think of?


Steve Jobs liked proprietary which is not the way to add workstation like features. However, NeXT had high resolutions, VRAM, good performance main memory, networking capabilities, microphone support, etc. High end Amigas should have been upgraded into poor man's workstations instead of having low end Amiga features in a big box with Zorro slots. The Amiga 3000 almost got there if it had been upgraded to AGA but then the Amiga 4000 was cheapened and dropped SCSI support. The Amiga 4000 in one of your video links takes more time to start Doom than all the other computers together and Doom stutters when accessing the drive. The likely stock CBM A3640 accelerator and memory performance is also pathetic and far from workstation like. I have little doubt that the NeXTstation Color with Doom optimizations could beat the Amiga 4000.

 Status: Offline
Profile     Report this post  
Hammer 
Re: One major reason why Motorola and 68k failed...
Posted on 19-Jun-2024 2:29:03
#214 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5859
From: Australia

@ppcamiga1

Quote:
I also play Quake II on my rpi. And Doom.
Drop emulator use native version on debian.

Debian is not "the name" AmigaOS.

https://amiga-news.de/en/news/AN-2015-02-00027-EN.html
Cloanto confirms transfers of Commodore/Amiga copyrights.

The current Warp3D acceleration for VideoCore4 is able to run GLQuake, Quake 2 GL, and Quake 3.

Emu68 is like ROM'ed microcode in CISC 68K.

Deal with it.

Last edited by Hammer on 19-Jun-2024 at 02:31 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
matthey 
Re: One major reason why Motorola and 68k failed...
Posted on 19-Jun-2024 3:43:54
#215 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2270
From: Kansas

Hammer Quote:

Vulkan requires staging setup time and it's not good for many small transactions i.e. PCIe I/O latency issues. Lower-level Vulkan API wouldn't beat the CPU's SIMD extension in low-latency transactions.

For modern PC games with real-time raytracing, the CPU maintains a BVH tree (with geometry boxes and dot geometry) to feed the BVH hardware-accelerated GPU.

Vulkan doesn't remove the CPU's BVH tree maintenance. GPU renders the CPU's viewport.


A Heterogeneous System Architecture (HSA) solves "PCIe I/O latency issues" and Vulkan supports HSA.



https://en.wikipedia.org/wiki/Heterogeneous_System_Architecture

It's not like integrated HSA ray tracing GPUs are too high end. They could be implemented as standard on RPi level hardware or microconsoles. The full sized console market seems to be the only market making them standard and gaining the advantages. It's amazing how many people think plugin GPUs have a performance advantage over integrated GPUs when the opposite is true. Even in Amiga Neverland where the Amiga was ahead of its time and headed toward an integrated CPU+2D SoC and today's SoCs are CPU+2D+3D. Jay Miner understood that closer is better performance and the future is better integration of technology.

Hammer Quote:

Other companies adopted SIMD e.g. Intel i860 with MMX-like extensions which later influenced Pentium MMX and AMD's 3DNow.

Who is delivering "workstation compute" and "workstation graphics" for the masses?


ARM. The low and mid performance ARM hardware is for the "masses" while the more expensive high end x86-64 hardware is for the classes.

Hammer Quote:

Quake's 3D optimization was mostly done by Michael Abrash i.e. VGA Mode X optimization gaming PC evangelist.

...

There is no Michael Abrash-like (VGA's Mode X optimization) advocate for the Amiga platform's optimized C2P advocacy from 1991 to 1995 time period. I can name names who didn't open-source their recent open-source ports.


Mikael Kalms?

https://github.com/Kalmalyzer/kalms-c2p Quote:

Here is a fairly complete set of c2p routines, almost one routine for
every occasion! The main aim is towards 68030-68060 processors, using
CPU-only conversion, although you can find the occasional oddity in some
darker corners of the archive.

The main aim of this archive is to provide a fairly broad range of fast and
easy-to-use c2p routines, so all other programmers need not re-invent the
wheel.


The origins date back to demo coders from at least the late 1990s. Rune Stensland is another demo coder that created some c2p documentation and later was as early Apollo Team member with me. Distribution of documentation was not as easy back then with dial up.

Hammer Quote:

Commodore management didn't believe in delivering "workstation compute" and "workstation graphics" for the masses.

Jay Miner's Amiga OCS initial success wasn't repeated. With Amiga OCS, you have a single leadership with Jay Miner.

Amiga OCS's 4096 color palette (12-bit) use case wasn't original.

From Commodore The Final Years by Brian, the post-OCS road map was a directionless confusion. Amiga's low-risk ECS Denise is mostly competed near the end of 1987.

Amiga OCS could have ECS Denise and ECS Agnus in 1988 and help A500/A2000's business use case. There was confusion with the post-OCS/ECS road map which delayed Amiga's 256-color display capability i.e. "To be, or not to be, that is the question" - William Shakespeare.


ECS in 1988, AGA in 1990 and AA+ in 1992 may have been enough to survive. An earlier memory bandwidth increase and more colors were needed to maintain the position as best value for a graphics PC. CBM did nothing but watch competitors catch up and then surpass the Amiga in graphics capabilities between 1985 and 1990.

Hammer Quote:

Again, prove Motorola's 17 MIPS claim for 68030 is Dhrystone. Furthermore, later Motorola literature changed MIPS numbers e.g. 9 MIPS for 40Mhz 68EC030.


Like the Post Commodore Bankruptcy Documents, faked documents or tampered data are possible without insiders to verify the data but unlikely to be complete hoaxes because the data is elaborate, at least partially accurate and the effort would be large with little to gain. The benchmark results are on the high side but this may be because of improved compiler results at the later date of the documents. It's also possible that Motorola reused an older benchmark result by mistake in "later Motorola literature".

Hammer Quote:

The same SSE unit can run in scalar mode. LOL

Pixel processing is very friendly with parallelism which is useful for games.

...

The lack of pack math 16x8bit SIMD128 instruction is a major problem for a PS3 emulator since this is significant CELL SPU instruction.

Pixel processing is very friendly with parallelism which is useful for games.


SIMD units often have quirky instructions, many corner cases and they don't scale well. They are not so friendly to use or extract maximum performance from and practically require optimization from a very knowledgeable programmer. Even sophisticated compilers using autovectorization have trouble maximizing performance. They are available because they can provide a major boost to performance when everything is just right. The x86-64 scalar non-SIMD use of SIMD instructions as a FPU replacement is easier to use than SIMD coding but the instructions and code are larger than FPU instructions and the resources unnecessarily high limiting the ability to scale hardware down.

Hammer Quote:

Motorola/Freescale's instruction set kitbashing is part of it's DNA.


ColdFire added back some 68k functionality not for compatibility but for improved functionality. This demonstrates that they took their castration of the 68k too far. All this for smaller cores as there wasn't much to be gained in performance by the major castration. Well, there were political reasons too. ColdFire needed to scale smaller than PPC could. The 68k registers were already half of PPC registers due to CISC advantages and even a castrated 68k can easily beat PPC in code density.

Hammer Quote:

That's too late since ColdFireV4e was released in the year 2000. Do you want companies like Sony to pulse their business until ColdFireV4e's year 2000?

A full pipeline FPU can hide latency.


The 68060 FPU has overall better performance and features than the simplified ColdFire FPU. FDIV is an exception where the ColdFire FPU has better performance but the 68060 FPU design could have optimized the FSGLDIV instruction and/or borrowed the 88110 extended precision FDIV which has a latency of only 26 cycles compared to the ColdFire double precision latency of 23 cycles. The 88110 also had optimizations for single precision (13 cycle) and double precision (23 cycle) FDIV which the ColdFire double precision FDIV latency exactly matches and may have borrowed.

Instruction | Precision | 6888x cycles | 68060 cycles | ColdFireV4e cycles | 88110 cycles
FMUL .x 71 3 4 3
FMUL .d 71 3 4 3
FMUL .s 59 3 4 3
FDIV .x 103 37 no 26
FDIV .d 103 37 23 23
FDIV .s 69 37 23 13

If 88110 sources were available to borrow from, the small 3 cycle difference between 88110 extended precision and double precision latency is probably not worth adding a 68k FPU FDBLDIV instruction but a FSGLDIV instruction with a 13 cycle latency would be nice.

Hammer Quote:

For Y1998 to Y2000 time frame, the embedded gaming CPU competition against ColdFireV4e is AMD's K7 Duron and Intel's Pentium III Coppermine with 128 L2 cache.


No. ColdFireV4e is for lower end embedded use like the ARM Cortex-R. Freescale would have competed with PPC for the console market but they lost the market to IBM.

2001 Nintendo Game Cube IBM PPC Gekko
2005 MS XBox 360 IBM PPC Xenon
2006 Sony PS3 IBM PPC Cell
2006 Nintendo Wii IBM PPC Broadway
2012 Nintendo Wii U IBM PPC Espresso

Freescale couldn't compete with IBM and IBM lost the console market due to PPC and poor designs. The PPC 970 (G5), Xenon and Cell CPUs were bad while the Nintendo PPC cores were more practical but lackluster PPC G3 designs. Joining the AIM Alliance and developing PPC made it easier for Motorola/Freescale like they wanted as losing business is easier.

Hammer Quote:

1. Motorola has its own RISC road map with MC88000 and was working with Apple.

2. IBM invited Apple into PowerPC and Apple included Motorola in PowerPC. PowerPC 601 bus recycled MC88110 bus interface. For Motorola, PowerPC displaced the MC88000 RISC project.

Most big-endian 68K Unix workstations exited Motarola's orbit with big-endian RISC projects.

MIPS Inc. spotted Motorola's weakness and offered a strong MUL-equipped MIPS R2000 CPU.


Motorola made mistakes with their 88k too. They should have either made a 64 bit multi-chip 88k CPU or a single chip 32 bit CPU. They were over protective of their IP after the Hitachi lawsuits where they should have been more aggressive at licensing their technology for integrating in SoCs like MIPS, SuperH and ARM. Motorola was likely to lose the too many RISC ISAs proliferation battle so they surrendered both their 88k and 68k development to bet their future on mediocre at best PPC.

Hammer Quote:

Motorola wasn't paranoid enough to respond with 68030M i.e. MUL strong 68030 variant.

My proposed 68030M would have a strong math add and mul reg,reg operands-only instruction subset which differs from 68040's mostly strong instruction set. I'm a bit paranoid if I run Motorola. 68030M would specifically target game platforms with "cheapo RISC" requirements i.e. blocking SuperH2, R2000, and R3000.


It's not that easy to increase the 68020/68030 performance without a major redesign with a longer pipeline like the 68040. Reducing the caches of the 3.3V 68040V is a better idea to reduce the price as I suggested earlier. The Motorola/Freescale XCF5102 68040/ColdFire hybrid has 2kiB instruction cache, 1kiB data cache, improved decoupled pipeline and 3.3V operation which are great but then ignorantly dropped 68040 compatibility. ColdFire could have been a proper subset of 68020 like CPU32 and compatible with it.

Hammer Quote:

68000 licensees didn't develop a strong CISC R&D i.e. none of the 68000 licensees developed their business like AMD's.

Not including Atmel TS68020MR1B, I don't recall 68020/68030 clones from the usual 68000 licensees.


Motorola/Freescale had a newer 68k ISA and more to protect with a brighter future than 808x/x86 which was approaching EOL. It's natural that licensing would be stricter with the 68k but IBM's mistake of choosing the inferior 808x for their PC changed everything.

Hammer Quote:

Atmel had a 68020 license.


There are several reimplemented 68020 compatible cores which could be licensed or used for an ASIC. The holy grail remains the MC qualified MC68060 which shouldn't even be that expensive to license anymore considering the age. Amiga progression seems to be from 68k incompatible Next Generation (NG) PPC AmigaNOne to more 68k compatible Last Generation (LG) emulation though. It's Amiga amateur hour as an Amiga resurrection opportunity comes and goes with THEA500 Mini one and done emulation toy.

Last edited by matthey on 19-Jun-2024 at 03:51 AM.

 Status: Offline
Profile     Report this post  
Hammer 
Re: One major reason why Motorola and 68k failed...
Posted on 19-Jun-2024 3:46:31
#216 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5859
From: Australia

@matthey

Quote:
A more expensive NeXTstation Color would be unnecessary for simple cross compiling. NeXTstation Color Doom and Quake performance may be inferior to 68040 Amiga performance but this may simply be due to lack of optimization kind of like Amiga1 software suffers from poor optimization because of a tiny market. The 68040 NeXTstation Color hardware is higher end than most Amiga hardware with 4096 colors (RGB palette?), 1.5MiB VRAM, 16MiB 70ns main memory, 56001 DSP, 4.8MiB/s SCSI controller, etc.


NeXTstation Color can display 4096 colors from a 4096 color palette backed by VRAM.

For example, VRAM's serial access could range from 40 ns (e.g. IBM 8514, 25 Mhz effective) to 20 ns (e.g. 3DO, 50 Mhz effective). Pixel processing is sequential.

70 ns access FP DRAM is slightly faster than A1200's 80 ns access FP DRAM.

The "ns access" doesn't show the "read/write cycle". For example, A1200's 80 ns access FP DRAM has 140 ns "read/write cycle" which is 7.14 Mhz effective.

Tseng Labs ET4000 mastered "memory interleave" for FP DRAM to deliver VRAM-like performance, a claim from Tseng Labs.


_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
Hammer 
Re: One major reason why Motorola and 68k failed...
Posted on 19-Jun-2024 5:12:04
#217 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5859
From: Australia

@matthey

Quote:
https://en.wikipedia.org/wiki/Heterogeneous_System_Architecture

It's not like integrated HSA ray tracing GPUs are too high end. They could be implemented as standard on RPi level hardware or microconsoles. The full sized console market seems to be the only market making them standard and gaining the advantages. It's amazing how many people think plugin GPUs have a performance advantage over integrated GPUs when the opposite is true. Even in Amiga Neverland where the Amiga was ahead of its time and headed toward an integrated CPU+2D SoC and today's SoCs are CPU+2D+3D. Jay Miner understood that closer is better performance and the future is better integration of technology.

Physical UMA has its downside.

PS4's CPU bandwidth reduces GPU bandwidth disproportionately.
https://cdn.wccftech.com/wp-content/uploads/2014/08/PS4-GPU-Bandwidth-140-not-176.png

PS4 Pro has 1 GB DDR3 and 8 GB GDDR5.

PS5 has 512 MB DDR4 and 16 GB GDDR6-14000.

The extra DDR memory controller is located in custom Southbridge. The extra DDR memory pool is to reduce memory access interference from slow processors.

Due to UMA's downside, AMD's incoming Strix Halo APU has a 32 MB cache for 40 CU RDNA 3.5 IGP with 256-bit LPDDR5x-8000. This is an attempt to mitigate UMA's downside.

Unlike Intel's IGP which shares L3 cache with the CPU, AMD's IGP is implemented as a discrete GPU without discrete memory. AMD's IGP is not part of the CPU's cache scope.

Gaming PC with discrete video memory has the ultimate gaming performance crown. ReBar enables the CPU to fully access the GPU's discrete video memory in a logical UMA, but it's still physically discrete memory pools.

Quote:

ARM. The low and mid performance ARM hardware is for the "masses" while the more expensive high end x86-64 hardware is for the classes.

There's Qualcomm's profit expectation factor which is different BOM cost from TSMC or Samsung.

AMD's APU profit margins for game consoles are low e.g. 15 to 18 percent range.

Quote:

Mikael Kalms?

The origins date back to demo coders from at least the late 1990s. Rune Stensland is another demo coder that created some c2p documentation and later was as early Apollo Team member with me. Distribution of documentation was not as easy back then with dial up.

My cited 1991 to 1995 date range is important.

Michael Abrash's Mode X optimization guide was published in July 1991 and there was a flood of texture-mapped 3D PC games in 1994. You didn't factor in the game's development time.

https://github.com/Kalmalyzer/kalms-c2p 1996 Google code is too late for 1991 to 1994 date range. For Western markets, Q4 1995 is Sony's PS1 era which influenced PC's Pentium class CPU requirements in early 1996.

Quake was in development after Doom II with a Pentium CPU target. IDsoftware hired Michael Abrash at the start of 1995.


From https://www.intel.fr/content/dam/doc/report/history-1994-annual-report.pdf
Intel reported the following
1. In 1994's fourth quarter, Pentium unit sales accounted for 23 percent of Intel's desktop processor volume.
2. Millions of Pentiums were shipped.
3. During Q4 1993 and 1994, a typical PC purchase was a computer featuring the Intel 486 chip.
4. Net 1994 revenue reached $11.5 billion.
5. Net 1993 revenue reached $8.7 billion.
6. Growing demand and production for Intel 486 resulted in a sharp decline in sales for Intel 386 from 1992 to 1993.
7. Sales of the Intel 486 family comprised the majority of Intel's revenue during 1992, 1993, and 1994.
8. Intel reached its 6 to 7 million Pentiums shipped goal during 1994. This is only 23 percent unit volume.

By the end of 1994, Intel's Pentium PC install base crushed the entire Amiga install base of 4 to 5 million units!

The PC market was preparing to meet PS1's Western market release in H2 1995.

Quote:

Like the Post Commodore Bankruptcy Documents, faked documents or tampered data are possible without insiders to verify the data but unlikely to be complete hoaxes because the data is elaborate, at least partially accurate and the effort would be large with little to gain. The benchmark results are on the high side but this may be because of improved compiler results at the later date of the documents. It's also possible that Motorola reused an older benchmark result by mistake in "later Motorola literature".

You couldn't prove that the 17 mips for 68030 @ 50 Mhz is Dhrystone.

"Dhrystone" itself can change e.g. stacking math instruction towards MUL will degrade CPUs like 68030. This is why SysInfo's Dhrystone can only be compared to SysInfo's Dhrystone since Sysinfo does not use the official "Dhrystone" routine.

For 3D, 68030 can't compete against MUL strong MIPS R3000 or SuperH2.

ARMv3M (during ARM60 era) specifically targets the strong MUL instruction CPU market.

Motorola/Freescale's 68000-based DragonBall VZ wasn't pushed out from the handheld market until around the ARMv4T era.


Quote:

The 68060 FPU has overall better performance and features than the simplified ColdFire FPU. FDIV is an exception where the ColdFire FPU has better performance but the 68060 FPU design could have optimized the FSGLDIV instruction and/or borrowed the 88110 extended precision FDIV which has a latency of only 26 cycles compared to the ColdFire double precision latency of 23 cycles. The 88110 also had optimizations for single precision (13 cycle) and double precision (23 cycle) FDIV which the ColdFire double precision FDIV latency exactly matches and may have borrowed.

Instruction | Precision | 6888x cycles | 68060 cycles | ColdFireV4e cycles | 88110 cycles
FMUL .x 71 3 4 3
FMUL .d 71 3 4 3
FMUL .s 59 3 4 3
FDIV .x 103 37 no 26
FDIV .d 103 37 23 23
FDIV .s 69 37 23 13

If 88110 sources were available to borrow from, the small 3 cycle difference between 88110 extended precision and double precision latency is probably not worth adding a 68k FPU FDBLDIV instruction but a DSGLDIV instruction with a 13 cycle latency would be nice.

That's "What If" argument.

Facts: 68060's FPU is weaker than P5 Pentium.

68LC060 and 68EC060 are not low-cost-friendly for game consoles with "cheap RISC" requirements. Remember, Amiga Hombre's two chips are $40 with a 1 million transistor budget that is similar to PS1's 1 million transistor budget.

68EC060 is useless for platforms with DMA'ed devices.

For 1993 to 1996 time period, no Motorola CPU product delivers A1200's $599 retail price with PS1's R3000 CPU @ 33 Mhz performance. Prove me wrong.

Quote:

No. ColdFireV4e is for lower end embedded use like the ARM Cortex-R. Freescale would have competed with PPC for the console market but they lost the market to IBM.

NO.

NXP ColdFireV4e @ 200Mhz is not cheap e.g.
https://www.mouser.sg/c/semiconductors/embedded-processors-controllers/microprocessors-mpu/?core=ColdFire%20V4e&m=NXP&tradename=ColdFire&sort=pricing
200 units @ $65.87 each

vs

Texas Instruments, ARM Cortex A9 @ 600 MHz
https://www.mouser.sg/c/semiconductors/integrated-circuits-ics/embedded-processors-controllers/?core=ARM%20Cortex%20A9&sort=pricing
90 units @ $13.46 each.

vs

Renesas Electronics, ARM Cortex A15 @ 1500 Mhz
https://www.mouser.sg/c/semiconductors/integrated-circuits-ics/embedded-processors-controllers/?core=ARM%20Cortex%20A15
40 units @ $69.20 each

vs

Texas Instruments, ARM Cortex A53 @ 1400 Mhz
https://www.mouser.sg/c/semiconductors/integrated-circuits-ics/embedded-processors-controllers/?core=ARM%20Cortex%20A53&sort=pricing
119 units @ $15.50 each

You're in dreamland.

IBM has PPC 602 (with 1 million transistors) for 3DO M2.

IBM deal with 3DO M2 is two PPC 602 @ 66Mhz for a very low price.

Quote:

2001 Nintendo Game Cube IBM PPC Gekko
2005 MS XBox 360 IBM PPC Xenon
2006 Sony PS3 IBM PPC Cell
2006 Nintendo Wii IBM PPC Broadway
2012 Nintendo Wii U IBM PPC Espresso

Freescale couldn't compete with IBM and IBM lost the console market due to PPC and poor designs.

For Xbox One and PS4, IBM PowerPC A2 lost to AMD Jaguar.

AMD Jaguar also defeated ARM Cortex A15 competitor.

Quote:

The PPC 970 (G5), Xenon and Cell CPUs were bad while the Nintendo PPC cores were more practical but lackluster PPC G3 designs.

CELL SPU has useful instructions for pixel processing which is matched by the 2006 release of Intel Core 2's SSSE3. Core 2 and PS3 were released in 2006.

Quote:

Motorola made mistakes with their 88k too. They should have either made a 64 bit multi-chip 88k CPU or a single chip 32 bit CPU. They were over protective of their IP after the Hitachi lawsuits where they should have been more aggressive at licensing their technology for integrating in SoCs like MIPS, SuperH and ARM. Motorola was likely to lose the too many RISC ISAs proliferation battle so they surrendered both their 88k and 68k development to bet their future on mediocre at best PPC.


https://en.wikipedia.org/wiki/SoundStorm
1. NVIDIA SoundStorm uses a Motorola 56300-based digital signal processor (DSP). With Xbox subsidy from Microsoft.

2. NVIDIA decided the cost of including the SoundStorm SIP block (Motorola 56300 DSP) on the dies of their chipsets was too high and was not included in nForce3 and beyond.

Without Microsoft's Xbox subsidies, NVIDIA drops Motorola 56300 DSP.

Quote:

Motorola/Freescale had a newer 68k ISA and more to protect with a brighter future than 808x/x86 which was approaching EOL.

Your EOL assertion on X86 is FALSE. There are many false prophets.

Quote:

It's natural that licensing would be stricter with the 68k but IBM's mistake of choosing the inferior 808x for their PC changed everything.

Hint: 68008 wasn't available in the required time for IBM's PC. There's a PCB and support chips cost issue.

Last edited by Hammer on 19-Jun-2024 at 05:44 AM.
Last edited by Hammer on 19-Jun-2024 at 05:35 AM.
Last edited by Hammer on 19-Jun-2024 at 05:30 AM.
Last edited by Hammer on 19-Jun-2024 at 05:27 AM.
Last edited by Hammer on 19-Jun-2024 at 05:18 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
matthey 
Re: One major reason why Motorola and 68k failed...
Posted on 20-Jun-2024 3:57:26
#218 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2270
From: Kansas

Hammer Quote:

Physical UMA has its downside.

PS4's CPU bandwidth reduces GPU bandwidth disproportionately.
https://cdn.wccftech.com/wp-content/uploads/2014/08/PS4-GPU-Bandwidth-140-not-176.png

PS4 Pro has 1 GB DDR3 and 8 GB GDDR5.

PS5 has 512 MB DDR4 and 16 GB GDDR6-14000.

The extra DDR memory controller is located in custom Southbridge. The extra DDR memory pool is to reduce memory access interference from slow processors.

Due to UMA's downside, AMD's incoming Strix Halo APU has a 32 MB cache for 40 CU RDNA 3.5 IGP with 256-bit LPDDR5x-8000. This is an attempt to mitigate UMA's downside.

Unlike Intel's IGP which shares L3 cache with the CPU, AMD's IGP is implemented as a discrete GPU without discrete memory. AMD's IGP is not part of the CPU's cache scope.

Gaming PC with discrete video memory has the ultimate gaming performance crown. ReBar enables the CPU to fully access the GPU's discrete video memory in a logical UMA, but it's still physically discrete memory pools.


Copying memory between a discreet CPU and GPU, even with DMA, uses CPU memory bandwidth and GPU memory bandwidth reducing system memory bandwidth disproportionately too. HSA reduces the overhead of shared memory but does not eliminate it. Non-shared memory can avoid shared memory overhead at the cost of reduced memory for all units. There is no free lunch. An integrated GPU still has the performance advantage over a discreet GPU because the CPU, GPU and memory can all be physically closer together reducing communication latencies with or without HSA. The fact that the highest performance GPUs are discreet is due to economics and marketing. With chip fab process improvements and SRAM scaling slowing, don't be surprised if AMD APUs (SoCs) become higher performance and more competitive. Apple gets it with their SoCs even though their integrated GPUs are designed more for low power than performance.

Hammer Quote:

You couldn't prove that the 17 mips for 68030 @ 50 Mhz is Dhrystone.

"Dhrystone" itself can change e.g. stacking math instruction towards MUL will degrade CPUs like 68030. This is why SysInfo's Dhrystone can only be compared to SysInfo's Dhrystone since Sysinfo does not use the official "Dhrystone" routine.


The old SysInfo does not use the Dhrystone benchmark code but labels their benchmark result as Dhrystones. This is a violation of Dhrystone rules and SysInfo results should not be used.

Dhrystone does not stress MUL performance. It primarily tests string manipulation and memory/cache handling. It is a poor synthetic integer benchmark but it is very simple, easy to compile, has well defined rules and there are bechmark values available for many old CPUs.

Hammer Quote:

For 3D, 68030 can't compete against MUL strong MIPS R3000 or SuperH2.

ARMv3M (during ARM60 era) specifically targets the strong MUL instruction CPU market.

Motorola/Freescale's 68000-based DragonBall VZ wasn't pushed out from the handheld market until around the ARMv4T era.


Integer MUL and DIV instructions were not commonly used in the 68000-68030 era. 3D was not common even in the early 68040 era. It was 1992+ before 3D started to take off.

Year | CPU | 16-bit MUL cycles | 32-bit MUL cycles | 16-bit DIV cycles | 32-bit DIV cycles
1984 68020 25-28 41-44 42-57 76-91
1988 R3000 no 12 no 75
1991 R4000 no 12 no 75
1994 68060 2 2 22 38
1996 R5000 no 4 no 36

The early microcoded 68k designs were weak and it took too long for improved designs to be released. Even the 68040 had longer latencies for MUL and DIV than the R3000 but the tables had turned with the 68060. You complain about the 68k not having 8 bit integer MUL and DIV but MIPS CPUs don't have 8 or 16 bit MUL and DIV.

Hammer Quote:

Your EOL assertion on X86 is FALSE. There are many false prophets.


If the 68000 was chosen for the IBM PC, x86 would be long dead and forgotten more than the 68k is today. The 1994 68060 hasn't been surpassed yet 30 years later this CPU is in short supply. The 68k ISA is used by hundreds of thousands of users. The 68000 had product wins and was crushing 808x/x86 until the fateful decision by IBM to choose an inferior and lower value CPU. IBM should have chosen the 68000 and not the 68008 which offered less value. Jay Miner got it right and created the IBM PC killer but CBM mucked it up.

 Status: Offline
Profile     Report this post  
ppcamiga1 
Re: One major reason why Motorola and 68k failed...
Posted on 20-Jun-2024 6:38:48
#219 ]
Cult Member
Joined: 23-Aug-2015
Posts: 858
From: Unknown

@Hammer

emulator is emulator. you fool yourself.
drop this crap emu68. just play native quake quake ii doom on rpi.
cheaper, faster and dont need amiga as joystick interface

 Status: Offline
Profile     Report this post  
Hammer 
Re: One major reason why Motorola and 68k failed...
Posted on 20-Jun-2024 7:06:39
#220 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5859
From: Australia

@matthey

Quote:

Copying memory between a discreet CPU and GPU, even with DMA, uses CPU memory bandwidth and GPU memory bandwidth reducing system memory bandwidth disproportionately too.

1. CPU generates a command list for GpGPU's consumption. This is a small size transaction.

2. When necessary, GpGPU fetches data from main memory or DirectX12U's DIrectStorage NVMe.

Large VRAM-equipped GPUs have lower PCIe hit rates.

For lower VRAM-equipped GPUs, higher PCIe access with texture fetch appears as hitches in the minimum frame time. This is not a problem for GpGPUs with 16 GB VRAM or greater.

https://www.slideshare.net/slideshow/framegraph-extensible-rendering-architecture-in-frostbite/72795495
DirectX 12 PC memory layout = 80 MB
PS4 memory layout = 77 MB
XBO memory layout = 76 MB
Non-aliasing memory layout = 147 MB (this is old school).

PC DirectX12 has features to minimize data copies.

Quote:

HSA reduces the overhead of shared memory but does not eliminate it. Non-shared memory can avoid shared memory overhead at the cost of reduced memory for all units. There is no free lunch. An integrated GPU still has the performance advantage over a discreet GPU because the CPU, GPU and memory can all be physically closer together reducing communication latencies with or without HSA.

A shared GPU and CPU in an APU also has a shared TDP budget.

The absolute best 3D GPU is still RTX 4090 for a reason. AMD's best 3D GpGPU is RX 7900 XTX.

Quote:

The old SysInfo does not use the Dhrystone benchmark code but labels their benchmark result as Dhrystones. This is a violation of Dhrystone rules and SysInfo results should not be used.

SysInfo's Dhrystone is only used against SysInfo Dhrystone.

Quote:

Dhrystone does not stress MUL performance. It primarily tests string manipulation and memory/cache handling. It is a poor synthetic integer benchmark but it is very simple, easy to compile, has well defined rules and there are bechmark values available for many old CPUs.

Dhrystone is biased towards string copying performance.

Do you still defend 68030 @ 50Mhz being PS1's R3000 @ 33 Mhz or Saturn's SuperH2 @ 28Mhz class CPU?

Quote:

Integer MUL and DIV instructions were not commonly used in the 68000-68030 era. 3D was not common even in the early 68040 era. It was 1992+ before 3D started to take off.

Reminders,
1. PC's 3D games were in development before their release date in 1992, 1993, and 1994 e.g.
Ultima Underworld: The Stygian Abyss (released in 1992) was in development from May 1990. The team demonstrated a tech demo engine at the June 1990 Consumer Electronics Show (CES) and impressed Origin Systems.

Geoff Crammond's Formula 1 Grand Prix for PC 1991 already has a textured map 3D.

2. PS1's 3D games are in development before their release date in 1994 and 1995. PS1's game development has been known to be active since 1993 and is influenced by earlier SNES (released in 1990)'s Mode 7 and various 3D DSP add-ons.

For strong MUL performance, did you miss DSPs?

MIPS Inc. decided to combine strong MUL instructions with a RISC CPU.







Last edited by Hammer on 20-Jun-2024 at 07:59 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

 Status: Offline
Profile     Report this post  
Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle