Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
|
|
|
|
Poster | Thread | Heimdall
|  |
Re: 32-bit PPC on FPGA Posted on 11-Mar-2025 11:16:38
| | [ #441 ] |
| |
 |
Regular Member  |
Joined: 20-Jan-2025 Posts: 103
From: North Dakota | | |
|
| | Status: Offline |
| | matthey
|  |
Re: 32-bit PPC on FPGA Posted on 12-Mar-2025 0:16:05
| | [ #442 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2602
From: Kansas | | |
|
| Hammer Quote:
SNES's 65816 CPU's math power was supported by custom DSP and SuperFX (this 16-bit RISC coprocessor development led to Argonaut RISC Core CPU family) add-ons.
65xx/65xxx CPU family's slow R&D pace led to ARM and ARC RISC CPUs.
|
On paper, the 1985 Amiga is inferior to the 1988 Sega Genesis/Mega Drive and 1990 SNES in almost every way.
https://segaretro.org/Sega_Mega_Drive/Hardware_comparison Quote:
Vs. Amiga
The Mega Drive was generally more powerful than the Amiga. The Mega Drive's 68000 CPU is clocked at 7.6 MHz, while the Amiga's 68000 CPU was clocked at 7.16 MHz (NTSC) or 7.09 MHz (PAL). The Mega Drive displays eighty 15-color sprites at 32×32 pixels each, while the Amiga displays eight 3-color sprites at 8 pixels wide. The Mega Drive displays 61–64 colors standard and 183–192 colors with Shadow/Highlight, while the Amiga displays 2–32 colors standard and 64 colors with EHB. The Mega Drive's VDP can DMA blit 3.21845–6.4 MB/s bandwidth (6.4 MPixels/s fillrate), while the Amiga's Blitter can blit 1.7725–3.58 MB/s (2.363333–4.773333 MPixels/s with 64 colors). During active display, with 64 colors at 60 FPS, the VDP can write 708 KB/s to 2 MB/s (1.4–2 MPixels/s) during 320×224 display, while the Blitter can write 332.5–700 KB/s (443,333–933,333 pixels/s) during 320×200 display. The Mega Drive supports tilemap backgrounds, reducing processing, memory and bandwidth requirements by up to 64 times compared to the Amiga's bitmap backgrounds, giving the Mega Drive an effective tile fillrate of 6–36 MPixels/s. The Mega Drive has a Z80 sound CPU and supports 10 audio channels, while the Amiga lacks a sound CPU and supports 4 audio channels.[
|
The SNES Ricoh 5A22 CPU MIPS is roughly equivalent to the 68000 in the Amiga and Mega Drive but the 68000 is much easier to program and has a large advantage with 16-bit and 32-bit datatypes. A good example of SNES sluggishness, inferior parallax scrolling and smaller sprites with less animation is Shadow of the Beast.
Shadow of the Beast | Amiga, MD, PCE, SNES | Comparison - Quad Longplay https://www.youtube.com/watch?v=QUT91K4mPlw
The SNES colors are good when looking at a static screen and the music is ok but the playability is the worst among the 4 consoles above. The Mega Drive version is almost as good as the Amiga version and seems fast including scrolling, maybe even faster than the Amiga version. The Amiga version looks like the highest resolution, remains a good all around contender and has the best music and sound. The SNES may have the best hardware other than the CPU and the SNES version of the game could likely be optimized more but this is where the 68000 is so much better. A 68000 SNES may have killed the Mega Drive and a 68020 may have even been possible as the CD32 delivered 3 years later. Console developers eventually learned the lesson of difficult to program hardware with hypothetical peak performance that was rarely realized.
Hammer Quote:
Commodore's bad 386 inventory control caused inventory write down.
|
It is necessary to buy CPUs in quantity to get quantity discounts. Buying many 386 CPUs probably looks bad in hindsight but Commodore was likely not the only one caught with outdated CPUs. Is it better to be like Trevor and pay high prices for a few hundred embedded PPC SoCs and choose castrated versions to try to make up for the lack of quantity discounts? How do you compete against the likes of RPi, SiFive/StarFive, DMP Electronics and WDC that have fabless semi development and produce SoCs starting at less than $1 USD enabling SBCs for less than $100 USD?
Hammer Quote:
Pentium 100 Mhz vs Cyrix 6x86 100 MHz and both have 50 MHz 64-bit FSB.
|
How many of the games and benchmarks were compiled and/or optimized for the Pentium? You don't think Intel played fair after Operation Crush lies do you?
Hammer Quote:
Unlike 68060, Cyrix 6x86 has 16 bytes per cycle fetch from L1 instruction cache.
...
https://archive95.net/view-amigaplus_n/http://www.newtek.com/tech/faqs/lightwav/multi/bench.html Lightwave benchmarks
...
Raytrace.lws Amiga (Phase5/060): 5hr 28min. 14 sec. MAC 8500/150mhz = 31 min. 51 sec. Intel P5/166 = 37 min. 43 sec Intel PP200 = 20 min. 24 sec Dual PP200 = 12 min. 20 sec Alpha 366 MHz = 13 min. 48 sec Alpha 500 MHz = 12 min. 34 sec SGI 02 R10K = 23 min. 58 sec. Ultrasparc 167 MHz = 26 min 57 sec.
|
The 68060 must be horrible!
Hammer Quote:
Other Raytrace.lws benchmark results with low resolution settings from Hold and Modify on YouTube A3000, Phase5-MKII (060@50), OS3.2.1 = 52m 24s A1200, TF1260 (060@50)128MB, OS3.2.1 = 54m 22s A4000, BFG060 (rev5) 060@50, 128MB, OS3.2.1 = 43m 46s A2500, PP&S (040@25), OS3.3.2.1 = 2h 2m8s
A1200/Vamp v2 Coffin56/Gold2.12fw - 1526s (25m 26s) Amiga4000D 3.2.1 WarpEngine060@96MHz - 1477s (24m 37s) Vampire V4SA+ Coffin57/SA_8435.jic (x14) - 1405s (23m 25s) Amiga4000T 3.2.1, BFG9060@100MHz - 1358s (22m 38s)
A4000D, TF4060, 060@100 (Rev6), AmigaOS 3.2.2.1 - 28 minutes
|
But then the AC68080@~100MHz is worse than the 68060@100MHz with a 4B/cycle fetch, a 6B/cycle max instruction size for superscalar execution, a minimalist FPU which is not pipelined and only 8 GP FPU registers. The 68060 improved from a pathetic 5hr 28min for the 68060@50MHz to 22m 38s for the 68060@100MHz easily killing the P5 Pentium@166MHz with 37 min 43 sec and even coming close to the OoO P6 Pentium@200MHz? With that kind of improvement, forget the AC68080 and clock up the Pentium killer 68060!
What happened to the 68060 4B/cycle fetch handicap?
http://apollo-core.com/knowledge.php?b=2¬e=32570&z=nGKBJI Gunnar von Boehn Quote:
A1200 coder Quote:
68060 2-3 instructions per clock (yes, Motorola manuals actually mention somewhere 3 instructions per clock)
|
You can answer this very easily.
68060 Icache does provides at maximum 32bit per cycle. 32bit is enough for maximal 2 of the shortest 68k instructions. 68k instructions have a size of 16bit or longer.
As many 68k instructions are 32bit or 48bit or longer the 68060 is often limited by this. Motorola was aware of this bottleneck and the 68060-dev-team wanted actually to double the Icache fetch ... but this never happened.
|
Gunnar should understand how the decoupled instruction fetch pipeline (IFP) and execution pipelines (OEPs) work as we discussed it as well as the 6B/cycle superscalar execution limit. Then again he did not think much of it or adding it to the AC68080 despite borrowing other ideas.
FPU instructions are 4B minimum yet this is not a FPU handicap? What about the lack of a pipelineded FPU and only 8 FPU registers?
http://apollo-core.com/knowledge.php?b=2¬e=40575&z=EKsqLH Gunnar von Boehn Quote:
A1200 coder Quote:
I know that even 68060 was better at integer performance than Pentium 1, but the FPU wasn't as good,
|
Many people say this but I'm not so sure this is true. I would say the 68000 FPU is generally more flexible and a lot better to code. The 68K has 8 FPU register it can use as destination. It can use the 8 FPU register, the 8 Data register and memory and immediate as source. This is very flexible and nice to program. The Intel FPU has less coding options, which means it often needs to do more complicated and needs more instructions for doing the same. Therefore the 68K FPU is better ... If you look at the cycles then some instructions are a little faster on intel, other instructions are a cycle or two faster on 060. In general I would say they are on the same ballpark in performance. Comparing with the Apollo 68080 FPU .. its very clear that the 080 FPU is by far the best. It can use 32 FPU register as destination. Can use 40 Register as source, plus immediates, plus memory The 68080 FPU can also reach a lot higher performance.
some example of peak FLOPS 68060@50 Pentium@90 68080@90
68060 Pentium 68080 FNEG 50 90 90 FADD 17 90 90 FMUL 17 90 90 FDIV 1.5 3 90 FSQRT 0.7 1 90
The Pentium looks on paper better than in real world. In real world you often need an extra instruction to compensate its design weakness.
If you factor this in then its more like:
68060 Pentium 68080 FNEG 50 45 90 FADD 17 45 90 FMUL 17 45 90 FDIV 1.5 3 90 FSQRT 0.7 1 90
|
The AC68080 has 32 GP FPU registers and pipelined FADD, FMUL, FDIV and FSQRT instructions. It has 16B/cycle fetch, larger caches and DDR3 memory. Where is the performance then? Perhaps the wonderful performance does not work with existing 68k code? Perhaps a recompile is needed with non-existent compiler support for the AC68080?
Most of the superscalar CPUs above really are fetching 16B/cycle of instructions. The 68060 4B/cycle fetch handicap does exist though.

 http://apollo-core.com/minibench/index.htm?page=benchmarks
These are Gunnar's MiniBench results. The 68060 "add-im16" and "And-im16" results would be all the way up with the "add-reg" and "And-reg" results with an 8B/cycle fetch. Like SysInfo, this is a synthetic benchmark with unrealistic code. If an "add.w #d16,Dn" that is 4B is repeated enough times, the instruction buffer will empty and the 68060 will drop from superscalar executing 2x "add.w #d16,Dn" per cycle to executing a single "add.w #d16,Dn" per cycle. It is odd that he does not have an "add.l #d32,Dn" test which would drain the instruction buffer faster with dual 6B instructions being executed. In realistic code, the 68060 may lose a few cycles when the instruction buffer starts to fill and when superscalar executing a sustained number of large instructions. Performance can be good with just a 4B/cycle fetch and is not a major bottleneck as 68060 performance results show in some software despite the real handicap which is 68060 compiler support. However, an 8B/cycle fetch coupled with 8B superscalar instruction execution, larger caches and other improvements should have synergies to provide a nice performance boost.
|
| Status: Offline |
| | matthey
|  |
Re: 32-bit PPC on FPGA Posted on 12-Mar-2025 2:16:49
| | [ #443 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2602
From: Kansas | | |
|
| Heimdall Quote:
Thanks for the Gunnar's thread. That answers all my questions about the FPGA implementation.
I have to say I am surprised, though. While I don't know anything about FPGA coding, I don't quite understand how come the microcode of RISC's instructions takes more FPGA space/gates to implement than CISC instructions ?
Or is it because there's so many RISC instructions on 603e, perhaps ? Meaning, it just all adds up and you need a giant board just to fit it all in?
|
PPC is not a small RISC ISA. It had more instructions, more complex instructions and more addressing modes than most RISC ISAs when it was released although it is nothing like most ARM cores today which support the original ARM ISA, Thumb ISA, Thumb-2 ISA and AArch64 ISA with AArch64 alone being several times the number of instructions and addressing modes of PPC plus a SIMD unit. The standard PPC ISA included 32 GP 32-bit or 64-bit registers and a FPU with 32 GP 64-bit registers. The limited OoO PPC603 was actually a little smaller at 1.7 million transistors than the 68060 at 2.5 million both with 8kiB caches but the 68060 had more features.
68060 features that increased transistors compared to the PPC603 8-stage pipeline vs 4-stage PPC603 pipeline dynamic branch prediction 2nd AGU+ALU integer unit & load/store unit combo with 2nd barrel shifter multi-banked cache more cache associative ways
PPC603 features that increased transistors compared to the 68060 limited OoO with unit reservation stations and a completion unit reorder buffer 32 GP 32-bit integer registers vs 16 GP 32-bit integer registers for the 68060 32 GP 64-bit FPU registers vs 8 GP 80-bit FPU registers for the 68060
The PPC603 had disappointing performance and the PPC603e with double the caches, more associative ways and a die shrink quickly replaced it but the core grew to 2.6 million transistors. The 68060 was more of a cache miser due to much better code density and still had similar performance efficiency (performance/MHz). The 68060 should have been able to clock higher using the same chip process due to the deeper pipeline though. The PPC shallow pipeline limited OoO designs did boost performance over ARM designs but the low clock speeds limited performance and Steve Jobs was not happy choosing to switch to deeper pipelined x86 cores while ignoring the 68060 which would have saved Apple from switching to PPC in the first place. The low power PPC603 design received a 2nd integer unit, branch prediction and more caches to become G3 designs but it did not receive the deeper pipeline for higher clock speeds until late in the short PPC desktop life.
Migrating from IBM 750GX to MPC7447A https://www.nxp.com/docs/en/application-note/AN2797.pdf Quote:
2.2 Pipeline Comparison
The difference in pipeline depths between the IBM 750GX and MPC7447A is significant. With the IBM 750GX, the minimum depth has been kept to a rather short four stages of instruction; fetch, dispatch/decode, execute, and complete. Write back is included in the complete stage. The pipeline diagram for the IBM 750GX is shown in Figure 3.
Figure 3 shows a maximum depth of six stages using the floating-point unit. If branch prediction does not work well for a particular application, having a short pipeline is advantageous due to a fairly small pipeline flushing penalty. However, branch prediction and modern compilers can, more often than not, prevent frequent pipeline flushes. As a result, the completion rate of two instruction retirements per clock becomes more of a performance bottleneck. It is also worth noting that the IBM 750GX will not be able to sustain clock rates of much greater than 1.1GHz without increasing the depth of the pipeline.
With a minimum depth of seven stages, the MPC7447A pipeline, shown in Figure 4, boasts efficient use of its additional hardware resources by dispatching three instructions per cycle to its execution units as well as the ability to retire three instructions per cycle. Due to the higher maximum frequency of the 7447A (up to 1.5GHz) the extra pipeline depth is required to make efficient use of faster running pipeline stage hardware, reducing the latency of certain instructions, such as many floating point and complex integer instructions. Compilers can take advantage of the extended pipeline to ensure that the target maximum of 16 instructions in flight at any one time is achieved as closely as possible.
|
The G4 PPC MPC7447A 7-stage pipeline allowed the clock speed to increase to 1.5 GHz from the G3 PPC 750GX 1.1 GHz clock speed with a 4-stage pipeline which is a 36% improvement. The 68060 8-stage pipeline likewise should have clocked about 50% higher than the PPC603(e)/G3 4-stage pipeline but Motorola/Freescale did not allow that to happen because of the AIM to push their PPC shallow pipeline designs. The limited OoO PPC design was not as efficient in some ways with the deeper pipeline judging from Gunnar's MiniBench results.

 http://apollo-core.com/minibench/index.htm?page=benchmarks
The 4-stage PPC G3 has a nice consistent performance profile while I suspect the 7-stage PPC G4 may have an increased load-to-use penalty that the limited OoO design is no longer able to remove from the "Work-LA" and similar tests. Branch prediction is worse at least partially do to the deeper pipeline but may have other problems. Without good branch prediction, loop overhead is high and loops need to be unrolled a lot to get good performance which is bad for code density. The 68060 has a deeper pipeline and much better branch prediction and lower loop overhead.

The 68060 has a great performance profile for as old as the design is. The "add-im16" and "And-im16" tests are only bad because of the unrealistic repetitive code with large instructions and would be fixed by an 8B/cycle fetch like the ColdFire V5 added anyway. The Work tests may be down some for the same reason. The "gosub" test could be better because the 68060 lacks a hardware return stack which the later CPUs have and the ColdFire V5 added as well.
A stripped down in-order superscalar PPC CPU with a very deep pipeline for a high clock speed of 3.2 GHz in the PS3 and a higher clock speed in a FPGA looks like the following.

The shallow pipeline limited OoO PPC designs were effective but did not scale to deeper pipelines well. There are other PPC603 limited OoO designs in Gunnar's benchmark list including the Efika 5200B which looks like it has a 2nd integer unit but only one barrel shifter compared to the e300/5121 which also adds the 2nd barrel shifter and is a big upgrade from the original PPC603 and is quite acceptable for embedded use other than the poor code density and much larger footprint compared to the 68k.
Heimdall Quote:
I would love to see the benchmark code they used, though. It's not easy to create a fair benchmark that is 50/50 fair to both CISCs and RISCs.
I should probably educate myself further on this topic. Yet another rabbit hole to go down instead of coding my games 
|
One of the benchmarks that the Apollo Team worked on improving for the competition was the DMIPS benchmark. Some of the benchmarks may have been more about compliance than performance. I believe the PPC core "won" the competition even though it did not have as much integer performance and was likely a larger core. The competition was before the Apollo core had a FPU which may have been important. It is not really surprising that IBM would choose a PPC core they designed and are familiar with over some unfamiliar design and architecture.
Last edited by matthey on 12-Mar-2025 at 10:16 AM. Last edited by matthey on 12-Mar-2025 at 10:14 AM. Last edited by matthey on 12-Mar-2025 at 10:11 AM. Last edited by matthey on 12-Mar-2025 at 10:06 AM.
|
| Status: Offline |
| | Hammer
 |  |
Re: 32-bit PPC on FPGA Posted on 12-Mar-2025 5:54:51
| | [ #444 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @matthey
Quote:
The SNES Ricoh 5A22 CPU MIPS is roughly equivalent to the 68000 in the Amiga and Mega Drive but the 68000 is much easier to program and has a large advantage with 16-bit and 32-bit datatypes. A good example of SNES sluggishness, inferior parallax scrolling and smaller sprites with less animation is Shadow of the Beast.
Shadow of the Beast | Amiga, MD, PCE, SNES | Comparison - Quad Longplay https://www.youtube.com/watch?v=QUT91K4mPlw
|
Mega Drive and SNES can do better than that, e.g. Mortal Kombat.
SNES and Mega Drive have discrete 64KB VRAM (designed for 320x200 level resolution).
Quote:
The SNES colors are good when looking at a static screen and the music is ok but the playability is the worst among the 4 consoles above. The Mega Drive version is almost as good as the Amiga version and seems fast including scrolling, maybe even faster than the Amiga version. The Amiga version looks like the highest resolution, remains a good all around contender and has the best music and sound. The SNES may have the best hardware other than the CPU and the SNES version of the game could likely be optimized more but this is where the 68000 is so much better. A 68000 SNES may have killed the Mega Drive and a 68020 may have even been possible as the CD32 delivered 3 years later. Console developers eventually learned the lesson of difficult to program hardware with hypothetical peak performance that was rarely realized.
|
Reminders, 1. Mac LC-II with 68030-16 on 16-bit system bus and discrete 256K VRAM graphics was Apple's best selling model in 1992, from a total of 2.5 million Mac 1992 sales.
Jeff Porter's 1991 AA1000Plus's 68EC020-16 on 32bit system bus and UMA graphics inside a pizza box case was Porter's Jackintosh move against Apple's successful Mac LC (68020-16Mhz on 16bit bus and discrete 256K VRAM graphics).
Under USD$1000 AA1000Plus is important to shift the Amiga platform into Mac LC's mass production and profit levels.
AA1000Plus's near pizza case design looks like a desktop Mac LCish, instead of looking like a toy.
2. Every A1200 unit's $50 is allocated to pay for A600's old debts, hence there's very little room to maneuver.
A600's 1 million scale mass production has drained CBM's cash at the bank and maxed CBM's credit limit.
A300/A600 project is a large mistake.
Quote:
It is necessary to buy CPUs in quantity to get quantity discounts. Buying many 386 CPUs probably looks bad in hindsight but Commodore was likely not the only one caught with outdated CPUs. Is it better to be like Trevor and pay high prices for a few hundred embedded PPC SoCs and choose castrated versions to try to make up for the lack of quantity discounts? How do you compete against the likes of RPi, SiFive/StarFive, DMP Electronics and WDC that have fabless semi development and produce SoCs starting at less than $1 USD enabling SBCs for less than $100 USD?
|
Blame Commodore PC's Jeff Frank for the f--kup.
Quote:
How many of the games and benchmarks were compiled and/or optimized for the Pentium? You don't think Intel played fair after Operation Crush lies do you?
|
For IBM PC 5150, 8088 and 8087 are real while 68008 is late.
8087 handles INT32, INT64, FP32, FP64 and FP80 datatypes for the IBM PC 5150.
Motorola wouldn't have 68881 until the 1984 release.
PC's Lotus 123 2.0 supports 8087/80287 from 1985. Lotus spreadsheet GUI for Mac was late, which allowed MS Excel GUI to establish a foothold for the next gen GUI spreadsheet application market.
Apple's 1986 Mac Plus' 68000 has 68881 support. Commodore was late on this area since A2620's 68020/68881 release was attached with 68551's late release while Apple released Macintosh II (68020/68881) in 1987.
Quote:
The 68060 must be horrible!
|
An example of ecosystem f__kup. Needs OS related patches to get 68060 to behave correctly.
Emu68 and AC68080 don't need 68060 OS patches. A firmware matured AC68080 V2/V4 just works.
Quote:
But then the AC68080@~100MHz is worse than the 68060@100MHz with a 4B/cycle fetch, a 6B/cycle max instruction size for superscalar execution, a minimalist FPU which is not pipelined and only 8 GP FPU registers. The 68060 improved from a pathetic 5hr 28min for the 68060@50MHz to 22m 38s for the 68060@100MHz easily killing the P5 Pentium@166MHz with 37 min 43 sec (SNIP)
|
As I stated, Hold and Modify's LW benchmark has low resolution setting, hence note the separation between the two benchmark groups.
The benchmark group with SGI has undisclosed LW setting, hence they are NOT directly comparable to the EAB benchmark group.
Using Hold and Modify LW's settings https://eab.abime.net/showpost.php?p=1667864&postcount=109 Date: February 2024 V4 Standalone Core 9128 12x = 1419s (23m 39s) V4 Standalone Core 1063 14x = 1099s (18m 19s)
https://eab.abime.net/showpost.php?p=1605865&postcount=91 Date: March 2023 Amiga4000D 3.2.1 WarpEngine060 @96MHz - 1477s (24m 37s) Amiga4000T 3.2.1, BFG9060 @100MHz - 1358s (22m 38s)
V4 Standalone Core 1063 14x is faster than BFG9060 @100MHz 68080 V4's firmware took some time to mature.
AC68080 only has a single FPU, not P6's split FADD and FMUL pipelines design.
AC68080 V2 quickly targeted Quake. Recall non-IEEE 52bit FP "Quake enabler" for AC68080 V2.
Quake is a mixed integer and floating point program that exploits concurrent pipelines.
Both 68060 and AC68080 have a single FPU, hence they are closer when the workload is heavy FP.
https://groups.google.com/g/comp.sys.amiga.misc/c/jWErDssEr5A/m/ATxHFW6zADwJ Date: 16-Jul-95 Benchmarks printed in Video Toaster User magazine show the '060 to be about half the speed of a P90.
Last edited by Hammer on 22-Mar-2025 at 10:07 PM. Last edited by Hammer on 12-Mar-2025 at 06:35 AM. Last edited by Hammer on 12-Mar-2025 at 06:29 AM. Last edited by Hammer on 12-Mar-2025 at 06:27 AM. Last edited by Hammer on 12-Mar-2025 at 06:25 AM. Last edited by Hammer on 12-Mar-2025 at 06:21 AM. Last edited by Hammer on 12-Mar-2025 at 06:18 AM. Last edited by Hammer on 12-Mar-2025 at 06:09 AM. Last edited by Hammer on 12-Mar-2025 at 05:57 AM.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | Hammer
 |  |
Re: 32-bit PPC on FPGA Posted on 12-Mar-2025 7:55:40
| | [ #445 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @matthey
Quote:
Gunnar should understand how the decoupled instruction fetch pipeline (IFP) and execution pipelines (OEPs) work as we discussed it as well as the 6B/cycle superscalar execution limit. Then again he did not think much of it or adding it to the AC68080 despite borrowing other ideas.
FPU instructions are 4B minimum yet this is not a FPU handicap? What about the lack of a pipelineded FPU and only 8 FPU registers?
|
68060's 4 byte fetch per cycle problem is for superscalar, i.e. concurrent multiple pipeline operations.
A heavy floating point workload like Lightwave is not Quake's mixed integer and floating point that exploits concurrency.
A heavy floating point workload with a single port FPU largely negates superscalar.
Quote:
The AC68080 has 32 GP FPU registers and pipelined FADD, FMUL, FDIV and FSQRT instructions. It has 16B/cycle fetch, larger caches and DDR3 memory. Where is the performance then? Perhaps the wonderful performance does not work with existing 68k code? Perhaps a recompile is needed with non-existent compiler support for the AC68080?
|
32 FPR wouldn't work with legacy Amiga apps with 68K FPU support since they are designed with 8 FPR in mind. It's the same for X86-64's 16 SSE2 registers needing new software.
Out-of-order processing is linked with register renaming and works for legacy software.
AC68080 has two ALU/AGU, a FPU and AMMX units.
AC68080's power is shown with Quake's concurrent multiple pipeline exploits, which need a mix of integer and floating workloads. AC68080 V4 is effectively a Quake-optimized 68K clone CPU.
Remember, DirectX8 class 3D has floating-point vertex and fixed-point integer shader workloads. DirectX7 has a fixed-function version of floating-point T&L and fixed-point integer pixel pipelines, which is a 3D acceleration step from Quake's mix integer and floating-point workload.
K7 Athlon / Core 2 class CPU has multiple ports to serve superscalar FADD and FMUL pipelines.
https://www.tomshardware.com/reviews/athlon-processor,121-8.html
The number one reason why Athlon can play in the same ballpark as the Intel CPUs is the fact that Athlon's FPU is now fully pipelined vs. the unpipelined FPU of K6, K6-2 and K6-3. That's not all however. Athlon has got three parallel FP execution units and, as we know from above, the three execution units can be fed at the same time, since each of them has its own port. Pentium III has also got 3 FP execution units, but unfortunately they're all behind one port. What is so great about the Athlon FPU is that it can execute two 80-bit extended operations a clock to Intel's one.
AC68080 V4 is not yet at K7 Athlon level, perhaps the future AC68080 evolution.
Quote:
These are Gunnar's MiniBench results. The 68060 "add-im16" and "And-im16" results would be all the way up with the "add-reg" and "And-reg" results with an 8B/cycle fetch. Like SysInfo, this is a synthetic benchmark with unrealistic code.
|
68060's being higher clocked 68040 is real for TheForceEngine 68K Amiga port.
Quote:
If an "add.w #d16,Dn" that is 4B is repeated enough times, the instruction buffer will empty and the 68060 will drop from superscalar executing 2x "add.w #d16,Dn" per cycle to executing a single "add.w #d16,Dn" per cycle. It is odd that he does not have an "add.l #d32,Dn" test which would drain the instruction buffer faster with dual 6B instructions being executed. In realistic code, the 68060 may lose a few cycles when the instruction buffer starts to fill and when superscalar executing a sustained number of large instructions. Performance can be good with just a 4B/cycle fetch and is not a major bottleneck as 68060 performance results show in some software despite the real handicap which is 68060 compiler support. However, an 8B/cycle fetch coupled with 8B superscalar instruction execution, larger caches and other improvements should have synergies to provide a nice performance boost.
|
Why are you defending the obsolete 68060 design?
Classic Pentium / 68060 era CPU has FPU being attached to one of the integer pipelines.
ARM Cortex A53's dual FPU pipelines have a multi-port dispatcher unit with 5 ports. Multi-port dispatcher unit with 5 ports feeds Pipe 0: ALU Pipe 1: ALU / INT MUL / Branch Pipe 2: AGU Pipe 3: FADD / FMUL / FMA / NEON-A Pipe 4: FADD / FMUL / FMA / NEON-B
Pipe 3 and 4 combine to form 128-bit FP/INT NEON unit.
ARM Cortex A53 has 8 bytes per cycle from L1 instruction cache into 2-way decoder.
Cortex A53's 3 latency instruction fetch is for high clock speed like on the Intel Atom Bonnell. https://images.anandtech.com/reviews/cpu/intel/atom/deepdive/pipeline.jpg Intel Atom Bonnell has a 16-stage pipeline for high clock speed. Note that this is longer than the Core 2's 14 stage pipeline.
Cortex A55 has lowered instruction fetch latency to 2 cycles on a newer process node. For Bonnell's process node target, Intel's Austin design team traded latency for higher clock speed attainment.
There are valid reasons for Cortex A53's instruction fetch 3 cycle latency on a given process node.
68K ISA performance is dependent on microarchitecture implementation e.g. Athlon'ed 68K is superior when compared to 68060. ZEN'ed 68K is superior when compared to 68060. Core 2'ed 68K is superior when compared to 68060.
Last edited by Hammer on 12-Mar-2025 at 08:51 PM. Last edited by Hammer on 12-Mar-2025 at 08:07 AM. Last edited by Hammer on 12-Mar-2025 at 08:00 AM.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | matthey
|  |
Re: 32-bit PPC on FPGA Posted on 13-Mar-2025 1:09:18
| | [ #446 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2602
From: Kansas | | |
|
| Hammer Quote:
Mega Drive and SNES can do better than that, e.g. Mortal Kombat.
SNES and Mega Drive have discrete 64KB VRAM (designed for 320x20 level resolution).
|
Unified memory and an integrated chipset were the future which is now integrated SoCs! Jay Miner planned the Amiga correctly but Commodore failed to integrate and produce the 68k Amiga SoC as they planned so they failed. The advantages the 68000 Amiga still had on these later consoles was more chip memory and more storage. SNES had 64kiB limits for graphics and sound and it forced reductions from the earlier 68000 Amiga that had unified memory and a large flat address space. SNES ports often have the lowest resolution among competitors. Sound samples sometimes need to be shortened to fit in the discreet memories. More data needs to be moved around and memory is wasted. The 68k Amiga could have had more of an advantage with chipset enhancements like more chip memory, sprite flipping and HD floppy support giving more unified memory and increasing storage but Commodore upper management was incompetent and corrupt like A-EonKit. They will cut any corner to keep from professional development in the needed direction which is integration.
Hammer Quote:
For IBM PC 5150, 8088 and 8087 are real while 68008 is late.
|
A 68000 is a far cheaper solution than an Intel 8088 with 1980 8087 and can do acceptable lower precision software FP for spreadsheets. The Motorola 6809 with 1978 6839 FP chip may have been closer to fulfilling IBMs low cost PC requirements.
Floating Point - Past, present and Future https://youtu.be/LuKBvsvkzEs?list=PLISEtDmihMo1-ADUicHo5hl7RJRQ5ilJT&t=210 Quote:
Back 1978 I was working at Motorola and we did a FP ROM for the 6809 MPU probably don't know anything about that MPU that it came close to being the one used in the Intel PC for the IBM PC way back when.
|
It was not so much that the 8087 FPU was first but it was likely the best available at the time. William Kahan, "the Father of FP", helped develop the FP standard including IEEE FP standard for it. Specialized mathematicians and computer scientists worked on it rather than normal programmers. It was an intellectual pursuit to improve FP which was successful in many ways. Amiga technology or should I say lack of technology or deteriorating technology is the exact opposite. "The Father of FP" was wrong about extended precision FP to maintain double precision accuracy with intermediate calculations, even though Intel was still proud of the Pentium FPU if you keep watching the video above. Compatibility is not important as emulation FPUs and even 68k FPGA cores just castrate the extended precision FPU to double precision or better yet replace it with the standard double precision PPC FPU only to remove it entirely. "The father of the Amiga" was wrong about the 68k and integration too. Modern Amiga times require cutting corners and cutting compatibility for niche market competitiveness after all. It is kind of funny that Intel was still so proud of the Pentium FPU so close to replacing it and GCC refusing to properly support the extended precision because the FPU was deprecated. The Pentium even had the FDIV bug, the poor FSIN precision and the horrible stack register based ISA. The 68k FPU is so much better and so close to realizing the full potential of the extended precision FPU but even Volker does not want to improve the VBCC 68k backend with no real hardware and corner cutting hardware only supporting double precision. Compilers do not support EOL emulation and FPGA CPU cores.
Hammer Quote:
8087 handles INT32, INT64, FP32, FP64 and FP80 datatypes for the IBM PC 5150.
|
The 68k FPU supports FP32, FP64, FP80/FP96, INT8, INT16 and INT32 datatypes. Most 68k FPU instructions support immediates, data registers and all the FPU registers which x86 FPUs do not. The x86 FPU ISA is crap compared to the 68k FPU ISA!
Hammer Quote:
An example of ecosystem f__kup. Needs OS related patches to get 68060 to behave correctly.
Emu68 and AC68080 don't need 68060 OS patches. A firmware matured AC68080 V2/V4 just works.
|
The 68060 FPU is compatible with the 68k FPU ISA even though instructions are trapped which affects performance not accuracy.
https://www.nxp.com/docs/en/data-sheet/MC68060UM.pdf Quote:
C.3.1 Floating-Point Emulation Results
All numerical results and condition code settings produced by the M68060FPSP and visible to the user are identical to those produced by the MC68881/882 and MC68040 with the following exception: the M68060FPSP transcendental calculation results are not the same as for the MC68881/882, because the algorithms used in the MC68881/882 (CORDIC) cannot be effectively implemented in software. However, the error bound of the M68060FPSP transcendental routines (same as for the MC68040 routines) are equivalent or superior.
For floating-point arithmetic instructions, the error bound is one-half unit in the last place of the destination format in the round-to-nearest mode, and one unit in the last place in the other rounding modes. Transcendental instructions have an error bound of less than 0.6 unit in the last place of double precision. The error bound for decimal conversions is 0.97 unit in the destination precision for the round-to-nearest mode and 1.47 units in the last digit of the destination precision for the other rounding modes.
|
The only way the 68060 is able to maintain this accuracy is due to retaining the extended precision FPU! Any FPU that reduces precision to double precision is incompatible with the 68k FPU ISA! There is existing 68k FPU code that will fail!
Hammer Quote:
AC68080 only has a single FPU, not P6's split FADD and FMUL pipelines design.
AC68080 V2 quickly targeted Quake. Recall non-IEEE 52bit FP "Quake enabler" for AC68080 V2.
Quake is a mixed integer and floating point program that exploits concurrent pipelines.
Both 68060 and AC68080 have a single FPU, hence they are closer when the workload is heavy FP.
|
The AC68080 likely has separate FADD/FSUB, FMUL and FDIV/FSQRT pipelines which is a significant advantage but the 68060 likely has shorter latencies. The problem for the AC68080 is that a FPGA core needs deep pipelines like a high clocked CPU in an ASIC and like the 14-stage P6 Pentium's deeper FPU pipeline. On paper, the P6 Pentium FPU is much better than the shallow pipeline P5 Pentium FPU but latencies increased with the deeper pipeline and a recompile is required to take advantage of this much like the AC68080. The P6 Pentium has the added disadvantage of not enough FPU registers for deep FPU pipelines and the AC68080 has the added disadvantage of no compiler support.
Hammer Quote:
I have seen benchmark results where the Pentium blows away the 68060 without patching the trapped instructions to the 68060 blowing away the Pentium with a carefully configured 68060 system. Lightwave is compiled with SAS/C which barely has any changes for the 68060 and has primitive FPU support. The Pentium FPU suffers from poor compiler support for the horrible FPU too. The 68060 has a clear advantage with integer code, mixed integer and FPU code is about a tie and the P5 Pentium has a small advantage with FPU heavy code when compiler support is brought into the equation.
Hammer Quote:
68060's 4 byte fetch per cycle problem is for superscalar, i.e. concurrent multiple pipeline operations.
A heavy floating point workload like Lightwave is not Quake's mixed integer and floating point that exploits concurrency.
A heavy floating point workload with a single port FPU largely negates superscalar.
|
So you listened to me and believed me for once about the 68060 instruction buffer filling with heavy FPU code?
Quake and Lightwave both have FPU heavy code, mixed integer and FP code and all integer code. It just comes down to the percentages of each with integer code remaining the most common.
Hammer Quote:
32 FPR wouldn't work with legacy Amiga apps with 68K FPU support since they are designed with 8 FPR in mind. It's the same for X86-64's 16 SSE2 registers needing new software.
Out-of-order processing is linked with register renaming and works for legacy software.
|
Eventually register renaming came to the x86 FPU but it is much more expensive for FP80 than INT32. Requiring OoO execution and register renaming is an expensive solution for a crap FPU ISA.
Hammer Quote:
Why are you defending the obsolete 68060 design?
Classic Pentium / 68060 era CPU has FPU being attached to one of the integer pipelines.
|
I believe the 68060 FPU is a new design and not the 68040 FPU. The whole 68060 design looks new to me where the Pentium has been described as two 486s glued together. Motorola skipped the 68050 which was going to be an enhanced 68040 and created the more modern 68060 design. It is missing a few modern features like a hardware return stack and in some ways appears unfinished but it is a solid foundation for a line of in-order CPUs like the in-order P5 Pentium line but much better and especially lower power which is important for embedded use.
Hammer Quote:
ARM Cortex A53 has 8 bytes per cycle from L1 instruction cache into 2-way decoder.
|
AArch64 instructions use a fixed length 32-bit encoding so superscalar execution would not be possible with a 4B/cycle fetch. The average instruction length of AArch64 code is 4 bytes while the 68k average instruction length is likely 2.5-3.0 bytes. Big difference.
Hammer Quote:
The in-order Bonnell pipeline is too deep for a low power design and my liking. I doubt there is a load-to-use penalty on the Bonnell CISC design despite the long latency fetch unlike the Cortex-A53 design. As I recall, both of these designs are much different in how they fetch and decode instructions compared to the 68060. Instruction cache lines fetched into the L1 cache are pre/partially decoded before being placed into the L1 cache. This reduces the latency of instructions in the L1 cache and decreases execution time decoding overhead but code in the L1 cache is not as compact and there is an impact on code coming from outside the L1 cache. Many IBM cores have used this style of instruction fetching in designs too.
Last edited by matthey on 13-Mar-2025 at 01:21 AM. Last edited by matthey on 13-Mar-2025 at 01:15 AM.
|
| Status: Offline |
| | cdimauro
|  |
Re: 32-bit PPC on FPGA Posted on 21-Mar-2025 21:54:22
| | [ #447 ] |
| |
 |
Elite Member  |
Joined: 29-Oct-2012 Posts: 4298
From: Germany | | |
|
| @Hammer
Quote:
Hammer wrote: @cdimauro
Quote:
Definitely, it was a bad luck. FPU weren't that important before Quake, and it was good to focus on the GP/integer part of the chips. |
Strong FPU is important for 3D rendering, not just Quake e.g. Bryce 3D.
MS Excel also supports FPU.
Bryce 3D v3 is available for Mac 68K, hence they should run on fake Mac Amigas.
Bryce 3D v3.1 is available for Windows 95/NT 3.5/NT 4. |
And? Consumer 3D was born with Amiga, and 3D applications usually came with 68000 and 68020+68881 executables to allow all users to use them.
Professionals obviously used accelerators with the math coprocessors (or newer Amigas which embedded them) to greatly speed-up their execution.
Games never used neither required such accelerators, because they addressed basically only two platforms: Amiga 500 (usually with 512kB expansion) or Amiga 1200.
That's on the Amiga land, but for PC it was the same thing. Even for 3D games: they haven't used math coprocessors.
Quake changed it, and FPUs became not only common on mainstream computers, but even mandatory.
But, again, it's starting with this game, and processor vendors had to change it towards this direction.
And... rolling drum... they had no crystal ball to forecast this change: who has pointing to only enforce the integer performance got a bad surprise when Quake arrived. Bad luck. DOT! |
| Status: Offline |
| | cdimauro
|  |
Re: 32-bit PPC on FPGA Posted on 21-Mar-2025 22:14:17
| | [ #448 ] |
| |
 |
Elite Member  |
Joined: 29-Oct-2012 Posts: 4298
From: Germany | | |
|
| @matthey
Quote:
matthey wrote: Hammer Quote:
SNES's 65816 CPU's math power was supported by custom DSP and SuperFX (this 16-bit RISC coprocessor development led to Argonaut RISC Core CPU family) add-ons.
65xx/65xxx CPU family's slow R&D pace led to ARM and ARC RISC CPUs.
|
On paper, the 1985 Amiga is inferior to the 1988 Sega Genesis/Mega Drive and 1990 SNES in almost every way.
https://segaretro.org/Sega_Mega_Drive/Hardware_comparison Quote:
Vs. Amiga
The Mega Drive was generally more powerful than the Amiga. The Mega Drive's 68000 CPU is clocked at 7.6 MHz, while the Amiga's 68000 CPU was clocked at 7.16 MHz (NTSC) or 7.09 MHz (PAL). The Mega Drive displays eighty 15-color sprites at 32×32 pixels each, while the Amiga displays eight 3-color sprites at 8 pixels wide. The Mega Drive displays 61–64 colors standard and 183–192 colors with Shadow/Highlight, while the Amiga displays 2–32 colors standard and 64 colors with EHB. The Mega Drive's VDP can DMA blit 3.21845–6.4 MB/s bandwidth (6.4 MPixels/s fillrate), while the Amiga's Blitter can blit 1.7725–3.58 MB/s (2.363333–4.773333 MPixels/s with 64 colors). During active display, with 64 colors at 60 FPS, the VDP can write 708 KB/s to 2 MB/s (1.4–2 MPixels/s) during 320×224 display, while the Blitter can write 332.5–700 KB/s (443,333–933,333 pixels/s) during 320×200 display. The Mega Drive supports tilemap backgrounds, reducing processing, memory and bandwidth requirements by up to 64 times compared to the Amiga's bitmap backgrounds, giving the Mega Drive an effective tile fillrate of 6–36 MPixels/s. The Mega Drive has a Z80 sound CPU and supports 10 audio channels, while the Amiga lacks a sound CPU and supports 4 audio channels.[
|
The SNES Ricoh 5A22 CPU MIPS is roughly equivalent to the 68000 in the Amiga and Mega Drive but the 68000 is much easier to program and has a large advantage with 16-bit and 32-bit datatypes. A good example of SNES sluggishness, inferior parallax scrolling and smaller sprites with less animation is Shadow of the Beast.
Shadow of the Beast | Amiga, MD, PCE, SNES | Comparison - Quad Longplay https://www.youtube.com/watch?v=QUT91K4mPlw
The SNES colors are good when looking at a static screen and the music is ok but the playability is the worst among the 4 consoles above. The Mega Drive version is almost as good as the Amiga version and seems fast including scrolling, maybe even faster than the Amiga version. The Amiga version looks like the highest resolution, remains a good all around contender and has the best music and sound. The SNES may have the best hardware other than the CPU and the SNES version of the game could likely be optimized more but this is where the 68000 is so much better. A 68000 SNES may have killed the Mega Drive and a 68020 may have even been possible as the CD32 delivered 3 years later. Console developers eventually learned the lesson of difficult to program hardware with hypothetical peak performance that was rarely realized. |
The SNES SotB conversion is really bad. If you carefully take a look at it, it doesn't even have the same assets. So, it looks like that it was developed from scratch, and not making good use of the hardware.
Just take a look at Donkey Kong Country: https://www.youtube.com/watch?v=hakuztODkAw You can see how many objects and playfields are moving and so fast and with so many colours and transparencies effects. Not even an Amiga 1200 can do the same at 60/50FPS.
Yes, the resolution is lower (256 pixels horizontally instead of 320 for the Amiga), but that's how Nintendo decided to implement the TV output. And for good reasons: the coordinates require just one byte. In fact, setting the position of a sprite requires just a write (store) to the specific register. |
| Status: Offline |
| | matthey
|  |
Re: 32-bit PPC on FPGA Posted on 22-Mar-2025 3:29:47
| | [ #449 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2602
From: Kansas | | |
|
| @cdimauro Donkey Kong Country is a good SNES port. It is one of Nintendo's flagship games like Mario Bros so it is going to get the best programmers, resources and time to create it. These were the sprite and tile type games that the SNES was supposed to be good at with the good chipset but they still needed considerable optimizing that the 68000 does not need and suffered from lower resolution, limited sound samples, etc. The Amiga 1200 could still hang with the SNES too. An example is the Lion King where the Amiga version is a rush job, unfinished and poorly optimized in places but it is still impressive.
Lion King | Mega Drive, SNES & Amiga | Comparison - Triple Longplay https://www.youtube.com/watch?v=qAMbPwmx19k
From the video comments:
rafaellima83 Quote:
Although Amiga could be upgraded, like 99.9% of the games run from the out of box hardware. That was something many people complained about the Amiga: no point in getting upgrades since all games run on standard hardware.
The Lion King on Amiga was ported in just 2 months because it had to be ready for 1994 Christmas. There's a interview with him at Codetapper's site, where he states he was sure he could do it in 2 months because he would have access to the original source code... and when he received it was in C and he was expecting 68k Assembly... and he didn't know C at that time, so he had a lot more of work to do it than he first thought.... and that's why levels were cut from the game, because he wouldn't manage to do it for a xmas release.
|
SNES suffered more when games were outside of the low resolution sprite based platform games. For example, SNES Dungeon Master is far worse than the Atari ST version.
Dungeon Master SNES Intro and Gameplay https://www.youtube.com/watch?v=hakuztODkAw
Or how about a flight sim?
SNES Longplay [677] Super Strike Eagle (US) https://www.youtube.com/watch?v=TgPmROtU6XA
The SNES is not even capable of OCS/ECS flight sims like F/A-18 Interceptor and Falcon. The problem is not the chipset but the limited CPU. A SNES with the 68000 would have been higher performance, less limited with a large flat address space, less limited with larger datatype support and easier to program. Even if primarily poking hardware registers, the 68k is much nicer than the 6502 family. The 68k based consoles and computers were good. Even the Atari ST with minimal hardware was good for the price. Nintendo made a poor choice for the 1990 SNES CPU especially considering that they did not retain NES compatibility. Jay Miner had avoided the same mistake with the more dynamic and less limited 68000 Amiga 5 years earlier.
|
| Status: Offline |
| | cdimauro
|  |
Re: 32-bit PPC on FPGA Posted on 22-Mar-2025 6:03:07
| | [ #450 ] |
| |
 |
Elite Member  |
Joined: 29-Oct-2012 Posts: 4298
From: Germany | | |
|
| @matthey
Quote:
matthey wrote: @cdimauro Donkey Kong Country is a good SNES port. It is one of Nintendo's flagship games like Mario Bros so it is going to get the best programmers, resources and time to create it. These were the sprite and tile type games that the SNES was supposed to be good at with the good chipset but they still needed considerable optimizing that the 68000 does not need and suffered from lower resolution, limited sound samples, etc. |
Samples aren't limited on the SNES: there's a lot of space for them, especially thanking the ADPCM compression which lowered the occupied space while keeping a very good quality.
I think that its the exact contrary, because for the Amiga samples had limited space. It might look unbelievable, but the 512kB of the Chip RAM had to be used for graphics and samples (and the buffer for loading MFM data from the disk), so the space was very limited and the samples' space was the most sacrificed.
The situation greatly changed with the Amiga 1200. But the SNES was already out since two years... Quote:
The Amiga 1200 could still hang with the SNES too. An example is the Lion King where the Amiga version is a rush job, unfinished and poorly optimized in places but it is still impressive.
Lion King | Mega Drive, SNES & Amiga | Comparison - Triple Longplay https://www.youtube.com/watch?v=qAMbPwmx19k
From the video comments:
rafaellima83 Quote:
Although Amiga could be upgraded, like 99.9% of the games run from the out of box hardware. That was something many people complained about the Amiga: no point in getting upgrades since all games run on standard hardware.
The Lion King on Amiga was ported in just 2 months because it had to be ready for 1994 Christmas. There's a interview with him at Codetapper's site, where he states he was sure he could do it in 2 months because he would have access to the original source code... and when he received it was in C and he was expecting 68k Assembly... and he didn't know C at that time, so he had a lot more of work to do it than he first thought.... and that's why levels were cut from the game, because he wouldn't manage to do it for a xmas release.
|
|
Yes, Lion King was a good port, despite the short time and such problems.
The SNES version is the best one: more colourful, albeit with smaller resolution.
The Amiga suffers from the lack of sprites /and it has a bit less colours) which is visible only here: https://www.youtube.com/watch?v=qAMbPwmx19k&t=870s
The Amiga 1200 was good in this case, because this game only uses two playfields and a few of sprites (with the exception of the above level). But more demanding games (e.g.: more playfields and/or more sprites) it had problems. Quote:
The Dungeon Master link is wrong. Here's the correct one: https://www.youtube.com/watch?v=1eM_k1ebRL8
This in example of a very bad port, because with a proper use of Mode7 it could have been way better: SNES - BASTARD!! https://www.youtube.com/watch?v=-pUMQtj8K0A
The same applies to Super Strike Eagle: almost all game could have used Mode 7. The only exception being represented by the "cockpit" sections, because there you've a real 3D screen to draw, and SNES had not even a bitplane mode. Quote:
The SNES is not even capable of OCS/ECS flight sims like F/A-18 Interceptor and Falcon. The problem is not the chipset but the limited CPU. |
The primary problem is the chipset, because it lacks a bitmap mode.
The CPU, even with its 8-bit data bus, isn't so bad. The C64 had also 3D games, with a much slower CPU, and the SNES embedded fast multiplication circuits: https://snes.nesdev.org/wiki/Multiplication
So, again, it's a matter of properly optimizing the code. Quote:
A SNES with the 68000 would have been higher performance, less limited with a large flat address space, less limited with larger datatype support and easier to program. Even if primarily poking hardware registers, the 68k is much nicer than the 6502 family. The 68k based consoles and computers were good. Even the Atari ST with minimal hardware was good for the price. |
Yes, we know that the 68k is a much better CPU, but 65c816 was a very good choice for the SNES, thanks to its "8-bit chipset" (poking 8-bit registers is a perfect match for such CPU with the 8-bit data bus).
You save even single pennies when you mass produce something, and I think that a 68000 costed much more compared to the castrated 65c816 of the SNES. Quote:
Nintendo made a poor choice for the 1990 SNES CPU especially considering that they did not retain NES compatibility. Jay Miner had avoided the same mistake with the more dynamic and less limited 68000 Amiga 5 years earlier. |
Backward compatibility wasn't a problem for the consoles of the time.
Even SEGA decided for a completely incompatible choice for its Genesis. |
| Status: Offline |
| | matthey
|  |
Re: 32-bit PPC on FPGA Posted on 22-Mar-2025 19:51:18
| | [ #451 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2602
From: Kansas | | |
|
| cdimauro Quote:
Samples aren't limited on the SNES: there's a lot of space for them, especially thanking the ADPCM compression which lowered the occupied space while keeping a very good quality.
I think that its the exact contrary, because for the Amiga samples had limited space. It might look unbelievable, but the 512kB of the Chip RAM had to be used for graphics and samples (and the buffer for loading MFM data from the disk), so the space was very limited and the samples' space was the most sacrificed.
The situation greatly changed with the Amiga 1200. But the SNES was already out since two years...
|
A basic comparison has the SNES and Amiga close.
SNES 64kiB video 64kiB audio (3.56:1 compression so like up to 228kiB) 128kiB main memory --- 256kiB-420kiB
Amiga 512kiB OCS 1MiB ECS 2MiB AGA (CD-32 has CD-ROM music mixing)
The SNES has the ADPCM compression, has sprite flipping, uses tiles and can load from ROM cartridges faster which saves memory but has discreet video, audio and main memory size limits. The Amiga has HAM, HAM8, CD32 CD-ROM mixing and 68k code compression which improved from the 68000 to the 68020 for saving memory. Commodore could have added more memory saving features but chose the more expensive route of increase the chip memory instead.
cdimauro Quote:
Yes, Lion King was a good port, despite the short time and such problems.
The SNES version is the best one: more colourful, albeit with smaller resolution.
The Amiga suffers from the lack of sprites /and it has a bit less colours) which is visible only here: https://www.youtube.com/watch?v=qAMbPwmx19k&t=870s
The Amiga 1200 was good in this case, because this game only uses two playfields and a few of sprites (with the exception of the above level). But more demanding games (e.g.: more playfields and/or more sprites) it had problems.
|
The Amiga version of the Lion King suffers a little from lack of AGA enhancements and limitations. A more polished Amiga CD-32 version could have been one of the best versions though. The SNES port is a quality port and probably the best by a narrow margin. Not only is the resolution lower but there are a few changed/shortened sampled sounds.
JulioRedVerse Quote:
The only thing that bothers me is Simba's roar. In the Genesis version he sounds like a cub pretending to be adult, which is how it should be. But in the SNES one he meows, wtf? solarflare9078 Quote:
It's an issue with the SNES sound hardware: It's forced to use shorter/lower quality samples due to the SNES's sound RAM limit of 64KB
|
Tjoeb123 Quote:
@solarflare9078 And in the Genesis version, when he gets hit as young Simba, the adult version of the clip plays (instead of "Ow!") in all levels EXCEPT The Stampede (where the correct sound is played) for some reason.
|
CarloNassar Quote:
@solarflare9078 How is that because of the hardware? Anyone can tell it's a drastically different sound, so it seems more like Simba's roar was intentionally different.
|
|
I still feel like the SNES hardware had limitations and some of it was due to the poor choice of CPU. When starting with a CPU without a large flat address space, everything built around it reflects that as the small discreet video and audio memories show. The 68k Amiga started with the large flat address space and unified memory that was perfect to enhance and Commodore barely enhanced it other than increasing the chip memory.
cdimauro Quote:
This in example of a very bad port, because with a proper use of Mode7 it could have been way better:
...
The same applies to Super Strike Eagle: almost all game could have used Mode 7. The only exception being represented by the "cockpit" sections, because there you've a real 3D screen to draw, and SNES had not even a bitplane mode.
|
There were plenty of bad SNES ports. Mode 7 was overrated as it was a blurry low resolution mess. The Amiga copper chunky was almost as good in games like Ambermoon.
cdimauro Quote:
Backward compatibility wasn't a problem for the consoles of the time.
Even SEGA decided for a completely incompatible choice for its Genesis.
|
No. The Sega Genesis/Mega Drive was highly compatible with the Master System other than 3 games.
https://segaretro.org/Power_Base_Converter Quote:
Hardware
One of the key design features of the Mega Drive is its compatibility with its immediate predecessor, the Master System, as the Mega Drive's design is based upon the Master System's design, albeit enhanced and extended in many areas. As the cartridge slot of the Mega Drive is shaped differently than that of the Master System, and because its games could not be played directly through the Mega Drive, Sega released the Power Base Converter, an accessory that is placed between a Master System cartridge and the Mega Drive, allowing the user to play the previous generation of Sega games without the need for an extra console.
The Power Base Converter does not contain any Master System components but acts as a pass-through port. The converter contains two slots; a top slot for cartridge-based titles and a front slot for card-based games and accessories. The Power Base Converter would be fully compatible with the cost-reduced Mega Drive II, were it not for the different shape of the Mega Drive II's plastic casing.
In Europe the differently shaped Master System Converter II was released in order to satisfy Mega Drive II owners. This accessory is also fully compatible with the original Mega Drive, but lacks the ability to run cards.
Both 2-button Master System pads and standard Mega Drive pads can be used to play the majority of Master System games. Like the Master System, the PAUSE button is not part of the gamepad connector and instead is implemented as a push-button switch on the Power Base Converter or similar devices.
Technical information
In order to achieve backwards compatibility, the original Master System central processor and sound chip (the Z80 and SN76489) are included in the Mega Drive/Genesis and the new Video Display Processor is capable of the Master System VDP's mode 4 (though it cannot run in modes 0, 1, 2, or 3, so cannot run SG-1000 games). Once a Master System game is inserted, the system's bus controller chip (later integrated with the I/O chip into a single multi-purpose ASIC) will put the Z80 in control leaving the 68000 idle.
|
Keeping the Z80 and turning it into an I/O chip while adding a 68000 was a better upgrade path than the SNES which upgraded the 6502 family CPU to 16-bit but was unable to retain compatibility. The original SNES goal was to retain compatibility but the 6502 is not a good candidate to upgrade because the ISA is bad. I expect some people here will disagree and say Commodore made a mistake by not upgrading the 6502 family. The Z80 has a better ISA than the 6502 and Sega could have used the Z80 upgrade path but the 68000 was better. Intel should have thrown their 808x ISA away instead of trying to upgrade it but they started their Operation Crush propaganda campaign against the 68k instead. The IBM PC made Intel pigs fly and Motorola surrendered despite their 68k superiority.
|
| Status: Offline |
| | Hammer
 |  |
Re: 32-bit PPC on FPGA Posted on 22-Mar-2025 22:17:17
| | [ #452 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @cdimauro
Atari MegaSTe (1991) has 68881 or 68882 with 68000 @ 8 Mhz or 16 Mhz. https://en.wikipedia.org/wiki/Atari_MEGA_STE
Atari MegaSTe's 68881 FPU was too late when Mac established a multi-million business targeted Mac install base.
Atari MegaSTe's US$1,799 price for display capabilities like 320×200 (16 of 4096 colors), 640×200 (4 of 4096 colors), and 640×400p (mono) is an uncompetitive joke when compared to Mac LC I.
Quote:
And? Consumer 3D was born with Amiga, and 3D applications usually came with 68000 and 68020+68881 executables to allow all users to use them.
|
1. Amiga was inferior in texture-mapped 3D and didn't scale with SMP per render node.
2. A2620's 1988 release was late. For OCS's stable high-resolution display, the production scale for the A2024 monitor is around 5000 units. Read Commodore - The Final Years book.
3. 3D wasn't the only graphics application for FPU e.g. vector 2D artwork with DTP. Macintosh romp home with DTP in combination with MS Excel GUI / MS Word GUI.
4. PC had Autodesk's 3D Studio 1988 (3D Studio MAX for NT 3.5 in 1996) and AutoCAD for a long time. AutoCAD Release 10 1988 was available for MacOS System 6, Xenix, MS-DOS, and OS/2 1.x. Macintosh GUI business install base exceeds the Amiga.
AutoCAD Release 11 Mac edition was released in 1992 in time for Apple's best-selling 1992 Mac LC II. AutoCAD Release 12 was released for MacOS 7 in 1992 and Windows 3.1 in 1993.
Multiple millions of Macs beat a few 100,000-scale A2000s.
5. Full ECS was released in 1990 with A3000 which is inferior to Mac LC I's entry-level SVGA. Starting from October 1990, Mac LC I's 500,000 unit sales in its 1st year have no problems beating A2000 and A3000 unit sales.
A1000plus and A3000plus for 1991 were supposed to counter the Mac LC series and Mac IIsi.
For AutoCad's install base with stable high-resolution graphics.
According to Dataquest November 1989, VGA crossed more than 50 percent market share in 1989 i.e. 56%. http://bitsavers.trailing-edge.com/components/dataquest/0005190_PC_Graphics_Chip_Sets--Product_Analysis_1989.pdf
Low-End PC Graphics Market Share by Standard Type Estimated Worldwide History and Forecast
Total low-end PC graphic chipset shipment history and forecast 1987 = 9.2. million, VGA 16.4% market share i.e. 1.5088 million VGA. 1988 = 11.1 million, VGA 34.2% i.e. 3.79 million VGA. 1989 = 13.7 million, VGA 54.6% i.e. 7.67 million VGA. 1990 = 14.3 million, VGA 66.4% i.e. 9.50 million VGA. 1991 = 15.8 million, VGA 76.6% i.e. 12.10 million VGA. 1992 = 16.4 million, VGA 84.2% i.e. 13.81 million VGA. 1993 = 18.3 million, VGA 92.4% i.e. 16.9 million VGA.
The estimate for the Amiga AGA install base is about 500,000 units. PC VGA/SVGA crushed the Amiga AGA on the production scale.
IBM 8514 and its clones exceeded OCS's A2024 monitor's 5000 units production scale!
Quote:
And... rolling drum... they had no crystal ball to forecast this change: who has pointing to only enforce the integer performance got a bad surprise when Quake arrived. Bad luck. DOT! |
N64 was in development during 1993 and SGI offered $40 MIPS R4000 with relatively strong FPU.
https://web.archive.org/web/20150208022940/http://www.nytimes.com/1993/08/21/business/company-news-video-game-link-is-seen-for-nintendo.html [code] A computer industry official said MIPS, a subsidiary of Silicon Graphics, had developed a version of its R4000 processor that operated on less than one-half watt and could be produced for about $40 each [/code]
Ex-original Amiga engineers' 3DO M2's dual IBM PPC 602 @ 66 Mhz cores including full pipelined FP32 FPU in 1995. Around 1998, AMD would re-enter the game console market during early original Xbox development (Project Midway).
Intel's Xbox involvement was later in 2001. https://kotaku.com/report-xboxs-last-second-intel-switcheroo-left-amd-eng-1847851074 Bill Gates made an Intel CPU announcement in 2001 overriding the Xbox team's AMD CPU selection in public.
Removing Bill Gates as Microsoft's CEO also removes the systemic privilege link with Intel Corp.
Last edited by Hammer on 22-Mar-2025 at 11:39 PM. Last edited by Hammer on 22-Mar-2025 at 11:15 PM. Last edited by Hammer on 22-Mar-2025 at 10:44 PM.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | Hammer
 |  |
Re: 32-bit PPC on FPGA Posted on 22-Mar-2025 23:06:43
| | [ #453 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @matthey
Quote:
The SNES is not even capable of OCS/ECS flight sims like F/A-18 Interceptor and Falcon. The problem is not the chipset but the limited CPU. A SNES with the 68000 would have been higher performance, less limited with a large flat address space, less limited with larger datatype support and easier to program. Even if primarily poking hardware registers, the 68k is much nicer than the 6502 family. The 68k based consoles and computers were good. Even the Atari ST with minimal hardware was good for the price. Nintendo made a poor choice for the 1990 SNES CPU especially considering that they did not retain NES compatibility. Jay Miner had avoided the same mistake with the more dynamic and less limited 68000 Amiga 5 years earlier.
|
Nintendo officially supported DSP and SuperFX add-ons to reduce risk for game developers. Nintendo's SuperFX support shows 65816's road map inferiority.
For 3rd party software developers' risk assessment and planning, there's no statistics visibility for Amig's 3rd party CPU accelerators while the PC market has market intelligence statistics for its full 32-bit x86 CPU sales.
For the 1992 year, 1. Nintendo delivered a strong 2D gaming experience when compared to the A600.
2. Nintendo delivered a large production scale when compared to the A1200's 44,000 production scale.
3. Apple's Mac LC II where the standout best-seller Mac model for the 68K based desktop platform. Apple sold 2.5 million Macs in 1992.
4. 1992 year was SNES's head-to-head against Amiga's core European market. SNES destroyed the A600 and the A1200's very low production rate is effectively missing in action.
5. SNES's 1990 was only in the Japanese market. SNES was released for the US market in 1991. SNES has the jump on fast-VGA class install base build-up when to compared to Amiga AGA and Atari Falcon. Gaming PC switched to texture 3D gaming to avoid SNES's strong 2D.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | cdimauro
|  |
Re: 32-bit PPC on FPGA Posted on 23-Mar-2025 7:31:19
| | [ #454 ] |
| |
 |
Elite Member  |
Joined: 29-Oct-2012 Posts: 4298
From: Germany | | |
|
| @matthey
Quote:
matthey wrote: cdimauro Quote:
Samples aren't limited on the SNES: there's a lot of space for them, especially thanking the ADPCM compression which lowered the occupied space while keeping a very good quality.
I think that its the exact contrary, because for the Amiga samples had limited space. It might look unbelievable, but the 512kB of the Chip RAM had to be used for graphics and samples (and the buffer for loading MFM data from the disk), so the space was very limited and the samples' space was the most sacrificed.
The situation greatly changed with the Amiga 1200. But the SNES was already out since two years...
|
A basic comparison has the SNES and Amiga close.
SNES 64kiB video 64kiB audio (3.56:1 compression so like up to 228kiB) 128kiB main memory --- 256kiB-420kiB
Amiga 512kiB OCS 1MiB ECS 2MiB AGA (CD-32 has CD-ROM music mixing)
The SNES has the ADPCM compression, has sprite flipping, uses tiles and can load from ROM cartridges faster which saves memory but has discreet video, audio and main memory size limits. |
It has the same 24-bit address space of the 68000 and 68EC020, which provides access to a lot of memory.
On top of that, games can also implement bank switching in a very efficient and cheap way (it's very simple logic), to further increase the addressable memory when it was needed. A few games used it, like Tales of Phantasia: https://www.youtube.com/watch?v=CzCFv2vPNkw which impressed me. It's a notable example which has shown how to squeeze SNES' hardware (included Mode 7, of course) to achieve great results. Another one is its "successor", Star Ocean: https://www.youtube.com/watch?v=xY6FL1SimdE&list=PLW9S_7BSQS0ViRC4bFHgzQuTEYDy49TtI but it need to use another chip for decompression (too much stuff). Quote:
The Amiga has HAM, HAM8, CD32 CD-ROM mixing and 68k code compression which improved from the 68000 to the 68020 for saving memory. |
Unfortunately, there were only two configurations for games: Amiga OCS/ECS with 512kB of Chip RAM + an optional 512kB of "extra" memory (which many times was the stupid "Slow" Mem), and Amiga AGA with 2MB Chip RAM. CD32 expanded a bit the latter, but it was too little, too late.
The main problem is that having only 512kB of Chip Mem heavily crippled what was doable with an Amiga OCS/ECS. Thanks to the "brilliant" Commodore engineers, which added such idiotic Slow Mem.
Amiga 1200 was a different story, but unfortunately such geniuses haven't removed the memory access pattern of the CPU to the Chip RAM (which was exactly the same as for the 68000), heavily limiting the CPU usage to help the obsolete hardware. Quote:
Commodore could have added more memory saving features but chose the more expensive route of increase the chip memory instead. |
Increasing the Chip RAM was very very good! As I've already stated several times, the CPU was mostly the slave of the chipset: it's there for poking the hardware registers, letting the custom chip do most of the needed work. The business logic was a very little and limited part, so a powerful CPU wasn't usually required. Of course, 3D games are a different story.
However, the primary problem was represented by Commodore engineers: a bunch of incompetent and uncapable people which never realized how the chipset was really working and how to correctly evolve it to address all such issues. They also made idiotic decisions, and the above Slow Mem was one of the most important ones (but not the only one).
The Amiga deserved way better professionals. Quote:
Quote:
cdimauro [quote] Yes, Lion King was a good port, despite the short time and such problems.
The SNES version is the best one: more colourful, albeit with smaller resolution.
The Amiga suffers from the lack of sprites /and it has a bit less colours) which is visible only here: https://www.youtube.com/watch?v=qAMbPwmx19k&t=870s
The Amiga 1200 was good in this case, because this game only uses two playfields and a few of sprites (with the exception of the above level). But more demanding games (e.g.: more playfields and/or more sprites) it had problems.
|
The Amiga version of the Lion King suffers a little from lack of AGA enhancements and limitations. A more polished Amiga CD-32 version could have been one of the best versions though. |
I fully agree. Not only the missing levels, but for the sound track. Quote:
The SNES port is a quality port and probably the best by a narrow margin. Not only is the resolution lower but there are a few changed/shortened sampled sounds. |
JulioRedVerse Quote:
The only thing that bothers me is Simba's roar. In the Genesis version he sounds like a cub pretending to be adult, which is how it should be. But in the SNES one he meows, wtf? solarflare9078 Quote:
It's an issue with the SNES sound hardware: It's forced to use shorter/lower quality samples due to the SNES's sound RAM limit of 64KB
|
Tjoeb123 Quote:
@solarflare9078 And in the Genesis version, when he gets hit as young Simba, the adult version of the clip plays (instead of "Ow!") in all levels EXCEPT The Stampede (where the correct sound is played) for some reason.
|
CarloNassar Quote:
@solarflare9078 How is that because of the hardware? Anyone can tell it's a drastically different sound, so it seems more like Simba's roar was intentionally different.
| |
Weird. But I don't think that it's due to limits of the SNES hardware. Quote:
I still feel like the SNES hardware had limitations and some of it was due to the poor choice of CPU. When starting with a CPU without a large flat address space, everything built around it reflects that as the small discreet video and audio memories show. The 68k Amiga started with the large flat address space and unified memory that was perfect to enhance and Commodore barely enhanced it other than increasing the chip memory. |
No, see above: the SNES has the same addressable memory of 68000 & 68EC020. Plus bank switching.
It's not flatten (it can only use 64kB banks), but it's perfectly suitable for the specific needs of the console. Quote:
cdimauro Quote:
This in example of a very bad port, because with a proper use of Mode7 it could have been way better:
...
The same applies to Super Strike Eagle: almost all game could have used Mode 7. The only exception being represented by the "cockpit" sections, because there you've a real 3D screen to draw, and SNES had not even a bitplane mode.
|
There were plenty of bad SNES ports. Mode 7 was overrated as it was a blurry low resolution mess. |
Hum. Hard to take it. I've reported some examples of very good usage of Mode 7. Quote:
The Amiga copper chunky was almost as good in games like Ambermoon. |
I never played Ambermoon, but I took at look and its small 3D sections are really great to see on an OCS/ECS game. Great work!
However, I find them very limited. Quote:
cdimauro Quote:
Backward compatibility wasn't a problem for the consoles of the time.
Even SEGA decided for a completely incompatible choice for its Genesis.
|
No. The Sega Genesis/Mega Drive was highly compatible with the Master System other than 3 games.
https://segaretro.org/Power_Base_Converter Quote:
Hardware
One of the key design features of the Mega Drive is its compatibility with its immediate predecessor, the Master System, as the Mega Drive's design is based upon the Master System's design, albeit enhanced and extended in many areas. As the cartridge slot of the Mega Drive is shaped differently than that of the Master System, and because its games could not be played directly through the Mega Drive, Sega released the Power Base Converter, an accessory that is placed between a Master System cartridge and the Mega Drive, allowing the user to play the previous generation of Sega games without the need for an extra console.
The Power Base Converter does not contain any Master System components but acts as a pass-through port. The converter contains two slots; a top slot for cartridge-based titles and a front slot for card-based games and accessories. The Power Base Converter would be fully compatible with the cost-reduced Mega Drive II, were it not for the different shape of the Mega Drive II's plastic casing.
In Europe the differently shaped Master System Converter II was released in order to satisfy Mega Drive II owners. This accessory is also fully compatible with the original Mega Drive, but lacks the ability to run cards.
Both 2-button Master System pads and standard Mega Drive pads can be used to play the majority of Master System games. Like the Master System, the PAUSE button is not part of the gamepad connector and instead is implemented as a push-button switch on the Power Base Converter or similar devices.
Technical information
In order to achieve backwards compatibility, the original Master System central processor and sound chip (the Z80 and SN76489) are included in the Mega Drive/Genesis and the new Video Display Processor is capable of the Master System VDP's mode 4 (though it cannot run in modes 0, 1, 2, or 3, so cannot run SG-1000 games). Once a Master System game is inserted, the system's bus controller chip (later integrated with the I/O chip into a single multi-purpose ASIC) will put the Z80 in control leaving the 68000 idle.
|
|
Strange. I was recalling differently. Bad mistake.
But good choice to use the Z80 as I/O chip. Quote:
Keeping the Z80 and turning it into an I/O chip while adding a 68000 was a better upgrade path than the SNES which upgraded the 6502 family CPU to 16-bit but was unable to retain compatibility. The original SNES goal was to retain compatibility but the 6502 is not a good candidate to upgrade because the ISA is bad. |
No, the ISA wasn't bad for this specific purpose. To me it's a perfect match for a console which has mostly to set 8-bit registers.
What's strange is that Nintendo hasn't used the NES's 6502 for the I/O chip. In fact, the SPC (which handles music) is clearly derived from the 6502, albeit with some customization.
Probably Nintendo decided that it was more costly to add all legacy NES hardware to the console, so it wasn't worth keeping the 6502 as it is for the music chip. Quote:
I expect some people here will disagree and say Commodore made a mistake by not upgrading the 6502 family. |
Only a blind fanatical like Lou can say such stupidity.
I'm still waiting to see his 65xx version of the challenge (increasing an array of integers) that HE has proposed... Quote:
The Z80 has a better ISA than the 6502 and Sega could have used the Z80 upgrade path but the 68000 was better. |
Even because Z80 had no future: all subsequent 16/32-bit Zilog's processors have a completely different, and incompatible, ISA. Quote:
Intel should have thrown their 808x ISA away instead of trying to upgrade it but they started their Operation Crush propaganda campaign against the 68k instead. |
Indeed, and it made sense at the time: they were surprisingly lucky with the 8086, and basically they were forced to continue on this direction. Even its APX 432 was a big flop for this reason (and some technical reasons as well). Quote:
The IBM PC made Intel pigs fly and Motorola surrendered despite their 68k superiority. |
No, Motorola had never a chance.
There was no second supplier for the 68000, but what's most important is that it could never manage to sell it for $5 (as per IBM request. Intel had to sell its 8088 at loss). |
| Status: Offline |
| | cdimauro
|  |
Re: 32-bit PPC on FPGA Posted on 23-Mar-2025 7:53:26
| | [ #455 ] |
| |
 |
Elite Member  |
Joined: 29-Oct-2012 Posts: 4298
From: Germany | | |
|
| @Hammer
Quote:
Hammer wrote: @cdimauro
Atari MegaSTe (1991) has 68881 or 68882 with 68000 @ 8 Mhz or 16 Mhz. https://en.wikipedia.org/wiki/Atari_MEGA_STE
Atari MegaSTe's 68881 FPU was too late when Mac established a multi-million business targeted Mac install base.
Atari MegaSTe's US$1,799 price for display capabilities like 320×200 (16 of 4096 colors), 640×200 (4 of 4096 colors), and 640×400p (mono) is an uncompetitive joke when compared to Mac LC I. |
And? Irrelevant. Quote:
Quote:
And? Consumer 3D was born with Amiga, and 3D applications usually came with 68000 and 68020+68881 executables to allow all users to use them.
| 1. Amiga was inferior in texture-mapped 3D |
And who cares? 3D applications never had such problem. Quote:
and didn't scale with SMP per render node. |
Please, tell me more about Apple and all other players. To be more clear, when MacOS supported SMP? When SMP Windows NT systems were available?
Amiga was The King of 3D applications and never faced such problems, like all other competitors (in its market segment). Quote:
2. A2620's 1988 release was late. |
Irrelevant. The Amiga had several accelerator cards available since the Amiga 1000. Quote:
For OCS's stable high-resolution display, the production scale for the A2024 monitor is around 5000 units. Read Commodore - The Final Years book. |
I know it, and? Quote:
3. 3D wasn't the only graphics application for FPU e.g. vector 2D artwork with DTP. Macintosh romp home with DTP in combination with MS Excel GUI / MS Word GUI. |
Irrelevant. The Amiga had several accelerator cards available since the Amiga 1000.
BTW, I've used Professional Page on my unexpanded Amiga 2000 (512kB Chip RAM + 512kB of Stupid Slow Mem) achieving great results on DTP, and the system was very usable. It could have been much better with Fast Mem and an accelerator card, but I was NOT a professional. Professionals had no problem buying them. Quote:
4. PC had Autodesk's 3D Studio 1988 (3D Studio MAX for NT 3.5 in 1996) |
Irrelevant. The Amiga was The King of 3D and it had several accelerator cards available since the Amiga 1000. Quote:
and AutoCAD for a long time. AutoCAD Release 10 1988 was available for MacOS System 6, Xenix, MS-DOS, and OS/2 1.x. Macintosh GUI business install base exceeds the Amiga. |
Looking at this list, it could have have easily been ported to the Amiga: Mac OS 6 sucked and MSDOS even more.
So, not an Amiga limit. Which, BTW, had also good and competitive CADs (but it wasn't my domain and never tried one). Quote:
AutoCAD Release 11 Mac edition was released in 1992 in time for Apple's best-selling 1992 Mac LC II. AutoCAD Release 12 was released for MacOS 7 in 1992 and Windows 3.1 in 1993. |
Same as above: not inherent problem of the Amiga OS. Quote:
Multiple millions of Macs beat a few 100,000-scale A2000s. |
Totally irrelevant. Quote:
5. Full ECS was released in 1990 with A3000 which is inferior to Mac LC I's entry-level SVGA. Starting from October 1990, Mac LC I's 500,000 unit sales in its 1st year have no problems beating A2000 and A3000 unit sales.
A1000plus and A3000plus for 1991 were supposed to counter the Mac LC series and Mac IIsi. |
Same here: totally irrelevant. Quote:
For AutoCad's install base with stable high-resolution graphics. |
I reveal you a secret: Amiga had additional graphic cards available, for professional markets. Quote:
According to Dataquest November 1989, VGA crossed more than 50 percent market share in 1989 i.e. 56%. http://bitsavers.trailing-edge.com/components/dataquest/0005190_PC_Graphics_Chip_Sets--Product_Analysis_1989.pdf
Low-End PC Graphics Market Share by Standard Type Estimated Worldwide History and Forecast
Total low-end PC graphic chipset shipment history and forecast 1987 = 9.2. million, VGA 16.4% market share i.e. 1.5088 million VGA. 1988 = 11.1 million, VGA 34.2% i.e. 3.79 million VGA. 1989 = 13.7 million, VGA 54.6% i.e. 7.67 million VGA. 1990 = 14.3 million, VGA 66.4% i.e. 9.50 million VGA. 1991 = 15.8 million, VGA 76.6% i.e. 12.10 million VGA. 1992 = 16.4 million, VGA 84.2% i.e. 13.81 million VGA. 1993 = 18.3 million, VGA 92.4% i.e. 16.9 million VGA.
The estimate for the Amiga AGA install base is about 500,000 units. PC VGA/SVGA crushed the Amiga AGA on the production scale.
IBM 8514 and its clones exceeded OCS's A2024 monitor's 5000 units production scale! |
Irrelevant. Quote:
Quote:
And... rolling drum... they had no crystal ball to forecast this change: who has pointing to only enforce the integer performance got a bad surprise when Quake arrived. Bad luck. DOT! |
N64 was in development during 1993 and SGI offered $40 MIPS R4000 with relatively strong FPU.
https://web.archive.org/web/20150208022940/http://www.nytimes.com/1993/08/21/business/company-news-video-game-link-is-seen-for-nintendo.html [code] A computer industry official said MIPS, a subsidiary of Silicon Graphics, had developed a version of its R4000 processor that operated on less than one-half watt and could be produced for about $40 each [/code]
Ex-original Amiga engineers' 3DO M2's dual IBM PPC 602 @ 66 Mhz cores including full pipelined FP32 FPU in 1995. Around 1998, AMD would re-enter the game console market during early original Xbox development (Project Midway).
Intel's Xbox involvement was later in 2001. https://kotaku.com/report-xboxs-last-second-intel-switcheroo-left-amd-eng-1847851074 Bill Gates made an Intel CPU announcement in 2001 overriding the Xbox team's AMD CPU selection in public.
Removing Bill Gates as Microsoft's CEO also removes the systemic privilege link with Intel Corp. |
Irrelevant + usual Hammer's PADDING. |
| Status: Offline |
| | Hammer
 |  |
Re: 32-bit PPC on FPGA Posted on 25-Mar-2025 1:59:53
| | [ #456 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @matthey
Quote:
On paper, the 1985 Amiga is inferior to the 1988 Sega Genesis/Mega Drive and 1990 SNES in almost every way.
|
Factor in the release timeline and market area.
Sega Genesis/Mega Drive has discrete 64K VRAM in addition to 68000's 16-bit system bus is connected to user game ROM and system RAM.
Against Sega Genesis/Mega Drive, AmigaOCS has its advantages e.g. Sega Mega Drive has inferior 512 color palette shades. Elf Mania and Shadow Fighter shows Amiga OCS strong technical 2D game delivery.
Against Sega Mega Drive and head-to-head battle in Commodore's core European market, mostly Amiga 500 OCS still scored its highest 1991 unit sales.
The major change in 1992's competitive environments near A600's price range is SNES's European 1992 release.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | Hammer
 |  |
Re: 32-bit PPC on FPGA Posted on 25-Mar-2025 2:18:13
| | [ #457 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @cdimauro
Quote:
The main problem is that having only 512kB of Chip Mem heavily crippled what was doable with an Amiga OCS/ECS. Thanks to the "brilliant" Commodore engineers, which added such idiotic Slow Mem.
|
ECS Agnus wasn't completed for A500's Rev 3 1987 release, hence the quick 512KB Slow RAM hack.
ECS Agnus' 1 MB and 2 MB Chip RAM address range capability is the same for the canceled Amiga Ranger. At least ECS Agnus A + ECS Denise was demonstrated in Q4 1988.
1989 era A500 Rev 6A didn't have A3000's $C0 range shadow on the Chip RAM address range feature.
A500 Rev 6A had to support A500 Rev 3/Rev5's $C0 512KB Slow RAM hack.
You can have some kind of 1MB Chip RAM even with 0.5MB Chip RAM + 0.5MB Slow RAM configuration as long as the Agnus is an ECS model e.g. a copper pointer set to 0x090000 sees memory at 0xC10000.
A500 Rev 6A is the definitive Amiga 500 due to record sales from 1989 to 1991. A500 Rev 3 and Rev 5 are the minority.
Last edited by Hammer on 25-Mar-2025 at 02:20 AM.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | Hammer
 |  |
Re: 32-bit PPC on FPGA Posted on 25-Mar-2025 2:50:11
| | [ #458 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @cdimauro
Quote:
Relevant for mainstream 3rd party developers i.e. install base numbers matter.
MegaST's minority sale numbers didn't change the overall Atari ST experience.
Quote:
And who cares? 3D applications never had such problem.
|
Prove Amiga's Lightwave has 4 million unit sales. Hint: PC's Doom/Doom 2 has a 4 million unit sales.
Quote:
Please, tell me more about Apple and all other players. To be more clear, when MacOS supported SMP? When SMP Windows NT systems were available?
|
Did you forget the multi-CPU RISC Windows NT Lightwave hype in English Amiga magazines?
Quote:
Irrelevant. The Amiga had several accelerator cards available since the Amiga 1000.
|
Irrelevant. Hint: lack of statistics visibility for 3rd party CPU accelerated Amigas.
Quote:
Hint: Market size with productivity resolution capability. Apple was able to create a large enough business customer base which is not dependent on the lower price game market.
Hint 2: Commodore's primary revenue streams are from the C64 and the Amiga. CSG LSI group's VIC-20 and C64 revenue streams have been sustaining Commodore, NOT the "system engineering" group (e.g. C900 debacle, AMIX debacle and took over the Amiga project from the original Los Gatos Amiga team).
Amiga's revenue generation took over from C64 during Amiga's golden years until it crashed and burn with the A600 project.
Quote:
Irrelevant. The Amiga was The King of 3D and it had several accelerator cards available since the Amiga 1000.
|
False.
https://www.annualreports.com/HostedData/AnnualReportArchive/a/NASDAQ_ADSK_1990.pdAutoDesk's 1990 financials.
The total MS-DOS-based CAD 3D market in 1989 is $252 million and Autodesk's market share is 63.5 percent of it. This doesn't include Autodesk's 3D software offerings in OS/2, SCO Xenix 386, and Macintosh markets.
You're in dreamland.
Quote:
Relevant for the platform's survival as a mainstream ongoing concern i.e. large economies of scale matters.
Quote:
I reveal you a secret: Amiga had additional graphic cards available, for professional markets.
|
I reveal you a secret: Amiga's add-on graphic cards don't have economies of scale.
Quote:
Relevant for VGA's absolute sales numbers.
Italy lost WW2 due to weak economies of scale and you're weak at economies of scale.
Last edited by Hammer on 25-Mar-2025 at 03:05 AM. Last edited by Hammer on 25-Mar-2025 at 03:03 AM.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | cdimauro
|  |
Re: 32-bit PPC on FPGA Posted on 25-Mar-2025 5:53:45
| | [ #459 ] |
| |
 |
Elite Member  |
Joined: 29-Oct-2012 Posts: 4298
From: Germany | | |
|
| @Hammer
Quote:
Hammer wrote: @cdimauro
Quote:
Relevant for mainstream 3rd party developers i.e. install base numbers matter.
MegaST's minority sale numbers didn't change the overall Atari ST experience. |
What's not clear to you is that the install base number is irrelevant when talking about professional software, as Mac, Atari ST and Amiga have proved. That's the first point.
The second point is that the same install base is irrelevant for professionals, because they can buy additional cards to accelerate / improve the original systems, to satisfy their needs. And that's the second point.
I'll not repeat them again. Quote:
Quote:
And who cares? 3D applications never had such problem.
|
Prove Amiga's Lightwave has 4 million unit sales. Hint: PC's Doom/Doom 2 has a 4 million unit sales. |
Irrelevant. See above. Quote:
Quote:
Please, tell me more about Apple and all other players. To be more clear, when MacOS supported SMP? When SMP Windows NT systems were available?
|
Did you forget the multi-CPU RISC Windows NT Lightwave hype in English Amiga magazines? |
You haven't answered. What a news... Quote:
Quote:
Irrelevant. The Amiga had several accelerator cards available since the Amiga 1000.
|
Irrelevant. Hint: lack of statistics visibility for 3rd party CPU accelerated Amigas. |
I'll check the old magazines when I've time and provide you that information.
"Strangely", it's the first time that you don't make your usual searches on such old magazines. Quote:
Quote:
Hint: Market size with productivity resolution capability. Apple was able to create a large enough business customer base which is not dependent on the lower price game market.
Hint 2: Commodore's primary revenue streams are from the C64 and the Amiga. CSG LSI group's VIC-20 and C64 revenue streams have been sustaining Commodore, NOT the "system engineering" group (e.g. C900 debacle, AMIX debacle and took over the Amiga project from the original Los Gatos Amiga team).
Amiga's revenue generation took over from C64 during Amiga's golden years until it crashed and burn with the A600 project. |
Nevertheless, third-parties have covered it (much better than Commodore) with their solutions to overcame those limits.
Do you need references here, or it's enough the current thread on EAB where you, Bruce and ThOR are fighting, which recently reported some of them? Quote:
Irrelevant. This have NOT stopped other companies to develop other CAD softwares, even for the Amiga. Quote:
Sure. Told by a bot which don't even understand the context of a discussion, I take it as a compliment.  Quote:
Quote:
Relevant for the platform's survival as a mainstream ongoing concern i.e. large economies of scale matters. |
Irrelevant: see above. Quote:
Quote:
I reveal you a secret: Amiga had additional graphic cards available, for professional markets.
|
I reveal you a secret: Amiga's add-on graphic cards don't have economies of scale. |
Same as above. Quote:
Quote:
Relevant for VGA's absolute sales numbers. |
Same as above. Quote:
Italy lost WW2 due to weak economies of scale and you're weak at economies of scale. |
Even worse.
To sum it up, you've no experience on markets different from PCs. Hence your continuous reporting absolute numbers of PCs as your unique way to compare the different markets.
The reality was made by professional software and additional cards which were available for markets other than this one, which allowed professionals to do their job and many times better than compared to the PCs.
When you reborn again maybe you can enjoy what other people have already done.
Finally, and of course, IN THE LONG RUN PCs dominated all markets. But in the long run: NOT at the time when we're talking about, which is THE CONTEXT OF THE DISCUSSION. |
| Status: Offline |
| | Hammer
 |  |
Re: 32-bit PPC on FPGA Posted on 25-Mar-2025 12:20:54
| | [ #460 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6320
From: Australia | | |
|
| @cdimauro
Quote:
What's not clear to you is that the install base number is irrelevant when talking about professional software, as Mac, Atari ST and Amiga have proved. That's the first point.
The second point is that the same install base is irrelevant for professionals, because they can buy additional cards to accelerate / improve the original systems, to satisfy their needs. And that's the second point.
I'll not repeat them again. (SNIP for out of topic)
|
Irreverent. This topic is about a platform's survival.
Against strong 2D gaming experience platforms like SNES, what's not clear to you is that the size of the install base with business/education customers is important for a platform vendor's survival.
Apple's business/education customer base can spend 1 million PowerMacs within less than a year i.e. March 1994 to January 1995 time frame.
You're out of this topic.
Quote:
The second point is that the same install base is irrelevant for professionals, because they can buy additional cards to accelerate / improve the original systems, to satisfy their needs. And that's the second point.
|
No shit.
The second point is that add-on cards require a base platform that can accept them, and the big box Amigas are many magnitudes smaller than the Macintosh platform.
Great Valley Products liquidated itself in July 1995. In terms of revenue in 1991, Great Valley Products was one of largest of the 3rd third-party add-on providers for the Amiga. Phase 5 survived for a while before filing for insolvency and playing company shell games.
PPC AmigaNone camp's failure is treating the Amiga like a Mac, which is not a Mac. Hint: demographics matter. The same failure for Amithlon's "we don't care about games", reminder: Amiga is not a Mac.
Low volume sales workstation vendors like SGI usually offset their low volume sales with higher costs beyond the PC. SGI attempted to execute mass production via the N64 partnership, but SGI rebels founded ArtX instead and won GameCube's GPU contract.
The PC market slowly chipped away at SGI's advantages.
Similar story for Evan & Sutherland that offered low-volume production rendering workstations at high cost, attempted to compete in Windows NT's professional OpenGL market, and were pushed out by NVIDIA's mass-produced NV10-based GeForce / Quadro. Evan & Sutherland exited workstations and the OpenGL add-on market, with NVIDIA assimilating their patents.
Old school big iron Unix workstation vendors were killed by PC's death by a thousand cuts.
The platform's market economics matter for the commercial add-on vendor's production scale.
Amiga's 3D raytracing companies, like Newtek and Maxon have exited the Amiga platform.
Lightwave users follow Newtek's exit from the Amiga market. Maxon users follow Maxon's exit from the Amiga market.
XCad vendor, Cadvision international, has exited the Amiga platform. CADVision Systems is a major provider for Dassault Système SOLIDWORKS.
https://www.nextcomputers.org/NeXTfiles/Articles/NeXTWORLD/NeXTWORLD_Extra/92.04.SummerNWE/92.04.SummerNWExtra12.html Ditek has sold 10,000 units of DynaCADD since 1985, with 30 percent in North America and most of the balance in Germany, where Amiga and Atari flourish. Also shipping this fall, according to Asher, will be 10 to 15 third-party products taking advantage of DynaCADD's open, extensible architecture.
You're out of this topic's platform survival debate.Last edited by Hammer on 25-Mar-2025 at 01:16 PM. Last edited by Hammer on 25-Mar-2025 at 01:13 PM. Last edited by Hammer on 25-Mar-2025 at 01:11 PM. Last edited by Hammer on 25-Mar-2025 at 12:55 PM. Last edited by Hammer on 25-Mar-2025 at 12:52 PM.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| |
|
|
|
[ home ][ about us ][ privacy ]
[ forums ][ classifieds ]
[ links ][ news archive ]
[ link to us ][ user account ]
|